Creating your first extractor
This section of the guide will cover how to create your first extractor. Enabling you to transform your chosen webpage into data by the end of this page. We offer this in two formats: video and text.
If you have watched the video click here to learn how to run your extractor.
Step 1. Choose your webpage:
Choosing the webpage you wish to extract, in the example I am going to be using http://www.bbc.co.uk/news/world-australia-38713957, a news article about the world finest uncut opal. Go to the web page you wish to extract data from and click the address box at the top of the screen, highlight the URL, then copy the URL by right clicking and selecting copy.
Step 2. log into the dashboard Import.io:
After you create your account and sign in you will be taken to the online dashboard, this is the hub for all Import.io extractors. For more information on navigating and using the dashboard click here.
Step 3. Clicking the new extractor button and inputting your URL.
To create a new extractor click the new extractor button at the top left corner of the screen, as shown circled below:
This will bring up a box requesting your URL, right click the white area in the box and paste the URL into the box.
After entering the URL, the go button will change from grey to blue to show that the URL has been accepted. To start the extractor, click the go button. This will launch the extractor and this screen will be displayed.
Step 4. Entering the editor
After this is finished, the data page is displayed. The data you see has been automatically selected as possible data you may wish to extract from the page. There are two main windows in the editor, the data window and the website window.
They can be moved between by clicking either data or website at the top right corner of the screen.
Click the website tab to go to the website page, this will display the webpage you chose with the extractor interface loaded over it. Now, click back to the data page. The first column in my data sample doesn't contain the information I want, as such I will have to delete to do this. I click the little arrow next to the name of the column and click delete column. Please note the arrow only appears once you hover over the name of the column.
Step 5. Adding a new column
Adding a new column will let us add more data. Now press the add column button in the top right of the editor.
Step 6. Selecting your data
We can now fill this column with data. To do this we simply use the mouse, if you move around the webpage you will notice that pink boxes appear wherever the mouse hovers. This pink box shows what data will be selected if we left click with the mouse. For example, in my news article I click the title and the name of the news article appears in the new column section.
As you may have noticed the pink box has become bolder, this shows exactly what data has been extracted. Now, we can also change the name of the column simply by clicking on it, this will highlight the column name, enabling us to delete it and write a new name, in this case article name. Press enter to confirm the name. We can capture more than one column at a time by simply clicking the add column button on the top right hand of the screen. Once we have clicked this a new column will appear.
Yet again, we use the same process to train the column with the data we want to collect. Repeat this process of creating new columns until you have collected all the data you wish from the website.
Step 7. Saving your extractor
Now, we are ready to save the extractor. This is done by clicking the save button at the top right of the screen.
Step 8. Running your extractor
Now, once you save your results you will be taken back to the dashboard. Here you can run you URLs, by clicking the run URLs box which appears at the middle of the screen.
Now, once you have clicked this button you will then be taken to the run history page and your extractor will start running. There are two important sections of the runner the left and the right hand side. The left shows all the ongoing data about the run including the number of URLs, the number of successes, and the duration and the total rows.
The right hand side has three symbols. If you click the first symbol, you can select the download format: CSV and JSON. When using the data in Excel, download the CSV version and open the file with Excel. The second symbol is the log file, which shows information about the run. The third symbol of an eye, is the preview symbol. This can be used to view a sample of your data.
Ok we have now reached the end of the first extractor tutorial. You can learn about more functionality of the extractor here. Alternatively you can now look at how to utilize the dashboard, for further control in managing and running your extractors.