What can be extracted?
With Import.io you can extract data from the entire web! From e-commerce to science journals, Import.io enables you to get data from the web into a structured format.
In this section, we will cover a few examples of different types of data which can be extracted. To create your own extractor, click here.
Example 1 Amazon purchase page
Amazon purchases pages can contain large amounts of information about the product:
Using the extractor we can capture price, savings, what department the product is sold and many other factors directly from the page. We can then compare this to other webpages and capture the same data from them. An example of two websites where we collected data is shown below.
Above is an example of a detail page, which shows information about a single item, in this case the cameras. You can use these to compare different detail pages, as shown by the second row of data displayed. You could use this one extractor on most Amazon sale pages.
Example 2 Trip Advisor
However, sometimes you want to collect data from multiple items in one page, this is known as a list page. Trip Advisor is a good example of where you would want to create a list page.
In this page, you collect information about different objects on the pages, as shown by the multiple pink selection boxes used by our extractor. For this website we are collecting information about different items on the same webpage.
The data column for that single page will look like this:
Thus, showing how from one list webpage you can extract a large amount of information. List pages are a great way of obtaining links that can then be used in detail pages. For example, you could create a detail page for a specific hotel page on Trip Advisor, then use this list of hotel pages URLs to extract data from every single page. For more information on the differences between list and detail pages, click here. Or, to create your own extractor click here.
Example 3 BBC news articles
You can also use create extractor for media articles, enabling you to quickly obtain mass media information. For this example, I am going to use a BBC news article:
From this you could extract key points of data as follows:
Of course this is just one article, although interesting not that useful, now I am going to quickly create an extractor which looks at the front page of the BBC news site and obtains the URLs of the top stories.
Now, we can link both of these extractors together, enabling the first extractor to be used on all extractors from the list. Thus, creating this data (opened in Excel):
This extractor could be set to run every day, so that you can instantly click and get information about the newest articles on the BBC front page. Now, imagine this extractor combined with a few from other sites, you will be able to quickly gather what is happening in the world without having to open each article one by one.
XE.com is a website that states the current currency exchange rate, as shown below:
Using our extractor this can be extracted into this format:
This image shows a small subset of the data, which was extracted from XE.com. Thus, giving you the ability to extract currency information from the web. This can be set up to continually extract, keeping you up to date with the current currency values. Now, how this data can be used is up to you within copyright laws. An economist could continually extract this data to examine changes in the currency exchange market. Alternatively you could track this data to see if there is any benefit in investing in an alternate currency.
Now that you have seen a few of our examples, try creating your own extractor.