Detail vs. list pages

Previously, we only selected one box per column, however you may have noticed that you can actually train multiple sections of a website into the same column by clicking more than one section while selecting that column. This is enables two different types of webpages to be extracted: list and detail pages. So far, we have been looking at detail pages.

Detail pages

Detail pages have only one item on a page, such as a product, a person, or an address. Here are some examples of detail pages:

Amazon purchase pages are great examples of detail pages as they only have one item, the product which is being sold. In this example it is a camera.

If the data you want is on a detail page, you more than likely want to get corresponding data from lots of similar detail pages and you can easily do this by adding URLs to your extractor after you have created it. For single item pages, each URL will return one row of data.

In the editor if you are using a detail page, often it is worth switching from auto row to single row, as it means that all data on that single one item will appear in one row. This can be done in the top left corner of the editor:

List pages

If there are multiple items on the same page, such as a list of products, a table of data, or a page of links, we call these list pages. Here is an example of a list page:

Trip Advisor is a good example of a list page, as each page contains multiple products, in this case different London hotels.

If your data is list based you should make sure the editor is in auto rows mode by toggling to auto rows in this box.

When training a column, you have to click each different item you are interested in. For the hotel example above, select all the names of the hotels in the column which you designate the hotel name, as shown below. detects that you are doing this. In the example above you can see that the first three names are surrounded by pink boxes. These are the ones I have clicked. The rest of the names are surrounded by green boxes. This is where has detected that I am collecting those names and added the rest of the names to the column.

For more information in further enhancing your extractors please visit this section.

results matching ""

    No results matching ""