Training URLs

What are training URLs?

When you are in the editor selecting specific data from your chosen webpage, you are training the extractor in what data you wish to extract. As such we call this training the first URL. You can add additional URLs to be trained, this will give the extractor a greater understanding of what you wish to be extracted and thus enable a more accurate extractor.

You don't need to add lots of URLs for the editor to automatically pick the differences between pages, on list pages one is usually enough and on detail pages it might take up to three to five to do the trick.

How to add more training URLs?

In the editor click on Add URL at the top left hand side of the screen.

This will bring up a box which asks for additional URLs. You can also remove URLs from this box, this is done by the trash cans to the right of the listed URLs.

After the URLs have been added, close the box with x at the top right hand corner, and check if the data is correct. There are two ways to do this. First, you can manually check through the columns in the website view. Note: you can change which webpage you are looking at by using this drop down menu in the middle of the screen and selecting the desired website.

Alternatively you can use the data view, if you are using a single row data extractor, in that case all of the information will be displayed on the data page.

If you find a gap in that data simply select that website, go to the website view, click the column with the gap and then click the data, using the pink box selection tool. This will then fill in the gap and enable more accurate collection of data by the extractor. For more information in selecting data from a website click here.

Unsuccessful additions

Sometimes URLs cannot be added and this box will appear:

There are multiple reasons this can occur. First, this can occur if the website is currently down, because it is in maintenance, or the website may have crashed. Another reason is that the URL is incorrect and doesn't exist.

