Checklist for improving extractor performance

This checklist provides a list of possible actions to improve you extractor

1. Train more URLs

By training more URLs, you can provide more information for your extractor to learn from and thus enable it to extract more effectively.

2. Look at the error codes

Looking at the log files displays the error codes of the extractor, this can enable you to work out why they are failing. For more information on the error codes please click here.

3. Using manual XPath

Manual XPath is an advanced tool which use the elements of a webpage to select specific data.

4. Split extractors

Sometimes the simplest way to solve the extraction problem is to create a second extractor and run only the failed results through this extractor.

5. Turn off the website styles and script

Certain website have data which can only be accessed if the styles or scripts are turned off.

