XPath is defined as XML path. It is a syntax or language for finding any element on the web page using XML path expression. XPath is used to find the location of any element on a webpage using HTML DOM structure. For more detail visit below pages:
Some websites loads data dynamically on scroll therefore you have to apply auto-scroll on page load before extraction. Set Scroll step points and delay of each step enough so that page get enough time to load. Also your internet speed should be enough. For example your internet speed is medium and you set scroll step points 300 (means 300 pixel) and 500 milliconds delay.
When you need to extract specific leads from any website, the software provides you an environment to generate each field (like business name, address, person name, phone number, email address, website URL. Etc.) from the web pages with mouse right clicks on the page. When you complete all fields needed to extract from the web page(s) of the website, save the configurations in a file called a project.
Some websites display business name, address and contact information on search page also and it is enough information to extract instead of going to detail page for complete profile. It is called short profile information page or multi-record per page information. Short profile information pages are normally search pages containing multiple records on each page. It is fast if short profile fulfills your needs. See below screenshot. Search page contains 3 records:
Most of the websites display short profile on search page and complete profile on separate page when you click on short profile link. Such types of profiles are called detail profile information pages. In such cases software takes profile links from search page and open detail profile in separate windows to extract data.
How to select parent? You can select any field as a parent node where child nodes are exists. First of all select area with mouse over where your required data exists as shown in below image where area is selected which have all fields required like business name, address, telephone, rating, reviews etc.
Below screenshot is the search page where multiple short information profiles are displaying
Below page loaded complete profile in single page.
How to select parent? You can select any field as a parent node where child nodes are exists. First of all select area with mouse over where your required data exists as shown in below image where area is selected which have all fields required like business name, address, telephone, rating, reviews etc.
Below screenshot is the search page where multiple short information profiles are displaying
Below page loaded complete profile in single page.
In browser html nodes are shown in a tree structure and there is relationship between nodes like child parent, siblings, ancestors same as human relation .Parent nodes are that nodes which have some fields as child nodes in own tree structure.
How to select parent? You can select any field as a parent node where child nodes are exists. First of all select area with mouse over where your required data exists as shown in below image where area is selected which have all fields required like business name, address, telephone, rating, reviews etc.
How to select parent? You can select any field as a parent node where child nodes are exists. First of all select area with mouse over where your required data exists as shown in below image where area is selected which have all fields required like business name, address, telephone, rating, reviews etc.
Sometimes html field’ actual data is not visible and we need to click on that field to see its actual data or extract actual data as shown in below image. First we click on “Telephone” then it shows telephone numbers. When you add click item in software configuration during project creation then software will automatically click to show number then extract.
To click any html field, select that field item where you want to click as “Phone number” in above shown image. Right click on phone number, a popup window will open as shown below:
Sometimes after click page will take milliseconds to load data or clicked item so we can wait here for milliseconds as shown in above image and put the name of selected item and save it.
To click any html field, select that field item where you want to click as “Phone number” in above shown image. Right click on phone number, a popup window will open as shown below:
Sometimes after click page will take milliseconds to load data or clicked item so we can wait here for milliseconds as shown in above image and put the name of selected item and save it.
When you searched any query in browser, the profiles data will be shown as below image:
To collect these profiles URL links “Right Click” on any field which has url link and a popup window will be opened as shown below:
When you are selecting profile links then field type should be “Get Detail page links” .If you have any parent of profiles links then select relevant parent in “select parent filed” dropdown list.
To collect these profiles URL links “Right Click” on any field which has url link and a popup window will be opened as shown below:
When you are selecting profile links then field type should be “Get Detail page links” .If you have any parent of profiles links then select relevant parent in “select parent filed” dropdown list.
There are some websites where directly “next page” is not given and pagination is showing like below Image:
In that case we can select next page just to “Right click” on any page number like 2,3,4 etc.. but is not already selected. New popup window will open:
Selected field type should be “Set the next page item” and if there is any parent node of selected of pagination then selected parent type must be relevant parent where this is existing.
In that case we can select next page just to “Right click” on any page number like 2,3,4 etc.. but is not already selected. New popup window will open:
Selected field type should be “Set the next page item” and if there is any parent node of selected of pagination then selected parent type must be relevant parent where this is existing.
At the time of extraction if we want to show current page number of that site where software is extracting data .Because software is moving from 1 page to endless pages automatically .To create xpath of current page number, click on selected page number as “1” is selected in below image:
In that case we can select next page just to “Right click” on any page number like 2,3,4 etc.. but is not already selected. New popup window will open:
When you are selecting profile links then field type should be “Get Detail page links” .If you have any parent of profiles links then select relevant parent in “select parent filed” dropdown list.
In that case we can select next page just to “Right click” on any page number like 2,3,4 etc.. but is not already selected. New popup window will open:
When you are selecting profile links then field type should be “Get Detail page links” .If you have any parent of profiles links then select relevant parent in “select parent filed” dropdown list.
Sometimes our required field data is the value of any property as shown below:
In this case the telephone number is the value of “data-visible-number” property. So to collect data as the values shown in above picture:
When you are going to select any property data then selected type should be “Get property data” and write the name or the property or select property name from dropdown list as property name is “data-visible-number” as shown in above image.
In this case the telephone number is the value of “data-visible-number” property. So to collect data as the values shown in above picture:
When you are going to select any property data then selected type should be “Get property data” and write the name or the property or select property name from dropdown list as property name is “data-visible-number” as shown in above image.
To extract image url address from any site just Right click on image new popup window will be appeared as shown below:
Note Select field type “Extract the image source address” when you are selecting image link from any site.
Note Select field type “Extract the image source address” when you are selecting image link from any site.
When you want to select any email address which is available on web page right click on that field. A popup will be appeared as shown below:
When your selected field type is “Extract email address” software will pick email formatted data from your selected html field.
When your selected field type is “Extract email address” software will pick email formatted data from your selected html field.
There are many sites where we need to delete item before extraction.
e.gSometimes in pagination when next page and previous page has same xpath then we need to delete previous page item before extraction. Because if we do not delete previous page item then software will click again and again previous page and do not move to next page. because in pagination, previous page is on first number and next page will come after this one so we need to delete.
Select field type “delete item from search page before extraction” and save it. Software will delete your selected field at runtime before extraction.
e.gSometimes in pagination when next page and previous page has same xpath then we need to delete previous page item before extraction. Because if we do not delete previous page item then software will click again and again previous page and do not move to next page. because in pagination, previous page is on first number and next page will come after this one so we need to delete.
Select field type “delete item from search page before extraction” and save it. Software will delete your selected field at runtime before extraction.
Scroll on search page required to load web page completely. Because there are some sites that will load data on search browser after scrolling. If we do not scroll on that page then software will never extract complete page information because complete page is not loaded in browser. So to load page just enable this option while creating project.
Sometimes a site has multiple fields with same xapth or has same attributes. So in that case we need to delete any item to create a unique xpath for selected field .To delete any item from detail page just “Right Click” on that field and select field type is “Delete item from detail page”.
When you require to click back button while extraction, select your field where you want click as shown below image:
Right click on back button which is shown in above image and a new popup window will be appeared as shown below:
Note: To click on previous page field type should be “Click item to go on previous page”.
Right click on back button which is shown in above image and a new popup window will be appeared as shown below:
Note: To click on previous page field type should be “Click item to go on previous page”.
Sometimes software cannot select a unique xpath when multiple html fields have same attributes. So to create a unique and valid xpath enable this option. Software will select xpath with index number of that field as shown in below picture:
Note: Remember one thing when you create xpath with indexing there are some chances to extract invalid data. Because some sites may not have that field and then software pick wrong data which is available on selected index.
Note: Remember one thing when you create xpath with indexing there are some chances to extract invalid data. Because some sites may not have that field and then software pick wrong data which is available on selected index.
While creating xpath of any field may be we need to take sibling index number of our selected field to create a unique xpath in absence of unique attributes values of selected filed. To use sibling index enable option.
Sibling attributes are enabled by default while creating a project. Sibling attributes are those attributes which has the same parent which is the parent of our selected node. Siblings are used to identify the right clicked item when selected HTML node has no unique path to reach the selected item with Xpath.
To use siblings in Xpath creation just check this
To use siblings in Xpath creation just check this
Html nodes are shown in a tree structure on the web page. This tree structure may have multiple parent nodes and each parent nodes may be has multiple child nodes. So to select xpath of any specific html node we can use child nodes attributes via enable option.
There are some sites where attributes are same for multiple fields or has not valid attributes. In that case we can use text of child nodes to create a valid xpath as shown below image:
You can use child node text just enable this option.
You can use child node text just enable this option.
Sometimes the attributes values are not on a standard which is used in website where we want to extract data. Then we can use nearest siblings text node of selected html filed to create a unique xpath by enabling this option.
Sometimes we can create a valid xpath with selected field text and also their sibling text then with enable this option software will create xpath with current node text and their sibling node.
When you enable this option, software will pick only unique attributes in their html tree.
You can change the number of parent nodes that is being used in xpath creation just select value from this dropdown . Software will use the parents of selected field that you have selected from this dropdown while xpath creation.
Sometimes data is shown on click of any item on web page. In this case click on web page may take few milliseconds to load actual data .So you need to set Ajax wait between 300 milliseconds to 500 milliseconds , software will wait after click according to your set time and then extract your clicked item data.
Note:Any website can take more time to load information after click event therefore you can incease time upto 3000 milliconds as per your internet speed and load time.
1000 milliconds = 1 second
Note:Any website can take more time to load information after click event therefore you can incease time upto 3000 milliconds as per your internet speed and load time.
1000 milliconds = 1 second
When you want to extract only number value like 1,2,3,4,5 etc. then select data type “Number” and if you want to extract all kind of data available on the web page then select data type “Text”
Yes you can restrict the software omit complete row data if any specific column value is empty. Just enable option software will discard complete row data if your selected column remain empty.
Yes you can disallow the software to extract that column value if any specific text value exists in that column. Just put your text per lines here:
Note: As you can see in above image “new York” and “bay shore” is written. Software will never extract those values for current column where these defined values are exists.
Note: As you can see in above image “new York” and “bay shore” is written. Software will never extract those values for current column where these defined values are exists.
Software has a filter option for a specific column where you can tell the software to extract only if this column has these defined values as shown below:
If there is no values found then this column will be remain empty.
If there is no values found then this column will be remain empty.
To shorten xpath enable this option if possible, the software will create a short xpath for current field.
Note: If short path changes output data then do not use it.
Note: If short path changes output data then do not use it.
Yes you can also select xpath of that fields where nodes are not clickable. Just Right click on that field , a new popup window will appear as shown below:
Now you can select your required field as shown in above image like “telephone”. “Business name” and “address” . Just select your field and save it.
Now you can select your required field as shown in above image like “telephone”. “Business name” and “address” . Just select your field and save it.
You can use a dynamic attributes option when you need a specific part of any attribute value ,as shown below image:
Just enable this option graphic as shown in above image and select number of character from dropdown . software will use contains method of xpath. In this way you can create xpath using few characters of selected attribute value.
Just enable this option graphic as shown in above image and select number of character from dropdown . software will use contains method of xpath. In this way you can create xpath using few characters of selected attribute value.
Software can extract what website provides and does not generate any information itself. This software might not fit to extract data from any website because of not creating project correctly therefore before making purchase; must test the software if it works for you Or contact us.
- Note:Software do not work for LinkedIn, Google Map, and Xing
- Must use and test the software before purchasing.
See screenshot to import to project script in the United Lead Scraper:
See screenshot to import to project script in the United Lead Scraper:
See screenshot to import to project script in the United Lead Scraper: