Partilhar via


Advanced Scraper (Independent Publisher)

An advanced web scraper API with rotating IPs from 170+ countries.

This connector is available in the following products and regions:

Service Class Regions
Logic Apps Standard All Logic Apps regions except the following:
     -   Azure Government regions
     -   Azure China regions
     -   US Department of Defense (DoD)
Power Automate Premium All Power Automate regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Power Apps Premium All Power Apps regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Contact
Name Troy Taylor
URL https://www.hitachisolutions.com
Email ttaylor@hitachisolutions.com
Connector Metadata
Publisher Troy Taylor, Hitachi Solutions
Website https://apilayer.com/marketplace/description/adv_scraper-api
Privacy policy https://www.ideracorp.com/Legal/APILayer/PrivacyStatement
Categories Website

Creating a connection

The connector supports the following authentication types:

Default Parameters for creating connection. All regions Not shareable

Default

Applicable: All regions

Parameters for creating connection.

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Name Type Description Required
API Key securestring The API Key for this api True

Throttling Limits

Name Calls Renewal Period
API calls per connection 100 60 seconds

Actions

Scrape a form page

Scrape a remote page containing a HTML form.

Scrape a remote URL

Scrape a remote URL, with optional request from country, render, CSS selector, and timeout.

Scrape a form page

Scrape a remote page containing a HTML form.

Parameters

Name Key Required Type Description
URL
url True string

The URL address to scrape.

Country
country string

An optional 2 character country code if you wish to scrape from an IP address of a specific country.

Render
render boolean

A boolean whether to render the remote page. If you wish to scrape images, JSON files, PDF files or XML feeds, you need to set this to false.

Selector
selector string

A CSS selector. Ex: a.navbar-brand.

Timeout
timeout integer

A timeout in seconds before the scraper returns a result. Min value: 5, max: 45.

Body
body True string

The form entries.

Returns

Name Path Type Description
Data Selector
data-selector array of string

The data selected.

Country
options.country string

The country requested.

Render
options.render boolean

Whether rendered.

Selector
options.selector string

The selector requested.

Timeout
options.timeout integer

The timeout requested.

Page Title
page_title string

The title of the page.

Referer
request_headers.Referer string

The referer.

The result URL address.
result_url string

Result URL

The URL address requested.
url string

URL

Scrape a remote URL

Scrape a remote URL, with optional request from country, render, CSS selector, and timeout.

Parameters

Name Key Required Type Description
URL
url True string

The URL address to scrape.

Country
country string

An optional 2 character country code if you wish to scrape from an IP address of a specific country.

Render
render boolean

A boolean whether to render the remote page. If you wish to scrape images, JSON files, PDF files or XML feeds, you need to set this to false.

Selector
selector string

A CSS selector. Ex: a.navbar-brand.

Timeout
timeout integer

A timeout in seconds before the scraper returns a result. Min value: 5, max: 45.

Returns

Name Path Type Description
Data Selector
data-selector array of string

The data selected.

Country
options.country string

The country requested.

Render
options.render boolean

Whether rendered.

Selector
options.selector string

The selector requested.

Timeout
options.timeout integer

The timeout requested.

Page Title
page_title string

The page title.

Result URL
result_url string

The result URL address.

URL
url string

The URL address requested.