progragle

progragle / scraglehtml / 0.1.3

README.md

scraglehtml - HTML of javascript based web page

scraglehtml load given web page, scroll down a number of times and wait until javascript loaded, then save HTML. scraglehtml is useful in scrapping web sites when the web page can not be loaded by HTML parser such as Jsoup or web crawler such as cUrl.

Usage

Input

scraglehtml receives JSON object with following attributes:

ParameterDescription
urlURL of web page need to get HTML
scrollNumber of times web page will be scrolled down. Each time will pause for 100 ms. If scroll > 100000, new scroll will be subtracted from 100000, and algorithm will wait "new scroll" seconds before saving web page.
cookiesFileFile which stores cookies during loading web page
injectJsFileJavaScript file which will be loaded to web page and runned on it

Output

scraglehtml returns JSON object with following attributes:

ParameterDescription
isError'true' if error occurs, 'false' if success
errorCodeCode name of error
errorMessageMessage of error
resultsResults data
results.outputFileResult file which contains HTML of web page
results.outputFileUrlDownload URL of result file which contains HTML of web page