magicanded / newscrape / 0.3.1
README.md
newscrape - Scrape various news sites
newscrape scrapes articles by keyword from various news sites such as Reuters, CNN, WSJ, etc.
Usage
Input
newscrape receives JSON object with following attributes:
Parameter | Description |
---|---|
dataDir | Data collection which will store articles |
site | Define which news site will be scraped. 'All' for scraping all sites. Values include 'Reuters', 'CNN', 'CNBC', 'NYTimes', 'Guardian', 'WashingtonPost', 'HuffingtonPost', 'FoxNews', 'BBC', 'DailyMail', 'IndiaTimes', 'USAToday', 'WSJ', 'NBCNews', 'ABCNews' |
keyword | Keyword which articles will be search for |
grammarly | 'yes' if Grammarly will be runned, 'no' if otherwise. Currently, this feature is not implemented. |
textgears | 'yes' if Textgears will be runned, 'no' if otherwise. Currently, this feature is not implemented. |
copyscape | 'yes' if Copyscape will be runned, 'no' if otherwise. Currently, this feature is not implemented. |
language | Target language of articles. If target language is not same of language of article, translation will be done. |
maxPage | Maximum pages will be searched for. |
Output
gethtml returns JSON object with following attributes:
Parameter | Description |
---|---|
isError | 'true' if error occurs, 'false' if success |
errorCode | Code name of error |
errorMessage | Message of error |
results | Results data |
results.count | Number of scraped articles |
results.articles | List of scraped articles |
Supported Languages
Language | Code |
---|---|
Albanian | sq |
Arabian | ar |
Armenian | hy |
Azeri | az |
Belarusian | be |
Bosnian | bs |
Bulgarian | bg |
Catalan | ca |
Croatian | hr |
Czech | cs |
Chinese | zh |
Danish | da |
Dutch | nl |
English | en |
Estonian | et |
Finnish | fi |
French | fr |
Georgian | ka |
German | de |
Greek | el |
Hebrew | he |
Hungarian | hu |
Icelandic | is |
Indonesian | id |
Italian | it |
Japanese | ja |
Korean | ko |
Latvian | lv |
Lithuanian | lt |
Macedonian | mk |
Malay | ms |
Maltese | mt |
Norwegian | no |
Polish | pl |
Portuguese | pt |
Romanian | ro |
Russian | ru |
Spanish | es |
Serbian | sr |
Slovak | sk |
Slovenian | sl |
Swedish | sv |
Thai | th |
Turkish | tr |
Ukrainian | uk |
Vietnamese | vi |
Contents