magicanded

magicanded / newscrape / 0.3.1

README.md

newscrape - Scrape various news sites

newscrape scrapes articles by keyword from various news sites such as Reuters, CNN, WSJ, etc.

Usage

Input

newscrape receives JSON object with following attributes:

ParameterDescription
dataDirData collection which will store articles
siteDefine which news site will be scraped. 'All' for scraping all sites. Values include 'Reuters', 'CNN', 'CNBC', 'NYTimes', 'Guardian', 'WashingtonPost', 'HuffingtonPost', 'FoxNews', 'BBC', 'DailyMail', 'IndiaTimes', 'USAToday', 'WSJ', 'NBCNews', 'ABCNews'
keywordKeyword which articles will be search for
grammarly'yes' if Grammarly will be runned, 'no' if otherwise. Currently, this feature is not implemented.
textgears'yes' if Textgears will be runned, 'no' if otherwise. Currently, this feature is not implemented.
copyscape'yes' if Copyscape will be runned, 'no' if otherwise. Currently, this feature is not implemented.
languageTarget language of articles. If target language is not same of language of article, translation will be done.
maxPageMaximum pages will be searched for.

Output

gethtml returns JSON object with following attributes:

ParameterDescription
isError'true' if error occurs, 'false' if success
errorCodeCode name of error
errorMessageMessage of error
resultsResults data
results.countNumber of scraped articles
results.articlesList of scraped articles

Supported Languages

LanguageCode
Albaniansq
Arabianar
Armenianhy
Azeriaz
Belarusianbe
Bosnianbs
Bulgarianbg
Catalanca
Croatianhr
Czechcs
Chinesezh
Danishda
Dutchnl
Englishen
Estonianet
Finnishfi
Frenchfr
Georgianka
Germande
Greekel
Hebrewhe
Hungarianhu
Icelandicis
Indonesianid
Italianit
Japaneseja
Koreanko
Latvianlv
Lithuanianlt
Macedonianmk
Malayms
Maltesemt
Norwegianno
Polishpl
Portuguesept
Romanianro
Russianru
Spanishes
Serbiansr
Slovaksk
Sloveniansl
Swedishsv
Thaith
Turkishtr
Ukrainianuk
Vietnamesevi