hotels

hotels / WebsiteSummary / 0.1.4

README.md

This algorithm takes a web address and returns a summary relevant structural details of the site. Specifically, it is intended to identify the relevant pages on a hotel website, returning selected metadata and the relative importance of various pages as measured by PageRank. 


The returned information includes:
  • url - the original address given, assumed to be the main page of the website.
  • language - the language of the main page. See https://algorithmia.com/algorithms/nlp/LanguageIdentification for a guide to the returned language symbols.
  • tags - important terms from the website.
  • important pages - we check to identify which pages on the site are used for rooms, reservations/booking, photos, and location. For this we currently support English, Spanish, Italian, German, and Portuguese.
  • pageRanks - an ordered list of pages on the site by page rank, the higher the rank, the more likely the page is to be important.