This algorithm takes a web address and returns a summary relevant structural details of the site. Specifically, it is intended to identify the relevant pages on a hotel website, returning selected metadata and the relative importance of various pages as measured by PageRank. 

The returned information includes:
  • url - the original address given, assumed to be the main page of the website.
  • language - the language of the main page. See for a guide to the returned language symbols.
  • tags - important terms from the website.
  • important pages - we check to identify which pages on the site are used for rooms, reservations/booking, photos, and location. For this we currently support English, Spanish, Italian, German, and Portuguese.
  • pageRanks - an ordered list of pages on the site by page rank, the higher the rank, the more likely the page is to be important.

