web

web / BreadthFirstSiteMap / 0.2.17

README.md

A more efficient site mapper that explores in a breadth-first fashion. Given a url and a max number of pages to return, it explores in a breadth-first fashion until it detects that is has passed the maximum allowed number of pages, or until it has run for two minutes, whichever comes first. Returns an adjacency map of urls. Optionally, you can provide a list of required terms as a third argument. If you do this, the algorithm will return a hashmap with two keys, "map", whose value is the site map as described above, and "marked", whose value is a list of URLs whose html (including human-readable text) contains at least one of the strings provided in the third argument. This is most useful for finding specific html tags, though be careful of very short or common words that might be present either in html or in human readable text.