web

web / GetLinks / 0.1.5

README.md

Table of Contents

  1. Introduction
  2. Examples
  3. Credits

Introduction

Given a url (as a string), scrapes the page for links to other pages and returns them as url strings. Links to documents, such as PDFs and PPTs, are ignored. 

Input:

  • (Required): Website URL.

Output:

  • List of links extracted from given website URL.

Examples

Example 1.

  • Parameter 1: Algorithmia website URL.
"https://algorithmia.com"

Output:

[
  "http://developers.algorithmia.com",
  "https://algorithmia.com/terms",
  "https://algorithmia.com/algorithms/TimeSeries/OutlierDetection",
  "https://algorithmia.com/algorithms/opencv/FaceDetection",
  "https://algorithmia.com/algorithms/SummarAI/Summarizer",
  "https://algorithmia.com/signin",
  .....
  "https://algorithmia.com/pricing",
  "https://angel.co/algorithmia/jobs/",
  "https://algorithmia.com/signup",
  "http://blog.algorithmia.com/post/121967357859/isitnude",
  "http://www.xconomy.com/seattle/2015/03/12/theres-an-algorithm-for-that-algorithmia-helps-you-find-it/",
  "https://algorithmia.com/algorithms/util/Url2Text"
]

Example 2.

  • Parameter 1: Wikipedia website URL.
"https://wikipedia.org"

Output:

[
  "https://de.wikipedia.org/",
  "https://hsb.wikipedia.org/",
  "https://roa-tara.wikipedia.org/",
  "https://tt.wikipedia.org/",
  "https://cbk-zam.wikipedia.org/",
  "https://kg.wikipedia.org/",
  .....
  "https://tpi.wikipedia.org/",
  "https://bug.wikipedia.org/",
  "https://sv.wikipedia.org/",
  "https://jv.wikipedia.org/",
  "https://av.wikipedia.org/",
  "https://gv.wikipedia.org/"
]

Credits

Algorithm was implemented using JSOUP.