util

util / Url2Text / 0.1.4

README.md

Table of Content

  1. Introduction
  2. Examples
  3. Credits

Introduction

Takes in a url and extracts the content from the page. Makes an attempt to remove non-content text like navigation and footer text.

Input:

  • (Required): Website URL.

Output:

  • Extracted text content.

Examples

Example 1.

  • Parameter 1: Algorithmia website URL.
"https://algorithmia.com"

Output:

"Join a community built around state-of-the-art algorithm development. Create, share, and build on other algorithms ..... could use a way to be shared themselves."

Example 2.

  • Parameter 1: A Wikipedia article URL.
"https://en.wikipedia.org/wiki/Dark_matter"

Output:

"Dark matter is a hypothetical substance that is believed by most astronomers to account for around five-sixths of the matter in the universe ..... it is usually attributed extraordinary physical or magical properties. Such descriptions are often inconsistent with the hypothesized properties of dark matter in physics and cosmology."

Credits

Algorithm is based on the JSOUP library.