WebArticleScraper

No algorithm description given

Introduction WebArticleSraper as it’s name suggests, is an api that can scrape the main article out of any generic web page. It specializes in being able to determine which text is relevant, and which ones aren’t thus having a surprisingly low false positive score. As a result of it pickiness, there are occasional cases of false negatives, so this api is recommended for users who need to scrape the vast majority of the relevant text on an article site, and are ok with missing a sentence or two. Essentially it performs better for data mining ( where useless comments/advertisements are ignored) opposed to consumer driven application, where 100% the text must be scraped. Another additive feature, is that if the scraper scrapes a site incorrectly, we have a easy to use configuration page where you can explicitly tell it what information is useful and which to ignore. Plus it has the ability to recognize titles. Examples: Test it on any web article or wiki page below!

Tags
(no tags)

Cost Breakdown

2 cr
royalty per call
1 cr
usage per second
avg duration

Cost Calculator

API call duration (sec)
×
API calls
=
Estimated cost
per calls
for large volume discounts
For additional details on how pricing works, see Algorithmia pricing.

Internet access

This algorithm has Internet access. This is necessary for algorithms that rely on external services, however it also implies that this algorithm is able to send your input data outside of the Algorithmia platform.


To understand more about how algorithm permissions work, see the permissions documentation.

1. Type your input

2. See the result

Running algorithm...

3. Use this algorithm

curl -X POST -d '{{input | formatInput:"curl"}}' -H 'Content-Type: application/json' -H 'Authorization: Simple YOUR_API_KEY' https://api.algorithmia.com/v1/algo/Chessgecko/html2textPro/0.2.0
View cURL Docs
algo auth
# Enter API Key: YOUR_API_KEY
algo run algo://Chessgecko/html2textPro/0.2.0 -d '{{input | formatInput:"cli"}}'
View CLI Docs
import com.algorithmia.*;
import com.algorithmia.algo.*;

String input = "{{input | formatInput:"java"}}";
AlgorithmiaClient client = Algorithmia.client("YOUR_API_KEY");
Algorithm algo = client.algo("algo://Chessgecko/html2textPro/0.2.0");
AlgoResponse result = algo.pipeJson(input);
System.out.println(result.asJsonString());
View Java Docs
import com.algorithmia._
import com.algorithmia.algo._

val input = {{input | formatInput:"scala"}}
val client = Algorithmia.client("YOUR_API_KEY")
val algo = client.algo("algo://Chessgecko/html2textPro/0.2.0")
val result = algo.pipeJson(input)
System.out.println(result.asJsonString)
View Scala Docs
var input = {{input | formatInput:"javascript"}};
Algorithmia.client("YOUR_API_KEY")
           .algo("algo://Chessgecko/html2textPro/0.2.0")
           .pipe(input)
           .then(function(output) {
             console.log(output);
           });
View Javascript Docs
var input = {{input | formatInput:"javascript"}};
Algorithmia.client("YOUR_API_KEY")
           .algo("algo://Chessgecko/html2textPro/0.2.0")
           .pipe(input)
           .then(function(response) {
             console.log(response.get());
           });
View NodeJS Docs
import Algorithmia

input = {{input | formatInput:"python"}}
client = Algorithmia.client('YOUR_API_KEY')
algo = client.algo('Chessgecko/html2textPro/0.2.0')
print algo.pipe(input)
View Python Docs
library(algorithmia)

input <- {{input | formatInput:"r"}}
client <- getAlgorithmiaClient("YOUR_API_KEY")
algo <- client$algo("Chessgecko/html2textPro/0.2.0")
result <- algo$pipe(input)$result
print(result)
View R Docs
require 'algorithmia'

input = {{input | formatInput:"ruby"}}
client = Algorithmia.client('YOUR_API_KEY')
algo = client.algo('Chessgecko/html2textPro/0.2.0')
puts algo.pipe(input).result
View Ruby Docs
use algorithmia::*;

let input = {{input | formatInput:"rust"}};
let client = Algorithmia::client("YOUR_API_KEY");
let algo = client.algo('Chessgecko/html2textPro/0.2.0');
let response = algo.pipe(input);
View Rust Docs
Discussion
  • {{comment.username}}