Document Classifier

No algorithm description given

Note: image courtesy of nltk Table of Contents Introduction I/O Examples Algorithm Console Introduction This algorithm is used to classify documents using pre-defined labels and document vectors.Technologically it uses doc2vec and K-nearest neighbours to discover the most relevent label in document space. I/O Input { 
 "mode": String 
 "data":String or Array[ 
 { 
 "text":String,
 "label":String
 }
 ],
 "namespace":String,
 "n":Int
}
}
 mode - (required) - This is a functionality switch between training and prediction, use pass train for training and predict for prediction. data - (required) - This may be either a list of datapoints (if less than 1k), or a URL to either an http:// web hosted or a data API ( data:// , s3:// , dropbox:// , etc) hosted json file containing a list of datapoints. namespace - (optional) - The data collection (or namespace) you're using for storing state (model files, etc), if not present this defaults to the algorithm's temp directory. n - (optional) - When using the predict mode, this defines the number of elements to return with each prediction. if not present this defaults to 5 , not used with the train mode. When training, each DataPoint must have label You should pre-load the DocumentClassifier state with labelled training data before running any prediction tasks. Train Output { 
 "uuid":String
}

 uuid - the unique identifier of the training event. Predict Output Single Output this endpoint returns a different output depending on if you passed in a single document or multiple, but both versions return the Prediction object. { 
 "text":String,
 "topN":Array[ 
 { 
 "prediction":String,
 "confidence":Double
 }
 ],
}
 text - the same input text you provided for this prediction, makes it easier to match tensor with document for later. classPerdictions - An array of labels predicted by the model prediction - The label in string form for this predicted class confidence - The confidence that this particular label is correct, ranges between 0 and 1. Multiple Output if you pass in multiple documents to predict, this endpoint returns, which is a wrapper around the single output endpoint. { 
 "predictions":Array[ 
 { 
 "text":String,
 "topN":Array[ 
 { 
 "prediction":String,
 "confidence":Double
 }
 ]
 }
 ]
}
 predictions - an array of prediction objects for each document. text - the same input text you provided for this prediction, makes it easier to match tensor with document for later. classPerdictions - An array of labels predicted by the model prediction - The label in string form for this predicted class confidence - The confidence that this particular label is correct, ranges between 0 and 1. Datapoint The main data storage object used for training and predicting. { 
 "text":String,
 "label":Option[String]
}
 text - (required) - the document you wish to process, be careful to scrub the document for any special characters as they might interfere with processing. be extra careful to remove any quotation marks " and ' from your document, as they break the json string. label - (optional) - the label you define as valid for this document, mandatory for training. single labels for now, multi-label support is coming in the future. Examples Files here are a couple of examples of how to construct a Train and Predict json file for large datasets: Training:
https://gist.github.com/zeryx/0c3ca0d5e6aea1f981a76f55e40a8ef3
Prediction:
https://gist.github.com/zeryx/d185f16d8e8fe5a2130e971affd0bec3
 Train - list of datapoints { 
 "data":[{ 
 "text":"Attention-based neural encoder-decoder frameworks have been widely adopted for image captioning. Most methods force visual attention to be active for every generated word. However, the decoder likely requires little to no visual information from the image to predict non-visual words such as the and of. Other words that may seem visual can often be predicted reliably just from the language model e.g., sign after behind a red stop or phone following talking on a cell. In this paper, we propose a novel adaptive attention model with a visual sentinel.At each time step, our model decides whether to attend to the image (and if so, to which regions) or to the visual sentinel. The model decides whether to attend to the image and where, in order to extract meaningful information for sequential word generation. We test our method on the COCO image captioning 2015 challenge dataset and Flickr30K. Our approach sets the new state-of-the-art by a significant margin",
 "label":"machine learning"
 }],
"mode":"train"
}

 Train - url to json file { 
 "data://.my/collection/training_data.json",
 "namespace":"data://.my/classifier",
 "mode":"train"
}
 Output: {"uuid":"6b6a7f1a-883c-45b9-a2d6-a0dc069b58b9"}
 Predict - list of datapoints { 
 "data":[ 
 { 
 "text":"This is a english sentence that contains information useful for testing."
 }
 ],
 "mode":"predict",
 "n":5
}
 Predict - url to json file { 
 "data":"data://.my/collection/testing_data.json",
 "mode":"predict",
 "n":4
}
 Output - single: { 
 "text":"This is a english sentence that contains information useful for testing.",
 "topN":[ 
 { 
 "confidence":0.8295813798904419,
 "prediction":"ALGO"
 },
 { 
 "confidence":0.8327966332435608,
 "prediction":"CONTENT"
 },
 { 
 "confidence":0.8323397040367126,
 "prediction":"DIEGO"
 },
 { 
 "confidence":0.7701727747917175,
 "prediction":"OUTREACH"
 },
 { 
 "confidence":0.8306512832641602,
 "prediction":"NOT_ASSIGNED"
 }
 ],
}
 Output - multiple: { 
 "predictions":[ 
 { 
 "text":"This is a english sentence that contains information useful for testing.",
 "topN":[ 
 { 
 "prediction":"machine learning",
 "confidence":1
 }
 ]
 },
 { 
 "text":"As you might be able to tell, this dataset only has a single label called machine learning.",
 "topN":[ 
 { 
 "prediction":"machine learning",
 "confidence":1
 }
 ]
 }
 ]
}


Tags
(no tags)

Cost Breakdown

0 cr
royalty per call
1 cr
usage per second
avg duration
This algorithm has permission to call other algorithms which may incur separate royalty and usage costs.

Cost Calculator

API call duration (sec)
×
API calls
=
Estimated cost
per calls
for large volume discounts
For additional details on how pricing works, see Algorithmia pricing.

Internet access

This algorithm has Internet access. This is necessary for algorithms that rely on external services, however it also implies that this algorithm is able to send your input data outside of the Algorithmia platform.


Calls other algorithms

This algorithm has permission to call other algorithms. This allows an algorithm to compose sophisticated functionality using other algorithms as building blocks, however it also carries the potential of incurring additional royalty and usage costs from any algorithm that it calls.


To understand more about how algorithm permissions work, see the permissions documentation.

1. Type your input

2. See the result

Running algorithm...

3. Use this algorithm

curl -X POST -d '{{input | formatInput:"curl"}}' -H 'Content-Type: application/json' -H 'Authorization: Simple YOUR_API_KEY' https://api.algorithmia.com/v1/algo/nlp/DocumentClassifier/0.3.2
View cURL Docs
algo auth
# Enter API Key: YOUR_API_KEY
algo run algo://nlp/DocumentClassifier/0.3.2 -d '{{input | formatInput:"cli"}}'
View CLI Docs
import (
  algorithmia "github.com/algorithmiaio/algorithmia-go"
)

input := {{input | formatInput:"go"}}

var client = algorithmia.NewClient("YOUR_API_KEY", "")
algo, _ := client.Algo("algo://nlp/DocumentClassifier/0.3.2")
resp, _ := algo.Pipe(input)
response := resp.(*algorithmia.AlgoResponse)
fmt.Println(response.Result)
View Go Docs
import com.algorithmia.*;
import com.algorithmia.algo.*;

String input = "{{input | formatInput:"java"}}";
AlgorithmiaClient client = Algorithmia.client("YOUR_API_KEY");
Algorithm algo = client.algo("algo://nlp/DocumentClassifier/0.3.2");
AlgoResponse result = algo.pipeJson(input);
System.out.println(result.asJsonString());
View Java Docs
import com.algorithmia._
import com.algorithmia.algo._

val input = {{input | formatInput:"scala"}}
val client = Algorithmia.client("YOUR_API_KEY")
val algo = client.algo("algo://nlp/DocumentClassifier/0.3.2")
val result = algo.pipeJson(input)
System.out.println(result.asJsonString)
View Scala Docs
var input = {{input | formatInput:"javascript"}};
Algorithmia.client("YOUR_API_KEY")
           .algo("algo://nlp/DocumentClassifier/0.3.2")
           .pipe(input)
           .then(function(output) {
             console.log(output);
           });
View Javascript Docs
using Algorithmia;

var input = "{{input | formatInput:"cs"}}";
var client = new Client("YOUR_API_KEY");
var algorithm = client.algo("algo://nlp/DocumentClassifier/0.3.2");
var response = algorithm.pipe<object>(input);
Console.WriteLine(response.result);
View .NET/C# Docs
var input = {{input | formatInput:"javascript"}};
Algorithmia.client("YOUR_API_KEY")
           .algo("algo://nlp/DocumentClassifier/0.3.2")
           .pipe(input)
           .then(function(response) {
             console.log(response.get());
           });
View NodeJS Docs
import Algorithmia

input = {{input | formatInput:"python"}}
client = Algorithmia.client('YOUR_API_KEY')
algo = client.algo('nlp/DocumentClassifier/0.3.2')
print algo.pipe(input)
View Python Docs
library(algorithmia)

input <- {{input | formatInput:"r"}}
client <- getAlgorithmiaClient("YOUR_API_KEY")
algo <- client$algo("nlp/DocumentClassifier/0.3.2")
result <- algo$pipe(input)$result
print(result)
View R Docs
require 'algorithmia'

input = {{input | formatInput:"ruby"}}
client = Algorithmia.client('YOUR_API_KEY')
algo = client.algo('nlp/DocumentClassifier/0.3.2')
puts algo.pipe(input).result
View Ruby Docs
use algorithmia::Algorithmia;

let input = {{input | formatInput:"rust"}};
let client = Algorithmia::client("YOUR_API_KEY");
let algo = client.algo("nlp/DocumentClassifier/0.3.2");
let response = algo.pipe(input);
View Rust Docs
import Algorithmia

let input = "{{input | formatInput:"swift"}}";
let client = Algorithmia.client(simpleKey: "YOUR_API_KEY")
let algo = client.algo(algoUri: "nlp/DocumentClassifier/0.3.2") { resp, error in
  print(resp)
}
View Swift Docs
Discussion
  • {{comment.username}}