LDA

No algorithm description given

0. TL;DR This algorithm takes a group of documents (anything that is made of up text), and returns a number of topics (which are made up of a number of words) most relevant to these documents. 1. Introduction In natural language processing, Latent Dirichlet Allocation (LDA) is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA is an example of a topic model and was first presented as a graphical model for topic discovery by David Blei, Andrew Ng, and Michael Jordan in 2003.  Input: (Required):   A list of strings *  or   File path or url ** (Optional):   The mode *** (Default is Smart Mode)   or Custom Algorithm Settings **** (Optional): A list of  Stop words *****  or File path or url of Stop words   ****** Output: A list of topics which contain relevant words and their occurrences in the documents. 2. Documents *A list of strings:  A list which contains at least 1 string document. (key = "docsList") Example List of Documents: {
 ...,
 "docsList": [
 "this is document 1",
 "this is document 2",
 "this is document 3"
 ],
 ...
} **File Path & Url:  File path is for files stored in Algorithmia Data API. Url is for http and https links. The file should include 1 string document per line. (key = "docsFile") Example Data API File Path: {
 ...,
 "docsFile": "data://nlp/LDA/lda_demo.txt",
 ...
} Example Urls: {
 ...,
 "docsFile": "https://s3.amazonaws.com/algorithmia-assets/algo_desc_images/nlp_lda/lda_demo.txt",
 ...
} 3. Modes & Custom Settings ***Mode:  A predefined settings of parameters. Currently 3 modes are supported: (key = "mode") Smart Mode ("mode"="smart"): Optimizes algorithm parameters adaptively. Balances speed and quality. Quality Mode ("mode"="quality"): Runs a larger number of iterations for improved quality. Requires longer execution time. Fast Mode ("mode"="fast"): Optimized for fast topic extraction. Ideal for applications which require fast LDA processing. Example of Mode: {
 ...,
 "mode": "fast",
 ...
} ****Custom Algorithm Settings: This allows you to provide additional parameters (other than no. topics) of the LDA algorithm. (key = "customSettings") Number of Topics:  The number of topics you want to generate. (Default is 4) (key = "numTopics") Number of Iterations: Larger iteration size provides better quality of topics. (key = "numIterations") Number of words: The number of words you want to see in topics. (Default is 8) (key = "numWords") Example of Custom Algorithm Settings: {
 ...,
 "customSettings": {
 "numTopics": 4,
 "numIterations": 100,
 "numWords": 16
 },
 ...
} 4. Stop words *****A list of Stop words:  A list of stop words (strings). This filters out all of the Stop words before running the LDA algorithm. (key = "stopWordsList") Example List of Stop words: {
 ...,
 "stopWordList": [
 "and",
 "or",
 "the"
 ],
 ...
} ******File path or Url of Stop words:  A Algorithmia Data API file path or Url to the list of stop words. Words should be separated by a new line. This filters out all of the Stop words before running the LDA algorithm. (key = "stopWordsFile") Example Data API File Path: {
 ...,
 "stopWordsFile": "data://.my/collection/stopwords.txt",
 ...
} Example Urls: {
 ...,
 "stopWordsFile": "http://site.com/dir/stopwords.txt",
 ...
} {
 ...,
 "stopWordsFile": "https://site.com/dir/stopwords.txt",
 ...
} 5. Output A list of topics: This contains relevant words and their occurrences in these obtained topics. Example of A List of Topics: [
 {
 "biology": 45692,
 "university": 10576,
 "moth": 5304,
 "caterpillar": 4927
 },
 {
 "space": 67019,
 "nasa": 9673,
 "earth": 5674,
 "moon": 3455
 },
 {
 "politics": 24763,
 "washington": 11982,
 "congress": 7261,
 "president": 5820
 }
] Description: In the above example you can see that we have a group of words for each topic. LDA doesn't name the topic itself, but by giving you the most frequent words it's usually easy to guess what that topic is generally about. The first topic is probably about caterpillars and moths in general, and maybe research done about it in universities? The second topic is very clear. It's about space in general, as we can see very relevant keywords about it. The third topic is clearly about politics. Again we can see very relevant keywords about it. 6. Examples Example 1: Parameter 1:   An Algorithmia Data API file path. Parameter 2: Not specified (Uses default   Smart Mode ) Parameter 3: Not specified (Uses default stop words list) {
 "docsFile": "data://.my/collection/file.txt"
} Example 2: Parameter 1: A http link Parameter 2: Fast Mode Parameter 3: Not specified (Uses default stop words list) {
 "docsFile": "http://site.com/dir/file.txt",
 "mode": "fast"
} Example 3:  Parameter 1:  A list of strings representing observations Parameter 2:   4 topics,  100 iterations, 16 words Parameter 3:  Not specified (Uses default stop words list) {
 "docsList": [
 "this is an observation string",
 "this is yet another observation test string"
 ],
 "customSettings": {
 "numTopics": 4,
 "numIterations": 100,
 "numWords": 16
 }
} Example 4: Parameter 1: An Algorithmia Data API file path. Parameter 2: Quality Mode Parameter 3: Custom Stop words list {
 "docsFile": "data://.my/collection/file.txt",
 "mode": "quality",
 "stopWordsList": ["and", "but", "before", "them"]
} Example 5: Parameter 1:   An Algorithmia Data API file path. Parameter 2:   Not specified (Uses default  Smart Mode ) Parameter 3:   Custom Stop words file {
 "docsFile": "data://.my/collection/file.txt",
 "stopWordsFile": "data://.my/collection/stopWords.txt"
} 7. Credits For more information, please refer to  http://mallet.cs.umass.edu/topics-devel.php or McCallum, Andrew Kachites. "MALLET: A Machine Learning for Language Toolkit." http://mallet.cs.umass.edu . 2002.

Tags
(no tags)

Cost Breakdown

0 cr
royalty per call
1 cr
usage per second
avg duration

Cost Calculator

API call duration (sec)
×
API calls
=
Estimated cost
per calls
for large volume discounts
For additional details on how pricing works, see Algorithmia pricing.

Internet access

This algorithm has Internet access. This is necessary for algorithms that rely on external services, however it also implies that this algorithm is able to send your input data outside of the Algorithmia platform.


To understand more about how algorithm permissions work, see the permissions documentation.

1. Type your input

2. See the result

Running algorithm...

3. Use this algorithm

curl -X POST -d '{{input | formatInput:"curl"}}' -H 'Content-Type: application/json' -H 'Authorization: Simple YOUR_API_KEY' https://api.algorithmia.com/v1/algo/nlp/LDA/1.0.0
View cURL Docs
algo auth
# Enter API Key: YOUR_API_KEY
algo run algo://nlp/LDA/1.0.0 -d '{{input | formatInput:"cli"}}'
View CLI Docs
import com.algorithmia.*;
import com.algorithmia.algo.*;

String input = "{{input | formatInput:"java"}}";
AlgorithmiaClient client = Algorithmia.client("YOUR_API_KEY");
Algorithm algo = client.algo("algo://nlp/LDA/1.0.0");
AlgoResponse result = algo.pipeJson(input);
System.out.println(result.asJsonString());
View Java Docs
import com.algorithmia._
import com.algorithmia.algo._

val input = {{input | formatInput:"scala"}}
val client = Algorithmia.client("YOUR_API_KEY")
val algo = client.algo("algo://nlp/LDA/1.0.0")
val result = algo.pipeJson(input)
System.out.println(result.asJsonString)
View Scala Docs
var input = {{input | formatInput:"javascript"}};
Algorithmia.client("YOUR_API_KEY")
           .algo("algo://nlp/LDA/1.0.0")
           .pipe(input)
           .then(function(output) {
             console.log(output);
           });
View Javascript Docs
var input = {{input | formatInput:"javascript"}};
Algorithmia.client("YOUR_API_KEY")
           .algo("algo://nlp/LDA/1.0.0")
           .pipe(input)
           .then(function(response) {
             console.log(response.get());
           });
View NodeJS Docs
import Algorithmia

input = {{input | formatInput:"python"}}
client = Algorithmia.client('YOUR_API_KEY')
algo = client.algo('nlp/LDA/1.0.0')
print algo.pipe(input)
View Python Docs
library(algorithmia)

input <- {{input | formatInput:"r"}}
client <- getAlgorithmiaClient("YOUR_API_KEY")
algo <- client$algo("nlp/LDA/1.0.0")
result <- algo$pipe(input)$result
print(result)
View R Docs
require 'algorithmia'

input = {{input | formatInput:"ruby"}}
client = Algorithmia.client('YOUR_API_KEY')
algo = client.algo('nlp/LDA/1.0.0')
puts algo.pipe(input).result
View Ruby Docs
use algorithmia::*;

let input = {{input | formatInput:"rust"}};
let client = Algorithmia::client("YOUR_API_KEY");
let algo = client.algo('nlp/LDA/1.0.0');
let response = algo.pipe(input);
View Rust Docs
Discussion
  • {{comment.username}}