LuceneTextAnalyzer

No algorithm description given

In general, any analyzer in Lucene is a combination of tokenizer => Stemmer => Stop-words filter. Tokenizer splits your text into chunks, and since different analyzers may use different tokenizers, you can get different output token streams , i.e. sequences of chunks of text. For example, KeywordAnalyzer you mentioned doesn't split the text at all and takes all the field as a single token. At the same time, StandardAnalyzer (and most other analyzers) use spaces and punctuation as a split points. For example, for phrase "I am very happy" it will produce list ["i", "am", "very", "happy"] (or something like that). For more information on specific analyzers/tokenizers see its Java Docs . Stemmers are used to get the base of a word in question. It heavily depends on the language used. For example, for previous phrase in English there will be something like ["i", "be", "veri", "happi"] produced, and for French "Je suis très heureux" some kind of French analyzer (like SnowballAnalyzer , initialized with "French") will produce ["je", "être", "tre", "heur"]. Of course, if you will use analyzer of one language to stem text in another, rules from the other language will be used and stemmer may produce incorrect results. It isn't fail of all the system, but search results then may be less accurate. KeywordAnalyzer do not use any stemmers, it passes all the field unmodified. So, if you are going to search some words in English text, it isn't a good idea to use this analyzer. Stop words are the most frequent and almost useless words. Again, it heavily depends on language. For English these words are "a", "the", "I", "be", "have", etc. Stop-words filters remove them from the token stream to lower noise in search results, so finally our phrase "I'm very happy" with StandardAnalyzer will be transformed to list ["veri", "happi"]. And KeywordAnalyzer again do nothing. So, KeywordAnalyzer is used for things like ID or phone numbers, but not for usual text.]

Tags
(no tags)

Cost Breakdown

0 cr
royalty per call
1 cr
usage per second
avg duration

Cost Calculator

API call duration (sec)
×
API calls
=
Estimated cost
per calls
for large volume discounts
For additional details on how pricing works, see Algorithmia pricing.

Internet access

This algorithm has Internet access. This is necessary for algorithms that rely on external services, however it also implies that this algorithm is able to send your input data outside of the Algorithmia platform.


To understand more about how algorithm permissions work, see the permissions documentation.

1. Type your input

2. See the result

Running algorithm...

3. Use this algorithm

curl -X POST -d '{{input | formatInput:"curl"}}' -H 'Content-Type: application/json' -H 'Authorization: Simple YOUR_API_KEY' https://api.algorithmia.com/v1/algo/codeb34v3r/LuceneTextAnalyzer/0.1.0
View cURL Docs
algo auth
# Enter API Key: YOUR_API_KEY
algo run algo://codeb34v3r/LuceneTextAnalyzer/0.1.0 -d '{{input | formatInput:"cli"}}'
View CLI Docs
import com.algorithmia.*;
import com.algorithmia.algo.*;

String input = "{{input | formatInput:"java"}}";
AlgorithmiaClient client = Algorithmia.client("YOUR_API_KEY");
Algorithm algo = client.algo("algo://codeb34v3r/LuceneTextAnalyzer/0.1.0");
AlgoResponse result = algo.pipeJson(input);
System.out.println(result.asJsonString());
View Java Docs
import com.algorithmia._
import com.algorithmia.algo._

val input = {{input | formatInput:"scala"}}
val client = Algorithmia.client("YOUR_API_KEY")
val algo = client.algo("algo://codeb34v3r/LuceneTextAnalyzer/0.1.0")
val result = algo.pipeJson(input)
System.out.println(result.asJsonString)
View Scala Docs
var input = {{input | formatInput:"javascript"}};
Algorithmia.client("YOUR_API_KEY")
           .algo("algo://codeb34v3r/LuceneTextAnalyzer/0.1.0")
           .pipe(input)
           .then(function(output) {
             console.log(output);
           });
View Javascript Docs
var input = {{input | formatInput:"javascript"}};
Algorithmia.client("YOUR_API_KEY")
           .algo("algo://codeb34v3r/LuceneTextAnalyzer/0.1.0")
           .pipe(input)
           .then(function(response) {
             console.log(response.get());
           });
View NodeJS Docs
import Algorithmia

input = {{input | formatInput:"python"}}
client = Algorithmia.client('YOUR_API_KEY')
algo = client.algo('codeb34v3r/LuceneTextAnalyzer/0.1.0')
print algo.pipe(input)
View Python Docs
library(algorithmia)

input <- {{input | formatInput:"r"}}
client <- getAlgorithmiaClient("YOUR_API_KEY")
algo <- client$algo("codeb34v3r/LuceneTextAnalyzer/0.1.0")
result <- algo$pipe(input)$result
print(result)
View R Docs
require 'algorithmia'

input = {{input | formatInput:"ruby"}}
client = Algorithmia.client('YOUR_API_KEY')
algo = client.algo('codeb34v3r/LuceneTextAnalyzer/0.1.0')
puts algo.pipe(input).result
View Ruby Docs
use algorithmia::*;

let input = {{input | formatInput:"rust"}};
let client = Algorithmia::client("YOUR_API_KEY");
let algo = client.algo('codeb34v3r/LuceneTextAnalyzer/0.1.0');
let response = algo.pipe(input);
View Rust Docs
Discussion
  • {{comment.username}}