nlp

nlp / ProfanityDetection / 1.0.0

README.md

Profanity Detection

This algorithm implements a profanity detector based on string comparisons.The default usage just checks the input string to see if any of its substrings match a list of known obscenities and profanities, and returns a Map with identified profanities as keys and the number of times that profanity appeared as value. Our default dictionary of profanity includes about 340 words drawn from noswearing.com on 01/28/2015.

Note: that for compound profanities this may over count. This is not as straight forward to use as a boolean return value, but provides additional information that might be useful - for instance, a single use of the word "damn", or references to genitalia in a medical context, may not be considered objectionable, whereas stronger profanity or large volumes of profanity might be. For the maximum strictness, just check for an empty Map.

Note: that this is word-based only. It may miss some words, miss certain misspellings, or double entendres and other material that is offensive in context.

Note: Profanity Detection has now been upgraded to batch & json formatting with version 1.0.0

Table of Contents

Input

{
  data: String[],
  additionalWorkds: String[]
  customizedExclusive: Boolean
}
{  
   "documents":[  
      "He is acting like a damn jackass, and as far as I'm concerned he can frack off.",
      "It really depends on what kind of type safety you need.",
      "I am following a tutorial on asp.net web api and mongodb here and on step 4 it talks about dependency injection and adding it to the start.cs in the ConfigureServices() method, however this doesnt seem to exist anymore. My web api templates startup.cs looks something like this..."
   ],
   "extraWords":[  
      "frack",
      "damn",
      "api"
   ],
   "customizedExclusive":false
}
  • documents - (required) - Can be an array of strings of any length.
  • additionalWords - (optional) - Any additional words you want to detect can be input here.
  • extraWords - (switch) - If true, the input data is checked against only the words defined by extraWords, if false the input is checked against the default database and the words defined by extraWords. defaults to false.

Output

Sentences[word_counts]
{  
   "I am following a tutorial on asp.net web api and mongodb here and on step 4 it talks about dependency injection and adding it to the start.cs in the ConfigureServices() method, however this doesnt seem to exist anymore. My web api templates startup.cs looks something like this...":{  
      "api":2
   },
   "It really depends on what kind of type safety you need.":{  

   },
   "He is acting like a damn jackass, and as far as I'm concerned he can frack off.":{  
      "damn":1,
      "jackass":1,
      "frack":1
   }
}
  • sentences - the sentence label for each word_counts object.
  • word_counts - a json object containing the detected words and the number of times detected.