PetiteProgrammer / ProgrammingLanguageIdentification / 0.1.3

This algorithm detects the programming language of source code with high accuracy (about 99.4% top-1 accuracy for a Github dataset).

It currently supports these languages:

C, C#, C++, CSS, Haskell, HTML, Java, JavaScript, Lua, Objective-C, Perl, PHP, Python, R, Scala, SQL, Swift, VB.

Also see my article on the machine learning techniques used.


The text of a document with source code.


List of pairs: [language name, probability]

For example:

[  ["javascript", 0.9935536807317678],  ["vb", 0.001937278879510437],  ["c", 0.0017313291225903907] ...]