nlp / AnalyzeGithubReadme / 0.1.15

0. TL; DR

This is an algorithm for analyzing GitHub readmes. It also makes recommendations for improving your project readme.

1. Introduction

Creating good readmes can be really challenging. This is especially true given the fact that there's no official standard to follow. One approach for solving this problem could be to go through the top starred repositories on github and find common features in each of them. Going through hundreds of repositories can be time consuming, slow process.

This algorithm uses machine learning models to learn from the 1000 starred repositories for the top 10 programming languages (that has the most repositories) on Github. It gives your project readme a score between 1-10, and makes recommendations to make your readme more similar to the top readmes on GitHub.


  • (Required): A github project URL.


  • Scoring for the given readme.
  • Recommended Changes.

2. Scoring

The algorithm scores 5 distinct features for every given project readme file. These features are:

  1. Titles and Headers (Heading Tags)
  2. Text Content (Paragraphs)
  3. Number of Code Snippets (Pre Tags)
  4. Number of Images and Badges (Img Tags)
  5. Total length of readme

Each feature is scored an integer value starting from 1 to 10. The scoring is done by a regression model that is pre-selected for each feature. Models are selected based on their accuracy rates.

3. Recommendations

The algorithm recommends changes to make in your features for improving the overall score of your readme. For titles, headers and text based content it makes insertion and deletion recommendation for words. For all of the other features, it makes recommendations telling you how much you should increase or decrease your numerical features. For example if it says incease by 5 for pre, that means you should probably add 5 more code snippets to your repository.

4. Examples

Example 1.

  • Parameter 1: A github readme
  "repo": ""


  "score": {
    "pre": 8,
    "header": 7,
    "length": 5,
    "paragraph": 5,
    "img": 5
  "recommendation": {
    "pre": [{"operation": "increase", "value": 8}],
    "header": [
      {"operation": "insert", "value": "featur"},
      {"operation": "insert", "value": "overview"},
      {"operation": "insert", "value": "code"},
      {"operation": "delete", "value": "instal"},
      {"operation": "insert", "value": "develop"},
      {"operation": "insert", "value": "author"},
      {"operation": "insert", "value": "support"},
      {"operation": "insert", "value": "configur"},
      {"operation": "insert", "value": "note"},
      {"operation": "insert", "value": "start"}
    "length": [{"operation": "decrease", "value": 3426}],
    "paragraph": [
      {"operation": "insert", "value": "code"},
      {"operation": "insert", "value": "queri"},
      {"operation": "insert", "value": "prefix"},
      {"operation": "insert", "value": "implement"},
      {"operation": "insert", "value": "profil"},
      {"operation": "insert", "value": "depend"},
      {"operation": "insert", "value": "send"},
      {"operation": "insert", "value": "easiest"},
      {"operation": "insert", "value": "program"},
      {"operation": "insert", "value": "merchant"}
    "img": []