HTML Data Extractor

Turn HTML into Structured JSON, with XPath Support

Algorithmia Platform License · Internet Access · Calls Other Algorithms

Try the API

[
  {
    "p": {
      "attributes": {
        "class": "small text-darker"
      },
      "children": [
        "Leverage a library of more than 3,500 microservices via an API."
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "small text-darker"
      },
      "children": [
        "Try any microservice in 5-lines of code or less."
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "small text-darker"
      },
      "children": [
        "Secure and scalable infrastructure is ready when you are."
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "fine-print"
      },
      "children": [
        "By signing up, you agree to our",
        {
          "a": {
            "attributes": {
              "href": "/terms"
            },
            "children": [
              "terms and conditions"
            ]
          }
        },
        {
          "a": {
            "attributes": {
              "href": "/privacy"
            },
            "children": [
              "privacy policy"
            ]
          },
          "br": {
            "attributes": {}
          }
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {},
      "children": [
        {
          "strong": "Intelligent APIs meet intelligent apps."
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "service__algo",
        "ng-class": "{active: algo==='dlib/FaceDetection'}",
        "ng-click": "algo='dlib/FaceDetection'"
      },
      "children": [
        "Face Detection"
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "service__algo",
        "ng-class": "{active: algo==='nlp/SocialSentimentAnalysis'}",
        "ng-click": "algo='nlp/SocialSentimentAnalysis'"
      },
      "children": [
        "Social Sentiment Analysis"
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "service__algo",
        "ng-class": "{active: algo==='sfw/NudityDetectioni2v'}",
        "ng-click": "algo='sfw/NudityDetectioni2v'"
      },
      "children": [
        "Nudity Detection i2v"
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "service__algo",
        "ng-class": "{active: algo==='nlp/Word2Vec'}",
        "ng-click": "algo='nlp/Word2Vec'"
      },
      "children": [
        "Word 2 Vec"
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "service__algo",
        "ng-class": "{active: algo==='opencv/SmartThumbnail'}",
        "ng-click": "algo='opencv/SmartThumbnail'"
      },
      "children": [
        "SmartThumbnail"
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "service__comment"
      },
      "children": [
        {
          "a": {
            "attributes": {
              "href": "/algorithms"
            },
            "children": [
              "and thousands more"
            ]
          }
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "feature--title"
      },
      "children": [
        "FIND"
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "m-B-l"
      },
      "children": [
        {
          "b": "Expand your toolbelt"
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "feature--title"
      },
      "children": [
        "TEST"
      ]
    }
  },
  {
    "p": {
      "attributes": {},
      "children": [
        {
          "b": "Language-agnostic to fit your workflow."
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "feature--title"
      },
      "children": [
        "DEPLOY"
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "small m-B-l"
      },
      "children": [
        "Friction-free, secure, auto-scaling cloud infrastructure is there when you need it."
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "small m-B-l"
      },
      "children": [
        "Algorithms, functions, and models as a service with simple, pay-per-execution pricing."
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "small m-B-xxs"
      },
      "children": [
        "Deploy and scale your machine learning, deep learning, and data science models using CPUs or GPUs in the cloud with support for the top frameworks, including TensorFlow and R."
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "small m-B-xxs"
      },
      "children": [
        "Built-in support for 14 languages and clients makes integration quick and simple. Stick to the programming language you use, whether it’s Java, Python, Node, Ruby, Rust, or Scala."
      ]
    }
  },
  {
    "p": {
      "attributes": {},
      "children": [
        {
          "b": "Whether you’re open sourcing your code or deploying it for private use"
        },
        {
          "code": "git clone"
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "footer-terms"
      },
      "children": [
        {
          "a": {
            "attributes": {
              "href": "/privacy"
            },
            "children": [
              "Privacy Policy"
            ]
          }
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "footer-terms"
      },
      "children": [
        {
          "a": {
            "attributes": {
              "href": "/terms"
            },
            "children": [
              "Terms & Conditions"
            ]
          }
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "footer-info"
      },
      "children": [
        "COPYRIGHT",
        {
          "i": {
            "attributes": {
              "aria-hidden": "true",
              "class": "fa fa-copyright"
            }
          }
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "footer-info"
      },
      "children": [
        "ALGORITHMIA"
      ]
    }
  }
]

Install & Use

Use

curl -X POST -d '{
  "URL":"http://algorithmia.com",
  "XPATH":"//p",
  "FORMAT":"cobra"
}' -H 'Content-Type: application/json' -H 'Authorization: Simple YOUR_API_KEY' https://api.algorithmia.com/v1/algo/web/HTMLDataExtractor/0.1.0