HTML Data Extractor

Turn HTML into Structured JSON, with XPath Support

Algorithmia Platform License
apl
· Internet Access

This algorithm has Internet access.

This is necessary for algorithms that rely on external services, however it also implies that this algorithm is able to send your input data outside of the Algorithmia platform.
· Calls Other Algorithms

This algorithm has permission to call other algorithms.

This allows an algorithm to compose sophisticated functionality using other algorithms as building blocks, however it also carries the potential of incurring additional royalty and usage costs from any algorithm that it calls.

Run an Example

[
  {
    "p": {
      "attributes": {
        "class": "small text-darker"
      },
      "children": [
        "Leverage a library of more than 3,500 microservices via an API."
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "small text-darker"
      },
      "children": [
        "Try any microservice in 5-lines of code or less."
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "small text-darker"
      },
      "children": [
        "Secure and scalable infrastructure is ready when you are."
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "fine-print"
      },
      "children": [
        "By signing up, you agree to our",
        {
          "a": {
            "attributes": {
              "href": "/terms"
            },
            "children": [
              "terms and conditions"
            ]
          }
        },
        {
          "a": {
            "attributes": {
              "href": "/privacy"
            },
            "children": [
              "privacy policy"
            ]
          },
          "br": {
            "attributes": {}
          }
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {},
      "children": [
        {
          "strong": "Intelligent APIs meet intelligent apps."
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "service__algo",
        "ng-class": "{active: algo==='dlib/FaceDetection'}",
        "ng-click": "algo='dlib/FaceDetection'"
      },
      "children": [
        "Face Detection"
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "service__algo",
        "ng-class": "{active: algo==='nlp/SocialSentimentAnalysis'}",
        "ng-click": "algo='nlp/SocialSentimentAnalysis'"
      },
      "children": [
        "Social Sentiment Analysis"
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "service__algo",
        "ng-class": "{active: algo==='sfw/NudityDetectioni2v'}",
        "ng-click": "algo='sfw/NudityDetectioni2v'"
      },
      "children": [
        "Nudity Detection i2v"
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "service__algo",
        "ng-class": "{active: algo==='nlp/Word2Vec'}",
        "ng-click": "algo='nlp/Word2Vec'"
      },
      "children": [
        "Word 2 Vec"
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "service__algo",
        "ng-class": "{active: algo==='opencv/SmartThumbnail'}",
        "ng-click": "algo='opencv/SmartThumbnail'"
      },
      "children": [
        "SmartThumbnail"
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "service__comment"
      },
      "children": [
        {
          "a": {
            "attributes": {
              "href": "/algorithms"
            },
            "children": [
              "and thousands more"
            ]
          }
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "feature--title"
      },
      "children": [
        "FIND"
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "m-B-l"
      },
      "children": [
        {
          "b": "Expand your toolbelt"
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "feature--title"
      },
      "children": [
        "TEST"
      ]
    }
  },
  {
    "p": {
      "attributes": {},
      "children": [
        {
          "b": "Language-agnostic to fit your workflow."
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "feature--title"
      },
      "children": [
        "DEPLOY"
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "small m-B-l"
      },
      "children": [
        "Friction-free, secure, auto-scaling cloud infrastructure is there when you need it."
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "small m-B-l"
      },
      "children": [
        "Algorithms, functions, and models as a service with simple, pay-per-execution pricing."
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "small m-B-xxs"
      },
      "children": [
        "Deploy and scale your machine learning, deep learning, and data science models using CPUs or GPUs in the cloud with support for the top frameworks, including TensorFlow and R."
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "small m-B-xxs"
      },
      "children": [
        "Built-in support for 14 languages and clients makes integration quick and simple. Stick to the programming language you use, whether it’s Java, Python, Node, Ruby, Rust, or Scala."
      ]
    }
  },
  {
    "p": {
      "attributes": {},
      "children": [
        {
          "b": "Whether you’re open sourcing your code or deploying it for private use"
        },
        {
          "code": "git clone"
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "footer-terms"
      },
      "children": [
        {
          "a": {
            "attributes": {
              "href": "/privacy"
            },
            "children": [
              "Privacy Policy"
            ]
          }
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "footer-terms"
      },
      "children": [
        {
          "a": {
            "attributes": {
              "href": "/terms"
            },
            "children": [
              "Terms & Conditions"
            ]
          }
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "footer-info"
      },
      "children": [
        "COPYRIGHT",
        {
          "i": {
            "attributes": {
              "aria-hidden": "true",
              "class": "fa fa-copyright"
            }
          }
        }
      ]
    }
  },
  {
    "p": {
      "attributes": {
        "class": "footer-info"
      },
      "children": [
        "ALGORITHMIA"
      ]
    }
  }
]

Install & Use

Use

curl -X POST -d '{
  "URL":"http://algorithmia.com",
  "XPATH":"//p",
  "FORMAT":"cobra"
}' -H 'Content-Type: application/json' -H 'Authorization: Simple YOUR_API_KEY' https://api.algorithmia.com/v1/algo/web/HTMLDataExtractor/0.1.0