Developer Center

Resources to get you started with Algorithmia

Crawl, Scrape, and Analyze Websites

Updated

Available on GitHub.

The Analyze URL microservice from Algorithmia is a useful tool for developers that need a straight-forward to scrape a single web pages, and consistently extract structured data from any URL.

Analyze URL works as a simple API endpoint that’s always on and available.

This is a simple Python script that will crawl a domain, scrape the metadata contents, and put it in a useful JSON format.

For the full blog post related to this recipe, see Web Scraping with Python: How To Crawl, Scrape, and Analyze URLs

Getting Started

Create a free Algorithmia account, and install the Algorithmia Python client:

  
pip install algorithmia 
 

Detailed instructions can be found in the web scraping with Python blog post.

How To Crawl, Scrape, and Analyze URLs

Use the command line, and navigate to the folder with your Python file and run:

 
python sitemap2analyzeUrl.py
 

Built With