jpeck

jpeck / scrapydemo / 0.1.3

README.md

Overview

This is a demo of using Scrapy in a multithreaded context. It retrieves the HTML from a list of URLs and extracts the title tag from each, spawning multiple processes to retrieve and parse each URL in parallel.

Input

A list of URLs (including the protocol): ["http://www.example.com/","http://www.yahoo.com/","http://google.com"]

Output

The title tags of those pages:

[["<title>Example Domain</title>"],["<title>Yahoo</title>"],["<title>Google</title>"]]