Skip to content

wormtooth/miner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Miner

Miner is a very simple web crawler/scraper written in Python, inspired by Sukhoi. It only requires requests and lxml. It works under both python 2 and python 3.

Usage

The simplest way to use Miner is to inherit it and override Miner.parse.

from miner import Miner

class QuoteMiner(Miner):
    def parse(self, dom):
        texts = dom.xpath('//div[@class="quote"]//span[@class="text"]/text()')
        authors = dom.xpath('//div[@class="quote"]//small/text()')
        for text, author in zip(texts, authors):
            self.append({
                'author': author,
                'text': text[1:-1],
            })

url = 'http://quotes.toscrape.com/'
quotes = QuoteMiner(url)

Examples

quote.py: Another QuoteMiner, only this time it also fetches information about authors.

quote_thread.py: Use thread to accelerate.

quote_soup.py: Use BeautifulSoup instead of lxml.

quote_pony.py: Save to database, using Pony ORM.

About

A very simple python crawler/scraper

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages