High perfomance web crawler
- asynchronous
- search Rest API
- throttling (by domain)
- distributed
- auth
- Start docker-compose
docker-compose up -d
- Signup
curl -X POST -d "email=oleg@mail.net&name=Oleg&password=qwerty123" http://localhost:8080/api/v1/signup - Login
curl -X POST -d "email=oleg@mail.net&password=qwerty123" http://localhost:8080/api/v1/login - Add crawl task
curl -X POST -H "X-Token: <Token ID>" -d "domain=docs.python.org&https=1" http://localhost:8080/api/v1/index
- Get result
curl http://localhost:8080/api/v1/search?q=asyncio&limit=20&offset=5
+-------------+
+---------------+ | | +-----------------+
| |<---| Rest API |<--->| |
| | | | | |
| | +-------------+ | |
| | +-------------+ | |
| | | | | |
| |--->| Crawler |---->| ElasticSearch |
| | | | | (crawled pages) |
| RabbitMQ | +-------------+ | |
| | +-------------+ | |
| | | | | |
| |--->| Crawler |---->| |
| | | | +-----------------+
| | +-------------+ +---------------+
| | +-------------+ | |
| | | | | PostgreSQL |
| |<-->| Auth |<--->| (user data) |
+---------------+ | | | |
+-------------+ +---------------+