GitHub - jedberg/quickparse: A parser for haproxy logs in Python

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README		README
log_gen.pl		log_gen.pl
quickparse.py		quickparse.py

Repository files navigation

This program is used by reddit to more easily parse
haproxy log file lines.  It was written because using
awk was starting to get tiring. :)

Simple usage is something like this:

    tail -F logfile | quickparse.py status uri

That will get you something like this:

    200 /user/jedberg/about.json 
    200 /user/reddit 
    200 /static/reddit.js?v=7dd397929b2f6c735455095e17a9eaa5 
    200 /ads/r/Enhancement/ 
    200 /comments/fjgit/.json?sort=top&limit=200 
    304 /static/button/button2.html?width=51
    304 /static/button/button2.html?width=51
    404 /user/lskdjf
    503 /user/foobar

Also in this repository is a program called log_gen.pl,
which will generate random bogus log lines for your testing
needs.

For this program to work, your haproxy needs to be 
configured correctly.  Here are the relevant logging bits
for your haproxy configuration file:

In the frontend section:

    option httplog
    log /dev/log local4
    capture request header User-Agent len 150
    capture request header Host len 50
    capture request header Referer len 300
    capture request header X-Forwarded-For len 15
    capture request header If-Modified-Since len 50
    capture cookie session= len 25

In each backend section:

    capture cookie session= len 25

(this may now be unnecessary -- it was there because of
a bug in haproxy)

The local4 in the log option sends all the logs to 
the local4 service, which helps in keeping the logs
seperate.

The quickparse program has a few support modules that are
reddit specific.  I left these in as examples.

Enjoy!