Search in the blog:

2024-03-01

Webolog: Common/Combined Log Format analyzer [Java, 2001]

In 2001, I had access to the server logs of a Tonigy website. I tried several ways (at least Moglan, Webalizer, Analog, ALA, and hypermart.net service) to analyze them, but I didn't like the results. So I decided to write my own web log analyzer.

Although I wrote Tonigy in the C language, I decided to use the Java language. Java had a much more advanced standard library than C. In C, even simple string manipulation was too verbose.

I was also writing in Perl at the time, but I was never a fan of that language. I didn't see Perl as a language for anything more complex than a few CGI files.

In 2001 I created an analyzer called Webolog, which I never published. But I used it a lot in my work. It was a command line tool for Java 1.1+. It takes logs in Common or Combined Log Format, extracts data, collects it, and generates a bunch of HTML files (with statistics).

I do not have logs that old, but I found screenshots from May 2001. Funny, they show MS IE as the browser.

Total visits:




"Referers":




Search queries (from "referers"):




It is possible to configure query extraction. The search sites at that time:

===
web.altavista.com q
www.altavista.com q
www.lycos.com query
www.google.com q
www.google.fr q
www.google.de q
www.search.com qt
google.yahoo.com p
search.163.com key
en.os2.org query
===


Browsers used:




And OS used:




Agent data extraction is also configurable with RegEx.

===
; Opera
"Opera/(\S+) \(.*Windows 9(?:5|8).*\)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "Windows 95/98"
"Opera/(\S+) \(.*Windows NT 4\.0;.*\)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "Windows NT"
"Opera/(\S+) \(.*Windows NT 5\.\d;.*\)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "Windows 2000"
"Opera/(\S+) \(.*Linux.*\)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "Linux"
"Opera/(\S+) \(.*OS/2.*\)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "OS/2"
"Mozilla/\S+ \(.*Windows 9(?:5|8).*\) Opera (\S+)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "Windows 95/98"
"Mozilla/\S+ \(.*Windows 2000.*\) Opera (\S+)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "Windows 2000"
"Mozilla/\S+ \(.*Linux.*\) Opera (\S+)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "Linux"
"Mozilla/\S+ \(.*Windows 3\.10.*\) Opera (\S+)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "Windows 3.1"

; Internet Explorer
"Mozilla/\S+ \(.*MSIE (\S+);.*Windows 3\.1\)" "Internet Explorer" "Internet Explorer $1" "Windows 3.1"
"Mozilla/\S+ \(.*MSIE (\S+);.*Windows 9(?:5|8).*\)" "Internet Explorer" "Internet Explorer $1" "Windows 95/98"
"Mozilla/\S+ \(.*MSIE (\S+);.*Windows NT 5\.\d.*\)" "Internet Explorer" "Internet Explorer $1" "Windows 2000"
"Mozilla/\S+ \(.*MSIE (\S+);.*Windows NT(?:| 4.0).*\)" "Internet Explorer" "Internet Explorer $1" "Windows NT"
"Mozilla/\S+ \(.*MSIE (\S+);.*Mac_PowerPC.*\)" "Internet Explorer" "Internet Explorer $1" "Mac/PowerPC"

; Netscape Navigator
"Mozilla \(OS/2; (?:I|U); OS/2 Warp\)" "Netscape Navigator" "Netscape Navigator" "OS/2"
"Mozilla/(\S+) \(.*OS/2.*\)" "Netscape Navigator" "Netscape Navigator $1" "OS/2"
"Mozilla/(\S+) \(.*Linux.*\)" "Netscape Navigator" "Netscape Navigator $1" "Linux"
"Mozilla/(\S+) \(.*Macintosh.*68K\)" "Netscape Navigator" "Netscape Navigator $1" "Mac/68K"
"Mozilla/(\S+) \(.*Macintosh.*PPC\)" "Netscape Navigator" "Netscape Navigator $1" "Mac/PowerPC"
"Mozilla/(\S+) \[[a-z]{2}\].* \(.*OS/2.*\)" "Netscape Navigator" "Netscape Navigator $1" "OS/2"
"Mozilla/(\S+) \[[a-z]{2}\].* \(.*Win9(?:5|8).*\)" "Netscape Navigator" "Netscape Navigator $1" "Windows 95/98"
"Mozilla/(\S+) \[[a-z]{2}\].* \(.*WinNT.*\)" "Netscape Navigator" "Netscape Navigator $1" "Windows NT"
"Mozilla/(\S+) \[[a-z]{2}\].* \(.*Windows NT 5\.\d.*\)" "Netscape Navigator" "Netscape Navigator $1" "Windows 2000"
"Mozilla/(\S+) \[[a-z]{2}\].* \(.*Linux.*\)" "Netscape Navigator" "Netscape Navigator $1" "Linux"
"Mozilla/(\S+) \[[a-z]{2}\].* \(.*SunOS.*\)" "Netscape Navigator" "Netscape Navigator $1" "SunOS"
"Mozilla/(\S+) \[[a-z]{2}\].* \(.*AIX.*\)" "Netscape Navigator" "Netscape Navigator $1" "AIX"
"Mozilla/(\S+) \[[a-z]{2}\].* \(.*IRIX.*\)" "Netscape Navigator" "Netscape Navigator $1" "IRIX"

...
===


Webolog collects data in a simple database. For simplicity I didn't use a RDBMS. Instead, the data is stored in a ZIP file containing several binary and XML files. Not very efficient, but it was enough.

Then Google Analytics became usable and even popular, so I stopped using Webolog.


p.s. And when I looked at the source of Webolog, I found using the variable name "t" as expected:





See also related notes:
Image albums:

0 comments:

Post a Comment

Blog Archive