In 2001, I had access to the server logs of a Tonigy website. I tried several ways (at least Moglan, Webalizer, Analog, ALA, and hypermart.net service) to analyze them, but I didn't like the results. So I decided to write my own web log analyzer.
Although I wrote Tonigy in the C language, I decided to use the Java language. Java had a much more advanced standard library than C. In C, even simple string manipulation was too verbose.
I was also writing in Perl at the time, but I was never a fan of that language. I didn't see Perl as a language for anything more complex than a few CGI files.
In 2001 I created an analyzer called Webolog, which I never published. But I used it a lot in my work. It was a command line tool for Java 1.1+. It takes logs in Common or Combined Log Format, extracts data, collects it, and generates a bunch of HTML files (with statistics).
I do not have logs that old, but I found screenshots from May 2001. Funny, they show MS IE as the browser.
Total visits:
"Referers":
Search queries (from "referers"):
It is possible to configure query extraction. The search sites at that time:
===
web.altavista.com q
www.altavista.com q
www.lycos.com query
www.google.com q
www.google.fr q
www.google.de q
www.search.com qt
google.yahoo.com p
search.163.com key
en.os2.org query
===
Browsers used:
And OS used:
Agent data extraction is also configurable with RegEx.
===
; Opera
"Opera/(\S+) \(.*Windows 9(?:5|8).*\)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "Windows 95/98"
"Opera/(\S+) \(.*Windows NT 4\.0;.*\)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "Windows NT"
"Opera/(\S+) \(.*Windows NT 5\.\d;.*\)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "Windows 2000"
"Opera/(\S+) \(.*Linux.*\)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "Linux"
"Opera/(\S+) \(.*OS/2.*\)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "OS/2"
"Mozilla/\S+ \(.*Windows 9(?:5|8).*\) Opera (\S+)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "Windows 95/98"
"Mozilla/\S+ \(.*Windows 2000.*\) Opera (\S+)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "Windows 2000"
"Mozilla/\S+ \(.*Linux.*\) Opera (\S+)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "Linux"
"Mozilla/\S+ \(.*Windows 3\.10.*\) Opera (\S+)\s+\[[a-z]{2}\]" "Opera" "Opera $1" "Windows 3.1"
; Internet Explorer
"Mozilla/\S+ \(.*MSIE (\S+);.*Windows 3\.1\)" "Internet Explorer" "Internet Explorer $1" "Windows 3.1"
"Mozilla/\S+ \(.*MSIE (\S+);.*Windows 9(?:5|8).*\)" "Internet Explorer" "Internet Explorer $1" "Windows 95/98"
"Mozilla/\S+ \(.*MSIE (\S+);.*Windows NT 5\.\d.*\)" "Internet Explorer" "Internet Explorer $1" "Windows 2000"
"Mozilla/\S+ \(.*MSIE (\S+);.*Windows NT(?:| 4.0).*\)" "Internet Explorer" "Internet Explorer $1" "Windows NT"
"Mozilla/\S+ \(.*MSIE (\S+);.*Mac_PowerPC.*\)" "Internet Explorer" "Internet Explorer $1" "Mac/PowerPC"
; Netscape Navigator
"Mozilla \(OS/2; (?:I|U); OS/2 Warp\)" "Netscape Navigator" "Netscape Navigator" "OS/2"
"Mozilla/(\S+) \(.*OS/2.*\)" "Netscape Navigator" "Netscape Navigator $1" "OS/2"
"Mozilla/(\S+) \(.*Linux.*\)" "Netscape Navigator" "Netscape Navigator $1" "Linux"
"Mozilla/(\S+) \(.*Macintosh.*68K\)" "Netscape Navigator" "Netscape Navigator $1" "Mac/68K"
"Mozilla/(\S+) \(.*Macintosh.*PPC\)" "Netscape Navigator" "Netscape Navigator $1" "Mac/PowerPC"
"Mozilla/(\S+) \[[a-z]{2}\].* \(.*OS/2.*\)" "Netscape Navigator" "Netscape Navigator $1" "OS/2"
"Mozilla/(\S+) \[[a-z]{2}\].* \(.*Win9(?:5|8).*\)" "Netscape Navigator" "Netscape Navigator $1" "Windows 95/98"
"Mozilla/(\S+) \[[a-z]{2}\].* \(.*WinNT.*\)" "Netscape Navigator" "Netscape Navigator $1" "Windows NT"
"Mozilla/(\S+) \[[a-z]{2}\].* \(.*Windows NT 5\.\d.*\)" "Netscape Navigator" "Netscape Navigator $1" "Windows 2000"
"Mozilla/(\S+) \[[a-z]{2}\].* \(.*Linux.*\)" "Netscape Navigator" "Netscape Navigator $1" "Linux"
"Mozilla/(\S+) \[[a-z]{2}\].* \(.*SunOS.*\)" "Netscape Navigator" "Netscape Navigator $1" "SunOS"
"Mozilla/(\S+) \[[a-z]{2}\].* \(.*AIX.*\)" "Netscape Navigator" "Netscape Navigator $1" "AIX"
"Mozilla/(\S+) \[[a-z]{2}\].* \(.*IRIX.*\)" "Netscape Navigator" "Netscape Navigator $1" "IRIX"
...
===
Webolog collects data in a simple database. For simplicity I didn't use a RDBMS. Instead, the data is stored in a ZIP file containing several binary and XML files. Not very efficient, but it was enough.
Then Google Analytics became usable and even popular, so I stopped using Webolog.
See also related notes:
- Loop counter name: "t" vs "i" (2024-02-21)
- Tonigy source code [OS/2, 2001-2002] (2023-12-03)
- Tonigy 20+ лет (2021-12-03)
Image albums:
0 comments:
Post a Comment