February 23, 2009
About Log Files: Part 2Stefan Wörthmüller
Extended logging techniques
Stefan is a software developer in Berlin specializing in cross-platform development, user interfaces, and reengineering. He can be contacted at StefanWoe@gmail.com
In Part 1 of this two-part article about log files, I presented log file basics. You've seen how to write log files and how to use them. But how can log files be used in more complex situations? In this installment, I examine how you can use logging with large data sets and how to find rarely occurring bugs. You'll also see how to use log files in threaded or multi-tiered applications.
Rotating Log Files
For server applications, it is important that you don't fill the entire disk. However, it is equally important to keep information for a long time because the original cause of problems might be recorded. In such cases, rotating log files are a proven technique: You create a new log file after a certain amount of timevor when a certain size is reached, then only keep the last couple of log files. The filenames should not just increase an indexed suffix number in a round-robin fashion, but rather include the date and time of creation in a sortable format. This makes it easier to identify the "right" log file. Whenever a new file is created, the oldest files are deleted by a defined policy; for example, keep the last 10 log files, or those of the last three or so days.
Ring Logging
I once had a problem with an assertion that very rarely failed very deep in the database layer of my application. A class function was called in a class state where it wasn't meant to be called. This only happened at my client's machines, and I wasn't able to reproduce the bug -- not even once. This is where ring logging comes in.
Ring logging is implemented via the following steps:
/////////////////////////////////////////////////////////////////////
Listing One: Logging classes and function
Listing One implements a ring buffer, along with macros that simplify the creation of log statements and that automatically added file and line numbers. I added these macros to most of the class members and inserted the call to the log dump function right before the failing assert -- calling the dump only if the asserted condition was True. I shipped the new version to customers who had reported the error and waited. After a while I received a log file that more closely showed what was happening -- but still I wasn't able to reproduce or fix the problem. However, I did have a closer view of where the problem was hidden. With this knowledge I added more ring log statements and shipped another version. When I received the next log file I saw within a few minutes what the problem was -- and I was able to reproduce it on the spot and fix it right away. And I had spenet only a hour (or less) for the log entries. This is why I say that ring logging is something like the secret service of programming.
Working with Large Datasets
A single log file is good, but what do you do with 100MB of them? You may just browse them, usually starting from the end of the last log file. However, the information you are looking for might be spread over several log files. When analyzing a large number of large log files, use grep or a similar search over several files. Searching for all occurrences of a specific ID will probably reduce the amount of data to a small number of lines.
For example, one application I maintain simulates hydraulic networks. Sometimes a problem occurs only in networks with hundreds of thousands of elements. In such situations I use a function that writes the state of all elements to a log file at certain states (i.e., when entering and leaving the simulation code). I only do this in my debugging version by uncommenting some calls. Of course, it then takes time to log the information for all elements processed. But afterwards I just search for all log entries referring to the element where a problem exists. I might add additional statements and rerun the program. But all that is much more feasible than using a debugger. Debuggers are great, but when working with huge amounts of data it sometimes advantagous to be able to search forward and backwards in a log file, to search for the state of related elements, to double check things again, and so on. This is easily done with a text file. You can't browse or search the execution history of your program in a debugger. You'll have to restart the whole program, which can take a while. Using less, grep, and tail
Of course, you need good tools to analyze large log files. UNIX programmers are familiar with tools such as grep, less, or tail. They're also available for Windows(see http://unxutils.sourceforge.net/) and arelightweight, fast, and have no size limit. Still, you need a little practice to work with them:
While you can use Notepad or other editors, large files may be difficult. And also most editors refuse to open a log file that's still is in use. "Less" and the other "unix" commands do not complain.
|
|
||||||||||||||||||||||||||||||
|
|
|
|