SentinelOne | Common Ways People Destroy Their Log Files

For this article, I’m going to set up a hypothetical scenario (but based on reality) that needs logging. We’re writing an application that automates part of a steel factory. In our application, we need to calculate the temperature to which the steel must be heated. This is the responsibility of the TemperatureCalculator class.

The class is fed a lot of parameters that come from external sensors (like current temperature of the furnace, weight of the steel, chemical composition of the steel, etc.). The sensors sometimes provide invalid values, forcing us to be creative. The engineers said that, in such a case, we should use the previous value. This isn’t something that crashes our application, but we do want to log such an event.

So the team has set up a simple logging system, and the following line is appended to the log file:

An invalid value was provided. Using previous value.

Let’s explore how this well-meant log message doesn’t actually help. In fact, combined with similar messages in our log file, the log file ends up being a giant, useless mess.

Not Logging Descriptively

Of course, the first thing you’ll see is that we don’t have a lot of extra information. What is an invalid value exactly? “Null” would probably be invalid. But what about values we know are too high or too low for our calculations? What’s the previous value we’ll use? Who provided the invalid value?

These are probably all things we can find out by looking in other places (other logs, the code, the UI, etc.). But we could make our lives easier by adding some context.

What if we change the line to

The Sensor3000 provided an invalid value (0). This would lead to a DivideByZeroException. Falling back to previous value (88.4).

That’s a lot better, isn’t it?

Forgetting Timestamps

Say there was an issue far down the line of the production facility. Some batch of steel was of a low quality and had to be discarded. Making steel is a complex process that consists of many steps and many machines, so management wants everyone to check their applications for problems that occurred yesterday between 4 and 6 PM.

How can we find that in our logging message? We have no way of finding out when the messages occurred. Obviously, the solution is to add timestamps:

2018-01-22 20:14:35.9834 The Sensor3000 provided an invalid value (0). This would lead to a DivideByZeroException. Falling back to previous value (88.4).

We can now quickly and easily find all messages inside any date and time interval.

It also provides us some insight into possible performance problems. These two lines could indicate a problem with the network or with the external sensor:

2018-01-22 20:16:02.3947 Connecting to Sensor3000
2018-01-22 20:28:44.9771 Connected to Sensor3000

Twelve minutes to connect to the Sensor3000?! We know it’s usually under one minute!

Not Using Log Levels

We now have descriptive log messages, but there’s no way of instantly seeing how important the message is. The message about an invalid value could be something we find quite important (although not critical). But let’s assume it’s surrounded by hundreds of lines like this:

2018-01-22 20:13:57.6290 Started heating steel order 08931-09
2018-01-22 20:21:36.9318 Finished heating steel order 08953-03. Heated to 1257°C.

These are more informational messages, while others are warnings that something abnormal happened. It’s even worse if we have messages like the following, buried in a flood of less important information:

2018-01-22 20:23:12.3087 NullReferenceException: Object reference not set to an instance of an object.

This is an exception that happened in our application, possibly sending it into an invalid state.

I think you get the picture. Messages have different levels of importance, which is why log levels were invented. Use them correctly and you’ll see messages stand out:

2018-01-22 20:13:57.6290 INFO Started heating steel order 08931-09
2018-01-22 20:14:35.9834 WARN The Sensor3000 provided an invalid value (0). This would lead to a DivideByZeroException. Falling back to previous value (88.4).
2018-01-22 20:21:36.9318 INFO Finished heating steel order 08953-03. Heated to 1257°C.
2018-01-22 20:23:12.3087 ERROR NullReferenceException: Object reference not set to an instance of an object.

Using log levels correctly also allows you to treat the messages differently. The WARN message from above might be something you want to email to the engineers or to the technicians of the Sensor3000. The ERROR message is something you should probably be emailing to your team so you can fix the issue and avoid it in the future.

Not Logging Enough

At the very least, you should be logging exceptions. But when things go wrong, this will probably not be enough. You might need some context as to what happened before the exception occurred and maybe also what happened afterwards.

When writing your code, don’t just think about how the code is going to run. Also try to think about what information you’d like to see long after the code has run. When you only have an exception message to go off of, finding the cause of an issue will require walking through the code, making a mental model of what happened. This is very hard because there will be all kinds of things that you need to add to that mental model. You’ll also need to make a lot of assumptions.

Logging will show you a good history of what actually happened.

Logging the Wrong Things

There are certain use cases that developers solve with logging, even though there are better solutions.

One I see often is logging the performance of problematic methods. While this can be OK for a one-time case, you should avoid having your application (and logs) full of these log messages. Investigate performance profiling tools and application performance monitoring tools. They come at a price but will be better able to help you.

Here’s another example of the wrong things to be logging: user data and behavior. I’ve seen log files full of JSON content that was sent to the server. At some point in time, this was added so there was user data that could be analyzed. However, this filled the log files and, in doing so, obscured the lines about the application’s health. A better option here was to use a user analytics tool, which we ended up switching to.

Because logging is usually already set up, it’s an easy entry point to use for these (and similar) use cases. But logging should be about how your application is running. For other data, there are better solutions.

Logging Too Much

I apologize if this is confusing. First, I’m telling you to make sure you log enough. Then I tell you not to log the wrong things. And now I’m warning you not to log too much.

So what is the correct amount of logging? This is something you’ll have to find out as you go. It also depends on factors like the stability and complexity of the application.

But I’ve seen log files where every detail of the application was logged, forcing you to scroll endlessly to get the bigger picture of what was happening. Too much detail will overwhelm you.

Here, log levels can help again. You could use the lower DEBUG and TRACE levels to have an extremely detailed log in case you need it and write those levels to separate files. But I would only advise logging these levels in production if absolutely necessary.

Not Cleaning Up Log Files

Once you have decent logging in place, your application will be spitting out log files every day. Depending on the size of your application, this will amount to quite some megabytes or even gigabytes over time.

Consider having an automated way of cleaning up log files. You probably won’t be interested in log files from more than a year ago. If there’s still useful data in those files, a better option is to store this data elsewhere, like I mentioned above.

Cleaning up log files is basic good housekeeping. I have encountered multiple applications in my career that filled up the entire hard disk with log files. This stopped multiple applications on the server from working.

Most popular frameworks provide convenient configuration options to perform this automatic cleanup.

Evaluate and Evolve

Inevitably, you will encounter situations where you could have used more logging. And after some time, you will feel that other log messages are no longer necessary.

This is why your logging is subject to the same process as your production code. Evaluate your logs from time to time and let them evolve. Add more logging when it’s deemed necessary, and don’t be afraid of removing log statements when you feel they just crowd the log files.

This post was written by Peter Morlion. Peter is a passionate programmer that helps people and companies improve the quality of their code, especially in legacy codebases. He firmly believes that industry best-practices are invaluable when working towards this goal.