MAR 11, 2025

When Logging Becomes Noise

I got a gig building an OTP system with a team that could deliver codes via phone calls instead of SMS. Pretty neat idea, right? The flow was straightforward: generate OTP, convert to speech via Amazon Polly, dial the user, read the code. Clean microservice architecture, async processing, proper error handling.

Then we decided to log everything. And I mean everything.

Every OTP generation. Every Polly API call. Every audio conversion result. Every retry attempt. Every error detail. Every API response. We were drowning in observability, convinced that more data meant better debugging. Classic mistake.

The result? We were just having a hard time debugging things. Too many logs, not enough signal. Figuring out why something failed meant scrolling through way too many lines, and half the time you couldn’t even tell what went wrong.

We had turned logging into noise. Testament to skill issue, really.

The fix

So we started fixing it. First thing that clicked was realizing individual events don’t matter as much as what they add up to. Kind of obvious in hindsight, but whatever. Instead of logging every Polly retry attempt:

[INFO] Polly retry attempt 1: success
[INFO] Polly retry attempt 2: success  
[INFO] Polly retry attempt 3: failed
[ERROR] Polly retry attempt 4: timeout
[INFO] Polly retry attempt 5: success

We aggregated the entire retry sequence into a single, meaningful log entry:

[INFO] Polly TTS completed: 5 attempts, succeeded on 5th, total_latency=2.3s, user_id=12345

This single line tells you everything you need to know: the operation succeeded, it took multiple attempts (so probably network issues), and it took 2.3 seconds total. No noise, just signal. Much better.

Structured logging

Then we started using structured logging properly. It wasn’t some revolutionary discovery (structured logging already existed), we just weren’t using it. We were using Winston but treating it like console.log, just dumping strings everywhere. Raw text logs are debugging hell. Consider this absolute gem I spent 7 hours debugging at 2 AM:

[ERROR] null

That’s it. Just “null”. No context, no stack trace, no correlation ID. Nothing. Thanks for that, system.

Structured logging changes everything. Every log entry becomes a structured JSON object with consistent fields:

{
    "timestamp": "2023-03-11T14:30:15Z",
    "level": "INFO",
    "service": "otp-service",
    "event": "otp_call_completed",
    "status": "success",
    "call_id": "call_abc123",
    "user_id": "user_456",
    "tts_latency_ms": 320,
    "polly_attempts": 1,
    "user_completed_flow": true,
    "correlation_id": "req_789"
}

Now you can actually search through logs without losing your mind. CloudWatch Insights stops being useless. Debugging becomes less of a guessing game.

Sampling

The other thing we learned was about sampling. High-traffic endpoints don’t need every single request logged. Our OTP service handled around 600 calls at its highest per day. Reason? Things were mostly buggy, and hence logs, and took long to fix because we couldn’t figure out the mess we created. Another skill issue we had to fix.

The solution? Intelligent sampling. Log 100% of errors, 10% of successful requests, and 1% of routine operations:

def should_log_request(request, response):
    if response.status_code >= 400:
        return True
    if response.status_code == 200:
        return random.random() < 0.1
    return random.random() < 0.01

This approach gives you statistical insight into system behavior without drowning in noise. You can still detect patterns, measure performance, and debug issues just with a representative sample instead of the full firehose. Pretty neat trick.

The real lesson

Here’s what most logging discussions miss though: logs are written by humans, for humans. They’re not just data, they’re communication.

A good log entry tells a story. It answers the questions: What happened? When did it happen? Why did it happen? What was the impact?

Bad logging is like bad documentation. It’s technically correct but practically useless. Good logging is like good storytelling, it guides you to the right conclusions quickly. Who knew?

The lesson? Proper logging isn’t just about using a logger. It’s about communicating events in a way that cuts straight to fixes when things break. Focus on what actually matters, structure it consistently, and sample intelligently.

Your logs should tell a story, not create noise.

💡 tip

If you’re interested in diving deeper into logging best practices, I found these resources particularly helpful in my journey:

The 10 Commandments of Logging by Masterzen
Logging Wisdom: How to Log by Emil Stenqvist