I started programming professionally when I was 16. It’s not that I was a child prodigy. Just lucked into something I really liked at a time when no one knew anything about programming. As a result in my early 20s I was pretty cocky about my development chops. I was the youngest guy in every company I worked at. Younger than the secretarial staff… I knew all the latest stuff and was feeling pretty confident. Then I sat down to debug some code with a brilliant young manager. I used debuggers before, but not like that… I didn’t even know you can change the value of an inspected variable. I was floored by the way he used a tool I thought I understood. I was blown away by the way he read code and instantly understood the forces at play.
As a side note he was also an amazing manager who was remarkably empathetic and calm. I learned from him how a good manager should act and every time I feel like I’m about to blow a fuse I try to channel his calmness. Can’t say I’m perfect at that but if you get a chance to mentor young developers a little patience, empathy and positivity goes a long way!
I think my success in later years was thanks to that humbling experience. I received that humble pie early enough in my career to learn from it, but not too early to keep me in constant self doubt. I was able to learn these things and I dug deeper into everything I could find but honestly, there just isn’t all that much material covering debugging.
There’s even less about debugging production systems. This is probably the most important and time crucial task you’ll ever have as a programmer and yet… No one writes about it.
I spend most of my time as a developer reading code. A small amount of the time coding and debugging is just one part of that. I’m atypical here since the statistics say that developers spend 25-50% of their time debugging. But the time we spend debugging impacts everything else we do. If we don’t have Jedi level mastery of this craft we’re effectively working with one hand tied behind our backs.
Debugging and dealing with production issues are often treated like a craft. Passed on by pair coding from master to novice. We invented weird internal gags like rubber ducks, shaved Yaks and other lore around this craft. But there's an amazing lack of knowledge around the tools, tricks and even the very process we need to take. Many of these things are timeless. They transcend the current tools, languages and fads. Others are new developments that we’re sometimes unfamiliar with. I’ll try to cover all of these and dig especially deep into debugging production related problems.
Logging is a Form of Debugging
A month or so ago I responded to an article that covered the many frustrations the author experienced with debuggers. This resulted in him abandoning debuggers altogether and even advocating against them. This isn’t a rare case, some pretty prominent and smart people promote logging over debugging. Personally, I’d argue that logging is a form of debugging. Especially with scripting languages where we can instantly add a log to see what’s going on. I commented on that post with some thoughts, but a month later my opinion on this subject is more fleshed out.
Debuggers let us verify assumptions and understand complex systems. I didn’t need them as much for a system where I’m the sole author, but when I work in the “real world” they provide a level of insight no UML diagram can rival. But in some environments the standard debuggers aren’t great, logging on the other hand is pretty easy. We can also add logging to production code to help us inspect the real world.
Fast Turnaround is Key
One of the biggest problems in solving a bug is getting it to reproduce consistently. Bugs that reproduce in production, unit tests but don’t show up under a debugger often sour people on debuggers… But there are tricks around that that most developers aren't even aware of.
The value of debuggers is in the ability to instantly “see” the problem and solve it. Logs are most valuable when we don’t need to go through an entire CI/CD round trip to see their output. Unfortunately, they sometimes require guesswork and foresight during the initial development stage that we just don’t possess.
So What Will We Cover?
Recently a young developer at Lightrun sent me an obfuscated stack trace and asked me if I had an idea. He was surprised I gave him the answer almost immediately. I can brag and claim “experience” but it’s really knowing where to look, which I gained through experience. Unfortunately, these guidelines of “where to look” aren’t well documented. So I had to learn them the hard way. I hope to save you the pain of wading through that long/exhausting process of understanding the nuances in our field. The main difficulty here is that all these problems don’t fit into the common boxes. They aren’t code. They aren’t devops. That gray area of debugging, of reading a crash report, a log etc. That falls between the cracks.
I want this blog to help people of all levels. So one of the first things I want to work on is a complete from scratch debugger tutorial. This can be helpful for experienced developers who I’m sure will learn stuff they didn’t know in the advanced sections. But I’m aiming for something anyone can pick up.
I’ll focus on IntelliJ/IDEA and VSCode for this guide since they’re the tools I use a lot and are both very popular. I’ll focus on Java but touch on typescript a bit. I’m just more comfortable with Java and a lot of the ideas are universal.
The second part I want to focus on is production problems. There’s plenty of “production horror” on the internet, but it’s all over the place in terms of guidelines. It reads like a horror story, not like a tutorial. The value of these stories is in the lessons learned. How did we find the problem, narrow it down, fix and deploy it. What did we learn from it?
While I’ll talk a lot about crashing, burning and even almost going bankrupt because of a badly written PaaS… My main focus in these articles is the process of fixing and debugging. A debrief on what we “should have done” whether it’s in preparation for this day or during the event. Obviously, hindsight is 20/20 but we can still project lessons learned and best practices.
I’d also like to talk about performance in production but I won’t consider this a focus. I think a lot of people talk about performance and I’m not sure I can provide valuable insight in the high level general space. I would however like to discuss performance in terms of tooling and the way in which performance in production can differ from synthetic performance benchmarks. This is one of those niches in performance that doesn’t get enough attention.
I might sneak in an occasional positioning style post discussing “shift left” and other terms managers and CTOs like. I think this blog is more focused on technical “real world” work but sometimes we need that terminology so we can communicate our needs and status more effectively when presenting to management. So I think those can be valuable even in a blog geared towards developers.
Finally, while this will be the overarching theme for the blog I still intend to write opinion pieces and other stuff of general interest. E.g. why I love Java so much and still do all these years later.
What I Don’t Want to Talk About
Technically I’m open to discussing anything but I think these things don’t belong in this blog:
- TDD and Testing in general – when reviewing debug books and materials the process quickly moves on to creating a unit test to reproduce the problem, creating an integration test so it won’t come back etc. Yes, those are important things that you definitely should do. But that’s part of your process. I think there are a lot of resources on TDD and I don’t want this blog to turn into that
- Automated tests or any form of QA. The same reasons I gave in the previous point, these are important things that are covered elsewhere
- Devops processes – I want to talk about production and deployment but I don’t want to turn this blog into a devops blog. The main audience for the blog is developers, as such my discussion of OPS is related to the developer angle of OPS
- APM – while I will discuss some APM tools I will focus only on those that are applicable to developers. Most APMs are in the ops side of the chart and probably don’t belong here
To summarize this: if developers find it important for their job and it’s related to debugging/production I want to talk about it. If it’s something that belongs to ops or a big subject that has many 3rd party resources covering it… Well, I don’t want to duplicate that effort.
So What’s Next
Follow this blog for more. Future posts will be more technical and less on the “high level” stuff, I promise! Also follow me on twitter , I just created a new account and could use the follows…
It would be great if you comment here and ask questions etc but if you feel you need to reach me personally you can DM me on twitter or write to me at shaia (at) lightrun.com.