Review: Effective Monitoring & Alerting

Effective Monitoring and Alerting by Slawek Ligus was not quite the book I had originally thought I was starting. I was looking for something more prescriptive and I read something far higher up the stack. It wasn’t a bad book, but it was something different from what I went in expecting.

The book takes a high-level look at how to keep alerting from getting out of hand. That is the overall message they are trying to get across. Here is the overall message:

  • You need to make sure you monitor the proper things in the proper way. This brings about a deep understanding of the system as a whole and also forces you to really figure out what dependencies the distinct of parts of your system have in order to be certain you are monitoring things that matter.
  • Armed with that information, you move onto mapping out what should be shooting off alerts. This gets directly to the data about dependencies because we want to be certain that we alert only on the parts of the system that are failing, not on those parts dependent on the failed area.
  • The entire idea is to make sure that the alerts getting sent out are needed and useful. There is talk of standardizing the names of the systems and alerts so you can know exactly what is happening right from the start.
  • There is a huge focus on making sure the alerts are truly actionable and needed so that you don’t give your IT operations staff alert fatigue. The idea is to alert on things that can and need to be fixed and on nothing else.
  • This means monitoring everything but alerting on just a small subset. You can use the monitoring data for capacity planning and also trying to find issues before they start, but you will constantly be changing the alerting thresholds so that only the most important ones are sent through.

That’s the overall look. As far as this review goes, it comes down to this: I would definitely read it again, but be aware of what the book is going to be about. It is NOT prescriptive at all, but it is short enough to be useful even for the smallest of operations department.

Dark IT

Within any organization you have individuals bringing in IT resources from the outside. This is not always a bad thing, but I have found that it is the root cause of many individual problems for individual users. I’m not going to talk about the problems that unsupported technology can cause within an organization. That’s boring and it has been rehashed by too many people.

The better topic is this: If you have a proliferation of Dark IT (unsupported technology brought in by another individual or department), why is that happening? What is or is not happening that is causing these people or departments to look outside of IT to find solutions for their problems?

Every time this happens, it is an opportunity to look inward at your department to make things better.

Nate Beran gets to the heart of the matter with IT And The Business Are Indistinguishable and I am not going to reproduce any of it here because it is so short. Technology and computing has weaseled its way into every nook and cranny of virtually every organization that the traditional way of thinking of IT as a separate entity doling out technology to everyone else isn’t going to cut it anymore.

Things change. Needs change. Department change. Dark IT can be a symptom of the larger problem of IT being pushed out either consciously or subconsciously and it is time to sit down with people and figure out what needs to be done. It is not so much that your job is on the line (even though it might be), but that in order to better serve the people around you, you need to get working on mending fences.

Something About SmallOps

DevOps gets all of the press right now (and rightfully so in many respects). If you haven’t already, you should subscribe to DevOps Weekly and enjoy some great, curated articles from around the web specifically targeted at anyone even slightly interested in DevOps.

This post isn’t specifically about DevOps, but it is about how I am trying to navigate the torrent of information about DevOps and apply it to my work at Martin Luther College. As a small liberal arts college focusing on training the next generation of pastors and teachers for a conservative Lutheran synod, many of the items that DevOps proponents talk about don’t directly apply to what we do here.

However, there are things that DO apply … sometimes with only some very minor tweaks. So I’m going to go ahead and coin the term SmallOps so that I have some sort of umbrella term to cover these sorts of topics from a slightly different lens. SmallOps is my attempt to take the overall shift in how IT Operations works in the world and apply it to some of the smallest groups and applications out there. I’m working in higher education, so my bias will be quite apparent.

The idea isn’t to think that SmallOps is so much different from the larger realm of DevOps, but that SmallOps is a specific part of the greater whole of IT Operations with needs that larger shops don’t necessarily have. I’m hoping future writing can bring some of this to light.