In this article I reported my experience working on large applications, and how monitoring cames in my journey when I started producing software designed to solve business critical problems.
Hi, I’m Valerio, software engineer from Italy and CTO at Inspector.
Solve customers critical problems can generate great business opportunities, but in these situations you need to be ready for really high customers expectations.
To serve these customers and seize these business opportunities I quickly realized that it was needed to automate most of the activities that were taking up a lot of my time every day with a negative impact on my productivity.
We understand immediately when the time comes to change something in our way of working. New customers are coming, applications become more and more complicated, bureaucracy increases as well, emergencies that previously occurred once a month start to force us to stay at work until late every day.
I can’t be aware that my application is broken because a customer directly report the error to me. Customers doesn’t report bugs or errors, they simply stop using an application looking for another team which is simply better organized.
After more than ten years as software engineer I spent a lot of time selecting the best tool-set to improve my productivity.
A lot of confusion has arisen in the world of monitoring, probably because so many data can be used in so many different ways. At first approach it’s not easy for developers to find the best combination to solve emergencies efficiently also improving everyday work. In this article I wrote about my experience trying to differentiate:
- When, or in which situations, monitoring can be effective
- Why you should monitor certain parts of your system and others not based on your stage of growth
- What is the right tool for each specific monitoring problem
What are applications monitoring tools?
Applications monitoring tools generally consist of two parts:
- The Agent
- The Analytics Platform
The agent is a software package that developers install in their servers or applications (based on how the agent is designed) and its goal is to collect relevant informations about application behavior and performance.
These informations are sent to the remote platform that analyze that data, generate visual charts to help developers easily understand what’s happening in their system, and it should be able to send alerts to developers if something goes wrong in a convenient way.
What they are not
This is obviously a simplicistic description that could be cover a huge amount of tools out there.
In fact many tools look like application monitoring tools, but they have nothing to do with application monitoring. These similarities made it difficult for me to figuring out which was the right tool to solve my productivity problems.
Here is what I learned in my journey.
Logs management tools
Logs management tool is often the first kind of tool we tend to approach because since the beginning of the application development journey, watch application logs is one the most important activity every day to be informed about what’s happening inside the most important parts of our application.
But when the application started to scale (it runs on multiple servers, require a complex architecture, etc.) I realized that it was very difficult to extract relevant informations from logs about application performance and monitor the impact of the code changes over time in terms of stability and resources consumption.
Like when the car was invented, people were initially looking for a faster horse because they were used to using the horse. Then they realized that a different tool was needed to take it to the next level.
Uptime monitoring tools can be described like a more sophisticated “ping”.
The main purpose is simple: They ping your application endpoints from multiple regions to understand how well (or bad) it can be reached from users located in various geography.
These informations are useful to understand how the cloud infrastructure works to bring your application to the end-users (load balancer, CDN, network, etc.) and if some of this systems generate issues. It does not provide any information on what is going on inside your application.
In my case my application serves users all around the world, so external ping stats helped us to understand what regions suffer the highest latency by making decisions about in which regions we should place our servers to improve our customer experience.
They monitor the external environment, if your database slows down, you will never know.
Server vs Application monitoring
This is the hardest difference to understand, and I have not found any interesting article that helped me clarify separation of duties.
The Application runs on a Server, so they are obviously two strictly related components of the system. That’s why it might be confusing at the first.
But server and application monitoring accomplish two completely different needs.
Server monitoring focus you on infrastructure, and it’s also basically provided for free by any decent cloud provider.
Google GCP, AWS, DigitalOcean, provides you the most important metrics by default, like CPU usage, Storage, Bandwith, and more, completely free with no extra cost other than run the VM itslef.
Server monitoring offered free by cloud providers.
Understand the time your VMs must scale up (or down) is an important necessity, but have the CPU at 100% could mean everything and nothing:
- What part of your application you need to refactor if your application consumes to many resources?
- How can you identify why a certain part of your app is slowing down causing a negative experience to your users?
- How can you be aware if your application is firing an exceptions, and why?
As mentioned at the beginning of the article Server monitoring works installing an agent at the server level, so “outside” of your application. But it’s really hard to look at your application from the outside and know what’s going on inside your code.
Application monitoring finally focus you on “application” 🙆🙆.
These class of tools provides you a software library, not a package to install in the OS. Developers install the integration library in their application like any other dependency without touching the server’s configuration, and it automatically collect relevant information about your code performance, errors, and trends to alert you in case something goes wrong, like a sentinel.
All-in-one platforms issues
This market is currently dominated by gigantic, all-in-one platforms like Dynatrace, Instana, AppDynamics, Datadog, and more that provides one platforms that contains logs, server metrics, uptime metrics, application metrics, unstructured data, search indices, etc.
During a business event I have had the opportunity to present Inspector to one of the big Utility Company in Italy (5 billion € of annual revenues) that has invested who had already entered into an agreement with Dynatrace for two million euros per year.
I immediately thought this can’t be the case of the million of software houses and SaaS startup out there. This kind of platform often require a dedicated engineering team for configuration, and maintainance and the difficulty of being used by smaller companies increases even more.
What problem does an application monitoring tool solve?
Application monitoring tool provides metrics and alerts to identify bugs and bottlenecks in your application without waiting the customers to report an issue.
It act like a sentinel allowing you to visually explore how your code runs, doing 90% of the analysis work in complete autonomy.
Why is application monitoring important?
It is important because happy customers are paying customers.
Having an application is the easy part relatively speaking; anyone can do it.
The real work starts by building your rapport with the customer and making them number one.
If you put the customer first, they’ll remain loyal fans of your product. On the other hand, one of the worst things for your business is an error prone, buggy software.
Nothing will drive potentially paying customers away faster than waiting for the site to load up, or finding it down altogether. So do whatever it takes to make them happy and the revenue will follow.
What can you monitor in an application?
You should be able to easily know how long your application takes to fulfill http requests or complete background processes, like jobs, cron tasks, etc. to understand what are the most consuming processes in your system.
Each execution cycle is typically called a “Transaction”. So during a transaction the application can performs many different tasks like SQL queries, read/write files, call external systems, algorithm, etc.
In Inspector you can explore your running code visually like in the image below:
All of this information are automatically collected by Inspector without any tricky configuration by developers.
Have you ever desired to watch your code running, instead to have just imagining it?
That’s what Inspector is designed to do, and how it is positioned in the monitoring market. It focus your attention to the code.
I really believe that clear and simple infromation are the most important thing to make better decisions.
Learn why, when and how to use monitoring tools was one of the most confusing parts of my developer journey, and I hope my experience building Inspector can help you to have a more aware vision of your needs and what are the right tools to solve your problems and improve your productivity.
Thank you so much for reading it, share this article on your social accounts if you think it would be helpful for others.