With the ever-growing demand for decentralized architecture, when anyone buzy design systems; on-premises or cloud, it is a must, they need to pay special attention to the troubleshooting.
There is a massive demand for decentralized architectures. Having to have almost all business capabilities online and serving a global customer base with low latency is driving enterprise architects to rethink existing architectural strategies for scale. In a large distributed system, it’s critical to have visibility at the operational level for troubleshooting issues.
As a solutions architect, in my book, observability /tracing ranks at the top requirement for the project deliverable. I have witnessed how projects were saved from near collisions and escape from disastrous payback because they had traceability implemented on time, elsewhere some projects had very bad experiences due to a lack of attention to the tracing.
That’s why tracing and observability should be critical parts of any distributed architecture. I have witnessed how observability tools saved customers from having disastrous downtimes that could have cost them hundreds of thousands of revenue.
Why is distributed tracing important?
To give you a clear picture, what I mean here is, you may think that all the lines of code that you have written or you have chosen a vendor for distribution architecture. But, you forget that the software itself is not just what you have written, you may use reusable code created by another developer, its collaborative work. So, one of your function calls another service made by another department. When you test it in development or staging mode, you will find it works well. Yet, when it comes to production, things may go wrong and you lost your mind about what caused that problem. Whether it your function slow down the request, slow database query or the other service that you call is slow?
There are multiple failure points in a distributed system with many microservices. A single developer or a team might be responsible for a single microservice that works well in isolation. However, at runtime, it can talk to different microservices written by different team members or vendor software systems. Having visibility into this chain of system calls (or HTTP calls) is crucial to understanding behavior of a production system.
Imagine the following scenario — you wake up by an alert, at the midnight — a call from your manager and a request involving five different microservices is failing repeatedly. You’re jumping to the logs, still trying to open your eyes against your ultra-brightened screen, looking for errors around the time of the alert, but the stream of data is too big to figure out what happened., It is just taking too long. Using distributed tracing, you can find the first service that failed, get the logs from that failure, and some other stuff (depending on your tracing implementation).
Distributed tracing capabilities of WSO2 products stack
In a distributed WSO2 product architecture such as API Manager, Enterprise Integrator, OpenTracing allows you to enable distributed tracing with ease. OpenTracing aims to be an open, vendor-neutral standard for distributed systems instrumentation. It offers a way for developers to follow the thread — to trace requests from beginning to end across touchpoints and understand distributed systes at scale. There are a bunch of operating tools available such as Jaeger, Zipkin
Distributed tracing is a mechanism to identify the complex journey requests can go through a complex web of microservices. OpenTracing is an open standard that defines APIs and libraries to do distributed tracing.
If you are working with WSO2 products assume you have deployed WSO2 products on-premises or cloud, as solution architects I would recommend you to revisit the tracing capabilities we have improved.
Let’s have look at Jaeger way of tracing for WSO2 API Manager products; If someone needs to know how to install Jaeger please visit here
If you have docker installed, just run the following command;
docker run -d --name jaeger -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 -p 5775:5775/udp -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778 -p 16686:16686 -p 14268:14268 -p 9411:9411 jaegertracing/all-in-one:1.21
For the default Gateway; you need to open deployment.toml and add the following lines
For the Micro Gateway; you need to open micro-gw.conf and add the following lines
Once set up API Manager default gateway or Micro gateway deployment per the instructions, just publishes API from APIM Manager, please refer ; if someone needs to understand how to publish API to the micro gateway. Then, refer ; to understand how to publish API from API Manager.
Once API publishes, you can follow  to successfully invoke the API. Then, you may log in to the jaeger server e.g http://localhost:16686/
Search the services and click find a trace, you will find a recent translation listing
Click transaction view, you should be able to drill down within API Manager the time taken to process the request
With this approch, you could make lots of internal DevOps decisions if something goes wrong due to distributed nature of communication. Simple as that.
You already know that knowing the exact problem in the deployment can be tricky and painful when the traffic is high. But, the tracing system can help you to minimize your headache to trace which process that causing the bottleneck of the system. Therefore, think of this and make it integrated as part of your large-scale deployment which saves lots of time and lots of money and panic attacks ….