Distributed Tracing with Spring Cloud Sleuth | by Ruby K Valappil | CodeX | Dec, 2021

Spring Cloud Sleuth is one of the projects under the spring cloud umbrella.

The objective of this project is to enable tracing in a distributed system and to make it easily configurable in a Spring Boot project.

Internally, Zipkin OSS is used to generate and report traces. Zipkin is an OSS that was initially developed by Twitter and is now maintained by the Zipkin community and volunteers.

Tracing becomes a challenge in distributed systems when there is no unique reference to trace a particular activity across multiple systems and services.

Spring Cloud Sleuth addresses this challenge and generates trace headers accordingly.

Each trace includes these three key information:

  1. Application name — As configured in the yml file
  2. Traceid — This is an Id assigned to a single request or job.
  3. SpanId — This is an Id assigned to a single unit of work. Multiple SpanId’s could be traced under the same TraceId.

If one microservice calls another microservice then the trace headers will continue in the called service.

Let’s see the implementation using a SpringBoot application (and compare it with an application that doesn’t use Sleuth)

If needed, check out the sample Spring Boot code here. Refer to this article to set up the workspace and project. You may use any Spring Boot application to follow this tutorial.

Start the application mentioned above and you will see the logs generated as given below,

Stop the application and let’s make the following changes to add cloud dependencies.

  1. Modify pom.xml

Note that, if you are using a different codebase then add a cloud version corresponding to your Spring Boot version.

2. Modify application.yml

Add the application’s name in the yml file, this name will be part of the trace generated by the sleuth.

“bar” name is copied straight from the spring docs, you may want to use a meaningful name.

3. Start the application

In the given example, when the service starts, a scheduler is triggered, after which you can hit GET/POST/DELETE APIs to test the trace ids.

We can see in the above logs, with each user request a new TraceId and SpanId are generated.

In distributed environments, there is a practice to generate a unique id and pass it in the header to trace all the consecutive calls, all that overhead can now be taken care of by the sleuth.

Configuration is straightforward and the trace output is easy enough to comprehend.

The biggest challenge though with passing a unique header or maintaining a unique id is, the traceability is lost in asynchronous calls. I do not see that issue being addressed in Sleuth as well.