We’ve talked a lot recently about our experience decomposing our monolithic web applications and migrating towards a microservice architecture. In this post, the third part in our ongoing microservices series, we’ll talk about the technology and tools we have chosen to build our common microservice stack.
Choosing a Microservice Stack
As we talked about in our microservices pitfalls post, one of the most important things you should do before building microservices is settle on a microservices stack. A microservice stack is the consistent set of technologies with which all teams will develop their microservices. This is what Mark Richards referred to as a “base image”. As we’ll discuss below, failure to take this important step early on can have lasting impacts on your organization.
The HomeAdvisor microservice stack is based on a variety of open source projects that generally have great community support and proven track records across multiple industries. Choosing your own stack should be based on several criteria:
- Cost: Microservices typically require more CPU and server space than their monolithic counterparts, so use open source wherever possible. And don’t forget to consider both license and support costs.
- Support: Choose products that have active community involvement. Even if a project is not open source, you can often gauge its general usage in the wild by how frequently questions are asked and answered on community forums. Make sure to checkout Stack Overflow or other community boards to make sure a project hasn’t been abandoned or that you aren’t an early adopter.
- Experience: Reduce friction by choosing tools that are written in languages your developers already know. Even better, favor those that are built on existing technology you may already use in your monolith. Learning new tools and technology is great, but when it comes to the core of your microservice architecture avoid the temptation to use bleeding edge frameworks.
- Cloud Ready: If you plan to run your microservices in a third party cloud service like AWS or App Engine, make sure your desired frameworks are supported. Some cloud engines like Heroku and App Engine may limit the technology and frameworks that can be deployed, or which programming language can be used.
There may be other criteria that make sense for your organization, but these are a good starting point. And who exactly should be a part of the decision making? In addition to your development staff, you may consider including other groups when choosing your stack:
- QA: The choice of technology will most likely play a role in how you test your microservices, so it might make sense to include them in the decision process. Automated testing is something we’re starting to dive more into at HomeAdvisor, and we’re working with our QA team to enhance our microservice stack to make building automated tests as easy as possible.
- DevOps: The people who push code around will almost certainly care about your microservice stack. They may plan to use a cloud service or container environment, which could have an impact on your technology choices.
- Product: In general product owners don’t care how your stories get implemented, just as long as they work. But they might care about certain aspects of the architecture that impact their ability to verify stories: logging, testing, data persistence, scripting, etc. Your choice of technology may affect one or more of these areas.
- Management: The people in charge of setting business goals and monitoring performance will be impacted by your choices. They’re likely to look at the microservices stack from a different viewpoint: how much does it cost? is it scalable? how long will it take to implement? It’s important to keep these stakeholders involved throughout, or else you risk making choices that won’t meet business needs and ending up back at square one.
The HomeAdvisor Microservice Stack
When we first began writing microservices, our Instant Booking team was the first to start building microservices. We had not yet finalized our microservice stack, but the Instant Booking team had their own deadlines and business needs and couldn’t wait for a standard to be adopted. So they blazed their own trail and used technology that was easy for their team to work with and allowed them to create microservices quickly. A few months later, the rest of the teams had adopted a standard microservice stack that benefited from shared knowledge, frequent updates and fixes, and which removed a lot of boilerplate code. While the Instant Booking team had a working set of microservices that met their needs, they had become the black sheep of our agile teams. Their microservices had different deployment needs and used different frameworks for building and testing than the rest of the teams. That was nearly two years ago, and to this day there are still some remnants of those tools in our architecture. Periodically we have to allocate time in our development sprints to clean up this technical debt.
Based on our experience, we found that having a common microservice stack has many benefits. First, it simplifies the architecture and development processes. Less tools for developers to learn makes it easy to maintain microservices and for developers to move around and help other teams when needed. It also means consistent instructions for DevOps. Having a consistent set of procedures to build, test, and deploy every microservice has allowed the new team members to acclimate much faster than if each time had their own needs. Having a standard microservice stack also leads to best practices and tribal knowledge. As developers learn more about the technologies, every team can benefit from that knowledge. Or even better, those best practices can be rolled into the microservice stack so they can be enforced and adopted with minimum friction. And as bugs are identified and fixed, teams simply need to pull the latest code to remain up-to-date. All of these benefits can only be realized when you commit to a standard microservice stack for all teams.
Hopefully you’re convinced that creating a standard microservice stack is important. The rest of this post will focus on the tools we chose at HomeAdvisor for our common microservice stack.
Spring Boot With Embedded Tomcat and MVC
At the heart and soul of each of our microservices is Spring Boot. This is a lightweight framework for building Java applications using various Spring and third party projects. Spring Boot takes an opinionated view of applications and only includes the libraries you absolutely need. This keeps applications lightweight and quick, and makes it easy to include new features as needed. We find that even our biggest microservices start up in less than 30 seconds.
On top of the Spring Boot framework, we layer on both embedded tomcat and Spring MVC. The choice to use Tomcat over another embedded HTTP server like Netty was simply based on the fact that we already use Tomcat with Spring MVC in our monolith. Using the same classes and constructs in our microservices made transitioning much easier for developers.
One of the things we really like about Spring Boot is auto configurations. All of our microservices get a basic set of components right out of the box without a single line of code or configuration. For example, they all register with our service discovery system on startup, and they all get some basic health checks for memory usage and other application level metrics.
But the real power behind Spring Boot auto configurations is a powerful set of conditional logic constructs that allow us to tailor functionality for each microservice. For example, we can instantiate and register ElasticSearch health checks for applications that perform operations on our ElasticSearch cluster. Or we can create a Hystrix metrics publisher if we detect the presence of one of our Robusto API clients, to which we can then attach to the Hystrix dashboard. The ability to configure all of these components for microservices without a single line of configuration really makes it easy for developers to write microservices.
Using Spring Boot with Spring MVC also allows us to provide standard functionality for all of our microservices. For example, the base image for microservices includes interceptors that can examine or manipulate API calls. This allows us to do things like log every request and response to our logging platform (another important microservice pitfall you should avoid), implement common security checks, or validate request parameters. Basically anything we don’t want each team to implement on their own, we build into the standard microservice stack.
Apache ZooKeeper is a distributed service for maintaining configuration data and coordinating distributed systems. Similar to a Unix file system, data is stored in a hierarchal set of structures known as ZNodes. Each ZNode can have zero or more child ZNodes, along with it’s own set of permissions. On its own, ZooKeeper is just a way to store arbitrary data in a distributed and highly available fashion. What makes it really powerful is the Curator library, which abstracts away a lot of the boilerplate code required for managing connections and manipulating data. Curator also provides what they call recipes, which are APIs that implement common design patterns.
At HomeAdvisor, we use the Service Discovery recipe of Curator as the core of our discovery and registration infrastructure. Using Spring Boot auto configurations, all of our microservices register with ZooKeeper during startup. This auto configuration gathers all the information needed for Curator like host, port, and name, and also grabs some additional information specific to our environment, like JMX port and application version. All of this data is stored in ZooKeeper as long as the microservice is running. Once the microservice terminates, gracefully or not, the service registration data is cleaned up automatically by ZooKeeper. This ensures all of our microservices have a relatively recent view of other services with minimal lag.
When it comes time to discover another microservice, we use a number of Curator constructs. First, we use a Down Instance Manager that keeps track of errors that occur while using other services. If too many errors occur on a particular service, the Curator framework automatically ignores that particular instance for a brief period, allowing it time to hopefully recover. We also use a number a service filters to eliminate certain services from discovery. For example, if a developer introduces a new breaking change to their microservice, we can instruct service discovery to ignore that particular version or host when finding services. Most importantly, we use the Curator round robin policy to ensure that whenever we have multiple instances of a microservice running, requests are spread across all available instances, rather than overloading a single instance.
Apache Kafka is a distributed commit log that replaces traditional messaging systems such as JMS. Kafka gets rid of the notion of topics and queues and simply defines everything as a topic. Using consumer groups, a topic can behave as either point-to-point (queue) or publish-subscribe (topic). Kafka is designed to be fast, scalable, and fault tolerant. And messages can be persisted and re-read even after they’ve been consumed.
At HomeAdvisor, we have two main use cases for Kafka in our architecture:
- Logging: All of our microservices that expose HTTP endpoints use Kafka to publish details every time they send a response. This is done asynchronously after the response is sent back to the client, so it does not impact performance. We log details such as system time, request URI, client IP, response status code, correlation ID, etc. This lets us do some simple SQL queries to see interesting details about our APIs like which APIs are called most frequently (or not at all), which ones have high error rates, and more.
- Asynchronous Processing: Whenever we want to offload processing intensive tasks that don’t require acknowledgement, we typically use Kafka. For example, if we have a ElasticSearch index that needs updating, we’ll use a Kafka topic to handle this. Whenever data changes that impacts the index, a message is published on the topic. The some microservice will listen on the topic and perform the necessary work. This allows the microservice to work at its own pace while messages accumulate in the topic, and makes it easy to add new instances of the microservice to churn through the backlog if we need.
In fact, because Kafka is such a critical piece of our infrastructure, we even created our own Kafka monitoring tool.
Typically when a microservice exposes HTTP end points or listens for messages, you’ll need to write some sort of client library to facilitate interacting with that microservice. Similar to the reasons you should standardize the entire microservice stack, at HomeAdvisor we created a framework just for building API clients. The client API framework provides a way to wrap remote calls using both Hystrix and Spring Retry, which provides fault tolerance and configurable retry policies for failures. You can read more about the Robusto framework on our GitHub page.
We use Dropwizard Metrics (formerly Codahale Metrics) for tracking both metrics and health in our microservices. Metrics are simply the ability to collect any data we want during runtime. Data can be collected in a number of useful data structures such as counters, gauges, and histograms. All of these metrics are managed by a MetricRegistry class, which provides a single point of access for collecting and querying different metrics within the JVM. Some examples of metrics we collect at HomeAdvisor are Kafka consumer rates, API response times and failure rates, and JVM state. Each microservice is also free to add in metrics that make sense for their needs.
The second aspect of monitoring is health. Using Dropwizard, applications create multiple HealthCheck classes, which are managed by a central HealthCheckRegistry class. Health checks are incredibly flexible and simply report an UP or DOWN result. A single health check can report on any number of things:
- JVM (memory pools, thread counts, garbage collection, etc)
- Server or Operating System (disk usage, network I/O, etc)
- External connectivity (database, third party systems, etc)
- Metrics from the metric registry
Basically, any state that can be observed can be wrapped in a health check. The overall health of the microservice is just the worst case individual health check. At HomeAdvisor, we have strayed from the core health check semantics a little bit. We found the boolean nature of UP or DOWN was a little too restrictive, so we’ve introduced an in between state that we call SICK. The main difference between DOWN and SICK for us is that the former will alert our on-call staff for immediate attention, while the latter is considered non urgent.
Using Spring Boot auto configurations, we create singleton metric and health check registries that are available for auto wiring in all of our microservices. We also wrap those registries with Spring Managed Beans so that we can expose them via JMX. This allow external tools to ask each microservice for specific metrics or the current overall health. At HomeAdvisor, our primary tool for monitoring and alerting is Nagios, but we also have a custom application that uses these JMX endpoints to visually display microservice health.
By default, every microservice gets a default set of health checks for free without any configuration or setup:
- Ping: This is a simple health check that always returns UP. Basically, the fact that it returns anything is a simple indication that the process is running.
- Memory Leak: This health check keeps track of the JVM heap after each garbage collection, specifically the old generation, and warns when it continues to grow after multiple garbage collections.
- Errors: The health check monitors the LOG4J appenders in the microservice and warns if the number of ERROR messages exceeds some threshold.
- ZooKeeper: This is a basic connectivity check for our ZooKeeper cluster to ensure the microservice can both register itself and discover other services.
In addition to these “always on” health checks, we provide several more health checks that are only enabled based on certain conditions within the microservice. Using Spring Boot’s rich set of conditional annotations, we enable additional health checks based on a number of criteria: the presence of a particular class on the class path, a configuration in the environment, etc. Some examples of these additional health checks:
- Data Source: If the microservice uses a database connection, this health check will periodically run a test query on the data source to confirm connectivity to the RDBMS.
- Coherence: We typically front our database reads with distributed in-memory caches to help ease load on our database. For microservices that utilize one or more of these caches, this health check alerts if the microservice is unable to connect to the cache cluster.
- ElasticSearch: For microservices that utilize ElasticSearch, this health check monitors the health of whichever indices the application uses. If the cluster is ever yellow or red for those particular indices, this health check will alert accordingly.
- Kafka: If a microservice consumes from one or more of our Kafka topics, it will get a health check that monitors several aspects of each topic and message consumption: number of message failures, partition lag, and low message rates. One or more of these health checks can cause the microservice to alert for each Kafka topic is consumes from.
- Clients: Any client library that is built using our Robusto framework automatically exposes a health check that alerts when the number of Hystrix failures exceeds some threshold. When these client libraries are included in a microservice, their health checks are automatically registered with the health check registry.
And finally, each microservice has the ability to create its own health checks and register them with the health check registry. This allows each microservice to report health based on metrics or external services that make sense for it. For example, a big part of our Instant Booking platform is the ability to synchronize service professional’s calendars with Google and other calendar providers. The microservices that are responsible for synchronizing calendars create customized health checks that alert whenever we cannot reach the external calendar provider or if there are too many errors during synchronization.
We use Apache Maven for two primary purposes. The first is to manage each software repository as a project. The basic unit within Maven is the Project Object Model, or POM. All of our Git repositories use POM files to express project meta data such as name, version, description, etc. We also use POM files to manage dependencies between libraries. POMs also have a default set of lifecycle targets, such as compile, test, package, and deploy, which make it really easy to create a consistent development and deployment process across every repository and environment.
In addition to project management, we maintain a number of Maven archetypes to quickly create and initialize new microservices. Whenever a team has a need to create a new microservice, they request a new Git repository from our DevOps team. The DevOps team initializes the repository using one of our custom archetypes, which seeds the repository with the minimum directory structure and file set that is appropriate for the microservice. For example, not all of our microservices need to support HTTP endpoints, so they won’t include embedded tomcat in their project dependencies. Having only the required dependencies in our microservices allows us to create tailored auto configurations and health checks for each microservice, and avoid the one-size approach of most monolithic systems.
We maintain a variety of Jenkins jobs to assist in several aspects of development, deployment, and maintenance. All of our code is verified by QA in a sandbox environment, and each sandbox host has a standard set of Jenkins jobs for building libraries and starting and stopping microservices. Developers use these sandbox Jenkins jobs to build their stories and associated libraries, as well as manage the microservices and web applications. Once code has passed in sandbox, changes for each repository are pushed to master, where another set of Jenkins jobs takes over to cut official releases of each library. Once a sprint has ended and the release train is ready, Jenkins is used to promote code between our different QA environments where everything is tested in a integration fashion, eventually being promoted into production. After code is in production, we have a dedicated set of Jenkins jobs to help manage production resources. For example, we can bounce microservices or entire hosts if services start to experience problems.
In addition to JUnit, over the past year or so we’ve adopted the Spock Framework for writing both unit and integration tests for microservices. In fact, we love it so much, we’ve also started using it in our monolithic web applications when writing new tests. However, Spock wasn’t actually our first choice when it came to standardizing unit and integration tests.
Our Instant Booking team initially chose RSpec for their microservices, and for a little while it work perfectly fine. However, as more teams began writing microservices, it was clear we needed to standardize on a different test framework. RSpec is a Ruby language framework, and most of our developers had no experience with Ruby. This made writing and maintaining tests difficult for all but a few developers. We also found the integration with IntelliJ, our IDE of choice, was less than ideal. Developers often spent multiple days trying to configure the RSpec runtime and required gems. After evaluating Spock for a brief period, it was clear that adoption would be much higher for our Java developers than RSpec. Being based on a JVM language, the syntax was much more familiar, and we have access to all of our favorite Java classes when writing tests.
It took a fair amount of prototyping and a little trial and error to settle on a standard microservice stack, but now that we have, creating and maintaining our microservice architecture is second nature. Every team benefits from shared knowledge and frequent updates of the common components. Every microservice gets a consistent and predictable environment in which to run, and our DevOps team doesn’t have to maintain different deployment instructions for each team. Having each team on a standard set of technologies and tools has accelerated our ability to build new services and meet business goals. If you’re organization is still in the early stages of microservices or even just considering them, make sure that standardization is a priority.