In our last post, we introduced our villainous software monolith and explained its origin story. To recap, in the early days of moving from a small startup to large software organization, it made sense to clump all our code together. Builds were less complex (always build everything), software dependencies were never an issue (all the code you need is one place), and server space was expensive (deploy everything to a few hosts). Life was good.
But times have changed, and just like the pet rock, not all ideas are good forever. We’re no longer a small band of developers trying to build a few applications to secure some more VC. We’ve grown like crazy and hired a ton of people over the past 15 years and our customer base has grown to levels we never imagined in those early days. On our busiest days we connect upwards of 50,000 consumers to service professionals. And with the introduction of products like Instant Connect and Instant Booking, the demand on our systems will continue to grow.
The monolith has served us well, but it was time to evolve. The time came for a more flexible architecture that could meet the needs of our growing network of homeowners and service professionals. One that lets us fully decouple components, while still letting us share common functionality. One that scales horizontally to meet growing demand. One that lets us onboard new developers and make them productive in hours, not days or weeks. One that creates a clear delineation of responsibilities within the architecture.
You might be wondering to yourself, “What architectural style could possibly achieve such a utopian vision?” In case it isn’t obvious by now, we believe the answer is microservices.
What Are Microservices
So what exactly are microservices? No single definition exists, but Martin Fowler sums them up quite nicely:
An approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery.
If a monolith is a single, large software application that does many things, a microservice, by contrast, is a small application that does a few things, preferably one thing. Instead of a monolithic application doing everything, larger goals in the architecture are accomplished via the interaction of multiple microservices. This interaction between microservices is typically done via well defined standards such as RESTful web services or distributed messaging.
As an example, consider our monolithic consumer facing web application. The entire application handles every aspect of the consumer’s experience with our service (at least, it used to before we began our migration to microservices). Let’s consider a particular path through the system: a consumer trying to book an appointment. It should be pointed out that this example is slightly contrived since the reality is that we’re somewhere in between software monolith misery and pure microservice nirvana, but it makes for a good example.
In a monolithic world, a single application would handle every step of this:
- Loading a list of tasks for the consumer.
- Finding a service professional that performs the desired work in the consumer’s location.
- Presenting available times from the service professional’s calendar (including synching with an external calendars like Google).
- Selecting an appointment time and notifying the service professional.
- Creating an account for the user, or letting them login if they already have one.
- Notifying the consumer when the service professional confirms the appointment.
- Sending reminders to both the consumer and service professional as the appointment approaches.
That’s a lot of responsibility for a single application, and it’s just one of the many user interactions that could occur. Building, testing, and deploying an application of this size is time consuming. What happens if we get a flood of visitors trying to login? We could potentially bog down an entire instance just trying to service those requests, impacting someone who is already logged in and trying book an appointment. Or what if we find a bug in the appointment module? We’re stuck building and testing the entire application, and then bouncing each instance in production to pickup the code fix. And let’s not forget the poor developer who is asked to create some new functionality for account management. We would point him at the thousands of Java classes that makeup the monolith and wish him luck.
Clearly none of this is ideal. So let’s rethink this series of actions from a microservice viewpoint. A first cut of decomposing functionality might result in the following microservices:
- content Task: Knows about all the tasks in our system that a user can choose. Allows us to quickly add new tasks or decommission old ones on the fly.
- http://insidezhanzhuang.com/2015/01/the-significance-of-discomfort-in-zhan-zhuang-training-part-2/ Service Professional: Manages all aspects of a service professional’s profile, and lets us query for service professionals by task and location.
- Appointment: Provides services for finding and booking appointments.
- Calendar Sync: Works with external calendar providers to ensure the appointment service always knows the latest availability for a service professional.
- Account: Allows us to create and manage user accounts. It could also handle login and authentication, or we could put that part into another microservice on its own.
- Notification: Services for sending consumers and service professionals e-mails, SMS, push notifications, etc.
This already feels cleaner that the software monolith, but lets take a closer look at the benefits of the microservice approach.
Microservices Are Lightweight
By design, microservices should be lightweight and therefore occupy a very small memory and CPU footprint compared to a monolithic application. As a result, when demand for one component increases, the system can easily scale to meet that demand. New instances of a service can be spun up quickly and without perturbing other components. This is particularly beneficial in cloud environments where you pay per cycle or byte. Running fewer cycles, and only when you need them, can translate to real cost savings.
The light weight nature of microservices also make them great candidates for automation. At HomeAdvisor we use Jenkins to drive our continuous builds and not-so-continuous deliveries to production, which are greatly simplified with microservices. Libraries build faster, test failures are identified sooner, and services are deployed rapidly. And when it comes time to push code to production, we can setup a rolling bounce of services to ensure new code is picked up and no downtime is required.
Going back to our example, if we get a flood of login requests we simple spin up more of instances of the microservice that handles that functionality, and the users already logged in and browsing the site are none the wiser. And if we do find a bug in the appointment module, we simply patch that microservice, run its regression tests, and do a rolling bounce of those services, all without impacting other pieces of the architecture.
Separation of Responsibilities
Microservices typically fall on data or functional boundaries. In fact, as rule of thumb, microservices should be no bigger than your head. This creates natural separation of responsibilities among the components, which also leads to well defined roles among different development teams. Having a clear separation of responsibilities makes it easier for a developer to understand the scope of a microservice. This leads to shorter ramp up times for new team members because they can focus on a few things rather than being dropped into the middle of a monolithic maze. A rookie developer now has to acclimate to dozens of class files instead of hundreds or thousands.
Additionally, microservices manage only the data they care about. This means data access can be optimized per service, for example using a distributed cache or de-normalized document store such as Elasticsearch. This allows each microservice owner to focus on their specific data needs, and allows new technology to be introduced into the architecture with low risk and easy rollback if needed. It also prevents unnecessary library dependencies in other services. For example if you don’t use Elasticsearch in your microservice, then you don’t need it on your class path just to satisfy some other component in the system.
Changes in Isolation
Microservices are a software tester’s best friend. When a bug is discovered in a monolith, the software QA effort alone can be daunting. Identifying the affected paths through the application and appropriate level of regression is a bit of a moving target, and the costs of deploying a monolith mean there are heavy penalties if things don’t go right the first time. With a microservice, fixes can be tested quickly and the updated service deployed rapidly. Running an entire microservice regression will be much quicker than it would with a software monolith. And on the off chance that the fix doesn’t work, a microservice can be rolled back or re-patched much easier than a monolith. This should make any dev ops or software tester sing with joy.
The Case Against Microservices
For all the benefits that microservices offer, it’s unrealistic to think any software architecture will solve all your problems. If microservices were the silver bullet we have been searching for all these years, everyone would use them and software architects would be out of work. But as with any software architecture, microservices solve many problems while introducing new ones. At HomeAdvisor, our brief experience with microservices has revealed several pain points. Nothing that’s made us change course and beg the monolith for forgiveness, but there have certainly been a few hurdles that are worth mentioning.
Software Repository Management
Our software monolith lives as a single git repository as a multi-module maven project. Each module is versioned at the same time, making software builds and dependency management quite easy. Even on code only changes for one module, say the consumer web application, every other module gets the same revision. And since every module has the same version, you never have to worry about choosing the “right version” for your needs.
But as we pull more and more functionality out of the monolith and into their own repositories, we find ourselves with more git repositories than we know what to do with. Each repository has its own version, and multiple developers may change it within the same development cycle. When multiple pieces of the monolith require one of these external modules, we often find conflicting versions, which typically don’t manifest until the code reaches an integation environment.
There are several ways to deal with this pain point. First, break apart the monolith. This seems obvious and counter productive to this post, but the reality is that library conflicts are almost inevitable as long as large pieces of your architecture are tightly coupled. Continuing the detangle modules will always be the best long term strategy. Secondly, if you cannot achieve total monolithic destruction in one commit, make sure your changes are backwards compatible whenever possible. This means deprecating or overloading functions instead of deleting them. Leaving class names and packages unchanged. And following semantic versioning to make it clear to users of your library when it’s safe to upgrade and when caution should be exercised.
Monitoring & Metrics
Monitoring a system that has a few instances of a monolith is exponentially easier than a system that has dozens of microservices. At HomeAdvisor all of our Java services use JMX as the primary means for exposing application health. Moving from monolith to microservices means more service instances, which means more places to keep track of. And keeping the monitoring system up to date with all running instances can be cumbersome, especially in an environment where new services may come and go without warning.
To that end, our first problem was simply keeping track of where all these new processes were running. When you only run a few monolithic applications, you can typically recite host names from memory. Remembering where things are running is not usually a tough mental exercise. But when you grow your number of applications nearly 10x, and have the ability to add or migrate them without warning, keeping tabs on everything becomes a lot like herding cats.
To solve this, we built a microservice that monitors…microservices. Because we love a good pun, we call this service Komodo Monitor, named after the species of lizard with the same name. The Komodo Monitor provides both a human friendly interface that lists which applications are running and how to communicate with them (host and post), as well as machine friendly interface that feeds into our larger reporting and monitoring framework. It does this by tapping into the common service discovery framework we have built using Apache Curator. On startup, every one of our services registers in the curator framework, providing details of its main HTTP interface (if it has one), as well as details for how to ask it for health status and other important information.
In addition to the sheer number of processes that require monitoring, each one may have different needs. In our monolithic applications we would simply expose the same set of health checks for every instance. But microservices are intended to be unique and may require different types of checks. For example, some of our microservices expose API end points via HTTP, so it makes sense to monitor response codes since a large number of 503 responses may be indicative of a problem. Other microservices only use Kafka, either as a producer or consumer, so it makes sense to monitor for failures to publish or consume messages. And in addition to these common patterns, individual microservices may wish to expose health checks and metrics that make sense for their business function. For example, our Instant Connect platform may want to expose call failures, or our Instant Booking platform may need to report failures to synchronize with a third party like Google Calendar.
To solve this, we have built a microservices stack that provides generic health checks and metrics out of the box. We’ll go more into these standard checks and metrics in a future post, but essentially all new microservices will get the same set of basic checks simply by using our framework. Things like memory usage, database connectivity, service discovery status, and more are all standard. On top of the standard health checks and metrics, when a microservice uses Kafka or HTTP or Elasticsearch, we enable additional checks specific to those services. And finally, when an application needs to report health or metrics that are specific to its business function, we provide generic extension points that allow developers to create customized health checks and metrics reporting. All of these are accessible via a single JMX endpoint that can quickly tell us if a service is healthy or not, and what in particular is making it sick.
Finally, to tie it all together, our custom monitoring microservice has the ability to invoke all of the health checks above for every microservice it discovers. This allows humans and machines alike to quickly assess the overall health of the system, and identify which services are unhealthy and need attention. When a service identifies itself as down or degraded, the service itself tells us what is wrong, whether it be heavy memory utilization, failure to send a Kafka message, or something else.
Backwards compatibility comes in many different flavors. At the unit level, you can have Java classes that are not backwards compatible because method names change, variables change scope, classes move to new packages, etc. Most of these will be exposed at compile time, but in the world of Dependency Injection and Inversion of Control these types of incompatibilities sometimes remain hidden until runtime. We have found the best way to combat these types of problems is to have a strong integrated testing environment. At HomeAdvisor, once all code changes are on the release train we spend nearly a full week testing all of them together in a single environment that closely mirrors our production environment. This usually irons out issues that individual developers did not see during their own testing and only appeared in combination with other changes.
At the integration level, components that communicate via APIs or distributed messaging (JMS, Kafka, etc.) can also become backwards incompatible. When you’re dealing with two remote systems that communicate, it’s important to be mindful of sending message payloads that can be ingested by older versions of a service (unless you can ensure all of your services upgrade at once, which is nearly impossible in our growing world of microservices and external devices). There are several ways you can try to be backwards compatible in your APIs:
- Never remove or rename fields in your message bodies, only add them. In addition, make sure your clients can tolerate unknown fields in case they start getting data from an updated service. Most JSON and XML serialization packages have ways to enforce the latter.
- Consider adding a version parameter to either your message payload, or URI structure if you’re working with RESTful APIs. Clients should send requests that specify the version that they are written against, and services must be able to accommodate all versions until you’re sure a particular version can be retired.
As a rule of thumb, whenever you’re building two disparate systems that need to communicate, always remember the Robustness Principle.
One of the areas of microservices that is easy to overlook is performance: how quickly is a task accomplished. In a software monolith, tasks are accomplished in the same JVM. Objects reside in one large memory pool that is accessible by every module. This means there is no penalty due to serialization of objects or network latency. The opposite is true for microservices. Requests must be serialized by the client, sent across a network connection, and then deserialized by the server. After the server does some work, the process is reversed: the server serializes the response, sends it across the network, and it is deserialized by the client. This series of interactions can be exacerbated by several factors: message size, number of network hops between services, etc. If you’re not careful, all of these interactions can add up quickly.
When building microservices, there are several ways to mitigate performance concerns. First, choose a protocol that is both performant and scalable. At HomeAdvisor we use both HTTP and Apache Kafka for our microservices. Both provide excellent performance, are supported in just about every modern programming language, and scale incredibly well. Most importantly, both of these protocols support arbitrary payloads, meaning we have the flexibility to use raw or compressed text, or even binary data.
Just make sure you’re choosing your protocol carefully and include features only when necessary. For example, using SSL may be a requirement for some organizations, but it adds overhead to the underlying connection so use it only when necessary. Adding compression may be more harmful than good if your payloads are small. The time spent compressing and decompressing data may take longer than just sending the original message, and in some rare cases a really small message may actually be bigger after it is compressed. In general, only enable the features of your protocol that are necessary, and make sure you are quantifying their utility at every step.
Another way to boost performance in your microservices is to minimize payload sizes. As a developer, make your DTOs as simple as possible, but more importantly choose a representation that is fast and efficient. The two primary representations in modern software architectures are JSON and XML. XML can be better for tabular type data where you can rely heavily on attributes and avoid elements with ending tags, but for more complex or nested data structures JSON eliminates some of the redundancy of XML. And if you factor in compression, the difference between the two may be negligible. As with any technology tradeoff, there will be pros and cons to each approach. Just be sure to understand how they impact your applications and data.
But perhaps the most important consideration for microservice performance is freeing yourself from the bonds of synchronous processing. We touched on this in our post about Java Multithreading, but by far the best way to get the most out of your microservices is to perform as many tasks as possible using asynchronous mechanisms. This is why we love Kafka so much at HomeAdvisor (though our Robusto Client Framework is certainly up to this task as well thanks to Java Futures and Observables). We use Kafka for logging messages, maintaining Elasticsearch indices, synchronizing external calendars for our service professionals, and much more. Basically any process that isn’t time critical and requires no response is a good candidate for becoming asynchronous. The services that publish messages are free to do other work instead of blocking, and the services that consume those messages can do so at their own pace. Remember that microservices are scalable, so if the processing does slow down it should be easy to spin up new services to meet the demand.
What Comes Next?
Now that we’ve hopefully convinced you that microservices are an architecture worth embracing, we’ll spend the next few posts in this series discussing the details of how we’ve built our stack. We’ll discus the common tools and practices we’ve adopted to help make our transition to microservices as smooth as possible. With several of our critical web applications still tangled in our monolith, we have a ways to go before we are fully free. We definitely have a lot to learn, but based on our experience so far we believe microservices are the right approach for us and we’re excited to share our experiences as we go.