Today we are happy to announce the release of Robusto: our open source Java API client framework. The goal of this library is to help build resilient API clients while being as flexible and configurable as possible. At HomeAdvisor we use it for building HTTP clients, but the framework is not bound to any particular client or protocol.
You can find the official source code and documentation on our GitHub page. Below we’ll talk about the motivation behind building this framework and how it helps us build a more robust microservice architecture.
The name Robusto is both an homage to the Java programming language in which it is written, as well as a nod to the similarly named coffee bean that is particularly strong and resilient to pests and disease.
The Need for Robust Clients
As we’ll talk about more in future posts, in the past year or so we’ve begun breaking up our monolithic architecture. Instead of a few applications that do everything, we’re building a large collection of microservices that each do a few things. The interaction of multiple microservices is what accomplishes higher level tasks, such as user authentication or booking an appointment with our Instant Booking platform.
Interactions between microservices in our system are typically done via one of two means: asynchronous Kafka messaging or RESTful APIs. For the latter, we wanted to create a common framework that could be used as a starting point for writing client libraries that access these APIs. We struggled to find a suitable library that met all of our needs:
- Support for both synchronous and asynchronous execution for high throughput.
- Ability to retry failed commands automatically.
- Not tied to any particular protocol or client library.
- Good metrics gathering and reporting mechanisms.
We looked at a number of other libraries, but never found a single library that checked every box. For example, Jersey and Spring Web Client are great for writing RESTful clients, but do not provide any resiliency or retry. We also looked at modules like Netflix Ribbon, Google HTTP Client, and Apache HTTP Client which do provide some retry and fault tolerance. But at the time we did our trade study each library had one or more drawbacks: light documentation and community involvement, some functionality was still in beta, too tightly coupled to HTTP, little or no metrics infrastructure, long dependency list, etc. Essentially no single library we looked at met all of the goals above.
The Decision to Build Robusto
As they say, necessity is the mother of all invention, so in the end we wrote our own Java API client framework. The Robusto framework is actually a combination of two other open source projects: Netflix Hystrix and Spring Retry. Together these two libraries meet all of the goals above:
- Ability to execute in synchronous, asynchronous, or reactive modes provided by Hystrix.
- Automatic retry with configurable retry and backoff policies provided by Spring Retry.
- Ability to execute any remote command for any protocol, not just HTTP.
- Robust metrics reporting provided by Hystrix.
Both of these libraries provide key pieces of the Robusto framework without tying us to any particular protocol or client library. Hystrix provides a number of mechanisms that help achieve resiliency. The ability to isolate commands by thread pool creates high throughput and concurrency, while decoupling different remote commands. This ensures that failures of one command do not cause the entire client to backup. A configurable circuit breaker mechanism allows us to fast fail commands after a certain number of failures occurs. Command caching and collapsing prevents duplicate requests to preserve network bandwidth. And command fallbacks allow sophisticated and graceful degradation options when a failure occurs.
One piece we found missing from Hystrix is that it does not perform retries. This is suitable for many uses cases because network layer failures are typically retried by the operating system, and retry storms are often counter productive. However, in a microservice architecture where services may terminate abnormally or be re-shuffled without warning, being able to retry transient errors is critical. Adding in Spring Retry is necessary for Robusto to achieve true resiliency at the application layer. It also gives us a great deal of flexibility over when we retry and adapts easily to any client library or protocol.
How it Works
Every remote command starts with the core class helpful site ApiCommand, which extends browse around this web-site HystrixCommand. This class encapsulates a remote service call that is prone to failure. It starts by setting up a Spring Retry context, which can be configured anyway you like. By default there will be 3 attempts with an exponential back off starting at 500ms, though this is very much configurable. As we’ll discuss below, retries are driven by exception hierarchies. This gives us very fine grain control over which failures should be retried, and easily adapts to whichever client or protocol you use.
The first phase of execution is service lookup, which does not involve any particular library. At HomeAdvisor, service registration and lookup is done via Apache Curator, but you can use any service you want. There is also a basic implementation that simply uses the same address for every invocation, which is ideal for virtual IPs, load balancers, DNS aliases, etc.
After service discovery is performed, the service address is passed into a remote service callback. This is where the core of your API calls occur. Using the service address, you can now perform RESTful HTTP commands, SOAP method calls, elastic search queries, or any other command that makes a network call. At HomeAdvisor we use the Spring Web Client library for all of our API clients, and it is available as an extension of the Robusto framework.
All of the core pieces above use Java generics, which allows you to return any data type you want. This gives you great flexibility to use any representation you want, from Java primitives to custom DTOs. As long as your underlying client can support the conversion to and from your custom DTO classes, there is nothing additional you need to do within Robusto.
At a high level, command execution can be visualized in the images below.
Retries in the Robusto Java API client framework are controlled via Java exceptions. Exceptions can occur during service discovery or remote execution, and are first intercepted by the Spring Retry context. Spring Retry uses exception hierarchies to determine if commands should be retried or not. The framework provides two basic exception classes from which you can extend to create more meaningful exceptions for your clients:
The former are types of failures that can reasonably be retried without altering the command (socket timeouts, HTTP 500 responses, etc), while the latter are failures that should not be retried without being modified (authentication failures, bad requests, etc). By default, all commands are configured to retry Throwable types, except for NonRetryableApiCommandException and its child classes. You can override this behavior when you build your commands, or you can convert the underlying exceptions of your calls into this hierarchy as appropriate.
As an example, for our clients based on Spring Web Client, we wrap the underlying HttpStatusCodeException into one of the two framework exception types. This allows us fine grain control over which specific HTTP responses we want to retry, and keeps the underlying failures in the stack trace for logging and inspection by the calling client. It also means we do not have to directly handle the standard Java exception types like SocketTimeoutException.
Once Spring Retry has exhausted all possible retries, the exception is passed to Hystrix. Hystrix uses exceptions to maintain health of the command pool and potentially close the circuit. As documented here, Hystrix treats all exception types count as failures, except for HystrixBadRequestException. For this reason, the NonRetryableApiCommandException class extends HystrixBadRequestException. This prevents things like bad requests and authentication failures from opening a circuit.
In addition to the core principles of resiliency and fault tolerance, the Robusto Java API client framework has several other features to help build robust API clients. These features are optional and will only be used when explicitly enabled. See the project page Wiki for more details.
- Command Caching: Responses from remote services can be cached using any mechanism of your choice. This can prevent multiple calls for the same data and save bandwidth. We currently provide two caching mechanisms (Guava and Oracle Coherence).
- Health Checks: Clients can provide any number of health checks using any provider you want. We currently provide a single mechanism (Codahale).
- Configuration: Clients can pull in configuration from a number of different sources such as command line, config files, etc. While we currently do not provide any concrete extension points here, the goal is to allow you to incorporate configuration from any number of sources such as Spring Cloud Config or Netflix Archaius. At HomeAdvisor we use a custom Spring application context that delegates to multiple sources such as command line, config files, etc.
All of these features have interfaces and abstract classes in the core library, and all concrete implementations live in their own modules. This allows you to only use the features and implementations you want without having to import extra libraries and code into your applications. It also makes it easier for you to create extensions that make sense for your stack. For example, you could quickly create an extension of command caching based on Ehcache.
Does it Really Work?
The Robusto Java API client framework has been in development for over a year and is used heavily in our production environment, so in short, yes. We’ve done a lot of work the past few months to shore it up and make it as modular as possible, and we believe it fills a critical need for any microservice architecture. And while we use it for RESTful HTTP APIs, we also believe it is flexible enough to fit many other use cases.
We invite you to try it out for yourself and let us know what you think. This is our first major foray into the open source universe, and the more quality feedback we get the better this framework will become. Checkout the GitHub project page for everything you need to know to get started, and feel free to fork us and submit issues. We look forward to seeing this framework grow and hearing about how other organizations are using it in their own stacks.