As OpsGenie, we have been growing aggressively, both in terms of headcount and product features. To give you some idea, our engineering team grew from 15 to 50 just last year. To scale up the development teams, we divided the engineering power to eight-people teams by obeying the Two Pizza team rule.
As you would expect, our current product is somewhat monolithic. Developing and operating it is challenging in terms of parallel development efforts of teams, CI/CD (Continuous Integration/Continuous Delivery) process, etc. We are following the current trend and working on transitioning from the monolith to microservices architecture. You can read more about microservices architecture and its benefits from Martin Fowler’s this article.
There are some recommended architectural patterns for applying Microservice concepts. One of these patterns is the API Gateway. API gateway is a single entry point for all clients. The API gateway handles requests in one of two ways. Some requests are simply proxied/routed to the appropriate service. It handles other requests by fanning out to multiple services.
API Gateway pattern is a good starting point for the microservice architecture because it enables routing specific requests to different services that we detached from the monolith. Actually API Gateway is not a new concept for us. Up until so far, we have been using Nginx as the API Gateway in front of our monolith application, but we wanted to re-evaluate our decision in the context of microservice transition. We care about performance, ease of extensibility and additional capabilities such as rate limiting. The first step is to evaluate the performance of the alternatives under heavy load to assure that they will scale enough to meet our needs.
In this blog post, we explain how we setup our test environment and compare the performance of alternative API Gateways: Zuul 1, Nginx, Spring Cloud Gateway, and Linkerd. In fact, we have other alternatives like Lyft’s Envoy and UnderTow. We are going to perform similar tests with these tools and share the results in future blog posts.
Zuul 1 seems promising for us since it is developed with Java and has Spring framework’s strong support. There are already some blog posts that compare Zuul with Nginx, but we also want to evaluate the performance of Spring Cloud Gateway and Linkerd. Besides, we plan to perform further load tests, so we decided to set our own test workbench.
To evaluate the performance of the API Gateways independently, we created an isolated test environment independent of OpsGenie product. We used Apache Http Server Benchmarking tool — ab for benchmarks.
We first installed Nginx to an AWS EC2 t2.micro instance according to the official Nginx documentation. This environment is our initial test environment, and we added Zuul and Spring Cloud Gateway installations to this environment. Nginx web server hosts static resources, and we defined reverse proxies to the web server for Nginx, Zuul and Spring Cloud Gateway. We also started another t2.micro EC2 to perform requests (Client EC2).
The dashed arrows in the figure are our test paths. There are four of them:
Direct access
Access via Nginx reverse proxy
Access via Zuul
Access via Spring Cloud Gateway
Access via Linkerd
We know that you are impatient about seeing the results, so let’s give the results first, and the details later.
Performance Benchmark Summary
Test Strategy
We used Apache HTTP Server Benchmarking tool. We made 10,000 total requests with 200 concurrent threads at each test run.
ab -n 10000 -c 200 HTTP://<server-address>/<path to resource>
We performed tests on three different AWS EC2 server configurations. We narrowed down test cases at each step for further clarification:
1.We performed an additional direct access test in just the first step to see the overhead of proxies, but since direct access is not option for us, we didn’t performed this test on the following steps.
2.Since Spring Cloud Gateway is still not released formally, we evaluated it just at the last step.
3.Zuul’s performance is better at subsequent calls after the first call. We think this is probably caused the JIT (Just In Time) optimization is performed on the first call, so we called the first of Zuul run as “Warmup”. The values shown in the summary tables below are after the warm-up performance.
4.We know that Linkerd is a resource intensive proxy, so we compared it just at the last step with the highest resource configuration.
Test Configuration
T2.Micro — Single Core CPU, 1GB of Memory: We ran tests for direct access, Ngnix reverse proxy, and Zuul (after warmup).
M4.Large — Dual Core CPU, 8GB of Memory: We compared the performance of Nginx reverse proxy and Zuul (after warmup).
M4.2xLarge — 8 Core CPU, 32GB of Memory: We compared the performance of Nginx reverse proxy, Zuul (after warmup), Spring Cloud Gateway, and Linkerd.
Test Results
The performance benchmark summary is below:
Test Details
Direct Access Request
First, we accessed a static resource directly without any proxy. The results are as follows. Mean time per request is 30 ms.
Access Via Nginx Reverse Proxy
In our second test, we accessed a resource via Nginx reverse proxy. Mean time per request is 40 ms. We can say that Nginx reverse proxy added a %33 overhead at average when compared to direct access that is explained in the previous section.
Access via Zuul Reverse Proxy
After that, we created a Spring Boot Application with a main method:
And this is our application.yml file:
The results of the initial Zuul test is as follows:
Time per request for Nginx were 30ms and 40ms for direct access and Nginx reverse proxy, respectively. Time per request for Zuul at first run is 388ms. As mentioned in other blog posts, a JVM warmup may help. When we reran the test, we got the following results:
Zuul proxy performs better after warmup (time per request is 200ms), but it is still not that good when compared to Nginx reverse proxy which has a score of 40ms.
What if we upgrade the server to m4.large?
As shown in Figure 1, the server is a t2.micro ec2 which has a single core and 1GB of memory. Nginx is a native C++ application and Zuul is Java-based. We know that Java applications are a little bit :) more demanding. So we changed the server to an m4.large instance which has two CPU cores and 8GB of memory.
We ran the Nginx and Zuul reverse proxy tests again, and the results are given below:
As shown in the above figures, the request per second values are 32ms and 95ms for Nginx and Zuul, respectively. These results are way better than the results of the tests on t2.micro which were 40ms and 200ms.
A valid criticism is that we are introducing extra overhead by using Zuul via a Spring Boot application. Most probably it will perform better if we run Zuul as a standalone application.
What if we upgrade the server to m4.2xlarge?
We also evaluate m4.2xlarge server which has eight cores and 32GB of memory. The results for Ngnix and Zuul are given in the following figures:
Zuul outperformed Ngnix on m4.2xlarge server. We performed some research to find out what type of ec2 instances Netflix is using to host Zuul instances, but we couldn’t find any information about it. In some blog posts, people complained about performance of Zuul and asked how Netflix scales it; we think this is the answer; as it is said, Zuul is CPU-bound :)
Benchmark for Linkerd
Linkerd is a Cloud Native Computing Foundation member project. It is a service mesh application developed in Scala. It provides reverse proxy capabilities in addition to service mesh capabilities such as service discovery. We have evaluated performance of Linkerd and the results are given below. Performance of Linkerd is very close with Zuul.
Benchmark for Spring Cloud Gateway
Spring Cloud community is also developing a Gateway module. Although it is not still released officially, we think it is worth comparing it with the other alternatives. Thus, we modified the sample application of Gateway application according to our test environment.
We ran the same performance test with Apache Http Server Benchmarking Tool that sends 10,000 requests with 200 concurrent threads. The results are shown in the following figure:
As shown in the figure, Spring Cloud Gateway can handle 873 requests per second, and mean time per request is 229ms. According to our tests, the performance of Spring Cloud Gateway can not reach the level of Zuul, Linkerd and Nginx, at least that’s the case with their current codebase on Github. Comparison of Nginx, Zuul, Linkerd and Spring Cloud Gateway is given above, at the end of Benchmark Summary section.
From Comparing API Gateway Performances: NGINX vs. ZUUL vs. Spring Cloud Gateway vs. Linkerd