Welcome-To-The-Jungle(Spike Demand )

Spike Demand 需求激增

Microbenchmarks are great at measuring performance "in the small"; for example, measuring the performance of individual methods. But good results do not necessarily translate into macro-scale performance. Real world access patterns and demand loads often run into deeper, systemic, architectural design issues that cannot be discerned at the micro level.

HikariCP has over 1 million users, so from time to time we are approached with challenges encountered "in the wild". Recently, one such challenge led to a deeper investigation: Spike Demand.

Microbenchmarks适合做小功能点的性能测试,例如测试每个独立的方法的性能。但是小模块拥有较高的性能这个事情并不一定能推导出整体系统具备较高的性能。现实世界系统的访问模式和需求负荷通常涉及到更深层次的,系统性的架构设计上的问题,这些问题从微观层面很难分辨出来(每个小的方法逻辑都ok,但是组合到一起性能,因为种种原因整体并不是很理想)。

HikariCP 有超过100W的用户,因此时不时的我们会遇到一些不可预知的问题与挑战。最近,其中一个挑战就使得我们进行了更加深入的调查研究:spike demand(需求激增场景)

The Challenge

The user has an environment where connection creation is expensive, on the order of 150ms; and yet queries typically execute in ~2ms. Long connection setup times can be the result of various factors, alone or in combination: DNS resolution times, encrypted connections with strong encryption (2048/4096 bit), external authentication, database server load, etc.

Generally speaking, for the best performance in response to spike demands, HikariCP recommends a fixed-size pool.

Unfortunately, the user's application is also in an environment where many other applications are connected to the same database, and therefore dynamically-sized pools are desirable -- where idle applications are allowed to give up some of their connections. The user is running the application with HikariCP configured as minimumIdle=5.

In this environment, the application has periods of quiet, as well as sudden spikes of requests, and periods of sustained activity. The combination of high connection setup times, a dynamically-sized pool requirement, and spike demands is just about the worst case scenario for a connection pool.

The questions ultimately were these:

  • If the pool is sitting idle with 5 connections, and is suddenly hit with 50 requests, what should happen?
  • Given that a each new connection is going to take 150ms to establish, and given that each request can ultimately be satisfied in ~2ms, shouldn't even a single one of the idle connections be able to handle all the of the requests in ~100ms anyway?
  • So, why is the pool size growing [so much]?

We thought these were interesting questions, and HikariCP was indeed creating more connections than we expected...

假设一个用户环境,其中数据库连接的创建代价比较昂贵,需要耗费150ms,然后查询通常只需要2ms。连接初始化时间耗时原因可能是多个因素共同造成的:DNS解析时间,连接通信加密耗时,额外的认证耗时,数据库服务负载等等。

通常来说,为了在需求激增的场景下也能提供较好的性能,HikariCP推荐使用固定大小的线程池

不幸的是,通常情况下用户的应用服务并非独享数据库资源,会有很多其他应用也会使用到同一个数据库,因此动态的数据库连接池连接数量才是期望的--在这种情况下,空闲的连接可以被释放掉以此给其他应用使用。使用HikariCP的应用一般默认被设置为minimumIdle=5,最小的空闲连接数量为5,在连接数小于5的情况下会尽力补充连接数量。

在假定的环境下,应用有时会在正常请求量情况下运行,同时也可能会发生请求量激增的情况并持续一段时间。在高耗时的连接创建,连接数量动态可调,以及请求激增的这3种可能下,连接池面临了最坏的场景。

这些问题从根本上可以归类成如下几点:

  • 如果连接池设置空闲连接数为5,然后突然请求突然激增到50,会发生什么情况?
  • 假设每个新连接创建耗时150ms,每个请求实际处理耗时2ms,难道不是仅仅需要1个空闲连接就可以在100ms内完成这50个请求的处理吗?
  • 那么如果第二点成立的话,为什么连接池大小需要增长(或者增长幅度很大)?

3, 2, 1 ... Go!

In order to explore these questions, we built a simulation and started measuring. The simulation harness code is here.

The constraints are simple:

  • Connection establishment takes 150ms.

  • Query execution takes 2ms.

  • The maximum pool size is 50.

  • The minimum idle connections is 5.

And the simulation is fairly simple:

  • Everything is quiet, and then ... Boom! ... 50 threads, at once, wanting a connection and to execute a query.

  • Take measurements every 250μs (microseconds).

为了探索这些问题,我们建立了一个模拟环境并且基于此进行厕所。模拟环境的代码 在这

限制条件很简单:

  • 连接建立耗时150ms

  • 查询耗时2ms

  • 最大连接池限制为50

  • 最小空闲连接数量为5

模拟场景也很简单:

  • 刚开始请求量很稳定5个线程并发请求, 然后突然的Boom!一下子增加到了50个并发,这些请求都需要数据库连接来执行query

  • 每次观察间隔为250μs (microseconds). 微妙?

Results

After running HikariCP through the simulation, tweaking the code (ultimately a one-line change), and satisfying ourselves that the behavior is as we would wish, we ran a few other pools through the simulation.

The code was run as follows: bash$ ./spiketest.sh 150 <pool> 50 Where 150 is the connection establishment time, <pool> is one of [hikari, dbcp2, vibur, tomcat, c3p0], and 50 is the number of threads/requests. Note that c3p0 was dropped from the analysis here, as its run time was ~120x that of HikariCP.

3, 2, 1 ... Go!

In order to explore these questions, we built a simulation and started measuring. The simulation harness code is here.

The constraints are simple:

  • Connection establishment takes 150ms.

  • Query execution takes 2ms.

  • The maximum pool size is 50.

  • The minimum idle connections is 5.

And the simulation is fairly simple:

  • Everything is quiet, and then ... Boom! ... 50 threads, at once, wanting a connection and to execute a query.

  • Take measurements every 250μs (microseconds).

为了探索这些问题,我们建立了一个模拟环境并且基于此进行厕所。模拟环境的代码 在这

限制条件很简单:

  • 连接建立耗时150ms

  • 查询耗时2ms

  • 最大连接池限制为50

  • 最小空闲连接数量为5

模拟场景也很简单:

  • 刚开始请求量很稳定5个线程并发请求, 然后突然的Boom!一下子增加到了50个并发,这些请求都需要数据库连接来执行query

  • 每次观察间隔为250μs (microseconds). 微妙?

Results

After running HikariCP through the simulation, tweaking the code (ultimately a one-line change), and satisfying ourselves that the behavior is as we would wish, we ran a few other pools through the simulation.

The code was run as follows: bash$ ./spiketest.sh 150 <pool> 50 Where 150 is the connection establishment time, <pool> is one of [hikari, dbcp2, vibur, tomcat, c3p0], and 50 is the number of threads/requests. Note that c3p0 was dropped from the analysis here, as its run time was ~120x that of HikariCP.


HikariCP (v2.6.0) raw data

image-20210615225800130.png


Apache DBCP (v2.1.1) raw data

image-20210615225826385.png

Apache Tomcat (v8.0.24) raw data

image-20210615225839502.png

Vibur DBCP (v16.1) raw data

image-20210615225859614.png

Apache DBCP vs HikariCP

如果你上面的图片没有好好看的话,这里给了一张完整的的对比图。

Apache DBCP 在上面, HikariCP 在下面。

image-20210615225927153.png

Commentary 说明

We'll start by saying that we are not going to comment on the implementation specifics of the other pools, but you may be able to draw inferences by our comments regarding HikariCP.

我们并不想去直接讨论其他线程池的具体实现,但是你也许可以从我们对HikariCP的讨论中看出一些区别出来。

Looking at the HikariCP graph, we couldn't have wished for a better profile; it's about as close to perfect efficiency as we could expect. It is interesting, though not surprising, that the other pool profiles are so similar to each other. Even though arrived at via different implementations, they are the result of a conventional or obvious approach to pool design.

从HikariCP的测试结果图中我们发现,这已经我们最想得到的结果了,没有比他更好的了,它具备了我们期待的接近完美的效率。其他的线程池则很有趣,他们的测试结果都非常接近,但这也在我们意料之中,即使其他的连接池实现各有不同,但他们都采用了传统的符合常识的线程池设计方案。

HikariCP在这个案例下有别于其他线程池的性能表现,是由于我们的最基本指令(最基本的设计原则)。

💡 用户线程应该最大程度的阻塞在线程池的连接获取上(而不是去主动创建连接,ps.最大程度并不是说永远阻塞)。

考虑一个假定的场景:

一个连接池里有5个正在使用的连接,以及0个空闲连接,然后一个新的线程进来了,它需要一个连接用于执行请求。

那么我们的基本准则在这种情况下如何处理呢?我们以一个问题开始然后进行解答。

如果这个新线程进来了并且被指示创建一个新连接,然后这花费了150ms去建立连接,那么如果5个处于使用中的连接其中一个执行完毕并归还到了连接池,那么此时中会发生什么呢?

Apache DBCP2和Viber都最终以45个连接数量结束, Apache Tomcat JDBC最终以40个连接结束,然而HikariCP最终则只以5个连接结束(科学的说是6个,see below)。这就会有显著的,可观测的影响作用于实际的应用部署,那就是35-40个额外的连接资源被耗费了,无法被其他应用使用到(只能被当前应用使用,其他应用部署时候可能连接数量都不够启动不了,本来可以额外启动7,8个应用),除了连接资源外,数据库端35-40个线程资源,以及关联的内存资源也无法被其他应用使用。

我们知道你现在在想什么,“万一这个50个线程持续请求呢?即持续以高并发请求呢?”,答案就是,HikariCP也会增加线程数量。

实际上,HikariCP中只要线程池可用线程数量为0了,大概在持续 800μs之后,它就会开始以异步的方式创建一个新的连接。如果在上面的模拟场景中继续持续的收集指标,你就会发现HikariCP他也会新增一个额外的线程到连接池中。但它只会增加一个线程,因为HikariCP采用省略逻辑(??),因为在结束的时候HikariCP检测到实际已经没有处于等待获取连接状态的请求了,因此之后的连接创建流程会被省略。

HikariCP's profile in this case, and the reason for the difference observed between other pools, is the result of our Prime Directive:

💡 User threads should only ever block on the pool itself.

Consider this hypothetical scenario: There is a pool with five connections in-use, and zero idle (available) connections. Then, a new thread comes in requesting a connection. "How does the prime directive apply in this case?" We'll answer with a question of our own:

If the thread is directed to create a new connection, and that connection takes 150ms to establish, what happens if one of the five in-use connections is returned to the pool?


Both Apache DBCP2 and Vibur ended the run with 45 connections, Apache Tomcat (inexplicably) with 40 connections, while HikariCP ended the run with 5 (technically six, see below). This has major and measurable effects for real world deployments. That is 35-40 additional connections that are not available to other applications, and 35-40 additional threads and associated memory structures in the database.

We know what you are thinking, "What if the load had been sustained?" The answer is: HikariCP also would have ramped up.

In point of fact, as soon as the pool hit zero available connections, right around 800μs into the run, HikariCP began requesting connections to be added to the pool asynchronously. If the metrics had continued to be collected past the end of the spike -- out beyond 150ms -- you would observe that an additional connection is indeed added to the pool. But only one, because HikariCP employs elision logic; at that point HikariCP would also realize that there is actually no more pending demand, and the remaining connection acquisitions would be elided.

Epilog 收场白

这个场景仅仅表现了众多连接池访问场景中的一种。当遇到其他挑战性的问题时,HikariCP会持续的进行研究和创新改进。和通常一样,谢谢你的浏览光顾赞助。

This scenario represents only one of many access patterns. HikariCP will continue to research and innovate when presented with challenging problems encountered in real world deployments. As always, thank you for your patronage.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,456评论 5 477
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,370评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,337评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,583评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,596评论 5 365
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,572评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,936评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,595评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,850评论 1 297
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,601评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,685评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,371评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,951评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,934评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,167评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 43,636评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,411评论 2 342

推荐阅读更多精彩内容