下午同事反馈,某业务场景性能测试过程中,出现异常,提供日志报: Redis command timed out
1. 先看下日志
org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 minute(s) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:271) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.data.redis.connection.lettuce.LettuceConnection.await(LettuceConnection.java:1062) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.data.redis.connection.lettuce.LettuceConnection.lambda$doInvoke$4(LettuceConnection.java:919) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.data.redis.connection.lettuce.LettuceInvoker$Synchronizer.invoke(LettuceInvoker.java:665) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.data.redis.connection.lettuce.LettuceInvoker.just(LettuceInvoker.java:94) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.data.redis.connection.lettuce.LettuceSetCommands.sMembers(LettuceSetCommands.java:149) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.data.redis.connection.DefaultedRedisConnection.sMembers(DefaultedRedisConnection.java:805) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.data.redis.core.DefaultSetOperations.lambda$members$10(DefaultSetOperations.java:214) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:222) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:189) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:96) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.data.redis.core.DefaultSetOperations.members(DefaultSetOperations.java:214) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.data.redis.core.DefaultBoundSetOperations.members(DefaultBoundSetOperations.java:152) ~[spring-data-redis-2.5.11.jar!/:2.5.11] at org.springframework.session.data.redis.RedisSessionExpirationPolicy.cleanExpiredSessions(RedisSessionExpirationPolicy.java:132) ~[spring-session-data-redis-2.5.6.jar!/:2.5.6] at org.springframework.session.data.redis.RedisIndexedSessionRepository.cleanupExpiredSessions(RedisIndexedSessionRepository.java:424) ~[spring-session-data-redis-2.5.6.jar!/:2.5.6] at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-5.3.20.jar!/:5.3.20] at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:95) [spring-context-5.3.20.jar!/:5.3.20] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_191] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_191] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_191] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_191] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_191] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_191] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191] Caused by: io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 minute(s) at io.lettuce.core.internal.ExceptionFactory.createTimeoutException(ExceptionFactory.java:59) ~[lettuce-core-6.1.8.RELEASE.jar!/:6.1.8.RELEASE] at io.lettuce.core.internal.Futures.awaitOrCancel(Futures.java:246) ~[lettuce-core-6.1.8.RELEASE.jar!/:6.1.8.RELEASE] at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:74) ~[lettuce-core-6.1.8.RELEASE.jar!/:6.1.8.RELEASE] at org.springframework.data.redis.connection.lettuce.LettuceConnection.await(LettuceConnection.java:1060) ~[spring-data-redis-2.5.11.jar!/:2.5.11] ... 22 common frames omitted 2022-09-15 15:31:00.427 INFO 16828 --- [xecutorLoop-1-3] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 192.168.0.163:6692 2022-09-15 15:31:00.427 INFO 16828 --- [xecutorLoop-1-2] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 192.168.0.163:6692 2022-09-15 15:31:00.428 WARN 16828 --- [ioEventLoop-8-4] i.l.core.protocol.ConnectionWatchdog : Cannot reconnect to [192.168.0.163:6692]: Connection refused: /192.168.0.163:6692 2022-09-15 15:31:00.429 WARN 16828 --- [ioEventLoop-8-5] i.l.core.protocol.ConnectionWatchdog : Cannot reconnect to [192.168.0.163:6692]: Connection refused: /192.168.0.163:6692 2022-09-15 15:31:00.606 INFO 16828 --- [xecutorLoop-4-7] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 192.168.0.163:6692 2022-09-15 15:31:00.608 WARN 16828 --- [ioEventLoop-9-3] i.l.core.protocol.ConnectionWatchdog : Cannot reconnect to [192.168.0.163:6692]: Connection refused: /192.168.0.163:6692 2022-09-15 15:31:30.527 INFO 16828 --- [xecutorLoop-1-6] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 192.168.0.163:6692 2022-09-15 15:31:30.527 INFO 16828 --- [xecutorLoop-1-7] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 192.168.0.163:6692 2022-09-15 15:31:30.530 WARN 16828 --- [ioEventLoop-8-7] i.l.core.protocol.ConnectionWatchdog : Cannot reconnect to [192.168.0.163:6692]: Connection refused: /192.168.0.163:6692 2022-09-15 15:31:30.532 WARN 16828 --- [ioEventLoop-8-6] i.l.core.protocol.ConnectionWatchdog : Cannot reconnect to [192.168.0.163:6692]: Connection refused: /192.168.0.163:6692 2022-09-15 15:31:30.706 INFO 16828 --- [xecutorLoop-4-8] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 192.168.0.163:6692 2022-09-15 15:31:30.714 WARN 16828 --- [ioEventLoop-9-4] i.l.core.protocol.ConnectionWatchdog : Cannot reconnect to [192.168.0.163:6692]: Connection refused: /192.168.0.163:6692 2022-09-15 15:32:00.627 INFO 16828 --- [xecutorLoop-1-2] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 192.168.0.163:6692 2022-09-15 15:32:00.627 INFO 16828 --- [xecutorLoop-1-4] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 192.168.0.163:6692
从日志可以看出:Cannot reconnect to [192.168.0.163:6692]: Connection refused: /192.168.0.163:6692
*********************************************************************************************************************************************************************************************
2. 登入redis集群服务器看下redis服务状态
[root@iZ2ze0gm3scdypc0i15r8yZ redis-cluster]# ps -ef|grep redis root 1718 1 78 09:36 ? 04:49:50 ./redis-server *:6695 [cluster] root 1723 1 36 09:36 ? 02:12:52 ./redis-server *:6693 [cluster] root 1724 1 94 09:36 ? 05:49:34 ./redis-server *:6694 [cluster] root 1726 1 94 09:36 ? 05:48:41 ./redis-server *:6696 [cluster] root 1727 1 35 09:36 ? 02:12:15 ./redis-server *:6691 [cluster] root 2426 1 0 09:48 ? 00:01:16 ./redis-server *:6379 root 22212 22111 0 15:44 pts/4 00:00:00 grep --color=auto redis
可以看出,6692端口的redis节点挂掉了。为什么会挂掉?
*********************************************************************************************************************************************************************************************
3. 先把6692端口对应的redis节点启动,看下内存
[root@iZ2ze0gm3scdypc0i15r8yZ redis-cluster]# ./redis-cli -h 192.168.0.163 -p 6692 -c -a Tiye@123! Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 192.168.0.163:6692> info memory # Memory used_memory:5255993312 used_memory_human:4.90G used_memory_rss:6011609088 used_memory_rss_human:5.60G used_memory_peak:5255993312 used_memory_peak_human:4.90G used_memory_peak_perc:100.00% used_memory_overhead:23434632 used_memory_startup:1483784 used_memory_dataset:5232558680 used_memory_dataset_perc:99.58% allocator_allocated:5225666392 allocator_active:6178586624 allocator_resident:6227939328 total_system_memory:33020043264 total_system_memory_human:30.75G used_memory_lua:32768 used_memory_lua_human:32.00K used_memory_scripts:0 used_memory_scripts_human:0B number_of_cached_scripts:0 maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction allocator_frag_ratio:1.18 allocator_frag_bytes:952920232 allocator_rss_ratio:1.01 allocator_rss_bytes:49352704 rss_overhead_ratio:0.97 rss_overhead_bytes:-216330240 mem_fragmentation_ratio:1.15 mem_fragmentation_bytes:786011688 mem_not_counted_for_evict:0 mem_replication_backlog:0 mem_clients_slaves:0 mem_clients_normal:0 mem_aof_buffer:0 mem_allocator:jemalloc-5.1.0 active_defrag_running:0 lazyfree_pending_objects:0 lazyfreed_objects:0
从6692节点的redis内存可看出,内存已达到最大值:4.9G,内存占用率为100%,这是该节点挂掉的原因。
*********************************************************************************************************************************************************************************************
4. 再来看看6691、6693两个节点的内存使用情况
[root@iZ2ze0gm3scdypc0i15r8yZ redis-cluster]# ./redis-cli -h 192.168.0.163 -p 6691 -c -a Tiye@123! Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 192.168.0.163:6691> info memory # Memory used_memory:312651792 used_memory_human:298.17M used_memory_rss:317714432 used_memory_rss_human:303.00M used_memory_peak:313941360 used_memory_peak_human:299.40M used_memory_peak_perc:99.59% used_memory_overhead:38853632 used_memory_startup:1483784 used_memory_dataset:273798160 used_memory_dataset_perc:87.99% allocator_allocated:312643776 allocator_active:313520128 allocator_resident:322052096 total_system_memory:33020043264 total_system_memory_human:30.75G used_memory_lua:32768 used_memory_lua_human:32.00K used_memory_scripts:0 used_memory_scripts_human:0B number_of_cached_scripts:0 maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction allocator_frag_ratio:1.00 allocator_frag_bytes:876352 allocator_rss_ratio:1.03 allocator_rss_bytes:8531968 rss_overhead_ratio:0.99 rss_overhead_bytes:-4337664 mem_fragmentation_ratio:1.02 mem_fragmentation_bytes:5103624 mem_not_counted_for_evict:0 mem_replication_backlog:0 mem_clients_slaves:0 mem_clients_normal:841560 mem_aof_buffer:0 mem_allocator:jemalloc-5.1.0 active_defrag_running:0 lazyfree_pending_objects:0 lazyfreed_objects:0
[root@iZ2ze0gm3scdypc0i15r8yZ redis-cluster]# ./redis-cli -h 192.168.0.163 -p 6693 -c -a Tiye@123! Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 192.168.0.163:6693> info memory # Memory used_memory:313492000 used_memory_human:298.97M used_memory_rss:318951424 used_memory_rss_human:304.18M used_memory_peak:314344552 used_memory_peak_human:299.78M used_memory_peak_perc:99.73% used_memory_overhead:38814608 used_memory_startup:1483784 used_memory_dataset:274677392 used_memory_dataset_perc:88.04% allocator_allocated:313483968 allocator_active:314449920 allocator_resident:322850816 total_system_memory:33020043264 total_system_memory_human:30.75G used_memory_lua:32768 used_memory_lua_human:32.00K used_memory_scripts:0 used_memory_scripts_human:0B number_of_cached_scripts:0 maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction allocator_frag_ratio:1.00 allocator_frag_bytes:965952 allocator_rss_ratio:1.03 allocator_rss_bytes:8400896 rss_overhead_ratio:0.99 rss_overhead_bytes:-3899392 mem_fragmentation_ratio:1.02 mem_fragmentation_bytes:5500440 mem_not_counted_for_evict:0 mem_replication_backlog:0 mem_clients_slaves:0 mem_clients_normal:821056 mem_aof_buffer:0 mem_allocator:jemalloc-5.1.0 active_defrag_running:0 lazyfree_pending_objects:0 lazyfreed_objects:0
6691、6693两个节点的redis内存占用率也达到了99%以上,但是实际使用内存不超过300M,redis内存在集群内没有做到均衡
*********************************************************************************************************************************************************************************************
5. redis超时问题解决办法
redis超时问题的解决办法:清除6691、6692、6693这三个主节点的缓存。不过,这只是个临时的解决方案。
[root@iZ2ze0gm3scdypc0i15r8yZ redis-cluster]# ./redis-cli -h 192.168.0.163 -p 6691 -c -a Tiye@123!
Warning: Using a password with ‘-a’ or ‘-u’ option on the command line interface may not be safe.
192.168.0.163:6691> flushall
OK
*********************************************************************************************************************************************************************************************
6. 目前问题及个人认为的最佳解决方案
6.1 目前存在两个问题
①redis内存消耗没有在集群内部达到均衡;②重启redis服务,缓存没有清除。
6.2 个人认为的最佳解决方案
查看redis.conf配置文件,最大内存配置( redis.conf默认配置)为:# maxmemory <bytes>
redis服务器总内存大小为32G,根据经验(之前在上海的某个项目,设置maxmemory为固定值,集群内部所有节点的内存消耗均衡),可以给6个redis节点都分配4G内存。即: maxmemory 4096M
重启redis集群。
***************后续待观察,redis集群最大内存是否达到均衡,redis服务是否还会挂掉*************************
原创文章,作者:ItWorker,如若转载,请注明出处:https://blog.ytso.com/tech/java/289637.html