zhangguanzhang's Blog

openstack 虚机上 rabbitmqctl 超时

字数统计: 4.9k阅读时长: 29 min
2023/02/14

记录一下今天遇到的 openstack 上 rabbitmqctl 一直超时的问题

前因

现场遇到的问题,hostNetwork 启动的 rabbitmq 容器,启动后 rabbitmqctl 所有命令都超时报错,使用 ctl 是因为初始化集群后要设置用户。

过程

按照时间线写了。

排查

调用 rabbitmqctl 报错日志为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[root@zhxxxx-xxx-20230203150413-0008 ~]# docker exec k8s_rabbitmq_rabbitmq-1-xx_0 rabbitmqctl --quiet add_user root 'xxx'
Error: unable to perform an operation on node 'rabbit@zhxxxx-xxx-20230203150413-0008'. Please see diagnostics information and suggestions below.

Most common reasons for this are:

* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running

In addition to the diagnostics info below:

* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit@zhxxxx-xxx-20230203150413-0008
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools

DIAGNOSTICS
===========

attempted to contact: ['rabbit@zhxxxx-xxx-20230203150413-0008']

rabbit@zhxxxx-xxx-20230203150413-0008:
* connected to epmd (port 4369) on zhxxxx-xxx-20230203150413-0008
* epmd reports node 'rabbit' uses port 25672 for inter-node and CLI tool traffic
* can't establish TCP connection to the target node, reason: timeout (timed out)
* suggestion: check if host 'zhxxxx-xxx-20230203150413-0008' resolves, is reachable and ports 25672, 4369 are not blocked by firewall

Current node details:
* node name: 'rabbitmqcli-2488-rabbit@zhxxxx-xxx-20230203150413-0008'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: xxxx==

查看日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
[root@zhxxxx-xxx-20230203150413-0008 ~]# docker logs xxx
{:query, :"rabbit@zhxxxx-xxx-20230203150413-0008", {:badrpc, :timeout}}

2023-02-14 10:29:01.553 [debug] <0.284.0> Lager installed handler error_logger_lager_h into error_logger
2023-02-14 10:29:01.565 [debug] <0.290.0> Lager installed handler lager_forwarder_backend into rabbit_log_lager_event
2023-02-14 10:29:01.565 [debug] <0.293.0> Lager installed handler lager_forwarder_backend into rabbit_log_channel_lager_event
2023-02-14 10:29:01.565 [debug] <0.287.0> Lager installed handler lager_forwarder_backend into error_logger_lager_event
2023-02-14 10:29:01.566 [debug] <0.296.0> Lager installed handler lager_forwarder_backend into rabbit_log_connection_lager_event
2023-02-14 10:29:01.566 [debug] <0.305.0> Lager installed handler lager_forwarder_backend into rabbit_log_ldap_lager_event
2023-02-14 10:29:01.566 [debug] <0.308.0> Lager installed handler lager_forwarder_backend into rabbit_log_mirroring_lager_event
2023-02-14 10:29:01.566 [debug] <0.299.0> Lager installed handler lager_forwarder_backend into rabbit_log_feature_flags_lager_event
2023-02-14 10:29:01.566 [debug] <0.311.0> Lager installed handler lager_forwarder_backend into rabbit_log_prelaunch_lager_event
2023-02-14 10:29:01.566 [debug] <0.302.0> Lager installed handler lager_forwarder_backend into rabbit_log_federation_lager_event
2023-02-14 10:29:01.566 [debug] <0.314.0> Lager installed handler lager_forwarder_backend into rabbit_log_queue_lager_event
2023-02-14 10:29:01.566 [debug] <0.317.0> Lager installed handler lager_forwarder_backend into rabbit_log_ra_lager_event
2023-02-14 10:29:01.566 [debug] <0.323.0> Lager installed handler lager_forwarder_backend into rabbit_log_upgrade_lager_event
2023-02-14 10:29:01.566 [debug] <0.320.0> Lager installed handler lager_forwarder_backend into rabbit_log_shovel_lager_event
2023-02-14 10:29:01.670 [info] <0.267.0> HiPE disabled: no modules were natively recompiled.
2023-02-14 10:29:02.053 [debug] <0.280.0> Lager installed handler lager_backend_throttle into lager_event
2023-02-14 10:29:02.093 [info] <0.267.0>
Starting RabbitMQ 3.8.5 on Erlang 23.0.3
Copyright (c) 2007-2020 VMware, Inc. or its affiliates.
Licensed under the MPL 1.1. Website: https://rabbitmq.com

## ## RabbitMQ 3.8.5
## ##
########## Copyright (c) 2007-2020 VMware, Inc. or its affiliates.
###### ##
########## Licensed under the MPL 1.1. Website: https://rabbitmq.com

Doc guides: https://rabbitmq.com/documentation.html
Support: https://rabbitmq.com/contact.html
Tutorials: https://rabbitmq.com/getstarted.html
Monitoring: https://rabbitmq.com/monitoring.html

Logs: <stdout>

Config file(s): /tmp/rabbitmq.conf

Starting broker...2023-02-14 10:29:02.096 [info] <0.267.0>
node : rabbit@zhxxxx-xxx-20230203150413-0008
home dir : /var/lib/rabbitmq
config file(s) : /tmp/rabbitmq.conf
cookie hash : xxxxxxx==
log(s) : <stdout>
database dir : /var/lib/rabbitmq/mnesia/rabbit@zhxxxx-xxx-20230203150413-0008
2023-02-14 10:29:07.360 [info] <0.267.0> Running boot step pre_boot defined by app rabbit
2023-02-14 10:29:07.360 [info] <0.267.0> Running boot step rabbit_core_metrics defined by app rabbit
2023-02-14 10:29:07.363 [info] <0.267.0> Running boot step rabbit_alarm defined by app rabbit
2023-02-14 10:29:07.373 [info] <0.355.0> Memory high watermark set to 12301 MiB (12898874163 bytes) of 30753 MiB (32247185408 bytes) total
2023-02-14 10:29:07.386 [info] <0.357.0> Enabling free disk space monitoring
2023-02-14 10:29:07.386 [info] <0.357.0> Disk free limit set to 50MB
2023-02-14 10:29:07.396 [info] <0.267.0> Running boot step code_server_cache defined by app rabbit
2023-02-14 10:29:07.396 [info] <0.267.0> Running boot step file_handle_cache defined by app rabbit
2023-02-14 10:29:07.398 [info] <0.360.0> Limiting to approx 1073741719 file handles (966367545 sockets)
2023-02-14 10:29:07.398 [info] <0.361.0> FHC read buffering: OFF
2023-02-14 10:29:07.398 [info] <0.361.0> FHC write buffering: ON
2023-02-14 10:29:07.399 [info] <0.267.0> Running boot step worker_pool defined by app rabbit
2023-02-14 10:29:07.399 [info] <0.343.0> Will use 16 processes for default worker pool
2023-02-14 10:29:07.400 [info] <0.343.0> Starting worker pool 'worker_pool' with 16 processes in it
2023-02-14 10:29:07.402 [info] <0.267.0> Running boot step database defined by app rabbit
2023-02-14 10:29:07.403 [info] <0.267.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@zhxxxx-xxx-20230203150413-0008 is empty. Assuming we need to join an existing cluster or initialise from scratch...
2023-02-14 10:29:07.403 [info] <0.267.0> Configured peer discovery backend: rabbit_peer_discovery_classic_config
2023-02-14 10:29:07.404 [info] <0.267.0> Will try to lock with peer discovery backend rabbit_peer_discovery_classic_config
2023-02-14 10:29:07.404 [info] <0.267.0> Peer discovery backend does not support locking, falling back to randomized delay
2023-02-14 10:29:07.404 [info] <0.267.0> Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping randomized startup delay.
2023-02-14 10:29:07.404 [info] <0.267.0> All discovered existing cluster peers:
2023-02-14 10:29:07.404 [info] <0.267.0> Discovered no peer nodes to cluster with. Some discovery backends can filter nodes out based on a readiness criteria. Enabling debug logging might help troubleshoot.
2023-02-14 10:29:07.410 [info] <0.44.0> Application mnesia exited with reason: stopped
2023-02-14 10:29:15.639 [info] <0.267.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2023-02-14 10:29:15.639 [info] <0.267.0> Successfully synced tables from a peer
2023-02-14 10:29:15.700 [info] <0.267.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2023-02-14 10:29:15.700 [info] <0.267.0> Successfully synced tables from a peer
2023-02-14 10:29:15.701 [info] <0.267.0> Feature flag `implicit_default_bindings`: supported, attempt to enable...
2023-02-14 10:29:15.701 [info] <0.267.0> Feature flag `implicit_default_bindings`: mark as enabled=state_changing
2023-02-14 10:29:15.713 [info] <0.267.0> Feature flags: list of feature flags found:
2023-02-14 10:29:15.713 [info] <0.267.0> Feature flags: [~] implicit_default_bindings
2023-02-14 10:29:15.713 [info] <0.267.0> Feature flags: [ ] quorum_queue
2023-02-14 10:29:15.713 [info] <0.267.0> Feature flags: [ ] virtual_host_metadata
2023-02-14 10:29:15.713 [info] <0.267.0> Feature flags: feature flag states written to disk: yes
2023-02-14 10:29:15.739 [info] <0.267.0> Waiting for Mnesia tables for 30000 ms, 0 retries left
2023-02-14 10:29:15.739 [info] <0.267.0> Successfully synced tables from a peer
2023-02-14 10:29:15.739 [info] <0.267.0> Feature flag `implicit_default_bindings`: mark as enabled=true
2023-02-14 10:29:15.754 [info] <0.267.0> Feature flags: list of feature flags found:
2023-02-14 10:29:15.754 [info] <0.267.0> Feature flags: [x] implicit_default_bindings
2023-02-14 10:29:15.754 [info] <0.267.0> Feature flags: [ ] quorum_queue
2023-02-14 10:29:15.754 [info] <0.267.0> Feature flags: [ ] virtual_host_metadata
2023-02-14 10:29:15.755 [info] <0.267.0> Feature flags: feature flag states written to disk: yes
2023-02-14 10:29:15.784 [info] <0.267.0> Feature flag `quorum_queue`: supported, attempt to enable...
2023-02-14 10:29:15.784 [info] <0.267.0> Feature flag `quorum_queue`: mark as enabled=state_changing
2023-02-14 10:29:15.796 [info] <0.267.0> Feature flags: list of feature flags found:
2023-02-14 10:29:15.797 [info] <0.267.0> Feature flags: [x] implicit_default_bindings
2023-02-14 10:29:15.797 [info] <0.267.0> Feature flags: [~] quorum_queue
2023-02-14 10:29:15.797 [info] <0.267.0> Feature flags: [ ] virtual_host_metadata
2023-02-14 10:29:15.797 [info] <0.267.0> Feature flags: feature flag states written to disk: yes
2023-02-14 10:29:15.824 [info] <0.267.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2023-02-14 10:29:15.824 [info] <0.267.0> Successfully synced tables from a peer
2023-02-14 10:29:15.824 [info] <0.267.0> Feature flag `quorum_queue`: migrating Mnesia table rabbit_queue...
2023-02-14 10:29:15.972 [info] <0.267.0> Feature flag `quorum_queue`: migrating Mnesia table rabbit_durable_queue...
2023-02-14 10:29:16.026 [info] <0.267.0> Feature flag `quorum_queue`: Mnesia tables migration done
2023-02-14 10:29:16.026 [info] <0.267.0> Feature flag `quorum_queue`: mark as enabled=true
2023-02-14 10:29:16.040 [info] <0.267.0> Feature flags: list of feature flags found:
2023-02-14 10:29:16.040 [info] <0.267.0> Feature flags: [x] implicit_default_bindings
2023-02-14 10:29:16.040 [info] <0.267.0> Feature flags: [x] quorum_queue
2023-02-14 10:29:16.041 [info] <0.267.0> Feature flags: [ ] virtual_host_metadata
2023-02-14 10:29:16.041 [info] <0.267.0> Feature flags: feature flag states written to disk: yes
2023-02-14 10:29:16.067 [info] <0.267.0> Feature flag `virtual_host_metadata`: supported, attempt to enable...
2023-02-14 10:29:16.067 [info] <0.267.0> Feature flag `virtual_host_metadata`: mark as enabled=state_changing
2023-02-14 10:29:16.079 [info] <0.267.0> Feature flags: list of feature flags found:
2023-02-14 10:29:16.079 [info] <0.267.0> Feature flags: [x] implicit_default_bindings
2023-02-14 10:29:16.079 [info] <0.267.0> Feature flags: [x] quorum_queue
2023-02-14 10:29:16.079 [info] <0.267.0> Feature flags: [~] virtual_host_metadata
2023-02-14 10:29:16.079 [info] <0.267.0> Feature flags: feature flag states written to disk: yes
2023-02-14 10:29:16.106 [info] <0.267.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2023-02-14 10:29:16.106 [info] <0.267.0> Successfully synced tables from a peer
2023-02-14 10:29:16.162 [info] <0.267.0> Feature flag `virtual_host_metadata`: mark as enabled=true
2023-02-14 10:29:16.176 [info] <0.267.0> Feature flags: list of feature flags found:
2023-02-14 10:29:16.176 [info] <0.267.0> Feature flags: [x] implicit_default_bindings
2023-02-14 10:29:16.176 [info] <0.267.0> Feature flags: [x] quorum_queue
2023-02-14 10:29:16.176 [info] <0.267.0> Feature flags: [x] virtual_host_metadata
2023-02-14 10:29:16.176 [info] <0.267.0> Feature flags: feature flag states written to disk: yes
2023-02-14 10:29:16.200 [info] <0.267.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2023-02-14 10:29:16.200 [info] <0.267.0> Successfully synced tables from a peer
2023-02-14 10:29:16.257 [info] <0.267.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2023-02-14 10:29:16.258 [info] <0.267.0> Successfully synced tables from a peer
2023-02-14 10:29:16.258 [info] <0.267.0> Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping registration.
2023-02-14 10:29:16.258 [info] <0.267.0> Running boot step database_sync defined by app rabbit
2023-02-14 10:29:16.258 [info] <0.267.0> Running boot step feature_flags defined by app rabbit
2023-02-14 10:29:16.259 [info] <0.267.0> Running boot step codec_correctness_check defined by app rabbit
2023-02-14 10:29:16.259 [info] <0.267.0> Running boot step external_infrastructure defined by app rabbit
2023-02-14 10:29:16.259 [info] <0.267.0> Running boot step rabbit_registry defined by app rabbit
2023-02-14 10:29:16.259 [info] <0.267.0> Running boot step rabbit_auth_mechanism_cr_demo defined by app rabbit
2023-02-14 10:29:16.260 [info] <0.267.0> Running boot step rabbit_queue_location_random defined by app rabbit
2023-02-14 10:29:16.260 [info] <0.267.0> Running boot step rabbit_event defined by app rabbit
2023-02-14 10:29:16.261 [info] <0.267.0> Running boot step rabbit_auth_mechanism_amqplain defined by app rabbit
2023-02-14 10:29:16.261 [info] <0.267.0> Running boot step rabbit_auth_mechanism_plain defined by app rabbit
2023-02-14 10:29:16.261 [info] <0.267.0> Running boot step rabbit_exchange_type_direct defined by app rabbit
2023-02-14 10:29:16.261 [info] <0.267.0> Running boot step rabbit_exchange_type_fanout defined by app rabbit
2023-02-14 10:29:16.262 [info] <0.267.0> Running boot step rabbit_exchange_type_headers defined by app rabbit
2023-02-14 10:29:16.262 [info] <0.267.0> Running boot step rabbit_exchange_type_topic defined by app rabbit
2023-02-14 10:29:16.263 [info] <0.267.0> Running boot step rabbit_mirror_queue_mode_all defined by app rabbit
2023-02-14 10:29:16.263 [info] <0.267.0> Running boot step rabbit_mirror_queue_mode_exactly defined by app rabbit
2023-02-14 10:29:16.263 [info] <0.267.0> Running boot step rabbit_mirror_queue_mode_nodes defined by app rabbit
2023-02-14 10:29:16.263 [info] <0.267.0> Running boot step rabbit_priority_queue defined by app rabbit
2023-02-14 10:29:16.263 [info] <0.267.0> Priority queues enabled, real BQ is rabbit_variable_queue
2023-02-14 10:29:16.263 [info] <0.267.0> Running boot step rabbit_queue_location_client_local defined by app rabbit
2023-02-14 10:29:16.264 [info] <0.267.0> Running boot step rabbit_queue_location_min_masters defined by app rabbit
2023-02-14 10:29:16.264 [info] <0.267.0> Running boot step kernel_ready defined by app rabbit
2023-02-14 10:29:16.264 [info] <0.267.0> Running boot step rabbit_sysmon_minder defined by app rabbit
2023-02-14 10:29:16.265 [info] <0.267.0> Running boot step rabbit_epmd_monitor defined by app rabbit
2023-02-14 10:29:24.268 [info] <0.607.0> epmd monitor knows us, inter-node communication (distribution) port: 25672
2023-02-14 10:29:24.269 [info] <0.267.0> Running boot step guid_generator defined by app rabbit
2023-02-14 10:29:24.280 [info] <0.267.0> Running boot step rabbit_node_monitor defined by app rabbit
2023-02-14 10:29:24.281 [info] <0.619.0> Starting rabbit_node_monitor
2023-02-14 10:29:24.281 [info] <0.267.0> Running boot step delegate_sup defined by app rabbit
2023-02-14 10:29:24.283 [info] <0.267.0> Running boot step rabbit_memory_monitor defined by app rabbit
2023-02-14 10:29:24.283 [info] <0.267.0> Running boot step core_initialized defined by app rabbit
2023-02-14 10:29:24.284 [info] <0.267.0> Running boot step upgrade_queues defined by app rabbit
2023-02-14 10:29:24.316 [info] <0.267.0> message_store upgrades: 1 to apply
2023-02-14 10:29:24.316 [info] <0.267.0> message_store upgrades: Applying rabbit_variable_queue:move_messages_to_vhost_store
2023-02-14 10:29:24.316 [info] <0.267.0> message_store upgrades: No durable queues found. Skipping message store migration
2023-02-14 10:29:24.317 [info] <0.267.0> message_store upgrades: Removing the old message store data
2023-02-14 10:29:24.318 [info] <0.267.0> message_store upgrades: All upgrades applied successfully
2023-02-14 10:29:24.358 [info] <0.267.0> Running boot step rabbit_connection_tracking defined by app rabbit
2023-02-14 10:29:24.358 [info] <0.267.0> Running boot step rabbit_connection_tracking_handler defined by app rabbit
2023-02-14 10:29:24.359 [info] <0.267.0> Running boot step rabbit_exchange_parameters defined by app rabbit
2023-02-14 10:29:24.359 [info] <0.267.0> Running boot step rabbit_mirror_queue_misc defined by app rabbit
2023-02-14 10:29:24.360 [info] <0.267.0> Running boot step rabbit_policies defined by app rabbit
2023-02-14 10:29:24.363 [info] <0.267.0> Running boot step rabbit_policy defined by app rabbit
2023-02-14 10:29:24.363 [info] <0.267.0> Running boot step rabbit_queue_location_validator defined by app rabbit
2023-02-14 10:29:24.363 [info] <0.267.0> Running boot step rabbit_quorum_memory_manager defined by app rabbit
2023-02-14 10:29:24.363 [info] <0.267.0> Running boot step rabbit_vhost_limit defined by app rabbit
2023-02-14 10:29:24.364 [info] <0.267.0> Running boot step recovery defined by app rabbit
2023-02-14 10:29:24.366 [info] <0.267.0> Running boot step definition_import_worker_pool defined by app rabbit
2023-02-14 10:29:24.366 [info] <0.343.0> Starting worker pool 'definition_import_pool' with 16 processes in it
2023-02-14 10:29:24.369 [info] <0.267.0> Running boot step load_core_definitions defined by app rabbit
2023-02-14 10:29:24.369 [info] <0.267.0> Running boot step empty_db_check defined by app rabbit
2023-02-14 10:29:24.369 [info] <0.267.0> Adding vhost '/' (description: 'Default virtual host')
2023-02-14 10:29:24.435 [info] <0.674.0> Making sure data directory '/var/lib/rabbitmq/mnesia/rabbit@zhxxxx-xxx-20230203150413-0008/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L' for vhost '/' exists
2023-02-14 10:29:24.446 [info] <0.674.0> Starting message stores for vhost '/'
2023-02-14 10:29:24.446 [info] <0.678.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_transient": using rabbit_msg_store_ets_index to provide index
2023-02-14 10:29:24.451 [info] <0.674.0> Started message store of type transient for vhost '/'
2023-02-14 10:29:24.451 [info] <0.682.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": using rabbit_msg_store_ets_index to provide index
2023-02-14 10:29:24.453 [warning] <0.682.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": rebuilding indices from scratch
2023-02-14 10:29:24.455 [info] <0.674.0> Started message store of type persistent for vhost '/'
2023-02-14 10:29:24.466 [info] <0.267.0> Created user 'root'
2023-02-14 10:29:24.473 [info] <0.267.0> Successfully set user tags for user 'root' to [administrator]
2023-02-14 10:29:24.481 [info] <0.267.0> Successfully set permissions for 'root' in virtual host '/' to '.*', '.*', '.*'
2023-02-14 10:29:24.481 [info] <0.267.0> Running boot step rabbit_looking_glass defined by app rabbit
2023-02-14 10:29:24.481 [info] <0.267.0> Running boot step rabbit_core_metrics_gc defined by app rabbit
2023-02-14 10:29:24.482 [info] <0.267.0> Running boot step background_gc defined by app rabbit
2023-02-14 10:29:24.483 [info] <0.267.0> Running boot step connection_tracking defined by app rabbit
2023-02-14 10:29:24.494 [info] <0.267.0> Setting up a table for connection tracking on this node: 'tracked_connection_on_node_rabbit@zhxxxx-xxx-20230203150413-0008'
2023-02-14 10:29:24.506 [info] <0.267.0> Setting up a table for per-vhost connection counting on this node: 'tracked_connection_per_vhost_on_node_rabbit@zhxxxx-xxx-20230203150413-0008'
2023-02-14 10:29:24.507 [info] <0.267.0> Running boot step routing_ready defined by app rabbit
2023-02-14 10:29:24.507 [info] <0.267.0> Running boot step pre_flight defined by app rabbit
2023-02-14 10:29:24.507 [info] <0.267.0> Running boot step notify_cluster defined by app rabbit
2023-02-14 10:29:24.507 [info] <0.267.0> Running boot step networking defined by app rabbit
2023-02-14 10:29:40.515 [info] <0.745.0> started TCP listener on [::]:5672
2023-02-14 10:29:48.517 [info] <0.267.0> Running boot step cluster_name defined by app rabbit
2023-02-14 10:29:48.518 [info] <0.267.0> Initialising internal cluster ID to 'rabbitmq-cluster-id-AQyx0iUPrMGenjAJo1uO7A'
2023-02-14 10:29:48.527 [info] <0.267.0> Running boot step direct_client defined by app rabbit
2023-02-14 10:29:50.216 [info] <0.760.0> Feature flags: list of feature flags found:
2023-02-14 10:29:50.217 [info] <0.760.0> Feature flags: [ ] drop_unroutable_metric
2023-02-14 10:29:50.217 [info] <0.760.0> Feature flags: [ ] empty_basic_get_metric
2023-02-14 10:29:50.217 [info] <0.760.0> Feature flags: [x] implicit_default_bindings
2023-02-14 10:29:50.217 [info] <0.760.0> Feature flags: [x] quorum_queue
2023-02-14 10:29:50.217 [info] <0.760.0> Feature flags: [x] virtual_host_metadata
2023-02-14 10:29:50.217 [info] <0.760.0> Feature flags: feature flag states written to disk: yes
2023-02-14 10:29:51.162 [info] <0.760.0> Running boot step rabbit_mgmt_db_handler defined by app rabbitmq_management_agent
2023-02-14 10:29:51.162 [info] <0.760.0> Management plugin: using rates mode 'basic'
2023-02-14 10:29:51.326 [info] <0.760.0> Running boot step rabbit_mgmt_reset_handler defined by app rabbitmq_management
2023-02-14 10:29:51.326 [info] <0.760.0> Running boot step rabbit_management_load_definitions defined by app rabbitmq_management
2023-02-14 10:29:59.390 [info] <0.827.0> Management plugin: HTTP (non-TLS) listener started on port 15672
2023-02-14 10:29:59.392 [info] <0.941.0> Statistics database started.
2023-02-14 10:29:59.392 [info] <0.940.0> Starting worker pool 'management_worker_pool' with 3 processes in it
2023-02-14 10:30:07.530 [info] <0.954.0> Prometheus metrics: HTTP (non-TLS) listener started on port 15692
2023-02-14 10:30:08.171 [info] <0.760.0> Server startup complete; 4 plugins started.
* rabbitmq_prometheus
* rabbitmq_management
* rabbitmq_web_dispatch
* rabbitmq_management_agent
completed with 4 plugins.

看日志结尾也有 5672 端口监听,习惯性的从结尾看了,后面没头绪,往前翻看日志,看到了第一行的:

1
{:query, :"rabbit@zhxxxx-xxx-20230203150413-0008", {:badrpc, :timeout}}

如果 rabbitmq 多机的时候,出现 badrpc 一般是网络不通,或者成员之间的 hostname 无法解析,感觉这个单机的 mq 应该也是类似问题,看下 hosts 文件:

1
2
3
4
5
[root@zhxxxx-xxx-20230203150413-0008 ~]# docker exec -ti k8s_rabbitmq_rabbitmq-xxxx_0 bash
root@zhxxxx-xxx-20230203150413-0008:/# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
....
127.0.0.1 zhxxxx-xxx-20230203150413-0008.novalocal

emmm,果然没有 zhxxxx-xxx-20230203150413-0008 的解析记录,添加后就好了

1
2
3
4
5
root@zhxxxx-xxx-20230203150413-0008:/# echo '127.0.0.1 zhxxxx-xxx-20230203150413-0008' >> /etc/hosts
root@zhxxxx-xxx-20230203150413-0008:/# rabbitmqctl list_users
Listing users ...
user tags
root [administrator]

原因

.novalocal 是 openstack nova dhcp_domain 添加的,不知道为啥 mq 不把后缀当作 hostname 的部分,使用 hostnamectl 看是对的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[root@zhxxxx-xxx-20230203150413-0008 ~]# hostnamectl status
Static hostname: zhxxxx-xxx-20230203150413-0008.novalocal
Icon name: computer-vm
Chassis: vm
Machine ID: 059dd05a412c4xxxxxxxxxbf9892c16c
Boot ID: 8df788ae62e54xxxxxxxx99ce87178fe
Virtualization: kvm
Operating System: openEuler 22.03 LTS
Kernel: Linux 5.10.0-60.18.0.50.oe2203.aarch64
Architecture: arm64
Hardware Vendor: QEMU
Hardware Model: KVM Virtual Machine
[root@zhxxxx-xxx-20230203150413-0008 ~]# echo $HOSTNAME
zhxxxx-xxx-20230203150413-0008.novalocal
[root@zhxxxx-xxx-20230203150413-0008 ~]# docker exec -ti k8s_rabbitmq_rabbitmq-1-xxxx_0 bash
root@zhxxxx-xxx-20230203150413-0008:/# echo $HOSTNAME
zhxxxx-xxx-20230203150413-0008.novalocal

关于反向解析的,查了下默认是关闭的,暂时宿主机上手动添加 hosts 解析解决

1
2
$ docker exec 0ea rabbitmqctl environment | grep -i dns
{reverse_dns_lookups,false},

后续解决

打算更改下 rabbitmq 的 docker 镜像增加 entrypoint.sh ,增加去掉点后缀的 hosts 记录,避免这种情况,因为 nova 上可以改 dhcp_domain,兼容未知后缀的话就是打地鼠,而且 host 网络容器每次 restart 后都会同步宿主机的 hosts,所以镜像启动脚本兼容是最优解

在修改期间,看到了官方的 docker-entrypoint.sh 脚本里有类似判断:

1
2
3
4
# if long and short hostnames are not the same, use long hostnames
if [ "$(hostname)" != "$(hostname -s)" ]; then
: "${RABBITMQ_USE_LONGNAME:=true}"
fi

机器上输出则是下面,确实不一样

1
2
3
4
[root@zhxxxx-xxx-20230203150413-0008 ~]# hostname
zhxxxx-xxx-20230203150413-0008.novalocal
[root@zhxxxx-xxx-20230203150413-0008 ~]# hostname -s
zhxxxx-xxx-20230203150413-0008

官方文档的 clustering.html 里并没有看懂 RABBITMQ_USE_LONGNAME 描述,还是直接 hack entrypoint.sh 在前面添加下面的了

1
2
3
if [ "$(hostname)" != "$(hostname -s)" ]; then
grep -Eqw `hostname -s`'$' /etc/hosts || echo "127.0.0.1 `hostname -s`" >> /etc/hosts
fi

参考

CATALOG
  1. 1. 前因
  2. 2. 过程
    1. 2.1. 排查
    2. 2.2. 原因
    3. 2.3. 后续解决
  3. 3. 参考