开源项目

知识点

相关文章

更多

最近更新

更多

RabbitMQ 集群搭建

2019-03-08 23:12|来源: 网路

准备机器:
192.168.56.101 huangyineng
192.168.56.102 slave2
192.168.56.103 slave1


参考《RabbitMQ 环境配置-基于linux》安装好三台RabbitMQ server,然后开启 RabbitMQ 监控插件

[hadoop@huangyineng rabbitmq]$ sbin/rabbitmq-plugins enable rabbitmq_management
The following plugins have been enabled:
 mochiweb
 webmachine
 rabbitmq_web_dispatch
 amqp_client
 rabbitmq_management_agent
 rabbitmq_management


1、了解.erlang.cookie

RabbitMQ nodes and CLI tools (e.g. rabbitmqctl) use a cookie to determine whether they are allowed to communicate with each other. For two nodes to be able to communicate they must have the same shared secret called the Erlang cookie. The cookie is just a string of alphanumeric characters. It can be as long or short as you like. Every cluster node must have the same cookie.

Erlang VM will automatically create a random cookie file when the RabbitMQ server starts up. The easiest way to proceed is to allow one node to create the file, and then copy it to all the other nodes in the cluster.

On Unix systems, the cookie will be typically located in /var/lib/rabbitmq/.erlang.cookie or$HOME/.erlang.cookie.

On Windows, the locations are C:\Users\Current User\.erlang.cookie(%HOMEDRIVE% + %HOMEPATH%\.erlang.cookie) orC:\Documents and Settings\Current User\.erlang.cookie, and C:\Windows\.erlang.cookie for RabbitMQ Windows service. If Windows service is used, the cookie should be placed in both places.

As an alternative, you can insert the option "-setcookie cookie" in the erl call in the rabbitmq-serverand rabbitmqctl scripts.

When the cookie is misconfigured (for example, not identical), RabbitMQ will log errors such as "Connection attempt from disallowed node" and "Could not auto-cluster".


Rabbitmq的集群是依附于erlang的集群来工作的。Erlang的集群中各节点是经由过程一个magic cookie来实现的,这个cookie存放在 $home/.erlang.cookie 中(如本人测试是在 /home/hadoop/.erlang.cookie ),文件是400的权限。所以必须包管各节点cookie对峙一致,不然节点之间就无法通信。


2、同步.erlang.cookie

由于.erlang.cookie的权限是400,不能修改,首先对其他两台机器的.erlang.cookie更改为可修改的权限,如:
[hadoop@slave2 sbin]$ chmod 777 ~/.erlang.cookie
[hadoop@slave1 rabbitmq]$ chmod 777 ~/.erlang.cookie

把另外一台机器的.erlang.cookie 拷贝到上面两台机器,然后把权限改回来
[hadoop@huangyineng rabbitmq]$ scp ~/.erlang.cookie slave1:/home/hadoop/
[hadoop@huangyineng rabbitmq]$ scp ~/.erlang.cookie slave2:/home/hadoop/

[hadoop@slave1 rabbitmq]$ chmod 400 ~/.erlang.cookie
[hadoop@slave2 sbin]$ chmod 400 ~/.erlang.cookie


3、使用 -detached 参数运行各节点

如果rabbitmq server在运行,先停掉
[hadoop@huangyineng rabbitmq]$ sbin/rabbitmqctl stop
[hadoop@huangyineng rabbitmq]$ sbin/rabbitmq-server -detached
Warning: PID file not written; -detached was passed.


[hadoop@huangyineng rabbitmq]$ ps -ef | grep rabbit

hadoop    3373     1  5 10:24 ?        00:00:02 /data/dn1/erlang/lib/erlang/erts-8.0/bin/beam -W w -A 64 -P 1048576 -K true -- -root /data/dn1/erlang/lib/erlang -progname erl -- -home /home/hadoop -- -pa /data/dn1/rabbitmq/ebin -noshell -noinput -s rabbit boot -sname rabbit@huangyineng -boot start_sasl -kernel inet_default_connect_options [{nodelay,true}] -sasl errlog_type error -sasl sasl_error_logger false -rabbit error_logger {file,"/data/dn1/rabbitmq/var/log/rabbitmq/rabbit@huangyineng.log"} -rabbit sasl_error_logger {file,"/data/dn1/rabbitmq/var/log/rabbitmq/rabbit@huangyineng-sasl.log"} -rabbit enabled_plugins_file "/data/dn1/rabbitmq/etc/rabbitmq/enabled_plugins" -rabbit plugins_dir "/data/dn1/rabbitmq/plugins" -rabbit plugins_expand_dir "/data/dn1/rabbitmq/var/lib/rabbitmq/mnesia/rabbit@huangyineng-plugins-expand" -os_mon start_cpu_sup false -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/data/dn1/rabbitmq/var/lib/rabbitmq/mnesia/rabbit@huangyineng" -kernel inet_dist_listen_min 25672 -kernel inet_dist_listen_max 25672 -noshell -noinput

在另外两个机器上同样使用上面的命令操作

使用 rabbitmqctl cluster_status查看每台机器的状态
[ hadoop@huangyineng rabbitmq]$ sbin/rabbitmqctl cluster_status
Cluster status of node rabbit@huangyineng ...
[{nodes,[{disc,[ rabbit@huangyineng]}]},
{running_nodes,[ rabbit@huangyineng]},
{cluster_name,<<" rabbit@huangyineng">>},
{partitions,[]},
{alarms,[{ rabbit@huangyineng,[]}]}]
[ hadoop@slave1 rabbitmq]$ sbin/rabbitmqctl cluster_status
Cluster status of node rabbit@slave1 ...
[{nodes,[{disc,[ rabbit@slave1]}]},
{running_nodes,[ rabbit@slave1]},
{cluster_name,<<" rabbit@slave1">>},
{partitions,[]},
{alarms,[{ rabbit@slave1,[]}]}]
[ hadoop@slave2 rabbitmq]$ sbin/rabbitmqctl cluster_status
Cluster status of node rabbit@slave2 ...
[{nodes,[{disc,[ rabbit@slave2]}]},
{running_nodes,[ rabbit@slave2]},
{cluster_name,<<" rabbit@slave2">>},
{partitions,[]},
{alarms,[{ rabbit@slave2,[]}]}]

4、创建集群
把slave1、slave2与huangyineng组成集群
[ hadoop@slave1 rabbitmq]$ sbin/rabbitmqctl stop_app
Stopping node rabbit@slave1 ...
[ hadoop@slave1 rabbitmq]$ sbin/rabbitmqctl join_cluster rabbit@huangyineng
Clustering node rabbit@slave1 with rabbit@huangyineng ...
[ hadoop@slave1 rabbitmq]$ sbin/rabbitmqctl start_app
Starting node rabbit@slave1 ...


在slave2上进行同样操作

[ hadoop@slave2 rabbitmq]$ sbin/rabbitmqctl stop_app
Stopping node rabbit@slave2 ...
[ hadoop@slave2 rabbitmq]$ sbin/rabbitmqctl join_cluster rabbit@huangyineng
Clustering node rabbit@slave2 with rabbit@huangyineng ...
[ hadoop@slave2 rabbitmq]$ sbin/rabbitmqctl start_app
Starting node rabbit@slave2 ...


使用rabbitmqctl cluster_status在任意一台机器上看到的状态都是一样的

[ hadoop@huangyineng rabbitmq]$ sbin/rabbitmqctl cluster_status
Cluster status of node rabbit@huangyineng ...
{cluster_name,<<" rabbit@huangyineng">>},
{partitions,[]},
{alarms,[{ rabbit@slave2,[]},{ rabbit@slave1,[]},{ rabbit@huangyineng,[]}]}]


注:如果要使用内存节点,则可以使用--ram参数加入集群,如:

[hadoop@slave2 rabbitmq]$ sbin/rabbitmqctl join_cluster --ram rabbit@huangyineng


5、重启集群节点

停掉huangyineng节点
[hadoop@huangyineng rabbitmq]$ sbin/rabbitmqctl stop
Stopping and halting node rabbit@huangyineng ...

查看集群状态,可以看出运行的节点有 rabbit@slave1,rabbit@slave2
[ hadoop@slave2 rabbitmq]$ sbin/rabbitmqctl cluster_status
Cluster status of node rabbit@slave2 ...
{running_nodes,[ rabbit@slave1, rabbit@slave2]},
{cluster_name,<<" rabbit@huangyineng">>},
{partitions,[]},
{alarms,[{ rabbit@slave1,[]},{ rabbit@slave2,[]}]}]


重启huangyineng节点

[ hadoop@huangyineng rabbitmq]$ sbin/rabbitmq-server -detached
Warning: PID file not written; -detached was passed.
[ hadoop@huangyineng rabbitmq]$ sbin/rabbitmqctl cluster_status
Cluster status of node rabbit@huangyineng ...
{cluster_name,<<" rabbit@huangyineng">>},
{partitions,[]},
{alarms,[{ rabbit@slave1,[]},{ rabbit@slave2,[]},{ rabbit@huangyineng,[]}]}]


There are some important caveats:

   When the entire cluster is brought down, the last node to go down must be the first node to be brought online. If this doesn't happen, the nodes will wait 30 seconds for the last disc node to come back online, and fail afterwards. If the last node to go offline cannot be brought back up, it can be removed from the cluster using the forget_cluster_node command - consult the rabbitmqctl manpage for more information.
   If all cluster nodes stop in a simultaneous and uncontrolled manner (for example with a power cut) you can be left with a situation in which all nodes think that some other node stopped after them. In this case you can use the force_boot command on one node to make it bootable again - consult the rabbitmqctl manpage for more information.


6、从集群中移除节点

6.1、主动从集群中退出
把slave2从集群中移除
[ hadoop@slave2 rabbitmq]$ sbin/rabbitmqctl stop_app
Stopping node rabbit@slave2 ...
[ hadoop@slave2 rabbitmq]$ sbin/rabbitmqctl reset
Resetting node rabbit@slave2 ...
[ hadoop@slave2 rabbitmq]$ sbin/rabbitmqctl start_app
Starting node rabbit@slave2 ...


在slave2中查看集群状态,只有他自己了

[ hadoop@slave2 rabbitmq]$ sbin/rabbitmqctl cluster_status
Cluster status of node rabbit@slave2 ...
[{nodes,[{disc,[ rabbit@slave2]}]},
{running_nodes,[ rabbit@slave2]},
{cluster_name,<<" rabbit@slave2">>},
{partitions,[]},
{alarms,[{ rabbit@slave2,[]}]}]


在其他节点查看集群状态,可以看出该slave2已经不在集群中了

[ hadoop@huangyineng rabbitmq]$ sbin/rabbitmqctl cluster_status
Cluster status of node rabbit@huangyineng ...
[{nodes,[{disc,[ rabbit@huangyineng, rabbit@slave1]}]},
{running_nodes,[ rabbit@slave1, rabbit@huangyineng]},
{cluster_name,<<" rabbit@huangyineng">>},
{partitions,[]},
{alarms,[{ rabbit@slave1,[]},{ rabbit@huangyineng,[]}]}]


6.2、使用forget_cluster_node被动退出

停掉slave1,相当他挂了
[ hadoop@slave1 rabbitmq]$ sbin/rabbitmqctl stop_app
Stopping node rabbit@slave1 ...


在huangyineng这个节点上移除他

[ hadoop@huangyineng rabbitmq]$ sbin/rabbitmqctl forget_cluster_node rabbit@slave1
Removing node rabbit@slave1 from cluster ...


启动slave1时,他以为自己还在集群中,但集群已经把它移除了,所以会出现启动报错,可以执行rabbitmqctl reset重置后再启动

[ hadoop@slave1 rabbitmq]$ sbin/rabbitmqctl start_app
Starting node rabbit@slave1 ...
BOOT FAILED
===========
Error description:
  {error,{inconsistent_cluster,"Node rabbit@slave1 thinks it's clustered with node rabbit@huangyineng, but rabbit@huangyineng disagrees"}}
[ hadoop@slave1 rabbitmq]$ sbin/rabbitmqctl reset
Resetting node rabbit@slave1 ...
[ hadoop@slave1 rabbitmq]$ sbin/rabbitmqctl start_app
Starting node rabbit@slave1 ...
[ hadoop@slave1 rabbitmq]$ sbin/rabbitmqctl cluster_status
Cluster status of node rabbit@slave1 ...
[{nodes,[{disc,[ rabbit@slave1]}]},
{running_nodes,[ rabbit@slave1]},
{cluster_name,<<" rabbit@slave1">>},
{partitions,[]},
{alarms,[{ rabbit@slave1,[]}]}]


相关问答

更多
  • 下方两个网址有详细的教程,望采纳,还是不行可以追问我。http://www.cnblogs.com/shanyou/p/4067250.html http://www.myexception.cn/windows/1596759.html
  • PORT 4369:Erlang使用端口映射程序守护程序(epmd)来解决群集中的节点名称。 节点必须能够达到彼此,端口映射程序守护进程可用于集群工作。 PORT 35197由inet_dist_listen_min / max设置防火墙必须允许此范围内的流量在群集节点之间传递 RabbitMQ管理控制台: PORT 15672 for RabbitMQ 3.x版 PORT 55672 for RabbitMQ pre 3.x 端口5672 RabbitMQ主端口。 对于一组节点,它们必须在35197和56 ...
  • 今天我也有同样的问题。 没有cookie或防火墙问题,Windows报告服务运行成功。 这是什么终于解决了: 以管理员身份运行RabbitMQ sbin命令提示符。 运行“rabbitmq-service remove” 运行“rabbitmq-service install” 由于某种原因,安装程序设置的服务未配置多个注册表项。 运行它设置正确,并允许服务运行。 有一件事我注意到,在我这样做之前,Windows服务视图中没有描述该服务。 使用rabbitmq-service命令安装后,描述可见。 如果您遇 ...
  • 通过拥有更多服务器,您可以获得更多吞吐量,能够接受更多连接的客户端等等。 尽管声明了资源的位置,但非HA群集仍能够查看群集中所有节点中的资源。 By having more servers you can get more throughput, be able to accept more connected clients and so on. The non HA cluster is able to see resources in all nodes in the clusters, despit ...
  • 正如写评论: ConnectionFactory factory = new ConnectionFactory(); //factory.setHost("130.211.112.37:5672"); <----- sethost accepts only the host! factory.setHost("130.211.112.37"); factory.setPort(5672); try { con ...
  • 没有更多信息,很难回答你的问题。 您至少应该查看日志文件和/或将它们发布到某处。 停止以root用户身份运行的节点后,将整个/var/lib/rabbitmq所有权更改为rabbitmq:rabbitmq 。 对/var/log/rabbitmq执行相同的/var/log/rabbitmq 。 这是RabbitMQ使用官方软件包和默认配置写入数据的唯一地方。 因为它以前是以root身份运行的,所以Erlang在/root/.erlang.cookie存储了它的cookie,这是用于允许节点间通信的共享密钥“ ...
  • GAE与RabbitMQ不具有可比性,因为您的问题几乎没有意义。 RabbitMQ只是消息传递,您很可能必须设置更多基础架构来支持您的用例。 GAE是一个完整的基础设施。 也就是说,RabbitMQ每秒可以处理大量的消息。 也许你想为手机游戏连接到一个API,以及一些应用程序逻辑来处理这些消息。 您不会让游戏客户端连接到您的RabbitMQ服务器并直接与其他客户端通信。 在GAE上,您通常会有请求并立即回复。 如果您需要持续时间较长的连接并允许您将消息推送到游戏客户端(不使用APN等),您可能需要考虑XMP ...
  • 我已经找到了解决问题2的方法,为了在不停机的情况下修复集群运行状况,我们需要删除不一致节点上的所有mnesia数据: [root@rmq01 ~]# rm -rf /var/lib/rabbitmq/mnesia/ [root@rmq01 ~]# service rabbitmq-server start Starting rabbitmq-server: SUCCESS rabbitmq-server. [root@rmq01 ~]# rabbitmqctl cluster_status Cluster ...
  • 我正在使用HAProxy进行故障转移,并在集群节点之间为rabbitmq进行负载均衡,同样在.net客户端上,当节点出现故障时需要手动重新连接客户端。 以下是两个节点群集的HAProxy配置,节点在15672和25672端口上运行。 客户端连接5672端口。 global daemon log 127.0.0.1 alert log 127.0.0.1 alert debug defaults log global mode http option dontlognull option ...