Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You need to look into using HA queues (ie. mirroring) and ensure you are using a client that properly supports consumer cancel notifications and are actually processing them correctly.

There is also significant TCP/IP and RabbitMQ tuning that can be done to make fail-over much faster etc.

I have used RabbitMQ for years and though it's not perfect if you do try something else you will quickly understand that it's still so far ahead of the competition.

Best of luck!



HA queues have this caveat in the documentation:

    This solution requires a RabbitMQ cluster, which means
    that it will not cope seamlessly with network partitions
    within the cluster and, for that reason, is not
    recommended for use across a WAN
And then:

    However, there is currently no way for a slave to know
    whether or not its queue contents have diverged from the
    master to which it is rejoining (this could happen
    during a network partition, for example). As such, when
    a slave rejoins a mirrored queue, it throws away any
    durable local contents it already has and starts empty.
So I don't think that's helpful to us at all.

We are going to migrate to a client that supports consumer cancel notifications, though. Thanks for the tip.


True, what I do recommend though is using RMQ clusters on each cloud (where networking should be abit more reliable, the exception being AWS, which always sucks in this regard) then using federation (probably via shovel, but there are other means) to the other clouds.

Ultimately though.. when you get to this stage I question if your app is big enough to warrant this is suggest you use Azure Service Bus/Simple Queuing Services/whatever else your providers make available.

If your business really needs such control over messaging.. I understand. I have been in the situation where those easy ways out aren't available and I know your pain. Unfortunately there is no vendor you can go to make it go away, Tibco, Sterling etc aren't much better than RMQ.

I wish you the best of luck in your multi-cloud federated messaging system though, I highly suggest you look at Azure Service Bus though, I have nothing but praise for it despite being a devout RMQ zealot.


You misunderstood me, I think. Our clouds aren't connected. The partitioning problem exists within each data center (eg., Digital Ocean).

So federation/shovel is probably not the solution.

SQS is far too simple for our needs. No idea what Azure is, but if it's SaaS the latency will likely be too high. We need local performance.


Can you point me at a good place to start for RMQ and TCP/IP stack tuning?


Sure, I would recommend the below in sysctl:

  net.ipv4.tcp_keepalive_time=5
  net.ipv4.tcp_keepalive_probes=5
  net.ipv4.tcp_keepalive_intvl=1
This will tune the TCP keepalives to decrease the time it takes for most client stacks to realize a server has gone away. It should also be configured on the servers themselves that are participating in the cluster.

As for rabbitmq tuning I recommend these settings at a minimum:

  [
   {rabbit, [{tcp_listen_options, [binary,
                                  {packet, raw},
                                  {reuseaddr, true},
                                  {backlog, 128},
                                  {nodelay, true},
                                  {exit_on_close, false},
                                  {keepalive, true}]}
            ]}
  ].


Thank you! I have some reading to do. Appreciate it!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: