-
-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconnection fails #17
Comments
Yes. And the worst part is sometimes it indicates it is connected but won't receive any messages. |
Yes, I have seen this condition as well, typically after an attempted reconnection, the status seems to indicate that I am connected, even though no messages are received. |
With paho AutoReconnect turned on, I can see it attempting to reconnect while the connection is interrupted:
Ultimately, it does seem to reconnect at the MQTT level, but it doesn't appear to be resubscribing to the previously-subscribed channels. I imagine there is something that needs to happen in the emitter Go client, but I really don't know enough about the underlying code yet. |
[CC @tiddnet @mudhoney @paulborile] I've built a self-contained example of the problem: https://github.com/kamermans/emitter-reconnect-issue To reproduce my issue, clone the repo above, then run It will start a stack including:
Once it's started, give it about 20 seconds to stabilize, so you can get a feel for the output (sender client sending message to the channel, receiver client printing it, pings and keepalives throughout, server chatting about client status, etc). Now, run this command in a separate window to stop the server:
Wait a few seconds, and start it again:
You will see both clients complaining about the disconnection and reconnecting, however, after auto-reconnection, the receiver client doesn't display any more messages - it's as if it is no longer subscribed to the channel. This is the problem I'm reporting. |
Ok, so I have made some progress here. When the client loses a connection, then auto-reconnects, you need to resubscribe to all the channels again. Using the emitterClient.OnConnect(func(c *emitter.Client) {
log.Println("OnConnect triggered, subscribing...")
// Subscribe to the channel(s)
emitterClient.Subscribe(channelKey, channel, func(client *emitter.Client, msg emitter.Message) {
log.Printf("Message received: [%s] %s", msg.Topic(), msg.Payload())
})
}) You should If |
I've found a discussion about this problem on the eclipse paho MQTT client project: TL;DR: set I'll change this to the default and add an Emitter option in a PR. |
seems that eclipse-paho/paho.mqtt.golang#22 is fixed and closed. Has this fixed the reconnection issue for Emitter though ? |
The issue is still happening, even after using this auto-resubscribe code: func ConnectionLostHandler(client *emitter.Client, reason error) {
log.Println("Emitter connection lost, reconnecting. Reason: ", reason.Error())
emitterClient.Disconnect(100 * time.Millisecond)
if connErr := emitterClient.Connect(); connErr != nil {
log.Fatalf("Error on Client.Connect(): %v", connErr)
}
}
emitterClient.OnConnect(func(client *emitter.Client) {
// Subscribe to the channel
if subErr := emitterClient.Subscribe(channelKey, channel, nil); subErr != nil {
log.Fatalf("Error on Client.Subscribe(): %v", subErr)
}
log.Printf("Subscribed to channel '%s'", channel)
ConnectHandler(client)
})
emitterClient.OnDisconnect(ConnectionLostHandler) I am still seeing output like this in my logs:
I have 48 servers running the emitter client and 5-10 of them randomly get stuck in this condition. The only fix is for me to restart the application completely. I am running 3 emitter servers in different continents with latency-based DNS routing, so each client uses the nearest/fastest server. I wonder if my Any ideas? At this point, I'm seriously considering scrapping emitter for this project and moving to something else, but I really like the simplicity, channel authorization model and native go code. |
I'll have a look over the weekend to see if we can reproduce this issue. |
Thanks! I'm also running a couple versions of it on a separate machine that will slack me if it disconnects :). I've been using the master branch from |
I have a client on one machine, that just runs a loop incrementing ´i´ and publishing it. The other client, on another machine, has this code: package main
import (
"fmt"
emitter "github.com/emitter-io/go/v2"
)
func main() {
key := "zG0gBzTYXKYDLRA0n-O6cU2J5pbabnl_"
channel := "test/"
done := make(chan bool)
c := emitter.NewClient(
emitter.WithBrokers("tcp://192.168.0.10:8080"),
emitter.WithAutoReconnect(true))
c.OnError(func(_ *emitter.Client, err emitter.Error) {
fmt.Println(err.Message)
})
c.OnDisconnect(func(client *emitter.Client, e error) {
fmt.Println("######### Disconnected: ", e.Error())
})
c.OnConnect(func(_ *emitter.Client) {
fmt.Println("######### Connected.")
c.Subscribe(key, channel, func(_ *emitter.Client, msg emitter.Message) {
fmt.Println(string(msg.Payload()))
})
})
c.Connect()
<-done
} Naively, what I do is that I disconnect the Wifi for 30 seconds or more, then I reconnect it. The output looks like this:
|
Thanks for checking into it @Florimond - I constructed a similar test, which ran successfully for several weeks and reconnected correctly after disconnects (which I introduced with iptables). In my scenario, nearly all servers are listening to a channel indefinitely. When a message arrives, they perform a task internally and send a confirmation back on a different channel. Some of the clients disconnect and don't reconnect, but for whatever reason it's not 100% reproducible - there is some element of chance it seems. Is there some sort of verbose logging I can enable on the client to help diagnose the issue? |
It may seem it's a paho MQTT issue: eclipse-paho/paho.mqtt.golang#328 |
Interesting, this certainly looks related, thanks! |
I think I've found the issue that's been haunting me, and it's with my code: func ConnectionLostHandler(client *emitter.Client, reason error) {
log.Println("Emitter connection lost, reconnecting. Reason: ", reason.Error())
emitterClient.Disconnect(100 * time.Millisecond)
if connErr := emitterClient.Connect(); connErr != nil {
log.Fatalf("Error on Client.Connect(): %v", connErr)
}
} The statement It turns out that calling When in this weird state (lost connection, then called
I suspect the message pending on the channel is misinterpreted as a I am going to update all of my servers to ensure that this solves the problem. Thanks for hanging in there with me! For other users, this is how you enable verbose logging of the import mqtt "github.com/eclipse/paho.mqtt.golang"
func init() {
logger := log.New(os.Stderr, "", log.LstdFlags)
mqtt.ERROR = logger
mqtt.CRITICAL = logger
mqtt.WARN = logger
mqtt.DEBUG = logger
} |
Hi, I've been trying to use emitter for 4 months now, but am constantly plagued with reconnection issues. Basically, whenever there is a network interruption, the Go client seems to notice the problem, but it fails to reconnect and doesn't panic, which leaves the client in a disconnected state.
This is happening in both the v1 and v2 code, and with both standalone emitter servers as well as clustered servers. The problem is much more common when running the emitter behind an AWS Application Load Balancer with a client TTL above 30 seconds (I suspect the AWS ALB is killing the connection more frequently in this case).
I previously found some issues in the dependency, paho.mqtt.golang, that related to autoreconnection, and so I implemented a
OnConnectionLost()
handler that tries to reconnect, but it also doesn't seem to help. It seems the only way to reliably reconnect is to panic the whole thing and let my process supervisor restart my application.Is anyone else experiencing this problem, and/or is there something I can do to gather conclusive information about the problem?
I've tried the stable version, and also the latest
master
via my application'sGopkg.toml
:Thanks!
The text was updated successfully, but these errors were encountered: