New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FS#253 - Multicast over wireless ceases to work properly after a while on ath9k with many clients connected #5382
Comments
nbd: Please try the latest version |
mamarley: I will, but I can only reproduce this issue on my parents' WAP because I don't have enough devices and I don't want to flash their WAP remotely, so I won't be able to try until this weekend. |
mamarley: I have tried the latest version and it is definitely still a problem. |
mamarley: In fact, it seems to be worse now. Multicast (and therefore IPv6) has stopped working within an hour of each time I reboot the WAP since I updated. Previously (before the wireless-testing update) it would work for about a day or so before it stopped. |
neil: I suspect this bug is responsible for my Chromecasts disappearing every day or so, since they use multicast for discovery. I have a GL-iNet 6416A (Atheros AR9330 rev 1) with about 13 wireless clients. I maintain 2 other identical routers for friends, with 2 or 3 wireless clients each, which don't seem to suffer from the same problem. Edit (2016-12-06): This hasn't happened since November 21. |
mkresin: Michael, would you please test again! According to Neils edit, this one should be fixed in the meantime. |
mamarley: I just tested again with the latest LEDE nightly. It took about a day this time, but it still stopped passing multicasted RAs over wireless, causing IPv6 to stop working. |
mkresin: And the RAs are still broadcasted via wired ethernet? Just to make sure that you don't see a different bug now. |
mamarley: They are indeed. I can even see the RAs multicasted in response to the RSs I am sending from the wireless PC on the wired PCs, I just can't see any of the RAs from the wireless PCs. They are still disappearing somewhere inside the wireless subsystem on the WAP because I can see them (with tcpdump) coming in on the WAP's wired interface and leaving through its wireless interface, but they never arrive at the clients. And you still can't reproduce this with LEDE/OpenWRT and odhcpd sending the RAs because that program unicasts RAs in response to RSs, while radvd multicasts them. |
nbd: Please test the latest version |
neil: I just passed 100 hours uptime with 16 wifi clients (I had 13 when I last saw multicast stop working). |
mamarley: After turning multicast snooping back on (removing the "igmp_snooping" lines from the /etc/config/network file) (as recommended by Stijn Tintel on IRC), multicast continues to work correctly and IPv6 stays up for me as well. |
nbd: Multicast snooping is no longer enabled by default. Please test with option igmp_snooping 1. |
mamarley: Enabling multicast snooping still breaks it. |
T-X: Hi Michael, would it be possible for you to share those tcpdump's? A capture of all ICMPv6 packets both on the AP interface and the client side for about five minutes would be great. |
mamarley: Sure, but I won't be able to get that until next weekend, since the WAP in question is at my parents' house and I don't have time to both reproduce the problem and capture the data this weekend. |
T-X: Thanks Michael, looking forward to it :-). Two more questions popped into my mind: Does that network somehow involve multiple routers? (I noticed that Felix's patch disabled both multicast_snooping and multicast_querier. If only reenabling igmp_snooping (= multicast snooping of the bridge) changes something for you then this could mean that there is another IGMP/MLD Querier somewhere, maybe another OpenWRT router or an enterprise switch?) Second question: While a specific client stays connected, are the issues temporary (like 2-5min.) or permanent (> 10min.) for that client? |
mamarley: Thank you for investigating the issue. The network has one router (an x86_64 box running pfSense 2.3.2) with the WAP (a Ubiquiti UAP-LR) directly connected to one of its Ethernet ports. The Ethernet connection does use multiple VLANs, if that matters, one for management and three others that become three SSIDs on the WAP. When the multicast issues start, it seems that all multicast packets fail to be transmitted to all clients permanently with the only fix being to reboot the WAP (or do "/etc/init.d/network restart"). Occasionally one will make it through to a single client, but there is no pattern to this behavior. |
T-X: One more thing which popped into my mind, after you mentioned multiple SSIDs: For the client devices having this issue, are more than one SSID entered in the network manager of those devices? Could it be that the issue occurs after a client device roams from one SSID to another? There are some known issues involving roaming devices and snooping switches inside the Linux IPv6 code. See the patch I just submitted to the Linux netdev mailinglist for details: https://patchwork.ozlabs.org/patch/723428/ (However these issues should(tm) be temporary, things should recover with the next MLD query and you mentioned it does not recover even after a couple of minutes. Nevertheless, we should probably check whether your issue somehow involves roaming.) |
mamarley: Nope, none of the devices are configured with more than one of the SSIDs. |
mamarley: This seems to be back in recent trunk builds (for a week or so?). It looks like multicast snooping got disabled by default? |
T-X: Hi Michael, thanks for your update! Yes, it got disabled in LEDE/netifd. So you are still having this issue which seems to indicate that is actually not a bug in the bridge multicast snooping code? This is a very interesting issue, were you able narrow it down any further? (I've also read somewhere that pfSense too has some IGMP proxy code or something like that, which you might be able to disable. Maybe that box might have something to do with it?) |
mamarley: What happened for me is that originally (before netifd disabled snooping), with snooping manually disabled, multicast stopped working after a bit. When I removed the option from the /etc/config/network file, implicitly re-enabling snooping, multicast started working properly again. When netifd disabled snooping by default, multicast stopped working again. I then explicitly re-enabled snooping in /etc/config/network file, which caused multicast to start working again. pfSense's configuration didn't have anything to do with it, as I was able to figure out using tcpdump that the multicast packets (IPv6 RAs) were leaving through the wireless interface on the WAP but never arriving on the client. Also, I recently got fed up with pfSense and switched my main router back to LEDE (it's good to be back!). |
T-X: Ah, okay, then I completely misunderstood you! For the broken configuration (so with the igmp_snooping line removed now?), could you paste your:
So to avoid misunderstandings again. EDIT: Added request for hairpin_mode setting. |
T-X: Any news regarding these settings and configuration files? |
mamarley: Oh, sorry, I had forgotten. This weekend when I go home I will grab the configuration from my parents' WAP. |
JurgendW: Noticed similar problem in my setup running 17.01.1. I observed that mdns discovery stops working from Wireless clients. I have done a //tcpdump -i any port 5353// to observe all multicast traffic and you see all the messages being passed around. They just do not reach wireless clients. Doing //service network restart// resolves the problem. I have now implicitly enabled igmp_snooping as this was turned off (i checked ///sys/class/net/br-lan/bridge_multicast_snooping// and it was set to 0 as default). It appears to be working now. I will keep monitoring to see if it breaks. |
pg: Could anyone please report if this recent patch solved the multicast issue? Thanks. |
tofurky: edit: nevermind, the ssdp search target field of the android app i was testing with didn't match that of the service i was trying to discover. if i find a real issue with multicast/ssdp i'll reply then :) sorry. |
mamarley:
I am having a problem on a UBNT UAP-LR where, if many clients stay connected the router for a long time, multicast packets will stop being reliably sent to the wireless clients. I can reproduce this with LEDE r1953, but it goes back at least as far as OpenWRT 15.05.1 with kernel 3.18. I can not, however, reproduce it on another network with the same WAP but with only one or two wireless clients.
This originally manifested as IPv6 ceasing to work since the (multicasted) RA packets were not reaching the wireless clients. I used tcpdump on the WAP and determined that the RAs were arriving through the wired interface and leaving through the wireless interface as they should, but tcpdump on the client indicates that a vast majority of the RAs are never received. (Please note that the IPv6 issue is not reproducible with the router running OpenWRT/LEDE and odhcpd, since odhcpd unicasts RAs sent as a response to an RS instead of multicasting them, while it is reproducible when the router is running radvd, for example as pfSense does.)
The text was updated successfully, but these errors were encountered: