Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#253 - Multicast over wireless ceases to work properly after a while on ath9k with many clients connected #5382

Open
openwrt-bot opened this issue Oct 27, 2016 · 29 comments
Labels

Comments

@openwrt-bot
Copy link

mamarley:

I am having a problem on a UBNT UAP-LR where, if many clients stay connected the router for a long time, multicast packets will stop being reliably sent to the wireless clients. I can reproduce this with LEDE r1953, but it goes back at least as far as OpenWRT 15.05.1 with kernel 3.18. I can not, however, reproduce it on another network with the same WAP but with only one or two wireless clients.

This originally manifested as IPv6 ceasing to work since the (multicasted) RA packets were not reaching the wireless clients. I used tcpdump on the WAP and determined that the RAs were arriving through the wired interface and leaving through the wireless interface as they should, but tcpdump on the client indicates that a vast majority of the RAs are never received. (Please note that the IPv6 issue is not reproducible with the router running OpenWRT/LEDE and odhcpd, since odhcpd unicasts RAs sent as a response to an RS instead of multicasting them, while it is reproducible when the router is running radvd, for example as pfSense does.)

@openwrt-bot
Copy link
Author

nbd:

Please try the latest version

@openwrt-bot
Copy link
Author

mamarley:

I will, but I can only reproduce this issue on my parents' WAP because I don't have enough devices and I don't want to flash their WAP remotely, so I won't be able to try until this weekend.

@openwrt-bot
Copy link
Author

mamarley:

I have tried the latest version and it is definitely still a problem.

@openwrt-bot
Copy link
Author

mamarley:

In fact, it seems to be worse now. Multicast (and therefore IPv6) has stopped working within an hour of each time I reboot the WAP since I updated. Previously (before the wireless-testing update) it would work for about a day or so before it stopped.

@openwrt-bot
Copy link
Author

neil:

I suspect this bug is responsible for my Chromecasts disappearing every day or so, since they use multicast for discovery.

I have a GL-iNet 6416A (Atheros AR9330 rev 1) with about 13 wireless clients. I maintain 2 other identical routers for friends, with 2 or 3 wireless clients each, which don't seem to suffer from the same problem.

Edit (2016-12-06): This hasn't happened since November 21.

@openwrt-bot
Copy link
Author

mkresin:

Michael, would you please test again! According to Neils edit, this one should be fixed in the meantime.

@openwrt-bot
Copy link
Author

mamarley:

I just tested again with the latest LEDE nightly. It took about a day this time, but it still stopped passing multicasted RAs over wireless, causing IPv6 to stop working.

@openwrt-bot
Copy link
Author

mkresin:

And the RAs are still broadcasted via wired ethernet? Just to make sure that you don't see a different bug now.

@openwrt-bot
Copy link
Author

mamarley:

They are indeed. I can even see the RAs multicasted in response to the RSs I am sending from the wireless PC on the wired PCs, I just can't see any of the RAs from the wireless PCs. They are still disappearing somewhere inside the wireless subsystem on the WAP because I can see them (with tcpdump) coming in on the WAP's wired interface and leaving through its wireless interface, but they never arrive at the clients.

And you still can't reproduce this with LEDE/OpenWRT and odhcpd sending the RAs because that program unicasts RAs in response to RSs, while radvd multicasts them.

@openwrt-bot
Copy link
Author

nbd:

Please test the latest version

@openwrt-bot
Copy link
Author

neil:

I just passed 100 hours uptime with 16 wifi clients (I had 13 when I last saw multicast stop working).

@openwrt-bot
Copy link
Author

mamarley:

After turning multicast snooping back on (removing the "igmp_snooping" lines from the /etc/config/network file) (as recommended by Stijn Tintel on IRC), multicast continues to work correctly and IPv6 stays up for me as well.

@openwrt-bot
Copy link
Author

nbd:

Multicast snooping is no longer enabled by default. Please test with option igmp_snooping 1.

@openwrt-bot
Copy link
Author

mamarley:

Enabling multicast snooping still breaks it.

@openwrt-bot
Copy link
Author

T-X:

Hi Michael, would it be possible for you to share those tcpdump's? A capture of all ICMPv6 packets both on the AP interface and the client side for about five minutes would be great.

@openwrt-bot
Copy link
Author

mamarley:

Sure, but I won't be able to get that until next weekend, since the WAP in question is at my parents' house and I don't have time to both reproduce the problem and capture the data this weekend.

@openwrt-bot
Copy link
Author

T-X:

Thanks Michael, looking forward to it :-).

Two more questions popped into my mind: Does that network somehow involve multiple routers?

(I noticed that Felix's patch disabled both multicast_snooping and multicast_querier. If only reenabling igmp_snooping (= multicast snooping of the bridge) changes something for you then this could mean that there is another IGMP/MLD Querier somewhere, maybe another OpenWRT router or an enterprise switch?)

Second question: While a specific client stays connected, are the issues temporary (like 2-5min.) or permanent (> 10min.) for that client?

@openwrt-bot
Copy link
Author

mamarley:

Thank you for investigating the issue.

The network has one router (an x86_64 box running pfSense 2.3.2) with the WAP (a Ubiquiti UAP-LR) directly connected to one of its Ethernet ports. The Ethernet connection does use multiple VLANs, if that matters, one for management and three others that become three SSIDs on the WAP.

When the multicast issues start, it seems that all multicast packets fail to be transmitted to all clients permanently with the only fix being to reboot the WAP (or do "/etc/init.d/network restart"). Occasionally one will make it through to a single client, but there is no pattern to this behavior.

@openwrt-bot
Copy link
Author

T-X:

One more thing which popped into my mind, after you mentioned multiple SSIDs:

For the client devices having this issue, are more than one SSID entered in the network manager of those devices? Could it be that the issue occurs after a client device roams from one SSID to another?

There are some known issues involving roaming devices and snooping switches inside the Linux IPv6 code. See the patch I just submitted to the Linux netdev mailinglist for details: https://patchwork.ozlabs.org/patch/723428/

(However these issues should(tm) be temporary, things should recover with the next MLD query and you mentioned it does not recover even after a couple of minutes. Nevertheless, we should probably check whether your issue somehow involves roaming.)

@openwrt-bot
Copy link
Author

mamarley:

Nope, none of the devices are configured with more than one of the SSIDs.

@openwrt-bot
Copy link
Author

mamarley:

This seems to be back in recent trunk builds (for a week or so?). It looks like multicast snooping got disabled by default?

@openwrt-bot
Copy link
Author

T-X:

Hi Michael, thanks for your update!

Yes, it got disabled in LEDE/netifd. So you are still having this issue which seems to indicate that is actually not a bug in the bridge multicast snooping code?

This is a very interesting issue, were you able narrow it down any further?

(I've also read somewhere that pfSense too has some IGMP proxy code or something like that, which you might be able to disable. Maybe that box might have something to do with it?)

@openwrt-bot
Copy link
Author

mamarley:

What happened for me is that originally (before netifd disabled snooping), with snooping manually disabled, multicast stopped working after a bit. When I removed the option from the /etc/config/network file, implicitly re-enabling snooping, multicast started working properly again. When netifd disabled snooping by default, multicast stopped working again. I then explicitly re-enabled snooping in /etc/config/network file, which caused multicast to start working again.

pfSense's configuration didn't have anything to do with it, as I was able to figure out using tcpdump that the multicast packets (IPv6 RAs) were leaving through the wireless interface on the WAP but never arriving on the client. Also, I recently got fed up with pfSense and switched my main router back to LEDE (it's good to be back!).

@openwrt-bot
Copy link
Author

T-X:

Ah, okay, then I completely misunderstood you!

For the broken configuration (so with the igmp_snooping line removed now?), could you paste your:

  • /etc/config/network
  • /etc/config/wireless
  • /etc/config/firewall
  • /tmp/run/hostapd-phyXXX.conf
  • /sys/class/net//bridge/multicast_snooping
  • /sys/class/net//bridge/multicast_querier
  • /sys/class/net//brport/multicast_to_unicast
  • /sys/class/net//brport/hairpin_mode

So to avoid misunderstandings again.

EDIT: Added request for hairpin_mode setting.

@openwrt-bot
Copy link
Author

T-X:

Any news regarding these settings and configuration files?

@openwrt-bot
Copy link
Author

mamarley:

Oh, sorry, I had forgotten. This weekend when I go home I will grab the configuration from my parents' WAP.

@openwrt-bot
Copy link
Author

JurgendW:

Noticed similar problem in my setup running 17.01.1. I observed that mdns discovery stops working from Wireless clients.

I have done a //tcpdump -i any port 5353// to observe all multicast traffic and you see all the messages being passed around. They just do not reach wireless clients. Doing //service network restart// resolves the problem.

I have now implicitly enabled igmp_snooping as this was turned off (i checked ///sys/class/net/br-lan/bridge_multicast_snooping// and it was set to 0 as default). It appears to be working now. I will keep monitoring to see if it breaks.

@openwrt-bot
Copy link
Author

@openwrt-bot
Copy link
Author

tofurky:

edit: nevermind, the ssdp search target field of the android app i was testing with didn't match that of the service i was trying to discover. if i find a real issue with multicast/ssdp i'll reply then :) sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant