OpenWrt/LEDE Project

Attached to Project: OpenWrt/LEDE Project
Opened by Michael Marley - 27.10.2016

FS#253 - Multicast over wireless ceases to work properly after a while on ath9k with many clients connected

I am having a problem on a UBNT UAP-LR where, if many clients stay connected the router for a long time, multicast packets will stop being reliably sent to the wireless clients. I can reproduce this with LEDE r1953, but it goes back at least as far as OpenWRT 15.05.1 with kernel 3.18. I can not, however, reproduce it on another network with the same WAP but with only one or two wireless clients.

This originally manifested as IPv6 ceasing to work since the (multicasted) RA packets were not reaching the wireless clients. I used tcpdump on the WAP and determined that the RAs were arriving through the wired interface and leaving through the wireless interface as they should, but tcpdump on the client indicates that a vast majority of the RAs are never received. (Please note that the IPv6 issue is not reproducible with the router running OpenWRT/LEDE and odhcpd, since odhcpd unicasts RAs sent as a response to an RS instead of multicasting them, while it is reproducible when the router is running radvd, for example as pfSense does.)

Project Manager
Felix Fietkau commented on 03.11.2016 10:19

Please try the latest version

Michael Marley commented on 03.11.2016 10:20

I will, but I can only reproduce this issue on my parents' WAP because I don't have enough devices and I don't want to flash their WAP remotely, so I won't be able to try until this weekend.

Michael Marley commented on 04.11.2016 23:45

I have tried the latest version and it is definitely still a problem.

Michael Marley commented on 06.11.2016 13:02

In fact, it seems to be worse now. Multicast (and therefore IPv6) has stopped working within an hour of each time I reboot the WAP since I updated. Previously (before the wireless-testing update) it would work for about a day or so before it stopped.

Neil Crawforth commented on 08.11.2016 12:10

I suspect this bug is responsible for my Chromecasts disappearing every day or so, since they use multicast for discovery.

I have a GL-iNet 6416A (Atheros AR9330 rev 1) with about 13 wireless clients. I maintain 2 other identical routers for friends, with 2 or 3 wireless clients each, which don't seem to suffer from the same problem.

Edit (2016-12-06): This hasn't happened since November 21.

Project Manager
Mathias Kresin commented on 22.12.2016 08:06

Michael, would you please test again! According to Neils edit, this one should be fixed in the meantime.

Michael Marley commented on 23.12.2016 11:04

I just tested again with the latest LEDE nightly. It took about a day this time, but it still stopped passing multicasted RAs over wireless, causing IPv6 to stop working.

Project Manager
Mathias Kresin commented on 23.12.2016 11:15

And the RAs are still broadcasted via wired ethernet? Just to make sure that you don't see a different bug now.

Michael Marley commented on 23.12.2016 11:22

They are indeed. I can even see the RAs multicasted in response to the RSs I am sending from the wireless PC on the wired PCs, I just can't see any of the RAs from the wireless PCs. They are still disappearing somewhere inside the wireless subsystem on the WAP because I can see them (with tcpdump) coming in on the WAP's wired interface and leaving through its wireless interface, but they never arrive at the clients.

And you still can't reproduce this with LEDE/OpenWRT and odhcpd sending the RAs because that program unicasts RAs in response to RSs, while radvd multicasts them.

Project Manager
Felix Fietkau commented on 20.01.2017 10:44

Please test the latest version

Neil Crawforth commented on 28.01.2017 18:04

I just passed 100 hours uptime with 16 wifi clients (I had 13 when I last saw multicast stop working).

Michael Marley commented on 28.01.2017 18:54

After turning multicast snooping back on (removing the "igmp_snooping" lines from the /etc/config/network file) (as recommended by Stijn Tintel on IRC), multicast continues to work correctly and IPv6 stays up for me as well.

Project Manager
Felix Fietkau commented on 28.01.2017 18:57

Multicast snooping is no longer enabled by default. Please test with option igmp_snooping 1.

Michael Marley commented on 28.01.2017 18:58

Enabling multicast snooping still breaks it.

Linus Lüssing commented on 29.01.2017 05:56

Hi Michael, would it be possible for you to share those tcpdump's? A capture of all ICMPv6 packets both on the AP interface and the client side for about five minutes would be great.

Michael Marley commented on 29.01.2017 11:48

Sure, but I won't be able to get that until next weekend, since the WAP in question is at my parents' house and I don't have time to both reproduce the problem and capture the data this weekend.

Linus Lüssing commented on 31.01.2017 02:45

Thanks Michael, looking forward to it :-).

Two more questions popped into my mind: Does that network somehow involve multiple routers?

(I noticed that Felix's patch disabled both multicast_snooping and multicast_querier. If only reenabling igmp_snooping (= multicast snooping of the bridge) changes something for you then this could mean that there is another IGMP/MLD Querier somewhere, maybe another OpenWRT router or an enterprise switch?)

Second question: While a specific client stays connected, are the issues temporary (like 2-5min.) or permanent (> 10min.) for that client?

Michael Marley commented on 31.01.2017 03:38

Thank you for investigating the issue.

The network has one router (an x86_64 box running pfSense 2.3.2) with the WAP (a Ubiquiti UAP-LR) directly connected to one of its Ethernet ports. The Ethernet connection does use multiple VLANs, if that matters, one for management and three others that become three SSIDs on the WAP.

When the multicast issues start, it seems that all multicast packets fail to be transmitted to all clients permanently with the only fix being to reboot the WAP (or do "/etc/init.d/network restart"). Occasionally one will make it through to a single client, but there is no pattern to this behavior.

Linus Lüssing commented on 03.02.2017 07:24

One more thing which popped into my mind, after you mentioned multiple SSIDs:

For the client devices having this issue, are more than one SSID entered in the network manager of those devices? Could it be that the issue occurs after a client device roams from one SSID to another?

There are some known issues involving roaming devices and snooping switches inside the Linux IPv6 code. See the patch I just submitted to the Linux netdev mailinglist for details: https://patchwork.ozlabs.org/patch/723428/

(However these issues should™ be temporary, things should recover with the next MLD query and you mentioned it does not recover even after a couple of minutes. Nevertheless, we should probably check whether your issue somehow involves roaming.)

Michael Marley commented on 03.02.2017 11:02

Nope, none of the devices are configured with more than one of the SSIDs.

Michael Marley commented on 20.02.2017 01:23

This seems to be back in recent trunk builds (for a week or so?). It looks like multicast snooping got disabled by default?

Linus Lüssing commented on 13.03.2017 15:05

Hi Michael, thanks for your update!

Yes, it got disabled in LEDE/netifd. So you are still having this issue which seems to indicate that is actually not a bug in the bridge multicast snooping code?

This is a very interesting issue, were you able narrow it down any further?

(I've also read somewhere that pfSense too has some IGMP proxy code or something like that, which you might be able to disable. Maybe that box might have something to do with it?)

Michael Marley commented on 13.03.2017 17:09

What happened for me is that originally (before netifd disabled snooping), with snooping manually disabled, multicast stopped working after a bit. When I removed the option from the /etc/config/network file, implicitly re-enabling snooping, multicast started working properly again. When netifd disabled snooping by default, multicast stopped working again. I then explicitly re-enabled snooping in /etc/config/network file, which caused multicast to start working again.

pfSense's configuration didn't have anything to do with it, as I was able to figure out using tcpdump that the multicast packets (IPv6 RAs) were leaving through the wireless interface on the WAP but never arriving on the client. Also, I recently got fed up with pfSense and switched my main router back to LEDE (it's good to be back!).

Linus Lüssing commented on 13.03.2017 19:21

Ah, okay, then I completely misunderstood you!

For the broken configuration (so with the igmp_snooping line removed now?), could you paste your:

  • /etc/config/network
  • /etc/config/wireless
  • /etc/config/firewall
  • /tmp/run/hostapd-phyXXX.conf
  • /sys/class/net/<your-bridge-dev>/bridge/multicast_snooping
  • /sys/class/net/<your-bridge-dev>/bridge/multicast_querier
  • /sys/class/net/<your-ap-dev>/brport/multicast_to_unicast
  • /sys/class/net/<your-ap-dev>/brport/hairpin_mode

So to avoid misunderstandings again.

EDIT: Added request for hairpin_mode setting.

Linus Lüssing commented on 10.04.2017 08:02

Any news regarding these settings and configuration files?

Michael Marley commented on 10.04.2017 10:03

Oh, sorry, I had forgotten. This weekend when I go home I will grab the configuration from my parents' WAP.

JurgendW commented on 02.05.2017 13:38

Noticed similar problem in my setup running 17.01.1. I observed that mdns discovery stops working from Wireless clients.

I have done a tcpdump -i any port 5353 to observe all multicast traffic and you see all the messages being passed around. They just do not reach wireless clients. Doing service network restart resolves the problem.

I have now implicitly enabled igmp_snooping as this was turned off (i checked /sys/class/net/br-lan/bridge_multicast_snooping and it was set to 0 as default). It appears to be working now. I will keep monitoring to see if it breaks.

tofurky commented on 13.11.2017 01:41

edit: nevermind, the ssdp search target field of the android app i was testing with didn't match that of the service i was trying to discover. if i find a *real* issue with multicast/ssdp i'll reply then :) sorry.

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing