OpenWrt/LEDE Project

  • Status New
  • Percent Complete
    0%
  • Task Type Bug Report
  • Category Kernel
  • Assigned To No-one
  • Operating System All
  • Severity High
  • Priority Medium
  • Reported Version All
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Daniel Golle - 10.03.2019
Last edited by Daniel Golle - 10.03.2019

FS#2176 - ubiquiti loco xw (AR9342 Rev.2) stops receiving on wireless

This bug has been existing as long as the ubiquiti loco xw 5 ghz hardware is around, probably what we are dealing with is a hardware bug in the AR9342 Rev. 2 chip. It is somehow hard to reproduce, but it hits us reliable every couple of hours, sometimes days, running any version of OpenWrt up to todays master branch. It just happened again and this time I decided to check if the bug is actually listed on FS – in addition to creating the usual cron-job executing `iw dev wlan0 scan` every minute. There is even a watchdog in the community libremesh repository designed to catch exactly this bug:
https://github.com/libremesh/lime-packages/blob/master/packages/cotonete/Makefile#L21

So to run into it, here we got 2 ubiquiti nanobeam m5 devices running OpenWrt ar71xx/generic loco-m-xw pointing at each other over a distance of roughly 2km. This link is acceptable, but not perfect and slightly asymmetric.

Device A (worse RX SNR):

Station f0:9f:c2:xx:xx:7a (on wlan0-mesh)
	inactive time:	0 ms
	rx bytes:	1433997029
	rx packets:	8589913
	tx bytes:	23160025785
	tx packets:	15667682
	tx retries:	677924
	tx failed:	0
	rx drop misc:	30076
	signal:  	-76 [-79, -79] dBm
	signal avg:	-74 [-77, -77] dBm
	Toffset:	4887897799 us
	tx bitrate:	43.3 MBit/s MCS 10 short GI
	rx bitrate:	43.3 MBit/s MCS 4 short GI
	expected throughput:	24.536Mbps
	mesh llid:	0
	mesh plid:	0
	mesh plink:	ESTAB
	mesh local PS mode:	ACTIVE
	mesh peer PS mode:	ACTIVE
	mesh non-peer PS mode:	ACTIVE
	authorized:	yes
	authenticated:	yes
	associated:	yes
	preamble:	long
	WMM/WME:	yes
	MFP:		yes
	TDLS peer:	no
	DTIM period:	2
	beacon interval:100
	connected time:	18117 seconds

Device B (better RX SNR)

Station fc:ec:da:xx:xx:8c (on wlan0-mesh)
	inactive time:	20 ms
	rx bytes:	23442553203
	rx packets:	16047526
	tx bytes:	1274688573
	tx packets:	8251479
	tx retries:	1337700
	tx failed:	2932
	rx drop misc:	45195
	signal:  	-72 [-79, -73] dBm
	signal avg:	-71 [-78, -73] dBm
	Toffset:	18446744068821653812 us
	tx bitrate:	57.8 MBit/s MCS 11 short GI
	rx bitrate:	43.3 MBit/s MCS 10 short GI
	last ack signal:24 dBm
	expected throughput:	24.536Mbps
	mesh llid:	0
	mesh plid:	0
	mesh plink:	ESTAB
	mesh local PS mode:	ACTIVE
	mesh peer PS mode:	ACTIVE
	mesh non-peer PS mode:	ACTIVE
	authorized:	yes
	authenticated:	yes
	associated:	yes
	preamble:	long
	WMM/WME:	yes
	MFP:		yes
	TDLS peer:	no
	DTIM period:	2
	beacon interval:100
	connected time:	18179 seconds

Now it so happens that device A (and always only device A!) becomes deaf after some hours of mostly sending lots of traffic to device B. It will continue to send beacons, but loose all associations. device B will keep trying setting up a link, but it keeps ending up in ‘BLOCKED’ state and dumps using a monitor mode interface show that device A simply doesn’t react at all to any of the frames send by device B. A simple `iw dev wlan0 scan` on device A (which doesn’t give any results) fixes the problem.

The channel seems rather unused otherwise and signal quality only varies by weather conditions. Interestingly this seems to happen on non-DFS channels only. And it happens on Ad-Hoc mode (unencrypted, never tried encrypted) and 802.11s (open ie. setup via `iw` tool as well as with SAE ie. running `wpa_supplicant`) equally. It doesn’t happen on all nodes, but only on those with rather bad signal or at least one far-off neighbor.

Maybe related to FS#1246

I saw this occuring on ubnt nanostation loco m5 XW as well as on all nanobeam m5 variants (which is supposedly compatible with the loco-m-xw image).

ieee80211 phy0: Atheros AR9340 Rev:2 mem=0xb8100000, irq=47

WiFi EEPROM of the devices:

*
00001000  02 02 F0 9F C2 XX XX XX  00 30 3a 31 35 3a 36 64  |.....XXX.0:15:6d|
00001010  3a 64 64 3a 64 65 3a 61  64 00 00 00 00 00 1f 00  |:dd:de:ad.......|
00001020  33 01 00 00 00 00 04 00  00 00 2d 04 03 00 08 ff  |3.........-.....|
00001030  20 01 00 00 00 20 02 00  00 cc cc 0c 00 50 01 50  | .... .......P.P|
00001040  01 50 01 00 00 00 00 00  00 21 00 a4 00 00 00 00  |.P.......!......|
00001050  ff 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00001060  0e 0e 03 00 2c e2 00 02  0e 1c e0 e0 00 0c e0 e0  |....,...........|
00001070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00001080  00 00 00 00 00 00 00 00  00 00 70 89 ac 00 00 00  |..........p.....|
00001090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000010c0  00 00 00 70 ac 70 89 ac  70 89 ac 70 89 ac 22 22  |...p.p..p..p..""|
000010d0  22 22 22 22 22 22 22 22  20 1c 22 22 20 1c 22 22  |"""""""" ."" .""|
000010e0  20 1c 24 24 20 18 16 14  20 16 14 12 20 20 1c 14  | .$$ ... ...  ..|
000010f0  24 24 20 18 16 14 20 16  14 12 20 20 1c 14 24 24  |$$ ... ...  ..$$|
00001100  20 18 16 14 20 16 14 12  20 20 1c 14 22 22 1e 16  | ... ...  ..""..|
00001110  14 12 1e 14 12 10 20 20  1c 14 22 22 1e 16 14 12  |......  ..""....|
00001120  1e 14 12 10 20 20 1c 14  22 22 1e 16 14 12 1e 14  |....  ..""......|
00001130  12 10 20 20 1c 14 11 12  15 17 41 42 45 47 31 32  |..  ......ABEG12|
00001140  35 37 70 75 ac b8 70 75  ac b8 70 75 ac b8 70 75  |57pu..pu..pu..pu|
00001150  ac b8 70 75 ac b8 70 75  ac b8 70 75 ac b8 70 75  |..pu..pu..pu..pu|
*
00001170  ac b8 3c 7c 3c 7c 3c 7c  3c 7c 3c 7c 3c 7c 3c 7c  |..<|<|<|<|<|<|<||
00001180  3c 7c 3c 7c 3c 7c 3c 7c  3c 7c 3c 7c 3c 7c 3c 7c  |<|<|<|<|<|<|<|<||
*
000011a0  3c 7c 10 01 00 00 22 22  02 00 00 00 00 00 00 00  |<|....""........|
000011b0  00 00 00 00 00 00 44 00  00 00 00 00 00 ff 00 00  |......D.........|
000011c0  00 00 00 00 00 00 00 00  00 00 00 00 ff 0e 0e 03  |................|
000011d0  00 2d e2 00 02 0e 1c 00  00 00 00 00 00 00 00 00  |.-..............|
000011e0  00 00 00 00 00 00 00 00  00 44 44 00 00 00 00 00  |.........DD.....|
000011f0  00 00 00 00 00 00 00 4c  58 68 8c a4 b4 bd cd d9  |.......LXh......|
00001200  00 89 00 00 00 dc 00 89  00 00 00 e0 00 8a 00 00  |................|
00001210  00 e2 00 8b 00 00 00 de  00 8b 00 00 00 de 00 8b  |................|
00001220  00 00 00 dc 00 89 00 00  00 da 00 8b 00 00 00 e0  |................|
00001230  00 89 00 00 00 e4 00 8a  00 00 00 e7 00 8b 00 00  |................|
00001240  00 e6 00 8b 00 00 00 e2  00 8c 00 00 00 e1 00 8c  |................|
00001250  00 00 00 df 00 8b 00 00  00 dd 00 8b 00 00 00 00  |................|
00001260  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001280  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 4c  |...............L|
00001290  54 68 78 8c a0 b4 c5 4c  54 68 78 8c a0 b4 c5 4c  |Thx....LThx....L|
000012a0  54 68 78 8c a0 b4 c5 26  20 1e 1c 26 20 1e 1c 26  |Thx....& ..& ..&|
000012b0  20 1e 1c 26 20 1e 1c 26  20 1e 1c 26 20 1e 1c 26  | ..& ..& ..& ..&|
000012c0  20 1e 1c 26 20 1e 1c 26  22 20 1e 1c 1a 20 1e 1c  | ..& ..&" ... ..|
000012d0  1a 00 00 00 00 26 22 20  1e 1c 1a 20 1e 1c 1a 00  |.....&" ... ....|
000012e0  00 00 00 26 22 20 1e 1c  1a 20 1e 1c 1a 00 00 00  |...&" ... ......|
000012f0  00 26 22 20 1e 1c 1a 20  1e 1c 1a 00 00 00 00 26  |.&" ... .......&|
00001300  22 20 1e 1c 1a 20 1e 1c  1a 00 00 00 00 26 22 20  |" ... .......&" |
00001310  1e 1c 1a 20 1e 1c 1a 00  00 00 00 26 22 20 1e 1c  |... .......&" ..|
00001320  1a 20 1e 1c 1a 00 00 00  00 26 22 20 1e 1c 1a 20  |. .......&" ... |
00001330  1e 1c 1a 00 00 00 00 26  22 20 1e 1c 1a 20 1e 1c  |.......&" ... ..|
00001340  1a 00 00 00 00 26 22 20  1e 1c 1a 20 1e 1c 1a 00  |.....&" ... ....|
00001350  00 00 00 26 22 20 1e 1c  1a 20 1e 1c 1a 00 00 00  |...&" ... ......|
00001360  00 26 22 20 1e 1c 1a 20  1e 1c 1a 00 00 00 00 26  |.&" ... .......&|
00001370  22 20 1e 1c 1a 20 1e 1c  1a 00 00 00 00 26 22 20  |" ... .......&" |
00001380  1e 1c 1a 20 1e 1c 1a 00  00 00 00 26 22 20 1e 1c  |... .......&" ..|
00001390  1a 20 1e 1c 1a 00 00 00  00 26 22 20 1e 1c 1a 20  |. .......&" ... |
000013a0  1e 1c 1a 00 00 00 00 10  16 18 40 46 48 30 36 38  |..........@FH068|
000013b0  4c 54 68 78 8c a0 b9 cd  4c 54 68 78 8c a0 b9 cd  |LThx....LThx....|
*
000013f0  4c 54 68 78 8c a0 b9 cd  3c 7c 3c 7c 3c 7c 3c 7c  |LThx....<|<|<|<||
00001400  3c 7c 3c 7c 3c 7c 3c 7c  3c 7c 3c 7c 3c 7c 3c 7c  |<|<|<|<|<|<|<|<||
*
00001440  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
Project Manager
Daniel Golle commented on 10.03.2019 05:20

I'm speculating that bgscan may indirectly prevent it from occurring (and thereby may have hidden it from QA)

Lars commented on 03.05.2019 20:08

Does this issue occur on the master or the client side of the connection?

(I am just curious, since we are having a similar issue - but only on the master side)

psyborg commented on 05.06.2019 23:16

to rule out ath9k DFS code, try building image with DFS flags disabled/removed and then run it on some DFS channel

Project Manager
Koen Vandeputte commented on 07.10.2019 21:55

Is there any warning in dmesg when this occurs?
Does it still occur in latest 19.07 or master?

Thanks

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing