OpenWrt/LEDE Project

  • Status Unconfirmed
  • Percent Complete
    0%
  • Task Type Bug Report
  • Category Kernel
  • Assigned To No-one
  • Operating System All
  • Severity Critical
  • Priority Very Low
  • Reported Version openwrt-21.02
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by slick_diligence - 24.07.2021

FS#3947 - Wifi STA that loses AP signal takes down the whole router, sometimes rebooting it

Initially reported on forum: https://forum.openwrt.org/t/wifi-client-disconnecting-takes-the-whole-wifi-ap-down-on-21-02-snapshot-how-to-debug/102094/8

Device: Asus RT-N56U
Branch: openwrt-21.02, initial report commit 60fad8f (v21.02.0-rc3-74-g60fad8f82b)

Observation: when a wifi client disconnects that appears to see the AP with low signal strength, the whole AP goes down. Verified with airmon/tcpdump that AP beacons stop. Reboot of the router also observed when log_level set to 2 or lower.

Bisecting, commit https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=a078037ace50 (v21.02.0-rc3-35-ga078037ace) seems to be the issue:

    mac80211: improve rate control performance

    Call rate control handler after intermediate queueuing
    Includes follow-up fixes

Test case:

1. Connect to AP
2. Move far away or shield mobile so the AP signal drops significantly as seen by mobile
3. Viewing the wifi networks of the mobile (android), if when signal drops below some threshold:
4.1 AP disappears from list: FAIL [the router also reboots]
4.2 AP moves from “Connected” to “Saved”: PASS

I did not have these issues at all in May 2021 using the dev snapshot. I updated to the July 2021 dev snapshot, observed the issue, then built openwrt-21.02 and still observed the issue.

slick_diligence commented on 24.07.2021 21:30

More information in case it matters:

- wifi-device: 802.11n HT40 5GHz channel, txpower is limited
- wifi-iface: these options I mucked with but did not make a difference in seeing the issue: ieee80211w, isolate, wpa_disable_eapol_key_retries)

My 23 July 2021 build of openwrt-21.02 with "git revert ccbe535; git revert a07803" has not encountered the issue in 1 day compared to almost seeing the issue 5+ times per day.

dbpalan commented on 23.11.2021 03:59

Exactly same symptom from openwrt-21.02.0 as well as openwrt-21.02.1:

(1) One wifi client move far away (low signal) from AP
(2) ALL wifi clients connected to that AP disconnect and cannot find that AP (another AP from the same router has no problem, i.e. disconnect from 2.4GHz AP will not affect 5GHz AP)
(3) After around 1 minute, the disconnected AP re-appears and able to connect again

Same symptom occured in two different routers.

Image used: https://downloads.openwrt.org/releases/21.02.1/targets/ramips/mt7621/lenovo_newifi-d1-squashfs-sysupgrade.bin

slick_diligence commented on 24.11.2021 01:26

It sounds like there may be some progress from:

https://lkml.org/lkml/2021/11/18/539

The reporter to the kernel list identified the exact same commit that I did "mac80211: call ieee80211_tx_h_rate_ctrl() when dequeue".

There appears to be a patch available from Felix Fietkau:

https://lkml.org/lkml/2021/11/21/252

---
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1822,15 +1822,15 @@ static int invoke_tx_handlers_late(struct ieee80211_tx_data *tx)
  	struct ieee80211_tx_info *info = IEEE80211_SKB_CB(tx->skb);
  	ieee80211_tx_result res = TX_CONTINUE;
  
+	if (!ieee80211_hw_check(&tx->local->hw, HAS_RATE_CONTROL))
+		CALL_TXH(ieee80211_tx_h_rate_ctrl);
+
  	if (unlikely(info->flags & IEEE80211_TX_INTFL_RETRANSMISSION)) {
  		__skb_queue_tail(&tx->skbs, tx->skb);
  		tx->skb = NULL;
  		goto txh_done;
  	}
  
-	if (!ieee80211_hw_check(&tx->local->hw, HAS_RATE_CONTROL))
-		CALL_TXH(ieee80211_tx_h_rate_ctrl);
-
  	CALL_TXH(ieee80211_tx_h_michael_mic_add);
  	CALL_TXH(ieee80211_tx_h_sequence);
  	CALL_TXH(ieee80211_tx_h_fragment);

I will give this a try and see if it improves.

slick_diligence commented on 24.11.2021 02:05

The above patch as used in OpenWrt at commit https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=d1ea575baa1b53bb477a020974afcec1b1193edc fixes the issue.

I had 100% success rate crashing the AP with my test case with the commit prior, and 0% success rate crashing the AP with my test case with the above commit.

The commit indicates:

"This showed up primarily on rt2x00"

But based on my report, @dbpalan's, and the LKML report, it occurred with:

- rt2x00usb (Raspberry pi, not OpenWrt)
- ramips/mt7621 (lenovo newifi-d1)
- rampis/rt3883 (asus rt-n56u)

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing