OpenWrt/LEDE Project

  • Status Unconfirmed
  • Percent Complete
    0%
  • Task Type Bug Report
  • Category Kernel
  • Assigned To No-one
  • Operating System All
  • Severity High
  • Priority Very Low
  • Reported Version Trunk
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Konrad Kosmatka - 28.12.2019

FS#2707 - mt7621/ath9k: unreliable AP with hardware encryption

Board: MikroTik Routerboard RBM33G (ramips/mt7621)
Interface: MikroTik R11e-2HPnD (ath9k/ar9582)
Tested versions: 18.06.5, snapshot

Some background: I’ve been using RB433AH & UBNT XR2 (ath5k) as home AP for many years. It was extremely reliable with OpenWRT, LEDE and OpenWRT once again. However, the 802.11g is not sufficient nowadays, so I decided to upgrade. I’ve acquired a RBM33G board and R11e-2HPnD module, but this setup is very unreliable for me.

The bug: At some point, the AP cannot receive (decrypt?) packets from a station, but the connection is still established.

Details:

  • It can happen on any active station connected to AP, but only one stalls at a given time (not all of them).
  • AP actually receives data from STA, as the RX byte counter increases, but the RX packet count is typically stuck at the same value. For example, several subsequent calls to `iw wlan0 station dump` show: rx bytes: 12121338 / 12127266 / 12141740, rx packets: 1282 / 1282 / 1282.
  • Driver discards the packets, as the replay counter greatly increases when this bug occurs. (e.g. /sys/kernel/debug/ieee80211/phy0/keys/3/replays: 484947)
  • There is nothing relevant in `dmesg` or `logread`.

Reproducibility:

  • It happens after several hours of a typical traffic.
  • I can trigger it manually much faster by passing high data throughput between clients (e.g. using iperf).

Configuration:

config wifi-device 'radio0'
        option type 'mac80211'
        option channel '11'
        option hwmode '11g'
        option path 'pci0000:00/0000:00:00.0/0000:01:00.0'
        option htmode 'NOHT'
        option txpower '16'
        option disabled '0'

config wifi-iface 'default_radio0'
        option device 'radio0'
        option network 'lan'
        option mode 'ap'
        option ssid '<SSID>'
        option encryption 'psk2+ccmp'
        option key '<KEY>'

Stations tested:

  • Dell Latitude E7440 with Intel Wireless 7260 (iwlwifi), running linux 5.4.6
  • Dell Latitude E7270 with Intel Wireless 8260 (iwlwifi), running linux 4.19.91
  • RB411U with DBII F20-PRO (ath5k, AR5414), running LEDE 17.01.4 (in WDS link)


What does NOT help:

  • Changing the basic configuration (SSID, key).
  • Disabling 802.11n.
  • Disabling WDS (both AP & STA).
  • Disabling SMP (nosmp kernel cmdline option).
  • Using another version of hostapd, i.e. wpad-mesh.
  • Changing miniPCIe slot on the RBM33G board.
  • Disabling CPU powersave in MikroTik bootloader.

Workarounds:

  • Disable the encryption at all, but that’s obviously not an option.
  • Disable hardware encryption with ath9k’s nohwcrypt=1 parameter.

This bug does not occur with software encryption, at least in 802.11g mode. In 802.11n mode, the throughput is limited to a little over 30 Mbps and there are frequent disconnects. Apparently, the CPU is not the limiting factor, as `top` reports usage of approximately 10%.

Unfortunately, this bug renders the AP unusable for me, because I need the best reliability.

I also tested this setup with different board and (so far) I cannot reproduce this bug on RW2458N (ar71xx) & R11e-2HPnD with OpenWRT 18.06.5.

Paul Fertser commented on 29.12.2019 12:18

https://patchwork.kernel.org/cover/10583683/ likely related.
hostapd doesn't seem to be honouring NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 at all, as well as NL80211_EXT_FEATURE_EXT_KEY_ID.

Paul Fertser commented on 10.02.2020 18:04

Fully disregard my last comment. And the advice I got from the wireless maintainer is that PTK rekeying just shouldn't be enabled, there's no practical need for it, and most drivers can't do it properly.

The original reporter shared with me a workaround to this problem which made his systems fully reliable for days: commenting out this line https://github.com/torvalds/linux/blob/81160dda9a7aad13c04e78bb2cfd3c4630e3afab/net/mac80211/wpa.c#L538 .

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing