New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FS#1263 - Archer C7 Periodic 2.4G WiFi Failure (ath10k_pci failed to parse phyerr tlv payload at byte 0) #7969
Comments
imac: It looks like we have seen this issue since 17.01.1, we did initially believe a 2.4G printer may have been the culprit as logged in this older closed ticket here. https://bugs.lede-project.org/index.php?do=details&task_id=791 Eventually this printer was hardwired, but the 2.4G dropping issues have persisted. |
avbohemen: The ath10k driver is for the 802.11ac radio, so 5Ghz. The Archer C7 also has an ath9k driver for the 2.4GHz radio. So the messages probably have very little to do with disconnects on the 2.4GHz radio. |
imac: We think this issue was related to the Chromecast Client (https://support.google.com/chromecast/answer/7634752) We have had almost a month now without seeing this issue. |
imac: Unfortunately I posted a day too soon, as the 2.4G dropped again today. At then end of dmesg a bunch of these: [9168557.004798] ath: phy1: Unable to reset channel, reset status -5 In the logs, you can see the three 2.4G devices drop off (wlan1) and the two 5G devices make it to the next handshake. Wed Mar 14 23:15:11 2018 daemon.info hostapd: wlan0: STA a0:cc:2b:15:20:97 WPA: group key handshake completed (RSN) Back to resetting wifi nightly for now via cron. |
imac: The first one to disconnect has a unique error, not sure if that is related or just a symptom Wed Mar 14 23:18:16 2018 daemon.info hostapd: wlan1: STA 6c:ad:f8:4b:a3:52 IEEE 802.11: disconnected due to excessive missing ACKs |
imac: Here is a dump of logread and dmesg from a failure I caught in the last few minutes. I restarted wifi at Wed Apr 11 15:19:35 2018; All that seems to be logged at the time of the outage is the "Unable to reset channel, reset status -5". Logread: Dmesg: Wed Apr 11 15:16:00 2018 kern.err kernel: [12609090.577341] ath: phy1: Unable to reset channel, reset status -5 [12570336.615632] br-lan: port 2(wlan0) entered forwarding state |
psyborg: i've observed similar on ath79: https://bugs.openwrt.org/index.php?do=details&task_id=2172 not sure bootup procedure would have any effect on later device operation but with original fw there are bunch of "check DDR activity HIGH" "RTC reset" messages before wifi interfaces initialise |
zorxd: I just had a similar problem with an Archer c7 v2, OpenWRT 19.07.1 (ath79) ath: phy1: Unable to reset channel, reset status -5 Somehow, it's only after the third error that the radio became unusable. I then have a bunch of "disassociated due to inactivity" messages from hostapd wlan1. I clicked the "restart" button in LuCI and it started working again. A short time ago, I removed the candela tech driver/firmware and installed the non-ct ones. Not sure if it's related since this driver is only used for the 5 GHz radio (phy0) and it's the 2.4 GHz radio (phy1, ath9k) that failed. |
imac:
We run 17.01.4 (with latest package updates) on Archer C7 (4.4.92 mips / ath10k)
We have several similar ArcherC7s that experience a periodic disconnect on 2.4G band. We are fairly certain it occurs on all of them, though about 5 of them actively see this issue where we have to deal with it.
The result is no 2.4G devices can connect (no issue with 5G) until we execute "wifi" or reboot the ArcherC7s.
We have applied a cron job that runs each night and executes "wifi" to workaround this issue since 17.01.2 but believe it may have been present since 17.01.1. It is still present in 17.01.4 unfortunately so we have decided to commit some time to resolving.
In one of our office locations, to monitor this bug with the hope of oneday resolving it, we do not run the cron job, and only have two 2.4G devices connected.
On this device where we have to manually intervene to work around the problem, in the last 52 days (current uptime), the 2.4G failure has happenened three times.
Between the 1st and 2nd occurence we saw these SWBA messages in our dmesg.
[2506537.784853] ath: phy1: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x02100020 DMADBG_7=0x00028800
[2506539.854485] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.862178] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.869849] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.877511] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.885148] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.892821] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.900499] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.908171] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.915809] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.923492] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
Before each occurence of the 2.4G failure, at some point we see these messages noting the three different timestamps:
[290147.737347] ath10k_pci 0000:01:00.0: failed to parse phyerr tlv payload at byte 0
[463582.984526] ath10k_pci 0000:01:00.0: failed to parse phyerr tlv payload at byte 0
[2794383.390501] ath10k_pci 0000:01:00.0: failed to parse phyerr tlv payload at byte 0
[2794532.486186] ath10k_pci 0000:01:00.0: failed to parse phyerr tlv payload at byte 0
[4541364.695287] ath10k_pci 0000:01:00.0: failed to parse phyerr tlv payload at byte 0
[4541500.990587] ath10k_pci 0000:01:00.0: failed to parse phyerr tlv payload at byte 0
The only other message that jumps out in the dmesg is the one below, but like thw SWBA, it is not associated with each occurence.
[3311103.068324] ath: phy1: Unable to reset channel, reset status -5
The 2.4G clients are a D-LINK DCH-S150 motion detector (70:62:b8:93:98:b8) and a Google Chromecast (6C:AD:F8:4B:A3:52)
There is not much else in the dmesg (which I have attached), other then the bridge changes that occur when we execute "wifi" to resolve the issue which help identify which messages occur before resolving the issue each time (which can be days later as we only really notice when Chromecast is offline and we are trying to use it).
Nothing jumps out of our logread (and has not in the past), but typically the buffer has wrapped (as was the case today) and provided no messages related to the 2.4G clients.
Our 2.4G config is below, noting we do have a 40Mhz wide config enabled leveraging ht_coex and htmode.
config wifi-device 'radio1'
option type 'mac80211'
option hwmode '11g'
option path 'platform/qca955x_wmac'
option channel '11'
option htmode 'HT40-'
option txpower '24'
option ht_coex '0'
option noscan '1'
option country 'US'
config wifi-iface
option device 'radio1'
option network 'lan'
option mode 'ap'
option ssid 'xxxx'
option key 'xxxx'
option encryption 'psk2+ccmp'
The text was updated successfully, but these errors were encountered: