OpenWrt/LEDE Project

  • Status Closed
  • Percent Complete
    100%
  • Task Type Bug Report
  • Category Kernel
  • Assigned To No-one
  • Operating System All
  • Severity High
  • Priority Very Low
  • Reported Version Trunk
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Ulrich Mayer - 12.07.2020
Last edited by Petr Štetiar - 05.08.2020

FS#3228 - ath10k_pci 0000:00:00.0: firmware crashed

Supply the following if possible:
- Device problem occurs on
- Software versions of OpenWrt/LEDE release, packages, etc.
- Steps to reproduce

 


Closed by  Petr ┼átetiar
05.08.2020 10:14
Reason for closing:  Different project
Additional comments about closing:  

Requested

Ulrich Mayer commented on 12.07.2020 11:41

This might be an upstream problem, please point me to the right tracking system if here is not the right place to report this bug.

A few weeks ago I decided to try roaming on my wifi at home.
Setup: 2x Archer C7 v5 as AP, 1x Archer C7 v2 as router, DAWN (i.e. 802.11krv, also 802.11w which I currently don't knowingly use)
I experienced firmware crashes with kernel 4.19, but decided to wait until kernel 5.4 made it into snapshots.
The bug report is from an AP.

After more than 20 times of

Wed Jul  8 13:00:41 2020 kern.warn kernel: [ 7797.345447] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon

I get firmware crashes. The logfiles captured 3:

kernel-fail2:Wed Jul  8 13:00:44 2020 kern.err kernel: [ 7800.322765] ath10k_pci 0000:00:00.0: Cannot communicate with firmware, previous wmi cmds: 36891:1874286 36904:1874286 36952:1874264 40859:1874264, jiffies: 1875073, attempting to fake crash and restart firmware, dev-flags: 0x42
kernel-fail2:Wed Jul  8 13:00:44 2020 kern.err kernel: [ 7800.393197] ath10k_pci 0000:00:00.0: firmware crashed! (guid ac5c19c8-a67a-4669-bb2c-e7dbd341642b)
kernel-fail2:Wed Jul  8 13:18:06 2020 kern.err kernel: [ 8842.746412] ath10k_pci 0000:00:00.0: Cannot communicate with firmware, previous wmi cmds: 36891:2134867 36904:2134866 36954:2134752 36952:2134657, jiffies: 2135680, attempting to fake crash and restart firmware, dev-flags: 0x42
kernel-fail2:Wed Jul  8 13:18:07 2020 kern.err kernel: [ 8842.830762] ath10k_pci 0000:00:00.0: firmware crashed! (guid 48ca0542-2372-4a1b-a014-32a8037305ab)

fail 2 shows up again in kernel-fail3 in addition to the 3rd fail:

kernel-fail3:Wed Jul  8 14:21:45 2020 kern.err kernel: [12662.004832] ath10k_pci 0000:00:00.0: Cannot communicate with firmware, previous wmi cmds: 36891:3089744 36904:3089743 36952:3089576 40859:3089576, jiffies: 3090496, attempting to fake crash and restart firmware, dev-flags: 0x42
kernel-fail3:Wed Jul  8 14:21:46 2020 kern.err kernel: [12662.080578] ath10k_pci 0000:00:00.0: firmware crashed! (guid a5421a2b-97e2-45bc-b152-b2e4804b327f)

uptime:

14:22:29 up  3:31,  load average: 0.46, 0.16, 0.07

Model

TP-Link Archer C7 v5

Architecture

Qualcomm Atheros QCA956X ver 1 rev 0

Firmware Version

OpenWrt SNAPSHOT r13710-7cb721c03f / LuCI Master git-20.186.82618-556e354

Kernel Version

5.4.50

Firmware (unchanged from snapshot):

/lib/firmware/ath10k/QCA988X/hw2.0/firmware-2.bin

Setup:

opkg remove wpad-basic kmod-ath10k-ct
opkg install avahi-daemon-service-http avahi-nodbus-daemon \
             wpad-openssl  \
             kmod-ath10k-ct-smallbuffers
opkg install luci luci-app-dawn

I use

kmod-ath10k-ct-smallbuffers

because of

https://forum.openwrt.org/t/oom-issues-on-archer-c7-v5/60736

I tested roaming by transferring large files from my NAS to the mobile device while walking throught the house.
The handover mostily works nicely, but sometimes the clien needs to re-authenticate.

Wifi performace is bad after firmware crash, restarting network is not sufficient to fix, reboot is.

Please let me know if you need more details or config files.
I'd happily assist debugging by performing experients.

Ulrich Mayer commented on 20.07.2020 16:07

Adrian Schmutzler mentioned he has rock solid Archer C7 v5 with non-ct ath10k and firmware on openwrt 19.07, so I changed my setup to the following:

opkg remove wpad-basic kmod-ath10k-ct kmod-ath10k-ct ath10k-firmware-qca988x-ct
opkg install kmod-ath10k ath10k-firmware-qca988x
opkg install avahi-daemon-service-http avahi-nodbus-daemon \
             wpad-openssl
opkg install luci luci-app-dawn

based on todays snapshot:

OpenWrt SNAPSHOT r13888-d5a148f5c8 / LuCI Master git-20.186.79919-0c47989

Since then it is much more solid, but I still see kernel issues when doing big file transfers while roaming between different APs.
It seems to be not directly related to ath10k:

Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.212863] ------------[ cut here ]------------
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.217849] WARNING: CPU: 0 PID: 1194 at backports-5.7-rc3-1/net/mac80211/sta_info.c:1929 ieee80211_sta_update_pending_airtime+0x20c/0x214 [mac80211]
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.231699] STA ea:46:3d:86:c5:e8 AC 2 txq pending airtime underflow: 4294967100, 196
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.231702] Modules linked in: ath9k ath9k_common pppoe ppp_async iptable_nat ath9k_hw ath10k_pci ath10k_core ath xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD xt_CT pppox ppp_generic nf_nat nf_flow_table_hw nf_flow_table nf_conntrack_rtcache nf_conntrack mac80211 ipt_REJECT cfg80211 xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG slhc nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables crc_ccitt compat ledtrig_usbport ledtrig_heartbeat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.306945] CPU: 0 PID: 1194 Comm: hostapd Not tainted 5.4.52 #0
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.313147] Stack : 806a0000 80643a24 00000000 00000000 80642bf4 87c0bc7c 87e86ee0 8067dd83
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.321782]         805e0c98 000004aa 807d32d8 00000842 40000010 00000001 87c0bc30 ff466b54
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.330419]         00000000 00000000 80800000 000000e5 00000030 00000000 6420352e 342e3532
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.339056]         000000e5 8a11d732 00000000 0004af01 80000000 00000009 00000000 8770862c
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.347689]         00000009 00000842 40000010 87212460 00000003 80307718 00000000 807d0000
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.356322]         ...
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.358851] Call Trace:
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.361389] [<80069944>] show_stack+0x30/0x100
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.366010] [<8007e818>] __warn+0xc0/0x10c
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.370249] [<8007e8f0>] warn_slowpath_fmt+0x8c/0xac
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.375493] [<8770862c>] ieee80211_sta_update_pending_airtime+0x20c/0x214 [mac80211]
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.383589] [<877038ec>] ieee80211_tx_monitor+0xf0c/0x120c [mac80211]
Mon Jul 20 18:12:05 2020 kern.warn kernel: [32112.390333] ---[ end trace f6eb2110714d04d3 ]---

Btw, the non-ct version also changed the ath10k firmware from 10.1
(likely https://github.com/kvalo/ath10k-firmware/tree/master/QCA988X/hw2.0/10.1)
to 10.2.4
(likely https://github.com/kvalo/ath10k-firmware/tree/master/QCA988X/hw2.0/10.2.4-1.0)

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing