New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FS#295 - ath10k_pci 0001:01:00.0: SWBA overrun on vdev #6665
Comments
nbd: Please try the latest version |
kraut: I also saw Call Traces for SWBA overrun with ath10k_pci, but on Archer C7 V2 on r3189-12db207. |
lede-0x7f: Same router (not sure about the revision), but no issue w/ LEDE Reboot 17.01.0-rc1 r3042 |
slthomason: We are able to reproduce these readily and quickly on the Archer C7 v2. Here is how we have been able to reproduce it. We get these errors: Further, the odd thing is that these errors are on the 5Ghz code (ath10k), however, we only replicate this when connected to the 2Ghz band (ath9k). When we connect to the 5Ghz and put load from our systems and the iperf - it handles it fine. However, when we go back to putting load from our system + iperf on the 2Ghz band, we get the below errors. We have seen this with any system that puts load on kernel calls - not just our systems. When the kworker gets at all burdened - this happens. We have now confirmed this on the following build combinations: A couple of questions:
|
hertog_jan: I am also seeing this is my logs, in most recent master: LuCI Master (git-17.249.21998-0c99b64) / LEDE Reboot SNAPSHOT r4797-23317f1. It is so bad that the router is running a constant load of about 1, 50% sys usage. I have disabled both 5Ghz and 2.4GHz radios in turn, the messages keep coming. |
imac: We have several ArcherC7s that experience periodic disconnect on 2.4G band. We are fairly certain it occurs on all of them. And we do see this SWBA message, though not in direct correlation with each failure of the 2.4G. The result is no 2.4G devices can connect (no issue with 5G) until we execute "wifi" or reboot the ArcherC7s. We have applied a cron job that runs each night and executes "wifi" to workaround this issue since 17.01.2 but believe it may have been present since 17.01.1. It is still present in 17.01.4 unfortunately, so we are confident that this is no transient bug, so will begin posting to try and resolve. In on of our office locations, to monitor this bug with the hope of oneday resolving it, we do not run the cron job, and only have two 2.4G devices connected. In the last 52 days, the 2.4G failure happenened three times. Between the 1st and 2nd occurence we saw these similar SWBA messages in our dmesg. Could be related to this ticket, so we have provided those details here. [2506537.784853] ath: phy1: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x02100020 DMADBG_7=0x00028800 Before each occurence of the 2.4G failure, at some point we see these messages noting the three different timestamps: [290147.737347] ath10k_pci 0000:01:00.0: failed to parse phyerr tlv payload at byte 0 [2794383.390501] ath10k_pci 0000:01:00.0: failed to parse phyerr tlv payload at byte 0 [4541364.695287] ath10k_pci 0000:01:00.0: failed to parse phyerr tlv payload at byte 0 Since there is not much else in the dmesg, other then the bridge changes, when we execute "wifi" to resolve the issue, it is pretty easy to surmize that these messages are related to our dropping of 2.4G as they seem to be present before the bridge messages that occur when we resolve the problem. The 2.4G clients are a D-LINK DCH-S150 motion detector (70:62:b8:93:98:b8) and a Google Chromecast (6C:AD:F8:4B:A3:52) I also added the completed dmesg for reference. |
m0urs: I see the same messages and I also have the issue that all Wifi devices get disconnected from time to time and I need to restart Wifi in order to fix that. I also do that with a cronjob which checks every few minutes if there are still some deivces connected. If not it restarts Wifi again. However it would be great to have permanent solution instead of that workaround ... Device: Archer C7 v2 |
ynezz: Please try again with latest snapshot build. |
mfortini: Same behavior with 19.07.4 on a Netgear r7800 [15652.415824] ath10k_pci 0001:01:00.0: Invalid VHT mcs 15 peer stats |
The release is is EOL, please comment if this affects you with currently supported releases. |
jpereira:
Supply the following if possible:
[61789.980553] ------------[ cut here ]------------
[61789.980600] WARNING: CPU: 0 PID: 3 at net/core/dev.c:4837 net_rx_action+0x154/0x2e4()
[61789.984240] Modules linked in: pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_id xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_NETMAP xt_LOG xt_HL xt_DSCP xt_CT xt_CLASSIFY ts_kmp ts_fsm ts_bm slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack_netlink nf_conntrack iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt br_netfilter em_cmp sch_teql em_nbyte sch_htb sch_pie sch_gred sch_dsmark cls_basic act_ipt sch_prio em_text
[61790.061568] sch_codel sch_tbf sch_sfq em_meta act_police sch_fq sch_red act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ath10k_pci ath10k_core ath mac80211 cfg80211 compat ledtrig_usbport xt_set ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables ifb ip6_tunnel tunnel6 tun snd_compress snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_rawmidi snd_seq_device snd_hwdep snd input_core soundcore
[61790.130320] usb_storage uhci_hcd f2fs ext4 jbd2 mbcache crc32c_generic crypto_hash leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_qcom dwc3_of_simple ohci_platform ohci_hcd phy_qcom_dwc3 ahci ehci_platform ehci_hcd sd_mod ahci_platform libahci_platform libahci libata scsi_mod gpio_button_hotplug usbcore nls_base usb_common
[61790.158798] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.4.30 #0
[61790.159062] Hardware name: Qualcomm (Flattened Device Tree)
[61790.165160] [] (unwind_backtrace) from [] (show_stack+0x14/0x20)
[61790.170534] [] (show_stack) from [] (dump_stack+0x8c/0xa0)
[61790.178521] [] (dump_stack) from [] (warn_slowpath_common+0xa4/0xd0)
[61790.185550] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x1c/0x24)
[61790.193798] [] (warn_slowpath_null) from [] (net_rx_action+0x154/0x2e4)
[61790.202568] [] (net_rx_action) from [] (__do_softirq+0xdc/0x230)
[61790.210636] [] (__do_softirq) from [] (run_ksoftirqd+0x34/0x64)
[61790.218626] [] (run_ksoftirqd) from [] (smpboot_thread_fn+0x190/0x1b8)
[61790.226006] [] (smpboot_thread_fn) from [] (kthread+0xf8/0x100)
[61790.234335] [] (kthread) from [] (ret_from_fork+0x14/0x3c)
[61790.241951] ---[ end trace bbf62e8dea16c714 ]---
[61790.250226] ath10k_pci 0001:01:00.0: SWBA overrun on vdev 1, skipped old beacon
The text was updated successfully, but these errors were encountered: