New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FS#9 - Kernel panic with SQM scripts #7087
Comments
neheb: i think i got a similar issue on my ramips device when it switched from the 3.18 kernel to 4.4. you should try using sched_cake and see if it causes the same problem. |
trismo:
|
Ansuel: Ok today i have updated the router i have the same problem... any idea? |
mirlang: i can reproduce it on my rspro (ar71xx) and on my wdr4900 (mpc85xx), it just needs a certain amount of UDP traffic (not sure, if it depends on UDP or just small packets)... lots of warnings in the log, and after some time the router becomes inaccessible, then stops forwarding and sometimes even reboots it's definitively related to HFSC (on linux-4.4)... not sure where to reports this bug :( |
Ansuel: exactly me too... so i reported it right at the source of the project |
neheb: This is a kernel problem which has been reported upstream: https://bugzilla.kernel.org/show_bug.cgi?id=109581 If you want to keep using SQM, switch to using the cake shaper(requires kmod-sched-cake). it replaces htsc as well as the other ones(except fq_codel). No crashes here. edit: could also try disabling TSO on your ethernet interfaces. It might work. Requires ethtool. |
Ansuel: For me the TSO it's arleady disabled |
Ansuel: Ok i have tried now the cake shaper and i instal the extra experimental sqm script... So now how to alert the dev's that they need to set cake as default couse others are broken? |
neheb: cake is still out of tree and you need to install it separately from sqm. won't be default soon. the difference between cake and the default SQM setup is that cake has lower CPU usage. |
Ansuel: if it better than the default one why it's not included? |
moeller0: Dear All, Please let me try to answer a few of the questions above: Why is cake not the default? Why cake is not included? Does cake use less CPU than HTB+fq_codel? What about sqm-scripts-extra scripts? [OT] SQM-scripts-exts: What is the difference between the LAN and WAN variants? |
Ansuel: thx for the explanation. Currently i'm using the triple wan script with cake And you did'nt explain the wan variants. It's the same? |
diizzyy: Also affects qos-scripts on trunk r1242 (ramips, MT7621, DIR-860L B1)
[ 3034.313000] ------------[ cut here ]------------
[ 3034.322000] WARNING: CPU: 3 PID: 0 at net/sched/sch_hfsc.c:1426 0x86921ea0()
[ 3034.336000] Modules linked in: ifb qcserial pppoe ppp_async option iptable_nat cdc_mbim usb_wwan sierra_net sierra rndis_host qmi_wwan pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE huawei_cdc_ncm cdc_subset cdc_ncm cdc_ether cdc_eem xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_id xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CT xt_CLASSIFY usbserial usbnet usblp slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt cdc_wdm act_connmark nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress mt7603e mt76x2e mt76 mac80211 cfg80211 compat ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nfsd nfsv3 nfs tun loop vfat fat lockd sunrpc grace nls_utf8 nls_iso8859_15 nls_iso8859_1 nls_cp437 usb_storage leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd sd_mod scsi_mod gpio_button_hotplug ext4 jbd2 mbcache exfat usbcore nls_base usb_common mii crypto_hash [last unloaded: ifb]
[ 3034.573000] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.15 #5
[ 3034.585000] Stack : 00000000 00000000 804b6862 00000033 00000000 00000000 80460000 804d0000
[ 3034.585000] 8783bf10 8045dc83 803db648 00000003 00000000 804b367c 86d9dc68 86b6a480
[ 3034.585000] 00000008 8006349c 80460000 804d0000 804621d8 804621dc 803dff70 87865c04
[ 3034.585000] 00000003 80061228 86d9dc68 86b6a480 00000008 00000000 00000000 00865c04
[ 3034.585000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 3034.585000] ...
[ 3034.656000] Call Trace:
[ 3034.661000] [<800165c8>] show_stack+0x50/0x84
[ 3034.669000] [<801b5200>] dump_stack+0x84/0xbc
[ 3034.678000] [<8002be00>] warn_slowpath_common+0xa0/0xd0
[ 3034.688000] [<8002beb4>] warn_slowpath_null+0x18/0x24
[ 3034.698000] [<86921ea0>] 0x86921ea0
|
diizzyy: Doesn't occur on trunk r1122 (ar71xx, AR7242, Mikrotik RB750GL) running qos-scripts. |
mirlang: nack, diizzyy, i can still kill my router (means: it stops forwarding traffic) on 4.4.19 by just downloading a well seeded torrent, and slowpath-warnings are still there... doesn't happen with HTB instead of HFSC |
diizzyy: I didn't say it was fixed, it does seem to only occur on certain hardware/SoCs. Also, please state version of trunk and hardware. |
dtaht: I have never been huge on hfsc. HTB is much better tested, as is "cake". |
eTomm: Hello I posted a crash in FS#277. I was using fq_codel with nxt_routed_hfsc.qos. Unfortunately I'm not able to use cake or simple.qos because they cut my bandwidth from 20 to 2 mbit. I could not find any configuration to don't allow this apart using the above one. Obviously on Linksys EA8500 nxt_routed_hfsc.qos crashes. Same error here and then the router stop to accept traffic from WAN and LAN interfaces |
moeller0: @tommaso Ercole HTB should be similar enough to hfsc that a drop from 20 to 2 might indicate some other bug in sqm-scripts. Maybe you could help me debug this? Potentially through the sqm-scripts github site? |
eTomm: If it is not difficult... I think my wife gave me an ultimatum for my "networking" tests |
hnyman: One more me-too report of HFSC crashes. I tested new R7800 with different qdiscs and HFSC seems to cause problems. Netgear R7800, IPQ8065 SoC (ipq806x platform in LEDE). A few slightly different variations of the crash, but are at net/sched/sch_hfsc.c:1426 hfsc_dequeue+0x188/0x568 Example below:
Ps. Somebody with edit rights might add "HFSC qdisc" to be visible in the bug title. |
moeller0: @tommaso ah, I have the same issue, my family appreciates me not taking down our internet connection for testing, so you have my sympathies. So if this is too inconvienent stick to HFSC (but du\o try cake if possible, while far from perfect it still has a number of great ideas making it worth testing) @hannum Nyman: I concur, since HFSC is the root cause and SQM is only implicated because one of its (non-default) scripts actually sets up a HFSC instance, so it might be the messenger but it is not the cause. Maybe "Kernel panic with HFSC (triggered by SQM scripts)" would be a better name ;) |
DoubleQ: Looks like im on the same problems - but router looks like stable. Found the following in my dmesg. |
Ansuel:
When i select a script different than simplest.qos in the sqm list i get the log full of kernel panic message
If you need more information tell me.
To reproduce i have a wdr3600
just install the sqm package and select the script and watch the log
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.147796] ------------[ cut here ]------------
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.147817] WARNING: CPU: 0 PID: 20621 at net/sched/sch_hfsc.c:1426 0x871e9e6c()
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.147825] Modules linked in: ath9k ath9k_common pppoe ppp_async iptable_nat ath9k_hw ath pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 mac80211 ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_id xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_NETMAP xt_LOG xt_HL xt_DSCMon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148143] CPU: 0 PID: 20621 Comm: dropbear Tainted: G W 4.4.12 #1
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148151] Stack : 803dc584 00000000 00000001 80430000 8782a080 80426f63 803bdcb0 0000508d
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148151] 80493790 873e9f9 873e9c68 0000000a 00000100 800a6854 803c32bc 80420000
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148151] 00000003 873e9f9 803c16c8 8500fad4 00000100 800a4820 00000000 00000000
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148151] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148151] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148151] ...
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148240] Call Trace:
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148255] [<80071a94>] show_stack+0x50/0x84
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148271] [<80081628>] warn_slowpath_common+0xa0/0xd0
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148286] [<800816dc>] warn_slowpath_null+0x18/0x24
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148315] [<871e9e6c>] 0x871e9e6c
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148326]
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148335] ---[ end trace ce6ca764afdc3e14 ]---
here the log i get a lot
The text was updated successfully, but these errors were encountered: