OpenWrt/LEDE Project

  • Status Closed
  • Percent Complete
    100%
  • Task Type Bug Report
  • Category Kernel
  • Assigned To No-one
  • Operating System All
  • Severity Medium
  • Priority Very Low
  • Reported Version Trunk
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Ansuel - 13.06.2016
Last edited by Petr Štetiar - 10.04.2019

FS#9 - Kernel panic with SQM scripts

When i select a script different than simplest.qos in the sqm list i get the log full of kernel panic message
If you need more information tell me.
To reproduce i have a wdr3600
just install the sqm package and select the script and watch the log
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.147796] ————[ cut here ]———— Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.147817] WARNING: CPU: 0 PID: 20621 at net/sched/sch_hfsc.c:1426 0x871e9e6c()
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.147825] Modules linked in: ath9k ath9k_common pppoe ppp_async iptable_nat ath9k_hw ath pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 mac80211 ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_id xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_NETMAP xt_LOG xt_HL xt_DSCMon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148143] CPU: 0 PID: 20621 Comm: dropbear Tainted: G W 4.4.12 #1
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148151] Stack : 803dc584 00000000 00000001 80430000 8782a080 80426f63 803bdcb0 0000508d
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148151] 80493790 873e9f90 873e9c68 0000000a 00000100 800a6854 803c32bc 80420000
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148151] 00000003 873e9f90 803c16c8 8500fad4 00000100 800a4820 00000000 00000000
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148151] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148151] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148151] ...
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148240] Call Trace:
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148255] [<80071a94>] show_stack+0×50/0×84 Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148271] [<80081628>] warn_slowpath_common+0xa0/0xd0
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148286] [<800816dc>] warn_slowpath_null+0×18/0×24 Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148315] [<871e9e6c>] 0x871e9e6c
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148326]
Mon Jun 13 21:13:07 2016 kern.warn kernel: [30780.148335] —[ end trace ce6ca764afdc3e14 ]—

here the log i get a lot

Closed by  Petr Štetiar
10.04.2019 09:20
Reason for closing:  Works for me
Additional comments about closing:  

Please try to reproduce the problem with latest snapshot/release images and request reopen of this issue if the problem still persist.

neheb commented on 17.06.2016 03:08

i think i got a similar issue on my ramips device when it switched from the 3.18 kernel to 4.4. you should try using sched_cake and see if it causes the same problem.

Trismo commented on 18.06.2016 01:04


With kernel 4.4.13 for Ar71xx/wdr4300 same HW dint have any problem.
REVISION='r707'

Ansuel commented on 20.06.2016 16:54

Ok today i have updated the router i have the same problem... any idea?

mirlang commented on 20.06.2016 16:57

i can reproduce it on my rspro (ar71xx) and on my wdr4900 (mpc85xx), it just needs a certain amount of UDP traffic (not sure, if it depends on UDP or just small packets)... lots of warnings in the log, and after some time the router becomes inaccessible, then stops forwarding and sometimes even reboots

it's definitively related to HFSC (on linux-4.4)... not sure where to reports this bug :(

Ansuel commented on 20.06.2016 17:43

exactly me too... so i reported it right at the source of the project
if you set the scrpt nothing happen but when you start with internet traffic the log full with that

neheb commented on 21.06.2016 03:00

This is a kernel problem which has been reported upstream: https://bugzilla.kernel.org/show_bug.cgi?id=109581

If you want to keep using SQM, switch to using the cake shaper(requires kmod-sched-cake). it replaces htsc as well as the other ones(except fq_codel). No crashes here.

edit: could also try disabling TSO on your ethernet interfaces. It might work. Requires ethtool.

Ansuel commented on 21.06.2016 10:37

For me the TSO it's arleady disabled
i will try the cacke shaper... what are the difference between them ?

Ansuel commented on 21.06.2016 10:49

Ok i have tried now the cake shaper and i instal the extra experimental sqm script...
Now i have cacke shaper and test triple wan script and i don't have any error at all.
I get A with bufferbloat and i think it's working because i was downloading while i was doing the test.

So now how to alert the dev's that they need to set cake as default couse others are broken?

neheb commented on 22.06.2016 00:30

cake is still out of tree and you need to install it separately from sqm. won't be default soon.

the difference between cake and the default SQM setup is that cake has lower CPU usage.

Ansuel commented on 22.06.2016 09:10

if it better than the default one why it's not included?

moeller0 commented on 23.06.2016 10:27

Dear All,

Please let me try to answer a few of the questions above:

Why is cake not the default?
Cake is not per-se "better" than htb+fq_codel and cake is still under (more or less) active development so it certainly is not yet ready for becoming the default. Simple.qos with its combination of HTB and fq_codel still is the recommended default, so unless you want to participate in active debugging and development please stick to simple.qos or simplest.qos.

Why cake is not included?
Cake just became available as an easy to install kernel module in LEDE to allow wider testing. But since it is not (yet) recommended as default it also is not installed by default (people might be unhappy if sqm-scripts would install unnecessary modules wasting their space).

Does cake use less CPU than HTB+fq_codel?
Some time last year tests with an earlier version of cake indicated that in a CPU limited situation cake might allow a higher overall shaper bandwidth than the default HTB+fq_codel combination. More recent tests are not that conclusive. Especially it was shown that HTB behaves differently from HFSC and cake in CPU limited mode: HTB will keep the added latency low, sacrificing more bandwidth (sometimes considerably more bandwidth) while the other two shaper sacrifice less bandwidth but will also increase the latency under load (or show more bufferbloat).

What about sqm-scripts-extra scripts?
The sqm-scripts extra scripts are really just for wider testing, please do not relay on them being available for longer times/.

[OT] SQM-scripts-exts: What is the difference between the _LAN_ and _WAN_ variants?
Cake promises much better isolation of internal host IPs versus each other than the other qdiscs. But to be able to implement per-internal-host-IP fairness cake needs to see the internal IPs, in the typical home situation sqm gets instantiated on the WAN interface that typically also performs NAT for IPv4. Cake will only be able to see one internal address if instantiated there, making per-internal-IP isolation degrade into the default per-flw isolation. To allow to test whether cake's two relevant isolation options (triple and dual) actually work in the real world the _LAN_ scripts are prepared to be instantiated on internal LAN interfaces of a home router, since on the LAN ports the internal IPs are still visible. Please note that typically the bridged WLAN interfaces will not be covered by the shaping, making the LAN variant scripts not generally recommended solutions, but pure testing devices. The ideal test would be to hook upp another switch/dumbAP behind the shaped LAN port and try to mix traffic from different host and see if for example heavy bit-torrenting still badly affects the connections of other internal hosts. If anybody actually tests this, please report any results as issues under https://github.com/tohojo/sqm-scripts thanks in advance
[/OT]
Best Regards

      M.
Ansuel commented on 23.06.2016 12:58

thx for the explanation. Currently i'm using the triple wan script with cake
My connetion is a pppoe with atm overhead so is it wrong how i set the sqm settings?

And you did'nt explain the wan variants. It's the same?

diizzyy commented on 05.08.2016 13:11

Also affects qos-scripts on trunk r1242 (ramips, MT7621, DIR-860L B1)

[ 3034.313000] ------------[ cut here ]------------
[ 3034.322000] WARNING: CPU: 3 PID: 0 at net/sched/sch_hfsc.c:1426 0x86921ea0()
[ 3034.336000] Modules linked in: ifb qcserial pppoe ppp_async option iptable_nat cdc_mbim usb_wwan sierra_net sierra rndis_host qmi_wwan pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE huawei_cdc_ncm cdc_subset cdc_ncm cdc_ether cdc_eem xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_id xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CT xt_CLASSIFY usbserial usbnet usblp slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt cdc_wdm act_connmark nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress mt7603e mt76x2e mt76 mac80211 cfg80211 compat ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nfsd nfsv3 nfs tun loop vfat fat lockd sunrpc grace nls_utf8 nls_iso8859_15 nls_iso8859_1 nls_cp437 usb_storage leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd sd_mod scsi_mod gpio_button_hotplug ext4 jbd2 mbcache exfat usbcore nls_base usb_common mii crypto_hash [last unloaded: ifb]
[ 3034.573000] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.15 #5
[ 3034.585000] Stack : 00000000 00000000 804b6862 00000033 00000000 00000000 80460000 804d0000
[ 3034.585000]    8783bf10 8045dc83 803db648 00000003 00000000 804b367c 86d9dc68 86b6a480
[ 3034.585000]    00000008 8006349c 80460000 804d0000 804621d8 804621dc 803dff70 87865c04
[ 3034.585000]    00000003 80061228 86d9dc68 86b6a480 00000008 00000000 00000000 00865c04
[ 3034.585000]    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 3034.585000]    ...
[ 3034.656000] Call Trace:
[ 3034.661000] [<800165c8>] show_stack+0x50/0x84
[ 3034.669000] [<801b5200>] dump_stack+0x84/0xbc
[ 3034.678000] [<8002be00>] warn_slowpath_common+0xa0/0xd0
[ 3034.688000] [<8002beb4>] warn_slowpath_null+0x18/0x24
[ 3034.698000] [<86921ea0>] 0x86921ea0
diizzyy commented on 24.08.2016 05:42

Doesn't occur on trunk r1122 (ar71xx, AR7242, Mikrotik RB750GL) running qos-scripts.

mirlang commented on 25.08.2016 08:04

nack, diizzyy, i can still kill my router (means: it stops forwarding traffic) on 4.4.19 by just downloading a well seeded torrent, and slowpath-warnings are still there... doesn't happen with HTB instead of HFSC

diizzyy commented on 29.08.2016 08:47

I didn't say it was fixed, it does seem to only occur on certain hardware/SoCs. Also, please state version of trunk and hardware.

Dave Täht commented on 04.09.2016 19:12

I have never been huge on hfsc. HTB is much better tested, as is "cake".

Tommaso Ercole commented on 11.11.2016 07:46

Hello I posted a crash in  FS#277 . I was using fq_codel with nxt_routed_hfsc.qos. Unfortunately I'm not able to use cake or simple.qos because they cut my bandwidth from 20 to 2 mbit. I could not find any configuration to don't allow this apart using the above one.

Obviously on Linksys EA8500 nxt_routed_hfsc.qos crashes. Same error here and then the router stop to accept traffic from WAN and LAN interfaces

moeller0 commented on 11.11.2016 08:21

@Tommaso Ercole HTB should be similar enough to hfsc that a drop from 20 to 2 might indicate some other bug in sqm-scripts. Maybe you could help me debug this? Potentially through the sqm-scripts github site?

Tommaso Ercole commented on 11.11.2016 20:50

If it is not difficult... I think my wife gave me an ultimatum for my "networking" tests

Hannu Nyman commented on 12.11.2016 15:11

One more me-too report of HFSC crashes. I tested new R7800 with different qdiscs and HFSC seems to cause problems.

Netgear R7800, IPQ8065 SoC (ipq806x platform in LEDE).
LEDE Reboot r2154, kernel 4.4.30

A few slightly different variations of the crash, but are at net/sched/sch_hfsc.c:1426 hfsc_dequeue+0x188/0x568

Example below:

[67086.803277] ------------[ cut here ]------------
[67086.806968] WARNING: CPU: 0 PID: 3 at net/sched/sch_hfsc.c:1426 hfsc_dequeue+0x188/0x568 [sch_hfsc]()
[67086.811517] Modules linked in: pppoe ppp_async iptable_nat ip6table_nat pptp pppox ppp_mppe ppp_generic nf_nat_ipv6 nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_id xt_hl xt_helper xt_esp xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CT xt_CLASSIFY usbserial slhc nf_reject_ipv4 nf_nat_rtsp nf_nat_redirect nf_nat_masquerade_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtsp nf_conntrack_rtcache iptable_raw iptable_mangle iptable_filter ipt_ah ipt_ECN ip_tables crc_ccitt sch_cake em_cmp sch_teql em_nbyte sch_htb sch_tbf sch_dsmark sch_pie sch_gred em_meta cls_basic act_ipt sch_prio em_text sch_codel sch_sfq act_police sch_fq sch_red act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ath10k_pci ath10k_core ath mac80211 cfg80211 compat ledtrig_usbport xt_set ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_NPT ip6t_MASQUERADE nf_nat_masquerade_ipv6 nf_nat nf_conntrack ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables msdos ip_gre gre ifb sit tunnel4 ip_tunnel tun vfat fat ntfs hfsplus cifs nls_utf8 nls_iso8859_15 nls_iso8859_1 nls_cp850 nls_cp437 nls_cp1250 sha256_generic sha1_generic md5 md4 hmac ecb des_generic usb_storage leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_qcom dwc3_of_simple ohci_platform ohci_hcd phy_qcom_dwc3 ahci ehci_platform ehci_hcd sd_mod ahci_platform libahci_platform libahci libata scsi_mod gpio_button_hotplug ext4 jbd2 mbcache usbcore nls_base usb_common cryptomgr aead crypto_null crc32c_generic crypto_hash
[67086.999791] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G        W       4.4.30 #0
[67087.000143] Hardware name: Qualcomm (Flattened Device Tree)
[67087.007284] [<c022072c>] (unwind_backtrace) from [<c021d21c>] (show_stack+0x14/0x20)
[67087.012830] [<c021d21c>] (show_stack) from [<c03b7c1c>] (dump_stack+0x8c/0xa0)
[67087.020816] [<c03b7c1c>] (dump_stack) from [<c0228a24>] (warn_slowpath_common+0xa4/0xd0)
[67087.027845] [<c0228a24>] (warn_slowpath_common) from [<c0228b04>] (warn_slowpath_null+0x1c/0x24)
[67087.036121] [<c0228b04>] (warn_slowpath_null) from [<bf574490>] (hfsc_dequeue+0x188/0x568 [sch_hfsc])
[67087.044895] [<bf574490>] (hfsc_dequeue [sch_hfsc]) from [<c052a990>] (__qdisc_run+0xcc/0x1b4)
[67087.053983] [<c052a990>] (__qdisc_run) from [<c05083a8>] (net_tx_action+0xf4/0x180)
[67087.062488] [<c05083a8>] (net_tx_action) from [<c022bbc8>] (__do_softirq+0xdc/0x230)
[67087.069948] [<c022bbc8>] (__do_softirq) from [<c022bd50>] (run_ksoftirqd+0x34/0x64)
[67087.077937] [<c022bd50>] (run_ksoftirqd) from [<c02466bc>] (smpboot_thread_fn+0x190/0x1b8)
[67087.085321] [<c02466bc>] (smpboot_thread_fn) from [<c024381c>] (kthread+0xf8/0x100)
[67087.093648] [<c024381c>] (kthread) from [<c0209cb8>] (ret_from_fork+0x14/0x3c)
[67087.101259] ---[ end trace 829b74d04ace8079 ]---

Ps. Somebody with edit rights might add "HFSC qdisc" to be visible in the bug title.

moeller0 commented on 12.11.2016 21:35

@Tommaso ah, I have the same issue, my family appreciates me not taking down our internet connection for testing, so you have my sympathies. So if this is too inconvienent stick to HFSC (but du\o try cake if possible, while far from perfect it still has a number of great ideas making it worth testing)

@Hannum Nyman: I concur, since HFSC is the root cause and SQM is only implicated because one of its (non-default) scripts actually sets up a HFSC instance, so it might be the messenger but it is not the cause. Maybe "Kernel panic with HFSC (triggered by SQM scripts)" would be a better name ;)

JD commented on 13.11.2016 17:34

Looks like im on the same problems - but router looks like stable.
In not sure how to get other times in the front row.
Im on a early archerc5/c7 with lede:
Linux Archer-Lede 4.4.30 #0 Wed Nov 9 11:17:52 2016 mips GNU/Linux

Found the following in my dmesg.

[54837.883304] ------------[ cut here ]------------
[54837.888030] WARNING: CPU: 0 PID: 3 at net/core/dev.c:4837 net_rx_action+0x138/0x2c8()
[54837.896015] Modules linked in: pppoe ppp_async iptable_nat ath9k pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE ath9k_common xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_id xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt ath9k_hw ath10k_pci ath10k_core ath mac80211 cfg80211 compat ledtrig_usbport ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
[54837.965308] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.4.30 #0
[54837.971493] Stack : 803ec784 00000000 00000001 80440000 87c28c80 80438e63 803cde2c 00000003
[54837.971493]    804a379c 00000040 00000042 00000102 00000001 800a72f0 803d3490 80430000
[54837.971493]    00000003 00000040 803d189c 87c41d3c 00000001 800a526c 00000000 00000000
[54837.971493]    00000001 801f4b00 00000000 00000000 00000000 00000000 00000000 00000000
[54837.971493]    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[54837.971493]    ...
[54838.007839] Call Trace:
[54838.010339] [<80071c38>] show_stack+0x50/0x84
[54838.014768] [<800819b8>] warn_slowpath_common+0xa0/0xd0
[54838.020077] [<80081a70>] warn_slowpath_null+0x18/0x24
[54838.025208] [<8027810c>] net_rx_action+0x138/0x2c8
[54838.030085] [<80083f34>] __do_softirq+0x250/0x298
[54838.034861] [<80083fa4>] run_ksoftirqd+0x28/0x60
[54838.039555] [<8009a96c>] smpboot_thread_fn+0x158/0x188
[54838.044776] [<8009839c>] kthread+0xd8/0xec
[54838.048941] [<80060878>] ret_from_kernel_thread+0x14/0x1c
[54838.054416]
[54838.055932] ---[ end trace 53e447d7b63cfb5f ]---

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing