OpenWrt/LEDE Project

  • Status Unconfirmed
  • Percent Complete
    0%
  • Task Type Bug Report
  • Category Kernel
  • Assigned To No-one
  • Operating System All
  • Severity Critical
  • Priority Very Low
  • Reported Version All
  • Due in Version Undecided
  • Due Date Undecided
  • Votes 1
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Kristian Evensen - 15.11.2017

FS#1170 - mt7621: kernel errors - rcu_sched detected stalls on CPUs/tasks - again

After the work done for issue  FS#804 , the rcu_sched error seemed to be gone. However, I am now starting to see it again. Usually, at least for me, the error happens when there is large amounts of traffic and I do something with the network. My most reliable way for reproducing the error is as follows:

- Use iperf to flood a router with small packets. Other ways to stress the CPU also work, I for example triggered the error when I added very aggressive logging to the firewall.
- While the router is being flooded, I restarted networking (I am logged in to the router via UART).
- After a couple of network restarts, the error is trigger and the following is written to syslog at some interval:

[ 2251.870000] INFO: rcu_bh detected stalls on CPUs/tasks:
[ 2251.870000] 2-...: (1 GPs behind) idle=ae1/140000000000001/0 softirq=212487/217796 fqs=4380
[ 2251.870000] (detected by 1, t=6002 jiffies, g=-146, c=-147, q=4)
[ 2251.870000] Task dump for CPU 2:
[ 2251.870000] openvpn-mover.s R running 0 2598 1 0×08100004 [ 2251.870000] Stack : 8fa69998 800ebe38 00000000 8fa69998 57512e2b 000001fd 00000000 80035454
[ 2251.870000] 00000000 800edbd4 8fa69998 804b0000 00000000 00000000 00000004 00000000
[ 2251.870000] 00000000 8ea17850 8efc7ec0 800376d4 00000000 00000000 778b8930 00000012
[ 2251.870000] 00000000 004077cd 778d4000 00000000 778d55e8 778d6f7c 00000000 8002b280
[ 2251.870000] ffbffeff ffffffff 00617772 706d742f 00000000 00000000 00000001 800379dc
[ 2251.870000] ...
[ 2251.870000] Call Trace:
[ 2251.870000] [<8000bc88>] __schedule+0×574/0×758

I am also able to sometimes trigger the issue by simply issuing the reboot-command (while the CPU is stressed). I have not applied any traffic shaping to my interface, and I see the error both with kernel 4.4 and 4.9 (i.e., LEDE 17.01 and master). I don’t quite know how to progress in debugging this.

 


Kristian Evensen commented on 15.11.2017 20:26

Here is a more detailed trace. Trace was triggered by a network restart during heavy load:

[ 686.452081] INFO: rcu_sched self-detected stall on CPU
[ 686.457237] 0-...: (5999 ticks this GP) idle=b89/140000000000002/0 softirq=54653/54653 fqs=2895
[ 686.466064] (t=6000 jiffies g=20069 c=20068 q=101897)
[ 686.471268] Task dump for CPU 0:
[ 686.474477] swapper/0 R running task 0 0 0 0x00100004
[ 686.481498] Stack : 00000000 8006b3cc 00000000 00000000 00000000 8006b3cc 0000001d 00000006
[ 686.489843] 00000006 80471234 00000000 00000000 00000000 00000000 80530000 00000014
[ 686.498186] 00000000 804d3f80 804d1490 804d0000 804d1490 00000000 00000000 00000000
[ 686.506529] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 686.514871] 00000000 00000000 00000000 00000000 00000000 800a7d00 8fc0dc94 00000000
[ 686.523214] ...
[ 686.525650] Call Trace:
[ 686.528115] [<8000f7b4>] show_stack+0x54/0x88
[ 686.532458] [<800a7fc8>] rcu_dump_cpu_stacks+0xdc/0x110
[ 686.537686] [<8007b7a8>] rcu_check_callbacks+0x2cc/0x7c4
[ 686.542977] [<8007dcec>] update_process_times+0x34/0x70
[ 686.548186] [<8008d6d8>] tick_handle_periodic+0x34/0xd0
[ 686.553392] [<802b2a2c>] gic_compare_interrupt+0x2c/0x40
[ 686.558708] [<80071a48>] handle_percpu_devid_irq+0xc4/0x18c
[ 686.564255] [<8006c6cc>] generic_handle_irq+0x24/0x3c
[ 686.569291] [<80209498>] gic_handle_local_int+0x94/0xd4
[ 686.574492] [<80209654>] gic_irq_dispatch+0x10/0x20
[ 686.579346] [<8006c6cc>] generic_handle_irq+0x24/0x3c
[ 686.584390] [<8000c3a8>] do_IRQ+0x1c/0x34
[ 686.588385] [<80208740>] plat_irq_dispatch+0xb4/0xdc
[ 686.593326] [<8000a908>] except_vec_vi_end+0xb8/0xc4
[ 689.252109] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 6269 jiffies s: 245 root: 0x1/.
[ 689.262138] blocking rcu_node structures:
[ 689.266130] Task dump for CPU 0:
[ 689.269343] swapper/0 R running task 0 0 0 0x00100004
[ 689.276424] Stack : 804c5fe0 804d3f80 00000000 00000000 ffffff80 00000000 00000000 00000340
[ 689.284808] 804d148c 00000001 80470cdc 804e0000 00000001 9bff0000 9bfe8000 9bfe8000
[ 689.293224] 00000000 803dc644 00000000 00000000 804c4000 804c5ea8 00000000 803dc978
[ 689.301789] 804d148c 00000001 80470cdc 804e0000 00000001 80064dac 80530000 804d0000
[ 689.310361] 80530000 803da67c 80530000 804d0000 80530000 8051f324 80530000 804f8bac
[ 689.318776] ...
[ 689.321222] Call Trace:
[ 689.323726] [<803dc408>] __schedule+0x5d4/0x7a4
[ 689.328280] [<803dc644>] schedule+0x6c/0x84
[ 689.332524] [<803dc978>] schedule_preempt_disabled+0x10/0x1c
[ 689.338220] [<80064dac>] cpu_startup_entry+0x15c/0x170
[ 689.343434] [<804f8bac>] start_kernel+0x444/0x464

leonardo c vargas commented on 25.01.2018 15:57

having the same problem
dir-860b1

OpenWrt SNAPSHOT r5938-6f425a28a4 / LuCI Master (git-18.023.74248-ee409b6)

Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.048845] ------------[ cut here ]------------
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.058089] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x254/0x2f4
Wed Jan 24 22:53:22 2018 kern.info kernel: [ 7437.074592] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.088469] Modules linked in: pppoe ppp_async uvcvideo pppox ppp_generic nf_conntrack_ipv6 mt76x2e mt76 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY videobuf2_v4l2 slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack_netlink iptable_mangle iptable_filter ipt_ECN ip_tables input_core crc_ccitt compat sch_cake nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.229054]  sch_hfsc sch_ingress videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev ledtrig_usbport xt_set ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb dma_shared_buffer leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_button_hotplug usbcore nls_base usb_common
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.340615] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.77 #0
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.352378] Stack : 00000000 00000000 80527b4a 00000033 803e79e4 00000000 00000000 80520000
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.369016]         804c81bc 804c7da7 80460564 00000000 00000000 80523824 ffffffff 00000200
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.385651]         00200000 8006a9e8 00000001 80520000 804cdec4 804cdec8 80465170 87c0ddcc
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.402289]         00000003 800a78a0 ffffffff 00000200 00200000 00000000 00000006 00c0ddcc
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.418924]         00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.435558]         ...
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.440414] Call Trace:
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.445287] [<8000f714>] show_stack+0x54/0x88
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.453953] [<801da8fc>] dump_stack+0x8c/0xd0
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.462612] [<8002adc4>] __warn+0xe4/0x118
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.470750] [<8002ae28>] warn_slowpath_fmt+0x30/0x3c
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.480622] [<80303968>] dev_watchdog+0x254/0x2f4
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.489973] [<8007ce78>] call_timer_fn.isra.3+0x24/0x84
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.500358] [<8007d0bc>] run_timer_softirq+0x1e4/0x244
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.510575] [<8002e924>] __do_softirq+0x294/0x2e0
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.519922] [<8002ec18>] irq_exit+0x7c/0x98
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.528236] [<80209750>] plat_irq_dispatch+0xb4/0xdc
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.538103] [<8000a968>] except_vec_vi_end+0xb8/0xc4
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.547989] [<80064134>] cpu_startup_entry+0x104/0x170
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.558222] [<804edbe4>] start_kernel+0x47c/0x49c
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.567725] ---[ end trace 37648a34027f878d ]---
Wed Jan 24 22:53:22 2018 kern.err kernel: [ 7437.576938] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Wed Jan 24 22:53:22 2018 kern.info kernel: [ 7437.589281] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
Wed Jan 24 22:53:22 2018 kern.info kernel: [ 7437.601275] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=06e0e000, max=512, ctx=428, dtx=428, fdx=427, next=428
Wed Jan 24 22:53:22 2018 kern.info kernel: [ 7437.622591] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=06e14000, max=512, calc=60, drx=61
[ 7437.048845] ------------[ cut here ]------------
[ 7437.058089] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x254/0x2f4
[ 7437.074592] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
[ 7437.088469] Modules linked in: pppoe ppp_async uvcvideo pppox ppp_generic nf_conntrack_ipv6 mt76x2e mt76 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY videobuf2_v4l2 slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack_netlink iptable_mangle iptable_filter ipt_ECN ip_tables input_core crc_ccitt compat sch_cake nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb
[ 7437.229054]  sch_hfsc sch_ingress videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev ledtrig_usbport xt_set ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb dma_shared_buffer leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_button_hotplug usbcore nls_base usb_common
[ 7437.340615] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.77 #0
[ 7437.352378] Stack : 00000000 00000000 80527b4a 00000033 803e79e4 00000000 00000000 80520000
[ 7437.369016]         804c81bc 804c7da7 80460564 00000000 00000000 80523824 ffffffff 00000200
[ 7437.385651]         00200000 8006a9e8 00000001 80520000 804cdec4 804cdec8 80465170 87c0ddcc
[ 7437.402289]         00000003 800a78a0 ffffffff 00000200 00200000 00000000 00000006 00c0ddcc
[ 7437.418924]         00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 7437.435558]         ...
[ 7437.440414] Call Trace:
[ 7437.445287] [<8000f714>] show_stack+0x54/0x88
[ 7437.453953] [<801da8fc>] dump_stack+0x8c/0xd0
[ 7437.462612] [<8002adc4>] __warn+0xe4/0x118
[ 7437.470750] [<8002ae28>] warn_slowpath_fmt+0x30/0x3c
[ 7437.480622] [<80303968>] dev_watchdog+0x254/0x2f4
[ 7437.489973] [<8007ce78>] call_timer_fn.isra.3+0x24/0x84
[ 7437.500358] [<8007d0bc>] run_timer_softirq+0x1e4/0x244
[ 7437.510575] [<8002e924>] __do_softirq+0x294/0x2e0
[ 7437.519922] [<8002ec18>] irq_exit+0x7c/0x98
[ 7437.528236] [<80209750>] plat_irq_dispatch+0xb4/0xdc
[ 7437.538103] [<8000a968>] except_vec_vi_end+0xb8/0xc4
[ 7437.547989] [<80064134>] cpu_startup_entry+0x104/0x170
[ 7437.558222] [<804edbe4>] start_kernel+0x47c/0x49c
[ 7437.567725] ---[ end trace 37648a34027f878d ]---
[ 7437.576938] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[ 7437.589281] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[ 7437.601275] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=06e0e000, max=512, ctx=428, dtx=428, fdx=427, next=428
[ 7437.622591] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=06e14000, max=512, calc=60, drx=61
Project Manager
Felix Fietkau commented on 27.02.2018 08:49

Please try the latest version

camel commented on 20.03.2018 03:17

problem still exists ..

Project Manager
Felix Fietkau commented on 15.04.2018 09:19

Should be fixed now, please test

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing