Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#1170 - mt7621: kernel errors - rcu_sched detected stalls on CPUs/tasks - again #8656

Closed
openwrt-bot opened this issue Nov 15, 2017 · 5 comments
Labels

Comments

@openwrt-bot
Copy link

kristrev:

After the work done for issue FS#804, the rcu_sched error seemed to be gone. However, I am now starting to see it again. Usually, at least for me, the error happens when there is large amounts of traffic and I do something with the network. My most reliable way for reproducing the error is as follows:

  • Use iperf to flood a router with small packets. Other ways to stress the CPU also work, I for example triggered the error when I added very aggressive logging to the firewall.
  • While the router is being flooded, I restarted networking (I am logged in to the router via UART).
  • After a couple of network restarts, the error is trigger and the following is written to syslog at some interval:

[ 2251.870000] INFO: rcu_bh detected stalls on CPUs/tasks:
[ 2251.870000] 2-...: (1 GPs behind) idle=ae1/140000000000001/0 softirq=212487/217796 fqs=4380
[ 2251.870000] (detected by 1, t=6002 jiffies, g=-146, c=-147, q=4)
[ 2251.870000] Task dump for CPU 2:
[ 2251.870000] openvpn-mover.s R running 0 2598 1 0x08100004
[ 2251.870000] Stack : 8fa69998 800ebe38 00000000 8fa69998 57512e2b 000001fd 00000000 80035454
[ 2251.870000] 00000000 800edbd4 8fa69998 804b0000 00000000 00000000 00000004 00000000
[ 2251.870000] 00000000 8ea17850 8efc7ec0 800376d4 00000000 00000000 778b8930 00000012
[ 2251.870000] 00000000 004077cd 778d4000 00000000 778d55e8 778d6f7c 00000000 8002b280
[ 2251.870000] ffbffeff ffffffff 00617772 706d742f 00000000 00000000 00000001 800379dc
[ 2251.870000] ...
[ 2251.870000] Call Trace:
[ 2251.870000] [<8000bc88>] __schedule+0x574/0x758

I am also able to sometimes trigger the issue by simply issuing the reboot-command (while the CPU is stressed). I have not applied any traffic shaping to my interface, and I see the error both with kernel 4.4 and 4.9 (i.e., LEDE 17.01 and master). I don't quite know how to progress in debugging this.

@openwrt-bot
Copy link
Author

kristrev:

Here is a more detailed trace. Trace was triggered by a network restart during heavy load:

[ 686.452081] INFO: rcu_sched self-detected stall on CPU
[ 686.457237] 0-...: (5999 ticks this GP) idle=b89/140000000000002/0 softirq=54653/54653 fqs=2895
[ 686.466064] (t=6000 jiffies g=20069 c=20068 q=101897)
[ 686.471268] Task dump for CPU 0:
[ 686.474477] swapper/0 R running task 0 0 0 0x00100004
[ 686.481498] Stack : 00000000 8006b3cc 00000000 00000000 00000000 8006b3cc 0000001d 00000006
[ 686.489843] 00000006 80471234 00000000 00000000 00000000 00000000 80530000 00000014
[ 686.498186] 00000000 804d3f80 804d1490 804d0000 804d1490 00000000 00000000 00000000
[ 686.506529] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 686.514871] 00000000 00000000 00000000 00000000 00000000 800a7d00 8fc0dc94 00000000
[ 686.523214] ...
[ 686.525650] Call Trace:
[ 686.528115] [<8000f7b4>] show_stack+0x54/0x88
[ 686.532458] [<800a7fc8>] rcu_dump_cpu_stacks+0xdc/0x110
[ 686.537686] [<8007b7a8>] rcu_check_callbacks+0x2cc/0x7c4
[ 686.542977] [<8007dcec>] update_process_times+0x34/0x70
[ 686.548186] [<8008d6d8>] tick_handle_periodic+0x34/0xd0
[ 686.553392] [<802b2a2c>] gic_compare_interrupt+0x2c/0x40
[ 686.558708] [<80071a48>] handle_percpu_devid_irq+0xc4/0x18c
[ 686.564255] [<8006c6cc>] generic_handle_irq+0x24/0x3c
[ 686.569291] [<80209498>] gic_handle_local_int+0x94/0xd4
[ 686.574492] [<80209654>] gic_irq_dispatch+0x10/0x20
[ 686.579346] [<8006c6cc>] generic_handle_irq+0x24/0x3c
[ 686.584390] [<8000c3a8>] do_IRQ+0x1c/0x34
[ 686.588385] [<80208740>] plat_irq_dispatch+0xb4/0xdc
[ 686.593326] [<8000a908>] except_vec_vi_end+0xb8/0xc4
[ 689.252109] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 6269 jiffies s: 245 root: 0x1/.
[ 689.262138] blocking rcu_node structures:
[ 689.266130] Task dump for CPU 0:
[ 689.269343] swapper/0 R running task 0 0 0 0x00100004
[ 689.276424] Stack : 804c5fe0 804d3f80 00000000 00000000 ffffff80 00000000 00000000 00000340
[ 689.284808] 804d148c 00000001 80470cdc 804e0000 00000001 9bff0000 9bfe8000 9bfe8000
[ 689.293224] 00000000 803dc644 00000000 00000000 804c4000 804c5ea8 00000000 803dc978
[ 689.301789] 804d148c 00000001 80470cdc 804e0000 00000001 80064dac 80530000 804d0000
[ 689.310361] 80530000 803da67c 80530000 804d0000 80530000 8051f324 80530000 804f8bac
[ 689.318776] ...
[ 689.321222] Call Trace:
[ 689.323726] [<803dc408>] __schedule+0x5d4/0x7a4
[ 689.328280] [<803dc644>] schedule+0x6c/0x84
[ 689.332524] [<803dc978>] schedule_preempt_disabled+0x10/0x1c
[ 689.338220] [<80064dac>] cpu_startup_entry+0x15c/0x170
[ 689.343434] [<804f8bac>] start_kernel+0x444/0x464

@openwrt-bot
Copy link
Author

elphidium:

having the same problem
dir-860b1

OpenWrt SNAPSHOT r5938-6f425a28a4 / LuCI Master (git-18.023.74248-ee409b6)

Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.048845] ------------[ cut here ]------------
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.058089] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x254/0x2f4
Wed Jan 24 22:53:22 2018 kern.info kernel: [ 7437.074592] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.088469] Modules linked in: pppoe ppp_async uvcvideo pppox ppp_generic nf_conntrack_ipv6 mt76x2e mt76 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY videobuf2_v4l2 slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack_netlink iptable_mangle iptable_filter ipt_ECN ip_tables input_core crc_ccitt compat sch_cake nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.229054] sch_hfsc sch_ingress videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev ledtrig_usbport xt_set ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb dma_shared_buffer leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_button_hotplug usbcore nls_base usb_common
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.340615] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.77 #0
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.352378] Stack : 00000000 00000000 80527b4a 00000033 803e79e4 00000000 00000000 80520000
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.369016] 804c81bc 804c7da7 80460564 00000000 00000000 80523824 ffffffff 00000200
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.385651] 00200000 8006a9e8 00000001 80520000 804cdec4 804cdec8 80465170 87c0ddcc
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.402289] 00000003 800a78a0 ffffffff 00000200 00200000 00000000 00000006 00c0ddcc
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.418924] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.435558] ...
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.440414] Call Trace:
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.445287] [<8000f714>] show_stack+0x54/0x88
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.453953] [<801da8fc>] dump_stack+0x8c/0xd0
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.462612] [<8002adc4>] __warn+0xe4/0x118
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.470750] [<8002ae28>] warn_slowpath_fmt+0x30/0x3c
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.480622] [<80303968>] dev_watchdog+0x254/0x2f4
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.489973] [<8007ce78>] call_timer_fn.isra.3+0x24/0x84
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.500358] [<8007d0bc>] run_timer_softirq+0x1e4/0x244
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.510575] [<8002e924>] __do_softirq+0x294/0x2e0
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.519922] [<8002ec18>] irq_exit+0x7c/0x98
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.528236] [<80209750>] plat_irq_dispatch+0xb4/0xdc
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.538103] [<8000a968>] except_vec_vi_end+0xb8/0xc4
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.547989] [<80064134>] cpu_startup_entry+0x104/0x170
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.558222] [<804edbe4>] start_kernel+0x47c/0x49c
Wed Jan 24 22:53:22 2018 kern.warn kernel: [ 7437.567725] ---[ end trace 37648a34027f878d ]---
Wed Jan 24 22:53:22 2018 kern.err kernel: [ 7437.576938] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Wed Jan 24 22:53:22 2018 kern.info kernel: [ 7437.589281] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
Wed Jan 24 22:53:22 2018 kern.info kernel: [ 7437.601275] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=06e0e000, max=512, ctx=428, dtx=428, fdx=427, next=428
Wed Jan 24 22:53:22 2018 kern.info kernel: [ 7437.622591] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=06e14000, max=512, calc=60, drx=61

[ 7437.048845] ------------[ cut here ]------------
[ 7437.058089] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x254/0x2f4
[ 7437.074592] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
[ 7437.088469] Modules linked in: pppoe ppp_async uvcvideo pppox ppp_generic nf_conntrack_ipv6 mt76x2e mt76 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY videobuf2_v4l2 slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack_netlink iptable_mangle iptable_filter ipt_ECN ip_tables input_core crc_ccitt compat sch_cake nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb
[ 7437.229054] sch_hfsc sch_ingress videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev ledtrig_usbport xt_set ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb dma_shared_buffer leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_button_hotplug usbcore nls_base usb_common
[ 7437.340615] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.77 #0
[ 7437.352378] Stack : 00000000 00000000 80527b4a 00000033 803e79e4 00000000 00000000 80520000
[ 7437.369016] 804c81bc 804c7da7 80460564 00000000 00000000 80523824 ffffffff 00000200
[ 7437.385651] 00200000 8006a9e8 00000001 80520000 804cdec4 804cdec8 80465170 87c0ddcc
[ 7437.402289] 00000003 800a78a0 ffffffff 00000200 00200000 00000000 00000006 00c0ddcc
[ 7437.418924] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 7437.435558] ...
[ 7437.440414] Call Trace:
[ 7437.445287] [<8000f714>] show_stack+0x54/0x88
[ 7437.453953] [<801da8fc>] dump_stack+0x8c/0xd0
[ 7437.462612] [<8002adc4>] __warn+0xe4/0x118
[ 7437.470750] [<8002ae28>] warn_slowpath_fmt+0x30/0x3c
[ 7437.480622] [<80303968>] dev_watchdog+0x254/0x2f4
[ 7437.489973] [<8007ce78>] call_timer_fn.isra.3+0x24/0x84
[ 7437.500358] [<8007d0bc>] run_timer_softirq+0x1e4/0x244
[ 7437.510575] [<8002e924>] __do_softirq+0x294/0x2e0
[ 7437.519922] [<8002ec18>] irq_exit+0x7c/0x98
[ 7437.528236] [<80209750>] plat_irq_dispatch+0xb4/0xdc
[ 7437.538103] [<8000a968>] except_vec_vi_end+0xb8/0xc4
[ 7437.547989] [<80064134>] cpu_startup_entry+0x104/0x170
[ 7437.558222] [<804edbe4>] start_kernel+0x47c/0x49c
[ 7437.567725] ---[ end trace 37648a34027f878d ]---
[ 7437.576938] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[ 7437.589281] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[ 7437.601275] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=06e0e000, max=512, ctx=428, dtx=428, fdx=427, next=428
[ 7437.622591] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=06e14000, max=512, calc=60, drx=61

@openwrt-bot
Copy link
Author

nbd:

Please try the latest version

@openwrt-bot
Copy link
Author

camel:

problem still exists ..

@openwrt-bot
Copy link
Author

nbd:

Should be fixed now, please test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant