OpenWrt/LEDE Project

  • Status Unconfirmed
  • Percent Complete
    0%
  • Task Type Bug Report
  • Category Kernel
  • Assigned To No-one
  • Operating System All
  • Severity Critical
  • Priority Very Low
  • Reported Version All
  • Due in Version Undecided
  • Due Date Undecided
  • Votes 1
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Kristian Evensen - 15.11.2017

FS#1170 - mt7621: kernel errors - rcu_sched detected stalls on CPUs/tasks - again

After the work done for issue  FS#804 , the rcu_sched error seemed to be gone. However, I am now starting to see it again. Usually, at least for me, the error happens when there is large amounts of traffic and I do something with the network. My most reliable way for reproducing the error is as follows:

- Use iperf to flood a router with small packets. Other ways to stress the CPU also work, I for example triggered the error when I added very aggressive logging to the firewall.
- While the router is being flooded, I restarted networking (I am logged in to the router via UART).
- After a couple of network restarts, the error is trigger and the following is written to syslog at some interval:

[ 2251.870000] INFO: rcu_bh detected stalls on CPUs/tasks:
[ 2251.870000] 2-...: (1 GPs behind) idle=ae1/140000000000001/0 softirq=212487/217796 fqs=4380
[ 2251.870000] (detected by 1, t=6002 jiffies, g=-146, c=-147, q=4)
[ 2251.870000] Task dump for CPU 2:
[ 2251.870000] openvpn-mover.s R running 0 2598 1 0×08100004 [ 2251.870000] Stack : 8fa69998 800ebe38 00000000 8fa69998 57512e2b 000001fd 00000000 80035454
[ 2251.870000] 00000000 800edbd4 8fa69998 804b0000 00000000 00000000 00000004 00000000
[ 2251.870000] 00000000 8ea17850 8efc7ec0 800376d4 00000000 00000000 778b8930 00000012
[ 2251.870000] 00000000 004077cd 778d4000 00000000 778d55e8 778d6f7c 00000000 8002b280
[ 2251.870000] ffbffeff ffffffff 00617772 706d742f 00000000 00000000 00000001 800379dc
[ 2251.870000] ...
[ 2251.870000] Call Trace:
[ 2251.870000] [<8000bc88>] __schedule+0×574/0×758

I am also able to sometimes trigger the issue by simply issuing the reboot-command (while the CPU is stressed). I have not applied any traffic shaping to my interface, and I see the error both with kernel 4.4 and 4.9 (i.e., LEDE 17.01 and master). I don’t quite know how to progress in debugging this.

 


Kristian Evensen commented on 15.11.2017 20:26

Here is a more detailed trace. Trace was triggered by a network restart during heavy load:

[ 686.452081] INFO: rcu_sched self-detected stall on CPU
[ 686.457237] 0-...: (5999 ticks this GP) idle=b89/140000000000002/0 softirq=54653/54653 fqs=2895
[ 686.466064] (t=6000 jiffies g=20069 c=20068 q=101897)
[ 686.471268] Task dump for CPU 0:
[ 686.474477] swapper/0 R running task 0 0 0 0x00100004
[ 686.481498] Stack : 00000000 8006b3cc 00000000 00000000 00000000 8006b3cc 0000001d 00000006
[ 686.489843] 00000006 80471234 00000000 00000000 00000000 00000000 80530000 00000014
[ 686.498186] 00000000 804d3f80 804d1490 804d0000 804d1490 00000000 00000000 00000000
[ 686.506529] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 686.514871] 00000000 00000000 00000000 00000000 00000000 800a7d00 8fc0dc94 00000000
[ 686.523214] ...
[ 686.525650] Call Trace:
[ 686.528115] [<8000f7b4>] show_stack+0x54/0x88
[ 686.532458] [<800a7fc8>] rcu_dump_cpu_stacks+0xdc/0x110
[ 686.537686] [<8007b7a8>] rcu_check_callbacks+0x2cc/0x7c4
[ 686.542977] [<8007dcec>] update_process_times+0x34/0x70
[ 686.548186] [<8008d6d8>] tick_handle_periodic+0x34/0xd0
[ 686.553392] [<802b2a2c>] gic_compare_interrupt+0x2c/0x40
[ 686.558708] [<80071a48>] handle_percpu_devid_irq+0xc4/0x18c
[ 686.564255] [<8006c6cc>] generic_handle_irq+0x24/0x3c
[ 686.569291] [<80209498>] gic_handle_local_int+0x94/0xd4
[ 686.574492] [<80209654>] gic_irq_dispatch+0x10/0x20
[ 686.579346] [<8006c6cc>] generic_handle_irq+0x24/0x3c
[ 686.584390] [<8000c3a8>] do_IRQ+0x1c/0x34
[ 686.588385] [<80208740>] plat_irq_dispatch+0xb4/0xdc
[ 686.593326] [<8000a908>] except_vec_vi_end+0xb8/0xc4
[ 689.252109] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 6269 jiffies s: 245 root: 0x1/.
[ 689.262138] blocking rcu_node structures:
[ 689.266130] Task dump for CPU 0:
[ 689.269343] swapper/0 R running task 0 0 0 0x00100004
[ 689.276424] Stack : 804c5fe0 804d3f80 00000000 00000000 ffffff80 00000000 00000000 00000340
[ 689.284808] 804d148c 00000001 80470cdc 804e0000 00000001 9bff0000 9bfe8000 9bfe8000
[ 689.293224] 00000000 803dc644 00000000 00000000 804c4000 804c5ea8 00000000 803dc978
[ 689.301789] 804d148c 00000001 80470cdc 804e0000 00000001 80064dac 80530000 804d0000
[ 689.310361] 80530000 803da67c 80530000 804d0000 80530000 8051f324 80530000 804f8bac
[ 689.318776] ...
[ 689.321222] Call Trace:
[ 689.323726] [<803dc408>] __schedule+0x5d4/0x7a4
[ 689.328280] [<803dc644>] schedule+0x6c/0x84
[ 689.332524] [<803dc978>] schedule_preempt_disabled+0x10/0x1c
[ 689.338220] [<80064dac>] cpu_startup_entry+0x15c/0x170
[ 689.343434] [<804f8bac>] start_kernel+0x444/0x464

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing