OpenWrt/LEDE Project

  • Status Closed
  • Percent Complete
    100%
  • Task Type Bug Report
  • Category Base system
  • Assigned To No-one
  • Operating System All
  • Severity Medium
  • Priority Very Low
  • Reported Version lede-17.01
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Karoly Kovacs - 08.03.2017
Last edited by Yousong Zhou - 01.09.2017

FS#606 - Stall on Mediatek

On D-Link DIR-860L B1, processor MediaTek MT7621
LEDE 17.01.0

The system used connected to WAN and wifi is used (ac and n). WDS is also used (as server) and SQM is used on WAN. Only the WAN switch port is used.

After random time wifi is disconnected and connected again, can be seen in the log as: WARNING: CPU: 1 PID: 0 at net/core/skbuff.c:4194. However, after some time, which can be minutes or days wifi totally disconnects, and even DHCP does nor work on the LAN ports. The log shows: INFO: rcu_sched self-detected stall on CPU. Only a reboot can help. Sometimes the device reboots by itself.

The above could not be linked to any event (wifi connection, change in sqm settings, etc).

Switched back to Openwrt 15.05.1.

A similar issue was reported here: External Link

   dmesg.txt (192.2 KiB)
Closed by  Yousong Zhou
01.09.2017 11:08
Reason for closing:  Fixed
Karoly Kovacs commented on 08.03.2017 16:11

Similar issue reported also here: External Link, relayd, rp_filter is suspected there

Project Manager
Felix Fietkau commented on 08.03.2017 17:55

Please make a build with CONFIG_KERNEL_KALLSYMS enabled and post a fresh log

Karoly Kovacs commented on 08.03.2017 20:28

Hi Felix,

I would love to do that, but, unfortunately, making new builds is way beyond my abilities. I'm just a user. Sorry for that.

Roger commented on 09.03.2017 09:57

Hi Karoly,

I've compiled a sysupgrade image for you with CONFIG_KERNEL_KALLSYMS. Please find it at any of the following links:

Karoly Kovacs commented on 09.03.2017 10:32

Hi Roger,

You were quick, I'm impressed. I'll do the testing this evening and will come back to you.

Thanks a lot,

Karoly Kovacs commented on 10.03.2017 06:01

This is the 1st log, I'll submit further bugs:

[   33.520000] WARNING: CPU: 0 PID: 0 at net/core/skbuff.c:4194 skb_try_coalesce+0x22c/0x36c()
[   33.530000] Modules linked in: pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt sch_cake nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress mt76x2e mt7603e ledtrig_usbport mt76 mac80211 cfg80211 compat ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_button_hotplug usbcore nls_base usb_common
[   33.710000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.52 #0
[   33.710000] Stack : 00000000 00000000 804c6862 00000033 00000000 00000000 80470000 804e0000
[   33.710000] 	  8046a064 80469c83 803e4c74 00000000 00000000 804c367c 804837a0 804fade0
[   33.710000] 	  80466880 80063f7c 80470000 804e0000 8046e198 8046e19c 803e9588 87c0dbdc
[   33.710000] 	  00000003 80061cc8 804837a0 804fade0 80466880 00000000 00000000 00c0dbdc
[   33.710000] 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   33.710000] 	  ...
[   33.710000] Call Trace:
[   33.710000] [<8001684c>] show_stack+0x50/0x84
[   33.710000] [<801b4068>] dump_stack+0x84/0xbc
[   33.710000] [<8002bf50>] warn_slowpath_common+0xa0/0xd0
[   33.710000] [<8002c008>] warn_slowpath_null+0x18/0x24
[   33.710000] [<80290988>] skb_try_coalesce+0x22c/0x36c
[   33.710000] [<802ecf64>] tcp_try_coalesce+0x4c/0xd0
[   33.710000] [<802ed048>] tcp_queue_rcv+0x60/0x12c
[   33.710000] [<802f1d28>] tcp_rcv_established+0x4e4/0x61c
[   33.710000] [<802f9bf8>] tcp_v4_do_rcv+0xa0/0x1f8
[   33.710000] [<802fc104>] tcp_v4_rcv+0x6c0/0xa28
[   33.710000] [<802d941c>] ip_local_deliver_finish+0xd4/0x134
[   33.710000] [<802d9988>] ip_local_deliver+0xd0/0xe8
[   33.710000] [<802d9c54>] ip_rcv+0x2b4/0x320
[   33.710000] [<802a15dc>] __netif_receive_skb_core+0x6f0/0x914
[   33.710000] [<802a31e8>] process_backlog+0xac/0x184
[   33.710000] [<802a2f8c>] net_rx_action+0x130/0x2e0
[   33.710000] [<8002eb64>] __do_softirq+0x294/0x2e0
[   33.710000] [<8002ee4c>] irq_exit+0x78/0x94
[   33.710000] [<801de42c>] plat_irq_dispatch+0xb4/0xdc
[   33.710000] 
[   33.980000] ---[ end trace f57fba055f35649b ]---
Karoly Kovacs commented on 10.03.2017 06:24
[   52.700000] WARNING: CPU: 3 PID: 1980 at net/core/skbuff.c:4194 skb_try_coalesce+0x22c/0x36c()
[   52.720000] Modules linked in: pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt sch_cake nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress mt76x2e mt7603e ledtrig_usbport mt76 mac80211 cfg80211 compat ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_button_hotplug usbcore nls_base usb_common
[   52.890000] CPU: 3 PID: 1980 Comm: uhttpd Not tainted 4.4.52 #0
[   52.910000] Stack : 00000000 00000000 804c6862 00000033 00000000 00000000 80470000 804e0000
[   52.910000] 	  877918ec 80469c83 803e4c74 00000003 000007bc 804c367c 804837a0 804fade0
[   52.910000] 	  80466880 80063f7c 80470000 804e0000 8046e198 8046e19c 803e9588 87c21bdc
[   52.910000] 	  00000003 80061cc8 804837a0 804fade0 80466880 00000000 00000000 00c21bdc
[   52.910000] 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   52.910000] 	  ...
[   52.980000] Call Trace:
[   52.980000] [<8001684c>] show_stack+0x50/0x84
[   52.990000] [<801b4068>] dump_stack+0x84/0xbc
[   53.000000] [<8002bf50>] warn_slowpath_common+0xa0/0xd0
[   53.010000] [<8002c008>] warn_slowpath_null+0x18/0x24
[   53.020000] [<80290988>] skb_try_coalesce+0x22c/0x36c
[   53.030000] [<802ecf64>] tcp_try_coalesce+0x4c/0xd0
[   53.040000] [<802ed048>] tcp_queue_rcv+0x60/0x12c
[   53.050000] [<802f1d28>] tcp_rcv_established+0x4e4/0x61c
[   53.060000] [<802f9bf8>] tcp_v4_do_rcv+0xa0/0x1f8
[   53.070000] [<802fc104>] tcp_v4_rcv+0x6c0/0xa28
[   53.080000] [<802d941c>] ip_local_deliver_finish+0xd4/0x134
[   53.090000] [<802d9988>] ip_local_deliver+0xd0/0xe8
[   53.100000] [<802d9c54>] ip_rcv+0x2b4/0x320
[   53.110000] [<802a15dc>] __netif_receive_skb_core+0x6f0/0x914
[   53.120000] [<802a31e8>] process_backlog+0xac/0x184
[   53.130000] [<802a2f8c>] net_rx_action+0x130/0x2e0
[   53.140000] [<8002eb64>] __do_softirq+0x294/0x2e0
[   53.150000] [<8002ee4c>] irq_exit+0x78/0x94
[   53.160000] [<801de42c>] plat_irq_dispatch+0xb4/0xdc
[   53.160000] 
[   53.170000] ---[ end trace cea7224c49ce0bb3 ]---
Karoly Kovacs commented on 10.03.2017 06:38

under heavy load, the system crashes and reboots

Karoly Kovacs commented on 10.03.2017 08:02

it's still the same. I have the skbuff problem, what you can see above, just 1-2 minutes after boot. Under load (iperf3 or heavy torrent) the router crashes and reboots, I have no chance to look at the DMESG or syslog.

Project Manager
Baptiste Jonglez commented on 10.03.2017 09:12

Karoly, you can try looking at /sys/kernel/debug/crashlog after a crash-reboot, it might hold an interesting stack trace.

Karoly Kovacs commented on 10.03.2017 17:10

Hi Baptiste,

Thanks for the tip, I didn't know that.

I checked the directory. Unfortunately, there was no crashlog file in the /sys/kernel/debug/ directory. I waited for 3 crashes, no results. I also did echo c >/proc/sysrq-trigger to check, whether crashlog settings are OK. This method generated a crash file, but the bug unfortunately, not.

Do you have any other idea to get a crashlog file?

Thanks,

Karoly Kovacs commented on 20.03.2017 23:24

Hi Baptiste,

I could copy-paste this:

[  329.570000] WARNING: CPU: 3 PID: 0 at net/core/skbuff.c:4194 skb_try_coalesce+0x22c/0x36c()
[  329.580000] Modules linked in: pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt sch_cake nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress mt76x2e mt7603e ledtrig_usbport mt76 mac80211 cfg80211 compat ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd gpio_button_hotplug usbcore nls_base usb_common
[  329.760000] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.52 #0
[  329.770000] Stack : 00000000 00000000 804c6862 00000033 00000000 00000000 80470000 804e0000
[  329.770000] 	  87c4bf6c 80469c83 803e4c74 00000003 00000000 804c367c 804837a0 804fade0
[  329.770000] 	  80466880 80063f7c 80470000 804e0000 8046e198 8046e19c 803e9588 87c21bdc
[  329.770000] 	  00000003 80061cc8 804837a0 804fade0 80466880 00000000 00000000 00c21bdc
[  329.770000] 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  329.770000] 	  ...
[  329.840000] Call Trace:
[  329.850000] [<8001684c>] show_stack+0x50/0x84
[  329.860000] [<801b4068>] dump_stack+0x84/0xbc
[  329.860000] [<8002bf50>] warn_slowpath_common+0xa0/0xd0
[  329.880000] [<8002c008>] warn_slowpath_null+0x18/0x24
[  329.890000] [<80290988>] skb_try_coalesce+0x22c/0x36c
[  329.900000] [<802ecf64>] tcp_try_coalesce+0x4c/0xd0
[  329.910000] [<802ed048>] tcp_queue_rcv+0x60/0x12c
[  329.910000] [<802f1d28>] tcp_rcv_established+0x4e4/0x61c
[  329.930000] [<802f9bf8>] tcp_v4_do_rcv+0xa0/0x1f8
[  329.930000] [<802fc104>] tcp_v4_rcv+0x6c0/0xa28
[  329.940000] [<802d941c>] ip_local_deliver_finish+0xd4/0x134
[  329.950000] [<802d9988>] ip_local_deliver+0xd0/0xe8
[  329.960000] [<802d9c54>] ip_rcv+0x2b4/0x320
[  329.970000] [<802a15dc>] __netif_receive_skb_core+0x6f0/0x914
[  329.980000] [<802a31e8>] process_backlog+0xac/0x184
[  329.990000] [<802a2f8c>] net_rx_action+0x130/0x2e0
[  330.000000] [<8002eb64>] __do_softirq+0x294/0x2e0
[  330.010000] [<8002ee4c>] irq_exit+0x78/0x94
[  330.020000] [<801de42c>] plat_irq_dispatch+0xb4/0xdc
[  330.030000] 
[  330.030000] ---[ end trace 8bd5e0303178b765 ]---
[ 1137.390000] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1137.400000] 	2-...: (0 ticks this GP) idle=5f8/0/0 softirq=76940/76940 fqs=0 
[ 1137.410000] 	(detected by 3, t=6004 jiffies, g=32609, c=32608, q=4661)
[ 1137.420000] Task dump for CPU 2:
[ 1137.430000] swapper/2       R running      0     0      1 0x00100000
[ 1137.440000] Stack : 00000000 001c5bc4 00000108 05080000 00000000 18c45114 804b72a4 80460000
[ 1137.440000] 	  8046675c 00000001 00000000 804665e0 80466724 80460000 00000000 800134fc
[ 1137.440000] 	  1100fc03 00000002 87c72000 87c73ec0 80460000 8005dc48 1100fc03 00000002
[ 1137.440000] 	  00000000 80460000 804b72a4 8005dc40 80460000 8001b0c8 1100fc03 00000000
[ 1137.440000] 	  00000004 804664a0 000000a0 8001b0d0 00000000 00000000 00000000 00000000
[ 1137.440000] 	  ...
[ 1137.510000] Call Trace:
[ 1137.520000] [<8000bdb0>] __schedule+0x574/0x758
[ 1137.530000] [<800134fc>] r4k_wait_irqoff+0x0/0x20
[ 1137.540000] 
[ 1137.540000] rcu_sched kthread starved for 6017 jiffies! g32609 c32608 f0x0 s3 ->state=0x1
Karoly Kovacs commented on 20.03.2017 23:25
[ 1317.440000] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1317.450000] 	2-...: (0 ticks this GP) idle=997/1/0 softirq=76940/76940 fqs=1 
[ 1317.460000] 	(detected by 1, t=24009 jiffies, g=32609, c=32608, q=19485)
[ 1317.470000] Task dump for CPU 2:
[ 1317.480000] swapper/2       R running      0     0      1 0x00100000
[ 1317.490000] Stack : 00000000 00000001 06f85030 05080000 3fff0000 00000000 804b72a4 80460000
[ 1317.490000] 	  8046675c 00000001 00000000 804665e0 80466724 80460000 40000000 800134fc
[ 1317.490000] 	  1100fc03 00000002 87c72000 87c73ec0 80460000 8005dc48 1100fc03 00000002
[ 1317.490000] 	  00000000 80460000 c0800400 80013514 80460000 8001b0c8 1100fc03 00000000
[ 1317.490000] 	  00000004 804664a0 000000a0 8001b0d0 00000000 00000000 00000000 00000000
[ 1317.490000] 	  ...
[ 1317.560000] Call Trace:
[ 1317.570000] [<8000bdb0>] __schedule+0x574/0x758
[ 1317.580000] [<800134fc>] r4k_wait_irqoff+0x0/0x20
[ 1317.590000] 
[ 1317.590000] rcu_sched kthread starved for 18003 jiffies! g32609 c32608 f0x0 s3 ->state=0x1
[ 1377.630000] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1377.640000] 	2-...: (0 ticks this GP) idle=117/1/0 softirq=76940/76940 fqs=0 
[ 1377.650000] 	(detected by 3, t=6004 jiffies, g=32610, c=32609, q=23765)
[ 1377.660000] Task dump for CPU 2:
[ 1377.670000] swapper/2       R running      0     0      1 0x00100000
[ 1377.680000] Stack : 00000000 00000001 05eaf910 05080000 3fff0000 00000000 804b72a4 80460000
[ 1377.680000] 	  8046675c 00000001 00000000 804665e0 80466724 80460000 40000000 800134fc
[ 1377.680000] 	  1100fc03 00000002 87c72000 87c73ec0 80460000 8005dc48 1100fc03 00000002
[ 1377.680000] 	  00000000 80460000 804b72a4 8005dc40 80460000 8001b0c8 1100fc03 00000000
[ 1377.680000] 	  00000004 804664a0 000000a0 8001b0d0 00000000 00000000 00000000 00000000
[ 1377.680000] 	  ...
[ 1377.750000] Call Trace:
[ 1377.760000] [<8000bdb0>] __schedule+0x574/0x758
[ 1377.770000] [<800134fc>] r4k_wait_irqoff+0x0/0x20
[ 1377.780000] 
[ 1377.780000] rcu_sched kthread starved for 6017 jiffies! g32610 c32609 f0x0 s3 ->state=0x1
Karoly Kovacs commented on 20.03.2017 23:31

there is still no crashlog, the above test was done on the version provided (LEDE Reboot 17.01-SNAPSHOT r3276-4a405ac8f9 / LuCI lede-17.01 branch (git-17.063.59066-a5191ef)) with SQM cake switched on on WAN. I attached the dmesg output.

   dmesg.txt (23.8 KiB)
Karoly Kovacs commented on 20.03.2017 23:34

Another interesting symptom was that on snapshot r3276 the ac wifi speed was smaller lower than on 17.01 release (80-100 Mbit/s versus 220-300 Mbit/s)

Karoly Kovacs commented on 20.03.2017 23:45

a few minutes after the above stalls the rooter rebooted, no crashlog found in /sys/kernel/debug/ directory:

root@LEDE:~# ls /sys/kernel/debug/ -l
drwxr-xr-x   12 root     root             0 Jan  1  1970 bdi
-r--r--r--    1 root     root             0 Jan  1  1970 bootrom
drwxr-xr-x    4 root     root             0 Jan  1  1970 clk
-rw-r--r--    1 root     root             0 Jan  1  1970 fault_around_bytes
-r--r--r--    1 root     root             0 Jan  1  1970 gpio
drwxr-xr-x    4 root     root             0 Mar 21 00:36 ieee80211
drwxr-xr-x    2 root     root             0 Jan  1  1970 mips
drwxr-xr-x    3 root     root             0 Jan  1  1970 pinctrl
drwxr-xr-x    2 root     root             0 Jan  1  1970 pm_qos
drwxr-xr-x    2 root     root             0 Jan  1  1970 regmap
-r--r--r--    1 root     root             0 Jan  1  1970 sleep_time
drwxr-xr-x    2 root     root             0 Jan  1  1970 ubi
drwxr-xr-x    2 root     root             0 Jan  1  1970 ubifs
drwxr-xr-x    2 root     root             0 Jan  1  1970 usb
Project Manager
Baptiste Jonglez commented on 25.08.2017 13:46

This has been fixed, see  FS#804 

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing