New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FS#126 - kernel panic on brcm47xx (netgear wgt634u) when routing at a high speed #5199
Comments
russell:
I turned off the stock firewall, and used a minimal masquerade rule in its place:
|
russell: I tried installing my oldest OpenWrt build for WGT634U, a KAMIKAZE r12893, kernel 2.6.25.17, built 2008-10-09, and it panics under the same test conditions. The trace lacks kernel symbols, so it's probably not worth pasting, but this implies it is not new behavior, just that upstream bandwidth has increased to the point the problem is more visible these days. |
bjonglez: Does the problem still exist with 17.01-rc2? |
rnemec: The problem still exists in 17.01.0 (Asus WL-500gP v2, image https://downloads.lede-project.org/releases/17.01.0/targets/brcm47xx/legacy/lede-17.01.0-r3205-59508e3-brcm47xx-legacy-asus-wl-500gp-v2-squashfs.trx). :-( |
chaseadam: Seeing similar behavior. Can reliably reproduce when using dslreports speedtest. Reproduced on multiple wgt634u devices. I can send a spare device to someone if it would help. Alternatively, can someone point me to how to get a useful backtrace? I am familiar with custom openwrt builds and kernel builds. |
russell: FWIW, as of OpenWrt SNAPSHOT, r5484-69d22a6bf6, I managed to get through an entire tarball, 96.1MB in 44 seconds, on the order of 20Mbps, with no panic. |
kizmoo: Hello. I have a similar problem. My patient: OpenWrt 18.06.2, r7676-cddd7b4c77 on the wgt634u router.
| |.-----.-----.-----.| | | |.----.| |_ | - || _ | -| || | | || || | |_____|| |||||___||| |____| || W I R E L E S S F R E E D O M |
ynezz: I think, that in order to move on with this outstanding issue, you need to reproduce the problem with latest upstream kernel(or at least use minimum of patches on top of upstream kernel) and report the problem upstream to get some help. |
kizmoo: Hi, latest upstream kernel ? Bug is reproduce with latest stable release 18.06.2 with updates. |
ynezz:
Yes
I can see that, but I can't do much about it. The bug was reported 30.08.2016 and still isn't fixed for some reason, so if I were you, I would simply try to reproduce it with latest upstream kernel and ask for help on the appropriate kernel development mailing list. |
2raghu: If anyone can send me the spare wgt634u router, I will reproduce and work on this bug. Is this specific to wgt634u? Has anyone observed the issue on other routers? Thanks. |
This kernel relates to a very old kernel, please comment if this still affects you. |
russell:
With LEDE version reboot-1444-g1bb914d, when pulling data through the WAN interface to the LAN interface at a sufficiently high speed, e.g. on a Raspberry Pi connected by ethernet to a LAN port, and the WAN interface connected to a gigabit internet service and issuing a command like:
curl https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.7.2.tar.xz > /dev/null
From the pi, I see something like:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
9 86.2M 9 7977k 0 0 1834k 0 0:00:48 0:00:04 0:00:44 1835k[ 352.295252] smsc95xx 1-1.1:1.0 eth0: link down
9 86.2M 9 7977k 0 0 1255k 0 0:01:10 0:00:06 0:01:04 1155k[ 353.959380] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
9 86.2M 9 7977k 0 0 43292 0 0:34:48 0:03:08 0:31:40 0
or
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
8 86.2M 8 7440k 0 0 1478k 0 0:00:59 0:00:05 0:00:54 1478k[ 1100.164849] smsc95xx 1-1.1:1.0 eth0: link down
8 86.2M 8 7440k 0 0 1232k 0 0:01:11 0:00:06 0:01:05 1300k[ 1101.836954] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
8 86.2M 8 7440k 0 0 740k 0 0:01:59 0:00:10 0:01:49 0
Where the "link down" is the WGT634U panic'ing and rebooting. The panic on the WGT634U looks like this:
[ 171.966841] CPU 0 Unable to handle kernel paging request at virtual address 008224d8, epc == 80077ecc, ra == 801d8100
[ 171.977700] Oops[#1]:
[ 171.980093] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.1.20 #0
[ 171.986265] task: 8181e008 ti: 8182a000 task.ti: 8182a000
[ 171.991711] $ 0 : 00000000 1000b800 008224d8 b41c479b
[ 171.997140] $ 4 : 008224d8 00010000 80361ef8 00000000
[ 172.002557] $ 8 : 81018b14 8101da14 00100100 dceb27b8
[ 172.007983] $12 : ffffffff 00000001 ffffff80 000042c6
[ 172.013400] $16 : b41c479b 00000001 81b450a8 81b7d260
[ 172.018817] $20 : 803b5dcc 00000002 00000008 0000000a
[ 172.024235] $24 : 00000000 80072b24
[ 172.029653] $28 : 8182a000 8182bdb8 00000100 801d8100
[ 172.035083] Hi : 00000000
[ 172.038026] Lo : 0000006c
[ 172.041036] epc : 80077ecc put_compound_page+0x78/0x240
[ 172.046547] ra : 801d8100 skb_release_data+0xa8/0x10c
[ 172.051920] Status: 1000b803 KERNEL EXL IE
[ 172.056254] Cause : 00800008
[ 172.059194] BadVA : 008224d8
[ 172.062144] PrId : 00029007 (Broadcom BMIPS3300)
[ 172.066891] Modules linked in: pppoe ppp_async iptable_nat ath5k ath pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 mac80211 ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_id xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt compat ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables leds_gpio ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common ssb_hcd
[ 172.132913] Process ksoftirqd/0 (pid: 3, threadinfo=8182a000, task=8181e008, tls=00000000)
[ 172.141213] Stack : 1000b803 00000000 0000006c b41c479b 81b45080 801d8100 00000000 0000012c
00000001 801e7d60 81becb60 81b7d260 81b7d260 80335b3c 81bece60 801d818c
80360000 803623e0 81becb60 80335b3c 803623e0 801e63d8 8182be10 8182be10
00000001 00000002 803b5dd0 00000001 803b5dd4 00000003 803b0000 80024fec
81841048 80361100 819bb770 80007664 80364510 8181e008 80362898 04208040
...
[ 172.178058] Call Trace:
[ 172.180640] [<80077ecc>] put_compound_page+0x78/0x240
[ 172.185822] [<801d8100>] skb_release_data+0xa8/0x10c
[ 172.190895] [<801d818c>] __kfree_skb+0x28/0xb4
[ 172.195466] [<801e63d8>] net_tx_action+0xd8/0x140
[ 172.200329] [<80024fec>] __do_softirq+0x184/0x2b0
[ 172.205184] [<80025140>] run_ksoftirqd+0x28/0x80
[ 172.209953] [<8003bae0>] smpboot_thread_fn+0x148/0x178
[ 172.215246] [<80039390>] kthread+0xdc/0xe8
[ 172.219459] [<800010a8>] ret_from_kernel_thread+0x14/0x1c
[ 172.224921]
[ 172.226470]
Code: 30840001 0204100a 00402021 <8c420000> 000211c2 30420001 10400018 00000000 8e020000
[ 172.237022] ---[ end trace 89a3318b662df6d8 ]---
[ 172.250536] Kernel panic - not syncing: Fatal exception in interrupt
[ 172.262338] Rebooting in 3 seconds..
or
[ 1317.958261] Unhandled kernel unaligned access[#1]:
[ 1317.963173] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.1.20 #0
[ 1317.969349] task: 8181e008 ti: 8182a000 task.ti: 8182a000
[ 1317.974795] $ 0 : 00000000 1000b801 00000001 00200000
[ 1317.980232] $ 4 : 647b394a 00010000 00018da4 00000000
[ 1317.985658] $ 8 : 8181e040 b1e1b104 00000017 40000000
[ 1317.991085] $12 : 500018dd 00000000 00000000 010102f2
[ 1317.996510] $16 : 80e4d1e0 00000001 80e4d208 81acb0e0
[ 1318.001936] $20 : 803b5dcc 00000002 00000008 0000000a
[ 1318.007362] $24 : 00000010 8001ead0
[ 1318.012789] $28 : 8182a000 8182bdd0 00000100 801d8100
[ 1318.018220] Hi : 00000001
[ 1318.021160] Lo : 00000001
[ 1318.024169] epc : 80078474 put_page+0x0/0x4c
[ 1318.028732] ra : 801d8100 skb_release_data+0xa8/0x10c
[ 1318.034104] Status: 1000b803 KERNEL EXL IE
[ 1318.038439] Cause : 00800010
[ 1318.041380] BadVA : 647b394a
[ 1318.044330] PrId : 00029007 (Broadcom BMIPS3300)
[ 1318.049076] Modules linked in: pppoe ppp_async iptable_nat ath5k ath pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 mac80211 ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_id xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt compat ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables leds_gpio ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common ssb_hcd
[ 1318.115124] Process ksoftirqd/0 (pid: 3, threadinfo=8182a000, task=8181e008, tls=00000000)
[ 1318.123424] Stack : 00000000 80c60c94 00db35c0 626a006e ff330000 81acb0e0 81acb0e0 80335b3c
80c28260 801d818c 80360000 8004f368 00000018 0000000a 803623e0 801e63d8
8182be10 8182be10 80361100 00000002 803b5dd0 00000001 803b5dd4 00000013
803b0000 80024fec 80364510 80361100 80360000 8000736c 80364510 8181e008
80362898 04208040 00018da4 80360000 80310000 803b5dcc 803615a0 803114f4
...
[ 1318.160295] Call Trace:
[ 1318.162885] [<80078474>] put_page+0x0/0x4c
[ 1318.167108] [<801d8100>] skb_release_data+0xa8/0x10c
[ 1318.172182] [<801d818c>] __kfree_skb+0x28/0xb4
[ 1318.176752] [<801e63d8>] net_tx_action+0xd8/0x140
[ 1318.181614] [<80024fec>] __do_softirq+0x184/0x2b0
[ 1318.186471] [<80025140>] run_ksoftirqd+0x28/0x80
[ 1318.191238] [<8003bae0>] smpboot_thread_fn+0x148/0x178
[ 1318.196530] [<80039390>] kthread+0xdc/0xe8
[ 1318.200744] [<800010a8>] ret_from_kernel_thread+0x14/0x1c
[ 1318.206200]
[ 1318.207747]
Code: 00003021 0801e0ce 24a57888 <8c820000> 3042c000 10400003 00801821 0801df95 00000000
[ 1318.218397] ---[ end trace 2502a626803fb4b9 ]---
[ 1318.231740] Kernel panic - not syncing: Fatal exception in interrupt
[ 1318.243362] Rebooting in 3 seconds..
This looks similar to: https://dev.openwrt.org/ticket/11091
The text was updated successfully, but these errors were encountered: