Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#2928 - TP-Link TL-WDR3600 v1 on kernel 5.4 boot-loops since change to GCC 8.4.0 #8046

Closed
openwrt-bot opened this issue Mar 25, 2020 · 11 comments
Labels

Comments

@openwrt-bot
Copy link

russell:

  • Device problem occurs on

TP-Link TL-WDR3600 v1

  • Software versions of OpenWrt/LEDE release, packages, etc.

Since reboot-12646-gdb70077668 "toolchain: Update GCC 8 to version 8.4.0" and kernel 5.4, WDR3600 boot-loops with the following message:

Starting kernel ...

[ 0.000000] Linux version 5.4.24 (openwrt@hawg) (gcc version 8.4.0 (OpenWrt GCC 8.4.0 r12683-8c33debb52)) #0 Sat Mar 21 21:35:45 2020
[ 0.000000] printk: bootconsole [early0] enabled
[ 0.000000] CPU0 revision is: 0001974c (MIPS 74Kc)
[ 0.000000] MIPS: machine is TP-Link TL-WDR3600 v1
[ 0.000000] SoC: Atheros AR9344 rev 2
[ 0.000000] Initrd not found or empty - disabling initrd
[ 0.000000] Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes.
[ 0.000000] Primary data cache 32kB, 4-way, VIPT, cache aliases, linesize 32 bytes
[ 0.000000] Zone ranges:
[ 0.000000] Normal [mem 0x0000000000000000-0x0000000007ffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000000000-0x0000000007ffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000007ffffff]
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 32480
[ 0.000000] Kernel command line: console=ttyS0,115200 rootfstype=squashfs,jffs2
[ 0.000000] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
[ 0.000000] Writing ErrCtl register=00000000
[ 0.000000] Readback ErrCtl register=00000000
[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.000000] Memory: 122384K/131072K available (4681K kernel code, 187K rwdata, 1080K rodata, 1212K init, 196K bss, 8688K reserved, 0K cma-reserved)
[ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] NR_IRQS: 51
[ 0.000000] random: get_random_bytes called from start_kernel+0x32c/0x51c with crng_init=0
[ 0.000000] CPU clock: 560.000 MHz
[ 0.000000] clocksource: MIPS: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6825930166 ns
[ 0.000009] sched_clock: 32 bits at 280MHz, resolution 3ns, wraps every 7669584382ns
[ 0.008305] Calibrating delay loop... 278.93 BogoMIPS (lpj=1394688)
[ 0.084927] pid_max: default: 32768 minimum: 301
[ 0.089999] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.097796] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.107070] Kernel panic - not syncing: Unexpected DSP exception
[ 0.113470] Rebooting in 1 seconds..

@openwrt-bot
Copy link
Author

realmicu:

I'm experiencing the same issue on Netgear WNDR4300 (SoC: ar9344). Images compiled with previous version 8.3.0 were OK while 8.4.0 produces invalid code. What worked for me was switching GCC from version 8.4.0 to 9.3.0 :

CONFIG_TARGET_ath79=y
CONFIG_TARGET_ath79_nand=y
CONFIG_TARGET_ath79_nand_DEVICE_netgear_wndr4300=y
CONFIG_DEVEL=y
CONFIG_TOOLCHAINOPTS=y
CONFIG_CCACHE=y
CONFIG_COLLECT_KERNEL_DEBUG=y

CONFIG_GCC_USE_VERSION_8 is not set

CONFIG_GCC_USE_VERSION_9=y
CONFIG_GCC_VERSION="9.3.0"
CONFIG_GCC_VERSION_9=y
CONFIG_IMAGEOPT=y
CONFIG_LINUX_5_4=y
CONFIG_TESTING_KERNEL=y

@openwrt-bot
Copy link
Author

sbrown:

Reverting 7000f11c23e23cf11f96 toolchain: Update GCC 8 to version 8.4.0

Fixes the problem on my TP-Link archer a7-v5

@openwrt-bot
Copy link
Author

russell:

Fwiw, this is the .config stub I used while bisecting:

CONFIG_TARGET_ath79=y
CONFIG_TARGET_ath79_generic=y
CONFIG_TARGET_ath79_generic_DEVICE_tplink_tl-wdr3600-v1=y
CONFIG_DEVEL=y
CONFIG_BUILD_LOG=y

CONFIG_BUSYBOX_CONFIG_BRCTL is not set

CONFIG_BUSYBOX_CONFIG_FREE is not set

CONFIG_BUSYBOX_CONFIG_PGREP is not set

CONFIG_BUSYBOX_CONFIG_TOP is not set

CONFIG_BUSYBOX_CONFIG_UPTIME is not set

CONFIG_PACKAGE_6relayd is not set

CONFIG_PACKAGE_firewall is not set

CONFIG_PACKAGE_firewall3 is not set

CONFIG_PACKAGE_iptables-mod-ipopt=y
CONFIG_PACKAGE_iptables-mod-nat-extra=y

CONFIG_PACKAGE_odhcp6c is not set

CONFIG_PACKAGE_ppp is not set

CONFIG_PACKAGE_ppp-mod-pppoe is not set

CONFIG_TESTING_KERNEL=y

@openwrt-bot
Copy link
Author

Hauke:

I can reproduce it on a TP-Link TL-WDR4300 v1 with a AR9344.

It is happening in the save_dsp() function:
https://elixir.bootlin.com/linux/v5.4.28/source/arch/mips/include/asm/dsp.h#L50
which is called by arch_dup_task_struct()
https://elixir.bootlin.com/linux/v5.4.28/source/arch/mips/kernel/process.c#L110

The AR9344 says it supports the DSP extension:

root@OpenWrt:/# cat /proc/cpuinfo
system type : Atheros AR9344 rev 2
machine : TP-Link TL-WDR4300 v1
processor : 0
cpu model : MIPS 74Kc V4.12
BogoMIPS : 278.78
wait instruction : yes
microsecond timers : yes
tlb_entries : 32
extra interrupt vector : yes
hardware watchpoint : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa : mips1 mips2 mips32r1 mips32r2
ASEs implemented : mips16 dsp dsp2
Options implemented : tlb 4kex 4k_cache prefetch mcheck ejtag llsc dc_aliases perf_cntr_intr_bit nan_legacy nan_2008 perf
shadow register sets : 1
kscratch registers : 0
package : 0
core : 0
VCED exceptions : not available
VCEI exceptions : not available

root@OpenWrt:/#

I added this function in between:

void my_save_dsp(void)
{
save_dsp(current);
}

The working assembler for kernel 4.19 looks like this:

80067b40 <my_save_dsp.part.8>:
80067b40: 8f830000 lw v1,0(gp)
80067b44: 00202810 mfhi a1,$ac1
80067b48: 00202012 mflo a0,$ac1
80067b4c: ac65057c sw a1,1404(v1)
80067b50: 8f830000 lw v1,0(gp)
80067b54: 00403810 mfhi a3,$ac2
80067b58: 00403012 mflo a2,$ac2
80067b5c: ac640580 sw a0,1408(v1)
80067b60: 8f830000 lw v1,0(gp)
80067b64: 00602810 mfhi a1,$ac3
80067b68: 00602012 mflo a0,$ac3
80067b6c: ac670584 sw a3,1412(v1)
80067b70: 8f830000 lw v1,0(gp)
80067b74: ac660588 sw a2,1416(v1)
80067b78: 8f830000 lw v1,0(gp)
80067b7c: ac65058c sw a1,1420(v1)
80067b80: 8f830000 lw v1,0(gp)
80067b84: ac640590 sw a0,1424(v1)
80067b88: 7c3f1cb8 rddsp v1,0x3f
80067b8c: 8f820000 lw v0,0(gp)
80067b90: 03e00008 jr ra
80067b94: ac430594 sw v1,1428(v0)

The crashing assembler for kernel 5.4 looks like this:

80066db0 <my_save_dsp.part.7>:
80066db0: 8f830000 lw v1,0(gp)
80066db4: 00202810 mfhi a1,$ac1
80066db8: 00202012 mflo a0,$ac1
80066dbc: ac65048c sw a1,1164(v1)
80066dc0: 8f830000 lw v1,0(gp)
80066dc4: 00403810 mfhi a3,$ac2
80066dc8: 00403012 mflo a2,$ac2
80066dcc: ac640490 sw a0,1168(v1)
80066dd0: 8f830000 lw v1,0(gp)
80066dd4: 00602810 mfhi a1,$ac3
80066dd8: 00602012 mflo a0,$ac3
80066ddc: ac670494 sw a3,1172(v1)
80066de0: 8f830000 lw v1,0(gp)
80066de4: ac660498 sw a2,1176(v1)
80066de8: 8f830000 lw v1,0(gp)
80066dec: ac65049c sw a1,1180(v1)
80066df0: 8f830000 lw v1,0(gp)
80066df4: ac6404a0 sw a0,1184(v1)
80066df8: 7c3f1cb8 rddsp v1,0x3f
80066dfc: 8f820000 lw v0,0(gp)
80066e00: 03e00008 jr ra
80066e04: ac4304a4 sw v1,1188(v0)

This looks very similar, Is there some initialization for the DSP extension needed?

This commit from Linux 4.20 looks interesting:
https://git.kernel.org/linus/edbb4233e7efc37dbebb10f7774b38c64080dd66

@openwrt-bot
Copy link
Author

Hauke:

I did a git bisect and it breaks since this kernel commit:
http://git.kernel.org/linus/9012d011660ea5cf2a623e1de207a2bc0ca6936d

As this is changing some compiler optimizations I assume this is related to some compiler bug.

@openwrt-bot
Copy link
Author

bmork:

I can confirm this issue on a Ubiquiti UniFi AP AC Pro, so I don't think there is any reason to limit this bug to a specific device. It's probably target wide.

Based on the kernel commit and error location @hauke pointed out, I tried forcibly inlining the dsp_init functions. And that solved the problem for me. See attached patch.

Still needs someone to figure out why, and write a proper commit message explaining it all...

@openwrt-bot
Copy link
Author

Hauke:

Thank you Bjørn Mork that is helpful.

We backported the CONFIG_OPTIMIZE_INLINING function already to kernel 4.19, but we did not see the problem there. I did an additional bisect and found that these two changes are also needed to cause this problem:

Since this commit the system hangs like this:

[ 0.000000] CPU clock: 560.000 MHz
[ 0.000000] clocksource: MIPS: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6825930166 ns
[ 0.000008] sched_clock: 32 bits at 280MHz, resolution 3ns, wraps every 7669584382ns

https://git.kernel.org/linus/172dcd935c34b022729f45a7bbaae5cc05231533

This hang was fixed in this commit added 3 commits later we see the DSP exception
https://git.kernel.org/linus/de56d4c1da3e68f0ca468a55f6677bef3cee6e10

This was both done by manually applying this poach from OpenWrt:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/generic/pending-4.19/220-optimize_inlining.patch;h=ae032709d2729d23c3485c0a4e9ecbbfebd6d6a6;hb=HEAD

My assumption is that the kernel did not handle DSP exceptions correctly before and this was fixed by these patches from Paul.

@openwrt-bot
Copy link
Author

Hauke:

When I revert this GCC commit it works again:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9fe0f3b6468871448bf40751a4f30cf20118ce6a

I created a bug report for GCC:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94506

I reverted a commit in GCC here:
https://patchwork.ozlabs.org/patch/1267087/

I also see this problem with an unmodified upstream kernel.

@openwrt-bot
Copy link
Author

thwalker3:

fwiw- 100% confirmed the problem and that reverting to gcc 8.3 works on an Archer C7 v2. Is anyone following through with Jakub in the gcc bugzilla?

@openwrt-bot
Copy link
Author

xnoreq:

Can confirm as well. Tried flashing master with 5.4 on an Archer C7 v5 last week. Didn't boot.

@openwrt-bot
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant