OpenWrt/LEDE Project

  • Status Unconfirmed
  • Percent Complete
    0%
  • Task Type Bug Report
  • Category Kernel
  • Assigned To No-one
  • Operating System All
  • Severity Critical
  • Priority Very Low
  • Reported Version Trunk
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Alexander Lochmann - 15.05.2020

FS#3099 - TP-Link C2600 Kernel 5.4 crashes while accessing invalid memory address

Linux kernel experiences an oops from time to time
- Device: TP-Link Archer C2600
- Kernel Version: 5.4.40
- OpenWRT Version: OpenWrt SNAPSHOT, r13224+16-2308644b0c
- Steps to reproduce: Simply wait. After an unknown amount of time the kernels crashes leading to a reboot.

 


John T commented on 12.07.2020 19:16

Thanks for linking my bug. It does seem similar. How did you get the full kernel call stack?

Alexander Lochmann commented on 12.07.2020 19:36
John T commented on 12.07.2020 20:22

Thanks! Have you tried reverting to kernel 4.19 and try to catch this again?

I checked the clk-krait.c, file where "krait_mux_set_parent" is defined, introduced by the patch in 4.19 and the one already in 5.4 and they're pretty similar.

FWIW I'm on 5.4.48 and still seeing this issue.

Alexander Lochmann commented on 13.07.2020 08:50

No, I haven't tried this yet.
How have you reverted back to 4.19?

Which patch are you talking about?
Can you pls point me to the location?

John T commented on 13.07.2020 16:22

I haven't but I'm trying different workarounds in that CPU scaling area. I have 2 identical routers so I can experiment with different settings. I'll post if I find anything working, the issue is that it's taking days, weeks to crash.

The patch I was referring to is in openwrt\target\linux\ipq806x\patches-4.19:
0034-0007-clk-qcom-Add-support-for-Krait-clocks.patch

And looks already merged into 5.4.

John T commented on 17.07.2020 18:18

I might be too soon, but one of the routers survived for over 7 days now, with no reboots.
I basically disabled the CPU scaling on both cores. Might be something to try and post any updates. I hope I get to 2 weeks with no power outages.

# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance
performance

John T commented on 23.07.2020 21:59

Update: 2 weeks uptime now and no reboots with CPU scaling disabled.

Alexander Lochmann commented on 26.07.2020 18:19

Nice. I applied your settings as well. So far no reboot for 2 and a half days.
Keep you posted.

Alexander Lochmann commented on 03.08.2020 09:30

Update: Uptime went up to ~9 days. I switched back to ondemand governor, and experienced a crash within 12 hours.....

Filip Matijević commented on 03.08.2020 19:47

Had no crashes to report for a while - they are typical after a week or so for me, and since I had one power failure and a couple of user initiated reboots I couldn't catch any logs but I have finally caught one to report (files attached).

I'll try performance setting for both cores and report back if I experience any crashes.

John T commented on 03.08.2020 19:58

Filip, your call stacks look a little different, so it might be a different issue.

I'm almost at 30 days uptime with the governor set to "performance".

Filip Matijević commented on 04.08.2020 07:53

I've just had another reboot and it seems that crashlog doesn't get overwritten on subsequent crashes - I'll do a hard reboot and wait.
I'm not able to say if it's the same problem here as my crashlogs are not consistent (for example: https://pastebin.com/raw/fhD5vVLQ) In any changing governor for me did the opposite for stability - I'm wondering if making CPU to run at 100% frequency causes overheating issue making it more unstable in my case. I'll try with the same governor once more to see if that's the case.

Alexander Lochmann commented on 03.09.2020 20:43

@John T: Disabling cpufreq did the trick. :-/ Uptime is currently 32 days.

John T commented on 03.09.2020 21:41

Thanks for confirming.

I kind of had some power failures, then had to flash a new image to get the stuff that I needed, so I haven't been able to test it for so long!

I'm on kernel 5.4.60 now with cpufreq disabled from kernel_menuconfig.

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing