New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FS#2202 - brcm63xx: Hg556a: 4.14 kernel boot stuck at "random: crng init done" #7864
Comments
gaddman: Confirming this is affecting me too. HG556a_A with 4.14.114 kernel (OpenWRT 18.06.2). |
ynezz:
This information isn't correct, you can't have 4.14.114 kernel in 18.06.2, so maybe it's 18.06-SNAPSHOT?
What does it mean exactly? Some kind of kernel crash? Or do you mean, that the booting of the device has stuck at this point? If so, do you run clean snapshot image or you've installed some packages on top as well? |
sapi69: Same error in openwrt 19.07 |
ynezz:
Please try latest snapshot image, they contain urngd which should help with |
sapi69: same after flash, boot failed in 19.07 and master. |
ynezz: Can you please attach complete log from the snapshot image? |
sapi69: [ 0.000000] Detected Broadcom 0x6358 CPU revision a1 |
ynezz:
This is not complete log, please provide complete log. Can you please provide information about last working OpenWrt version on this device? |
sapi69: lede 17.01.7 and openwrt 18.06.4 working fine |
sapi69: log openwrt 19.07
|
sapi69: log trunk
|
sapi69: up up |
ynezz: Thanks for the logs, I don't see any issues in those logs, can you provide log from working 18.06.4 so I can compare them with something?
Ok, that's probably something we could use for a start, so if you're able to compile your own firmware image, you could try to find the offending commit which has broken your device with [[https://flaviocopes.com/git-bisect/|git bisect]]. It's something like this:
git bisect start
git bisect good v18.06.4
git bisect bad openwrt-19.07
(compile & flash)
git bisect bad (or git bisect good)
(compile & flash)
git bisect bad (or git bisect good)
etc..
until you get to |
sapi69: i'm busy now, maybe sunday for 18.06.4 log this log lede 17.01.7 |
sapi69: log 18.06.4
|
sapi69: up up |
peperfus: OpenWrt 19.07.0 with kernel 4.14.162 working. |
peperfus: Does anybody know how to extract firmware from my working HG556a (19.07.0) and then copy it to other HG556a (same model, same version) ?? Or if is it possible to get exactly one past commit of the firmware? (and how / where?) thank you. |
peperfus: Why is this bug set to low priority? |
sapi69: no news? |
Asux30: Please solve this problem |
ruben-p: please |
ynezz:
Just simply ignore that priority field. My understanding is, that this is community driven project, so you cant actually force volunteers to prioritize their work. So nobody wastes time with priorities.
It wont change anything anyway. This problem is device specific, so it needs someone who has access to hardware to bisect it down to the problematic commit, then find out which change inside that commit is causing it. When you can point us at the breaking change/file/line number, we might be able to find a fix. |
peperfus: I only know I have a HG556a_A with this version working: |
syscon: Had some time to do this and learn abit... the check in is: 2308b87 is the first bad commit
:040000 040000 29a6b61ecb4a6a79ac382d9ba7fef76e6c07b8e0 a1e3d1b1141f284bee313a2ba2586d8474aef19b M target sadly i dont think it will help as thats likely the time the 63xx hg556a got remapped to use the new kernel (so doesnt highlight the root cause of the issue). If anyone has the ability to investigate further / suggestions etc... Matt |
syscon: Ive struggled with this for the last few days and costed me one of the 3 routers (though if i can get the jtag on the "c" working, it will be recovered) so heres where i got to.... I can get this to boot from time to time (for example, i left it over night in stuck position, rebooted from a cold restart - unplugged etc, and it would complete the boot process). This is using a stuffed kernel which i had played around with to try a few things (ie: removed pre_init as i thought it might had been in there but issue appears to be a lot earlier) so if you notice it doesnt complete all the way, most likely cos i removed the init. Note you can get into the console though :-) Log of it stuck and working (on exact same firmware and on the exact same device).
@w45260: Flash Manufacture id :c2
|
syscon: confirming.... on the exact same firmware, another reboot and its now back to the hang state...
CFE version cfe.d081.5003 for BCM96358 (32bit,SP,BE)
Build Date: Wed Nov 11 10:36:35 CST 2009 (Lihua_68693)
Copyright (C) 2006 Huawei Technologies Co. Ltd.
|
syscon: no good at linix debugging, but adding afew trace lines, it might be something to do with "local_irq_enable"
|
ynezz:
Well, you're already pretty good, because you know how to compile and run your custom kernel. If I were you, I would simply With |
ynezz: Ah, sorry, you've already done that, this commit is |
russell: Maybe do the same tracing with a working version, so we knew what came after local_irq_enable(). might help bracket where things are going wrong. |
syscon: Hi Russell / Petr, thanks for your replys. I had put print statements before every line so all indications are that it never returned from local_irq_enable() / never reached the next line of kmem_cache_init_late() on the start_kernel (/build_dir/target-mips_mips32_musl/linux-bcm63xx_generic/linux-5.4.36/init.main.c) The key bits around this call below:
Going off a hunch, ive done some more digging and think i got to where i'm happy with... Ive found that by removing the spi sections of the target/linux/bcm63xx/dts/bcm6358.dtsi and re-compiling, i have gone from failing to boot 90% of the time to successfully being able to boot 90% of the time (for some reason i still had a few non boots) in a test of about 30 reboots / cold boots etc. understanding that spi is prob critical to many 63xx boards or to some specific customization, this may not be the solution for everyone. For me, as the hg556A/b/C all use nor flash (not spi memory) etc and I dont have a need for spi, i'm happy to have the trade off in order to get the new kernel working and software offloading working (i'm getting approx 2x the nat speed using software offloading, which is gold). if this does lead to a patch specific to hg556, would be great, but understand based on the above that this may not be the root cause but just hiding it (perhaps someone who knows abit more on the underlying aspects of this chip may know how all this glues together - possibly something to do with the gio pins / irqs / spi???) another observation was that when the router router was on the original 19.x or the daily builds, it "seemed" to be more successful on booting from cold start (unplug for 30 seconds or so) then it would be just by unplugging power and plugging back in within a few seconds.... I suspect that people who are having this issue may be able to unplug the device for 10 mins or so and try again (afew times....). If the power light starts flashing in about 30 seconds, its successfully booting, otherwise if its still solid, its stuck (repeat process). First successful boot takes approx 3 mins before interface is up (it seems to do file system maintenance etc), future boots takes approx 1 min. For anyone else who gets here, Hopefully the above has been useful. Has been fun on my end playing around with it etc while in lock down, Thanks heaps to those working on openwrt / supporting it. available for testing any patches as needed (with B model only unless i can restore cfe on my c model) |
Noltari: Please, test again with latest OpenWrt master: Cheers! |
Asux30: I have a hg566A ver A router I have installed this fix and it is running. |
syscon: Love your work, Tested a few reboots on my hg-556A-verB router and looks good |
sirrion:
The kernel 4.14.107 is unable to boot on the Hg556a (BCM6358).
Steps to reproduce:
Sympthoms:
Boot Log:
*** Press any key to stop auto run (1 seconds) ***
Auto run second count down: 0
boot kernel from be020100
Code Address: 0x80A00000, Entry Address: 0x80a00000
Decompression OK!
Entry at 0x80a00000
Closing network.
Starting program at 0x80a00000
[ 0.000000] Linux version 4.14.107 (hg556a@localhost.localdomain) (gcc version 7.4.0 (OpenWrt GCC 7.4.0 r9015-34696ce25e)) #0 Thu Mar 10 15:47:43 2019
[ 0.000000] Detected Broadcom 0x6358 CPU revision a1
[ 0.000000] CPU frequency is 300 MHz
[ 0.000000] 64MB of RAM installed
[ 0.000000] board_bcm963xx: Boot address 0xbe000000
[ 0.000000] board_bcm963xx: CFE version: d081.5003
[ 0.000000] bcm63xx_nvram: nvram checksum failed, contents may be invalid (expected 33313330, got 3c502ae7)
[ 0.000000] bootconsole [early0] enabled
[ 0.000000] CPU0 revision is: 0002a010 (Broadcom BMIPS4350)
[ 0.000000] board: board name: HW556_B
[ 0.000000] MIPS: machine is Huawei EchoLife HG556a (version B)
[ 0.000000] Determined physical RAM map:
[ 0.000000] memory: 04000000 @ 00000000 (usable)
[ 0.000000] Initrd not found or empty - disabling initrd
[ 0.000000] Primary instruction cache 16kB, VIPT, 2-way, linesize 16 bytes.
[ 0.000000] Primary data cache 16kB, 2-way, VIPT, cache aliases, linesize 16 bytes
[ 0.000000] Zone ranges:
[ 0.000000] Normal [mem 0x0000000000000000-0x0000000003ffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000000000-0x0000000003ffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000003ffffff]
[ 0.000000] random: get_random_bytes called from start_kernel+0x80/0x488 with crng_init=0
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 16256
[ 0.000000] Kernel command line: rootfstype=squashfs,jffs2 noinitrd console=ttyS0,115200
[ 0.000000] PID hash table entries: 256 (order: -2, 1024 bytes)
[ 0.000000] Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
[ 0.000000] Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
[ 0.000000] Memory: 54392K/65536K available (6369K kernel code, 343K rwdata, 2132K rodata, 1324K init, 256K bss, 11144K reserved, 0K cma-reserved)
[ 0.000000] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] NR_IRQS: 256
[ 0.000000] clocksource: MIPS: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 12741736309 ns
[ 0.000026] sched_clock: 32 bits at 150MHz, resolution 6ns, wraps every 14316557820ns
[ 1.034043] random: fast init done
[ 53.820705] random: crng init done
The text was updated successfully, but these errors were encountered: