Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#542 - ipq806x: unable to boot linux-4.9.10 uboot seeing Bad Magic Number #5565

Closed
openwrt-bot opened this issue Feb 21, 2017 · 16 comments
Closed
Labels

Comments

@openwrt-bot
Copy link

reiffert:

ipq806x on Netgear Nighthawk X4 / R7500 is unable to start Lede-trunk with Linux-4.9.10.

U-Boot 2012.07 [local,local] (Jun 20 2014 - 17:36:49)

U-boot 2012.07 dni1 V2.2 for DNI HW ID: 29764841 NOR flash 0MB NAND flash 128MB RAM 256MB 1st Radio 3x3 2nd Radio 4x4
smem ram ptable found: ver: 0 len: 5
DRAM: 235 MiB
NAND: 128 MiB
In: serial
Out: serial
Err: serial
131072 bytes read: OK
cdp: get part failed for 0:HLOS
Net: MAC1 addr:6c:b0:ce:24:7d:5
athrs17_reg_init: complete
athrs17_vlan_config ...done
S17 inits done
MAC2 addr:6c:b0:ce:24:7d:4
eth0, eth1
Hit any key to stop autoboot: 0

Client starts...[Listening] for ADVERTISE...TTT
Retry count exceeded; boot the image as usual

nmrp server is stopped or failed !

Loading from device 0: nand0 (offset 0x1340000)

** check kernel image **
Verifying Checksum ... OK

** check rootfs image **

** Bad Magic Number 0x0 **

The sources of this particular u-boot loader can be found [[http://www.downloads.netgear.com/files/GPL/R7500-and_qtn_gpl_src_V1.0.0.94.zip|here]].

u-boot.git/common/cmd_dni.c:1068 is indicating that it's unable to verify the ih_magic from the rootfs header which should be 0x27051956.

u-boot.git/common/cmd_dni.c:1068
    if (ntohl(hdr->ih_magic) != IH_MAGIC) {
            printf("\n** Bad Magic Number 0x%x **\n", hdr->ih_magic);
            return 1;
    }

Comparing the headers for a working lede-trunk with Linux-4.4.49 and with Linux-4.9.10:

Lede-trunk/Linux-4.4.49:

00000080 27 05 19 56 32 6e b2 0a 58 aa fe 1a 00 1f 50 22 |'..V2n..X.....P"|
00000090 42 20 80 00 42 20 80 00 aa b1 af 1b 05 02 02 00 |B ..B ..........|
000000a0 41 52 4d 20 4c 45 44 45 20 4c 69 6e 75 78 2d 34 |ARM LEDE Linux-4|
000000b0 2e 34 2e 34 39 00 00 00 00 00 00 00 00 00 00 00 |.4.49...........|
...
00200040 27 05 19 56 2f 33 ed 8c 58 aa fe 1a 00 00 00 00 |'..V/3..X.......|
00200050 00 00 00 00 00 00 00 00 00 00 00 00 05 02 07 00 |................|
00200060 41 52 4d 20 4c 45 44 45 20 66 61 6b 65 72 6f 6f |ARM LEDE fakeroo|
00200070 74 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |t...............|

Lede-trunk/Linux-4.9.10:

00000080 27 05 19 56 e1 9e 33 1c 58 aa fe 1a 00 1d 22 22 |'..V..3.X.....""|
00000090 42 20 80 00 42 20 80 00 29 b3 4f cd 05 02 02 00 |B ..B ..).O.....|
000000a0 41 52 4d 20 4c 45 44 45 20 4c 69 6e 75 78 2d 34 |ARM LEDE Linux-4|
000000b0 2e 39 2e 31 30 00 00 00 00 00 00 00 00 00 00 00 |.9.10...........|
...
00200040 27 05 19 56 2f 33 ed 8c 58 aa fe 1a 00 00 00 00 |'..V/3..X.......|
00200050 00 00 00 00 00 00 00 00 00 00 00 00 05 02 07 00 |................|
00200060 41 52 4d 20 4c 45 44 45 20 66 61 6b 65 72 6f 6f |ARM LEDE fakeroo|
00200070 74 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |t...............|

The header according to u-boot.git/include/image.h:189

typedef struct image_header {
uint32_t ih_magic; /* Image Header Magic Number /
uint32_t ih_hcrc; /
Image Header CRC Checksum /
uint32_t ih_time; /
Image Creation Timestamp /
uint32_t ih_size; /
Image Data Size /
uint32_t ih_load; /
Data Load Address /
uint32_t ih_ep; /
Entry Point Address /
uint32_t ih_dcrc; /
Image Data CRC Checksum /
uint8_t ih_os; /
Operating System /
uint8_t ih_arch; /
CPU architecture /
uint8_t ih_type; /
Image Type /
uint8_t ih_comp; /
Compression Type /
uint8_t ih_name[IH_NMLEN]; /
Image Name */
} image_header_t;

Anything that can be concluded from here?

  • The kernel header is existing and uboot verifys it ok
  • The rootfs header is at the right spot but uboot is unable to find the ih_magic
@openwrt-bot
Copy link
Author

hnyman:

Sounds similar that has happened to me with Netgear R7800 with 4.9 trials.

As my device ends up in TFTP recovery mode controlled by u-boot, the control has never been passed to kernel.

I have not had a serial cable attached, so great info from you ;-)

Related discussion in:
https://forum.lede-project.org/t/netgear-r7800-exploration-ipq8065-qca9984/285/129

@openwrt-bot
Copy link
Author

dissent1:

Current lede k4.9 has incorrect nand DT setup, its lacking
dissent1/r7800@d8e127f
dissent1/r7800@3464973
dissent1/r7800@a2d8a5b

Could you please build off my tree and provide a bootlog?
https://github.com/dissent1/r7800/tree/bl49-2

@openwrt-bot
Copy link
Author

@openwrt-bot
Copy link
Author

jogo:

While these dts changes look reasonable and correct at a first glance, they do not have anything to do with the issue in this ticket.

The issue here is that somewhere the image/kernel is generated differently, and the bootloader trips over that.

The main difference I see from the headers is that the 4.9 kernel is significantly larger (> 128k), which might cause the loader to look at a different ("wrong") offset, seeing the 0x0.

@openwrt-bot
Copy link
Author

jogo:

Ah, that commit has the clue:

  • KERNEL = kernel-bin | append-dtb | pad-to $$$$(($$(KERNEL_SIZE)-2*64-1)) | uImage none | append-file $(KDIR)/root.dummy
  • KERNEL = kernel-bin | append-dtb | uImage none | pad-offset $$(KERNEL_SIZE) 64 | \
  •   append-uImage-fakeroot-hdr
    

The commit switched the order of wrapping in uImage and padding the kernel, which is likely the issue.

Can you try putting the uImage non after the pad-offset command?

@openwrt-bot
Copy link
Author

mkresin:

It is the issue. I'm working with Thomas at the moment on a fix.

@openwrt-bot
Copy link
Author

reiffert:

There is a patch at http://patchwork.ozlabs.org/patch/730653/

@openwrt-bot
Copy link
Author

dissent1:

Are you able to boot up after that? Or it just fixes this particular issue?

@openwrt-bot
Copy link
Author

reiffert:

It entitles the bootloader to successfully verify the existance of a rootfs and gets it one step further.

U-Boot 2012.07 [local,local] (Jun 20 2014 - 17:36:49)

U-boot 2012.07 dni1 V2.2 for DNI HW ID: 29764841 NOR flash 0MB NAND flash 128MB RAM 256MB 1st Radio 3x3 2nd Radio 4x4
smem ram ptable found: ver: 0 len: 5
DRAM: 235 MiB
NAND: 128 MiB
In: serial
Out: serial
Err: serial
131072 bytes read: OK
cdp: get part failed for 0:HLOS
Net: MAC1 addr:6c:b0:ce:24:7d:5
athrs17_reg_init: complete
athrs17_vlan_config ...done
S17 inits done
MAC2 addr:6c:b0:ce:24:7d:4
eth0, eth1
Hit any key to stop autoboot: 0

Client starts...[Listening] for ADVERTISE...TTT
Retry count exceeded; boot the image as usual

nmrp server is stopped or failed !

Loading from device 0: nand0 (offset 0x1340000)

** check kernel image **
Verifying Checksum ... OK

** check rootfs image **
Verifying Checksum ... OK

Loading from nand0, offset 0x1340000
Image Name: ARM LEDE Linux-4.9.10
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 1911122 Bytes = 1.8 MiB
Load Address: 42208000
Entry Point: 42208000
Automatic boot of image at addr 0x44000000 ...
Image Name: ARM LEDE Linux-4.9.10
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 1911122 Bytes = 1.8 MiB
Load Address: 42208000
Entry Point: 42208000
Verifying Checksum ... OK
Loading Kernel Image ... OK
OK
mtdparts variable not set, see 'help mtdparts'
no partitions defined

defaults:
mtdids : nand0=msm_nand
mtdparts: mtdparts=msm_nand:12M@0x2d40000(netgear)
info: "mtdparts" not set
Using machid 0x1260 from environment

Starting kernel ...

U-Boot 2012.07 [local,local] (Jun 20 2014 - 17:36:49)

U-boot 2012.07 dni1 V2.2 for DNI HW ID: 29764841 NOR flash 0MB NAND flash 128MB

@openwrt-bot
Copy link
Author

dissent1:

Is it full bootlog?

@openwrt-bot
Copy link
Author

reiffert:

Yes it is.

@openwrt-bot
Copy link
Author

thess:

FWIW (I don't mean to hijack this thread) - I have seen this problem on my R7800 with the 4.4 kernel when I built an image without IPV6 support. A re-build with IPV6 enabled and no other changes corrected the problem. (I have a console available if you need more info)

Ignore this rambling if it is not relevant.

@openwrt-bot
Copy link
Author

reiffert:

Hey Ted, please feel free to apply my [[http://patchwork.ozlabs.org/patch/730653/|patch]], then build a linux-4.4 image without IPv6 and let us know if it works.

@openwrt-bot
Copy link
Author

thess:

Yes, this patch fixes my 4.4 builds w/o IPv6. Kernel is actually 286K smaller - not that the R7800 can't spare it.

@openwrt-bot
Copy link
Author

reiffert:

dissent1: the bootlogs for https://github.com/dissent1/r7800/tree/bl49-2

U-Boot 2012.07 [local,local] (Jun 20 2014 - 17:36:49)

U-boot 2012.07 dni1 V2.2 for DNI HW ID: 29764841 NOR flash 0MB NAND flash 128MB RAM 256MB 1st Radio 3x3 2nd Radio 4x4
smem ram ptable found: ver: 0 len: 5
DRAM: 235 MiB
NAND: 128 MiB
In: serial
Out: serial
Err: serial
131072 bytes read: OK
cdp: get part failed for 0:HLOS
Net: MAC1 addr:6c:b0:ce:24:7d:5
athrs17_reg_init: complete
athrs17_vlan_config ...done
S17 inits done
MAC2 addr:6c:b0:ce:24:7d:4
eth0, eth1
Hit any key to stop autoboot: 0
(IPQ) # tftpboot lede-ipq806x-R7500-initramfs-uImage
Using eth1 device
TFTP from server 10.20.0.2; our IP address is 10.20.0.3
Filename 'lede-ipq806x-R7500-initramfs-uImage'.
Load address: 0x41000000
Loading: #################################################################
#################################################################
#################################################################
#################################################################
######
done
Bytes transferred = 3894357 (3b6c55 hex)
(IPQ) # bootm
Image Name: ARM LEDE Linux-4.9.10
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 3894293 Bytes = 3.7 MiB
Load Address: 42208000
Entry Point: 42208000
Verifying Checksum ... OK
Loading Kernel Image ... OK
OK
mtdparts variable not set, see 'help mtdparts'
no partitions defined

defaults:
mtdids : nand0=msm_nand
mtdparts: mtdparts=msm_nand:12M@0x2d40000(netgear)
info: "mtdparts" not set
Using machid 0x1260 from environment

Starting kernel ...

@openwrt-bot
Copy link
Author

dissent1:

Thanks! We are stuck at it for almost a week already, seems kernel fails before serial driver attaches. Irq/clock/timer calibration failure?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant