Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#3277 - malta/mipseb64: #6424

Closed
openwrt-bot opened this issue Aug 7, 2020 · 5 comments
Closed

FS#3277 - malta/mipseb64: #6424

openwrt-bot opened this issue Aug 7, 2020 · 5 comments
Labels

Comments

@openwrt-bot
Copy link

guidosarducci:

On current master, the image for target malta (mipseb64) errors during startup of the init process, resulting in a boot loop.

Two points to note:

  • the problem is not seen with the previous 4.19 kernel
  • the malta/mipsel64 image fails to build, so the same init error cannot be verified for the alternate endian platform

The specific error is a SIGSEGV fault:

[ 1.061776] cdrom: Uniform CD-ROM driver Revision: 3.20
[ 1.091522] Freeing unused kernel memory: 21544K
[ 1.091778] This architecture does not have kernel memory protection.
[ 1.092156] Run /init as init process
[ 1.137374] random: fast init done
[ 1.197756] init: Console is alive
[ 1.284807] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[ 1.294466] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
[ 1.309498] init: - preinit -
[ 1.321863] do_page_fault(): sending SIGSEGV to init for invalid read access from 0000000000000360
[ 1.322458] epc = 0000000000000360 in init[aaaba5c000+4000]
[ 1.323185] ra = 000000fffd40d5e0 in
[ 1.325513] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 1.326546] Rebooting in 1 seconds..

@openwrt-bot
Copy link
Author

guidosarducci:

I've confirmed this problem also occurs on malta/mipsel64, once the binutils build failure on that platform is fixed by [[https://github.com//pull/3288|PR#3288]]

Reproducing the issue is trivial using QEMU (image built from defaults):./scripts/qemustart malta be64 -cpu MIPS64R2-generic

This bug could use input from a developer familiar with procd init.

@openwrt-bot
Copy link
Author

guidosarducci:

Still seeing the same error after updating to current openwrt commit b59a98b. Enabling debug of procd init shows a little more information.

./scripts/qemustart malta be64 -append "init_debug=4"

[...]

[ 1.264652] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
[ 1.276184] scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 2.5+ PQ: 0 ANSI: 5
[ 1.311226] Freeing unused kernel memory: 21628K
[ 1.311469] This architecture does not have kernel memory protection.
[ 1.311787] Run /init as init process
[ 1.418543] init: Console is alive
[ 1.436103] init: Ping
[ 1.447138] init: Ping
[ 1.457733] init: Ping
[ 1.468352] init: Ping
[ 1.478950] init: Ping
[ 1.489642] init: Ping
[ 1.500256] init: Ping
[ 1.506097] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[ 1.511427] init: Ping
[ 1.516345] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
[ 1.522048] init: Ping
[ 1.525605] init: - preinit -
[ 1.529807] init: Launched preinit instance, pid=505
[ 1.535329] do_page_fault(): sending SIGSEGV to init for invalid read access from 0000000000000360
[ 1.535755] epc = 0000000000000360 in init[aaab9d8000+4000]
[ 1.536552] ra = 000000fffc2f05f0 in
[ 1.538997] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 1.540118] Rebooting in 1 seconds..

Looking at the [[https://lxr.openwrt.org/source/procd/initd/init.c|procd code]], the main init seems to start OK, then forks kmodloader which also completes without error. The subsequent code does:
uloop_init();
preinit();
uloop_run();

Looking at the [[https://lxr.openwrt.org/source/procd/initd/preinit.c#L125|preinit() code]], it appears to fork plugd:
/sbin/procd -h /etc/hotplug-preinit.json
and then fork a preinit instance:
/bin/sh /etc/preinit
before printing a DEBUG message and returning. The above seems to have worked.

The kernel panic is immediately after, and the logged "exitcode=0x0000000b" is:
EAGAIN 11 Resource temporarily unavailable

Perhaps there's an issue with one of the forked process, or during uloop_run().

@openwrt-bot
Copy link
Author

yousong:

Likely the 0xb exitcode is for SIGSEGV. See /bin/kill -L

@openwrt-bot
Copy link
Author

guidosarducci:

As another experiment I configured init as /bin/sh to test some basic functions:

qemu-system-mips64 -M malta -kernel bpf-openwrt-malta-be64-vmlinux.elf
-drive file=bpf-openwrt-malta-be64-rootfs-ext4.img,index=0,media=disk
-nographic -m 512 -append "root=/dev/sda rootfstype=ext4 init=/bin/sh"

[...]

[ 0.724245] Freeing unused kernel memory: 252K
[ 0.724501] This architecture does not have kernel memory protection.
[ 0.724920] Run /bin/sh as init process

BusyBox v1.31.1 () built-in shell (ash)

/bin/sh: can't access tty; job control turned off
[ 22.326975] random: fast init done
/ #
/ # echo *
bin dev etc init lib lib64 lost+found mnt overlay proc rom root sbin sys tmp usr var www
/ #
/ # ls
[ 35.314543] do_page_fault(): sending SIGSEGV to ls for invalid read access from 0000000000000360
[ 35.314858] epc = 0000000000000360 in busybox[120000000+82000]
[ 35.315434] ra = 000000fffeb105f0 in
Segmentation fault
/ #

While shell builtins appear to work, commands that require fork/exec are yielding the same SIGSEGV fault. Looks like something very wrong with this system.

@openwrt-bot
Copy link
Author

guidosarducci:

BTW, I realized the "exitcode" refers to a status returned by wait() which, in the case of signal termination, encodes the signal in the low byte. So yes, 0x0b is SIGSEGV.

I also keep coming back to the fact the invalid read access address and the epc are the same. It seems like there's a crazy jump being made into inaccessible memory, from somewhere well away (ra=000000fffc2f05f0) from the init code.

While booting with "init=/bin/sh" a few times, I could manually mount /proc and look at the self memory map. The previous ra seems very close to the [vvar]/[vdso] regions, which could make sense given a long jump, and gives me some ideas/clues to follow up on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant