New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FS#533 - anon_mount 1 in fstab freezes boot in wrt3200acm #8011
Comments
hnyman: Really strange. The only difference is that now the other partition has been flashed once with LEDE, as yesterday there was still the Linksys OEM firmware in the other partition. Now "/sbin/block mount" works also with anon_mount 1 active, and shows a second ubi partition. Unlike yesterday:
It looks like the syscfg partition gets now mounted both as /tmp/syscfg and /mnt/ubi1_0 So, just a hypothesis, but possibly /sbin/block misdetects something when it scans the partitions, if the other partition contains the Linksys OEM firmware. Possibly some naming or formatting is slightly different? EDIT: |
hnyman: Just reporting here that forum user MagicSimi has reproduced the error condition: He has similar setup as I had when the error materialised: Router has been flashed once with LEDE and the other partition still contains Linksys OEM firmware. MagicSimi:
Based on his experience, having additional defined mount sections in fstab does not change things:
So my guess is that something in the name or structure of the Linksys OEM firmware partition causes "/sbin/block mount" to get confused and try something that causes itself (or the NAND driver) to get stuck in a loop, where the device is unsuccessfully polled every 0.2 seconds (like my kernel log shows). |
hnyman: This looks like a nasty bug as it will hit newcomer users who have flashed LEDE just once and are configuring their router. Simply adding "anon_mount 1" option to fstab will make router unbootable. Failsafe will work, but may be hard to find by new users :-( Summary of findings so far:
The hang can be seen from kernel log if the command was issued from console and another console is used to read the log. The log gets filled with this at 0.2s intervals: So far this has been found in WRT3200ACM, but the same will likely happen with all WRT1900/1200/3200ACx devices that have dual firmware partitions. Longer explanation: I added debug statements to fstools block, and MagicSimi tested in his device: Based on testing, the router is trying to automount "mtdblock6" and gets stuck there. cat /proc/mtd: /sbin/block info: "block info" shows that it is a "jffs2" partition. After that finding, I checked the binary copy of my own router's mtd partitions (which I made immediately after flashing LEDE successfully first time), and the corresponding partition really seems to contain raw jffs2 data. Starting with magic 0x1985.
85 19 01 e0 31 00 00 00 1d d9
Apparently fstools block still tries to mount it, and then the kernel driver hangs for some reason. It expects to be find ubi? The real reason for the hang is so far unclear. Something in the flash driver? jffs2 should not be mounted at all in a NAND system? I have added more debug print commands to block and hopefully MagicSimi will test it again. That should reveal the parameters passed to the mount function etc. So far it likely that the hnag happens at the call to the system mount function. It looks like fstools block mount should be somehow modified. |
hnyman: Update: detailed info on the parameters used by the fstools block mount The bug can also be triggered just by running this in a router that still has Linksys OEM firmware in the other partition: mount -t jffs2 /dev/mtdblock6 /mnt/mtdblock6 (mtdblock6 or mtdblock8 depending on which partition contains OEM firmware) It never finishes, doesn't give an error. Kernel log floods with This sounds like a fix is needed from fstools specialists and/or mvebu gurus. Somehow this wrong(?) jffs2 partition with OEM contents should be skipped from the automatic mount tries. Like I said earlier, this is a nasty bug as it hits newcomer users who have flashed LEDE just once and are configuring their router. Simply adding "anon_mount 1" option to fstab will make that router unbootable. Debugging from MagicSimi: With "anon_mount 1":
root@LEDE:/tmp# /tmp/block mount
block: list_for_each_entry starts
block: mount_device for mtdblock6, type=jffs2, version=(null),
label=(null), uuid=(null)
block: mount_device for mtdblock6 auto_mount test
block: mount_device for mtdblock6 extroot test
block: mount_device for mtdblock6 check anon_mount
block: mount_device for mtdblock6 try anon_mount
block: handle_mount mount source=/dev/mtdblock6, target=/mnt/mtdblock6, fstype=jffs2
...hang...
With "anon_mount 0":
root@LEDE:/tmp# /tmp/block mount
block: list_for_each_entry starts
block: mount_device for mtdblock6, type=jffs2, version=(null),
label=(null), uuid=(null)
block: mount_device for mtdblock6 auto_mount test
block: mount_device for mtdblock6 extroot test
block: mount_device for mtdblock6 check anon_mount
block: mount_device for mtdblock6 ends
block: mount_device for ubiblock0_0, type=squashfs, version=4.0,
label=(null), uuid=643ef9c1-d0d092d2-0fa9f12b-3da3502a
block: mount_device for ubiblock0_0 auto_mount test
block: /dev/ubiblock0_0 is already mounted on /rom
The bug can also be triggered just by running mount -t jffs2 /dev/mtdblock6 /mnt/mtdblock6 It never finishes, doesn't give an error that /mnt/mtdblock6 doesn't exist. In the log is the usual "pxa3xx-nand f10d0000.flash: Wait time out!!!" message. It seems to be a problem with /dev/mtdblock6, which (if I understand the partition layout correctly) is the partition where the original linksys firmware is located. |
hnyman:
I have run into strange trouble with my new WRT3200ACM:
anon_mount 1 option in /etc/config/fstab causes the boot process to hang.
Hang can be triggered manually with "/sbin/block mount" when that option is set.
Kernel log fills from entries "pxa3xx-nand f10d0000.flash: Wait time out!!!"
I am filing this bug report for 17.01, but the bug seems to be also in master.
For some reason enabling the anon_mount option in /etc/config/fstab prevents the boot process from completing. The boot progresses to some stage and router e.g. answers ping, but power led blinks and it does not allow login and does not provide dhcp etc. services.
I have had that option in all my routers (WNDR3700, WNDR3800, R7800) to make USB memory sticks to automount when inserted into the router's USB port. So it is a quite normal setting.
I first compiled a mvebu wrt3200acm firmware image from master with kernel 4.9, and that failed to boot. I thought that the new kernel 4.9 might be the reason. So I installed 17.01.0-rc2 official image and it booted ok. I installed packages, configured, rebooted and everything was ok as long as I did not change that anon_mount to 1 in fstab. After changing that, the next boot again gets stuck. (Editing the option in failsafe enables the router to boot normally.)
Boot completes ok with the output of "block detect" entered into fstab:
Boot does not complete with anon_mount 1 in /etc/config/fstab:
I have no good guess about the reason. My initial guess was some connection to ubi1:syscfg being mounted as /tmp/syscfg as that is different than in my other devices, but it is hard to see how that could be the reason. (That mounting is done in /lib/preinit/81_linksys_syscfg )
After some investigation it looks like the reason is actually "/sbin/block mount".
At the boot the init script /etc/init.d/fstab actually runs "/sbin/block mount".
Running that command manually from console (with anon_mount 1) makes the console to hang. From another console window I can see that system & kernel log fills from entries like these:
And the "block" process can't even be killed with "kill -9" or "kill -15".
I rebooted the router from power switch to get it killed.
With anon_mount 0 in fstab the same command works ok and produces:
root@LEDE:~# /sbin/block mount
block: /dev/ubiblock0_0 is already mounted on /rom
block: /dev/ubi0_1 is already mounted on /overlay
The text was updated successfully, but these errors were encountered: