Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#4131 - 21.02.1 squashfs overlay retained rpi4 #9113

Open
openwrt-bot opened this issue Nov 9, 2021 · 25 comments
Open

FS#4131 - 21.02.1 squashfs overlay retained rpi4 #9113

openwrt-bot opened this issue Nov 9, 2021 · 25 comments
Labels

Comments

@openwrt-bot
Copy link

wulfy23:

Upgrading from 21.02.0 to 21.02.1 leads to whacky opkg state where previous packages are listed

see:
https://forum.openwrt.org/t/rpi-4-sysupgrade-21-02-1-squashfs-overlay-retained/110586?u=wulfy23
for further information

@openwrt-bot
Copy link
Author

wulfy23:

note:

selection of keep settings or not makes no difference and user configs are retained even without keep settings...

21.02.0-rc4 > 21.02.0 (no problem)

21.02.0 > 21.02.1 (problem)
21.02.1 > 21.02.1 (problem)

21.02.1 > master@r18xxx+ (no problem)

@openwrt-bot
Copy link
Author

mojifax:

i've noticed this in 21.02.0 > 21.02.0 also

1- factory install
2- set password
3- reboot
4- sysupgrade not retaining settings
5- reboot password i still set

https://forum.openwrt.org/t/raspberry-pi-4-usb-boot/109646/7

@CakeConnoisseur
Copy link

CakeConnoisseur commented Feb 20, 2022

Duplicate of #5087

@TalalMash
Copy link

TalalMash commented Feb 26, 2022

For those looking for a temporary workaround with no graceful shutdown options (not a problem with f2fs-squashfs):
Using ext4 image, add to rc.local mount -o rw,remount,noatime,async,barrier=1,commit=30,errors=continue /
And in /lib/preinit/80_mount_root:

....
do_mount_root() {
       if [ -x '/usr/sbin/fsck.ext4' ]; then
           if [ -e '/dev/<replacewithrootpart>' ]; then
               echo "Checking <replacewithrootpart>"
               /usr/sbin/fsck.ext4 -y /dev/<replacewithrootpart> >> /tmp/fsck-results
           fi
	fi
	mount_root
....

With automated power cycles over ~100 iterations journaling corruptions were the most common but no data corruption, only data loss. Tested with litesql writing to storage every 2 seconds. Platform: RPi4 with sdcard.
E: to add filesystem would switch to read-only between 10-20 iterations with default settings. Note that speeds degrades significantly with 'sync' option.

@wulfy23
Copy link
Contributor

wulfy23 commented Feb 28, 2022

possible overlap/relationships with this issue

#9352

@CakeConnoisseur
Copy link

FYI: Just did a sysupgrade from 21.02.2 > 21.02.3 with the official squashfs image, this time "keep settings" was checked.
Could not reproduce the bug!

Was there a fix implemented?
AFAIK nothing mentioned here or in #9352.

@lespinasse
Copy link

Finally managed to buy a Pi to play with, and hit this issue. Good news, I think I have a fix for it.

When booting, openwrt creates a loop device pointing at some block-aligned offset after the end of the squashfs. If a valid filesystem is found there, it's used as the overlay, if not it is created then used.

When creating disk images, the squashfs is copied at the appropriate place into the disk image, but the disk image is not extended to cover the rest of the root partition.

So, when installing the image with dd (and presumably also with sysupgrade, I'm not entirely sure, but dd is a widely used installation method anyway), the squashfs is copied to the start of the target device's root partition, but the rest of the partition is preserved. Upon booting, the pre-existing overlay fs might or might not be found, depending on wether it falls at the same block-aligned offset as with the previous image or not.

My suggested fix would be to always zero-extend the disk image, prior to compression, so that installing it will always overwrite the entire rootfs partition including the overlay space. The attached diff works for me; I would like to see it merged into trunk and possibly 22.03 branch if there are no objections:

--------------------------------------------------------------8<--------------------------------------------------------------------
Zero-fill the target root partition before copying over its contents.

In the squashfs case, the space after the copied content will be used for
the /overlay loopfs, and we want to ensure it is consistently cleared
when installing new sd card images (potentially with dd).

diff -ru a/target/linux/bcm27xx/image/gen_rpi_sdcard_img.sh b/target/linux/bcm27xx/image/gen_rpi_sdcard_img.sh
--- a/target/linux/bcm27xx/image/gen_rpi_sdcard_img.sh 2022-10-14 15:44:41.000000000 -0700
+++ b/target/linux/bcm27xx/image/gen_rpi_sdcard_img.sh 2022-10-22 04:39:43.448104056 -0700
@@ -22,7 +22,8 @@
ROOTFSOFFSET="$(($3 / 512))"
ROOTFSSIZE="$(($4 / 512))"

dd bs=512 if="$BOOTFS" of="$OUTPUT" seek="$BOOTOFFSET" conv=notrunc
+dd bs=512 if=/dev/zero of="$OUTPUT" seek="$((ROOTFSOFFSET+ROOTFSSIZE-1))" count=1 conv=notrunc
dd bs=512 if="$ROOTFS" of="$OUTPUT" seek="$ROOTFSOFFSET" conv=notrunc

@plan44
Copy link

plan44 commented Jun 1, 2023

Hi @lespinasse,

I ran into the same issue recently and came to the same conclusion. I tried to get some attention in the openwrt-devel mailing list and in the forum, because it was (still is) not entirely clear to me what the correct solution would be.

[...]
So, when installing the image with dd (and presumably also with sysupgrade,

Indeed, also with sysupgrade! With bricking potential... In the end, sysupgrade just dds the image, see target/linux/bcm27xx/base-files/lib/upgrade/platform.sh:64.

My suggested fix [...]

Thanks for that! This makes much sense to me. Before, I thought a fix in target/linux/bcm27xx/base-files/lib/upgrade/platform.sh was needed, but padding the image before it gets used (be it with dd or sysupgrade) is certainly much better.

However - this did not make it into openwrt, for some reason? Do you have an idea why? The last commit for gen_rpi_sdcard_img.sh is from 2022...

regards, Lukas

@plan44
Copy link

plan44 commented Jun 1, 2023

Hi @lespinasse,

I just experimented a bit with your fix, and it took a moment to realize that writing the last block of the partition would fill the in-between with zeroes as well ;-)

However, this looks a bit wasteful to me in terms of expanded image size and flash wear. For example, my sqashfs image is ~15MB, but the partition is 512MB. I think in order to prevent mount_root() falsely detecting a file system, padding 1MB after squashfs should be well enough.

So my suggestion would be:

diff --git target/linux/bcm27xx/image/gen_rpi_sdcard_img.sh target/linux/bcm27xx/image/gen_rpi_sdcard_img.sh
index 5e8fb2769c41ee7a6ea08a93438498fa13574e88..ce3913d27ce7519b939f2ab2ed1d323ee9e0bf25 100755
--- target/linux/bcm27xx/image/gen_rpi_sdcard_img.sh
+++ target/linux/bcm27xx/image/gen_rpi_sdcard_img.sh
@@ -22,8 +22,32 @@ BOOTSIZE="$(($2 / 512))"
 ROOTFSOFFSET="$(($3 / 512))"
 ROOTFSSIZE="$(($4 / 512))"
 
+# for sqashfs + f2fs layout, the rootfs image is only the sqashfs part, which
+# is (usually much) smaller than the rootfs partition.
+# The f2fs part (for the overlay) will be created only at first boot
+# into the new image, when mount_root does not detect a valid filesystem already
+# present. To make sure random data from previous installations are not falsely
+# detected (which could and did happen in earlier versions), we add 1 MB zero padding
+# or up to the end of the rootfs partition, whichever is less.
+# While the padding is required for the squashfs+f2fs layout only, it does not
+# interfere with other rootfs partition types such as ext4.
+
+# the image itself is usually less than the partition size (ROOTFSSIZE)
+ROOTFSBYTES="$(wc -c < $ROOTFS)"
+ROOTFSIMGSIZE="$(( ($ROOTFSBYTES+511) / 512))"
+# we start padding one block BEFORE image ends
+PADOFFSET="$(($ROOTFSOFFSET + $ROOTFSIMGSIZE - 1))"
+PADSIZE="$(($ROOTFSSIZE - ROOTFSIMGSIZE + 1))"
+# limit the padding to 1M, it needs to be just enough to prevent false fs detection by mount_root
+if (( $PADSIZE > 2048 )); then
+  PADSIZE=2048
+fi
+
+# write the boot partition
 dd bs=512 if="$BOOTFS" of="$OUTPUT" seek="$BOOTOFFSET" conv=notrunc
+# pad out some space after rootfs image, overlapping last block
+if (( $PADSIZE > 0 )); then
+  dd bs=512 if=/dev/zero of="$OUTPUT" seek="$PADOFFSET" count="$PADSIZE" conv=notrunc
+fi
+# write the rootfs image which does usually not fill the entire partition
 dd bs=512 if="$ROOTFS" of="$OUTPUT" seek="$ROOTFSOFFSET" conv=notrunc
-
-
-

What do you think?

@lespinasse
Copy link

Hi @plan44, thank you for your comments.

However - this did not make it into openwrt, for some reason? Do you have an idea why?

I would like to see the changes go in, maybe I have not been very active in pushing them due to lack of comments here, but I don't know and haven't heard of any blockers...

I just experimented a bit with your fix, and it took a moment to realize that writing the last block of the partition would fill the in-between with zeroes as well ;-)

Yes, writing with an offset extends the size of the file, without actually allocating disk space for the blocks that were skipped over. The unallocated blocks will later read as zeroes when gzip compresses the image.

However, this looks a bit wasteful to me in terms of expanded image size and flash wear. For example, my sqashfs image is ~15MB, but the partition is 512MB. I think in order to prevent mount_root() falsely detecting a file system, padding 1MB after squashfs should be well enough.

I don't have a strong opinion on this. I went with zeroing the entire partition because that seemed like the most repeatable option - it makes the final state independent of what was previously there. For sure we could overwrite less than that but I don't have any real grasp of whether 1MB would always be enough, if there could be any trailers that also need to be cleared at the end of the partition, etc... I think erasing 1MB as you are proposing would probably work, but I don't know for sure. I think testing is not enough because it could depend on prior state and/or partition size.

I think if we want to limit the size of the cleared space we should go from first principles and find out out much data mount_root() might look at before deciding what to do.

I'm also not sure how much of an issue would flash wear be if we overwrite the entire partition. I would assume even a cheap SD card could still be overwritten a few hundred times before it becomes a problem ?

@plan44
Copy link

plan44 commented Jun 2, 2023

Yes, writing with an offset extends the size of the file, without actually allocating disk space for the blocks that were skipped over. The unallocated blocks will later read as zeroes when gzip compresses the image.

Yeah, I just wasn't aware that this "reads back zeroes from unwritten parts of the file" is actually guaranteed, but it seems it is. And sparse allocation is not something every file system can do.
But as this is only relevant for image creation on the build system, not the target, this is certainly not a problem either way, I agree.

I went with zeroing the entire partition because that seemed like the most repeatable option

Agreed too. But here I see the need for a compromise given that decompression of that image takes place on the target at sysupgrade. It is not unusual to have quite large partitions (in the GBs) on Rpi, so a sysupgrade could end up writing GBs of zeroes.
I don't think mount_root goes very far into the data to find a filesystem marker. My bad luck was that the first superblock of the f2fs was overwritten, but (most likely) the backup superblock a bit into the data was detected. Writing only 250k of zeroes did fix that already, so I figured quadrupling that should be a decent safety margin.

I'm also not sure how much of an issue would flash wear be if we overwrite the entire partition. I would assume even a cheap SD card could still be overwritten a few hundred times before it becomes a problem ?

As flash wear is the only thing that ever killed one of my 24/7 Rpi based devices in the field (thousands), I'm trying to minimize it as much as possible ;-)

@wulfy23
Copy link
Contributor

wulfy23 commented Jun 4, 2023

FYI

imx_sdcard_pre_upgrade() {

@plan44
Copy link

plan44 commented Jun 14, 2023

@wulfy23 Thx, but I do not really get the point - it seems to me this imx_sdcard_pre_upgrade seems to get active only when there is no backup, and does explicitly reset the overlay to avoid it to survive the upgrade by accident, no?

I can image something similar might work on bcm27xx, when it can be done after the backup is created and stored elsewhere (/tmp in RAM).

But killing the current overlay does not guarantee the space where the new overlay will reside is actually free of data that might be taken as a file system by mount_root, as this depends on the new size of the squashfs part.

So I went with @lespinasse's approach padding the image (with a limited padding size, as described above).

This is what all mtd/jffs2-based targets do anyway, using the mtd tool.

As long as there is no equivalent to mtd for the squashfs+f2fs case, having gen_rpi_sdcard_img.sh pad the image seems to me the solution closest to to the "normal OpenWrt way", and one with no risk for making the situation worse.

I will this as a patch to OpenWrt upstream.

wryun added a commit to wryun/openwrt that referenced this issue Oct 26, 2023
This addresses openwrt#9113 by adding extra space to the image
if it doesn't fill the partition.

Credit to @lespinasse and @plan44
wryun added a commit to wryun/openwrt that referenced this issue Oct 26, 2023
This addresses openwrt#9113 by adding extra space to the image
if it doesn't fill the partition. Otherwise if the rootfs
image is exactly the same size, the overlay is in the same
place and is not reinitialised on upgrade (regardless
of whether settings are kept or not).

Credit to @lespinasse and @plan44
@wryun
Copy link

wryun commented Oct 26, 2023

Not sure if you raised your patch on the mailing list, @plan44 , but in case it helps I've raised a PR based on yours that we're using (see above). Thanks!

@plan44
Copy link

plan44 commented Oct 26, 2023

Not sure if you raised your patch on the mailing list, @plan44

I did, but with exactly zero feedback, neither positive nor negative. I have no idea why.

but in case it helps I've raised a PR based on yours that we're using (see above). Thanks!

Thanks @wryun, sure that helps to increase attention a bit - and hopefully to get it merged eventually.

@plan44
Copy link

plan44 commented Oct 28, 2023

@wryun Now I see your PR does something similar, but not entirely the same than what I proposed, and I'm not 100% sure that what happens now is correct when the rootfs is not an integer number of 512 byte blocks (which is often the case with squashfs).

In your PR, you still use the rounded up ROOTFSSIZE, but start zeroing in the next block, which IMHO can leave a gap of up to 511 bytes between end of rootfs and start of zeroed area.

To prevent this, I started the zeroing one block earlier, but did so before writing the rootfs, which would then partially overwrite the first block of the zeroed area.

Maybe all of this could be calulated and executed on the byte rather than block level, but I just did not want to deviate from the 512-block aligned way of the original code too much.

@wryun
Copy link

wryun commented Oct 29, 2023

@plan44 ah, I see, I misunderstood your original PR. I agree that approach is correct, though I believe in practice the overlay code aligns on 64kb blocks.

@plan44
Copy link

plan44 commented Oct 31, 2023

@plan44 ah, I see, I misunderstood your original PR. I agree that approach is correct, though I believe in practice the overlay code aligns on 64kb blocks.

That may be, however the question is what the search strategy of mount_root looks like, e.g. if it starts searching also aligned to 64kb (or 512bytes) blocks or not. I did not want to rely on a particular assumption about mount_root, except that it tries to find a overlay file system marker following the rootfs data end, which is definitely not block-aligned.

@RussellSenior
Copy link
Contributor

This seems to be the same problem as I recently reported in #13980

@RussellSenior
Copy link
Contributor

Fwiw, this seems to have fixed it for me, based on a suggestion on #openwrt-devel irc channel: #13980 (comment)

@lespinasse
Copy link

@RussellSenior - could you clarify what hardware you are running on and what change fixed the issue for you ?

I am confused because, my understanding is that on pi hardware the images are created with target/linux/bcm27xx/image/gen_rpi_sdcard_img.sh , not with scripts/gen_image_generic.sh as with the change you are linking to.

@RussellSenior
Copy link
Contributor

@RussellSenior - could you clarify what hardware you are running on and what change fixed the issue for you ?

I am confused because, my understanding is that on pi hardware the images are created with target/linux/bcm27xx/image/gen_rpi_sdcard_img.sh , not with scripts/gen_image_generic.sh as with the change you are linking to.

It's right there in the name of the issue the linked comment is attached to: watchguard,firebox-m300. Not sure exactly how it translates to the rpi.

@lespinasse
Copy link

Well I asked because this issue is for rpi4 - that's in the title too...

@RussellSenior
Copy link
Contributor

"squashfs overlay retained" is the same problem on my target device (and some other open issues too, afaict). I did notice that this thread was about rpi4. I don't know anything about the specific context for rpi4. I mentioned the parallel issue because it seemed relevantly addressing the same underlying problem.

@lespinasse
Copy link

yes, that makes sense - both image generating scripts have a similar problem and both can be fixed by clearing/padding the proper spots in the generated image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants