OpenWrt/LEDE Project

  • Status Unconfirmed
  • Percent Complete
    0%
  • Task Type Bug Report
  • Category Base system
  • Assigned To No-one
  • Operating System All
  • Severity Low
  • Priority Very Low
  • Reported Version Trunk
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by crowston - 20.10.2021

FS#4100 - SQUASHFS errors with OpenWrt 21.02

Supply the following if possible:
- Device problem occurs on

Western Digital My Net N750

- Software versions of OpenWrt/LEDE release, packages, etc.

openwrt-21.02.0

strongswan, dnscrypt-proxy2, avahi-utils, luci-app-ddns

- Steps to reproduce

I installed openwrt-21.02.0-ath79-generic-wd_mynet-n750-squashfs-sysupgrade.bin on a Western Digital My Net N750 that had been running openwrt-19.

The router seemed okay initially but after power cycling, it started reporting errors:

Oct 17 12:20:37 router2 kernel: [ 38.613970] SQUASHFS error: xz decompression failed, data probably corrupt
Oct 17 12:20:37 router2 kernel: [ 38.621029] SQUASHFS error: squashfs_read_data failed to read block 0x23686e
Oct 17 12:20:37 router2 kernel: [ 38.628199] SQUASHFS error: Unable to read fragment cache entry [23686e]
Oct 17 12:20:37 router2 kernel: [ 38.635010] SQUASHFS error: Unable to read page, block 23686e, size 16b28

The filesystem problem would leave some random file damaged, so different services would fail. Over time, the router became less and less functional as various files became inaccessible and after a few cycles, wouldn’t boot at all.

I wondered if there was a problem with my old configuration on the new release (though I’m not sure how that could damage the squashfs), so I reinstalled a few more times in different ways, e.g., doing a factory install (openwrt-21.02.0-ath79-generic-wd_mynet-n750-squashfs-factory.bin and then the upgrade) instead of just the upgrade, and configuring from scratch rather than from the backup. But each time I had the same problem with the router.

It wasn’t the same block on different installs, I noticed, but it seemed to be consistent for a particular installation attempt.

Oct 17 16:11:14 router2 kernel: [ 53.182571] SQUASHFS error: xz decompression failed, data probably corrupt
Oct 17 16:11:14 router2 kernel: [ 53.189582] SQUASHFS error: squashfs_read_data failed to read block 0x21e9e6
Oct 17 16:11:14 router2 kernel: [ 53.196749] SQUASHFS error: Unable to read fragment cache entry [21e9e6]
Oct 17 16:11:14 router2 kernel: [ 53.203559] SQUASHFS error: Unable to read page, block 21e9e6, size fd9c

Once there were two blocks (I think this is a reboot of the install above):

Oct 17 16:29:04 router2 kernel: [ 78.505075] SQUASHFS error: xz decompression failed, data probably corrupt
Oct 17 16:29:04 router2 kernel: [ 78.512103] SQUASHFS error: squashfs_read_data failed to read block 0x1e6e76
Oct 17 16:29:05 router2 kernel: [ 79.111366] SQUASHFS error: xz decompression failed, data probably corrupt
Oct 17 16:29:05 router2 kernel: [ 79.118386] SQUASHFS error: squashfs_read_data failed to read block 0x21e9e6
Oct 17 16:29:05 router2 kernel: [ 79.125565] SQUASHFS error: Unable to read fragment cache entry [21e9e6]
Oct 17 16:29:05 router2 kernel: [ 79.132445] SQUASHFS error: Unable to read page, block 21e9e6, size fd9c

One time there was first a jffs error, followed by lots of squashfs errors. Sorry, I don’t have the log for that one.

I now realize that I should have tried power cycling a clean install a few times to see if there were errors right away or if they only happened after files were installed/changed.

To check whether the router was just having a hardware problem, I reinstalled openwrt-19.07.8 and configured it the same. I have not seen any errors after a few power cycles, which points to a problem with the new release. I did not see any bug reports on this tracker that mention squashfs problems and googling, I did not find any useful discussions, hence this bug report.

I guess it could be that the new release uses a bad bit of memory that the earlier release managed to miss. I looked for but didn’t find a memory test utility, so I don’t know how to examine that possibility. Though the fact that it was different blocks each time makes it not sound like a hardware problem.

crowston commented on 22.10.2021 15:36

I tried installing on a different router and after a few powercycles saw the same SQUASHFS errors, suggesting it's not just bad memory:

Fri Oct 22 11:30:14 2021 kern.err kernel: [ 97.569402] SQUASHFS error: xz decompression failed, data probably corrupt
Fri Oct 22 11:30:14 2021 kern.err kernel: [ 97.576445] SQUASHFS error: squashfs_read_data failed to read block 0x4b5872
Fri Oct 22 11:30:14 2021 kern.err kernel: [ 97.584696] SQUASHFS error: xz decompression failed, data probably corrupt
Fri Oct 22 11:30:14 2021 kern.err kernel: [ 97.591837] SQUASHFS error: squashfs_read_data failed to read block 0x4b5872

But most of the time it seems to work fine.

M95D commented on 24.10.2021 08:26

I have this exact problem with WRT1900ACv1, OpenWRT built from git master. It won't boot at all with the new firmware.

M95D commented on 24.10.2021 11:57

More debugging:

Apparently, the image is not correctly written to flash. Reading back the squashfs and trying to mount it on a x86 Gentoo linux gives the same decompression errors.

See attachment for details.
router is booted from the working firmware (mtd5). mtd7 is the new defective firmware.

M95D commented on 24.10.2021 13:07

Even more debugging:

I extracted the squashfs from the original firmware image that was uploaded to the router. They are identical, except for some extra 0xFF at the end (ubifs read back from the router's mtd is larger, probably because it extends until the end of the erase block).

So, it's not a flash write issue, and it's not a hardware defect.
Both squashfs images can be extracted with the unsquashfs tool without any errors. So, there must be something wrong with the kernel xz decompressor. This affects both my router and my x64 Gentoo machine. Both kernels are v5.10

M95D commented on 29.10.2021 07:25

It seems that ARM BCJ filter decoder is needed in kernel, even on the desktop. Having only x86 BCJ filter decoder won't help.

Maybe there should be a warning put somwhere to alert users that alter the default kernel config.

Brian commented on 17.11.2021 23:53

My WD Mynet N750 is also unstable and also displays these same errors in the log.

Dana commented on 30.11.2021 00:53

I am also seeing this on a WD MyNet N750, starting with 21.02.1. I made an attempt to build a kernel/image with ARM BCJ pinned to the kernel and it did not make a difference.

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing