OpenWrt/LEDE Project

  • Status Unconfirmed
  • Percent Complete
    0%
  • Task Type Bug Report
  • Category Base system
  • Assigned To No-one
  • Operating System All
  • Severity Medium
  • Priority Very Low
  • Reported Version Trunk
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Leon George - 21.07.2020

FS#3241 - temporary flash failure on ipq40xx device (wpj428)

Hello :-)

My employer has noticed a small fraction of devices failing with a trunk-based software image (OpenWrt SNAPSHOT, r13134+521-f57230c4e6) on the WPJ428 platform (ipq40xx).

Messages like these appear in syslog:

```
Tue Jul 21 13:16:39 2020 daemon.err node-comm[27021]: Error loading shared library libevent_openssl-2.1.so.7: I/O error (needed by /usr/bin/node-comm-mqtt)
Tue Jul 21 13:16:39 2020 kern.err kernel: [523126.625066] SQUASHFS error: Unable to read fragment cache entry [3c56aa]
Tue Jul 21 13:16:39 2020 kern.err kernel: [523126.625114] SQUASHFS error: Unable to read page, block 3c56aa, size 1522c
```

After reboot, the problem goes away (probably because it’s very unlikely to appear twice in a row).

The problem occurs with various flash chip revisions, so we believe it is a driver issue.

On an (un-)lucky day, the error occured on my device and i created two dumps of /dev/mtd8ro (the whole 32M of flash), one while error was occuring and another after the reboot.
1290 consecutive bytes are read as FF in the error state (reliably when running dd multiple times).

The diff from before and after the reboot looks like this (`cmp -l` output converted to hex, xx for redacted bytes):

```
01BB36BD FF xx
01BB36BE FF xx
01BB36BF FF xx
01BB36C0 FF xx
...
01BB3BC7 FF xx
```

The syslog from above belongs to the same occurance as diff.
It’s worth noting that the file that couldn’t be read is in the ROM portion of the flash while the offset of the diff is near the end.

I’ve reached the limits of my knowledge. If there’s anything else that would be interesting to know from the error state, let me know, i’ll see what i can do.

Leon George commented on 21.07.2020 14:27

the information about partition and mtd size:

$ df -h
Filesystem Size Used Available Use% Mounted on
/dev/root 5.8M 5.8M 0 100% /rom
tmpfs 121.9M 124.0K 121.8M 0% /tmp
/dev/mtdblock11 21.3M 692.0K 20.6M 3% /overlay
overlayfs:/overlay 21.3M 692.0K 20.6M 3% /
tmpfs 512.0K 0 512.0K 0% /dev
$ cat /proc/mtd
dev: size erasesize name
mtd0: 00040000 00010000 "0:SBL1"
mtd1: 00020000 00010000 "0:MIBIB"
mtd2: 00060000 00010000 "0:QSEE"
mtd3: 00010000 00010000 "0:CDT"
mtd4: 00010000 00010000 "0:DDRPARAMS"
mtd5: 00010000 00010000 "0:APPSBLENV"
mtd6: 00080000 00010000 "0:APPSBL"
mtd7: 00010000 00010000 "0:ART"
mtd8: 01e80000 00010000 "firmware"
mtd9: 00390000 00010000 "kernel"
mtd10: 01af6bdc 00010000 "rootfs"
mtd11: 01540000 00010000 "rootfs_data"

Leon George commented on 21.07.2020 15:45

on two other routers, the same block is affected:

Tue Jul 21 15:43:29 2020 kern.err kernel: [1257728.664909] SQUASHFS error: Unable to read fragment cache entry [3c56aa]
Tue Jul 21 15:43:29 2020 kern.err kernel: [1257728.664949] SQUASHFS error: Unable to read page, block 3c56aa, size 1522c

Tue Jul 21 15:40:49 2020 kern.err kernel: [1770438.768700] SQUASHFS error: Unable to read fragment cache entry [3c56aa]
Tue Jul 21 15:40:49 2020 kern.err kernel: [1770438.768741] SQUASHFS error: Unable to read page, block 3c56aa, size 1522c

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing