OpenWrt/LEDE Project

  • Status Assigned
  • Percent Complete
  • Task Type Bug Report
  • Category Base system
  • Assigned To
    Alexander Couzens
  • Operating System All
  • Severity Critical
  • Priority Very Low
  • Reported Version lede-17.01
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Rob White - 24.07.2017

FS#927 - SQUASHFS error: xz decompression failed

Ubiquiti AirGateway and Bullet M2
LEDE 17.01.2

Runs very well on boot up. Then after some interval ranging from minutes to days, the following typical error occurs, repeated numerous times with differing blocks:
Mon Jul 24 07:09:39 2017 kern.err kernel: [43650.730023] SQUASHFS error: xz decompression failed, data probably corrupt
Mon Jul 24 07:09:39 2017 kern.err kernel: [43650.735459] SQUASHFS error: squashfs_read_data failed to read block 0x1e5d9a

Following this the unit becomes unresponsive, very slow or reboots its self.
Tried on two different devices and get the same result.

Images produced with Imagebuilder with ipv6, usb, ppp, luci removed to give space on flash.

The exact same config on OpenWrt CC (but with php5) gives no problems.

root@BlueWave:~# free

           total       used       free     shared    buffers     cached

Mem: 28176 19956 8220 132 1460 4296
-/+ buffers/cache: 14200 13976
Swap: 0 0 0

root@BlueWave:~# df -h
Filesystem Size Used Available Use% Mounted on
/dev/root 4.8M 4.8M 0 100% /rom
tmpfs 13.8M 132.0K 13.6M 1% /tmp
/dev/mtdblock5 1.4M 508.0K 964.0K 35% /overlay
overlayfs:/overlay 1.4M 508.0K 964.0K 35% /
tmpfs 512.0K 0 512.0K 0% /dev

psyborg55 commented on 24.07.2017 08:57

32MB of RAM is the problem

Rob White commented on 24.07.2017 09:54

Just saying 32MB of RAM is the problem is not helpful.
Can you be more specific?
I can set up a test to drop free ram to less than 1MB and start to see a slowdown but no errors.
I can believe this is the problem, but have no ACTUAL evidence.

psyborg55 commented on 24.07.2017 10:02

sysupgrading >8MB image on 32MB device just yesterday gave me same errors once or twice - device booted but wifi did not work, other attempts resulted in non-bootable device.

flashing any of the images from u-boot webserver - device booted successfully and wifi worked fine.

Rob White commented on 24.07.2017 10:43

The image I have built is 6.2MB and both tftp and sysupgrade work fine with no issues other than this.
Devices normally sit around 7 to 9 MB free ram, dropping occasionally to 1.5MB under high load, very quickly recovering.
The only problem is the occasional squashfs read errors, some of which are terminal..
My first thoughts were flash failure but multiple devices show the same symptoms.

Currently I am thinking the xz decompression is using excessive amounts of ram and failing.
I do have config and data files built into the squashfs but the largest is only a few KB.

psyborg55 commented on 24.07.2017 11:04

kernels >4 are pure bullshit.

Project Manager
Alexander Couzens commented on 10.08.2017 04:45

@Rob White can you share your images? Or maybe your image config so I can reproduce your problem?

Rob White commented on 10.08.2017 13:05

@Alexander Couzens
Yes, I can build an image with some test scripts to force the problem.
What hardware do you have available?
I have airGateway, airRouter, Bullet M2 and Nanostation M2.
Alternatively I could provide a makeimage script and files folder for Imagebuilder.

It is true that this does seem to only occur on a 32MB device, however the idle free memory does not actually differ much from OpenWrt CC which leads me to suspect some issue is causing LEDE to use excessive memory, most likely the xz decompression. If found to be true this could make 32MB devices much more stable with LEDE.

Project Manager
Alexander Couzens commented on 10.08.2017 13:33

@bluewavenet I've bullet m2 and nanostation m2 available. It might also help to reproduce the bug on a qemu target (e.g. mips qemu) or x86.

Rob White commented on 10.08.2017 14:58

@Alexander Couzens
I'll make an image for the Bullet M2 then. Might take a few days to find the time.... ;-) I could not reproduce the bug on x86 as the smallest ram config I have is 256MB :-D

diizzyy commented on 25.08.2017 09:25

You're most likely running into issues because underlying subsystems are running out of memory especially if you're running additional services.[]=flash

Lucian CRISTIAN commented on 03.10.2017 14:57

beaglebone black has 512MB ram and it fails the same on squashfs

Project Manager
Koen Vandeputte commented on 04.10.2017 20:41

Same error seen a few times on my gw2388 boards. (cns3xxx)

- 16MB NOR using squashfs (5MB free)
- 256MB RAM (>80MB free after full boot)

I recall this suddenly popped up a few months ago..

psyborg commented on 01.12.2018 00:17

have you opened device and modified anything?

Rob White commented on 01.12.2018 20:58

Having posted originally over a year ago and long since moved on, this seems to have sprung back to life!

I did loads of tests at the time and the conclusion was indeed shortage of free RAM.
The decompression needs at a guess at least 2 or 3 times the size if the file being decompressed as free RAM.

Particularly when running a web server, lots of RAM is used for buffering and in my case this was happening at the same time as php scripts were being decompressed to serve yet more. At idle, yes there was 5 or 6 MB of free ram but this could easily fall to less than 1MB just briefly and if a compressed script was opened just at that moment - well, no guesses needed for what happened.

I could replicate this easily on just about any device, even a PC Engines APU2 with a GB of free RAM by using up the RAM of course (A php script appending a string to its self in memory in a loop for example).

If free RAM is not enough for the decompression then it fails.

I now use GL-AR300's with 128MB RAM and don't have the problem ever.

32 MB RAM devices are just fine with 18.06.1 as long as you just want a simple AP, and this is what I do with Ubiquiti BULLET, particularly now the price is dropping (as it is end of life).
It makes a very good outdoor AP, as does the NANO Station and LOCO. Some good prices on Amazon etc if you want outdoor equipment.

psyborg commented on 01.12.2018 21:38

but you wrote "The exact same config on OpenWrt CC (but with php5) gives no problems."

what if you try to replicate that on APU2 running CC? in addition to possible hardware mod issues looks like a kernel has grown to shit too

Project Manager
Koen Vandeputte commented on 19.06.2019 21:46

I had this issue too in the past on my cns3xxx boards (256MB RAM).
This started happening suddenly when upgraded to the latest master
at that time.

Iirc, it lasted for about a month while bumping to master head on daily base.
At some point the issue was gone completely.

I guess it got fixed along the way and this hasn't been an issue for over a year now.

So I think it can be closed ..


Available keyboard shortcuts


Task Details

Task Editing