FS#1926 - MTD partition offset not correctly mapped when bad eraseblocks present #7298

openwrt-bot · 2018-11-02T16:05:30Z

dksl3:

Device problem occurs on
Netgear R6220
Software versions of OpenWrt/LEDE
OpenWrt 18.06.1, r7258-5eb055306f

When OpenWrt detects a bad eraseblock, all following offsets are sifted by one.

I'll try to explain better this issue with an example.
We have this situation in kernel log:
[ 2.853468] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xf1 [ 2.866112] nand: Macronix NAND 128MiB 3,3V 8-bit [ 2.875473] nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB si4 [ 2.890555] Scanning device for bad blocks [ 2.969549] Bad eraseblock 266 at 0x000002140000 [ 3.096049] Bad eraseblock 708 at 0x000005880000 [ 3.189001] 6 fixed-partitions partitions found on MTD device MT7621-NAND [ 3.202518] Creating 6 MTD partitions on "MT7621-NAND": [ 3.212922] 0x000000000000-0x000000100000 : "u-boot" [ 3.223925] 0x000000100000-0x000000200000 : "SC PID" [ 3.234878] 0x000000200000-0x000000600000 : "kernel" [ 3.245854] 0x000000600000-0x000002200000 : "ubi" [ 3.256476] 0x000002e00000-0x000002f00000 : "factory" [ 3.267585] 0x000004200000-0x000007e00000 : "reserved" [ 3.279423] [mtk_nand] probe successfully!

As you can see there are 2 bad eraseblocks. Let's leave the last one, since it is at the end of the flash.
The kernel states that the 'factory' partition starts at 0x2e00000 (that's correct), but in reality OpenWrt will search for the partition at 0x2e20000 (2e00000 + (1 * 128KiB)).
People that have 3 bad eraseblocks before the factory partition reported that their mtd4 (factory) partition content reflects what is in nand at 0x2e60000 (0x2e00000 + (3 * 128KiB)).
This issue led to the wrong belief that there is more than one flash layout for this device, as reported in [[https://openwrt.org/toh/netgear/netgear_r6220|OpenWrt device page]] too.
A rapid check with sc_nand r from U-boot prompt can confirm this behavior.

The text was updated successfully, but these errors were encountered:

openwrt-bot · 2019-01-09T20:33:17Z

jwh7:

Given the severity of this bug, shouldn't the priority be raised?

openwrt-bot · 2019-01-10T11:55:29Z

rmilecki:

OpenWrt will search for the partition at 0x2e20000 (2e00000 + (1 * 128KiB))

This is very vague.

All this report is about problem with reading WiFi EEPROM data from a flash partition "factory". R6220 uses following entries in the DTS:
mediatek,mtd-eeprom = <&factory 0x0000>;
mediatek,mtd-eeprom = <&factory 0x8000>;

If you look at the mt76_get_of_eeprom() you'll see it simply uses mtd_read() to read flash content with EEPROM. So in R6220 case it gets translated into:
mtd_read(mtd, 0x0000, len, ..., ...)
mtd_read(mtd, 0x8000, len, ..., ...)

Now, mtd internally will calculate an absolute offset and will ask NAND driver to read flash content from 0x2e00000 and 0x2e08000.

The real problem is the NAND flash driver. It contains something called BMT which is some crazy translation of NAND pages. It tries to be smart and handle bad block magically as if they didn't exist. It completely doesn't fit the fixed partitioning layout that R6220 uses.

Apparently when the NAND flash driver gets a request for reading a page with 0x2e00000 flash data it returns page that contains 0x2e20000 data. There is nothing wrong with the mt76 driver of mtd subsystem.

All "solutions" like adjusting partitions or mediatek,mtd-eeprom offset are only hacky workarounds for the unexpected NAND driver behavior.

openwrt-bot · 2019-01-10T16:56:59Z

jow-:

Can you please do a local build with the following change applied and see if if it fixes the issue?

Before flashing an image with this change, make sure you're able to recover the device through TFTP if needed.

diff --git a/target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch b/target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch index d50e689110..03b2b36db9 100644 --- a/target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch +++ b/target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch @@ -3578,7 +3578,7 @@ Signed-off-by: John Crispin blogic@openwrt.org




 if (!err) {



         MSG(INIT, "[mtk_nand] probe successfully!\n");



         nand_disable_clock();



-+              shift_on_bbt = 1;

++              shift_on_bbt = 0;


         if (load_fact_bbt(mtd) == 0) {



                 int i;



                 for (i = 0; i < 0x100; i++)

openwrt-bot · 2019-01-14T17:26:19Z

dksl3:

This patch solved the issue!!!

openwrt-bot · 2019-01-16T06:32:44Z

jow-:

Great - thanks for confirming. I am still waiting for my test hardware to arrive in order to do some more thorough fixing of the NAND driver. I am not sure if there's still some shifting / retry logic left in the write path.

openwrt-bot · 2019-01-20T06:25:59Z

ptpt52:

disable shift_on_bbt on nand flash is a bad idea

This may cause the data written on the flash to fail or be corrupted.

It depends on the location of the bad block and the location where the data is written

openwrt-bot · 2019-01-21T10:28:50Z

jogo:

This may cause the data written on the flash to fail or be corrupted.

This is just how NAND flash works. NAND aware filesystems (and FTL) expect this to happen and can handle it. And they expect the NAND controller driver to write/read where they are told to, and report if it fails, or if they needed to do correct bit errors. Not internally remap to some random location.

openwrt-bot · 2019-02-20T18:05:04Z

jwh7:

//@Jo-Philipp Wich:// were you able to use that test hardware for further investigation/improvements of this fix? Thanks!

openwrt-bot · 2019-03-18T23:28:44Z

superfes:

Just FYI, I've applied this patch to my R6220 and it is able to keep my 5G network up.

Eventually the router is a little unstable and I have to reboot it (though I built from master, so it may have nothing to do with this patch).

I don't know what everybody else's experience with this bug, but for me my 5G network would shutdown after ~20 minutes or so and to get it back I'd have to reboot the router.

openwrt-bot · 2019-05-12T19:56:44Z

frost242:

Hello,
I've applied this patch too, on the 18.06.2 branch. The bad eraseblock problem went away and the OpenWrt was able to read the MAC addresses in the flashrom.
The router is rock also stable and works flawlessly as our home WiFi AP.

openwrt-bot · 2019-07-10T20:37:15Z

jwh7:

I am still waiting for my test hardware to arrive in order to do some more thorough fixing of the NAND driver.

//@Jo-Philipp Wich// Just checking in if this testing has been planned? Thanks!

openwrt-bot · 2019-07-16T08:05:35Z

th0m4s:

I can also confirm that firmware from : https://github.com/jayanta525/openwrt-netgear-r6220-100ins is working fine.

openwrt-bot · 2019-07-20T12:22:10Z

ptpt52:

test ok with this patch

openwrt-bot · 2019-08-31T11:23:24Z

bjonglez:

The patch was committed to master: https://git.openwrt.org/527832e54bf3bc4d699a145ae66f34230246f0a9

It probably needs a backport to 19.07, and possibly also 18.06?

openwrt-bot · 2019-09-09T17:04:38Z

jwh7:

This is ported to 19.07 now:
https://git.openwrt.org/b8b62b8506f5465331e749799c36ef49160036f4

openwrt-bot · 2019-10-12T20:44:00Z

Ingvix:

Hey, I'd like to know if this patch is already in some prebuild packages I can use to update my router with or would I currently need to do some building mambojambo — which isn't really in my comfort zone — to get it to my system?

openwrt-bot · 2019-10-18T17:41:36Z

jwh7:

@Ingvix You can use the daily snapshots, following the R6220 device page install instructions, and then manually install LuCI (see docs) and whatever other needed packages.

openwrt-bot · 2019-10-18T17:42:50Z

jwh7:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FS#1926 - MTD partition offset not correctly mapped when bad eraseblocks present #7298

FS#1926 - MTD partition offset not correctly mapped when bad eraseblocks present #7298

openwrt-bot commented Nov 2, 2018

openwrt-bot commented Jan 9, 2019

openwrt-bot commented Jan 10, 2019

openwrt-bot commented Jan 10, 2019

openwrt-bot commented Jan 14, 2019

openwrt-bot commented Jan 16, 2019

openwrt-bot commented Jan 20, 2019

openwrt-bot commented Jan 21, 2019

openwrt-bot commented Feb 20, 2019

openwrt-bot commented Mar 18, 2019

openwrt-bot commented May 12, 2019

openwrt-bot commented Jul 10, 2019

openwrt-bot commented Jul 16, 2019

openwrt-bot commented Jul 20, 2019

openwrt-bot commented Aug 31, 2019

openwrt-bot commented Sep 9, 2019

openwrt-bot commented Oct 12, 2019

openwrt-bot commented Oct 18, 2019

openwrt-bot commented Oct 18, 2019

openwrt-bot commented Oct 21, 2019

openwrt-bot commented Jan 20, 2021

FS#1926 - MTD partition offset not correctly mapped when bad eraseblocks present #7298

FS#1926 - MTD partition offset not correctly mapped when bad eraseblocks present #7298

Comments

openwrt-bot commented Nov 2, 2018

openwrt-bot commented Jan 9, 2019

openwrt-bot commented Jan 10, 2019

openwrt-bot commented Jan 10, 2019

openwrt-bot commented Jan 14, 2019

openwrt-bot commented Jan 16, 2019

openwrt-bot commented Jan 20, 2019

openwrt-bot commented Jan 21, 2019

openwrt-bot commented Feb 20, 2019

openwrt-bot commented Mar 18, 2019

openwrt-bot commented May 12, 2019

openwrt-bot commented Jul 10, 2019

openwrt-bot commented Jul 16, 2019

openwrt-bot commented Jul 20, 2019

openwrt-bot commented Aug 31, 2019

openwrt-bot commented Sep 9, 2019

openwrt-bot commented Oct 12, 2019

openwrt-bot commented Oct 18, 2019

openwrt-bot commented Oct 18, 2019

openwrt-bot commented Oct 21, 2019

openwrt-bot commented Jan 20, 2021