OpenWrt/LEDE Project

  • Status Requires testing
  • Percent Complete
    0%
  • Task Type Bug Report
  • Category Kernel
  • Assigned To No-one
  • Operating System All
  • Severity High
  • Priority Very Low
  • Reported Version openwrt-18.06
  • Due in Version Undecided
  • Due Date Undecided
  • Votes 11
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Marco - 02.11.2018

FS#1926 - MTD partition offset not correctly mapped when bad eraseblocks present

- Device problem occurs on

        Netgear R6220

- Software versions of OpenWrt/LEDE

        OpenWrt 18.06.1, r7258-5eb055306f

When OpenWrt detects a bad eraseblock, all following offsets are sifted by one.

I’ll try to explain better this issue with an example.
We have this situation in kernel log:

[    2.853468] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xf1         
[    2.866112] nand: Macronix NAND 128MiB 3,3V 8-bit                            
[    2.875473] nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB si4
[    2.890555] Scanning device for bad blocks                                   
[    2.969549] Bad eraseblock 266 at 0x000002140000                             
[    3.096049] Bad eraseblock 708 at 0x000005880000                             
[    3.189001] 6 fixed-partitions partitions found on MTD device MT7621-NAND    
[    3.202518] Creating 6 MTD partitions on "MT7621-NAND":                      
[    3.212922] 0x000000000000-0x000000100000 : "u-boot"                         
[    3.223925] 0x000000100000-0x000000200000 : "SC PID"                         
[    3.234878] 0x000000200000-0x000000600000 : "kernel"                         
[    3.245854] 0x000000600000-0x000002200000 : "ubi"                            
[    3.256476] 0x000002e00000-0x000002f00000 : "factory"                        
[    3.267585] 0x000004200000-0x000007e00000 : "reserved"                       
[    3.279423] [mtk_nand] probe successfully!                                   
 

As you can see there are 2 bad eraseblocks. Let’s leave the last one, since it is at the end of the flash.
The kernel states that the ‘factory’ partition starts at 0x2e00000 (that’s correct), but in reality OpenWrt will search for the partition at 0x2e20000 (2e00000 + (1 * 128KiB)).
People that have 3 bad eraseblocks before the factory partition reported that their mtd4 (factory) partition content reflects what is in nand at 0x2e60000 (0x2e00000 + (3 * 128KiB)).
This issue led to the wrong belief that there is more than one flash layout for this device, as reported in OpenWrt device page too.
A rapid check with

sc_nand r

from U-boot prompt can confirm this behavior.


Jeremy commented on 09.01.2019 20:33

Given the severity of this bug, shouldn't the priority be raised?

Project Manager
Rafał Miłecki commented on 10.01.2019 11:55
OpenWrt will search for the partition at 0x2e20000 (2e00000 + (1 * 128KiB))

This is very vague.


All this report is about problem with reading WiFi EEPROM data from a flash partition "factory". R6220 uses following entries in the DTS:

mediatek,mtd-eeprom = <&factory 0x0000>;
mediatek,mtd-eeprom = <&factory 0x8000>;

If you look at the mt76_get_of_eeprom() you'll see it simply uses mtd_read() to read flash content with EEPROM. So in R6220 case it gets translated into:

mtd_read(mtd, 0x0000, len, ..., ...)
mtd_read(mtd, 0x8000, len, ..., ...)

Now, mtd internally will calculate an absolute offset and will ask NAND driver to read flash content from 0x2e00000 and 0x2e08000.

The real problem is the NAND flash driver. It contains something called BMT which is some crazy translation of NAND pages. It tries to be smart and handle bad block magically as if they didn't exist. It completely doesn't fit the fixed partitioning layout that R6220 uses.

Apparently when the NAND flash driver gets a request for reading a page with 0x2e00000 flash data it returns page that contains 0x2e20000 data. There is nothing wrong with the mt76 driver of mtd subsystem.

All "solutions" like adjusting partitions or mediatek,mtd-eeprom offset are only hacky workarounds for the unexpected NAND driver behavior.

Admin
Jo-Philipp Wich commented on 10.01.2019 16:56

Can you please do a local build with the following change applied and see if if it fixes the issue?

Before flashing an image with this change, make sure you're able to recover the device through TFTP if needed.

diff --git a/target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch b/target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch
index d50e689110..03b2b36db9 100644
--- a/target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch
+++ b/target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch
@@ -3578,7 +3578,7 @@ Signed-off-by: John Crispin <blogic@openwrt.org>
 +      if (!err) {
 +              MSG(INIT, "[mtk_nand] probe successfully!\n");
 +              nand_disable_clock();
-+              shift_on_bbt = 1;
++              shift_on_bbt = 0;
 +              if (load_fact_bbt(mtd) == 0) {
 +                      int i;
 +                      for (i = 0; i < 0x100; i++)
Marco commented on 14.01.2019 17:26

This patch solved the issue!!!

Admin
Jo-Philipp Wich commented on 16.01.2019 06:32

Great - thanks for confirming. I am still waiting for my test hardware to arrive in order to do some more thorough fixing of the NAND driver. I am not sure if there's still some shifting / retry logic left in the write path.

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing