Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#2097 - mt7621 nand mtd slave fail to read same page twice #6952

Open
openwrt-bot opened this issue Feb 2, 2019 · 4 comments
Open

FS#2097 - mt7621 nand mtd slave fail to read same page twice #6952

openwrt-bot opened this issue Feb 2, 2019 · 4 comments
Labels
flyspray kernel pull request/issue with Linux kernel related changes

Comments

@openwrt-bot
Copy link

LeonPoon:

For my setup, only 'firmware' partition is specified in .dts (no 'kernel' or 'rootfs') so that mtdsplit can do its job of finding the correct offset for ubi parition.

mtdsplit found the ubi slave partition but attaching the ubi failed so kernel panicked due to lack of a root fs.

I narrowed down the issue to data all zero read when doing ubi_auto_attach().

Observe in below added printk in nand_do_read_ops() that 0x580000 was read once at 2.028127s and again at 3.682984s but the data are different:

[ 1.166183] 7 fixed-partitions partitions found on MTD device MT7621-NAND
[ 1.172957] Creating 7 MTD partitions on "MT7621-NAND": ***
[ 1.178513] 0x000000000000-0x000000080000 : "uboot"
[ 1.207245] 0x000000080000-0x0000000c0000 : "uboot_env"
[ 1.237246] 0x0000000c0000-0x000000100000 : "factory"
[ 1.266565] 0x000000100000-0x000000140000 : "s_env"
[ 1.295219] 0x000000140000-0x000000180000 : "devinfo"
[ 1.324541] 0x000000180000-0x000002980000 : "firmware"
[ 1.902188] nand: read 4bytes 34f983(3471747): @8fc35b60=0 (retlen=4)
[ 1.909212] nand: read 4bytes 360000(3538944): @8fc35b60=0 (retlen=4)
[ 1.916224] nand: read 4bytes 380000(3670016): @8fc35b60=0 (retlen=4)
[ 1.923224] nand: read 4bytes 3a0000(3801088): @8fc35b60=0 (retlen=4)
[ 1.930201] nand: read 4bytes 3c0000(3932160): @8fc35b60=0 (retlen=4)
[ 1.937204] nand: read 4bytes 3e0000(4063232): @8fc35b60=0 (retlen=4)
[ 1.944203] nand: read 4bytes 400000(4194304): @8fc35b60=0 (retlen=4)
[ 1.951180] nand: read 4bytes 420000(4325376): @8fc35b60=0 (retlen=4)
[ 1.958183] nand: read 4bytes 440000(4456448): @8fc35b60=0 (retlen=4)
[ 1.965183] nand: read 4bytes 460000(4587520): @8fc35b60=0 (retlen=4)
[ 1.972160] nand: read 4bytes 480000(4718592): @8fc35b60=0 (retlen=4)
[ 1.979163] nand: read 4bytes 4a0000(4849664): @8fc35b60=0 (retlen=4)
[ 1.986162] nand: read 4bytes 4c0000(4980736): @8fc35b60=0 (retlen=4)
[ 1.993172] nand: read 4bytes 4e0000(5111808): @8fc35b60=0 (retlen=4)
[ 2.000150] nand: read 4bytes 500000(5242880): @8fc35b60=0 (retlen=4)
[ 2.007147] nand: read 4bytes 520000(5373952): @8fc35b60=0 (retlen=4)
[ 2.014152] nand: read 4bytes 540000(5505024): @8fc35b60=0 (retlen=4)
[ 2.021130] nand: read 4bytes 560000(5636096): @8fc35b60=0 (retlen=4)
[ 2.028127] nand: read 4bytes 580000(5767168): @8fc35b60=23494255 (retlen=4)
[ 2.035166] mtdsplit: mtd_check_rootfs_magic(firmware, offset=4194304) got UBI_EC_MAGIC (magic=23494255)
[ 2.044625] 2 uimage-fw partitions found on MTD device firmware
[ 2.050514] run_parsers_by_type(firmware, MTD_PARSER_TYPE_FIRMWARE) found 2 parts
[ 2.057997] 0x000000180000-0x000000580000 : "kernel"
[ 2.087096] 0x000000580000-0x000002980000 : "ubi"
[ 2.115345] 0x000002980000-0x000005180000 : "alt_firmware"
[ 2.146730] [mtk_nand] probe successfully!
[ 2.151499] Signature matched and data read!
[ 2.155764] load_fact_bbt success 1023
[ 2.160202] libphy: Fixed MDIO Bus: probed
[ 2.234009] mtk_soc_eth 1e100000.ethernet: generated random MAC address 3e:90:49:dc:0e:f4
[ 2.242383] libphy: mdio: probed
[ 3.645190] mtk_soc_eth 1e100000.ethernet: loaded mt7530 driver
[ 3.651851] mtk_soc_eth 1e100000.ethernet eth0: mediatek frame engine at 0xbe100000, irq 20
[ 3.662579] NET: Registered protocol family 10
[ 3.668281] Segment Routing with IPv6
[ 3.672014] NET: Registered protocol family 17
[ 3.676546] 8021q: 802.1Q VLAN Support v1.8
[ 3.682984] nand: read 4bytes 580000(5767168): @8fc35e1c=0 (retlen=4)
[ 3.689426] UBI mtd_read(ubi, offset=0, 4 bytes) got magic@8fc35e1c bytes: 0
[ 3.696477] UBI error: no valid UBI magic found inside mtd7
[ 3.702054] hctosys: unable to open rtc device (rtc0)
[ 3.707847] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6

I added ops.oobbuf = 1 in nand_read() and the issue went away seemingly due to disabling of read from chip buffer (near line 1903 in nand_base.c). Observe below printk indicates now reading same address returns same data both times (3.188661s and 4.854364s):

[ 3.153685] nand: read 4bytes 4e0000(5111808): @8fc35b60=0 (retlen=4) [ 3.160662] nand: read 4bytes 500000(5242880): @8fc35b60=0 (retlen=4) [ 3.167676] nand: read 4bytes 520000(5373952): @8fc35b60=0 (retlen=4) [ 3.174678] nand: read 4bytes 540000(5505024): @8fc35b60=0 (retlen=4) [ 3.181655] nand: read 4bytes 560000(5636096): @8fc35b60=0 (retlen=4) [ 3.188661] nand: read 4bytes 580000(5767168): @8fc35b60=23494255 (retlen=4) [ 3.195698] mtdsplit: mtd_check_rootfs_magic(firmware, offset=4194304) got UBI_EC_MAGIC (magic=23494255) [ 3.205161] 2 uimage-fw partitions found on MTD device firmware [ 3.211051] run_parsers_by_type(firmware, MTD_PARSER_TYPE_FIRMWARE) found 2 parts [ 3.218530] 0x000000180000-0x000000580000 : "kernel" [ 3.247625] 0x000000580000-0x000002980000 : "ubi" [ 3.275885] 0x000002980000-0x000005180000 : "alt_firmware" [ 3.307291] [mtk_nand] probe successfully! [ 3.312061] Signature matched and data read! [ 3.316326] load_fact_bbt success 1023 [ 3.320776] libphy: Fixed MDIO Bus: probed [ 3.393926] mtk_soc_eth 1e100000.ethernet: generated random MAC address d6:1b:2e:2e:17:74 [ 3.402252] libphy: mdio: probed [ 4.815958] mtk_soc_eth 1e100000.ethernet: loaded mt7530 driver [ 4.822595] mtk_soc_eth 1e100000.ethernet eth0: mediatek frame engine at 0xbe100000, irq 20 [ 4.833311] NET: Registered protocol family 10 [ 4.839073] Segment Routing with IPv6 [ 4.842892] NET: Registered protocol family 17 [ 4.847360] 8021q: 802.1Q VLAN Support v1.8 [ 4.854364] nand: read 4bytes 580000(5767168): @8fc35e1c=23494255 (retlen=4) [ 4.861399] UBI mtd_read(ubi, offset=0, 4 bytes) got magic@8fc35e1c bytes: 23494255 [ 4.869062] UBI: auto-attach mtd7 [ 4.872414] ubi0: attaching mtd7 [ 4.892381] UBI: EOF marker found, PEBs from 14 will be erased [ 4.898547] ubi0: scanning is finished [ 4.940681] ubi0: volume 1 ("rootfs_data") re-sized from 9 to 252 LEBs [ 4.948006] ubi0: attached mtd7 (name "ubi", size 36 MiB) [ 4.953428] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes [ 4.960270] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048 [ 4.967044] ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096 [ 4.973995] ubi0: good PEBs: 287, bad PEBs: 1, corrupted PEBs: 0 [ 4.979971] ubi0: user volume: 2, internal volumes: 1, max. volumes count: 128 [ 4.987175] ubi0: max/mean erase counter: 1/0, WL threshold: 4096, image sequence number: 1003525433 [ 4.996288] ubi0: available PEBs: 0, total reserved PEBs: 287, PEBs reserved for bad PEB handling: 19 [ 5.005507] ubi0: background thread "ubi_bgt0d" started, PID 354 [ 5.006121] nand: read 4bytes 5c1000(6033408): @8fc35d8c=73717368 (retlen=4) [ 5.019595] block ubiblock0_0: created from ubi0:0(rootfs) [ 5.025122] ubiblock: device ubiblock0_0 (rootfs) set to be root filesystem [ 5.032069] hctosys: unable to open rtc device (rtc0)

I don't know what oobbuf does and what's the wider impact so what is the real problem and fix?

Thanks.

Let met know what other information is needed.

@openwrt-bot
Copy link
Author

LeonPoon:

By the way the (re-)read works correctly if the ubi partition is directly defined in the .dts, so it seems that the issue hits only when ubi is a slave of firmware.

And if I boot into initramfs-kernel.bin (with oobbuf==0) and do head -c 32 /dev/mtd7|hexdump -C, it correctly shows the UBI# signature even if I run this command multiple times - so this probably excludes the issue from chip->buffers->databuf mechanism in nand_base.c?

@openwrt-bot
Copy link
Author

LeonPoon:

Forget about what I said above about /dev/mtd7 giving correct data when running hexdump. Please look at attached log instead.

Running the hexdump command gives different results.

Something must be filling chip->buffers->databuf with junk between the time when the ubi was found to the time when ubi was trying to auto attach.

@openwrt-bot
Copy link
Author

LeonPoon:

It appears that at the end of mtk_nand_probe() it reads the factory bad blocks table into chip->buffers->databuf without resetting the pagebuf number.

This patch fixes the problem for me (made to not change number of line):

diff --git a/target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch b/target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch
index d50e689110..5af384c342 100644
--- a/target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch
+++ b/target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch
@@ -3297,13 +3297,13 @@ Signed-off-by: John Crispin blogic@openwrt.org

  •                  printk("compare signature failed %x\n", page);
    
  •                  return -1;
    
  •          }
    

-+ if (mtk_nand_exec_read_page(mtd, page, mtd->writesize, chip->buffers->databuf, chip->oob_poi))
++ if (mtk_nand_exec_read_page(mtd, chip->pagebuf = page, mtd->writesize, chip->buffers->databuf, chip->oob_poi))

  •          {
    
  •                  printk("Signature matched and data read!\n");
    
  •                  memcpy(fact_bbt, chip->buffers->databuf, (bbt_size <= mtd->writesize)? bbt_size:mtd->writesize);
    
  •                  return 0;
    

-+ }
-+
++ } else
++ chip->pagebuf = -1;

  •  }
    
  •  printk("failed at page %x\n", page);
    
  •  return -1;</code>
    

@openwrt-bot
Copy link
Author

iscilyas:

I know that there are alternative mt7621-nand drivers in the works, but in the short term the priority for this bug needs to be raised. We've been seeing almost the exact same symptoms on Xiaomi R3P routers with Micron NAND chips with bad blocks.

Here's a proposed patch:

@aparcar aparcar added the kernel pull request/issue with Linux kernel related changes label Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flyspray kernel pull request/issue with Linux kernel related changes
Projects
None yet
Development

No branches or pull requests

2 participants