Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#3471 - Mikrotik RB911-5Hn propper nand detection failed #8537

Closed
openwrt-bot opened this issue Nov 24, 2020 · 34 comments
Closed

FS#3471 - Mikrotik RB911-5Hn propper nand detection failed #8537

openwrt-bot opened this issue Nov 24, 2020 · 34 comments
Labels

Comments

@openwrt-bot
Copy link

acoul:

  • Device problem occurs on tftpboot openwrt-ath79-mikrotik-mikrotik_routerboard-sxt-5nd-r2-initramfs-kernel.bin
  • Software versions of OpenWrt/LEDE release: trunk
  • Steps to reproduce: tftpboot openwrt-ath79-mikrotik-mikrotik_routerboard-sxt-5nd-r2-initramfs-kernel.bin

this is the hardware:
https://openwrt.org/toh/hwdata/mikrotik/mikrotik_rb911-5hn_911_lite5

root@OpenWrt:~# cat /proc/mtd

dev: size erasesize name
mtd0: 00040000 00020000 "booter"
mtd1: 003c0000 00020000 "kernel"
mtd2: 07c00000 00020000 "ubi"
mtd3: 00010000 00001000 "RouterBoot"
mtd4: 00010000 00001000 "bootloader1"
mtd5: 00000000 00001000 "bootloader2"

@openwrt-bot
Copy link
Author

acoul:

using: openwrt-19.07.4-ar71xx-mikrotik-rb-nor-flash-16M-initramfs-kernel.bin

also fails on this board. it looks like the board is mistakenly detected as:

"sxt5n", "MikroTik RouterBOARD SXT Lite5"

rather than:

ATH79_MACH_RB_911L, /* Mikrotik RouterBOARD 911-2Hn/911-5Hn boards */
strstr(arcs_cmdline, "board=911L")

@openwrt-bot
Copy link
Author

acoul:

after flashing routeros on the board, & then tftp either openwrt-19.07.4-ar71xx-mikrotik-rb-nor-flash-16M-initramfs-kernel.bin or openwrt-trunk-ar71xx-mikrotik-rb-nor-flash-16M-initramfs-kernel.bin, the "Bad eraseblock" issue is gone. apparently it's because SXT Lite5 arch is trying to access a 64M nand instead of a 16M nor-flash

it's strange that the board probe insists that the board is an SXT Lite5 instead of an rb-911L

@openwrt-bot
Copy link
Author

rogerpueyo:

Hi Alexandros,

// - Device problem occurs on tftpboot openwrt-ath79-mikrotik-mikrotik_routerboard-sxt-5nd-r2-initramfs-kernel.bin
//

=> In the legacy ar71xx target, OpenWrt would "autodetect" the device based on what the RouterBoot bootloader was pasing. This is no longer valid for the ath79 target, so images are available on a per-supported-device basis and the device name is "hardcoded" in the image. This particular initramfs image you are using is for the MikroTik RouterBOARD SXT 5nD r2 (SXT Lite5), and so you get. This is as expected.

// - this is the hardware: https://openwrt.org/toh/hwdata/mikrotik/mikrotik_rb911-5hn_911_lite5 //

=> I can't really understand why but, when booting the ar71xx initramfs, the RouterBoot bootloader is reporting the hardware is an sxt5n (instead of 911L) and the kernel is detecting a 128 MB Toshiba NAND flash chip:

dmesg-19_07_4.txt

[ 0.000000] Kernel command line: no-uart parts=1 boot_part_size=4194304 gpio=216619 HZ=300000000 mem=64M kmac=4C:5E:0C:4A:9E:E2 board=sxt5n boot=0 mlc=5 rootfstype=squashfs noinitrd
...
[ 4.450139] nand: device found, Manufacturer ID: 0x98, Chip ID: 0xf1
[ 4.526221] nand: Toshiba NAND 128MiB 3,3V 8-bit
[ 4.581416] nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64

dmesg-ar71xx-trunk-routeros-format.txt

Kernel command line: no-uart parts=1 boot_part_size=4194304 gpio=216619 HZ=300000000 mem=64M kmac=4C:5E:0C:40:C2:11 board=sxt5n boot=0 mlc=5
...
nand: device found, Manufacturer ID: 0x98, Chip ID: 0xf1
nand: Toshiba NAND 128MiB 3,3V 8-bit
nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64

This is strange, but it could well be that MikroTik was producing these devices with NAND flash and then switched to NOR, or the other way round. Is it a recent purchase?

You can do a couple of things:

  1. In RouterOS, can you check what are the values for "System -> RouterBOARD -> Model", "System -> Resources -> Board Name", "System -> Resources -> Total HDD Size"?
  2. If you boot the ath79-sxt5n initramfs image, there should be the directory /sys/firmware/mikrotik/hard_config with some info about the device (the info the bootloader would pass, and some others). Can you please paste the content of the text files (i.e., not the wlan_data calibration, just plain text).

@openwrt-bot
Copy link
Author

rogerpueyo:

Last, but not least, if you go to MikroTik's 911 Lite5 device page [[https://mikrotik.com/product/RB911-5Hn#fndtn-gallery|https://mikrotik.com/product/RB911-5Hn]] and zoom in the board picture, you'll notice a Samsung K9F1G08U0D chip (bottom left).

Well, that's a 128 MB NAND flash chip :-)

@openwrt-bot
Copy link
Author

acoul:

greetings rogerpueyo,

thank you for the feedback

tftp booting openwrt-19.07.4-ar71xx-mikrotik-rb-nor-flash-16M-initramfs-kernel gets into:
FAILSAFE MODE
here are the contents you asked {empty}
ls -laR /sys/firmware/mikrotik/

/sys/firmware/mikrotik/:
drwxr-xr-x 2 root root 0 Jan 1 00:00 .
drwxr-xr-x 3 root root 0 Jan 1 00:00 ..
and here is the RouterOS system details:
model: 911-5HnD
firmware-type: ar9344
factory-firmware: 3.10
current-firmware: 6.47.7
upgrade-firmware: 6.47.7
and here is my update on the target/linux/ar71xx/base-files/lib/ar71xx.sh
|- *"911-5Hn")
|+ *"911"|
|+ *"911L"|
|+ *"911-5Hn"|
|+ *"911-5HnD")
name="rb-911-5hn"
though, disabling the CONFIG_ATH79_MACH_RBSXTLITE option, I am unable to successfully tftp boot the device (I have no serial access to it yet)

looking at [[https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=eb9e3651dd1a081e0d908e7e5162d6683098c1f3|this commit]] (dated 2018):
Notes:

  • Older versions of these boards might be equipped with a NAND
    flash chip instead of the SPI NOR device. Those boards are not
    supported (yet).
    probably the problem & the confusion is due to this [[https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=3519322e6238913eeae717a598c9760ddf40436c|earlier commit]] (dated 2014):

The new RB911L series is also supported as a result

@openwrt-bot
Copy link
Author

acoul:

here is a picture of the nand-flash. it looks identical to the [[https://i.mt.lv/cdn/rb_images/881_hi_res.png|RB711-5Hn flash]]

here is a dmesg from an rb711 on an older patched lede trunk (my rb711 version has a 64MB nand, not an 128 MB as shown on the above product page).

Mikrotik has a tendency of mixing different peripheral hw under the exact same product model.

@openwrt-bot
Copy link
Author

acoul:

with the help of rogerpueyo, I realized that the nand flash size is 128MB and NOT 16MB as advertised on the product page.

thereof, I used openwrt-ar71xx-mikrotik-nand-large-initramfs-kernel.bin for tftp boot & openwrt-ar71xx-mikrotik-nand-large-squashfs-sysupgrade.bin for the sysupgrade, and my board functions just fine.

it still insists that the board is:

MikroTik RouterBOARD SXT Lite5

and there is no wlan calibration data on:

/sys/firmware/mikrotik/
in fact the above directory is empty

@openwrt-bot
Copy link
Author

acoul:

tftp booting openwrt-18.06.9-ar71xx-mikrotik-vmlinux-initramfs-lzma.elf on the device gives the following under the dir /sys/firmware/routerboot/

drwxr-xr-x 2 root root 0 Nov 19 09:50 .
drwxr-xr-x 3 root root 0 Nov 19 09:50 ..
-rw------- 1 root root 65536 Nov 19 09:50 ext_wlan_data

so once again, it looks like "older" is "better".
in this case it's true for both hw & sw !

@openwrt-bot
Copy link
Author

acoul:

this bug report may well & shamefully close now as Invalid, although it does contain some valuable information & material on steps & procedures one has to take prior to rushing on filling BUG reports.

may good health & spirits, peace & creativity always guard open-source and its advocates

@openwrt-bot
Copy link
Author

rogerpueyo:

Hi Alexandros,

I'll try to go through all the topics in order:

// tftp booting openwrt-19.07.4-ar71xx-mikrotik-rb-nor-flash-16M-initramfs-kernel gets into:

FAILSAFE MODE

//

=> OK, this is an issue I'm also experiencing with the "real" SXT Lite5 I have. It seems that the RESET button polarity is reversed and the kernel understands that it is always pressed. Therefore, at boot time, when you are prompted to press "f" to enter failsafe mode (or, alteratively, to press the reset button), since the button is wrongly detected as always pressed, it enters in failsafe mode.

I tried to fix it in these images here: https://we.tl/t-sSD1Id4jnY . I wonder if you would be so kind to give it a try.

// here are the contents you asked {empty} //

=> Sorry I didn't make myself clear. On point 2) I was asking you to boot the new ath79 image for the sxt5n, which you can find at [[http://example.com|External Linkhttps://downloads.openwrt.org/snapshots/targets/ath79/mikrotik/openwrt-ath79-mikrotik-mikrotik_routerboard-sxt-5nd-r2-initramfs-kernel.bin]]. The ath79 has a new driver for MikroTik devices to expose the device information, calibration data, etc. under /sys/firmware/mikrotik/hard_config , which is not in ar71xx.

// and here is my update on the target/linux/ar71xx/base-files/lib/ar71xx.sh

|- *"911-5Hn")
|+ *"911"|
|+ *"911L"|
|+ *"911-5Hn"|
|+ *"911-5HnD")
name="rb-911-5hn"

though, disabling the CONFIG_ATH79_MACH_RBSXTLITE option, I am unable to successfully tftp boot the device (I have no serial access to it yet) //

=> In ar71xx, during boot process, the bootloader passes some commands to the kernel, including the device hardware model, which is different to the "commercial" model (RB911whatever). There are many cases of different MikroTik devices that have the same hardware, so the bootloader passes the same hardware model to the kernel. Your case might be this one: the RB911Lite5 and the SXT Lite5 have the same CPU/RAM/flash/Ethernet/Wifi, so it may make sense to use the same hardware model identifier (sxt5n).

In any case, your bootloader is passing the sxt5n identifier, which under ar71xx OpenWrt corresponds to the SXT Lite5. Therefore, if you remove the support for the sxt5n device, OpenWrt might only partially boot, or might be unable to bring up some devices/resources (in your case, I understand, at least the serial interface). I would say this is expected.

// looking at this commit (dated 2018):

Notes:

  • Older versions of these boards might be equipped with a NAND
    flash chip instead of the SPI NOR device. Those boards are not
    supported (yet).

//

=> There we have it, MikroTik initially made the RB911Lite5 with NAND flash, and then switched to NOR flash for whatever reason. This means you probably have one of these old NAND boards, which are not supported as "normal RB911Lite5 NOR boards", but have exactly the same hardware as the SXT Lite5 (and so the bootloader is telling).

// probably the problem & the confusion is due to this earlier commit (dated 2014):

The new RB911L series is also supported as a result

//

=> You found it! This commit added support to the SXT Lite 5, and "accidentally" also added support to the early NAND-RB911Lite5, because they had the same exact hardware. Later, MikroTik switched to NOR-RB911Lite5, which has partially different hardware. Tricky ;)

//here is a picture of the nand-flash. it looks identical to the RB711-5Hn flash

here is a dmesg from an rb711 on an older patched lede trunk (my rb711 version has a 64MB nand, not an 128 MB as shown on the above product page).

Mikrotik has a tendency of mixing different peripheral hw under the exact same product model.
//

=> OK, this is a completely different model. Check the CPU, it's an old Atheros AR7241 ... But yes, you got it right, MikroTik does this quite often, which is inconvenient for OpenWrt. :( Sometimes they change the info on their webpage, but keep the old picture there.

// thereof, I used openwrt-ar71xx-mikrotik-nand-large-initramfs-kernel.bin for tftp boot & openwrt-ar71xx-mikrotik-nand-large-squashfs-sysupgrade.bin for the sysupgrade, and my board functions just fine.

it still insists that the board is:

MikroTik RouterBOARD SXT Lite5

and there is no wlan calibration data on:

/sys/firmware/mikrotik/

in fact the above directory is empty//

=> The "MikroTik RouterBOARD SXT Lite5" board name you are getting is because ar71xx OpenWrt gets "sxt5n" from the bootloader and translates it into the "commercial" name. If you had a NOR-RB911Lite5, the bootloader would be passing something like "911l" and you'd be reading "MikroTik RouterBoard 911L".

Also, you're not seeing the calibration data because you are in failsafe mode, could it be?

Just think that your NAND-RB911Lite 5 is hardware-identical to the SXT Lite 5 and don't worry anymore about it. :)

//
tftp booting openwrt-18.06.9-ar71xx-mikrotik-vmlinux-initramfs-lzma.elf on the device gives the following under the dir /sys/firmware/routerboot/

drwxr-xr-x 2 root root 0 Nov 19 09:50 .
drwxr-xr-x 3 root root 0 Nov 19 09:50 ..
-rw------- 1 root root 65536 Nov 19 09:50 ext_wlan_data

so once again, it looks like "older" is "better".
in this case it's true for both hw & sw !
//

=> I would say that if you tftp boot the ar71xx-19.07.4 image I left at https://we.tl/t-sSD1Id4jnY the device won't enter in failsafe mode and you'll get the calibration data under /sys/firmware/routerboot.

//this bug report may well & shamefully close now as Invalid, although it does contain some valuable information & material on steps & procedures one has to take prior to rushing on filling BUG reports.
//

No worries, now you know your board is disguised as another thing! :)

@openwrt-bot
Copy link
Author

acoul:

Greetings rogerpueyo,

// ⇒ OK, this is an issue I'm also experiencing with the "real" SXT Lite5 I have. It seems that the RESET button polarity is reversed and the kernel understands that it is always pressed. Therefore, at boot time, when you are prompted to press "f" to enter failsafe mode (or, alteratively, to press the reset button), since the button is wrongly detected as always pressed, it enters in failsafe mode.//

// I tried to fix it in these images here: https://we.tl/t-sSD1Id4jnY . I wonder if you would be so kind to give it a try.//

the above image tftp boots just fine on my device and it does sysupgrade successfully (no failsafe issue)
OpenWrt 19.07-SNAPSHOT, r11242-6703abb7ca

//Also, you're not seeing the calibration data because you are in failsafe mode, could it be?
//

//⇒ I would say that if you tftp boot the ar71xx-19.07.4 image I left at https://we.tl/t-sSD1Id4jnY the device won't enter in failsafe mode and you'll get the calibration data under /sys/firmware/routerboot.//

I am attaching the dmesg-rb911-19_07-SNAPSHOT_r11242-6703abb7ca.txt

after sysupgrade & booting from nand, the directory /sys/firmware/mikrotik/ is empty. if you see above on dmesg-rb911_18_06_9.txt, ext_wlan_data does exist.

//No worries, now you know your board is disguised as another thing! :)//

MikroTik, what can I say :)

@openwrt-bot
Copy link
Author

rogerpueyo:

// the above image tftp boots just fine on my device and it does sysupgrade successfully (no failsafe issue)

OpenWrt 19.07-SNAPSHOT, r11242-6703abb7ca//

=> Excellent! I have to give it a try on my SXT still...

// after sysupgrade & booting from nand, the directory /sys/firmware/mikrotik/ is empty. if you see above on dmesg-rb911_18_06_9.txt, ext_wlan_data does exists. //

The driver that dumps the device data to /sys/firmware/mikrotik is only in the ath79 architecture, not in the ar71xx images you are providing the log for. Can you please check it with the ath79 image for the SXT at [[https://downloads.openwrt.org/snapshots/targets/ath79/mikrotik/openwrt-ath79-mikrotik-mikrotik_routerboard-sxt-5nd-r2-initramfs-kernel.bin|https://downloads.openwrt.org/snapshots/targets/ath79/mikrotik/openwrt-ath79-mikrotik-mikrotik_routerboard-sxt-5nd-r2-initramfs-kernel.bin]]

Cheers!

@openwrt-bot
Copy link
Author

acoul:

//⇒ Sorry I didn't make myself clear. On point 2) I was asking you to boot the new ath79 image for the sxt5n, which you can find at External Linkhttps://downloads.openwrt.org/snapshots/targets/ath79/mikrotik/openwrt-ath79-mikrotik-mikrotik_routerboard-sxt-5nd-r2-initramfs-kernel.bin. The ath79 has a new driver for MikroTik devices to expose the device information, calibration data, etc. under /sys/firmware/mikrotik/hard_config , which is not in ar71xx. //

I am attaching the /tmp/dmesg-rb911-trunk-r15003-b19a684f46.txt

the /sys/firmware/mikrotik directory remains empty. it may be due to the fact that the ath9k device is not recognized due to a bus incompatibility (see attached dmesg)

I am also attaching a full listing under the /sys/firmware (ls_laR_sys_firmware.txt)

sysupgrading openwrt-ath79-mikrotik-mikrotik_routerboard-sxt-5nd-r2-squashfs-sysupgrade.bin is not successful. I have not (yet) a serial access to the board to further debug the problem

@openwrt-bot
Copy link
Author

rogerpueyo:

Hi Alexandros,

Thanks for testing that.

MikroTik devices with NAND flash also have a tiny NOR flash where the bootloader, the calibration data and the device information are stored.

For example, my SXTLite5 has a 128 kbytes NOR flash chip:

[ 0.594784] spi-nor spi0.0: w25x10 (128 Kbytes)
[ 0.599528] 1 fixed-partitions partitions found on MTD device spi0.0
[ 0.606109] Creating 1 MTD partitions on "spi0.0":
[ 0.611096] 0x000000000000-0x000000020000 : "RouterBoot"
[ 0.619637] 5 routerbootpart partitions found on MTD device RouterBoot
[ 0.626447] Creating 5 MTD partitions on "RouterBoot":
[ 0.631791] 0x000000000000-0x00000000c000 : "bootloader1"
[ 0.638514] 0x00000000c000-0x00000000d000 : "hard_config"
[ 0.645267] 0x00000000d000-0x00000000e000 : "bios"
[ 0.651435] 0x00000000e000-0x00000000f000 : "soft_config"
[ 0.658220] 0x000000010000-0x000000020000 : "bootloader2"

but your NAND-RB911Lite5 has (surprise!) a 64 kbytes chip:

[ 4.534025] spi-nor spi0.0: mx25l512e (64 Kbytes)
[ 4.538906] 1 fixed-partitions partitions found on MTD device spi0.0
[ 4.545368] Creating 1 MTD partitions on "spi0.0":
[ 4.550250] 0x000000000000-0x000000020000 : "RouterBoot"
[ 4.555650] mtd: partition "RouterBoot" extends beyond the end of device "spi0.0" -- size truncated to 0x10000
[ 4.568564] RouterBoot: routerboot partition /ahb/spi@1f000000/flash@0/partitions/partition@0/partition@10000 (/ahb/spi@1f000000/flash@0/partitions/partition@0) "bootloader2" extends past end of segment.
[ 4.586958] RouterBoot: error parsing routerboot partition /ahb/spi@1f000000/flash@0/partitions/partition@0/partition@10000 (/ahb/spi@1f000000/flash@0/partitions/partition@0)
[ 4.602778] 2 fixed-partitions partitions found on MTD device RouterBoot
[ 4.609582] Creating 2 MTD partitions on "RouterBoot":
[ 4.614816] 0x000000000000-0x000000010000 : "bootloader1"
[ 4.621413] 0x000000010000-0x000000020000 : "bootloader2"
[ 4.626971] mtd: partition "bootloader2" is out of reach -- disabled

This is why there's nothing at /sys/firmware/mikrotik, because the partitions layout is different and the driver can not get the right calibration data.

I've compiled an ath79 image for your NAND-RB911Lite5 based on the SXTLite5, which is available here. I wonder if you could give it a try. Specifically, could you tell me what is in the /proc/mtd file (or just paste the dmesg output, as you want)?

Please find the new images here: https://we.tl/t-hb7qbpHX9n

Cheers!

@openwrt-bot
Copy link
Author

rogerpueyo:

If you could also test this image and paste /proc/mtd or dmesg, that'd be amazing: https://we.tl/t-Xxw96dbGp5

@openwrt-bot
Copy link
Author

acoul:

Greetings rogerpueyo,

here is the info you requested

@openwrt-bot
Copy link
Author

rogerpueyo:

Hi Alexandros,

Thanks for testing. Unfortunately, it seems that didn't work as expected. Still, if you are up to it, I think we can figure out what's inside the flash. :-)

First, I've uploaded the code to support the RB911Lite5NAND to my GitHub, in case you want to take a look at it or compile the images yourself. Now there shouldn't be any //bootloader2// partition out of reach and the driver should be able to detect the //soft_config// and //hard_config// partitions correctly.

I wonder if you could please give this image a try: https://we.tl/t-BoM4ADRMHd (or you can compile it yourself from my branch linked above, if you prefer). In particular, could you please provide the //dmesg// output and the content of the /proc/mtd file?

Also, hopefully, you should see a message like

[ 4.619917] spi-nor spi0.0: w25x10 (128 Kbytes) => Yours will be 64 Kbytes [ 4.624667] 1 fixed-partitions partitions found on MTD device spi0.0 [ 4.631245] Creating 1 MTD partitions on "spi0.0": [ 4.636231] 0x000000000000-0x000000020000 : "RouterBoot" => Yours should finish at 0x000000010000 [ 4.644669] 4 routerbootpart partitions found on MTD device RouterBoot [ 4.651486] Creating 4 MTD partitions on "RouterBoot": [ 4.656826] 0x000000000000-0x00000000c000 : "bootloader1" [ 4.663628] 0x00000000c000-0x00000000d000 : "hard_config" [ 4.670399] 0x00000000d000-0x00000000e000 : "bios" [ 4.676512] 0x00000000e000-0x00000000f000 : "soft_config" => Your numbers might be different

If the //hard_config// / //soft_config// partitions are detected, then you should find something inside /sys/firmware/mikrotik.

Cheers!

@openwrt-bot
Copy link
Author

acoul:

Hi rogerpueyo,

great work. we have a success !

@openwrt-bot
Copy link
Author

rogerpueyo:

Hi,

Now we have the soft_config and hard_config partitions detected. Great! :-)

It would be nice to check a couple of things to make sure the routerboot driver is actually detecting everything correctly.

  • 1 First, let's see the values in soft_config. Could you please execute this command and paste the output? It will list all the files in /sys/firmware/mikrotik/soft_config and print the values:
for i in /sys/firmware/mikrotik/soft_config/*; do echo $i; cat $i; echo ""; done
  • 2 Then, let's also do it for the hard_config partition. Could you please execute this command and paste the output? It does the same, but it will avoid printing wlan_data.
for i in /sys/firmware/mikrotik/hard_config/*; do if echo $i | grep -v "wlan_data"; then cat $i; echo ""; fi; done

** => Tip**: you may want to modify the MAC address or the serial number, for privacy reasons.

  • 3 Since we now have the hard_config data, could you check the Ethernet and WiFi interfaces have the correct MAC addresses?

  • 4 Last thing, could you check if the LEDs work as expected? There should be a green power LED (always on), a green status LED (blink during boot, then off), and five green LEDs that indicate the wifi signal strenght.

Thanks!

@openwrt-bot
Copy link
Author

acoul:

//3 Since we now have the hard_config data, could you check the Ethernet and WiFi interfaces have the correct MAC addresses?//

LAN & WLAN MAC addresses are correct

//4 Last thing, could you check if the LEDs work as expected? There should be a green power LED (always on), a green status LED (blink during boot, then off), and five green LEDs that indicate the wifi signal strength.//

LEDs do work as expected (wifi signal strength included)

cheers

@openwrt-bot
Copy link
Author

acoul:

on the latest stable-19-trunk, ath9k is unable to successfully load on this arch

on your tftpboot-19-shapshot image you sent me though, ath9k is loading & is operational

what am I missing ?

personally, I find access to /proc/config.gz quite usefull

I am attaching a diff between the two dmesg from a vanilla 19-shnapshot & 19-rogerpueyo images

@openwrt-bot
Copy link
Author

rogerpueyo:

Hi,

// LAN & WLAN MAC addresses are correct
LEDs do work as expected (wifi signal strength included) //

Brilliant! Sorry, I think I didn't upload the sysupgrade image. It's at https://we.tl/t-tDBQxBkZD1.

Could you please give the sysupgrade image a try and tell me if it works? Or you can compile it from my branch (linked above). Then the patch will be ready.

Can I add "Tested-by: Alexandros C. Couloumbis your@email.ext" to the commit?

// on the latest stable-19-trunk, ath9k is unable to successfully load on this arch

on your tftpboot-19-shapshot image you sent me though, ath9k is loading & is operational

what am I missing ?

personally, I find access to /proc/config.gz quite usefull

I am attaching a diff between the two dmesg from a vanilla 19-shnapshot & 19-rogerpueyo images
//

Sorry, this is difficult to understand! ;)

We have the 19.07.4-ar71xx release. As far as I can recall, that image made your device enter failsafe mode. Was it initramfs or sysupgrade?

I also sent you a 19.07-ar71xx image I compiled from the fresh 19.07 branch. This is, the same as above (19.07.4-ar71xx release) plus a few commits that have been added to the branch since the release. As far as I remember, that one worked. Therefore, if there's a 19.07.5-ar71xx release, it should work.

// dmesg-19-snapshot_vs_19-rogerpueyo-diff.txt //

I am not sure what's going on there, but in the "r11208-ce6496d796" image (I assume it is 19.07.4-ar71xx) the br-lan interface does not appear anywhere. This looks to me like entering failsafe.

Cheers!

@openwrt-bot
Copy link
Author

acoul:

Hello rogerpueyo,

//Could you please give the sysupgrade image a try and tell me if it works? Or you can compile it from my branch (linked above). Then the patch will be ready. //

sysupgrade works just fine

root@OpenWrt:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 2816 2816 0 100% /rom
tmpfs 29080 56 29024 0% /tmp
/dev/ubi0_2 107840 44 102956 0% /overlay
overlayfs:/overlay 107840 44 102956 0% /
tmpfs 512 0 512 0% /dev

//Can I add "Tested-by: Alexandros C. Couloumbis your@email.ext" to the commit? //

yes you can

//Sorry, this is difficult to understand! ;)//

you are right. let me try again.

tftp booting [[https://downloads.openwrt.org/releases/18.06.9/targets/ar71xx/mikrotik/openwrt-18.06.9-ar71xx-mikrotik-vmlinux-initramfs-lzma.elf|18.06.9 image]] on my rb911-nand board, provides /sys/firmware/routerboot/ext_wlan_data, loads ath9k & wlan0 is visible & operational

tftp booting [[https://downloads.openwrt.org/releases/19.07.4/targets/ar71xx/mikrotik/openwrt-19.07.4-ar71xx-mikrotik-vmlinux-initramfs-lzma.elf|19.07.4 image]] gets into FAILSAFE MODE, /sys/firmware/mikrotik is empty & ath9k loads but wlan0 is not visible or operational & no ath/ath9k dmesg is reported

tftp booting the 19-snapshot image you sent me, /sys/firmware/mikrotik is empty while ath9k loads & wlan0 is visible & operational

my question is how come, the older 18.x trunk offers the wifi calibration data while the newer 19.x does not

also, would it be possible to sent me your 19.x image with the kernel option "CONFIG_IKCONFIG_PROC=y"

cheers

@openwrt-bot
Copy link
Author

rogerpueyo:

Hi,

OK, now I see.

//tftp booting 18.06.9 image on my rb911-nand board, provides /sys/firmware/routerboot/ext_wlan_data, loads ath9k & wlan0 is visible & operational//

This means 18.06.9 works just fine. Great.

//tftp booting 19.07.4 image gets into FAILSAFE MODE, /sys/firmware/mikrotik is empty & ath9k loads but wlan0 is not visible or operational & no ath/ath9k dmesg is reported//

Different things here. The FAILSAFE mode thing should not happen anymore when 19.07.5 is finally published, as whatever was causing this, it seems that has been fixed after 19.07.4.

The /sys/firmware/mikrotik folder is only available with ath79, because it includes a driver that parses the device info to that folder. You are booting an ar71xx image here, which does not have the driver, so don't expect to find anything there! :)

In any case, in FAILSAFE mode there's no WiFi: the normal boot process is interrupted before the ath9k driver is loaded. This is normal.

//tftp booting the 19-snapshot image you sent me, /sys/firmware/mikrotik is empty while ath9k loads & wlan0 is visible & operational //

Cool. This means that, whenever 19.07.5 is published, the ar71xx image should work completely again (same as 18.06.9).

//my question is how come, the older 18.x trunk offers the wifi calibration data while the newer 19.x does not //

Both do, and both do it the same way, but you are looking at different places for the same calibration data.

// also, would it be possible to sent me your 19.x image with the kernel option "CONFIG_IKCONFIG_PROC=y"

Please find them at https://we.tl/t-KXIWYSacjQ

Cheers!

@openwrt-bot
Copy link
Author

acoul:

Hi rogerpueyo,

thank you so much for your valuable feedback.

I am still trying to figure out the main changes & differences among openwrt 18.x & 19.x on the ar71xx arch.

for example, there is [[https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=ac56d253618c8c60794496518f2522863f24dedf|this 18.x commit]] which is probably based on [[https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=3fecb06fb1b0005a52dc10dba7f5ff8b8abc578b|this 19.x commit]], but on the later, there is no information about the exclusion of the rb_ext_wlan_data code & with what it has been replaced

I will dig further into this and post here my findings.

thank you also for the latest image with the "CONFIG_IKCONFIG_PROC=y" option. BTW, FWIW, this option is very handy & helps a lot if becomes a default on the main releases

best regards,

@openwrt-bot
Copy link
Author

rogerpueyo:

Hi Alexandros,

I created a pull request to add support for the device in the ath79 architecture. Could you please check it at #3652? You may also want to compile it yourself and test it.

Since the ar71xx target is already defunct, I'd focus efforts on making sure the device is correctly supported in ath79 for the next 20.xx release.

Cheers!

@openwrt-bot
Copy link
Author

acoul:

Hi rogerpueyo,

I used your patches against a fresh trunk today. you may find the results in the attached files.

/sys/firmware/mikrotik/hard_config/wlan_data

is empty though

last working wlan calibration data is on openwrt-18.x for this board

I traced the relative changes for the empty wlan calibration data on these resent commits:

[[https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=ddae86cc699703dfcdfa59c4e01736223357d786|ddae86cc699703dfcdfa59c4e01736223357d786]]
[[https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=511859de9b4df0e5472c8daa48b5d2cc6ea9ab11|511859de9b4df0e5472c8daa48b5d2cc6ea9ab11]]
[[https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=612b64e6c4ed67e113510cdcb32046f22c2681c7|612b64e6c4ed67e113510cdcb32046f22c2681c7]]
[[https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=fa2369e59bb243136f8e069e3c92d3b14f06b66a|fa2369e59bb243136f8e069e3c92d3b14f06b66a]]
[[https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=b36aa168d8906e24cfde18b5cc05de06f43df56f|b36aa168d8906e24cfde18b5cc05de06f43df56f]]
[[https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=3fecb06fb1b0005a52dc10dba7f5ff8b8abc578b|3fecb06fb1b0005a52dc10dba7f5ff8b8abc578b]]
[[https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=4cd44e5dc73ff9554cf31773b008f7ee94d979ed|4cd44e5dc73ff9554cf31773b008f7ee94d979ed]]

edit: FWIW, reverting the above patches against openwrt-19-trunk did not solve the issue.

/sys/firmware

is empty.

rogerpueyo, is /sys/firmware/mikrotik/hard_config/wlan_data also empty on your board ?

@openwrt-bot
Copy link
Author

acoul:

Hi rogerpueyo,

it looks like this device works just fine with your patches (thank you)

I am still wondering, if in your device:

/sys/firmware/mikrotik/hard_config/wlan_data

is empty (as in mine) or NOT

finally, any hints, references, pointers onto how the wlan calibration data is been loaded, utilized & used would be quite welcomed

cheers

@openwrt-bot
Copy link
Author

rogerpueyo:

Hi Alexandros,

Yes, I do have the actual calibration data under /sys/firmware/mikrotik/hard_config/wlan_data, or at least it was there the last time I checked. I'll take a look at it again.

A few questions, for the ath79 target:

  • Do the rest of parameters (e.g., board_identifier, mac_base, etc.) still appear in /sys/firmware/mikrotik/hard_config/ ? If so, then it could mean that the wlan_data information is packaged in a format the driver can not parse properly.
  • Does the wifi interface work? Does it have the correct MAC address? Do you get similar RX/TX power as before (with ar71xx)?

The driver to parse the MikroTik soft_config and hard_config info is at https://git.openwrt.org/?p=openwrt/openwrt.git;a=tree;f=target/linux/generic/files/drivers/platform/mikrotik;h=58eb706817ded67ad8aab6849bbc08eae773e981;hb=HEAD.

Cheers

@openwrt-bot
Copy link
Author

acoul:

Hey Roger,

//Yes, I do have the actual calibration data under /sys/firmware/mikrotik/hard_config/wlan_data//

in my case, this file is 0 bytes
/sys/firmware/mikrotik/hard_config/wlan_data

here is the content of the /sys/firmware/mikrotik/hard_config (all files are 4096 bytes with the exception of wlan_data which is 0 bytes

board_identifier sxt5n
board_product_code 911-5HnD
board_serial correct
booter_version 3.10
flash_info 0xc21020c2 0x0000000c 0x00000010 0x02000033 0x02010400
hw_options look bellow
mac_base correct value
mac_count 0x00000002
mem_size 0x04000000
wlan_data 0 bytes file

cat /sys/firmware/mikrotik/hard_config/hw_options
raw : 0x00200001

no UART : true
has Vreg : false
has usb : false
has ATtiny : false
no NAND : false
has LCD : false
has POE out : false
has MicroSD : false
has SIM : false
has SFP : false
has WiFi : true
has TS ADC : false
has PLC : false

//Does the wifi interface work?//

yes

//Does it have the correct MAC address?//

yes

//Do you get similar RX/TX power as before (with ar71xx)?//

currently, the only operational official ar71xx openwrt release for my 911-5HnD is 18.06.x

official ar71xx openwrt release 19.07.x has the failsafe issue

your ar71xx openwrt release 19.07.x loads ath9k & wifi looks functional but there is no calibration data under /sys/firmware

I am still trying to familiarize myself with the ath9k calibration data. it would be great, if there was a utility that could produce a comparison on performance on a wifi with & without ath9k calibration data

finally, I also try to understand what [[https://cateee.net/lkddb/web-lkddb/ATH9K_PCI_NO_EEPROM.html|this option]] actually does, affect & offer to ath9k

apparently, the calibration data has nothing to do with the actual performance of the wifi card but rather with the reported numbers as far as signal strength, noise etc. am I correct ? though, still those values may affect the way ath9k selects speed rates etc.

cheers

@openwrt-bot
Copy link
Author

acoul:

Hi Roger,

// ⇒ OK, this is an issue I'm also experiencing with the "real" SXT Lite5 I have. It seems that the RESET button polarity is reversed and the kernel understands that it is always pressed. Therefore, at boot time, when you are prompted to press "f" to enter failsafe mode (or, alteratively, to press the reset button), since the button is wrongly detected as always pressed, it enters in failsafe mode.

I tried to fix it in these images here: https://we.tl/t-sSD1Id4jnY . I wonder if you would be so kind to give it a try. //

can I kindly ask you for the relative patch to test it my self on the 19.07.x current tree?

many thanks

edit: I just synced latest 19.07-trunk, compiled & flashed my device. it looks like the failsafe issue is NOT there anymore, so please discard the above request.

still, I am unable [[https://git.openwrt.org/?p=openwrt/openwrt.git;a=log;h=refs/tags/v19.07.5|to locate]] the patch/change that fixes this issue. I need this specific patch in order to be able to compile & test older 19.07-trunk

edit II:

//The ath79 has a new driver for MikroTik devices to expose the device information, calibration data, etc. under /sys/firmware/mikrotik/hard_config , which is not in ar71xx. //

//The driver that dumps the device data to /sys/firmware/mikrotik is only in the ath79 architecture, not in the ar71xx images you are providing the log for.//

//The /sys/firmware/mikrotik folder is only available with ath79, because it includes a driver that parses the device info to that folder. You are booting an ar71xx image here, which does not have the driver, so don't expect to find anything there! :) //

Roger, sorry I've made you repeat your self so many times. FWIW, 19.07 trunk does look like though it has the new sysfs driver included since [[https://git.openwrt.org/?p=openwrt%2Fopenwrt.git&a=search&h=refs%2Fheads%2Fopenwrt-19.07&st=author&s=Thibaut+VAR%C3%88NE|2020-05-12 commit]]. I don't know if that commit is only ath79 specific & negates/breaks the old but working ar71xx relative functionality though

@openwrt-bot
Copy link
Author

rogerpueyo:

Hi,

// edit: I just synced latest 19.07-trunk, compiled & flashed my device. it looks like the failsafe issue is NOT there anymore, so please discard the above request.

still, I am unable to locate the patch/change that fixes this issue. I need this specific patch in order to be able to compile & test older 19.07-trunk
//

=> No worries. It's weird, it just got "fixed" (???) Actually, I just recompiled the ar71xx image like you did, no changes. The setting is in file target/linux/ar71xx/files/arch/mips/ath79/mach-rbsxtlite.c, line #129 (active_low):

static struct gpio_keys_button rbsxtlite_gpio_keys[] __initdata = { { .desc = "Reset button", .type = EV_KEY, .code = KEY_RESTART, .debounce_interval = SXTLITE_KEYS_DEBOUNCE_INTERVAL, .gpio = SXTLITE_GPIO_BTN_RESET, .active_low = 0, }, };

// Roger, sorry I've made you repeat your self so many times. FWIW, 19.07 trunk does look like though it has the new sysfs driver included since 2020-05-12 commit. I don't know if that commit is only ath79 specific & negates/breaks the old but working ar71xx relative functionality though //

=> Don't be sorry, you're right! My bad! :-) The new driver was ported also to ar71xx, probably to deal with devices that were already supported that had their wlan_data, etc. format changed in new versions. I was not aware of it.

If the new sysfs driver is not working in ar71xx with your device and it was working before, this is a regression. Since this bug report has gotten very long, you may want to open a new one specifically for that. Otherwise, it will be a nightmare for Thibaut (or whoever can fix it) to understand what was going on here.

// in my case, this file is 0 bytes

/sys/firmware/mikrotik/hard_config/wlan_data //

=> Mine is also 0 bytes, but if I run "hexdump wlan_data" it actually shows data, e.g.:

root@qMp-SXT5:/sys/firmware/mikrotik/hard_config# hexdump wlan_data | head -n5
0000000 ffff ffff ffff ffff ffff ffff ffff ffff
*
0001000 0202 0002 0304 0506 0000 0000 0000 0000
0001010 0000 0000 0000 0000 0000 0000 0000 1f00
0001020 3301 0000 0000 0400 1400 6d04 0300 08ff

// finally, I also try to understand what this option actually does, affect & offer to ath9k

apparently, the calibration data has nothing to do with the actual performance of the wifi card but rather with the reported numbers as far as signal strength, noise etc. am I correct ? though, still those values may affect the way ath9k selects speed rates etc. //

=> I'm not a kernel/drivers expert. As I understand it, some ath9k radios (e.g., on a PCI card) have the calibration data on the board (so you can move it from one PC to another) while other radios (e.g., those in a wireless router) have the calibration data embedded in the main memory of the device.

From my experience, I once wiped the calibration data of an ath9k radio in a NanoStation and just copy&pasting the caldata from other NanoStations did not work well. The radio worked, but speeds and distance range were very low. Fortunately I found a backup of the flash chip and could recover it.

@openwrt-bot
Copy link
Author

acoul:

Hi Roger,

// If the new sysfs driver is not working in ar71xx with your device and it was working before, this is a regression. Since this bug report has gotten very long, you may want to open a new one specifically for that. Otherwise, it will be a nightmare for Thibaut (or whoever can fix it) to understand what was going on here. //

//
From my experience, I once wiped the calibration data of an ath9k radio in a NanoStation and just copy&pasting the caldata from other NanoStations did not work well. The radio worked, but speeds and distance range were very low. Fortunately I found a backup of the flash chip and could recover it.//

if I understand correctly, if ath9k can't find the wlan calibration data, it aborts loading for that wifi. so if ath9k loads & the wlan is visible, that means that the wlan calibration data was located & loaded successfully.

openwrt-18.x loads ath9k & also offers a wlan calibration data file under the /sys/firmware directory

openwrt-19.x, ar71xx loads ath9k but there is no wlan calibration data file under the /sys/firmware directory (not even the 0 byte one). somehow though, ath9k is able to locate the art partition & load the relative data

the ath9k loading process, from the above looks like is independent from the process of exposing this data under /sys/firmware. am I right ?

the above, just for the record.

as for the ath79/trunk, my rb911-5HnD-nand is working rock stable @ the rooftop thanks to your help. I really hope this work gets into the openwrt mainline since these devices are still sold & they are quite cool & inexpensive.

Finally, I do have an rb411, an rb433 & an rb711 that are still not supported under the ath79. if you are up-to giving them some shots, let me know & I will open new tickets accordingly.

cheers

@openwrt-bot
Copy link
Author

Luflosi:

Sorry for reviving this old bug report but Alexandros C. Couloumbis wrote:
//Finally, I do have an rb411, an rb433 & an rb711 that are still not supported under the ath79. if you are up-to giving them some shots, let me know & I will open new tickets accordingly.//

I'm trying to port the rb711 (and also the gigabit version) to ath79 but haven't managed to do so because the bootloader doesn't seem to accept my images. My ar71xx image worked on the first try. See https://forum.openwrt.org/t/adding-support-for-mikrotik-routerboard-711g-5hnd/94162 for a full explanation.
If you have any suggestions what else I could try, that would be very helpful. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant