Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#687 - Meraki MR24: Ethernet interface not detected correctly if cable not plugged at boot time #5800

Closed
openwrt-bot opened this issue Apr 8, 2017 · 8 comments
Labels

Comments

@openwrt-bot
Copy link

russell:

  • Device problem occurs on

Meraki MR24 (apm821xx)

  • Software versions of LEDE release, packages, etc.

Tested reboot-3921-g3169a6a7ad, but it's been a problem since at least last September.

  • Steps to reproduce

Disconnect ethernet cable, apply power, wait until device has booted, plug in ethernet, check for interfaces, no eth0 is listed.

This appears to be a problem during probing of the AR8035 Phy chip. When ethernet has no link, the phy detection fails, and eth0 is not created. Plugging ethernet later has no effect, because there is no interface as far as the kernel is concerned. The relevant part of the boot log looks like this:

this is the failing case:

[ 0.876611] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode [ 0.882532] /plb/opb/ethernet@ef600c00: reset timeout [ 0.888546] /plb/opb/ethernet@ef600c00: can't find PHY!

and the succeeding case:

[ 0.876672] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode [ 0.883952] eth0: EMAC-0 /plb/opb/ethernet@ef600c00, MAC 00:01:73:01:23:41 [ 0.890822] eth0: found Atheros 8035 Gigabit Ethernet PHY (0x01)
@openwrt-bot
Copy link
Author

chunkeey:

Hello,

I don't have a MR24 myself. But I know that the MyBook Live. (Same single RGMII PHY setup, but with a Broadcom PHY BCM54610) doesn't have a problem with detecting the PHY during startup, even if no ethernet cable is connected to it. I've tested todays image from downloads.lede-project.org: r3925-64175ff

Since I think this is something specific to the MR24 and since you said "but it’s been a problem since at least last September.". I think it must have been a issue from the beginning. Is this correct? Or do you remember an older version, were this was working as intended?

As for debugging this. Is this phy detection error just a problem during boot-time? Or does it persist? I think you could test this by unbinding and rebinding the emac driver as root when the MR24 has finished booting and is running:

echo "4ef600c00.ethernet" > /sys/bus/platform/drivers/emac/unbind

echo "4ef600c00.ethernet" > /sys/bus/platform/drivers/emac/bind

[ 208.654537] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
[ 208.661902] eth0: EMAC-0 /plb/opb/ethernet@ef600c00, MAC 00:90:aa:31:32:25
[ 208.668772] eth0: found Generic MII PHY (0x01)
(The MBL can use the generic PHY driver)
...

If the PHY is still not detected, then it has to be something else.

(maybe the bootloader disables the PHY via GPIO? Does Cisco advertise such a feature? In that case a dump of /sys/kernel/debug/gpio with and w/o ethernet cable attached during boot might help).

@openwrt-bot
Copy link
Author

russell:

Since I think this is something specific to the MR24 and since you said "but it’s been a problem since at least last September.". I think it must have been a issue from the beginning. Is this correct? Or do you remember an older version, were this was working as intended?

I am unaware if it ever worked. I think my initial foray into MR24 things dates from the same September period.

I tried the bind/unbind. With ethernet unplugged at boot time, then plugged in, and running the suggested command, I get this:

echo 4ef600c00.ethernet > /sys/bus/platform/drivers/emac/bind [ 49.562867] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode [ 49.568842] /plb/opb/ethernet@ef600c00: reset timeout [ 49.573925] /plb/opb/ethernet@ef600c00: can't find PHY! ash: write error: No such device

With ethernet plugged at boot time, I see this:

# ls -al /sys/bus/platform/drivers/emac drwxr-xr-x 2 root root 0 Apr 8 18:33 . drwxr-xr-x 23 root root 0 Apr 8 18:33 .. lrwxrwxrwx 1 root root 0 Apr 8 18:33 4ef600c00.ethernet -> ../../../../devices/platform/plb/plb:opb/4ef600c00.ethernet --w------- 1 root root 4096 Apr 8 18:33 bind --w------- 1 root root 4096 Apr 8 18:33 uevent --w------- 1 root root 4096 Apr 8 18:33 unbind

With the MR24 booted with no ethernet, the GPIOs dump as follows:

# cat /sys/kernel/debug/gpio gpiochip2: GPIOs 448-463, parent: pci/0000:44:00.0, ath9k-phy1: gpio-458 ( |ath9k-phy1 ) out lo

gpiochip1: GPIOs 464-479, parent: pci/0000:43:00.0, ath9k-phy0:
gpio-474 ( |ath9k-phy0 ) out lo

gpiochip0: GPIOs 480-511, /plb/opb/gpio@ef600b00:
gpio-496 ( |Reset button ) in hi
gpio-497 ( |? ) out hi
gpio-498 ( |? ) out lo
gpio-499 ( |? ) out hi
gpio-500 ( |? ) out lo
gpio-501 ( |? ) out hi
gpio-502 ( |? ) out hi
gpio-503 ( |? ) out hi

With ethernet plugged at boot time, I get these GPIOs:

# cat /sys/kernel/debug/gpio gpiochip2: GPIOs 448-463, parent: pci/0000:44:00.0, ath9k-phy1: gpio-458 ( |ath9k-phy1 ) out lo

gpiochip1: GPIOs 464-479, parent: pci/0000:43:00.0, ath9k-phy0:
gpio-474 ( |ath9k-phy0 ) out lo

gpiochip0: GPIOs 480-511, /plb/opb/gpio@ef600b00:
gpio-496 ( |Reset button ) in hi
gpio-497 ( |? ) out lo
gpio-498 ( |? ) out lo
gpio-499 ( |? ) out hi
gpio-500 ( |? ) out lo
gpio-501 ( |? ) out hi
gpio-502 ( |? ) out hi
gpio-503 ( |? ) out hi

I see gpio-497 has s different state, which is interesting.

@openwrt-bot
Copy link
Author

russell:

gpio-497 is just the LAN LED, according to target/linux/apm821xx/dts/MR24.dts:

[...] lan { label = "mr24:green:wan"; gpios = <&GPIO0 17 GPIO_ACTIVE_LOW>; }; [...]

@openwrt-bot
Copy link
Author

russell:

When booted with ethernet unplugged:

root@LEDE:/# ls -al /sys/bus/platform/drivers/emac drwxr-xr-x 2 root root 0 Apr 4 06:13 . drwxr-xr-x 23 root root 0 Apr 4 06:13 .. --w------- 1 root root 4096 Apr 4 06:13 bind --w------- 1 root root 4096 Apr 4 06:13 uevent --w------- 1 root root 4096 Apr 4 06:13 unbind

Then trying to bind:

root@LEDE:/# echo 4ef600c00.ethernet > /sys/bus/platform/drivers/emac/bind [ 533.566010] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode [ 533.572086] /plb/opb/ethernet@ef600c00: reset timeout [ 533.577159] /plb/opb/ethernet@ef600c00: can't find PHY! ash: write error: No such device

When booted with ethernet plugged:

root@LEDE:/# ls -al /sys/bus/platform/drivers/emac drwxr-xr-x 2 root root 0 Apr 8 23:10 . drwxr-xr-x 23 root root 0 Apr 8 23:10 .. lrwxrwxrwx 1 root root 0 Apr 8 23:10 4ef600c00.ethernet -> ../../../../devices/platform/plb/plb:opb/4ef600c00.ethernet --w------- 1 root root 4096 Apr 8 23:10 bind --w------- 1 root root 4096 Apr 8 23:10 uevent --w------- 1 root root 4096 Apr 8 23:10 unbind

Then unbind, ls, bind with cable still plugged:

root@LEDE:/# ls -al /sys/bus/platform/drivers/emac drwxr-xr-x 2 root root 0 Apr 8 23:10 . drwxr-xr-x 23 root root 0 Apr 8 23:10 .. lrwxrwxrwx 1 root root 0 Apr 8 23:10 4ef600c00.ethernet -> ../../../../devices/platform/plb/plb:opb/4ef600c00.ethernet --w------- 1 root root 4096 Apr 8 23:10 bind --w------- 1 root root 4096 Apr 8 23:10 uevent --w------- 1 root root 4096 Apr 8 23:10 unbind root@LEDE:/# echo 4ef600c00.ethernet > /sys/bus/platform/drivers/emac/unbind root@LEDE:/# ls -al /sys/bus/platform/drivers/emac drwxr-xr-x 2 root root 0 Apr 8 23:10 . drwxr-xr-x 23 root root 0 Apr 8 23:10 .. --w------- 1 root root 4096 Apr 8 23:10 bind --w------- 1 root root 4096 Apr 8 23:10 uevent --w------- 1 root root 4096 Apr 8 23:11 unbind root@LEDE:/# echo 4ef600c00.ethernet > /sys/bus/platform/drivers/emac/bind [ 124.622773] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode [ 124.630448] eth0: EMAC-0 /plb/opb/ethernet@ef600c00, MAC 00:01:73:01:23:41 [ 124.637355] eth0: found Atheros 8035 Gigabit Ethernet PHY (0x01) [ 124.645524] eth0: link is down [ 124.649110] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready root@LEDE:/# [ 128.178678] eth0: link is up, 1000 FDX, pause enabled [ 128.183831] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

root@LEDE:/# ls -al /sys/bus/platform/drivers/emac
drwxr-xr-x 2 root root 0 Apr 8 23:10 .
drwxr-xr-x 23 root root 0 Apr 8 23:10 ..
lrwxrwxrwx 1 root root 0 Apr 8 23:11 4ef600c00.ethernet -> ../../../../devices/platform/plb/plb:opb/4ef600c00.ethernet
--w------- 1 root root 4096 Apr 8 23:11 bind
--w------- 1 root root 4096 Apr 8 23:10 uevent
--w------- 1 root root 4096 Apr 8 23:11 unbind

Now, remove the ethernet and repeat:

root@LEDE:/# [ 323.765251] eth0: link is down root@LEDE:/# ls -al /sys/bus/platform/drivers/emac drwxr-xr-x 2 root root 0 Apr 8 23:10 . drwxr-xr-x 23 root root 0 Apr 8 23:10 .. lrwxrwxrwx 1 root root 0 Apr 8 23:11 4ef600c00.ethernet -> ../../../../devices/platform/plb/plb:opb/4ef600c00.ethernet --w------- 1 root root 4096 Apr 8 23:11 bind --w------- 1 root root 4096 Apr 8 23:10 uevent --w------- 1 root root 4096 Apr 8 23:11 unbind root@LEDE:/# echo 4ef600c00.ethernet > /sys/bus/platform/drivers/emac/unbind [ 339.974763] /plb/opb/ethernet@ef600c00: RX disable timeout root@LEDE:/# ls -al /sys/bus/platform/drivers/emac drwxr-xr-x 2 root root 0 Apr 8 23:10 . drwxr-xr-x 23 root root 0 Apr 8 23:10 .. --w------- 1 root root 4096 Apr 8 23:11 bind --w------- 1 root root 4096 Apr 8 23:10 uevent --w------- 1 root root 4096 Apr 8 23:15 unbind root@LEDE:/# echo 4ef600c00.ethernet > /sys/bus/platform/drivers/emac/bind [ 352.468859] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode [ 352.474941] /plb/opb/ethernet@ef600c00: reset timeout [ 352.480974] /plb/opb/ethernet@ef600c00: can't find PHY! ash: write error: No such device root@LEDE:/# ls -al /sys/bus/platform/drivers/emac drwxr-xr-x 2 root root 0 Apr 8 23:10 . drwxr-xr-x 23 root root 0 Apr 8 23:10 .. --w------- 1 root root 4096 Apr 8 23:15 bind --w------- 1 root root 4096 Apr 8 23:10 uevent --w------- 1 root root 4096 Apr 8 23:15 unbind

@openwrt-bot
Copy link
Author

chunkeey:

Chris Blake was also able to reproduce this issue with his unit. Problem is we are currently stuck on what's going on with regards to the AR8035.

It's likely that the u-boot is powering down the (inactive) phy before handing
control over to the LEDE installation. This is because the MX60 (same Generation, but it is the router) does disable the ports as some sort of "security measure".

I think in order to debug this, it will be necessary to look what is happening to the phy chip. Can you probe the individual pins of the chip with a digital oscilloscope? The most important pins would be the reset pin (Pin 1), the XI clocks (Pin 5 and Pin 4). Of course it would be better, if you can also probe the MDC/MDIO(pin 40 and pin 39) and see if u-boot is disabling the PHY and how.

you can find a datasheet with the pinout with google
(i.e.: https://www.redeszone.net/app/uploads/2014/04/AR8035.pdf )

The Pinout is on page 4. The Power-on Sequence is explained on page 26.

Note: We looked into meraki-linux's source as well. But there are no modifications to the emac driver, apart from adding the PHY to the list of known phys. I don't have a MR24 myself, but given this information, I wonder what Meraki's stock firmware is doing in this case. Can you reflash the original firmware and provide a bootlog? Just in case there's some note or hint in it.

@openwrt-bot
Copy link
Author

chunkeey:

A patch has been sent to the LEDE-ML:
https://patchwork.ozlabs.org/patch/772731/

The fix has also been accepted upstream by David Miller:
https://patchwork.ozlabs.org/patch/772436/

@openwrt-bot
Copy link
Author

chunkeey:

@dedeckeh
Can you please look again? The patch is still waiting (marked as new) in LEDE's patchwork: https://patchwork.ozlabs.org/patch/772731/
I can't find it in either the main source.git or in your staging tree either.

@openwrt-bot
Copy link
Author

dedeckeh:

Closed by accident; reopened again.
sorry for the noise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant