Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#4032 - Uplink ethernet ports not connecting at gigabit speeds (Ath79) #9078

Closed
openwrt-bot opened this issue Sep 11, 2021 · 17 comments
Closed
Labels

Comments

@openwrt-bot
Copy link

psherman:

  • Device problem occurs on
    2 devices known at the present time: Ubiquiti RouterStation Pro and Buffalo WZR-HP-AG300H

  • Software versions of OpenWrt/LEDE release, packages, etc.
    21.02.0 stable release (did not test RC versions).

  • Steps to reproduce
    Upgrade device to 21.02.0 and observe the uplink ethernet port speeds. Install ethtool and look at the supported link speeds for eth0 (RSPro) (eth1 on the WZR-HP-AG300H).

  • More info
    This issue presents on 21.02.0, but not on 19.07.x and it affects only eth0 on the RSPro -- limiting the uplink speed to 100Mbps. Forcing the link to 1000Mbps causes the link to fail. This does not affect eth1 (3-port switch) which still works at gigabit speeds.

On the RSPro, I have tested with 19.07.8 (ath79) and 21.02.0 (ath79), both completely fresh installations from official stable images with no changes to the default configs or packages except for the installation of ethtool. I used known good cables connected to the device and did not physically change anything between the different images. Upstream is a Ubiquiti Unifi Flex Mini (gigabit) switch, and downstream (on eth1) is a 2009 Mac Pro (gigabit).

**The following is the ethtool output from 21.02.0:
**

root@OpenWrt:~# ethtool eth0
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 4
Transceiver: external
Auto-negotiation: on
Current message level: 0x000000ff (255)
drv probe link timer ifdown ifup rx_err tx_err
Link detected: yes

**And this is the ethtool output when running on 19.07.8.
**

root@OpenWrt:~# ethtool eth0
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Port: MII
PHYAD: 4
Transceiver: internal
Auto-negotiation: on
Current message level: 0x000000ff (255)
drv probe link timer ifdown ifup rx_err tx_err
Link detected: yes

Details can be found in [[https://forum.openwrt.org/t/possible-21-02-0-bug-with-routerstation-pro-ethernet-speed/105594|my original post]] about the RouterStation Pro and in this one for the [[https://forum.openwrt.org/t/uplink-port-not-negotiating-gigabit-speed-after-21-02-update/105906|Buffalo device]].

@openwrt-bot
Copy link
Author

psherman:

I have a theory, but this is unconfirmed -- ath79 based devices that have 2 or more distinct interfaces - where one of them is typically used as an uplink - may be affected by this bug. Devices that have one logical interface (i.e. dedicate a port on an internal switch as the WAN/uplink by means of VLANs) may not be affected.

@openwrt-bot
Copy link
Author

dreamlayers:

I also have this issue on my WZR-600DHP. The WAN port is connected via a cat 6 cable to an Arris TM822 cable modem. When I reboot the WZR-600DHP, while the red diagnostic light is lit or blinking, the modem status light lights up green, indicating a gigabit connection. However, when WZR-600DHP status lights indicate normal operation, the connection is always 100 Mbit. This seems to indicate hardware on both ends and the cable are capable of gigabit, but OpenWrt is causing a 100 Mbit connection.

If I try to force gigabit on eth1 with ethtool, I either get "Speed: 100Mb/s" or "Speed: Unknown!" and "Duplex: Unknown! (255)". This can cause the link to stop functioning until I set it back to 100.

In

I don't see any differences between eth0 and eth1 which might cause this. In I see differences. Eth0, which works at gigabit, has:
fixed-link {
speed = <1000>;
full-duplex;
};
While eth1 instead has:
phy-handle = <&phy4>;

Eth1 may have worked at gigabit speed before the ar71xx to ath79 transition, but I'm not certain of that.

@openwrt-bot
Copy link
Author

psherman:

Boris - can you try running 19.07.8 to see if your ethernet port runs properly at gigabit speeds?

It appears that your device was not supported on the ath79 platform until 21.02, but the ar71xx image is [[https://downloads.openwrt.org/releases/19.07.8/targets/ar71xx/generic/openwrt-19.07.8-ar71xx-generic-wzr-600dhp-squashfs-sysupgrade.bin|here]]. Do not retain settings (make a backup so you can restore quickly later). Test with a completely default installation to the greatest extent possible. Install ethtool and get info on the ethernet ports (similar to what I had done earlier). Then do the same thing with 21.02.

This won't necessarily answer what commit(s) broke the gigabit connectivity, but it will at least point to something in the ath79 target (as compared to something specific to your device or physical setup). My device was ath79 for both 19.07 and 21.02, and something clearly broke gigabit connectivity, but only on the uplink port.

@openwrt-bot
Copy link
Author

dreamlayers:

Thank you for the quick reply. I will do that testing tonight, when internet access going down for a bit won't be a problem. Right now I was learning about how the various components work together, and I think this problem has to do with the Ethernet phy, not the Ethernet interface. The mii-tool program, available via opkg, provides more information about the phy:

mii-tool -v -v -v -v eth1

Using SIOCGMIIPHY=0x8947
eth1: negotiated 100baseTx-FD, link ok
registers for MII PHY 4:
3100 796d 004d d041 0101 c1e1 000d 2801
0000 0000 1000 0000 0000 0000 0000 2000
0862 7c10 0000 1002 002c 0000 0000 0000
3200 0000 0000 0000 0000 0000 824e 0000
product info: vendor 00:13:74, model 4 rev 1
basic mode: autonegotiation enabled
basic status: autonegotiation complete, link ok
capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
advertising: 100baseTx-FD
link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD

The eth0 port which goes to the switch chip does not have a phy.

@openwrt-bot
Copy link
Author

psherman:

This is what I get with the mii-tool output on my RouterStation Pro for eth0 (my uplink) and eth1 (LAN, not in use right now)

root@OpenWrt:~# mii-tool -v -v -v -v eth0
Using SIOCGMIIPHY=0x8947
eth0: negotiated 100baseTx-FD, link ok
registers for MII PHY 4:
1000 796d 004d d041 01e1 c1e1 000d 2801
0000 0000 1000 0000 0000 0000 0000 2000
0862 7c52 0000 1002 002c 0000 0000 0000
3200 0000 0000 0000 0000 0000 824e 0000
product info: vendor 00:13:74, model 4 rev 1
basic mode: autonegotiation enabled
basic status: autonegotiation complete, link ok
capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD

root@OpenWrt:~# mii-tool -v -v -v -v eth1
Using SIOCGMIIPHY=0x8947
eth1: no link
registers for MII PHY 0:
3100 7949 004d d041 0de1 0000 0004 2801
0000 0200 0000 0000 0000 0000 0000 2000
0862 0010 0000 0000 002c 0000 0000 0000
3200 0000 0000 0000 0000 0000 02ee 0000
product info: vendor 00:13:74, model 4 rev 1
basic mode: autonegotiation enabled
basic status: no link
capabilities: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
advertising: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control

@openwrt-bot
Copy link
Author

dreamlayers:

Yeah, I definitely have gigabit on the WAN port after installing openwrt-19.07.8-ar71xx-generic-wzr-600dhp-squashfs-sysupgrade.bin on my WZR-600DHP. Both ethtool and mii-tool say so, and the LED on the Arris TM822 cable modem is green instead of yellow.

root@OpenWrt:~# ethtool eth1 | sed 's/^/    /'
Settings for eth1:
    Supported ports: [ TP MII ]
    Supported link modes:   10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Supported FEC modes: Not reported
    Advertised link modes:  10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes
    Advertised FEC modes: Not reported
    Link partner advertised link modes:  10baseT/Half 10baseT/Full 
                                         100baseT/Half 100baseT/Full 
                                         1000baseT/Full 
    Link partner advertised pause frame use: No
    Link partner advertised auto-negotiation: Yes
    Link partner advertised FEC modes: Not reported
    Speed: 1000Mb/s
    Duplex: Full
    Port: MII
    PHYAD: 4
    Transceiver: internal
    Auto-negotiation: on
    Current message level: 0x000000ff (255)
                           drv probe link timer ifdown ifup rx_err tx_err
    Link detected: yes

Besides the speed differences, the "Transceiver" changed from "external" to "internal".

root@OpenWrt:~# mii-tool -vvvv eth1 | sed 's/^/    /'
Using SIOCGMIIPHY=0x8947
eth1: negotiated 1000baseT-FD flow-control, link ok
  registers for MII PHY 4: 
    1000 796d 004d d041 01e1 c1e1 000d 2001
    0000 0200 3800 0000 0000 0000 0000 2000
    0862 bc50 0000 1000 002c 0000 0000 0000
    3200 0000 0000 0000 0000 0000 824e 0000
  product info: vendor 00:13:74, model 4 rev 1
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  link partner: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD

dmesg also has lines confirming gigabit for both Ethernet interfaces:
[ 63.438575] eth0: link up (1000Mbps/Full duplex)
[ 67.947302] eth1: link up (1000Mbps/Full duplex)

I am able to use "ethtool -s eth1 speed 100 duplex full" and "ethtool -s eth1 speed 1000 duplex full" to change speed of eth0. The modem LED confirms the speed changes.

The dmesg changes might be relevant. This is from 19.07.8. It was interleaved with some other stuff. Weird how it says using port 4 as switch port and as PHY. But I guess they're two different devices, eth0 being ag71xx-mdio.0:00, eth1 being ag71xx-mdio.0:04.

libphy: Fixed MDIO Bus: probed
switch0: Atheros AR8316 rev. 1 switch registered on ag71xx-mdio.0
ar8316: Using port 4 as switch port
ag71xx ag71xx.0: connected to PHY at ag71xx-mdio.0:00 [uid=004dd041, driver=Atheros AR8216/AR8236/AR8316]
eth0: Atheros AG71xx at 0xb9000000, irq 4, mode: rgmii
ar8316: Using port 4 as PHY
ag71xx ag71xx.1: connected to PHY at ag71xx-mdio.0:04 [uid=004dd041, driver=Atheros AR8216/AR8236/AR8316]
eth1: Atheros AG71xx at 0xba000000, irq 5, mode: rgmii

This was from 21.02.0. Note how eth0 is changed to fixed, like the device tree says, with the Generic PHY driver. It doesn't seem like anything changed regarding eth1.

libphy: Fixed MDIO Bus: probed
libphy: ag71xx_mdio: probed
switch0: Atheros AR8316 rev. 1 switch registered on mdio.0
ag71xx 19000000.eth: connected to PHY at fixed-0:00 [uid=00000000, driver=Generic PHY]
random: fast init done
eth0: Atheros AG71xx at 0xb9000000, irq 4, mode: rgmii
ar8316: Using port 4 as PHY
ag71xx 1a000000.eth: connected to PHY at mdio.0:04 [uid=004dd041, driver=Atheros AR8216/AR8236/AR8316]
eth1: Atheros AG71xx at 0xba000000, irq 5, mode: rgmii

It's weird how in 19.07.8 mii-tool claims eth0 is 100 Mbit. But ethtool and dmesg claim 1000, and I believe them. What is this anyways? I thought eth0 is connected directly to the switch chip, with no phy there.

root@OpenWrt:~# mii-tool -vvvv eth0 | sed 's/^/    /'
Using SIOCGMIIPHY=0x8947
eth0: negotiated 100baseTx-FD flow-control, link ok
  registers for MII PHY 0: 
    1000 796d 004d d041 0de1 4d01 0005 2001
    0000 0200 1000 0000 0000 0000 0000 2000
    0862 7c1e 0000 1002 002c 0000 0000 0000
    3200 0000 0000 0000 0000 0000 02ee 0000
  product info: vendor 00:13:74, model 4 rev 1
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
  link partner: 100baseTx-FD flow-control

They don't seem swapped. Using mii-tool on eth0 affects LAN and eth1 affects WAN.

@openwrt-bot
Copy link
Author

psherman:

Boris - Thanks for all the new info

All three of affected routers known thus far use the AR7161 SoC. It would be nice to know if this affects any other SoCs in the family, or if this is unique to this specific chip.

We can also infer from the available data that this issue is caused by some commit that is present in 21.02 and not in 19.07 (the two other devices are available as ath79 targets for both of those versions), so we can reasonably assume that this specific bug is not directly related to the transition from ar71xx to ath79.

I have searched the git repos for changes that could affect this (searching commits for references to ar7161, ar7100, router station pro, WZR-HP-AG300H) -- I was unable to find anything obvious, but I'm not a developer and I don't really know how to identify the root-cause of this type of bug. I am hoping that one of the devs will see the bug and have some ideas.

@openwrt-bot
Copy link
Author

dreamlayers:

The transition from ar71xx to ath79 involved a change from using C code to configure the system to using device trees. The next thing I'd like to see is whether the last ar71xx build works and the first ath79 fails. Maybe the device tree never properly set up the device like it was set up via the old method?

You can see the old style configuration at

static void __init wzrhpag300h_setup(void)

ath79_eth0_data.phy_if_mode = PHY_INTERFACE_MODE_RGMII;
ath79_eth0_data.speed = SPEED_1000;
ath79_eth0_data.duplex = DUPLEX_FULL;
ath79_eth0_data.phy_mask = BIT(0);
    
ath79_eth1_data.phy_if_mode = PHY_INTERFACE_MODE_RGMII;
ath79_eth1_data.phy_mask = BIT(4);
    
ath79_register_eth(0);
ath79_register_eth(1);

The phy_mask there seems to have been forgotten, and some other devices have them in the mdio0 section of the device tree. So, I tried adding "phy-mask = <0x10>;" below "status = "okay";" in target/linux/ath79/dts/ar7161_buffalo_wzr-hp-ag300h.dtsi of openwrt-21.02. It did not help.

@openwrt-bot
Copy link
Author

dreamlayers:

It's weird that "Atheros AR8216/AR8236/AR8316 mdio.0:04" is printing messages about LAN switch ports going up and down, like this:

[ 2598.530695] Atheros AR8216/AR8236/AR8316 mdio.0:04: Port 1 is down
[ 2602.626507] Atheros AR8216/AR8236/AR8316 mdio.0:04: Port 1 is up
[ 2681.473414] Atheros AR8216/AR8236/AR8316 mdio.0:04: Port 2 is down
[ 2687.617192] Atheros AR8216/AR8236/AR8316 mdio.0:04: Port 2 is up

That is supposed to be the phy for eth1, the WAN interface, but those are LAN events, which I can create manually. Why are LAN things being printed? This made me think that maybe the phy were swapped in the old configuration and mdio.0:00 might make gigabit work. So I tried the attached patch, and got a gigabit connection according to both dmesg and the router. However, no data was received on eth1, and so there was no internet access. With the patch, WAN port disconnection and reconnection was reported as "Port 5". With unpatched 21.02.0, there is no such reporting, and instead I see "eth1: link down" and "eth1: link up (100Mbps/Full duplex)".

This patch is attached for exact record of what I did, not as a recommended solution.

@openwrt-bot
Copy link
Author

dreamlayers:

The phy numbering is correct. You can look at switch ports via "swconfig dev switch0 show", and different phy via "mii-tool -vvv -p # eth1", replacing # with 0 through 4. Switch port 0 connects to the CPU, without a phy. Switch ports 1 to 4 are LAN ports on the back, and correspond to phy 0 to 3. The wan port corresponds to phy 4.

The "ath79_eth1_data.phy_mask = BIT(4);" in the old config corresponds to the eth1 phy being at address 4 in the new device tree. That's fine, it selects the correct phy.

The problem can be seen in "registers for MII PHY 4:" output of "mii-tool -p 4 -vvv eth1", the second number in the second row. It's 0200 for phy 0 through 3 and phy 4 in 19.07.8, but 0000 for phy 4 in 21.02.0. This seems to be a real phy hardware register claiming the hardware doesn't support gigabit. I do not know why. The other phy will show gigabit support even when connected at slower speeds to devices which don't support gigabit on the other end.

@openwrt-bot
Copy link
Author

dreamlayers:

After editing mii-tool (make package/net-tools/compile) and then copying staging_dir/target-mips_24kc_musl/root-ath79/usr/sbin/mii-tool to the router, I can reliably establish a real but unusable gigabit link in 21.02.0. I edited the reset code at https://github.com/ecki/net-tools/blob/master/mii-tool.c#L397 to use ar8216.c initialization:

if (opt_reset) {
    printf("reinitializing the transceiver...\n");
    mdio_write(skfd, MII_ADVERTISE,
               ADVERTISE_ALL | ADVERTISE_PAUSE_CAP | ADVERTISE_PAUSE_ASYM);
    mdio_write(skfd, MII_CTRL1000, ADVERTISE_1000FULL);
    mdio_write(skfd, MII_BMCR, BMCR_RESET | BMCR_ANENABLE);
}

By itself "mii-tool -R eth1" still doesn't accomplish anything, but if I do "mii-tool -F 10baseT-FD eth1" before the reset, then the reset always establishes a gigabit connection. Though it's not usable, as ethtool still claims speed is 100 and supported link modes are 100 and below. Seems that for some reason the kernel believes eth1 doesn't support gigabit. I wonder if it's because of failed phy initialization?

@openwrt-bot
Copy link
Author

dreamlayers:

This problem is already solved in master branch via https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=766e0f584a325b0b80a97bbc86ca515d97c63001

Quoting from the commit message:
Modifying PHY capabilities in the probe function broke with upstream
commit 92ed2eb7f4b7 ("net: phy: probe the PHY before determining the
supported features").

AR8316 switches only support 10/100 Mbit/s link modes because of this
change.

Provide a get_features method for the PHY driver, so Gigabit link mode
will be advertised to link partners again.

The fix is not in 21.02.0. You can add it before building from source via:
git cherry-pick 766e0f5

It would be nice if this fix could be included in future releases of version 21.

I spent a lot of time on this needlessly, but that's okay, because I learned something. Sorry about posting so much about my investigations.

@openwrt-bot
Copy link
Author

psherman:

Boris - Thank you for all the work you've done. Incredible job! I never would have found this myself (I'm not a software engineer or anything close to it).

After compiling 21.02.0 + the cherrypicked commit, I can confirm that this has fixed the bug on my RouterStation Pro!

I hope we can get this merged into 21.02.1.

@openwrt-bot
Copy link
Author

dreamlayers:

You're welcome! I've e-mailed the David Bauer, the author of the commit. They also signed off on it and and they've made commits to the 21.02 branch, so they can do it. I don't know if it's appropriate to ask but there doesn't seem to be any documented contact for this sort of thing so I didn't know what else to do.

@openwrt-bot
Copy link
Author

psherman:

Thanks for emailing the author. I think that David goes by @blocktrron on the forums. and if so, he's been tagged on one of my threads (thanks to @hnyman). So he should get the message one way or another. If we don't hear back from him, we can create a pull request -- I'm happy to take care of that if it is necessary (I'm guessing a week or so is reasonable to see if David replies and can help, otherwise the PR will bring this to the attention of others who can help merge this cherrypick).

@openwrt-bot
Copy link
Author

dreamlayers:

OpenWrt 21.02.1 has been released now, and I confirm that this is fixed for me in openwrt-21.02.1-ath79-generic-buffalo_wzr-600dhp-squashfs-sysupgrade.bin

David made the commit: https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=e7c5e08db09a0034a1dc5f013af651339743fd40

Thank you!

@openwrt-bot
Copy link
Author

psherman:

I can also confirm that this is fixed on my device (Routerstation Pro).

Thanks Boris for the work to chase this down and find the commit, and thank you David for the fix and the cherrypick into 21.02.1!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant