OpenWrt/LEDE Project

  • Status Closed
  • Percent Complete
    100%
  • Task Type Bug Report
  • Category Base system
  • Assigned To No-one
  • Operating System All
  • Severity Medium
  • Priority Very Low
  • Reported Version openwrt-21.02
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Peter Sherman - 11.09.2021
Last edited by Ted Hess - 20.10.2021

FS#4032 - Uplink ethernet ports not connecting at gigabit speeds (Ath79)

- Device problem occurs on
2 devices known at the present time: Ubiquiti RouterStation Pro and Buffalo WZR-HP-AG300H

- Software versions of OpenWrt/LEDE release, packages, etc.
21.02.0 stable release (did not test RC versions).

- Steps to reproduce
Upgrade device to 21.02.0 and observe the uplink ethernet port speeds. Install ethtool and look at the supported link speeds for eth0 (RSPro) (eth1 on the WZR-HP-AG300H).

- More info
This issue presents on 21.02.0, but not on 19.07.x and it affects only eth0 on the RSPro – limiting the uplink speed to 100Mbps. Forcing the link to 1000Mbps causes the link to fail. This does not affect eth1 (3-port switch) which still works at gigabit speeds.

On the RSPro, I have tested with 19.07.8 (ath79) and 21.02.0 (ath79), both completely fresh installations from official stable images with no changes to the default configs or packages except for the installation of ethtool. I used known good cables connected to the device and did not physically change anything between the different images. Upstream is a Ubiquiti Unifi Flex Mini (gigabit) switch, and downstream (on eth1) is a 2009 Mac Pro (gigabit).


The following is the ethtool output from 21.02.0:

root@OpenWrt:~# ethtool eth0
Settings for eth0:

Supported ports: [ TP MII ]
Supported link modes:   10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes:  10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes:  10baseT/Half 10baseT/Full 
                                     100baseT/Half 100baseT/Full 
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 4
Transceiver: external
Auto-negotiation: on
Current message level: 0x000000ff (255)
		       drv probe link timer ifdown ifup rx_err tx_err
Link detected: yes

—-
And this is the ethtool output when running on 19.07.8.

root@OpenWrt:~# ethtool eth0
Settings for eth0:

Supported ports: [ TP MII ]
Supported link modes:   10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
                        1000baseT/Full 
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes:  10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
                        1000baseT/Full 
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes:  10baseT/Half 10baseT/Full 
                                     100baseT/Half 100baseT/Full 
                                     1000baseT/Full 
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Port: MII
PHYAD: 4
Transceiver: internal
Auto-negotiation: on
Current message level: 0x000000ff (255)
		       drv probe link timer ifdown ifup rx_err tx_err
Link detected: yes

—-

Details can be found in my original post about the RouterStation Pro and in this one for the Buffalo device.

Closed by  Ted Hess
20.10.2021 11:20
Reason for closing:  Fixed
Peter Sherman commented on 12.09.2021 23:26

I have a theory, but this is unconfirmed – ath79 based devices that have 2 or more distinct interfaces - where one of them is typically used as an uplink - may be affected by this bug. Devices that have one logical interface (i.e. dedicate a port on an internal switch as the WAN/uplink by means of VLANs) may not be affected.

Boris Gjenero commented on 08.10.2021 15:52

I also have this issue on my WZR-600DHP. The WAN port is connected via a cat 6 cable to an Arris TM822 cable modem. When I reboot the WZR-600DHP, while the red diagnostic light is lit or blinking, the modem status light lights up green, indicating a gigabit connection. However, when WZR-600DHP status lights indicate normal operation, the connection is always 100 Mbit. This seems to indicate hardware on both ends and the cable are capable of gigabit, but OpenWrt is causing a 100 Mbit connection.

If I try to force gigabit on eth1 with ethtool, I either get "Speed: 100Mb/s" or "Speed: Unknown!" and "Duplex: Unknown! (255)". This can cause the link to stop functioning until I set it back to 100.

In https://github.com/openwrt/openwrt/blob/ec6293febc244d187e71a6e54f44920be679cde4/target/linux/ath79/dts/ar7100.dtsi#L188 I don't see any differences between eth0 and eth1 which might cause this. In https://github.com/openwrt/openwrt/blob/ec6293febc244d187e71a6e54f44920be679cde4/target/linux/ath79/dts/ar7161_buffalo_wzr-hp-ag300h.dtsi#L248 I see differences. Eth0, which works at gigabit, has:

fixed-link {
	speed = <1000>;
	full-duplex;
};

While eth1 instead has:

phy-handle = <&phy4>;

Eth1 may have worked at gigabit speed before the ar71xx to ath79 transition, but I'm not certain of that.

Peter Sherman commented on 08.10.2021 17:12

Boris - can you try running 19.07.8 to see if your ethernet port runs properly at gigabit speeds?

It appears that your device was not supported on the ath79 platform until 21.02, but the ar71xx image is here. Do not retain settings (make a backup so you can restore quickly later). Test with a completely default installation to the greatest extent possible. Install ethtool and get info on the ethernet ports (similar to what I had done earlier). Then do the same thing with 21.02.

This won't necessarily answer what commit(s) broke the gigabit connectivity, but it will at least point to something in the ath79 target (as compared to something specific to your device or physical setup). My device was ath79 for both 19.07 and 21.02, and something clearly broke gigabit connectivity, but only on the uplink port.

Boris Gjenero commented on 08.10.2021 17:34

Thank you for the quick reply. I will do that testing tonight, when internet access going down for a bit won't be a problem. Right now I was learning about how the various components work together, and I think this problem has to do with the Ethernet phy, not the Ethernet interface. The mii-tool program, available via opkg, provides more information about the phy:
# mii-tool -v -v -v -v eth1
Using SIOCGMIIPHY=0x8947
eth1: negotiated 100baseTx-FD, link ok

registers for MII PHY 4:
  3100 796d 004d d041 0101 c1e1 000d 2801
  0000 0000 1000 0000 0000 0000 0000 2000
  0862 7c10 0000 1002 002c 0000 0000 0000
  3200 0000 0000 0000 0000 0000 824e 0000
product info: vendor 00:13:74, model 4 rev 1
basic mode:   autonegotiation enabled
basic status: autonegotiation complete, link ok
capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
advertising:  100baseTx-FD
link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD

The eth0 port which goes to the switch chip does not have a phy.

Peter Sherman commented on 08.10.2021 21:01

This is what I get with the mii-tool output on my RouterStation Pro for eth0 (my uplink) and eth1 (LAN, not in use right now)

root@OpenWrt:~# mii-tool -v -v -v -v eth0
Using SIOCGMIIPHY=0x8947
eth0: negotiated 100baseTx-FD, link ok

registers for MII PHY 4: 
  1000 796d 004d d041 01e1 c1e1 000d 2801
  0000 0000 1000 0000 0000 0000 0000 2000
  0862 7c52 0000 1002 002c 0000 0000 0000
  3200 0000 0000 0000 0000 0000 824e 0000
product info: vendor 00:13:74, model 4 rev 1
basic mode:   autonegotiation enabled
basic status: autonegotiation complete, link ok
capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD

root@OpenWrt:~# mii-tool -v -v -v -v eth1
Using SIOCGMIIPHY=0x8947
eth1: no link

registers for MII PHY 0: 
  3100 7949 004d d041 0de1 0000 0004 2801
  0000 0200 0000 0000 0000 0000 0000 2000
  0862 0010 0000 0000 002c 0000 0000 0000
  3200 0000 0000 0000 0000 0000 02ee 0000
product info: vendor 00:13:74, model 4 rev 1
basic mode:   autonegotiation enabled
basic status: no link
capabilities: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
advertising:  1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
Boris Gjenero commented on 09.10.2021 04:14

Yeah, I definitely have gigabit on the WAN port after installing openwrt-19.07.8-ar71xx-generic-wzr-600dhp-squashfs-sysupgrade.bin on my WZR-600DHP. Both ethtool and mii-tool say so, and the LED on the Arris TM822 cable modem is green instead of yellow.

  root@OpenWrt:~# ethtool eth1 | sed 's/^/    /'
  Settings for eth1:
      Supported ports: [ TP MII ]
      Supported link modes:   10baseT/Half 10baseT/Full 
                              100baseT/Half 100baseT/Full 
                              1000baseT/Full 
      Supported pause frame use: No
      Supports auto-negotiation: Yes
      Supported FEC modes: Not reported
      Advertised link modes:  10baseT/Half 10baseT/Full 
                              100baseT/Half 100baseT/Full 
                              1000baseT/Full 
      Advertised pause frame use: No
      Advertised auto-negotiation: Yes
      Advertised FEC modes: Not reported
      Link partner advertised link modes:  10baseT/Half 10baseT/Full 
                                           100baseT/Half 100baseT/Full 
                                           1000baseT/Full 
      Link partner advertised pause frame use: No
      Link partner advertised auto-negotiation: Yes
      Link partner advertised FEC modes: Not reported
      Speed: 1000Mb/s
      Duplex: Full
      Port: MII
      PHYAD: 4
      Transceiver: internal
      Auto-negotiation: on
      Current message level: 0x000000ff (255)
                             drv probe link timer ifdown ifup rx_err tx_err
      Link detected: yes

Besides the speed differences, the "Transceiver" changed from "external" to "internal".

  root@OpenWrt:~# mii-tool -vvvv eth1 | sed 's/^/    /'
  Using SIOCGMIIPHY=0x8947
  eth1: negotiated 1000baseT-FD flow-control, link ok
    registers for MII PHY 4: 
      1000 796d 004d d041 01e1 c1e1 000d 2001
      0000 0200 3800 0000 0000 0000 0000 2000
      0862 bc50 0000 1000 002c 0000 0000 0000
      3200 0000 0000 0000 0000 0000 824e 0000
    product info: vendor 00:13:74, model 4 rev 1
    basic mode:   autonegotiation enabled
    basic status: autonegotiation complete, link ok
    capabilities: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
    advertising:  1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
    link partner: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD

dmesg also has lines confirming gigabit for both Ethernet interfaces:

  [   63.438575] eth0: link up (1000Mbps/Full duplex)
  [   67.947302] eth1: link up (1000Mbps/Full duplex)

I am able to use "ethtool -s eth1 speed 100 duplex full" and "ethtool -s eth1 speed 1000 duplex full" to change speed of eth0. The modem LED confirms the speed changes.

The dmesg changes might be relevant. This is from 19.07.8. It was interleaved with some other stuff. Weird how it says using port 4 as switch port and as PHY. But I guess they're two different devices, eth0 being ag71xx-mdio.0:00, eth1 being ag71xx-mdio.0:04.

  libphy: Fixed MDIO Bus: probed
  switch0: Atheros AR8316 rev. 1 switch registered on ag71xx-mdio.0
  ar8316: Using port 4 as switch port
  ag71xx ag71xx.0: connected to PHY at ag71xx-mdio.0:00 [uid=004dd041, driver=Atheros AR8216/AR8236/AR8316]
  eth0: Atheros AG71xx at 0xb9000000, irq 4, mode: rgmii
  ar8316: Using port 4 as PHY
  ag71xx ag71xx.1: connected to PHY at ag71xx-mdio.0:04 [uid=004dd041, driver=Atheros AR8216/AR8236/AR8316]
  eth1: Atheros AG71xx at 0xba000000, irq 5, mode: rgmii

This was from 21.02.0. Note how eth0 is changed to fixed, like the device tree says, with the Generic PHY driver. It doesn't seem like anything changed regarding eth1.

  libphy: Fixed MDIO Bus: probed
  libphy: ag71xx_mdio: probed
  switch0: Atheros AR8316 rev. 1 switch registered on mdio.0
  ag71xx 19000000.eth: connected to PHY at fixed-0:00 [uid=00000000, driver=Generic PHY]
  random: fast init done
  eth0: Atheros AG71xx at 0xb9000000, irq 4, mode: rgmii
  ar8316: Using port 4 as PHY
  ag71xx 1a000000.eth: connected to PHY at mdio.0:04 [uid=004dd041, driver=Atheros AR8216/AR8236/AR8316]
  eth1: Atheros AG71xx at 0xba000000, irq 5, mode: rgmii

It's weird how in 19.07.8 mii-tool claims eth0 is 100 Mbit. But ethtool and dmesg claim 1000, and I believe them. What is this anyways? I thought eth0 is connected directly to the switch chip, with no phy there.

  root@OpenWrt:~# mii-tool -vvvv eth0 | sed 's/^/    /'
  Using SIOCGMIIPHY=0x8947
  eth0: negotiated 100baseTx-FD flow-control, link ok
    registers for MII PHY 0: 
      1000 796d 004d d041 0de1 4d01 0005 2001
      0000 0200 1000 0000 0000 0000 0000 2000
      0862 7c1e 0000 1002 002c 0000 0000 0000
      3200 0000 0000 0000 0000 0000 02ee 0000
    product info: vendor 00:13:74, model 4 rev 1
    basic mode:   autonegotiation enabled
    basic status: autonegotiation complete, link ok
    capabilities: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
    advertising:  1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
    link partner: 100baseTx-FD flow-control

They don't seem swapped. Using mii-tool on eth0 affects LAN and eth1 affects WAN.

Peter Sherman commented on 09.10.2021 05:05

Boris - Thanks for all the new info

All three of affected routers known thus far use the AR7161 SoC. It would be nice to know if this affects any other SoCs in the family, or if this is unique to this specific chip.

We can also infer from the available data that this issue is caused by some commit that is present in 21.02 and not in 19.07 (the two other devices are available as ath79 targets for both of those versions), so we can reasonably assume that this specific bug is not directly related to the transition from ar71xx to ath79.

I have searched the git repos for changes that could affect this (searching commits for references to ar7161, ar7100, router station pro, WZR-HP-AG300H) – I was unable to find anything obvious, but I'm not a developer and I don't really know how to identify the root-cause of this type of bug. I am hoping that one of the devs will see the bug and have some ideas.

Boris Gjenero commented on 09.10.2021 14:40

The transition from ar71xx to ath79 involved a change from using C code to configure the system to using device trees. The next thing I'd like to see is whether the last ar71xx build works and the first ath79 fails. Maybe the device tree never properly set up the device like it was set up via the old method?

You can see the old style configuration at https://github.com/openwrt/openwrt/blob/a5e404d1923d135d335e4ece83f87e6e891396e2/target/linux/ar71xx/files/arch/mips/ath79/mach-wzr-hp-ag300h.c#L157

ath79_eth0_data.phy_if_mode = PHY_INTERFACE_MODE_RGMII;
ath79_eth0_data.speed = SPEED_1000;
ath79_eth0_data.duplex = DUPLEX_FULL;
ath79_eth0_data.phy_mask = BIT(0);
      
ath79_eth1_data.phy_if_mode = PHY_INTERFACE_MODE_RGMII;
ath79_eth1_data.phy_mask = BIT(4);
      
ath79_register_eth(0);
ath79_register_eth(1);

The phy_mask there seems to have been forgotten, and some other devices have them in the mdio0 section of the device tree. So, I tried adding "phy-mask = <0x10>;" below "status = "okay";" in target/linux/ath79/dts/ar7161_buffalo_wzr-hp-ag300h.dtsi of openwrt-21.02. It did not help.

Boris Gjenero commented on 09.10.2021 16:00

It's weird that "Atheros AR8216/AR8236/AR8316 mdio.0:04" is printing messages about LAN switch ports going up and down, like this:

  [ 2598.530695] Atheros AR8216/AR8236/AR8316 mdio.0:04: Port 1 is down
  [ 2602.626507] Atheros AR8216/AR8236/AR8316 mdio.0:04: Port 1 is up
  [ 2681.473414] Atheros AR8216/AR8236/AR8316 mdio.0:04: Port 2 is down
  [ 2687.617192] Atheros AR8216/AR8236/AR8316 mdio.0:04: Port 2 is up

That is supposed to be the phy for eth1, the WAN interface, but those are LAN events, which I can create manually. Why are LAN things being printed? This made me think that maybe the phy were swapped in the old configuration and mdio.0:00 might make gigabit work. So I tried the attached patch, and got a gigabit connection according to both dmesg and the router. However, no data was received on eth1, and so there was no internet access. With the patch, WAN port disconnection and reconnection was reported as "Port 5". With unpatched 21.02.0, there is no such reporting, and instead I see "eth1: link down" and "eth1: link up (100Mbps/Full duplex)".

This patch is attached for exact record of what I did, not as a recommended solution.

Boris Gjenero commented on 10.10.2021 03:20

The phy numbering is correct. You can look at switch ports via "swconfig dev switch0 show", and different phy via "mii-tool -vvv -p # eth1", replacing # with 0 through 4. Switch port 0 connects to the CPU, without a phy. Switch ports 1 to 4 are LAN ports on the back, and correspond to phy 0 to 3. The wan port corresponds to phy 4.

The "ath79_eth1_data.phy_mask = BIT(4);" in the old config corresponds to the eth1 phy being at address 4 in the new device tree. That's fine, it selects the correct phy.

The problem can be seen in "registers for MII PHY 4:" output of "mii-tool -p 4 -vvv eth1", the second number in the second row. It's 0200 for phy 0 through 3 and phy 4 in 19.07.8, but 0000 for phy 4 in 21.02.0. This seems to be a real phy hardware register claiming the hardware doesn't support gigabit. I do not know why. The other phy will show gigabit support even when connected at slower speeds to devices which don't support gigabit on the other end.

Boris Gjenero commented on 10.10.2021 05:02

After editing mii-tool (make package/net-tools/compile) and then copying staging_dir/target-mips_24kc_musl/root-ath79/usr/sbin/mii-tool to the router, I can reliably establish a real but unusable gigabit link in 21.02.0. I edited the reset code at https://github.com/ecki/net-tools/blob/master/mii-tool.c#L397 to use ar8216.c initialization:

  if (opt_reset) {
      printf("reinitializing the transceiver...\n");
      mdio_write(skfd, MII_ADVERTISE,
                 ADVERTISE_ALL | ADVERTISE_PAUSE_CAP | ADVERTISE_PAUSE_ASYM);
      mdio_write(skfd, MII_CTRL1000, ADVERTISE_1000FULL);
      mdio_write(skfd, MII_BMCR, BMCR_RESET | BMCR_ANENABLE);
  }

By itself "mii-tool -R eth1" still doesn't accomplish anything, but if I do "mii-tool -F 10baseT-FD eth1" before the reset, then the reset always establishes a gigabit connection. Though it's not usable, as ethtool still claims speed is 100 and supported link modes are 100 and below. Seems that for some reason the kernel believes eth1 doesn't support gigabit. I wonder if it's because of failed phy initialization?

Boris Gjenero commented on 10.10.2021 19:01

This problem is already solved in master branch via https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=766e0f584a325b0b80a97bbc86ca515d97c63001

Quoting from the commit message:

  Modifying PHY capabilities in the probe function broke with upstream
  commit 92ed2eb7f4b7 ("net: phy: probe the PHY before determining the
  supported features").
  
  AR8316 switches only support 10/100 Mbit/s link modes because of this
  change.
  
  Provide a get_features method for the PHY driver, so Gigabit link mode
  will be advertised to link partners again.

The fix is not in 21.02.0. You can add it before building from source via:

  git cherry-pick 766e0f584a325b0b80a97bbc86ca515d97c63001

It would be nice if this fix could be included in future releases of version 21.

I spent a lot of time on this needlessly, but that's okay, because I learned something. Sorry about posting so much about my investigations.

Peter Sherman commented on 12.10.2021 03:18

Boris - Thank you for all the work you've done. Incredible job! I never would have found this myself (I'm not a software engineer or anything close to it).

After compiling 21.02.0 + the cherrypicked commit, I can confirm that this has fixed the bug on my RouterStation Pro!

I hope we can get this merged into 21.02.1.

Boris Gjenero commented on 12.10.2021 04:03

You're welcome! I've e-mailed the David Bauer, the author of the commit. They also signed off on it and and they've made commits to the 21.02 branch, so they can do it. I don't know if it's appropriate to ask but there doesn't seem to be any documented contact for this sort of thing so I didn't know what else to do.

Peter Sherman commented on 12.10.2021 06:34

Thanks for emailing the author. I think that David goes by @blocktrron on the forums. and if so, he's been tagged on one of my threads (thanks to @hnyman). So he should get the message one way or another. If we don't hear back from him, we can create a pull request – I'm happy to take care of that if it is necessary (I'm guessing a week or so is reasonable to see if David replies and can help, otherwise the PR will bring this to the attention of others who can help merge this cherrypick).

Boris Gjenero commented on 28.10.2021 09:08

OpenWrt 21.02.1 has been released now, and I confirm that this is fixed for me in openwrt-21.02.1-ath79-generic-buffalo_wzr-600dhp-squashfs-sysupgrade.bin

David made the commit: https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=e7c5e08db09a0034a1dc5f013af651339743fd40

Thank you!

Peter Sherman commented on 30.10.2021 02:48

I can also confirm that this is fixed on my device (Routerstation Pro).

Thanks Boris for the work to chase this down and find the commit, and thank you David for the fix and the cherrypick into 21.02.1!

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing