Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#2236 - MikroTik wAP AC ethernet not going up on reboot / 803x_aneg_done: SGMII link is not ok #7229

Closed
openwrt-bot opened this issue Apr 13, 2019 · 12 comments
Labels

Comments

@openwrt-bot
Copy link

champtar:

Hello OpenWrt Team,

I'm trying out MikroTik wAP AC (MikroTik RouterBOARD wAP G-5HacT2HnD / Qualcomm Atheros QCA9556 ver 1 rev 0) with both:

  • 18.06 (OpenWrt 18.06-SNAPSHOT r7737-6ac061f319 / LuCI openwrt-18.06 branch (git-19.079.57770-b99e77d / Linux 4.9.168)
  • trunk (OpenWrt SNAPSHOT r9860-9385ff654e / LuCI Master (f138fc93) / Linux 4.14.111)

On reboot, the ethernet port

  • comes up fine with 18.06 (10 reboot ok in a row)
  • most of the time doesn't come up with trunk, connecting via wifi I see a lot of "803x_aneg_done: SGMII link is not ok"

Unplugging / plugging again to the switch/computer doesnt help (it's powered via POE)

ip l show eht0

2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel master br-lan state DOWN mode DEFAULT group default qlen 1000
link/ether cc:2d:e0:0f:d5:7e brd ff:ff:ff:ff:ff:ff

The "ETH" led is green on the Mikrotik and link is considered up on my laptop
I've also tried with a switch but it doesn't seems to help

Doing a cold boot works most of the time

rbcfg show

boot_delay=2
boot_device=nandeth
boot_key=any
boot_protocol=dhcp
booter=backup
cpu_mode=powersave
uart_speed=115200

@openwrt-bot
Copy link
Author

champtar:

I just had 1 failure with 18.06, see attached logs

@openwrt-bot
Copy link
Author

champtar:

# ethtool --reset eth0 all ETHTOOL_RESET 0xffffffff Cannot issue ETHTOOL_RESET: Not supported

@openwrt-bot
Copy link
Author

xback:

Is this still true on the latest master?
I flashed a bunch of these 2 days ago and they seemed to work properly at first glance.

Thanks

@openwrt-bot
Copy link
Author

champtar:

Hi Koen,

Sorry, I haven't had time to tests again since your message
Will try shortly I hope

Regards
Etienne

@openwrt-bot
Copy link
Author

champtar:

I got the error after 10 reboot with this build
[ 0.000000] Linux version 4.14.123 (...) (gcc version 7.4.0 (OpenWrt GCC 7.4.0 r10073-0293aa72d1)) #0 Thu Jun 6 14:41:07 2019

Problem is that I only have 1 MikroTik wAP AC (the idea was to validate it works before buying ~10), so I don't know if it's the hardware, some config (rbcfg ?) or the kernel/driver

My test setup is wAP AC directly connected to my laptop, but in the past I reproduced the error with a switch also.

@Koen, could I ask you to reboot lots (10+) times to see if you are able to hit the bug ?
I you don't have the issue could you share a working build + "rbcfg show" output ?

Thanks
Etienne

@openwrt-bot
Copy link
Author

ynezz:

I would try following patch series https://patchwork.ozlabs.org/project/openwrt/list/?series=102237

@openwrt-bot
Copy link
Author

champtar:

Hi Petr,

There is a good chances these patches could fix my issue, but they are for ath79, and MikroTik wAP AC is still ar71xx :(

@openwrt-bot
Copy link
Author

champtar:

Attached a dirty patch that help a lot
with it I have way less connectivity problem (maybe 1 in 10 reboot), and disconnecting/reconnecting the ethernet cable is enough to unlock the situation

https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=4e39e213af7e3e0cd747403e8c227e145cfef988 assumes that platform_data is "struct at803x_platform_data", but for the wAP ac it's "struct mdio_gpio_platform_data"
https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/ar71xx/files/arch/mips/ath79/mach-rbspi.c;h=96511a40808f9c40b35393f5406c2437f68c3dd8;hb=HEAD#l501, so not sure how to detect which type it is

@openwrt-bot
Copy link
Author

xback:

I'm currently seeing something similar on my rb922 boards.
qca95xx with a rgmii

Ethernet only once or twice per day suddenly stops working.
running dmesg via uart doesnt show any error at all.

running tcpdump shows only arp traffic.
Running a ping to it only accelerates the arp traffic.

disable/enable the interface doesnt bring it back.
A cold boot solves it.

wondering ..

@openwrt-bot
Copy link
Author

champtar:

Attached a v2 of the patch, moving override_sgmii_aneg in device struct, not sure it's the right place for it, but it works (tm)

I'm doing reboots in loop for 1h and still no issue yet (50+ reboots)
while true; do ssh root@192.168.1.1 -o ConnectTimeout=5s -- "dmesg | grep '803x_aneg_done' ; reboot" 2>/dev/null && date; done

@openwrt-bot
Copy link
Author

champtar:

sent as RFC to the devel mailing list
https://patchwork.ozlabs.org/patch/1128549/

@openwrt-bot
Copy link
Author

champtar:

New patch
https://patchwork.ozlabs.org/patch/1131676/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant