Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#2563 - Extremely high latency on 5GHz radio ath10k-ct (tplink archer C7 v2) #7570

Closed
openwrt-bot opened this issue Oct 23, 2019 · 26 comments
Labels

Comments

@openwrt-bot
Copy link

muddyfeet:

Supply the following if possible:

  • Device problem occurs on
    Archer C7 v2 configured as a simple WPA2-PSK access point only (no WAN routing, wpad-basic, ath10k-ct, and clean radio channels)

  • Software versions of OpenWrt/LEDE release, packages, etc.

Trunk (ath79)

  • Steps to reproduce
    Latency on builds prior to 11 Sep on 5 GHz radio (ath10k-ct) is around 200-400ms (20ms on the 2.4GHz ath9 radio) - which is still unacceptably high. Despite this high latency I can achieve throughput of up to 50Mbps
    Post 11 Sep, latency increases to over 2 sec making the 5GHz radio close to unusable (the 2.4GHz radio is unchanged around 20ms). Throughput drops to kbps.

Have compiled and tested builds e667d6f, 9f34bf5, 7bed9bf, and 6d819fa. The major shift in latency occurs with 6d819fa (which makes absolutely no sense as it is a change for the gemini kernel build). The two middle builds have changes to hostapd but do not appear to cause the 2 sec latency. The issue still remains on 7a577e9 (22 Oct), which I have tried with different configurations of hostapd (full-wolfssl and mini).

I have tried different encryption, no encryption, and the non-CT driver and firmware to no benefit.

My testing is with a Microsoft Surface Pro 6 which may be half the problem as I'm not seeing the 2 sec latency on the other devices I own (Apple ipad and iphones). BTW I am aware of the current 5GHz radio issues with the recent Surface driver (I am using an older driver without these issues).

I have reverted to the 18.06 ar71xx branch to fix the problem for now.

@openwrt-bot
Copy link
Author

muddyfeet:

Latency on the 18.06 branch is around 100ms for 5GHz, and still around 20ms for the 2.4GHz radio.

@openwrt-bot
Copy link
Author

muddyfeet:

I should add that I'm testing the latency using http://www.dslreports.com/speedtest, so will include the latency of my firewall, modem and the internet path. It is the bufferbloat latency reported on that website, not idle ping latency.

@openwrt-bot
Copy link
Author

muddyfeet:

Just for reference, the latency on a wired connection through to the above site is 15ms with throughput of around 300Mbps.

@openwrt-bot
Copy link
Author

muddyfeet:

May be the same issue as

FS#2246 - Latency is over 360ms for game PUBG for ath10k-ct firmware

@openwrt-bot
Copy link
Author

muddyfeet:

I've also just tried building 7a577e9 (22 Oct) as an ar71xx image and it exhibits the same behaviour so it is not the ath79 configuration.

Here's the 2.4 GHz radio result for the ar71xx build connected to the MS Surface Pro client which is fine.

[[http://www.dslreports.com/speedtest/55686780]]

and the 5 GHz radio result (really bad)

[[http://www.dslreports.com/speedtest/55686701]]

and the results of the 5 GHz radio with an iPad (no issues with this client)

[[http://www.dslreports.com/speedtest/55687215]]

Changing the access point back to the 18.06 build with the 5 GHz radio and the MS Surface Pro client it is all good.

[[http://www.dslreports.com/speedtest/55687982]]

@openwrt-bot
Copy link
Author

psyborg:

please report bug here: https://github.com/greearb/ath10k-ct/issues
it has higher chances of being fixed sooner

@openwrt-bot
Copy link
Author

muddyfeet:

Thanks psyborg. As I mentioned in the original report the problem exists in the non-CT firmware/driver as well which would point to something else.

My best guess is there is something not quite right in the Marvell driver in the MS Surface that interacts with later versions of hostapd or mac80211 in a bad way, which the Apple iPad doesn't. The problem really is Microsoft's but given that the 6d819fa build (gemini: image: fix race condition when building copy-kernel.bin) triggers the problem (which as I said make no sense), I suspect there is something in the Openwrt system that isn't quite right either.

@openwrt-bot
Copy link
Author

ldir:

I'm seeing erratic and high latency issues with ath10k-* and mt76 when using 5Ghz and Apple devices. I've actually reverted to factory firmware on my Archer C7 v2 boxes that I'm using as access points to solve the problem.

Usually people say 'your fault for using Apple'

@openwrt-bot
Copy link
Author

psyborg:

I'm not convinced about this. Did you cleaned builddir before making again?
The only latency problem I had was fixed with https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=97c37f8dd067fd4750a64bcaa639e6d08462060d

@openwrt-bot
Copy link
Author

muddyfeet:

Hi psyborg. Yes, I cleaned before each build. In fact I did a distclean before building the ar71xx build having previously been building ath79 builds. . Do you have a surface to test with? The problem occurs with this device and not my Apple or android devices. I’m currently trying to work out if I can capture some on air packets and debug further.

@openwrt-bot
Copy link
Author

psyborg:

can you attach png screenshots for surface and ipad 5ghz on a build before 11 sep ?

@openwrt-bot
Copy link
Author

muddyfeet:

Ok, here's the results for an ath79 build 853e4dd dated 23 Aug. This build used the ath10k-ct driver and firmware. This was my last good regular build. Even then, the latency for the surface is somewhat excessive which is impacting on throughput.

These results were captured with the devices around 4 metres from the AP with no obstructions.

The 5GHz radio with the surface

[[http://www.dslreports.com/speedtest/55853053]]

and the iPad

[[http://www.dslreports.com/speedtest/55853074]]

My identification of the offending commit was done using git bisect and confirmed separately with clean makes.

@openwrt-bot
Copy link
Author

psyborg:

what if your revert these two commits: e8350c9 and 191c3e4

@openwrt-bot
Copy link
Author

muddyfeet:

I'll give it a try on both an ar71xx and ath79 build and report back.

@openwrt-bot
Copy link
Author

muddyfeet:

Actually, just realised that these patches were not present in the 6d819fa ath79 build I did as they were applied a few commits onwards. That build exhibited the very high latency.

I'll build a current snapshot without those commits and let you know.

@openwrt-bot
Copy link
Author

muddyfeet:

Just cloned a brand new snapshot of openwrt, updated and installed packages, deleted patch 952 and 953 from the ./target/linux/ar71xx/patches-4.14 directory (corresponding to the commits above), then built an ar71xx target for the archer c7 v2.

The results are the same - extremely high latency. It is not those patches causing the issue.

@openwrt-bot
Copy link
Author

psyborg:

try reverting 6d819fa on your latest snapshot. if it doesn't help then try also reverting 9f34bf5, 7bed9bf

@openwrt-bot
Copy link
Author

muddyfeet:

I've done the same procedure as above, but reverted the changes to the gemini makefiles (6d819fa). The high latency is still there. I am not overly surprised as this is not the target for the archer c7.

[[http://www.dslreports.com/speedtest/55890141]]

Reverting the other commits is quite difficult as there have been many commits since to hostapd including a change in version.

At this stage, I'm just going to leave it and use the 18.06.4 version which works fine.

@openwrt-bot
Copy link
Author

muddyfeet:

And just for completeness, here's my latest test using my compiled v18.06.4 branch using the above procedure (except adding git checkout v18.06.4).

[[http://www.dslreports.com/speedtest/55890670]]

So, my build system looks fine.

Time for someone else to have a go at fixing.

@openwrt-bot
Copy link
Author

psyborg:

in that case your bisection to 6d819fa is wrong. there was kernel update on the same day, maybe you had that build

@openwrt-bot
Copy link
Author

muddyfeet:

Maybe - I'm not a software engineer (I'm a radio engineer) so may have tripped up somewhere. I initially did a bisect to identify the commit where things really broke, followed a series of clean builds first using "make clean" then "git reset --hard 6d819fa" to confirm it (I did the three prior commits working up to it as I actually thought it was probably one of the two prior hostapd commits that might have triggered it).

As I mentioned earlier, the latency I was seeing on my 23 Aug build was still unacceptably high (100-200mS) but usable. My guess is there is something unstable in the build that doesn't take much to push it over the edge. The 6d819fa commit was the one which did it in my initial testing (still can't understand why though) but could equally be something else later on.

Given that it is present with both the ath10k-ct and ath10k firmware and driver, my best guess is that the issue is in the mac80211 or hostapd layers. Probably the way to actually find the issue is to bisect looking for the increase in latency from tens of millisecs to hundreds (sometime prior to Aug).

I have also just tried the ath79 snapshot on the openwrt download site. The problem is still there with latency measured in seconds.

Either way the problem still remains and I don't have time to continue debugging. I'm very thankful for your time trying to get this sorted but I need to revert to the v18.06.4 build as it keeps things running here.

@openwrt-bot
Copy link
Author

muddyfeet:

See also FS#2682 - Similar issues on the same device.

@openwrt-bot
Copy link
Author

adrianschmutzler:

I've just been informed by a user that antenna_gain is not retrieved from driver on C7 v2 5 GHz, too (resulting in txpower=legal limit)

@openwrt-bot
Copy link
Author

muddyfeet:

Further update - I now have a Linksys wrt1200ac device running the latest trunk snapshot (r11982-c6e972c877). This exhibits the same behaviour on 5GHz as the Archer C7s.

My feeling is the problem lies not in the ath10k driver (the wrt1200ac has a Marvell chipset) but further up the chain - probably hostapd. There is some evidence for this in the ping tests I did with @hgblob's build of 19.07 with hostapd v2.8 - see https://forum.openwrt.org/t/zyxel-nbg6617-ipq4019-regression-in-wifi-5ghz/50544/26?u=muddyfeet

As I mentioned earlier, it's an interaction with a problematic Microsoft driver on the MS Surface devices triggered by something in Openwrt that doesn't play nice with it.

@openwrt-bot
Copy link
Author

adrianschmutzler:

BTW: The antenna_gain issue just reported by me is with Non-CT driver/firmware on 19.07.0.

@openwrt-bot
Copy link
Author

muddyfeet:

See FS#2679. This has now been fixed with https://git.openwrt.org/c07f6e8659ea1348c75c04dac2924616f0042293

Thank you Felix and Petr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant