- Status Closed
- Percent Complete
- Task Type Bug Report
- Category Base system
- Assigned To No-one
- Operating System All
- Severity High
- Priority Very Low
- Reported Version Trunk
- Due in Version Undecided
-
Due Date
Undecided
- Private
Opened by Warren Linton - 23.10.2019
Last edited by Petr Štetiar - 24.01.2020
FS#2563 - Extremely high latency on 5GHz radio ath10k-ct (tplink archer C7 v2)
Supply the following if possible:
- Device problem occurs on
Archer C7 v2 configured as a simple WPA2-PSK access point only (no WAN routing, wpad-basic, ath10k-ct, and clean radio channels)
- Software versions of OpenWrt/LEDE release, packages, etc.
Trunk (ath79)
- Steps to reproduce
Latency on builds prior to 11 Sep on 5 GHz radio (ath10k-ct) is around 200-400ms (20ms on the 2.4GHz ath9 radio) - which is still unacceptably high. Despite this high latency I can achieve throughput of up to 50Mbps
Post 11 Sep, latency increases to over 2 sec making the 5GHz radio close to unusable (the 2.4GHz radio is unchanged around 20ms). Throughput drops to kbps.
Have compiled and tested builds e667d6f, 9f34bf5, 7bed9bf, and 6d819fa. The major shift in latency occurs with 6d819fa (which makes absolutely no sense as it is a change for the gemini kernel build). The two middle builds have changes to hostapd but do not appear to cause the 2 sec latency. The issue still remains on 7a577e9 (22 Oct), which I have tried with different configurations of hostapd (full-wolfssl and mini).
I have tried different encryption, no encryption, and the non-CT driver and firmware to no benefit.
My testing is with a Microsoft Surface Pro 6 which may be half the problem as I’m not seeing the 2 sec latency on the other devices I own (Apple ipad and iphones). BTW I am aware of the current 5GHz radio issues with the recent Surface driver (I am using an older driver without these issues).
I have reverted to the 18.06 ar71xx branch to fix the problem for now.
24.01.2020 07:53
Reason for closing: Fixed
Additional comments about closing:
Fixed in https:/ /git.openwrt.org/c07f6e8659ea1348c75c04d ac2924616f0042293
Latency on the 18.06 branch is around 100ms for 5GHz, and still around 20ms for the 2.4GHz radio.
I should add that I'm testing the latency using http://www.dslreports.com/speedtest, so will include the latency of my firewall, modem and the internet path. It is the bufferbloat latency reported on that website, not idle ping latency.
Just for reference, the latency on a wired connection through to the above site is 15ms with throughput of around 300Mbps.
May be the same issue as
FS#2246 - Latency is over 360ms for game PUBG for ath10k-ct firmware
I've also just tried building 7a577e9 (22 Oct) as an ar71xx image and it exhibits the same behaviour so it is not the ath79 configuration.
Here's the 2.4 GHz radio result for the ar71xx build connected to the MS Surface Pro client which is fine.
http://www.dslreports.com/speedtest/55686780
and the 5 GHz radio result (really bad)
http://www.dslreports.com/speedtest/55686701
and the results of the 5 GHz radio with an iPad (no issues with this client)
http://www.dslreports.com/speedtest/55687215
Changing the access point back to the 18.06 build with the 5 GHz radio and the MS Surface Pro client it is all good.
http://www.dslreports.com/speedtest/55687982
please report bug here: https://github.com/greearb/ath10k-ct/issues it has higher chances of being fixed sooner
Thanks psyborg. As I mentioned in the original report the problem exists in the non-CT firmware/driver as well which would point to something else.
My best guess is there is something not quite right in the Marvell driver in the MS Surface that interacts with later versions of hostapd or mac80211 in a bad way, which the Apple iPad doesn't. The problem really is Microsoft's but given that the 6d819fa build (gemini: image: fix race condition when building copy-kernel.bin) triggers the problem (which as I said make no sense), I suspect there is something in the Openwrt system that isn't quite right either.
I'm seeing erratic and high latency issues with ath10k-* and mt76 when using 5Ghz and Apple devices. I've actually reverted to factory firmware on my Archer C7 v2 boxes that I'm using as access points to solve the problem.
Usually people say 'your fault for using Apple'
I'm not convinced about this. Did you cleaned builddir before making again?
The only latency problem I had was fixed with https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=97c37f8dd067fd4750a64bcaa639e6d08462060d
Hi psyborg. Yes, I cleaned before each build. In fact I did a distclean before building the ar71xx build having previously been building ath79 builds. . Do you have a surface to test with? The problem occurs with this device and not my Apple or android devices. I’m currently trying to work out if I can capture some on air packets and debug further.
can you attach png screenshots for surface and ipad 5ghz on a build before 11 sep ?
Ok, here's the results for an ath79 build 853e4dd dated 23 Aug. This build used the ath10k-ct driver and firmware. This was my last good regular build. Even then, the latency for the surface is somewhat excessive which is impacting on throughput.
These results were captured with the devices around 4 metres from the AP with no obstructions.
The 5GHz radio with the surface
http://www.dslreports.com/speedtest/55853053
and the iPad
http://www.dslreports.com/speedtest/55853074
My identification of the offending commit was done using git bisect and confirmed separately with clean makes.
what if your revert these two commits: e8350c998bbac54f6d17bf809e30386ef3cf7563 and 191c3e49b99911571d3cc66ef8d363d1cafb2b89
I'll give it a try on both an ar71xx and ath79 build and report back.
Actually, just realised that these patches were not present in the 6d819fa ath79 build I did as they were applied a few commits onwards. That build exhibited the very high latency.
I'll build a current snapshot without those commits and let you know.
Just cloned a brand new snapshot of openwrt, updated and installed packages, deleted patch 952 and 953 from the ./target/linux/ar71xx/patches-4.14 directory (corresponding to the commits above), then built an ar71xx target for the archer c7 v2.
The results are the same - extremely high latency. It is not those patches causing the issue.
try reverting 6d819fa on your latest snapshot. if it doesn't help then try also reverting 9f34bf5, 7bed9bf
I've done the same procedure as above, but reverted the changes to the gemini makefiles (6d819fa). The high latency is still there. I am not overly surprised as this is not the target for the archer c7.
http://www.dslreports.com/speedtest/55890141
Reverting the other commits is quite difficult as there have been many commits since to hostapd including a change in version.
At this stage, I'm just going to leave it and use the 18.06.4 version which works fine.
And just for completeness, here's my latest test using my compiled v18.06.4 branch using the above procedure (except adding git checkout v18.06.4).
http://www.dslreports.com/speedtest/55890670
So, my build system looks fine.
Time for someone else to have a go at fixing.
in that case your bisection to 6d819fa is wrong. there was kernel update on the same day, maybe you had that build
Maybe - I'm not a software engineer (I'm a radio engineer) so may have tripped up somewhere. I initially did a bisect to identify the commit where things really broke, followed a series of clean builds first using "make clean" then "git reset –hard 6d819fa" to confirm it (I did the three prior commits working up to it as I actually thought it was probably one of the two prior hostapd commits that might have triggered it).
As I mentioned earlier, the latency I was seeing on my 23 Aug build was still unacceptably high (100-200mS) but usable. My guess is there is something unstable in the build that doesn't take much to push it over the edge. The 6d819fa commit was the one which did it in my initial testing (still can't understand why though) but could equally be something else later on.
Given that it is present with both the ath10k-ct and ath10k firmware and driver, my best guess is that the issue is in the mac80211 or hostapd layers. Probably the way to actually find the issue is to bisect looking for the increase in latency from tens of millisecs to hundreds (sometime prior to Aug).
I have also just tried the ath79 snapshot on the openwrt download site. The problem is still there with latency measured in seconds.
Either way the problem still remains and I don't have time to continue debugging. I'm very thankful for your time trying to get this sorted but I need to revert to the v18.06.4 build as it keeps things running here.
See also
FS#2682- Similar issues on the same device.I've just been informed by a user that antenna_gain is not retrieved from driver on C7 v2 5 GHz, too (resulting in txpower=legal limit)
Further update - I now have a Linksys wrt1200ac device running the latest trunk snapshot (r11982-c6e972c877). This exhibits the same behaviour on 5GHz as the Archer C7s.
My feeling is the problem lies not in the ath10k driver (the wrt1200ac has a Marvell chipset) but further up the chain - probably hostapd. There is some evidence for this in the ping tests I did with @hgblob's build of 19.07 with hostapd v2.8 - see https://forum.openwrt.org/t/zyxel-nbg6617-ipq4019-regression-in-wifi-5ghz/50544/26?u=muddyfeet
As I mentioned earlier, it's an interaction with a problematic Microsoft driver on the MS Surface devices triggered by something in Openwrt that doesn't play nice with it.
BTW: The antenna_gain issue just reported by me is with Non-CT driver/firmware on 19.07.0.
See
FS#2679. This has now been fixed with https://git.openwrt.org/c07f6e8659ea1348c75c04dac2924616f0042293Thank you Felix and Petr.