OpenWrt/LEDE Project

  • Status Unconfirmed
  • Percent Complete
    0%
  • Task Type Bug Report
  • Category Base system
  • Assigned To No-one
  • Operating System All
  • Severity High
  • Priority Very Low
  • Reported Version Trunk
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Warren Linton - 23.10.2019

FS#2563 - Extremely high latency on 5GHz radio ath10k-ct (tplink archer C7 v2)

Supply the following if possible:
- Device problem occurs on
Archer C7 v2 configured as a simple WPA2-PSK access point only (no WAN routing, wpad-basic, ath10k-ct, and clean radio channels)

- Software versions of OpenWrt/LEDE release, packages, etc.

Trunk (ath79)

- Steps to reproduce
Latency on builds prior to 11 Sep on 5 GHz radio (ath10k-ct) is around 200-400ms (20ms on the 2.4GHz ath9 radio) - which is still unacceptably high. Despite this high latency I can achieve throughput of up to 50Mbps
Post 11 Sep, latency increases to over 2 sec making the 5GHz radio close to unusable (the 2.4GHz radio is unchanged around 20ms). Throughput drops to kbps.

Have compiled and tested builds e667d6f, 9f34bf5, 7bed9bf, and 6d819fa. The major shift in latency occurs with 6d819fa (which makes absolutely no sense as it is a change for the gemini kernel build). The two middle builds have changes to hostapd but do not appear to cause the 2 sec latency. The issue still remains on 7a577e9 (22 Oct), which I have tried with different configurations of hostapd (full-wolfssl and mini).

I have tried different encryption, no encryption, and the non-CT driver and firmware to no benefit.

My testing is with a Microsoft Surface Pro 6 which may be half the problem as I’m not seeing the 2 sec latency on the other devices I own (Apple ipad and iphones). BTW I am aware of the current 5GHz radio issues with the recent Surface driver (I am using an older driver without these issues).

I have reverted to the 18.06 ar71xx branch to fix the problem for now.

Warren Linton commented on 23.10.2019 08:01

Latency on the 18.06 branch is around 100ms for 5GHz, and still around 20ms for the 2.4GHz radio.

Warren Linton commented on 23.10.2019 08:11

I should add that I'm testing the latency using http://www.dslreports.com/speedtest, so will include the latency of my firewall, modem and the internet path. It is the bufferbloat latency reported on that website, not idle ping latency.

Warren Linton commented on 23.10.2019 08:35

Just for reference, the latency on a wired connection through to the above site is 15ms with throughput of around 300Mbps.

Warren Linton commented on 23.10.2019 11:36

May be the same issue as

FS#2246 - Latency is over 360ms for game PUBG for ath10k-ct firmware

Warren Linton commented on 24.10.2019 12:35

I've also just tried building 7a577e9 (22 Oct) as an ar71xx image and it exhibits the same behaviour so it is not the ath79 configuration.

Here's the 2.4 GHz radio result for the ar71xx build connected to the MS Surface Pro client which is fine.

http://www.dslreports.com/speedtest/55686780

and the 5 GHz radio result (really bad)

http://www.dslreports.com/speedtest/55686701

and the results of the 5 GHz radio with an iPad (no issues with this client)

http://www.dslreports.com/speedtest/55687215

Changing the access point back to the 18.06 build with the 5 GHz radio and the MS Surface Pro client it is all good.

http://www.dslreports.com/speedtest/55687982

psyborg commented on 25.10.2019 00:25

please report bug here: https://github.com/greearb/ath10k-ct/issues it has higher chances of being fixed sooner

Warren Linton commented on 25.10.2019 07:34

Thanks psyborg. As I mentioned in the original report the problem exists in the non-CT firmware/driver as well which would point to something else.

My best guess is there is something not quite right in the Marvell driver in the MS Surface that interacts with later versions of hostapd or mac80211 in a bad way, which the Apple iPad doesn't. The problem really is Microsoft's but given that the 6d819fa build (gemini: image: fix race condition when building copy-kernel.bin) triggers the problem (which as I said make no sense), I suspect there is something in the Openwrt system that isn't quite right either.

Project Manager
Kevin 'ldir' Darbyshire-Bryant commented on 25.10.2019 14:59

I'm seeing erratic and high latency issues with ath10k-* and mt76 when using 5Ghz and Apple devices. I've actually reverted to factory firmware on my Archer C7 v2 boxes that I'm using as access points to solve the problem.

Usually people say 'your fault for using Apple'

psyborg commented on 27.10.2019 20:20

I'm not convinced about this. Did you cleaned builddir before making again?
The only latency problem I had was fixed with https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=97c37f8dd067fd4750a64bcaa639e6d08462060d

Warren Linton commented on 28.10.2019 06:48

Hi psyborg. Yes, I cleaned before each build. In fact I did a distclean before building the ar71xx build having previously been building ath79 builds. . Do you have a surface to test with? The problem occurs with this device and not my Apple or android devices. I’m currently trying to work out if I can capture some on air packets and debug further.

psyborg commented on 29.10.2019 01:16

can you attach png screenshots for surface and ipad 5ghz on a build before 11 sep ?

Warren Linton commented on 29.10.2019 08:29

Ok, here's the results for an ath79 build 853e4dd dated 23 Aug. This build used the ath10k-ct driver and firmware. This was my last good regular build. Even then, the latency for the surface is somewhat excessive which is impacting on throughput.

These results were captured with the devices around 4 metres from the AP with no obstructions.

The 5GHz radio with the surface

http://www.dslreports.com/speedtest/55853053

and the iPad

http://www.dslreports.com/speedtest/55853074

My identification of the offending commit was done using git bisect and confirmed separately with clean makes.

psyborg commented on 29.10.2019 09:00

what if your revert these two commits: e8350c998bbac54f6d17bf809e30386ef3cf7563 and 191c3e49b99911571d3cc66ef8d363d1cafb2b89

Warren Linton commented on 29.10.2019 09:28

I'll give it a try on both an ar71xx and ath79 build and report back.

Warren Linton commented on 29.10.2019 09:37

Actually, just realised that these patches were not present in the 6d819fa ath79 build I did as they were applied a few commits onwards. That build exhibited the very high latency.

I'll build a current snapshot without those commits and let you know.

Warren Linton commented on 29.10.2019 13:44

Just cloned a brand new snapshot of openwrt, updated and installed packages, deleted patch 952 and 953 from the ./target/linux/ar71xx/patches-4.14 directory (corresponding to the commits above), then built an ar71xx target for the archer c7 v2.

The results are the same - extremely high latency. It is not those patches causing the issue.

psyborg commented on 29.10.2019 21:05

try reverting 6d819fa on your latest snapshot. if it doesn't help then try also reverting 9f34bf5, 7bed9bf

Warren Linton commented on 30.10.2019 11:04

I've done the same procedure as above, but reverted the changes to the gemini makefiles (6d819fa). The high latency is still there. I am not overly surprised as this is not the target for the archer c7.

http://www.dslreports.com/speedtest/55890141

Reverting the other commits is quite difficult as there have been many commits since to hostapd including a change in version.

At this stage, I'm just going to leave it and use the 18.06.4 version which works fine.

Warren Linton commented on 30.10.2019 11:39

And just for completeness, here's my latest test using my compiled v18.06.4 branch using the above procedure (except adding git checkout v18.06.4).

http://www.dslreports.com/speedtest/55890670

So, my build system looks fine.

Time for someone else to have a go at fixing.

psyborg commented on 30.10.2019 12:21

in that case your bisection to 6d819fa is wrong. there was kernel update on the same day, maybe you had that build

Warren Linton commented on 30.10.2019 13:57

Maybe - I'm not a software engineer (I'm a radio engineer) so may have tripped up somewhere. I initially did a bisect to identify the commit where things really broke, followed a series of clean builds first using "make clean" then "git reset –hard 6d819fa" to confirm it (I did the three prior commits working up to it as I actually thought it was probably one of the two prior hostapd commits that might have triggered it).

As I mentioned earlier, the latency I was seeing on my 23 Aug build was still unacceptably high (100-200mS) but usable. My guess is there is something unstable in the build that doesn't take much to push it over the edge. The 6d819fa commit was the one which did it in my initial testing (still can't understand why though) but could equally be something else later on.

Given that it is present with both the ath10k-ct and ath10k firmware and driver, my best guess is that the issue is in the mac80211 or hostapd layers. Probably the way to actually find the issue is to bisect looking for the increase in latency from tens of millisecs to hundreds (sometime prior to Aug).

I have also just tried the ath79 snapshot on the openwrt download site. The problem is still there with latency measured in seconds.

Either way the problem still remains and I don't have time to continue debugging. I'm very thankful for your time trying to get this sorted but I need to revert to the v18.06.4 build as it keeps things running here.

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing