OpenWrt/LEDE Project

  • Status Closed
  • Percent Complete
    100%
  • Task Type Bug Report
  • Category Kernel
  • Assigned To No-one
  • Operating System All
  • Severity Medium
  • Priority Very Low
  • Reported Version Trunk
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Dave Täht - 20.11.2016
Last edited by Felix Fietkau - 11.01.2017

FS#294 - linksys 1200ac (and most likely other mvneta) has a multi-flow lockout problem

Supply the following if possible:
- Device problem occurs on:

linksys 1200ac

- Software versions of LEDE release, packages, etc.

Reboot (HEAD, r2246)

- Steps to reproduce


Install netperf

and then, from another machine, either:

netperf -H the_device -l 60 -t TCP_MAERTS &

netperf -H the_device -l 60 -t TCP_MAERTS &
netperf -H the_device -l 60 -t TCP_MAERTS &
netperf -H the_device -l 60 -t TCP_MAERTS &

or:

flent -H the_device –test-parameter=download_streams=12 tcp_ndown

The result generally is that you only get one of the flows going, the others starve completely.

I am under the impression that fixes for this arrived in mainline linux (also adding BQL support)


Closed by  Felix Fietkau
11.01.2017 13:57
Reason for closing:  Fixed
Anonymous Submitter commented on 21.11.2016 11:04
anomeome commented on 29.11.2016 16:52

Seems with 4.4.35 kernel things are working well again.

Project Manager
Mathias Kresin commented on 04.12.2016 08:52

Dave, would you please test if the issue is fixed for you as well!

Dave Täht commented on 24.12.2016 03:22

Nope. Not fixed. Tried the Dec 23 build just now. I hit it with > 4 flows, it locks out everything else.

(The way I was dealing with it was with running cake at 900mbit on the
internal ethernet using sqm - which works great, aside from burning a ton of cpu).

I guess it's one way around bugs like this....

Example test using 12 flows from flent and netperf on the router:

root@apu2:~/t# flent -H 172.26.64.1 -t 'isitfixed'
–te=download_streams=12 tcp_ndown

Warning: Program exited non-zero (1).
Command: /usr/bin/netperf -P 0 -v 0 -D -0.20 -4 -H 172.26.64.1 -t
TCP_MAERTS -l 60 -f m – -H 172.26.64.1
Program output:

netperf: send_omni: connect_data_socket failed: No route to host

Warning: Command produced no valid data.
Data series: TCP download::5
Runner: NetperfDemoRunner
Command: /usr/bin/netperf -P 0 -v 0 -D -0.20 -4 -H 172.26.64.1 -t
TCP_MAERTS -l 60 -f m – -H 172.26.64.1
Standard error output:

netperf: send_omni: connect_data_socket failed: No route to host
woody77 commented on 05.01.2017 22:20

I'm seeing the same here on my wrt1900ac (v1-Mamba) running the 12/22 snapshot. Here's the graphed output from a 12-stream netperf download test (flent -H <wrt_1900_ac> tcp_12down)

Project Manager
Felix Fietkau commented on 11.01.2017 13:57

Added a workaround for this issue to current master.

Dave Täht commented on 19.01.2017 20:05

I have tested the mvneta with this and it no longer has the lockout behavior, and
can, indeed push netperf in one direction or the other at 1Gbit, with 12 flows.

I did not test "through" the router at a gbit.

It can't do both 12 flows up and down at the same time. (about 600mbit each way).

It might be good for the driver to stop exposing "mq" at all to the higher level bits of the stack, as allocating 8 fq_codel instances is somewhat wasteful, and confusing.

I have also seen bql "working" in this case.

I support closure until this is better fixed upstream.

2017-01-16 19:05 GMT+01:00 Felix Fietkau nbd@nbd.name:
> On 2017-01-16 18:59, Dave Taht wrote:
» On Mon, Jan 16, 2017 at 9:28 AM, Marcin Wojtas mw@semihalf.com wrote:
»> I just took a look in the LEDE master branch and found his work-around:
»>
»> https://git.lede-project.org/?p=source.git;a=blob;f=target/linux/mvebu/patches-4.4/400-mvneta-tx-queue-workaround.patch;h=5dba311d93a6d325fc110b8218d56209bd78e9dd;hb=2e1f6f1682d3974d8ea52310e460f1bbe470390f#l1


He simply uses TXQ0 for entire traffic. I'm not aware of any problem
»> in HW. Maybe he can send a description of his findings to the kernel
»> lists and then I'd poke Marvell so that it could at least try to get
»> into their network team bug system?
> To me the behavior looks like the hardware is configured to service the
> queues in a fixed priority scheme. If I put the system under heavy load,
> one queue gets its packets out all the time, whereas all the other
> queues starve completely (>900 Mbit/s on one queue vs <1 Mbit/s on others).
> I've tried to resolve this myself by looking at the data sheet and
> playing with the queue configuration registers, but didn't get anywhere
> with that.
Dave Täht commented on 19.01.2017 22:57

Also, the current code is locked to the first core.

root@linksys-1200ac:/proc/irq# cd 37
root@linksys-1200ac:/proc/irq/37# echo 2 > smp_affinity
-ash: write error: I/O error
root@linksys-1200ac:/proc/irq/37# ls
affinity_hint node smp_affinity_list
mvneta smp_affinity spurious

root@linksys-1200ac:/proc/irq/105# cat /proc/interrupts

         CPU0       CPU1       

17: 47628305 48244950 GIC 29 Edge twd
18: 0 0 armada_370_xp_irq 5 Level armada_370_xp_per_cpu_tick
20: 174 0 GIC 34 Level mv64xxx_i2c
21: 20 0 GIC 44 Level serial
35: 45945135 1 armada_370_xp_irq 12 Level mvneta
36: 0 0 GIC 50 Level ehci_hcd:usb1
37: 71740156 0 armada_370_xp_irq 8 Level mvneta
38: 0 0 GIC 51 Level f1090000.crypto
39: 0 0 GIC 52 Level f1090000.crypto
40: 0 0 GIC 53 Level f10a3800.rtc
41: 0 0 GIC 58 Level f10a8000.sata
42: 41658 0 GIC 116 Level f10d0000.flash
43: 0 0 GIC 49 Level xhci-hcd:usb2
68: 0 0 f1018100.gpio 24 Edge gpio_keys
73: 0 0 f1018100.gpio 29 Edge gpio_keys
104: 363099418 34760 GIC 61 Level mwlwifi
105: 373070016 17906 GIC 65 Level mwlwifi
106: 2 0 GIC 54 Level f1060800.xor
107: 2 0 GIC 97 Level f1060900.xor
IPI0: 0 1 CPU wakeup interrupts
IPI1: 0 0 Timer broadcast interrupts
IPI2: 1104300 3646744 Rescheduling interrupts
IPI3: 0 0 Function call interrupts
IPI4: 280997 19353589 Single function call interrupts
IPI5: 0 0 CPU stop interrupts
IPI6: 0 0 IRQ work interrupts
IPI7: 0 0 completion interrupts
Err: 0
root@linksys-1200ac:/proc/irq/105# cd ..
root@linksys-1200ac:/proc/irq# cd 37
root@linksys-1200ac:/proc/irq/37# echo 2 > smp_affinity
-ash: write error: I/O error
root@linksys-1200ac:/proc/irq/37# ls
affinity_hint node smp_affinity_list
mvneta smp_affinity spurious

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing