New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FS#294 - linksys 1200ac (and most likely other mvneta) has a multi-flow lockout problem #5411
Comments
None: I note https://git.lede-project.org/8aa9f6bd71bcfd15e953a0932ed21953ab6d6bbf has just been committed. |
anomeome: Seems with 4.4.35 kernel things are working well again. |
mkresin: Dave, would you please test if the issue is fixed for you as well! |
dtaht: Nope. Not fixed. Tried the Dec 23 build just now. I hit it with > 4 flows, it locks out everything else. (The way I was dealing with it was with running cake at 900mbit on the I guess it's one way around bugs like this.... Example test using 12 flows from flent and netperf on the router: root@apu2:~/t# flent -H 172.26.64.1 -t 'isitfixed' Warning: Program exited non-zero (1). Warning: Command produced no valid data. |
woody77: I'm seeing the same here on my wrt1900ac (v1-Mamba) running the 12/22 snapshot. Here's the graphed output from a 12-stream netperf download test (flent -H <wrt_1900_ac> tcp_12down) |
nbd: Added a workaround for this issue to current master. |
dtaht: I have tested the mvneta with this and it no longer has the lockout behavior, and I did not test "through" the router at a gbit. It can't do both 12 flows up and down at the same time. (about 600mbit each way). It might be good for the driver to stop exposing "mq" at all to the higher level bits of the stack, as allocating 8 fq_codel instances is somewhat wasteful, and confusing. I have also seen bql "working" in this case. I support closure until this is better fixed upstream. 2017-01-16 19:05 GMT+01:00 Felix Fietkau nbd@nbd.name:
|
dtaht: Also, the current code is locked to the first core. root@linksys-1200ac:/proc/irq# cd 37 root@linksys-1200ac:/proc/irq/105# cat /proc/interrupts |
Here's some more information from IRC:
This was around kernel 5.4 days. Since then, it seems this may have actually been fixed. See torvalds/linux@cf9bf87 That says kernel 5.12. @tmn505 you think it can be removed? edit: speaking of configuring queues, that seems to have been added in kernel 5.17: torvalds/linux@2551dc9 |
If You'll look at the patch, the workaround is disabled on ARM64, so doesn't apply to 3700. Also, looking at referenced PR You'll see this comment: #12938 (comment), so yes, it's still needed. Maybe fiddling with queues can fix that, but that's not for me to test, since I don't own any devices with Armada 32-bit SoC. |
If I remember correctly, maybe it was already two years ago when @pali stumbled upon this patch, and you are still trying to maintain this repo Pali or @elkablo investigated it and determined that with the vanilla kernel, it works, so something is happening in the downstream repository, and this patch was clearly hack, and I think we in Turris dropped it, and things started to work. Also, this patch has been here for ages. Only @nbd168 knows why it was added; maybe it is time to drop it, ask the OpenWrt community and Linux kernel developers to work together, and figure it out once and for all because still, with each kernel bump, we go through the same shit(s) over and over. Just my 2c. |
---- mark ---- :D |
dtaht:
Supply the following if possible:
linksys 1200ac
Reboot (HEAD, r2246)
Install netperf
and then, from another machine, either:
netperf -H the_device -l 60 -t TCP_MAERTS &
netperf -H the_device -l 60 -t TCP_MAERTS &
netperf -H the_device -l 60 -t TCP_MAERTS &
netperf -H the_device -l 60 -t TCP_MAERTS &
or:
flent -H the_device --test-parameter=download_streams=12 tcp_ndown
The result generally is that you only get one of the flows going, the others starve completely.
I am under the impression that fixes for this arrived in mainline linux (also adding BQL support)
The text was updated successfully, but these errors were encountered: