Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#3373 - IPV6 flow offload broken #8239

Closed
openwrt-bot opened this issue Oct 9, 2020 · 54 comments
Closed

FS#3373 - IPV6 flow offload broken #8239

openwrt-bot opened this issue Oct 9, 2020 · 54 comments
Labels

Comments

@openwrt-bot
Copy link

graphine:

Enabling this option makes the ipv6 connection unstable.

Example: connecting to #openwrt-devel from hexchat (Ubuntu) makes you constantly disconnect.
Disabling flow offload makes the connection stable.

@openwrt-bot
Copy link
Author

netprince:

I can confirm, using -j FLOWOFFLOAD from ip6tables causes ipv6 connections to stall after a few minutes.

@openwrt-bot
Copy link
Author

netprince:

Just another datapoint, using the exact same configuration for each build. This was tested using a Xiaomi Mi Router 3G. This occurs with and without hardware offload enabled.

A build of OpenWrt 19.07-SNAPSHOT, r11285-11f4918ebb (dated 20210125), offloading works great.

A build of OpenWrt SNAPSHOT, r15599-37752336bd (dated 20210125), ipv6 connections stall after being idle for a while. (switching back and forth between apps on a phone for example)

When the connection stalls, everything in the app (on my phone) just hangs for about 8-10 seconds before finally coming back to life.

@openwrt-bot
Copy link
Author

hacc1225:

Tested on ath79 (TP-Link Archer C7 v2) with the same issues.

@openwrt-bot
Copy link
Author

hacc1225:

When -j FLOWOFFLOAD is enable, the IPv6 packet loss rate is as high as 50%.
Build the image with commit ddab795

@openwrt-bot
Copy link
Author

ijoda:

I'm still seeing this issue in OpenWrt 21.02.0-rc1 r16046-59980f7aaf (netgear r7800). I'd be happy to help diagnose this if given some assistance in how to best track it down.

@openwrt-bot
Copy link
Author

aag:

confirm this exact issue still happening on r16708-e7249669d2 on r7800

@openwrt-bot
Copy link
Author

kkeijzer:

I also have this issue on a WDR4300 with 21.02.0-rc3. Flow offloading was working fine before, with 19.07.7. Connections break completely after a while, all kinds of elements on web pages are not loaded, and so on.

@openwrt-bot
Copy link
Author

andybotting:

Took me a week before I discovered this issue, and I find this bug report just after I add a new topic to the forum https://forum.openwrt.org/t/21-02-0-rc3-losing-ipv6-packets-when-flow-offloading-1/100575

In the mean time, I've just added this line to /etc/firewall.user
ip6tables -D FORWARD -m comment --comment "!fw3: Traffic offloading" -m conntrack --ctstate RELATED,ESTABLISHED -j FLOWOFFLOAD --hw

@openwrt-bot
Copy link
Author

farmergreg:

I think I may have a similar issue with IPv4 and flow offloading turned on. Idle ssh connections drop relatively quickly when offloading is enabled. When offloading is turned off, they stay connected indefinately. Please see issue #3759 where I have documented the problems that I have had.

@openwrt-bot
Copy link
Author

farmergreg:

I meant to write please see issue #3759. Thanks!

@openwrt-bot
Copy link
Author

bohdan-s:

Hi, I have confirm my bug FS#3973 is linked to this bug. Disabling offloading or IPv6 resolves the issue. But when both are enabled this bug is observed.

@openwrt-bot
Copy link
Author

Shine-:

OK, seriously. You released 21.02 today. With this bug. I updated. It - well - STOPPED WORKING. You know, most major websites, like, Google. Facebook. Ebay. Are using IPv6. And you are releasing a new major release of OpenWRT that stops supporting IPv6. Wow. Hello? I mean, yes, offloading is disabled by default. But every 19.07 user has it turned on, because OpenWRT is painfully slow without. And now you break it? And nevertheless release the broken version as a new major release? Again, wow. Wow.

@openwrt-bot
Copy link
Author

cuviper:

The 21.02 release has not been announced yet, but I expect this will be a known issue in the release notes, just as it was in rc4. The developers decided not to let this be an indefinite blocker, as the release is already very late. It sounds like this will still be investigated for a point release though.

@openwrt-bot
Copy link
Author

fda:

I have multiple vlans and separate ULAs on every interface. odhcpd announces every subnet to every vlan!
Now i noticed this bug is also gone with disabled sw-offloading

@openwrt-bot
Copy link
Author

User_nikolad:

Archer C7-V2. Problem confirmed

@openwrt-bot
Copy link
Author

Cheddoleum:

I'm puzzled at the lack of a substantive description of this bug. There is no detail, no steps to reproduce, not even any anecdotes except a handful contributed in the comments.

For example, does this affect native IPv6 connections only? Tunneled connections such as 6rd or 6in4 use IPv4 for ingress and egress, but then are routed internally to the destination LAN subnet as v6, so I'm guessing that the v6 flow tables, and therefore this bug, would be in effect. But that is just a guess; as this is in general release, it would be helpful to know whether mitigation is needed without doing a bunch of ad-hoc, nearly-uninformed testing.

@openwrt-bot
Copy link
Author

stephenseiber:

details ipv6 is borked when offloading is enabled. steps to reproduce have ipv6 on and turn on software or hardware offloading. this effects visiting ipv6 primary sites like google, youtube, facebook etc. it effects all connections to ipv6 sites. how is ipv6 borked well when you go to an ipv6 website it will sometimes load instantly like ipv4 but every other or every reload will take several seconds to load or timeout.
fix is to either disable ipv6 or offloading or
turn on software offloading
install ipset
for software offloading i havent confirmed this to work but i would think this does go to the custom rules of ur fire wall and add this
ip6tables -D FORWARD -m comment --comment "!fw3: Traffic offloading" -m conntrack --ctstate RELATED,ESTABLISHED -j FLOWOFFLOAD
then ssh into the device

uci set firewall.@include[0].reload="1"
uci commit firewall
service firewall restart
fw3 flush
fw3 restart

and you should have offloading working and ipv6

for hardware offloading do this
turn on software and hardware offloading
go to the custom rules of ur firewall and add this
ip6tables -D FORWARD -m comment --comment "!fw3: Traffic offloading" -m conntrack --ctstate RELATED,ESTABLISHED -j FLOWOFFLOAD --hw
then ssh into ur router
uci set firewall.@include[0].reload="1"
uci commit firewall
service firewall restart
fw3 flush
fw3 restart

i havent tried with software offloading cause it works with hardware offloading for me
i am using OpenWrt SNAPSHOT r17482-6f2044c2d7
on MikroTik RouterBOARD 760iGS
there appears to be another bug in current snapshot causing speeds to drop off after an initial peak

@openwrt-bot
Copy link
Author

fda:

I have native ipv6 but with nat6 enabled. An unbound behind an openwrt router can not use UDP connections when sw-offloading is enabled! Tcp connections are working, but from time to time there are errors in syslog
Could the new DSA be related?

@openwrt-bot
Copy link
Author

Cheddoleum:

This thread suggests that the change that caused the problem is known and revertible, though I see no confirmation yet:

http://lists.openwrt.org/pipermail/openwrt-devel/2021-August/036223.html

Meanwhile, to perhaps answer my own earlier question, I'm running tunneled IPv6 (Hurricane Electric 6in4) in 21.02.0 and there is as yet no indication of this problem with either HW or SW flow offloading.

@openwrt-bot
Copy link
Author

stephenseiber:

there has only been two people who have even mentioned it in the 8 days since 21.02 was dropped
http://lists.openwrt.org/pipermail/openwrt-devel/2021-September/thread.html#start
http://lists.openwrt.org/pipermail/openwrt-devel/2021-September/036238.html

@openwrt-bot
Copy link
Author

Jacko_neill:

Another affected user here, Xiaomi mi3g. Disabling the rule via ip6tables as indicated above works for me as a workaround. I'd still like to keep flow offload enabled for ipv4, as this router isn't powerful enough to handle the 300mps connection my ISP provides.

I can reproduce this pretty consistently on iphone when scrolling through Facebook/Instagram and switching apps every minute or so. Not sure though on what sort of logs I could grab to help confirming the bug

@openwrt-bot
Copy link
Author

Shine-:

If this is really caused by kernel commit [[https://github.com/torvalds/linux/commit/e97d940|e97d940]] (Edit: it's not, see below!) as suggested in [[http://lists.openwrt.org/pipermail/openwrt-devel/2021-August/036223.html|the mailing list]] then the attached patch should fix it. It effectively reverts e97d940 and additionally picks all necessary changes from [[https://github.com/torvalds/linux/commit/4592ee7f525c4683ec9e290381601fdee50ae110|4592ee7f]], skipping stuff that kernel 5.4 doesn't have yet (like configurable offload timeouts).

It compiles, but is otherwise completely untested, since I went back to 19.07 for now and won't be able to perform another upgrade attempt for at least the next 2 weeks. Try at your own risk, reports welcome.

Edit: So I did dig up a spare box to test, and no, it's not working, even with this patch. Would be strange anyway, since if that were the cause, then both, IPv4 and IPv6 would be affected by the bug, not only IPv6, and also, the packet loss would only happen after 30 or 120 seconds (which are the hardcoded timeouts). However, it occurs pretty much immediately.

Feel free to confirm for yourself, I can't delete the attachment anyway. Maybe a mod can delete it, then I'd re-attach it with proper name and description, for reference. It might be the fix for FS#3759, after all.

@openwrt-bot
Copy link
Author

aragorn:

Affected, too, I can confirm that after I updated my Archer C2600 from 19.07 to 21.02, software flow offloading on, this broke IPv6 connections immediately.
Switching it off fixes the issue, however wan>>lan routing throughput is cut in half then in 21.02, compared to 19.07

@openwrt-bot
Copy link
Author

supersebbo:

I am also intruiged by the lack of definition and priority of this bug.

I'm not currently seeing any issues as I am only using IPv6 on one internal network so nothing traverses the router. However its concerning to think that if I was to start using IPv6 accross multiple networks, this would likely be a hard to diagnose issue which I would probably waste hours on trying to bebug my IPv6 config. I know it's in the release notes... but who looks at the release notes after they've upgraded.

Feels like a big miss for a major release.

Happy to setup some test networks and hosts if that would help... what is needed?

@openwrt-bot
Copy link
Author

PepePepe:

Another affected user here. TP-Link Archer C7 v5.
IPv6 throughput is cut to less than half after upgrading to 21.02 but forced to remove the firewall rules to disable offloading as suggested above.

@openwrt-bot
Copy link
Author

Kris:

Same for me on Xiaomi Mi Router 3G. With 19.07, SW offload was fine, HW offload broken. On 21.02, SW/HW offload is broken on IPv6 and SW offload is ok on IPv4 (not HW offload on IPv4). By broken, I mean permanent connections (SSH, RDP) froze and are closed down.

@openwrt-bot
Copy link
Author

pmarks:

Is this on track for the 21.02.1 release? My Archer A7 is currently stuck on 19.07.x due to the performance regression.

@openwrt-bot
Copy link
Author

Hou-dev:

I am running the Edgerouter-X 21.02 and it does have the same issue with IPv6.Disabling IPv6 solves the issues of performance regression. HWNAT with IPv4 does not show any signs of performance loss.

@openwrt-bot
Copy link
Author

Kris:

@Hou-dev and all,
On what do you deactivate IPv6 for this issue? Globally or on a specific interface?

@openwrt-bot
Copy link
Author

Hou-dev:

@kris I just disabled from the interfaces section in the Network tab. I clicked STOP WAN6 and Disabled "Bring up on boot" in the edit button.

@openwrt-bot
Copy link
Author

ritaro:

I have the same problem with WRC-2533GST2 ndp-proxy. Some IPv6 UDP packets drop only when flow-offloading is enabled.
I compared netfilter source code with openwrt-19.07 and made a patch. Tested on 21.02 latest commit (266890b), IPv6 packets drop seems to be fixed.

@openwrt-bot
Copy link
Author

alexandreteles:

Can confirm. I was able to reproduce the bug on a Xiaomi Mi Router 3G on v21.02.1.

@openwrt-bot
Copy link
Author

fda:

After some months on E5480 with disabled flow offload there where no problems with a dual-stack internet access

@openwrt-bot
Copy link
Author

Shine-:

Also see [[https://github.com//pull/4849|GitHub PR#4849]] - likely same author as above, but much smaller patch. Did anybody try that yet?

Edit: PR has been updated by author to align with upstream.

Question again: anybody try that yet? I'm back at 19.07 with all my boxes, so I can't test in real life.

@Shine-
Copy link

Shine- commented Mar 2, 2022

As mentioned above, this issue is fixed for version 21.02 (kernel 5.4) by PR #4849.

@sebdanielsson
Copy link

PR still open. Is it safe to activate software offloading or do we need to wait for merge?

@Shine-
Copy link

Shine- commented Mar 2, 2022

Since it's not merged yet, you'll have to apply the patch and compile yourself, obviously...

@Headcrabed
Copy link

Headcrabed commented Mar 5, 2022

So why the patch haven't been merged? The patch have been there for a long time. Or there's still bug?

@Leo-PL
Copy link
Contributor

Leo-PL commented Mar 6, 2022

I think it needs more testing, and not everyone is capable of testing that - the problem manifests on native IPv6 connections - although this could be sort of "simulated' using two routers, one using 19.07 acting as PPPoE/6in4/whatever endpoint, and announcing a prefix to device under test. I might do just that soon, when I find some spare time.

@swg0101
Copy link
Contributor

swg0101 commented Mar 22, 2022

On master commit af434e0 - getting about 15% packet loss when offload is turned on on my R7800 on kernel 5.10.107, so definitely not just a 5.4 problem.

@bennyzhou88
Copy link

Since IPv6 doesn't involve NAT, turn the "flow-offloading" on will not improve the IPv6 performance. Only IPv4 NAT needs "flow-offloading".

My solution is to turn on the HW-NAT for IPv4 only

@swg0101
Copy link
Contributor

swg0101 commented Apr 4, 2022

Since IPv6 doesn't involve NAT, turn the "flow-offloading" on will not improve the IPv6 performance. Only IPv4 NAT needs "flow-offloading".

My solution is to turn on the HW-NAT for IPv4 only

I don't believe that's true at all. Enabling software offloading creates a flowable in which more or less the flow configuration is saved. While NAT is part of that decision, stuff like routing decisions and interfaces are also cached. I think this page explains quite clearly how it works:
https://www.kernel.org/doc/html/v5.8/networking/nf_flowtable.html

@bennyzhou88
Copy link

Since IPv6 doesn't involve NAT, turn the "flow-offloading" on will not improve the IPv6 performance. Only IPv4 NAT needs "flow-offloading".
My solution is to turn on the HW-NAT for IPv4 only

I don't believe that's true at all. Enabling software offloading creates a flowable in which more or less the flow configuration is saved. While NAT is part of that decision, stuff like routing decisions and interfaces are also cached. I think this page explains quite clearly how it works: https://www.kernel.org/doc/html/v5.8/networking/nf_flowtable.html

I agree with you. But IPv6 native routing doesn't get as much benefit as IPv4 NAT with "flow-offloading".

Since this FS#3373 issue is unsolved by now, the best I can do is to leave the IPv6 flow-offload off and only enable it for IPv4.

@Shine-
Copy link

Shine- commented Apr 4, 2022

bennyzhou88, any chance you could post numbers to prove your claim that flow offloading doesn't improve IPv6 performance for you?

If you look at my numbers here, I do see a HUGE difference. You may disagree, but 600% increase in throughput is definitely a huge difference in my opinion.

As mentioned earlier, FS#3373 is in fact fixed with PR #4849, at least for 21.02 (kernel 5.4). Unfortunately, it's not merged, so you have to patch and build on your own.

P.S. I can't mention it often enough. This issue is about software flow offloading only, not "HW-NAT".

@MrObvious
Copy link

Does anyone have a build for Archer C7 v2 with this fix?

@woshilaiba
Copy link

is anyone knows if this fixed in 22.03?

@Shine-
Copy link

Shine- commented Jun 9, 2022

Merged just yesterday into 22.03 branch.
21.02 is still unfixed (PR existing but not merged yet).

@Neverends4
Copy link

Does anyone have a build for Archer C7 v2 with this fix?

download 22.03 rc4 you will have this patch automatically.

@dR3b
Copy link

dR3b commented Aug 7, 2022

I can confirm this on:

Model | MikroTik RouterBOARD 760iGS
Architecture | MediaTek MT7621 ver:1 eco:3
Target Platform | ramips/mt7621
OpenWrt 21.02.3 r16554-1d4dea6d4f / LuCI openwrt-21.02 branch git-22.213.35964-87836ca
5.4.188

@cbeyls
Copy link

cbeyls commented Sep 10, 2022

My TP-Link Archer C7 V5 shows abysmal performance on 22.03 with flow offloading enabled (~100 Mbps vs ~450 Mbps on 19.07). I have yet to identify the cause.
Would it be possible to have the patch merged in a future 21.02 stable release ?

@dR3b
Copy link

dR3b commented Sep 12, 2022

@cbeyls OpenWrt 22.03 is stable now!

@cbeyls
Copy link

cbeyls commented Sep 12, 2022

@dR3b OpenWRT 22.03 is stable but has low performance on my router compared to previous releases so I would rather install 21.02 for now

@Shine-
Copy link

Shine- commented Sep 12, 2022

The fix was merged into 21.02 with #9940 two months ago already.
Perhaps a dev can close this issue, so people will realize it's fixed for all affected versions by now?

@hauke
Copy link
Member

hauke commented Oct 6, 2022

This was fixed
OpenWrt master: efff485
OpenWrt 22.03: 972160a
OpenWrt 21.02: 6d891ad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests