-
-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FS#3373 - IPV6 flow offload broken #8239
Comments
netprince: I can confirm, using -j FLOWOFFLOAD from ip6tables causes ipv6 connections to stall after a few minutes. |
netprince: Just another datapoint, using the exact same configuration for each build. This was tested using a Xiaomi Mi Router 3G. This occurs with and without hardware offload enabled. A build of OpenWrt 19.07-SNAPSHOT, r11285-11f4918ebb (dated 20210125), offloading works great. A build of OpenWrt SNAPSHOT, r15599-37752336bd (dated 20210125), ipv6 connections stall after being idle for a while. (switching back and forth between apps on a phone for example) When the connection stalls, everything in the app (on my phone) just hangs for about 8-10 seconds before finally coming back to life. |
hacc1225: Tested on ath79 (TP-Link Archer C7 v2) with the same issues. |
hacc1225: When -j FLOWOFFLOAD is enable, the IPv6 packet loss rate is as high as 50%. |
ijoda: I'm still seeing this issue in OpenWrt 21.02.0-rc1 r16046-59980f7aaf (netgear r7800). I'd be happy to help diagnose this if given some assistance in how to best track it down. |
aag: confirm this exact issue still happening on r16708-e7249669d2 on r7800 |
kkeijzer: I also have this issue on a WDR4300 with 21.02.0-rc3. Flow offloading was working fine before, with 19.07.7. Connections break completely after a while, all kinds of elements on web pages are not loaded, and so on. |
andybotting: Took me a week before I discovered this issue, and I find this bug report just after I add a new topic to the forum https://forum.openwrt.org/t/21-02-0-rc3-losing-ipv6-packets-when-flow-offloading-1/100575 In the mean time, I've just added this line to /etc/firewall.user |
farmergreg: I think I may have a similar issue with IPv4 and flow offloading turned on. Idle ssh connections drop relatively quickly when offloading is enabled. When offloading is turned off, they stay connected indefinately. Please see issue #3759 where I have documented the problems that I have had. |
farmergreg: I meant to write please see issue #3759. Thanks! |
bohdan-s: Hi, I have confirm my bug FS#3973 is linked to this bug. Disabling offloading or IPv6 resolves the issue. But when both are enabled this bug is observed. |
Shine-: OK, seriously. You released 21.02 today. With this bug. I updated. It - well - STOPPED WORKING. You know, most major websites, like, Google. Facebook. Ebay. Are using IPv6. And you are releasing a new major release of OpenWRT that stops supporting IPv6. Wow. Hello? I mean, yes, offloading is disabled by default. But every 19.07 user has it turned on, because OpenWRT is painfully slow without. And now you break it? And nevertheless release the broken version as a new major release? Again, wow. Wow. |
cuviper: The 21.02 release has not been announced yet, but I expect this will be a known issue in the release notes, just as it was in rc4. The developers decided not to let this be an indefinite blocker, as the release is already very late. It sounds like this will still be investigated for a point release though. |
fda: I have multiple vlans and separate ULAs on every interface. odhcpd announces every subnet to every vlan! |
User_nikolad: Archer C7-V2. Problem confirmed |
Cheddoleum: I'm puzzled at the lack of a substantive description of this bug. There is no detail, no steps to reproduce, not even any anecdotes except a handful contributed in the comments. For example, does this affect native IPv6 connections only? Tunneled connections such as 6rd or 6in4 use IPv4 for ingress and egress, but then are routed internally to the destination LAN subnet as v6, so I'm guessing that the v6 flow tables, and therefore this bug, would be in effect. But that is just a guess; as this is in general release, it would be helpful to know whether mitigation is needed without doing a bunch of ad-hoc, nearly-uninformed testing. |
stephenseiber: details ipv6 is borked when offloading is enabled. steps to reproduce have ipv6 on and turn on software or hardware offloading. this effects visiting ipv6 primary sites like google, youtube, facebook etc. it effects all connections to ipv6 sites. how is ipv6 borked well when you go to an ipv6 website it will sometimes load instantly like ipv4 but every other or every reload will take several seconds to load or timeout.
for hardware offloading do this i havent tried with software offloading cause it works with hardware offloading for me |
fda: I have native ipv6 but with nat6 enabled. An unbound behind an openwrt router can not use UDP connections when sw-offloading is enabled! Tcp connections are working, but from time to time there are errors in syslog |
Cheddoleum: This thread suggests that the change that caused the problem is known and revertible, though I see no confirmation yet: http://lists.openwrt.org/pipermail/openwrt-devel/2021-August/036223.html Meanwhile, to perhaps answer my own earlier question, I'm running tunneled IPv6 (Hurricane Electric 6in4) in 21.02.0 and there is as yet no indication of this problem with either HW or SW flow offloading. |
stephenseiber: there has only been two people who have even mentioned it in the 8 days since 21.02 was dropped |
Jacko_neill: Another affected user here, Xiaomi mi3g. Disabling the rule via ip6tables as indicated above works for me as a workaround. I'd still like to keep flow offload enabled for ipv4, as this router isn't powerful enough to handle the 300mps connection my ISP provides. I can reproduce this pretty consistently on iphone when scrolling through Facebook/Instagram and switching apps every minute or so. Not sure though on what sort of logs I could grab to help confirming the bug |
Shine-: If this is really caused by kernel commit [[https://github.com/torvalds/linux/commit/e97d940|e97d940]] (Edit: it's not, see below!) as suggested in [[http://lists.openwrt.org/pipermail/openwrt-devel/2021-August/036223.html|the mailing list]] then the attached patch should fix it. It effectively reverts e97d940 and additionally picks all necessary changes from [[https://github.com/torvalds/linux/commit/4592ee7f525c4683ec9e290381601fdee50ae110|4592ee7f]], skipping stuff that kernel 5.4 doesn't have yet (like configurable offload timeouts).
Edit: So I did dig up a spare box to test, and no, it's not working, even with this patch. Would be strange anyway, since if that were the cause, then both, IPv4 and IPv6 would be affected by the bug, not only IPv6, and also, the packet loss would only happen after 30 or 120 seconds (which are the hardcoded timeouts). However, it occurs pretty much immediately. Feel free to confirm for yourself, I can't delete the attachment anyway. Maybe a mod can delete it, then I'd re-attach it with proper name and description, for reference. It might be the fix for FS#3759, after all. |
aragorn: Affected, too, I can confirm that after I updated my Archer C2600 from 19.07 to 21.02, software flow offloading on, this broke IPv6 connections immediately. |
supersebbo: I am also intruiged by the lack of definition and priority of this bug. I'm not currently seeing any issues as I am only using IPv6 on one internal network so nothing traverses the router. However its concerning to think that if I was to start using IPv6 accross multiple networks, this would likely be a hard to diagnose issue which I would probably waste hours on trying to bebug my IPv6 config. I know it's in the release notes... but who looks at the release notes after they've upgraded. Feels like a big miss for a major release. Happy to setup some test networks and hosts if that would help... what is needed? |
PepePepe: Another affected user here. TP-Link Archer C7 v5. |
Kris: Same for me on Xiaomi Mi Router 3G. With 19.07, SW offload was fine, HW offload broken. On 21.02, SW/HW offload is broken on IPv6 and SW offload is ok on IPv4 (not HW offload on IPv4). By broken, I mean permanent connections (SSH, RDP) froze and are closed down. |
pmarks: Is this on track for the 21.02.1 release? My Archer A7 is currently stuck on 19.07.x due to the performance regression. |
Hou-dev: I am running the Edgerouter-X 21.02 and it does have the same issue with IPv6.Disabling IPv6 solves the issues of performance regression. HWNAT with IPv4 does not show any signs of performance loss. |
Kris: @Hou-dev and all, |
Hou-dev: @kris I just disabled from the interfaces section in the Network tab. I clicked STOP WAN6 and Disabled "Bring up on boot" in the edit button. |
Shine-: Also see [[https://github.com//pull/4849|GitHub PR#4849]] - likely same author as above, but much smaller patch. Did anybody try that yet? Edit: PR has been updated by author to align with upstream. Question again: anybody try that yet? I'm back at 19.07 with all my boxes, so I can't test in real life. |
As mentioned above, this issue is fixed for version 21.02 (kernel 5.4) by PR #4849. |
PR still open. Is it safe to activate software offloading or do we need to wait for merge? |
Since it's not merged yet, you'll have to apply the patch and compile yourself, obviously... |
So why the patch haven't been merged? The patch have been there for a long time. Or there's still bug? |
I think it needs more testing, and not everyone is capable of testing that - the problem manifests on native IPv6 connections - although this could be sort of "simulated' using two routers, one using 19.07 acting as PPPoE/6in4/whatever endpoint, and announcing a prefix to device under test. I might do just that soon, when I find some spare time. |
On master commit af434e0 - getting about 15% packet loss when offload is turned on on my R7800 on kernel 5.10.107, so definitely not just a 5.4 problem. |
Since IPv6 doesn't involve NAT, turn the "flow-offloading" on will not improve the IPv6 performance. Only IPv4 NAT needs "flow-offloading". My solution is to turn on the HW-NAT for IPv4 only |
I don't believe that's true at all. Enabling software offloading creates a flowable in which more or less the flow configuration is saved. While NAT is part of that decision, stuff like routing decisions and interfaces are also cached. I think this page explains quite clearly how it works: |
I agree with you. But IPv6 native routing doesn't get as much benefit as IPv4 NAT with "flow-offloading". Since this FS#3373 issue is unsolved by now, the best I can do is to leave the IPv6 flow-offload off and only enable it for IPv4. |
bennyzhou88, any chance you could post numbers to prove your claim that flow offloading doesn't improve IPv6 performance for you? If you look at my numbers here, I do see a HUGE difference. You may disagree, but 600% increase in throughput is definitely a huge difference in my opinion. As mentioned earlier, FS#3373 is in fact fixed with PR #4849, at least for 21.02 (kernel 5.4). Unfortunately, it's not merged, so you have to patch and build on your own. P.S. I can't mention it often enough. This issue is about software flow offloading only, not "HW-NAT". |
Does anyone have a build for Archer C7 v2 with this fix? |
is anyone knows if this fixed in 22.03? |
Merged just yesterday into 22.03 branch. |
download 22.03 rc4 you will have this patch automatically. |
I can confirm this on: Model | MikroTik RouterBOARD 760iGS
Architecture | MediaTek MT7621 ver:1 eco:3
Target Platform | ramips/mt7621
OpenWrt 21.02.3 r16554-1d4dea6d4f / LuCI openwrt-21.02 branch git-22.213.35964-87836ca
5.4.188 |
My TP-Link Archer C7 V5 shows abysmal performance on 22.03 with flow offloading enabled (~100 Mbps vs ~450 Mbps on 19.07). I have yet to identify the cause. |
@cbeyls OpenWrt 22.03 is stable now! |
@dR3b OpenWRT 22.03 is stable but has low performance on my router compared to previous releases so I would rather install 21.02 for now |
The fix was merged into 21.02 with #9940 two months ago already. |
Commit 3b4985ab fixes this issue This reverts commit 8649de6b3ea8a847ae0c6dbd6026ad7b3fdc367e.
graphine:
Enabling this option makes the ipv6 connection unstable.
Example: connecting to #openwrt-devel from hexchat (Ubuntu) makes you constantly disconnect.
Disabling flow offload makes the connection stable.
The text was updated successfully, but these errors were encountered: