FS#3759 - Idle ssh Connection exits with: client_loop: send disconnect: Broken pipe #8776

openwrt-bot · 2021-04-28T05:39:17Z

farmergreg:

=====Summary=====

Device problem occurs on: x86-64 based router
Affected Software version: 21.02.0-rc1
- This bug does NOT happen when running 19.07.7

=====Steps to reproduce=====

SSH from one computer to another machine that is on the internet

machine 1 < ---- > openwrt 21.02.0-rc1 <---- internet ----> machine 2

Run this command on machine 1: watch 'sleep 3000;date'

After some time has passed (wait about 30 minutes, maybe a little less):

the ssh tunnel will exit when you press enter in the terminal.
The error message will be: "client_loop: send disconnect: Broken pipe"

openwrt-bot · 2021-04-28T12:38:38Z

farmergreg:

Update: it turns out that my simple script above (watch ‘sleep 3000;date’) also exits on 19.07.7 with the "client_loop: send disconnect: Broken pipe" error.

I have another script ( a long running backup job) that doesn't exit with the error when running 19.07.7, but does exit with the error when running the rc version. That script runs a backup (typically 2 - 3 minutes), then sleeps until the next hour starts and then runs again. So I'm not completely sure this is an openwrt bug, but there is certainly a change in behavior between the rc and the 19.07.x releases.

openwrt-bot · 2021-05-02T22:47:43Z

farmergreg:

Update:
It seems that wireguard in 21.02.0-rc1 might be part of the cause.

SSH without wireguard is working fine for me when using the RC candidate.

SSH over a wireguard tunnel is where I start seeing the "client_loop: send disconnect: Broken pipe " error messages.

Network Diagram:

machine1 <-- wireguard --> OpenWRT A <--> internet <--> OpenWRT B <---> wireguard <---> machine2

The wireguard vpn is a site to site link. Both routers involved are running openwrt. Both have ports open so that either router can initiate the wireguard tunnel.

I am able to quickly and easily switch between 19.07.7 and 21.02.0-rc1 on OpenWRT router "A". 19.07.7 works great and ssh connections appear to stay alive indefinitely. 21.02.0-rc1 has the "client_loop: send disconnect: Broken pipe" problem and ssh connections typically exit early/unexpectedly.

openwrt-bot · 2021-06-19T22:45:07Z

farmergreg:

This is present in all rc releases thus far. I installed 21.02.0-rc3 today and an idle ssh connection made over wireguard exits after a little while with "client_loop: send disconnect: Broken pipe"

this does not happen in 19.07.7

openwrt-bot · 2021-08-03T03:01:12Z

farmergreg:

Update: this may be related to #3373. I turned off the "Software flow offloading" on my router. With this change, the ssh connection that exits early seems to stay connected (it's been connected for several hours now). This connection is being made over IPv4 (not ipv6 like in #3373).

openwrt-bot · 2021-08-05T14:13:21Z

farmergreg:

I can confirm that this is still an issue in rc4.

The wireguard tunnel in question is has ipv4 and ipv6 addresses locally, however the ssh connection is made using ipv4 to a vm on the remote side that only has ipv4 enabled.

Turning off software flow offloading is the workaround for now.

openwrt-bot · 2021-08-06T07:03:29Z

patrakov:

This is the same as https://forum.openwrt.org/t/software-flow-offloading-and-conntrack-timeouts/74588 and effectively confirmed as a WONTFIX in that thread.

openwrt-bot · 2021-08-06T15:46:04Z

buraktamturk:

I am having the same issue with no wireguard setup (just pppoe connection), disabling software flow offloading as a workaround fixed the issue.

Shine- · 2022-03-02T12:30:08Z

Kernel issue, not an OpenWRT issue. Caused by kernel commit e97d940.

It's fixed in Master (kernel 5.15) with kernel commit 4592ee7f.

To fix it for 22.03 (kernel 5.10), you can directly backport 4592ee7f into kernel 5.10.

For 21.02 (kernel 5.4), see the attached patch for a fix.

Note that the attached patch for kernel 5.4 is not a revert or backport, but a hack. Changes in the kernel code before 5.4 prevent e97d940 (which caused the issue) from being reverted directly, while features added later than kernel 5.4 prevent 4592ee7f (which fixed the issue for kernel 5.15) from being backported directly. So this patch essentially picks all necessary changes from 4592ee7f, while skipping / working around stuff that kernel 5.4 doesn't have yet (ie. sysctl-configurable offload timeout) - to restore the behavior from before e97d940.

remove-timeouts-for-flow-offload-pickup.zip

Shine- · 2022-06-09T13:34:26Z

@Ansuel Hope that mentioning you here triggers a notification to you in GH. I don't want to clutter the 22.03 release thread.

More detail (I hope that I can recall everything properly after all those months):

When conntrack drops a flow from offloaded back to non-offloaded, pre-e97d940 kept the flow lingering until the actual TCP/UDP connection timeout (then hardcoded, iirc). This was, and still is, the proper behavior. However, the author of e97d940 decided that arbitrarily using timeouts of 2 min (TCP) resp. 30 sec (UDP) was a much better idea and hardcoded these into his commit.

The effect was that, once a flow is given back from offloaded to non-offloaded due to inactivity, and it stayed idle for more than the arbitrary 2m resp. 30s, it was dropped completely, causing this very issue.

Later, these "offload pickup timeouts" were made sysctl-configurable, and finally, with 4592ee7f, luckily someone decided that it's pointless to have different TCP/UDP timeouts for flows released from offloading than for non-offloaded flows. Therefore, "flow offload pickup" timeouts were dropped completely, making normal TCP/UDP timeouts apply again. By then, the timeout for giving back offloaded flows was configurable, so 4592ee7f can't be backported to 5.4 without changes.

I wasn't (knowingly) affected by this issue, but when someone insisted in this being the cause for #8239, I decided to test it out. Of course, wasn't related...

Then I almost forgot that I created this 5.4-hack. But at some point - or rather, when I switched from 19.07 to 21.02 - I noticed that I'm experiencing exactly this issue: unstable SSH connections when they're idle. Since then, I'm using my 5.4-hack in every 21.02 build (everyday use), and the 5.15-backport for every 22.03 build ~~(so far only testing, not everyday use)~~ in daily/production use.

~~So, even though I never made an actual test with 22.03:~~ Kernel 5.10 has the exact same stupid default behavior for flow offload pickups as 5.4 had. Backporting the commit that fixed it for 5.15, will definitely does also fix it for 5.10.

Edit: Should've re-read the commit messages again, before writing this text off-hand. This is about offloaded vs. non-offloaded, not hw vs. sw offload. Text above fixed. Sorry.

Ansuel · 2022-06-09T13:57:17Z

@Shine- i don't know how to handle 5.4 but for sure we should be able to fix this for 5.10 with the 22.03 release. Hope to gets more comments about the topic.

@aparcar any hint on how to proceed?

Shine- · 2022-06-09T14:05:36Z

I think that I saw this backport as part of a "backport some missing flowtable fixes" commit somewhere in @hauke's staging tree. So perhaps it's already pending for 22.03 release?

As for 5.4, anybody can take my patch and do with it whatever they want - use it, change it, PR it, etc. - I don't need (or want) to be credited.

Shine- · 2022-08-07T09:34:10Z

Ping?
22.03 RC6 still doesn't have the fix... it's a straight kernel 5.15 backport without any regressions!

aparcar · 2022-08-07T11:38:54Z

@Shine- thank you very much for investigating this issue and also for you patience!

@Ansuel if you have the time please backport this to 22.03 as soon as possible, I think we have more time until the next 21.03 point release.

@hauke does this ring a bell? I didn't see these patches in your latest work but may overseen something

Ansuel · 2022-08-07T11:40:25Z

@aparcar ok will backport this... just to make sure for both 5.10 and 5.15 right?

aparcar · 2022-08-07T11:44:24Z

Based on the comments I assume it's fixed in 5.15

Shine- · 2022-08-07T11:54:33Z

Thanks for your quick replies!
Yes, 5.15 already has this fix, so only 5.10 backport is needed - and perhaps the 5.4 hack later on, in case it qualifies, in terms of code quality ;-)

P.S.: found Hauke's staging commit, it's backport 613-02 from here: https://git.openwrt.org/?p=openwrt/staging/hauke.git;a=commitdiff;h=ac21c648ad5b16359bea99f2af3b512639b10e3c

Ansuel · 2022-08-07T12:48:26Z

@aparcar considering the commit in staging how should we proceed? wait for @hauke or pick just the relevant backport patch?

hauke · 2022-08-08T22:04:41Z

You can take the patch from my branch. I do not have a real test setup here now.

This backports some patches from kernel 5.15 to fix issues with flowtable offloading in kernel 5.10. OpenWrt backports most of the patches related to flowtable offloading from kernel 5.15 already, but we are missing some of the extra fixes. This fixes some connection tracking problems when a flow gets removed from the offload and added to the normal SW path again. The patch 614-v5.18-netfilter-flowtable-fix-TCP-flow-teardown.patch was extended manually with the nf_conntrack_tcp_established() function. All changes are already included in kernel 5.15. Fixes: openwrt#8776 Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>

This backports some patches from kernel 5.15 to fix issues with flowtable offloading in kernel 5.10. OpenWrt backports most of the patches related to flowtable offloading from kernel 5.15 already, but we are missing some of the extra fixes. This fixes some connection tracking problems when a flow gets removed from the offload and added to the normal SW path again. The patch 614-v5.18-netfilter-flowtable-fix-TCP-flow-teardown.patch was extended manually with the nf_conntrack_tcp_established() function. All changes are already included in kernel 5.15. Fixes: #8776 Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> (cherry picked from commit 96ef2da)

This backports some patches from kernel 5.15 to fix issues with flowtable offloading in kernel 5.10. OpenWrt backports most of the patches related to flowtable offloading from kernel 5.15 already, but we are missing some of the extra fixes. This fixes some connection tracking problems when a flow gets removed from the offload and added to the normal SW path again. The patch 614-v5.18-netfilter-flowtable-fix-TCP-flow-teardown.patch was extended manually with the nf_conntrack_tcp_established() function. All changes are already included in kernel 5.15. Fixes: openwrt#8776 Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>

aparcar added the release/21.02 pull request/issue targeted (also) for OpenWrt 21.02 release label Feb 22, 2022

Shine- mentioned this issue Jun 9, 2022

[22.03] Upcoming release requirements #9248

Open

5 tasks

hauke mentioned this issue Aug 8, 2022

kernel: Backport upstream flowtable patches from 5.15 #10422

Closed

jow- closed this as completed in 96ef2da Aug 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FS#3759 - Idle ssh Connection exits with: client_loop: send disconnect: Broken pipe #8776

FS#3759 - Idle ssh Connection exits with: client_loop: send disconnect: Broken pipe #8776

openwrt-bot commented Apr 28, 2021

openwrt-bot commented Apr 28, 2021

openwrt-bot commented May 2, 2021

openwrt-bot commented Jun 19, 2021

openwrt-bot commented Aug 3, 2021

openwrt-bot commented Aug 5, 2021

openwrt-bot commented Aug 6, 2021

openwrt-bot commented Aug 6, 2021

Shine- commented Mar 2, 2022 •

edited

Shine- commented Jun 9, 2022 •

edited

Ansuel commented Jun 9, 2022 •

edited

Shine- commented Jun 9, 2022 •

edited

Shine- commented Aug 7, 2022

aparcar commented Aug 7, 2022

Ansuel commented Aug 7, 2022

aparcar commented Aug 7, 2022

Shine- commented Aug 7, 2022 •

edited

Ansuel commented Aug 7, 2022

hauke commented Aug 8, 2022

FS#3759 - Idle ssh Connection exits with: client_loop: send disconnect: Broken pipe #8776

FS#3759 - Idle ssh Connection exits with: client_loop: send disconnect: Broken pipe #8776

Comments

openwrt-bot commented Apr 28, 2021

openwrt-bot commented Apr 28, 2021

openwrt-bot commented May 2, 2021

openwrt-bot commented Jun 19, 2021

openwrt-bot commented Aug 3, 2021

openwrt-bot commented Aug 5, 2021

openwrt-bot commented Aug 6, 2021

openwrt-bot commented Aug 6, 2021

Shine- commented Mar 2, 2022 • edited

Shine- commented Jun 9, 2022 • edited

Ansuel commented Jun 9, 2022 • edited

Shine- commented Jun 9, 2022 • edited

Shine- commented Aug 7, 2022

aparcar commented Aug 7, 2022

Ansuel commented Aug 7, 2022

aparcar commented Aug 7, 2022

Shine- commented Aug 7, 2022 • edited

Ansuel commented Aug 7, 2022

hauke commented Aug 8, 2022

Shine- commented Mar 2, 2022 •

edited

Shine- commented Jun 9, 2022 •

edited

Ansuel commented Jun 9, 2022 •

edited

Shine- commented Jun 9, 2022 •

edited

Shine- commented Aug 7, 2022 •

edited