Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#854 - Unstable Internet caused by frequent PPPoE reconnect, network.wan.keepalive=0 has no effect #7996

Closed
openwrt-bot opened this issue Jun 18, 2017 · 5 comments
Labels

Comments

@openwrt-bot
Copy link

ppuzr:

daemon.info pppd[7738]: No response to 5 echo-requests daemon.notice pppd[7738]: Serial link appears to be disconnected. daemon.info pppd[7738]: Connect time 1.0 minutes. daemon.info pppd[7738]: Sent 28006 bytes, received 46170 bytes. daemon.notice pppd[7738]: Connection terminated. daemon.info pppd[7738]: Connect time 1.0 minutes. daemon.info pppd[7738]: Sent 28006 bytes, received 46170 bytes. daemon.info pppd[7738]: Sent PADT daemon.info pppd[7738]: Exit.

network.wan.keepalive is not set, and on luci "LCP echo failure threshold" shows a grey 0 and the description below it says "Presume peer to be dead after given amount of LCP echo failures, use 0 to ignore failures", but this description is not consistent with the behaviour.

After running the commands below, the grey 0 becomes darker, but the problem persists.

root@LEDE:# uci set network.wan.keepalive=0
root@LEDE:
# /etc/init.d/network restart

The value of network.wan.keepalive when set using luci has two numbers separated by a space (which appears to be in the format of '[threshold] [interval]'), but it should be a number according to the wiki [[https://lede-project.org/docs/user-guide/wan_interface_protocols?s[]=pppoe#protocol_pppoe_ppp_over_ethernet]]

I then tried the following commands, and there're no reconnects after one hour.

root@LEDE:# uci set network.wan.keepalive='0 1'
root@LEDE:
# /etc/init.d/network restart

Is it better to have a default of 0 (default=undefined network.wan.keepalive) instead of seemingly 5 which would also be consistent with luci?

@openwrt-bot
Copy link
Author

ppuzr:

It turns out the last commands I tried below didn't help.

root@LEDE:# uci set network.wan.keepalive='0 1'
root@LEDE:
# /etc/init.d/network restart

pppd still didn't ignore echo failures

daemon.info pppd[16382]: No response to 5 echo-requests
daemon.notice pppd[16382]: Serial link appears to be disconnected.
daemon.info pppd[16382]: Connect time 0.4 minutes.
daemon.info pppd[16382]: Sent 881 bytes, received 733 bytes.
daemon.notice pppd[16382]: Connection terminated.
daemon.info pppd[16382]: Connect time 0.4 minutes.
daemon.info pppd[16382]: Sent 881 bytes, received 733 bytes.
daemon.info pppd[16382]: Sent PADT
daemon.info pppd[16382]: Exit.

@openwrt-bot
Copy link
Author

ppuzr:

Before a proper solution is found, it seems that I could work around it by setting a large lcp-echo-interval (86400) to reduce the echo request frequency and optionally a large echo failure threshold (1024)

root@LEDE:# uci set network.wan.keepalive='1024 86400'
root@LEDE:
# /etc/init.d/network restart

@openwrt-bot
Copy link
Author

MomenMamdouh:

the unstable pppoe connection still exists in lede-17.01.4

@openwrt-bot
Copy link
Author

kochstefan:

I have noticed this issue the first time at the begin of the year 2016 with OpenWRT. The issue occurs for me every time when connecting to my SSH-server behind the OpenWRT router and load (SSH/SCP server uploads to SSH/SCP downloading remote client) a big file.

So the SSH/SCP client does this:
scp domainname.of.server:/path/to/big/file bigfile
bigfile 0% 3712KB 0.0KB/s - stalled -

To reproduce: The SSH server behind OpenWRT router needs a slower internet connection as the remote client (server's upload speed is slower than client's download speed). So the upload speed of the SSH servers internet connection is fully exhausted.

I assume that the keepalive message does not get transmitted properly because of exhausted internet connection. The question is, why there is no prioritization or balancing.

The workaround for me was to add "option keepalive '100 30'" to the WAN interface within /etc/config/network.

Recently, I noticed a similar failure again. The issue occurs every time when loading a big file via SCP, too. So I have tried your suggestion of '1024 86400', too. But it seems to be another error (not the keepalive issue):
Tue May 1 14:32:32 2018 daemon.notice pppd[1905]: Modem hangup
Tue May 1 14:32:32 2018 daemon.info pppd[1905]: Connect time 9.8 minutes.
Tue May 1 14:32:32 2018 daemon.info pppd[1905]: Sent 114234224 bytes, received 2755645 bytes.

If no keepalive option is set I get the log messages:
Tue May 1 14:54:03 2018 daemon.info pppd[1874]: No response to 5 echo-requests
Tue May 1 14:54:03 2018 daemon.notice pppd[1874]: Serial link appears to be disconnected.
Tue May 1 14:54:03 2018 daemon.info pppd[1874]: Connect time 5.8 minutes.
Tue May 1 14:54:03 2018 daemon.info pppd[1874]: Sent 4718939 bytes, received 539442 bytes.

With both options '100 30' and '1024 86400' there is no "No response to X echo-requests" message within the log.

pppd was started with:
/usr/sbin/pppd nodetach ipparam wan ifname pppoe-wan lcp-echo-interval 86400 lcp-echo-failure 1024 lcp-echo-adaptive +ipv6 set AUTOIPV6=1 nodefaultroute usepeerdns maxfail 1 [...]

I use OpenWrt SNAPSHOT, r6013-112f0469c4 on a lantiq xway vrx288 based router with a 25 MBit/s (down), 5 MBit/s (up) VDSL connection.

The following workaround is possible, too. It solves both the keepalive and Modem hangup errors. Enabling SQM (simple.qos) on pppoe-wan, in my case with 25000 kbit/s at download and 5000 kbit/s at upload. A luci web page for SQM configuration is available, too. VDSL2 syncs with fixed rate down: 25.088 Mb/s and up: 5.056 Mb/s.
Using this workaround allows to remove the keepalive line from wan interface within /etc/config/network.

@openwrt-bot
Copy link
Author

Sven:

I had the exact same problem on an o2 Box 6431 (Arcadyan VGV7510KW22) and managed to resolve it by activating QoS and limiting pppoe-wan's bandwidth (both uplink and downlink) to 97%. Many thanks @kochstefan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant