Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#1259 - PPPoE disconnects under high upstream load #5826

Closed
openwrt-bot opened this issue Jan 6, 2018 · 6 comments
Closed

FS#1259 - PPPoE disconnects under high upstream load #5826

openwrt-bot opened this issue Jan 6, 2018 · 6 comments
Labels

Comments

@openwrt-bot
Copy link

llucz:

Hello,

I'm using an OpenWrt (Barrier Breaker r39582 ) TP-Link router to connect to the Internet through a DSL modem, via PPPoE.

Everything is fine, except when under high upstream load (i.e., when I saturate my upstream bandwidth, and only under these conditions). In these conditions, the OpenWrt router is unable to receive modem replies to PPPoE LCP echo-requests, and terminates the PPP connection. This behavior is experienced //only// under heavy upstream traffic, and is pretty deterministic.

This defect is also witnessed by the following log excerpt:

Sat Jan 6 17:47:13 2018 daemon.info dnsmasq[13213]: using nameserver 8.8.4.4#53
Sat Jan 6 17:47:13 2018 daemon.info dnsmasq[13213]: using nameserver 8.8.8.8#53
Sat Jan 6 17:47:13 2018 daemon.info dnsmasq[13213]: using local addresses only for domain lan
Sat Jan 6 17:50:01 2018 daemon.info pppd[12884]: No response to 6 echo-requests
Sat Jan 6 17:50:01 2018 daemon.notice pppd[12884]: Serial link appears to be disconnected.
Sat Jan 6 17:50:01 2018 daemon.info pppd[12884]: Connect time 2.9 minutes.
Sat Jan 6 17:50:01 2018 daemon.info pppd[12884]: Sent 53819998 bytes, received 1176989 bytes

I've attempted to play with PPPD lcp-echo-interval and lcp-echo-failure parameters, but I'm unable to solve the problem through these settings.

I guess the correct solution should be to "prioritize" (QoS?) PPP LCP echos, but maybe there is a simpler way to do it.

Thanks!

@openwrt-bot
Copy link
Author

moeller0:

So a stop-gap measure would be to install sqm-scripts and luci-app-sqm and instantiate a shaper on pppoe-wan with a bandwidth slightly below the real available bandwidth. The shaper on pppoe-wan will never actually see the LCP packets and hence will not be able to drop them and by setting the bandwidth slightly below the available rate thee should always enough pppoe bandwidth t accomodate the lcp packets.

@openwrt-bot
Copy link
Author

weedy:

This isn't a bug, it's "by design".

Since you're flooding your upload buffers on the router and modem the maintenance packets are taking too long and pppd thinks you have a dead link. LEDE borrowed some patches from Debian (I think) that will skip this issue for you.

You should still use QoS for interactivity reasons.

# Adaptive/Smart keepalive
option keepalive_adaptive 1 
## lcp_failure,lcp_interval
option keepalive "7,3"

@openwrt-bot
Copy link
Author

nbd:

The relevant option is already set by default since some time around 2014.

@openwrt-bot
Copy link
Author

cbz:

I'm not sure the option is always set - the pppd option it maps to is lcp-echo-adaptive. This isn't a default option for pppd, and its trivial to see that openwrt launches pppd without this option set (ps output or running strings on /proc/pppd-id/cmdline )

@openwrt-bot
Copy link
Author

cbz:

It also seems to be the logic of this here:

https://github.com/openwrt/openwrt/blob/master/package/network/services/ppp/files/ppp.sh#L123

@openwrt-bot
Copy link
Author

bill888:

fwiw, I witnessed this problem when I started using BT Home Hub 5A on a 55/10mb VDSL2 connection a year ago. Whenever the 10 mbps upstream was saturated, the DSL connection would disconnect and 'No response to 5 echo-requests' would be reported in the system log.

In LuCI->Network->Interfaces->WAN->Advanced Settings, there are two default settings:

LCP echo failure threshold 0
LCP echo interval 5

The 'default' setting LCP echo failure threshold to Zero, implies all LCP echo failures would be ignored. (see attached image) But I found this to be incorrect/misleading.

The solution was to specify a non-zero value in LuCI as in example below

eg.
LCP echo failure threshold 5
LCP echo interval 5

A non-zero value ADDS the option keepalive to /etc/config/network:

config interface 'wan' option keepalive '5 5'

This increased the timeout to 25 seconds and resolved my DSL disconnection issues.

LuCI doesn't appear to be able to manage this keepalive_adaptive option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant