OpenWrt/LEDE Project

  • Status Researching
  • Percent Complete
    0%
  • Task Type Bug Report
  • Category Base system
  • Assigned To No-one
  • Operating System All
  • Severity High
  • Priority Medium
  • Reported Version Trunk
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Carlo - 14.10.2016
Last edited by Mathias Kresin - 25.02.2017

FS#227 - VLAN support mismatch between preinit and default network config

PPPoE is broken on WRT1900ACS

Upgraded from Lede r578 to latest Lede r1814 and PPPOE doesn’t work anymore altough the pppd version and PPPoE version are the same:
- Linksys WRT1900ACS
- LEDE reboot r1814

pppd debug log:

Plugin rp-pppoe.so loaded.
RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7
Send PPPOE Discovery V1T1 PADI session 0×0 length 4
dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4
[service-name]
Send PPPOE Discovery V1T1 PADI session 0×0 length 4
dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4
[service-name]
Send PPPOE Discovery V1T1 PADI session 0×0 length 4
dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4
[service-name]
Timeout waiting for PADO packets
Unable to complete PPPoE Discovery
Plugin rp-pppoe.so loaded.
RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7
Send PPPOE Discovery V1T1 PADI session 0×0 length 4
dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4
[service-name]
Send PPPOE Discovery V1T1 PADI session 0×0 length 4
dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4
[service-name]
Send PPPOE Discovery V1T1 PADI session 0×0 length 4
dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4
[service-name]

While on the same hardware running LEDE r578, the PPPoE module works as expected:

Plugin rp-pppoe.so loaded.
RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7
Send PPPOE Discovery V1T1 PADI session 0×0 length 4
dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4
[service-name]
Recv PPPOE Discovery V1T1 PADO session 0×0 length 40
dst c2:56:27:ca:d7:d4 src a0:f3:e4:34:d8:21
[service-name] [AC-name acc-aln1.hac] [AC-cookie 75 58 37 a5 ba 3c e4 a5 2a 61 bb 23 92 5c 1b dc]
Send PPPOE Discovery V1T1 PADR session 0×0 length 24
dst a0:f3:e4:34:d8:21 src c2:56:27:ca:d7:d4
[service-name] [AC-cookie 75 58 37 a5 ba 3c e4 a5 2a 61 bb 23 92 5c 1b dc]
Recv PPPOE Discovery V1T1 PADS session 0x30b length 4
dst c2:56:27:ca:d7:d4 src a0:f3:e4:34:d8:21
[service-name]
PADS: Service-Name: ‘’ PPP session is 779
Connected to a0:f3:e4:34:d8:21 via interface eth0
using channel 2
Using interface pppoe-wan
Connect: pppoe-wan ←→ eth0
sent [LCP ConfReq id=0×1 <mru 1492> <magic 0xc6952556>]
rcvd [LCP ConfReq id=0×66 <mru 1492> <auth chap MD5> <magic 0x4cc73648>]
sent [LCP ConfAck id=0×66 <mru 1492> <auth chap MD5> <magic 0x4cc73648>]
rcvd [LCP ConfAck id=0×1 <mru 1492> <magic 0xc6952556>]
sent [LCP EchoReq id=0×0 magic=0xc6952556]
rcvd [CHAP Challenge id=0×1 <7131a44524d1de8f1cd1061cac6d8c071d8bfe7351bc4ea7bd08f56684428475f229ba177a192696ebab32>, name = “acc-aln1.hac”]
sent [CHAP Response id=0×1 <4bb1a418b298790b128ad4d7ef3109ad>, name = “bthomehub@btbroadband.com”]
rcvd [LCP EchoRep id=0×0 magic=0x4cc73648]
rcvd [CHAP Success id=0×1 “CHAP authentication success”]
CHAP authentication succeeded: CHAP authentication success
CHAP authentication succeeded
peer from calling number A0:F3:E4:34:D8:21 authorized
sent [IPCP ConfReq id=0×1 <addr 0.0.0.0> <ms-dns1 0.0.0.0> <ms-dns2 0.0.0.0>]
sent [IPV6CP ConfReq id=0×1 <addr fe80::c595:37d1:3987:1929>]
rcvd [IPV6CP ConfReq id=0x7b <addr fe80::0221:05ff:feb4:8824>]
sent [IPV6CP ConfAck id=0x7b <addr fe80::0221:05ff:feb4:8824>]
rcvd [IPCP ConfReq id=0×38 <addr 172.16.12.12>]
sent [IPCP ConfAck id=0×38 <addr 172.16.12.12>]
rcvd [IPCP ConfNak id=0×1 <addr 81.146.2.155> <ms-dns1 81.139.57.100> <ms-dns2 81.139.56.100>]
sent [IPCP ConfReq id=0×2 <addr 81.146.2.155> <ms-dns1 81.139.57.100> <ms-dns2 81.139.56.100>]
rcvd [IPV6CP ConfAck id=0×1 <addr fe80::c595:37d1:3987:1929>]
local LL address fe80::c595:37d1:3987:1929
remote LL address fe80::0221:05ff:feb4:8824
Script /lib/netifd/ppp-up started (pid 2646)
rcvd [IPCP ConfAck id=0×2 <addr 81.146.2.155> <ms-dns1 81.139.57.100> <ms-dns2 81.139.56.100>]
local IP address 81.146.2.155
remote IP address 172.16.12.12
primary DNS address 81.139.57.100
secondary DNS address 81.139.56.100
ppp.log
secondary DNS address 81.139.56.100
Script /lib/netifd/ppp-up started (pid 2653)
Script /lib/netifd/ppp-up finished (pid 2646), status = 0×9 Script /lib/netifd/ppp-up finished (pid 2653), status = 0×9

Project Manager
Mathias Kresin commented on 14.10.2016 12:43

Would you please attach/paste your /e/c/network! Do you have to vlan tag your PPPoE traffic? Which ISP?

Are you able to compile your own image? It would be helpful if you can do a git bisect to find the commit which broke PPPoE on your WRT1900ACS.

Carlo commented on 14.10.2016 13:00

This is the intrface config for the pppoe traffic:

config interface 'wan'

      option ifname 'eth0'
      option proto 'pppoe'
      option username 'bthomehub@btbroadband.com'
      option password 'bt'
      option timeout '10'

I use the same config for the workign and not working LEDE build.
The provider is BT in UK.

Yes I do build my own image but I am not familiar with git bisect, I will check how to use it and come back on this point.

Project Manager
Mathias Kresin commented on 14.10.2016 13:13

Would you please provide your complete /e/c/network!

Carlo commented on 14.10.2016 13:16

Full network:

root@OpenWrt:/etc/config# cat network

config interface 'loopback'

      option ifname 'lo'
      option proto 'static'
      option ipaddr '127.0.0.1'
      option netmask '255.0.0.0'

config globals 'globals'

      option ula_prefix 'fd7b:f926:6250::/48'

config interface 'lan'

      option type 'bridge'
      option proto 'static'
      option netmask '255.255.255.0'
      option ip6assign '60'
      option ipaddr '192.168.20.254'
      option igmp_snooping '1'
      option _orig_ifname 'eth1 wlan0 wlan1'
      option _orig_bridge 'true'
      option ifname 'eth1 eth2'

config interface 'wan'

      option ifname 'eth0'
      option proto 'pppoe'
      option username 'bthomehub@btbroadband.com'
      option password 'bt'
      option timeout '10'

config interface 'wan6'

      option ifname 'eth0'
      option proto 'dhcpv6'

config interface 'iptv'

      option ifname 'eth0'
      option proto 'static'
      option ipaddr '10.22.22.1'
      option netmask '255.255.255.0'

config interface 'vpn0'

      option ifname 'tun0'
      option proto 'none'
      option auto '1'

config interface 'guest'

      option _orig_ifname 'radio1.network2'
      option _orig_bridge 'false'
      option proto 'static'
      option ipaddr '192.168.99.254'
      option netmask '255.255.255.0'

root@OpenWrt:/etc/config#

Carlo commented on 14.10.2016 13:37

Just found this topic on Openwrt board:

https://forum.openwrt.org/viewtopic.php?pid=335168#p335168

From the topic:
(BTW: R1297 is running ok, so must be a change of the last week)
edit 1: This seems to be the only change to the PPP package: https://git.lede-project.org/?p=source. … 344006173)
edit 2: just reverted that change and rebuild the setup, still not working so it must be collateral damage from something else.

Project Manager
Mathias Kresin commented on 14.10.2016 14:05

Nice finding.

The as working reported version r1297 has the git commit hash 4e8c6f340751c66a602b98b727af28b2a9004313

The report in the forum is from 2016-08-20. The last commit of this date has the commit hash 35be9284668d19a565d354a33febb508b0e28131 (r1396).

First step would be to test these both commits, to make sure that r1297 works and r1396 is really broken.

$ git checkout master
$ git checkout 4e8c6f340751c66a602b98b727af28b2a9004313
$ make dirclean
$ make menuconfig
$ make
<test the image>

the same with 35be9284668d19a565d354a33febb508b0e28131

If you have a good and a bad version you can use git bisect (git bisect start <bad> <good>):

$ git checkout master
$ git bisect start 35be9284668d19a565d354a33febb508b0e28131 4e8c6f340751c66a602b98b727af28b2a9004313
$ make dirclean
$ make menuconfig
$ make
<test the image>
$ git bisect good OR git bisect bad
$ make dirclean
$ make menuconfig
$ make
<test the image>
$ git bisect good OR git bisect bad
...

In the end, git bisect will tell you which commit introduced the regression.

Carlo commented on 15.10.2016 11:05

here you go:

carlo@ubuntu:~/source$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[c18edcec4500008a1dabf0b017322eb23b059c58] base-files: add preinit ifname detection based on board.json
carlo@ubuntu:~/source$

Important: while testing the builds, I had some of them that would build without errors but didn't let the router to boot, so I marked them as bad.

Project Manager
Mathias Kresin commented on 15.10.2016 11:23

Good job!

Would you please apply the attached patch on top of the latest git and check if the issue is gone. The patch is not a fix! It's just to confirm that c18edcec4500008a1dabf0b017322eb23b059c58 is really the cause of your issue.

$ git checkout master
$ patch -p1 < fs227_confirmation.patch

# to confirm that the patch is applied successfully
$ git diff

# build and test image
$ make dirclean
$ make menuconfig
$ make
Carlo commented on 15.10.2016 22:55

it works :)
I applied the patch to this build: LEDE Reboot (HEAD, r1845) and got finally the pppoe connection back :)

Thanks for your help, hopefully we will get a permanent fix in the trunk (soon) :)

Project Manager
Mathias Kresin commented on 16.10.2016 07:08

Please attach the output of the following commands from a working image and from a not working image:

$ dmesg
$ swconfig dev switch0 show
$ cat /etc/board.json
$ cat /etc/config/network
$ for iface in $(ls /sys/class/net/);do echo "${iface}: $(cat /sys/class/net/${iface}/carrier)";done

PLease do not keep your settings during test.

Johnny sl commented on 16.10.2016 07:21

FYI: That message on the openwrt forum was mine.
I could eventually trace it to the changes done to enable vlans on the Switch by default, while my config didn't really use those.
After wiping my /etc/config/network, rebooting, reconfiguring from scratch based on switch vlans, everything started to work again.
PPPOE is still quite slow though, taking often multiple attemps in a couple of minutes to log in.

Project Manager
Mathias Kresin commented on 16.10.2016 07:39

According to the code, the vlans were set up already before c18edcec4500008a1dabf0b017322eb23b059c58.

But since commit c18edcec4500008a1dabf0b017322eb23b059c58 vlans are enabled in failsafe/preinit as well. This might cause some unexpected side effects on mvebu boards, since they never had support for failsafe (which is really bad).

Due to your remark regarding a changed vlan config, I've updated the post where I'm asking for some output.

As a general not, please report bugs here and do not hide them in the forum. To my knowledge no dev is monitoring the forum for bugs reports

Johnny sl commented on 16.10.2016 08:09

Usually i like to understand if it is me, or a bug. Don't want to clutter this page with all issues i run into.
Due to nobody complaining, and me "fixing" it with a reconfigure of /etc/config/network i assumed it was not a real bug...

Carlo commented on 16.10.2016 08:29

attached there is the commands' output for the same build (working and not working version)

   debug.rtf (75.8 KiB)
Project Manager
Mathias Kresin commented on 16.10.2016 09:13
Usually i like to understand if it is me, or a bug. Don't want to clutter this page with all issues i run into.
Due to nobody complaining, and me "fixing" it with a reconfigure of /etc/config/network i assumed it was not a real bug...

Thanks for that! Indeed, that is the way to go and not to spam the bugtracker with support requests.

attached there is the commands' output for the same build (working and not working version)

Okay, now I can see the real issue.

It's a bug in "set up vlans in preinit/failsafe" which is revealed by a config that differs from the default network config.

During preinit vlan support is enabled ("enable_vlan: 1" in swconfig output) since it is (now) the default for the board, but the vlan support is not disabled afterwards. Since your /e/c/network misses the vlan part, it can neither disable vlan support nor setup the desired vlan config on it's own.

That your lan interfaces are working is more luck than expected.

For now, the best is to disable vlan support after boot. Everything should work after that using an unmodified LEDE image:

swconfig dev switch0 set enable_vlan 0
swconfig dev switch0 set apply

I will try to get in contact with the author of this change to discuss the issue. I'm not interested to commit a fix which possibly introduces a new bug.

psyborg commented on 08.07.2018 20:09

your ticket break spacing on 1280x800 screens. also i don't see a point in using tags...

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing