Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#964 - odhcpdIPv6 node(Win 10) stops sending Router Solicitation but it still receives RAs from LEDE #5949

Closed
openwrt-bot opened this issue Aug 16, 2017 · 27 comments
Labels

Comments

@openwrt-bot
Copy link

freefor:

Router: Buffalo wbmr-hp-g300h
Lede version: Reboot (SNAPSHOT, r4696-df3295f50e)
ISP: TIM Italy - ADSL2+
NIC: Realtek PCIe GBE - Driver 10.19.627.2017 (27/06/2017 - Latest)

I have set up a new pppoe session to get IPv6 on all my mobile devices(see attachment). Without a dedicated pppoe session,
my Android phone could not reach IPv4-only hosts.

Note:My ISP delegates via DHCPv6-PD a single dynamic /64.

Problem:

Everything worked fine until I noticed that my Windows 10 (Build 15063.540) laptop loses its IPv6 default gateway after 3 hours
more or less. If I disable/reenable the NIC (wired) the default gateway returns and I can use IPv6.

I did a wireshark capture on my laptop and I can see my laptop receiving RAs but I stop sending RS to the router.
I captured at the same time traffic on the router and indeed there are no RS from my laptop.

I attach here the output before and after I have a default gateway for IPv6

  • ifstatus wan
  • ifstatus wan6
  • ifstatus lan
  • ubus call network.interface.wan6 status

You can find also my /etc/config/network, /etc/config/firewall and /etc/config/dhcp settings.

Let me know if you need more logs.
I can easily reproduce this problem(just have to wait 3 hours).

Forum discussion: https://forum.lede-project.org/t/problem-with-ipv6-and-mobile-device/241

@openwrt-bot
Copy link
Author

dedeckeh:

The attached logs don't reveal any issue.
It would be interesting to see if odhcpd still transmits RA messages when the default gateway issue is observed and what router lifetime is included in the RA messages.
Therefore set odhcpd loglevel to 7 in uci (uci set dhcp.odhcpd.loglevel=7) which will make odhcpd more verbose; traces can be read via logread

@openwrt-bot
Copy link
Author

jogo:

I did a wireshark capture on my laptop and I can see my laptop receiving RAs but I stop sending RS to the router.
I captured at the same time traffic on the router and indeed there are no RS from my laptop.

This is expected/conforming behavior to [[https://tools.ietf.org/html/rfc4861#section-6.3.7|RFC 4861]] - RS are only sent on a link up event, until an RA is received. RAs are sent in regular intervals by the router (so RS are not actually required, and just to allow faster router discovery). The RFC even forbids sending RS except in response to link up / interface becomes available events.

So the question is probably why windows 10 does not like the RAs. Does it ever update the router lifetime? Do you have any logs from the windows machine (be it regular output).

@openwrt-bot
Copy link
Author

freefor:

@hans Dedecker
Tonight I should be able to test again and send you the logs.

@jonas Gorski
Yes, as you said I receive RAs but I don't know why win 10 stops sending RS.
What kind of logs do you need from my windows machine?
I will provide the pcap.

@openwrt-bot
Copy link
Author

jogo:

Yes, as you said I receive RAs but I don't know why win 10 stops sending RS.

Windows stops sending RS because the standard says you stop sending RS once you received a RA.

And as I said, try to use netsh or so to find out how lifetimes of routes etc go on the windows side (especially if RAs change something or not).

@openwrt-bot
Copy link
Author

freefor:

Sorry, I misunderstood you and the rfc. Thank you for pointing it out.
Ok, I will check with netsh.

@openwrt-bot
Copy link
Author

freefor:

I attached the uci odhcpd debug log and a pcap (icmpv6 filter). Logs are in UTC timestamp (my timezone is UTC+2).

This is what I did:
1.Rebooted the router
2.Rebooted my windows machine
3.Started wireshark
4.Opened facebook and confirmed it was connected via IPv6 (chrome extension)
5.Left the computer at facebook's homepage. UTC 11:22
6.Checked my IPv6 default gateway and it was gone. UTC 14:31

My windows machine still has an IPv6 address, IPv6 DNS but no default gateway.
While I am writing this comment, the IPv6 default gateway is still missing.

I checked also the lifetime on my windows machine using netsh and it gets refreshed every once in while(usually it has a value between 5 - 3 mins as duration).

Thank you for your time.

@openwrt-bot
Copy link
Author

dedeckeh:

The odhcpd traces indicate the default IPv6 route on the router is expiring :

Fri Aug 18 14:29:38 2017 daemon.info odhcpd[835]: Raising SIGUSR1 due to default route change
Fri Aug 18 14:29:38 2017 daemon.info odhcpd[835]: Raising SIGUSR1 due to default route change
Fri Aug 18 14:29:39 2017 daemon.info odhcpd[835]: Using a RA lifetime of 0 seconds on br-lan

This results into a RA being sent to the lan with routerlifetime 0
Shortly after odhcpd is informed about a new default IPv6 route but with a small lifetime (which is confirmed ifstatus wan6):

Fri Aug 18 14:29:54 2017 daemon.info odhcpd[835]: Raising SIGUSR1 due to default route change
Fri Aug 18 14:29:54 2017 daemon.debug odhcpd[835]: Netlink newaddr 2a01:2000:2000:ed4f::1%pppoe-wan2
Fri Aug 18 14:29:54 2017 daemon.debug odhcpd[835]: Netlink newaddr 2a01:2000:2001:5c3::1%br-lan
Fri Aug 18 14:29:54 2017 daemon.info odhcpd[835]: Raising SIGUSR1 due to address change on br-lan
Fri Aug 18 14:29:54 2017 daemon.info odhcpd[835]: Raising SIGUSR1 due to default route change
Fri Aug 18 14:29:54 2017 daemon.info odhcpd[835]: Raising SIGUSR1 due to default route change
Fri Aug 18 14:29:55 2017 daemon.info odhcpd[835]: Using a RA lifetime of 237 seconds on br-lan

This behavior is continuously repeated and is very unusual as the lifetime of the default IPv6 route is determined by the RA route lifetime received on the wan.

Are you able to sniff IPv6 packets on the wan (eg via tcpdump) ? I'm particular interested in the contents of the RA and DHCPv6 messages received on the PPP link

@openwrt-bot
Copy link
Author

freefor:

Some changes from last time:
-new build r4707-3e6d303d61
-dnscrypt added

I attached pcap from WAN and Windows 10 machine.There is also an ipconfig output when IPv6 is working fine.

UTC 19:23 - Started capturing
UTC 21:12 - I checked with ipconfig and the IPv6 default gateway was gone. I redid again ipconfig and it reappeared.
UTC 21:19 - IPv6 default gateway is gone and it never came back(waited 10 minutes).

Unfortunately I forgot to increase the system.log max filesize and I miss a large amount of odhcp debug(PART1 UTC 19:25 ~ 19:38 and PART2 UTC 21:28 ~ 21:30).

I won't be able to do other tests until 27th August but let me know what do you want me to do.

Thanks.

@openwrt-bot
Copy link
Author

dedeckeh:

The RA transmission rate on the wan ppp interface is very chatty; it varies between 10 and 23 seconds... On top the router lifetime is set to 40 seconds resulting into a similar lifetime for the default IPv6 route.
The problem is not situated in the odhcpd daemon but in the odhcp6c DHCPv6 client daemon as it only accepts ra updates every 30 seconds. This will cause the default IPv6 route to time out resulting into the observed behavior.
Further investigation is required why this limitation has been put into place.

@openwrt-bot
Copy link
Author

EricLuehrsen:

Maybe this is an ISP problem ... Any parameter lifetime under one minute is out of line with the implied best practices in RFC 4861 and since. Default router lifetime sent or achieving 0 removes the router from the host router list. A lifetime of 40 seconds is not robust to local congestion or missed RA messages. Similarly, unsolicited RA faster than one minute are a potential congestion source (depending on connection type or speed). While odhcp6c limit of 30 seconds may be arbitrary, it may also be a well considered throttle to the burden from excessive RA events.

@openwrt-bot
Copy link
Author

freefor:

Great to see you have some ideas.
If it is not a odhcpc problem I could try asking my ISP but it will take weeks before I found a competent technician.

As I said, I am available from the 27th August for testing.

Thank you.

@openwrt-bot
Copy link
Author

jogo:

While odhcp6c limit of 30 seconds may be arbitrary, it may also be a well considered throttle to the burden from excessive RA events.

Luckily you can change that as a workaround (although not per UCI) with the ''-m'' switch. If you modify ''/lib/netifd/proto/dhcpv6.sh'' and add e.g. ''-m 10'' (for 10 seconds minimum) to the arguments passed to odhcp6c, it should help in the mean time.

@openwrt-bot
Copy link
Author

dedeckeh:

@ForFree Can you add as suggested by Jonas Gorski -m 10 to the proto_run_command in /lib/netifd/proto/dhcpv6.sh; this will instruct odhcp6c to accept RA messages spaced with a minimum interval of 10 seconds.
Can you monitor the IPv6 default route on the router and the lan devices ?

@openwrt-bot
Copy link
Author

dedeckeh:

@eric Luehrsen From experience I know Tim uses network config settings which differ a lot from other ISPs and as such are not seen as best practices.
Checking RFC4861 regarding best practice for minimum router advertisement interval I can only find the following statement

The minimum time allowed between sending unsolicited multicast Router Advertisements from the interface, in seconds. MUST be no less than 3seconds and no greater than .75 * MaxRtrAdvInterval.

Neither do I find any requirement in RFC7084
Tim will argue that the minimum router advertisement interval exceeds 3 seconds and as such is in line with RFC4861.
On the other hand the router lifetime seems not be in line with the best practice described in RFC4861 as 40 seconds does not seem to be 3*max router interval for a max observed routerinterval of 23 seconds

@openwrt-bot
Copy link
Author

EricLuehrsen:

@hans Dedecker, you have hit it right on the head. There is no one specification limit, but rather a set of parametric relationships in a system. There are also simple matters of reality. A certain too fast RA represents congestion or link waste. It is not possible to robustly maintain a link if the information for the link is not refreshed at least 3 times in its link life (an RA may go missing or be garbled). At 20-40 odd seconds, it would be further ridiculous for a client to RA solicit as a fall-back-measure after the routers half-life and get things more in a knot. This is why I said "implied best practices," because the system of relationships just doesn't work well any other way.

@openwrt-bot
Copy link
Author

freefor:

@hans Dedecker

I edited dhcpv6.sh like this:
proto_run_command "$config" odhcp6c
-m 10 -s /lib/netifd/dhcpv6.script
$opts $iface

saved, reboot both router and then my windows machine.
After 1 hour and 10 minutes I checked the IPv6 default gateway and it was gone.
I can do quick tests if you want but collecting logs, you have to wait after 27th Aug.
Sorry.

@openwrt-bot
Copy link
Author

dedeckeh:

@eric Luehrsen I don't think "implied best practices" will be a strong enough argument to convince Tim; after all it's an interpretation of the RFC and not an explicit written down requirement.

@openwrt-bot
Copy link
Author

dedeckeh:

@ForFree

Change dhcpv6.sh as follows :
proto_run_command "$config" odhcp6c
-s /lib/netifd/dhcpv6.script
-v -m 10 $opts $iface

This will start odhcp6c in verbose mode and will instruct odhcp6c to accept RA messages spaced with a minimum interval of 10 seconds

@openwrt-bot
Copy link
Author

freefor:

@hans Dedecker

Now my IPv6 default gateway is stable!!
More than 4 hours and I still have full IPv6 connection.
So you have to add "-m 10" after the .script.. :)

I will continue monitoring the situation but it seems ok.

EDIT: after 1 hour and 30 minutes I wrote this comment, my windows machine does not have any GUA anymore only ULA address.

EDIT2: It seems like that the pppoe went down completely that's why I didn't have any GUA address. That is normal.

@openwrt-bot
Copy link
Author

freefor:

Over 6 hours and my IPv6 default gateway is stable.
I think we have found a valid workaround for TIM Italy..
Is it possible to let us configure this parameter via LUCI?

Thank you!

@openwrt-bot
Copy link
Author

dedeckeh:

@ForFree Can you provide the odhcp6c and odhcpd traces when possible in the current configuration ?
I will do some more testing before pushing a final solution (next week)

@openwrt-bot
Copy link
Author

freefor:

@hans Dedecker

I was able to do remote syslog and also save some pcapng.

As I wrote in the my previous post, today my IPv6 was stable for over 6 hours.
So my IPv6 was UP from UTC 6:00 until UTC 13:32 when I started collecting logs.

UTC 13:32 - started wireshark on windows machine

  • file pppoewan2-part1.pcap

UTC 14:20 - I checked with ipconfig and my PC didn't have any GUA address but only ULA.

UTC 14:31:11 - I manually shutdown the wan2 interface and reactivating the wan2 interface pppoe-wan2 came UP without any IPv6.
I captured a new pcap because the pppoe-wan2 went down terminating my previous tcpdump session. I also started capturing the"nas0" interface because I thought it could be useful to understand why there was no IPv6 in the PPP request(I can upload it if needed).I waited until 14:48:10 but still no IPv6 for pppoe-wan2 so I shut again wan2 interface.

UTC 14:48:30~ - I started again wan2 and here I made a mistake because I used the same filename(file pppoewan2-part2.pcap), overwriting everything from 14:31:11 to 14:48:10.
IPv6 was stable at this point.

UTC 17:50 - I check with ipconfig and my PC didn't have any GUA address but only ULA. pppoe-wan2 went down again.I don't remember if the pppoe session resumed on its own or I shut/unshut wan2 interface.. and I start a new tcpdump (file pppoewan2-part3.pcap).

UTC 19:49 - My IPv6 was stable and still is now that I am writing this comment (UTC 20:50)


I noticed that the RAs/RSs problem is solved but sometimes my pppoe-wan2 get disconnected. I guess this is a coincidence and could be my ISP's pppoe that is not stable.

@openwrt-bot
Copy link
Author

freefor:

5 days and my IPv6 default gateway is stable. Some pppoe disconnections but that's the ISP.

@openwrt-bot
Copy link
Author

dedeckeh:

@ForFree A patch has been pushed (https://git.lede-project.org/?p=project/odhcp6c.git;a=commit;h=51733a6d3bfe0fb9e8c53aea22231e5b8a1f64c3) which aligns odhcp6c RA behavior with RFC4861; by default a RA update is accepted with a minimum interval of 3 seconds.
The commit https://git.lede-project.org/?p=source.git;a=commit;h=05c3647d35bb6fe762b221f36f68d44cce15b963 allows to configure the RA update interval via the uci parameter ra_holdoff.
Can you give this a try in your setup as by default it should work now without any (config) changes

@openwrt-bot
Copy link
Author

freefor:

@hans Dedecker

Compiled a new build with your patch included.
I will monitor my IPv6 and let you know this Friday.

@openwrt-bot
Copy link
Author

freefor:

@hans Dedecker

After 3 days testing r4786-05c3647d35, I confirm you that my IPv6 default gateway is stable without changing the minimum interval.

Thank you!

@openwrt-bot
Copy link
Author

dedeckeh:

@ForFree Thanks for testing and the feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant