Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#3056 - odhcpd: "on-link" Router Information Options pollution in Router Advertisements #7848

Closed
openwrt-bot opened this issue Apr 27, 2020 · 5 comments
Labels

Comments

@openwrt-bot
Copy link

mherbert:

===Device problem occurs on===

Reproduced on Archer C7 v5 but not hardware specific at all.

===Software versions of OpenWrt/LEDE release, packages, etc.===

openwrt-18.06 branch (git-20.029.49294-41e2258)
odhcpd-v6only 1.15-3

===Steps to reproduce===

The function [[https://github.com/openwrt/odhcpd/blob/6db312a698e920f/src/router.c#L430|send_router_advert()]] in src/router.c doesn't seem to know the purpose of Router Information Options. It systematically pollutes its Router Advertisements with bogus, RFC4191 off-link Route Information Options (RIOs) that seem to somehow clone on-link Prefix Information Options (PIOs) in the same RA packet.

This makes most Linux hosts unreachable with IPv6 on the same LAN.

This can be reproduced very easily with odhcpd-ipv6only - 1.15-3 and either ndptool or tcpdump. From a Linux workstation:

$ ndptool monitor -i wlan &
$ ndptool send -t rs -i wlan0

Prefix: 2601:2c0:7a00:724::/64, valid_time: 239810s, preferred_time: 239810s, on_link: yes, autonomous_addr_conf: yes, router_addr: no
Route: 2601:2c0:7a00:724::/64, lifetime: 239810s, preference: medium

tcpdump -v icmp6

Prefix Info Option (3), length 32 (4):, 2601:2c0:7a00:724::/64 Flags [onlink, auto], valid time 159616s, pref. time 159616s
Route Info Option (24), length 24 (3): 2601:2c0:7a00:724::/64, pref=medium, lifetime=159616s

These wrong RIOs are harmless for the (many?) RFC4191 Type A or B operating systems that ignore RIOs, however they are harmful for some Type-C hosts like NetworkManager 1.20.10-1.fc31 or NetworkManager 1.22.10-1ubuntu1 that get confused by these RIOs. Most Linux distributions use NetworkManager by default.

RFC4191 RIOs meant for advertising OFF-LINK routes, they are not meant for (trying and failing to) duplicate information already found in ON-LINK prefixes. Quoting https://tools.ietf.org/html/rfc4191

In some network topologies where the host has multiple routers on its Default Router List, the choice of router for an off-link destination is important.

===Consequence===

=> Some RFC4191 Type C hosts like NetworkManager are unreachable on the LAN ⇐

For some Type C hosts like NetworkManager, the (wrong) RIO takes precedence over the (correct) PIO in the routing table. Let’s call these “Type C-” hosts.

When trying to reach a Type C- host (with ping6 or anything else) packets are sent directly on-link to the Type C- host. Then the Type C- host obeys the broken RIO pollution and replies indirectly through the router. Finally, the following ip6tables rule in OpenWRT’s firewall configuration drops the unexpected reply:

FORWARD
DROP all anywhere anywhere ctstate INVALID /* !fw3 */

Confusingly, Type C- hosts can reach other Type C- hosts - through the router. Other hosts can reach each others hosts directly. Only some communications are broken which was very difficult to debug.

@openwrt-bot
Copy link
Author

mherbert:

Thanks thaller for sharing the pointer to the relevant NetworkManager code:
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/dec1678fecadf/src/ndisc/nm-ndisc.c#L554

Some (less user friendly) alternatives to NetworkManager avoid the issue:

  • sysctl -w net.ipv6.conf.wlan0.accept_ra=1 ignores RIOs.

  • systemd-networkd does take this RIO pollution into account and adds it to the routing table but unlike NetworkManager it populates the routing table with BOTH the PIO and bogus RIO and the on-link prefix information seems to take precedence for some reason. Windows 10 behaves the same.

@openwrt-bot
Copy link
Author

brianjmurrell:

Interesting. Pity the Priority is Very Low even though the Severity is Medium.

@openwrt-bot
Copy link
Author

mherbert:

I didn't see any Severity guideline at https://openwrt.org/bugs
While this is quite "critical" for IPv6, I set the initial Severity to Medium assuming most people can fallback on IPv4.

"Very Low" is probably just the initial priority, especially for "Unconfirmed" bugs. You don't get to pick a priority when you file a bug and I don't see anyone changed it in the history tab.

While Severity could be somewhat objective given some guidelines, Priority is by definition totally relative and subjective :-)

Speaking of "Unconfirmed" it wouldn't hurt if a couple people could reproduce and confirm here.

@openwrt-bot
Copy link
Author

dedeckeh:

odhcpd sends RIO RA options to be in line with RFC7084 and in particular requirement L3 :

An IPv6 CE router MUST advertise itself as a router for the
delegated prefix(es) (and ULA prefix if configured to provide
ULA addressing) using the "Route Information Option" specified
in Section 2.3 of [RFC4191]. This advertisement is
independent of having or not having IPv6 connectivity on the
WAN interface.

But I agree this can cause issues on type C hosts if the delegated prefix length is equal to the downstream delegated prefix length.
Therefore I've made a change to exclude the RIO RA option for a given prefix if the delegated prefix length is equal to the downstream delegated prefix length (https://git.openwrt.org/?p=project/odhcpd.git;a=commit;h=5ce077026b991f49d96464587386f93d92f56385)
Can you check if this fixes the issue for you ?

@openwrt-bot
Copy link
Author

mherbert:

Thanks Hans for the quick fix, really appreciated.

Therefore I've made a change to exclude the RIO RA option for a given prefix if the delegated prefix length is equal to the downstream delegated prefix length...

... the latter being always equal to 64 (for those looking at the code change).

I did indeed observe that the problem happens with the fc00:: ULA only when its prefix is /64, didn't happen with fc00:/48. I tried to make the relatively long bug description shorter and I didn't think it was very important. Looks like it is :-)

(You obviously can't change the /64 prefix when it's given by your Internet Provider)

https://git.openwrt.org/?p=project/odhcpd.git;a=commit;h=5ce077026b991f4
Can you check if this fixes the issue for you ?

For a while I was wondering whether manually downloading and installing
https://downloads.openwrt.org/snapshots/packages/mipsel_24kc/base/
odhcpd-ipv6only_2020-05-04-5ce07702-3_mipsel_24kc.ipk would work. To my amazement, this dead simple command was enough to do the job instead:

# opkg upgrade odhcp-ipv6only
Upgrading odhcpd-ipv6only on root from 2019-12-16-e53fec89-3 to 2020-05-03-49e4949c-3...
Downloading http://downloads.openwrt.org/releases/19.07.2/packages/mips_24kc/base/ odhcpd-ipv6only_2020-05-03-49e4949c-3_mips_24kc.ipk

I initially got confused by commits e53fec89 and 49e4949c but then I found them in this other branch: https://git.openwrt.org/?p=project/odhcpd.git;a=shortlog;h=refs/heads/openwrt-19.07

Problem fixed, NetworkManager is back! Thanks for rewarding all the time I spent debugging this that quickly.

(Besides having a friendlier graphical UI, one of the smarter things NetworkManager does is setting a higher default metric on wifi when both wifi and wired are connected)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant