OpenWrt/LEDE Project

  • Status Closed
  • Percent Complete
    100%
  • Task Type Bug Report
  • Category Kernel
  • Assigned To No-one
  • Operating System All
  • Severity Low
  • Priority Very Low
  • Reported Version Trunk
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by plntyk - 10.05.2018
Last edited by Jonas Gorski - 09.08.2018

FS#1541 - Invalid Kernel logspam : sit: non-ECT from <various IPs, Invalid IPs>

Invalid IP suggests sth. wrong ?

Logs:

dmesg |grep sit | sed ‘s/\[\ *[0-9]*\.[0-9]*\]//g’ |sort |uniq

sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
sit: non-ECT from 0.14.42.24 with TOS=0×2 sit: non-ECT from 0.14.42.24 with TOS=0×6 sit: non-ECT from 0.14.42.24 with TOS=0xb
sit: non-ECT from 0.14.42.24 with TOS=0xe
sit: non-ECT from 0.14.42.24 with TOS=0xf
sit: non-ECT from 0.206.0.0 with TOS=0×1 sit: non-ECT from 0.206.0.0 with TOS=0×9 sit: non-ECT from 0.206.0.0 with TOS=0xa
sit: non-ECT from 1.30.2.141 with TOS=0×5 sit: non-ECT from 1.30.2.141 with TOS=0×9 sit: non-ECT from 1.30.2.141 with TOS=0xb
sit: non-ECT from 1.30.2.141 with TOS=0xf
sit: non-ECT from 1.30.2.143 with TOS=0×3 sit: non-ECT from 1.30.2.143 with TOS=0×5 sit: non-ECT from 1.30.2.143 with TOS=0×9 sit: non-ECT from 1.30.2.143 with TOS=0xe
sit: non-ECT from 20.22.0.242 with TOS=0×5 sit: non-ECT from 20.22.0.242 with TOS=0×9 sit: non-ECT from 20.22.0.242 with TOS=0xb
sit: non-ECT from 32.0.0.1 with TOS=0×6 sit: non-ECT from 32.0.0.1 with TOS=0×9 sit: non-ECT from 32.72.0.1 with TOS=0×6 sit: non-ECT from 64.1.8.27 with TOS=0×2 sit: non-ECT from 64.1.8.27 with TOS=0xb
sit: non-ECT from 64.1.8.27 with TOS=0xd
sit: non-ECT from 64.1.8.27 with TOS=0xf
sit: non-ECT from 64.14.8.11 with TOS=0xe
sit: non-ECT from 64.14.8.5 with TOS=0×5 sit: non-ECT from 64.14.8.5 with TOS=0xd
sit: non-ECT from 64.14.8.7 with TOS=0×1 sit: non-ECT from 64.14.8.7 with TOS=0×2 sit: non-ECT from 64.14.8.7 with TOS=0×3 sit: non-ECT from 64.14.8.7 with TOS=0×6 sit: non-ECT from 64.14.8.7 with TOS=0xa
sit: non-ECT from 64.14.8.7 with TOS=0xd
sit: non-ECT from 64.14.8.9 with TOS=0xd
sit: non-ECT from 72.96.0.0 with TOS=0×1 sit: non-ECT from 72.96.0.0 with TOS=0×2 sit: non-ECT from 72.96.0.0 with TOS=0×3 sit: non-ECT from 72.96.0.0 with TOS=0×5 sit: non-ECT from 72.96.0.0 with TOS=0×6 sit: non-ECT from 72.96.0.0 with TOS=0×7 sit: non-ECT from 72.96.0.0 with TOS=0×9 sit: non-ECT from 72.96.0.0 with TOS=0xa
sit: non-ECT from 72.96.0.0 with TOS=0xb
sit: non-ECT from 72.96.0.0 with TOS=0xe
sit: non-ECT from 72.96.0.0 with TOS=0xf

Supply the following if possible:
- Device problem occurs on

TP-Link WDR3600

- Software versions of OpenWrt/LEDE release, packages, etc.

root@OpenWrt:~# cat /etc/openwrt_version
r6867-ce4d2fb5cc
root@OpenWrt:~# cat /etc/openwrt_release
DISTRIB_ID=’OpenWrt’ DISTRIB_RELEASE=’SNAPSHOT’ DISTRIB_REVISION=’r6867-ce4d2fb5cc’ DISTRIB_TARGET=’ar71xx/generic’ DISTRIB_ARCH=’mips_24kc’ DISTRIB_DESCRIPTION=’OpenWrt SNAPSHOT r6867-ce4d2fb5cc’ DISTRIB_TAINTS=’no-all’

- Steps to reproduce

Updated router to recent trunk (WDR3600)
kept config

Guessing:
log spam might involve tunnelbroker tunnel or openvpn server

some issues with either comp_lzo or openvpn-mbedtls on updating
were resolved with “compress lzo” and openvpn-openssl on server side

vpn client is LEDE r5099 (OpenVPN 2.4.4 mips-openwrt-linux-gnu [SSL (mbed TLS)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD]
library versions: mbed TLS 2.6.0, LZO 2.10)

Closed by  Jonas Gorski
09.08.2018 06:53
Reason for closing:  Fixed
Additional comments about closing:  

The kernel in 18.06 branch was updated to version including the revert with ht tps://git.lede-project.org/9e1530b2a35e0 51664ed243efd1eac942883494a, so it will be fixed in 18.06.1.

plntyk commented on 10.05.2018 08:52

found https://forum.freifunk.net/t/kernel-ip-tunnel-non-ect-from-185-66-194-0-with-tos-0x/6149/5

But there the problem vanished after some random updates

Arjen de Korte commented on 10.05.2018 20:54

FWIW, I see this too. About two weeks ago this started. Fortunately, I still have a build available that is not affected, so I'll dig in next week to see what is the culprit. I guess it is something related to OpenVPN/routing, since I can no longer tunnel IPv6 over an IPv4 OpenVPN connection.

menno commented on 13.05.2018 14:55

I ran into this as well, the workaround for me is to do 'echo N > /sys/module/sit/parameters/log_ecn_error'

My guess is that the default changed at some point but I have not yet looked further into this.

Arjen de Korte commented on 14.05.2018 19:23

I changed the contents '/etc/modules.d/32-sit' of from

  sit

to

  sit log_ecn_error=0

This will suppress the errors too after rebooting. It doesn't fix the underlying problem though.

Kufat commented on 16.05.2018 02:44

I also have this issue on a recent custom build of the 17.01 branch. I have a tunnelbroker IPv6 tunnel but don't use a VPN, if that helps narrow things down.

Rafal commented on 24.05.2018 18:33

Try

sit log_ecn_error=N

in contents '/etc/modules.d/32-sit'

Ryan Mounce commented on 08.06.2018 02:04

I am seeing this with kernel 4.9.102 on mpc85xx with a simple 6in4 tunnel configured.

I don't know when the bug was introduced, but the problem is that the inner IPv6 header of inbound packets (from the tunnel broker) is being interpreted as an IPv4 header when looking for the ToS/ECN bits. The debug message is also looking in the IPv4 location for the source address, and is instead printing the second quarter of the v6 source address formatted as a v4 address.

Ubuntu has also identified the issue, a fix will hopefully be backported to 4.9.x upstream.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1772775

Edit: found the commit that backported the bug to 4.9, even after it had been reverted in mainline
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/net/ipv6/sit.c?h=linux-4.9.y&id=92b86857ed7e8be6843d216fefe2178b172dbf63

https://github.com/torvalds/linux/commit/f4eb17e1efe5

Dmitry Tunin commented on 12.07.2018 22:54

I've sent a pull request against the 17.01 branch.
The same is needed for 18.06 for 4.9 kernel.

patrikx3 commented on 23.07.2018 04:58

Hello guys!

Do you have any solution! I still experience that the DNS is not working 100%. I addded compress lzo yes to OpenVpn, but I do not find why I can set on compress lzo to the tunnelbroker tunnel.

This is on Linksys WRT. At least now it works, but my DNS server (home DNS server via server) sometimes drops packages, but when I click on the browser like 2-3 times it works, in 17.01.4 it never happened).

Alexander E. Patrakov commented on 29.07.2018 10:02

This bug is much more serious than just log spam. Each logged line is actually one packet dropped for no good reason. The bug is actually about misinterpreting the first byte of the IPv6 Flow Label field as an IPv4 TOS field and then (mistakenly) applying logic from RFC 6040 to drop packets under the supposed explicit congestion.

Turris equivalent: https://forum.turris.cz/t/performance-issue-with-ipv6-6rd-on-turris-omnia/7505

Alexander E. Patrakov commented on 01.08.2018 13:35

In the #openwrt channel I was asked how to reproduce this in lab conditions.

1. Install release 17.01.5 on WRT1200ACv2.
2. Set up a PPPoe WAN 3. Register with tunnelbroker.net, create a tunnel, tune down the MTU as required for PPPoE.
4. Setup a 6in4 tunnel in OpenWRT
5. Modify firewall so that pings from WAN to LAN over IPv6 are accepted
6. Run tcpdump on pppoe-wan (ideally with a filter like "host 1.2.3.4" where 1.2.3.4 is the tunnel server) and on 6in4-wan6, separately
7. Ping some host in LAN from some Linux host in WAN

Important: the host in WAN must have a non-zero flow label on its ping packets. That's why "Linux".

8. See the IPv6 packets captured by tcpdump on pppoe-wan but not on 6in4-wan6, and also see the TOS log spam with invalid IP addresses.

9. If the pings do come through, wait 20 minutes and try again (so that ping chooses a different flow label), or try from a different location. For me, pinging from a 6to4 host running the latest Arch Linux was reliably triggering the issue.
10. Explicitly pass "-F 0" to ping and see that the packets come through.

MorrisMajor commented on 07.08.2018 13:15

If this bug has been fixed upstream in kernel 4.9.112 and the OpenWrt 18.06 codebase was just updated to kernel 4.9.118 that should mean that OpenWrt 18.06.1 will fix this bug.

Correct?

Dmitry Tunin commented on 07.08.2018 14:26

@Alexander PPPoE is not needed for that.

@Maurits You are correct. If you build from the 18.06 branch, you'll get a fix.

Alexander E. Patrakov commented on 25.08.2018 01:32

I confirm that on mvebu (WRT1200ACv2) on OpenWRT 18.06.1 the bug no longer exists. It would still be nice to fix the regression in the LEDE 17.01.x release.

Daniel Gimpelevich commented on 25.08.2018 03:02

Supposedly, this bug has also been fixed upstream in kernel 4.4.143, so 17.01.6 will have it too, since the 17.01 codebase is currently using kernel 4.4.151, which includes that fix.

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing