OpenWrt/LEDE Project

  • Status Unconfirmed
  • Percent Complete
    0%
  • Task Type Bug Report
  • Category Kernel
  • Assigned To No-one
  • Operating System All
  • Severity High
  • Priority Very Low
  • Reported Version openwrt-18.06
  • Due in Version Undecided
  • Due Date Undecided
  • Votes 2
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Paul Oranje - 26.12.2018

FS#2026 - kernel ipq806x Oops

Device: netgear,r7800
Version: 18.06.1

Steps to reproduce:
The crash will happen with 100% certainty, but what exactly causes the problem is a little hard to say. It looks like large internet frames on WAN interface somehow trigger the error.

The WAN interface (eth.2) is untagged, on the LAN ports besides 1 untagged VLAN, 3 untagged VLANs exist. Only (possibly) relevant extra installed package is ip-tiny; it isn’t used though for any custom configuration though.

From the syslog (from boot till Oops, seemingly non-related stuff redacted out):

Wed Dec 26 15:18:27 2018 kern.info kernel: [    0.000000] Booting Linux on physical CPU 0x0
Wed Dec 26 15:18:27 2018 kern.notice kernel: [    0.000000] Linux version 4.14.63 (buildbot@builds-03.infra.lede-project.org) (gcc version 7.3.0 (OpenWrt GCC 7.3.0 r7102-3f3a2c9)) #0 SMP Thu Aug 16 07:51:15 2018
Wed Dec 26 15:18:27 2018 kern.info kernel: [    0.000000] CPU: ARMv7 Processor [512f04d0] revision 0 (ARMv7), cr=10c5787d
Wed Dec 26 15:18:27 2018 kern.info kernel: [    0.000000] CPU: div instructions available: patching division code
Wed Dec 26 15:18:27 2018 kern.info kernel: [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
Wed Dec 26 15:18:27 2018 kern.info kernel: [    0.000000] OF: fdt: Machine model: Netgear Nighthawk X4S R7800
...
Wed Dec 26 15:18:27 2018 kern.info kernel: [    1.304591] libphy: GPIO Bitbanged MDIO: probed
Wed Dec 26 15:18:27 2018 kern.info kernel: [    1.325983] switch0: Atheros AR8337 rev. 2 switch registered on gpio-0
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.210243] libphy: Fixed MDIO Bus: probed
Wed Dec 26 15:18:27 2018 kern.warn kernel: [    2.212378] ipq806x-gmac-dwmac 37200000.ethernet: PTP uses main clock
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.213549] stmmac - user ID: 0x10, Synopsys ID: 0x37
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.219747] ipq806x-gmac-dwmac 37200000.ethernet: Ring mode enabled
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.224876] ipq806x-gmac-dwmac 37200000.ethernet: DMA HW capability register supported
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.230859] ipq806x-gmac-dwmac 37200000.ethernet: Enhanced/Alternate descriptors
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.238924] ipq806x-gmac-dwmac 37200000.ethernet: Enabled extended descriptors
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.246483] ipq806x-gmac-dwmac 37200000.ethernet: RX Checksum Offload Engine supported
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.253511] ipq806x-gmac-dwmac 37200000.ethernet: COE Type 2
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.261323] ipq806x-gmac-dwmac 37200000.ethernet: TX Checksum insertion supported
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.267227] ipq806x-gmac-dwmac 37200000.ethernet: Wake-Up On Lan supported
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.274601] ipq806x-gmac-dwmac 37200000.ethernet: Enable RX Mitigation via HW Watchdog Timer
Wed Dec 26 15:18:27 2018 kern.warn kernel: [    2.282886] ipq806x-gmac-dwmac 37400000.ethernet: PTP uses main clock
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.290081] stmmac - user ID: 0x10, Synopsys ID: 0x37
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.296320] ipq806x-gmac-dwmac 37400000.ethernet: Ring mode enabled
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.301257] ipq806x-gmac-dwmac 37400000.ethernet: DMA HW capability register supported
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.307399] ipq806x-gmac-dwmac 37400000.ethernet: Enhanced/Alternate descriptors
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.315397] ipq806x-gmac-dwmac 37400000.ethernet: Enabled extended descriptors
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.322956] ipq806x-gmac-dwmac 37400000.ethernet: RX Checksum Offload Engine supported
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.329903] ipq806x-gmac-dwmac 37400000.ethernet: COE Type 2
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.337865] ipq806x-gmac-dwmac 37400000.ethernet: TX Checksum insertion supported
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.343699] ipq806x-gmac-dwmac 37400000.ethernet: Wake-Up On Lan supported
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.350995] ipq806x-gmac-dwmac 37400000.ethernet: Enable RX Mitigation via HW Watchdog Timer
...
Wed Dec 26 15:18:27 2018 kern.info kernel: [    2.428253] 8021q: 802.1Q VLAN Support v1.8
...
Wed Dec 26 15:18:27 2018 user.info kernel: [    3.839793] init: - watchdog -
...
Wed Dec 26 15:18:27 2018 user.info kernel: [    4.407941] kmodloader: loading kernel modules from /etc/modules-boot.d/*
...
Wed Dec 26 15:18:27 2018 user.info kernel: [    5.088295] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
Wed Dec 26 15:18:27 2018 user.info kernel: [    5.098476] init: - preinit -
...
Wed Dec 26 15:18:27 2018 kern.info kernel: [    6.799339] Generic PHY fixed-0:01: attached PHY driver [Generic PHY] (mii_bus:phy_addr=fixed-0:01, irq=POLL)
Wed Dec 26 15:18:27 2018 kern.info kernel: [    6.800863] dwmac1000: Master AXI performs any burst length
Wed Dec 26 15:18:27 2018 kern.info kernel: [    6.808373] ipq806x-gmac-dwmac 37400000.ethernet eth1: IEEE 1588-2008 Advanced Timestamp supported
Wed Dec 26 15:18:27 2018 kern.info kernel: [    6.813856] ipq806x-gmac-dwmac 37400000.ethernet eth1: registered PTP clock
Wed Dec 26 15:18:27 2018 kern.info kernel: [    7.834378] ipq806x-gmac-dwmac 37400000.ethernet eth1: Link is Up - 1Gbps/Full - flow control off
...
Wed Dec 26 15:18:31 2018 kern.info kernel: [   35.015710] Generic PHY fixed-0:01: attached PHY driver [Generic PHY] (mii_bus:phy_addr=fixed-0:01, irq=POLL)
Wed Dec 26 15:18:31 2018 kern.info kernel: [   35.016654] dwmac1000: Master AXI performs any burst length
Wed Dec 26 15:18:31 2018 kern.info kernel: [   35.016672] ipq806x-gmac-dwmac 37400000.ethernet eth1: IEEE 1588-2008 Advanced Timestamp supported
Wed Dec 26 15:18:31 2018 kern.info kernel: [   35.016873] ipq806x-gmac-dwmac 37400000.ethernet eth1: registered PTP clock
...
Wed Dec 26 15:18:31 2018 kern.info kernel: [   35.020482] device eth1.1 entered promiscuous mode
Wed Dec 26 15:18:31 2018 kern.info kernel: [   35.020487] device eth1 entered promiscuous mode
Wed Dec 26 15:18:31 2018 kern.info kernel: [   35.066212] device eth1.127 entered promiscuous mode
Wed Dec 26 15:18:31 2018 kern.info kernel: [   35.120009] device eth1.34 entered promiscuous mode
Wed Dec 26 15:18:31 2018 kern.info kernel: [   35.162731] device eth1.10 entered promiscuous mode
...
Wed Dec 26 15:18:31 2018 daemon.notice netifd: Interface 'wan' is enabled
Wed Dec 26 15:18:31 2018 daemon.notice netifd: Interface 'wan' is setting up now
Wed Dec 26 15:18:31 2018 daemon.notice netifd: Interface 'wan' is now up
...
Wed Dec 26 15:18:31 2018 kern.info kernel: [   35.174821] Generic PHY fixed-0:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=fixed-0:00, irq=POLL)
Wed Dec 26 15:18:31 2018 kern.info kernel: [   35.175880] dwmac1000: Master AXI performs any burst length
Wed Dec 26 15:18:31 2018 kern.info kernel: [   35.175902] ipq806x-gmac-dwmac 37200000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported
Wed Dec 26 15:18:31 2018 kern.info kernel: [   35.176061] ipq806x-gmac-dwmac 37200000.ethernet eth0: registered PTP clock
...
Wed Dec 26 15:18:32 2018 kern.info kernel: [   36.072469] ipq806x-gmac-dwmac 37400000.ethernet eth1: Link is Up - 1Gbps/Full - flow control off
Wed Dec 26 15:18:32 2018 daemon.notice netifd: Network device 'eth1' link is up
Wed Dec 26 15:18:32 2018 daemon.notice netifd: VLAN 'eth1.1' link is up
Wed Dec 26 15:18:32 2018 daemon.notice netifd: VLAN 'eth1.10' link is up
Wed Dec 26 15:18:32 2018 daemon.notice netifd: VLAN 'eth1.34' link is up
Wed Dec 26 15:18:32 2018 daemon.notice netifd: VLAN 'eth1.127' link is up
...
Wed Dec 26 15:18:32 2018 daemon.notice netifd: Network device 'eth0' link is up
Wed Dec 26 15:18:32 2018 daemon.notice netifd: Interface 'wan' has link connectivity
Wed Dec 26 15:18:32 2018 kern.info kernel: [   36.232391] ipq806x-gmac-dwmac 37200000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
...
Wed Dec 26 15:18:47 2018 daemon.info procd: - init complete -
...
Wed Dec 26 18:09:06 2018 kern.err kernel: [ 1488.045291] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1990 larger than size (1536)
Wed Dec 26 18:09:06 2018 kern.err kernel: [ 1488.546134] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1990 larger than size (1536)
Wed Dec 26 18:09:07 2018 kern.err kernel: [ 1489.546246] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1990 larger than size (1536)
Wed Dec 26 18:09:09 2018 kern.err kernel: [ 1491.546705] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1990 larger than size (1536)
Wed Dec 26 18:09:13 2018 kern.err kernel: [ 1495.548558] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1990 larger than size (1536)
Wed Dec 26 18:09:17 2018 kern.err kernel: [ 1499.548584] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1990 larger than size (1536)
Wed Dec 26 18:09:21 2018 kern.err kernel: [ 1503.548245] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1990 larger than size (1536)
Wed Dec 26 18:09:25 2018 kern.err kernel: [ 1507.549643] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1990 larger than size (1536)
Wed Dec 26 18:09:29 2018 kern.err kernel: [ 1511.551169] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1990 larger than size (1536)
Wed Dec 26 18:09:33 2018 kern.err kernel: [ 1515.552148] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1990 larger than size (1536)
Wed Dec 26 18:09:37 2018 kern.err kernel: [ 1519.553088] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1990 larger than size (1536)
Wed Dec 26 18:24:05 2018 kern.alert kernel: [ 2387.041393] Unable to handle kernel paging request at virtual address 64616f86
Wed Dec 26 18:24:05 2018 kern.alert kernel: [ 2387.041483] pgd = c0204000
Wed Dec 26 18:24:05 2018 kern.alert kernel: [ 2387.047548] [64616f86] *pgd=00000000
Paul Oranje commented on 30.12.2018 21:10

Curious detail: the ipq806x-gmac-dwmac refer to faults with eth0, but these fault do occur only when the switch is connected to eth1 ...

please IGNORE

Paul Oranje commented on 20.01.2019 12:56

This OOPS also happens on a tplink-c2600, so it concerns more devices of the ipq806x target.

Also on the c2600 the OOPS is preceded by a number of messages like

kern.err kernel: [   92.235675] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1994 larger than size (1536)

The error happens within a minute after WAN is connected to the modem.

A test where WAN was mapped to a port on eth1 also resulted in the OOPS.

This OOPS is triggered when WAN is connected to a UBEE cable modem - presumably that device outputs frames with an incorrect len. Within a few days the ISP will change the modem for another DOCSIS3 modem. (A test behind a Compal modem did not trigger the OOPS).

Paul Oranje commented on 20.01.2019 13:38

According to the OpenWrt forum the error also hits the NBG6817.

Paul Oranje commented on 20.01.2019 16:36

In drivers/net/ethernet/stmicro/stmmac/stmmac_main.c, when length of received frame is larger then the size of the DMA buffer, the message mentioned above is logged and the code breaks from the loop processing arrived frames. [1].

This break was introduced in a commit named "stmmac: fix oversized frame reception" (commit 527c4a769d375ac0472450c52bde29087f49cd9) [2].

Speculation 1:
Some of the other conditions are followed by an else branch within the loop in stead of a break out of the loop, so maybe breaking out is not a correct step.
Since the OOPS is always preceded by several occurrences of oversized frames, possibly some needed resource handling (DMA ?) is skipped.

Aside, in the code directly following the test on the frame length follows a test (LLC frame type) that subtracts a few bytes from the length. Could it be that this test should be above the test on the frame length ?

Speculation 2:
Possibly the frame length is not correctly determined.

[1] https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c#L3398

[2] https://github.com/torvalds/linux/commit/e527c4a769d375ac0472450c52bde29087f49cd9

Hannu Nyman commented on 08.02.2019 18:03

Just adding here the best analysis & possible fixes that I have seen so far:

discussion by "ewald" in forum:

https://forum.openwrt.org/t/nbg6817-openwrt-rebooting-constantly/23036/51

(and a few earlier messages in the same thread)

I have been trying to fix the stmmac_main.c by adding code to properly deal with this unexpected larger size packages, but it requires a lot of code changes due to the way the rx ring buffer sizes are (pre)allocated based on MTU size.
> The kernel panic that I hit was due to a missing call to set the proper dma size (basically dma would overrun the buffer based on the real size of the packet) size after which an skb buffer free would free illegal memory. There is now a patch for this problem: here 2. I managed to fix this and 2 other issues, but kept hitting new bugs like starvation (basically ethernet port hangs).
> I believe the correct fix for this is now posted here 2. Most hangs went away, but some remained...
> When you configure a larger MTU, the driver allocates a 2K, 4K or 16K buffer and in this way manages to bypass a number of defects due to the extra headroom.
> That is why with MTU=4088 my router been performing flawlessly for the last 18 days, despite all the stress tests thrown it it.
>
> I think it's worth to back-port the master branch patches (a long list...) to 4.14.93 (or whatever is the current release). There are 20 or so major code changes/patches submitted 2 since the 4.14 version. If I have some time I will try to generate a patch set and build a kernel.

Possible fixes:

https://github.com/torvalds/linux/commit/fa0be0a43f101888ac677dba31b590963eafeaa1#diff-11cf855fef243c84239e6e5a90c50fd1

https://github.com/torvalds/linux/commit/4205c88eaf17b5f3ee30032d68df55cd5d9077a1#diff-11cf855fef243c84239e6e5a90c50fd1

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing