FS#4146 - e1000e: Detected Hardware Unit Hang, Reset adapter unexpectedly #9135

openwrt-bot · 2021-11-22T03:53:02Z

misieck:

System is a Fujitsu Esprimo C5731 with Intel Core2Duo E7500 and 4 GB RAM.

The problem NIC:
00:19.0 Ethernet controller [0200]: Intel Corporation 82567LF-3 Gigabit Network Connection [8086:10df] (rev 02)

Openwrt:
OpenWrt x86_64 21.02.1 r16325-88151b8303

System is configured as a simple router with the e1000e NIC as WAN and a skge NIC [Ethernet controller [0200]: D-Link System Inc Gigabit Ethernet Adapter [1186:4c00] (rev 11)] as LAN.
When doing a speedtest through the router (bredbandskollen.se) the hang occurs during the upload test (when the e1000e NIC sends data and the skge NIC receives data). The download test does not cause the error.

Similar (identical?) problems were reported previously:
https://serverfault.com/questions/616485/e1000e-reset-adapter-unexpectedly-detected-hardware-unit-hang
https://serverfault.com/questions/193114/linux-e1000e-intel-networking-driver-problems-galore-where-do-i-start
https://web.archive.org/web/20160205153351/http://ehc.ac:80/p/e1000/bugs/378/

Turning TSO off is a workaround.
ethtool -K eth0 tso off

but pcie_aspm=off does not help.
[49573.954931] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: [49573.954931] TDH <2> [49573.954931] TDT <1a> [49573.954931] next_to_use <1a> [49573.954931] next_to_clean [49573.954931] buffer_info[next_to_clean]: [49573.954931] time_stamp <100bbf478> [49573.954931] next_to_watch <2> [49573.954931] jiffies <100bbf6f8> [49573.954931] next_to_watch.status <0> [49573.954931] MAC Status <80083> [49573.954931] PHY Status <796d> [49573.954931] PHY 1000BASE-T Status <3800> [49573.954931] PHY Extended Status <3000> [49573.954931] PCI Status <10> [49575.970909] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: [49575.970909] TDH <2> [49575.970909] TDT <1a> [49575.970909] next_to_use <1a> [49575.970909] next_to_clean [49575.970909] buffer_info[next_to_clean]: [49575.970909] time_stamp <100bbf478> [49575.970909] next_to_watch <2> [49575.970909] jiffies <100bbf8f0> [49575.970909] next_to_watch.status <0> [49575.970909] MAC Status <80083> [49575.970909] PHY Status <796d> [49575.970909] PHY 1000BASE-T Status <3800> [49575.970909] PHY Extended Status <3000> [49575.970909] PCI Status <10> [49577.954909] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: [49577.954909] TDH <2> [49577.954909] TDT <1a> [49577.954909] next_to_use <1a> [49577.954909] next_to_clean [49577.954909] buffer_info[next_to_clean]: [49577.954909] time_stamp <100bbf478> [49577.954909] next_to_watch <2> [49577.954909] jiffies <100bbfae0> [49577.954909] next_to_watch.status <0> [49577.954909] MAC Status <80083> [49577.954909] PHY Status <796d> [49577.954909] PHY 1000BASE-T Status <3800> [49577.954909] PHY Extended Status <3000> [49577.954909] PCI Status <10> [49578.082559] e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly [49578.254005] e1000e: eth0 NIC Link is Down [49581.083429] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

The text was updated successfully, but these errors were encountered:

openwrt-bot · 2021-12-19T06:40:24Z

equid0x:

This is known as the "TX Unit Hang" issue and its allegedly a bug in silicon that can't be fixed. As far as I recall, Intel released an updated microcode(included in driver) for this series of chips that partially mitigates, but does not completely eliminate the issue. This is a very, very old issue.

I believe the workaround is to turn off checksum offloading:

ethtool -K eth0 tx off rx off

The bug is probably reproducible if you use something like iPerf or Netcat to totally flood the affected interface with TX traffic for an extended period of time (several minutes).

I did a cursory search on this out of curiosity and interestingly, there is at least one user who has reported that the issue does not seem to occur while running under kernel 5.11 so its possible someone finally tracked down and fixed a long standing bug in the driver source. This issue has been around since at least 2009(!).

openwrt-bot · 2021-12-23T06:38:40Z

misieck:

The problem does not exhibit in OPNSense. At least not in an overly noticable way. So even if it is a hardware problem, there ostensibly exist a workable workaround.

aparcar added release/21.02 pull request/issue targeted (also) for OpenWrt 21.02 release kernel pull request/issue with Linux kernel related changes labels Feb 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FS#4146 - e1000e: Detected Hardware Unit Hang, Reset adapter unexpectedly #9135

FS#4146 - e1000e: Detected Hardware Unit Hang, Reset adapter unexpectedly #9135

openwrt-bot commented Nov 22, 2021

openwrt-bot commented Dec 19, 2021

openwrt-bot commented Dec 23, 2021

FS#4146 - e1000e: Detected Hardware Unit Hang, Reset adapter unexpectedly #9135

FS#4146 - e1000e: Detected Hardware Unit Hang, Reset adapter unexpectedly #9135

Comments

openwrt-bot commented Nov 22, 2021

openwrt-bot commented Dec 19, 2021

openwrt-bot commented Dec 23, 2021