OpenWrt/LEDE Project

  • Status Unconfirmed
  • Percent Complete
  • Task Type Bug Report
  • Category Base system
  • Assigned To No-one
  • Operating System All
  • Severity Low
  • Priority Very Low
  • Reported Version Trunk
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Johan van Zoomeren - 28.01.2017

FS#441 - Kernel crash: eth0 (ag71xx): transmit queue 0 timed out

Device: TL-WR1043ND v1
LEDE: snapshot r3189-12db207

During a simultaneous bidirectional iperf load test, after about 20 minutes, the kernel crashes. I reproduced this several times:

Server 1 ←–> 1043ND ←–> Laptop via wireless N

LEDE is using a default setup. Only changes:
* Setting wireless encryption to psk with password
* Setting a DNAT rule for server 1 to be able reach the iperf server on the Laptop

This bug was actually discovered while testing fixes for  FS#13  - Ath9k AP stays up for connected clients but doesn’t show in scan on new ones

Serial console ouput:

[ 1294.022551] ------------[ cut here ]------------
[ 1294.027247] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:306 dev_watchdog+0x1dc/0x260()
[ 1294.035754] NETDEV WATCHDOG: eth0 (ag71xx): transmit queue 0 timed out
[ 1294.042319] Modules linked in: ath9k ath9k_common pppoe ppp_async ath9k_hw ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nn
[ 1294.106287] CPU: 0 PID: 0 Comm: swapper Not tainted 4.4.45 #0
[ 1294.112066] Stack : 803e4844 00000000 00000001 80440000 8042f1dc 8042ee63 803c5e64 00000000
          804a378c 8042d4fc 00000200 00100000 0000000a 800a7618 803cb554 80430000
          00000003 8042d4fc 803c9960 81809e34 0000000a 800a5594 00000006 00000000
          00000000 801f5400 00000000 00000000 00000000 00000000 00000000 00000000
          00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1294.148122] Call Trace:
[ 1294.150617] [<800a7618>] vprintk_default+0x24/0x30
[ 1294.155474] [<800a5594>] printk+0x2c/0x38
[ 1294.159515] [<801f5400>] wait_for_xmitr+0x84/0xcc
[ 1294.164289] [<80081c3c>] warn_slowpath_common+0xa0/0xd0
[ 1294.169564] [<801a72dc>] dump_stack+0x14/0x28
[ 1294.173975] [<80071eb0>] show_stack+0x50/0x84
[ 1294.178376] [<80081c3c>] warn_slowpath_common+0xa0/0xd0
[ 1294.183661] [<8028ef3c>] dev_watchdog+0x1dc/0x260
[ 1294.188408] [<80081c98>] warn_slowpath_fmt+0x2c/0x38
[ 1294.193450] [<8028ef3c>] dev_watchdog+0x1dc/0x260
[ 1294.198191] [<8028ed60>] dev_watchdog+0x0/0x260
[ 1294.202782] [<800b08d0>] call_timer_fn.isra.5+0x24/0x80
[ 1294.208051] [<800b0b54>] run_timer_softirq+0x1b4/0x1fc
[ 1294.213248] [<800a89f0>] handle_irq_event_percpu+0x154/0x188
[ 1294.218960] [<800841b8>] __do_softirq+0x250/0x298
[ 1294.223721] [<800abdac>] handle_percpu_irq+0x50/0x80
[ 1294.228746] [<8006a9e0>] plat_irq_dispatch+0xd4/0x10c
[ 1294.233848] [<80060bf4>] handle_int+0x134/0x140
[ 1294.238400] 
[ 1294.239904] ---[ end trace 17bad011a41ccba7 ]---
[ 1294.244567] eth0: tx timeout
[ 1299.022570] eth0: tx timeout
[ 1304.022581] eth0: tx timeout
[ 1309.022588] eth0: tx timeout

The eth0: tx timeout line is repeated every 5 seconds.

Sven Schönhoff commented on 29.01.2017 08:14

@Johan: Regarding your question in  FS#13 : It's not possible to run the tests on the smartphone simultaneously but I will try to borrow a laptop to run the tests again.

Sven Schönhoff commented on 29.01.2017 17:40

Is your device overclocked, Johan?
I was able to reproduce a kernel crash/reboot with a TL-WR1043ND v1 unit overclocked at 430 MHz by running a bidirectional iperf load test for a few minutes. Reverting it to 400 MHz fixed it for me.

Johan van Zoomeren commented on 29.01.2017 18:23

Sven, thanks for trying this out too. No, I haven't been overclocking; using the device just as is, only with a serial console added.

[    0.000000] Clocks: CPU:400.000MHz, DDR:400.000MHz, AHB:200.000MHz, Ref:5.000MHz

How long have you been running the test without over clocking?

Sven Schönhoff commented on 29.01.2017 18:26

About 20 minutes. I can repeat the test if that wasn't long enough.

Johan van Zoomeren commented on 29.01.2017 18:54

If you have the time, please let the test run some longer. And also generate some normal load on the CPU, during the test. I've been using the router during iperf test also for normal internet browsing. Running top, etc. I'm hoping with some extra CPU load the issue will surface.

Installing packages using opkg seems to causes another issue; in my case once it just rebooted without spitting anything to the console. All in all the 1043ND doesn't yet sound stable on master.

Sven Schönhoff commented on 01.02.2017 17:14

Johan, sorry but I still can't reproduce this issue. I just tested the current snapshot with bidirectional iperf load, multiple putty windows with top und multiple browser windows with luci for 40 minutes.
I started from a default setup and only enabled wifi with WPA2-PSK, Force CCMP (AES) and password.
Could you please explain what you mean with DNAT rule? Is it a port forwarding rule in the firewall settings?

Johan van Zoomeren commented on 01.02.2017 19:51

Thanks Sven for having another look. In don't want to put you yet through another round of tests. Looks like it's more an hardware issue with my device then an software issue. But if you're still interested.

First I start iperf -c(client) on the laptop(, which then connects over wifi and then is NATted to my test server ( network). Due to the NAT, the server won't be able to connect back to the laptop without some help. SO after the first iperf is started, I enter:

# iptables -t nat -I PREROUTING -p tcp –dport 5001 -j DNAT –to
# iptables -t filter -I FORWARD -j ACCEPT

And then the second iperf -c(client) is started on the server, which connect to the laptop.

I've been performing this test in my mini lab, because this is how the 1043ND is going to be used, when connected to the Internet.

psyborg55 commented on 05.03.2017 20:34

have you both used same revison for testing?


Available keyboard shortcuts


Task Details

Task Editing