New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FS#441 - Kernel crash: eth0 (ag71xx): transmit queue 0 timed out #5447
Comments
IronicSven: @johan: Regarding your question in FS#13: It's not possible to run the tests on the smartphone simultaneously but I will try to borrow a laptop to run the tests again. |
IronicSven: Is your device overclocked, Johan? |
amain: Sven, thanks for trying this out too. No, I haven't been overclocking; using the device just as is, only with a serial console added.
[ 0.000000] Clocks: CPU:400.000MHz, DDR:400.000MHz, AHB:200.000MHz, Ref:5.000MHz
How long have you been running the test without over clocking? |
IronicSven: About 20 minutes. I can repeat the test if that wasn't long enough. |
amain: If you have the time, please let the test run some longer. And also generate some normal load on the CPU, during the test. I've been using the router during iperf test also for normal internet browsing. Running top, etc. I'm hoping with some extra CPU load the issue will surface. Installing packages using opkg seems to causes another [[https://bugs.lede-project.org/index.php?do=details&task_id=120|issue]]; in my case once it just rebooted without spitting anything to the console. All in all the 1043ND doesn't yet sound stable on master. |
IronicSven: Johan, sorry but I still can't reproduce this issue. I just tested the current snapshot with bidirectional iperf load, multiple putty windows with top und multiple browser windows with luci for 40 minutes. |
amain: Thanks Sven for having another look. In don't want to put you yet through another round of tests. Looks like it's more an hardware issue with my device then an software issue. But if you're still interested. First I start iperf -c(client) on the laptop(192.168.1.152), which then connects over wifi and then is NATted to my test server (192.168.100.0/24 network). Due to the NAT, the server won't be able to connect back to the laptop without some help. SO after the first iperf is started, I enter: iptables -t nat -I PREROUTING -p tcp --dport 5001 -j DNAT --to 192.168.1.152iptables -t filter -I FORWARD -j ACCEPTAnd then the second iperf -c(client) is started on the server, which connect to the laptop. I've been performing this test in my mini lab, because this is how the 1043ND is going to be used, when connected to the Internet. |
psyborg55: have you both used same revison for testing? |
amain:
Device: TL-WR1043ND v1
LEDE: snapshot r3189-12db207
During a simultaneous bidirectional iperf load test, after about 20 minutes, the kernel crashes. I reproduced this several times:
Server 1 <---> 1043ND <---> Laptop via wireless N
LEDE is using a default setup. Only changes:
This bug was actually discovered while testing fixes for FS#13 - Ath9k AP stays up for connected clients but doesn't show in scan on new ones
Serial console ouput:
[ 1294.022551] ------------[ cut here ]------------ [ 1294.027247] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:306 dev_watchdog+0x1dc/0x260() [ 1294.035754] NETDEV WATCHDOG: eth0 (ag71xx): transmit queue 0 timed out [ 1294.042319] Modules linked in: ath9k ath9k_common pppoe ppp_async ath9k_hw ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nn [ 1294.106287] CPU: 0 PID: 0 Comm: swapper Not tainted 4.4.45 #0 [ 1294.112066] Stack : 803e4844 00000000 00000001 80440000 8042f1dc 8042ee63 803c5e64 00000000 804a378c 8042d4fc 00000200 00100000 0000000a 800a7618 803cb554 80430000 00000003 8042d4fc 803c9960 81809e34 0000000a 800a5594 00000006 00000000 00000000 801f5400 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ... [ 1294.148122] Call Trace: [ 1294.150617] [<800a7618>] vprintk_default+0x24/0x30 [ 1294.155474] [<800a5594>] printk+0x2c/0x38 [ 1294.159515] [<801f5400>] wait_for_xmitr+0x84/0xcc [ 1294.164289] [<80081c3c>] warn_slowpath_common+0xa0/0xd0 [ 1294.169564] [<801a72dc>] dump_stack+0x14/0x28 [ 1294.173975] [<80071eb0>] show_stack+0x50/0x84 [ 1294.178376] [<80081c3c>] warn_slowpath_common+0xa0/0xd0 [ 1294.183661] [<8028ef3c>] dev_watchdog+0x1dc/0x260 [ 1294.188408] [<80081c98>] warn_slowpath_fmt+0x2c/0x38 [ 1294.193450] [<8028ef3c>] dev_watchdog+0x1dc/0x260 [ 1294.198191] [<8028ed60>] dev_watchdog+0x0/0x260 [ 1294.202782] [<800b08d0>] call_timer_fn.isra.5+0x24/0x80 [ 1294.208051] [<800b0b54>] run_timer_softirq+0x1b4/0x1fc [ 1294.213248] [<800a89f0>] handle_irq_event_percpu+0x154/0x188 [ 1294.218960] [<800841b8>] __do_softirq+0x250/0x298 [ 1294.223721] [<800abdac>] handle_percpu_irq+0x50/0x80 [ 1294.228746] [<8006a9e0>] plat_irq_dispatch+0xd4/0x10c [ 1294.233848] [<80060bf4>] handle_int+0x134/0x140 [ 1294.238400] [ 1294.239904] ---[ end trace 17bad011a41ccba7 ]--- [ 1294.244567] eth0: tx timeout [ 1299.022570] eth0: tx timeout [ 1304.022581] eth0: tx timeout [ 1309.022588] eth0: tx timeout
The eth0: tx timeout line is repeated every 5 seconds.
The text was updated successfully, but these errors were encountered: