OpenWrt/LEDE Project

  • Status Closed
  • Percent Complete
    100%
  • Task Type Bug Report
  • Category Base system
  • Assigned To No-one
  • Operating System All
  • Severity Low
  • Priority Very Low
  • Reported Version Trunk
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Hertog Jan - 10.09.2017
Last edited by Mathias Kresin - 13.09.2017

FS#1004 - TP-Link Archer C2600 very high idle load

Supply the following if possible:
- Device problem occurs on
- Software versions of LEDE release, packages, etc.
- Steps to reproduce

When flashing a TP-Link Archer C2600 with a new master image, the device idles with a load of between 0.9 and 1.5. `top` shows a sys% of between 40% and 55%, with a `[kworker/1:1]` taking up most of that at around 38-40%, followed by a `[kworker/0:1]` at around 2-5%. Furthermore, using the device for internet ‘feels’ sluggish, and via wifi I could not get much more than around 5-8mbit up and down (whereas I should be able to achieve at least 200/30 via wifi).

The device: TP-Link Archer C2600 v1
Software: LEDE master r4807-bd24d53, kernel 4.9.47.
Steps: perform a clean install and observe the problem.

As requested in IRC, `ps` output (also attached):

  PID USER       VSZ STAT COMMAND
    1 root      1328 S    /sbin/procd
    2 root         0 SW   [kthreadd]
    3 root         0 SW   [ksoftirqd/0]
    5 root         0 SW<  [kworker/0:0H]
    6 root         0 SW   [kworker/u4:0]
    7 root         0 SW   [rcu_sched]
    8 root         0 SW   [rcu_bh]
    9 root         0 SW   [migration/0]
   10 root         0 SW<  [lru-add-drain]
   11 root         0 SW   [watchdog/0]
   12 root         0 SW   [cpuhp/0]
   13 root         0 SW   [cpuhp/1]
   14 root         0 SW   [watchdog/1]
   15 root         0 SW   [migration/1]
   16 root         0 SW   [ksoftirqd/1]
   18 root         0 SW<  [kworker/1:0H]
   19 root         0 SW   [oom_reaper]
   20 root         0 SW<  [writeback]
   21 root         0 SW<  [crypto]
   22 root         0 SW<  [bioset]
   23 root         0 SW<  [kblockd]
   24 root         0 SW<  [watchdogd]
   25 root         0 SW   [kworker/0:1]
   26 root         0 SW   [kswapd0]
   27 root         0 SW<  [vmstat]
   34 root         0 SW<  [pencrypt]
   35 root         0 SW<  [pdecrypt]
   52 root         0 SW<  [bioset]
   53 root         0 SW<  [bioset]
   54 root         0 SW<  [bioset]
   55 root         0 SW<  [bioset]
   56 root         0 SW<  [bioset]
   57 root         0 SW<  [bioset]
   58 root         0 SW<  [bioset]
   59 root         0 SW<  [bioset]
   60 root         0 SW   [spi32766]
   61 root         0 SW<  [bioset]
   62 root         0 RW   [kworker/1:1]
   63 root         0 SW<  [bioset]
   64 root         0 SW<  [bioset]
   65 root         0 SW<  [bioset]
   66 root         0 SW<  [bioset]
   67 root         0 SW<  [bioset]
   68 root         0 SW<  [bioset]
   69 root         0 SW<  [bioset]
   70 root         0 SW<  [bioset]
   71 root         0 SW<  [bioset]
   72 root         0 SW<  [bioset]
   73 root         0 SW<  [bioset]
   74 root         0 SW<  [bioset]
   75 root         0 SW<  [bioset]
   76 root         0 SW<  [bioset]
   77 root         0 SW<  [bioset]
   78 root         0 SW<  [bioset]
   79 root         0 SW<  [bioset]
   80 root         0 SW<  [bioset]
   81 root         0 SW<  [bioset]
   82 root         0 SW<  [bioset]
   83 root         0 SW<  [bioset]
   84 root         0 SW<  [bioset]
   85 root         0 SW<  [bioset]
   86 root         0 SW<  [bioset]
   87 root         0 SW<  [bioset]
   88 root         0 SW   [kworker/u4:1]
   93 root         0 SW<  [ipv6_addrconf]
   94 root         0 SW<  [kworker/0:1H]
   95 root         0 SW<  [kworker/1:1H]
   98 root         0 SW   [irq/71-gpio-key]
   99 root         0 SW   [irq/86-gpio-key]
  100 root         0 SW   [irq/87-gpio-key]
  101 root         0 SW   [irq/38-gpio-key]
  102 root         0 SW<  [ata_sff]
  179 root       968 S    /sbin/ubusd
  180 root       676 S    /sbin/askfirst /usr/libexec/login.sh
  202 root         0 SW<  [cfg80211]
  203 root         0 SW<  [ath10k_wq]
  204 root         0 SW<  [ath10k_aux_wq]
  222 root         0 SW<  [ath10k_wq]
  223 root         0 SW<  [ath10k_aux_wq]
  524 root      1012 S    /sbin/logd -S 64
  533 root      1320 S    /sbin/rpcd
  584 root      1452 S    /sbin/netifd
  600 root      1216 S    /usr/sbin/odhcpd
  737 root       808 S    odhcp6c -s /lib/netifd/dhcpv6.script -P0 -t120 eth0.2
  739 root      1040 S    udhcpc -p /var/run/udhcpc-eth0.2.pid -s /lib/netifd/dhcp.script -f -t 0 -
  770 root       816 S    /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300 -T 3
  835 dnsmasq   1132 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg02411c -k -x /var/run/dnsma
  852 root      2140 S    /usr/sbin/uhttpd -f -h /www -r LEDE -x /cgi-bin -u /ubus -t 60 -T 30 -k 2
  950 root       884 S    /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300 -T 3
  958 root      1044 S    -ash
 1063 root         0 SWN  [jffs2_gcd_mtd13]
 1124 root      1040 S<   /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p 0.lede.pool.ntp.org -p
 1497 root         0 SW   [kworker/1:0]
 1805 root         0 SW   [kworker/0:2]
 1806 root         0 SW   [kworker/1:2]
 1808 root         0 SW   [kworker/0:0]
 1814 root      1040 R    ps
Closed by  Mathias Kresin
13.09.2017 13:38
Reason for closing:  Fixed
Additional comments about closing:  

Fixed with ht tps://git.lede-project.org/eff3549c5883a 9abc5dbff00c084cabbcfdf4437

ashbenz commented on 10.09.2017 21:01

I can confirm this on Archer v1.1 that I have access to, a kworker process hogs up one of the two cores.
I see normal load/performance on my v1.0 though.

Is your device a v1.1?

Paul Oranje commented on 10.09.2017 21:47

On a C2600 running 17.01-SNAPSHOT, r3428-443d705:

# top
Mem: 122108K used, 359560K free, 316K shrd, 5192K buff, 16920K cached
CPU: 0% usr 7% sys 0% nic 92% idle 0% io 0% irq 0% sirq
Load average: 0.17 0.14 0.11 1/99 8470

Cannot tell the used C2600 version (deployed remote).

Project Manager
Baptiste Jonglez commented on 10.09.2017 22:01

I have a C2600 v1.1 running a recent 17.01 build, and I never saw this issue.

So it must be something recently introduced in master.

ashbenz commented on 10.09.2017 23:30

Actually just reverting a couple dozen commits from current trunk will remove the problem and load will return to normal.

So its recently introduced (possibly ubox/busybox bumps?)

Hertog Jan commented on 11.09.2017 05:37

Hmm, I'm willing to `git bisect` those few dozen commits, let me check tonight.

I'm also using a v1.1, according to the sticker on the back. TP-Link software said v1. Never assume, I guess.

Hertog Jan commented on 11.09.2017 22:34

Turns out 4d8a66d9346373c2a7fcac5bdae3f662a9dbd9df introduces sys% to around 30%, not sure why yet. I'm going to dig deeper later this week. With a commit before that, my devices idles with a load of 0.3 and a sys% of around 5 to 8% which is already better, but my main device which I'm currently using has a load of 0.0 with 98% idle, so I'm sure there's room for improvement.

Project Manager
Baptiste Jonglez commented on 12.09.2017 13:05

There are some explanations about the issue here: https://github.com/lede-project/source/pull/1277#issuecomment-328789438

Hertog Jan commented on 12.09.2017 20:49

On GitHub there was the suggestion to change the LED triggers to none. I have done so on the last good commit I know of, and sys% has dropped from 5-8% to now 1-3%. There is still one kworker process taking a little bit of sys%.

I have flashed the first bad commit (r4776-4d8a66d) and disabled the LED triggers. On this revision that also lowers sys% to 1-3% with a load of around 0.09 to 0.25. This last bit might be caused by another issue, but I think from this test we can be fairly sure there is an interaction between driver and LED trigger (other than the LED blinking, I mean).

Project Manager
Jonas Gorski commented on 13.09.2017 08:43

The issue might be that on ipq806x, the switch chips are connected through a GPIO bit banged MDIO bus, so any switch register accesses are expensive (relatively).

Project Manager
Mathias Kresin commented on 13.09.2017 13:44

@Jonas: I see the same issue on lantiq.

Adriano commented on 19.09.2017 18:24

Quick question.
Is this bug affecting 17.01-SNAPSHOT?

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing