Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#1544 - High load on Ubiquiti Nanostation XM - maybe related to "workingset_refault" (vmstat) #6497

Closed
openwrt-bot opened this issue May 11, 2018 · 6 comments
Labels

Comments

@openwrt-bot
Copy link

sumpfralle:

Our local wireless community uses a lot of Ubiquiti devices.

They all worked well with Chaos Calmer.

With LEDE 17.01 we started to see load issues with Nanostation M5 XM devices (the older Nanostation model, only 32 MB). We did not notice the issue with any other device up to now.

After a few hours of uptime the routers will start to develop persistent high load (>8) and usually "recover" only after a reboot. "wifi up/down" do not seem to affect the issue.

The problem is almost non-existing for devices using only a single ethernet port. Devices using both ethernet ports suffer greatly (problems starting usually within 24 hours). Thus I could imagine, that [[https://bugs.openwrt.org/index.php?do=details&task_id=296|issue #296]] is related (just wild guessing).

Traffic on the wireless interface seems to increase the likelyhood of the problem (maybe CPU utilization in general).

"top" and other tools do not show processes, that could cause the high load.

The only unusual metric that seems to be connected to the high-load situation seems to be "workingset_refault" (see /proc/vmstat).
See the following output:

root@AP-1-96:~# while sleep 10; do grep workingset_ /proc/vmstat; done
workingset_refault 1304983
workingset_activate 392198
workingset_nodereclaim 10330
workingset_refault 1308585
workingset_activate 393391
workingset_nodereclaim 10352
workingset_refault 1308671
workingset_activate 393412
workingset_nodereclaim 10352
workingset_refault 1310284
workingset_activate 393940
workingset_nodereclaim 10374
workingset_refault 1317360
workingset_activate 396226
workingset_nodereclaim 10454
workingset_refault 1317465
workingset_activate 396251
workingset_nodereclaim 10454
workingset_refault 1317540
workingset_activate 396292
workingset_nodereclaim 10454
workingset_refault 1324449
workingset_activate 398402
workingset_nodereclaim 10508
workingset_refault 1328418
workingset_activate 399908
workingset_nodereclaim 10536
workingset_refault 1328796
workingset_activate 400114
workingset_nodereclaim 10536
workingset_refault 1329186
workingset_activate 400213
workingset_nodereclaim 10546
workingset_refault 1333889
workingset_activate 401528
workingset_nodereclaim 10594

Above you see 13k "workingset_refault" events within 60 seconds. The "workingset_refault" value stays at zero for routers with the same kernel, that do now show this problem. Thus I could imagine, that this is related to the high load.

Now I am running out of ideas, how to research the issue. Maybe someone can give me a hint, what I could try?

Just for reference: we are also discussing this issue in the bug tracker of our local wireless community (https://dev.opennet-initiative.de/ticket/187 - only in German). But this discussion may be a bit hard to read, as we were hunting down different potential causes of the problem. But sadly each of our theories dissolved without giving a hint for the root cause.

@openwrt-bot
Copy link
Author

sumpfralle:

Sorry - there was a confusing typo in my description above:

Above you see 13k “workingset_refault” events within 60 seconds. The “workingset_refault” value stays at zero for routers with the same kernel, that do now show this problem.

"that do now show this problem" -> "that do not show this problem"

@openwrt-bot
Copy link
Author

sumpfralle:

Does someone have an idea how I could debug this issue?

It is a bit sad to see, that we are starting to replace the XM devices in our local wireless community, since they cannot run the latest firmware releases due to the above device-specific excessive load.

@openwrt-bot
Copy link
Author

ynezz:

perf top could be a good start

@openwrt-bot
Copy link
Author

sumpfralle:

perf top could be a good start

Thank you for this suggestion!

I just built an image with perf and tried to run it, bit sadly it segfaults :(

I am bit at loss here, how to investigate this due to the limited resources of the device. Do you have an advice for me?

Thank you!

@openwrt-bot
Copy link
Author

ynezz:

I just built an image with perf and tried to run it, bit sadly it segfaults :(

This is happening on the latest ar71xx/ath79? Do you've perf support enabled in kernel as well? I'm using following configuration on ath79(should as well work on ar71xx) and perf top works for me:

CONFIG_KERNEL_DYNAMIC_DEBUG=y
CONFIG_KERNEL_DYNAMIC_FTRACE=y
CONFIG_KERNEL_FTRACE=y
CONFIG_KERNEL_FTRACE_SYSCALLS=y
CONFIG_KERNEL_FUNCTION_GRAPH_TRACER=y
CONFIG_KERNEL_FUNCTION_PROFILER=y
CONFIG_KERNEL_FUNCTION_TRACER=y
CONFIG_KERNEL_KPROBES=y
CONFIG_KERNEL_KPROBE_EVENT=y
CONFIG_KERNEL_KPROBE_EVENTS=y
CONFIG_KERNEL_PERF_EVENTS=y
CONFIG_KERNEL_PROFILING=y
CONFIG_PACKAGE_iperf3=y
CONFIG_PACKAGE_perf=y

@aparcar
Copy link
Member

aparcar commented Dec 2, 2022

This issue is for a EOL release, please comment if this bug still affects you in currently supported releases.

@aparcar aparcar closed this as not planned Won't fix, can't repro, duplicate, stale Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants