- Status Closed
- Percent Complete
- Task Type Bug Report
- Category Base system
- Assigned To No-one
- Operating System All
- Severity Low
- Priority Medium
- Reported Version openwrt-18.06
- Due in Version Undecided
-
Due Date
Undecided
- Private
Opened by Baptiste Jonglez - 13.06.2020
Last edited by Baptiste Jonglez - 06.07.2020
FS#3177 - procd fails to start rpcd on 18.06.8 because of a libubox regression
I’ve been trying to debug this regression in 18.06.8: in some circumstances, rpcd fails to start.
This is mostly visible as it breaks LuCI, see e.g. https://github.com/openwrt/luci/issues/3773 or https://forum.openwrt.org/t/luci-error-after-upgrade-to-r10949-or-r10951-etc-config-luci-seems-to-be-corrupt/56880
To reproduce on 18.06.8:
- remove the
rpcd
section in/etc/config/rpcd
- reboot (this is important)
- result:
rpcd
is not started, and the following log message is printed inlogread
:
Thu Feb 27 21:26:37 2020 daemon.info procd: Not starting instance rpcd::instance1, command not set
More details:
When this issue happens, it becomes impossible to start rpcd
with procd
, even when adding back the rpcd
section:
root@OpenWrt:~# PROCD_DEBUG=1 /etc/init.d/rpcd start { "name": "rpcd", "script": "\/etc\/init.d\/rpcd", "instances": { "instance1": { "command": [ "\/sbin\/rpcd" ] } }, "triggers": [ ], "data": { } } root@OpenWrt:~# ps | grep rpc 1614 root 1200 S grep rpc root@OpenWrt:~# uci add rpcd rpcd cfg027c4e root@OpenWrt:~# uci set rpcd.@rpcd[-1].timeout=30 root@OpenWrt:~# uci commit root@OpenWrt:~# PROCD_DEBUG=1 /etc/init.d/rpcd start { "name": "rpcd", "script": "\/etc\/init.d\/rpcd", "instances": { "instance1": { "command": [ "\/sbin\/rpcd", "-t", "30" ] } }, "triggers": [ ], "data": { } } root@OpenWrt:~# ps | grep rpc 1636 root 1200 S grep rpc root@OpenWrt:~# uci set rpcd.@rpcd[-1].socket=/var/run/ubus.sock root@OpenWrt:~# uci commit root@OpenWrt:~# PROCD_DEBUG=1 /etc/init.d/rpcd start { "name": "rpcd", "script": "\/etc\/init.d\/rpcd", "instances": { "instance1": { "command": [ "\/sbin\/rpcd", "-s", "\/var\/run\/ubus.sock", "-t", "30" ] } }, "triggers": [ ], "data": { } } root@OpenWrt:~# ps | grep rpc 1680 root 1200 S grep rpc
However, running rpcd
manually works perfectly well (and fixes LuCI):
root@OpenWrt:~# rpcd
Workaround:
To workaround the issue, it is necessary to:
- add a
rpcd
section with either asocket
ortimeout
option - reboot
At this point, rpcd
is started correctly, and everything works fine. It is even possible to delete the rpcd
section and restart rpcd
, it will still start correctly.
Finding the root cause:
There are very few commits between 18.06.7 and 18.06.8. None of these commits is touching procd
or rpcd
.
However, there has been a libubox fix in 82fbd857471292ca71dc06b05f11089962f33a4f. This is currently the prime suspect: I will try to revert this commit, and also try with the further libubox fixes that have not yet been backported.
06.07.2020 14:09
Reason for closing: Fixed
Additional comments about closing:
https:/ /git.openwrt.org/2dcf46b079b3e31aa70b1cf e64fd12e727683c16
I could confirm libubox is the root cause:
For this last one, I backported these 4 fixes in libubox:
Patch to backport the libubox fixes sent: https://patchwork.ozlabs.org/project/openwrt/patch/20200613181740.988875-1-baptiste@bitsofnetworks.org/