OpenWrt/LEDE Project

  • Status Closed
  • Percent Complete
    100%
  • Task Type Bug Report
  • Category Base system
  • Assigned To No-one
  • Operating System All
  • Severity Low
  • Priority Medium
  • Reported Version openwrt-18.06
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Baptiste Jonglez - 13.06.2020
Last edited by Baptiste Jonglez - 06.07.2020

FS#3177 - procd fails to start rpcd on 18.06.8 because of a libubox regression

I’ve been trying to debug this regression in 18.06.8: in some circumstances, rpcd fails to start.

This is mostly visible as it breaks LuCI, see e.g. https://github.com/openwrt/luci/issues/3773 or https://forum.openwrt.org/t/luci-error-after-upgrade-to-r10949-or-r10951-etc-config-luci-seems-to-be-corrupt/56880

To reproduce on 18.06.8:

  • remove the rpcd section in /etc/config/rpcd
  • reboot (this is important)
  • result: rpcd is not started, and the following log message is printed in logread:
Thu Feb 27 21:26:37 2020 daemon.info procd: Not starting instance rpcd::instance1, command not set

More details:

When this issue happens, it becomes impossible to start rpcd with procd, even when adding back the rpcd section:

root@OpenWrt:~# PROCD_DEBUG=1 /etc/init.d/rpcd start
{ "name": "rpcd", "script": "\/etc\/init.d\/rpcd", "instances": { "instance1": { "command": [ "\/sbin\/rpcd" ] } }, "triggers": [ ], "data": { } }
root@OpenWrt:~# ps | grep rpc
 1614 root      1200 S    grep rpc

root@OpenWrt:~# uci add rpcd rpcd
cfg027c4e
root@OpenWrt:~# uci set rpcd.@rpcd[-1].timeout=30
root@OpenWrt:~# uci commit
root@OpenWrt:~# PROCD_DEBUG=1 /etc/init.d/rpcd start
{ "name": "rpcd", "script": "\/etc\/init.d\/rpcd", "instances": { "instance1": { "command": [ "\/sbin\/rpcd", "-t", "30" ] } }, "triggers": [ ], "data": { } }
root@OpenWrt:~# ps | grep rpc
 1636 root      1200 S    grep rpc

root@OpenWrt:~# uci set rpcd.@rpcd[-1].socket=/var/run/ubus.sock
root@OpenWrt:~# uci commit
root@OpenWrt:~# PROCD_DEBUG=1 /etc/init.d/rpcd start
{ "name": "rpcd", "script": "\/etc\/init.d\/rpcd", "instances": { "instance1": { "command": [ "\/sbin\/rpcd", "-s", "\/var\/run\/ubus.sock", "-t", "30" ] } }, "triggers": [ ], "data": { } }
root@OpenWrt:~# ps | grep rpc
 1680 root      1200 S    grep rpc

However, running rpcd manually works perfectly well (and fixes LuCI):

root@OpenWrt:~# rpcd

Workaround:

To workaround the issue, it is necessary to:

  • add a rpcd section with either a socket or timeout option
  • reboot

At this point, rpcd is started correctly, and everything works fine. It is even possible to delete the rpcd section and restart rpcd, it will still start correctly.

Finding the root cause:

There are very few commits between 18.06.7 and 18.06.8. None of these commits is touching procd or rpcd.

However, there has been a libubox fix in 82fbd857471292ca71dc06b05f11089962f33a4f. This is currently the prime suspect: I will try to revert this commit, and also try with the further libubox fixes that have not yet been backported.

Closed by  Baptiste Jonglez
06.07.2020 14:09
Reason for closing:  Fixed
Additional comments about closing:  

https:/ /git.openwrt.org/2dcf46b079b3e31aa70b1cf e64fd12e727683c16

Project Manager
Baptiste Jonglez commented on 13.06.2020 18:06

I could confirm libubox is the root cause:

  • r8035-b98bfd4e9b (latest 18.06): same issue, rpcd does not start
  • r8035-b98bfd4e9b + reverting 82fbd8574: rpcd now starts correctly
  • r8035-b98bfd4e9b + backport of latest libubox fixes: rpcd now starts correctly

For this last one, I backported these 4 fixes in libubox:

Project Manager
Baptiste Jonglez commented on 13.06.2020 18:22

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing