Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#3177 - procd fails to start rpcd on 18.06.8 because of a libubox regression #8048

Closed
openwrt-bot opened this issue Jun 13, 2020 · 2 comments
Labels

Comments

@openwrt-bot
Copy link

bjonglez:

I've been trying to debug this regression in 18.06.8: in some circumstances, rpcd fails to start.

This is mostly visible as it breaks LuCI, see e.g. openwrt/luci#3773 or https://forum.openwrt.org/t/luci-error-after-upgrade-to-r10949-or-r10951-etc-config-luci-seems-to-be-corrupt/56880

To reproduce on 18.06.8:

  • remove the ''rpcd'' section in ''/etc/config/rpcd''
  • reboot (this is important)
  • result: ''rpcd'' is not started, and the following log message is printed in ''logread'':
Thu Feb 27 21:26:37 2020 daemon.info procd: Not starting instance rpcd::instance1, command not set

More details:

When this issue happens, it becomes impossible to start ''rpcd'' with ''procd'', even when adding back the ''rpcd'' section:

root@OpenWrt:~# PROCD_DEBUG=1 /etc/init.d/rpcd start { "name": "rpcd", "script": "\/etc\/init.d\/rpcd", "instances": { "instance1": { "command": [ "\/sbin\/rpcd" ] } }, "triggers": [ ], "data": { } } root@OpenWrt:~# ps | grep rpc 1614 root 1200 S grep rpc

root@OpenWrt:# uci add rpcd rpcd
cfg027c4e
root@OpenWrt:
# uci set rpcd.@rpcd[-1].timeout=30
root@OpenWrt:# uci commit
root@OpenWrt:
# PROCD_DEBUG=1 /etc/init.d/rpcd start
{ "name": "rpcd", "script": "/etc/init.d/rpcd", "instances": { "instance1": { "command": [ "/sbin/rpcd", "-t", "30" ] } }, "triggers": [ ], "data": { } }
root@OpenWrt:~# ps | grep rpc
1636 root 1200 S grep rpc

root@OpenWrt:# uci set rpcd.@rpcd[-1].socket=/var/run/ubus.sock
root@OpenWrt:
# uci commit
root@OpenWrt:# PROCD_DEBUG=1 /etc/init.d/rpcd start
{ "name": "rpcd", "script": "/etc/init.d/rpcd", "instances": { "instance1": { "command": [ "/sbin/rpcd", "-s", "/var/run/ubus.sock", "-t", "30" ] } }, "triggers": [ ], "data": { } }
root@OpenWrt:
# ps | grep rpc
1680 root 1200 S grep rpc

However, running ''rpcd'' manually works perfectly well (and fixes LuCI):

root@OpenWrt:~# rpcd

Workaround:

To workaround the issue, it is necessary to:

  • add a ''rpcd'' section with either a ''socket'' or ''timeout'' option
  • reboot

At this point, ''rpcd'' is started correctly, and everything works fine. It is even possible to delete the ''rpcd'' section and restart ''rpcd'', it will still start correctly.

Finding the root cause:

There are very few commits between 18.06.7 and 18.06.8. None of these commits is touching ''procd'' or ''rpcd''.

However, there has been a libubox fix in 82fbd85. This is currently the prime suspect: I will try to revert this commit, and also try with the further libubox fixes that have not yet been backported.

@openwrt-bot
Copy link
Author

bjonglez:

I could confirm libubox is the root cause:

  • r8035-b98bfd4e9b (latest 18.06): same issue, rpcd does not start
  • r8035-b98bfd4e9b + reverting 82fbd85: rpcd now starts correctly
  • r8035-b98bfd4e9b + backport of latest libubox fixes: rpcd now starts correctly

For this last one, I backported these 4 fixes in libubox:

@openwrt-bot
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant