New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FS#2483 - procd: watchdog timeout is too high for sun8i #6124
Comments
ynezz: Can you please attach (unedited) dmesg log?
indeed, this could be improved
your reasoning doesn't make much sense to me, as the watchdog should use default max. timeout of 16 seconds but it should be pinged every 5 seconds by procd, thus have no reason for reset |
halmartin: //Can you please attach (unedited) dmesg log?// It's hard for me to attach a dmesg log, since the board takes about 7 seconds to reach the uart console and the watchdog reboots it after 10 seconds. I've attached the full boot log as captured from the serial console. I booted with procd in //init_debug=4// //your reasoning doesn't make much sense to me, as the watchdog should use default max. timeout of 16 seconds but it should be pinged every 5 seconds by procd, thus have no reason for reset// I agree, from the code it appears that procd should ping the watchdog every 5 seconds and since the sun8i watchdog timeout is 16 seconds, this shouldn't cause the board to reset. However this is not the experience I've had. If I boot with //init=/bin/sh// the board will not reset. As soon as the unpatched procd starts though, there's only a few seconds before the watchdog resets the SoC. Since I have only a few seconds of console access via uart, it's very difficult to debug the root cause. Apologies, I can't seem to delete the uart log file I've attached. The first boot is my experimenting with //init=/bin/sh// and the following boot is with procd. |
ynezz: So from your dmesg it looks fine:
I don't see a reason why it should reboot after ~10s. Can you please check the return value of the ioctl, something like this? Disabling the ioctl completely would be worth trying as well. diff --git a/watchdog.c b/watchdog.c
index 9d770b4470a9..290432dd08c4 100644
--- a/watchdog.c
+++ b/watchdog.c
@@ -87,10 +87,16 @@ static void watchdog_close(void)
static int watchdog_set_drv_timeout(void)
{
+ int r;
+
if (wdt_fd < 0)
return -1;
- return ioctl(wdt_fd, WDIOC_SETTIMEOUT, &wdt_drv_timeout);
+ r = ioctl(wdt_fd, WDIOC_SETTIMEOUT, &wdt_drv_timeout);
+ if (r < 0)
+ ERROR("WDT failed to set timeout: %m\n");
+
+ return r;
}
void watchdog_set_magicclose(bool val) |
halmartin: I tried applying the above patch, and indeed it appears the ioctl is failing:
Full boot log attached. |
ynezz:
Ok, good to know, but this shouldn't influence the 16s watchdog timeout. Anyway, can you try it without the ioctl altogether, something like this:
Do I understand that log correctly, that it reboots after ~4 seconds? This is really strange and make no sense. |
halmartin: I think the issue is that //procd// is trying to set a timeout value above the maximum that //sunxi-wdt// supports, and thus the kernel is returning -EINVAL. This is confirmed with the following patch:
I believe the kernel side is here: https://github.com/torvalds/linux/blob/master/drivers/watchdog/watchdog_core.c#L138-L139 You can see //procd// is receiving //-EINVAL// until wdt_drv_timeout is below the maximum supported by //sunxi-wdt//:
Unfortunately this doesn't completely resolve the issue for me, the watchdog still resets the SoC a few seconds later. |
halmartin: Even the following patch isn't enough to prevent the watchdog from resetting the SoC:
It is stable when I set //wdt_frequency = 1//, but otherwise it seems to reset after a slightly longer lifetime. :( |
ynezz:
Sure, that's clear.
That's why I've asked you (two times already) to remove that WDIOC_SETTIMEOUT ioctl altogether. |
halmartin: Removing the ioctl does not resolve the issue. Patch: --- a/watchdog.c 2019-09-13 11:04:30.725915275 +0000
+++ b/watchdog.c 2019-09-13 11:46:53.658245623 +0000
@@ -90,7 +90,7 @@
if (wdt_fd < 0)
return -1;
- return ioctl(wdt_fd, WDIOC_SETTIMEOUT, &wdt_drv_timeout);
+ return 0;
}
void watchdog_set_magicclose(bool val)```
It is still resetting shortly after boot. Log attached. |
ynezz:
Ok, so this basically confirms, that the problem is not related to procd. You'll probably get the same result with init=/bin/sh and then opening the watchdog device (cat /dev/watchdog should be probably enough), thus starting the watchdog.
Then the problem is probably somewhere in the kernel and you should be able to figure it out with a bunch of printfs at the obvious places (sunxi_wdt_set_timeout, sunxi_wdt_start, sunxi_wdt_ping). |
Not a bug in procd, too old. |
halmartin:
Device: Orange Pi M2 Ultra
Version: trunk
Steps to reproduce:
Problem:
I was able to mitigate the watchdog reset with the following patch to procd:
--- a/watchdog.c 2019-09-06 06:13:52.293677306 +0000
+++ b/watchdog.c 2019-09-06 06:14:46.716987215 +0000
@@ -30,13 +30,13 @@
static struct uloop_timeout wdt_timeout;
static int wdt_fd = -1;
-static int wdt_drv_timeout = 30;
-static int wdt_frequency = 5;
+static int wdt_drv_timeout = 5;
+static int wdt_frequency = 1;
static bool wdt_magicclose = false;
void watchdog_ping(void)
{
DEBUG(4, "Ping watchdog\n");
if (wdt_fd >= 0 && write(wdt_fd, "X", 1) < 0)
ERROR("WDT failed to write: %m\n");
}
@@ -170,6 +170,7 @@
watchdog_timeout_cb(&wdt_timeout);
DEBUG(4, "Opened watchdog with timeout %ds\n", watchdog_timeout(0));
watchdog_ping();
}
I'm sorry but I don't know enough C to provide a patch that includes a configurable watchdog timeout in the main menuconfig for procd.
The text was updated successfully, but these errors were encountered: