Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#2483 - procd: watchdog timeout is too high for sun8i #6124

Closed
openwrt-bot opened this issue Sep 7, 2019 · 11 comments
Closed

FS#2483 - procd: watchdog timeout is too high for sun8i #6124

openwrt-bot opened this issue Sep 7, 2019 · 11 comments
Labels
flyspray target/sunxi pull request/issue for sunxi target

Comments

@openwrt-bot
Copy link

halmartin:

Device: Orange Pi M2 Ultra
Version: trunk
Steps to reproduce:

  • build trunk with the DTS for Orange Pi M2 Ultra
  • boot
  • wait 10 seconds
  • watchdog will reset the SoC

Problem:

I was able to mitigate the watchdog reset with the following patch to procd:

--- a/watchdog.c 2019-09-06 06:13:52.293677306 +0000
+++ b/watchdog.c 2019-09-06 06:14:46.716987215 +0000
@@ -30,13 +30,13 @@

static struct uloop_timeout wdt_timeout;
static int wdt_fd = -1;
-static int wdt_drv_timeout = 30;
-static int wdt_frequency = 5;
+static int wdt_drv_timeout = 5;
+static int wdt_frequency = 1;
static bool wdt_magicclose = false;

void watchdog_ping(void)
{

  • DEBUG(4, "Ping\n");
  • DEBUG(4, "Ping watchdog\n");
    if (wdt_fd >= 0 && write(wdt_fd, "X", 1) < 0)
    ERROR("WDT failed to write: %m\n");
    }
    @@ -170,6 +170,7 @@
    watchdog_timeout_cb(&wdt_timeout);

    DEBUG(4, "Opened watchdog with timeout %ds\n", watchdog_timeout(0));

  • watchdog_ping();
    }

I'm sorry but I don't know enough C to provide a patch that includes a configurable watchdog timeout in the main menuconfig for procd.

@openwrt-bot
Copy link
Author

ynezz:

Can you please attach (unedited) dmesg log?

  • the default wdt_drv_timeout of procd is 30 seconds

indeed, this could be improved

  • wdt_frequency is not low enough to prevent the sun8i watchdog from resetting the SoC

your reasoning doesn't make much sense to me, as the watchdog should use default max. timeout of 16 seconds but it should be pinged every 5 seconds by procd, thus have no reason for reset

@openwrt-bot
Copy link
Author

halmartin:

//Can you please attach (unedited) dmesg log?//

It's hard for me to attach a dmesg log, since the board takes about 7 seconds to reach the uart console and the watchdog reboots it after 10 seconds.

I've attached the full boot log as captured from the serial console. I booted with procd in //init_debug=4//

//your reasoning doesn't make much sense to me, as the watchdog should use default max. timeout of 16 seconds but it should be pinged every 5 seconds by procd, thus have no reason for reset//

I agree, from the code it appears that procd should ping the watchdog every 5 seconds and since the sun8i watchdog timeout is 16 seconds, this shouldn't cause the board to reset.

However this is not the experience I've had. If I boot with //init=/bin/sh// the board will not reset. As soon as the unpatched procd starts though, there's only a few seconds before the watchdog resets the SoC.

Since I have only a few seconds of console access via uart, it's very difficult to debug the root cause.

Apologies, I can't seem to delete the uart log file I've attached. The first boot is my experimenting with //init=/bin/sh// and the following boot is with procd.

@openwrt-bot
Copy link
Author

openwrt-bot commented Sep 10, 2019

ynezz:

So from your dmesg it looks fine:

[ 0.544007] sunxi-wdt 1c20c90.watchdog: Watchdog enabled (timeout=16 sec, nowayout=0)
...
[ 2.398234] init: - watchdog -
[ 2.401334] init: Ping
[ 2.403774] init: Opened watchdog with timeout 30s
...
[ 6.264823] procd: Watchdog handover: fd=3
[ 6.269030] procd: - watchdog -
[ 6.272210] procd: Ping
[ 6.274725] procd: Opened watchdog with timeout 30s

I don't see a reason why it should reboot after ~10s. Can you please check the return value of the ioctl, something like this? Disabling the ioctl completely would be worth trying as well.

diff --git a/watchdog.c b/watchdog.c
index 9d770b4470a9..290432dd08c4 100644
--- a/watchdog.c
+++ b/watchdog.c
@@ -87,10 +87,16 @@ static void watchdog_close(void)
 
 static int watchdog_set_drv_timeout(void)
 {
+       int r;
+
        if (wdt_fd < 0)
                return -1;
 
-       return ioctl(wdt_fd, WDIOC_SETTIMEOUT, &wdt_drv_timeout);
+       r = ioctl(wdt_fd, WDIOC_SETTIMEOUT, &wdt_drv_timeout);
+       if (r < 0)
+               ERROR("WDT failed to set timeout: %m\n");
+
+       return r;
 }
 
 void watchdog_set_magicclose(bool val)

@openwrt-bot
Copy link
Author

halmartin:

I tried applying the above patch, and indeed it appears the ioctl is failing:

[ 2.505875] init: Console is alive
[ 2.509530] init: - watchdog -
[ 2.512638] init: WDT failed to set timeout: Invalid argument
[ 2.518439] init: Ping
[ 2.520882] init: Opened watchdog with timeout 30s

Full boot log attached.

@openwrt-bot
Copy link
Author

openwrt-bot commented Sep 13, 2019

ynezz:

indeed it appears the ioctl is failing

Ok, good to know, but this shouldn't influence the 16s watchdog timeout. Anyway, can you try it without the ioctl altogether, something like this:

static int watchdog_set_drv_timeout(void)
{
        if (wdt_fd < 0)
                return -1;
 
        return 0;
}

Full boot log attached.

Do I understand that log correctly, that it reboots after ~4 seconds? This is really strange and make no sense.

@openwrt-bot
Copy link
Author

openwrt-bot commented Sep 13, 2019

halmartin:

I think the issue is that //procd// is trying to set a timeout value above the maximum that //sunxi-wdt// supports, and thus the kernel is returning -EINVAL.

This is confirmed with the following patch:

--- a/watchdog.c	2019-09-05 21:12:50.000000000 +0000
+++ b/watchdog.c	2019-09-13 10:48:04.070425419 +0000
@@ -30,7 +30,7 @@
 
 static struct uloop_timeout wdt_timeout;
 static int wdt_fd = -1;
-static int wdt_drv_timeout = 30;
+int wdt_drv_timeout = 30;
 static int wdt_frequency = 5;
 static bool wdt_magicclose = false;
 
@@ -87,10 +87,19 @@
 
 static int watchdog_set_drv_timeout(void)
 {
+	int r;
 	if (wdt_fd < 0)
 		return -1;
 
-	return ioctl(wdt_fd, WDIOC_SETTIMEOUT, &wdt_drv_timeout);
+	//return ioctl(wdt_fd, WDIOC_SETTIMEOUT, &wdt_drv_timeout);
+	r = ioctl(wdt_fd, WDIOC_SETTIMEOUT, &wdt_drv_timeout);
+	while (r < 0) {
+		ERROR("WDT failed to set timeout: %m\n");
+		wdt_drv_timeout--;
+		r = ioctl(wdt_fd, WDIOC_SETTIMEOUT, &wdt_drv_timeout);
+	}
+	ERROR("Setting WDT timeout succeeded. Timeout: %d\n", wdt_drv_timeout);
+	return r;
 }
 
 void watchdog_set_magicclose(bool val)

I believe the kernel side is here: https://github.com/torvalds/linux/blob/master/drivers/watchdog/watchdog_core.c#L138-L139

You can see //procd// is receiving //-EINVAL// until wdt_drv_timeout is below the maximum supported by //sunxi-wdt//:

[    2.215693] Run /sbin/init as init process
[    2.493915] init: Console is alive
[    2.497578] init: - watchdog -
[    2.500686] init: WDT failed to set timeout: Invalid argument
[    2.506501] init: WDT failed to set timeout: Invalid argument
[    2.512281] init: WDT failed to set timeout: Invalid argument
[    2.518082] init: WDT failed to set timeout: Invalid argument
[    2.523861] init: WDT failed to set timeout: Invalid argument
[    2.529669] init: WDT failed to set timeout: Invalid argument
[    2.535467] init: WDT failed to set timeout: Invalid argument
[    2.541245] init: WDT failed to set timeout: Invalid argument
[    2.547041] init: WDT failed to set timeout: Invalid argument
[    2.552819] init: WDT failed to set timeout: Invalid argument
[    2.558615] init: WDT failed to set timeout: Invalid argument
[    2.564393] init: WDT failed to set timeout: Invalid argument
[    2.570188] init: WDT failed to set timeout: Invalid argument
[    2.575983] init: WDT failed to set timeout: Invalid argument
[    2.581764] init: Setting WDT timeout succeeded. Timeout: 16
[    2.587465] init: Ping
[    2.589902] init: Opened watchdog with timeout 16s

Unfortunately this doesn't completely resolve the issue for me, the watchdog still resets the SoC a few seconds later.

@openwrt-bot
Copy link
Author

openwrt-bot commented Sep 13, 2019

halmartin:

Even the following patch isn't enough to prevent the watchdog from resetting the SoC:

--- a/watchdog.c	2019-09-13 11:04:30.725915275 +0000
+++ b/watchdog.c	2019-09-13 11:09:31.010271821 +0000
@@ -30,8 +30,8 @@
 
 static struct uloop_timeout wdt_timeout;
 static int wdt_fd = -1;
-static int wdt_drv_timeout = 30;
-static int wdt_frequency = 5;
+int wdt_drv_timeout = 30;
+int wdt_frequency = 5;
 static bool wdt_magicclose = false;
 
 void watchdog_ping(void)
@@ -87,10 +87,25 @@
 
 static int watchdog_set_drv_timeout(void)
 {
+	int r;
 	if (wdt_fd < 0)
 		return -1;
 
-	return ioctl(wdt_fd, WDIOC_SETTIMEOUT, &wdt_drv_timeout);
+	//return ioctl(wdt_fd, WDIOC_SETTIMEOUT, &wdt_drv_timeout);
+	r = ioctl(wdt_fd, WDIOC_SETTIMEOUT, &wdt_drv_timeout);
+	while (r < 0) {
+		ERROR("WDT failed to set timeout: %m\n");
+		wdt_drv_timeout--;
+		r = ioctl(wdt_fd, WDIOC_SETTIMEOUT, &wdt_drv_timeout);
+	}
+	ERROR("Setting WDT timeout succeeded. Timeout: %d\n", wdt_drv_timeout);
+	if ( wdt_drv_timeout/4 > 1 ) {
+		wdt_frequency = wdt_drv_timeout/4;
+	} else {
+		wdt_frequency = 1;
+	}
+	ERROR("Set ping frequency to: %d\n", wdt_frequency);
+	return r;
 }
 
 void watchdog_set_magicclose(bool val)
@@ -170,6 +185,7 @@
 	watchdog_timeout_cb(&wdt_timeout);
 
 	DEBUG(4, "Opened watchdog with timeout %ds\n", watchdog_timeout(0));
+	watchdog_ping();
 }

It is stable when I set //wdt_frequency = 1//, but otherwise it seems to reset after a slightly longer lifetime. :(

@openwrt-bot
Copy link
Author

ynezz:

I think the issue is that procd is trying to set a timeout value above the maximum that sunxi-wdt supports, and thus the kernel is returning -EINVAL.

Sure, that's clear.

Unfortunately this doesn't completely resolve the issue for me, the watchdog still resets the SoC a few seconds later.

That's why I've asked you (two times already) to remove that WDIOC_SETTIMEOUT ioctl altogether.

@openwrt-bot
Copy link
Author

openwrt-bot commented Sep 13, 2019

halmartin:

Removing the ioctl does not resolve the issue.

Patch:

--- a/watchdog.c	2019-09-13 11:04:30.725915275 +0000
+++ b/watchdog.c	2019-09-13 11:46:53.658245623 +0000
@@ -90,7 +90,7 @@
 	if (wdt_fd < 0)
 		return -1;
 
-	return ioctl(wdt_fd, WDIOC_SETTIMEOUT, &wdt_drv_timeout);
+	return 0;
 }
 
 void watchdog_set_magicclose(bool val)```

It is still resetting shortly after boot. Log attached.

@openwrt-bot
Copy link
Author

ynezz:

Removing the ioctl does not resolve the issue.

Ok, so this basically confirms, that the problem is not related to procd. You'll probably get the same result with init=/bin/sh and then opening the watchdog device (cat /dev/watchdog should be probably enough), thus starting the watchdog.

It is still resetting shortly after boot. Log attached.

Then the problem is probably somewhere in the kernel and you should be able to figure it out with a bunch of printfs at the obvious places (sunxi_wdt_set_timeout, sunxi_wdt_start, sunxi_wdt_ping).

@ynezz ynezz added the target/sunxi pull request/issue for sunxi target label Mar 7, 2022
@ynezz
Copy link
Member

ynezz commented Mar 7, 2022

Not a bug in procd, too old.

@ynezz ynezz closed this as completed Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flyspray target/sunxi pull request/issue for sunxi target
Projects
None yet
Development

No branches or pull requests

2 participants