You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I have some questions regarding debugging of random reboot problem caused by kernel panic.
I'm getting random reboot when running a software inserting prerouting iptables rules in order to redirect traffic. My device is a x86_64 router. My openwrt release is compiled by myself from a forked openwrt source at https://github.com/coolsnowwolf/lede . Its kernel version is 4.19.108.
The software causing this problem is called OpenClash. It acts as a transparent proxy. It inserts prerouting rules to redirect all tcp traffic from computers in LAN to its own listening port and sends the traffic through a proxy.
Whenever this software is started, I get random reboots at 1-2 times/day. There was not any abnormal in saved log files because the crash happend in kernel and it caused reboot quickly. So I had to compile the NetConsole kernel module to capture the dmesg when crash happened. You can see the crash logs in crash_dmesg.txt.
The crash happens in the nf_xfrm_me_harder function. Decompiling the crash code, I get crash_code.png (The highlighted line is the crash instruction). The crash log mentions illegal memory access at 000000000000113c and the crash code shows that the kernel was accessing [rax+0x113c], so I think the problem is rax==0, which should not be happening.
After patched, function nf_xfrm_me_harder looks like
int nf_xfrm_me_harder(struct net *net, struct sk_buff *skb, unsigned int family)
{
struct flowi fl;
unsigned int hh_len;
struct dst_entry *dst;
struct sock *sk = skb->sk;
int err;
if (skb->dev && !dev_net(skb->dev)->xfrm.policy_count[XFRM_POLICY_OUT]) // <-------crash
return 0;
err = xfrm_decode_session(skb, &fl, family);
if (err < 0)
return err;
This means dev_net(skb->dev) sometimes equals to `NULL` .
I'm not familiar with the network mechanism in linux kernel, so I'm not sure how I can find the reason of it being NULL. Is this problem something we can safely ignore by checking its validity like this?
if (skb->dev && dev_net(skb->dev) && !dev_net(skb->dev)->xfrm.policy_count[XFRM_POLICY_OUT])
If not, can anyone give me some advice on how I can debug this problem? I understand this may be difficult for you developers to figure out what's happening by merely reading my description, especially when I'm not using the trunk OpenWrt. So I would love to dig it by myself.
You can see other information of my router in dmesg.txt.
Thanks a lot.
The text was updated successfully, but these errors were encountered:
Thank you for your help. I'll give it a try and tell you my result.
I have found another similar ticket, which was also caused by this patch (it was a previous version of this patch) and also happening when heavy redirect iptables rules is used: https://dev.archive.openwrt.org/ticket/18462.html
tete1030:
Hi, I have some questions regarding debugging of random reboot problem caused by kernel panic.
I'm getting random reboot when running a software inserting prerouting iptables rules in order to redirect traffic. My device is a x86_64 router. My openwrt release is compiled by myself from a forked openwrt source at https://github.com/coolsnowwolf/lede . Its kernel version is 4.19.108.
The software causing this problem is called OpenClash. It acts as a transparent proxy. It inserts prerouting rules to redirect all tcp traffic from computers in LAN to its own listening port and sends the traffic through a proxy.
Whenever this software is started, I get random reboots at 1-2 times/day. There was not any abnormal in saved log files because the crash happend in kernel and it caused reboot quickly. So I had to compile the NetConsole kernel module to capture the dmesg when crash happened. You can see the crash logs in crash_dmesg.txt.
The crash happens in the
nf_xfrm_me_harder
function. Decompiling the crash code, I get crash_code.png (The highlighted line is the crash instruction). The crash log mentions illegal memory access at 000000000000113c and the crash code shows that the kernel was accessing [rax+0x113c], so I think the problem is rax==0, which should not be happening.The source code causing crash actually locates in a patch, which is also included in trunk OpenWrt: https://github.com/openwrt/openwrt/blob/master/target/linux/generic/pending-4.19/616-net_optimize_xfrm_calls.patch
After patched, function
nf_xfrm_me_harder
looks likeint nf_xfrm_me_harder(struct net *net, struct sk_buff *skb, unsigned int family)
{
struct flowi fl;
unsigned int hh_len;
struct dst_entry *dst;
struct sock *sk = skb->sk;
int err;
dev_net(skb->dev)
sometimes equals to `NULL` .I'm not familiar with the network mechanism in linux kernel, so I'm not sure how I can find the reason of it being NULL. Is this problem something we can safely ignore by checking its validity like this?
if (skb->dev && dev_net(skb->dev) && !dev_net(skb->dev)->xfrm.policy_count[XFRM_POLICY_OUT])
If not, can anyone give me some advice on how I can debug this problem? I understand this may be difficult for you developers to figure out what's happening by merely reading my description, especially when I'm not using the trunk OpenWrt. So I would love to dig it by myself.
You can see other information of my router in dmesg.txt.
Thanks a lot.
The text was updated successfully, but these errors were encountered: