Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#251 - sending SIGSEGV to dnsmasq for invalid read access from 00000000 #5482

Closed
openwrt-bot opened this issue Oct 26, 2016 · 36 comments
Closed
Labels

Comments

@openwrt-bot
Copy link

koa:

it's a message that shows up frequently on a fresh install of lede's 4.4.27 on wzr-hp-g300nh -> Linux robokoa 4.4.27 #0 Wed Oct 26 10:37:47 2016 mips GNU/Linux -- steps to reproduce are unknown, i'm unsure of the initial reason for this error

[28082.882471] do_page_fault(): sending SIGSEGV to dnsmasq for invalid read access from 00000000
[28082.891127] epc = 00439ff1 in busybox[400000+4a000]
[28082.896083] ra = 00439fe5 in busybox[400000+4a000]
[28082.901018]

@openwrt-bot
Copy link
Author

IronicSven:

I've got the same message on my TP-Link TL-WR1043N/ND v1 with r2109.

@openwrt-bot
Copy link
Author

pmalecka:

Same here on mikrotik 493g - r2155

The sigsegv also happens for:

Sun Nov 13 02:27:07 2016 kern.info kernel: [49320.120778]
Sun Nov 13 02:27:07 2016 kern.info kernel: [49320.120778] do_page_fault(): sending SIGSEGV to sysntpd for invalid read access from 00000000
Sun Nov 13 02:27:07 2016 kern.info kernel: [49320.223021] epc = 00439ff1 in busybox[400000+4a000]
Sun Nov 13 02:27:07 2016 kern.info kernel: [49320.281722] ra = 00439fe5 in busybox[400000+4a000]
Sun Nov 13 02:27:07 2016 kern.info kernel: [49320.340351]

Sun Nov 13 08:26:00 2016 kern.info kernel: [70852.962664]
Sun Nov 13 08:26:00 2016 kern.info kernel: [70852.962664] do_page_fault(): sending SIGSEGV to hotplug-call for invalid read access from 00000000
Sun Nov 13 08:26:00 2016 kern.info kernel: [70853.070137] epc = 00439ff1 in busybox[400000+4a000]
Sun Nov 13 08:26:00 2016 kern.info kernel: [70853.128724] ra = 00439fe5 in busybox[400000+4a000]
Sun Nov 13 08:26:00 2016 kern.info kernel: [70853.187376]

@openwrt-bot
Copy link
Author

mkresin:

It seams to me that the SIGSEGV is related to busybox since the return address (ra) points always to busybox and always to the same position in busybox.

Would any of you please compile an Image with the following extra option in menuconfig:

Base system ---> <*> busybox ---> [*] Customize busybox options ---> Busybox Settings ---> Debugging Options ---> [*] Build BusyBox with extra Debugging symbols

This **might ** print the function which is called in busybox instead of the - not really helpful - position of the function in the binary.

@openwrt-bot
Copy link
Author

mamarley:

I did a build with that option and it has been running for a day or so now (more than long enough to reproduce it in the past) and so far there are no segfaults at all. Stupid Heisenbug…

@openwrt-bot
Copy link
Author

NeoRaider:

I'm seeing the same issue, unfortunately also without debug symbols. I haven't had a closer look yet, but here's some GDB output:

#0 0x00439ff1 in nonblock_immune_read () (gdb) bt #0 0x00439ff1 in nonblock_immune_read () #1 0x0041bc23 in argstr () #2 0x0041bd89 in expandarg () #3 0x0041e7c3 in evalfor () #4 0x0041dc37 in evaltreenr () #5 0x0041dc37 in evaltreenr () #6 0x0041e117 in cmdloop () #7 0x0041f8e3 in ash_main () #8 0x00407879 in run_applet_no_and_exit () #9 0x004078f1 in main () (gdb) info registers zero at v0 v1 a0 a1 a2 a3 R0 00000000 80480000 00000000 fffffffc 00000000 7fda119c 00000080 00000000 t0 t1 t2 t3 t4 t5 t6 t7 R8 00000000 80f7fa80 00000001 00000000 8104217c 00000024 804a0000 ffffff80 s0 s1 s2 s3 s4 s5 s6 s7 R16 00000000 00000003 00000003 0040789d 77292000 77292000 77294500 77295e94 t8 t9 k0 k1 gp sp s8 ra R24 00000000 772144f8 00000000 00000000 7729b2b0 7fda1108 00000000 00439fe5 sr lo hi bad cause pc 0000dc13 02400000 000f4537 00000000 00800008 00439ff1 fsr fir 00000000 00000000 (gdb) disas Dump of assembler code for function nonblock_immune_read: 0x00439fd5 <+0>: save a0-a2,48,ra,s0-s1 0x00439fd9 <+4>: move s1,a0 0x00439fdb <+6>: lw a2,56(sp) 0x00439fdd <+8>: lw a1,52(sp) 0x00439fdf <+10>: jal 0x4086c1 0x00439fe3 <+14>: move a0,s1 0x00439fe5 <+16>: slti v0,0 0x00439fe7 <+18>: move s0,v0 0x00439fe9 <+20>: bteqz 0x43a00d 0x00439feb <+22>: jal 0x448ff1 <__errno_location@mips16plt> 0x00439fef <+26>: nop => 0x00439ff1 <+28>: lw v0,0(v0) 0x00439ff3 <+30>: cmpi v0,11 0x00439ff5 <+32>: btnez 0x43a00d 0x00439ff7 <+34>: li v0,1 0x00439ff9 <+36>: li a2,1 0x00439ffb <+38>: move v1,sp 0x00439ffd <+40>: neg a2 0x00439fff <+42>: li a1,1 0x0043a001 <+44>: addiu a0,sp,24 0x0043a003 <+46>: sw s1,24(sp) 0x0043a005 <+48>: jal 0x43a671 0x0043a009 <+52>: sh v0,28(v1) 0x0043a00b <+54>: b 0x439fdb 0x0043a00d <+56>: move v0,s0 0x0043a00f <+58>: restore 48,ra,s0-s1 0x0043a011 <+60>: jrc ra End of assembler dump.

@openwrt-bot
Copy link
Author

NeoRaider:

It is indeed a Heisenbug, any change to the code to add debug output makes it go away. More weirdness (if the information from the core dump I got is accurate):

  • The whole function is aligned to odd addresses; I've never seen this before. Is this even allowed? More weirdly, gdb dumps the addresses like this, while objdump shows the whole function shifted by one byte, so the addresses are even (it's MIPS16 code, so it is not aligned to 4 bytes)
  • If the program counter is accurate (which I'm not sure about), the only way a NULL dereference can happen here is if __errno_location() has returned NULL (or something even weirder like register corruption). This should not be possible.

I can reproduce the issue fairly easily on a TL-WR1043 v1 by calling "/etc/init.d/network restart" when dnsmasq is restarted by this, but I haven't seen it on a TL-WR841 v7. Either this is hardware-dependent, or something changed because I cleaned my tree when changing the models; I'll have to check again when I have both devices at the same place.

@openwrt-bot
Copy link
Author

NeoRaider:

I'm not much closer to the root of this issue, but at least I'm a bit less confused.

  • I've found out that the 1 bit of the program counter enables MIPS16 mode, thus explaning the "odd addresses"
  • I've verified that the issue is indeed that __errno_location() returns NULL
  • /etc/init.d/dnsmasq reload will segfault on my TL-WR1043 v1 in about 1 out of 3 runs

I've been unable to test this command in gdb (it just hangs). When run in strace, the command doesn't ever segfault.

I'll check with the musl people if they have any idea what is happening.

@openwrt-bot
Copy link
Author

NeoRaider:

Further increasing severity, as this doesn't only affect init scripts, but all shell scripts using shell expansion ($() or backticks). While testing, I've experienced several crashs of sysupgrade.

Further results of my investigation:

  • errno_location is not returning NULL after all; in fact, errno_location() is not called at all. This seems correct; the branch calling errno_location() is only called when safe_read() fails, and it doesn't look like safe_read() fails before the crash. The value of the ra register still holds the return address from safe_read().
  • The whole thing is very fragile; adding a single "nop" instruction before the "jal __errno_location" makes the crash go away.
  • The Program Counter somehow ends up at 0x00439ff1; it is unclear how it gets there. The preceeding instructions have not been executed. While a random jump after memory corruption could be a possible cause, the backtrace up to nonblock_immune_read() looks sane

I'm currently looking into possible kernel-side causes for this issue.

@openwrt-bot
Copy link
Author

None:

Adding this as a 'me too'.

[ 200.009789] do_page_fault(): sending SIGSEGV to odhcpd for invalid read access from 00000000
[ 200.018396] epc = 00407ec1 in odhcpd[400000+d000]
[ 200.023198] ra = 004063bf in odhcpd[400000+d000]

Archer c7 v2 - linux 4.4.34

Can't re-create at will, but occurs 2-3 times every reboot. Let me know how I can be of assistance with running tests etc.

@openwrt-bot
Copy link
Author

None:

Don't know if this is of any help, but I got a 'strace':

epoll_pwait(3, [], 10, 2000, NULL, 16) = 0 clock_gettime(CLOCK_MONOTONIC, {118, 496503244}) = 0 clock_gettime(CLOCK_MONOTONIC, {118, 496956153}) = 0 clock_gettime(CLOCK_MONOTONIC, {118, 497110062}) = 0 clock_gettime(CLOCK_MONOTONIC, {118, 497561396}) = 0 epoll_pwait(3, [{EPOLLIN, {u32=2002960268, u64=8602648846247395328}}], 10, 2000, NULL, 16) = 1 recvmsg(18, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\0\5\0\23\0\0\0\0\0\0\0P", iov_len=12}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 12 poll([{fd=18, events=POLLIN}], 1, -1) = 1 ([{fd=18, revents=POLLIN}]) recvmsg(18, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\3\0\0\10B\353\226\255\4\0\0\24ubus.object.add\0\7\0\0000"..., iov_len=76}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 76 sendmsg(18, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\0\1\0\23\0\0\0\0", iov_len=8}, {iov_base="\0\0\0\24\1\0\0\10\0\0\0\0\3\0\0\10B\353\226\255", iov_len=20}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, 0) = 28 recvmsg(18, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) clock_gettime(CLOCK_MONOTONIC, {118, 667711369}) = 0 clock_gettime(CLOCK_MONOTONIC, {118, 667914580}) = 0 epoll_pwait(3, [{EPOLLIN, {u32=4313216, u64=18525121660583936}}], 10, 1830, NULL, 16) = 1 recvmsg(13, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000400}, msg_namelen=28->12, msg_iov=[{iov_base=[{{len=116, type=0x18 /* NLMSG_??? */, flags=0, seq=0, pid=0}, "\n\10\0\0\377\3\0\1\0\0\0\0\0\10\0\17\0\0\0\377\0\24\0\1\377\0\0\0\0\0\0\0"...}, {{len=0, type=0x62e3 /* NLMSG_??? */, flags=NLM_F_REQUEST|NLM_F_MULTI|NLM_F_ACK|NLM_F_ECHO|NLM_F_DUMP_INTR|NLM_F_DUMP_FILTERED|0x27c0, seq=4272922192, pid=0}}], iov_len=8192}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT) = 116 recvmsg(13, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000400}, msg_namelen=28->12, msg_iov=[{iov_base=[{{len=116, type=0x18 /* NLMSG_??? */, flags=0, seq=0, pid=0}, "\n@\0\0\376\2\0\1\0\0\0\0\0\10\0\17\0\0\0\376\0\24\0\1\376\200\0\0\0\0\0\0"...}, {{len=0, type=0x62e3 /* NLMSG_??? */, flags=NLM_F_REQUEST|NLM_F_MULTI|NLM_F_ACK|NLM_F_ECHO|NLM_F_DUMP_INTR|NLM_F_DUMP_FILTERED|0x27c0, seq=4272922192, pid=0}}], iov_len=8192}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT) = 116 recvmsg(13, {msg_namelen=28}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) clock_gettime(CLOCK_MONOTONIC, {118, 707666461}) = 0 clock_gettime(CLOCK_MONOTONIC, {118, 707849995}) = 0 epoll_pwait(3, [{EPOLLIN, {u32=4313216, u64=18525121660583936}}], 10, 1790, NULL, 16) = 1 recvmsg(13, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000100}, msg_namelen=28->12, msg_iov=[{iov_base=[{{len=72, type=0x14 /* NLMSG_??? */, flags=0, seq=0, pid=0}, "\n\200\0\0\0\0\0\n\0\24\0\1*\2\f\177\22 \277+\0\0\0\0\0\0\0\376\0\24\0\6"...}, {{len=2359308, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\24\0\0\0\0\0\0\0\0"...}], iov_len=8192}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT) = 72 clock_gettime(CLOCK_MONOTONIC, {119, 188109722}) = 0 sendto(7, {{len=24, type=0x16 /* NLMSG_??? */, flags=NLM_F_REQUEST|0x300, seq=1, pid=0}, "\n\0\0\0\0\0\0\n"}, 24, 0, NULL, 0) = 24 recvfrom(7, [{{len=72, type=0x14 /* NLMSG_??? */, flags=NLM_F_MULTI, seq=1, pid=2601}, "\n\200\200\376\0\0\0\1\0\24\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\24\0\6"...}, {{len=72, type=0x14 /* NLMSG_??? */, flags=NLM_F_MULTI, seq=1, pid=2601}, "\n\200\0\0\0\0\0\n\0\24\0\1*\2\f\177\22 \277+\0\0\0\0\0\0\0\376\0\24\0\6"...}, {{len=72, type=0x14 /* NLMSG_??? */, flags=NLM_F_MULTI, seq=1, pid=2601}, "\n@\200\375\0\0\0\n\0\24\0\1\376\200\0\0\0\0\0\0\26\314 \377\376\276\2112\0\24\0\6"...}, {{len=72, type=0x14 /* NLMSG_??? */, flags=NLM_F_MULTI, seq=1, pid=2601}, "\n@\200\375\0\0\0\22\0\24\0\1\376\200\0\0\0\0\0\0\26\314 \377\376\276\2111\0\24\0\6"...}, {{len=72, type=0x14 /* NLMSG_??? */, flags=NLM_F_MULTI, seq=1, pid=2601}, "\n@\300\375\0\0\0\23\0\24\0\1\376\200\0\0\0\0\0\0\26\314 \377\376\276\2110\0\24\0\6"...}], 8192, 0, NULL, NULL) = 360 recvfrom(7, {{len=20, type=NLMSG_DONE, flags=NLM_F_MULTI, seq=1, pid=2601}, "\0\0\0\0"}, 8192, 0, NULL, NULL) = 20 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} --- +++ killed by SIGSEGV +++

@openwrt-bot
Copy link
Author

NeoRaider:

What exact options did you use for this strace? Is contains lots of syscalls that are not from busybox.

@openwrt-bot
Copy link
Author

None:

it was an 'strace -p' of odhcpd which is the thing that gets killed on a 'regular' basis. Oh hell, I've just noticed odhcpd was bumped recently.... this might be a red herring.

@openwrt-bot
Copy link
Author

NeoRaider:

Most likely that is a different bug. All reports in this ticket are about busybox (ash) crashing while running shell scripts. Some strings like "dnsmasq" appear in the logs as that are the names of the scripts (e.g. /etc/init.d/dnsmasq).

@openwrt-bot
Copy link
Author

IronicSven:

I'm testing latest trunk r0+2321 on my TP-Link TL-WR1043N/ND v1 since a few hours and I can't reproduce the SIGSEGV messages anymore :)

@openwrt-bot
Copy link
Author

IronicSven:

I just flashed r0+2369 and the SIGSEGV messages are back.

@openwrt-bot
Copy link
Author

fuzzle:

i build several lede in the last days - and at least on tplink 841 i never see this. (with some days uptime)
just something on gluon like this
Sat Dec 3 14:46:38 2016 daemon.crit dnsmasq[2095]: unknown user or group: dnsmasq
Sat Dec 3 14:46:38 2016 daemon.crit dnsmasq[2095]: FAILED to start up

..
some dnsmasq is still running
1016 root 1116 S /usr/sbin/dnsmasq -x /var/run/gluon-wan-dnsmasq.pid -u root -i lo -p 54 -h -r /var/gluon/wan-dnsmasq/resolv.conf

..
logread
Sat Dec 3 14:46:34 2016 daemon.crit dnsmasq[1985]: unknown user or group: dnsmasq
Sat Dec 3 14:46:34 2016 daemon.crit dnsmasq[1985]: FAILED to start up
Sat Dec 3 14:46:35 2016 user.notice firewall: Reloading firewall due to ifup of wan6 (br-wan)
Sat Dec 3 14:46:35 2016 daemon.warn fastd[1472]: sendmsg: Operation not permitted
Sat Dec 3 14:46:35 2016 daemon.warn fastd[1472]: sendmsg: Operation not permitted
Sat Dec 3 14:46:37 2016 daemon.info dnsmasq[1016]: reading /var/gluon/wan-dnsmasq/resolv.conf
Sat Dec 3 14:46:37 2016 daemon.info dnsmasq[1016]: using nameserver fd00::a96:d7ff:fe5d:1026#53
Sat Dec 3 14:46:37 2016 daemon.info dnsmasq[1016]: using nameserver 192.168.0.1#53
Sat Dec 3 14:46:37 2016 daemon.crit dnsmasq[2040]: unknown user or group: dnsmasq
Sat Dec 3 14:46:37 2016 daemon.crit dnsmasq[2040]: FAILED to start up
Sat Dec 3 14:46:38 2016 daemon.crit dnsmasq[2095]: unknown user or group: dnsmasq
Sat Dec 3 14:46:38 2016 daemon.crit dnsmasq[2095]: FAILED to start up
Sat Dec 3 14:46:38 2016 daemon.info procd: Instance dnsmasq::cfg02411c s in a crash loop 6 crashes, 0 seconds since last crash

this particular node:
Linux version 4.4.32 (fffr@v32412.1blu.de) (gcc version 5.4.0 (LEDE GCC 5.4.0 r2187+6) ) #0 Tue Sep 27 01:55:55 2016
based on https://kau.toke.dk/git/lede/commit/?id=18726b0ed2be546d1d2503c903d7d069ae5522d5

@openwrt-bot
Copy link
Author

NeoRaider:

fuzzle, that's not even close to the issues reported in this ticket. As mentioned in earlier comments, this ticket doesn't have to do anything with dnsmasq, but is about a segfault in busybox.

Also, please don't report Gluon bugs in the LEDE tracker.

@openwrt-bot
Copy link
Author

NeoRaider:

Small update:

While I mostly see this issue on a TL-WR1043 v1, I've also observed it on a TL-WR841 v9 at least once; so it seems the bug is not hardware-specific after all (at least not limited to specific SoCs).

Unfortunately, I've been busy with other things last week, so I haven't been able to continue debugging the issue.

@openwrt-bot
Copy link
Author

nbd:

Please test the latest version

@openwrt-bot
Copy link
Author

IronicSven:

I've tested a few versions since last weekend and couldn't reproduce this issue on my 1043nd v1 anymore. I think it's fixed.

@openwrt-bot
Copy link
Author

NeoRaider:

Still reproducible with current master (r2695-c9c68c71776).

@openwrt-bot
Copy link
Author

mjw99:

Just a "Me too". I am seeing this with a NETGEAR WNR2000v1 on r2449-7c47f43:
[30131.723691] do_page_fault(): sending SIGSEGV to dnsmasq for invalid read access from 00000000
[30131.732296] epc = 00439ff1 in busybox[400000+4a000]
[30131.737282] ra = 00439fe5 in busybox[400000+4a000]

@openwrt-bot
Copy link
Author

IronicSven:

I can't reproduce this issue since weeks. I've been testing a TL-WR1043ND v1, TL-WR1043ND v2 and Archer C7 during this period.

Is it possible your images are selfbuilt and a make dirclean or make distclean might help?

@openwrt-bot
Copy link
Author

nbd:

If you're still affected by this bug, please try the latest version

@openwrt-bot
Copy link
Author

mamarley:

I'm not seeing this on my UAP-LR anymore.

@openwrt-bot
Copy link
Author

mjw99:

I am no longer seeing this on a NETGEAR WNR2000v1 with 17.01-SNAPSHOT, r3045-e038c60.

@openwrt-bot
Copy link
Author

guidosarducci:

I've just noticed seeing the following several times within the last day or so:
[1461327.495159] do_page_fault(): sending SIGSEGV to dnsmasq for invalid read access from 00000000
[1461327.504081] epc = 0040f28d in dnsmasq[400000+2c000]
[1461327.509252] ra = 0040f273 in dnsmasq[400000+2c000]

I'm running the latest LEDE stable, with all updates applied as of 2017-05-05:

  • LEDE Reboot 17.01.1 r3316-7eb58cf109
  • D-Link DIR-835 rev. A1
  • dnsmasq-full - 2.76-6

The most recent upgrade in the same time frame was to odhcpd-2017-04-28-9268ca65-1. And DNSSEC is enabled.

After a few restart attempts, dnsmasq has continued to run since then.

@openwrt-bot
Copy link
Author

guidosarducci:

The SIGSEGV crashes continue to happen periodically, and I may have been missing them due to dnsmasq being restarted by procd.

To get a little more info, I rebuilt the stable LEDE and dnsmasq-full with a "-g" CFLAG option. After installing this package, I captured the following crash details:

[1562749.817613] do_page_fault(): sending SIGSEGV to dnsmasq for invalid read access from 00000000
[1562749.826522] epc = 0040f295 in dnsmasq[400000+2c000]
[1562749.831681] ra = 0040f27b in dnsmasq[400000+2c000]

Checking further with gdb yields:
(gdb) info line *0x0040f27b
Line 278 of "forward.c" starts at address 0x40f275 <forward_query+204>
and ends at 0x40f281 <forward_query+216>.

(gdb) info line *0x0040f295
Line 281 of "forward.c" starts at address 0x40f295 <forward_query+236>
and ends at 0x40f29b <forward_query+242>.

And the relevant source (forward.c) looks like:
275 blockdata_retrieve(forward->stash, forward->stash_len, (void *)header);
276 plen = forward->stash_len;
277
278 if (find_pseudoheader(header, plen, NULL, &pheader, &is_sign, NULL) && !is_sign)
279 PUTSHORT(SAFE_PKTSZ, pheader);
280
281 if (forward->sentto->addr.sa.sa_family == AF_INET)
282 log_query(F_NOEXTRA | F_DNSSEC | F_IPV4, "retry", (struct all_addr *)&forward->sentto->addr.in.sin_addr, "dnssec");
283 #ifdef HAVE_IPV6
284 else
285 log_query(F_NOEXTRA | F_DNSSEC | F_IPV6, "retry", (struct all_addr

Any similar reports from others? I'll keep monitoring in the meantime...

@openwrt-bot
Copy link
Author

NeoRaider:

This ticket is specifically about a crash in busybox, often seen while running the dnsmasq init script (but also in other shell scripts). Your issue is a crash in dnsmasq itself, please open a new ticket.

@openwrt-bot
Copy link
Author

guidosarducci:

Sure, new ticket created. I'd also like to suggest changing the unfortunately misleading title of this ticket if possible, since it matches my own issue.

@openwrt-bot
Copy link
Author

ckujau:

For the record, this is still an issue with 17.01.2 and Dnsmasq version 2.77:

kernel: [ 2860.890789] kernel: [ 2860.890789] do_page_fault(): sending SIGSEGV to dnsmasq for invalid write access to 00552000 kernel: [ 2860.899402] epc = 77cd488c in libc.so[77c62000+92000] kernel: [ 2860.904552] ra = 00406c41 in dnsmasq[400000+21000] kernel: [ 2860.909537]

I came across this one while playing around with //dnseval// from the [[https://github.com/farrokhi/dnsdiag|dnsdiag]] package. Simply calling //dnseval foo// was enough to make //dnsmasq// crash :-|

But, as this crashes the lastest git checkout from //dnsmasq// too, I shall report this upstream, of course.

@openwrt-bot
Copy link
Author

NeoRaider:

ckujau: please open a new ticket for dnsmasq, I believe your issue hasn't been reported yet.

As mentioned in an earlier comment, this ticket doesn't have to do anything with dnsmasq at all; it is about a segfault in busybox that just happened to occur while running a shell script called "dnsmasq", leading to a somewhat confusing error message.

@openwrt-bot
Copy link
Author

ckujau:

I think this has been reported in [[https://bugs.lede-project.org/index.php?do=details&task_id=766|#766]], sorry for the mixup.

@openwrt-bot
Copy link
Author

NeoRaider:

#766 also looks like an independent issue. While both your crash and #766 are a segfault of dnsmasq, your crash happens in libc.so, while #766 is in dnsmasq itself.

@openwrt-bot
Copy link
Author

marcin1j:

Christian

I reported the issue you mentioned as FS#994. What's your target and device the problem occurs on?

@openwrt-bot
Copy link
Author

ckujau:

I was able to reproduce this on x86 too and [[http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2017q3/011704.html|bisected]] it to upstream commit 0xfa78573778, so it was not LEDE or architecture specific and I should've have reported this upstream from the start. But yes, the fix mentioned in FS#994 is the same [[http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2017q3/011714.html|posted]] to the //dnsmasq-discuss// list. I haven't had a chance to verify it yet (the previous band-aid patch worked), will report back.

For completeness' sake: my target is ar71xx (a TP-Link AC1750 Wifi router).

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant