New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FS#766 - Intermittent SIGSEGV crash of dnsmasq-full #5741
Comments
guidosarducci: After a little more investigation, this is definitely a bug that also exists in the latest lede/master which uses dnsmasq-2.77test5. It is easily triggered via a common mozilla DNS query, and appears related to using split DNS and DNSSEC. A minimal, standalone dnsmasq.conf that is vulnerable: Removing either of these config lines results in no SIGSEGV: The bug can be triggered from a DNS client simply (e.g.a blank Firefox page!): I also captured a dnsmasq core file from my router and ran it through gdb: The dnsmasq config file, log file, and client log are attached. I'm not sure I can go any further, so would appreciate the dnsmasq package maintainer taking a look and advising. Thanks! |
None: I've forwarded your message including the replication procedure to the dnsmasq list. I was able to replicate with ease following your instruction, in fact all I needed to do was add server=/cloudfront.net/50.22.147.234 to my existing config. This makes me think it's a particular type of server that's provoking the issue. Let's see what happens Kevin |
guidosarducci: Hehe, very optimistic of you to close this... I saw the update from Simon Kelley (thank you!) on the Dnsmasq-discuss mailing list and built an updated LEDE dnsmasq-2.77rc1 package to test. (see required patch attached) The prior minimal test-case passed, but the original production config file now creates a horrible SIGSEGV crash-loop (log attached): Stack trace indicates something to do with logging: This turns out to be easy to reproduce. Simply add I attached all the relevant logs, configs and patches. |
dedeckeh: The bug record has been closed as this is an upstream issue in the dnsmasq project; meaning the issue has to be reported on the dnsmasq mailing list and needs to be fixed by the dnsmasq maintainer. Therefore it makes no sense to keep this bug record open on the Lede project. |
guidosarducci: @Kevin Darbyshire-Bryant: Since these crash-loop bugs are fairly serious and difficult to troubleshoot, do you expect someone would be able to back-port the fixes to LEDE-17.01 once Simon releases dnsmasq-2.77 in the near future? Thanks, @hans Dedecker: Your suggestions that the issue is unrelated to LEDE and doesn't belong here are misleading, unhelpful, and serve to dissuade others from volunteering their time to improve LEDE. I'd like to think that is not your intention. Am I wrong to think so? |
None: No problem. Thanks for doing the hard work with gdb! I've already got a pull request on standby for when Simon tags rc2 - we'll see how quickly the release gets released and I guess someone will make a decision on how to get that into LEDE17.01. |
dedeckeh: @tony Ambardar: |
ckujau: For the record, this is still an issue with 17.01.2 and Dnsmasq version 2.77:
kernel: [ 2860.890789]
kernel: [ 2860.890789] do_page_fault(): sending SIGSEGV to dnsmasq for invalid write access to 00552000
kernel: [ 2860.899402] epc = 77cd488c in libc.so[77c62000+92000]
kernel: [ 2860.904552] ra = 00406c41 in dnsmasq[400000+21000]
kernel: [ 2860.909537]
I came across this one while playing around with //dnseval// from the [[https://github.com/farrokhi/dnsdiag|dnsdiag]] package. Simply calling //dnseval foo// was enough to make //dnsmasq// crash :-| But, as this crashes the lastest git checkout from //dnsmasq// too, I If somebody wants to take a stab at the MIPS core dump (attached), please do, as I don't have a LEDE build environment set up yet. |
guidosarducci:
I've just noticed seeing the following several times within the last day or so:
[1461327.495159] do_page_fault(): sending SIGSEGV to dnsmasq for invalid read access from 00000000
[1461327.504081] epc = 0040f28d in dnsmasq[400000+2c000]
[1461327.509252] ra = 0040f273 in dnsmasq[400000+2c000]
I'm running the latest LEDE stable, with all updates applied as of 2017-05-05, and have been using DNSSEC for a while:
The most recent upgrade in the same time frame was to odhcpd-2017-04-28-9268ca65-1.
After several restarts by procd and subsequent crashes, dnsmasq will be disabled, leaving me without name resolution until I notice and restart manually.
To get a little more info, I rebuilt the stable LEDE and dnsmasq-full with a "-g" CFLAG option. After installing this package, I captured the following crash details:
[1562749.817613] do_page_fault(): sending SIGSEGV to dnsmasq for invalid read access from 00000000
[1562749.826522] epc = 0040f295 in dnsmasq[400000+2c000]
[1562749.831681] ra = 0040f27b in dnsmasq[400000+2c000]
Checking further with gdb yields:
(gdb) info line *0x0040f27b
Line 278 of "forward.c" starts at address 0x40f275 <forward_query+204>
and ends at 0x40f281 <forward_query+216>.
(gdb) info line *0x0040f295
Line 281 of "forward.c" starts at address 0x40f295 <forward_query+236>
and ends at 0x40f29b <forward_query+242>.
And the relevant source (forward.c) looks like:
275 blockdata_retrieve(forward->stash, forward->stash_len, (void *)header);
276 plen = forward->stash_len;
277
278 if (find_pseudoheader(header, plen, NULL, &pheader, &is_sign, NULL) && !is_sign)
279 PUTSHORT(SAFE_PKTSZ, pheader);
280
281 if (forward->sentto->addr.sa.sa_family == AF_INET)
282 log_query(F_NOEXTRA | F_DNSSEC | F_IPV4, "retry", (struct all_addr *)&forward->sentto->addr.in.sin_addr, "dnssec");
283 #ifdef HAVE_IPV6
284 else
285 log_query(F_NOEXTRA | F_DNSSEC | F_IPV6, "retry", (struct all_addr
Any similar reports from others? I'll keep monitoring in the meantime but this is difficult to reproduce on demand. It seems to happen more with web browsing.
The text was updated successfully, but these errors were encountered: