Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#1765 - dnsmasq-full fails to listen on dhcp socket after reboot #8482

Open
openwrt-bot opened this issue Aug 10, 2018 · 9 comments
Open
Labels
flyspray release/19.07 pull request/issue targeted (also) for OpenWrt 19.07 release

Comments

@openwrt-bot
Copy link

jtlayton:

When I restart my router, dnsmasq comes up and works for IPv6, but fails to serve out dhcp for ipv4.

Supply the following if possible:

  • Device problem occurs on

Ubiquiti Edgerouter Lite

  • Software versions of OpenWrt/LEDE release, packages, etc.

OpenWrt 18.06.0, r7188-b0b5c64c22
dnsmasq-full - 2.80test3-1

  • Steps to reproduce

Reboot the router. When it comes up just after reboot, it fails to respond to DHCP requests. lsof shows that it's not listening on port 67, and does not have /tmp/dhcp.leases open:

root@junction:~# lsof -p 1792 -P
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
dnsmasq 1792 dnsmasq cwd DIR 0,15 3488 1166 /
dnsmasq 1792 dnsmasq rtd DIR 0,15 3488 1166 /
dnsmasq 1792 dnsmasq txt REG 7,0 349744 1526 /usr/sbin/dnsmasq
dnsmasq 1792 dnsmasq mem REG 7,0 16776 189 /usr/lib/libmnl.so.0.2.0
dnsmasq 1792 dnsmasq mem REG 8,2 117320 1033 /lib/libgcc_s.so.1
dnsmasq 1792 dnsmasq mem REG 8,2 25440 1114 /lib/libubus.so
dnsmasq 1792 dnsmasq mem REG 8,2 46168 1029 /lib/libubox.so
dnsmasq 1792 dnsmasq mem REG 7,0 466112 696 /usr/lib/libgmp.so.10.3.2
dnsmasq 1792 dnsmasq mem REG 7,0 206656 702 /usr/lib/libhogweed.so.4.3
dnsmasq 1792 dnsmasq mem REG 7,0 231888 704 /usr/lib/libnettle.so.6.3
dnsmasq 1792 dnsmasq mem REG 7,0 22232 216 /usr/lib/libnfnetlink.so.0.2.0
dnsmasq 1792 dnsmasq mem REG 7,0 111864 561 /usr/lib/libnetfilter_conntrack.so.3.6.0
dnsmasq 1792 dnsmasq mem REG 8,2 650896 1012 /lib/libc.so
dnsmasq 1792 dnsmasq mem REG 0,14 23 713 /tmp/TZ
dnsmasq 1792 dnsmasq 0u CHR 1,3 0t0 1226 /dev/null
dnsmasq 1792 dnsmasq 1u CHR 1,3 0t0 1226 /dev/null
dnsmasq 1792 dnsmasq 2u CHR 1,3 0t0 1226 /dev/null
dnsmasq 1792 dnsmasq 3u netlink 0t0 3561 ROUTE
dnsmasq 1792 dnsmasq 4u IPv4 3568 0t0 UDP junction:53
dnsmasq 1792 dnsmasq 5u IPv4 3569 0t0 TCP junction:53 (LISTEN)
dnsmasq 1792 dnsmasq 6u IPv4 3570 0t0 UDP localhost:53
dnsmasq 1792 dnsmasq 7u IPv4 3571 0t0 TCP localhost:53 (LISTEN)
dnsmasq 1792 dnsmasq 8u IPv6 3572 0t0 UDP localhost:53
dnsmasq 1792 dnsmasq 9u IPv6 3573 0t0 TCP localhost:53 (LISTEN)
dnsmasq 1792 dnsmasq 10r FIFO 0,8 0t0 3574 pipe
dnsmasq 1792 dnsmasq 11w FIFO 0,8 0t0 3574 pipe
dnsmasq 1792 dnsmasq 12u unix 0x800000041d16ca00 0t0 3576 type=DGRAM
dnsmasq 1792 dnsmasq 13u a_inode 0,9 0 14 [eventpoll]
dnsmasq 1792 dnsmasq 14r FIFO 0,8 0t0 3598 pipe
dnsmasq 1792 dnsmasq 15w FIFO 0,8 0t0 3598 pipe
dnsmasq 1792 dnsmasq 16u unix 0x800000041d16d900 0t0 3599 type=STREAM
dnsmasq 1792 dnsmasq 21u IPv6 3729 0t0 UDP fe80::822a:a8ff:fe4c:9dc1:53
dnsmasq 1792 dnsmasq 22u IPv6 3730 0t0 TCP fe80::822a:a8ff:fe4c:9dc1:53 (LISTEN)
dnsmasq 1792 dnsmasq 26u IPv6 3745 0t0 UDP fd90:79d3:5065:f00d::1:53
dnsmasq 1792 dnsmasq 27u IPv6 3746 0t0 TCP fd90:79d3:5065:f00d::1:53 (LISTEN)
dnsmasq 1792 dnsmasq 35u IPv6 6049 0t0 UDP cpe-2606-A000-1100-DB-0-0-0-1.dyn6.twc.com:53
dnsmasq 1792 dnsmasq 36u IPv6 6050 0t0 TCP cpe-2606-A000-1100-DB-0-0-0-1.dyn6.twc.com:53 (LISTEN)

...if I then restart it with "/etc/init.d/dnsmasq restart", it comes up properly. The router

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
dnsmasq 6861 dnsmasq cwd DIR 0,15 3488 1166 /
dnsmasq 6861 dnsmasq rtd DIR 0,15 3488 1166 /
dnsmasq 6861 dnsmasq txt REG 7,0 349744 1526 /usr/sbin/dnsmasq
dnsmasq 6861 dnsmasq mem REG 7,0 16776 189 /usr/lib/libmnl.so.0.2.0
dnsmasq 6861 dnsmasq mem REG 8,2 117320 1033 /lib/libgcc_s.so.1
dnsmasq 6861 dnsmasq mem REG 8,2 25440 1114 /lib/libubus.so
dnsmasq 6861 dnsmasq mem REG 8,2 46168 1029 /lib/libubox.so
dnsmasq 6861 dnsmasq mem REG 7,0 466112 696 /usr/lib/libgmp.so.10.3.2
dnsmasq 6861 dnsmasq mem REG 7,0 206656 702 /usr/lib/libhogweed.so.4.3
dnsmasq 6861 dnsmasq mem REG 7,0 231888 704 /usr/lib/libnettle.so.6.3
dnsmasq 6861 dnsmasq mem REG 7,0 22232 216 /usr/lib/libnfnetlink.so.0.2.0
dnsmasq 6861 dnsmasq mem REG 7,0 111864 561 /usr/lib/libnetfilter_conntrack.so.3.6.0
dnsmasq 6861 dnsmasq mem REG 8,2 650896 1012 /lib/libc.so
dnsmasq 6861 dnsmasq mem REG 0,14 23 713 /tmp/TZ
dnsmasq 6861 dnsmasq 0u CHR 1,3 0t0 1226 /dev/null
dnsmasq 6861 dnsmasq 1u CHR 1,3 0t0 1226 /dev/null
dnsmasq 6861 dnsmasq 2u CHR 1,3 0t0 1226 /dev/null
dnsmasq 6861 dnsmasq 3u REG 0,14 0 806 /tmp/dhcp.leases
dnsmasq 6861 dnsmasq 4u IPv4 10060 0t0 UDP *:bootps
dnsmasq 6861 dnsmasq 5u netlink 0t0 10061 ROUTE
dnsmasq 6861 dnsmasq 6u IPv4 10063 0t0 UDP junction:domain
dnsmasq 6861 dnsmasq 7u IPv4 10064 0t0 TCP junction:domain (LISTEN)
dnsmasq 6861 dnsmasq 8u IPv4 10065 0t0 UDP localhost:domain
dnsmasq 6861 dnsmasq 9u IPv4 10066 0t0 TCP localhost:domain (LISTEN)
dnsmasq 6861 dnsmasq 10u IPv6 10067 0t0 UDP fe80::822a:a8ff:fe4c:9dc1:domain
dnsmasq 6861 dnsmasq 11u IPv6 10068 0t0 TCP fe80::822a:a8ff:fe4c:9dc1:domain (LISTEN)
dnsmasq 6861 dnsmasq 12u IPv6 10069 0t0 UDP fd90:79d3:5065:f00d::1:domain
dnsmasq 6861 dnsmasq 13u IPv6 10070 0t0 TCP fd90:79d3:5065:f00d::1:domain (LISTEN)
dnsmasq 6861 dnsmasq 14u IPv6 10071 0t0 UDP cpe-2606-A000-1100-DB-0-0-0-1.dyn6.twc.com:domain
dnsmasq 6861 dnsmasq 15u IPv6 10072 0t0 TCP cpe-2606-A000-1100-DB-0-0-0-1.dyn6.twc.com:domain (LISTEN)
dnsmasq 6861 dnsmasq 16u IPv6 10073 0t0 UDP localhost:domain
dnsmasq 6861 dnsmasq 17u IPv6 10074 0t0 TCP localhost:domain (LISTEN)
dnsmasq 6861 dnsmasq 18r FIFO 0,8 0t0 10075 pipe
dnsmasq 6861 dnsmasq 19w FIFO 0,8 0t0 10075 pipe
dnsmasq 6861 dnsmasq 20u unix 0x800000041c446300 0t0 10077 type=DGRAM
dnsmasq 6861 dnsmasq 21u a_inode 0,9 0 14 [eventpoll]
dnsmasq 6861 dnsmasq 22r FIFO 0,8 0t0 10078 pipe
dnsmasq 6861 dnsmasq 23w FIFO 0,8 0t0 10078 pipe
dnsmasq 6861 dnsmasq 24u unix 0x800000041c445e00 0t0 10079 type=STREAM

When I restart the daemon, I see these messages in the log as well:

Fri Aug 10 09:46:50 2018 daemon.info dnsmasq-dhcp[6861]: DHCP, IP range 192.168.1.100 -- 192.168.1.249, lease time 12h
Fri Aug 10 09:46:50 2018 daemon.info dnsmasq-dhcp[6861]: DHCP, sockets bound exclusively to interface br-lan

...but they are absent when the daemon starts on a clean boot.

Attaching my /etc/config/dhcp, and the output from logread.

@openwrt-bot
Copy link
Author

jtlayton:

Another datapoint -- the dnsmasq config files are different after I restart it. Here's a diff:

--- /tmp/dnsmasq.conf.cfg01411c-afterboot 2018-08-10 09:55:17.837267122 -0400
+++ /var/etc/dnsmasq.conf.cfg01411c 2018-08-10 09:55:27.421137742 -0400
@@ -11,13 +11,13 @@
domain=poochiereds.net
server=/poochiereds.net/
server=127.0.0.1#553
+interface=br-lan
dhcp-leasefile=/tmp/dhcp.leases
stop-dns-rebind
rebind-localhost-ok
rebind-domain-ok=sepia.ceph.com
conf-file=/usr/share/dnsmasq/trust-anchors.conf
dnssec
-dnssec-no-timecheck
dhcp-broadcast=tag:needs-broadcast
addn-hosts=/tmp/hosts
conf-dir=/tmp/dnsmasq.d
@@ -29,6 +29,8 @@

bogus-priv
conf-file=/usr/share/dnsmasq/rfc6761.conf
+dhcp-range=set:lan,192.168.1.100,192.168.1.249,255.255.255.0,12h
+no-dhcp-interface=eth1

@openwrt-bot
Copy link
Author

jtlayton:

My best guess at this point is that it's some sort of boot-time race between the interfaces coming up and dnsmasq starting. I'm happy to test script patches if anyone can suggest one.

@openwrt-bot
Copy link
Author

jow-:

This should have been fixed with https://git.openwrt.org/h=bf1b0fad2b788f2e933cbe43740402fba5acaf16 - are you running the exact 18.06.0 release version?

@openwrt-bot
Copy link
Author

jmartincufre:

Hello!! Im using openwrt 18.06.1 tagged release, custom built with default packages in a tp link tl wr1043n v5 and tl wr1043nd v3 and this issue is repeating as jtlayton describes, but im using dnsmasq (not the ...-full one). After a manual “/etc/init.d/dnsmasq restart” it comes back to normal, but before takes tooo long to give a dhcp lease.

@openwrt-bot
Copy link
Author

reinerotto:

I have same issue on GLi-net MIFI, OpenWrt 18.06.1, r7258-5eb055306f ; Custom image, compiled without IPv6 for packages.

dhcp-range=set:lan,192.168.8.100,192.168.8.249,255.255.255.0,12h
is missing in /var/etc/dnsmasq.conf.cfg01411c, when no dhcp on 'lan'.

Happens sometimes, when eth0 as wan,
happens more often, when 3g-modem used as wan.

@openwrt-bot
Copy link
Author

sotux:

I've a similar issue on an mt7621 device.
My router is Newifi-D2 (mt7621 with 512M ram, 32M spi flash, mt7603e 2.4g, mt7612e 5g). When I enable STP on the lan and reboot the router, my computer can't get the IP from dnsmasq. If STP is not enabled, this issue not occurred. I think enable STP let the process of initializing the switch takes more time so the dnsmasq can't detect the lan device.
I reviewed the /etc/init.d/network script and found the startup sequence of it is 20 but the the startup sequence of dnsmasq is 19, So I think should we change the startup sequence of network from 20 to 18, and let the network startup earlier than the dnsmasq service to resolve this problem?

@openwrt-bot
Copy link
Author

golf247:

This task seems to be on hold or something but I think it deserves some attention. I am seeing the exact same issue on AR300m after installing OpenWrt 18.06.2 r7676-cddd7b4c77 on it. (on the NOR part, not the NAND).

I tried installing without keeping settings and then reset settings after that. Both experience the same ipv4 issue after reboot. Once I manually setup IP address on my device ethernet, I can login to Luci and restart dnsmasq. My easy solution was to put sleep 60 and /etc/init.d/dnsmasq restart into /etc/rc.local. My non-expert thoughts is that it's a boot timing issue on dnsmasq (boots at 19 by default).

EDIT: sleep 60 and dnsmasq restart was not working in rc.local. I ended up putting this in rc.local and this seems to be working.

echo sleep 30 > /tmp/dnsrestart
echo /etc/init.d/dnsmasq restart>> /tmp/dnsrestart
chmod +x /tmp/dnsrestart
/tmp/./dnsrestart &

@openwrt-bot
Copy link
Author

rembo10:

Hi - I was having the same problem on a GL.iNet AR300M-Lite, which only has one Ethernet port. I noticed in /etc/config/network, it was showing:

config interface 'lan'
    ....
    option ifname 'eth1'
    ....

I only have eth0 on my system, so removing that line fixed the problem - maybe it fixed the delay during startup or something. Thought I would pass the info along, hope it helps

@openwrt-bot
Copy link
Author

rdowle4rf:

This is still happening for me on 19.07, so I have spent some time analyzing the issue. The sequence of events at bootup seems to be:

  • netifd sends hotplug event for lan bridge interface (br-lan) UP
  • hotplug script toggles uci state for interface UP
  • dnsmasq init script reload is triggered
  • dnsmasq init script calls devstatus, which returns carrier as down, so interface is ignored
  • No further hotplug events

This is intermittent, and when it works there can be additional hotplug events after the lan is up.

So it looks like a race condition where netifd is reporting incorrect carrier information (bridge interfaces should always be carrier up when the interface is up).

Workaround is to make dnsmasq ignore carrier status for bridge interfaces as this seems to be buggy.

@aparcar aparcar added the release/19.07 pull request/issue targeted (also) for OpenWrt 19.07 release label Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flyspray release/19.07 pull request/issue targeted (also) for OpenWrt 19.07 release
Projects
None yet
Development

No branches or pull requests

2 participants