Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#464 - Syslog dnsmasq errors, where dnsmasq thinks it is running as pid 1 (procd) when using ujail/seccomp #5504

Closed
openwrt-bot opened this issue Feb 4, 2017 · 11 comments
Labels

Comments

@openwrt-bot
Copy link

kpv:

For the past several weeks, while testing recent LEDE x86 trunk images in a VirtualBox 5.1 VM (which I've been doing for the past 2+ years), I've been noticing dnsmasq errors in syslog, where dnsmasq thinks it's running as pid 1, logging errors like "daemon.crit dnsmasq[1]: failed to bind DHCP server socket: Address in use".

Occasionally I notice that ujail dnsmasq will hang, as zombie process (flag Z in ps)

Relevant info I can think of (in addition to the CONFIG_PACKAGE_dnsmasq_full_noid=y in recent builds) is the fact that I'm using procd-ujail/seccomp (but I've been using procd-ujail/seccomp for a long time >1 yr).

BusyBox v1.25.1 () built-in shell (ash)

Reboot (SNAPSHOT, r3186-9f7fc23)

root@LEDE:# logread |fgrep dnsm
Sat Jan 28 20:12:10 2017 daemon.info dnsmasq[1]: started, version 2.76 cachesize 150
Sat Jan 28 20:12:10 2017 daemon.info dnsmasq[1]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP no-DHCPv6 no-Lua TFTP conntrack ipset no-auth no-DNSSEC no-ID loop-detect inotify
Sat Jan 28 20:12:10 2017 daemon.info dnsmasq[1]: DNS service limited to local subnets
Sat Jan 28 20:12:10 2017 daemon.info dnsmasq-dhcp[1]: DHCP, IP range 192.168.1.100 -- 192.168.1.249, lease time 12h
Sat Jan 28 20:12:10 2017 daemon.info dnsmasq[1]: using local addresses only for domain lan
Sat Jan 28 20:12:10 2017 daemon.info dnsmasq[1]: reading /tmp/resolv.conf.auto
Sat Jan 28 20:12:10 2017 daemon.info dnsmasq[1]: using local addresses only for domain lan
Sat Jan 28 20:12:10 2017 daemon.info dnsmasq[1]: using nameserver 10.0.3.1#53
Sat Jan 28 20:12:10 2017 daemon.info dnsmasq[1]: read /etc/hosts - 4 addresses
Sat Jan 28 20:12:10 2017 daemon.info dnsmasq[1]: read /tmp/hosts/dhcp.cfg02411c - 2 addresses
Sat Jan 28 20:12:10 2017 daemon.info dnsmasq-dhcp[1]: read /etc/ethers - 0 addresses
Sat Jan 28 20:12:12 2017 daemon.info dnsmasq[1]: exiting on receipt of SIGTERM
Sat Jan 28 20:12:12 2017 daemon.info dnsmasq[1]: started, version 2.76 cachesize 150
Sat Jan 28 20:12:12 2017 daemon.info dnsmasq[1]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP no-DHCPv6 no-Lua TFTP conntrack ipset no-auth no-DNSSEC no-ID loop-detect inotify
Sat Jan 28 20:12:12 2017 daemon.info dnsmasq[1]: DNS service limited to local subnets
Sat Jan 28 20:12:12 2017 daemon.info dnsmasq-dhcp[1]: DHCP, IP range 192.168.1.100 -- 192.168.1.249, lease time 12h
Sat Jan 28 20:12:12 2017 daemon.info dnsmasq[1]: using local addresses only for domain lan
Sat Jan 28 20:12:12 2017 daemon.info dnsmasq[1]: reading /tmp/resolv.conf.auto
Sat Jan 28 20:12:12 2017 daemon.info dnsmasq[1]: using local addresses only for domain lan
Sat Jan 28 20:12:12 2017 daemon.info dnsmasq[1]: using nameserver 10.0.3.1#53
Sat Jan 28 20:12:12 2017 daemon.info dnsmasq[1]: read /etc/hosts - 4 addresses
Sat Jan 28 20:12:12 2017 daemon.info dnsmasq[1]: read /tmp/hosts/dhcp.cfg02411c - 2 addresses
Sat Jan 28 20:12:12 2017 daemon.info dnsmasq-dhcp[1]: read /etc/ethers - 0 addresses
Sat Jan 28 20:12:59 2017 daemon.crit dnsmasq[1]: failed to bind DHCP server socket: Address in use
Sat Jan 28 20:12:59 2017 daemon.crit dnsmasq[1]: FAILED to start up
Sat Jan 28 20:13:04 2017 daemon.crit dnsmasq[1]: failed to bind DHCP server socket: Address in use
Sat Jan 28 20:13:04 2017 daemon.crit dnsmasq[1]: FAILED to start up
Sat Jan 28 20:13:09 2017 daemon.crit dnsmasq[1]: failed to bind DHCP server socket: Address in use
Sat Jan 28 20:13:09 2017 daemon.crit dnsmasq[1]: FAILED to start up
Sat Jan 28 20:13:14 2017 daemon.crit dnsmasq[1]: failed to bind DHCP server socket: Address in use
Sat Jan 28 20:13:14 2017 daemon.crit dnsmasq[1]: FAILED to start up
Sat Jan 28 20:13:19 2017 daemon.crit dnsmasq[1]: failed to bind DHCP server socket: Address in use
Sat Jan 28 20:13:19 2017 daemon.crit dnsmasq[1]: FAILED to start up
Sat Jan 28 20:13:19 2017 daemon.info procd: Instance dnsmasq::cfg02411c s in a crash loop 6 crashes, 0 seconds since last crash
root@LEDE:
# ps
PID USER VSZ STAT COMMAND
1 root 1024 S /sbin/procd
2 root 0 SW [kthreadd]
3 root 0 SW [ksoftirqd/0]
4 root 0 SW [kworker/0:0]
5 root 0 SW< [kworker/0:0H]
7 root 0 SW [rcu_sched]
8 root 0 SW [rcu_bh]
9 root 0 SW [migration/0]
10 root 0 SW< [netns]
11 root 0 SW< [perf]
12 root 0 SW [kworker/u2:1]
13 root 0 SW< [writeback]
318 root 0 SW< [crypto]
319 root 0 SW< [bioset]
321 root 0 SW< [kblockd]
384 root 0 SW< [ata_sff]
496 root 0 SW [kworker/0:1]
509 root 0 SW [kswapd0]
510 root 0 SW< [vmstat]
580 root 0 SW [fsnotify_mark]
597 root 0 SW< [pencrypt]
599 root 0 SW< [pdecrypt]
619 root 0 SW< [acpi_thermal_pm]
657 root 0 SW< [bioset]
660 root 0 SW< [bioset]
663 root 0 SW< [bioset]
666 root 0 SW< [bioset]
669 root 0 SW< [bioset]
672 root 0 SW< [bioset]
675 root 0 SW< [bioset]
678 root 0 SW< [bioset]
692 root 0 SW [scsi_eh_0]
693 root 0 SW< [scsi_tmf_0]
696 root 0 SW [scsi_eh_1]
707 root 0 SW< [scsi_tmf_1]
710 root 0 SW [scsi_eh_2]
711 root 0 SW< [scsi_tmf_2]
714 root 0 SW< [kpsmoused]
715 root 0 SW [kworker/u2:3]
769 root 0 SW< [ipv6_addrconf]
780 root 0 SW< [deferwq]
781 root 0 SW< [bioset]
785 root 0 SW< [bioset]
790 root 0 SW< [kworker/0:1H]
798 root 0 SW< [ext4-rsv-conver]
1057 root 776 S /sbin/ubusd
1075 root 968 S /bin/ash --login
1394 root 0 SW< [cfg80211]
1637 root 880 S /sbin/logd -S 64
1684 root 1172 S /sbin/netifd
1724 root 896 S /usr/sbin/odhcpd
1769 root 848 S /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300
1887 root 964 S udhcpc -p /var/run/udhcpc-eth1.pid -s /lib/netifd/dhcp.script -f -t 0 -i eth1 -C -O 121
1891 root 732 S odhcp6c -s /lib/netifd/dhcpv6.script -P0 -t120 eth1
2145 root 1564 S /usr/lib/ipsec/starter --daemon charon
2147 root 4108 S /usr/lib/ipsec/charon --use-syslog
2292 root 968 S < /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p 0.lede.pool.ntp.org -p 1.lede.pool.ntp.org -p 2.lede.pool.ntp.org -p 3.lede.pool.ntp.org
2506 root 968 S {mwan3track} /bin/sh /usr/sbin/mwan3track wan eth1 2 1 2 5 3 8 208.67.220.220 208.67.222.222 8.8.8.8 8.8.4.4
2707 dnsmasq 1024 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg02411c -k -x /var/run/dnsmasq/dnsmasq.cfg02411c.pid
3814 root 916 S /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300
3838 root 968 S -ash
3943 root 964 S sleep 5
3945 root 964 R ps
root@LEDE:~# cat /var/etc/dnsmasq.conf.cfg02411c

auto-generated config file from /etc/config/dhcp

conf-file=/etc/dnsmasq.conf
dhcp-authoritative
domain-needed
localise-queries
read-ethers
bogus-priv
expand-hosts
local-service
domain=lan
server=/lan/
dhcp-leasefile=/tmp/dhcp.leases
resolv-file=/tmp/resolv.conf.auto
stop-dns-rebind
rebind-localhost-ok
dhcp-broadcast=tag:needs-broadcast
addn-hosts=/tmp/hosts
conf-dir=/tmp/dnsmasq.d
user=dnsmasq
group=dnsmasq

dhcp-range=lan,192.168.1.100,192.168.1.249,255.255.255.0,12h
no-dhcp-interface=eth1

root@LEDE:# cat /var/run/dnsmasq/dnsmasq.cfg02411c.pid
1
root@LEDE:
# ip ad li
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br-lan state UP group default qlen 1000
link/ether 08:00:27:3a:49:cc brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:9e:1f:f5 brd ff:ff:ff:ff:ff:ff
inet 10.0.3.105/24 brd 10.0.3.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe9e:1ff5/64 scope link
valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 08:00:27:ee:c9:3a brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 08:00:27:77:06:57 brd ff:ff:ff:ff:ff:ff
6: ifb0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 32
link/ether a6:97:f6:34:f9:07 brd ff:ff:ff:ff:ff:ff
7: ifb1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 32
link/ether 92:11:dc:f8:d0:a4 brd ff:ff:ff:ff:ff:ff
8: gre0@NONE: mtu 1476 qdisc noop state DOWN group default qlen 1
link/gre 0.0.0.0 brd 0.0.0.0
9: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
10: teql0: mtu 1500 qdisc noop state DOWN group default qlen 100
link/void
13: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 08:00:27:3a:49:cc brd ff:ff:ff:ff:ff:ff
inet 192.168.1.1/24 brd 192.168.1.255 scope global br-lan
valid_lft forever preferred_lft forever
inet6 fd5a:d684:ee69::1/60 scope global noprefixroute
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe3a:49cc/64 scope link
valid_lft forever preferred_lft forever
root@LEDE:~#

$ fgrep -i procd .config
CONFIG_PACKAGE_procd=y

CONFIG_PROCD_SHOW_BOOT is not set

CONFIG_PROCD_ZRAM_TMPFS is not set

CONFIG_PACKAGE_procd-seccomp=y
CONFIG_PACKAGE_procd-ujail=y
$
$ fgrep -i dnsm .config
CONFIG_DEFAULT_dnsmasq=y

CONFIG_PACKAGE_dnsmasq is not set

CONFIG_PACKAGE_dnsmasq-dhcpv6 is not set

CONFIG_PACKAGE_dnsmasq-full=y

CONFIG_PACKAGE_dnsmasq_full_dhcpv6 is not set

CONFIG_PACKAGE_dnsmasq_full_dnssec is not set

CONFIG_PACKAGE_dnsmasq_full_auth is not set

CONFIG_PACKAGE_dnsmasq_full_ipset=y
CONFIG_PACKAGE_dnsmasq_full_conntrack=y
CONFIG_PACKAGE_dnsmasq_full_noid=y

CONFIG_PACKAGE_dnsmasq_full_broken_rtc is not set

$

$ ./scripts/diffconfig.sh|grep -v ^#
CONFIG_TARGET_x86=y
CONFIG_TARGET_x86_generic=y
CONFIG_TARGET_x86_generic_Generic=y
CONFIG_ALL=y
CONFIG_ALL_KMODS=y
CONFIG_ALL_NONSHARED=y
CONFIG_DEVEL=y
CONFIG_TOOLCHAINOPTS=y
CONFIG_DROPBEAR_ECC=y
CONFIG_EXTRA_OPTIMIZATION="-fno-caller-saves"
CONFIG_IB=y
CONFIG_IB_STANDALONE=y
CONFIG_KERNEL_AIO=y
CONFIG_KERNEL_CC_STACKPROTECTOR_STRONG=y
CONFIG_KERNEL_IPC_NS=y
CONFIG_KERNEL_NAMESPACES=y
CONFIG_KERNEL_NET_NS=y
CONFIG_KERNEL_PID_NS=y
CONFIG_KERNEL_SECCOMP=y
CONFIG_KERNEL_SECCOMP_FILTER=y
CONFIG_KERNEL_USER_NS=y
CONFIG_KERNEL_UTS_NS=y
CONFIG_LIBCURL_OPENSSL=y
CONFIG_MAKE_TOOLCHAIN=y
CONFIG_PACKAGE_ath9k-htc-firmware=y
CONFIG_PACKAGE_ca-certificates=y
CONFIG_PACKAGE_chat=y
CONFIG_PACKAGE_comgt=y
CONFIG_PACKAGE_curl=y
CONFIG_PACKAGE_ddns-scripts=y
CONFIG_PACKAGE_dnsmasq-full=y
CONFIG_PACKAGE_gre=y
CONFIG_PACKAGE_grev4=y
CONFIG_PACKAGE_hostapd-common=y
CONFIG_PACKAGE_iftop=y
CONFIG_PACKAGE_ip-tiny=y
CONFIG_PACKAGE_ipset=y
CONFIG_PACKAGE_iptables-mod-conntrack-extra=y
CONFIG_PACKAGE_iptables-mod-hashlimit=y
CONFIG_PACKAGE_iptables-mod-ipopt=y
CONFIG_PACKAGE_iptables-mod-ipsec=y
CONFIG_PACKAGE_iptables-mod-tee=y
CONFIG_PACKAGE_iw=y
CONFIG_PACKAGE_iwinfo=y
CONFIG_PACKAGE_kmod-ath=y
CONFIG_PACKAGE_kmod-ath9k=y
CONFIG_PACKAGE_kmod-ath9k-common=y
CONFIG_PACKAGE_kmod-ath9k-htc=y
CONFIG_PACKAGE_kmod-cfg80211=y
CONFIG_PACKAGE_kmod-crypto-aead=y
CONFIG_PACKAGE_kmod-crypto-authenc=y
CONFIG_PACKAGE_kmod-crypto-cbc=y
CONFIG_PACKAGE_kmod-crypto-crc32c=y
CONFIG_PACKAGE_kmod-crypto-deflate=y
CONFIG_PACKAGE_kmod-crypto-des=y
CONFIG_PACKAGE_kmod-crypto-echainiv=y
CONFIG_PACKAGE_kmod-crypto-hash=y
CONFIG_PACKAGE_kmod-crypto-hmac=y
CONFIG_PACKAGE_kmod-crypto-iv=y
CONFIG_PACKAGE_kmod-crypto-manager=y
CONFIG_PACKAGE_kmod-crypto-md5=y
CONFIG_PACKAGE_kmod-crypto-null=y
CONFIG_PACKAGE_kmod-crypto-pcompress=y
CONFIG_PACKAGE_kmod-crypto-rng=y
CONFIG_PACKAGE_kmod-crypto-sha1=y
CONFIG_PACKAGE_kmod-crypto-sha256=y
CONFIG_PACKAGE_kmod-crypto-wq=y
CONFIG_PACKAGE_kmod-fs-ext4=y
CONFIG_PACKAGE_kmod-gpio-button-hotplug=y
CONFIG_PACKAGE_kmod-gre=y
CONFIG_PACKAGE_kmod-i2c-algo-bit=y
CONFIG_PACKAGE_kmod-i2c-core=y
CONFIG_PACKAGE_kmod-ifb=y
CONFIG_PACKAGE_kmod-igb=y
CONFIG_PACKAGE_kmod-ipsec=y
CONFIG_PACKAGE_kmod-ipsec4=y
CONFIG_PACKAGE_kmod-ipsec6=y
CONFIG_PACKAGE_kmod-ipt-conntrack-extra=y
CONFIG_PACKAGE_kmod-ipt-hashlimit=y
CONFIG_PACKAGE_kmod-ipt-ipopt=y
CONFIG_PACKAGE_kmod-ipt-ipsec=y
CONFIG_PACKAGE_kmod-ipt-ipset=y
CONFIG_PACKAGE_kmod-ipt-tee=y
CONFIG_PACKAGE_kmod-iptunnel=y
CONFIG_PACKAGE_kmod-iptunnel4=y
CONFIG_PACKAGE_kmod-iptunnel6=y
CONFIG_PACKAGE_kmod-leds-gpio=y
CONFIG_PACKAGE_kmod-ledtrig-default-on=y
CONFIG_PACKAGE_kmod-ledtrig-netdev=y
CONFIG_PACKAGE_kmod-ledtrig-timer=y
CONFIG_PACKAGE_kmod-lib-crc16=y
CONFIG_PACKAGE_kmod-lib-crc32c=y
CONFIG_PACKAGE_kmod-lib-textsearch=y
CONFIG_PACKAGE_kmod-lib-zlib-deflate=y
CONFIG_PACKAGE_kmod-lib-zlib-inflate=y
CONFIG_PACKAGE_kmod-mac80211=y
CONFIG_PACKAGE_kmod-nf-conntrack-netlink=y
CONFIG_PACKAGE_kmod-nf-nathelper=y
CONFIG_PACKAGE_kmod-nf-nathelper-extra=y
CONFIG_PACKAGE_kmod-nfnetlink=y
CONFIG_PACKAGE_kmod-nls-base=y
CONFIG_PACKAGE_kmod-sched=y
CONFIG_PACKAGE_kmod-sched-cake=y
CONFIG_PACKAGE_kmod-sched-core=y
CONFIG_PACKAGE_kmod-tun=y
CONFIG_PACKAGE_kmod-usb-acm=y
CONFIG_PACKAGE_kmod-usb-core=y
CONFIG_PACKAGE_kmod-usb-net=y
CONFIG_PACKAGE_kmod-usb-net-cdc-ncm=y
CONFIG_PACKAGE_kmod-usb-net-huawei-cdc-ncm=y
CONFIG_PACKAGE_kmod-usb-net-qmi-wwan=y
CONFIG_PACKAGE_kmod-usb-ohci=y
CONFIG_PACKAGE_kmod-usb-serial=y
CONFIG_PACKAGE_kmod-usb-serial-option=y
CONFIG_PACKAGE_kmod-usb-serial-wwan=y
CONFIG_PACKAGE_kmod-usb-wdm=y
CONFIG_PACKAGE_kmod-usb2=y
CONFIG_PACKAGE_kmod-vmxnet3=y
CONFIG_PACKAGE_libcurl=y
CONFIG_PACKAGE_libgmp=y
CONFIG_PACKAGE_libiwinfo=y
CONFIG_PACKAGE_liblzo=y
CONFIG_PACKAGE_libmnl=y
CONFIG_PACKAGE_libncurses=y
CONFIG_PACKAGE_libnetfilter-conntrack=y
CONFIG_PACKAGE_libopenssl=y
CONFIG_PACKAGE_libpcap=y
CONFIG_PACKAGE_libseccomp=y
CONFIG_PACKAGE_libusb-1.0=y
CONFIG_PACKAGE_mtr=y
CONFIG_PACKAGE_netstat-nat=y
CONFIG_PACKAGE_openssl-util=y
CONFIG_PACKAGE_openvpn-openssl=y
CONFIG_PACKAGE_procd-seccomp=y
CONFIG_PACKAGE_procd-ujail=y
CONFIG_PACKAGE_resolveip=y
CONFIG_PACKAGE_softflowd=y
CONFIG_PACKAGE_sqm-scripts=y
CONFIG_PACKAGE_sqm-scripts-extra=y
CONFIG_PACKAGE_ssmtp=y
CONFIG_PACKAGE_strongswan=y
CONFIG_PACKAGE_strongswan-charon=y
CONFIG_PACKAGE_strongswan-default=y
CONFIG_PACKAGE_strongswan-mod-aes=y
CONFIG_PACKAGE_strongswan-mod-attr=y
CONFIG_PACKAGE_strongswan-mod-connmark=y
CONFIG_PACKAGE_strongswan-mod-constraints=y
CONFIG_PACKAGE_strongswan-mod-des=y
CONFIG_PACKAGE_strongswan-mod-dnskey=y
CONFIG_PACKAGE_strongswan-mod-fips-prf=y
CONFIG_PACKAGE_strongswan-mod-gmp=y
CONFIG_PACKAGE_strongswan-mod-hmac=y
CONFIG_PACKAGE_strongswan-mod-kernel-netlink=y
CONFIG_PACKAGE_strongswan-mod-md5=y
CONFIG_PACKAGE_strongswan-mod-nonce=y
CONFIG_PACKAGE_strongswan-mod-pem=y
CONFIG_PACKAGE_strongswan-mod-pgp=y
CONFIG_PACKAGE_strongswan-mod-pkcs1=y
CONFIG_PACKAGE_strongswan-mod-pubkey=y
CONFIG_PACKAGE_strongswan-mod-random=y
CONFIG_PACKAGE_strongswan-mod-rc2=y
CONFIG_PACKAGE_strongswan-mod-resolve=y
CONFIG_PACKAGE_strongswan-mod-revocation=y
CONFIG_PACKAGE_strongswan-mod-sha1=y
CONFIG_PACKAGE_strongswan-mod-sha2=y
CONFIG_PACKAGE_strongswan-mod-socket-default=y
CONFIG_PACKAGE_strongswan-mod-sshkey=y
CONFIG_PACKAGE_strongswan-mod-stroke=y
CONFIG_PACKAGE_strongswan-mod-updown=y
CONFIG_PACKAGE_strongswan-mod-x509=y
CONFIG_PACKAGE_strongswan-mod-xauth-generic=y
CONFIG_PACKAGE_strongswan-mod-xcbc=y
CONFIG_PACKAGE_strongswan-utils=y
CONFIG_PACKAGE_swconfig=y
CONFIG_PACKAGE_tc=y
CONFIG_PACKAGE_tcpdump-mini=y
CONFIG_PACKAGE_terminfo=y
CONFIG_PACKAGE_uboot-envtools=y
CONFIG_PACKAGE_uqmi=y
CONFIG_PACKAGE_usb-modeswitch=y
CONFIG_PACKAGE_usbreset=y
CONFIG_PACKAGE_wpad-mini=y
CONFIG_PACKAGE_wwan=y
CONFIG_PKG_CC_STACKPROTECTOR_STRONG=y
CONFIG_KERNEL_DIRECT_IO=y
CONFIG_OPENSSL_ENGINE_CRYPTO=y
$

@openwrt-bot
Copy link
Author

kpv:

Here is the same output from a more recent LEDE x86 trunk build (same diffconfig) with ujail still running:

BusyBox v1.26.2 () built-in shell (ash)

 _________
/        /\      _    ___ ___  ___

/ LE / \ | | | | | |
/ DE / \ | || _|| |) | _|
/
_/ LE \ |||/|| lede-project.org
\ \ DE /
\ LE \ / -----------------------------------------------------------
\ DE \ / Reboot (SNAPSHOT, r3285-1a52d11)
________/ -----------------------------------------------------------

root@LEDE:# ps
PID USER VSZ STAT COMMAND
1 root 1020 S /sbin/procd
2 root 0 SW [kthreadd]
3 root 0 SW [ksoftirqd/0]
4 root 0 SW [kworker/0:0]
5 root 0 SW< [kworker/0:0H]
6 root 0 SW [kworker/u2:0]
7 root 0 SW [rcu_sched]
8 root 0 SW [rcu_bh]
9 root 0 SW [migration/0]
10 root 0 SW< [netns]
11 root 0 SW< [perf]
12 root 0 SW [kworker/u2:1]
316 root 0 SW< [writeback]
318 root 0 SW< [crypto]
319 root 0 SW< [bioset]
321 root 0 SW< [kblockd]
384 root 0 SW< [ata_sff]
406 root 0 SW [kworker/u2:2]
497 root 0 SW [kworker/0:1]
510 root 0 SW [kswapd0]
511 root 0 SW< [vmstat]
581 root 0 SW [fsnotify_mark]
598 root 0 SW< [pencrypt]
600 root 0 SW< [pdecrypt]
620 root 0 SW< [acpi_thermal_pm]
658 root 0 SW< [bioset]
661 root 0 SW< [bioset]
664 root 0 SW< [bioset]
667 root 0 SW< [bioset]
670 root 0 SW< [bioset]
673 root 0 SW< [bioset]
676 root 0 SW< [bioset]
679 root 0 SW< [bioset]
693 root 0 SW [scsi_eh_0]
694 root 0 SW< [scsi_tmf_0]
697 root 0 SW [scsi_eh_1]
707 root 0 SW< [scsi_tmf_1]
710 root 0 SW [scsi_eh_2]
711 root 0 SW< [scsi_tmf_2]
714 root 0 SW< [kpsmoused]
715 root 0 SW [kworker/u2:3]
716 root 0 SW [kworker/u2:4]
771 root 0 SW< [ipv6_addrconf]
780 root 0 SW< [deferwq]
781 root 0 SW< [bioset]
785 root 0 SW< [bioset]
790 root 0 SW< [kworker/0:1H]
798 root 0 SW< [ext4-rsv-conver]
1054 root 776 S /sbin/ubusd
1072 root 968 S /bin/ash --login
1370 root 0 SW< [cfg80211]
1611 root 880 S /sbin/logd -S 64
1657 root 1172 S /sbin/netifd
1696 root 964 S /usr/sbin/odhcpd
1740 root 848 S /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300
1937 root 964 S udhcpc -p /var/run/udhcpc-eth1.pid -s /lib/netifd/dhcp.script -f -t 0 -i eth1 -C -O 121
1940 root 736 S odhcp6c -s /lib/netifd/dhcpv6.script -P0 -t120 eth1
2033 root 1588 S /usr/lib/ipsec/starter --daemon charon
2035 root 4120 S /usr/lib/ipsec/charon --use-syslog
2118 root 964 S < /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p 0.lede.pool.ntp.org -p 1.lede.pool.ntp.org -p 2.lede.pool.ntp.org -p 3.lede.pool.ntp.org
2406 root 968 S {mwan3track} /bin/sh /usr/sbin/mwan3track wan eth1 2 1 2 5 3 8 208.67.220.220 208.67.222.222 8.8.8.8 8.8.4.4
2700 root 1804 S {dnsmasq} /sbin/ujail -n dnsmasq -u -l -r /dev/null -r /dev/urandom -r /etc/TZ -r /etc/dnsmasq.conf -r /etc/ethers -r /etc/group -r /etc/hosts -r /etc/passwd
2703 dnsmasq 1024 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg02411c -k -x /var/run/dnsmasq/dnsmasq.cfg02411c.pid
3267 root 920 S /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300
3274 root 964 S sleep 5
3282 root 968 S -ash
3293 root 964 R ps
root@LEDE:
#
root@LEDE:# cat /var/run/dnsmasq/dnsmasq.cfg02411c.pid
1
root@LEDE:
# cat /var/etc/dnsmasq.conf.cfg02411c

auto-generated config file from /etc/config/dhcp

conf-file=/etc/dnsmasq.conf
dhcp-authoritative
domain-needed
localise-queries
read-ethers
bogus-priv
expand-hosts
local-service
domain=lan
server=/lan/
dhcp-leasefile=/tmp/dhcp.leases
resolv-file=/tmp/resolv.conf.auto
stop-dns-rebind
rebind-localhost-ok
dhcp-broadcast=tag:needs-broadcast
addn-hosts=/tmp/hosts
conf-dir=/tmp/dnsmasq.d
user=dnsmasq
group=dnsmasq

dhcp-range=set:lan,192.168.1.100,192.168.1.249,255.255.255.0,12h
no-dhcp-interface=eth1

root@LEDE:#
root@LEDE:
# logread |fgrep dnsm
Sun Feb 5 00:00:27 2017 daemon.info dnsmasq[1]: started, version 2.76 cachesize 150
Sun Feb 5 00:00:27 2017 daemon.info dnsmasq[1]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP no-DHCPv6 no-Lua TFTP conntrack ipset no-auth no-DNSSEC no-ID loop-detect inotify
Sun Feb 5 00:00:27 2017 daemon.info dnsmasq[1]: DNS service limited to local subnets
Sun Feb 5 00:00:27 2017 daemon.info dnsmasq-dhcp[1]: DHCP, IP range 192.168.1.100 -- 192.168.1.249, lease time 12h
Sun Feb 5 00:00:27 2017 daemon.info dnsmasq[1]: using local addresses only for domain lan
Sun Feb 5 00:00:27 2017 daemon.info dnsmasq[1]: reading /tmp/resolv.conf.auto
Sun Feb 5 00:00:27 2017 daemon.info dnsmasq[1]: using local addresses only for domain lan
Sun Feb 5 00:00:27 2017 daemon.info dnsmasq[1]: using nameserver 10.0.3.1#53
Sun Feb 5 00:00:27 2017 daemon.info dnsmasq[1]: read /etc/hosts - 4 addresses
Sun Feb 5 00:00:27 2017 daemon.info dnsmasq[1]: read /tmp/hosts/dhcp.cfg02411c - 2 addresses
Sun Feb 5 00:00:27 2017 daemon.info dnsmasq-dhcp[1]: read /etc/ethers - 0 addresses
root@LEDE:~#

@openwrt-bot
Copy link
Author

kpv:

Same build, after reboot, no lingering ujail process and dnsmasq error.

Note: Behavior is inconsistent, changes from boot to boot.

BusyBox v1.26.2 () built-in shell (ash)

 _________
/        /\      _    ___ ___  ___

/ LE / \ | | | | | |
/ DE / \ | || _|| |) | _|
/
_/ LE \ |||/|| lede-project.org
\ \ DE /
\ LE \ / -----------------------------------------------------------
\ DE \ / Reboot (SNAPSHOT, r3285-1a52d11)
________/ -----------------------------------------------------------

root@LEDE:# ps
PID USER VSZ STAT COMMAND
1 root 1024 S /sbin/procd
2 root 0 SW [kthreadd]
3 root 0 SW [ksoftirqd/0]
4 root 0 SW [kworker/0:0]
5 root 0 SW< [kworker/0:0H]
6 root 0 SW [kworker/u2:0]
7 root 0 SW [rcu_sched]
8 root 0 SW [rcu_bh]
9 root 0 SW [migration/0]
10 root 0 SW< [netns]
11 root 0 SW< [perf]
12 root 0 SW [kworker/u2:1]
13 root 0 SW< [writeback]
318 root 0 SW< [crypto]
319 root 0 SW< [bioset]
321 root 0 SW< [kblockd]
384 root 0 SW< [ata_sff]
496 root 0 SW [kworker/0:1]
509 root 0 SW [kswapd0]
510 root 0 SW< [vmstat]
580 root 0 SW [fsnotify_mark]
597 root 0 SW< [pencrypt]
599 root 0 SW< [pdecrypt]
619 root 0 SW< [acpi_thermal_pm]
657 root 0 SW< [bioset]
660 root 0 SW< [bioset]
663 root 0 SW< [bioset]
666 root 0 SW< [bioset]
669 root 0 SW< [bioset]
672 root 0 SW< [bioset]
675 root 0 SW< [bioset]
678 root 0 SW< [bioset]
692 root 0 SW [scsi_eh_0]
693 root 0 SW< [scsi_tmf_0]
696 root 0 SW [scsi_eh_1]
707 root 0 SW< [scsi_tmf_1]
710 root 0 SW [scsi_eh_2]
711 root 0 SW< [scsi_tmf_2]
714 root 0 SW< [kpsmoused]
771 root 0 SW< [ipv6_addrconf]
780 root 0 SW< [deferwq]
781 root 0 SW< [bioset]
785 root 0 SW< [bioset]
787 root 0 SW< [kworker/0:1H]
799 root 0 SW< [ext4-rsv-conver]
1052 root 776 S /sbin/ubusd
1059 root 968 S /bin/ash --login
1345 root 0 SW< [cfg80211]
1610 root 880 S /sbin/logd -S 64
1657 root 1172 S /sbin/netifd
1696 root 964 S /usr/sbin/odhcpd
1741 root 848 S /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300
1909 root 964 S udhcpc -p /var/run/udhcpc-eth1.pid -s /lib/netifd/dhcp.script -f -t 0 -i eth1 -C -O 121
1926 root 736 S odhcp6c -s /lib/netifd/dhcpv6.script -P0 -t120 eth1
2032 root 1588 S /usr/lib/ipsec/starter --daemon charon
2034 root 4120 S /usr/lib/ipsec/charon --use-syslog
2113 root 968 S < /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p 0.lede.pool.ntp.org -p 1.lede.pool.ntp.org -p 2.lede.pool.ntp.org -p 3.lede.pool.ntp.org
2410 root 968 S {mwan3track} /bin/sh /usr/sbin/mwan3track wan eth1 2 1 2 5 3 8 208.67.220.220 208.67.222.222 8.8.8.8 8.8.4.4
2650 dnsmasq 1024 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg02411c -k -x /var/run/dnsmasq/dnsmasq.cfg02411c.pid
3578 root 920 S /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300
3584 root 968 S -ash
3601 root 964 S sleep 5
3606 root 964 R ps
root@LEDE:
# logread |fgrep dnsm
Sun Feb 5 00:47:38 2017 daemon.info dnsmasq[1]: started, version 2.76 cachesize 150
Sun Feb 5 00:47:38 2017 daemon.info dnsmasq[1]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP no-DHCPv6 no-Lua TFTP conntrack ipset no-auth no-DNSSEC no-ID loop-detect inotify
Sun Feb 5 00:47:38 2017 daemon.info dnsmasq[1]: DNS service limited to local subnets
Sun Feb 5 00:47:38 2017 daemon.info dnsmasq-dhcp[1]: DHCP, IP range 192.168.1.100 -- 192.168.1.249, lease time 12h
Sun Feb 5 00:47:38 2017 daemon.info dnsmasq[1]: using local addresses only for domain lan
Sun Feb 5 00:47:38 2017 daemon.info dnsmasq[1]: reading /tmp/resolv.conf.auto
Sun Feb 5 00:47:38 2017 daemon.info dnsmasq[1]: using local addresses only for domain lan
Sun Feb 5 00:47:38 2017 daemon.info dnsmasq[1]: using nameserver 10.0.3.1#53
Sun Feb 5 00:47:38 2017 daemon.info dnsmasq[1]: read /etc/hosts - 4 addresses
Sun Feb 5 00:47:38 2017 daemon.info dnsmasq[1]: read /tmp/hosts/dhcp.cfg02411c - 2 addresses
Sun Feb 5 00:47:38 2017 daemon.info dnsmasq-dhcp[1]: read /etc/ethers - 0 addresses
Sun Feb 5 00:48:13 2017 daemon.crit dnsmasq[1]: failed to bind DHCP server socket: Address in use
Sun Feb 5 00:48:13 2017 daemon.crit dnsmasq[1]: FAILED to start up
Sun Feb 5 00:48:18 2017 daemon.crit dnsmasq[1]: failed to bind DHCP server socket: Address in use
Sun Feb 5 00:48:18 2017 daemon.crit dnsmasq[1]: FAILED to start up
Sun Feb 5 00:48:23 2017 daemon.crit dnsmasq[1]: failed to bind DHCP server socket: Address in use
Sun Feb 5 00:48:23 2017 daemon.crit dnsmasq[1]: FAILED to start up
Sun Feb 5 00:48:28 2017 daemon.crit dnsmasq[1]: failed to bind DHCP server socket: Address in use
Sun Feb 5 00:48:28 2017 daemon.crit dnsmasq[1]: FAILED to start up
Sun Feb 5 00:48:33 2017 daemon.crit dnsmasq[1]: failed to bind DHCP server socket: Address in use
Sun Feb 5 00:48:33 2017 daemon.crit dnsmasq[1]: FAILED to start up
Sun Feb 5 00:48:33 2017 daemon.info procd: Instance dnsmasq::cfg02411c s in a crash loop 6 crashes, 0 seconds since last crash
root@LEDE:~#

@openwrt-bot
Copy link
Author

champtar:

Hi @KpapaD

ujail uses Linux kernel namespaces to "jail" processes (like all container technologies (docker/lxc) and other jail project like firejail)
http://man7.org/linux/man-pages/man7/namespaces.7.html

In a new namespace with CLONE_NEWPID, the processes in this new namespace doesn't see processes in the parent namespace, and the PID numbers start at 1 again inside this new namespaces,
so it's totally normal that dnsmasq with ujail thinks he is PID 1 (because that's what he is inside the namespace / from his point of view)
and it's also normal that you don't see dnsmasq having PID 1 with ps, because you are looking from the parent namespace (different point of view).

Now for the implementation details, procd (Lede PID 1 / init / process manager) launch ujail, which takes care of setting up the namespaces, and then launch dnsmasq. So it's normal to have an ujail process for each "jailed" process.

What is happening I think, is that for some reason the ujail process is killed/die, but doesn't kill his children dnsmasq, so procd not seeing his children, try to restart, but dnsmasq is still running, so the new dnsmasq can't bind

You can try to play with strace to see exactly what is happening (if you can launch strace before the reboot loop)

strace -f -p 1 -p $(pgrep dnsmasq) -p $(pgrep ujail) -o /tmp/trace

Regards
Etienne

@openwrt-bot
Copy link
Author

None:

I've been able to replicate this on an Archer C7 v2 MIPS box. It's "interesting" because dnsmasq is still running (minus its ujail 'minder' process) and able to service queries, but procd has a few attempts at restarting and then gives up...as the new dnsmasq is unable to bind the port.

dnsmasq startup is increasingly hairy these days because a) it doesn't actually try to start at boot. 2) it starts on interface hotplug events of which there are probably quite a few.

The other 'fun' aspect of this is that the 'dnsmasq without ujail minder' wasn't killed when I tried /etc/init.d/dnsmasq stop. I think that suggests procd has really lost track of it.

I suspect this is a race condition....which is going to be fun :-/

@openwrt-bot
Copy link
Author

stintel:

Increased severity and priority. If dnsmasq doesn't work, a lot of people are going to be unhappy. While procd-ujail is not installed by default, it can be easily installed, resulting in broken connectivity if the LEDE device is your main router.

As this was working before, please try to bisect the problem. It's best to do this semi-manually, instead of letting git bisect decide what commit to try next, pick one from "git log --oneline package/system/procd/" by running "git checkout [shorthash]" after "git bisect good" or "git bisect bad".

For me this is a blocker for the 17.01 release. This needs to be fixed, or the commit that broke it has to be reverted.

@openwrt-bot
Copy link
Author

dtaht:

A suggestion might be to try out the newfangled jailing facilities on daemons less system critical and complicated than dnsmasq, and try to work out some of the bugs separately that way. It might be nice to jail lots of things in the long term.

@openwrt-bot
Copy link
Author

stintel:

It's not about trying out new facilities, it worked fine for over a year according to the reporter. We just need to test {better,more} before merging breaking changes.

@openwrt-bot
Copy link
Author

None:

Can someone humour me and try this total hack job 'locking around dnsmasq startup' patch that I botched together last night.

I've an AP that doesn't need to run dnsmasq, so rc.local has
"
/etc/init.d/dnsmasq disable
/etc/init.d/dnsmasq stop
#sleep 110
#/etc/init.d/dnsmaq stop
"

Without the patch it's a lottery as to whether dnsmasq is left running or not which indicates race type issues (hence the now commented sleep & another stop). I'm wondering if race+jail+dnsmasq is an even worse combination :-)

Just tested it on a ujail capable box - no joy. But definitely dnsmasq's ujail 'minder' process is dying as part of the boot process whilst dnsmasq does not.....which is why procd is trying to start more instances.

@openwrt-bot
Copy link
Author

None:

OK, another theory, but this one's a corker and fits with available facts. Looking through the logfile it's a more a case of what's missing rather than what's present.... normally dnsmasq starts with 'dnssec timestamp checks' disabled and waits for a -SIGHUP (via procd) caused by a hotplug/ntpd script signalling time is now valid.

The logfile shows dnsmasq starting in 'no timestamp mode' but it never gets the signal that timestamps are valid (via the SIGHUP) even though ntpd has gone through it's 'time is valid' hotplug script. I suspect procd is signalling the procd jailer, rather than the procd jailee. And procd jailer responds to sighup by closing shop as it should.

No idea how to fix...but that's what I think is going on.

@openwrt-bot
Copy link
Author

None:

A workaround fix lede-project/source#799

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant