OpenWrt/LEDE Project

  • Status Closed
  • Percent Complete
    100%
  • Task Type Bug Report
  • Category Base system
  • Assigned To No-one
  • Operating System All
  • Severity High
  • Priority High
  • Reported Version Trunk
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Stijn Tintel - 28.11.2019
Last edited by Stijn Tintel - 06.06.2020

FS#2637 - hostapd ubus reload makes OpenWrt as WiFi client very unreliable

Commit 60fb4c92b6b0d1582d31e02167b90b424185f3a2 (hostapd: add ubus reload) breaks WiFi client connectivity when for some reason the association does not succeed immediately.

I started seeing this problem on all my Raspberry Pi Zero W devices, where WiFi takes a long time to come up due to the lack of entropy:

[   32.392170] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[  104.672670] random: crng init done
[  104.685751] random: 3 urandom warning(s) missed due to ratelimiting
[  105.436537] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready

At this point, the Raspberry Pi is associated to the AP, but the DHCP client is not running. This is the output of ifstatus lan:

{
        "up": false,
        "pending": false,
        "available": false,
        "autostart": true,
        "dynamic": false,
        "proto": "dhcp",
        "data": {

        },
        "errors": [
                {
                        "subsystem": "interface",
                        "code": "NO_DEVICE"
                }
        ]
}

Running ifup lan fixes the problem, but this is of course not easy to do on headless devices.

Later I also experienced this problem on a Raspberry Pi 3 A, which does not have the low entropy problem. In this case the delayed association was caused by a power outage, and the Raspberry Pi booted faster than the APs.

Closed by  Stijn Tintel
06.06.2020 02:22
Reason for closing:  Fixed
Project Manager
Stijn Tintel commented on 03.12.2019 18:31

I just noticed this on one of my devices, and am pretty sure it is caused by the same commit:

root@litebeam01:~# cat /etc/config/wireless

config wifi-device 'radio0'
        option type 'mac80211'
        option channel '36'
        option hwmode '11a'
        option path 'pci0000:00/0000:00:00.0'
        option htmode 'VHT80'
        option disabled '1'

config wifi-iface 'default_radio0'
        option device 'radio0'
        option network 'lan'
        option mode 'ap'
        option ssid 'OpenWrt'
        option encryption 'none'

config wifi-device 'radio1'
        option type 'mac80211'
        option channel '11'
        option hwmode '11g'
        option path 'platform/ahb/18100000.wmac'
        option htmode 'HT20'
        option disabled '1'

config wifi-iface 'default_radio1'
        option device 'radio1'
        option network 'lan'
        option mode 'ap'
        option ssid 'OpenWrt'
        option encryption 'none'
root@litebeam01:~# ps | grep phy
  745 root      3444 S    /usr/sbin/hostapd -s -n phy0 -P /var/run/hostapd-phy0.pid -g /var/run/hostapd-phy0/global -B
  820 root      3444 S    /usr/sbin/hostapd -s -n phy1 -P /var/run/hostapd-phy1.pid -g /var/run/hostapd-phy1/global -B
  871 root      3444 S    /usr/sbin/wpa_supplicant -s -n phy0 -P /var/run/wpa_supplicant-phy0.pid -g /var/run/wpa_supplicant-phy0/global -B
  872 root      3444 S    /usr/sbin/wpa_supplicant -s -n phy1 -P /var/run/wpa_supplicant-phy1.pid -g /var/run/wpa_supplicant-phy1/global -B
 1806 root      1208 S    grep phy

We should not unconditionally start these instances I'd say.

Project Manager
Daniel Golle commented on 06.12.2019 13:04

Running the services unconditionally (to be precise: only in the presence of wifi hardware) was the intention of those commits and believe me, it's a feature, not a bug :)

The cause of the problem described by you above is probably that previously, we have been running hostapd/wpa_supplicant from those scripts and probably those services only fork to background once enough entropy had been collected.
I was hoping that introducing `ubus wait_for ...` would solve that problem, but apparently it didn't – the service is running and the configuration comes up (according to your logs), just netifd had already given up. I will try to find a way to reproduce this, but I don't have any hardware as bad as RasbPi Zero at hand, so it may take a couple of days to find it...

Project Manager
Stijn Tintel commented on 06.12.2019 13:09

If you have a spare AP you could configure a client to connect to it. Turn of that AP, boot the client, and boot the AP 2 minutes afterwards.

Project Manager
Daniel Golle commented on 06.04.2020 23:30

I've now tried exactly that and cannot reproduce the issue. Can it be that the wifi driver (maybe brcmfmac? cfg80211?) doesn't signal link status properly and our 'ip link set wlan0 up' call makes it come up prematurely?

Project Manager
Stijn Tintel commented on 16.04.2020 21:26

Any suggestions on how to check this would be appreciated.

Project Manager
Daniel Golle commented on 25.04.2020 00:13

To test this, please try this:
Have an open/unencrypted AP next to the device.
On the device, boot fresh/unconfigured OpenWrt.
Enter commands:

ip link show dev wlan0
ip link set dev wlan0 up
ip link show dev wlan0
iw dev wlan0 connect "wrong_non_existing_ssid"
ip link show dev wlan0
sleep 1
ip link show dev wlan0
sleep 1
ip link show dev wlan0
iw dev wlan0 connect "ssid_of_open_AP"
ip link show dev wlan0
sleep 1
ip link show dev wlan0
sleep 1
ip link show dev wlan0
Project Manager
Stijn Tintel commented on 06.06.2020 01:09

https://drive.google.com/file/d/1SJQx9ukPT6ZeOmz1GpYwYlYgeWRBTYSO/view?usp=sharing

Since the device was not connected I could not do this over SSH, so here's a photo.

Project Manager
Stijn Tintel commented on 06.06.2020 01:27

As I noticed some changes were made to wifi reconfig recently, I tried with master r13499-7b4877c204 and at first sight I can no longer reproduce.

The crng is initialized a bit faster in this new version, I will try booting the RPi0W with the AP offline for 2 minutes.

[   72.436719] random: crng init done
[   72.453561] random: 2 urandom warning(s) missed due to ratelimiting
[   73.233633] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready

FYI, on this version, I am not seeing this line:

[   32.392170] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
Project Manager
Stijn Tintel commented on 06.06.2020 02:22

Brought up the AP after ~25m uptime on the RPi0W and it instantly connected and requested an IP via DHCP.

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing