OpenWrt/LEDE Project

  • Status Unconfirmed
  • Percent Complete
    0%
  • Task Type Bug Report
  • Category Base system
  • Assigned To No-one
  • Operating System All
  • Severity High
  • Priority Very Low
  • Reported Version Trunk
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by william wortel - 05.09.2020

FS#3319 - erratic behaviour wireless network control on system with three wireless phy

Device: Ubiquiti Routerstation (ar71xx)
Software: OpenWrt SNAPSHOT, r14382+7-ad0f0df909 (local compilation after Sep 3 2020 git pull)
Symptoms: commands like ‘wifi up wlan1’ on a system with wlan0, wlan1, and wlan2 (phy0,1,2) , make other wlanN disapperar than wlan1.
plus in dmesg messages like
[ n.n] do_page_fault(): sending SIGSEGV to hostapd for invalid write access to 00000000
[ n.n] epc = 77e79e6c in libc.so[77e4c000+9c000]
[ n.s] ra = 77e1e309 in libubus.so[77e1c000+13000]

Bug found through experimentation:
script: /lib/netifd/wireless/mac80211.sh
function: drv_mac80211_teardown()
line: json_select data
problem: ‘data’ does not produce phy, ‘config’ does.
corrected line: json_select config

Going though experimentations also found that there may be problems in
script: /lib/netifd/hostapd.sh
function: wpa_supplicant_run()
line: ubus call wpa_supplicant config_add
and the lines that follow with parameters and line continuations.
Replaced:

ubus call wpa_supplicant config_add "{ \
	\"driver\": \"${_w_driver:-wext}\", \"ctrl\": \"$_rpath\", \
	\"iface\": \"$ifname\", \"config\": \"$_config\" \
	${network_bridge:+, \"bridge\": \"$network_bridge\"} \
	${hostapd_ctrl:+, \"hostapd_ctrl\": \"$hostapd_ctrl\"} \
	}"

By:

local br=${network_bridge:+',"bridge":"'$network_bridge'"'}
local hc=${hostapd_ctrl:+',"hostapd_ctrl":"'$hostapd_ctrl'"'}
local jsonstr='{"driver":"'${_w_driver:-wext}'","iface":"'${ifname}'"'${br}${hc}',"ctrl":"'${_rpath}'","config":"'${_config}'"}'
ubus call wpa_supplicant config_add ${jsonstr}

because in a separate test script just to see the workings of the original I could not get it to work due to the nested parameter substitution.
Have removed all the \” as the " do not need to be escaped when within single quotes.


william wortel commented on 12.09.2020 09:10

The hostapd crash is also easily reproducible by just 'wifi down' or 'wifi up'.
Initially thought that drv_mac80211_teardown() in /lib/netifd/wireless/mac80211.sh chose the wrong 'data' config.
But by json_dump at that place observed that this is only the case for one out of three physical wireless interfaces that get acted upon simultaniously. The other two get the data about phy served correctly.
The crash as such happens when drv_mac80211_setup() is called, either as setup or teardown action. But, again, only for one out of three wireless interfaces.
It occurs near the line 'local hostapd_pid=$(ubus call service list '{"name":"wpad"}' | jsonfilter -l 1 -e "@['wpad'].instances['hostapd'].pid")'
Even though the code before that line has been waiting for hostapd to be present for ubus (ubus wait_for hostapd), by the time communication occurs hostapd has already crashed and in the data 'running' is reported 'false'
But the code has no test for that and just reports the pid (empty) is not the correct one.
But, again, only for one out of three interfaces. After a hostapd crash it gets started again automatically unless the kernel has detected to many crashes of hostapd in a short time and then hostapd remains absent.
The missing interface is brought up later.
My preliminary conclusion is that the concurrency in time of three identical mac80211 commands to be executed, cause this instability. Apparently the ubus communication channel to hostapd does not properly interleave communications.

Another imperfection observed is that a command like 'wifi down wlan1' does not bring down just wlan1, when also wlan0 and wlan2 are present.
All three get brought down and then the two not given down are restarted afterwards. It is very inconvenient that manipulating one radio does not leave the others intact without interruption.
Earlier OpenWrt did not have this problem.

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing