New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FS#4098 - MESH-SAE-AUTH-BLOCKED #9082
Comments
nemesisdev: The exact commit of my OpenWrt master build is ade56b8d9e. |
Steve-Newcomb: We have this problem too. It occurs in two of our three meshes. It is much more frequent lately. I do not know whether it is merely coincidental that we recently upgraded from 21.01 to 21.02. My current solution is to maintain a pair of openssh tunnels between each dhcp server (in which gw_mode='server') and each client (in which gw_mode='client'). If a dhcp server finds itself with no clients that are (still?) in contact with it, it reboots. If a client finds itself with no dhcp server that is (still?) in contact with it, it reboots. It's a ridiculously heavy solution which is a lot of trouble to set up in a secure manner, but it has the advantage that each node can detect whether it is in contact with the node(s) with which it has one or more critical relationships. I suspect this problem is actually a driver issue. These are all Archer [CA]7 v [245] routers (affordable!) with QCA "wave1" radios. I haven't been able to use the -CT (Candela Technologies) driver for those radios in a mesh; perhaps I haven't understood the advice I've received about that, or perhaps the advice just doesn't work. Therefore, I have to use the stock (QCA) driver's inherent 802.11s implementation, which has quirks. For example, it always fails, usually with hours or minutes, if I have tweaked the radio's built-in MAC address. Therefore, I suspect the QCA firmware may be insufficiently hardened against the depredations of real-world environments. On the other hand, this could be a real OpenWRT bug. I have no explanation as to why it is suddenly so much more frequent. If anyone can suggest debugging instrumentation that I haven't already tried, I'll be grateful for the advice. |
EelcoV: Currently I am not using openwrt, but I have/had a similar issue. This had to do with "too" many clients trying to connect to the mesh peer at the same time. It then also got into the PLINK_BLOCKED state. First of all, I removed setting the PLINK_BLOCKED state when authentication fails several times (couldn't find it in the ieee802.11 standard anyway...). Then I noticed a lot of "anti-clogging" messages (see also chapter 12.4.6 in ieee802.11 standard). This mechanism will start sending tokens along with frames to reduce the number of peers which are allowed to perform authentication at the same time. This then led to peers getting blocked because they were not allowed to authenticate. Maybe you can check your logs for this kind of messages; Also, when you try to reproduce the issue, make sure you have a lot of peers (I had to have more than 5 peers...) I have posted my original issue here, maybe this helps to get more insight into the issue. http://lists.infradead.org/pipermail/hostap/2021-December/040095.html |
I'm getting the same messages since a recent rebase. The setup is: mesh point with fwding=0,ttl=1 with batman for routing on various mt7621 devices (Cudy WR2100, Cudy M1800, Dual-Q M721), wpad-openssl. There were two seemingly unrelated updates of openssl and bunch more for mt76. Hostapd is the same. |
In my test setup with two mt7621 devices, wpa_supplicant is constantly at 10% CPU - even if both devices have their peer in the 'blocked' state. The moment the other device is turned off or its radio disabled (
I'm giving it another go. If only coz was easy to use on openwrt.. |
I went at it using printf-debugging in I started from
sae_set_retransmit_timer(hapd, sta);
} else {
+wpa_printf(MSG_ERROR, "sae_sm_step state confirmed, accepting client. send_confirm: %u", (unsigned) sta->sae->send_confirm);
sta->sae->send_confirm = 0xffff;
sae_accept_sta(hapd, sta);
}
break; .. it didn't hit and I decided to give up for the day.
Reliable handshakes! My only changes are printfs, nothing else. No compiler flags or configuration change 🤔 🤯 So atm, I'm thinking 'race condition'. I wont have time for this until next week, so I'm leaving this here for the moment. |
The single one statement that decides whether the handshake succeeds or not (provided functioning wifi, correct password etc.) is:
replacing it with a sleep now and will try to move it 'forward' in hope to come closer to the step susceptible to the timing. |
i'm giving up :-( the printf only becomes 'effective' when there's a should anyone want to pick this up: run
|
nemesisdev:
Device problem occurs on: reported by multiple users on [[https://github.com/MESH-SAE-AUTH-FAILURE libremesh/lime-packages#837|different devices]], I am using [[http://www.win-star.com/en_us/product/WS_WN552K1_WN552K2_WN552K3.html|a mediatek based one]]
I am experiencing this on current master, revision r2857+4-9d994f35b4
Steps to reproduce: it randomly occurs some times when the root node of a mesh using plain 802.11s (mesh mode) + SAE/PSK2 authentication is rebooted (or a power outage), in order to replicate it, one would have to keep on rebooting aggressively until it happens. Maybe turning off and on wifi may be able to replicate it as well
What happens?
Some times, the devices in a mesh can't connect each other after a power outage or a reboot of the root node (the node which is connected to the gateway and allows the rest of the mesh to connect to the internet).
Log lines:
Oct 20 13:04:40 OpenWrt wpa_supplicant[1335]: mesh0: MESH-SAE-AUTH-FAILURE addr=*0:3f:5d:::1a
Oct 20 13:04:47 OpenWrt wpa_supplicant[1335]: mesh1: MESH-SAE-AUTH-FAILURE addr=*0:3f:5d:::1b
Oct 20 13:04:59 OpenWrt wpa_supplicant[1335]: mesh0: MESH-SAE-AUTH-FAILURE addr=*0:3f:5d:::1a
Oct 20 13:05:01 OpenWrt wpa_supplicant[1335]: mesh1: MESH-SAE-AUTH-FAILURE addr=*0:3f:5d:::1b
Oct 20 13:05:11 OpenWrt wpa_supplicant[1335]: mesh0: MESH-SAE-AUTH-FAILURE addr=*0:3f:5d:::1a
Oct 20 13:05:12 OpenWrt wpa_supplicant[1335]: mesh1: MESH-SAE-AUTH-FAILURE addr=*0:3f:5d:::1b
Oct 20 13:05:24 OpenWrt wpa_supplicant[1335]: mesh0: MESH-SAE-AUTH-FAILURE addr=*0:3f:5d:::1a
Oct 20 13:05:24 OpenWrt wpa_supplicant[1335]: mesh0: MESH-SAE-AUTH-BLOCKED addr=*0:3f:5d:::1a duration=300
Oct 20 13:05:26 OpenWrt wpa_supplicant[1335]: mesh1: MESH-SAE-AUTH-FAILURE addr=*0:3f:5d:::1b
Oct 20 13:05:26 OpenWrt wpa_supplicant[1335]: mesh1: MESH-SAE-AUTH-BLOCKED addr=*0:3f:5d:::1b duration=300
When this happens, the links show up in "iw mesh0 station dump" or "iw mesh1 station dump" but in BLOCKED state.
Rebooting the nodes which have their link blocked at the same time fixes the issue, which seems to rule out an interference issue, because how can a reboot fix an interference issue?
I also tried setting "cell_density '1'" in the configuration of the radios, but the problem keep happening, it doesn't happen often, but when it happens it can wreak havoc.
The mesh configuration is the following:
config wifi-device 'radio0'
option type 'mac80211'
option channel '11'
option hwmode '11g'
option path '1e140000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0'
option htmode 'HT20'
option disabled '0'
option log_level '0'
option legacy_rates '0'
option country 'US'
option cell_density '1'
config wifi-device 'radio1'
option type 'mac80211'
option hwmode '11a'
option path '1e140000.pcie/pci0000:00/0000:00:01.0/0000:02:00.0'
option htmode 'VHT80'
option disabled '0'
option log_level '0'
option channel '40'
option country 'US'
option cell_density '1'
config wifi-iface 'wifi_mesh0'
option device 'radio0'
option ifname 'mesh0'
option mode 'mesh'
option encryption 'psk2+ccmp'
option key ''
option mesh_id ''
option network 'lan'
option mesh_fwding '1'
option mesh_rssi_threshold '-80'
config wifi-iface 'wifi_mesh1'
option device 'radio1'
option ifname 'mesh1'
option mode 'mesh'
option encryption 'psk2+ccmp'
option key ''
option mesh_id ''
option network 'lan'
option mesh_fwding '1'
option mesh_rssi_threshold '-80'
config wifi-iface 'wifi_wlan0'
option device 'radio0'
option ifname 'wlan0'
option mode 'ap'
option encryption 'psk2'
option key ''
option ssid ''
option network 'lan'
option ieee80211r '1'
option ft_psk_generate_local '1'
option rsn_preauth '1'
option reassociation_deadline '20000'
option ft_over_ds '1'
config wifi-iface 'wifi_wlan1'
option device 'radio1'
option ifname 'wlan1'
option mode 'ap'
option encryption 'psk2'
option key ''
option ssid ''
option network 'lan'
option ieee80211r '1'
option ft_psk_generate_local '1'
option rsn_preauth '1'
option reassociation_deadline '20000'
option ft_over_ds '1'
The text was updated successfully, but these errors were encountered: