OpenWrt/LEDE Project

  • Status Unconfirmed
  • Percent Complete
    0%
  • Task Type Bug Report
  • Category Base system
  • Assigned To No-one
  • Operating System All
  • Severity High
  • Priority Very Low
  • Reported Version Trunk
  • Due in Version Undecided
  • Due Date Undecided
  • Private
Attached to Project: OpenWrt/LEDE Project
Opened by Kristian Evensen - 09.03.2017

FS#612 - WAN to LAN leakage on MT7620 devices

I am currently testing two MT7620 devices - the ZBT WE826 and the Sanlinking D240. During the first seconds of the boot, I see that packets leak between the WAN and LAN ports. Typically, this results in clients receiving a DHCP reply from my upstream router, rendering the clients without connectivity when the switch is properly initialized. I have also tested with the default firmware and do not see this behavior. Also, if I stop the devices in the bootloader, then packets do not leak until I resume boot again.

In order to try to solve this bug, I have tried to port some (at least to me) missing steps from the bootloader switch code and to the mt7620 switch driver in LEDE. This did not have an effect, at least not on the packet leak. A work-around I have found is to update u-boot and remove the input delay, so that the device will boot immediately. However, this is quite cumbersome to install and not very reliable. I suspect my luck with this work-around is more due to the timing of the DHCP clients in Ubuntu and Windows 10.

Does anyone have any idea as to what could be wrong and where to start looking?

Thanks in advance for any help.

 


Kristian Evensen commented on 09.03.2017 21:58

This bug is probably related to  FS#103 , which was fixed with commit 566de813c318d6d30ec3645ee46d3e7357e49f5e. However, this fix does not seem to work for me on the devices I am testing.

Kristian Evensen commented on 10.03.2017 00:08

After taking a closer look at the commit mentioned in my previous comment, I forced global_vlan_enable to be 0 in mt7530_probed for mt7620-devices. This seems to have solved the leak and without any side-effects. My VLANs, etc. still work fine.

I do not know if this an acceptable solution to the problem and if I should send a patch, or I should regard it as an ugly hack and keep it in my own tree? :)

psyborg55 commented on 10.03.2017 10:55

do you know why input delay (boot timeout?) would be relevant in this case?

also check that some similar commit did not cause this?

Kristian Evensen commented on 10.03.2017 12:35

Hi,

Thanks for the reply. I had already removed the script in you link, but with no effect.

The reason I suspect the timeout has an impact, is that when I remove the timeout then the switch will already be properly initialized before my server has time to send a DHCP reply. However, I don't think spending time on timeout is worth it, it is not a fix anyway.

-Kristian

Kristian Evensen commented on 10.03.2017 18:48

I did some more testing. I compiled a new bootloader with WAN/LAN partitioning available and then two firmware images, one with my crude fix and another without the fix. For both images, I also instrumented the kernel to write a debug message when mt7530_apply_config() is called. When booting the router, I ran arping querying for the IP of the upstream router.

Without my fix, I see roughly ten ARP replies. The time of the first replies matches with the first time apply_config is called, while the number of replies matches pretty well with the time it takes from apply_config() is called for the first time and until the actual switch config is set (i.e., my network config).

With my fix (and WAN/LAN partitioning) I saw no ARP replies from the upstream router across ~50 reboots of the router. I also tried to replicate the partitioning steps of the bootloader in the mt7620 switch driver, but I saw some leakage during some boots.

If anyone is interested in looking at my mt7620 configuration code, please let me know and I will share it here. I suspect this issue can be fixed without flashing the bootloader, just by setting up the switch correctly.

Admin
Jo-Philipp Wich commented on 01.06.2017 23:34

Please share your patch here so that a proper solution can be implemented.

Kristian Evensen commented on 05.06.2017 08:11

Hi,

Sorry for my slow reply, here is the patch (for 4.4). I have to admit that I don't remember all the details and I can't find my notes, but the purpose of the change is to ensure that the code at the top of mt7530_apply_config() is run once on boot. This seemed to be required to isolate the ports properly during boot, before the correct vlan configuration is applied.

This fix is not perfect, in the sense that I still sometimes see a few packages leaking if the bootloader is not updated. As I wrote earlier, I tried to replicate all configuration steps from the Mediatek driver, as wel as compare with the one in LEDE, but was not sucessful.

-Kristian

JamesT42 commented on 03.07.2017 14:08

This also affects MT7628 devices like the new TP-Link TL-WR841N V13

M.T. commented on 12.10.2017 19:11

This problem affects also DIR-300 B1 and DIR-615 H1 with LEDE 17.01.3

Toan commented on 13.10.2017 01:38

Hi,

I just want to share my patch which resolved the issue w/ the leakage, and also enable gigabit speed on the mt7530. Feel free to adopt it if you feel it could be useful.

thanks,

TP

------ target/linux/ramips/files/drivers/net/ethernet/ralink/gsw_mt7620a.c
@@ -43,6 +43,8 @@
 #include <ralink_regs.h>
 #include <asm/mach-ralink/mt7620.h>
 
+#include <gpio.h>
+
 #include "ralink_soc_eth.h"
 #include "gsw_mt7620a.h"
 #include "mt7530.h"
@@ -469,10 +471,57 @@ void mt7620_port_init(struct fe_priv *priv, struct device_node *np)
 	}
 }
 
+static void reset_mt7530_gsw(struct mt7620_gsw *gsw)
+{
+	u32 val;
+	int active_low = 0;
+	int reset_gpio = 10;
+	int delay=100000;
+	printk("mt7530: entering reset gws\r\n");
+	if (!gpio_request(reset_gpio, "gsw-reset")) {
+		printk("mt7530: reseting gws\r\n");
+        gpio_direction_output(reset_gpio, active_low ? 1 : 0);
+        udelay(delay);
+        gpio_set_value(reset_gpio, active_low ? 0 : 1);
+        udelay(delay);
+        gpio_set_value(reset_gpio, active_low ? 1 : 0);
+        udelay(delay);
+        gpio_set_value(reset_gpio, active_low ? 0 : 1);
+        udelay(125000);
+     }
+//#ifdef _notuse_
+    if ((rt_sysc_r32(SYSC_REG_CHIP_REV_ID) & 0xFFFF) == 0x0101) {
+        /* (GE1, Force 1000M/FD, FC ON) */
+        gsw_w32(gsw, 0x2005e30b, 0x100);
+        mt7530_mdio_w32(gsw, 0x3600, 0x5e30b); // use 0x5e337 for force link;
+    } else {
+        /* (GE1, Force 1000M/FD, FC ON) */
+        gsw_w32(gsw, 0x2005e33b, 0x100);
+        mt7530_mdio_w32(gsw, 0x3600, 0x5e33b);  // use 0x5e337 for 100mb Force link;
+    }
+
+    /* (GE2, Link down) */
+    gsw_w32(gsw, 0x8000, 0x200);
+
+    //val = 0x117ccf; //Enable Port 6, P5 as GMAC5, P5 disable
+    val = mt7530_mdio_r32(gsw, 0x7804);
+    val &= ~(1<<8); //Enable Port 6
+    val |= (1<<6); //Disable Port 5
+    val |= (1<<13); //Port 5 as GMAC, no Internal PHY
+
+	val |= (1<<16);//change HW-TRAP
+	printk("change HW-TRAP to 0x%x\n", val);
+	mt7530_mdio_w32(gsw, 0x7804, val);
+//#endif
+}
+
 static void gsw_hw_init_mt7620(struct mt7620_gsw *gsw, struct device_node *np)
 {
 	u32 is_BGA = mt7620_is_bga();
 
+	// pham: reset the external switch
+	reset_mt7530_gsw(gsw);
+
 	rt_sysc_w32(rt_sysc_r32(SYSC_REG_CFG1) | BIT(8), SYSC_REG_CFG1);
 	gsw_w32(gsw, gsw_r32(gsw, GSW_REG_CKGCR) & ~(0x3 << 4), GSW_REG_CKGCR);
nani xu commented on 07.07.2019 16:51

This also affects mi wifi nano. i compiled 15.05 as base 7628 borad, but firmware limit to 4m, and don't know how to extend . if anyone can help, i@nanixu.com very appreciated

Mikael Broström commented on 06.08.2020 10:15

I have this issue also, and also a other problem. the default port-map like llllw or wllll etc will create leak between lan ports also.
causing loops, err-disabled or spanning-tree issues.

I use some port as trunk ports later in openwrt...

i wrote a test patch to the driver that does the following config instead of dts wllll or llllw etc..:
enable_vlan=1
port0=vlan0
port1=vlan1
port2=vlan2
port3=vlan3
port4=vlan4
port5=vlan5
port6=vlan6
port7=vlan7
....
portX=vlanX
....

this is the result that openwrt gets before the switch is initialized..

result:
/ # swconfig dev switch0 show|grep -e link -e pvid -e vlan
enable_vlan: 1
pvid: 0
link: port:0 link:down
pvid: 1
link: port:1 link:down
pvid: 2
link: port:2 link:down
pvid: 3
link: port:3 link:down
pvid: 4
link: port:4 link:down
pvid: 5
link: port:5 link:down
pvid: 6
link: port:6 link:up speed:1000baseT full-duplex
pvid: 7
link: port:7 link:down

it works well to prevent leak between all ports on init (fixes: lan to wan lek, lan to lan leak)

i then let openwrt initialize the switch using my configuration.. with correct cpu port and config.

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing