Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#3561 - hw flow offload cause a kernel panic (oops) in mt7622 (BPI-R64) with trunk #8586

Open
openwrt-bot opened this issue Jan 4, 2021 · 5 comments
Labels
flyspray kernel pull request/issue with Linux kernel related changes

Comments

@openwrt-bot
Copy link

dixyes:

the device is bananapi r64, image was built using code from r64-emmc branch of https://github.com/graphine27/openwrt

(this branch is mainly modifing image build for emmc setup, codes related to hw offload is not modified.)

this branch is after b59d5c8

the kernel oops (as panic) randomly if hw flow offload is enabled.

panic log here, note that virtual address "ffffff883e35ffa0" do not change amoung builds of kernel, even all the same in both 5.4.86 and 5.4.85

[ 1148.631719] Unable to handle kernel paging request at virtual address ffffff883e35ffa0 [ 1148.639650] Mem abort info: [ 1148.642450] ESR = 0x96000045 [ 1148.645496] EC = 0x25: DABT (current EL), IL = 32 bits [ 1148.650820] SET = 0, FnV = 0 [ 1148.655255] EA = 0, S1PTW = 0 [ 1148.659777] Data abort info: [ 1148.664040] ISV = 0, ISS = 0x00000045 [ 1148.669257] CM = 0, WnR = 1 [ 1148.673608] swapper pgtable: 4k pages, 39-bit VAs, pgdp=000000004493b000 [ 1148.681688] [ffffff883e35ffa0] pgd=0000000000000000, pud=0000000000000000 [ 1148.689858] Internal error: Oops: 96000045 [#1] SMP [ 1148.696102] Modules linked in: iwlmvm iwldvm pppoe pl2303 nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet mt7615e mt7615_common mt76 mac80211 l2tp_ppp iwlwifi huawei_cdc_ncm ebtable_nat ebtable_filter ebtable_broute cp210x ch341 cfg80211 cdc_ncm xt_u32 xt_time xt_tcpmss xt_string xt_statistic xt_state xt_socket xt_recent xt_quota xt_policy xt_pkttype xt_physdev xt_owner xt_nat xt_multiport xt_mark xt_mac xt_lscan xt_limit xt_length2 xt_length xt_ipv4options xt_iprange xt_ipp2p xt_iface xt_hl xt_helper xt_hashlimit xt_geoip xt_fuzzy xt_esp xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_condition xt_comment xt_cluster xt_bpf xt_addrtype xt_TRACE xt_TPROXY xt_TEE xt_TCPMSS xt_SYSRQ xt_REDIRECT xt_PROTO xt_NFQUEUE xt_NFLOG xt_NETMAP xt_MASQUERADE xt_LOGMARK xt_LOG xt_LED xt_IPMARK xt_HL xt_FLOWOFFLOAD xt_DSCP xt_DNETMAP xt_DHCPMAC xt_CT xt_CLASSIFY xt_CHECKSUM xt_DELUDE xt_TARPIT ipt_REJECT xt_tcpudp xt_CHAOS xt_ACCOUNT usbserial usbnet usbhid [ 1148.696202] ums_usbat ums_sddr55 ums_sddr09 ums_karma ums_jumpshot ums_isd200 ums_freecom ums_datafab ums_cypress ums_alauda ts_fsm ts_bm tcp_hybla tcp_bbr sfp sch_mqprio sch_cake rtl8150 r8152 pptp pppox ppp_mppe ppp_async nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject_bridge nft_reject nft_redir nft_quota nft_queue nft_objref nft_numgen nft_nat nft_meta_bridge nft_masq nft_log nft_limit nft_hash nft_fwd_netdev nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_dup_netdev nft_ct nft_counter nft_chain_nat nfnetlink_queue nfnetlink_log nf_tproxy_ipv6 nf_tproxy_ipv4 nf_tables_set nf_tables nf_socket_ipv6 nf_socket_ipv4 nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_dup_netdev nf_dup_ipv6 nf_dup_ipv4 nf_conntrack_tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack_rtcache nf_conntrack_pptp nf_conntrack_netlink nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp [ 1148.783268] nf_conntrack_broadcast ts_kmp nf_conntrack_amanda nf_conncount mdio_i2c mdio_gpio mdio_bitbang iptable_raw iptable_nat iptable_mangle iptable_filter ipt_rpfilter ipt_ah ipt_ECN ipt_CLUSTERIP ip6table_raw ip6t_rpfilter ip_tables hid_generic exfat ebtables ebt_vlan ebt_stp ebt_snat ebt_redirect ebt_pkttype ebt_nflog ebt_mark_m ebt_mark ebt_log ebt_limit ebt_ip6 ebt_ip ebt_dnat ebt_arpreply ebt_arp ebt_among ebt_802_3 e1000e crc_ccitt compat_xtables compat cls_flower cdc_wdm br_netfilter asn1_decoder arptable_filter arpt_mangle arp_tables act_vlan sch_teql sch_sfq sch_red sch_prio sch_pie sch_multiq sch_gred sch_fq sch_dsmark sch_codel em_text em_nbyte em_meta em_cmp act_simple act_police act_pedit act_ipt act_gact act_csum em_ipset cls_bpf act_bpf act_ctinfo act_connmark sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred sg hid evdev i2c_gpio i2c_dev ledtrig_usbport ledtrig_heartbeat [ 1148.870868] ledtrig_gpio ledtrig_activity gpio_beeper xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink sr_mod cdrom ip6table_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6t_NPT ip6t_rt ip6t_mh ip6t_ipv6header ip6t_hbh ip6t_frag ip6t_eui64 ip6t_ah nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 pppoatm ppp_generic slhc bonding ip6_gre ip_gre gre ixgbe igb i2c_algo_bit i2c_core hwmon ifb nat46 l2tp_ip6 l2tp_ip l2tp_eth sctp libcrc32c mdio l2tp_netlink l2tp_core udp_tunnel ip6_udp_tunnel ipcomp6 xfrm6_tunnel esp6 ah6 xfrm4_tunnel ipcomp esp4 ah4 ipip ip6_tunnel netlink_diag tunnel6 tunnel4 ip_tunnel veth tun loop xfrm_user xfrm_ipcomp af_key xfrm_algo autofs4 dm_mirror [ 1148.957510] dm_region_hash dm_log dm_crypt dm_mod dax br2684 atm nls_utf8 nls_cp950 nls_cp936 md5 echainiv des_generic libdes cbc authenc arc4 fuse nls_iso8859_1 nls_cp437 uas usb_storage sdhci_pltfm sdhci input_core leds_gpio xhci_plat_hcd ohci_platform ohci_hcd ahci fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug f2fs ptp pps_core mii [ 1149.076312] CPU: 0 PID: 17994 Comm: kworker/0:1 Not tainted 5.4.85 #0 [ 1149.082741] Hardware name: Bananapi BPI-R64 (DT) [ 1149.087394] Workqueue: events 0xffffffc008fea260 [ 1149.092004] pstate: 60000005 (nZCv daif -PAN -UAO) [ 1149.096789] pc : mtk_flow_offload_add+0xd0/0x180 [ 1149.101396] lr : mtk_flow_offload_add+0xbc/0x180 [ 1149.106002] sp : ffffffc013323c60 [ 1149.109306] x29: ffffffc013323c60 x28: 0000000000000000 [ 1149.114609] x27: ffffff803ccb3a38 x26: ffffffc0108838e8 [ 1149.119911] x25: 0000000000000000 x24: 000000000000047c [ 1149.125212] x23: ffffff80030a2fb8 x22: 0000000000000f7d [ 1149.130514] x21: ffffff803b033800 x20: 0000000000000000 [ 1149.135816] x19: ffffff80030a2080 x18: 0000000000000014 [ 1149.141118] x17: 0000000080301482 x16: 000000001ea486e4 [ 1149.146419] x15: 00000000d00fee3e x14: 00000000d1faf0f2 [ 1149.151720] x13: 00000000da4ded9b x12: 000000002bc29ec3 [ 1149.157023] x11: 00000000463f697b x10: 00000000000007f0 [ 1149.162325] x9 : 00000000007ff020 x8 : 00000000007ff100 [ 1149.167626] x7 : 00000000090395eb x6 : a00460964dc40000 [ 1149.172928] x5 : 00000000670a7ca3 x4 : ffffff803dc208a0 [ 1149.178230] x3 : 0000000000000002 x2 : 00000000ffffffe4 [ 1149.183532] x1 : ffffff883e35ffa0 x0 : 00000000ffffffe4 [ 1149.188834] Call trace: [ 1149.191272] mtk_flow_offload_add+0xd0/0x180 [ 1149.195531] mtk_flow_offload+0x4c/0x60 [ 1149.199363] 0xffffffc008fea39c [ 1149.202497] process_one_work+0x1fc/0x390 [ 1149.206496] worker_thread+0x48/0x4d0 [ 1149.210150] kthread+0x120/0x128 [ 1149.213370] ret_from_fork+0x10/0x1c [ 1149.216939] Code: 8b364c21 c89ffc35 f947b661 8b204c21 (c89ffc35) [ 1149.223022] ---[ end trace 6f442c4095dcee79 ]--- [ 1149.227630] Kernel panic - not syncing: Fatal exception [ 1149.232846] SMP: stopping secondary CPUs [ 1149.236761] Kernel Offset: disabled [ 1149.240240] CPU features: 0x0002,04002004 [ 1149.244237] Memory Limit: none [ 1149.247283] Rebooting in 3 seconds..

while objdump -d -S drivers/net/ethernet/mediatek/mtk_offload.o is

...
00000000000000f0 <mtk_flow_offload_add>:
int mtk_flow_offload_add(struct mtk_eth *eth,
enum flow_offload_type type,
struct flow_offload *flow,
struct flow_offload_hw_path *src,
struct flow_offload_hw_path *dest)
{
f0: a9b27bfd stp x29, x30, [sp, #-224]!
f4: 910003fd mov x29, sp
f8: a90153f3 stp x19, x20, [sp, #16]
struct flow_offload_tuple *otuple = &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple;
fc: 91002054 add x20, x2, #0x8
struct flow_offload_tuple *rtuple = &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple;
struct mtk_foe_entry orig, reply;
u32 ohash, rhash, timestamp;
...
rhash = mtk_foe_entry_commit(&eth->ppe, &reply, timestamp);
19c: 2a1803e2 mov w2, w24
1a0: 910243e1 add x1, sp, #0x90
1a4: aa1703e0 mov x0, x23
1a8: 94000000 bl 0 <mtk_foe_entry_commit>
if (rhash < 0) {
mtk_foe_entry_clear(&eth->ppe, ohash);
return -EINVAL;
}

    rcu_assign_pointer(eth->foe_flow_table[ohash], flow);

1ac: f947b661 ldr x1, [x19, #3944]
1b0: 8b364c21 add x1, x1, w22, uxtw #3
1b4: c89ffc35 stlr x21, [x1]
rcu_assign_pointer(eth->foe_flow_table[rhash], flow);
1b8: f947b661 ldr x1, [x19, #3944]
1bc: 8b204c21 add x1, x1, w0, uxtw #3
1c0: c89ffc35 stlr x21, [x1]
...

@aparcar aparcar added the kernel pull request/issue with Linux kernel related changes label Feb 22, 2022
@quarkysg
Copy link
Contributor

I also encountered a kernel panic when I enabled H/W offload for my Linksys E84500. My E8450 is running my own master build as of this commit. I'm still on firewall3, but it should not matter I think.

My Internet is via PPPoE over VLAN.

I've disabled H/W offload for now and sticking to S/W offload.

Here's the stack trace extracted from ramoops:

> <6>[80279.214412] mt7530 mdio-bus:00 lan2: Link is Up - 100Mbps/Full - flow control off
> <6>[80279.221986] sw0: port 2(lan2) entered blocking state
> <6>[80279.226972] sw0: port 2(lan2) entered forwarding state
> <6>[80279.227905] mt7530 mdio-bus:00 lan2: Link is Down
> <6>[80279.236888] sw0: port 2(lan2) entered disabled state
> <6>[80280.880233] mt7530 mdio-bus:00 lan2: Link is Up - 100Mbps/Full - flow control rx/tx
> <6>[80280.887967] sw0: port 2(lan2) entered blocking state
> <6>[80280.892943] sw0: port 2(lan2) entered forwarding state
> <6>[80282.201763] mt7530 mdio-bus:00 lan2: Link is Down
> <6>[80282.206582] sw0: port 2(lan2) entered disabled state
> <1>[80527.045335] Unable to handle kernel paging request at virtual address dead000000000110
> <1>[80527.053276] Mem abort info:
> <1>[80527.056097]   ESR = 0x96000004
> <1>[80527.059149]   EC = 0x25: DABT (current EL), IL = 32 bits
> <1>[80527.064461]   SET = 0, FnV = 0
> <1>[80527.067536]   EA = 0, S1PTW = 0
> <1>[80527.070680]   FSC = 0x04: level 0 translation fault
> <1>[80527.075577] Data abort info:
> <1>[80527.078457]   ISV = 0, ISS = 0x00000004
> <1>[80527.082289]   CM = 0, WnR = 0
> <1>[80527.085299] [dead000000000110] address between user and kernel address ranges
> <0>[80527.092438] Internal error: Oops: 96000004 [#1] SMP
> <7>[80527.097317] Modules linked in: pppoe ppp_async iptable_nat xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD wireguard pppox ppp_generic nf_nat nf_flow_table nf_conntrack mt7915e mt7615e mt7615_common mt76_connac_lib mt76 mac80211 libchacha20poly1305 libblake2s ipt_REJECT ebtable_nat ebtable_filter ebtable_broute chacha_neon cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY slhc poly1305_neon nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcurve25519_generic libchacha libblake2s_generic iptable_mangle iptable_filter ipt_ECN ip_tables hwmon ebtables ebt_vlan ebt_stp ebt_redirect ebt_pkttype ebt_mark_m ebt_mark ebt_limit ebt_among ebt_802_3 crc_ccitt compat sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact xt_set ip_set_list_set
> <7>[80527.097743]  ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ip6_gre ip_gre gre ip6_udp_tunnel udp_tunnel sit ipip ip6_tunnel tunnel6 tunnel4 ip_tunnel tun shortcut_fe_ipv6 shortcut_fe seqiv usb_storage leds_gpio xhci_plat_hcd gpio_button_hotplug
> <7>[80527.233199] CPU: 1 PID: 2855 Comm: kworker/u4:2 Tainted: G S                5.15.35 #0
> <7>[80527.241122] Hardware name: Linksys E8450 (UBI) (DT)
> <7>[80527.245999] Workqueue: nf_ft_offload_stats nf_flow_table_offload_setup [nf_flow_table]
> <7>[80527.253942] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> <7>[80527.260909] pc : nf_flow_offload_ip_hook+0x798/0x14e0 [nf_flow_table]
> <7>[80527.267360] lr : nf_flow_offload_ip_hook+0x7ac/0x14e0 [nf_flow_table]
> <7>[80527.273812] sp : ffffffc00cc7bc30
> <7>[80527.277122] x29: ffffffc00cc7bc30 x28: 0000000000000000 x27: 0000000000000000
> <7>[80527.284268] x26: 0000000000000000 x25: ffffff80024b3600 x24: ffffffc000d4a660
> <7>[80527.291413] x23: ffffffc00cc7bd50 x22: 0000000000000002 x21: ffffffc000d4a650
> <7>[80527.298557] x20: 0000000000000000 x19: dead0000000000f0 x18: 0000000000000000
> <7>[80527.305701] x17: 000000000000001c x16: ffffffc008848680 x15: 000000000000001e
> <7>[80527.312846] x14: ffffffffffffffff x13: 0000000000000018 x12: 0101010101010101
> <7>[80527.319989] x11: 7f7f7f7f7f7f7f7f x10: fefefefefefefeff x9 : ffffff80024b387c
> <7>[80527.327133] x8 : ffffffc00cc7bce8 x7 : 0000000000000000 x6 : ffffffc00cc7bca8
> <7>[80527.334276] x5 : 0000000000000002 x4 : 0000000000000000 x3 : ffffff8003888b00
> <7>[80527.341419] x2 : 0000000000000000 x1 : dead000000000100 x0 : 0000000000000001
> <7>[80527.348564] Call trace:
> <7>[80527.351007]  nf_flow_offload_ip_hook+0x798/0x14e0 [nf_flow_table]
> <7>[80527.357111]  nf_flow_table_offload_setup+0x47c/0x670 [nf_flow_table]
> <7>[80527.363471]  process_one_work+0x200/0x3b4
> <7>[80527.367489]  worker_thread+0x17c/0x4dc
> <7>[80527.371241]  kthread+0x11c/0x130
> <7>[80527.374472]  ret_from_fork+0x10/0x20
> <0>[80527.378056] Code: d1004013 eb0002bf 54000580 52800014 (f9401263) 
> <4>[80527.384147] ---[ end trace 89e1d2f01f783a8b ]---
> <0>[80527.481840] Kernel panic - not syncing: Oops: Fatal exception
> <2>[80527.487593] SMP: stopping secondary CPUs
> <0>[80527.491517] Kernel Offset: disabled
> <0>[80527.494999] CPU features: 0x00003000,00000802
> <0>[80527.499354] Memory Limit: none

@Mushoz
Copy link
Contributor

Mushoz commented May 19, 2022

@quarkysg Firewall 3 uses iptables as its backend. Netfilter code was written specifically for nftables, and later backported in a relatively hacky way to iptables. There could potentially be a race condition in the iptables version of the netfilter code that is triggering these kernel panics. I would try retesting with Firewall 4 / nftables to see if the issue persists.

@quarkysg
Copy link
Contributor

@Mushoz My understanding is that iptables is just a user land tools to create rules for the kernel netfilter stack. Similar nftables is also another user land tool to create netfilter rules. How would a user land tool create race conditions for netfilter in the kernel?

The stack trace of the kernel panic I encountered appears to be caused by non-existence offload flow record. It would seem to me that there probably is a bug in the connection setup to flow offload kernel code that is causing the panic.

@Mushoz
Copy link
Contributor

Mushoz commented May 21, 2022

@quarkysg I don't know the exact details. I am merely forwarding the knowledge that I have learned from an Openwrt dev. Specifically, Daniel speaks about this issue in this post here: https://forum.openwrt.org/t/belkin-rt3200-linksys-e8450-wifi-ax-discussion/94302/1458?u=mushoz

I would suggest asking him your question :) Curious to his answer myself!

@quarkysg
Copy link
Contributor

@Mushoz Thanks for the info. Will test out firewall4 and see how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flyspray kernel pull request/issue with Linux kernel related changes
Projects
None yet
Development

No branches or pull requests

4 participants