Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FS#822 - busybox (ash) sporadically segfaults running shell scripts on ar71xx #6360

Closed
openwrt-bot opened this issue Jun 1, 2017 · 6 comments
Labels

Comments

@openwrt-bot
Copy link

NeoRaider:

I'm often seeing this message in my logs:

[ 2183.499756] do_page_fault(): sending SIGSEGV to dhcpv6.script for invalid read access from 00000000 [ 2183.509195] epc = 0041efe9 in busybox[400000+4b000] [ 2183.514285] ra = 0041efb1 in busybox[400000+4b000]

The issue might be a new variant of FS#251, which disappeared after a busybox upgrade.

I'm on a recent lede-17.01 version (dfecce6), with the follow adjustments to busybox:

CONFIG_BUSYBOX_CUSTOM=y

CONFIG_BUSYBOX_CONFIG_FEATURE_PREFER_IPV4_ADDRESS is not set

Hardware: TP-Link TL-WR841ND v9 (QCA9533)

My analysis so far:

Registers:

zero at v0 v1 a0 a1 a2 a3
R0 00000000 7fcfd4c8 00000000 0090b7c0 00910df4 00910df9 2f2f2f2f bcd0a2f0
t0 t1 t2 t3 t4 t5 t6 t7
R8 fefefeff 80808080 80083c3c 002f3634 7fcf9698 00000000 00000000 77a6e2c0
s0 s1 s2 s3 s4 s5 s6 s7
R16 0090ccc0 7fcf9900 00000004 0040788d 77a64000 77a64000 77a67518 77a68d8c
t8 t9 k0 k1 gp sp s8 ra
R24 0045b0e8 77a36d44 00000000 00000000 77a6e2c0 7fcf97d0 00000000 0041efb1
sr lo hi bad cause pc
0000f413 00000048 00000013 00000000 00800008 0041efe9
fsr fir
00000000 00000000

Disassembly:

Dump of assembler code for function find_command:
0x0041ef45 <+0>: save a0-a3,232,ra,s0-s1
0x0041ef49 <+4>: move s1,a1
0x0041ef4b <+6>: jal 0x449c61 strchr@mips16plt
0x0041ef4f <+10>: li a1,47
0x0041ef51 <+12>: beqz v0,0x41ef79 <find_command+52>
0x0041ef53 <+14>: li v0,1
0x0041ef55 <+16>: neg v0
0x0041ef57 <+18>: lw v1,240(sp)
0x0041ef59 <+20>: sw v0,4(s1)
0x0041ef5b <+22>: li v0,2
0x0041ef5d <+24>: and v0,v1
0x0041ef5f <+26>: bnez v0,0x41ef65 <find_command+32>
0x0041ef61 <+28>: li v0,0
0x0041ef63 <+30>: b 0x41ef75 <find_command+48>
0x0041ef65 <+32>: lw a0,232(sp)
0x0041ef67 <+34>: jal 0x44a441 stat@mips16plt
0x0041ef6b <+38>: addiu a1,sp,56
0x0041ef6d <+40>: slti v0,0
0x0041ef6f <+42>: bteqz 0x41ef61 <find_command+28>
0x0041ef71 <+44>: li v0,1
0x0041ef73 <+46>: neg v0
0x0041ef75 <+48>: sb v0,0(s1)
0x0041ef77 <+50>: b 0x41f1af <find_command+618>
0x0041ef79 <+52>: lw v0,0x41f1b4 <find_command+623>
0x0041ef7b <+54>: lw a0,244(sp)
0x0041ef7d <+56>: lw v0,0(v0)
0x0041ef7f <+58>: addiu v0,124
0x0041ef81 <+60>: lw v0,88(v0)
0x0041ef83 <+62>: addiu v0,5
0x0041ef85 <+64>: xor v0,a0
0x0041ef87 <+66>: sltiu v0,1
0x0041ef89 <+68>: move v1,t8
0x0041ef8b <+70>: sw v1,40(sp)
0x0041ef8d <+72>: beqz v0,0x41efa9 <find_command+100>
0x0041ef8f <+74>: lw v0,240(sp)
0x0041ef91 <+76>: li s0,8
0x0041ef93 <+78>: lw a1,0x41f1b8 <find_command+627>
0x0041ef95 <+80>: jal 0x449ba1 strstr@mips16plt
0x0041ef99 <+84>: or s0,v0
0x0041ef9b <+86>: beqz v0,0x41efa7 <find_command+98>
0x0041ef9d <+88>: lw v1,240(sp)
0x0041ef9f <+90>: li v0,40
0x0041efa1 <+92>: or v1,v0
0x0041efa3 <+94>: sw v1,240(sp)
0x0041efa5 <+96>: b 0x41efa9 <find_command+100>
0x0041efa7 <+98>: sw s0,240(sp)
0x0041efa9 <+100>: lw a0,232(sp)
0x0041efab <+102>: jal 0x41a75d
0x0041efaf <+106>: li a1,0
0x0041efb1 <+108>: move s0,v0
0x0041efb3 <+110>: beqz v0,0x41efdd <find_command+152>
0x0041efb5 <+112>: lb v0,8(v0)
0x0041efb7 <+114>: cmpi v0,1
0x0041efb9 <+116>: bteqz 0x41efc3 <find_command+126>
0x0041efbb <+118>: cmpi v0,2
0x0041efbd <+120>: btnez 0x41efc7 <find_command+130>
0x0041efbf <+122>: li v0,32
0x0041efc1 <+124>: b 0x41efc9 <find_command+132>
0x0041efc3 <+126>: li v0,4
0x0041efc5 <+128>: b 0x41efc9 <find_command+132>
0x0041efc7 <+130>: li v0,8
0x0041efc9 <+132>: lw v1,240(sp)
0x0041efcb <+134>: and v0,v1
0x0041efcd <+136>: bnez v0,0x41efd7 <find_command+146>
0x0041efcf <+138>: lbu v0,9(s0)
0x0041efd1 <+140>: beqz v0,0x41f1a3 <find_command+606>
0x0041efd5 <+144>: b 0x41efdd <find_command+152>
0x0041efd7 <+146>: li v0,0
0x0041efd9 <+148>: li s0,0
0x0041efdb <+150>: sw v0,40(sp)
0x0041efdd <+152>: jal 0x41b025 <find_builtin>
0x0041efe1 <+156>: lw a0,232(sp)
0x0041efe3 <+158>: sw v0,44(sp)
0x0041efe5 <+160>: beqz v0,0x41f00f <find_command+202>
0x0041efe7 <+162>: lw v0,0(v0)
=> 0x0041efe9 <+164>: lbu v1,0(v0)
0x0041efeb <+166>: li v0,2
0x0041efed <+168>: and v0,v1
0x0041efef <+170>: bnez v0,0x41f177 <find_command+562>
0x0041eff3 <+174>: lw v1,240(sp)
0x0041eff5 <+176>: li v0,8
0x0041eff7 <+178>: and v0,v1
0x0041eff9 <+180>: beqz v0,0x41f005 <find_command+192>
0x0041effb <+182>: li v0,32
0x0041effd <+184>: and v0,v1
0x0041efff <+186>: beqz v0,0x41f177 <find_command+562>
0x0041f003 <+190>: b 0x41f00f <find_command+202>
0x0041f005 <+192>: lw v0,0x41f1bc <find_command+631>
0x0041f007 <+194>: lw v0,0(v0)
...

As in FS#251, the contents of the registers don't really make sense. Unless I'm overlooking something, it should not be possible for $pc to reach 0x0041efe9 with $ra on 0x0041efb1 (return from cmdlookup); rather, $ra should have the value 0x0041efe3 (return from find_builtin). There are no code paths reaching 0x0041efe9 that don't call find_builtin.

@openwrt-bot
Copy link
Author

NeoRaider:

Stack dump around $sp (7fcf97d0):

(gdb) x/64 $sp-128 0x7fcf9750: 0x0000019e 0x00000000 0x00000000 0x00000000 0x7fcf9760: 0x0091a6e8 0x00000000 0x0000000b 0x0041c70b 0x7fcf9770: 0x0090d63b 0x0045b544 0x0091a6f4 0x77a66f98 0x7fcf9780: 0x77a52c8c 0x779ed000 0x77a6e2c0 0x0090d647 0x7fcf9790: 0x00000000 0x00000000 0x77a68de0 0x0041d15f 0x7fcf97a0: 0x77a6e2c0 0x00000003 0x0000002f 0x77a36c48 0x7fcf97b0: 0x00000000 0x0091a6f4 0x00000000 0x0090b7c0 0x7fcf97c0: 0x77a6e2c0 0x7fcfdd78 0x7fcf9900 0x0041efb1 0x7fcf97d0: 0x00910df4 0x00000000 0x77a64000 0x77a64000 0x7fcf97e0: 0x77a67518 0x77a68d8c 0x00000000 0x77a6e2c0 0x7fcf97f0: 0x00000102 0x00000000 0x00000001 0x0090ccc0 0x7fcf9800: 0x00000000 0x00910df4 0x0045b7d8 0x77a3752c 0x7fcf9810: 0x77a6e2c0 0x00418f9d 0x00000000 0x80000400 0x7fcf9820: 0x77a6e2c0 0x00000000 0x00910df4 0x0041ad53 0x7fcf9830: 0x00000000 0x00000008 0x00000004 0x00000008 0x7fcf9840: 0x0045b7d8 0x0041a251 0x77a67518 0x00910df4

@openwrt-bot
Copy link
Author

Sunz3r:

I can confirm this issues on LEDE+Gluon "gluon-v2017.1-2-g4827f2d" with WR842v3.1

[ 1571.670269] eth1: link down [ 1571.674048] br-wan: port 1(eth1) entered disabled state [ 1671.865533] [ 1671.865533] do_page_fault(): sending SIGSEGV to dhcpv6.script for invalid read access from 00000000 [ 1671.874977] epc = 0041efe9 in busybox[400000+4b000] [ 1671.880083] ra = 0041efb1 in busybox[400000+4b000] [ 1671.885139]

@openwrt-bot
Copy link
Author

bjonglez:

kernel 4.4.79 has several commits fixing issues on MIPS: https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.4.79

In particular this commit 4c7d28c1e99d ("MIPS: math-emu: Prevent wrong ISA mode instruction emulation") looks related

@openwrt-bot
Copy link
Author

NeoRaider:

Checked with 4.4.79, no change.

I have also added some debug messages to the kernel for trap-related context switches (like math emulation), and if I didn't overlook anything, no context switches happen anywhere close to these crashes.

@openwrt-bot
Copy link
Author

NeoRaider:

Not reproducible anymore with latest lede-17.01 or master, closing until further notice.

@stweil
Copy link
Contributor

stweil commented Jan 19, 2023

I found this issue because I had a similar segfault on a Zyxel NWA50AX device:

[83403.895297] do_page_fault(): sending SIGSEGV to dhcpv6.script for invalid read access from 0000000b
[83403.895320] epc = 0041b29f in busybox[400000+4f000]
[83403.895356] ra  = 00422463 in busybox[400000+4f000]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants