All of lore.kernel.org
 help / color / mirror / Atom feed
* NULL pointer issue in rcu_do_batch()
@ 2022-12-22 11:34 Mukesh Ojha
  2022-12-22 13:11 ` Joel Fernandes
  0 siblings, 1 reply; 3+ messages in thread
From: Mukesh Ojha @ 2022-12-22 11:34 UTC (permalink / raw)
  To: rcu; +Cc: quic_mojha

Hi All,

We are observing NULL pointer dereference issue in rcu_do_batch() in 
5.15, although it is very hard to hit.

Wanted to check if it is been reported and fixed in recent kernel ?


<1>[16.814014] [pid:    58] Unable to handle kernel NULL pointer 
dereference at virtual address 0000000000000000
<0>[16.814027] [pid:    58] PC Code: bad value
<0>[16.814034] [pid:    58] LR Code: f81e03a8 b5000068 d10083a8 f81e83a8 
aa1f03f6 91127319 d10083b7 f9434b68 d503201f f9400408 910006d6 f900041f 
d63f0100 (91004308) b8bfc108 374001c8 97ffff2b 9111e308 38bfc108 72001d1f

<4>[16.814359] [pid:    58] CPU: 7 PID: 58 Comm: rcuop/5 Tainted: G S 
   W  OE     5.15.41-android13-8-25574579-abS911USQU1AVLL #1
<4>[16.814361] [pid:    58] Hardware name: XXXXX
<4>[16.814362] [pid:    58] pstate: 42400805 (nZcv daif +PAN -UAO +TCO 
-DIT -SSBS BTYPE=-c)
<4>[16.814364] [pid:    58] pc : 0x0
<4>[16.814365] [pid:    58] lr : rcu_do_batch+0x328/0xcd8


rcu_data for CPU5 contains additional 12 RCU callback heads in the 
segment of RCU_DONE_TAIL whose func is NULL. It doesn’t seem to be a 
random memory corruption since only rhp->func is set to null across 
multiple objects.

There is one more occurrence with CONFIG_CFI_CLANG enabled.

[123587.101222][   T44] Kernel panic - not syncing: CFI failure (target: 
0x0)
[123587.101249][   T44] CPU: 0 PID: 44 Comm: rcuop/3 Tainted: G S 
WC OE     5.15.41 #1
[123587.101263][   T44] Hardware name: XXXXX
[123587.101274][   T44] Call trace:
[123587.101283][   T44]  dump_backtrace.cfi_jt+0x0/0x8
[123587.101298][   T44]  show_stack+0x1c/0x2c
[123587.101311][   T44]  dump_stack_lvl+0x94/0x100
[123587.101326][   T44]  panic+0x17c/0x450
[123587.101338][   T44]  find_check_fn+0x0/0x210
[123587.101349][   T44]  rcu_do_batch+0x368/0x6f8
[123587.101362][   T44]  nocb_cb_wait+0x80/0x450
[123587.101374][   T44]  rcu_nocb_cb_kthread+0x54/0x90
[123587.101386][   T44]  kthread+0x174/0x1d8
[123587.101398][   T44]  ret_from_fork+0x10/0x20
[123587.101410][   T44] SMP: stopping secondary CPUs
[123587.101670][    C4] VendorHooks: CPU4: stopping

-Mukesh

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: NULL pointer issue in rcu_do_batch()
  2022-12-22 11:34 NULL pointer issue in rcu_do_batch() Mukesh Ojha
@ 2022-12-22 13:11 ` Joel Fernandes
  2022-12-22 16:40   ` Paul E. McKenney
  0 siblings, 1 reply; 3+ messages in thread
From: Joel Fernandes @ 2022-12-22 13:11 UTC (permalink / raw)
  To: Mukesh Ojha; +Cc: rcu



> On Dec 22, 2022, at 6:34 AM, Mukesh Ojha <quic_mojha@quicinc.com> wrote:
> 
> Hi All,
> 
> We are observing NULL pointer dereference issue in rcu_do_batch() in 5.15, although it is very hard to hit.
> 
> Wanted to check if it is been reported and fixed in recent kernel ?

What is the test case? I have not seen such corruption. Is it possible for you to run with CONFIG_PROVE_RCU?

This looks like an Android kernel, I can tell by looking at VendorHooks in the log. So with all that GKI stuff, are we sure that is not causing some unforeseen side effect ?

Thanks,

 - Joel


> <1>[16.814014] [pid:    58] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> <0>[16.814027] [pid:    58] PC Code: bad value
> <0>[16.814034] [pid:    58] LR Code: f81e03a8 b5000068 d10083a8 f81e83a8 aa1f03f6 91127319 d10083b7 f9434b68 d503201f f9400408 910006d6 f900041f d63f0100 (91004308) b8bfc108 374001c8 97ffff2b 9111e308 38bfc108 72001d1f
> 
> <4>[16.814359] [pid:    58] CPU: 7 PID: 58 Comm: rcuop/5 Tainted: G S   W  OE     5.15.41-android13-8-25574579-abS911USQU1AVLL #1
> <4>[16.814361] [pid:    58] Hardware name: XXXXX
> <4>[16.814362] [pid:    58] pstate: 42400805 (nZcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=-c)
> <4>[16.814364] [pid:    58] pc : 0x0
> <4>[16.814365] [pid:    58] lr : rcu_do_batch+0x328/0xcd8
> 
> 
> rcu_data for CPU5 contains additional 12 RCU callback heads in the segment of RCU_DONE_TAIL whose func is NULL. It doesn’t seem to be a random memory corruption since only rhp->func is set to null across multiple objects.
> 
> There is one more occurrence with CONFIG_CFI_CLANG enabled.
> 
> [123587.101222][   T44] Kernel panic - not syncing: CFI failure (target: 0x0)
> [123587.101249][   T44] CPU: 0 PID: 44 Comm: rcuop/3 Tainted: G S WC OE     5.15.41 #1
> [123587.101263][   T44] Hardware name: XXXXX
> [123587.101274][   T44] Call trace:
> [123587.101283][   T44]  dump_backtrace.cfi_jt+0x0/0x8
> [123587.101298][   T44]  show_stack+0x1c/0x2c
> [123587.101311][   T44]  dump_stack_lvl+0x94/0x100
> [123587.101326][   T44]  panic+0x17c/0x450
> [123587.101338][   T44]  find_check_fn+0x0/0x210
> [123587.101349][   T44]  rcu_do_batch+0x368/0x6f8
> [123587.101362][   T44]  nocb_cb_wait+0x80/0x450
> [123587.101374][   T44]  rcu_nocb_cb_kthread+0x54/0x90
> [123587.101386][   T44]  kthread+0x174/0x1d8
> [123587.101398][   T44]  ret_from_fork+0x10/0x20
> [123587.101410][   T44] SMP: stopping secondary CPUs
> [123587.101670][    C4] VendorHooks: CPU4: stopping
> 
> -Mukesh

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: NULL pointer issue in rcu_do_batch()
  2022-12-22 13:11 ` Joel Fernandes
@ 2022-12-22 16:40   ` Paul E. McKenney
  0 siblings, 0 replies; 3+ messages in thread
From: Paul E. McKenney @ 2022-12-22 16:40 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: Mukesh Ojha, rcu

On Thu, Dec 22, 2022 at 08:11:23AM -0500, Joel Fernandes wrote:
> 
> 
> > On Dec 22, 2022, at 6:34 AM, Mukesh Ojha <quic_mojha@quicinc.com> wrote:
> > 
> > Hi All,
> > 
> > We are observing NULL pointer dereference issue in rcu_do_batch() in 5.15, although it is very hard to hit.
> > 
> > Wanted to check if it is been reported and fixed in recent kernel ?
> 
> What is the test case? I have not seen such corruption. Is it possible for you to run with CONFIG_PROVE_RCU?

What Joel said!

Another common cause of this is double call_rcu(), free-after-call_rcu(),
or similar.  CONFIG_DEBUG_OBJECTS_RCU_HEAD can help track these down,
and KASAN can also be helpful.

							Thanx, Paul

> This looks like an Android kernel, I can tell by looking at VendorHooks in the log. So with all that GKI stuff, are we sure that is not causing some unforeseen side effect ?
> 
> Thanks,
> 
>  - Joel
> 
> 
> > <1>[16.814014] [pid:    58] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> > <0>[16.814027] [pid:    58] PC Code: bad value
> > <0>[16.814034] [pid:    58] LR Code: f81e03a8 b5000068 d10083a8 f81e83a8 aa1f03f6 91127319 d10083b7 f9434b68 d503201f f9400408 910006d6 f900041f d63f0100 (91004308) b8bfc108 374001c8 97ffff2b 9111e308 38bfc108 72001d1f
> > 
> > <4>[16.814359] [pid:    58] CPU: 7 PID: 58 Comm: rcuop/5 Tainted: G S   W  OE     5.15.41-android13-8-25574579-abS911USQU1AVLL #1
> > <4>[16.814361] [pid:    58] Hardware name: XXXXX
> > <4>[16.814362] [pid:    58] pstate: 42400805 (nZcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=-c)
> > <4>[16.814364] [pid:    58] pc : 0x0
> > <4>[16.814365] [pid:    58] lr : rcu_do_batch+0x328/0xcd8
> > 
> > 
> > rcu_data for CPU5 contains additional 12 RCU callback heads in the segment of RCU_DONE_TAIL whose func is NULL. It doesn’t seem to be a random memory corruption since only rhp->func is set to null across multiple objects.
> > 
> > There is one more occurrence with CONFIG_CFI_CLANG enabled.
> > 
> > [123587.101222][   T44] Kernel panic - not syncing: CFI failure (target: 0x0)
> > [123587.101249][   T44] CPU: 0 PID: 44 Comm: rcuop/3 Tainted: G S WC OE     5.15.41 #1
> > [123587.101263][   T44] Hardware name: XXXXX
> > [123587.101274][   T44] Call trace:
> > [123587.101283][   T44]  dump_backtrace.cfi_jt+0x0/0x8
> > [123587.101298][   T44]  show_stack+0x1c/0x2c
> > [123587.101311][   T44]  dump_stack_lvl+0x94/0x100
> > [123587.101326][   T44]  panic+0x17c/0x450
> > [123587.101338][   T44]  find_check_fn+0x0/0x210
> > [123587.101349][   T44]  rcu_do_batch+0x368/0x6f8
> > [123587.101362][   T44]  nocb_cb_wait+0x80/0x450
> > [123587.101374][   T44]  rcu_nocb_cb_kthread+0x54/0x90
> > [123587.101386][   T44]  kthread+0x174/0x1d8
> > [123587.101398][   T44]  ret_from_fork+0x10/0x20
> > [123587.101410][   T44] SMP: stopping secondary CPUs
> > [123587.101670][    C4] VendorHooks: CPU4: stopping
> > 
> > -Mukesh

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-12-22 16:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-22 11:34 NULL pointer issue in rcu_do_batch() Mukesh Ojha
2022-12-22 13:11 ` Joel Fernandes
2022-12-22 16:40   ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.