RCU Archive on lore.kernel.org
 help / color / Atom feed
* rcu-torture: Internal error: Oops: 96000006
@ 2021-01-21 17:07 Naresh Kamboju
  2021-01-21 18:55 ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Naresh Kamboju @ 2021-01-21 17:07 UTC (permalink / raw)
  To: rcu, open list, Linux-Next Mailing List, lkft-triage
  Cc: Paul E. McKenney, Peter Zijlstra, Steven Rostedt, Ingo Molnar

While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
the following kernel crash noticed. This started happening from Linux next
next-20210111 tag to next-20210121.

metadata:
  git branch: master
  git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
  git describe: next-20210111
  kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config

output log:

[  621.538050] mem_dump_obj() slab test: rcu_torture_stats =
ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
= ffff8000091ab8e0
[  621.546662] mem_dump_obj(ZERO_SIZE_PTR):
[  621.546696] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000008
[  621.555431] Mem abort info:
[  621.557243]   ESR = 0x96000006
[  621.559074]   EC = 0x25: DABT (current EL), IL = 32 bits
[  621.562240]   SET = 0, FnV = 0
[  621.563626]   EA = 0, S1PTW = 0
[  621.565134] Data abort info:
[  621.566425]   ISV = 0, ISS = 0x00000006
[  621.568064]   CM = 0, WnR = 0
[  621.569571] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101ef0000
[  621.572446] [0000000000000008] pgd=0000000102ee1003,
p4d=0000000102ee1003, pud=0000000100b25003, pmd=0000000000000000
[  621.577007] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[  621.579359] Modules linked in: rcutorture(-) torture rfkill crct10dif_ce fuse
[  621.582549] CPU: 2 PID: 422 Comm: rcu_torture_sta Not tainted
5.11.0-rc2-next-20210111 #2
[  621.585294] Hardware name: linux,dummy-virt (DT)
[  621.586671] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[  621.588497] pc : kmem_valid_obj+0x58/0xa8
[  621.589748] lr : kmem_valid_obj+0x40/0xa8
[  621.591022] sp : ffff800012debdc0
[  621.592026] x29: ffff800012debdc0 x28: 0000000000000000
[  621.593652] x27: ffff800012e8b988 x26: ffff0000c634dbc0
[  621.595287] x25: 0000000000000000 x24: ffff8000091a1e60
[  621.596882] x23: ffff8000091ab8e0 x22: ffff0000c0a3ac40
[  621.598464] x21: ffff0000c1f44100 x20: ffff8000091a5e90
[  621.600070] x19: 0000000000000000 x18: 0000000000000010
[  621.601692] x17: 0000000000007fff x16: 00000000ffffffff
[  621.603303] x15: ffff0000c0a3b0b8 x14: 66203d2070687226
[  621.604866] x13: 202c303463613361 x12: 2c30346562656432
[  621.606455] x11: ffff80001246cbc0 x10: ffff800012454b80
[  621.608064] x9 : ffff8000100370c8 x8 : 0000000100000000
[  621.609649] x7 : 0000000000000018 x6 : ffff800012816348
[  621.611253] x5 : ffff800012816348 x4 : 0000000000000001
[  621.612849] x3 : 0000000000000001 x2 : 0000000140000000
[  621.614455] x1 : 0000000000000000 x0 : fffffc0000000000
[  621.616062] Call trace:
[  621.616816]  kmem_valid_obj+0x58/0xa8
[  621.617933]  mem_dump_obj+0x20/0xc8
[  621.619015]  rcu_torture_stats+0xf0/0x298 [rcutorture]
[  621.620578]  kthread+0x120/0x158
[  621.621583]  ret_from_fork+0x10/0x34
[  621.622685] Code: 8b000273 b25657e0 d34cfe73 8b131813 (f9400660)
[  621.624570] ---[ end trace 2a00688830f37ea1 ]---

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>

Full test log:
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20210111/testrun/3716647/suite/linux-log-parser/test/check-kernel-oops-2124993/log
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20210121/testrun/3791289/suite/linux-log-parser/test/check-kernel-oops-2172413/log

-- 
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rcu-torture: Internal error: Oops: 96000006
  2021-01-21 17:07 rcu-torture: Internal error: Oops: 96000006 Naresh Kamboju
@ 2021-01-21 18:55 ` Paul E. McKenney
  2021-01-21 21:31   ` Will Deacon
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2021-01-21 18:55 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: rcu, open list, Linux-Next Mailing List, lkft-triage,
	Peter Zijlstra, Steven Rostedt, Ingo Molnar, catalin.marinas,
	will, linux-arm-kernel

On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> the following kernel crash noticed. This started happening from Linux next
> next-20210111 tag to next-20210121.
> 
> metadata:
>   git branch: master
>   git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
>   git describe: next-20210111
>   kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> 
> output log:
> 
> [  621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> = ffff8000091ab8e0
> [  621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> [  621.546696] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000008
> [  621.555431] Mem abort info:
> [  621.557243]   ESR = 0x96000006
> [  621.559074]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  621.562240]   SET = 0, FnV = 0
> [  621.563626]   EA = 0, S1PTW = 0
> [  621.565134] Data abort info:
> [  621.566425]   ISV = 0, ISS = 0x00000006
> [  621.568064]   CM = 0, WnR = 0
> [  621.569571] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101ef0000
> [  621.572446] [0000000000000008] pgd=0000000102ee1003,
> p4d=0000000102ee1003, pud=0000000100b25003, pmd=0000000000000000
> [  621.577007] Internal error: Oops: 96000006 [#1] PREEMPT SMP
> [  621.579359] Modules linked in: rcutorture(-) torture rfkill crct10dif_ce fuse
> [  621.582549] CPU: 2 PID: 422 Comm: rcu_torture_sta Not tainted
> 5.11.0-rc2-next-20210111 #2
> [  621.585294] Hardware name: linux,dummy-virt (DT)
> [  621.586671] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
> [  621.588497] pc : kmem_valid_obj+0x58/0xa8
> [  621.589748] lr : kmem_valid_obj+0x40/0xa8
> [  621.591022] sp : ffff800012debdc0
> [  621.592026] x29: ffff800012debdc0 x28: 0000000000000000
> [  621.593652] x27: ffff800012e8b988 x26: ffff0000c634dbc0
> [  621.595287] x25: 0000000000000000 x24: ffff8000091a1e60
> [  621.596882] x23: ffff8000091ab8e0 x22: ffff0000c0a3ac40
> [  621.598464] x21: ffff0000c1f44100 x20: ffff8000091a5e90
> [  621.600070] x19: 0000000000000000 x18: 0000000000000010
> [  621.601692] x17: 0000000000007fff x16: 00000000ffffffff
> [  621.603303] x15: ffff0000c0a3b0b8 x14: 66203d2070687226
> [  621.604866] x13: 202c303463613361 x12: 2c30346562656432
> [  621.606455] x11: ffff80001246cbc0 x10: ffff800012454b80
> [  621.608064] x9 : ffff8000100370c8 x8 : 0000000100000000
> [  621.609649] x7 : 0000000000000018 x6 : ffff800012816348
> [  621.611253] x5 : ffff800012816348 x4 : 0000000000000001
> [  621.612849] x3 : 0000000000000001 x2 : 0000000140000000
> [  621.614455] x1 : 0000000000000000 x0 : fffffc0000000000
> [  621.616062] Call trace:
> [  621.616816]  kmem_valid_obj+0x58/0xa8
> [  621.617933]  mem_dump_obj+0x20/0xc8
> [  621.619015]  rcu_torture_stats+0xf0/0x298 [rcutorture]
> [  621.620578]  kthread+0x120/0x158
> [  621.621583]  ret_from_fork+0x10/0x34
> [  621.622685] Code: 8b000273 b25657e0 d34cfe73 8b131813 (f9400660)
> [  621.624570] ---[ end trace 2a00688830f37ea1 ]---
> 
> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> 
> Full test log:
> https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20210111/testrun/3716647/suite/linux-log-parser/test/check-kernel-oops-2124993/log
> https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20210121/testrun/3791289/suite/linux-log-parser/test/check-kernel-oops-2172413/log

Huh.  I am relying on virt_addr_valid() rejecting NULL pointers and
things like ZERO_SIZE_PTR, which is defined as ((void *)16).  It looks
like your configuration rejects NULL as an invalid virtual address,
but does not reject ZERO_SIZE_PTR.  Is this the intent, given that you
are not allowed to dereference a ZERO_SIZE_PTR?

Adding the ARM64 guys on CC for their thoughts.

It is easy enough for me to make kmem_valid_obj() return false for any
address less than (say) PAGE_SIZE for the upcoming merge window, but I
figured I should check for the longer term.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rcu-torture: Internal error: Oops: 96000006
  2021-01-21 18:55 ` Paul E. McKenney
@ 2021-01-21 21:31   ` Will Deacon
  2021-01-21 21:43     ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Will Deacon @ 2021-01-21 21:31 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Naresh Kamboju, rcu, open list, Linux-Next Mailing List,
	lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar,
	catalin.marinas, linux-arm-kernel, vincenzo.frascino,
	mark.rutland

On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > the following kernel crash noticed. This started happening from Linux next
> > next-20210111 tag to next-20210121.
> > 
> > metadata:
> >   git branch: master
> >   git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> >   git describe: next-20210111
> >   kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > 
> > output log:
> > 
> > [  621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > = ffff8000091ab8e0
> > [  621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > [  621.546696] Unable to handle kernel NULL pointer dereference at
> > virtual address 0000000000000008

[...]

> Huh.  I am relying on virt_addr_valid() rejecting NULL pointers and
> things like ZERO_SIZE_PTR, which is defined as ((void *)16).  It looks
> like your configuration rejects NULL as an invalid virtual address,
> but does not reject ZERO_SIZE_PTR.  Is this the intent, given that you
> are not allowed to dereference a ZERO_SIZE_PTR?
> 
> Adding the ARM64 guys on CC for their thoughts.

Spooky timing, there was a thread _today_ about that:

https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rcu-torture: Internal error: Oops: 96000006
  2021-01-21 21:31   ` Will Deacon
@ 2021-01-21 21:43     ` Paul E. McKenney
  2021-01-22  9:51       ` Naresh Kamboju
  2021-01-22 10:02       ` Mark Rutland
  0 siblings, 2 replies; 10+ messages in thread
From: Paul E. McKenney @ 2021-01-21 21:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: Naresh Kamboju, rcu, open list, Linux-Next Mailing List,
	lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar,
	catalin.marinas, linux-arm-kernel, vincenzo.frascino,
	mark.rutland

On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > the following kernel crash noticed. This started happening from Linux next
> > > next-20210111 tag to next-20210121.
> > > 
> > > metadata:
> > >   git branch: master
> > >   git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > >   git describe: next-20210111
> > >   kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > 
> > > output log:
> > > 
> > > [  621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > = ffff8000091ab8e0
> > > [  621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > [  621.546696] Unable to handle kernel NULL pointer dereference at
> > > virtual address 0000000000000008
> 
> [...]
> 
> > Huh.  I am relying on virt_addr_valid() rejecting NULL pointers and
> > things like ZERO_SIZE_PTR, which is defined as ((void *)16).  It looks
> > like your configuration rejects NULL as an invalid virtual address,
> > but does not reject ZERO_SIZE_PTR.  Is this the intent, given that you
> > are not allowed to dereference a ZERO_SIZE_PTR?
> > 
> > Adding the ARM64 guys on CC for their thoughts.
> 
> Spooky timing, there was a thread _today_ about that:
> 
> https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com

Very good, then my workaround (shown below for Naresh's ease of testing)
is only a short-term workaround.  Yay!  ;-)

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/mm/slab_common.c b/mm/slab_common.c
index cefa9ae..a8375d1 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -550,7 +550,8 @@ bool kmem_valid_obj(void *object)
 {
 	struct page *page;
 
-	if (!virt_addr_valid(object))
+	/* Some arches consider ZERO_SIZE_PTR to be a valid address. */
+	if (object < (void *)PAGE_SIZE || !virt_addr_valid(object))
 		return false;
 	page = virt_to_head_page(object);
 	return PageSlab(page);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rcu-torture: Internal error: Oops: 96000006
  2021-01-21 21:43     ` Paul E. McKenney
@ 2021-01-22  9:51       ` Naresh Kamboju
  2021-01-22 15:37         ` Paul E. McKenney
  2021-01-22 10:02       ` Mark Rutland
  1 sibling, 1 reply; 10+ messages in thread
From: Naresh Kamboju @ 2021-01-22  9:51 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Will Deacon, rcu, open list, Linux-Next Mailing List,
	lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar,
	Catalin Marinas, Linux ARM, Vincenzo Frascino, Mark Rutland

On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > > the following kernel crash noticed. This started happening from Linux next
> > > > next-20210111 tag to next-20210121.
> > > >
> > > > metadata:
> > > >   git branch: master
> > > >   git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > >   git describe: next-20210111
> > > >   kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > >
> > > > output log:
> > > >
> > > > [  621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > > = ffff8000091ab8e0
> > > > [  621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > > [  621.546696] Unable to handle kernel NULL pointer dereference at
> > > > virtual address 0000000000000008
> >
> > [...]
> >
> > > Huh.  I am relying on virt_addr_valid() rejecting NULL pointers and
> > > things like ZERO_SIZE_PTR, which is defined as ((void *)16).  It looks
> > > like your configuration rejects NULL as an invalid virtual address,
> > > but does not reject ZERO_SIZE_PTR.  Is this the intent, given that you
> > > are not allowed to dereference a ZERO_SIZE_PTR?
> > >
> > > Adding the ARM64 guys on CC for their thoughts.
> >
> > Spooky timing, there was a thread _today_ about that:
> >
> > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
>
> Very good, then my workaround (shown below for Naresh's ease of testing)
> is only a short-term workaround.  Yay!  ;-)

Paul, thanks for your (short-term workaround) patch.

I have applied your patch and tested rcu-torture test on qemu_arm64 and
the reported issues has been fixed.

- Naresh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rcu-torture: Internal error: Oops: 96000006
  2021-01-21 21:43     ` Paul E. McKenney
  2021-01-22  9:51       ` Naresh Kamboju
@ 2021-01-22 10:02       ` Mark Rutland
  2021-01-22 11:45         ` Vincenzo Frascino
  1 sibling, 1 reply; 10+ messages in thread
From: Mark Rutland @ 2021-01-22 10:02 UTC (permalink / raw)
  To: Paul E. McKenney, vincenzo.frascino
  Cc: Will Deacon, Naresh Kamboju, rcu, open list,
	Linux-Next Mailing List, lkft-triage, Peter Zijlstra,
	Steven Rostedt, Ingo Molnar, catalin.marinas, linux-arm-kernel

On Thu, Jan 21, 2021 at 01:43:14PM -0800, Paul E. McKenney wrote:
> On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > > the following kernel crash noticed. This started happening from Linux next
> > > > next-20210111 tag to next-20210121.
> > > > 
> > > > metadata:
> > > >   git branch: master
> > > >   git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > >   git describe: next-20210111
> > > >   kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > > 
> > > > output log:
> > > > 
> > > > [  621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > > = ffff8000091ab8e0
> > > > [  621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > > [  621.546696] Unable to handle kernel NULL pointer dereference at
> > > > virtual address 0000000000000008
> > 
> > [...]
> > 
> > > Huh.  I am relying on virt_addr_valid() rejecting NULL pointers and
> > > things like ZERO_SIZE_PTR, which is defined as ((void *)16).  It looks
> > > like your configuration rejects NULL as an invalid virtual address,
> > > but does not reject ZERO_SIZE_PTR.  Is this the intent, given that you
> > > are not allowed to dereference a ZERO_SIZE_PTR?
> > > 
> > > Adding the ARM64 guys on CC for their thoughts.
> > 
> > Spooky timing, there was a thread _today_ about that:
> > 
> > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
> 
> Very good, then my workaround (shown below for Naresh's ease of testing)
> is only a short-term workaround.  Yay!  ;-)

Hopefully, though we might need to check other architectures beyond
arm64, ppc, and x86, to be certain!

Is there any other latent use of virt_addr_valid() that needs this
semantic? If so we'll probably want to backport the changes to arm64's
implementation, at least for v5.10.

Vincenzo, would you mind taking a look?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rcu-torture: Internal error: Oops: 96000006
  2021-01-22 10:02       ` Mark Rutland
@ 2021-01-22 11:45         ` Vincenzo Frascino
  0 siblings, 0 replies; 10+ messages in thread
From: Vincenzo Frascino @ 2021-01-22 11:45 UTC (permalink / raw)
  To: Mark Rutland, Paul E. McKenney
  Cc: Will Deacon, Naresh Kamboju, rcu, open list,
	Linux-Next Mailing List, lkft-triage, Peter Zijlstra,
	Steven Rostedt, Ingo Molnar, catalin.marinas, linux-arm-kernel



On 1/22/21 10:02 AM, Mark Rutland wrote:
> On Thu, Jan 21, 2021 at 01:43:14PM -0800, Paul E. McKenney wrote:
>> On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
>>> On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
>>>> On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
>>>>> While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
>>>>> the following kernel crash noticed. This started happening from Linux next
>>>>> next-20210111 tag to next-20210121.
>>>>>
>>>>> metadata:
>>>>>   git branch: master
>>>>>   git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
>>>>>   git describe: next-20210111
>>>>>   kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
>>>>>
>>>>> output log:
>>>>>
>>>>> [  621.538050] mem_dump_obj() slab test: rcu_torture_stats =
>>>>> ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
>>>>> = ffff8000091ab8e0
>>>>> [  621.546662] mem_dump_obj(ZERO_SIZE_PTR):
>>>>> [  621.546696] Unable to handle kernel NULL pointer dereference at
>>>>> virtual address 0000000000000008
>>>
>>> [...]
>>>
>>>> Huh.  I am relying on virt_addr_valid() rejecting NULL pointers and
>>>> things like ZERO_SIZE_PTR, which is defined as ((void *)16).  It looks
>>>> like your configuration rejects NULL as an invalid virtual address,
>>>> but does not reject ZERO_SIZE_PTR.  Is this the intent, given that you
>>>> are not allowed to dereference a ZERO_SIZE_PTR?
>>>>
>>>> Adding the ARM64 guys on CC for their thoughts.
>>>
>>> Spooky timing, there was a thread _today_ about that:
>>>
>>> https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
>>
>> Very good, then my workaround (shown below for Naresh's ease of testing)
>> is only a short-term workaround.  Yay!  ;-)
> 
> Hopefully, though we might need to check other architectures beyond
> arm64, ppc, and x86, to be certain!
> 

Which other architectures do you propose to verify?

> Is there any other latent use of virt_addr_valid() that needs this
> semantic? If so we'll probably want to backport the changes to arm64's
> implementation, at least for v5.10.
> 
> Vincenzo, would you mind taking a look?
> 

I am happy to have a look at it, but due to previous commitments I will be able
to get at it after -rc1. A quick grep shows that there are ~32 cases that might
be affected by the same semantic in the common code (left out arch/ and
drivers/). I will post the improvement for arm64 in the meantime though.

> Thanks,
> Mark.
> 

-- 
Regards,
Vincenzo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rcu-torture: Internal error: Oops: 96000006
  2021-01-22  9:51       ` Naresh Kamboju
@ 2021-01-22 15:37         ` Paul E. McKenney
  2021-01-22 15:46           ` Naresh Kamboju
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2021-01-22 15:37 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Will Deacon, rcu, open list, Linux-Next Mailing List,
	lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar,
	Catalin Marinas, Linux ARM, Vincenzo Frascino, Mark Rutland

On Fri, Jan 22, 2021 at 03:21:07PM +0530, Naresh Kamboju wrote:
> On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> > > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > > > the following kernel crash noticed. This started happening from Linux next
> > > > > next-20210111 tag to next-20210121.
> > > > >
> > > > > metadata:
> > > > >   git branch: master
> > > > >   git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > > >   git describe: next-20210111
> > > > >   kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > > >
> > > > > output log:
> > > > >
> > > > > [  621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > > > = ffff8000091ab8e0
> > > > > [  621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > > > [  621.546696] Unable to handle kernel NULL pointer dereference at
> > > > > virtual address 0000000000000008
> > >
> > > [...]
> > >
> > > > Huh.  I am relying on virt_addr_valid() rejecting NULL pointers and
> > > > things like ZERO_SIZE_PTR, which is defined as ((void *)16).  It looks
> > > > like your configuration rejects NULL as an invalid virtual address,
> > > > but does not reject ZERO_SIZE_PTR.  Is this the intent, given that you
> > > > are not allowed to dereference a ZERO_SIZE_PTR?
> > > >
> > > > Adding the ARM64 guys on CC for their thoughts.
> > >
> > > Spooky timing, there was a thread _today_ about that:
> > >
> > > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
> >
> > Very good, then my workaround (shown below for Naresh's ease of testing)
> > is only a short-term workaround.  Yay!  ;-)
> 
> Paul, thanks for your (short-term workaround) patch.
> 
> I have applied your patch and tested rcu-torture test on qemu_arm64 and
> the reported issues has been fixed.

May I add your Tested-by?

And before I forget again, good to see the rcutorture testing on a
non-x86 platform!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rcu-torture: Internal error: Oops: 96000006
  2021-01-22 15:37         ` Paul E. McKenney
@ 2021-01-22 15:46           ` Naresh Kamboju
  2021-01-22 23:23             ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Naresh Kamboju @ 2021-01-22 15:46 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Will Deacon, rcu, open list, Linux-Next Mailing List,
	lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar,
	Catalin Marinas, Linux ARM, Vincenzo Frascino, Mark Rutland

On Fri, 22 Jan 2021 at 21:07, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Jan 22, 2021 at 03:21:07PM +0530, Naresh Kamboju wrote:
> > On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> > > > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > > > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > > > > the following kernel crash noticed. This started happening from Linux next
> > > > > > next-20210111 tag to next-20210121.
> > > > > >
> > > > > > metadata:
> > > > > >   git branch: master
> > > > > >   git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > > > >   git describe: next-20210111
> > > > > >   kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > > > >
> > > > > > output log:
> > > > > >
> > > > > > [  621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > > > > = ffff8000091ab8e0
> > > > > > [  621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > > > > [  621.546696] Unable to handle kernel NULL pointer dereference at
> > > > > > virtual address 0000000000000008
> > > >
> > > > [...]
> > > >
> > > > > Huh.  I am relying on virt_addr_valid() rejecting NULL pointers and
> > > > > things like ZERO_SIZE_PTR, which is defined as ((void *)16).  It looks
> > > > > like your configuration rejects NULL as an invalid virtual address,
> > > > > but does not reject ZERO_SIZE_PTR.  Is this the intent, given that you
> > > > > are not allowed to dereference a ZERO_SIZE_PTR?
> > > > >
> > > > > Adding the ARM64 guys on CC for their thoughts.
> > > >
> > > > Spooky timing, there was a thread _today_ about that:
> > > >
> > > > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
> > >
> > > Very good, then my workaround (shown below for Naresh's ease of testing)
> > > is only a short-term workaround.  Yay!  ;-)
> >
> > Paul, thanks for your (short-term workaround) patch.
> >
> > I have applied your patch and tested rcu-torture test on qemu_arm64 and
> > the reported issues has been fixed.
>
> May I add your Tested-by?

Yes.  Please add Reported-by and Tested-by.

>
> And before I forget again, good to see the rcutorture testing on a
> non-x86 platform!

We are running rcutorture tests on arm, arm64, i386 and x86_64.

Happy to test !

- Naresh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rcu-torture: Internal error: Oops: 96000006
  2021-01-22 15:46           ` Naresh Kamboju
@ 2021-01-22 23:23             ` Paul E. McKenney
  0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2021-01-22 23:23 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Will Deacon, rcu, open list, Linux-Next Mailing List,
	lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar,
	Catalin Marinas, Linux ARM, Vincenzo Frascino, Mark Rutland

On Fri, Jan 22, 2021 at 09:16:38PM +0530, Naresh Kamboju wrote:
> On Fri, 22 Jan 2021 at 21:07, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Fri, Jan 22, 2021 at 03:21:07PM +0530, Naresh Kamboju wrote:
> > > On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> > > > > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > > > > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > > > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > > > > > the following kernel crash noticed. This started happening from Linux next
> > > > > > > next-20210111 tag to next-20210121.
> > > > > > >
> > > > > > > metadata:
> > > > > > >   git branch: master
> > > > > > >   git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > > > > >   git describe: next-20210111
> > > > > > >   kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > > > > >
> > > > > > > output log:
> > > > > > >
> > > > > > > [  621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > > > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > > > > > = ffff8000091ab8e0
> > > > > > > [  621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > > > > > [  621.546696] Unable to handle kernel NULL pointer dereference at
> > > > > > > virtual address 0000000000000008
> > > > >
> > > > > [...]
> > > > >
> > > > > > Huh.  I am relying on virt_addr_valid() rejecting NULL pointers and
> > > > > > things like ZERO_SIZE_PTR, which is defined as ((void *)16).  It looks
> > > > > > like your configuration rejects NULL as an invalid virtual address,
> > > > > > but does not reject ZERO_SIZE_PTR.  Is this the intent, given that you
> > > > > > are not allowed to dereference a ZERO_SIZE_PTR?
> > > > > >
> > > > > > Adding the ARM64 guys on CC for their thoughts.
> > > > >
> > > > > Spooky timing, there was a thread _today_ about that:
> > > > >
> > > > > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
> > > >
> > > > Very good, then my workaround (shown below for Naresh's ease of testing)
> > > > is only a short-term workaround.  Yay!  ;-)
> > >
> > > Paul, thanks for your (short-term workaround) patch.
> > >
> > > I have applied your patch and tested rcu-torture test on qemu_arm64 and
> > > the reported issues has been fixed.
> >
> > May I add your Tested-by?
> 
> Yes.  Please add Reported-by and Tested-by.

Very good!  I have added:

Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>

Because I folded the workaround into the first commit in the series,
instead of adding your Reported-by, I added the following to that commit:

[ paulmck: Explicitly check for small pointers per Naresh Kamboju. ]

> > And before I forget again, good to see the rcutorture testing on a
> > non-x86 platform!
> 
> We are running rcutorture tests on arm, arm64, i386 and x86_64.

Nice!!!

Some ARMv8 people are getting bogus (but harmless) error messages
because parts of rcutorture think that all the world is an x86.
I am looking at a fix, but need to work out what the system is.
To that end, coul you please run the following on the arm, arm64,
and i386 systems and tell me what the output is?

	gcc -dumpmachine

> Happy to test !

And thank you very much for your testing efforts!!!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-21 17:07 rcu-torture: Internal error: Oops: 96000006 Naresh Kamboju
2021-01-21 18:55 ` Paul E. McKenney
2021-01-21 21:31   ` Will Deacon
2021-01-21 21:43     ` Paul E. McKenney
2021-01-22  9:51       ` Naresh Kamboju
2021-01-22 15:37         ` Paul E. McKenney
2021-01-22 15:46           ` Naresh Kamboju
2021-01-22 23:23             ` Paul E. McKenney
2021-01-22 10:02       ` Mark Rutland
2021-01-22 11:45         ` Vincenzo Frascino

RCU Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/rcu/0 rcu/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 rcu rcu/ https://lore.kernel.org/rcu \
		rcu@vger.kernel.org
	public-inbox-index rcu

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.rcu


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git