* Re: rcu-torture: Internal error: Oops: 96000006
@ 2021-01-21 21:43 ` Paul E. McKenney
0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-01-21 21:43 UTC (permalink / raw)
To: Will Deacon
Cc: mark.rutland, Peter Zijlstra, catalin.marinas, Naresh Kamboju,
open list, lkft-triage, rcu, Linux-Next Mailing List,
Steven Rostedt, vincenzo.frascino, Ingo Molnar, linux-arm-kernel
On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > the following kernel crash noticed. This started happening from Linux next
> > > next-20210111 tag to next-20210121.
> > >
> > > metadata:
> > > git branch: master
> > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > git describe: next-20210111
> > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > >
> > > output log:
> > >
> > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > = ffff8000091ab8e0
> > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > [ 621.546696] Unable to handle kernel NULL pointer dereference at
> > > virtual address 0000000000000008
>
> [...]
>
> > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and
> > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks
> > like your configuration rejects NULL as an invalid virtual address,
> > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you
> > are not allowed to dereference a ZERO_SIZE_PTR?
> >
> > Adding the ARM64 guys on CC for their thoughts.
>
> Spooky timing, there was a thread _today_ about that:
>
> https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
Very good, then my workaround (shown below for Naresh's ease of testing)
is only a short-term workaround. Yay! ;-)
Thanx, Paul
------------------------------------------------------------------------
diff --git a/mm/slab_common.c b/mm/slab_common.c
index cefa9ae..a8375d1 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -550,7 +550,8 @@ bool kmem_valid_obj(void *object)
{
struct page *page;
- if (!virt_addr_valid(object))
+ /* Some arches consider ZERO_SIZE_PTR to be a valid address. */
+ if (object < (void *)PAGE_SIZE || !virt_addr_valid(object))
return false;
page = virt_to_head_page(object);
return PageSlab(page);
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006
2021-01-21 21:43 ` Paul E. McKenney
@ 2021-01-22 9:51 ` Naresh Kamboju
-1 siblings, 0 replies; 19+ messages in thread
From: Naresh Kamboju @ 2021-01-22 9:51 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Will Deacon, rcu, open list, Linux-Next Mailing List,
lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar,
Catalin Marinas, Linux ARM, Vincenzo Frascino, Mark Rutland
On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > > the following kernel crash noticed. This started happening from Linux next
> > > > next-20210111 tag to next-20210121.
> > > >
> > > > metadata:
> > > > git branch: master
> > > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > > git describe: next-20210111
> > > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > >
> > > > output log:
> > > >
> > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > > = ffff8000091ab8e0
> > > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > > [ 621.546696] Unable to handle kernel NULL pointer dereference at
> > > > virtual address 0000000000000008
> >
> > [...]
> >
> > > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and
> > > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks
> > > like your configuration rejects NULL as an invalid virtual address,
> > > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you
> > > are not allowed to dereference a ZERO_SIZE_PTR?
> > >
> > > Adding the ARM64 guys on CC for their thoughts.
> >
> > Spooky timing, there was a thread _today_ about that:
> >
> > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
>
> Very good, then my workaround (shown below for Naresh's ease of testing)
> is only a short-term workaround. Yay! ;-)
Paul, thanks for your (short-term workaround) patch.
I have applied your patch and tested rcu-torture test on qemu_arm64 and
the reported issues has been fixed.
- Naresh
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006
@ 2021-01-22 9:51 ` Naresh Kamboju
0 siblings, 0 replies; 19+ messages in thread
From: Naresh Kamboju @ 2021-01-22 9:51 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, open list,
lkft-triage, rcu, Linux-Next Mailing List, Steven Rostedt,
Vincenzo Frascino, Will Deacon, Ingo Molnar, Linux ARM
On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > > the following kernel crash noticed. This started happening from Linux next
> > > > next-20210111 tag to next-20210121.
> > > >
> > > > metadata:
> > > > git branch: master
> > > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > > git describe: next-20210111
> > > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > >
> > > > output log:
> > > >
> > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > > = ffff8000091ab8e0
> > > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > > [ 621.546696] Unable to handle kernel NULL pointer dereference at
> > > > virtual address 0000000000000008
> >
> > [...]
> >
> > > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and
> > > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks
> > > like your configuration rejects NULL as an invalid virtual address,
> > > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you
> > > are not allowed to dereference a ZERO_SIZE_PTR?
> > >
> > > Adding the ARM64 guys on CC for their thoughts.
> >
> > Spooky timing, there was a thread _today_ about that:
> >
> > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
>
> Very good, then my workaround (shown below for Naresh's ease of testing)
> is only a short-term workaround. Yay! ;-)
Paul, thanks for your (short-term workaround) patch.
I have applied your patch and tested rcu-torture test on qemu_arm64 and
the reported issues has been fixed.
- Naresh
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006
2021-01-22 9:51 ` Naresh Kamboju
@ 2021-01-22 15:37 ` Paul E. McKenney
-1 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-01-22 15:37 UTC (permalink / raw)
To: Naresh Kamboju
Cc: Will Deacon, rcu, open list, Linux-Next Mailing List,
lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar,
Catalin Marinas, Linux ARM, Vincenzo Frascino, Mark Rutland
On Fri, Jan 22, 2021 at 03:21:07PM +0530, Naresh Kamboju wrote:
> On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> > > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > > > the following kernel crash noticed. This started happening from Linux next
> > > > > next-20210111 tag to next-20210121.
> > > > >
> > > > > metadata:
> > > > > git branch: master
> > > > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > > > git describe: next-20210111
> > > > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > > >
> > > > > output log:
> > > > >
> > > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > > > = ffff8000091ab8e0
> > > > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > > > [ 621.546696] Unable to handle kernel NULL pointer dereference at
> > > > > virtual address 0000000000000008
> > >
> > > [...]
> > >
> > > > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and
> > > > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks
> > > > like your configuration rejects NULL as an invalid virtual address,
> > > > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you
> > > > are not allowed to dereference a ZERO_SIZE_PTR?
> > > >
> > > > Adding the ARM64 guys on CC for their thoughts.
> > >
> > > Spooky timing, there was a thread _today_ about that:
> > >
> > > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
> >
> > Very good, then my workaround (shown below for Naresh's ease of testing)
> > is only a short-term workaround. Yay! ;-)
>
> Paul, thanks for your (short-term workaround) patch.
>
> I have applied your patch and tested rcu-torture test on qemu_arm64 and
> the reported issues has been fixed.
May I add your Tested-by?
And before I forget again, good to see the rcutorture testing on a
non-x86 platform!
Thanx, Paul
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006
@ 2021-01-22 15:37 ` Paul E. McKenney
0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-01-22 15:37 UTC (permalink / raw)
To: Naresh Kamboju
Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, open list,
lkft-triage, rcu, Linux-Next Mailing List, Steven Rostedt,
Vincenzo Frascino, Will Deacon, Ingo Molnar, Linux ARM
On Fri, Jan 22, 2021 at 03:21:07PM +0530, Naresh Kamboju wrote:
> On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> > > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > > > the following kernel crash noticed. This started happening from Linux next
> > > > > next-20210111 tag to next-20210121.
> > > > >
> > > > > metadata:
> > > > > git branch: master
> > > > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > > > git describe: next-20210111
> > > > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > > >
> > > > > output log:
> > > > >
> > > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > > > = ffff8000091ab8e0
> > > > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > > > [ 621.546696] Unable to handle kernel NULL pointer dereference at
> > > > > virtual address 0000000000000008
> > >
> > > [...]
> > >
> > > > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and
> > > > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks
> > > > like your configuration rejects NULL as an invalid virtual address,
> > > > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you
> > > > are not allowed to dereference a ZERO_SIZE_PTR?
> > > >
> > > > Adding the ARM64 guys on CC for their thoughts.
> > >
> > > Spooky timing, there was a thread _today_ about that:
> > >
> > > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
> >
> > Very good, then my workaround (shown below for Naresh's ease of testing)
> > is only a short-term workaround. Yay! ;-)
>
> Paul, thanks for your (short-term workaround) patch.
>
> I have applied your patch and tested rcu-torture test on qemu_arm64 and
> the reported issues has been fixed.
May I add your Tested-by?
And before I forget again, good to see the rcutorture testing on a
non-x86 platform!
Thanx, Paul
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006
2021-01-22 15:37 ` Paul E. McKenney
@ 2021-01-22 15:46 ` Naresh Kamboju
-1 siblings, 0 replies; 19+ messages in thread
From: Naresh Kamboju @ 2021-01-22 15:46 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Will Deacon, rcu, open list, Linux-Next Mailing List,
lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar,
Catalin Marinas, Linux ARM, Vincenzo Frascino, Mark Rutland
On Fri, 22 Jan 2021 at 21:07, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Jan 22, 2021 at 03:21:07PM +0530, Naresh Kamboju wrote:
> > On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> > > > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > > > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > > > > the following kernel crash noticed. This started happening from Linux next
> > > > > > next-20210111 tag to next-20210121.
> > > > > >
> > > > > > metadata:
> > > > > > git branch: master
> > > > > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > > > > git describe: next-20210111
> > > > > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > > > >
> > > > > > output log:
> > > > > >
> > > > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > > > > = ffff8000091ab8e0
> > > > > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > > > > [ 621.546696] Unable to handle kernel NULL pointer dereference at
> > > > > > virtual address 0000000000000008
> > > >
> > > > [...]
> > > >
> > > > > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and
> > > > > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks
> > > > > like your configuration rejects NULL as an invalid virtual address,
> > > > > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you
> > > > > are not allowed to dereference a ZERO_SIZE_PTR?
> > > > >
> > > > > Adding the ARM64 guys on CC for their thoughts.
> > > >
> > > > Spooky timing, there was a thread _today_ about that:
> > > >
> > > > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
> > >
> > > Very good, then my workaround (shown below for Naresh's ease of testing)
> > > is only a short-term workaround. Yay! ;-)
> >
> > Paul, thanks for your (short-term workaround) patch.
> >
> > I have applied your patch and tested rcu-torture test on qemu_arm64 and
> > the reported issues has been fixed.
>
> May I add your Tested-by?
Yes. Please add Reported-by and Tested-by.
>
> And before I forget again, good to see the rcutorture testing on a
> non-x86 platform!
We are running rcutorture tests on arm, arm64, i386 and x86_64.
Happy to test !
- Naresh
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006
@ 2021-01-22 15:46 ` Naresh Kamboju
0 siblings, 0 replies; 19+ messages in thread
From: Naresh Kamboju @ 2021-01-22 15:46 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, open list,
lkft-triage, rcu, Linux-Next Mailing List, Steven Rostedt,
Vincenzo Frascino, Will Deacon, Ingo Molnar, Linux ARM
On Fri, 22 Jan 2021 at 21:07, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Jan 22, 2021 at 03:21:07PM +0530, Naresh Kamboju wrote:
> > On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> > > > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > > > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > > > > the following kernel crash noticed. This started happening from Linux next
> > > > > > next-20210111 tag to next-20210121.
> > > > > >
> > > > > > metadata:
> > > > > > git branch: master
> > > > > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > > > > git describe: next-20210111
> > > > > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > > > >
> > > > > > output log:
> > > > > >
> > > > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > > > > = ffff8000091ab8e0
> > > > > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > > > > [ 621.546696] Unable to handle kernel NULL pointer dereference at
> > > > > > virtual address 0000000000000008
> > > >
> > > > [...]
> > > >
> > > > > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and
> > > > > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks
> > > > > like your configuration rejects NULL as an invalid virtual address,
> > > > > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you
> > > > > are not allowed to dereference a ZERO_SIZE_PTR?
> > > > >
> > > > > Adding the ARM64 guys on CC for their thoughts.
> > > >
> > > > Spooky timing, there was a thread _today_ about that:
> > > >
> > > > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
> > >
> > > Very good, then my workaround (shown below for Naresh's ease of testing)
> > > is only a short-term workaround. Yay! ;-)
> >
> > Paul, thanks for your (short-term workaround) patch.
> >
> > I have applied your patch and tested rcu-torture test on qemu_arm64 and
> > the reported issues has been fixed.
>
> May I add your Tested-by?
Yes. Please add Reported-by and Tested-by.
>
> And before I forget again, good to see the rcutorture testing on a
> non-x86 platform!
We are running rcutorture tests on arm, arm64, i386 and x86_64.
Happy to test !
- Naresh
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006
2021-01-22 15:46 ` Naresh Kamboju
@ 2021-01-22 23:23 ` Paul E. McKenney
-1 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-01-22 23:23 UTC (permalink / raw)
To: Naresh Kamboju
Cc: Will Deacon, rcu, open list, Linux-Next Mailing List,
lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar,
Catalin Marinas, Linux ARM, Vincenzo Frascino, Mark Rutland
On Fri, Jan 22, 2021 at 09:16:38PM +0530, Naresh Kamboju wrote:
> On Fri, 22 Jan 2021 at 21:07, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Fri, Jan 22, 2021 at 03:21:07PM +0530, Naresh Kamboju wrote:
> > > On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> > > > > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > > > > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > > > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > > > > > the following kernel crash noticed. This started happening from Linux next
> > > > > > > next-20210111 tag to next-20210121.
> > > > > > >
> > > > > > > metadata:
> > > > > > > git branch: master
> > > > > > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > > > > > git describe: next-20210111
> > > > > > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > > > > >
> > > > > > > output log:
> > > > > > >
> > > > > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > > > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > > > > > = ffff8000091ab8e0
> > > > > > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > > > > > [ 621.546696] Unable to handle kernel NULL pointer dereference at
> > > > > > > virtual address 0000000000000008
> > > > >
> > > > > [...]
> > > > >
> > > > > > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and
> > > > > > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks
> > > > > > like your configuration rejects NULL as an invalid virtual address,
> > > > > > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you
> > > > > > are not allowed to dereference a ZERO_SIZE_PTR?
> > > > > >
> > > > > > Adding the ARM64 guys on CC for their thoughts.
> > > > >
> > > > > Spooky timing, there was a thread _today_ about that:
> > > > >
> > > > > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
> > > >
> > > > Very good, then my workaround (shown below for Naresh's ease of testing)
> > > > is only a short-term workaround. Yay! ;-)
> > >
> > > Paul, thanks for your (short-term workaround) patch.
> > >
> > > I have applied your patch and tested rcu-torture test on qemu_arm64 and
> > > the reported issues has been fixed.
> >
> > May I add your Tested-by?
>
> Yes. Please add Reported-by and Tested-by.
Very good! I have added:
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Because I folded the workaround into the first commit in the series,
instead of adding your Reported-by, I added the following to that commit:
[ paulmck: Explicitly check for small pointers per Naresh Kamboju. ]
> > And before I forget again, good to see the rcutorture testing on a
> > non-x86 platform!
>
> We are running rcutorture tests on arm, arm64, i386 and x86_64.
Nice!!!
Some ARMv8 people are getting bogus (but harmless) error messages
because parts of rcutorture think that all the world is an x86.
I am looking at a fix, but need to work out what the system is.
To that end, coul you please run the following on the arm, arm64,
and i386 systems and tell me what the output is?
gcc -dumpmachine
> Happy to test !
And thank you very much for your testing efforts!!!
Thanx, Paul
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006
@ 2021-01-22 23:23 ` Paul E. McKenney
0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2021-01-22 23:23 UTC (permalink / raw)
To: Naresh Kamboju
Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, open list,
lkft-triage, rcu, Linux-Next Mailing List, Steven Rostedt,
Vincenzo Frascino, Will Deacon, Ingo Molnar, Linux ARM
On Fri, Jan 22, 2021 at 09:16:38PM +0530, Naresh Kamboju wrote:
> On Fri, 22 Jan 2021 at 21:07, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Fri, Jan 22, 2021 at 03:21:07PM +0530, Naresh Kamboju wrote:
> > > On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> > > > > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > > > > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > > > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > > > > > the following kernel crash noticed. This started happening from Linux next
> > > > > > > next-20210111 tag to next-20210121.
> > > > > > >
> > > > > > > metadata:
> > > > > > > git branch: master
> > > > > > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > > > > > git describe: next-20210111
> > > > > > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > > > > >
> > > > > > > output log:
> > > > > > >
> > > > > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > > > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > > > > > = ffff8000091ab8e0
> > > > > > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > > > > > [ 621.546696] Unable to handle kernel NULL pointer dereference at
> > > > > > > virtual address 0000000000000008
> > > > >
> > > > > [...]
> > > > >
> > > > > > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and
> > > > > > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks
> > > > > > like your configuration rejects NULL as an invalid virtual address,
> > > > > > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you
> > > > > > are not allowed to dereference a ZERO_SIZE_PTR?
> > > > > >
> > > > > > Adding the ARM64 guys on CC for their thoughts.
> > > > >
> > > > > Spooky timing, there was a thread _today_ about that:
> > > > >
> > > > > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
> > > >
> > > > Very good, then my workaround (shown below for Naresh's ease of testing)
> > > > is only a short-term workaround. Yay! ;-)
> > >
> > > Paul, thanks for your (short-term workaround) patch.
> > >
> > > I have applied your patch and tested rcu-torture test on qemu_arm64 and
> > > the reported issues has been fixed.
> >
> > May I add your Tested-by?
>
> Yes. Please add Reported-by and Tested-by.
Very good! I have added:
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Because I folded the workaround into the first commit in the series,
instead of adding your Reported-by, I added the following to that commit:
[ paulmck: Explicitly check for small pointers per Naresh Kamboju. ]
> > And before I forget again, good to see the rcutorture testing on a
> > non-x86 platform!
>
> We are running rcutorture tests on arm, arm64, i386 and x86_64.
Nice!!!
Some ARMv8 people are getting bogus (but harmless) error messages
because parts of rcutorture think that all the world is an x86.
I am looking at a fix, but need to work out what the system is.
To that end, coul you please run the following on the arm, arm64,
and i386 systems and tell me what the output is?
gcc -dumpmachine
> Happy to test !
And thank you very much for your testing efforts!!!
Thanx, Paul
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006
2021-01-21 21:43 ` Paul E. McKenney
@ 2021-01-22 10:02 ` Mark Rutland
-1 siblings, 0 replies; 19+ messages in thread
From: Mark Rutland @ 2021-01-22 10:02 UTC (permalink / raw)
To: Paul E. McKenney, vincenzo.frascino
Cc: Will Deacon, Naresh Kamboju, rcu, open list,
Linux-Next Mailing List, lkft-triage, Peter Zijlstra,
Steven Rostedt, Ingo Molnar, catalin.marinas, linux-arm-kernel
On Thu, Jan 21, 2021 at 01:43:14PM -0800, Paul E. McKenney wrote:
> On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > > the following kernel crash noticed. This started happening from Linux next
> > > > next-20210111 tag to next-20210121.
> > > >
> > > > metadata:
> > > > git branch: master
> > > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > > git describe: next-20210111
> > > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > >
> > > > output log:
> > > >
> > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > > = ffff8000091ab8e0
> > > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > > [ 621.546696] Unable to handle kernel NULL pointer dereference at
> > > > virtual address 0000000000000008
> >
> > [...]
> >
> > > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and
> > > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks
> > > like your configuration rejects NULL as an invalid virtual address,
> > > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you
> > > are not allowed to dereference a ZERO_SIZE_PTR?
> > >
> > > Adding the ARM64 guys on CC for their thoughts.
> >
> > Spooky timing, there was a thread _today_ about that:
> >
> > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
>
> Very good, then my workaround (shown below for Naresh's ease of testing)
> is only a short-term workaround. Yay! ;-)
Hopefully, though we might need to check other architectures beyond
arm64, ppc, and x86, to be certain!
Is there any other latent use of virt_addr_valid() that needs this
semantic? If so we'll probably want to backport the changes to arm64's
implementation, at least for v5.10.
Vincenzo, would you mind taking a look?
Thanks,
Mark.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006
@ 2021-01-22 10:02 ` Mark Rutland
0 siblings, 0 replies; 19+ messages in thread
From: Mark Rutland @ 2021-01-22 10:02 UTC (permalink / raw)
To: Paul E. McKenney, vincenzo.frascino
Cc: Peter Zijlstra, catalin.marinas, Naresh Kamboju, open list,
lkft-triage, rcu, Linux-Next Mailing List, Steven Rostedt,
Will Deacon, Ingo Molnar, linux-arm-kernel
On Thu, Jan 21, 2021 at 01:43:14PM -0800, Paul E. McKenney wrote:
> On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > > the following kernel crash noticed. This started happening from Linux next
> > > > next-20210111 tag to next-20210121.
> > > >
> > > > metadata:
> > > > git branch: master
> > > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > > git describe: next-20210111
> > > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > >
> > > > output log:
> > > >
> > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > > = ffff8000091ab8e0
> > > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > > [ 621.546696] Unable to handle kernel NULL pointer dereference at
> > > > virtual address 0000000000000008
> >
> > [...]
> >
> > > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and
> > > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks
> > > like your configuration rejects NULL as an invalid virtual address,
> > > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you
> > > are not allowed to dereference a ZERO_SIZE_PTR?
> > >
> > > Adding the ARM64 guys on CC for their thoughts.
> >
> > Spooky timing, there was a thread _today_ about that:
> >
> > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
>
> Very good, then my workaround (shown below for Naresh's ease of testing)
> is only a short-term workaround. Yay! ;-)
Hopefully, though we might need to check other architectures beyond
arm64, ppc, and x86, to be certain!
Is there any other latent use of virt_addr_valid() that needs this
semantic? If so we'll probably want to backport the changes to arm64's
implementation, at least for v5.10.
Vincenzo, would you mind taking a look?
Thanks,
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006
2021-01-22 10:02 ` Mark Rutland
@ 2021-01-22 11:45 ` Vincenzo Frascino
-1 siblings, 0 replies; 19+ messages in thread
From: Vincenzo Frascino @ 2021-01-22 11:45 UTC (permalink / raw)
To: Mark Rutland, Paul E. McKenney
Cc: Will Deacon, Naresh Kamboju, rcu, open list,
Linux-Next Mailing List, lkft-triage, Peter Zijlstra,
Steven Rostedt, Ingo Molnar, catalin.marinas, linux-arm-kernel
On 1/22/21 10:02 AM, Mark Rutland wrote:
> On Thu, Jan 21, 2021 at 01:43:14PM -0800, Paul E. McKenney wrote:
>> On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
>>> On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
>>>> On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
>>>>> While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
>>>>> the following kernel crash noticed. This started happening from Linux next
>>>>> next-20210111 tag to next-20210121.
>>>>>
>>>>> metadata:
>>>>> git branch: master
>>>>> git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
>>>>> git describe: next-20210111
>>>>> kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
>>>>>
>>>>> output log:
>>>>>
>>>>> [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats =
>>>>> ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
>>>>> = ffff8000091ab8e0
>>>>> [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR):
>>>>> [ 621.546696] Unable to handle kernel NULL pointer dereference at
>>>>> virtual address 0000000000000008
>>>
>>> [...]
>>>
>>>> Huh. I am relying on virt_addr_valid() rejecting NULL pointers and
>>>> things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks
>>>> like your configuration rejects NULL as an invalid virtual address,
>>>> but does not reject ZERO_SIZE_PTR. Is this the intent, given that you
>>>> are not allowed to dereference a ZERO_SIZE_PTR?
>>>>
>>>> Adding the ARM64 guys on CC for their thoughts.
>>>
>>> Spooky timing, there was a thread _today_ about that:
>>>
>>> https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
>>
>> Very good, then my workaround (shown below for Naresh's ease of testing)
>> is only a short-term workaround. Yay! ;-)
>
> Hopefully, though we might need to check other architectures beyond
> arm64, ppc, and x86, to be certain!
>
Which other architectures do you propose to verify?
> Is there any other latent use of virt_addr_valid() that needs this
> semantic? If so we'll probably want to backport the changes to arm64's
> implementation, at least for v5.10.
>
> Vincenzo, would you mind taking a look?
>
I am happy to have a look at it, but due to previous commitments I will be able
to get at it after -rc1. A quick grep shows that there are ~32 cases that might
be affected by the same semantic in the common code (left out arch/ and
drivers/). I will post the improvement for arm64 in the meantime though.
> Thanks,
> Mark.
>
--
Regards,
Vincenzo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006
@ 2021-01-22 11:45 ` Vincenzo Frascino
0 siblings, 0 replies; 19+ messages in thread
From: Vincenzo Frascino @ 2021-01-22 11:45 UTC (permalink / raw)
To: Mark Rutland, Paul E. McKenney
Cc: Peter Zijlstra, catalin.marinas, Naresh Kamboju, open list,
lkft-triage, rcu, Linux-Next Mailing List, Steven Rostedt,
Will Deacon, Ingo Molnar, linux-arm-kernel
On 1/22/21 10:02 AM, Mark Rutland wrote:
> On Thu, Jan 21, 2021 at 01:43:14PM -0800, Paul E. McKenney wrote:
>> On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
>>> On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
>>>> On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
>>>>> While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
>>>>> the following kernel crash noticed. This started happening from Linux next
>>>>> next-20210111 tag to next-20210121.
>>>>>
>>>>> metadata:
>>>>> git branch: master
>>>>> git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
>>>>> git describe: next-20210111
>>>>> kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
>>>>>
>>>>> output log:
>>>>>
>>>>> [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats =
>>>>> ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
>>>>> = ffff8000091ab8e0
>>>>> [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR):
>>>>> [ 621.546696] Unable to handle kernel NULL pointer dereference at
>>>>> virtual address 0000000000000008
>>>
>>> [...]
>>>
>>>> Huh. I am relying on virt_addr_valid() rejecting NULL pointers and
>>>> things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks
>>>> like your configuration rejects NULL as an invalid virtual address,
>>>> but does not reject ZERO_SIZE_PTR. Is this the intent, given that you
>>>> are not allowed to dereference a ZERO_SIZE_PTR?
>>>>
>>>> Adding the ARM64 guys on CC for their thoughts.
>>>
>>> Spooky timing, there was a thread _today_ about that:
>>>
>>> https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
>>
>> Very good, then my workaround (shown below for Naresh's ease of testing)
>> is only a short-term workaround. Yay! ;-)
>
> Hopefully, though we might need to check other architectures beyond
> arm64, ppc, and x86, to be certain!
>
Which other architectures do you propose to verify?
> Is there any other latent use of virt_addr_valid() that needs this
> semantic? If so we'll probably want to backport the changes to arm64's
> implementation, at least for v5.10.
>
> Vincenzo, would you mind taking a look?
>
I am happy to have a look at it, but due to previous commitments I will be able
to get at it after -rc1. A quick grep shows that there are ~32 cases that might
be affected by the same semantic in the common code (left out arch/ and
drivers/). I will post the improvement for arm64 in the meantime though.
> Thanks,
> Mark.
>
--
Regards,
Vincenzo
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 19+ messages in thread