* rcu-torture: Internal error: Oops: 96000006 @ 2021-01-21 17:07 Naresh Kamboju 2021-01-21 18:55 ` Paul E. McKenney 0 siblings, 1 reply; 10+ messages in thread From: Naresh Kamboju @ 2021-01-21 17:07 UTC (permalink / raw) To: rcu, open list, Linux-Next Mailing List, lkft-triage Cc: Paul E. McKenney, Peter Zijlstra, Steven Rostedt, Ingo Molnar While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device the following kernel crash noticed. This started happening from Linux next next-20210111 tag to next-20210121. metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20210111 kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config output log: [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z = ffff8000091ab8e0 [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): [ 621.546696] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 621.555431] Mem abort info: [ 621.557243] ESR = 0x96000006 [ 621.559074] EC = 0x25: DABT (current EL), IL = 32 bits [ 621.562240] SET = 0, FnV = 0 [ 621.563626] EA = 0, S1PTW = 0 [ 621.565134] Data abort info: [ 621.566425] ISV = 0, ISS = 0x00000006 [ 621.568064] CM = 0, WnR = 0 [ 621.569571] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101ef0000 [ 621.572446] [0000000000000008] pgd=0000000102ee1003, p4d=0000000102ee1003, pud=0000000100b25003, pmd=0000000000000000 [ 621.577007] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 621.579359] Modules linked in: rcutorture(-) torture rfkill crct10dif_ce fuse [ 621.582549] CPU: 2 PID: 422 Comm: rcu_torture_sta Not tainted 5.11.0-rc2-next-20210111 #2 [ 621.585294] Hardware name: linux,dummy-virt (DT) [ 621.586671] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) [ 621.588497] pc : kmem_valid_obj+0x58/0xa8 [ 621.589748] lr : kmem_valid_obj+0x40/0xa8 [ 621.591022] sp : ffff800012debdc0 [ 621.592026] x29: ffff800012debdc0 x28: 0000000000000000 [ 621.593652] x27: ffff800012e8b988 x26: ffff0000c634dbc0 [ 621.595287] x25: 0000000000000000 x24: ffff8000091a1e60 [ 621.596882] x23: ffff8000091ab8e0 x22: ffff0000c0a3ac40 [ 621.598464] x21: ffff0000c1f44100 x20: ffff8000091a5e90 [ 621.600070] x19: 0000000000000000 x18: 0000000000000010 [ 621.601692] x17: 0000000000007fff x16: 00000000ffffffff [ 621.603303] x15: ffff0000c0a3b0b8 x14: 66203d2070687226 [ 621.604866] x13: 202c303463613361 x12: 2c30346562656432 [ 621.606455] x11: ffff80001246cbc0 x10: ffff800012454b80 [ 621.608064] x9 : ffff8000100370c8 x8 : 0000000100000000 [ 621.609649] x7 : 0000000000000018 x6 : ffff800012816348 [ 621.611253] x5 : ffff800012816348 x4 : 0000000000000001 [ 621.612849] x3 : 0000000000000001 x2 : 0000000140000000 [ 621.614455] x1 : 0000000000000000 x0 : fffffc0000000000 [ 621.616062] Call trace: [ 621.616816] kmem_valid_obj+0x58/0xa8 [ 621.617933] mem_dump_obj+0x20/0xc8 [ 621.619015] rcu_torture_stats+0xf0/0x298 [rcutorture] [ 621.620578] kthread+0x120/0x158 [ 621.621583] ret_from_fork+0x10/0x34 [ 621.622685] Code: 8b000273 b25657e0 d34cfe73 8b131813 (f9400660) [ 621.624570] ---[ end trace 2a00688830f37ea1 ]--- Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Full test log: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20210111/testrun/3716647/suite/linux-log-parser/test/check-kernel-oops-2124993/log https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20210121/testrun/3791289/suite/linux-log-parser/test/check-kernel-oops-2172413/log -- Linaro LKFT https://lkft.linaro.org ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006 2021-01-21 17:07 rcu-torture: Internal error: Oops: 96000006 Naresh Kamboju @ 2021-01-21 18:55 ` Paul E. McKenney 2021-01-21 21:31 ` Will Deacon 0 siblings, 1 reply; 10+ messages in thread From: Paul E. McKenney @ 2021-01-21 18:55 UTC (permalink / raw) To: Naresh Kamboju Cc: rcu, open list, Linux-Next Mailing List, lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar, catalin.marinas, will, linux-arm-kernel On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote: > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device > the following kernel crash noticed. This started happening from Linux next > next-20210111 tag to next-20210121. > > metadata: > git branch: master > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next > git describe: next-20210111 > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config > > output log: > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z > = ffff8000091ab8e0 > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): > [ 621.546696] Unable to handle kernel NULL pointer dereference at > virtual address 0000000000000008 > [ 621.555431] Mem abort info: > [ 621.557243] ESR = 0x96000006 > [ 621.559074] EC = 0x25: DABT (current EL), IL = 32 bits > [ 621.562240] SET = 0, FnV = 0 > [ 621.563626] EA = 0, S1PTW = 0 > [ 621.565134] Data abort info: > [ 621.566425] ISV = 0, ISS = 0x00000006 > [ 621.568064] CM = 0, WnR = 0 > [ 621.569571] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101ef0000 > [ 621.572446] [0000000000000008] pgd=0000000102ee1003, > p4d=0000000102ee1003, pud=0000000100b25003, pmd=0000000000000000 > [ 621.577007] Internal error: Oops: 96000006 [#1] PREEMPT SMP > [ 621.579359] Modules linked in: rcutorture(-) torture rfkill crct10dif_ce fuse > [ 621.582549] CPU: 2 PID: 422 Comm: rcu_torture_sta Not tainted > 5.11.0-rc2-next-20210111 #2 > [ 621.585294] Hardware name: linux,dummy-virt (DT) > [ 621.586671] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) > [ 621.588497] pc : kmem_valid_obj+0x58/0xa8 > [ 621.589748] lr : kmem_valid_obj+0x40/0xa8 > [ 621.591022] sp : ffff800012debdc0 > [ 621.592026] x29: ffff800012debdc0 x28: 0000000000000000 > [ 621.593652] x27: ffff800012e8b988 x26: ffff0000c634dbc0 > [ 621.595287] x25: 0000000000000000 x24: ffff8000091a1e60 > [ 621.596882] x23: ffff8000091ab8e0 x22: ffff0000c0a3ac40 > [ 621.598464] x21: ffff0000c1f44100 x20: ffff8000091a5e90 > [ 621.600070] x19: 0000000000000000 x18: 0000000000000010 > [ 621.601692] x17: 0000000000007fff x16: 00000000ffffffff > [ 621.603303] x15: ffff0000c0a3b0b8 x14: 66203d2070687226 > [ 621.604866] x13: 202c303463613361 x12: 2c30346562656432 > [ 621.606455] x11: ffff80001246cbc0 x10: ffff800012454b80 > [ 621.608064] x9 : ffff8000100370c8 x8 : 0000000100000000 > [ 621.609649] x7 : 0000000000000018 x6 : ffff800012816348 > [ 621.611253] x5 : ffff800012816348 x4 : 0000000000000001 > [ 621.612849] x3 : 0000000000000001 x2 : 0000000140000000 > [ 621.614455] x1 : 0000000000000000 x0 : fffffc0000000000 > [ 621.616062] Call trace: > [ 621.616816] kmem_valid_obj+0x58/0xa8 > [ 621.617933] mem_dump_obj+0x20/0xc8 > [ 621.619015] rcu_torture_stats+0xf0/0x298 [rcutorture] > [ 621.620578] kthread+0x120/0x158 > [ 621.621583] ret_from_fork+0x10/0x34 > [ 621.622685] Code: 8b000273 b25657e0 d34cfe73 8b131813 (f9400660) > [ 621.624570] ---[ end trace 2a00688830f37ea1 ]--- > > Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> > > Full test log: > https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20210111/testrun/3716647/suite/linux-log-parser/test/check-kernel-oops-2124993/log > https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20210121/testrun/3791289/suite/linux-log-parser/test/check-kernel-oops-2172413/log Huh. I am relying on virt_addr_valid() rejecting NULL pointers and things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks like your configuration rejects NULL as an invalid virtual address, but does not reject ZERO_SIZE_PTR. Is this the intent, given that you are not allowed to dereference a ZERO_SIZE_PTR? Adding the ARM64 guys on CC for their thoughts. It is easy enough for me to make kmem_valid_obj() return false for any address less than (say) PAGE_SIZE for the upcoming merge window, but I figured I should check for the longer term. Thanx, Paul ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006 2021-01-21 18:55 ` Paul E. McKenney @ 2021-01-21 21:31 ` Will Deacon 2021-01-21 21:43 ` Paul E. McKenney 0 siblings, 1 reply; 10+ messages in thread From: Will Deacon @ 2021-01-21 21:31 UTC (permalink / raw) To: Paul E. McKenney Cc: Naresh Kamboju, rcu, open list, Linux-Next Mailing List, lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar, catalin.marinas, linux-arm-kernel, vincenzo.frascino, mark.rutland On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote: > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote: > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device > > the following kernel crash noticed. This started happening from Linux next > > next-20210111 tag to next-20210121. > > > > metadata: > > git branch: master > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next > > git describe: next-20210111 > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config > > > > output log: > > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z > > = ffff8000091ab8e0 > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): > > [ 621.546696] Unable to handle kernel NULL pointer dereference at > > virtual address 0000000000000008 [...] > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks > like your configuration rejects NULL as an invalid virtual address, > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you > are not allowed to dereference a ZERO_SIZE_PTR? > > Adding the ARM64 guys on CC for their thoughts. Spooky timing, there was a thread _today_ about that: https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com Will ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006 2021-01-21 21:31 ` Will Deacon @ 2021-01-21 21:43 ` Paul E. McKenney 2021-01-22 9:51 ` Naresh Kamboju 2021-01-22 10:02 ` Mark Rutland 0 siblings, 2 replies; 10+ messages in thread From: Paul E. McKenney @ 2021-01-21 21:43 UTC (permalink / raw) To: Will Deacon Cc: Naresh Kamboju, rcu, open list, Linux-Next Mailing List, lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar, catalin.marinas, linux-arm-kernel, vincenzo.frascino, mark.rutland On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote: > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote: > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote: > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device > > > the following kernel crash noticed. This started happening from Linux next > > > next-20210111 tag to next-20210121. > > > > > > metadata: > > > git branch: master > > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next > > > git describe: next-20210111 > > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config > > > > > > output log: > > > > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z > > > = ffff8000091ab8e0 > > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): > > > [ 621.546696] Unable to handle kernel NULL pointer dereference at > > > virtual address 0000000000000008 > > [...] > > > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and > > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks > > like your configuration rejects NULL as an invalid virtual address, > > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you > > are not allowed to dereference a ZERO_SIZE_PTR? > > > > Adding the ARM64 guys on CC for their thoughts. > > Spooky timing, there was a thread _today_ about that: > > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com Very good, then my workaround (shown below for Naresh's ease of testing) is only a short-term workaround. Yay! ;-) Thanx, Paul ------------------------------------------------------------------------ diff --git a/mm/slab_common.c b/mm/slab_common.c index cefa9ae..a8375d1 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -550,7 +550,8 @@ bool kmem_valid_obj(void *object) { struct page *page; - if (!virt_addr_valid(object)) + /* Some arches consider ZERO_SIZE_PTR to be a valid address. */ + if (object < (void *)PAGE_SIZE || !virt_addr_valid(object)) return false; page = virt_to_head_page(object); return PageSlab(page); ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006 2021-01-21 21:43 ` Paul E. McKenney @ 2021-01-22 9:51 ` Naresh Kamboju 2021-01-22 15:37 ` Paul E. McKenney 2021-01-22 10:02 ` Mark Rutland 1 sibling, 1 reply; 10+ messages in thread From: Naresh Kamboju @ 2021-01-22 9:51 UTC (permalink / raw) To: Paul E. McKenney Cc: Will Deacon, rcu, open list, Linux-Next Mailing List, lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar, Catalin Marinas, Linux ARM, Vincenzo Frascino, Mark Rutland On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote: > > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote: > > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote: > > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device > > > > the following kernel crash noticed. This started happening from Linux next > > > > next-20210111 tag to next-20210121. > > > > > > > > metadata: > > > > git branch: master > > > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next > > > > git describe: next-20210111 > > > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config > > > > > > > > output log: > > > > > > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = > > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z > > > > = ffff8000091ab8e0 > > > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): > > > > [ 621.546696] Unable to handle kernel NULL pointer dereference at > > > > virtual address 0000000000000008 > > > > [...] > > > > > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and > > > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks > > > like your configuration rejects NULL as an invalid virtual address, > > > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you > > > are not allowed to dereference a ZERO_SIZE_PTR? > > > > > > Adding the ARM64 guys on CC for their thoughts. > > > > Spooky timing, there was a thread _today_ about that: > > > > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com > > Very good, then my workaround (shown below for Naresh's ease of testing) > is only a short-term workaround. Yay! ;-) Paul, thanks for your (short-term workaround) patch. I have applied your patch and tested rcu-torture test on qemu_arm64 and the reported issues has been fixed. - Naresh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006 2021-01-22 9:51 ` Naresh Kamboju @ 2021-01-22 15:37 ` Paul E. McKenney 2021-01-22 15:46 ` Naresh Kamboju 0 siblings, 1 reply; 10+ messages in thread From: Paul E. McKenney @ 2021-01-22 15:37 UTC (permalink / raw) To: Naresh Kamboju Cc: Will Deacon, rcu, open list, Linux-Next Mailing List, lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar, Catalin Marinas, Linux ARM, Vincenzo Frascino, Mark Rutland On Fri, Jan 22, 2021 at 03:21:07PM +0530, Naresh Kamboju wrote: > On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote: > > > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote: > > > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote: > > > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device > > > > > the following kernel crash noticed. This started happening from Linux next > > > > > next-20210111 tag to next-20210121. > > > > > > > > > > metadata: > > > > > git branch: master > > > > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next > > > > > git describe: next-20210111 > > > > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config > > > > > > > > > > output log: > > > > > > > > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = > > > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z > > > > > = ffff8000091ab8e0 > > > > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): > > > > > [ 621.546696] Unable to handle kernel NULL pointer dereference at > > > > > virtual address 0000000000000008 > > > > > > [...] > > > > > > > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and > > > > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks > > > > like your configuration rejects NULL as an invalid virtual address, > > > > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you > > > > are not allowed to dereference a ZERO_SIZE_PTR? > > > > > > > > Adding the ARM64 guys on CC for their thoughts. > > > > > > Spooky timing, there was a thread _today_ about that: > > > > > > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com > > > > Very good, then my workaround (shown below for Naresh's ease of testing) > > is only a short-term workaround. Yay! ;-) > > Paul, thanks for your (short-term workaround) patch. > > I have applied your patch and tested rcu-torture test on qemu_arm64 and > the reported issues has been fixed. May I add your Tested-by? And before I forget again, good to see the rcutorture testing on a non-x86 platform! Thanx, Paul ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006 2021-01-22 15:37 ` Paul E. McKenney @ 2021-01-22 15:46 ` Naresh Kamboju 2021-01-22 23:23 ` Paul E. McKenney 0 siblings, 1 reply; 10+ messages in thread From: Naresh Kamboju @ 2021-01-22 15:46 UTC (permalink / raw) To: Paul E. McKenney Cc: Will Deacon, rcu, open list, Linux-Next Mailing List, lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar, Catalin Marinas, Linux ARM, Vincenzo Frascino, Mark Rutland On Fri, 22 Jan 2021 at 21:07, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Fri, Jan 22, 2021 at 03:21:07PM +0530, Naresh Kamboju wrote: > > On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote: > > > > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote: > > > > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote: > > > > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device > > > > > > the following kernel crash noticed. This started happening from Linux next > > > > > > next-20210111 tag to next-20210121. > > > > > > > > > > > > metadata: > > > > > > git branch: master > > > > > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next > > > > > > git describe: next-20210111 > > > > > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config > > > > > > > > > > > > output log: > > > > > > > > > > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = > > > > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z > > > > > > = ffff8000091ab8e0 > > > > > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): > > > > > > [ 621.546696] Unable to handle kernel NULL pointer dereference at > > > > > > virtual address 0000000000000008 > > > > > > > > [...] > > > > > > > > > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and > > > > > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks > > > > > like your configuration rejects NULL as an invalid virtual address, > > > > > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you > > > > > are not allowed to dereference a ZERO_SIZE_PTR? > > > > > > > > > > Adding the ARM64 guys on CC for their thoughts. > > > > > > > > Spooky timing, there was a thread _today_ about that: > > > > > > > > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com > > > > > > Very good, then my workaround (shown below for Naresh's ease of testing) > > > is only a short-term workaround. Yay! ;-) > > > > Paul, thanks for your (short-term workaround) patch. > > > > I have applied your patch and tested rcu-torture test on qemu_arm64 and > > the reported issues has been fixed. > > May I add your Tested-by? Yes. Please add Reported-by and Tested-by. > > And before I forget again, good to see the rcutorture testing on a > non-x86 platform! We are running rcutorture tests on arm, arm64, i386 and x86_64. Happy to test ! - Naresh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006 2021-01-22 15:46 ` Naresh Kamboju @ 2021-01-22 23:23 ` Paul E. McKenney 0 siblings, 0 replies; 10+ messages in thread From: Paul E. McKenney @ 2021-01-22 23:23 UTC (permalink / raw) To: Naresh Kamboju Cc: Will Deacon, rcu, open list, Linux-Next Mailing List, lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar, Catalin Marinas, Linux ARM, Vincenzo Frascino, Mark Rutland On Fri, Jan 22, 2021 at 09:16:38PM +0530, Naresh Kamboju wrote: > On Fri, 22 Jan 2021 at 21:07, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Fri, Jan 22, 2021 at 03:21:07PM +0530, Naresh Kamboju wrote: > > > On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote: > > > > > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote: > > > > > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote: > > > > > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device > > > > > > > the following kernel crash noticed. This started happening from Linux next > > > > > > > next-20210111 tag to next-20210121. > > > > > > > > > > > > > > metadata: > > > > > > > git branch: master > > > > > > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next > > > > > > > git describe: next-20210111 > > > > > > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config > > > > > > > > > > > > > > output log: > > > > > > > > > > > > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = > > > > > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z > > > > > > > = ffff8000091ab8e0 > > > > > > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): > > > > > > > [ 621.546696] Unable to handle kernel NULL pointer dereference at > > > > > > > virtual address 0000000000000008 > > > > > > > > > > [...] > > > > > > > > > > > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and > > > > > > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks > > > > > > like your configuration rejects NULL as an invalid virtual address, > > > > > > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you > > > > > > are not allowed to dereference a ZERO_SIZE_PTR? > > > > > > > > > > > > Adding the ARM64 guys on CC for their thoughts. > > > > > > > > > > Spooky timing, there was a thread _today_ about that: > > > > > > > > > > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com > > > > > > > > Very good, then my workaround (shown below for Naresh's ease of testing) > > > > is only a short-term workaround. Yay! ;-) > > > > > > Paul, thanks for your (short-term workaround) patch. > > > > > > I have applied your patch and tested rcu-torture test on qemu_arm64 and > > > the reported issues has been fixed. > > > > May I add your Tested-by? > > Yes. Please add Reported-by and Tested-by. Very good! I have added: Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Because I folded the workaround into the first commit in the series, instead of adding your Reported-by, I added the following to that commit: [ paulmck: Explicitly check for small pointers per Naresh Kamboju. ] > > And before I forget again, good to see the rcutorture testing on a > > non-x86 platform! > > We are running rcutorture tests on arm, arm64, i386 and x86_64. Nice!!! Some ARMv8 people are getting bogus (but harmless) error messages because parts of rcutorture think that all the world is an x86. I am looking at a fix, but need to work out what the system is. To that end, coul you please run the following on the arm, arm64, and i386 systems and tell me what the output is? gcc -dumpmachine > Happy to test ! And thank you very much for your testing efforts!!! Thanx, Paul ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006 2021-01-21 21:43 ` Paul E. McKenney 2021-01-22 9:51 ` Naresh Kamboju @ 2021-01-22 10:02 ` Mark Rutland 2021-01-22 11:45 ` Vincenzo Frascino 1 sibling, 1 reply; 10+ messages in thread From: Mark Rutland @ 2021-01-22 10:02 UTC (permalink / raw) To: Paul E. McKenney, vincenzo.frascino Cc: Will Deacon, Naresh Kamboju, rcu, open list, Linux-Next Mailing List, lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar, catalin.marinas, linux-arm-kernel On Thu, Jan 21, 2021 at 01:43:14PM -0800, Paul E. McKenney wrote: > On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote: > > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote: > > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote: > > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device > > > > the following kernel crash noticed. This started happening from Linux next > > > > next-20210111 tag to next-20210121. > > > > > > > > metadata: > > > > git branch: master > > > > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next > > > > git describe: next-20210111 > > > > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config > > > > > > > > output log: > > > > > > > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = > > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z > > > > = ffff8000091ab8e0 > > > > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): > > > > [ 621.546696] Unable to handle kernel NULL pointer dereference at > > > > virtual address 0000000000000008 > > > > [...] > > > > > Huh. I am relying on virt_addr_valid() rejecting NULL pointers and > > > things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks > > > like your configuration rejects NULL as an invalid virtual address, > > > but does not reject ZERO_SIZE_PTR. Is this the intent, given that you > > > are not allowed to dereference a ZERO_SIZE_PTR? > > > > > > Adding the ARM64 guys on CC for their thoughts. > > > > Spooky timing, there was a thread _today_ about that: > > > > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com > > Very good, then my workaround (shown below for Naresh's ease of testing) > is only a short-term workaround. Yay! ;-) Hopefully, though we might need to check other architectures beyond arm64, ppc, and x86, to be certain! Is there any other latent use of virt_addr_valid() that needs this semantic? If so we'll probably want to backport the changes to arm64's implementation, at least for v5.10. Vincenzo, would you mind taking a look? Thanks, Mark. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rcu-torture: Internal error: Oops: 96000006 2021-01-22 10:02 ` Mark Rutland @ 2021-01-22 11:45 ` Vincenzo Frascino 0 siblings, 0 replies; 10+ messages in thread From: Vincenzo Frascino @ 2021-01-22 11:45 UTC (permalink / raw) To: Mark Rutland, Paul E. McKenney Cc: Will Deacon, Naresh Kamboju, rcu, open list, Linux-Next Mailing List, lkft-triage, Peter Zijlstra, Steven Rostedt, Ingo Molnar, catalin.marinas, linux-arm-kernel On 1/22/21 10:02 AM, Mark Rutland wrote: > On Thu, Jan 21, 2021 at 01:43:14PM -0800, Paul E. McKenney wrote: >> On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote: >>> On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote: >>>> On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote: >>>>> While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device >>>>> the following kernel crash noticed. This started happening from Linux next >>>>> next-20210111 tag to next-20210121. >>>>> >>>>> metadata: >>>>> git branch: master >>>>> git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next >>>>> git describe: next-20210111 >>>>> kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config >>>>> >>>>> output log: >>>>> >>>>> [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = >>>>> ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z >>>>> = ffff8000091ab8e0 >>>>> [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): >>>>> [ 621.546696] Unable to handle kernel NULL pointer dereference at >>>>> virtual address 0000000000000008 >>> >>> [...] >>> >>>> Huh. I am relying on virt_addr_valid() rejecting NULL pointers and >>>> things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks >>>> like your configuration rejects NULL as an invalid virtual address, >>>> but does not reject ZERO_SIZE_PTR. Is this the intent, given that you >>>> are not allowed to dereference a ZERO_SIZE_PTR? >>>> >>>> Adding the ARM64 guys on CC for their thoughts. >>> >>> Spooky timing, there was a thread _today_ about that: >>> >>> https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com >> >> Very good, then my workaround (shown below for Naresh's ease of testing) >> is only a short-term workaround. Yay! ;-) > > Hopefully, though we might need to check other architectures beyond > arm64, ppc, and x86, to be certain! > Which other architectures do you propose to verify? > Is there any other latent use of virt_addr_valid() that needs this > semantic? If so we'll probably want to backport the changes to arm64's > implementation, at least for v5.10. > > Vincenzo, would you mind taking a look? > I am happy to have a look at it, but due to previous commitments I will be able to get at it after -rc1. A quick grep shows that there are ~32 cases that might be affected by the same semantic in the common code (left out arch/ and drivers/). I will post the improvement for arm64 in the meantime though. > Thanks, > Mark. > -- Regards, Vincenzo ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-01-22 23:24 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-01-21 17:07 rcu-torture: Internal error: Oops: 96000006 Naresh Kamboju 2021-01-21 18:55 ` Paul E. McKenney 2021-01-21 21:31 ` Will Deacon 2021-01-21 21:43 ` Paul E. McKenney 2021-01-22 9:51 ` Naresh Kamboju 2021-01-22 15:37 ` Paul E. McKenney 2021-01-22 15:46 ` Naresh Kamboju 2021-01-22 23:23 ` Paul E. McKenney 2021-01-22 10:02 ` Mark Rutland 2021-01-22 11:45 ` Vincenzo Frascino
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).