linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [-merge] BUG followed by oops running ndctl tests
@ 2019-11-15  6:06 Sachin Sant
  2019-11-15 12:04 ` Michael Ellerman
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Sachin Sant @ 2019-11-15  6:06 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: harish, Aneesh Kumar K. V

Following Oops is seen on latest (commit 3b4852888d) powerpc merge branch
code while running ndctl (test_namespace) tests

85c5b0984e was good.

 (06/12) avocado-misc-tests/memory/ndctl.py:NdctlTest.test_namespace:  [  213.570536] memmap_init_zone_device initialised 1636608 pages in 10ms
[  213.570835] pmem0: detected capacity change from 0 to 107256741888
[  216.488983] BUG: Unable to handle kernel data access at 0xc000043900000000
[  216.488996] Faulting instruction address: 0xc000000000087510
[  216.489002] Oops: Kernel access of bad area, sig: 11 [#1]
[  216.489007] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[  216.489019] Dumping ftrace buffer:
[  216.489029]    (ftrace buffer empty)
[  216.489033] Modules linked in: dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip6_tables nft_compat ip_set nf_tables nfnetlink sunrpc sg pseries_rng papr_scm uio_pdrv_genirq uio sch_fq_codel ip_tables sd_mod ibmvscsi ibmveth scsi_transport_srp
[  216.489059] CPU: 8 PID: 17523 Comm: lt-ndctl Not tainted 5.4.0-rc7-autotest #1
[  216.489065] NIP:  c000000000087510 LR: c00000000008752c CTR: 01ffffffce800000
[  216.489071] REGS: c000007ca84a37d0 TRAP: 0300   Not tainted  (5.4.0-rc7-autotest)
[  216.489076] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 42048224  XER: 00000000
[  216.489086] CFAR: c000000000087518 DAR: c000043900000000 DSISR: 40000000 IRQMASK: 0 
[  216.489086] GPR00: c00000000008752c c000007ca84a3a60 c00000000159bb00 0000000000000000 
[  216.489086] GPR04: 40066bdea7010e15 0000605530000194 0000000000000000 0000000000000080 
[  216.489086] GPR08: c000043900000000 ffffffffc000007f 01ffffffff800000 0000000000000000 
[  216.489086] GPR12: 0000000000008000 c00000001ec5d200 00007ffff897f9e9 000000001002e088 
[  216.489086] GPR16: 0000000000000000 0000000010050d88 000000001002f778 000000001002f770 
[  216.489086] GPR20: 0000000000000000 000000001002e048 0000000010050e3d 0000000010050e40 
[  216.489086] GPR24: 0000000000000000 c000007c8d0a6c10 c000007cced28a20 c000000001463048 
[  216.489086] GPR28: c000042080000000 c000042040000000 c000043900000000 c000042000000000 
[  216.489137] NIP [c000000000087510] arch_remove_memory+0x100/0x1b0
[  216.489143] LR [c00000000008752c] arch_remove_memory+0x11c/0x1b0
[  216.489148] Call Trace:
[  216.489151] [c000007ca84a3a60] [c00000000008752c] arch_remove_memory+0x11c/0x1b0 (unreliable)
[  216.489159] [c000007ca84a3b00] [c000000000407258] memunmap_pages+0x188/0x2c0
[  216.489167] [c000007ca84a3b80] [c0000000007b0810] devm_action_release+0x30/0x50
[  216.489174] [c000007ca84a3ba0] [c0000000007b18f8] release_nodes+0x2f8/0x3e0
[  216.489180] [c000007ca84a3c50] [c0000000007aa698] device_release_driver_internal+0x168/0x270
[  216.489187] [c000007ca84a3c90] [c0000000007a6ad0] unbind_store+0x130/0x170
[  216.489193] [c000007ca84a3cd0] [c0000000007a5c34] drv_attr_store+0x44/0x60
[  216.489200] [c000007ca84a3cf0] [c0000000004fa0d8] sysfs_kf_write+0x68/0x80
[  216.489205] [c000007ca84a3d10] [c0000000004f9530] kernfs_fop_write+0xf0/0x270
[  216.489212] [c000007ca84a3d60] [c00000000040cbdc] __vfs_write+0x3c/0x70
[  216.489217] [c000007ca84a3d80] [c00000000041052c] vfs_write+0xcc/0x240
[  216.489223] [c000007ca84a3dd0] [c00000000041090c] ksys_write+0x7c/0x140
[  216.489229] [c000007ca84a3e20] [c00000000000b278] system_call+0x5c/0x68
[  216.489233] Instruction dump:
[  216.489238] 80fb0008 815b000c 7d0700d0 7d08e038 7c0004ac 4c00012c 3927ffff 7d29ea14 
[  216.489245] 7d284850 7d2a5437 41820014 7d4903a6 <7c0040ac> 7d083a14 4200fff8 7c0004ac 
[  216.489254] ---[ end trace d9a4dfc9e158858a ]—

Thanks
-Sachin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [-merge] BUG followed by oops running ndctl tests
  2019-11-15  6:06 [-merge] BUG followed by oops running ndctl tests Sachin Sant
@ 2019-11-15 12:04 ` Michael Ellerman
  2019-11-15 18:55 ` Aneesh Kumar K.V
  2019-12-03  8:58 ` Aneesh Kumar K.V
  2 siblings, 0 replies; 6+ messages in thread
From: Michael Ellerman @ 2019-11-15 12:04 UTC (permalink / raw)
  To: Sachin Sant, linuxppc-dev; +Cc: harish, Aneesh Kumar K. V, Alastair D'Silva

Sachin Sant <sachinp@linux.vnet.ibm.com> writes:
> Following Oops is seen on latest (commit 3b4852888d) powerpc merge branch
> code while running ndctl (test_namespace) tests
>
> 85c5b0984e was good.

The obvious change is:

  076265907cf9 ("powerpc: Chunk calls to flush_dcache_range in arch_*_memory")

Though not obvious why it would cause that oops.

cheers

>  (06/12) avocado-misc-tests/memory/ndctl.py:NdctlTest.test_namespace:  [  213.570536] memmap_init_zone_device initialised 1636608 pages in 10ms
> [  213.570835] pmem0: detected capacity change from 0 to 107256741888
> [  216.488983] BUG: Unable to handle kernel data access at 0xc000043900000000
> [  216.488996] Faulting instruction address: 0xc000000000087510
> [  216.489002] Oops: Kernel access of bad area, sig: 11 [#1]
> [  216.489007] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [  216.489019] Dumping ftrace buffer:
> [  216.489029]    (ftrace buffer empty)
> [  216.489033] Modules linked in: dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip6_tables nft_compat ip_set nf_tables nfnetlink sunrpc sg pseries_rng papr_scm uio_pdrv_genirq uio sch_fq_codel ip_tables sd_mod ibmvscsi ibmveth scsi_transport_srp
> [  216.489059] CPU: 8 PID: 17523 Comm: lt-ndctl Not tainted 5.4.0-rc7-autotest #1
> [  216.489065] NIP:  c000000000087510 LR: c00000000008752c CTR: 01ffffffce800000
> [  216.489071] REGS: c000007ca84a37d0 TRAP: 0300   Not tainted  (5.4.0-rc7-autotest)
> [  216.489076] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 42048224  XER: 00000000
> [  216.489086] CFAR: c000000000087518 DAR: c000043900000000 DSISR: 40000000 IRQMASK: 0 
> [  216.489086] GPR00: c00000000008752c c000007ca84a3a60 c00000000159bb00 0000000000000000 
> [  216.489086] GPR04: 40066bdea7010e15 0000605530000194 0000000000000000 0000000000000080 
> [  216.489086] GPR08: c000043900000000 ffffffffc000007f 01ffffffff800000 0000000000000000 
> [  216.489086] GPR12: 0000000000008000 c00000001ec5d200 00007ffff897f9e9 000000001002e088 
> [  216.489086] GPR16: 0000000000000000 0000000010050d88 000000001002f778 000000001002f770 
> [  216.489086] GPR20: 0000000000000000 000000001002e048 0000000010050e3d 0000000010050e40 
> [  216.489086] GPR24: 0000000000000000 c000007c8d0a6c10 c000007cced28a20 c000000001463048 
> [  216.489086] GPR28: c000042080000000 c000042040000000 c000043900000000 c000042000000000 
> [  216.489137] NIP [c000000000087510] arch_remove_memory+0x100/0x1b0
> [  216.489143] LR [c00000000008752c] arch_remove_memory+0x11c/0x1b0
> [  216.489148] Call Trace:
> [  216.489151] [c000007ca84a3a60] [c00000000008752c] arch_remove_memory+0x11c/0x1b0 (unreliable)
> [  216.489159] [c000007ca84a3b00] [c000000000407258] memunmap_pages+0x188/0x2c0
> [  216.489167] [c000007ca84a3b80] [c0000000007b0810] devm_action_release+0x30/0x50
> [  216.489174] [c000007ca84a3ba0] [c0000000007b18f8] release_nodes+0x2f8/0x3e0
> [  216.489180] [c000007ca84a3c50] [c0000000007aa698] device_release_driver_internal+0x168/0x270
> [  216.489187] [c000007ca84a3c90] [c0000000007a6ad0] unbind_store+0x130/0x170
> [  216.489193] [c000007ca84a3cd0] [c0000000007a5c34] drv_attr_store+0x44/0x60
> [  216.489200] [c000007ca84a3cf0] [c0000000004fa0d8] sysfs_kf_write+0x68/0x80
> [  216.489205] [c000007ca84a3d10] [c0000000004f9530] kernfs_fop_write+0xf0/0x270
> [  216.489212] [c000007ca84a3d60] [c00000000040cbdc] __vfs_write+0x3c/0x70
> [  216.489217] [c000007ca84a3d80] [c00000000041052c] vfs_write+0xcc/0x240
> [  216.489223] [c000007ca84a3dd0] [c00000000041090c] ksys_write+0x7c/0x140
> [  216.489229] [c000007ca84a3e20] [c00000000000b278] system_call+0x5c/0x68
> [  216.489233] Instruction dump:
> [  216.489238] 80fb0008 815b000c 7d0700d0 7d08e038 7c0004ac 4c00012c 3927ffff 7d29ea14 
> [  216.489245] 7d284850 7d2a5437 41820014 7d4903a6 <7c0040ac> 7d083a14 4200fff8 7c0004ac 
> [  216.489254] ---[ end trace d9a4dfc9e158858a ]—
>
> Thanks
> -Sachin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [-merge] BUG followed by oops running ndctl tests
  2019-11-15  6:06 [-merge] BUG followed by oops running ndctl tests Sachin Sant
  2019-11-15 12:04 ` Michael Ellerman
@ 2019-11-15 18:55 ` Aneesh Kumar K.V
  2019-11-19  5:24   ` Sachin Sant
  2019-12-03  8:58 ` Aneesh Kumar K.V
  2 siblings, 1 reply; 6+ messages in thread
From: Aneesh Kumar K.V @ 2019-11-15 18:55 UTC (permalink / raw)
  To: Sachin Sant, linuxppc-dev; +Cc: harish

On 11/15/19 11:36 AM, Sachin Sant wrote:
> Following Oops is seen on latest (commit 3b4852888d) powerpc merge branch
> code while running ndctl (test_namespace) tests
> 
> 85c5b0984e was good.
> 



Are the namespace size created with size that is multiple of 16M size?

Wondering whether this is related to 
https://patchwork.kernel.org/patch/11215049/

-aneesh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [-merge] BUG followed by oops running ndctl tests
  2019-11-15 18:55 ` Aneesh Kumar K.V
@ 2019-11-19  5:24   ` Sachin Sant
  2019-11-29  7:57     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 6+ messages in thread
From: Sachin Sant @ 2019-11-19  5:24 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: harish, linuxppc-dev



> On 16-Nov-2019, at 12:25 AM, Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> wrote:
> 
> On 11/15/19 11:36 AM, Sachin Sant wrote:
>> Following Oops is seen on latest (commit 3b4852888d) powerpc merge branch
>> code while running ndctl (test_namespace) tests
>> 85c5b0984e was good.
> 
> 
> 
> Are the namespace size created with size that is multiple of 16M size?
> 
> Wondering whether this is related to https://patchwork.kernel.org/patch/11215049/

This patch series doesn’t seem to help. I can still recreate the problem with the patches applied.

Thanks
-Sachin
> 
> -aneesh


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [-merge] BUG followed by oops running ndctl tests
  2019-11-19  5:24   ` Sachin Sant
@ 2019-11-29  7:57     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 6+ messages in thread
From: Aneesh Kumar K.V @ 2019-11-29  7:57 UTC (permalink / raw)
  To: Sachin Sant; +Cc: harish, linuxppc-dev

Sachin Sant <sachinp@linux.vnet.ibm.com> writes:

>> On 16-Nov-2019, at 12:25 AM, Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> wrote:
>> 
>> On 11/15/19 11:36 AM, Sachin Sant wrote:
>>> Following Oops is seen on latest (commit 3b4852888d) powerpc merge branch
>>> code while running ndctl (test_namespace) tests
>>> 85c5b0984e was good.
>> 
>> 
>> 
>> Are the namespace size created with size that is multiple of 16M size?
>> 
>> Wondering whether this is related to https://patchwork.kernel.org/patch/11215049/
>
> This patch series doesn’t seem to help. I can still recreate the problem with the patches applied.
>

Are the namespace 16MB size aligned? If not you need to recreate all of
them size aligned.

-aneesh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [-merge] BUG followed by oops running ndctl tests
  2019-11-15  6:06 [-merge] BUG followed by oops running ndctl tests Sachin Sant
  2019-11-15 12:04 ` Michael Ellerman
  2019-11-15 18:55 ` Aneesh Kumar K.V
@ 2019-12-03  8:58 ` Aneesh Kumar K.V
  2 siblings, 0 replies; 6+ messages in thread
From: Aneesh Kumar K.V @ 2019-12-03  8:58 UTC (permalink / raw)
  To: Sachin Sant, linuxppc-dev; +Cc: harish

Sachin Sant <sachinp@linux.vnet.ibm.com> writes:

> Following Oops is seen on latest (commit 3b4852888d) powerpc merge branch
> code while running ndctl (test_namespace) tests
>
> 85c5b0984e was good.
>
>  (06/12) avocado-misc-tests/memory/ndctl.py:NdctlTest.test_namespace:  [  213.570536] memmap_init_zone_device initialised 1636608 pages in 10ms
> [  213.570835] pmem0: detected capacity change from 0 to 107256741888
> [  216.488983] BUG: Unable to handle kernel data access at 0xc000043900000000
> [  216.488996] Faulting instruction address: 0xc000000000087510
> [  216.489002] Oops: Kernel access of bad area, sig: 11 [#1]
> [  216.489007] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [  216.489019] Dumping ftrace buffer:
> [  216.489029]    (ftrace buffer empty)
> [  216.489033] Modules linked in: dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip6_tables nft_compat ip_set nf_tables nfnetlink sunrpc sg pseries_rng papr_scm uio_pdrv_genirq uio sch_fq_codel ip_tables sd_mod ibmvscsi ibmveth scsi_transport_srp
> [  216.489059] CPU: 8 PID: 17523 Comm: lt-ndctl Not tainted 5.4.0-rc7-autotest #1
> [  216.489065] NIP:  c000000000087510 LR: c00000000008752c CTR: 01ffffffce800000
> [  216.489071] REGS: c000007ca84a37d0 TRAP: 0300   Not tainted  (5.4.0-rc7-autotest)
> [  216.489076] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 42048224  XER: 00000000
> [  216.489086] CFAR: c000000000087518 DAR: c000043900000000 DSISR: 40000000 IRQMASK: 0 
> [  216.489086] GPR00: c00000000008752c c000007ca84a3a60 c00000000159bb00 0000000000000000 
> [  216.489086] GPR04: 40066bdea7010e15 0000605530000194 0000000000000000 0000000000000080 
> [  216.489086] GPR08: c000043900000000 ffffffffc000007f 01ffffffff800000 0000000000000000 
> [  216.489086] GPR12: 0000000000008000 c00000001ec5d200 00007ffff897f9e9 000000001002e088 
> [  216.489086] GPR16: 0000000000000000 0000000010050d88 000000001002f778 000000001002f770 
> [  216.489086] GPR20: 0000000000000000 000000001002e048 0000000010050e3d 0000000010050e40 
> [  216.489086] GPR24: 0000000000000000 c000007c8d0a6c10 c000007cced28a20 c000000001463048 
> [  216.489086] GPR28: c000042080000000 c000042040000000 c000043900000000 c000042000000000 
> [  216.489137] NIP [c000000000087510] arch_remove_memory+0x100/0x1b0
> [  216.489143] LR [c00000000008752c] arch_remove_memory+0x11c/0x1b0
> [  216.489148] Call Trace:
> [  216.489151] [c000007ca84a3a60] [c00000000008752c] arch_remove_memory+0x11c/0x1b0 (unreliable)
> [  216.489159] [c000007ca84a3b00] [c000000000407258] memunmap_pages+0x188/0x2c0
> [  216.489167] [c000007ca84a3b80] [c0000000007b0810] devm_action_release+0x30/0x50
> [  216.489174] [c000007ca84a3ba0] [c0000000007b18f8] release_nodes+0x2f8/0x3e0
> [  216.489180] [c000007ca84a3c50] [c0000000007aa698] device_release_driver_internal+0x168/0x270
> [  216.489187] [c000007ca84a3c90] [c0000000007a6ad0] unbind_store+0x130/0x170
> [  216.489193] [c000007ca84a3cd0] [c0000000007a5c34] drv_attr_store+0x44/0x60
> [  216.489200] [c000007ca84a3cf0] [c0000000004fa0d8] sysfs_kf_write+0x68/0x80
> [  216.489205] [c000007ca84a3d10] [c0000000004f9530] kernfs_fop_write+0xf0/0x270
> [  216.489212] [c000007ca84a3d60] [c00000000040cbdc] __vfs_write+0x3c/0x70
> [  216.489217] [c000007ca84a3d80] [c00000000041052c] vfs_write+0xcc/0x240
> [  216.489223] [c000007ca84a3dd0] [c00000000041090c] ksys_write+0x7c/0x140
> [  216.489229] [c000007ca84a3e20] [c00000000000b278] system_call+0x5c/0x68
> [  216.489233] Instruction dump:
> [  216.489238] 80fb0008 815b000c 7d0700d0 7d08e038 7c0004ac 4c00012c 3927ffff 7d29ea14 
> [  216.489245] 7d284850 7d2a5437 41820014 7d4903a6 <7c0040ac> 7d083a14 4200fff8 7c0004ac 
> [  216.489254] ---[ end trace d9a4dfc9e158858a ]—
>
> Thanks
> -Sachin

Can you try this patch?

commit 0eb3f28de8ad769c1c559f1269f9a9447af08005
Author: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Date:   Tue Dec 3 14:23:58 2019 +0530

    powerpc/pmem: Fix kernel crash due to wrong range value usage in flush_dcache_range
    
    This patch fix the below kernel crash.
    
     BUG: Unable to handle kernel data access on read at 0xc000000380000000
     Faulting instruction address: 0xc00000000008b6f0
    cpu 0x5: Vector: 300 (Data Access) at [c0000000d8587790]
        pc: c00000000008b6f0: arch_remove_memory+0x150/0x210
        lr: c00000000008b720: arch_remove_memory+0x180/0x210
        sp: c0000000d8587a20
       msr: 800000000280b033
       dar: c000000380000000
     dsisr: 40000000
      current = 0xc0000000d8558600
      paca    = 0xc00000000fff8f00   irqmask: 0x03   irq_happened: 0x01
        pid   = 1220, comm = ndctl
    enter ? for help
     memunmap_pages+0x33c/0x410
     devm_action_release+0x30/0x50
     release_nodes+0x30c/0x3a0
     device_release_driver_internal+0x178/0x240
     unbind_store+0x74/0x190
     drv_attr_store+0x44/0x60
     sysfs_kf_write+0x74/0xa0
     kernfs_fop_write+0x1b0/0x260
     __vfs_write+0x3c/0x70
     vfs_write+0xe4/0x200
     ksys_write+0x7c/0x140
     system_call+0x5c/0x68
    
    Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index ad299e72ec30..9488b63dfc87 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -121,7 +121,7 @@ static void flush_dcache_range_chunked(unsigned long start, unsigned long stop,
 	unsigned long i;
 
 	for (i = start; i < stop; i += chunk) {
-		flush_dcache_range(i, min(stop, start + chunk));
+		flush_dcache_range(i, min(stop, i + chunk));
 		cond_resched();
 	}
 }

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-12-03  9:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-15  6:06 [-merge] BUG followed by oops running ndctl tests Sachin Sant
2019-11-15 12:04 ` Michael Ellerman
2019-11-15 18:55 ` Aneesh Kumar K.V
2019-11-19  5:24   ` Sachin Sant
2019-11-29  7:57     ` Aneesh Kumar K.V
2019-12-03  8:58 ` Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).