linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
* [5.6.0-rc7] Kernel crash while running ndctl tests
@ 2020-03-24  5:55 Sachin Sant
  2020-03-24  7:07 ` Baoquan He
  2020-03-24  9:15 ` Aneesh Kumar K.V
  0 siblings, 2 replies; 6+ messages in thread
From: Sachin Sant @ 2020-03-24  5:55 UTC (permalink / raw)
  To: LKML, linuxppc-dev; +Cc: Baoquan He, linux-nvdimm

While running ndctl[1] tests against 5.6.0-rc7 following crash is encountered.

Bisect leads me to  commit d41e2f3bd546 
mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case

Reverting this commit helps and the tests complete without any crash.

pmem0: detected capacity change from 0 to 10720641024
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc000000000c3447c
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in: dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip6_tables nft_compat ip_set rfkill nf_tables nfnetlink sunrpc sg pseries_rng papr_scm uio_pdrv_genirq uio sch_fq_codel ip_tables sd_mod t10_pi ibmvscsi scsi_transport_srp ibmveth
CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
NIP:  c000000000c3447c LR: c000000000088354 CTR: c00000000018e990
REGS: c0000006223fb630 TRAP: 0300   Not tainted  (5.6.0-rc7-autotest)
MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24048888  XER: 00000000
CFAR: c00000000000dec4 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 
GPR00: c0000000003c5820 c0000006223fb8c0 c000000001684900 0000000004000000 
GPR04: c00c000101000000 0000000007ffffff c00000067ff20900 c00c000000000000 
GPR08: 0000000000000000 c00c000100000000 0000000000000000 c000000003f00000 
GPR12: 0000000000008000 c00000001ec70200 00007fffc102f9e8 000000001002e088 
GPR16: 0000000000000000 0000000010050d88 000000001002f778 000000001002f770 
GPR20: 0000000000000000 0000000000000100 0000000000000001 0000000000001000 
GPR24: 0000000000000008 0000000000000000 0000000004000000 c00c000100004000 
GPR28: c000000003101aa0 c00c000100000000 0000000001000000 0000000004000100 
NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
LR [c000000000088354] vmemmap_free+0x144/0x320
Call Trace:
[c0000006223fb8c0] [c0000006223fb960] 0xc0000006223fb960 (unreliable)
[c0000006223fb980] [c0000000003c5820] section_deactivate+0x220/0x240
[c0000006223fba30] [c0000000003dc1d8] __remove_pages+0x118/0x170
[c0000006223fba80] [c000000000086e5c] arch_remove_memory+0x3c/0x150
[c0000006223fbb00] [c00000000041a3bc] memunmap_pages+0x1cc/0x2f0
[c0000006223fbb80] [c0000000007d6d00] devm_action_release+0x30/0x50
[c0000006223fbba0] [c0000000007d7de8] release_nodes+0x2f8/0x3e0
[c0000006223fbc50] [c0000000007d0b38] device_release_driver_internal+0x168/0x270
[c0000006223fbc90] [c0000000007ccf50] unbind_store+0x130/0x170
[c0000006223fbcd0] [c0000000007cc0b4] drv_attr_store+0x44/0x60
[c0000006223fbcf0] [c00000000051fdb8] sysfs_kf_write+0x68/0x80
[c0000006223fbd10] [c00000000051f200] kernfs_fop_write+0x100/0x290
[c0000006223fbd60] [c00000000042037c] __vfs_write+0x3c/0x70
[c0000006223fbd80] [c00000000042404c] vfs_write+0xcc/0x240
[c0000006223fbdd0] [c00000000042442c] ksys_write+0x7c/0x140
[c0000006223fbe20] [c00000000000b278] system_call+0x5c/0x68
Instruction dump:
2ea80000 4196003c 794a2428 7d685215 41820030 7d48502a 71480002 41820024 
714a0008 4082002c e90b0008 786adf62 <e8680000> 7c635436 70630001 4c820020 
---[ end trace 579b48162da1b890 ]—

Thanks
-Sachin

[1] https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [5.6.0-rc7] Kernel crash while running ndctl tests
  2020-03-24  5:55 [5.6.0-rc7] Kernel crash while running ndctl tests Sachin Sant
@ 2020-03-24  7:07 ` Baoquan He
  2020-03-24  7:45   ` Sachin Sant
  2020-03-24  9:15 ` Aneesh Kumar K.V
  1 sibling, 1 reply; 6+ messages in thread
From: Baoquan He @ 2020-03-24  7:07 UTC (permalink / raw)
  To: Sachin Sant; +Cc: LKML, linuxppc-dev, linux-nvdimm

Hi Sachin,

On 03/24/20 at 11:25am, Sachin Sant wrote:
> While running ndctl[1] tests against 5.6.0-rc7 following crash is encountered.
> 
> Bisect leads me to  commit d41e2f3bd546 
> mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
> 
> Reverting this commit helps and the tests complete without any crash.

Could you paste your kernel config and the boot log?

If it's confidential, private attachment is also OK.

Thanks
Baoquan

> 
> pmem0: detected capacity change from 0 to 10720641024
> BUG: Kernel NULL pointer dereference on read at 0x00000000
> Faulting instruction address: 0xc000000000c3447c
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in: dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip6_tables nft_compat ip_set rfkill nf_tables nfnetlink sunrpc sg pseries_rng papr_scm uio_pdrv_genirq uio sch_fq_codel ip_tables sd_mod t10_pi ibmvscsi scsi_transport_srp ibmveth
> CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
> NIP:  c000000000c3447c LR: c000000000088354 CTR: c00000000018e990
> REGS: c0000006223fb630 TRAP: 0300   Not tainted  (5.6.0-rc7-autotest)
> MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24048888  XER: 00000000
> CFAR: c00000000000dec4 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 
> GPR00: c0000000003c5820 c0000006223fb8c0 c000000001684900 0000000004000000 
> GPR04: c00c000101000000 0000000007ffffff c00000067ff20900 c00c000000000000 
> GPR08: 0000000000000000 c00c000100000000 0000000000000000 c000000003f00000 
> GPR12: 0000000000008000 c00000001ec70200 00007fffc102f9e8 000000001002e088 
> GPR16: 0000000000000000 0000000010050d88 000000001002f778 000000001002f770 
> GPR20: 0000000000000000 0000000000000100 0000000000000001 0000000000001000 
> GPR24: 0000000000000008 0000000000000000 0000000004000000 c00c000100004000 
> GPR28: c000000003101aa0 c00c000100000000 0000000001000000 0000000004000100 
> NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
> LR [c000000000088354] vmemmap_free+0x144/0x320
> Call Trace:
> [c0000006223fb8c0] [c0000006223fb960] 0xc0000006223fb960 (unreliable)
> [c0000006223fb980] [c0000000003c5820] section_deactivate+0x220/0x240
> [c0000006223fba30] [c0000000003dc1d8] __remove_pages+0x118/0x170
> [c0000006223fba80] [c000000000086e5c] arch_remove_memory+0x3c/0x150
> [c0000006223fbb00] [c00000000041a3bc] memunmap_pages+0x1cc/0x2f0
> [c0000006223fbb80] [c0000000007d6d00] devm_action_release+0x30/0x50
> [c0000006223fbba0] [c0000000007d7de8] release_nodes+0x2f8/0x3e0
> [c0000006223fbc50] [c0000000007d0b38] device_release_driver_internal+0x168/0x270
> [c0000006223fbc90] [c0000000007ccf50] unbind_store+0x130/0x170
> [c0000006223fbcd0] [c0000000007cc0b4] drv_attr_store+0x44/0x60
> [c0000006223fbcf0] [c00000000051fdb8] sysfs_kf_write+0x68/0x80
> [c0000006223fbd10] [c00000000051f200] kernfs_fop_write+0x100/0x290
> [c0000006223fbd60] [c00000000042037c] __vfs_write+0x3c/0x70
> [c0000006223fbd80] [c00000000042404c] vfs_write+0xcc/0x240
> [c0000006223fbdd0] [c00000000042442c] ksys_write+0x7c/0x140
> [c0000006223fbe20] [c00000000000b278] system_call+0x5c/0x68
> Instruction dump:
> 2ea80000 4196003c 794a2428 7d685215 41820030 7d48502a 71480002 41820024 
> 714a0008 4082002c e90b0008 786adf62 <e8680000> 7c635436 70630001 4c820020 
> ---[ end trace 579b48162da1b890 ]—
> 
> Thanks
> -Sachin
> 
> [1] https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py
> 
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [5.6.0-rc7] Kernel crash while running ndctl tests
  2020-03-24  7:07 ` Baoquan He
@ 2020-03-24  7:45   ` Sachin Sant
  0 siblings, 0 replies; 6+ messages in thread
From: Sachin Sant @ 2020-03-24  7:45 UTC (permalink / raw)
  To: Baoquan He; +Cc: linuxppc-dev, LKML, linux-nvdimm


> On 24-Mar-2020, at 12:37 PM, Baoquan He <bhe@redhat.com> wrote:
> 
> Hi Sachin,
> 
> On 03/24/20 at 11:25am, Sachin Sant wrote:
>> While running ndctl[1] tests against 5.6.0-rc7 following crash is encountered.
>> 
>> Bisect leads me to  commit d41e2f3bd546 
>> mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
>> 
>> Reverting this commit helps and the tests complete without any crash.
> 
> Could you paste your kernel config and the boot log?
> 

I have attached boot.log as well as kernel config.

Thanks
-Sachin

_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [5.6.0-rc7] Kernel crash while running ndctl tests
  2020-03-24  5:55 [5.6.0-rc7] Kernel crash while running ndctl tests Sachin Sant
  2020-03-24  7:07 ` Baoquan He
@ 2020-03-24  9:15 ` Aneesh Kumar K.V
  2020-03-24  9:36   ` Sachin Sant
  1 sibling, 1 reply; 6+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-24  9:15 UTC (permalink / raw)
  To: Sachin Sant, LKML, linuxppc-dev; +Cc: Baoquan He, linux-nvdimm

Sachin Sant <sachinp@linux.vnet.ibm.com> writes:

> While running ndctl[1] tests against 5.6.0-rc7 following crash is encountered.
>
> Bisect leads me to  commit d41e2f3bd546 
> mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
>
> Reverting this commit helps and the tests complete without any crash.
>
> pmem0: detected capacity change from 0 to 10720641024
> BUG: Kernel NULL pointer dereference on read at 0x00000000
> Faulting instruction address: 0xc000000000c3447c
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in: dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip6_tables nft_compat ip_set rfkill nf_tables nfnetlink sunrpc sg pseries_rng papr_scm uio_pdrv_genirq uio sch_fq_codel ip_tables sd_mod t10_pi ibmvscsi scsi_transport_srp ibmveth
> CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
> NIP:  c000000000c3447c LR: c000000000088354 CTR: c00000000018e990
> REGS: c0000006223fb630 TRAP: 0300   Not tainted  (5.6.0-rc7-autotest)
> MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24048888  XER: 00000000
> CFAR: c00000000000dec4 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 
> GPR00: c0000000003c5820 c0000006223fb8c0 c000000001684900 0000000004000000 
> GPR04: c00c000101000000 0000000007ffffff c00000067ff20900 c00c000000000000 
> GPR08: 0000000000000000 c00c000100000000 0000000000000000 c000000003f00000 
> GPR12: 0000000000008000 c00000001ec70200 00007fffc102f9e8 000000001002e088 
> GPR16: 0000000000000000 0000000010050d88 000000001002f778 000000001002f770 
> GPR20: 0000000000000000 0000000000000100 0000000000000001 0000000000001000 
> GPR24: 0000000000000008 0000000000000000 0000000004000000 c00c000100004000 
> GPR28: c000000003101aa0 c00c000100000000 0000000001000000 0000000004000100 
> NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
> LR [c000000000088354] vmemmap_free+0x144/0x320
> Call Trace:
> [c0000006223fb8c0] [c0000006223fb960] 0xc0000006223fb960 (unreliable)
> [c0000006223fb980] [c0000000003c5820] section_deactivate+0x220/0x240
> [c0000006223fba30] [c0000000003dc1d8] __remove_pages+0x118/0x170
> [c0000006223fba80] [c000000000086e5c] arch_remove_memory+0x3c/0x150
> [c0000006223fbb00] [c00000000041a3bc] memunmap_pages+0x1cc/0x2f0
> [c0000006223fbb80] [c0000000007d6d00] devm_action_release+0x30/0x50
> [c0000006223fbba0] [c0000000007d7de8] release_nodes+0x2f8/0x3e0
> [c0000006223fbc50] [c0000000007d0b38] device_release_driver_internal+0x168/0x270
> [c0000006223fbc90] [c0000000007ccf50] unbind_store+0x130/0x170
> [c0000006223fbcd0] [c0000000007cc0b4] drv_attr_store+0x44/0x60
> [c0000006223fbcf0] [c00000000051fdb8] sysfs_kf_write+0x68/0x80
> [c0000006223fbd10] [c00000000051f200] kernfs_fop_write+0x100/0x290
> [c0000006223fbd60] [c00000000042037c] __vfs_write+0x3c/0x70
> [c0000006223fbd80] [c00000000042404c] vfs_write+0xcc/0x240
> [c0000006223fbdd0] [c00000000042442c] ksys_write+0x7c/0x140
> [c0000006223fbe20] [c00000000000b278] system_call+0x5c/0x68
> Instruction dump:
> 2ea80000 4196003c 794a2428 7d685215 41820030 7d48502a 71480002 41820024 
> 714a0008 4082002c e90b0008 786adf62 <e8680000> 7c635436 70630001 4c820020 
> ---[ end trace 579b48162da1b890 ]—


Can you try this change?

diff --git a/mm/sparse.c b/mm/sparse.c
index aadb7298dcef..3012d1f3771a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 			ms->usage = NULL;
 		}
 		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
+		/* Mark the section invalid */
+		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
 	}
 
 	if (section_is_early && memmap)

a pfn_valid check involves pnf_section_valid() check if section is
having MEM_MAP. In this case we did end up  setting the ms->uage = NULL.
So when we do that tupdate the section to not have MEM_MAP.

-aneesh
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [5.6.0-rc7] Kernel crash while running ndctl tests
  2020-03-24  9:15 ` Aneesh Kumar K.V
@ 2020-03-24  9:36   ` Sachin Sant
  2020-03-24 10:14     ` Baoquan He
  0 siblings, 1 reply; 6+ messages in thread
From: Sachin Sant @ 2020-03-24  9:36 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: LKML, linuxppc-dev, Baoquan He, linux-nvdimm



> On 24-Mar-2020, at 2:45 PM, Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> wrote:
> 
> Sachin Sant <sachinp@linux.vnet.ibm.com> writes:
> 
>> While running ndctl[1] tests against 5.6.0-rc7 following crash is encountered.
>> 
>> Bisect leads me to  commit d41e2f3bd546 
>> mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
>> 
>> Reverting this commit helps and the tests complete without any crash.
> 
> 
> Can you try this change?
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index aadb7298dcef..3012d1f3771a 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> 			ms->usage = NULL;
> 		}
> 		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> +		/* Mark the section invalid */
> +		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> 	}
> 
> 	if (section_is_early && memmap)
> 

This patch works for me. The test ran successfully without any crash/failure.

Thanks
-Sachin

> a pfn_valid check involves pnf_section_valid() check if section is
> having MEM_MAP. In this case we did end up  setting the ms->uage = NULL.
> So when we do that tupdate the section to not have MEM_MAP.
> 
> -aneesh
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [5.6.0-rc7] Kernel crash while running ndctl tests
  2020-03-24  9:36   ` Sachin Sant
@ 2020-03-24 10:14     ` Baoquan He
  0 siblings, 0 replies; 6+ messages in thread
From: Baoquan He @ 2020-03-24 10:14 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Sachin Sant; +Cc: LKML, linuxppc-dev, linux-nvdimm

On 03/24/20 at 03:06pm, Sachin Sant wrote:
> 
> 
> > On 24-Mar-2020, at 2:45 PM, Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> wrote:
> > 
> > Sachin Sant <sachinp@linux.vnet.ibm.com> writes:
> > 
> >> While running ndctl[1] tests against 5.6.0-rc7 following crash is encountered.
> >> 
> >> Bisect leads me to  commit d41e2f3bd546 
> >> mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
> >> 
> >> Reverting this commit helps and the tests complete without any crash.
> > 
> > 
> > Can you try this change?
> > 
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index aadb7298dcef..3012d1f3771a 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> > 			ms->usage = NULL;
> > 		}
> > 		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> > +		/* Mark the section invalid */
> > +		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> > 	}
> > 
> > 	if (section_is_early && memmap)
> > 
> 
> This patch works for me. The test ran successfully without any crash/failure.

Hi Aneesh,

Could you make a formal patch to post, since Sachin has tested and
confirmed it works?

> 
> Thanks
> -Sachin
> 
> > a pfn_valid check involves pnf_section_valid() check if section is
> > having MEM_MAP. In this case we did end up  setting the ms->uage = NULL.
> > So when we do that tupdate the section to not have MEM_MAP.
> > 
> > -aneesh
> 
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-03-24 10:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-24  5:55 [5.6.0-rc7] Kernel crash while running ndctl tests Sachin Sant
2020-03-24  7:07 ` Baoquan He
2020-03-24  7:45   ` Sachin Sant
2020-03-24  9:15 ` Aneesh Kumar K.V
2020-03-24  9:36   ` Sachin Sant
2020-03-24 10:14     ` Baoquan He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).