linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
@ 2016-08-23 18:43 Toshi Kani
  2016-08-23 20:42 ` Andrew Morton
  2016-08-23 22:32 ` Dan Williams
  0 siblings, 2 replies; 17+ messages in thread
From: Toshi Kani @ 2016-08-23 18:43 UTC (permalink / raw)
  To: dan.j.williams
  Cc: m.abhilash-kumar, linux-nvdimm, linux-kernel, Toshi Kani,
	Andrew Morton, Ard Biesheuvel, Brian Starkey

The following BUG was observed while starting up KVM with nvdimm
device as memory-backend-file to /dev/dax.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
 IP: [<ffffffff811ac851>] get_zone_device_page+0x11/0x30
 Call Trace:
   follow_devmap_pmd+0x298/0x2c0
   follow_page_mask+0x275/0x530
   __get_user_pages+0xe3/0x750
   __gfn_to_pfn_memslot+0x1b2/0x450 [kvm]
   ? hrtimer_try_to_cancel+0x2c/0x120
   ? kvm_read_l1_tsc+0x55/0x60 [kvm]
   try_async_pf+0x66/0x230 [kvm]
   ? kvm_host_page_size+0x90/0xa0 [kvm]
   tdp_page_fault+0x130/0x280 [kvm]
   kvm_mmu_page_fault+0x5f/0xf0 [kvm]
   handle_ept_violation+0x94/0x180 [kvm_intel]
   vmx_handle_exit+0x1d3/0x1440 [kvm_intel]
   ? atomic_switch_perf_msrs+0x6f/0xa0 [kvm_intel]
   ? vmx_vcpu_run+0x2d1/0x490 [kvm_intel]
   kvm_arch_vcpu_ioctl_run+0x81d/0x16a0 [kvm]
   ? wake_up_q+0x44/0x80
   kvm_vcpu_ioctl+0x33c/0x620 [kvm]
   ? __vfs_write+0x37/0x160
   do_vfs_ioctl+0xa2/0x5d0
   SyS_ioctl+0x79/0x90
   entry_SYSCALL_64_fastpath+0x1a/0xa4

devm_memremap_pages() calls for_each_device_pfn() to walk through
all pfns in page_map.  pfn_first(), however, returns a wrong pfn
that leaves page->pgmap uninitialized.

Since arch_add_memory() has set up direct mappings to the NVDIMM
range with altmap, pfn_first() should not modify the start pfn.
Change pfn_first() to simply return pfn of res->start.

Reported-and-tested-by: Abhilash Kumar Mulumudi <m.abhilash-kumar@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Brian Starkey <brian.starkey@arm.com>
---
 kernel/memremap.c |    8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 251d16b..50ea577 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -210,15 +210,9 @@ static void pgmap_radix_release(struct resource *res)
 
 static unsigned long pfn_first(struct page_map *page_map)
 {
-	struct dev_pagemap *pgmap = &page_map->pgmap;
 	const struct resource *res = &page_map->res;
-	struct vmem_altmap *altmap = pgmap->altmap;
-	unsigned long pfn;
 
-	pfn = res->start >> PAGE_SHIFT;
-	if (altmap)
-		pfn += vmem_altmap_offset(altmap);
-	return pfn;
+	return res->start >> PAGE_SHIFT;
 }
 
 static unsigned long pfn_end(struct page_map *page_map)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
  2016-08-23 18:43 [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() Toshi Kani
@ 2016-08-23 20:42 ` Andrew Morton
  2016-08-23 21:25   ` Kani, Toshimitsu
  2016-08-23 22:32 ` Dan Williams
  1 sibling, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2016-08-23 20:42 UTC (permalink / raw)
  To: Toshi Kani
  Cc: dan.j.williams, m.abhilash-kumar, linux-nvdimm, linux-kernel,
	Ard Biesheuvel, Brian Starkey

On Tue, 23 Aug 2016 12:43:20 -0600 Toshi Kani <toshi.kani@hpe.com> wrote:

> The following BUG was observed while starting up KVM with nvdimm
> device as memory-backend-file to /dev/dax.
> 
>  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>
> ...
>
> devm_memremap_pages() calls for_each_device_pfn() to walk through
> all pfns in page_map.  pfn_first(), however, returns a wrong pfn
> that leaves page->pgmap uninitialized.
> 
> Since arch_add_memory() has set up direct mappings to the NVDIMM
> range with altmap, pfn_first() should not modify the start pfn.
> Change pfn_first() to simply return pfn of res->start.

Which kernel version(s) do you think need fixing?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
  2016-08-23 20:42 ` Andrew Morton
@ 2016-08-23 21:25   ` Kani, Toshimitsu
  0 siblings, 0 replies; 17+ messages in thread
From: Kani, Toshimitsu @ 2016-08-23 21:25 UTC (permalink / raw)
  To: akpm
  Cc: dan.j.williams, Mulumudi, Abhilash Kumar, linux-kernel,
	ard.biesheuvel, linux-nvdimm, brian.starkey

On Tue, 2016-08-23 at 13:42 -0700, Andrew Morton wrote:
> On Tue, 23 Aug 2016 12:43:20 -0600 Toshi Kani <toshi.kani@hpe.com>
> wrote:
> 
> > 
> > The following BUG was observed while starting up KVM with nvdimm
> > device as memory-backend-file to /dev/dax.
> > 
> >  BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000008
> > 
> > ...
> > 
> > devm_memremap_pages() calls for_each_device_pfn() to walk through
> > all pfns in page_map.  pfn_first(), however, returns a wrong pfn
> > that leaves page->pgmap uninitialized.
> > 
> > Since arch_add_memory() has set up direct mappings to the NVDIMM
> > range with altmap, pfn_first() should not modify the start pfn.
> > Change pfn_first() to simply return pfn of res->start.
> 
> Which kernel version(s) do you think need fixing?

The fix applies to v4.5 and newer versions.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
  2016-08-23 18:43 [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() Toshi Kani
  2016-08-23 20:42 ` Andrew Morton
@ 2016-08-23 22:32 ` Dan Williams
  2016-08-23 23:47   ` Kani, Toshimitsu
  1 sibling, 1 reply; 17+ messages in thread
From: Dan Williams @ 2016-08-23 22:32 UTC (permalink / raw)
  To: Toshi Kani
  Cc: m.abhilash-kumar, linux-nvdimm@lists.01.org, linux-kernel,
	Andrew Morton, Ard Biesheuvel, Brian Starkey

On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com> wrote:
> The following BUG was observed while starting up KVM with nvdimm
> device as memory-backend-file to /dev/dax.
>
>  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>  IP: [<ffffffff811ac851>] get_zone_device_page+0x11/0x30
>  Call Trace:
>    follow_devmap_pmd+0x298/0x2c0
>    follow_page_mask+0x275/0x530
>    __get_user_pages+0xe3/0x750
>    __gfn_to_pfn_memslot+0x1b2/0x450 [kvm]
>    ? hrtimer_try_to_cancel+0x2c/0x120
>    ? kvm_read_l1_tsc+0x55/0x60 [kvm]
>    try_async_pf+0x66/0x230 [kvm]
>    ? kvm_host_page_size+0x90/0xa0 [kvm]
>    tdp_page_fault+0x130/0x280 [kvm]
>    kvm_mmu_page_fault+0x5f/0xf0 [kvm]
>    handle_ept_violation+0x94/0x180 [kvm_intel]
>    vmx_handle_exit+0x1d3/0x1440 [kvm_intel]
>    ? atomic_switch_perf_msrs+0x6f/0xa0 [kvm_intel]
>    ? vmx_vcpu_run+0x2d1/0x490 [kvm_intel]
>    kvm_arch_vcpu_ioctl_run+0x81d/0x16a0 [kvm]
>    ? wake_up_q+0x44/0x80
>    kvm_vcpu_ioctl+0x33c/0x620 [kvm]
>    ? __vfs_write+0x37/0x160
>    do_vfs_ioctl+0xa2/0x5d0
>    SyS_ioctl+0x79/0x90
>    entry_SYSCALL_64_fastpath+0x1a/0xa4
>
> devm_memremap_pages() calls for_each_device_pfn() to walk through
> all pfns in page_map.  pfn_first(), however, returns a wrong pfn
> that leaves page->pgmap uninitialized.
>
> Since arch_add_memory() has set up direct mappings to the NVDIMM
> range with altmap, pfn_first() should not modify the start pfn.
> Change pfn_first() to simply return pfn of res->start.
>
> Reported-and-tested-by: Abhilash Kumar Mulumudi <m.abhilash-kumar@hpe.com>
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Brian Starkey <brian.starkey@arm.com>
> ---
>  kernel/memremap.c |    8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index 251d16b..50ea577 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -210,15 +210,9 @@ static void pgmap_radix_release(struct resource *res)
>
>  static unsigned long pfn_first(struct page_map *page_map)
>  {
> -       struct dev_pagemap *pgmap = &page_map->pgmap;
>         const struct resource *res = &page_map->res;
> -       struct vmem_altmap *altmap = pgmap->altmap;
> -       unsigned long pfn;
>
> -       pfn = res->start >> PAGE_SHIFT;
> -       if (altmap)
> -               pfn += vmem_altmap_offset(altmap);
> -       return pfn;
> +       return res->start >> PAGE_SHIFT;
>  }

I'm not sure about this fix.  The point of honoring
vmem_altmap_offset() is because a portion of the resource that is
passed to devm_memremap_pages() also contains the metadata info block
for the device.  The offset says "use everything past this point for
pages".  This may work for avoiding a crash, but it may corrupt info
block metadata in the process.  Can you provide more information about
the failing scenario to be sure that we are not triggering a fault on
an address that is not meant to have a page mapping?  I.e. what is the
host physical address of the page that caused this fault, and is it
valid?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
  2016-08-23 22:32 ` Dan Williams
@ 2016-08-23 23:47   ` Kani, Toshimitsu
  2016-08-24  0:34     ` Dan Williams
  2016-08-24  1:00     ` Dan Williams
  0 siblings, 2 replies; 17+ messages in thread
From: Kani, Toshimitsu @ 2016-08-23 23:47 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org,
	ard.biesheuvel, linux-kernel, akpm, brian.starkey

On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote:
> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com>
> wrote:
 :
> I'm not sure about this fix.  The point of honoring 
> vmem_altmap_offset() is because a portion of the resource that is
> passed to devm_memremap_pages() also contains the metadata info block
> for the device.  The offset says "use everything past this point for
> pages".  This may work for avoiding a crash, but it may corrupt info
> block metadata in the process.  Can you provide more information
> about the failing scenario to be sure that we are not triggering a
> fault on an address that is not meant to have a page mapping?  I.e.
> what is the host physical address of the page that caused this fault,
> and is it valid?

The fault address in question was the 2nd page of an NVDIMM range.  I
assumed this fault address was valid and needed to be handled.  Here is
some info about the base and patched cases.  Let me know if you need
more info.

Base
====

The following NVDIMM range was set to /dev/dax.

/proc/iomem
480000000-87fffffff : Persistent Memory

devm_memremap_pages() initialized struct page from 0x490200-0x87ffff.
This left 0x48000-0x4901ff uninitialized for page->pgmap.

 devm_memremap_pages: pgmap 0xffff88046d0453f0
 [0]  : pfn 0x490200, page ffffea0012408000, pgmap ffff88046d0453f0
 [1]  : pfn 0x490201, page ffffea0012408040, pgmap ffff88046d0453f0
 [2]  : pfn 0x490202, page ffffea0012408080, pgmap ffff88046d0453f0
 [3]  : pfn 0x490203, page ffffea00124080c0, pgmap ffff88046d0453f0
 [4]  : pfn 0x490204, page ffffea0012408100, pgmap ffff88046d0453f0
  :
 [E+1]: pfn 0x880000, page ffffea0021ffffc0, pgmap ffff88046d0453f0

The faulted page was pfn 0x480001, which was the 2nd page in the NVDIMM
range and did not have valid pgmap.  This led the BUG.

 pfn              0x480001
 page             0xffffea0012000040
 page->pgmap      0xffffea0012000060
 page->pgmap->ref (null)

Patch
=====

With the patch, devm_memremap_pages() initializes as follows.

 devm_memremap_pages: pgmap ffff880462b3b4b0
 [0]  : pfn 0x480000, page ffffea0012000000, pgmap ffff880462b3b4b0
 [1]  : pfn 0x480001, page ffffea0012000040, pgmap ffff880462b3b4b0
 [2]  : pfn 0x480002, page ffffea0012000080, pgmap ffff880462b3b4b0
 [3]  : pfn 0x480003, page ffffea00120000c0, pgmap ffff880462b3b4b0
 [4]  : pfn 0x480004, page ffffea0012000100, pgmap ffff880462b3b4b0
  :
 [E+1]: pfn 0x880000, page ffffea0021ffffc0, pgmap ffff880462b3b4b0

A page fault to pfn 0x480001 is handled as it has valid pgmap.

 pfn              0x480001
 page             0xffffea0012000040
 page->pgmap      0xffff880462b3b4b0
 page->pgmap->ref 0xffff880462b3b530

Its dev_pagemap and vmem_altmap are as follows.

crash> p {struct dev_pagemap} 0xffff880462b3b4b0
$2 = {
  altmap = 0xffff880462b3b4d0,
  res = 0xffff880462b3b468,
  ref = 0xffff880462b3b530,
  dev = 0xffff880463e37010
}

crash> p {struct vmem_altmap} 0xffff880462b3b4d0
$3 = {
  base_pfn = 0x480000,
  reserve = 0x2,
  free = 0x101fe,
  align = 0x1fe,
  alloc = 0x10000
}

This page entry is physically located at 0x480200040.

crash> vtop 0xffffea0012000040
VIRTUAL           PHYSICAL
ffffea0012000040  480200040

PML4 DIRECTORY: ffffffff81c06000
PAGE DIRECTORY: 47ffe6067
   PUD: 47ffe6000 => 47ffe5067
   PMD: 47ffe5480 => 80000004802001e3
  PAGE: 480200000  (2MB)

      PTE         PHYSICAL   FLAGS
80000004802001e3  480200000  (PRESENT|RW|ACCESSED|DIRTY|PSE|GLOBAL|NX)

      PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
ffffea0012008000 480200000                0        0  1 4fffe000000400
reserved

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
  2016-08-23 23:47   ` Kani, Toshimitsu
@ 2016-08-24  0:34     ` Dan Williams
  2016-08-24  1:29       ` Kani, Toshimitsu
  2016-08-24  1:00     ` Dan Williams
  1 sibling, 1 reply; 17+ messages in thread
From: Dan Williams @ 2016-08-24  0:34 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org,
	ard.biesheuvel, linux-kernel, akpm, brian.starkey

On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
> On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote:
>> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com>
>> wrote:
>  :
>> I'm not sure about this fix.  The point of honoring
>> vmem_altmap_offset() is because a portion of the resource that is
>> passed to devm_memremap_pages() also contains the metadata info block
>> for the device.  The offset says "use everything past this point for
>> pages".  This may work for avoiding a crash, but it may corrupt info
>> block metadata in the process.  Can you provide more information
>> about the failing scenario to be sure that we are not triggering a
>> fault on an address that is not meant to have a page mapping?  I.e.
>> what is the host physical address of the page that caused this fault,
>> and is it valid?
>
> The fault address in question was the 2nd page of an NVDIMM range.  I
> assumed this fault address was valid and needed to be handled.  Here is
> some info about the base and patched cases.  Let me know if you need
> more info.
>
> Base
> ====
>
> The following NVDIMM range was set to /dev/dax.

With ndctl create-namespace or manually via sysfs?  Specifically I'm
looking for what the 'align' attribute was set to when the
configuration was established.  Can you provide a dump of the sysfs
attributes for the /dev/dax parent device?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
  2016-08-23 23:47   ` Kani, Toshimitsu
  2016-08-24  0:34     ` Dan Williams
@ 2016-08-24  1:00     ` Dan Williams
  2016-08-24  1:38       ` Kani, Toshimitsu
  1 sibling, 1 reply; 17+ messages in thread
From: Dan Williams @ 2016-08-24  1:00 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org,
	ard.biesheuvel, linux-kernel, akpm, brian.starkey

On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
> On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote:
>> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com>
>> wrote:
>  :
>> I'm not sure about this fix.  The point of honoring
>> vmem_altmap_offset() is because a portion of the resource that is
>> passed to devm_memremap_pages() also contains the metadata info block
>> for the device.  The offset says "use everything past this point for
>> pages".  This may work for avoiding a crash, but it may corrupt info
>> block metadata in the process.  Can you provide more information
>> about the failing scenario to be sure that we are not triggering a
>> fault on an address that is not meant to have a page mapping?  I.e.
>> what is the host physical address of the page that caused this fault,
>> and is it valid?
>
> The fault address in question was the 2nd page of an NVDIMM range.  I
> assumed this fault address was valid and needed to be handled.  Here is
> some info about the base and patched cases.  Let me know if you need
> more info.
>
> Base
> ====
>
> The following NVDIMM range was set to /dev/dax.
>
> /proc/iomem
> 480000000-87fffffff : Persistent Memory
>
> devm_memremap_pages() initialized struct page from 0x490200-0x87ffff.

This seems like the start of the trouble.  What happened to the first
1GB of the address range?  I'm assuming the 'align' attribute is set
to 2MB because we start at a pfn offset of 0x200, but this should be
starting at 0x480200.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
  2016-08-24  0:34     ` Dan Williams
@ 2016-08-24  1:29       ` Kani, Toshimitsu
  2016-08-24  2:53         ` Dan Williams
  0 siblings, 1 reply; 17+ messages in thread
From: Kani, Toshimitsu @ 2016-08-24  1:29 UTC (permalink / raw)
  To: Dan Williams
  Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org,
	ard.biesheuvel, linux-kernel, akpm, brian.starkey

> On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.kani@hpe.com>
> wrote:
> > On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote:
> >> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com>
> >> wrote:
> >  :
> >> I'm not sure about this fix.  The point of honoring
> >> vmem_altmap_offset() is because a portion of the resource that is
> >> passed to devm_memremap_pages() also contains the metadata info
> block
> >> for the device.  The offset says "use everything past this point for
> >> pages".  This may work for avoiding a crash, but it may corrupt info
> >> block metadata in the process.  Can you provide more information
> >> about the failing scenario to be sure that we are not triggering a
> >> fault on an address that is not meant to have a page mapping?  I.e.
> >> what is the host physical address of the page that caused this fault,
> >> and is it valid?
> >
> > The fault address in question was the 2nd page of an NVDIMM range.  I
> > assumed this fault address was valid and needed to be handled.  Here is
> > some info about the base and patched cases.  Let me know if you need
> > more info.
> >
> > Base
> > ====
> >
> > The following NVDIMM range was set to /dev/dax.
> 
> With ndctl create-namespace or manually via sysfs?  Specifically I'm
> looking for what the 'align' attribute was set to when the
> configuration was established.  Can you provide a dump of the sysfs
> attributes for the /dev/dax parent device?

I used the ndctl command below.
ndctl create-namespace -f -e namespace0.0 -m dax

Here is additional info from my note for the base case.

p {struct dev_pagemap} 0xffff88046d0453f0
$3 = {
  altmap = 0xffff88046d045410,
  res = 0xffff88046d0453a8,
  ref = 0xffff88046d0452f0,
  dev = 0xffff880464790410
}

crash> p {struct vmem_altmap} 0xffff88046d045410
$6 = {
  base_pfn = 0x480000,
  reserve = 0x2,        // PHYS_PFN(SZ_8K)
  free = 0x101fe,
  align = 0x1fe,
  alloc = 0x10000
}

crash> p {struct resource} 0xffff88046d0453a8
$4 = {
  start = 0x480000000,
  end = 0x87fffffff,
  name = 0xffff880c7da5d4a8 "region0",
  flags = 0x200,
  desc = 0x0,
  parent = 0x0,
  sibling = 0x0,
  child = 0x0
}

crash> p {struct percpu_ref} 0xffff88046d0452f0
$7 = {
  count = {
    counter = 0x8000000000000001
  },
  percpu_count_ptr = 0x60f380403a98,
  release = 0xffffffffa008a0a0,
  confirm_switch = 0x0,
  force_atomic = 0x0,
  rcu = {
    next = 0x0,
    func = 0x0
  }
}

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
  2016-08-24  1:00     ` Dan Williams
@ 2016-08-24  1:38       ` Kani, Toshimitsu
  0 siblings, 0 replies; 17+ messages in thread
From: Kani, Toshimitsu @ 2016-08-24  1:38 UTC (permalink / raw)
  To: Dan Williams
  Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org,
	ard.biesheuvel, linux-kernel, akpm, brian.starkey

> On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.kani@hpe.com>
> wrote:
> > On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote:
> >> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com>
> >> wrote:
> >  :
> >> I'm not sure about this fix.  The point of honoring
> >> vmem_altmap_offset() is because a portion of the resource that is
> >> passed to devm_memremap_pages() also contains the metadata info
> block
> >> for the device.  The offset says "use everything past this point for
> >> pages".  This may work for avoiding a crash, but it may corrupt info
> >> block metadata in the process.  Can you provide more information
> >> about the failing scenario to be sure that we are not triggering a
> >> fault on an address that is not meant to have a page mapping?  I.e.
> >> what is the host physical address of the page that caused this fault,
> >> and is it valid?
> >
> > The fault address in question was the 2nd page of an NVDIMM range.  I
> > assumed this fault address was valid and needed to be handled.  Here is
> > some info about the base and patched cases.  Let me know if you need
> > more info.
> >
> > Base
> > ====
> >
> > The following NVDIMM range was set to /dev/dax.
> >
> > /proc/iomem
> > 480000000-87fffffff : Persistent Memory
> >
> > devm_memremap_pages() initialized struct page from 0x490200-0x87ffff.
> 
> This seems like the start of the trouble.  What happened to the first
> 1GB of the address range?  I'm assuming the 'align' attribute is set
> to 2MB because we start at a pfn offset of 0x200, but this should be
> starting at 0x480200.

pfn_first() adds an offset from vmem_altmap_offset(), which is
altmap->reserve + altmap->free.  You can see why it ended up
with this offset from the dumps in my previous email.

Thanks
-Toshi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
  2016-08-24  1:29       ` Kani, Toshimitsu
@ 2016-08-24  2:53         ` Dan Williams
  2016-08-24  3:58           ` Dan Williams
  0 siblings, 1 reply; 17+ messages in thread
From: Dan Williams @ 2016-08-24  2:53 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org,
	ard.biesheuvel, linux-kernel, akpm, brian.starkey

On Tue, Aug 23, 2016 at 6:29 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
>> On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.kani@hpe.com>
>> wrote:
>> > On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote:
>> >> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com>
>> >> wrote:
>> >  :
>> >> I'm not sure about this fix.  The point of honoring
>> >> vmem_altmap_offset() is because a portion of the resource that is
>> >> passed to devm_memremap_pages() also contains the metadata info
>> block
>> >> for the device.  The offset says "use everything past this point for
>> >> pages".  This may work for avoiding a crash, but it may corrupt info
>> >> block metadata in the process.  Can you provide more information
>> >> about the failing scenario to be sure that we are not triggering a
>> >> fault on an address that is not meant to have a page mapping?  I.e.
>> >> what is the host physical address of the page that caused this fault,
>> >> and is it valid?
>> >
>> > The fault address in question was the 2nd page of an NVDIMM range.  I
>> > assumed this fault address was valid and needed to be handled.  Here is
>> > some info about the base and patched cases.  Let me know if you need
>> > more info.
>> >
>> > Base
>> > ====
>> >
>> > The following NVDIMM range was set to /dev/dax.
>>
>> With ndctl create-namespace or manually via sysfs?  Specifically I'm
>> looking for what the 'align' attribute was set to when the
>> configuration was established.  Can you provide a dump of the sysfs
>> attributes for the /dev/dax parent device?
>
> I used the ndctl command below.
> ndctl create-namespace -f -e namespace0.0 -m dax
>
> Here is additional info from my note for the base case.
>
> p {struct dev_pagemap} 0xffff88046d0453f0
> $3 = {
>   altmap = 0xffff88046d045410,
>   res = 0xffff88046d0453a8,
>   ref = 0xffff88046d0452f0,
>   dev = 0xffff880464790410
> }
>
> crash> p {struct vmem_altmap} 0xffff88046d045410
> $6 = {
>   base_pfn = 0x480000,
>   reserve = 0x2,        // PHYS_PFN(SZ_8K)
>   free = 0x101fe,
>   align = 0x1fe,
>   alloc = 0x10000
> }

Ah, so, on second look the 0x490200000 data offset looks correct.  The
total size of the address range is 16GB which equates to 256MB needed
for struct page, plus 2MB more to re-align the data on the next 2MB
boundary.

The question now is why is the guest faulting on an access to an
address less than 0x490200000?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
  2016-08-24  2:53         ` Dan Williams
@ 2016-08-24  3:58           ` Dan Williams
  2016-08-24  4:28             ` Kani, Toshimitsu
  2016-08-24  5:48             ` Dan Williams
  0 siblings, 2 replies; 17+ messages in thread
From: Dan Williams @ 2016-08-24  3:58 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org,
	ard.biesheuvel, linux-kernel, akpm, brian.starkey

[-- Attachment #1: Type: text/plain, Size: 2608 bytes --]

On Tue, Aug 23, 2016 at 7:53 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Aug 23, 2016 at 6:29 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
>>> On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.kani@hpe.com>
>>> wrote:
>>> > On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote:
>>> >> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com>
>>> >> wrote:
>>> >  :
>>> >> I'm not sure about this fix.  The point of honoring
>>> >> vmem_altmap_offset() is because a portion of the resource that is
>>> >> passed to devm_memremap_pages() also contains the metadata info
>>> block
>>> >> for the device.  The offset says "use everything past this point for
>>> >> pages".  This may work for avoiding a crash, but it may corrupt info
>>> >> block metadata in the process.  Can you provide more information
>>> >> about the failing scenario to be sure that we are not triggering a
>>> >> fault on an address that is not meant to have a page mapping?  I.e.
>>> >> what is the host physical address of the page that caused this fault,
>>> >> and is it valid?
>>> >
>>> > The fault address in question was the 2nd page of an NVDIMM range.  I
>>> > assumed this fault address was valid and needed to be handled.  Here is
>>> > some info about the base and patched cases.  Let me know if you need
>>> > more info.
>>> >
>>> > Base
>>> > ====
>>> >
>>> > The following NVDIMM range was set to /dev/dax.
>>>
>>> With ndctl create-namespace or manually via sysfs?  Specifically I'm
>>> looking for what the 'align' attribute was set to when the
>>> configuration was established.  Can you provide a dump of the sysfs
>>> attributes for the /dev/dax parent device?
>>
>> I used the ndctl command below.
>> ndctl create-namespace -f -e namespace0.0 -m dax
>>
>> Here is additional info from my note for the base case.
>>
>> p {struct dev_pagemap} 0xffff88046d0453f0
>> $3 = {
>>   altmap = 0xffff88046d045410,
>>   res = 0xffff88046d0453a8,
>>   ref = 0xffff88046d0452f0,
>>   dev = 0xffff880464790410
>> }
>>
>> crash> p {struct vmem_altmap} 0xffff88046d045410
>> $6 = {
>>   base_pfn = 0x480000,
>>   reserve = 0x2,        // PHYS_PFN(SZ_8K)
>>   free = 0x101fe,
>>   align = 0x1fe,
>>   alloc = 0x10000
>> }
>
> Ah, so, on second look the 0x490200000 data offset looks correct.  The
> total size of the address range is 16GB which equates to 256MB needed
> for struct page, plus 2MB more to re-align the data on the next 2MB
> boundary.
>
> The question now is why is the guest faulting on an access to an
> address less than 0x490200000?

Does the attached patch fix this for you?

[-- Attachment #2: 0001-dax-fix-device-dax-region-base.patch --]
[-- Type: text/x-patch, Size: 3372 bytes --]

From 506cdd2c4ec0f9283ac4eb92259f2e8a71c5b637 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams@intel.com>
Date: Tue, 23 Aug 2016 19:59:31 -0700
Subject: [PATCH] dax: fix device-dax region base

The data offset for a dax region needs to account for an altmap
reservation in the resource range.  Otherwise, device-dax is allowing
mappings directly into the memmap or device info-block area, with crash
signatures like the following:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
 IP: [<ffffffff811ac851>] get_zone_device_page+0x11/0x30
 Call Trace:
   follow_devmap_pmd+0x298/0x2c0
   follow_page_mask+0x275/0x530
   __get_user_pages+0xe3/0x750
   __gfn_to_pfn_memslot+0x1b2/0x450 [kvm]
   ? hrtimer_try_to_cancel+0x2c/0x120
   ? kvm_read_l1_tsc+0x55/0x60 [kvm]
   try_async_pf+0x66/0x230 [kvm]
   ? kvm_host_page_size+0x90/0xa0 [kvm]
   tdp_page_fault+0x130/0x280 [kvm]
   kvm_mmu_page_fault+0x5f/0xf0 [kvm]
   handle_ept_violation+0x94/0x180 [kvm_intel]
   vmx_handle_exit+0x1d3/0x1440 [kvm_intel]
   ? atomic_switch_perf_msrs+0x6f/0xa0 [kvm_intel]
   ? vmx_vcpu_run+0x2d1/0x490 [kvm_intel]
   kvm_arch_vcpu_ioctl_run+0x81d/0x16a0 [kvm]
   ? wake_up_q+0x44/0x80
   kvm_vcpu_ioctl+0x33c/0x620 [kvm]
   ? __vfs_write+0x37/0x160
   do_vfs_ioctl+0xa2/0x5d0
   SyS_ioctl+0x79/0x90
   entry_SYSCALL_64_fastpath+0x1a/0xa4

Cc: <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: ab68f2622136 ("/dev/dax, pmem: direct access to persistent memory")
Reported-by: Abhilash Kumar Mulumudi <m.abhilash-kumar@hpe.com>
Reported-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/pmem.c       | 2 ++
 include/linux/memremap.h | 1 +
 kernel/memremap.c        | 9 +++++++++
 3 files changed, 12 insertions(+)

diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index dfb168568af1..0603637df162 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -116,6 +116,8 @@ static int dax_pmem_probe(struct device *dev)
 	if (rc)
 		return rc;
 
+	res.start += vmem_altmap_resource_offset(altmap);
+
 	nd_region = to_nd_region(dev->parent);
 	dax_region = alloc_dax_region(dev, nd_region->id, &res,
 			le32_to_cpu(pfn_sb->align), addr, PFN_DEV|PFN_MAP);
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 93416196ba64..7265b83cdbea 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -24,6 +24,7 @@ struct vmem_altmap {
 };
 
 unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
+resource_size_t vmem_altmap_resource_offset(struct vmem_altmap *altmap);
 void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
 
 #ifdef CONFIG_ZONE_DEVICE
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 251d16b4cb41..941e897c52a8 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -384,6 +384,15 @@ unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
 	return altmap->reserve + altmap->free;
 }
 
+resource_size_t vmem_altmap_resource_offset(struct vmem_altmap *altmap)
+{
+	/* number of bytes from base where data starts after the memmap */
+	if (!altmap)
+		return 0;
+	return vmem_altmap_offset(altmap) * PAGE_SIZE;
+}
+EXPORT_SYMBOL_GPL(vmem_altmap_resource_offset);
+
 void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns)
 {
 	altmap->alloc -= nr_pfns;
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* RE: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
  2016-08-24  3:58           ` Dan Williams
@ 2016-08-24  4:28             ` Kani, Toshimitsu
  2016-08-24  4:48               ` Dan Williams
  2016-08-24  5:48             ` Dan Williams
  1 sibling, 1 reply; 17+ messages in thread
From: Kani, Toshimitsu @ 2016-08-24  4:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org,
	ard.biesheuvel, linux-kernel, akpm, brian.starkey

> On Tue, Aug 23, 2016 at 7:53 PM, Dan Williams <dan.j.williams@intel.com>
> wrote:
> > On Tue, Aug 23, 2016 at 6:29 PM, Kani, Toshimitsu <toshi.kani@hpe.com>
> wrote:
> >>> On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu
> <toshi.kani@hpe.com>
> >>> wrote:
 :
> >>
> >> crash> p {struct vmem_altmap} 0xffff88046d045410
> >> $6 = {
> >>   base_pfn = 0x480000,
> >>   reserve = 0x2,        // PHYS_PFN(SZ_8K)
> >>   free = 0x101fe,
> >>   align = 0x1fe,
> >>   alloc = 0x10000
> >> }
> >
> > Ah, so, on second look the 0x490200000 data offset looks correct.  The
> > total size of the address range is 16GB which equates to 256MB needed
> > for struct page, plus 2MB more to re-align the data on the next 2MB
> > boundary.
> >
> > The question now is why is the guest faulting on an access to an
> > address less than 0x490200000?
> 
> Does the attached patch fix this for you?

Yeah, that makes sense.  I will test it tomorrow.

BTW, why does devm_memremap_pages() put a whole range to pgmap_radix
as device memory, but only initialize page->pgmap for its data range?  Is there
particular reason for this inconsistency?

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
  2016-08-24  4:28             ` Kani, Toshimitsu
@ 2016-08-24  4:48               ` Dan Williams
  2016-08-24 14:40                 ` Kani, Toshimitsu
  0 siblings, 1 reply; 17+ messages in thread
From: Dan Williams @ 2016-08-24  4:48 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org,
	ard.biesheuvel, linux-kernel, akpm, brian.starkey

On Tue, Aug 23, 2016 at 9:28 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
>> On Tue, Aug 23, 2016 at 7:53 PM, Dan Williams <dan.j.williams@intel.com>
>> wrote:
>> > On Tue, Aug 23, 2016 at 6:29 PM, Kani, Toshimitsu <toshi.kani@hpe.com>
>> wrote:
>> >>> On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu
>> <toshi.kani@hpe.com>
>> >>> wrote:
>  :
>> >>
>> >> crash> p {struct vmem_altmap} 0xffff88046d045410
>> >> $6 = {
>> >>   base_pfn = 0x480000,
>> >>   reserve = 0x2,        // PHYS_PFN(SZ_8K)
>> >>   free = 0x101fe,
>> >>   align = 0x1fe,
>> >>   alloc = 0x10000
>> >> }
>> >
>> > Ah, so, on second look the 0x490200000 data offset looks correct.  The
>> > total size of the address range is 16GB which equates to 256MB needed
>> > for struct page, plus 2MB more to re-align the data on the next 2MB
>> > boundary.
>> >
>> > The question now is why is the guest faulting on an access to an
>> > address less than 0x490200000?
>>
>> Does the attached patch fix this for you?
>
> Yeah, that makes sense.  I will test it tomorrow.
>
> BTW, why does devm_memremap_pages() put a whole range to pgmap_radix
> as device memory, but only initialize page->pgmap for its data range?  Is there
> particular reason for this inconsistency?
>

The radix tree is indexed by section number, but we don't always
initialize a full section.  The cases when we don't use a full section
is when it overlaps device metadata, or if a platform multiplexes the
device memory range with another resource within the same section.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
  2016-08-24  3:58           ` Dan Williams
  2016-08-24  4:28             ` Kani, Toshimitsu
@ 2016-08-24  5:48             ` Dan Williams
  2016-08-24 15:21               ` Kani, Toshimitsu
  1 sibling, 1 reply; 17+ messages in thread
From: Dan Williams @ 2016-08-24  5:48 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org,
	ard.biesheuvel, linux-kernel, akpm, brian.starkey

[-- Attachment #1: Type: text/plain, Size: 2854 bytes --]

On Tue, Aug 23, 2016 at 8:58 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Aug 23, 2016 at 7:53 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> On Tue, Aug 23, 2016 at 6:29 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
>>>> On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.kani@hpe.com>
>>>> wrote:
>>>> > On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote:
>>>> >> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.kani@hpe.com>
>>>> >> wrote:
>>>> >  :
>>>> >> I'm not sure about this fix.  The point of honoring
>>>> >> vmem_altmap_offset() is because a portion of the resource that is
>>>> >> passed to devm_memremap_pages() also contains the metadata info
>>>> block
>>>> >> for the device.  The offset says "use everything past this point for
>>>> >> pages".  This may work for avoiding a crash, but it may corrupt info
>>>> >> block metadata in the process.  Can you provide more information
>>>> >> about the failing scenario to be sure that we are not triggering a
>>>> >> fault on an address that is not meant to have a page mapping?  I.e.
>>>> >> what is the host physical address of the page that caused this fault,
>>>> >> and is it valid?
>>>> >
>>>> > The fault address in question was the 2nd page of an NVDIMM range.  I
>>>> > assumed this fault address was valid and needed to be handled.  Here is
>>>> > some info about the base and patched cases.  Let me know if you need
>>>> > more info.
>>>> >
>>>> > Base
>>>> > ====
>>>> >
>>>> > The following NVDIMM range was set to /dev/dax.
>>>>
>>>> With ndctl create-namespace or manually via sysfs?  Specifically I'm
>>>> looking for what the 'align' attribute was set to when the
>>>> configuration was established.  Can you provide a dump of the sysfs
>>>> attributes for the /dev/dax parent device?
>>>
>>> I used the ndctl command below.
>>> ndctl create-namespace -f -e namespace0.0 -m dax
>>>
>>> Here is additional info from my note for the base case.
>>>
>>> p {struct dev_pagemap} 0xffff88046d0453f0
>>> $3 = {
>>>   altmap = 0xffff88046d045410,
>>>   res = 0xffff88046d0453a8,
>>>   ref = 0xffff88046d0452f0,
>>>   dev = 0xffff880464790410
>>> }
>>>
>>> crash> p {struct vmem_altmap} 0xffff88046d045410
>>> $6 = {
>>>   base_pfn = 0x480000,
>>>   reserve = 0x2,        // PHYS_PFN(SZ_8K)
>>>   free = 0x101fe,
>>>   align = 0x1fe,
>>>   alloc = 0x10000
>>> }
>>
>> Ah, so, on second look the 0x490200000 data offset looks correct.  The
>> total size of the address range is 16GB which equates to 256MB needed
>> for struct page, plus 2MB more to re-align the data on the next 2MB
>> boundary.
>>
>> The question now is why is the guest faulting on an access to an
>> address less than 0x490200000?
>
> Does the attached patch fix this for you?

Sorry, should be this much simpler patch that also mirrors what
driver/nvdimm/pmem.c is doing...

[-- Attachment #2: 0001-dax-fix-device-dax-region-base.patch --]
[-- Type: text/x-patch, Size: 2245 bytes --]

From 3369f0e825c12bb2f17c0f7d3ccecb7c60f645e0 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams@intel.com>
Date: Tue, 23 Aug 2016 19:59:31 -0700
Subject: [PATCH] dax: fix device-dax region base

The data offset for a dax region needs to account for an altmap
reservation in the resource range.  Otherwise, device-dax is allowing
mappings directly into the memmap or device info-block area, with crash
signatures like the following:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
 IP: [<ffffffff811ac851>] get_zone_device_page+0x11/0x30
 Call Trace:
   follow_devmap_pmd+0x298/0x2c0
   follow_page_mask+0x275/0x530
   __get_user_pages+0xe3/0x750
   __gfn_to_pfn_memslot+0x1b2/0x450 [kvm]
   ? hrtimer_try_to_cancel+0x2c/0x120
   ? kvm_read_l1_tsc+0x55/0x60 [kvm]
   try_async_pf+0x66/0x230 [kvm]
   ? kvm_host_page_size+0x90/0xa0 [kvm]
   tdp_page_fault+0x130/0x280 [kvm]
   kvm_mmu_page_fault+0x5f/0xf0 [kvm]
   handle_ept_violation+0x94/0x180 [kvm_intel]
   vmx_handle_exit+0x1d3/0x1440 [kvm_intel]
   ? atomic_switch_perf_msrs+0x6f/0xa0 [kvm_intel]
   ? vmx_vcpu_run+0x2d1/0x490 [kvm_intel]
   kvm_arch_vcpu_ioctl_run+0x81d/0x16a0 [kvm]
   ? wake_up_q+0x44/0x80
   kvm_vcpu_ioctl+0x33c/0x620 [kvm]
   ? __vfs_write+0x37/0x160
   do_vfs_ioctl+0xa2/0x5d0
   SyS_ioctl+0x79/0x90
   entry_SYSCALL_64_fastpath+0x1a/0xa4

Cc: <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: ab68f2622136 ("/dev/dax, pmem: direct access to persistent memory")
Reported-by: Abhilash Kumar Mulumudi <m.abhilash-kumar@hpe.com>
Reported-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/pmem.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index dfb168568af1..1f01e98c83c7 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -116,6 +116,9 @@ static int dax_pmem_probe(struct device *dev)
 	if (rc)
 		return rc;
 
+	/* adjust the dax_region resource to the start of data */
+	res.start += le64_to_cpu(pfn_sb->dataoff);
+
 	nd_region = to_nd_region(dev->parent);
 	dax_region = alloc_dax_region(dev, nd_region->id, &res,
 			le32_to_cpu(pfn_sb->align), addr, PFN_DEV|PFN_MAP);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
  2016-08-24  4:48               ` Dan Williams
@ 2016-08-24 14:40                 ` Kani, Toshimitsu
  2016-08-24 14:55                   ` Dan Williams
  0 siblings, 1 reply; 17+ messages in thread
From: Kani, Toshimitsu @ 2016-08-24 14:40 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org,
	ard.biesheuvel, linux-kernel, akpm, brian.starkey

On Tue, 2016-08-23 at 21:48 -0700, Dan Williams wrote:
> On Tue, Aug 23, 2016 at 9:28 PM, Kani, Toshimitsu <toshi.kani@hpe.com
> > 
> > BTW, why does devm_memremap_pages() put a whole range to
> > pgmap_radix as device memory, but only initialize page->pgmap for
> > its data range?  Is there particular reason for this inconsistency?
>
> The radix tree is indexed by section number, but we don't always
> initialize a full section.  The cases when we don't use a full
> section is when it overlaps device metadata, or if a platform
> multiplexes the device memory range with another resource within the
> same section.

I see, but I still feel odd about making get_dev_pagemap() to work for
metadata, but get_page() -> get_zone_device_page() to crash like this.
 follow_devmap_pmd() assumes get_page() to work when get_dev_pagemap()
returns a valid pgmap...

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
  2016-08-24 14:40                 ` Kani, Toshimitsu
@ 2016-08-24 14:55                   ` Dan Williams
  0 siblings, 0 replies; 17+ messages in thread
From: Dan Williams @ 2016-08-24 14:55 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org,
	ard.biesheuvel, linux-kernel, akpm, brian.starkey

[ adding Konstantin ]

On Wed, Aug 24, 2016 at 7:40 AM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
> On Tue, 2016-08-23 at 21:48 -0700, Dan Williams wrote:
>> On Tue, Aug 23, 2016 at 9:28 PM, Kani, Toshimitsu <toshi.kani@hpe.com
>> >
>> > BTW, why does devm_memremap_pages() put a whole range to
>> > pgmap_radix as device memory, but only initialize page->pgmap for
>> > its data range?  Is there particular reason for this inconsistency?
>>
>> The radix tree is indexed by section number, but we don't always
>> initialize a full section.  The cases when we don't use a full
>> section is when it overlaps device metadata, or if a platform
>> multiplexes the device memory range with another resource within the
>> same section.
>
> I see, but I still feel odd about making get_dev_pagemap() to work for
> metadata, but get_page() -> get_zone_device_page() to crash like this.
>  follow_devmap_pmd() assumes get_page() to work when get_dev_pagemap()
> returns a valid pgmap...
>

We could leave the unmapped portions of a section out of the radix,
but I'm also worried about keeping the get_dev_pagemap() lookup cheap.
I saw that Konstantin has some proposed changes to the radix api to
make it easier to fill a range [1].  I might switch to
radix_tree_fill_range() when it becomes available.

[1]: https://github.com/koct9i/linux/commits/radix-tree

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page()
  2016-08-24  5:48             ` Dan Williams
@ 2016-08-24 15:21               ` Kani, Toshimitsu
  0 siblings, 0 replies; 17+ messages in thread
From: Kani, Toshimitsu @ 2016-08-24 15:21 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Mulumudi, Abhilash Kumar, linux-nvdimm@lists.01.org,
	ard.biesheuvel, linux-kernel, akpm, brian.starkey

On Tue, 2016-08-23 at 22:48 -0700, Dan Williams wrote:
> On Tue, Aug 23, 2016 at 8:58 PM, Dan Williams <dan.j.williams@intel.c
> 
> > 
> > Does the attached patch fix this for you?
> 
> Sorry, should be this much simpler patch that also mirrors what
> driver/nvdimm/pmem.c is doing...

Yes, this change works fine. :-)

Tested-by: Toshi Kani <toshi.kani@hpe.com>

Thanks for the quick fix!
-Toshi

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-08-24 15:39 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-23 18:43 [PATCH] memremap: Fix NULL pointer BUG in get_zone_device_page() Toshi Kani
2016-08-23 20:42 ` Andrew Morton
2016-08-23 21:25   ` Kani, Toshimitsu
2016-08-23 22:32 ` Dan Williams
2016-08-23 23:47   ` Kani, Toshimitsu
2016-08-24  0:34     ` Dan Williams
2016-08-24  1:29       ` Kani, Toshimitsu
2016-08-24  2:53         ` Dan Williams
2016-08-24  3:58           ` Dan Williams
2016-08-24  4:28             ` Kani, Toshimitsu
2016-08-24  4:48               ` Dan Williams
2016-08-24 14:40                 ` Kani, Toshimitsu
2016-08-24 14:55                   ` Dan Williams
2016-08-24  5:48             ` Dan Williams
2016-08-24 15:21               ` Kani, Toshimitsu
2016-08-24  1:00     ` Dan Williams
2016-08-24  1:38       ` Kani, Toshimitsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).