All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Balbir Singh <bsingharora@gmail.com>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, John Hubbard <jhubbard@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	David Nellans <dnellans@nvidia.com>,
	Evgeny Baskakov <ebaskakov@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>,
	Sherry Cheung <SCheung@nvidia.com>,
	Subhash Gutti <sgutti@nvidia.com>
Subject: Re: [HMM 14/16] mm/hmm/devmem: device memory hotplug using ZONE_DEVICE
Date: Fri, 7 Apr 2017 12:26:37 -0400	[thread overview]
Message-ID: <20170407162636.GB15945@redhat.com> (raw)
In-Reply-To: <20170407020254.GA13927@redhat.com>

On Thu, Apr 06, 2017 at 10:02:55PM -0400, Jerome Glisse wrote:
> On Fri, Apr 07, 2017 at 11:37:34AM +1000, Balbir Singh wrote:
> > On Wed, 2017-04-05 at 16:40 -0400, Jérôme Glisse wrote:
> > > This introduce a simple struct and associated helpers for device driver
> > > to use when hotpluging un-addressable device memory as ZONE_DEVICE. It
> > > will find a unuse physical address range and trigger memory hotplug for
> > > it which allocates and initialize struct page for the device memory.
> > > 
> > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> > > Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
> > > Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> > > Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
> > > Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
> > > Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
> > > ---
> > >  include/linux/hmm.h | 114 +++++++++++++++
> > >  mm/Kconfig          |   9 ++
> > >  mm/hmm.c            | 398 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 521 insertions(+)
> > > 
> > > +/*
> > > + * To add (hotplug) device memory, HMM assumes that there is no real resource
> > > + * that reserves a range in the physical address space (this is intended to be
> > > + * use by unaddressable device memory). It will reserve a physical range big
> > > + * enough and allocate struct page for it.
> > 
> > I've found that the implementation of this is quite non-portable, in that
> > starting from iomem_resource.end+1-size (which is effectively -size) on
> > my platform (powerpc) does not give expected results. It could be that
> > additional changes are needed to arch_add_memory() to support this
> > use case.
> 
> The CDM version does not use that part, that being said isn't -size a valid
> value we care only about unsigned here ? What is the end value on powerpc ?
> In any case this sounds more like a unsigned/signed arithmetic issue, i will
> look into it.
> 
> > 
> > > +
> > > +	size = ALIGN(size, SECTION_SIZE);
> > > +	addr = (iomem_resource.end + 1ULL) - size;
> > 
> > 
> > Why don't we allocate_resource() with the right constraints and get a new
> > unused region?
> 
> The issue with allocate_resource() is that it does scan the resource tree
> from lower address to higher ones. I was told that it was less likely to
> have hotplug issue conflict if i pick highest physicall address for the
> device memory hence why i do my own scan from the end toward the start.
> 
> Again all this function does not apply to PPC, it can be hidden behind
> x86 config if you prefer it.

Ok so i have look into it and there is no arithmetic bug in my code the
issue is simpler than that. It seems only x86 clamp iomem_resource.end to
MAX_PHYSMEM_BITS so using allocate_resource() would just hide the issue.

It is fine not to clamp if you know that you won't get resource with
funky physical address but in case of UNADDRESSABLE i do not get any
physical address so i have to pick one and i want to pick one that is
unlikely to cause trouble latter on with someone hotpluging memory.

If we care about the UNADDRESSABLE case on powerpc i see 2 way to fix
this. Clamp iomem_resource.end to MAX_PHYSMEM_BITS or restrict my scan
in hmm to MIN(iomem_resource.end, 1UL << MAX_PHYSMEM_BITS) the latter
is probably safer and more bullet proof in respect to other arch getting
interested in this.

Cheers,
Jérôme

WARNING: multiple messages have this Message-ID (diff)
From: Jerome Glisse <jglisse@redhat.com>
To: Balbir Singh <bsingharora@gmail.com>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, John Hubbard <jhubbard@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	David Nellans <dnellans@nvidia.com>,
	Evgeny Baskakov <ebaskakov@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>,
	Sherry Cheung <SCheung@nvidia.com>,
	Subhash Gutti <sgutti@nvidia.com>
Subject: Re: [HMM 14/16] mm/hmm/devmem: device memory hotplug using ZONE_DEVICE
Date: Fri, 7 Apr 2017 12:26:37 -0400	[thread overview]
Message-ID: <20170407162636.GB15945@redhat.com> (raw)
In-Reply-To: <20170407020254.GA13927@redhat.com>

On Thu, Apr 06, 2017 at 10:02:55PM -0400, Jerome Glisse wrote:
> On Fri, Apr 07, 2017 at 11:37:34AM +1000, Balbir Singh wrote:
> > On Wed, 2017-04-05 at 16:40 -0400, Jerome Glisse wrote:
> > > This introduce a simple struct and associated helpers for device driver
> > > to use when hotpluging un-addressable device memory as ZONE_DEVICE. It
> > > will find a unuse physical address range and trigger memory hotplug for
> > > it which allocates and initialize struct page for the device memory.
> > > 
> > > Signed-off-by: Jerome Glisse <jglisse@redhat.com>
> > > Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
> > > Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> > > Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
> > > Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
> > > Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
> > > ---
> > >  include/linux/hmm.h | 114 +++++++++++++++
> > >  mm/Kconfig          |   9 ++
> > >  mm/hmm.c            | 398 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 521 insertions(+)
> > > 
> > > +/*
> > > + * To add (hotplug) device memory, HMM assumes that there is no real resource
> > > + * that reserves a range in the physical address space (this is intended to be
> > > + * use by unaddressable device memory). It will reserve a physical range big
> > > + * enough and allocate struct page for it.
> > 
> > I've found that the implementation of this is quite non-portable, in that
> > starting from iomem_resource.end+1-size (which is effectively -size) on
> > my platform (powerpc) does not give expected results. It could be that
> > additional changes are needed to arch_add_memory() to support this
> > use case.
> 
> The CDM version does not use that part, that being said isn't -size a valid
> value we care only about unsigned here ? What is the end value on powerpc ?
> In any case this sounds more like a unsigned/signed arithmetic issue, i will
> look into it.
> 
> > 
> > > +
> > > +	size = ALIGN(size, SECTION_SIZE);
> > > +	addr = (iomem_resource.end + 1ULL) - size;
> > 
> > 
> > Why don't we allocate_resource() with the right constraints and get a new
> > unused region?
> 
> The issue with allocate_resource() is that it does scan the resource tree
> from lower address to higher ones. I was told that it was less likely to
> have hotplug issue conflict if i pick highest physicall address for the
> device memory hence why i do my own scan from the end toward the start.
> 
> Again all this function does not apply to PPC, it can be hidden behind
> x86 config if you prefer it.

Ok so i have look into it and there is no arithmetic bug in my code the
issue is simpler than that. It seems only x86 clamp iomem_resource.end to
MAX_PHYSMEM_BITS so using allocate_resource() would just hide the issue.

It is fine not to clamp if you know that you won't get resource with
funky physical address but in case of UNADDRESSABLE i do not get any
physical address so i have to pick one and i want to pick one that is
unlikely to cause trouble latter on with someone hotpluging memory.

If we care about the UNADDRESSABLE case on powerpc i see 2 way to fix
this. Clamp iomem_resource.end to MAX_PHYSMEM_BITS or restrict my scan
in hmm to MIN(iomem_resource.end, 1UL << MAX_PHYSMEM_BITS) the latter
is probably safer and more bullet proof in respect to other arch getting
interested in this.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-04-07 16:26 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-05 20:40 [HMM 00/16] HMM (Heterogeneous Memory Management) v19 Jérôme Glisse
2017-04-05 20:40 ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 01/16] mm/memory/hotplug: add memory type parameter to arch_add/remove_memory Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-06  9:45   ` Anshuman Khandual
2017-04-06  9:45     ` Anshuman Khandual
2017-04-06 13:58     ` Jerome Glisse
2017-04-06 13:58       ` Jerome Glisse
2017-04-07 12:13   ` Michal Hocko
2017-04-07 12:13     ` Michal Hocko
2017-04-07 14:32     ` Jerome Glisse
2017-04-07 14:32       ` Jerome Glisse
2017-04-07 14:45       ` Michal Hocko
2017-04-07 14:45         ` Michal Hocko
2017-04-07 14:57         ` Jerome Glisse
2017-04-07 14:57           ` Jerome Glisse
2017-04-07 15:11           ` Michal Hocko
2017-04-07 15:11             ` Michal Hocko
2017-04-07 16:10             ` Jerome Glisse
2017-04-07 16:10               ` Jerome Glisse
2017-04-07 16:37               ` Michal Hocko
2017-04-07 16:37                 ` Michal Hocko
2017-04-07 17:10                 ` Jerome Glisse
2017-04-07 17:10                   ` Jerome Glisse
2017-04-07 17:59                   ` Michal Hocko
2017-04-07 17:59                     ` Michal Hocko
2017-04-07 18:27                     ` Jerome Glisse
2017-04-07 18:27                       ` Jerome Glisse
2017-04-05 20:40 ` [HMM 02/16] mm/put_page: move ZONE_DEVICE page reference decrement v2 Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 03/16] mm/unaddressable-memory: new type of ZONE_DEVICE for unaddressable memory Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 04/16] mm/ZONE_DEVICE/x86: add support for un-addressable device memory Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 05/16] mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 06/16] mm/migrate: new memory migration helper for use with device memory v4 Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 07/16] mm/migrate: migrate_vma() unmap page from vma while collecting pages Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 08/16] mm/hmm: heterogeneous memory management (HMM for short) Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 09/16] mm/hmm/mirror: mirror process address space on device with HMM helpers Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 10/16] mm/hmm/mirror: helper to snapshot CPU page table v2 Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-10  8:35   ` Michal Hocko
2017-04-10  8:35     ` Michal Hocko
2017-04-10  8:43   ` Michal Hocko
2017-04-10  8:43     ` Michal Hocko
2017-04-10 22:10     ` Andrew Morton
2017-04-10 22:10       ` Andrew Morton
2017-04-11  1:33       ` Jerome Glisse
2017-04-11  1:33         ` Jerome Glisse
2017-04-11 20:33         ` Andrew Morton
2017-04-11 20:33           ` Andrew Morton
2017-04-05 20:40 ` [HMM 11/16] mm/hmm/mirror: device page fault handler Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 12/16] mm/migrate: support un-addressable ZONE_DEVICE page in migration Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 13/16] mm/migrate: allow migrate_vma() to alloc new page on empty entry Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 14/16] mm/hmm/devmem: device memory hotplug using ZONE_DEVICE Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-06 21:22   ` Jerome Glisse
2017-04-06 21:22     ` Jerome Glisse
2017-04-07  1:37   ` Balbir Singh
2017-04-07  1:37     ` Balbir Singh
2017-04-07  2:02     ` Jerome Glisse
2017-04-07  2:02       ` Jerome Glisse
2017-04-07 16:26       ` Jerome Glisse [this message]
2017-04-07 16:26         ` Jerome Glisse
2017-04-10  4:31         ` Balbir Singh
2017-04-10  4:31           ` Balbir Singh
2017-04-05 20:40 ` [HMM 15/16] mm/hmm/devmem: dummy HMM device for ZONE_DEVICE memory v2 Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 16/16] hmm: heterogeneous memory management documentation Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-06  3:22 ` [HMM 00/16] HMM (Heterogeneous Memory Management) v19 Figo.zhang
2017-04-06  4:59   ` Jerome Glisse
2017-04-06  4:59     ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170407162636.GB15945@redhat.com \
    --to=jglisse@redhat.com \
    --cc=SCheung@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dnellans@nvidia.com \
    --cc=ebaskakov@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhairgrove@nvidia.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=sgutti@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.