linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: "James Morris" <jmorris@namei.org>,
	"Sasha Levin" <sashal@kernel.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Linux MM" <linux-mm@kvack.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Michal Hocko" <mhocko@suse.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Keith Busch" <keith.busch@intel.com>,
	"Vishal L Verma" <vishal.l.verma@intel.com>,
	"Dave Jiang" <dave.jiang@intel.com>,
	"Ross Zwisler" <zwisler@kernel.org>,
	"Tom Lendacky" <thomas.lendacky@amd.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	"Fengguang Wu" <fengguang.wu@intel.com>,
	"Borislav Petkov" <bp@suse.de>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Yaowei Bai" <baiyaowei@cmss.chinamobile.com>,
	"Takashi Iwai" <tiwai@suse.de>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"David Hildenbrand" <david@redhat.com>
Subject: Re: [v4 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM
Date: Thu, 2 May 2019 08:54:25 -0700	[thread overview]
Message-ID: <CAPcyv4iPzpP-gzuDtPB2ixd6_uTuO8-YdVSfGw_Dq=igaKuOEg@mail.gmail.com> (raw)
In-Reply-To: <20190501191846.12634-3-pasha.tatashin@soleen.com>

On Wed, May 1, 2019 at 12:19 PM Pavel Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> It is now allowed to use persistent memory like a regular RAM, but
> currently there is no way to remove this memory until machine is
> rebooted.
>
> This work expands the functionality to also allows hotremoving
> previously hotplugged persistent memory, and recover the device for use
> for other purposes.
>
> To hotremove persistent memory, the management software must first
> offline all memory blocks of dax region, and than unbind it from
> device-dax/kmem driver. So, operations should look like this:
>
> echo offline > echo offline > /sys/devices/system/memory/memoryN/state
> ...
> echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
>
> Note: if unbind is done without offlining memory beforehand, it won't be
> possible to do dax0.0 hotremove, and dax's memory is going to be part of
> System RAM until reboot.
>
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> ---
>  drivers/dax/dax-private.h |  2 +
>  drivers/dax/kmem.c        | 99 +++++++++++++++++++++++++++++++++++++--
>  2 files changed, 97 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> index a45612148ca0..999aaf3a29b3 100644
> --- a/drivers/dax/dax-private.h
> +++ b/drivers/dax/dax-private.h
> @@ -53,6 +53,7 @@ struct dax_region {
>   * @pgmap - pgmap for memmap setup / lifetime (driver owned)
>   * @ref: pgmap reference count (driver owned)
>   * @cmp: @ref final put completion (driver owned)
> + * @dax_mem_res: physical address range of hotadded DAX memory
>   */
>  struct dev_dax {
>         struct dax_region *region;
> @@ -62,6 +63,7 @@ struct dev_dax {
>         struct dev_pagemap pgmap;
>         struct percpu_ref ref;
>         struct completion cmp;
> +       struct resource *dax_kmem_res;
>  };
>
>  static inline struct dev_dax *to_dev_dax(struct device *dev)
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 4c0131857133..72b868066026 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -71,21 +71,112 @@ int dev_dax_kmem_probe(struct device *dev)
>                 kfree(new_res);
>                 return rc;
>         }
> +       dev_dax->dax_kmem_res = new_res;
>
>         return 0;
>  }
>
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +static int
> +check_devdax_mem_offlined_cb(struct memory_block *mem, void *arg)
> +{
> +       /* Memory block device */
> +       struct device *mem_dev = &mem->dev;
> +       bool is_offline;
> +
> +       device_lock(mem_dev);
> +       is_offline = mem_dev->offline;
> +       device_unlock(mem_dev);
> +
> +       /*
> +        * Check that device-dax's memory_blocks are offline. If a memory_block
> +        * is not offline a warning is printed and an error is returned.
> +        */
> +       if (!is_offline) {
> +               /* Dax device device */
> +               struct device *dev = (struct device *)arg;
> +               struct dev_dax *dev_dax = to_dev_dax(dev);
> +               struct resource *res = &dev_dax->region->res;
> +               unsigned long spfn = section_nr_to_pfn(mem->start_section_nr);
> +               unsigned long epfn = section_nr_to_pfn(mem->end_section_nr) +
> +                                                      PAGES_PER_SECTION - 1;
> +               phys_addr_t spa = spfn << PAGE_SHIFT;
> +               phys_addr_t epa = epfn << PAGE_SHIFT;
> +
> +               dev_err(dev,
> +                       "DAX region %pR cannot be hotremoved until the next reboot. Memory block [%pa-%pa] is not offline.\n",
> +                       res, &spa, &epa);
> +
> +               return -EBUSY;
> +       }
> +
> +       return 0;
> +}
> +
> +static int dev_dax_kmem_remove(struct device *dev)
> +{
> +       struct dev_dax *dev_dax = to_dev_dax(dev);
> +       struct resource *res = dev_dax->dax_kmem_res;
> +       resource_size_t kmem_start;
> +       resource_size_t kmem_size;
> +       unsigned long start_pfn;
> +       unsigned long end_pfn;
> +       int rc;
> +
> +       kmem_start = res->start;
> +       kmem_size = resource_size(res);
> +       start_pfn = kmem_start >> PAGE_SHIFT;
> +       end_pfn = start_pfn + (kmem_size >> PAGE_SHIFT) - 1;
> +
> +       /*
> +        * Keep hotplug lock while checking memory state, and also required
> +        * during __remove_memory() call. Admin can't change memory state via
> +        * sysfs while this lock is kept.
> +        */
> +       lock_device_hotplug();
> +
> +       /*
> +        * Walk and check that every singe memory_block of dax region is
> +        * offline. Hotremove can succeed only when every memory_block is
> +        * offlined beforehand.
> +        */
> +       rc = walk_memory_range(start_pfn, end_pfn, dev,
> +                              check_devdax_mem_offlined_cb);
> +
> +       /*
> +        * If admin has not offlined memory beforehand, we cannot hotremove dax.
> +        * Unfortunately, because unbind will still succeed there is no way for
> +        * user to hotremove dax after this.
> +        */
> +       if (rc) {
> +               unlock_device_hotplug();
> +               return rc;
> +       }
> +
> +       /* Hotremove memory, cannot fail because memory is already offlined */
> +       __remove_memory(dev_dax->target_node, kmem_start, kmem_size);
> +       unlock_device_hotplug();

Currently the kmem driver can be built as a module, and I don't see a
need to drop that flexibility. What about wrapping these core
routines:

    unlock_device_hotplug
    __remove_memory
    walk_memory_range
    lock_device_hotplug

...into a common exported (gpl) helper like:

    int try_remove_memory(int nid, struct resource *res)

Because as far as I can see there's nothing device-dax specific about
this "try remove iff offline" functionality outside of looking up the
related 'struct resource'. The check_devdax_mem_offlined_cb callback
can be made generic if the callback argument is the resource pointer.


  parent reply	other threads:[~2019-05-02 15:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-01 19:18 [v4 0/2] "Hotremove" persistent memory Pavel Tatashin
2019-05-01 19:18 ` [v4 1/2] device-dax: fix memory and resource leak if hotplug fails Pavel Tatashin
2019-05-01 19:18 ` [v4 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM Pavel Tatashin
2019-05-02 14:14   ` David Hildenbrand
2019-05-02 14:16     ` Pavel Tatashin
2019-05-02 15:54   ` Dan Williams [this message]
2019-05-02 16:47     ` Pavel Tatashin
2019-05-02 17:34   ` Sasha Levin
2019-05-02 17:44     ` Pavel Tatashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4iPzpP-gzuDtPB2ixd6_uTuO8-YdVSfGw_Dq=igaKuOEg@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=baiyaowei@cmss.chinamobile.com \
    --cc=bhelgaas@google.com \
    --cc=bp@suse.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@redhat.com \
    --cc=fengguang.wu@intel.com \
    --cc=jglisse@redhat.com \
    --cc=jmorris@namei.org \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mhocko@suse.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=sashal@kernel.org \
    --cc=thomas.lendacky@amd.com \
    --cc=tiwai@suse.de \
    --cc=vishal.l.verma@intel.com \
    --cc=ying.huang@intel.com \
    --cc=zwisler@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).