From: Dan Williams <dan.j.williams@intel.com>
To: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: "James Morris" <jmorris@namei.org>,
"Sasha Levin" <sashal@kernel.org>,
"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
"Linux MM" <linux-mm@kvack.org>,
linux-nvdimm <linux-nvdimm@lists.01.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Michal Hocko" <mhocko@suse.com>,
"Dave Hansen" <dave.hansen@linux.intel.com>,
"Keith Busch" <keith.busch@intel.com>,
"Vishal L Verma" <vishal.l.verma@intel.com>,
"Dave Jiang" <dave.jiang@intel.com>,
"Ross Zwisler" <zwisler@kernel.org>,
"Tom Lendacky" <thomas.lendacky@amd.com>,
"Huang, Ying" <ying.huang@intel.com>,
"Fengguang Wu" <fengguang.wu@intel.com>,
"Borislav Petkov" <bp@suse.de>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Yaowei Bai" <baiyaowei@cmss.chinamobile.com>,
"Takashi Iwai" <tiwai@suse.de>,
"Jérôme Glisse" <jglisse@redhat.com>
Subject: Re: [v1 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM
Date: Sat, 20 Apr 2019 09:18:16 -0700 [thread overview]
Message-ID: <CAPcyv4j9sG6Wy3EfTuPb0uZ2qp=gr9UgUhpnXQA_g6Ko9KFmLA@mail.gmail.com> (raw)
In-Reply-To: <20190420153148.21548-3-pasha.tatashin@soleen.com>
On Sat, Apr 20, 2019 at 8:36 AM Pavel Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> It is now allowed to use persistent memory like a regular RAM, but
> currently there is no way to remove this memory until machine is
> rebooted.
>
> This work expands the functionality to also allow hot removing
> previously hotplugged persistent memory, and recover the device for use
> for other purposes.
>
> To hotremove persistent memory, the management software must unbind it
> from device-dax/kmem driver:
>
> echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
>
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> ---
> drivers/dax/dax-private.h | 2 +
> drivers/dax/kmem.c | 77 +++++++++++++++++++++++++++++++++++++--
> 2 files changed, 75 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> index a45612148ca0..999aaf3a29b3 100644
> --- a/drivers/dax/dax-private.h
> +++ b/drivers/dax/dax-private.h
> @@ -53,6 +53,7 @@ struct dax_region {
> * @pgmap - pgmap for memmap setup / lifetime (driver owned)
> * @ref: pgmap reference count (driver owned)
> * @cmp: @ref final put completion (driver owned)
> + * @dax_mem_res: physical address range of hotadded DAX memory
> */
> struct dev_dax {
> struct dax_region *region;
> @@ -62,6 +63,7 @@ struct dev_dax {
> struct dev_pagemap pgmap;
> struct percpu_ref ref;
> struct completion cmp;
> + struct resource *dax_kmem_res;
> };
>
> static inline struct dev_dax *to_dev_dax(struct device *dev)
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 4c0131857133..026c34f93df5 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -71,21 +71,90 @@ int dev_dax_kmem_probe(struct device *dev)
> kfree(new_res);
> return rc;
> }
> + dev_dax->dax_kmem_res = new_res;
>
> return 0;
> }
>
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +/*
> + * Offline device-dax's memory_blocks. If a memory_block cannot be offlined
> + * a warning is printed and an error is returned. dax hotremove can succeed
> + * only when every memory_block is offline.
> + */
> +static int
> +offline_memblock_cb(struct memory_block *mem, void *arg)
> +{
> + struct device *dev = (struct device *)arg;
> + int rc = device_offline(&mem->dev);
> +
> + if (rc < 0) {
> + unsigned long spfn = section_nr_to_pfn(mem->start_section_nr);
> + unsigned long epfn = section_nr_to_pfn(mem->end_section_nr);
> + phys_addr_t spa = spfn << PAGE_SHIFT;
> + phys_addr_t epa = epfn << PAGE_SHIFT;
> +
> + dev_warn(dev, "could not offline memory block [%pa-%pa]\n",
> + &spa, &epa);
> +
> + return rc;
> + }
> +
> + return 0;
> +}
> +
> +static int dev_dax_kmem_remove(struct device *dev)
> +{
> + struct dev_dax *dev_dax = to_dev_dax(dev);
> + struct resource *res = dev_dax->dax_kmem_res;
> + resource_size_t kmem_start;
> + resource_size_t kmem_size;
> + unsigned long start_pfn;
> + unsigned long end_pfn;
> + int rc;
> +
> + /*
> + * dax kmem resource does not exist, means memory was never hotplugged.
> + * So, nothing to do here.
> + */
> + if (!res)
> + return 0;
> +
> + kmem_start = res->start;
> + kmem_size = resource_size(res);
> + start_pfn = kmem_start >> PAGE_SHIFT;
> + end_pfn = start_pfn + (kmem_size >> PAGE_SHIFT) - 1;
> +
> + /* Walk and offline every singe memory_block of the dax region. */
> + lock_device_hotplug();
> + rc = walk_memory_range(start_pfn, end_pfn, dev, offline_memblock_cb);
> + unlock_device_hotplug();
> + if (rc)
> + return rc;
This potential early return is the reason why memory hotremove is not
reliable vs the driver-core. If this walk fails to offline the memory
it will still be online, but the driver-core has no consideration for
device-unbind failing. The ubind will proceed while the memory stays
pinned.
next prev parent reply other threads:[~2019-04-20 16:22 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-20 15:31 [v1 0/2] "Hotremove" persistent memory Pavel Tatashin
2019-04-20 15:31 ` [v1 1/2] device-dax: fix memory and resource leak if hotplug fails Pavel Tatashin
2019-04-20 15:31 ` [v1 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM Pavel Tatashin
2019-04-20 16:18 ` Dan Williams [this message]
2019-04-20 16:30 ` Pavel Tatashin
2019-04-20 16:36 ` Dan Williams
2019-04-20 17:01 ` Pavel Tatashin
2019-04-20 21:02 ` Dan Williams
2019-04-20 22:04 ` Pavel Tatashin
2019-04-20 23:19 ` Dan Williams
2019-04-20 16:34 ` [v1 0/2] "Hotremove" persistent memory Dan Williams
2019-04-20 16:56 ` Pavel Tatashin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPcyv4j9sG6Wy3EfTuPb0uZ2qp=gr9UgUhpnXQA_g6Ko9KFmLA@mail.gmail.com' \
--to=dan.j.williams@intel.com \
--cc=akpm@linux-foundation.org \
--cc=baiyaowei@cmss.chinamobile.com \
--cc=bhelgaas@google.com \
--cc=bp@suse.de \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=fengguang.wu@intel.com \
--cc=jglisse@redhat.com \
--cc=jmorris@namei.org \
--cc=keith.busch@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=mhocko@suse.com \
--cc=pasha.tatashin@soleen.com \
--cc=sashal@kernel.org \
--cc=thomas.lendacky@amd.com \
--cc=tiwai@suse.de \
--cc=vishal.l.verma@intel.com \
--cc=ying.huang@intel.com \
--cc=zwisler@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).