linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>, akpm@linux-foundation.org
Cc: Ard Biesheuvel <ardb@kernel.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Borislav Petkov <bp@alien8.de>, David Airlie <airlied@linux.ie>,
	Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Joao Martins <joao.m.martins@oracle.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ben Skeggs <bskeggs@redhat.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Jason Gunthorpe <jgg@mellanox.com>, Jia He <justin.he@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Paul Mackerras <paulus@ozlabs.org>,
	Brice Goglin <Brice.Goglin@inria.fr>,
	Michael Ellerman <mpe@ellerman.id.au>,
	"Rafa el J. Wysocki" <rjw@rjwysocki.net>,
	Daniel Vetter <daniel@ffwll.ch>,
	Andy Lutomirski <luto@kernel.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	linux-mm@kvack.org, linux-nvdimm@lists.01.org,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v4 00/23] device-dax: Support sub-dividing soft-reserved ranges
Date: Mon, 3 Aug 2020 09:47:39 +0200	[thread overview]
Message-ID: <c59111f9-7c94-8b9e-2b8c-4cb96b9aa848@redhat.com> (raw)
In-Reply-To: <159643094279.4062302.17779410714418721328.stgit@dwillia2-desk3.amr.corp.intel.com>

[...]

> Well, no v5.8-rc8 to line this up for v5.9, so next best is early
> integration into -mm before other collisions develop.
> 
> Chatted with Justin offline and it currently appears that the missing
> numa information is the fault of the platform firmware to populate all
> the necessary NUMA data in the NFIT.

I'm planning on looking at some bits of this series this week, but some
questions upfront ...

> 
> ---
> Cover:
> 
> The device-dax facility allows an address range to be directly mapped
> through a chardev, or optionally hotplugged to the core kernel page
> allocator as System-RAM. It is the mechanism for converting persistent
> memory (pmem) to be used as another volatile memory pool i.e. the
> current Memory Tiering hot topic on linux-mm.
> 
> In the case of pmem the nvdimm-namespace-label mechanism can sub-divide
> it, but that labeling mechanism is not available / applicable to
> soft-reserved ("EFI specific purpose") memory [3]. This series provides
> a sysfs-mechanism for the daxctl utility to enable provisioning of
> volatile-soft-reserved memory ranges.
> 
> The motivations for this facility are:
> 
> 1/ Allow performance differentiated memory ranges to be split between
>    kernel-managed and directly-accessed use cases.
> 
> 2/ Allow physical memory to be provisioned along performance relevant
>    address boundaries. For example, divide a memory-side cache [4] along
>    cache-color boundaries.
> 
> 3/ Parcel out soft-reserved memory to VMs using device-dax as a security
>    / permissions boundary [5]. Specifically I have seen people (ab)using
>    memmap=nn!ss (mark System-RAM as Persistent Memory) just to get the
>    device-dax interface on custom address ranges. A follow-on for the VM
>    use case is to teach device-dax to dynamically allocate 'struct page' at
>    runtime to reduce the duplication of 'struct page' space in both the
>    guest and the host kernel for the same physical pages.


I think I am missing some important pieces. Bear with me.

1. On x86-64, e820 indicates "soft-reserved" memory. This memory is not
automatically used in the buddy during boot, but remains untouched
(similar to pmem). But as it involves ACPI as well, it could also be
used on arm64 (-e820), correct?

2. Soft-reserved memory is volatile RAM with differing performance
characteristics ("performance differentiated memory"). What would be
examples of such memory? Like, memory that is faster than RAM (scratch
pad), or slower (pmem)? Or both? :) Is it a valid use case to use pmem
in a hypervisor to back this memory?

3. There seem to be use cases where "soft-reserved" memory is used via
DAX. What is an example use case? I assume it's *not* to treat it like
PMEM but instead e.g., use it as a fast buffer inside applications or
similar.

4. There seem to be use cases where some part of "soft-reserved" memory
is used via DAX, some other is given to the buddy. What is an example
use case? Is this really necessary or only some theoretical use case?

5. The "provisioned along performance relevant address boundaries." part
is unclear to me. Can you give an example of how this would look like
from user space? Like, split that memory in blocks of size X with
alignment Y and give them to separate applications?

6. If you add such memory to the buddy, is there any way the system can
differentiate it from other memory? E.g., via fake/other NUMA nodes?


Also, can you give examples of how kmem-added memory is represented in
/proc/iomem for a) pmem and b) soft-resered memory after this series
(skimming over the patches, I think there is a change for pmem, right?)?

I am really wondering if it's the right approach to squeeze this into
our pmem/nvdimm infrastructure just because it's easy to do. E.g., man
"ndctl" - "ndctl - Manage "libnvdimm" subsystem devices (Non-volatile
Memory)" speaks explicitly about non-volatile memory.


> 
> [2]: http://lore.kernel.org/r/20200713160837.13774-11-joao.m.martins@oracle.com
> [3]: http://lore.kernel.org/r/157309097008.1579826.12818463304589384434.stgit@dwillia2-desk3.amr.corp.intel.com
> [4]: http://lore.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
> [5]: http://lore.kernel.org/r/20200110190313.17144-1-joao.m.martins@oracle.com
> 
> ---
> 
> Dan Williams (19):
>       x86/numa: Cleanup configuration dependent command-line options
>       x86/numa: Add 'nohmat' option
>       efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance
>       ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device
>       resource: Report parent to walk_iomem_res_desc() callback
>       mm/memory_hotplug: Introduce default phys_to_target_node() implementation
>       ACPI: HMAT: Attach a device for each soft-reserved range
>       device-dax: Drop the dax_region.pfn_flags attribute
>       device-dax: Move instance creation parameters to 'struct dev_dax_data'
>       device-dax: Make pgmap optional for instance creation
>       device-dax: Kill dax_kmem_res
>       device-dax: Add an allocation interface for device-dax instances
>       device-dax: Introduce 'seed' devices
>       drivers/base: Make device_find_child_by_name() compatible with sysfs inputs
>       device-dax: Add resize support
>       mm/memremap_pages: Convert to 'struct range'
>       mm/memremap_pages: Support multiple ranges per invocation
>       device-dax: Add dis-contiguous resource support
>       device-dax: Introduce 'mapping' devices
> 
> Joao Martins (4):
>       device-dax: Make align a per-device property
>       device-dax: Add an 'align' attribute
>       dax/hmem: Introduce dax_hmem.region_idle parameter
>       device-dax: Add a range mapping allocation attribute
> 
> 
>  Documentation/x86/x86_64/boot-options.rst |    4 
>  arch/powerpc/kvm/book3s_hv_uvmem.c        |   14 
>  arch/x86/include/asm/numa.h               |    8 
>  arch/x86/kernel/e820.c                    |   16 
>  arch/x86/mm/numa.c                        |   11 
>  arch/x86/mm/numa_emulation.c              |    3 
>  arch/x86/xen/enlighten_pv.c               |    2 
>  drivers/acpi/numa/hmat.c                  |   76 --
>  drivers/acpi/numa/srat.c                  |    9 
>  drivers/base/core.c                       |    2 
>  drivers/dax/Kconfig                       |    4 
>  drivers/dax/Makefile                      |    3 
>  drivers/dax/bus.c                         | 1046 +++++++++++++++++++++++++++--
>  drivers/dax/bus.h                         |   28 -
>  drivers/dax/dax-private.h                 |   60 +-
>  drivers/dax/device.c                      |  134 ++--
>  drivers/dax/hmem.c                        |   56 --
>  drivers/dax/hmem/Makefile                 |    6 
>  drivers/dax/hmem/device.c                 |  100 +++
>  drivers/dax/hmem/hmem.c                   |   65 ++
>  drivers/dax/kmem.c                        |  199 +++---
>  drivers/dax/pmem/compat.c                 |    2 
>  drivers/dax/pmem/core.c                   |   22 -
>  drivers/firmware/efi/x86_fake_mem.c       |   12 
>  drivers/gpu/drm/nouveau/nouveau_dmem.c    |   15 
>  drivers/nvdimm/badrange.c                 |   26 -
>  drivers/nvdimm/claim.c                    |   13 
>  drivers/nvdimm/nd.h                       |    3 
>  drivers/nvdimm/pfn_devs.c                 |   13 
>  drivers/nvdimm/pmem.c                     |   27 -
>  drivers/nvdimm/region.c                   |   21 -
>  drivers/pci/p2pdma.c                      |   12 
>  include/acpi/acpi_numa.h                  |   14 
>  include/linux/dax.h                       |    8 
>  include/linux/memory_hotplug.h            |    5 
>  include/linux/memremap.h                  |   11 
>  include/linux/numa.h                      |   11 
>  include/linux/range.h                     |    6 
>  kernel/resource.c                         |   11 
>  lib/test_hmm.c                            |   15 
>  mm/memory_hotplug.c                       |   10 
>  mm/memremap.c                             |  299 +++++---
>  tools/testing/nvdimm/dax-dev.c            |   22 -
>  tools/testing/nvdimm/test/iomap.c         |    2 
>  44 files changed, 1825 insertions(+), 601 deletions(-)
>  delete mode 100644 drivers/dax/hmem.c
>  create mode 100644 drivers/dax/hmem/Makefile
>  create mode 100644 drivers/dax/hmem/device.c
>  create mode 100644 drivers/dax/hmem/hmem.c
> 
> base-commit: 01830e6c042e8eb6eb202e05d7df8057135b4c26
> 


-- 
Thanks,

David / dhildenb
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

  parent reply	other threads:[~2020-08-03  7:48 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-03  5:02 [PATCH v4 00/23] device-dax: Support sub-dividing soft-reserved ranges Dan Williams
2020-08-03  5:02 ` [PATCH v4 01/23] x86/numa: Cleanup configuration dependent command-line options Dan Williams
2020-08-03  5:02 ` [PATCH v4 02/23] x86/numa: Add 'nohmat' option Dan Williams
2020-08-03  5:02 ` [PATCH v4 03/23] efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance Dan Williams
2020-08-03  5:02 ` [PATCH v4 04/23] ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device Dan Williams
2020-08-03  5:02 ` [PATCH v4 05/23] resource: Report parent to walk_iomem_res_desc() callback Dan Williams
2020-08-03  5:02 ` [PATCH v4 06/23] mm/memory_hotplug: Introduce default phys_to_target_node() implementation Dan Williams
2020-08-03  5:03 ` [PATCH v4 07/23] ACPI: HMAT: Attach a device for each soft-reserved range Dan Williams
2020-08-03  5:03 ` [PATCH v4 08/23] device-dax: Drop the dax_region.pfn_flags attribute Dan Williams
2020-08-03  5:03 ` [PATCH v4 09/23] device-dax: Move instance creation parameters to 'struct dev_dax_data' Dan Williams
2020-08-03  5:03 ` [PATCH v4 10/23] device-dax: Make pgmap optional for instance creation Dan Williams
2020-08-03  5:03 ` [PATCH v4 11/23] device-dax: Kill dax_kmem_res Dan Williams
2020-08-21 10:06   ` David Hildenbrand
2020-09-08 15:33     ` Joao Martins
2020-09-08 18:03       ` David Hildenbrand
2020-09-23  8:04       ` David Hildenbrand
2020-09-23 21:41         ` Dan Williams
2020-09-24  7:25           ` David Hildenbrand
2020-09-24 13:54             ` Dan Williams
2020-09-24 18:12               ` David Hildenbrand
2020-09-24 21:26                 ` Dan Williams
2020-09-24 21:41                   ` David Hildenbrand
2020-09-24 21:50                     ` Dan Williams
2020-09-25  8:54                       ` David Hildenbrand
2020-08-03  5:03 ` [PATCH v4 12/23] device-dax: Add an allocation interface for device-dax instances Dan Williams
2020-08-03  5:03 ` [PATCH v4 13/23] device-dax: Introduce 'seed' devices Dan Williams
2020-08-03  5:03 ` [PATCH v4 14/23] drivers/base: Make device_find_child_by_name() compatible with sysfs inputs Dan Williams
2020-08-03  5:03 ` [PATCH v4 15/23] device-dax: Add resize support Dan Williams
2020-08-21 22:56   ` Andrew Morton
2020-08-03  5:03 ` [PATCH v4 16/23] mm/memremap_pages: Convert to 'struct range' Dan Williams
2020-08-03  5:03 ` [PATCH v4 17/23] mm/memremap_pages: Support multiple ranges per invocation Dan Williams
2020-08-03  5:04 ` [PATCH v4 18/23] device-dax: Add dis-contiguous resource support Dan Williams
2020-08-03  5:04 ` [PATCH v4 19/23] device-dax: Introduce 'mapping' devices Dan Williams
2020-08-03  5:04 ` [PATCH v4 20/23] device-dax: Make align a per-device property Dan Williams
2020-08-03  5:04 ` [PATCH v4 21/23] device-dax: Add an 'align' attribute Dan Williams
2020-08-03  5:04 ` [PATCH v4 22/23] dax/hmem: Introduce dax_hmem.region_idle parameter Dan Williams
2020-08-03  5:04 ` [PATCH v4 23/23] device-dax: Add a range mapping allocation attribute Dan Williams
2020-08-03  7:47 ` David Hildenbrand [this message]
2020-08-20  1:53   ` [PATCH v4 00/23] device-dax: Support sub-dividing soft-reserved ranges Dan Williams
2020-08-21 10:15     ` David Hildenbrand
2020-08-21 18:27       ` Dan Williams
2020-08-21 18:30         ` David Hildenbrand
2020-08-21 21:17           ` Dan Williams
2020-08-21 21:33             ` David Hildenbrand
2020-08-21 21:42               ` David Hildenbrand
2020-08-21 21:43               ` David Hildenbrand
2020-08-21 21:46               ` David Hildenbrand
2020-08-21 23:21     ` Andrew Morton
2020-08-22  2:32       ` Leizhen (ThunderTown)
2020-09-08 10:45       ` David Hildenbrand
2020-09-23  0:43         ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c59111f9-7c94-8b9e-2b8c-4cb96b9aa848@redhat.com \
    --to=david@redhat.com \
    --cc=Brice.Goglin@inria.fr \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=ardb@kernel.org \
    --cc=benh@kernel.crashing.org \
    --cc=bp@alien8.de \
    --cc=bskeggs@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=dave.hansen@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=jgg@mellanox.com \
    --cc=joao.m.martins@oracle.com \
    --cc=justin.he@arm.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=pasha.tatashin@soleen.com \
    --cc=paulus@ozlabs.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).