Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: akpm@linux-foundation.org
Cc: David Hildenbrand <david@redhat.com>,
	Ira Weiny <ira.weiny@intel.com>, Ard Biesheuvel <ardb@kernel.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Borislav Petkov <bp@alien8.de>,
	Vishal Verma <vishal.l.verma@intel.com>,
	David Airlie <airlied@linux.ie>, Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Joao Martins <joao.m.martins@oracle.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Dave Jiang <dave.jiang@intel.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Wei Yang <richardw.yang@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ben Skeggs <bskeggs@redhat.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Jason Gunthorpe <jgg@mellanox.com>, Jia He <justin.he@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Paul Mackerras <paulus@ozlabs.org>,
	Brice Goglin <Brice.Goglin@inria.fr>,
	Jeff Moyer <jmoyer@redhat.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Daniel Vetter <daniel@ffwll.ch>,
	Andy Lutomirski <luto@kernel.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	vishal.l.verma@intel.com, linux-mm@kvack.org,
	linux-nvdimm@lists.01.org, joao.m.martins@oracle.com,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	dri-devel@lists.freedesktop.org
Subject: [PATCH v3 00/23] device-dax: Support sub-dividing soft-reserved ranges
Date: Fri, 31 Jul 2020 20:24:58 -0700
Message-ID: <159625229779.3040297.11363509688097221416.stgit@dwillia2-desk3.amr.corp.intel.com> (raw)

Changes since v2 [1]:
- Rebase on next/master to resolve conflicts with pending mem-hotplug
  and memremap_pages() changes in -mm

- Drop attempt at a generic phys_to_target_node() implementation and
  just follow the default fallback approach taken with
  memory_add_physaddr_to_nid() (Mike)

- Fix test_hmm and other compilation fixups (Ralph)

- Integrate Joao's extensions to the device-dax sub-division interface
  (per-device align, user-directed extent allocation). (Joao)

[1]: http://lore.kernel.org/r/159457116473.754248.7879464730875147365.stgit@dwillia2-desk3.amr.corp.intel.com

---
Merge notes:

Andrew, this series is rebased on today's next/master to resolve
conflicts with some pending patches in -mm. I'd like to take it through
your tree given the intersections with memremap_pages() and memory
hotplug. If at all possible I'd like to see it in v5.10, but I realize
time is short. Outside of the Intel identified use cases for this Joao
has identified a use case for Oracle as well.

I would have sent this earlier save for the fact I am mostly offline
tending to a newborn these days. Vishal has stepped up to take on care
and feeding of this patchset if additional review / integration fixups
are needed.

The one test feedback this wants is from Justin (justin.he@arm.com), and
whether this lights up dax_kmem and now dax_hmem for him on arm64.
Otherwise, Joao has written unit tests for this in his enabling of the
daxctl userspace utility [2].

---
Cover:

The device-dax facility allows an address range to be directly mapped
through a chardev, or optionally hotplugged to the core kernel page
allocator as System-RAM. It is the mechanism for converting persistent
memory (pmem) to be used as another volatile memory pool i.e. the
current Memory Tiering hot topic on linux-mm.

In the case of pmem the nvdimm-namespace-label mechanism can sub-divide
it, but that labeling mechanism is not available / applicable to
soft-reserved ("EFI specific purpose") memory [3]. This series provides
a sysfs-mechanism for the daxctl utility to enable provisioning of
volatile-soft-reserved memory ranges.

The motivations for this facility are:

1/ Allow performance differentiated memory ranges to be split between
   kernel-managed and directly-accessed use cases.

2/ Allow physical memory to be provisioned along performance relevant
   address boundaries. For example, divide a memory-side cache [4] along
   cache-color boundaries.

3/ Parcel out soft-reserved memory to VMs using device-dax as a security
   / permissions boundary [5]. Specifically I have seen people (ab)using
   memmap=nn!ss (mark System-RAM as Persistent Memory) just to get the
   device-dax interface on custom address ranges. A follow-on for the VM
   use case is to teach device-dax to dynamically allocate 'struct page' at
   runtime to reduce the duplication of 'struct page' space in both the
   guest and the host kernel for the same physical pages.

[2]: http://lore.kernel.org/r/20200713160837.13774-11-joao.m.martins@oracle.com
[3]: http://lore.kernel.org/r/157309097008.1579826.12818463304589384434.stgit@dwillia2-desk3.amr.corp.intel.com
[4]: http://lore.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
[5]: http://lore.kernel.org/r/20200110190313.17144-1-joao.m.martins@oracle.com

---

Dan Williams (19):
      x86/numa: Cleanup configuration dependent command-line options
      x86/numa: Add 'nohmat' option
      efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance
      ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device
      resource: Report parent to walk_iomem_res_desc() callback
      mm/memory_hotplug: Introduce default phys_to_target_node() implementation
      ACPI: HMAT: Attach a device for each soft-reserved range
      device-dax: Drop the dax_region.pfn_flags attribute
      device-dax: Move instance creation parameters to 'struct dev_dax_data'
      device-dax: Make pgmap optional for instance creation
      device-dax: Kill dax_kmem_res
      device-dax: Add an allocation interface for device-dax instances
      device-dax: Introduce 'seed' devices
      drivers/base: Make device_find_child_by_name() compatible with sysfs inputs
      device-dax: Add resize support
      mm/memremap_pages: Convert to 'struct range'
      mm/memremap_pages: Support multiple ranges per invocation
      device-dax: Add dis-contiguous resource support
      device-dax: Introduce 'mapping' devices

Joao Martins (4):
      device-dax: Make align a per-device property
      device-dax: Add an 'align' attribute
      dax/hmem: Introduce dax_hmem.region_idle parameter
      device-dax: Add a range mapping allocation attribute


 arch/powerpc/kvm/book3s_hv_uvmem.c     |   14 
 arch/x86/include/asm/numa.h            |    8 
 arch/x86/kernel/e820.c                 |   16 
 arch/x86/mm/numa.c                     |   11 
 arch/x86/mm/numa_emulation.c           |    3 
 arch/x86/xen/enlighten_pv.c            |    2 
 drivers/acpi/numa/hmat.c               |   76 --
 drivers/acpi/numa/srat.c               |    9 
 drivers/base/core.c                    |    2 
 drivers/dax/Kconfig                    |    4 
 drivers/dax/Makefile                   |    3 
 drivers/dax/bus.c                      | 1055 ++++++++++++++++++++++++++++++--
 drivers/dax/bus.h                      |   28 +
 drivers/dax/dax-private.h              |   40 +
 drivers/dax/device.c                   |  132 ++--
 drivers/dax/hmem.c                     |   56 --
 drivers/dax/hmem/Makefile              |    6 
 drivers/dax/hmem/device.c              |  100 +++
 drivers/dax/hmem/hmem.c                |   65 ++
 drivers/dax/kmem.c                     |  199 +++---
 drivers/dax/pmem/compat.c              |    2 
 drivers/dax/pmem/core.c                |   22 -
 drivers/firmware/efi/x86_fake_mem.c    |   12 
 drivers/gpu/drm/nouveau/nouveau_dmem.c |   15 
 drivers/nvdimm/badrange.c              |   26 -
 drivers/nvdimm/claim.c                 |   13 
 drivers/nvdimm/nd.h                    |    3 
 drivers/nvdimm/pfn_devs.c              |   13 
 drivers/nvdimm/pmem.c                  |   27 -
 drivers/nvdimm/region.c                |   21 -
 drivers/pci/p2pdma.c                   |   12 
 include/acpi/acpi_numa.h               |   14 
 include/linux/dax.h                    |    8 
 include/linux/memory_hotplug.h         |    5 
 include/linux/memremap.h               |   11 
 include/linux/range.h                  |    6 
 kernel/resource.c                      |   11 
 lib/test_hmm.c                         |   15 
 mm/memory_hotplug.c                    |   10 
 mm/memremap.c                          |  299 +++++----
 tools/testing/nvdimm/dax-dev.c         |   22 -
 tools/testing/nvdimm/test/iomap.c      |    2 
 42 files changed, 1810 insertions(+), 588 deletions(-)
 delete mode 100644 drivers/dax/hmem.c
 create mode 100644 drivers/dax/hmem/Makefile
 create mode 100644 drivers/dax/hmem/device.c
 create mode 100644 drivers/dax/hmem/hmem.c

base-commit: 01830e6c042e8eb6eb202e05d7df8057135b4c26


             reply index

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-01  3:24 Dan Williams [this message]
2020-08-01  3:25 ` [PATCH v3 01/23] x86/numa: Cleanup configuration dependent command-line options Dan Williams
2020-08-01  3:25 ` [PATCH v3 02/23] x86/numa: Add 'nohmat' option Dan Williams
2020-08-01  3:51   ` Randy Dunlap
2020-08-01 16:36     ` Dan Williams
2020-08-01  3:25 ` [PATCH v3 03/23] efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance Dan Williams
2020-08-01  3:25 ` [PATCH v3 04/23] ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device Dan Williams
2020-08-01  3:25 ` [PATCH v3 05/23] resource: Report parent to walk_iomem_res_desc() callback Dan Williams
2020-08-01  3:25 ` [PATCH v3 06/23] mm/memory_hotplug: Introduce default phys_to_target_node() implementation Dan Williams
2020-08-01  6:24   ` kernel test robot
2020-08-01 16:39   ` kernel test robot
2020-08-01  3:25 ` [PATCH v3 07/23] ACPI: HMAT: Attach a device for each soft-reserved range Dan Williams
2020-08-01  3:25 ` [PATCH v3 08/23] device-dax: Drop the dax_region.pfn_flags attribute Dan Williams
2020-08-01  3:25 ` [PATCH v3 09/23] device-dax: Move instance creation parameters to 'struct dev_dax_data' Dan Williams
2020-08-01  3:25 ` [PATCH v3 10/23] device-dax: Make pgmap optional for instance creation Dan Williams
2020-08-01  3:26 ` [PATCH v3 11/23] device-dax: Kill dax_kmem_res Dan Williams
2020-08-01  3:26 ` [PATCH v3 12/23] device-dax: Add an allocation interface for device-dax instances Dan Williams
2020-08-01  3:26 ` [PATCH v3 13/23] device-dax: Introduce 'seed' devices Dan Williams
2020-08-01  3:26 ` [PATCH v3 14/23] drivers/base: Make device_find_child_by_name() compatible with sysfs inputs Dan Williams
2020-08-01  3:26 ` [PATCH v3 15/23] device-dax: Add resize support Dan Williams
2020-08-01  3:26 ` [PATCH v3 16/23] mm/memremap_pages: Convert to 'struct range' Dan Williams
2020-08-01  3:26 ` [PATCH v3 17/23] mm/memremap_pages: Support multiple ranges per invocation Dan Williams
2020-08-01  3:26 ` [PATCH v3 18/23] device-dax: Add dis-contiguous resource support Dan Williams
2020-08-01  3:26 ` [PATCH v3 19/23] device-dax: Introduce 'mapping' devices Dan Williams
2020-08-01  3:26 ` [PATCH v3 20/23] device-dax: Make align a per-device property Dan Williams
2020-08-01  7:23   ` kernel test robot
2020-08-01  3:26 ` [PATCH v3 21/23] device-dax: Add an 'align' attribute Dan Williams
2020-08-01  6:14   ` kernel test robot
2020-08-01  6:18   ` kernel test robot
2020-08-01  3:27 ` [PATCH v3 22/23] dax/hmem: Introduce dax_hmem.region_idle parameter Dan Williams
2020-08-01  3:27 ` [PATCH v3 23/23] device-dax: Add a range mapping allocation attribute Dan Williams
2020-08-04 17:02 ` [PATCH v3 00/23] device-dax: Support sub-dividing soft-reserved ranges Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=159625229779.3040297.11363509688097221416.stgit@dwillia2-desk3.amr.corp.intel.com \
    --to=dan.j.williams@intel.com \
    --cc=Brice.Goglin@inria.fr \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=ardb@kernel.org \
    --cc=benh@kernel.crashing.org \
    --cc=bp@alien8.de \
    --cc=bskeggs@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=daniel@ffwll.ch \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=ira.weiny@intel.com \
    --cc=jgg@mellanox.com \
    --cc=jmoyer@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=justin.he@arm.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=pasha.tatashin@soleen.com \
    --cc=paulus@ozlabs.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rafael@kernel.org \
    --cc=richardw.yang@linux.intel.com \
    --cc=rjw@rjwysocki.net \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=vishal.l.verma@intel.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git