Linux-ACPI Archive on lore.kernel.org
 help / color / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: linux-nvdimm@lists.01.org
Cc: David Hildenbrand <david@redhat.com>,
	Ira Weiny <ira.weiny@intel.com>, Ard Biesheuvel <ardb@kernel.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Borislav Petkov <bp@alien8.de>,
	Vishal Verma <vishal.l.verma@intel.com>,
	David Airlie <airlied@linux.ie>, Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Joao Martins <joao.m.martins@oracle.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Dave Jiang <dave.jiang@intel.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Wei Yang <richardw.yang@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ben Skeggs <bskeggs@redhat.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Jason Gunthorpe <jgg@mellanox.com>, Jia He <justin.he@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Paul Mackerras <paulus@ozlabs.org>,
	Brice Goglin <Brice.Goglin@inria.fr>,
	Jeff Moyer <jmoyer@redhat.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Daniel Vetter <daniel@ffwll.ch>,
	Andy Lutomirski <luto@kernel.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-acpi@vger.kernel.org, hch@lst.de,
	joao.m.martins@oracle.com
Subject: [PATCH v2 00/22] device-dax: Support sub-dividing soft-reserved ranges
Date: Sun, 12 Jul 2020 09:26:05 -0700
Message-ID: <159457116473.754248.7879464730875147365.stgit@dwillia2-desk3.amr.corp.intel.com> (raw)

Changes since v1 [1]:

- Combine this series with "Manual definition of Soft Reserved memory
  devices" [2] as the pre-requisites are required to test the device-dax
  facility, the device-dax changes are part of the justification for the
  numa-info reworks.

- Provide a generic version of numa data retrieval based on memblock for
  arm64 rather than adding a new / empty phys_to_target_node() stub
  alongside memory_add_physaddr_to_nid(). (Will)

- Fix several corner case allocation bugs and pass the unit test written
  by Joao.

- Lift the restriction that a 'seed' device must be activated before a
  new seed can be created. This minor sanity check was to prevent userspace
  spamming devices, but it gets in the way of some of allocation
  scenarios like allocating a memory-range that is guaranteed to never be
  evicted due to memory-side-cache conflicts. (Iqbal)

- Add debug prints for space allocation decisions (Joao)

- Rebased on v5.8-rc2 which included resolving conflicts in the kmem
  driver and memremap_pages().

[1]: http://lore.kernel.org/r/158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com
[2]: http://lore.kernel.org/r/158489354353.1457606.8327903161927980740.stgit@dwillia2-desk3.amr.corp.intel.com/

---

The device-dax facility allows an address range to be directly mapped
through a chardev, or optionally hotplugged to the core kernel page
allocator as System-RAM. It is the mechanism for converting persistent
memory (pmem) to be used as another volatile memory pool i.e. the
current Memory Tiering hot topic on linux-mm.

In the case of pmem the nvdimm-namespace-label mechanism can sub-divide
it, but that labeling mechanism is not available / applicable to
soft-reserved ("EFI specific purpose") memory [3]. This series provides
a sysfs-mechanism for the daxctl utility to enable provisioning of
volatile-soft-reserved memory ranges.

The motivations for this facility are:

1/ Allow performance differentiated memory ranges to be split between
   kernel-managed and directly-accessed use cases.

2/ Allow physical memory to be provisioned along performance relevant
   address boundaries. For example, divide a memory-side cache [4] along
   cache-color boundaries.

3/ Parcel out soft-reserved memory to VMs using device-dax as a security
   / permissions boundary [5]. Specifically I have seen people (ab)using
   memmap=nn!ss (mark System-RAM as Persistent Memory) just to get the
   device-dax interface on custom address ranges. A follow-on for the VM
   use case is to teach device-dax to dynamically allocate 'struct page' at
   runtime to reduce the duplication of 'struct page' space in both the
   guest and the host kernel for the same physical pages.

Given the intersections of arm64, x86, and core memremap_pages() changes
I'd like to explore taking this through the libnvdimm tree, but that is
step 2. Any concerns with the proposed infrastructure changes
(memblock-numainfo and multi-range-memremap-pages)?

[3]: http://lore.kernel.org/r/157309097008.1579826.12818463304589384434.stgit@dwillia2-desk3.amr.corp.intel.com
[4]: http://lore.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
[5]: http://lore.kernel.org/r/20200110190313.17144-1-joao.m.martins@oracle.com

---

Dan Williams (22):
      x86/numa: Cleanup configuration dependent command-line options
      x86/numa: Add 'nohmat' option
      efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance
      ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device
      resource: Report parent to walk_iomem_res_desc() callback
      x86: Move NUMA_KEEP_MEMINFO and related definition to x86-internals
      numa: Introduce a generic memory_add_physaddr_to_nid()
      memblock: Introduce a generic phys_addr_to_target_node()
      arm64: Convert to generic memblock for numa-info
      ACPI: HMAT: Attach a device for each soft-reserved range
      device-dax: Drop the dax_region.pfn_flags attribute
      device-dax: Move instance creation parameters to 'struct dev_dax_data'
      device-dax: Make pgmap optional for instance creation
      device-dax: Kill dax_kmem_res
      device-dax: Add an allocation interface for device-dax instances
      device-dax: Introduce 'seed' devices
      drivers/base: Make device_find_child_by_name() compatible with sysfs inputs
      device-dax: Add resize support
      mm/memremap_pages: Convert to 'struct range'
      mm/memremap_pages: Support multiple ranges per invocation
      device-dax: Add dis-contiguous resource support
      device-dax: Introduce 'mapping' devices

 arch/arm64/Kconfig                     |    1 
 arch/arm64/mm/numa.c                   |   10 
 arch/powerpc/kvm/book3s_hv_uvmem.c     |   14 
 arch/x86/Kconfig                       |    7 
 arch/x86/include/asm/numa.h            |    8 
 arch/x86/kernel/e820.c                 |   16 +
 arch/x86/mm/numa.c                     |   12 
 arch/x86/mm/numa_emulation.c           |    3 
 arch/x86/mm/numa_internal.h            |    7 
 arch/x86/xen/enlighten_pv.c            |    2 
 drivers/acpi/numa/hmat.c               |   76 ---
 drivers/acpi/numa/srat.c               |    9 
 drivers/base/core.c                    |    2 
 drivers/dax/Kconfig                    |    6 
 drivers/dax/Makefile                   |    3 
 drivers/dax/bus.c                      |  902 ++++++++++++++++++++++++++++++--
 drivers/dax/bus.h                      |   28 +
 drivers/dax/dax-private.h              |   39 +
 drivers/dax/device.c                   |   97 ++-
 drivers/dax/hmem/Makefile              |    6 
 drivers/dax/hmem/device.c              |  100 ++++
 drivers/dax/hmem/hmem.c                |   20 -
 drivers/dax/kmem.c                     |  199 ++++---
 drivers/dax/pmem/compat.c              |    2 
 drivers/dax/pmem/core.c                |   22 +
 drivers/firmware/efi/x86_fake_mem.c    |   12 
 drivers/gpu/drm/nouveau/nouveau_dmem.c |    4 
 drivers/nvdimm/badrange.c              |   26 -
 drivers/nvdimm/claim.c                 |   13 
 drivers/nvdimm/nd.h                    |    3 
 drivers/nvdimm/pfn_devs.c              |   13 
 drivers/nvdimm/pmem.c                  |   27 +
 drivers/nvdimm/region.c                |   21 -
 drivers/pci/p2pdma.c                   |   12 
 include/acpi/acpi_numa.h               |   14 
 include/linux/dax.h                    |    8 
 include/linux/memblock.h               |    4 
 include/linux/memremap.h               |   11 
 include/linux/mm.h                     |   13 
 include/linux/numa.h                   |    9 
 include/linux/range.h                  |    6 
 kernel/resource.c                      |   11 
 mm/Kconfig                             |    7 
 mm/memblock.c                          |   22 +
 mm/memremap.c                          |  300 ++++++-----
 mm/page_alloc.c                        |   82 +++
 tools/testing/nvdimm/dax-dev.c         |   22 +
 tools/testing/nvdimm/test/iomap.c      |    2 
 48 files changed, 1705 insertions(+), 528 deletions(-)
 create mode 100644 drivers/dax/hmem/Makefile
 create mode 100644 drivers/dax/hmem/device.c
 rename drivers/dax/{hmem.c => hmem/hmem.c} (74%)

base-commit: 48778464bb7d346b47157d21ffde2af6b2d39110

             reply index

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-12 16:26 Dan Williams [this message]
2020-07-12 16:26 ` [PATCH v2 01/22] x86/numa: Cleanup configuration dependent command-line options Dan Williams
2020-07-12 16:26 ` [PATCH v2 02/22] x86/numa: Add 'nohmat' option Dan Williams
2020-07-12 16:58   ` Randy Dunlap
2020-07-12 16:26 ` [PATCH v2 03/22] efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance Dan Williams
2020-07-12 16:26 ` [PATCH v2 04/22] ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device Dan Williams
2020-07-12 16:26 ` [PATCH v2 05/22] resource: Report parent to walk_iomem_res_desc() callback Dan Williams
2020-07-12 16:26 ` [PATCH v2 06/22] x86: Move NUMA_KEEP_MEMINFO and related definition to x86-internals Dan Williams
2020-07-12 16:26 ` [PATCH v2 07/22] numa: Introduce a generic memory_add_physaddr_to_nid() Dan Williams
2020-07-13  6:58   ` Mike Rapoport
2020-07-13 15:42     ` Dan Williams
2020-07-12 16:26 ` [PATCH v2 08/22] memblock: Introduce a generic phys_addr_to_target_node() Dan Williams
2020-07-13  7:03   ` Mike Rapoport
2020-07-13 15:48     ` Dan Williams
2020-07-14  1:36       ` Justin He
2020-07-12 16:26 ` [PATCH v2 09/22] arm64: Convert to generic memblock for numa-info Dan Williams
2020-07-12 16:26 ` [PATCH v2 10/22] ACPI: HMAT: Attach a device for each soft-reserved range Dan Williams
2020-07-12 16:27 ` [PATCH v2 11/22] device-dax: Drop the dax_region.pfn_flags attribute Dan Williams
2020-07-12 16:27 ` [PATCH v2 12/22] device-dax: Move instance creation parameters to 'struct dev_dax_data' Dan Williams
2020-07-12 16:27 ` [PATCH v2 13/22] device-dax: Make pgmap optional for instance creation Dan Williams
2020-07-12 16:27 ` [PATCH v2 14/22] device-dax: Kill dax_kmem_res Dan Williams
2020-07-12 16:27 ` [PATCH v2 15/22] device-dax: Add an allocation interface for device-dax instances Dan Williams
2020-07-12 16:27 ` [PATCH v2 16/22] device-dax: Introduce 'seed' devices Dan Williams
2020-07-12 16:27 ` [PATCH v2 17/22] drivers/base: Make device_find_child_by_name() compatible with sysfs inputs Dan Williams
2020-07-12 17:09   ` Greg Kroah-Hartman
2020-07-13 15:39     ` Dan Williams
2020-07-13 15:52       ` Greg Kroah-Hartman
2020-07-13 16:09         ` Dan Williams
2020-07-13 16:12           ` Greg Kroah-Hartman
2020-07-13 16:36             ` Dan Williams
2020-07-12 16:27 ` [PATCH v2 18/22] device-dax: Add resize support Dan Williams
2020-07-12 16:27 ` [PATCH v2 19/22] mm/memremap_pages: Convert to 'struct range' Dan Williams
2020-07-13 16:36   ` Ralph Campbell
2020-07-13 16:54     ` Dan Williams
2020-07-12 16:27 ` [PATCH v2 20/22] mm/memremap_pages: Support multiple ranges per invocation Dan Williams
2020-07-12 16:27 ` [PATCH v2 21/22] device-dax: Add dis-contiguous resource support Dan Williams
2020-07-12 16:28 ` [PATCH v2 22/22] device-dax: Introduce 'mapping' devices Dan Williams
2020-07-16 13:18   ` Joao Martins
2020-07-16 16:00     ` Dan Williams
2020-07-16 19:04       ` Joao Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=159457116473.754248.7879464730875147365.stgit@dwillia2-desk3.amr.corp.intel.com \
    --to=dan.j.williams@intel.com \
    --cc=Brice.Goglin@inria.fr \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=ardb@kernel.org \
    --cc=benh@kernel.crashing.org \
    --cc=bp@alien8.de \
    --cc=bskeggs@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=daniel@ffwll.ch \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=ira.weiny@intel.com \
    --cc=jgg@mellanox.com \
    --cc=jmoyer@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=justin.he@arm.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=pasha.tatashin@soleen.com \
    --cc=paulus@ozlabs.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rafael@kernel.org \
    --cc=richardw.yang@linux.intel.com \
    --cc=rjw@rjwysocki.net \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=vishal.l.verma@intel.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ACPI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-acpi/0 linux-acpi/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-acpi linux-acpi/ https://lore.kernel.org/linux-acpi \
		linux-acpi@vger.kernel.org
	public-inbox-index linux-acpi

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-acpi


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git