NVDIMM Device and Persistent Memory development
 help / color / Atom feed
* [PATCH v2 0/4] ndctl: Add ipmregion tool with ipmregion list and reconfigure-region commands
@ 2021-07-20 15:51 James Anandraj
  2021-07-20 15:51 ` [PATCH v2 1/4] Documentation/ipmregion: Add documentation for ipmregion tool and commands James Anandraj
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: James Anandraj @ 2021-07-20 15:51 UTC (permalink / raw)
  To: nvdimm, james.sushanth.anandraj

From: James Sushanth Anandraj <james.sushanth.anandraj@intel.com>

The Intel Optane Persistent Memory OS provisioning specification
describes how to support basic provisioning for Intel Optane
persistent memory 100 and 200 series for use in different
operating modes using OS software.

This patch set introduces a new utility ipmregion that implements
basic provisioning as described in the provisioning specification
document at https://cdrdv2.intel.com/v1/dl/getContent/634430 .

The ipmregion utility provides enumeration and region reconfiguration
commands for "nvdimm" subsystem devices (Non-volatile Memory). This
is implemented as a separate tool rather than as a feature of ndctl as
the steps for provisioning are specific to Intel Optane devices and 
are as follows.
1..Generate a new region configuration request using this utility.
2. Reset the platform.
3. Use this utility to list the status of operation.

Since v1:
 * Change name of tool to ipmregion.
 * Change reconfigure-region modes to fault-isolation-pmem and performance-pmem.

James Sushanth Anandraj (4):
  Documentation/ipmregion: Add documentation for ipmregion tool and
    commands
  ipmregion/list: Add ipmregion-list command to enumerate 'nvdimm'
    devices
  ipmregion/reconfigure: Add ipmregion-reconfigure-region command
  ipmregion/reconfigure: Add support for different pmem region modes

 Documentation/ipmregion/Makefile.am           |   59 +
 .../ipmregion/asciidoctor-extensions.rb       |   30 +
 Documentation/ipmregion/ipmregion-list.txt    |   56 +
 .../ipmregion-reconfigure-region.txt          |   51 +
 Documentation/ipmregion/ipmregion.txt         |   40 +
 .../ipmregion/theory-of-operation.txt         |   29 +
 Makefile.am                                   |    4 +-
 configure.ac                                  |    2 +
 ipmregion/Makefile.am                         |   18 +
 ipmregion/builtin.h                           |    9 +
 ipmregion/ipmregion.c                         |   88 +
 ipmregion/list.c                              |  114 ++
 ipmregion/list.h                              |   11 +
 ipmregion/pcat.c                              |   59 +
 ipmregion/pcat.h                              |   13 +
 ipmregion/pcd.h                               |  381 +++++
 ipmregion/reconfigure.c                       | 1458 +++++++++++++++++
 ipmregion/reconfigure.h                       |   12 +
 util/main.h                                   |    1 +
 19 files changed, 2433 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ipmregion/Makefile.am
 create mode 100644 Documentation/ipmregion/asciidoctor-extensions.rb
 create mode 100644 Documentation/ipmregion/ipmregion-list.txt
 create mode 100644 Documentation/ipmregion/ipmregion-reconfigure-region.txt
 create mode 100644 Documentation/ipmregion/ipmregion.txt
 create mode 100644 Documentation/ipmregion/theory-of-operation.txt
 create mode 100644 ipmregion/Makefile.am
 create mode 100644 ipmregion/builtin.h
 create mode 100644 ipmregion/ipmregion.c
 create mode 100644 ipmregion/list.c
 create mode 100644 ipmregion/list.h
 create mode 100644 ipmregion/pcat.c
 create mode 100644 ipmregion/pcat.h
 create mode 100644 ipmregion/pcd.h
 create mode 100644 ipmregion/reconfigure.c
 create mode 100644 ipmregion/reconfigure.h

-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/4] Documentation/ipmregion: Add documentation for ipmregion tool and commands
  2021-07-20 15:51 [PATCH v2 0/4] ndctl: Add ipmregion tool with ipmregion list and reconfigure-region commands James Anandraj
@ 2021-07-20 15:51 ` James Anandraj
  2021-07-20 15:51 ` [PATCH v2 2/4] ipmregion/list: Add ipmregion-list command to enumerate 'nvdimm' devices James Anandraj
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: James Anandraj @ 2021-07-20 15:51 UTC (permalink / raw)
  To: nvdimm, james.sushanth.anandraj

From: James Sushanth Anandraj <james.sushanth.anandraj@intel.com>

Add man page files for ipmregion tool, ipmregion-list and
ipmregion-reconfigure-region commands. Ipmregion is a tool to
help region reconfiguration for 'nvdimm' devices.
It modifies a portion of the pcd region on 'nvdimm' devices to
reconfigure regions. The module Platform Configuration Data (PCD)
refers to a section of the PMem module that is used to store
metadata. The metadata stored in the PCD is the architected
interface between software and platform firmware to support
PMem provisioning

Signed-off-by: James Sushanth Anandraj <james.sushanth.anandraj@intel.com>
---
 Documentation/ipmregion/Makefile.am           | 59 +++++++++++++++++++
 .../ipmregion/asciidoctor-extensions.rb       | 30 ++++++++++
 Documentation/ipmregion/ipmregion-list.txt    | 56 ++++++++++++++++++
 .../ipmregion-reconfigure-region.txt          | 51 ++++++++++++++++
 Documentation/ipmregion/ipmregion.txt         | 40 +++++++++++++
 .../ipmregion/theory-of-operation.txt         | 29 +++++++++
 Makefile.am                                   |  2 +-
 configure.ac                                  |  1 +
 8 files changed, 267 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ipmregion/Makefile.am
 create mode 100644 Documentation/ipmregion/asciidoctor-extensions.rb
 create mode 100644 Documentation/ipmregion/ipmregion-list.txt
 create mode 100644 Documentation/ipmregion/ipmregion-reconfigure-region.txt
 create mode 100644 Documentation/ipmregion/ipmregion.txt
 create mode 100644 Documentation/ipmregion/theory-of-operation.txt

diff --git a/Documentation/ipmregion/Makefile.am b/Documentation/ipmregion/Makefile.am
new file mode 100644
index 0000000..baadad5
--- /dev/null
+++ b/Documentation/ipmregion/Makefile.am
@@ -0,0 +1,59 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2015-2020 Intel Corporation. All rights reserved.
+
+if USE_ASCIIDOCTOR
+
+do_subst = sed -e 's,@Utility@,Ipmregion,g' -e's,@utility@,ipmregion,g'
+CONFFILE = asciidoctor-extensions.rb
+asciidoctor-extensions.rb: ../asciidoctor-extensions.rb.in
+	$(AM_V_GEN) $(do_subst) < $< > $@
+
+else
+
+do_subst = sed -e 's,UTILITY,ipmregion,g'
+CONFFILE = asciidoc.conf
+asciidoc.conf: ../asciidoc.conf.in
+	$(AM_V_GEN) $(do_subst) < $< > $@
+
+endif
+
+man1_MANS = \
+	ipmregion.1 \
+	ipmregion-list.1 \
+	ipmregion-reconfigure-region.1
+
+EXTRA_DIST = $(man1_MANS)
+
+CLEANFILES = $(man1_MANS)
+
+XML_DEPS = \
+	../../version.m4 \
+	../copyright.txt \
+	Makefile \
+	$(CONFFILE)
+
+RM ?= rm -f
+
+if USE_ASCIIDOCTOR
+
+%.1: %.txt $(XML_DEPS)
+	$(AM_V_GEN)$(RM) $@+ $@ && \
+		$(ASCIIDOC) -b manpage -d manpage -acompat-mode \
+		-I. -rasciidoctor-extensions \
+		-amansource=ipmregion -amanmanual="ipmregion Manual" \
+		-andctl_version=$(VERSION) -o $@+ $< && \
+		mv $@+ $@
+
+else
+
+%.xml: %.txt $(XML_DEPS)
+	$(AM_V_GEN)$(RM) $@+ $@ && \
+		$(ASCIIDOC) -b docbook -d manpage -f asciidoc.conf \
+		--unsafe -aipmregion_version=$(VERSION) -o $@+ $< && \
+		mv $@+ $@
+
+%.1: %.xml $(XML_DEPS)
+	$(AM_V_GEN)$(RM) $@ && \
+		$(XMLTO) -o . -m ../manpage-normal.xsl man $<
+
+endif
diff --git a/Documentation/ipmregion/asciidoctor-extensions.rb b/Documentation/ipmregion/asciidoctor-extensions.rb
new file mode 100644
index 0000000..fa9b9f6
--- /dev/null
+++ b/Documentation/ipmregion/asciidoctor-extensions.rb
@@ -0,0 +1,30 @@
+require 'asciidoctor'
+require 'asciidoctor/extensions'
+
+module Ipmregion
+  module Documentation
+    class LinkIpmregionProcessor < Asciidoctor::Extensions::InlineMacroProcessor
+      use_dsl
+
+      named :chrome
+
+      def process(parent, target, attrs)
+        if parent.document.basebackend? 'html'
+          prefix = parent.document.attr('ipmregion-relative-html-prefix')
+          %(<a href="#{prefix}#{target}.html">#{target}(#{attrs[1]})</a>\n)
+        elsif parent.document.basebackend? 'manpage'
+          "#{target}(#{attrs[1]})"
+        elsif parent.document.basebackend? 'docbook'
+          "<citerefentry>\n" \
+            "<refentrytitle>#{target}</refentrytitle>" \
+            "<manvolnum>#{attrs[1]}</manvolnum>\n" \
+          "</citerefentry>\n"
+        end
+      end
+    end
+  end
+end
+
+Asciidoctor::Extensions.register do
+  inline_macro Ipmregion::Documentation::LinkIpmregionProcessor, :linkipmregion
+end
diff --git a/Documentation/ipmregion/ipmregion-list.txt b/Documentation/ipmregion/ipmregion-list.txt
new file mode 100644
index 0000000..799ccbb
--- /dev/null
+++ b/Documentation/ipmregion/ipmregion-list.txt
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+
+ipmregion-list(1)
+=================
+
+NAME
+----
+ipmregion-list - dump the platform nvdimm device topology and region
+reconfiguration attributes in json.
+
+include::theory-of-operation.txt[]
+
+SYNOPSIS
+--------
+[verse]
+'ipmregion list' [<options>]
+
+Walk all the nvdimm buses in the system and list all attached devices
+along with some of their major attributes including region reconfiguration
+attributes. Region reconfiguration involves writing to the pcd region.
+followed by a platform reset. The reconfiguration attributes are obtained
+from fields in pcd region. The attributes are reconfigure_status and
+reconfigure_pending. Reconfigure_status presents a human readable status
+string for the last region reconfiguration action. Reconfigure_pending
+is a boolean that indicates if a region reconfiguration request
+has been written to the pcd region.
+
+Options can be specified to limit the output to objects of a certain
+bus.
+
+EXAMPLE
+-------
+----
+# ipmregion list
+[
+    {
+        "dev":"nmem1",
+        "id":"8089-a2-1823-00000043",
+        "handle":17,
+        "phys_id":33,
+        "reconfigure_status":"success"
+    }
+]
+----
+
+OPTIONS
+-------
+-b::
+--bus=::
+include::../ndctl/xable-bus-options.txt[]
+
+-v::
+--verbose::
+    Emit debug messages from the devices when reading pcd data.
+
+include::../copyright.txt[]
diff --git a/Documentation/ipmregion/ipmregion-reconfigure-region.txt b/Documentation/ipmregion/ipmregion-reconfigure-region.txt
new file mode 100644
index 0000000..8eab072
--- /dev/null
+++ b/Documentation/ipmregion/ipmregion-reconfigure-region.txt
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+
+ipmregion-reconfigure-region(1)
+===============================
+
+NAME
+----
+ipmregion-reconfigure-region - reconfigure non-volatile memory device capacity
+into regions
+
+include::theory-of-operation.txt[]
+
+SYNOPSIS
+--------
+[verse]
+'ipmregion reconfigure-region' [<options>]
+
+EXAMPLES
+--------
+Request interleaved persistent memory region(s) on the default bus using
+maximum possible interleave ways to maximize bandwidth.
+[verse]
+ipmregion reconfigure-region
+
+Request non-interleaved persistent memory region(s) on the default bus.
+[verse]
+ipmregion reconfigure-region -m fault-isolation-pmem
+
+Request volatile memory region on the default bus.
+[verse]
+ipmregion reconfigure-regions –m ram
+
+OPTIONS
+-------
+-m::
+--mode::
+   Region reconfiguration request mode. Each region’s
+   capacity will be restricted to a single non-volatile memory device. The
+   possible values for this option are ram, performance-pmem and
+   fault-isolation-pmem. If this option is not specified the default is
+   fault-isolation-pmem.
+
+-b::
+--bus=::
+include::../ndctl/xable-bus-options.txt[]
+
+-v::
+--verbose::
+    Emit debug messages for the region configuration request process.
+
+include::../copyright.txt[]
diff --git a/Documentation/ipmregion/ipmregion.txt b/Documentation/ipmregion/ipmregion.txt
new file mode 100644
index 0000000..a3e10b0
--- /dev/null
+++ b/Documentation/ipmregion/ipmregion.txt
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: GPL-2.0
+
+ipmregion(1)
+============
+
+NAME
+----
+ipmregion - Provides enumeration and region reconfiguraion commands for "nvdimm"
+subsystem devices (Non-volatile Memory)
+
+include::theory-of-operation.txt[]
+
+SYNOPSIS
+--------
+[verse]
+'ipmregion' [--version] [--help] COMMAND [ARGS]
+
+OPTIONS
+-------
+-v::
+--version::
+  Display ipmregion version.
+
+-h::
+--help::
+  Run ipmregion help command.
+
+DESCRIPTION
+-----------
+The ipmregion utility provides enumeration and region reconfiguration commands for
+"nvdimm" subsystem devices (Non-volatile Memory). Operations
+supported by the tool include region reconfiguration and enumeration of the
+devices and their region reconfiguration status.
+
+include::../copyright.txt[]
+
+SEE ALSO
+--------
+linkipmregion:ipmregion-list[1],
+linkipmregion:ipmregion-reconfigure-region[1]
diff --git a/Documentation/ipmregion/theory-of-operation.txt b/Documentation/ipmregion/theory-of-operation.txt
new file mode 100644
index 0000000..7af5267
--- /dev/null
+++ b/Documentation/ipmregion/theory-of-operation.txt
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+
+THEORY OF OPERATION
+-------------------
+A region is persistent memory from one or more non-volatile memory devices that
+is mapped into the system physical address (SPA) space. For some device vendors,
+reconfiguring regions is a multi-step process as follows.
+1. Generate a new region configuration request using this command.
+2. Reset the platform.
+3. Platform firmware (BIOS) processes the region configuration request and
+presents the new region configuration via ACPI NFIT tables. The status of this
+BIOS operation can be retrieved using the ipmregion-list command.
+
+Region types are as follows:
+1. Performance Persistent Memory Region (performance-pmem)
+This is a persistent memory region that utilizes hardware interleaving across
+non-volatile memory devices. This configuration maximizes bandwidth.
+2. Fault Isolation Persistent Memory Region (fault-isolation-pmem)
+This is a persistent memory region that does not utilize hardware interleaving
+across non-volatile memory devices. This configuration maximizes the isolation
+and resilency to faults in individual modules.
+3. Volatile Memory Region (ram)
+The portion of persistent memory in the system that is used in a volatile
+memory region is treated as volatile 'system-ram' to expand the overall system
+memory. This type of region is entirely managed by platform firmware (BIOS) and
+is no longer visible in 'ndctl' nor is it usable by applications as persistent
+storage. Additionally, in this mode, some portion of DRAM in the system is
+'consumed' by the platform firmware to act as a cache that fronts the
+persistent memory.
diff --git a/Makefile.am b/Makefile.am
index 60a1998..f9fec0c 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -3,7 +3,7 @@ include Makefile.am.in
 ACLOCAL_AMFLAGS = -I m4 ${ACLOCAL_FLAGS}
 SUBDIRS = . daxctl/lib ndctl/lib ndctl daxctl
 if ENABLE_DOCS
-SUBDIRS += Documentation/ndctl Documentation/daxctl
+SUBDIRS += Documentation/ndctl Documentation/daxctl Documentation/ipmregion
 endif
 SUBDIRS += test
 
diff --git a/configure.ac b/configure.ac
index 5ec8d2f..9f16b01 100644
--- a/configure.ac
+++ b/configure.ac
@@ -228,6 +228,7 @@ AC_CONFIG_FILES([
         test/Makefile
         Documentation/ndctl/Makefile
         Documentation/daxctl/Makefile
+        Documentation/ipmregion/Makefile
 ])
 
 AC_OUTPUT
-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 2/4] ipmregion/list: Add ipmregion-list command to enumerate 'nvdimm' devices
  2021-07-20 15:51 [PATCH v2 0/4] ndctl: Add ipmregion tool with ipmregion list and reconfigure-region commands James Anandraj
  2021-07-20 15:51 ` [PATCH v2 1/4] Documentation/ipmregion: Add documentation for ipmregion tool and commands James Anandraj
@ 2021-07-20 15:51 ` James Anandraj
  2021-07-20 15:51 ` [PATCH v2 3/4] ipmregion/reconfigure: Add ipmregion-reconfigure-region command James Anandraj
  2021-07-20 15:51 ` [PATCH v2 4/4] ipmregion/reconfigure: Add support for different pmem region modes James Anandraj
  3 siblings, 0 replies; 5+ messages in thread
From: James Anandraj @ 2021-07-20 15:51 UTC (permalink / raw)
  To: nvdimm, james.sushanth.anandraj

From: James Sushanth Anandraj <james.sushanth.anandraj@intel.com>

Add ipmregion-list command to enumerate 'nvdimm' devices. The command
reads pcd data from the 'nvdimm' devices to display information
related to region reconfiguration such as, pending
reconfiguration request and the status of last request.

Signed-off-by: James Sushanth Anandraj <james.sushanth.anandraj@intel.com>
---
 Makefile.am             |   2 +-
 configure.ac            |   1 +
 ipmregion/Makefile.am   |  17 ++
 ipmregion/builtin.h     |   8 +
 ipmregion/ipmregion.c   |  87 +++++++++
 ipmregion/list.c        | 114 ++++++++++++
 ipmregion/list.h        |  11 ++
 ipmregion/pcd.h         | 160 +++++++++++++++++
 ipmregion/reconfigure.c | 379 ++++++++++++++++++++++++++++++++++++++++
 ipmregion/reconfigure.h |  12 ++
 util/main.h             |   1 +
 11 files changed, 791 insertions(+), 1 deletion(-)
 create mode 100644 ipmregion/Makefile.am
 create mode 100644 ipmregion/builtin.h
 create mode 100644 ipmregion/ipmregion.c
 create mode 100644 ipmregion/list.c
 create mode 100644 ipmregion/list.h
 create mode 100644 ipmregion/pcd.h
 create mode 100644 ipmregion/reconfigure.c
 create mode 100644 ipmregion/reconfigure.h

diff --git a/Makefile.am b/Makefile.am
index f9fec0c..3ef2a98 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -1,7 +1,7 @@
 include Makefile.am.in
 
 ACLOCAL_AMFLAGS = -I m4 ${ACLOCAL_FLAGS}
-SUBDIRS = . daxctl/lib ndctl/lib ndctl daxctl
+SUBDIRS = . daxctl/lib ndctl/lib ndctl daxctl ipmregion
 if ENABLE_DOCS
 SUBDIRS += Documentation/ndctl Documentation/daxctl Documentation/ipmregion
 endif
diff --git a/configure.ac b/configure.ac
index 9f16b01..222eda2 100644
--- a/configure.ac
+++ b/configure.ac
@@ -225,6 +225,7 @@ AC_CONFIG_FILES([
         ndctl/lib/Makefile
         ndctl/Makefile
         daxctl/Makefile
+        ipmregion/Makefile
         test/Makefile
         Documentation/ndctl/Makefile
         Documentation/daxctl/Makefile
diff --git a/ipmregion/Makefile.am b/ipmregion/Makefile.am
new file mode 100644
index 0000000..4a17a69
--- /dev/null
+++ b/ipmregion/Makefile.am
@@ -0,0 +1,17 @@
+include $(top_srcdir)/Makefile.am.in
+
+bin_PROGRAMS = ipmregion
+
+ipmregion_SOURCES =\
+		ipmregion.c \
+		list.c \
+		reconfigure.c \
+		../util/json.c \
+		builtin.h
+
+ipmregion_LDADD =\
+	../ndctl/lib/libndctl.la \
+	../libutil.a \
+	$(UUID_LIBS) \
+	$(KMOD_LIBS) \
+	$(JSON_LIBS)
diff --git a/ipmregion/builtin.h b/ipmregion/builtin.h
new file mode 100644
index 0000000..4ea5650
--- /dev/null
+++ b/ipmregion/builtin.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2015-2021 Intel Corporation. All rights reserved. */
+#ifndef _IPMREGION_BUILTIN_H_
+#define _IPMREGION_BUILTIN_H_
+
+struct ndctl_ctx;
+int cmd_list(int argc, const char **argv, struct ndctl_ctx *ctx);
+#endif /* _IPMREGION_BUILTIN_H_ */
diff --git a/ipmregion/ipmregion.c b/ipmregion/ipmregion.c
new file mode 100644
index 0000000..7726974
--- /dev/null
+++ b/ipmregion/ipmregion.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2015-2021 Intel Corporation. All rights reserved.
+#include <stdio.h>
+#include <errno.h>
+#include <string.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <util/parse-options.h>
+#include <util/main.h>
+#include <ccan/array_size/array_size.h>
+#include <ipmregion/builtin.h>
+#include <ndctl/libndctl.h>
+
+static const char ipmregion_usage_string[] =
+	"ipmregion [--version] [--help] COMMAND [ARGS]";
+static const char ipmregion_more_info_string[] =
+	"See 'ipmregion help COMMAND' for more information on a specific command."
+	"\n ipmregion --list-cmds to see all available commands";
+
+static int cmd_version(int argc, const char **argv, struct ndctl_ctx *ctx)
+{
+	printf("%s\n", VERSION);
+	return 0;
+}
+
+static int cmd_help(int argc, const char **argv, struct ndctl_ctx *ctx)
+{
+	const char *const builtin_help_subcommands[] = {
+		"list",
+		NULL,
+	};
+	struct option builtin_help_options[] = {
+		OPT_END(),
+	};
+	static const char *builtin_help_usage[] = { "ipmregion help [command]",
+						    NULL };
+
+	argc = parse_options_subcommand(argc, argv, builtin_help_options,
+					builtin_help_subcommands,
+					builtin_help_usage, 0);
+
+	if (!argv[0]) {
+		printf("\n usage: %s\n\n", ipmregion_usage_string);
+		printf("\n %s\n\n", ipmregion_more_info_string);
+		return 0;
+	}
+
+	return help_show_man_page(argv[0], "ipmregion", "ipmregion_MAN_VIEWER");
+}
+
+static struct cmd_struct commands[] = {
+	{ "version", { cmd_version } },
+	{ "list", { cmd_list } },
+	{ "help", { cmd_help } },
+};
+
+int main(int argc, const char **argv)
+{
+	struct ndctl_ctx *ctx;
+	int rc;
+
+	/* Look for flags.. */
+	argv++;
+	argc--;
+	main_handle_options(&argv, &argc, ipmregion_usage_string, commands,
+			    ARRAY_SIZE(commands));
+
+	if (argc > 0) {
+		if (!prefixcmp(argv[0], "--"))
+			argv[0] += 2;
+	} else {
+		/* The user didn't specify a command; give them help */
+		printf("\n usage: %s\n\n", ipmregion_usage_string);
+		printf("\n %s\n\n", ipmregion_more_info_string);
+		goto out;
+	}
+
+	rc = ndctl_new(&ctx);
+	if (rc)
+		goto out;
+	main_handle_internal_command(argc, argv, ctx, commands,
+				     ARRAY_SIZE(commands), PROG_ipmregion);
+	ndctl_unref(ctx);
+	fprintf(stderr, "Unknown command: '%s'\n", argv[0]);
+out:
+	return 1;
+}
diff --git a/ipmregion/list.c b/ipmregion/list.c
new file mode 100644
index 0000000..242999a
--- /dev/null
+++ b/ipmregion/list.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2015-2021 Intel Corporation. All rights reserved.
+#include <util/json.h>
+#include <util/filter.h>
+#include <json-c/json.h>
+#include <ndctl/libndctl.h>
+#include <ipmregion/builtin.h>
+#include <util/parse-options.h>
+#include <reconfigure.h>
+
+static struct list_param {
+	const char *bus;
+	bool verbose;
+} param;
+
+struct json_object *ipmregion_list_dimm_to_json(struct ndctl_dimm *dimm)
+{
+	struct json_object *jdimm = json_object_new_object();
+	const char *id = ndctl_dimm_get_unique_id(dimm);
+	unsigned int handle = ndctl_dimm_get_handle(dimm);
+	unsigned short phys_id = ndctl_dimm_get_phys_id(dimm);
+	struct json_object *jobj;
+
+	if (!jdimm)
+		return NULL;
+
+	jobj = json_object_new_string(ndctl_dimm_get_devname(dimm));
+	if (!jobj)
+		goto err;
+	json_object_object_add(jdimm, "dev", jobj);
+	if (id) {
+		jobj = json_object_new_string(id);
+		if (!jobj)
+			goto err;
+		json_object_object_add(jdimm, "id", jobj);
+	}
+	if (handle < UINT_MAX) {
+		jobj = util_json_object_hex(handle, 0);
+		if (!jobj)
+			goto err;
+		json_object_object_add(jdimm, "handle", jobj);
+	}
+	if (phys_id < USHRT_MAX) {
+		jobj = util_json_object_hex(phys_id, 0);
+		if (!jobj)
+			goto err;
+		json_object_object_add(jdimm, "phys_id", jobj);
+	}
+	if (ipmregion_dimm_reconfigure_region_pending(dimm)) {
+		jobj = json_object_new_boolean(true);
+		if (!jobj)
+			goto err;
+		json_object_object_add(jdimm, "reconfigure_pending", jobj);
+	} else {
+		const char *r_status_str = NULL;
+		const int r_status = ipmregion_dimm_reconfigure_status(dimm);
+
+		r_status_str = ipmregion_dimm_reconfigure_status_string(dimm);
+		if (r_status_str) {
+			jobj = json_object_new_string(r_status_str);
+			if (!jobj)
+				goto err;
+			json_object_object_add(jdimm, "reconfigure_status",
+					       jobj);
+		}
+
+		if (r_status >= 0) {
+			jobj = json_object_new_int(r_status);
+			if (!jobj)
+				goto err;
+			json_object_object_add(jdimm, "reconfigure_err_status",
+					       jobj);
+		}
+	}
+	return jdimm;
+err:
+	json_object_put(jdimm);
+	return NULL;
+}
+
+int cmd_list(int argc, const char **argv, struct ndctl_ctx *ctx)
+{
+	const struct option options[] = {
+		OPT_STRING('b', "bus", &param.bus, "bus-id",
+			   "filter by <bus-id>"),
+		OPT_BOOLEAN('v', "verbose", &param.verbose, "turn on debug"),
+		OPT_END(),
+	};
+	const char *const u[] = { "ipmregion list [<options>]", NULL };
+	struct ndctl_bus *bus = NULL;
+	struct json_object *j_dimms = NULL;
+
+	argc = parse_options(argc, argv, options, u, 0);
+	if (param.verbose)
+		ndctl_set_log_priority(ctx, LOG_DEBUG);
+	j_dimms = json_object_new_array();
+	if (!j_dimms)
+		return -ENOMEM;
+	ndctl_bus_foreach(ctx, bus) {
+		struct ndctl_dimm *dimm = NULL;
+
+		if (!util_bus_filter(bus, param.bus))
+			continue;
+		ndctl_dimm_foreach(bus, dimm) {
+			struct json_object *j_dimm = NULL;
+
+			j_dimm = ipmregion_list_dimm_to_json(dimm);
+			if (j_dimm)
+				json_object_array_add(j_dimms, j_dimm);
+		}
+	}
+	util_display_json_array(stdout, j_dimms, 0);
+	return 0;
+}
diff --git a/ipmregion/list.h b/ipmregion/list.h
new file mode 100644
index 0000000..eabd710
--- /dev/null
+++ b/ipmregion/list.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0*/
+/* Copyright(c) 2021 Intel Corporation. All rights reserved.*/
+
+#ifndef _LIST_H_
+#define _LIST_H_
+
+#include <ndctl/lib/private.h>
+#include <ndctl/libndctl.h>
+#include <util/json.h>
+struct json_object *ipmregion_list_dimm_to_json(struct ndctl_dimm *dimm);
+#endif /* _LIST_H_ */
diff --git a/ipmregion/pcd.h b/ipmregion/pcd.h
new file mode 100644
index 0000000..f49f7eb
--- /dev/null
+++ b/ipmregion/pcd.h
@@ -0,0 +1,160 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2021 Intel Corporation. All rights reserved. */
+#ifndef _PCD_H_
+#define _PCD_H_
+#include <stdint.h>
+#include <ccan/short_types/short_types.h>
+#include <string.h>
+#include <acpi.h>
+/**
+ * The module Platform Configuration Data (PCD) refers to a section of the
+ * PMem module that is used to store metadata. The metadata stored in the PCD
+ * is the architected interface between software and platform firmware to
+ * support PMem provisioning. The format of PCD partition ID #1 used for
+ * provisioning is as follows. This is from Section 3 Figure 31 of Provisioning
+ * specification document.
+ * PCD Partition ID 1 - Configuration management usage (64KB)
+ *
+ *			+---------------------+    +---------------+
+ *	 +------------->│Current Configuration+--->│Extension table│
+ *	 │		+---------------------+    +---------------+
+ *	 │
+ * +-----+-------+      +---------------------+    +---------------+
+ * │Configuration+----->│ Configuration Input +--->│Extension table│
+ * │   Header    │      +---------------------+    +---------------+
+ * +-----+-------+
+ *	 │		+---------------------+    +---------------+
+ *	 +------------->│Configuration Output +--->│Extension table│
+ *			+---------------------+    +---------------+
+ * Glossary
+ * --------
+ * PCD - Platform Configuration Data
+ * Config Header - Configuration header
+ * CCUR - Current Configuration
+ * CIN - Configuration Input
+ * COUT - Configuration Output
+ */
+/**
+ * struct pcd_config_header - configuration header
+ * @header: ACPI header
+ * @ccur_data_size: current configuration data size
+ * @ccur_offset: current configuration start offset
+ * @cin_data_size: configuration input data size
+ * @cin_offset: configuration input start offset
+ * @cout_data_size: configuration output data size
+ * @cout_offset: configuration output start offset
+ * The configuration header structure contains two parts - ACPI header
+ * and the body part contains pointers to the current configuration,
+ * configuration input, and configuration output. The structure and its
+ * fields are described in the configuration header section 3.1 and Table 31
+ * in provisioning specification document.
+ */
+struct pcd_config_header {
+	struct acpi_header header;
+	u32 ccur_data_size;
+	u32 ccur_offset;
+	u32 cin_data_size;
+	u32 cin_offset;
+	u32 cout_data_size;
+	u32 cout_offset;
+} __attribute__((packed));
+/**
+ * struct pcd_ccur - current configuration
+ * @header: ACPI header
+ * @status: configuration status
+ * @r1: reserved
+ * @volatile_size: volatile memory size mapped into SPA
+ * @persistent_size: persistent memory size mapped into SPA
+ * The current configuration structure consists of two parts - ACPI header
+ * and the body fields that are created by the platform firmware and
+ * updated on each PMem module during the memory configuration phase of
+ * the platform firmware. The structure and its fields are described in the
+ * section 3.2 and Table 32 in the provisioning specification document.
+ */
+struct pcd_ccur {
+	struct acpi_header header;
+	u16 status;
+	u8 r1[2];
+	u64 volatile_size;
+	u64 persistent_size;
+} __attribute__((packed));
+/**
+ * struct pcd_cin - configuration input
+ * @header: ACPI header
+ * @sequence: sequence number
+ * @r1: reserved
+ * The configuration input structure consists of two parts - ACPI header and
+ * the body fields that represents a configuration request created by the
+ * software. The platform firmware processes this table on the next system
+ * reboot. The structure and its fields are described in the section 3.3 and
+ * Table 33 in the provisioning specification document.
+ */
+struct pcd_cin {
+	struct acpi_header header;
+	u32 sequence;
+	u8 r1[8];
+} __attribute__((packed));
+/**
+ * struct pcd_cout - configuration output
+ * @header: ACPI header
+ * @sequence: sequence number
+ * @status: validation status
+ * @r1: reserved
+ * The configuration output structure consists of two parts - ACPI header and
+ * the body fields that are created by the platform firmware in response to the
+ * software request input configuration input table. The structure and its
+ * fields are described in the section 3.4 and Table 34 in the provisioning
+ * specification document.
+ */
+struct pcd_cout {
+	struct acpi_header header;
+	u32 sequence;
+	u8 status;
+	u8 r1[7];
+} __attribute__((packed));
+/**
+ * struct pcd_get_pcd_input - get pcd input
+ * @partition_id: partition id
+ * @payload_type: payload type
+ * @retreive option: retreive option
+ * @reserved: reserved
+ * @offset: offset
+ * The structure represents the input parameters to get
+ * pcd vendor specific command. The structure and its fields are described in
+ * section 4.1 and Table 41 in the provisioning specification document.
+ */
+struct pcd_get_pcd_input {
+	u8 partition_id;
+	struct {
+		u8 payload_type : 1;
+		u8 retrieve_option : 1;
+		u8 reserved : 6;
+	} __attribute__((packed)) options;
+	u32 offset;
+} __attribute__((packed));
+/**
+ * Human readable pcd status string
+ */
+static const char *const pcd_status_str[] = {
+	"undefined",
+	"success",
+	"reserved",
+	"configuration input error",
+};
+
+/**
+ * PCD small payload size .The value is mentioned in section 4.1 of
+ * provisioning document.
+ */
+#define PCD_SP_SIZE 128u
+/**
+ * PCD dimm partition size.The value is mentioned in section 3 figure 31 of
+ * provisioning document.
+ */
+#define MAX_PCD_SIZE 0x10000u
+/**
+ * The opcode value for get pcd vendor specific command. The value is mentioned
+ * in section 4.1 Table 42 of provisioning document.
+ */
+#define PCD_OPCODE_GET_PCD ((u16)0x0601)
+#endif /* _PCD_H_ */
diff --git a/ipmregion/reconfigure.c b/ipmregion/reconfigure.c
new file mode 100644
index 0000000..17703e8
--- /dev/null
+++ b/ipmregion/reconfigure.c
@@ -0,0 +1,379 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright(c) 2021 Intel Corporation. All rights reserved.
+#include <ndctl/libndctl.h>
+#include <pcd.h>
+#include <stdbool.h>
+#include <reconfigure.h>
+#include <stdlib.h>
+#include <ccan/minmax/minmax.h>
+
+/**
+ * The location of the configuration input in the pcd partition
+ * is determined by configuration input start offset field. The
+ * field is described in Section 3.1 and Table 31 in provisioning
+ * document
+ */
+static struct pcd_cin *get_cin(struct pcd_config_header const *c)
+{
+	return (struct pcd_cin *)((u8 *)c + c->cin_offset);
+}
+
+/**
+ * The location of the current configuration in the pcd partition
+ * is determined by current configuration start offset field. The
+ * field is described in Section 3.1 and Table 31 in provisioning
+ * document
+ */
+static struct pcd_ccur *get_ccur(struct pcd_config_header const *c)
+{
+	return (struct pcd_ccur *)((u8 *)c + c->ccur_offset);
+}
+
+/**
+ * The location of the configuration output in the pcd partition
+ * is determined by configuration output start offset field. The
+ * field is described in Section 3.1 and Table 31 in provisioning
+ * document
+ */
+static struct pcd_cout *get_cout(struct pcd_config_header const *c)
+{
+	return (struct pcd_cout *)((u8 *)c + c->cout_offset);
+}
+
+/**
+ * To determine if a reconfiguration request is pending we can look at
+ * configuration input data size and sequence number fields.
+ */
+static bool is_pcd_reconfigure_pending(struct pcd_config_header const *ch)
+{
+	struct pcd_cin *cin = NULL;
+	struct pcd_cout *cout = NULL;
+
+	/**
+	 * There is a pending request if configuration input table is
+	 * present and the sequence number in the configuration input
+	 * table is not same as the sequence number in configuration
+	 * output table
+	 */
+	cin = get_cin(ch);
+	cout = get_cout(ch);
+	if (ch->cin_data_size == 0) {
+		return false;
+	} else if (cin->sequence > 0) {
+		if (ch->cout_data_size == 0)
+			return true;
+		else if (cin->sequence != cout->sequence)
+			return true;
+	}
+	return false;
+}
+
+/**
+ * Map the current configuration status value to human readable strings. The
+ * current configuration status field is described in Section 3.2 and Table 32
+ * in provisioning document
+ */
+static const char *get_ccur_status_string(struct pcd_config_header const *ch)
+{
+	struct pcd_ccur *ccur = NULL;
+
+	ccur = get_ccur(ch);
+	if (ch->ccur_data_size == 0)
+		return NULL;
+	if (ccur->status <= 3)
+		return pcd_status_str[ccur->status];
+	if (ccur->status == 5)
+		return pcd_status_str[2];
+	/* conf status is cin error */
+	if (ccur->status >= 4 && ccur->status < 16)
+		return pcd_status_str[3];
+	/* all other values are reserved */
+	return pcd_status_str[2];
+}
+
+/**
+ * Get configuration status field from current configuration structure. The
+ * field is described in Section 3.2 and Table 32 in provisioning document
+ */
+static int get_pcd_ccur_status(struct pcd_config_header const *ch)
+{
+	struct pcd_ccur *ccur = NULL;
+
+	ccur = get_ccur(ch);
+	if (ch->ccur_data_size == 0)
+		return -ENOTTY;
+	return ccur->status;
+}
+
+/**
+ * Function to execute a vendor specific command where input data
+ * can be sent and output data can be received
+ */
+static int execute_vendor_specific_cmd(struct ndctl_dimm *dimm,
+				       const u32 op_code, void *inp,
+				       const u32 inp_size, void *op,
+				       u32 op_size)
+{
+	struct ndctl_cmd *cmd = NULL;
+	size_t bytes;
+
+	if (!dimm || !inp || inp_size == 0 || (op_size > 0 && !op) ||
+	    op_size > PCD_SP_SIZE) {
+		fprintf(stderr, "%s: dimm: %#x vendor cmd param incorrect\n",
+			__func__, ndctl_dimm_get_handle(dimm));
+		return -ENOTTY;
+	}
+	cmd = ndctl_dimm_cmd_new_vendor_specific(dimm, op_code, inp_size,
+						 op_size);
+	if (!cmd)
+		return -ENOTTY;
+	bytes = ndctl_cmd_vendor_set_input(cmd, inp, inp_size);
+	if (bytes != inp_size)
+		return -ENOTTY;
+	ndctl_cmd_submit(cmd);
+	if (op_size > 0) {
+		size_t rbytes = 0;
+
+		rbytes = ndctl_cmd_vendor_get_output(cmd, op, op_size);
+		if (rbytes < op_size)
+			return -ENOTTY;
+	}
+	ndctl_cmd_unref(cmd);
+	return 0;
+}
+
+static inline u32 max_of_three(u32 a, u32 b, u32 c)
+{
+	return max(a, b) > c ? max(a, b) : c;
+}
+
+/**
+ * The maximum pcd table size that needs to be read for purpose of reconfigure
+ * regions is the entire 64 kb configuration management usage sub partition.
+ * The actual table structures could occupy less space. The function helps
+ * to calculate the size that needs to be read to get all the pcd table
+ * structures. PCD format is explained in section 3 of provisioning document.
+ */
+static u32 get_table_size(struct pcd_config_header const *ch)
+{
+	u32 size = 0;
+
+	/**
+	 * Find which table among ccur, cin and cout is the furthest in
+	 * the sub partition. The offset + data size of the furthest
+	 * table rounded up to pcd small payload size and bounded by the maximum
+	 * pcd size would be the furthest we need to read to get all the pcd
+	 * table structures. The fields used are explained in section 3.1 and
+	 * table 31 of the provisioning document.
+	 */
+	size = max_of_three(ch->ccur_offset + ch->ccur_data_size,
+			    ch->cin_offset + ch->cin_data_size,
+			    ch->cout_offset + ch->cout_data_size);
+	size = size > PCD_SP_SIZE ? size - (size % PCD_SP_SIZE) + PCD_SP_SIZE :
+				    PCD_SP_SIZE;
+	size = min(size, MAX_PCD_SIZE);
+	return size;
+}
+
+/**
+ * Given a ndctl dimm object read the pcd and calculate table size to read
+ * and get all pcd table structures.
+ */
+static u32 read_pcd_size(struct ndctl_dimm *dimm)
+{
+	u32 op_code = 0;
+	char inp[PCD_SP_SIZE];
+	char op[PCD_SP_SIZE];
+	struct pcd_get_pcd_input *in = NULL;
+
+	memset(inp, 0, PCD_SP_SIZE);
+	memset(op, 0, PCD_SP_SIZE);
+	in = (struct pcd_get_pcd_input *)inp;
+	in->partition_id = 1;
+	in->options.payload_type = 1;
+	in->options.retrieve_option = 0;
+	in->offset = 0;
+	op_code = cpu_to_be16(PCD_OPCODE_GET_PCD);
+	if (execute_vendor_specific_cmd(dimm, op_code, inp, PCD_SP_SIZE, op,
+					PCD_SP_SIZE) != 0)
+		return 0;
+	return get_table_size((struct pcd_config_header *)op);
+}
+
+/**
+ * Given ndctl dimm object and a preallocated buffer read the pcd upto
+ * the number of bytes mentioned in size field into the buffer
+ */
+static int read_pcd(struct ndctl_dimm *dimm, struct pcd_config_header **pcd,
+		    u32 size)
+{
+	u32 op_code = 0;
+	char inp[PCD_SP_SIZE];
+	char op[PCD_SP_SIZE];
+	char **buf = (char **)pcd;
+	struct pcd_get_pcd_input *in = NULL;
+
+	memset(inp, 0, PCD_SP_SIZE);
+	memset(op, 0, PCD_SP_SIZE);
+	in = (struct pcd_get_pcd_input *)inp;
+	in->partition_id = 1;
+	in->options.payload_type = 1;
+	in->options.retrieve_option = 0;
+	in->offset = 0;
+	op_code = cpu_to_be16(PCD_OPCODE_GET_PCD);
+	while (size > 0) {
+		if (execute_vendor_specific_cmd(dimm, op_code, inp, PCD_SP_SIZE,
+						op, PCD_SP_SIZE) != 0)
+			return -ENOTTY;
+		memcpy((*buf) + in->offset, op, PCD_SP_SIZE);
+		size = size - PCD_SP_SIZE;
+		in->offset = in->offset + PCD_SP_SIZE;
+	}
+	return 0;
+}
+
+/**
+ * Validate checksum field of configuration header.
+ */
+static inline int validate_config_header(struct pcd_config_header *c)
+{
+	return acpi_checksum(c, c->header.length);
+}
+
+/**
+ * Validate signature and checksum fields of current configuration,
+ * configuration input and configuration output tables when they are present.
+ * These tables and the fields are explained in Section 3 of provisioning
+ * document.
+ */
+static inline int validate_config_data(struct pcd_config_header const *ch)
+{
+	int ret = 0;
+
+	if (ch->ccur_data_size > 0) {
+		const char *ccur_sig = "CCUR";
+		struct pcd_ccur *ccur = get_ccur(ch);
+
+		if (memcmp(ccur_sig, &ccur->header.signature, 4) != 0)
+			ret = -ENOTTY;
+		if (acpi_checksum(ccur, ccur->header.length) != 0)
+			ret = -ENOTTY;
+	}
+	if (ch->cin_data_size > 0) {
+		const char *cin_sig = "CIN_";
+		struct pcd_cin *cin = get_cin(ch);
+
+		if (memcmp(cin_sig, &cin->header.signature, 4) != 0)
+			ret = -ENOTTY;
+		if (acpi_checksum(cin, cin->header.length) != 0)
+			ret = -ENOTTY;
+	}
+	if (ch->cout_data_size > 0) {
+		const char *cout_sig = "COUT";
+		struct pcd_cout *cout = get_cout(ch);
+
+		if (memcmp(cout_sig, &cout->header.signature, 4) != 0)
+			ret = -ENOTTY;
+		if (acpi_checksum(cout, cout->header.length) != 0)
+			ret = -ENOTTY;
+	}
+	return ret;
+}
+
+/**
+ * Given a pcd buffer validate checksum and signature of sub tables. The pcd
+ * tables and subtables are explained in section 3 of provisioning document.
+ */
+static int validate_pcd(struct pcd_config_header *pcd)
+{
+	int ret = -ENOTTY;
+
+	if (!pcd)
+		return ret;
+	if (validate_config_header(pcd) != 0)
+		return ret;
+	if (validate_config_data(pcd) != 0)
+		return ret;
+	ret = 0;
+	return ret;
+}
+
+/**
+ * Read pcd data from dimm and if valid pcd is present check if there is
+ * a pending region reconfigure request.
+ */
+bool ipmregion_dimm_reconfigure_region_pending(struct ndctl_dimm *dimm)
+{
+	struct pcd_config_header *buf = NULL;
+	bool ret = false;
+	u32 size = read_pcd_size(dimm);
+
+	if (!size)
+		return ret;
+	buf = (struct pcd_config_header *)calloc(size, sizeof(char));
+	if (!(buf))
+		return ret;
+	if (read_pcd(dimm, &buf, size))
+		goto out;
+	if (validate_pcd(buf))
+		goto out;
+	ret = is_pcd_reconfigure_pending(buf);
+out:
+	free(buf);
+	return ret;
+}
+
+/**
+ * Read pcd data from dimm and if valid pcd and reconfigure request is present.
+ * return human readable configuration status string. This field is explained in
+ * section 3.2 and Table 32 of provisioning specification.
+ */
+const char *ipmregion_dimm_reconfigure_status_string(struct ndctl_dimm *dimm)
+{
+	struct pcd_config_header *buf = NULL;
+	const char *status = NULL;
+	u32 size = read_pcd_size(dimm);
+
+	if (!size)
+		return status;
+	buf = (struct pcd_config_header *)calloc(size, sizeof(char));
+	if (!(buf))
+		return status;
+	if (read_pcd(dimm, &buf, size))
+		goto out;
+	if (validate_pcd(buf))
+		goto out;
+	status = get_ccur_status_string(buf);
+out:
+	free(buf);
+	return status;
+}
+
+/**
+ * Read pcd data from dimm and if valid pcd and reconfiguration request is
+ * present return the configuration status value if it is not success. This
+ * field is explained in section 3.2 and Table 32 of provisioning specification.
+ */
+int ipmregion_dimm_reconfigure_status(struct ndctl_dimm *dimm)
+{
+	struct pcd_config_header *buf = NULL;
+	int status = -1;
+	u32 size = read_pcd_size(dimm);
+
+	if (!size)
+		return status;
+	buf = (struct pcd_config_header *)calloc(size, sizeof(char));
+	if (!(buf))
+		return status;
+	if (read_pcd(dimm, &buf, size))
+		goto out;
+	if (validate_pcd(buf))
+		goto out;
+	status = get_pcd_ccur_status(buf);
+	/* No need to display if status is success */
+	if (status == 1)
+		status = -1;
+out:
+	free(buf);
+	return status;
+}
diff --git a/ipmregion/reconfigure.h b/ipmregion/reconfigure.h
new file mode 100644
index 0000000..84b3340
--- /dev/null
+++ b/ipmregion/reconfigure.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0*/
+/* Copyright(c) 2021 Intel Corporation. All rights reserved.*/
+
+#ifndef _RECONFIGURE_H_
+#define _RECONFIGURE_H_
+
+#include <ndctl/lib/private.h>
+#include <ndctl/libndctl.h>
+bool ipmregion_dimm_reconfigure_region_pending(struct ndctl_dimm *dimm);
+const char *ipmregion_dimm_reconfigure_status_string(struct ndctl_dimm *dimm);
+int ipmregion_dimm_reconfigure_status(struct ndctl_dimm *dimm);
+#endif /* _RECONFIGURE_H_ */
diff --git a/util/main.h b/util/main.h
index c89a843..f723c6e 100644
--- a/util/main.h
+++ b/util/main.h
@@ -10,6 +10,7 @@
 enum program {
 	PROG_NDCTL,
 	PROG_DAXCTL,
+	PROG_ipmregion
 };
 
 struct ndctl_ctx;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 3/4] ipmregion/reconfigure: Add ipmregion-reconfigure-region command
  2021-07-20 15:51 [PATCH v2 0/4] ndctl: Add ipmregion tool with ipmregion list and reconfigure-region commands James Anandraj
  2021-07-20 15:51 ` [PATCH v2 1/4] Documentation/ipmregion: Add documentation for ipmregion tool and commands James Anandraj
  2021-07-20 15:51 ` [PATCH v2 2/4] ipmregion/list: Add ipmregion-list command to enumerate 'nvdimm' devices James Anandraj
@ 2021-07-20 15:51 ` James Anandraj
  2021-07-20 15:51 ` [PATCH v2 4/4] ipmregion/reconfigure: Add support for different pmem region modes James Anandraj
  3 siblings, 0 replies; 5+ messages in thread
From: James Anandraj @ 2021-07-20 15:51 UTC (permalink / raw)
  To: nvdimm, james.sushanth.anandraj

From: James Sushanth Anandraj <james.sushanth.anandraj@intel.com>

Add ipmregion-reconfigure-region command and helper functions. The command
reads pcd data from the 'nvdimm' devices and writes a new pcd
reflecting the region reconfiguration request. In this patch functions
to reconfigure region into volatile regions(ram) are implemented.

Signed-off-by: James Sushanth Anandraj <james.sushanth.anandraj@intel.com>
---
 ipmregion/Makefile.am   |   1 +
 ipmregion/builtin.h     |   1 +
 ipmregion/ipmregion.c   |   1 +
 ipmregion/pcat.c        |  59 ++++++
 ipmregion/pcat.h        |  13 ++
 ipmregion/pcd.h         |  58 ++++++
 ipmregion/reconfigure.c | 390 +++++++++++++++++++++++++++++++++++++++-
 7 files changed, 515 insertions(+), 8 deletions(-)
 create mode 100644 ipmregion/pcat.c
 create mode 100644 ipmregion/pcat.h

diff --git a/ipmregion/Makefile.am b/ipmregion/Makefile.am
index 4a17a69..ad18103 100644
--- a/ipmregion/Makefile.am
+++ b/ipmregion/Makefile.am
@@ -6,6 +6,7 @@ ipmregion_SOURCES =\
 		ipmregion.c \
 		list.c \
 		reconfigure.c \
+		pcat.c \
 		../util/json.c \
 		builtin.h
 
diff --git a/ipmregion/builtin.h b/ipmregion/builtin.h
index 4ea5650..c31fc48 100644
--- a/ipmregion/builtin.h
+++ b/ipmregion/builtin.h
@@ -5,4 +5,5 @@
 
 struct ndctl_ctx;
 int cmd_list(int argc, const char **argv, struct ndctl_ctx *ctx);
+int cmd_reconfigure_region(int argc, const char **argv, struct ndctl_ctx *ctx);
 #endif /* _IPMREGION_BUILTIN_H_ */
diff --git a/ipmregion/ipmregion.c b/ipmregion/ipmregion.c
index 7726974..1541348 100644
--- a/ipmregion/ipmregion.c
+++ b/ipmregion/ipmregion.c
@@ -51,6 +51,7 @@ static int cmd_help(int argc, const char **argv, struct ndctl_ctx *ctx)
 static struct cmd_struct commands[] = {
 	{ "version", { cmd_version } },
 	{ "list", { cmd_list } },
+	{ "reconfigure-region", { cmd_reconfigure_region } },
 	{ "help", { cmd_help } },
 };
 
diff --git a/ipmregion/pcat.c b/ipmregion/pcat.c
new file mode 100644
index 0000000..3320784
--- /dev/null
+++ b/ipmregion/pcat.c
@@ -0,0 +1,59 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright(c) 2020 Intel Corporation. All rights reserved.
+#include <pcat.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <string.h>
+#include <acpi.h>
+#include <errno.h>
+
+static u8 pcat_rev;
+
+/**
+ * Function to read pcat revision from acpi table
+ */
+static u8 read_pcat_rev(void)
+{
+	const char *sysfs_path = "/sys/firmware/acpi/tables/PCAT";
+	struct acpi_header acpi;
+	int fd = open(sysfs_path, O_RDONLY);
+	u32 bytes = 0;
+
+	if (fd < 0)
+		return 0;
+	memset(&acpi, 0, sizeof(struct acpi_header));
+	bytes = read(fd, &acpi, sizeof(struct acpi_header));
+	if (bytes < sizeof(struct acpi_header)) {
+		close(fd);
+		return 0;
+	}
+	close(fd);
+	return acpi.revision;
+}
+
+/**
+ * Check if we have already read the pcat revision else read it and return
+ */
+u8 get_pcat_rev(void)
+{
+	if (!pcat_rev)
+		pcat_rev = read_pcat_rev();
+	return pcat_rev;
+}
+
+/**
+ * See if the pcat revision is such that we can create a region reconfiguration
+ * request only two revisions (0.2 and 1.2) are supported and described in
+ * provisioning document. This is mentioned in section 3.1 of provisioning
+ * document.
+ */
+int validate_pcat_rev(void)
+{
+	if (!pcat_rev)
+		pcat_rev = read_pcat_rev();
+	if (pcat_rev == PCAT_REV_0_2 || pcat_rev == PCAT_REV_1_2)
+		return 0;
+	return -EOPNOTSUPP;
+}
diff --git a/ipmregion/pcat.h b/ipmregion/pcat.h
new file mode 100644
index 0000000..3f6f8ae
--- /dev/null
+++ b/ipmregion/pcat.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0*/
+/* Copyright(c) 2021 Intel Corporation. All rights reserved.*/
+
+#ifndef _PCAT_H_
+#define _PCAT_H_
+#include <stdint.h>
+#include <ccan/short_types/short_types.h>
+#define PCAT_REV_0_2 ((u8)0x02)
+#define PCAT_REV_1_2 ((u8)0x12)
+
+int validate_pcat_rev(void);
+u8 get_pcat_rev(void);
+#endif /* _LIST_H_ */
diff --git a/ipmregion/pcd.h b/ipmregion/pcd.h
index f49f7eb..e16c1e0 100644
--- a/ipmregion/pcd.h
+++ b/ipmregion/pcd.h
@@ -33,6 +33,7 @@
  * CCUR - Current Configuration
  * CIN - Configuration Input
  * COUT - Configuration Output
+ * PSCT - Partition Size Change Table
  */
 /**
  * struct pcd_config_header - configuration header
@@ -112,6 +113,23 @@ struct pcd_cout {
 	u8 status;
 	u8 r1[7];
 } __attribute__((packed));
+/**
+ * struct pcd_psct - partition size change table (type 4)
+ * @type: table type
+ * @length: length in bytes for entire table
+ * @status: partition size change status
+ * @size: persistent memory partition size
+ * The Partition Size Change Table is used for changing the module partition
+ * size, and it is used in the configuration input and output structures. The
+ * structure and its fields are described in 3.7 and Table 37 in the
+ * provisioning specification document.
+ */
+struct pcd_psct {
+	u16 type;
+	u16 length;
+	u32 status;
+	u64 size;
+} __attribute__((packed));
 /**
  * struct pcd_get_pcd_input - get pcd input
  * @partition_id: partition id
@@ -132,6 +150,24 @@ struct pcd_get_pcd_input {
 	} __attribute__((packed)) options;
 	u32 offset;
 } __attribute__((packed));
+/**
+ * struct pcd_set_pcd_input - set pcd input
+ * @partition_id: partition id
+ * @payload_type: payload type
+ * @offset: offset
+ * @reserved: reserved
+ * @data: data
+ * The structure represents the input parameters to set pcd vendor specific
+ * command. The structure and its fields are described in section 4.2 and Table
+ * 45 in the provisioning specification document.
+ */
+struct pcd_set_pcd_input {
+	u8 partition_id;
+	u8 payload_type;
+	u32 offset;
+	u8 reserved[58];
+	u8 data[64];
+} __attribute__((packed));
 /**
  * Human readable pcd status string
  */
@@ -147,6 +183,11 @@ static const char *const pcd_status_str[] = {
  * provisioning document.
  */
 #define PCD_SP_SIZE 128u
+/**
+ * PCD write payload data size. The value is mentioned in section 4.2 and Table
+ * 45 of provisioning document.
+ */
+#define PCD_WP_SIZE 64u
 /**
  * PCD dimm partition size.The value is mentioned in section 3 figure 31 of
  * provisioning document.
@@ -157,4 +198,21 @@ static const char *const pcd_status_str[] = {
  * in section 4.1 Table 42 of provisioning document.
  */
 #define PCD_OPCODE_GET_PCD ((u16)0x0601)
+/**
+ * The opcode value for set pcd vendor specific command. The value is mentioned
+ * in section 4.2 Table 45 of provisioning document.
+ */
+#define PCD_OPCODE_SET_PCD ((u16)0x0701)
+/**
+ * Defines for PCD revision values
+ */
+#define PCD_REV_0_1 ((u8)0x01)
+#define PCD_REV_0_2 ((u8)0x02)
+#define PCD_REV_1_0 ((u8)0x10)
+#define PCD_REV_1_2 ((u8)0x12)
+/**
+ * Define for header type value for Partition Size Change Table. The value is
+ * mentioned in Table 37 of provisioning document.
+ */
+#define PSCT_TYPE ((u16)0x4)
 #endif /* _PCD_H_ */
diff --git a/ipmregion/reconfigure.c b/ipmregion/reconfigure.c
index 17703e8..0a7fd3c 100644
--- a/ipmregion/reconfigure.c
+++ b/ipmregion/reconfigure.c
@@ -6,9 +6,53 @@
 #include <reconfigure.h>
 #include <stdlib.h>
 #include <ccan/minmax/minmax.h>
+#include <util/parse-options.h>
+#include <util/json.h>
+#include <util/filter.h>
+#include <json-c/json.h>
+#include <pcat.h>
+#include <list.h>
+
+static struct reconfigure_param {
+	const char *bus;
+	bool verbose;
+	const char *mode;
+} param;
+static const struct option reconfigure_options[] = {
+	OPT_STRING('b', "bus", &param.bus, "bus-id", "filter by <bus-id>"),
+	OPT_BOOLEAN('v', "verbose", &param.verbose, "turn on debug"),
+	OPT_STRING('m', "mode", &param.mode, "mode", "reconfigure region mode"),
+	OPT_END()
+};
 
 /**
- * The location of the configuration input in the pcd partition
+ * Return the Configuration Header revision based on pcat revision.
+ * 0.1: Used with PCAT revision 0.2
+ * 1.2: Used with PCAT revision 1.2
+ * The configuration header contains pointers to the current configuration,
+ * configuration input, and configuration output. Its structure and fields are
+ * described in Section 3.1 and Table 31 in provisioning document
+ */
+static inline u8 get_cfg_header_revision(u8 pcat_revision)
+{
+	return pcat_revision > PCD_REV_0_2 ? PCD_REV_1_2 : PCD_REV_0_1;
+}
+
+/**
+ * Return the PCD table revision based on config header revision.
+ * 0.2: Used with config header revision 0.1
+ * 1.2: Used with PCAT revision 1.2
+ * Current configuration, configuration input, and configuration output contain
+ * revision fields. The structures and fields are described in Table 32, Table
+ * 33 and Table 34 in provisioning document.
+ */
+static inline u8 get_pcd_table_revision(u8 cfg_header_revision)
+{
+	return cfg_header_revision > PCD_REV_1_0 ? PCD_REV_1_2 : PCD_REV_0_2;
+}
+
+/**
+ * The location of the configuration input(cin) in the pcd partition
  * is determined by configuration input start offset field. The
  * field is described in Section 3.1 and Table 31 in provisioning
  * document
@@ -19,7 +63,7 @@ static struct pcd_cin *get_cin(struct pcd_config_header const *c)
 }
 
 /**
- * The location of the current configuration in the pcd partition
+ * The location of the current configuration(ccur) in the pcd partition
  * is determined by current configuration start offset field. The
  * field is described in Section 3.1 and Table 31 in provisioning
  * document
@@ -30,7 +74,7 @@ static struct pcd_ccur *get_ccur(struct pcd_config_header const *c)
 }
 
 /**
- * The location of the configuration output in the pcd partition
+ * The location of the configuration output(cout) in the pcd partition
  * is determined by configuration output start offset field. The
  * field is described in Section 3.1 and Table 31 in provisioning
  * document
@@ -40,6 +84,154 @@ static struct pcd_cout *get_cout(struct pcd_config_header const *c)
 	return (struct pcd_cout *)((u8 *)c + c->cout_offset);
 }
 
+/**
+ * Determine the max bytes to write in PCD based on number of bytes
+ * read (pcd_length) and the size of the new tables (buf_length) to be written.
+ */
+static u32 set_pcd_length(u32 pcd_length, u32 buf_length)
+{
+	u32 length = 0;
+
+	/**
+	 * Here the total of read PCD length and size of new tables is rounded
+	 * down to MAX_PCD_SIZE and rounded up to PCD_SP_SIZE multiple. These
+	 * values are mentioned in section 3 figure 31 and section 4.1 of
+	 * provisioning document.
+	 */
+	length = pcd_length + buf_length;
+	length = length > MAX_PCD_SIZE ? MAX_PCD_SIZE : length;
+	if (length % PCD_SP_SIZE != 0) {
+		length = length + PCD_SP_SIZE - (length % PCD_SP_SIZE);
+		length = length > MAX_PCD_SIZE ? MAX_PCD_SIZE : length;
+	}
+	return length;
+}
+
+/**
+ * When creating a new request a new configuration input(cin) table needs to be
+ * created. This table can be placed at the end of ccur or between current
+ * configuration and configuration header if the gap is big enough
+ */
+static u32 calc_cin_offset(u32 ccur_offset, u32 ccur_size, u32 buf_length)
+{
+	u32 offset = 0;
+	u32 total_length = buf_length + sizeof(struct pcd_cin);
+	u32 space = ccur_offset - sizeof(struct pcd_config_header);
+
+	/**
+	 * Here space is the gap between configuration header and current
+	 * configuration. Total length is the size of the new configuration
+	 * input table and the extension tables to be written for the new
+	 * request. See section 5 in provisioning document for examples
+	 * on how pcd is structured for new requests.
+	 */
+	if (space > total_length)
+		offset = sizeof(struct pcd_config_header);
+	else
+		offset = ccur_offset + ccur_size;
+	return offset;
+}
+
+/**
+ * Given an empty buffer, its length and size of new tables to be written for
+ * a request. Initialize the buffer based on the read pcd. The configuration
+ * header and current configuration have to be copied over to the new buffer.
+ * Some values of configuration input can be set to default values. Section 3
+ * and Section 5 in provisioning document explain the fields and have examples
+ * of pcd when a new request is created.
+ */
+static void init_pcd(struct pcd_config_header const *buf, u32 set_pcd_length,
+		     u32 cin_length, struct pcd_config_header *ch)
+{
+	struct pcd_cin *cin = NULL;
+	struct pcd_cout *cout = NULL;
+
+	/**
+	 * The steps to initialize pcd from read pcd are as follows
+	 * 1) Copy configuration header
+	 * 2) Copy current configuraiton if it exists
+	 * 3) Update the revision, configuration input data size and
+	 *    configuration input offset fields
+	 * 4) Zero out configuration output data size and configuration
+	 *    offset fields
+	 * 5) Copy header section of configuration header to configuration
+	 *    input table
+	 * 6) Initialize signature, length, revision, id fields and sequence of
+	 *    configuration input table
+	 * Section 3 and Section 5 in provisioning document explain the fields
+	 * and have examples of pcd when a new request is created.
+	 */
+	memset(ch, 0, set_pcd_length);
+	/* copy over config header */
+	memcpy(ch, buf, sizeof(struct pcd_config_header));
+	/* copy over ccur */
+	if (ch->ccur_data_size > 0)
+		memcpy(((char *)ch + ch->ccur_offset),
+		       ((char *)buf + ch->ccur_offset), ch->ccur_data_size);
+	ch->header.revision = get_cfg_header_revision(get_pcat_rev());
+	ch->cin_data_size = sizeof(struct pcd_cin) + cin_length;
+	ch->cin_offset = calc_cin_offset(ch->ccur_offset, ch->ccur_data_size,
+					 cin_length);
+	ch->cout_data_size = 0;
+	ch->cout_offset = 0;
+	cin = get_cin(ch);
+	/* prefill cin with ch header */
+	memcpy(cin, ch, sizeof(struct acpi_header));
+	memcpy(&cin->header.signature, "CIN_", 4);
+	cin->header.length = sizeof(struct pcd_cin) + cin_length;
+	cin->header.revision = get_pcd_table_revision(ch->header.revision);
+	memcpy(&cin->header.asl_id, "PCDC", 4);
+	cin->header.asl_revision = 1;
+	/**
+	 * The sequence value is 1 if no configuration output table exists or
+	 * one more than sequence value of configuration output table
+	 */
+	cin->sequence = 1;
+	if (buf->cout_data_size > 0) {
+		cout = get_cout(buf);
+		cin->sequence = cout->sequence + 1;
+	}
+}
+
+/**
+ * This helper function provides the location after configuration input (cin)
+ * table. New tables listed in section 3.5 of provisioning document can
+ * be placed at this location.
+ */
+static void *get_cin_tables_start(struct pcd_config_header const *c)
+{
+	return (void *)(get_cin(c) + 1);
+}
+
+/**
+ * This helper function fills the fields of the partition size change table
+ * (psct). The structure and the fields are described in section 3.7 of the
+ * provisioning document.
+ */
+static void fill_psct(struct pcd_psct *psct, const u64 size)
+{
+	psct->type = PSCT_TYPE;
+	psct->length = sizeof(struct pcd_psct);
+	psct->status = 0;
+	psct->size = size;
+}
+
+/**
+ * Update the checksum fields in the configuration input and configuration
+ * header. These fields are described in Table 31 and Table 33 of provisioning
+ * document.
+ */
+static void finalize_pcd(struct pcd_config_header *ch)
+{
+	struct pcd_cin *cin = NULL;
+
+	cin = get_cin(ch);
+	cin->header.checksum = 0;
+	cin->header.checksum = acpi_checksum(cin, cin->header.length);
+	ch->header.checksum = 0;
+	ch->header.checksum = acpi_checksum(ch, ch->header.length);
+}
+
 /**
  * To determine if a reconfiguration request is pending we can look at
  * configuration input data size and sequence number fields.
@@ -69,9 +261,9 @@ static bool is_pcd_reconfigure_pending(struct pcd_config_header const *ch)
 }
 
 /**
- * Map the current configuration status value to human readable strings. The
- * current configuration status field is described in Section 3.2 and Table 32
- * in provisioning document
+ * Map the current configuration (ccur) status value to human readable strings.
+ * The current configuration status field is described in Section 3.2 and
+ * Table 32 in provisioning document
  */
 static const char *get_ccur_status_string(struct pcd_config_header const *ch)
 {
@@ -92,8 +284,8 @@ static const char *get_ccur_status_string(struct pcd_config_header const *ch)
 }
 
 /**
- * Get configuration status field from current configuration structure. The
- * field is described in Section 3.2 and Table 32 in provisioning document
+ * Get configuration status field from current configuration(ccur) structure.
+ * The field is described in Section 3.2 and Table 32 in provisioning document
  */
 static int get_pcd_ccur_status(struct pcd_config_header const *ch)
 {
@@ -232,6 +424,45 @@ static int read_pcd(struct ndctl_dimm *dimm, struct pcd_config_header **pcd,
 	return 0;
 }
 
+/**
+ * Given ndctl dimm object ,a pcd buffer and its length write the pcd upto
+ * to the dimm using set pcd vendor specific command. The structure of command
+ * and its fields are described in section 4.2 and Table 45 in the provisioning
+ * specification document.
+ */
+static int write_pcd(struct ndctl_dimm *dimm, const char *buf, u32 buf_length)
+{
+	u32 op_code = 0;
+	char inp[PCD_SP_SIZE];
+	char op[PCD_SP_SIZE];
+	struct pcd_set_pcd_input *in = NULL;
+	int ret = 0;
+
+	memset(inp, 0, PCD_SP_SIZE);
+	memset(op, 0, PCD_SP_SIZE);
+	in = (struct pcd_set_pcd_input *)inp;
+	in->partition_id = 1;
+	in->payload_type = 1;
+	in->offset = 0;
+	op_code = cpu_to_be16(PCD_OPCODE_SET_PCD);
+	while (buf_length > 0) {
+		/**
+		 * Write PCD_WP_SIZE bytes of pcd in every iteration. The value
+		 * is mentioned in section 4.2 and Table 45 of provisioning
+		 * document.
+		 */
+		memcpy(in->data, buf + in->offset, PCD_WP_SIZE);
+		ret = execute_vendor_specific_cmd(dimm, op_code, inp,
+						  PCD_SP_SIZE, op, PCD_SP_SIZE);
+		if (ret)
+			return ret;
+		in->offset = in->offset + PCD_WP_SIZE;
+		buf_length =
+			buf_length < PCD_WP_SIZE ? 0 : buf_length - PCD_WP_SIZE;
+	}
+	return 0;
+}
+
 /**
  * Validate checksum field of configuration header.
  */
@@ -377,3 +608,146 @@ out:
 	free(buf);
 	return status;
 }
+
+/**
+ * Given a dimm object read its pcd and create a pcd structure with a
+ * configuration input table that requests creation of volatile region.
+ * To create the request a new Configuration input table and Partition size
+ * change table are to be added the pcd and existing configuration output
+ * table is removed. Section 5.1 in provisioning document provides an example
+ * pcd when creating this request.
+ * PCD Partition ID 1 - Configuration management usage (64KB)
+ *
+ * +-------------+	+---------------------+    +---------------+
+ * |		 +----->│Current Configuration+--->│Extension table│
+ * |Configuration|	+---------------------+    +---------------+
+ * │	Header	 |	+---------------------+    +---------------------------+
+ * │		 +----->│ Configuration Input +--->│Partition Size Change table│
+ * +-------------+	+---------------------+    +---------------------------+
+ */
+static int reconfigure_volatile(struct ndctl_dimm *dimm)
+{
+	struct pcd_config_header *buf = NULL;
+	u32 length = 0;
+	struct pcd_config_header *ch = NULL;
+	struct pcd_psct *psct = NULL;
+	int ret = 0;
+	u32 pcd_size = read_pcd_size(dimm);
+
+	/**
+	 * Here are the steps to create a 100% memory mode request
+	 * 1) Read current pcd
+	 * 2) Copy configuration header, current configuration tables to a new
+	 *    pcd
+	 * 3) Create the Configuration input table and partition size change
+	      table and add them to the pcd
+	 * 4) Write the new pcd.
+	 * The value of persistent memory partition size in the partition size
+	 * change table is set to 0 to indicate that all memory is to be used
+	 * memory mode. Section 5.1 in provisioning document provides an
+	 * example pcd when creating this request.
+	 */
+	if (!pcd_size)
+		goto out;
+	buf = (struct pcd_config_header *)calloc(pcd_size, sizeof(char));
+	if (!(buf))
+		goto out;
+	ret = read_pcd(dimm, &buf, pcd_size);
+	if (ret != 0)
+		goto out;
+	/**
+	 * Create new partition size change table for memory mode. The length
+	 * of pcd to be written would depend on sum of read pcd size and length
+	 * of psct table for 100% memory mode request. The configuration input
+	 * table would take the same space as zeroed out configuration output
+	 * table from read pcd. Section 5.1 in provisioning document provides
+	 * an example pcd when creating this request.
+	 */
+	length = sizeof(struct pcd_psct);
+	pcd_size = set_pcd_length(pcd_size, length);
+	ch = malloc(pcd_size);
+	if (!ch) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	init_pcd(buf, pcd_size, length, ch);
+	psct = (struct pcd_psct *)get_cin_tables_start(ch);
+	fill_psct(psct, 0);
+	finalize_pcd(ch);
+	ret = write_pcd(dimm, (char *)ch, pcd_size);
+out:
+	free(buf);
+	free(ch);
+	return ret;
+}
+
+/**
+ * Given the mode option perform the region reconfiguration action on
+ * the dimm object
+ */
+static int do_reconfigure(struct ndctl_dimm *dimm, const char *mode)
+{
+	if (!mode)
+		return -EOPNOTSUPP;
+	if (strncmp(mode, "ram", 3) == 0)
+		return reconfigure_volatile(dimm);
+	return -EOPNOTSUPP;
+}
+
+/**
+ * Function to implement region reconfiguration command based on user
+ * options
+ */
+int cmd_reconfigure_region(int argc, const char **argv, struct ndctl_ctx *ctx)
+{
+	struct ndctl_bus *bus = NULL;
+	struct json_object *j_dimms = NULL;
+	u32 n_obj = 0;
+	int i, ret = 0;
+	char *usage = "ipmregion reconfigure-region [<options>]";
+	const char *const u[] = { usage, NULL };
+
+	argc = parse_options(argc, argv, reconfigure_options, u, 0);
+	if (param.verbose)
+		ndctl_set_log_priority(ctx, LOG_DEBUG);
+	for (i = 1; i < argc; i++)
+		fprintf(stderr, "unknown extra parameter \"%s\"\n", argv[i]);
+	if (argc > 1) {
+		usage_with_options(u, reconfigure_options);
+		return -EOPNOTSUPP;
+	}
+	ret = validate_pcat_rev();
+	if (ret) {
+		fprintf(stderr, "error: Invalid pcat revision %u\n",
+			get_pcat_rev());
+		return ret;
+	}
+	j_dimms = json_object_new_array();
+	if (!j_dimms)
+		return -ENOMEM;
+	ndctl_bus_foreach(ctx, bus) {
+		struct ndctl_dimm *dimm = NULL;
+
+		if (!util_bus_filter(bus, param.bus))
+			continue;
+		if (!ndctl_bus_has_nfit(bus))
+			continue;
+		ndctl_dimm_foreach(bus, dimm) {
+			struct json_object *j_dimm = NULL;
+
+			ret = do_reconfigure(dimm, param.mode);
+			if (ret) {
+				fprintf(stderr, "error: %s on dimm: %u\n",
+					strerror(ret), n_obj);
+				return ret;
+			}
+			j_dimm = ipmregion_list_dimm_to_json(dimm);
+			if (j_dimm)
+				json_object_array_add(j_dimms, j_dimm);
+			n_obj++;
+		}
+	}
+	util_display_json_array(stdout, j_dimms, 0);
+	fprintf(stderr, "%u nmems reconfig submitted\n", n_obj);
+	return 0;
+}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 4/4] ipmregion/reconfigure: Add support for different pmem region modes
  2021-07-20 15:51 [PATCH v2 0/4] ndctl: Add ipmregion tool with ipmregion list and reconfigure-region commands James Anandraj
                   ` (2 preceding siblings ...)
  2021-07-20 15:51 ` [PATCH v2 3/4] ipmregion/reconfigure: Add ipmregion-reconfigure-region command James Anandraj
@ 2021-07-20 15:51 ` James Anandraj
  3 siblings, 0 replies; 5+ messages in thread
From: James Anandraj @ 2021-07-20 15:51 UTC (permalink / raw)
  To: nvdimm, james.sushanth.anandraj

From: James Sushanth Anandraj <james.sushanth.anandraj@intel.com>

Implement ipmregion-reconfigure-region fault-isolation-pmem and
performance-pmem support. This patch adds helper functions
to support processing of different pmem reconfigure-region
requests. The fault-isolation-pmem reconfigure-region request
results in regions that does not utilize hardware interleaving
across non-volatile memory devices. The performance-pmem request
results in regions that utilize hardware interleaving. The command
reads pcd data from the 'nvdimm' devices and writes a new pcd
reflecting the new region reconfiguration request.

Signed-off-by: James Sushanth Anandraj <james.sushanth.anandraj@intel.com>
---
 ipmregion/pcd.h         | 163 +++++++++
 ipmregion/reconfigure.c | 709 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 870 insertions(+), 2 deletions(-)

diff --git a/ipmregion/pcd.h b/ipmregion/pcd.h
index e16c1e0..dccc03e 100644
--- a/ipmregion/pcd.h
+++ b/ipmregion/pcd.h
@@ -26,6 +26,11 @@
  *	 │		+---------------------+    +---------------+
  *	 +------------->│Configuration Output +--->│Extension table│
  *			+---------------------+    +---------------+
+ *
+ * The extension table for Configuration Input is determined by the region
+ * reconfiguration request. See section 5 in provisioning document for examples
+ * for each request type.
+ *
  * Glossary
  * --------
  * PCD - Platform Configuration Data
@@ -34,6 +39,11 @@
  * CIN - Configuration Input
  * COUT - Configuration Output
  * PSCT - Partition Size Change Table
+ * IIT - Interleave Information Table
+ * MIIIS - Module Identification Information for Interleave Set
+ * MUI - Module Unique Identifier
+ * ML - Module location
+ * MPIO - Module partition information output
  */
 /**
  * struct pcd_config_header - configuration header
@@ -130,6 +140,136 @@ struct pcd_psct {
 	u32 status;
 	u64 size;
 } __attribute__((packed));
+/**
+ * struct pcd_mui - module unique identifier
+ * @id: subsystem vendor id
+ * @location: manufacturing location
+ * @date: manufacturing date
+ * @serial: serial number
+ * The module unique identifier is a field in module identification information
+ * for interleave set. The field and its sub-fields are described in section
+ * 3.8.1.1 Table 311 in the provisioning specification document.
+ */
+struct pcd_mui {
+	u16 id;
+	u8 location;
+	u16 date;
+	u32 serial;
+} __attribute__((packed));
+/**
+ * struct pcd_ml - module location
+ * @socket_id: module socket id
+ * @die_id: module die id
+ * @mem_controller_id: module memory controller id
+ * @channel_id: memory channel number
+ * @slot_id: dimm number
+ * @r: reserved
+ * The module location is a field in module identification information for
+ * interleave set. The field and its sub-fields are described in section
+ * 3.8.1.1 Table 311 in the provisioning specification document.
+ */
+struct pcd_ml {
+	u8 socket_id;
+	u8 die_id;
+	u8 mem_controller_id;
+	u8 channel_id;
+	u8 slot_id;
+	u8 r[3];
+} __attribute__((packed));
+/**
+ * struct pcd_miiis - module identification information for interleave set
+ * @mui: module unique identifier
+ * @ml: module location only used in pcat v1.2
+ * @r: reserved
+ * @offset: partition offset
+ * @size: partition size
+ * The module identification information for interleave set is used for
+ * identifying the modules that are present in the interleave set.
+ * The structure and fields of table are described in section 3.8.1.1 Table 310
+ * and Table 311 in the provisioning specification document.
+ */
+struct pcd_miiis {
+	struct pcd_mui mui;
+	struct pcd_ml ml;
+	u8 r[15];
+	u64 offset;
+	u64 size;
+} __attribute__((packed));
+/**
+ * struct pcd_iit - interleave information table (type 5)
+ * @type: table type
+ * @length: length in bytes for the entire table
+ * @index: interleave set index
+ * @modules: number of modules in interleave set
+ * @itype: interleave memory type
+ * @isize: interleave size
+ * @iways: interleave ways - this field is ignore for pcat v1.2
+ * @r1: reserved
+ * @status: interleave change status
+ * @r2: reserved
+ * The interleave information table describes a interleave set. The structure
+ * and fields of the table are described in section 3.8 Table 38 and 39 in the
+ * provisioning specification document
+ */
+struct pcd_iit {
+	u16 type;
+	u16 length;
+	u16 index;
+	u8 modules;
+	u8 itype;
+	u16 isize;
+	u16 iways;
+	u8 r1;
+	u8 status;
+	u8 r2[10];
+} __attribute__((packed));
+/**
+ * struct nfit_handle - nfit handle
+ * @dimm: dimm number
+ * @mem_chnl: memory channel number
+ * @mem_ctrlr: memory controller id
+ * @socket_id: socket id
+ * @node: node id
+ * @r: reserved
+ * @value: unsigned value
+ * This structure is used to unpack the subfields of the dimm handle.
+ */
+struct nfit_handle {
+	union {
+		struct {
+			u8 dimm : 4;
+			u8 mem_chnl : 4;
+			u8 mem_ctrlr : 4;
+			u8 socket_id : 4;
+			u16 node : 12;
+			u8 r : 4;
+		} __attribute__((packed));
+		u32 value;
+	};
+} __attribute__((packed));
+/**
+ * struct pcd_mpio - module partition information output payload
+ * @volatile_capacity: volatile capacity
+ * @r1: reserved
+ * @volatile_start: volatile start location
+ * @persistent_capacity: persistent capacity
+ * @r2: reserved
+ * @persistent_start: persistent start location
+ * @raw_capacity: raw capacity
+ * The structure represents the output parameters of get module partition
+ * information vendor specific command. The structure and its fields are
+ * described in section 4.3 and Table 48 in the provisioning specification
+ * document.
+ */
+struct pcd_mpio {
+	u32 volatile_capacity;
+	u8 r1[4];
+	u64 volatile_start;
+	u32 persistent_capacity;
+	u8 r2[4];
+	u64 persistent_start;
+	u32 raw_capacity;
+} __attribute__((packed));
 /**
  * struct pcd_get_pcd_input - get pcd input
  * @partition_id: partition id
@@ -203,6 +343,12 @@ static const char *const pcd_status_str[] = {
  * in section 4.2 Table 45 of provisioning document.
  */
 #define PCD_OPCODE_SET_PCD ((u16)0x0701)
+/**
+ * The opcode value for get module partition information vendor specific
+ * command. The value is mentioned in section 4.3 Table 447 of provisioning
+ * document.
+ */
+#define PCD_OPCODE_GET_MPI ((u16)0x0602)
 /**
  * Defines for PCD revision values
  */
@@ -215,4 +361,21 @@ static const char *const pcd_status_str[] = {
  * mentioned in Table 37 of provisioning document.
  */
 #define PSCT_TYPE ((u16)0x4)
+/**
+ * Define for header type value for Interleave Information Table. The value is
+ * mentioned in Table 38 of provisioning document.
+ */
+#define IIT_TYPE ((u16)0x5)
+/**
+ * Define for interleave memory type field app direct mode value in Interleave
+ * information table. The value is mentioned in Table 38 of provisioning
+ * document.
+ */
+#define APP_DIRECT_MODE ((u8)0x2)
+/**
+ * Define for interleave size field value in Interleave information table.
+ * The value is mentioned in Table 38 of provisioning document.
+ */
+#define INTERLEAVE_SIZE ((u16)0x4040)
+
 #endif /* _PCD_H_ */
diff --git a/ipmregion/reconfigure.c b/ipmregion/reconfigure.c
index 0a7fd3c..08543e8 100644
--- a/ipmregion/reconfigure.c
+++ b/ipmregion/reconfigure.c
@@ -9,6 +9,7 @@
 #include <util/parse-options.h>
 #include <util/json.h>
 #include <util/filter.h>
+#include <util/bitmap.h>
 #include <json-c/json.h>
 #include <pcat.h>
 #include <list.h>
@@ -25,6 +26,20 @@ static const struct option reconfigure_options[] = {
 	OPT_END()
 };
 
+/**
+ * Global interleave set linked list pointer. This stores the different
+ * interleave sets for the system. The interleave sets are ordered by socket
+ * id of the dimms in the interleave set. Within a set the dimms are ordered
+ * as they would in module identification information field. See Section 3.8
+ * Table 38 in provisioning document for more information on this field.
+ */
+static struct ndctl_dimm **iset_list;
+/**
+ * Global variable to store number of dimms in the interleave set linked list.
+ * It should be equal to the number of dimms in the system.
+ */
+static u32 iset_count;
+
 /**
  * Return the Configuration Header revision based on pcat revision.
  * 0.1: Used with PCAT revision 0.2
@@ -216,6 +231,125 @@ static void fill_psct(struct pcd_psct *psct, const u64 size)
 	psct->size = size;
 }
 
+/**
+ * The first module identification information for interleave set structure
+ * comes immediately after the location of interleave information table
+ * in PCD. These structures are explained in Section 3.8 Table 38, 39 and
+ * Section 3.8.1.1 Table 310, 311 of the provisioning document
+ */
+static struct pcd_miiis *get_miiis(struct pcd_iit const *i)
+{
+	return (struct pcd_miiis *)(i + 1);
+}
+
+/**
+ * Helper function to convert raw capacity in GiB to bytes. The Persistent
+ * memory partition size field of partition size change table needs to be
+ * filled as bytes. See section 3.7 Table 37 of the provisioning document.
+ */
+static u64 raw_capacity_to_bytes(u32 raw_capacity)
+{
+	u64 capacity = raw_capacity;
+
+	return capacity << 12;
+}
+
+/**
+ * Helper function to align byte to 1-GiB. The partition size field in module
+ * identification information for Interleave set needs to be filled in as bytes
+ * aligned to 1-GiB size. See section 3.8.1.1 Table 310 and 311 of the
+ * provisioning docuement.
+ */
+static inline u64 align_to_gb(u64 byte)
+{
+	return (byte >> 30) << 30;
+}
+
+/**
+ * Helper function to get the Interleave ways field value of Interleave
+ * information table , given number of modules in interleave set. See section
+ * 38 Table 38 of the provisioning document
+ */
+static u32 get_interleave_ways(u32 m_count)
+{
+	u32 i;
+	/**
+	 * The array has the number of modules in the interleave set given a
+	 * index position. The index position is used to determine the bit to
+	 * be set for the interleave ways field. See section 38 Table 38 of the
+	 * provisioning document
+	 */
+	u32 i_ways[] = { 1, 2, 3, 4, 6, 8, 12, 16, 24 };
+
+	/**
+	 * Search the array to find the index position the number of modules
+	 * is at. Return BIT(index) as the interleave ways field value. See
+	 * section 38 Table 38 of the provisioning document
+	 */
+	for (i = 0; i < 9; i++)
+		if (i_ways[i] == m_count)
+			return BIT(i);
+	return 0;
+}
+
+/**
+ * This helper function fills the fields of the interleave information table
+ * (iit). The structure and the fields are described in section 3.8 Table 38
+ * of the provisioning document.
+ */
+static void fill_iit(struct pcd_iit *iit, u8 modules, u16 index)
+{
+	iit->type = IIT_TYPE;
+	iit->length =
+		sizeof(struct pcd_iit) + (modules * sizeof(struct pcd_miiis));
+	iit->index = index;
+	iit->modules = modules;
+	iit->itype = APP_DIRECT_MODE;
+	iit->isize = INTERLEAVE_SIZE;
+	/**
+	 * This field is only present in 0.2 version of the interleave
+	 * information table. In version 1.2 the field becomes reserved
+	 * See section 3.8 Table 38 of the provisioning document
+	 */
+	iit->iways =
+		get_pcat_rev() < PCD_REV_1_0 ? get_interleave_ways(modules) : 0;
+}
+
+/**
+ * This helper function fills the fields of the module identification
+ * information for interleave set table (miiis). The structure and the
+ * fields are described in section 3.8.1.1 Table 310, 311
+ * of the provisioning document.
+ */
+static void fill_miiis(struct ndctl_dimm *dimm, struct pcd_miiis *m,
+		       u8 revision, u64 size)
+{
+	struct nfit_handle n_handle;
+
+	m->mui.id = ndctl_dimm_get_vendor(dimm);
+	m->mui.id = cpu_to_be16(m->mui.id);
+	m->mui.location = ndctl_dimm_get_manufacturing_location(dimm);
+	m->mui.date = ndctl_dimm_get_manufacturing_date(dimm);
+	m->mui.date = cpu_to_be16(m->mui.date);
+	m->mui.serial = ndctl_dimm_get_serial(dimm);
+	m->mui.serial = cpu_to_be32(m->mui.serial);
+	m->offset = 0;
+	m->size = size;
+	n_handle.value = ndctl_dimm_get_handle(dimm);
+	/**
+	 * The module location field only exists in v1.2 of the table
+	 * In v0.2 the field is reserved. See section 3.8.1.1 Table 311
+	 * of the provisioning document
+	 */
+	if (revision > PCD_REV_1_0) {
+		m->ml.socket_id = n_handle.socket_id;
+		m->ml.die_id = 0;
+		m->ml.mem_controller_id = n_handle.mem_ctrlr;
+		m->ml.channel_id = n_handle.mem_chnl;
+		m->ml.slot_id = n_handle.dimm;
+	}
+}
+
 /**
  * Update the checksum fields in the configuration input and configuration
  * header. These fields are described in Table 31 and Table 33 of provisioning
@@ -463,6 +597,30 @@ static int write_pcd(struct ndctl_dimm *dimm, const char *buf, u32 buf_length)
 	return 0;
 }
 
+/**
+ * Given the ndctl dimm object obtain the module partition information
+ * output structure by using a vendor specific command. See section 4.3
+ * Table 47 and 48 for more information on the command and the structure
+ * and field information of command input and output.
+ */
+static int get_module_partition_info(struct ndctl_dimm *dimm,
+				     struct pcd_mpio *buf)
+{
+	u32 op_code = 0;
+	char inp[PCD_SP_SIZE];
+	char op[PCD_SP_SIZE];
+
+	memset(buf, 0, sizeof(struct pcd_mpio));
+	memset(inp, 0, PCD_SP_SIZE);
+	memset(op, 0, PCD_SP_SIZE);
+	op_code = cpu_to_be16(PCD_OPCODE_GET_MPI);
+	if (execute_vendor_specific_cmd(dimm, op_code, inp, PCD_SP_SIZE, op,
+					PCD_SP_SIZE) != 0)
+		return -ENOTTY;
+	memcpy(buf, op, sizeof(struct pcd_mpio));
+	return 0;
+}
+
 /**
  * Validate checksum field of configuration header.
  */
@@ -609,6 +767,323 @@ out:
 	return status;
 }
 
+static u8 get_socket_id(struct ndctl_dimm *dimm)
+{
+	struct nfit_handle n_handle;
+
+	n_handle.value = ndctl_dimm_get_handle(dimm);
+	return n_handle.socket_id;
+}
+
+static int compare_dimm_socket(const void *p1, const void *p2)
+{
+	struct ndctl_dimm *dimm1 = *(struct ndctl_dimm **)p1;
+	struct ndctl_dimm *dimm2 = *(struct ndctl_dimm **)p2;
+	u8 sid1 = get_socket_id(dimm1);
+	u8 sid2 = get_socket_id(dimm2);
+
+	return sid1 - sid2;
+}
+
+static int compare_dimm_iset_parity(const void *p1, const void *p2)
+{
+	struct ndctl_dimm *dimm1 = *(struct ndctl_dimm **)p1;
+	struct ndctl_dimm *dimm2 = *(struct ndctl_dimm **)p2;
+	struct nfit_handle n_handle;
+	u32 order1 = 0;
+	u32 order2 = 0;
+
+	n_handle.value = ndctl_dimm_get_handle(dimm1);
+	/**
+	 * For six-way interleave sets (which can occur in sockets that have
+	 * two iMCs, each with three channels), the order is determined as
+	 * follows: Modules are first ordered by
+	 * “(channel number + iMC number) modulus 2”
+	 * and then ordered by channel number. See section 3.8 Table 38 of the
+	 * provisioning document.
+	 */
+	order1 = n_handle.socket_id << 8 |
+		 (n_handle.mem_chnl + n_handle.mem_ctrlr) % 2 << 4 |
+		 n_handle.mem_chnl;
+	n_handle.value = ndctl_dimm_get_handle(dimm2);
+	order2 = n_handle.socket_id << 8 |
+		 (n_handle.mem_chnl + n_handle.mem_ctrlr) % 2 << 4 |
+		 n_handle.mem_chnl;
+	return order1 - order2;
+}
+
+static int compare_dimm_iset(const void *p1, const void *p2)
+{
+	struct ndctl_dimm *dimm1 = *(struct ndctl_dimm **)p1;
+	struct ndctl_dimm *dimm2 = *(struct ndctl_dimm **)p2;
+	struct nfit_handle n_handle;
+	u32 order1 = 0;
+	u32 order2 = 0;
+
+	n_handle.value = ndctl_dimm_get_handle(dimm1);
+	/**
+	 * In interleave set the modules are first ordered by channel number
+	 * and then ordered by iMC number. See section 3.8 Table 38 of the
+	 * provisioning document.
+	 */
+	order1 = n_handle.socket_id << 8 | n_handle.mem_chnl << 4 |
+		 n_handle.mem_ctrlr;
+	n_handle.value = ndctl_dimm_get_handle(dimm2);
+	order2 = n_handle.socket_id << 8 | n_handle.mem_chnl << 4 |
+		 n_handle.mem_ctrlr;
+	return order1 - order2;
+}
+
+/**
+ * Helper function to rearrange dimms in the global interleave set dimm array
+ * with the same socket id into the order in which they would be in an
+ * module identification information field. See Section 3.8 Table 38 in
+ * provisioning document for more information on this field.
+ */
+static void rearrange_iset(u32 loc)
+{
+	static u8 tsocket;
+	static u32 dcount;
+	u8 csocket = 0;
+
+	/**
+	 * Here are the steps
+	 * 1) Check if there is dimm in the location
+	 * 2) If first location store the socket id
+	 * 3) If the socket id is same as stored socket id increment dcount
+	 * 4) If new socket id is seen one interleave set is complete and
+	 *    can be sorted
+	 * 5) Reset dcount and stored socket id for next interleave set
+	 */
+	if (iset_list[loc]) {
+		csocket = get_socket_id(iset_list[loc]);
+		if (loc == 0)
+			tsocket = csocket;
+		if (tsocket == csocket)
+			dcount++;
+	}
+	if (!iset_list[loc] || csocket != tsocket) {
+		/* All devices within the socket have been found */
+		if (dcount == 6)
+			qsort(&iset_list[loc - dcount], dcount,
+			      sizeof(struct ndctl_dimm *),
+			      compare_dimm_iset_parity);
+		else
+			qsort(&iset_list[loc - dcount], dcount,
+			      sizeof(struct ndctl_dimm *), compare_dimm_iset);
+		dcount = 1;
+		tsocket = csocket;
+	}
+}
+
+/**
+ * Helper function to initialize in the global interleave set dimm array
+ * with dimms sorted first in increasing order of socket id. Dimms with same
+ * socket id belong to same interleave set. Within each interleave set the
+ * are stored in order in which they would be in an
+ * module identification information field. See Section 3.8 Table 38 in
+ * provisioning document for more information on this field.
+ */
+static int initialize_adi(struct ndctl_dimm *dimm, u32 adcount)
+{
+	struct ndctl_bus *const bus = ndctl_dimm_get_bus((void *)dimm);
+	u32 i = 0;
+
+	/**
+	 * Here are the steps
+	 * 1) Do initialization, if this is the first dimm.
+	 * 2) Count number of dimms
+	 * 3) Create a global array with size number of dimms + 1
+	 * 4) Copy dimms into the global array and assign null to last element
+	 * 5) Sort the dimms by socket id to create interleave sets with dimms
+	 *    having the same socket id
+	 * 6) Run the helper function over all dimms to identify interleave sets
+	 *    and sort the dimms within an interleave set in order in which
+	 *    they would be in a module identification information field.
+	 *    See Section 3.8 Table 38 in provisioning document.
+	 */
+	if (adcount != 0)
+		return 0;
+	dimm = NULL;
+	ndctl_dimm_foreach((void *)bus, dimm) {
+		iset_count++;
+	}
+	iset_list = malloc((iset_count + 1) * sizeof(dimm));
+	if (!iset_list)
+		return -ENOMEM;
+	i = 0;
+	ndctl_dimm_foreach((void *)bus, dimm) {
+		iset_list[i++] = dimm;
+	}
+	iset_list[i] = NULL;
+	qsort(iset_list, iset_count, sizeof(dimm), compare_dimm_socket);
+	for (i = 0; i < iset_count + 1; i++)
+		rearrange_iset(i);
+	return 0;
+}
+
+/**
+ * Given a iset location, socket id of dimm, and previously returned
+ * interleave set dimm, see if the dimm in iset location can be returned
+ * as the next dimm in the interleave set
+ */
+static int check_next_iset_dimm(u32 loc, u8 sid, struct ndctl_dimm **b)
+{
+	struct ndctl_dimm *d = (void *)iset_list[loc];
+
+	/**
+	 * Here are the steps
+	 * 1) If previous returned dimm is null, return dimm at current
+	 *    iset location as the next dimm in interleave set.
+	 * 2) If previous returned dimm is at current iset location, check to
+	 *    see if dimm in next iset location exists and belongs to same
+	 *    iset (same socket id), if so return the dimm in loc + 1 as the
+	 *    next dimm in interleave set
+	 * 3) If dimm at loc + 1 does not exist or has different socket id. The
+	 *    end of interleave set is reached and return null as next dimm.
+	 * See Section 3.8 Table 38 in provisioning document for ordering a
+	 * interleave set.
+	 */
+	if (!(*b)) {
+		/* inp is null , return first elem */
+		*b = d;
+		return 0;
+	} else if (*b == d) {
+		/* reached last returned elem */
+		struct ndctl_dimm *dn = iset_list[loc + 1];
+
+		if (!dn) {
+			/* No more elements in isetlist */
+			*b = NULL;
+			return 0;
+		} else if (dn) {
+			/* Next element in isetlist present */
+			u8 isid = get_socket_id(dn);
+
+			if (isid != sid) {
+				/* No next element to return */
+				*b = NULL;
+				return 0;
+			} else if (isid == sid) {
+				/* Next element is same sid */
+				*b = dn;
+				return 0;
+			}
+		}
+	}
+	return -1;
+}
+
+/**
+ * Given a dimm object and a dimm in the interleave set b, get the
+ * next dimm in the same interleave set after b. If next dimm does not exist
+ * or belongs to another interleave set return null. If b is null then
+ * the first dimm in interleave set to which dimm object belongs is returned.
+ */
+static int get_next_iset_dimm(struct ndctl_dimm *dimm, struct ndctl_dimm **b)
+{
+	u8 sid = 0;
+	struct ndctl_dimm *di = NULL;
+	u32 i = 0;
+	int ret = 0;
+
+	/**
+	 * Here are the steps
+	 * 1) Get socket id of dimm
+	 * 2) Loop through dimms in global interleave set array till dimm of
+	 *    same socket id can be found.
+	 * 3) For each dimm of same socket id, call check_next_iset_dimm
+	 *    to see if the next dimm in global interleave set array should
+	 *    be returned as the next dimm in interleave set after dimm b
+	 * See Section 3.8 Table 38 in provisioning document for ordering a
+	 * interleave set.
+	 */
+	di = (void *)iset_list[0];
+	if (!b || !dimm || !di)
+		return -ENOTTY;
+	sid = get_socket_id(dimm);
+	while (di) {
+		u8 isid = 0;
+
+		isid = get_socket_id(di);
+		if (sid == isid) {
+			ret = check_next_iset_dimm(i, sid, b);
+			if (!ret)
+				return ret;
+		}
+		di = (void *)iset_list[++i];
+	}
+	*b = NULL;
+	return -1;
+}
+
+/**
+ * Given a dimm object, get the number of dimms in the interleave set to which
+ * it belongs. See Section 3.8 Table 38 in provisioning document for ordering a
+ * interleave set
+ */
+static int get_iset_dimm_count(struct ndctl_dimm *dimm, u32 *count)
+{
+	struct ndctl_dimm *d = NULL;
+	int ret = -1;
+
+	if (!count)
+		return ret;
+	*count = 0;
+	/**
+	 * Iterate through the dimms in the interleave set and keep count
+	 * of dimms. If next dimm in interleave set is NULL, then return count
+	 */
+	while (!(ret = get_next_iset_dimm(dimm, &d))) {
+		if (!d)
+			break;
+		*count = *count + 1;
+	}
+	return ret;
+}
+
+/**
+ * Given a dimm object calculate partition size field in module identification
+ * information for interleave set table. See section 3.8.1.1 Table 310 and 311
+ * of provisioning document.
+ */
+static u64 get_iset_psize(struct ndctl_dimm *dimm)
+{
+	struct ndctl_dimm *d = NULL;
+	int count = 0;
+	struct pcd_mpio mpio;
+	u64 i_size = 0;
+	u64 r_size = 0;
+	int ret = 0;
+
+	/**
+	 * Iterate over dimms in the same interleave set as dimm object and
+	 * return the lowest raw capacity aligned to 1gib
+	 */
+	while (!(ret = get_next_iset_dimm(dimm, &d))) {
+		if (!d)
+			break;
+		if (get_module_partition_info(d, &mpio) != 0)
+			break;
+		i_size = align_to_gb(raw_capacity_to_bytes(mpio.raw_capacity));
+		r_size = (count == 0) ? i_size : min(r_size, i_size);
+		count++;
+	}
+	return r_size;
+}
+
+/**
+ * Free the global interleave set array. Using the passed in count, free is
+ * called only after all dimms are processed for the reconfigure request.
+ */
+static void deinitialize_adi(u32 ad_count)
+{
+	if ((ad_count == iset_count || ad_count == 0) && iset_list) {
+		free(iset_list);
+		iset_list = NULL;
+	}
+}
+
 /**
  * Given a dimm object read its pcd and create a pcd structure with a
  * configuration input table that requests creation of volatile region.
@@ -681,6 +1156,234 @@ out:
 	return ret;
 }
 
+/**
+ * Given a dimm object read its pcd and create a pcd structure with a
+ * configuration input table that requests creation of
+ * non-interleaved region.
+ * To create the request a new Configuration input table, Partition size
+ * change table, interleave information table and module identification
+ * information for interleave set table are to be added the pcd .Existing
+ * configuration output table is removed.
+ * PCD Partition ID 1 - Configuration management usage (64KB)
+ *
+ * +-------------+	+---------------------+    +---------------+
+ * |		 +----->│Current Configuration+--->│Extension table│
+ * |Configuration|	+---------------------+    +---------------+
+ * │	Header	 |	+---------------------+    +--------------------------+
+ * │		 +----->│ Configuration Input +--->│ Request Extension tables │
+ * +-------------+	+---------------------+    +--------------------------+
+ *
+ * The request extension tables for non-interleaved regions are
+ * 1) Partition size change table (PSCT)
+ * 2) Interleave information table (IIT)
+ * 3) Module identification information for interleave set (MIIIS)
+ * These tables are placed one after another. Section 5.2 in provisioning
+ * document provides an example pcd when creating this request.
+ */
+static int reconfigure_adni(struct ndctl_dimm *dimm)
+{
+	struct pcd_config_header *buf = NULL;
+	u32 length = 0;
+	struct pcd_mpio mpio;
+	struct pcd_config_header *ch = NULL;
+	struct pcd_psct *psct = NULL;
+	struct pcd_iit *iit = NULL;
+	struct pcd_miiis *m = NULL;
+	static u32 isi;
+	int ret = 0;
+	u64 msize = 0;
+	u32 pcd_size = read_pcd_size(dimm);
+
+	/**
+	 * Here are the steps to create a 100% memory mode request
+	 * 1) Read current pcd and module partition information
+	 * 2) Copy configuration header, current configuration tables to a new
+	 *    pcd
+	 * 3) Create the Configuration input table, partition size change
+	      table, interleave information table, module identification
+	      information for interleave set and add them to the pcd
+	 * 4) Write the new pcd.
+	 * The value of persistent memory partition size in the partition size
+	 * change table is set to raw capacity of the dimm to indicate that all
+	 * memory is to be used for interleave set. The number of modules in
+	 * interleave set field of interleave information table is set to 1 to
+	 * indicate there is no interleaving. Section 5.2 in provisioning
+	 * document provides an example pcd when creating this request.
+	 */
+	if (!pcd_size)
+		goto out;
+	buf = (struct pcd_config_header *)calloc(pcd_size, sizeof(char));
+	if (!(buf))
+		goto out;
+	ret = read_pcd(dimm, &buf, pcd_size);
+	if (ret != 0)
+		goto out;
+	/**
+	 * For non-interleave request, The length of pcd table
+	 * to be written would depend on
+	 * 1) sum of read pcd size
+	 * 2) length of psct table
+	 * 3) length of iit table
+	 * 4) length of one miiis
+	 * The configuration input table would take the same space as zeroed
+	 * out configuration output table from read pcd. Section 5.2 in
+	 * provisioning document provides an example pcd when creating
+	 * this request.
+	 */
+	length = sizeof(struct pcd_psct) + sizeof(struct pcd_iit) +
+		 sizeof(struct pcd_miiis);
+	pcd_size = set_pcd_length(pcd_size, length);
+	ret = get_module_partition_info(dimm, &mpio);
+	if (ret != 0)
+		goto out;
+	ch = malloc(pcd_size);
+	if (!ch) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	init_pcd(buf, pcd_size, length, ch);
+	psct = (struct pcd_psct *)get_cin_tables_start(ch);
+	fill_psct(psct, raw_capacity_to_bytes(mpio.raw_capacity));
+	iit = (struct pcd_iit *)(psct + 1);
+	fill_iit(iit, 1, isi++);
+	m = get_miiis(iit);
+	msize = align_to_gb(raw_capacity_to_bytes(mpio.raw_capacity));
+	fill_miiis(dimm, m, ch->header.revision, msize);
+	finalize_pcd(ch);
+	ret = write_pcd(dimm, (char *)ch, pcd_size);
+out:
+	free(ch);
+	free(buf);
+	return ret;
+}
+
+/**
+ * Given a dimm object read its pcd and create a pcd structure with a
+ * configuration input table that requests creation of
+ * interleaved region.
+ * To create the request a new Configuration input table, Partition size
+ * change table, interleave information table and module identification
+ * information for interleave set tables for each dimm in the interleave set
+ * are to be added the pcd .Existing configuration output table is removed.
+ * PCD Partition ID 1 - Configuration management usage (64KB)
+ *
+ * +-------------+	+---------------------+    +---------------+
+ * |		 +----->│Current Configuration+--->│Extension table│
+ * |Configuration|	+---------------------+    +---------------+
+ * │	Header	 |	+---------------------+    +--------------------------+
+ * │		 +----->│ Configuration Input +--->│ Request Extension tables │
+ * +-------------+	+---------------------+    +--------------------------+
+ *
+ * The request extension tables for interleaved are
+ * 1) Partition size change table (PSCT)
+ * 2) Interleave information table (IIT)
+ * 3) N * Module identification information for interleave set (MIIIS). N is
+ *    number of dimms in the same interleave set as dimm object.
+ * Here an interleave set is all dimms with same socket id as dimm object.
+ * These tables are placed one after another. Section 5.3 in provisioning
+ * document provides an example pcd when creating this request.
+ */
+static int reconfigure_ad(struct ndctl_dimm *dimm)
+{
+	struct pcd_config_header *buf = NULL;
+	static u32 ad_count;
+	u32 mii_count = 0;
+	u32 length = 0;
+	struct pcd_mpio mpio;
+	int ret = 0;
+	struct ndctl_dimm *idimm = NULL;
+	struct pcd_config_header *ch = NULL;
+	struct pcd_psct *psct = NULL;
+	struct pcd_iit *iit = NULL;
+	struct pcd_miiis *m = NULL;
+	u64 msize = 0;
+	u32 pcd_size = read_pcd_size(dimm);
+
+	/**
+	 * Here are the steps to create interleave request
+	 * 1) Create interleave sets for system if not created.
+	 * 2) Read current pcd and module partition information
+	 * 3) Copy configuration header, current configuration tables to a new
+	 *    pcd
+	 * 4) Create the Configuration input table, partition size change
+	      table and interleave information table.
+	 * 5) Identify other dimms in the same interleave set and for each
+	 *    dimm create a module indentification information for interleave
+	 *    set table and add all of them to interleave information table.
+	 * 6) Add all the new tables to the new pcd.
+	 * 7) Write the new pcd.
+	 * The value of persistent memory partition size in the partition size
+	 * change table is set to raw capacity of the dimm to indicate that all
+	 * memory is to be used for interleave set. The number of modules in
+	 * interleave set field of interleave information table is set to
+	 * to number of dimms in interleave set to indicate there is
+	 * interleaving. Section 5.3 in provisioning
+	 * document provides an example pcd when creating this request.
+	 */
+	ret = initialize_adi(dimm, ad_count++);
+	if (ret != 0)
+		goto out;
+	ret = get_iset_dimm_count(dimm, &mii_count);
+	if (ret != 0)
+		goto out;
+	ret = get_next_iset_dimm(dimm, &idimm);
+	if (ret != 0)
+		goto out;
+	ret = get_module_partition_info(dimm, &mpio);
+	if (ret != 0)
+		goto out;
+	if (!pcd_size)
+		goto out;
+	buf = (struct pcd_config_header *)calloc(pcd_size, sizeof(char));
+	if (!(buf))
+		goto out;
+	ret = read_pcd(dimm, &buf, pcd_size);
+	if (ret != 0)
+		goto out;
+	/**
+	 * For interleave request, The length of pcd table
+	 * to be written would depend on
+	 * 1) sum of read pcd size
+	 * 2) length of psct table
+	 * 3) length of iit table
+	 * 4) length of miiis * number of dimms in interleave set
+	 * The configuration input table would take the same space as zeroed
+	 * out configuration output table from read pcd. Section 5.3 in
+	 * provisioning document provides an example pcd when creating
+	 * this request.
+	 */
+	length = sizeof(struct pcd_psct) + sizeof(struct pcd_iit) +
+		 (mii_count * sizeof(struct pcd_miiis));
+	pcd_size = set_pcd_length(pcd_size, length);
+	ch = malloc(pcd_size);
+	if (!ch) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	init_pcd(buf, pcd_size, length, ch);
+	psct = (struct pcd_psct *)get_cin_tables_start(ch);
+	fill_psct(psct, raw_capacity_to_bytes(mpio.raw_capacity));
+	iit = (struct pcd_iit *)(psct + 1);
+	fill_iit(iit, mii_count, get_socket_id(dimm) + 1);
+	m = get_miiis(iit);
+	msize = get_iset_psize(dimm);
+	while (idimm) {
+		fill_miiis(idimm, m, ch->header.revision, msize);
+		get_next_iset_dimm(dimm, &idimm);
+		m = m + 1;
+	}
+	finalize_pcd(ch);
+	ret = write_pcd(dimm, (char *)ch, pcd_size);
+out:
+	free(ch);
+	free(buf);
+	/* Force deinitialization if returning error */
+	if (ret != 0)
+		ad_count = 0;
+	deinitialize_adi(ad_count);
+	return ret;
+}
+
 /**
  * Given the mode option perform the region reconfiguration action on
  * the dimm object
@@ -688,10 +1391,12 @@ out:
 static int do_reconfigure(struct ndctl_dimm *dimm, const char *mode)
 {
 	if (!mode)
-		return -EOPNOTSUPP;
+		return reconfigure_ad(dimm);
 	if (strncmp(mode, "ram", 3) == 0)
 		return reconfigure_volatile(dimm);
-	return -EOPNOTSUPP;
+	if (strncmp(mode, "fault-isolation-pmem", 20) == 0)
+		return reconfigure_adni(dimm);
+	return reconfigure_ad(dimm);
 }
 
 /**
-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-20 15:51 [PATCH v2 0/4] ndctl: Add ipmregion tool with ipmregion list and reconfigure-region commands James Anandraj
2021-07-20 15:51 ` [PATCH v2 1/4] Documentation/ipmregion: Add documentation for ipmregion tool and commands James Anandraj
2021-07-20 15:51 ` [PATCH v2 2/4] ipmregion/list: Add ipmregion-list command to enumerate 'nvdimm' devices James Anandraj
2021-07-20 15:51 ` [PATCH v2 3/4] ipmregion/reconfigure: Add ipmregion-reconfigure-region command James Anandraj
2021-07-20 15:51 ` [PATCH v2 4/4] ipmregion/reconfigure: Add support for different pmem region modes James Anandraj

NVDIMM Device and Persistent Memory development

Archives are clonable:
	git clone --mirror https://lore.kernel.org/nvdimm/0 nvdimm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 nvdimm nvdimm/ https://lore.kernel.org/nvdimm \
		nvdimm@lists.linux.dev
	public-inbox-index nvdimm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/dev.linux.lists.nvdimm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git