All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: "Alastair D'Silva" <alastair@d-silva.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
	"Paul Mackerras" <paulus@samba.org>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Frederic Barrat" <fbarrat@linux.ibm.com>,
	"Andrew Donnellan" <ajd@linux.ibm.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Mauro Carvalho Chehab" <mchehab+samsung@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Rob Herring" <robh@kernel.org>,
	"Anton Blanchard" <anton@ozlabs.org>,
	"Krzysztof Kozlowski" <krzk@kernel.org>,
	"Mahesh Salgaonkar" <mahesh@linux.vnet.ibm.com>,
	"Madhavan Srinivasan" <maddy@linux.vnet.ibm.com>,
	"Cédric Le Goater" <clg@kaod.org>,
	"Anju T Sudhakar" <anju@linux.vnet.ibm.com>,
	"Hari Bathini" <hbathini@linux.ibm.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Greg Kurz" <groug@kaod.org>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Masahiro Yamada" <yamada.masahiro@socionext.com>,
	"Alexey Kardashevskiy" <aik@ozlabs.ru>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	"Linux MM" <linux-mm@kvack.org>
Subject: Re: [PATCH v4 16/25] nvdimm/ocxl: Implement the Read Error Log command
Date: Thu, 2 Apr 2020 17:54:56 -0700	[thread overview]
Message-ID: <CAPcyv4go3Ufv91E4yuVJ8x9eRU+rdOBZLC2SM9FVr-2o5iRZDw@mail.gmail.com> (raw)
In-Reply-To: <20200327071202.2159885-17-alastair@d-silva.org>

On Tue, Mar 31, 2020 at 1:59 AM Alastair D'Silva <alastair@d-silva.org> wrote:
>
> The read error log command extracts information from the controller's
> internal error log.
>
> This patch exposes this information in 2 ways:
> - During probe, if an error occurs & a log is available, print it to the
>   console
> - After probe, make the error log available to userspace via an IOCTL.
>   Userspace is notified of pending error logs in a later patch
>   ("powerpc/powernv/pmem: Forward events to userspace")

So, have a look at the recent papr_scm patches to add health flags and
smart data retrieval. I'd prefer to extend existing nvdimm device
retrieval mechanisms than invent new ones.


>
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---
>  .../userspace-api/ioctl/ioctl-number.rst      |   1 +
>  drivers/nvdimm/ocxl/main.c                    | 240 ++++++++++++++++++
>  include/uapi/nvdimm/ocxlpmem.h                |  46 ++++
>  3 files changed, 287 insertions(+)
>  create mode 100644 include/uapi/nvdimm/ocxlpmem.h
>
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 9425377615ce..ba0ce7dca643 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -340,6 +340,7 @@ Code  Seq#    Include File                                           Comments
>  0xC0  00-0F  linux/usb/iowarrior.h
>  0xCA  00-0F  uapi/misc/cxl.h
>  0xCA  10-2F  uapi/misc/ocxl.h
> +0xCA  30-3F  uapi/nvdimm/ocxlpmem.h                                  OpenCAPI Persistent Memory
>  0xCA  80-BF  uapi/scsi/cxlflash_ioctl.h
>  0xCB  00-1F                                                          CBM serial IEC bus in development:
>                                                                       <mailto:michael.klein@puffin.lb.shuttle.de>
> diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
> index 9b85fcd3f1c9..e6be0029f658 100644
> --- a/drivers/nvdimm/ocxl/main.c
> +++ b/drivers/nvdimm/ocxl/main.c
> @@ -13,6 +13,7 @@
>  #include <linux/fs.h>
>  #include <linux/mm_types.h>
>  #include <linux/memory_hotplug.h>
> +#include <uapi/nvdimm/ocxlpmem.h>
>  #include "ocxlpmem.h"
>
>  static const struct pci_device_id pci_tbl[] = {
> @@ -401,10 +402,190 @@ static int file_release(struct inode *inode, struct file *file)
>         return 0;
>  }
>
> +/**
> + * error_log_header_parse() - Parse the first 64 bits of the error log command response
> + * @ocxlpmem: the device metadata
> + * @length: out, returns the number of bytes in the response (excluding the 64 bit header)
> + */
> +static int error_log_header_parse(struct ocxlpmem *ocxlpmem, u16 *length)
> +{
> +       int rc;
> +       u64 val;
> +       u16 data_identifier;
> +       u32 data_length;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset,
> +                                    OCXL_LITTLE_ENDIAN, &val);
> +       if (rc)
> +               return rc;
> +
> +       data_identifier = val >> 48;
> +       data_length = val & 0xFFFF;
> +
> +       if (data_identifier != 0x454C) { // 'EL'
> +               dev_err(&ocxlpmem->dev,
> +                       "Bad data identifier for error log data, expected 'EL', got '%2s' (%#x), data_length=%u\n",
> +                       (char *)&data_identifier,
> +                       (unsigned int)data_identifier, data_length);
> +               return -EINVAL;
> +       }
> +
> +       *length = data_length;
> +       return 0;
> +}
> +
> +static int read_error_log(struct ocxlpmem *ocxlpmem,
> +                         struct ioctl_ocxlpmem_error_log *log,
> +                         bool buf_is_user)
> +{
> +       u64 val;
> +       u16 user_buf_length;
> +       u16 buf_length;
> +       u64 *buf = (u64 *)log->buf_ptr;
> +       u16 i;
> +       int rc;
> +
> +       if (log->buf_size % 8)
> +               return -EINVAL;
> +
> +       rc = ocxlpmem_chi(ocxlpmem, &val);
> +       if (rc)
> +               return rc;
> +
> +       if (!(val & GLOBAL_MMIO_CHI_ELA))
> +               return -EAGAIN;
> +
> +       user_buf_length = log->buf_size;
> +
> +       mutex_lock(&ocxlpmem->admin_command.lock);
> +
> +       rc = admin_command_execute(ocxlpmem, ADMIN_COMMAND_ERRLOG);
> +       if (rc != STATUS_SUCCESS) {
> +               warn_status(ocxlpmem,
> +                           "Unexpected status from retrieve error log", rc);
> +               goto out;
> +       }
> +
> +       rc = error_log_header_parse(ocxlpmem, &log->buf_size);
> +       if (rc)
> +               goto out;
> +       // log->buf_size now contains the returned buffer size, not the user size
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x08,
> +                                    OCXL_LITTLE_ENDIAN, &val);
> +       if (rc)
> +               goto out;
> +
> +       log->log_identifier = val >> 32;
> +       log->program_reference_code = val & 0xFFFFFFFF;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x10,
> +                                    OCXL_LITTLE_ENDIAN, &val);
> +       if (rc)
> +               goto out;
> +
> +       log->error_log_type = val >> 56;
> +       log->action_flags = (log->error_log_type == OCXLPMEM_ERROR_LOG_TYPE_GENERAL) ?
> +                           (val >> 32) & 0xFFFFFF : 0;
> +       log->power_on_seconds = val & 0xFFFFFFFF;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x18,
> +                                    OCXL_LITTLE_ENDIAN, &log->timestamp);
> +       if (rc)
> +               goto out;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x20,
> +                                    OCXL_LITTLE_ENDIAN, &log->wwid[0]);
> +       if (rc)
> +               goto out;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x28,
> +                                    OCXL_LITTLE_ENDIAN, &log->wwid[1]);
> +       if (rc)
> +               goto out;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x30,
> +                                    OCXL_HOST_ENDIAN, (u64 *)log->fw_revision);
> +       if (rc)
> +               goto out;
> +       log->fw_revision[8] = '\0';
> +
> +       buf_length = (user_buf_length < log->buf_size) ?
> +                     user_buf_length : log->buf_size;
> +       for (i = 0; i < buf_length / (sizeof(u64)); i++) {
> +               rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                            ocxlpmem->admin_command.data_offset +
> +                                                       i * sizeof(u64),
> +                                            OCXL_HOST_ENDIAN, &val);
> +               if (rc)
> +                       goto out;
> +
> +               if (buf_is_user) {
> +                       if (copy_to_user((u64 __user *)&buf[i], &val,
> +                                        sizeof(u64))) {
> +                               rc = -EFAULT;
> +                               goto out;
> +                       }
> +               } else {
> +                       buf[i] = val;
> +               }
> +       }
> +
> +       rc = admin_response_handled(ocxlpmem);
> +       if (rc)
> +               goto out;
> +
> +out:
> +       mutex_unlock(&ocxlpmem->admin_command.lock);
> +       return rc;
> +}
> +
> +static int ioctl_error_log(struct ocxlpmem *ocxlpmem,
> +                          struct ioctl_ocxlpmem_error_log __user *uarg)
> +{
> +       struct ioctl_ocxlpmem_error_log args;
> +       int rc;
> +
> +       if (copy_from_user(&args, uarg, sizeof(args)))
> +               return -EFAULT;
> +
> +       rc = read_error_log(ocxlpmem, &args, true);
> +       if (rc)
> +               return rc;
> +
> +       if (copy_to_user(uarg, &args, sizeof(args)))
> +               return -EFAULT;
> +
> +       return 0;
> +}
> +
> +static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
> +{
> +       struct ocxlpmem *ocxlpmem = file->private_data;
> +       int rc = -EINVAL;
> +
> +       switch (cmd) {
> +       case IOCTL_OCXLPMEM_ERROR_LOG:
> +               rc = ioctl_error_log(ocxlpmem,
> +                                    (struct ioctl_ocxlpmem_error_log __user *)args);
> +               break;
> +       }
> +       return rc;
> +}
> +
>  static const struct file_operations fops = {
>         .owner          = THIS_MODULE,
>         .open           = file_open,
>         .release        = file_release,
> +       .unlocked_ioctl = file_ioctl,
> +       .compat_ioctl   = file_ioctl,
>  };
>
>  /**
> @@ -493,6 +674,60 @@ static int read_device_metadata(struct ocxlpmem *ocxlpmem)
>         return 0;
>  }
>
> +static const char *decode_error_log_type(u8 error_log_type)
> +{
> +       switch (error_log_type) {
> +       case 0x00:
> +               return "general";
> +       case 0x01:
> +               return "predictive failure";
> +       case 0x02:
> +               return "thermal warning";
> +       case 0x03:
> +               return "data loss";
> +       case 0x04:
> +               return "health & performance";
> +       default:
> +               return "unknown";
> +       }
> +}
> +
> +static void dump_error_log(struct ocxlpmem *ocxlpmem)
> +{
> +       struct ioctl_ocxlpmem_error_log log;
> +       u32 buf_size;
> +       u8 *buf;
> +       int rc;
> +
> +       if (ocxlpmem->admin_command.data_size == 0)
> +               return;
> +
> +       buf_size = ocxlpmem->admin_command.data_size - 0x48;
> +       buf = kzalloc(buf_size, GFP_KERNEL);
> +       if (!buf)
> +               return;
> +
> +       log.buf_ptr = (u64)buf;
> +       log.buf_size = buf_size;
> +
> +       rc = read_error_log(ocxlpmem, &log, false);
> +       if (rc < 0)
> +               goto out;
> +
> +       dev_warn(&ocxlpmem->dev,
> +                "OCXL PMEM Error log: WWID=0x%016llx%016llx LID=0x%x PRC=%x type=0x%x %s, Uptime=%u seconds timestamp=0x%llx\n",
> +                log.wwid[0], log.wwid[1],
> +                log.log_identifier, log.program_reference_code,
> +                log.error_log_type,
> +                decode_error_log_type(log.error_log_type),
> +                log.power_on_seconds, log.timestamp);
> +       print_hex_dump(KERN_WARNING, "buf", DUMP_PREFIX_OFFSET, 16, 1, buf,
> +                      log.buf_size, false);
> +
> +out:
> +       kfree(buf);
> +}
> +
>  /**
>   * probe_function0() - Set up function 0 for an OpenCAPI persistent memory device
>   * This is important as it enables templates higher than 0 across all other
> @@ -656,6 +891,11 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>         pci_set_drvdata(pdev, NULL);
>
>  err:
> +       if (ocxlpmem &&
> +           (ocxlpmem_chi(ocxlpmem, &chi) == 0) &&
> +           (chi & GLOBAL_MMIO_CHI_ELA))
> +               dump_error_log(ocxlpmem);
> +
>         /*
>          * Further cleanup is done in the release handler via free_ocxlpmem()
>          * This allows us to keep the character device live to handle IOCTLs to
> diff --git a/include/uapi/nvdimm/ocxlpmem.h b/include/uapi/nvdimm/ocxlpmem.h
> new file mode 100644
> index 000000000000..5d3a03ea1e08
> --- /dev/null
> +++ b/include/uapi/nvdimm/ocxlpmem.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/* Copyright 2020 IBM Corp. */
> +#ifndef _UAPI_OCXL_SCM_H
> +#define _UAPI_OCXL_SCM_H
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +#define OCXLPMEM_ERROR_LOG_ACTION_RESET        (1 << (32 - 32))
> +#define OCXLPMEM_ERROR_LOG_ACTION_CHKFW        (1 << (53 - 32))
> +#define OCXLPMEM_ERROR_LOG_ACTION_REPLACE      (1 << (54 - 32))
> +#define OCXLPMEM_ERROR_LOG_ACTION_DUMP         (1 << (55 - 32))
> +
> +#define OCXLPMEM_ERROR_LOG_TYPE_GENERAL                (0x00)
> +#define OCXLPMEM_ERROR_LOG_TYPE_PREDICTIVE_FAILURE     (0x01)
> +#define OCXLPMEM_ERROR_LOG_TYPE_THERMAL_WARNING        (0x02)
> +#define OCXLPMEM_ERROR_LOG_TYPE_DATA_LOSS              (0x03)
> +#define OCXLPMEM_ERROR_LOG_TYPE_HEALTH_PERFORMANCE     (0x04)
> +
> +struct ioctl_ocxlpmem_error_log {
> +       __u32 log_identifier; /* out */
> +       __u32 program_reference_code; /* out */
> +       __u32 action_flags; /* out, recommended course of action */
> +       __u32 power_on_seconds; /* out, Number of seconds the controller has been on when the error occurred */
> +       __u64 timestamp; /* out, relative time since the current IPL */
> +       __u64 wwid[2]; /* out, the NAA formatted WWID associated with the controller */
> +       char  fw_revision[8 + 1]; /* out, firmware revision as null terminated text */
> +       __u8  reserved0[7];
> +       __u16 buf_size; /* in/out, buffer size provided/required.
> +                        * If required is greater than provided, the buffer
> +                        * will be truncated to the amount provided. If its
> +                        * less, then only the required bytes will be populated.
> +                        * If it is 0, then there are no more error log entries.
> +                        */
> +       __u8  error_log_type;
> +       __u8  reserved1[5];
> +       __u64 buf_ptr; /* coerced pointer to output buffer */
> +       __u64 reserved2[2];
> +};
> +
> +/* ioctl numbers */
> +#define OCXLPMEM_MAGIC 0xCA
> +/* OpenCAPI Persistent memory devices */
> +#define IOCTL_OCXLPMEM_ERROR_LOG                       _IOWR(OCXLPMEM_MAGIC, 0x30, struct ioctl_ocxlpmem_error_log)
> +
> +#endif /* _UAPI_OCXL_SCM_H */
> --
> 2.24.1
>
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: "Alastair D'Silva" <alastair@d-silva.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	"Oliver O'Halloran" <oohall@gmail.com>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
	"Paul Mackerras" <paulus@samba.org>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Frederic Barrat" <fbarrat@linux.ibm.com>,
	"Andrew Donnellan" <ajd@linux.ibm.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Vishal Verma" <vishal.l.verma@intel.com>,
	"Dave Jiang" <dave.jiang@intel.com>,
	"Ira Weiny" <ira.weiny@intel.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Mauro Carvalho Chehab" <mchehab+samsung@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Rob Herring" <robh@kernel.org>,
	"Anton Blanchard" <anton@ozlabs.org>,
	"Krzysztof Kozlowski" <krzk@kernel.org>,
	"Mahesh Salgaonkar" <mahesh@linux.vnet.ibm.com>,
	"Madhavan Srinivasan" <maddy@linux.vnet.ibm.com>,
	"Cédric Le Goater" <clg@kaod.org>,
	"Anju T Sudhakar" <anju@linux.vnet.ibm.com>,
	"Hari Bathini" <hbathini@linux.ibm.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Greg Kurz" <groug@kaod.org>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Masahiro Yamada" <yamada.masahiro@socionext.com>,
	"Alexey Kardashevskiy" <aik@ozlabs.ru>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	"Linux MM" <linux-mm@kvack.org>
Subject: Re: [PATCH v4 16/25] nvdimm/ocxl: Implement the Read Error Log command
Date: Thu, 2 Apr 2020 17:54:56 -0700	[thread overview]
Message-ID: <CAPcyv4go3Ufv91E4yuVJ8x9eRU+rdOBZLC2SM9FVr-2o5iRZDw@mail.gmail.com> (raw)
In-Reply-To: <20200327071202.2159885-17-alastair@d-silva.org>

On Tue, Mar 31, 2020 at 1:59 AM Alastair D'Silva <alastair@d-silva.org> wrote:
>
> The read error log command extracts information from the controller's
> internal error log.
>
> This patch exposes this information in 2 ways:
> - During probe, if an error occurs & a log is available, print it to the
>   console
> - After probe, make the error log available to userspace via an IOCTL.
>   Userspace is notified of pending error logs in a later patch
>   ("powerpc/powernv/pmem: Forward events to userspace")

So, have a look at the recent papr_scm patches to add health flags and
smart data retrieval. I'd prefer to extend existing nvdimm device
retrieval mechanisms than invent new ones.


>
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---
>  .../userspace-api/ioctl/ioctl-number.rst      |   1 +
>  drivers/nvdimm/ocxl/main.c                    | 240 ++++++++++++++++++
>  include/uapi/nvdimm/ocxlpmem.h                |  46 ++++
>  3 files changed, 287 insertions(+)
>  create mode 100644 include/uapi/nvdimm/ocxlpmem.h
>
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 9425377615ce..ba0ce7dca643 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -340,6 +340,7 @@ Code  Seq#    Include File                                           Comments
>  0xC0  00-0F  linux/usb/iowarrior.h
>  0xCA  00-0F  uapi/misc/cxl.h
>  0xCA  10-2F  uapi/misc/ocxl.h
> +0xCA  30-3F  uapi/nvdimm/ocxlpmem.h                                  OpenCAPI Persistent Memory
>  0xCA  80-BF  uapi/scsi/cxlflash_ioctl.h
>  0xCB  00-1F                                                          CBM serial IEC bus in development:
>                                                                       <mailto:michael.klein@puffin.lb.shuttle.de>
> diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
> index 9b85fcd3f1c9..e6be0029f658 100644
> --- a/drivers/nvdimm/ocxl/main.c
> +++ b/drivers/nvdimm/ocxl/main.c
> @@ -13,6 +13,7 @@
>  #include <linux/fs.h>
>  #include <linux/mm_types.h>
>  #include <linux/memory_hotplug.h>
> +#include <uapi/nvdimm/ocxlpmem.h>
>  #include "ocxlpmem.h"
>
>  static const struct pci_device_id pci_tbl[] = {
> @@ -401,10 +402,190 @@ static int file_release(struct inode *inode, struct file *file)
>         return 0;
>  }
>
> +/**
> + * error_log_header_parse() - Parse the first 64 bits of the error log command response
> + * @ocxlpmem: the device metadata
> + * @length: out, returns the number of bytes in the response (excluding the 64 bit header)
> + */
> +static int error_log_header_parse(struct ocxlpmem *ocxlpmem, u16 *length)
> +{
> +       int rc;
> +       u64 val;
> +       u16 data_identifier;
> +       u32 data_length;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset,
> +                                    OCXL_LITTLE_ENDIAN, &val);
> +       if (rc)
> +               return rc;
> +
> +       data_identifier = val >> 48;
> +       data_length = val & 0xFFFF;
> +
> +       if (data_identifier != 0x454C) { // 'EL'
> +               dev_err(&ocxlpmem->dev,
> +                       "Bad data identifier for error log data, expected 'EL', got '%2s' (%#x), data_length=%u\n",
> +                       (char *)&data_identifier,
> +                       (unsigned int)data_identifier, data_length);
> +               return -EINVAL;
> +       }
> +
> +       *length = data_length;
> +       return 0;
> +}
> +
> +static int read_error_log(struct ocxlpmem *ocxlpmem,
> +                         struct ioctl_ocxlpmem_error_log *log,
> +                         bool buf_is_user)
> +{
> +       u64 val;
> +       u16 user_buf_length;
> +       u16 buf_length;
> +       u64 *buf = (u64 *)log->buf_ptr;
> +       u16 i;
> +       int rc;
> +
> +       if (log->buf_size % 8)
> +               return -EINVAL;
> +
> +       rc = ocxlpmem_chi(ocxlpmem, &val);
> +       if (rc)
> +               return rc;
> +
> +       if (!(val & GLOBAL_MMIO_CHI_ELA))
> +               return -EAGAIN;
> +
> +       user_buf_length = log->buf_size;
> +
> +       mutex_lock(&ocxlpmem->admin_command.lock);
> +
> +       rc = admin_command_execute(ocxlpmem, ADMIN_COMMAND_ERRLOG);
> +       if (rc != STATUS_SUCCESS) {
> +               warn_status(ocxlpmem,
> +                           "Unexpected status from retrieve error log", rc);
> +               goto out;
> +       }
> +
> +       rc = error_log_header_parse(ocxlpmem, &log->buf_size);
> +       if (rc)
> +               goto out;
> +       // log->buf_size now contains the returned buffer size, not the user size
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x08,
> +                                    OCXL_LITTLE_ENDIAN, &val);
> +       if (rc)
> +               goto out;
> +
> +       log->log_identifier = val >> 32;
> +       log->program_reference_code = val & 0xFFFFFFFF;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x10,
> +                                    OCXL_LITTLE_ENDIAN, &val);
> +       if (rc)
> +               goto out;
> +
> +       log->error_log_type = val >> 56;
> +       log->action_flags = (log->error_log_type == OCXLPMEM_ERROR_LOG_TYPE_GENERAL) ?
> +                           (val >> 32) & 0xFFFFFF : 0;
> +       log->power_on_seconds = val & 0xFFFFFFFF;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x18,
> +                                    OCXL_LITTLE_ENDIAN, &log->timestamp);
> +       if (rc)
> +               goto out;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x20,
> +                                    OCXL_LITTLE_ENDIAN, &log->wwid[0]);
> +       if (rc)
> +               goto out;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x28,
> +                                    OCXL_LITTLE_ENDIAN, &log->wwid[1]);
> +       if (rc)
> +               goto out;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x30,
> +                                    OCXL_HOST_ENDIAN, (u64 *)log->fw_revision);
> +       if (rc)
> +               goto out;
> +       log->fw_revision[8] = '\0';
> +
> +       buf_length = (user_buf_length < log->buf_size) ?
> +                     user_buf_length : log->buf_size;
> +       for (i = 0; i < buf_length / (sizeof(u64)); i++) {
> +               rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                            ocxlpmem->admin_command.data_offset +
> +                                                       i * sizeof(u64),
> +                                            OCXL_HOST_ENDIAN, &val);
> +               if (rc)
> +                       goto out;
> +
> +               if (buf_is_user) {
> +                       if (copy_to_user((u64 __user *)&buf[i], &val,
> +                                        sizeof(u64))) {
> +                               rc = -EFAULT;
> +                               goto out;
> +                       }
> +               } else {
> +                       buf[i] = val;
> +               }
> +       }
> +
> +       rc = admin_response_handled(ocxlpmem);
> +       if (rc)
> +               goto out;
> +
> +out:
> +       mutex_unlock(&ocxlpmem->admin_command.lock);
> +       return rc;
> +}
> +
> +static int ioctl_error_log(struct ocxlpmem *ocxlpmem,
> +                          struct ioctl_ocxlpmem_error_log __user *uarg)
> +{
> +       struct ioctl_ocxlpmem_error_log args;
> +       int rc;
> +
> +       if (copy_from_user(&args, uarg, sizeof(args)))
> +               return -EFAULT;
> +
> +       rc = read_error_log(ocxlpmem, &args, true);
> +       if (rc)
> +               return rc;
> +
> +       if (copy_to_user(uarg, &args, sizeof(args)))
> +               return -EFAULT;
> +
> +       return 0;
> +}
> +
> +static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
> +{
> +       struct ocxlpmem *ocxlpmem = file->private_data;
> +       int rc = -EINVAL;
> +
> +       switch (cmd) {
> +       case IOCTL_OCXLPMEM_ERROR_LOG:
> +               rc = ioctl_error_log(ocxlpmem,
> +                                    (struct ioctl_ocxlpmem_error_log __user *)args);
> +               break;
> +       }
> +       return rc;
> +}
> +
>  static const struct file_operations fops = {
>         .owner          = THIS_MODULE,
>         .open           = file_open,
>         .release        = file_release,
> +       .unlocked_ioctl = file_ioctl,
> +       .compat_ioctl   = file_ioctl,
>  };
>
>  /**
> @@ -493,6 +674,60 @@ static int read_device_metadata(struct ocxlpmem *ocxlpmem)
>         return 0;
>  }
>
> +static const char *decode_error_log_type(u8 error_log_type)
> +{
> +       switch (error_log_type) {
> +       case 0x00:
> +               return "general";
> +       case 0x01:
> +               return "predictive failure";
> +       case 0x02:
> +               return "thermal warning";
> +       case 0x03:
> +               return "data loss";
> +       case 0x04:
> +               return "health & performance";
> +       default:
> +               return "unknown";
> +       }
> +}
> +
> +static void dump_error_log(struct ocxlpmem *ocxlpmem)
> +{
> +       struct ioctl_ocxlpmem_error_log log;
> +       u32 buf_size;
> +       u8 *buf;
> +       int rc;
> +
> +       if (ocxlpmem->admin_command.data_size == 0)
> +               return;
> +
> +       buf_size = ocxlpmem->admin_command.data_size - 0x48;
> +       buf = kzalloc(buf_size, GFP_KERNEL);
> +       if (!buf)
> +               return;
> +
> +       log.buf_ptr = (u64)buf;
> +       log.buf_size = buf_size;
> +
> +       rc = read_error_log(ocxlpmem, &log, false);
> +       if (rc < 0)
> +               goto out;
> +
> +       dev_warn(&ocxlpmem->dev,
> +                "OCXL PMEM Error log: WWID=0x%016llx%016llx LID=0x%x PRC=%x type=0x%x %s, Uptime=%u seconds timestamp=0x%llx\n",
> +                log.wwid[0], log.wwid[1],
> +                log.log_identifier, log.program_reference_code,
> +                log.error_log_type,
> +                decode_error_log_type(log.error_log_type),
> +                log.power_on_seconds, log.timestamp);
> +       print_hex_dump(KERN_WARNING, "buf", DUMP_PREFIX_OFFSET, 16, 1, buf,
> +                      log.buf_size, false);
> +
> +out:
> +       kfree(buf);
> +}
> +
>  /**
>   * probe_function0() - Set up function 0 for an OpenCAPI persistent memory device
>   * This is important as it enables templates higher than 0 across all other
> @@ -656,6 +891,11 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>         pci_set_drvdata(pdev, NULL);
>
>  err:
> +       if (ocxlpmem &&
> +           (ocxlpmem_chi(ocxlpmem, &chi) == 0) &&
> +           (chi & GLOBAL_MMIO_CHI_ELA))
> +               dump_error_log(ocxlpmem);
> +
>         /*
>          * Further cleanup is done in the release handler via free_ocxlpmem()
>          * This allows us to keep the character device live to handle IOCTLs to
> diff --git a/include/uapi/nvdimm/ocxlpmem.h b/include/uapi/nvdimm/ocxlpmem.h
> new file mode 100644
> index 000000000000..5d3a03ea1e08
> --- /dev/null
> +++ b/include/uapi/nvdimm/ocxlpmem.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/* Copyright 2020 IBM Corp. */
> +#ifndef _UAPI_OCXL_SCM_H
> +#define _UAPI_OCXL_SCM_H
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +#define OCXLPMEM_ERROR_LOG_ACTION_RESET        (1 << (32 - 32))
> +#define OCXLPMEM_ERROR_LOG_ACTION_CHKFW        (1 << (53 - 32))
> +#define OCXLPMEM_ERROR_LOG_ACTION_REPLACE      (1 << (54 - 32))
> +#define OCXLPMEM_ERROR_LOG_ACTION_DUMP         (1 << (55 - 32))
> +
> +#define OCXLPMEM_ERROR_LOG_TYPE_GENERAL                (0x00)
> +#define OCXLPMEM_ERROR_LOG_TYPE_PREDICTIVE_FAILURE     (0x01)
> +#define OCXLPMEM_ERROR_LOG_TYPE_THERMAL_WARNING        (0x02)
> +#define OCXLPMEM_ERROR_LOG_TYPE_DATA_LOSS              (0x03)
> +#define OCXLPMEM_ERROR_LOG_TYPE_HEALTH_PERFORMANCE     (0x04)
> +
> +struct ioctl_ocxlpmem_error_log {
> +       __u32 log_identifier; /* out */
> +       __u32 program_reference_code; /* out */
> +       __u32 action_flags; /* out, recommended course of action */
> +       __u32 power_on_seconds; /* out, Number of seconds the controller has been on when the error occurred */
> +       __u64 timestamp; /* out, relative time since the current IPL */
> +       __u64 wwid[2]; /* out, the NAA formatted WWID associated with the controller */
> +       char  fw_revision[8 + 1]; /* out, firmware revision as null terminated text */
> +       __u8  reserved0[7];
> +       __u16 buf_size; /* in/out, buffer size provided/required.
> +                        * If required is greater than provided, the buffer
> +                        * will be truncated to the amount provided. If its
> +                        * less, then only the required bytes will be populated.
> +                        * If it is 0, then there are no more error log entries.
> +                        */
> +       __u8  error_log_type;
> +       __u8  reserved1[5];
> +       __u64 buf_ptr; /* coerced pointer to output buffer */
> +       __u64 reserved2[2];
> +};
> +
> +/* ioctl numbers */
> +#define OCXLPMEM_MAGIC 0xCA
> +/* OpenCAPI Persistent memory devices */
> +#define IOCTL_OCXLPMEM_ERROR_LOG                       _IOWR(OCXLPMEM_MAGIC, 0x30, struct ioctl_ocxlpmem_error_log)
> +
> +#endif /* _UAPI_OCXL_SCM_H */
> --
> 2.24.1
>

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: "Alastair D'Silva" <alastair@d-silva.org>
Cc: "Madhavan Srinivasan" <maddy@linux.vnet.ibm.com>,
	"Alexey Kardashevskiy" <aik@ozlabs.ru>,
	"Masahiro Yamada" <yamada.masahiro@socionext.com>,
	"Oliver O'Halloran" <oohall@gmail.com>,
	"Mauro Carvalho Chehab" <mchehab+samsung@kernel.org>,
	"Ira Weiny" <ira.weiny@intel.com>,
	"Rob Herring" <robh@kernel.org>,
	"Dave Jiang" <dave.jiang@intel.com>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	"Krzysztof Kozlowski" <krzk@kernel.org>,
	"Anju T Sudhakar" <anju@linux.vnet.ibm.com>,
	"Mahesh Salgaonkar" <mahesh@linux.vnet.ibm.com>,
	"Andrew Donnellan" <ajd@linux.ibm.com>,
	"Arnd Bergmann" <arnd@arndb.de>, "Greg Kurz" <groug@kaod.org>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Cédric Le Goater" <clg@kaod.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Hari Bathini" <hbathini@linux.ibm.com>,
	"Linux MM" <linux-mm@kvack.org>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Vishal Verma" <vishal.l.verma@intel.com>,
	"Frederic Barrat" <fbarrat@linux.ibm.com>,
	"Paul Mackerras" <paulus@samba.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH v4 16/25] nvdimm/ocxl: Implement the Read Error Log command
Date: Thu, 2 Apr 2020 17:54:56 -0700	[thread overview]
Message-ID: <CAPcyv4go3Ufv91E4yuVJ8x9eRU+rdOBZLC2SM9FVr-2o5iRZDw@mail.gmail.com> (raw)
In-Reply-To: <20200327071202.2159885-17-alastair@d-silva.org>

On Tue, Mar 31, 2020 at 1:59 AM Alastair D'Silva <alastair@d-silva.org> wrote:
>
> The read error log command extracts information from the controller's
> internal error log.
>
> This patch exposes this information in 2 ways:
> - During probe, if an error occurs & a log is available, print it to the
>   console
> - After probe, make the error log available to userspace via an IOCTL.
>   Userspace is notified of pending error logs in a later patch
>   ("powerpc/powernv/pmem: Forward events to userspace")

So, have a look at the recent papr_scm patches to add health flags and
smart data retrieval. I'd prefer to extend existing nvdimm device
retrieval mechanisms than invent new ones.


>
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> ---
>  .../userspace-api/ioctl/ioctl-number.rst      |   1 +
>  drivers/nvdimm/ocxl/main.c                    | 240 ++++++++++++++++++
>  include/uapi/nvdimm/ocxlpmem.h                |  46 ++++
>  3 files changed, 287 insertions(+)
>  create mode 100644 include/uapi/nvdimm/ocxlpmem.h
>
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 9425377615ce..ba0ce7dca643 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -340,6 +340,7 @@ Code  Seq#    Include File                                           Comments
>  0xC0  00-0F  linux/usb/iowarrior.h
>  0xCA  00-0F  uapi/misc/cxl.h
>  0xCA  10-2F  uapi/misc/ocxl.h
> +0xCA  30-3F  uapi/nvdimm/ocxlpmem.h                                  OpenCAPI Persistent Memory
>  0xCA  80-BF  uapi/scsi/cxlflash_ioctl.h
>  0xCB  00-1F                                                          CBM serial IEC bus in development:
>                                                                       <mailto:michael.klein@puffin.lb.shuttle.de>
> diff --git a/drivers/nvdimm/ocxl/main.c b/drivers/nvdimm/ocxl/main.c
> index 9b85fcd3f1c9..e6be0029f658 100644
> --- a/drivers/nvdimm/ocxl/main.c
> +++ b/drivers/nvdimm/ocxl/main.c
> @@ -13,6 +13,7 @@
>  #include <linux/fs.h>
>  #include <linux/mm_types.h>
>  #include <linux/memory_hotplug.h>
> +#include <uapi/nvdimm/ocxlpmem.h>
>  #include "ocxlpmem.h"
>
>  static const struct pci_device_id pci_tbl[] = {
> @@ -401,10 +402,190 @@ static int file_release(struct inode *inode, struct file *file)
>         return 0;
>  }
>
> +/**
> + * error_log_header_parse() - Parse the first 64 bits of the error log command response
> + * @ocxlpmem: the device metadata
> + * @length: out, returns the number of bytes in the response (excluding the 64 bit header)
> + */
> +static int error_log_header_parse(struct ocxlpmem *ocxlpmem, u16 *length)
> +{
> +       int rc;
> +       u64 val;
> +       u16 data_identifier;
> +       u32 data_length;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset,
> +                                    OCXL_LITTLE_ENDIAN, &val);
> +       if (rc)
> +               return rc;
> +
> +       data_identifier = val >> 48;
> +       data_length = val & 0xFFFF;
> +
> +       if (data_identifier != 0x454C) { // 'EL'
> +               dev_err(&ocxlpmem->dev,
> +                       "Bad data identifier for error log data, expected 'EL', got '%2s' (%#x), data_length=%u\n",
> +                       (char *)&data_identifier,
> +                       (unsigned int)data_identifier, data_length);
> +               return -EINVAL;
> +       }
> +
> +       *length = data_length;
> +       return 0;
> +}
> +
> +static int read_error_log(struct ocxlpmem *ocxlpmem,
> +                         struct ioctl_ocxlpmem_error_log *log,
> +                         bool buf_is_user)
> +{
> +       u64 val;
> +       u16 user_buf_length;
> +       u16 buf_length;
> +       u64 *buf = (u64 *)log->buf_ptr;
> +       u16 i;
> +       int rc;
> +
> +       if (log->buf_size % 8)
> +               return -EINVAL;
> +
> +       rc = ocxlpmem_chi(ocxlpmem, &val);
> +       if (rc)
> +               return rc;
> +
> +       if (!(val & GLOBAL_MMIO_CHI_ELA))
> +               return -EAGAIN;
> +
> +       user_buf_length = log->buf_size;
> +
> +       mutex_lock(&ocxlpmem->admin_command.lock);
> +
> +       rc = admin_command_execute(ocxlpmem, ADMIN_COMMAND_ERRLOG);
> +       if (rc != STATUS_SUCCESS) {
> +               warn_status(ocxlpmem,
> +                           "Unexpected status from retrieve error log", rc);
> +               goto out;
> +       }
> +
> +       rc = error_log_header_parse(ocxlpmem, &log->buf_size);
> +       if (rc)
> +               goto out;
> +       // log->buf_size now contains the returned buffer size, not the user size
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x08,
> +                                    OCXL_LITTLE_ENDIAN, &val);
> +       if (rc)
> +               goto out;
> +
> +       log->log_identifier = val >> 32;
> +       log->program_reference_code = val & 0xFFFFFFFF;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x10,
> +                                    OCXL_LITTLE_ENDIAN, &val);
> +       if (rc)
> +               goto out;
> +
> +       log->error_log_type = val >> 56;
> +       log->action_flags = (log->error_log_type == OCXLPMEM_ERROR_LOG_TYPE_GENERAL) ?
> +                           (val >> 32) & 0xFFFFFF : 0;
> +       log->power_on_seconds = val & 0xFFFFFFFF;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x18,
> +                                    OCXL_LITTLE_ENDIAN, &log->timestamp);
> +       if (rc)
> +               goto out;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x20,
> +                                    OCXL_LITTLE_ENDIAN, &log->wwid[0]);
> +       if (rc)
> +               goto out;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x28,
> +                                    OCXL_LITTLE_ENDIAN, &log->wwid[1]);
> +       if (rc)
> +               goto out;
> +
> +       rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                    ocxlpmem->admin_command.data_offset + 0x30,
> +                                    OCXL_HOST_ENDIAN, (u64 *)log->fw_revision);
> +       if (rc)
> +               goto out;
> +       log->fw_revision[8] = '\0';
> +
> +       buf_length = (user_buf_length < log->buf_size) ?
> +                     user_buf_length : log->buf_size;
> +       for (i = 0; i < buf_length / (sizeof(u64)); i++) {
> +               rc = ocxl_global_mmio_read64(ocxlpmem->ocxl_afu,
> +                                            ocxlpmem->admin_command.data_offset +
> +                                                       i * sizeof(u64),
> +                                            OCXL_HOST_ENDIAN, &val);
> +               if (rc)
> +                       goto out;
> +
> +               if (buf_is_user) {
> +                       if (copy_to_user((u64 __user *)&buf[i], &val,
> +                                        sizeof(u64))) {
> +                               rc = -EFAULT;
> +                               goto out;
> +                       }
> +               } else {
> +                       buf[i] = val;
> +               }
> +       }
> +
> +       rc = admin_response_handled(ocxlpmem);
> +       if (rc)
> +               goto out;
> +
> +out:
> +       mutex_unlock(&ocxlpmem->admin_command.lock);
> +       return rc;
> +}
> +
> +static int ioctl_error_log(struct ocxlpmem *ocxlpmem,
> +                          struct ioctl_ocxlpmem_error_log __user *uarg)
> +{
> +       struct ioctl_ocxlpmem_error_log args;
> +       int rc;
> +
> +       if (copy_from_user(&args, uarg, sizeof(args)))
> +               return -EFAULT;
> +
> +       rc = read_error_log(ocxlpmem, &args, true);
> +       if (rc)
> +               return rc;
> +
> +       if (copy_to_user(uarg, &args, sizeof(args)))
> +               return -EFAULT;
> +
> +       return 0;
> +}
> +
> +static long file_ioctl(struct file *file, unsigned int cmd, unsigned long args)
> +{
> +       struct ocxlpmem *ocxlpmem = file->private_data;
> +       int rc = -EINVAL;
> +
> +       switch (cmd) {
> +       case IOCTL_OCXLPMEM_ERROR_LOG:
> +               rc = ioctl_error_log(ocxlpmem,
> +                                    (struct ioctl_ocxlpmem_error_log __user *)args);
> +               break;
> +       }
> +       return rc;
> +}
> +
>  static const struct file_operations fops = {
>         .owner          = THIS_MODULE,
>         .open           = file_open,
>         .release        = file_release,
> +       .unlocked_ioctl = file_ioctl,
> +       .compat_ioctl   = file_ioctl,
>  };
>
>  /**
> @@ -493,6 +674,60 @@ static int read_device_metadata(struct ocxlpmem *ocxlpmem)
>         return 0;
>  }
>
> +static const char *decode_error_log_type(u8 error_log_type)
> +{
> +       switch (error_log_type) {
> +       case 0x00:
> +               return "general";
> +       case 0x01:
> +               return "predictive failure";
> +       case 0x02:
> +               return "thermal warning";
> +       case 0x03:
> +               return "data loss";
> +       case 0x04:
> +               return "health & performance";
> +       default:
> +               return "unknown";
> +       }
> +}
> +
> +static void dump_error_log(struct ocxlpmem *ocxlpmem)
> +{
> +       struct ioctl_ocxlpmem_error_log log;
> +       u32 buf_size;
> +       u8 *buf;
> +       int rc;
> +
> +       if (ocxlpmem->admin_command.data_size == 0)
> +               return;
> +
> +       buf_size = ocxlpmem->admin_command.data_size - 0x48;
> +       buf = kzalloc(buf_size, GFP_KERNEL);
> +       if (!buf)
> +               return;
> +
> +       log.buf_ptr = (u64)buf;
> +       log.buf_size = buf_size;
> +
> +       rc = read_error_log(ocxlpmem, &log, false);
> +       if (rc < 0)
> +               goto out;
> +
> +       dev_warn(&ocxlpmem->dev,
> +                "OCXL PMEM Error log: WWID=0x%016llx%016llx LID=0x%x PRC=%x type=0x%x %s, Uptime=%u seconds timestamp=0x%llx\n",
> +                log.wwid[0], log.wwid[1],
> +                log.log_identifier, log.program_reference_code,
> +                log.error_log_type,
> +                decode_error_log_type(log.error_log_type),
> +                log.power_on_seconds, log.timestamp);
> +       print_hex_dump(KERN_WARNING, "buf", DUMP_PREFIX_OFFSET, 16, 1, buf,
> +                      log.buf_size, false);
> +
> +out:
> +       kfree(buf);
> +}
> +
>  /**
>   * probe_function0() - Set up function 0 for an OpenCAPI persistent memory device
>   * This is important as it enables templates higher than 0 across all other
> @@ -656,6 +891,11 @@ static int probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>         pci_set_drvdata(pdev, NULL);
>
>  err:
> +       if (ocxlpmem &&
> +           (ocxlpmem_chi(ocxlpmem, &chi) == 0) &&
> +           (chi & GLOBAL_MMIO_CHI_ELA))
> +               dump_error_log(ocxlpmem);
> +
>         /*
>          * Further cleanup is done in the release handler via free_ocxlpmem()
>          * This allows us to keep the character device live to handle IOCTLs to
> diff --git a/include/uapi/nvdimm/ocxlpmem.h b/include/uapi/nvdimm/ocxlpmem.h
> new file mode 100644
> index 000000000000..5d3a03ea1e08
> --- /dev/null
> +++ b/include/uapi/nvdimm/ocxlpmem.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/* Copyright 2020 IBM Corp. */
> +#ifndef _UAPI_OCXL_SCM_H
> +#define _UAPI_OCXL_SCM_H
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +#define OCXLPMEM_ERROR_LOG_ACTION_RESET        (1 << (32 - 32))
> +#define OCXLPMEM_ERROR_LOG_ACTION_CHKFW        (1 << (53 - 32))
> +#define OCXLPMEM_ERROR_LOG_ACTION_REPLACE      (1 << (54 - 32))
> +#define OCXLPMEM_ERROR_LOG_ACTION_DUMP         (1 << (55 - 32))
> +
> +#define OCXLPMEM_ERROR_LOG_TYPE_GENERAL                (0x00)
> +#define OCXLPMEM_ERROR_LOG_TYPE_PREDICTIVE_FAILURE     (0x01)
> +#define OCXLPMEM_ERROR_LOG_TYPE_THERMAL_WARNING        (0x02)
> +#define OCXLPMEM_ERROR_LOG_TYPE_DATA_LOSS              (0x03)
> +#define OCXLPMEM_ERROR_LOG_TYPE_HEALTH_PERFORMANCE     (0x04)
> +
> +struct ioctl_ocxlpmem_error_log {
> +       __u32 log_identifier; /* out */
> +       __u32 program_reference_code; /* out */
> +       __u32 action_flags; /* out, recommended course of action */
> +       __u32 power_on_seconds; /* out, Number of seconds the controller has been on when the error occurred */
> +       __u64 timestamp; /* out, relative time since the current IPL */
> +       __u64 wwid[2]; /* out, the NAA formatted WWID associated with the controller */
> +       char  fw_revision[8 + 1]; /* out, firmware revision as null terminated text */
> +       __u8  reserved0[7];
> +       __u16 buf_size; /* in/out, buffer size provided/required.
> +                        * If required is greater than provided, the buffer
> +                        * will be truncated to the amount provided. If its
> +                        * less, then only the required bytes will be populated.
> +                        * If it is 0, then there are no more error log entries.
> +                        */
> +       __u8  error_log_type;
> +       __u8  reserved1[5];
> +       __u64 buf_ptr; /* coerced pointer to output buffer */
> +       __u64 reserved2[2];
> +};
> +
> +/* ioctl numbers */
> +#define OCXLPMEM_MAGIC 0xCA
> +/* OpenCAPI Persistent memory devices */
> +#define IOCTL_OCXLPMEM_ERROR_LOG                       _IOWR(OCXLPMEM_MAGIC, 0x30, struct ioctl_ocxlpmem_error_log)
> +
> +#endif /* _UAPI_OCXL_SCM_H */
> --
> 2.24.1
>

  reply	other threads:[~2020-04-03  0:55 UTC|newest]

Thread overview: 179+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-27  7:11 [PATCH v4 00/25] Add support for OpenCAPI Persistent Memory devices Alastair D'Silva
2020-03-27  7:11 ` Alastair D'Silva
2020-03-27  7:11 ` Alastair D'Silva
2020-03-27  7:11 ` [PATCH v4 01/25] powerpc/powernv: Add OPAL calls for LPC memory alloc/release Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-04-01  8:48   ` Dan Williams
2020-04-01  8:48     ` Dan Williams
2020-04-01  8:48     ` Dan Williams
2020-04-01 22:51     ` Alastair D'Silva
2020-04-01 22:51       ` Alastair D'Silva
2020-04-01 22:51       ` Alastair D'Silva
2020-03-27  7:11 ` [PATCH v4 02/25] mm/memory_hotplug: Allow check_hotplug_memory_addressable to be called from drivers Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-04-01  8:48   ` Dan Williams
2020-04-01  8:48     ` Dan Williams
2020-04-01  8:48     ` Dan Williams
2020-04-02  4:33     ` Alastair D'Silva
2020-04-02  4:33       ` Alastair D'Silva
2020-04-02  4:33       ` Alastair D'Silva
2020-03-27  7:11 ` [PATCH v4 03/25] powerpc/powernv: Map & release OpenCAPI LPC memory Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-04-01  8:48   ` Dan Williams
2020-04-01  8:48     ` Dan Williams
2020-04-01  8:48     ` Dan Williams
2020-04-02  4:36     ` Alastair D'Silva
2020-04-02  4:36       ` Alastair D'Silva
2020-04-02  4:36       ` Alastair D'Silva
2020-04-02 10:41     ` Benjamin Herrenschmidt
2020-04-02 10:41       ` Benjamin Herrenschmidt
2020-04-03  4:27       ` Michael Ellerman
2020-04-03  4:27         ` Michael Ellerman
2020-04-03  4:27         ` Michael Ellerman
2020-03-27  7:11 ` [PATCH v4 04/25] ocxl: Remove unnecessary externs Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-04-01  8:48   ` Dan Williams
2020-04-01  8:48     ` Dan Williams
2020-04-01  8:48     ` Dan Williams
2020-03-27  7:11 ` [PATCH v4 05/25] ocxl: Address kernel doc errors & warnings Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-04-01  8:49   ` Dan Williams
2020-04-01  8:49     ` Dan Williams
2020-04-01  8:49     ` Dan Williams
2020-03-27  7:11 ` [PATCH v4 06/25] ocxl: Tally up the LPC memory on a link & allow it to be mapped Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-04-01  8:48   ` Dan Williams
2020-04-01  8:48     ` Dan Williams
2020-04-01  8:48     ` Dan Williams
2020-04-02  6:21     ` Andrew Donnellan
2020-04-02  6:21       ` Andrew Donnellan
2020-04-02  6:21       ` Andrew Donnellan
2020-03-27  7:11 ` [PATCH v4 07/25] ocxl: Add functions to map/unmap LPC memory Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-04-01  8:49   ` Dan Williams
2020-04-01  8:49     ` Dan Williams
2020-04-01  8:49     ` Dan Williams
2020-04-03  3:50     ` Alastair D'Silva
2020-04-03  3:50       ` Alastair D'Silva
2020-04-03  3:50       ` Alastair D'Silva
2020-03-27  7:11 ` [PATCH v4 08/25] ocxl: Emit a log message showing how much LPC memory was detected Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-04-01  8:49   ` Dan Williams
2020-04-01  8:49     ` Dan Williams
2020-04-01  8:49     ` Dan Williams
2020-04-02  1:29     ` Joe Perches
2020-04-02  1:29       ` Joe Perches
2020-04-02  1:29       ` Joe Perches
2020-04-03  3:52     ` Alastair D'Silva
2020-04-03  3:52       ` Alastair D'Silva
2020-04-03  3:52       ` Alastair D'Silva
2020-03-27  7:11 ` [PATCH v4 09/25] ocxl: Save the device serial number in ocxl_fn Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11 ` [PATCH v4 10/25] nvdimm: Add driver for OpenCAPI Persistent Memory Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-29  2:56   ` Matthew Wilcox
2020-03-29  2:56     ` Matthew Wilcox
2020-03-29  2:56     ` Matthew Wilcox
2020-03-29  2:59     ` Matthew Wilcox
2020-03-29  2:59       ` Matthew Wilcox
2020-03-29  2:59       ` Matthew Wilcox
2020-04-01  8:49   ` Dan Williams
2020-04-01  8:49     ` Dan Williams
2020-04-01  8:49     ` Dan Williams
2020-04-01 19:35     ` Dan Williams
2020-04-01 19:35       ` Dan Williams
2020-04-01 19:35       ` Dan Williams
2020-03-27  7:11 ` [PATCH v4 11/25] powerpc: Enable the OpenCAPI Persistent Memory driver for powernv_defconfig Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-04-01 20:26   ` Dan Williams
2020-04-01 20:26     ` Dan Williams
2020-04-01 20:26     ` Dan Williams
2020-03-27  7:11 ` [PATCH v4 12/25] nvdimm/ocxl: Add register addresses & status values to the header Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-04-01 20:27   ` Dan Williams
2020-04-01 20:27     ` Dan Williams
2020-04-01 20:27     ` Dan Williams
2020-03-27  7:11 ` [PATCH v4 13/25] nvdimm/ocxl: Read the capability registers & wait for device ready Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-04-02  0:20   ` Dan Williams
2020-04-02  0:20     ` Dan Williams
2020-04-02  0:20     ` Dan Williams
2020-03-27  7:11 ` [PATCH v4 14/25] nvdimm/ocxl: Add support for Admin commands Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-04-02  6:41   ` Dan Williams
2020-04-02  6:41     ` Dan Williams
2020-04-02  6:41     ` Dan Williams
2020-03-27  7:11 ` [PATCH v4 15/25] nvdimm/ocxl: Register a character device for userspace to interact with Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-04-02  0:27   ` Dan Williams
2020-04-02  0:27     ` Dan Williams
2020-04-02  0:27     ` Dan Williams
2020-03-27  7:11 ` [PATCH v4 16/25] nvdimm/ocxl: Implement the Read Error Log command Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-04-03  0:54   ` Dan Williams [this message]
2020-04-03  0:54     ` Dan Williams
2020-04-03  0:54     ` Dan Williams
2020-03-27  7:11 ` [PATCH v4 17/25] nvdimm/ocxl: Add controller dump IOCTLs Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11 ` [PATCH v4 18/25] nvdimm/ocxl: Add an IOCTL to report controller statistics Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11 ` [PATCH v4 19/25] nvdimm/ocxl: Forward events to userspace Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-04-02  2:08   ` Dan Williams
2020-04-02  2:08     ` Dan Williams
2020-04-02  2:08     ` Dan Williams
2020-03-27  7:11 ` [PATCH v4 20/25] nvdimm/ocxl: Add an IOCTL to request controller health & perf data Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11 ` [PATCH v4 21/25] nvdimm/ocxl: Implement the heartbeat command Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11 ` [PATCH v4 22/25] nvdimm/ocxl: Add debug IOCTLs Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:11   ` Alastair D'Silva
2020-03-27  7:12 ` [PATCH v4 23/25] nvdimm/ocxl: Expose SMART data via ndctl Alastair D'Silva
2020-03-27  7:12   ` Alastair D'Silva
2020-03-27  7:12   ` Alastair D'Silva
2020-03-27  7:12 ` [PATCH v4 24/25] nvdimm/ocxl: Expose the serial number & firmware version in sysfs Alastair D'Silva
2020-03-27  7:12   ` Alastair D'Silva
2020-03-27  7:12   ` Alastair D'Silva
2020-03-27  7:12 ` [PATCH v4 25/25] MAINTAINERS: Add myself & nvdimm/ocxl to ocxl Alastair D'Silva
2020-03-27  7:12   ` Alastair D'Silva
2020-03-27  7:12   ` Alastair D'Silva
2020-04-01  8:47 ` [PATCH v4 00/25] Add support for OpenCAPI Persistent Memory devices Dan Williams
2020-04-01  8:47   ` Dan Williams
2020-04-01  8:47   ` Dan Williams
2020-04-01 22:44   ` Alastair D'Silva
2020-04-01 22:44     ` Alastair D'Silva
2020-04-01 22:44     ` Alastair D'Silva
2020-04-02  3:42     ` Michael Ellerman
2020-04-02  3:42       ` Michael Ellerman
2020-04-02  3:42       ` Michael Ellerman
2020-04-02  3:50       ` Oliver O'Halloran
2020-04-02  3:50         ` Oliver O'Halloran
2020-04-02  3:50         ` Oliver O'Halloran
2020-04-02 10:06         ` Michael Ellerman
2020-04-02 10:06           ` Michael Ellerman
2020-04-02 10:06           ` Michael Ellerman
2020-04-02 11:10           ` Greg Kurz
2020-04-02 11:10             ` Greg Kurz
2020-04-02 11:10             ` Greg Kurz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4go3Ufv91E4yuVJ8x9eRU+rdOBZLC2SM9FVr-2o5iRZDw@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=aik@ozlabs.ru \
    --cc=ajd@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alastair@d-silva.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anju@linux.vnet.ibm.com \
    --cc=anton@ozlabs.org \
    --cc=arnd@arndb.de \
    --cc=benh@kernel.crashing.org \
    --cc=clg@kaod.org \
    --cc=davem@davemloft.net \
    --cc=fbarrat@linux.ibm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=groug@kaod.org \
    --cc=hbathini@linux.ibm.com \
    --cc=krzk@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.vnet.ibm.com \
    --cc=mahesh@linux.vnet.ibm.com \
    --cc=mchehab+samsung@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=paulus@samba.org \
    --cc=robh@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=yamada.masahiro@socionext.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.