From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD2DFC3F2C6 for ; Fri, 28 Feb 2020 00:55:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 68A8A246A1 for ; Fri, 28 Feb 2020 00:55:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 68A8A246A1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=au1.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F09716B0005; Thu, 27 Feb 2020 19:55:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E92436B0006; Thu, 27 Feb 2020 19:55:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D0C6B6B0007; Thu, 27 Feb 2020 19:55:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0091.hostedemail.com [216.40.44.91]) by kanga.kvack.org (Postfix) with ESMTP id AEBB56B0005 for ; Thu, 27 Feb 2020 19:55:07 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 94933180AD807 for ; Fri, 28 Feb 2020 00:55:07 +0000 (UTC) X-FDA: 76537716654.01.chess78_2cacf37595031 X-HE-Tag: chess78_2cacf37595031 X-Filterd-Recvd-Size: 27777 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Fri, 28 Feb 2020 00:55:06 +0000 (UTC) Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 01S0sjMq066843 for ; Thu, 27 Feb 2020 19:55:06 -0500 Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com [195.75.94.101]) by mx0b-001b2d01.pphosted.com with ESMTP id 2yepwhjwn7-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 27 Feb 2020 19:55:05 -0500 Received: from localhost by e06smtp05.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 28 Feb 2020 00:55:03 -0000 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp05.uk.ibm.com (192.168.101.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Fri, 28 Feb 2020 00:54:55 -0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 01S0ssXB40566882 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 28 Feb 2020 00:54:54 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 85B77AE056; Fri, 28 Feb 2020 00:54:54 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D166BAE045; Fri, 28 Feb 2020 00:54:53 +0000 (GMT) Received: from ozlabs.au.ibm.com (unknown [9.192.253.14]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 28 Feb 2020 00:54:53 +0000 (GMT) Received: from adsilva.ozlabs.ibm.com (haven.au.ibm.com [9.192.254.114]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.au.ibm.com (Postfix) with ESMTPSA id E7A99A01F5; Fri, 28 Feb 2020 11:54:48 +1100 (AEDT) Subject: Re: [PATCH v3 10/27] powerpc: Add driver for OpenCAPI Persistent Memory From: "Alastair D'Silva" To: Frederic Barrat Cc: "Aneesh Kumar K . V" , "Oliver O'Halloran" , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Andrew Donnellan , Arnd Bergmann , Greg Kroah-Hartman , Dan Williams , Vishal Verma , Dave Jiang , Ira Weiny , Andrew Morton , Mauro Carvalho Chehab , "David S. Miller" , Rob Herring , Anton Blanchard , Krzysztof Kozlowski , Mahesh Salgaonkar , Madhavan Srinivasan , =?ISO-8859-1?Q?C=E9dric?= Le Goater , Anju T Sudhakar , Hari Bathini , Thomas Gleixner , Greg Kurz , Nicholas Piggin , Masahiro Yamada , Alexey Kardashevskiy , linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-nvdimm@lists.01.org, linux-mm@kvack.org Date: Fri, 28 Feb 2020 11:54:52 +1100 In-Reply-To: References: <20200221032720.33893-1-alastair@au1.ibm.com> <20200221032720.33893-11-alastair@au1.ibm.com> Organization: IBM Australia Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.34.4 (3.34.4-1.fc31) MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 20022800-0020-0000-0000-000003AE4741 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20022800-0021-0000-0000-000022066888 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.572 definitions=2020-02-27_08:2020-02-26,2020-02-27 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 phishscore=0 adultscore=0 suspectscore=2 malwarescore=0 spamscore=0 impostorscore=0 lowpriorityscore=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2001150001 definitions=main-2002280006 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 2020-02-27 at 21:44 +0100, Frederic Barrat wrote: >=20 > Le 21/02/2020 =C3=A0 04:27, Alastair D'Silva a =C3=A9crit : > > From: Alastair D'Silva > >=20 > > This driver exposes LPC memory on OpenCAPI pmem cards > > as an NVDIMM, allowing the existing nvram infrastructure > > to be used. > >=20 > > Namespace metadata is stored on the media itself, so > > scm_reserve_metadata() maps 1 section's worth of PMEM storage > > at the start to hold this. The rest of the PMEM range is registered > > with libnvdimm as an nvdimm. scm_ndctl_config_read/write/size() > > provide > > callbacks to libnvdimm to access the metadata. > >=20 > > Signed-off-by: Alastair D'Silva > > --- > > arch/powerpc/platforms/powernv/Kconfig | 3 + > > arch/powerpc/platforms/powernv/Makefile | 1 + > > arch/powerpc/platforms/powernv/pmem/Kconfig | 15 + > > arch/powerpc/platforms/powernv/pmem/Makefile | 7 + > > arch/powerpc/platforms/powernv/pmem/ocxl.c | 473 > > ++++++++++++++++++ > > .../platforms/powernv/pmem/ocxl_internal.h | 28 ++ > > 6 files changed, 527 insertions(+) > > create mode 100644 arch/powerpc/platforms/powernv/pmem/Kconfig > > create mode 100644 arch/powerpc/platforms/powernv/pmem/Makefile > > create mode 100644 arch/powerpc/platforms/powernv/pmem/ocxl.c > > create mode 100644 > > arch/powerpc/platforms/powernv/pmem/ocxl_internal.h > >=20 > > diff --git a/arch/powerpc/platforms/powernv/Kconfig > > b/arch/powerpc/platforms/powernv/Kconfig > > index 938803eab0ad..fc8976af0e52 100644 > > --- a/arch/powerpc/platforms/powernv/Kconfig > > +++ b/arch/powerpc/platforms/powernv/Kconfig > > @@ -50,3 +50,6 @@ config PPC_VAS > > config SCOM_DEBUGFS > > bool "Expose SCOM controllers via debugfs" > > depends on DEBUG_FS > > + > > +source "arch/powerpc/platforms/powernv/pmem/Kconfig" > > + > > diff --git a/arch/powerpc/platforms/powernv/Makefile > > b/arch/powerpc/platforms/powernv/Makefile > > index c0f8120045c3..0bbd72988b6f 100644 > > --- a/arch/powerpc/platforms/powernv/Makefile > > +++ b/arch/powerpc/platforms/powernv/Makefile > > @@ -21,3 +21,4 @@ obj-$(CONFIG_PPC_VAS) +=3D vas.o vas-window.o > > vas-debug.o > > obj-$(CONFIG_OCXL_BASE) +=3D ocxl.o > > obj-$(CONFIG_SCOM_DEBUGFS) +=3D opal-xscom.o > > obj-$(CONFIG_PPC_SECURE_BOOT) +=3D opal-secvar.o > > +obj-$(CONFIG_LIBNVDIMM) +=3D pmem/ > > diff --git a/arch/powerpc/platforms/powernv/pmem/Kconfig > > b/arch/powerpc/platforms/powernv/pmem/Kconfig > > new file mode 100644 > > index 000000000000..c5d927520920 > > --- /dev/null > > +++ b/arch/powerpc/platforms/powernv/pmem/Kconfig > > @@ -0,0 +1,15 @@ > > +# SPDX-License-Identifier: GPL-2.0-only > > +if LIBNVDIMM > > + > > +config OCXL_PMEM > > + tristate "OpenCAPI Persistent Memory" > > + depends on LIBNVDIMM && PPC_POWERNV && PCI && EEH && > > ZONE_DEVICE && OCXL > > + help > > + Exposes devices that implement the OpenCAPI Storage Class > > Memory > > + specification as persistent memory regions. You may also want > > + DEV_DAX, DEV_DAX_PMEM & FS_DAX if you plan on using DAX > > devices > > + stacked on top of this driver. > > + > > + Select N if unsure. > > + > > +endif > > diff --git a/arch/powerpc/platforms/powernv/pmem/Makefile > > b/arch/powerpc/platforms/powernv/pmem/Makefile > > new file mode 100644 > > index 000000000000..1c55c4193175 > > --- /dev/null > > +++ b/arch/powerpc/platforms/powernv/pmem/Makefile > > @@ -0,0 +1,7 @@ > > +# SPDX-License-Identifier: GPL-2.0 > > + > > +ccflags-$(CONFIG_PPC_WERROR) +=3D -Werror > > + > > +obj-$(CONFIG_OCXL_PMEM) +=3D ocxlpmem.o > > + > > +ocxlpmem-y :=3D ocxl.o > > diff --git a/arch/powerpc/platforms/powernv/pmem/ocxl.c > > b/arch/powerpc/platforms/powernv/pmem/ocxl.c > > new file mode 100644 > > index 000000000000..3c4eeb5dcc0f > > --- /dev/null > > +++ b/arch/powerpc/platforms/powernv/pmem/ocxl.c > > @@ -0,0 +1,473 @@ > > +// SPDX-License-Id > > +// Copyright 2019 IBM Corp. > > + > > +/* > > + * A driver for OpenCAPI devices that implement the Storage Class > > + * Memory specification. > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include "ocxl_internal.h" > > + > > + > > +static const struct pci_device_id ocxlpmem_pci_tbl[] =3D { > > + { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0625), }, > > + { } > > +}; > > + > > +MODULE_DEVICE_TABLE(pci, ocxlpmem_pci_tbl); > > + > > +#define NUM_MINORS 256 // Total to reserve > > + > > +static dev_t ocxlpmem_dev; > > +static struct class *ocxlpmem_class; > > +static struct mutex minors_idr_lock; > > +static struct idr minors_idr; > > + > > +/** > > + * ndctl_config_write() - Handle a ND_CMD_SET_CONFIG_DATA command > > from ndctl > > + * @ocxlpmem: the device metadata > > + * @command: the incoming data to write > > + * Return: 0 on success, negative on failure > > + */ > > +static int ndctl_config_write(struct ocxlpmem *ocxlpmem, > > + struct nd_cmd_set_config_hdr *command) > > +{ > > + if (command->in_offset + command->in_length > LABEL_AREA_SIZE) > > + return -EINVAL; > > + > > + memcpy_flushcache(ocxlpmem->metadata_addr + command->in_offset,=20 > > command->in_buf, > > + command->in_length); > > + > > + return 0; > > +} > > + > > +/** > > + * ndctl_config_read() - Handle a ND_CMD_GET_CONFIG_DATA command > > from ndctl > > + * @ocxlpmem: the device metadata > > + * @command: the read request > > + * Return: 0 on success, negative on failure > > + */ > > +static int ndctl_config_read(struct ocxlpmem *ocxlpmem, > > + struct nd_cmd_get_config_data_hdr > > *command) > > +{ > > + if (command->in_offset + command->in_length > LABEL_AREA_SIZE) > > + return -EINVAL; > > + > > + memcpy_mcsafe(command->out_buf, ocxlpmem->metadata_addr + > > command->in_offset, > > + command->in_length); > > + > > + return 0; > > +} > > + > > +/** > > + * ndctl_config_size() - Handle a ND_CMD_GET_CONFIG_SIZE command > > from ndctl > > + * @command: the read request > > + * Return: 0 on success, negative on failure > > + */ > > +static int ndctl_config_size(struct nd_cmd_get_config_size > > *command) > > +{ > > + command->status =3D 0; > > + command->config_size =3D LABEL_AREA_SIZE; > > + command->max_xfer =3D PAGE_SIZE; > > + > > + return 0; > > +} > > + > > +static int ndctl(struct nvdimm_bus_descriptor *nd_desc, > > + struct nvdimm *nvdimm, > > + unsigned int cmd, void *buf, unsigned int buf_len, int > > *cmd_rc) > > +{ > > + struct ocxlpmem *ocxlpmem =3D container_of(nd_desc, struct > > ocxlpmem, bus_desc); > > + > > + switch (cmd) { > > + case ND_CMD_GET_CONFIG_SIZE: > > + *cmd_rc =3D ndctl_config_size(buf); > > + return 0; > > + > > + case ND_CMD_GET_CONFIG_DATA: > > + *cmd_rc =3D ndctl_config_read(ocxlpmem, buf); > > + return 0; > > + > > + case ND_CMD_SET_CONFIG_DATA: > > + *cmd_rc =3D ndctl_config_write(ocxlpmem, buf); > > + return 0; > > + > > + default: > > + return -ENOTTY; > > + } > > +} > > + > > +/** > > + * reserve_metadata() - Reserve space for nvdimm metadata > > + * @ocxlpmem: the device metadata > > + * @lpc_mem: The resource representing the LPC memory of the > > OpenCAPI device > > + */ > > +static int reserve_metadata(struct ocxlpmem *ocxlpmem, > > + struct resource *lpc_mem) > > +{ > > + ocxlpmem->metadata_addr =3D devm_memremap(&ocxlpmem->dev, > > lpc_mem->start, > > + LABEL_AREA_SIZE, > > MEMREMAP_WB); > > + if (IS_ERR(ocxlpmem->metadata_addr)) > > + return PTR_ERR(ocxlpmem->metadata_addr); > > + > > + return 0; > > +} > > + > > +/** > > + * register_lpc_mem() - Discover persistent memory on a device and > > register it with the NVDIMM subsystem > > + * @ocxlpmem: the device metadata > > + * Return: 0 on success > > + */ > > +static int register_lpc_mem(struct ocxlpmem *ocxlpmem) > > +{ > > + struct nd_region_desc region_desc; > > + struct nd_mapping_desc nd_mapping_desc; > > + struct resource *lpc_mem; > > + const struct ocxl_afu_config *config; > > + const struct ocxl_fn_config *fn_config; > > + int rc; > > + unsigned long nvdimm_cmd_mask =3D 0; > > + unsigned long nvdimm_flags =3D 0; > > + int target_node; > > + char serial[16+1]; > > + > > + // Set up the reserved metadata area > > + rc =3D ocxl_afu_map_lpc_mem(ocxlpmem->ocxl_afu); > > + if (rc < 0) > > + return rc; > > + > > + lpc_mem =3D ocxl_afu_lpc_mem(ocxlpmem->ocxl_afu); > > + if (lpc_mem =3D=3D NULL || lpc_mem->start =3D=3D 0) > > + return -EINVAL; > > + > > + config =3D ocxl_afu_config(ocxlpmem->ocxl_afu); > > + fn_config =3D ocxl_function_config(ocxlpmem->ocxl_fn); > > + > > + rc =3D reserve_metadata(ocxlpmem, lpc_mem); > > + if (rc) > > + return rc; > > + > > + ocxlpmem->bus_desc.provider_name =3D "ocxl-pmem"; > > + ocxlpmem->bus_desc.ndctl =3D ndctl; > > + ocxlpmem->bus_desc.module =3D THIS_MODULE; > > + > > + ocxlpmem->nvdimm_bus =3D nvdimm_bus_register(&ocxlpmem->dev, > > + &ocxlpmem- > > >bus_desc); > > + if (!ocxlpmem->nvdimm_bus) > > + return -EINVAL; > > + > > + ocxlpmem->pmem_res.start =3D (u64)lpc_mem->start + > > LABEL_AREA_SIZE; > > + ocxlpmem->pmem_res.end =3D (u64)lpc_mem->start + config- > > >lpc_mem_size - 1; > > + ocxlpmem->pmem_res.name =3D "OpenCAPI persistent memory"; > > + > > + set_bit(ND_CMD_GET_CONFIG_SIZE, &nvdimm_cmd_mask); > > + set_bit(ND_CMD_GET_CONFIG_DATA, &nvdimm_cmd_mask); > > + set_bit(ND_CMD_SET_CONFIG_DATA, &nvdimm_cmd_mask); > > + > > + set_bit(NDD_ALIASING, &nvdimm_flags); > > + > > + snprintf(serial, sizeof(serial), "%llx", fn_config->serial); > > + nd_mapping_desc.nvdimm =3D nvdimm_create(ocxlpmem->nvdimm_bus, > > ocxlpmem, > > + NULL, nvdimm_flags, nvdimm_cmd_mask, > > + 0, NULL); > > + if (!nd_mapping_desc.nvdimm) > > + return -ENOMEM; > > + > > + if (nvdimm_bus_check_dimm_count(ocxlpmem->nvdimm_bus, 1)) > > + return -EINVAL; > > + > > + nd_mapping_desc.start =3D ocxlpmem->pmem_res.start; > > + nd_mapping_desc.size =3D resource_size(&ocxlpmem->pmem_res); > > + nd_mapping_desc.position =3D 0; > > + > > + ocxlpmem->nd_set.cookie1 =3D fn_config->serial + 1; // allow for > > empty serial > > + ocxlpmem->nd_set.cookie2 =3D fn_config->serial + 1; > > + > > + target_node =3D of_node_to_nid(ocxlpmem->pdev->dev.of_node); > > + > > + memset(®ion_desc, 0, sizeof(region_desc)); > > + region_desc.res =3D &ocxlpmem->pmem_res; > > + region_desc.numa_node =3D NUMA_NO_NODE; > > + region_desc.target_node =3D target_node; > > + region_desc.num_mappings =3D 1; > > + region_desc.mapping =3D &nd_mapping_desc; > > + region_desc.nd_set =3D &ocxlpmem->nd_set; > > + > > + set_bit(ND_REGION_PAGEMAP, ®ion_desc.flags); > > + /* > > + * NB: libnvdimm copies the data from ndr_desc into it's own > > + * structures so passing a stack pointer is fine. > > + */ > > + ocxlpmem->nd_region =3D nvdimm_pmem_region_create(ocxlpmem- > > >nvdimm_bus, > > + ®ion_desc); > > + if (!ocxlpmem->nd_region) > > + return -EINVAL; > > + > > + dev_info(&ocxlpmem->dev, > > + "Onlining %lluMB of persistent memory\n", > > + nd_mapping_desc.size / SZ_1M); > > + > > + return 0; > > +} > > + > > +/** > > + * allocate_minor() - Allocate a minor number to use for an > > OpenCAPI pmem device > > + * @ocxlpmem: the device metadata > > + * Return: the allocated minor number > > + */ > > +static int allocate_minor(struct ocxlpmem *ocxlpmem) > > +{ > > + int minor; > > + > > + mutex_lock(&minors_idr_lock); > > + minor =3D idr_alloc(&minors_idr, ocxlpmem, 0, NUM_MINORS, > > GFP_KERNEL); > > + mutex_unlock(&minors_idr_lock); > > + return minor; > > +} > > + > > +static void free_minor(struct ocxlpmem *ocxlpmem) > > +{ > > + mutex_lock(&minors_idr_lock); > > + idr_remove(&minors_idr, MINOR(ocxlpmem->dev.devt)); > > + mutex_unlock(&minors_idr_lock); > > +} > > + > > +/** > > + * free_ocxlpmem() - Free all members of an ocxlpmem struct > > + * @ocxlpmem: the device struct to clear > > + */ > > +static void free_ocxlpmem(struct ocxlpmem *ocxlpmem) > > +{ > > + int rc; > > + > > + if (ocxlpmem->nvdimm_bus) > > + nvdimm_bus_unregister(ocxlpmem->nvdimm_bus); > > + > > + free_minor(ocxlpmem); > > + > > + if (ocxlpmem->metadata_addr) > > + devm_memunmap(&ocxlpmem->dev, ocxlpmem->metadata_addr); > > + > > + if (ocxlpmem->ocxl_context) { > > + rc =3D ocxl_context_detach(ocxlpmem->ocxl_context); > > + if (rc =3D=3D -EBUSY) > > + dev_warn(&ocxlpmem->dev, "Timeout detaching > > ocxl context\n"); > > + else > > + ocxl_context_free(ocxlpmem->ocxl_context); > > + > > + } > > + > > + if (ocxlpmem->ocxl_afu) > > + ocxl_afu_put(ocxlpmem->ocxl_afu); > > + > > + if (ocxlpmem->ocxl_fn) > > + ocxl_function_close(ocxlpmem->ocxl_fn); > > + > > + kfree(ocxlpmem); > > +} > > + > > +/** > > + * free_ocxlpmem_dev() - Free an OpenCAPI persistent memory device > > + * @dev: The device struct > > + */ > > +static void free_ocxlpmem_dev(struct device *dev) > > +{ > > + struct ocxlpmem *ocxlpmem =3D container_of(dev, struct ocxlpmem, > > dev); > > + > > + free_ocxlpmem(ocxlpmem); > > +} > > + > > +/** > > + * ocxlpmem_register() - Register an OpenCAPI pmem device with the > > kernel > > + * @ocxlpmem: the device metadata > > + * Return: 0 on success, negative on failure > > + */ > > +static int ocxlpmem_register(struct ocxlpmem *ocxlpmem) > > +{ > > + int rc; > > + int minor =3D allocate_minor(ocxlpmem); > > + > > + if (minor < 0) > > + return minor; > > + > > + ocxlpmem->dev.release =3D free_ocxlpmem_dev; > > + rc =3D dev_set_name(&ocxlpmem->dev, "ocxlpmem%d", minor); > > + if (rc < 0) > > + return rc; > > + > > + ocxlpmem->dev.devt =3D MKDEV(MAJOR(ocxlpmem_dev), minor); > > + ocxlpmem->dev.class =3D ocxlpmem_class; >=20 >=20 > This function, as well as allocate_minor() and free_minor() above=20 > reference resources (the IDR, the file class, ...) which are not=20 > initialized yet. The function file_init() is coming in a later patch. >=20 Thanks, I caught this at runtime when I booted a kernel with (just) this patch :) Fixed in v4. >=20 >=20 > > + ocxlpmem->dev.parent =3D &ocxlpmem->pdev->dev; > > + > > + return device_register(&ocxlpmem->dev); > > +} > > + > > +/** > > + * ocxlpmem_remove() - Free an OpenCAPI persistent memory device > > + * @pdev: the PCI device information struct > > + */ > > +static void ocxlpmem_remove(struct pci_dev *pdev) > > +{ > > + if (PCI_FUNC(pdev->devfn) =3D=3D 0) { > > + struct ocxlpmem_function0 *func0 =3D > > pci_get_drvdata(pdev); > > + > > + if (func0) { > > + ocxl_function_close(func0->ocxl_fn); > > + func0->ocxl_fn =3D NULL; > > + } >=20 >=20 > The struct ocxlpmem_function0 allocated on probe() should be freed. >=20 I've dropped struct as per the thread from Andrew Donellan. >=20 >=20 > > + } else { > > + struct ocxlpmem *ocxlpmem =3D pci_get_drvdata(pdev); > > + > > + if (ocxlpmem) > > + device_unregister(&ocxlpmem->dev); > > + } > > +} > > + > > +/** > > + * probe_function0() - Set up function 0 for an OpenCAPI > > persistent memory device > > + * This is important as it enables templates higher than 0 across > > all other functions, > > + * which in turn enables higher bandwidth accesses > > + * @pdev: the PCI device information struct > > + * Return: 0 on success, negative on failure > > + */ > > +static int probe_function0(struct pci_dev *pdev) > > +{ > > + struct ocxlpmem_function0 *func0 =3D NULL; > > + struct ocxl_fn *fn; > > + > > + func0 =3D kzalloc(sizeof(*func0), GFP_KERNEL); > > + if (!func0) > > + return -ENOMEM; > > + > > + func0->pdev =3D pdev; >=20 >=20 > Storing the struct pci_dev for function 0 appears to be useless. >=20 Yup >=20 >=20 > > + fn =3D ocxl_function_open(pdev); > > + if (IS_ERR(fn)) { > > + kfree(func0); > > + dev_err(&pdev->dev, "failed to open OCXL function\n"); > > + return PTR_ERR(fn); > > + } > > + func0->ocxl_fn =3D fn; > > + > > + pci_set_drvdata(pdev, func0); > > + > > + return 0; > > +} > > + > > +/** > > + * probe() - Init an OpenCAPI persistent memory device > > + * @pdev: the PCI device information struct > > + * @ent: The entry from ocxlpmem_pci_tbl > > + * Return: 0 on success, negative on failure > > + */ > > +static int probe(struct pci_dev *pdev, const struct pci_device_id > > *ent) > > +{ > > + struct ocxlpmem *ocxlpmem; > > + int rc; > > + > > + if (PCI_FUNC(pdev->devfn) =3D=3D 0) > > + return probe_function0(pdev); > > + else if (PCI_FUNC(pdev->devfn) !=3D 1) > > + return 0; > > + > > + ocxlpmem =3D kzalloc(sizeof(*ocxlpmem), GFP_KERNEL); > > + if (!ocxlpmem) { > > + dev_err(&pdev->dev, "Could not allocate OpenCAPI > > persistent memory metadata\n"); > > + rc =3D -ENOMEM; > > + goto err; > > + } > > + ocxlpmem->pdev =3D pdev; >=20 > We should probably call pci_dev_get() here if we store the struct=20 > pci_dev pointer. We could debate how useful it really is, > considering=20 > we're registering a device, which will also take a reference, but it=20 > looks like the safe thing to do considering all those resources > don't=20 > have exactly the same life cycle and it is standard practice to=20 > guarantee that we won't have a dangling pointer. >=20 Ok >=20 >=20 > > + > > + pci_set_drvdata(pdev, ocxlpmem); > > + > > + ocxlpmem->ocxl_fn =3D ocxl_function_open(pdev); > > + if (IS_ERR(ocxlpmem->ocxl_fn)) { > > + kfree(ocxlpmem); >=20 > ocxlpmem is freed... >=20 >=20 > > + pci_set_drvdata(pdev, NULL); > > + dev_err(&pdev->dev, "failed to open OCXL function\n"); > > + rc =3D PTR_ERR(ocxlpmem->ocxl_fn); >=20 > ... and then referenced. >=20 Ok >=20 >=20 > > + goto err; > > + } > > + > > + ocxlpmem->ocxl_afu =3D ocxl_function_fetch_afu(ocxlpmem->ocxl_fn,=20 > > 0); > > + if (ocxlpmem->ocxl_afu =3D=3D NULL) { > > + dev_err(&pdev->dev, "Could not get OCXL AFU from > > function\n"); >=20 > The error path here should match the above, to free struct ocxlpmem. >=20 Yup, I've factored out err_unregiseterd to unify the error paths. >=20 > > + rc =3D -ENXIO; > > + goto err; > > + } > > + > > + ocxl_afu_get(ocxlpmem->ocxl_afu); > > + > > + // Resources allocated below here are cleaned up in the release > > handler > > + > > + rc =3D ocxlpmem_register(ocxlpmem); > > + if (rc) { > > + dev_err(&pdev->dev, "Could not register OpenCAPI > > persistent memory device with the kernel\n"); > > + goto err; > > + } > > + > > + rc =3D ocxl_context_alloc(&ocxlpmem->ocxl_context, ocxlpmem- > > >ocxl_afu, NULL); > > + if (rc) { > > + dev_err(&pdev->dev, "Could not allocate OCXL > > context\n"); > > + goto err; > > + } > > + > > + rc =3D ocxl_context_attach(ocxlpmem->ocxl_context, 0, NULL); > > + if (rc) { > > + dev_err(&pdev->dev, "Could not attach ocxl context\n"); > > + goto err; > > + } > > + > > + rc =3D register_lpc_mem(ocxlpmem); > > + if (rc) { > > + dev_err(&pdev->dev, "Could not register OpenCAPI > > persistent memory with libnvdimm\n"); > > + goto err; > > + } > > + > > + return 0; > > + > > +err: > > + /* > > + * Further cleanup is done in the release handler via > > free_ocxlpmem() > > + * This allows us to keep the character device live to handle > > IOCTLs to > > + * investigate issues if the card has an error > > + */ >=20 > If we fail probe, we don't call device_unregister() and the data=20 > structures will never be freed. The comment seems to indicate it's > done=20 > on purpose but that looks suprising and wrong. If we fail probe, the=20 > kernel thinks the driver it _not_ handling the device, so we need to=20 > exit probe() cleanly. We're not supposed to be able to make some > debug=20 > ioctl calls. Once we fail probe, the kernel is free to do whatever > it=20 > wants with the pci device. If you manage to extract some debug info=20 > during development, then fine, but it's not something we can rely on > and=20 > upstream. > If the card enters an error state after probe(), then we don't need > that=20 > anyway. We have all the time in the world to call ioctl's, as long as > we=20 > don't call the remove callback of the driver. >=20 >=20 Ok >=20 >=20 > > + > > + dev_err(&pdev->dev, > > + "Error detected, will not register OpenCAPI persistent > > memory\n"); > > + return rc; > > +} > > + > > +static struct pci_driver pci_driver =3D { > > + .name =3D "ocxl-pmem", > > + .id_table =3D ocxlpmem_pci_tbl, > > + .probe =3D probe, > > + .remove =3D ocxlpmem_remove, > > + .shutdown =3D ocxlpmem_remove, >=20 >=20 > nitpick: why doesn't the probe callback follow the same naming=20 > convention? It's all static and doesn't really matter, but... I had dropped the prefix when I renamed from scm as it doesn't really value-add, but clearly, I missed some :) I'll fix it. >=20 > Fred >=20 >=20 >=20 > > +}; > > + > > +static int __init ocxlpmem_init(void) > > +{ > > + int rc =3D 0; > > + > > + rc =3D pci_register_driver(&pci_driver); > > + if (rc) > > + return rc; > > + > > + return 0; > > +} > > + > > +static void ocxlpmem_exit(void) > > +{ > > + pci_unregister_driver(&pci_driver); > > +} > > + > > +module_init(ocxlpmem_init); > > +module_exit(ocxlpmem_exit); > > + > > +MODULE_DESCRIPTION("OpenCAPI Persistent Memory"); > > +MODULE_LICENSE("GPL"); > > diff --git a/arch/powerpc/platforms/powernv/pmem/ocxl_internal.h > > b/arch/powerpc/platforms/powernv/pmem/ocxl_internal.h > > new file mode 100644 > > index 000000000000..0faf3740e9b8 > > --- /dev/null > > +++ b/arch/powerpc/platforms/powernv/pmem/ocxl_internal.h > > @@ -0,0 +1,28 @@ > > +// SPDX-License-Identifier: GPL-2.0+ > > +// Copyright 2019 IBM Corp. > > + > > +#include > > +#include > > +#include > > +#include > > + > > +#define LABEL_AREA_SIZE (1UL << PA_SECTION_SHIFT) > > + > > +struct ocxlpmem_function0 { > > + struct pci_dev *pdev; > > + struct ocxl_fn *ocxl_fn; > > +}; > > + > > +struct ocxlpmem { > > + struct device dev; > > + struct pci_dev *pdev; > > + struct ocxl_fn *ocxl_fn; > > + struct nd_interleave_set nd_set; > > + struct nvdimm_bus_descriptor bus_desc; > > + struct nvdimm_bus *nvdimm_bus; > > + struct ocxl_afu *ocxl_afu; > > + struct ocxl_context *ocxl_context; > > + void *metadata_addr; > > + struct resource pmem_res; > > + struct nd_region *nd_region; > > +}; > >=20 --=20 Alastair D'Silva Open Source Developer Linux Technology Centre, IBM Australia mob: 0423 762 819