linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4]  cxl/mem: Fix memdev device setup
@ 2021-03-30 23:36 Dan Williams
  2021-03-30 23:36 ` [PATCH v3 1/4] cxl/mem: Use sysfs_emit() for attribute show routines Dan Williams
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Dan Williams @ 2021-03-30 23:36 UTC (permalink / raw)
  To: linux-cxl
  Cc: Ben Widawsky, Jason Gunthorpe, linux-kernel, vishal.l.verma,
	ira.weiny, alison.schofield

Changes since v2: [1]
- switch from non-idiomatic srcu synchronization of the device
  registration state to rwsem protection of the cxlmd->cxlm pointer.
  (Jason)

[1]: http://lore.kernel.org/r/161707245893.2072157.6743322596719518693.stgit@dwillia2-desk3.amr.corp.intel.com

---

A collection of fixes initially inspired by Jason's recognition of
dev_set_name() error handling mistakes on other driver review, but also
from a deeper discussion of idiomatic device operation shutdown flows.
The end result is easier to reason about and validate. Thank you, Jason.

The sysfs_emit() fixup and unpublishing of device power management files
are independent sanity cleanups.

---

Dan Williams (4):
      cxl/mem: Use sysfs_emit() for attribute show routines
      cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations
      cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures
      cxl/mem: Disable cxl device power management


 drivers/cxl/mem.c |  141 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 83 insertions(+), 58 deletions(-)

base-commit: a38fd8748464831584a19438cbb3082b5a2dab15

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/4] cxl/mem: Use sysfs_emit() for attribute show routines
  2021-03-30 23:36 [PATCH v3 0/4] cxl/mem: Fix memdev device setup Dan Williams
@ 2021-03-30 23:36 ` Dan Williams
  2021-03-30 23:36 ` [PATCH v3 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations Dan Williams
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2021-03-30 23:36 UTC (permalink / raw)
  To: linux-cxl
  Cc: Jason Gunthorpe, Ben Widawsky, Jason Gunthorpe, linux-kernel,
	vishal.l.verma, ira.weiny, alison.schofield

While none the CXL sysfs attributes are threatening to overrun a
PAGE_SIZE of output, it is good form to use the recommended helpers.

Fixes: b39cb1052a5c ("cxl/mem: Register CXL memX devices")
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/161661971101.1721612.16412318662284948582.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/mem.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index ecfc9ccdba8d..30bf4f0f3c17 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -1066,7 +1066,7 @@ static ssize_t firmware_version_show(struct device *dev,
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_mem *cxlm = cxlmd->cxlm;
 
-	return sprintf(buf, "%.16s\n", cxlm->firmware_version);
+	return sysfs_emit(buf, "%.16s\n", cxlm->firmware_version);
 }
 static DEVICE_ATTR_RO(firmware_version);
 
@@ -1076,7 +1076,7 @@ static ssize_t payload_max_show(struct device *dev,
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_mem *cxlm = cxlmd->cxlm;
 
-	return sprintf(buf, "%zu\n", cxlm->payload_size);
+	return sysfs_emit(buf, "%zu\n", cxlm->payload_size);
 }
 static DEVICE_ATTR_RO(payload_max);
 
@@ -1087,7 +1087,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr,
 	struct cxl_mem *cxlm = cxlmd->cxlm;
 	unsigned long long len = range_len(&cxlm->ram_range);
 
-	return sprintf(buf, "%#llx\n", len);
+	return sysfs_emit(buf, "%#llx\n", len);
 }
 
 static struct device_attribute dev_attr_ram_size =
@@ -1100,7 +1100,7 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr,
 	struct cxl_mem *cxlm = cxlmd->cxlm;
 	unsigned long long len = range_len(&cxlm->pmem_range);
 
-	return sprintf(buf, "%#llx\n", len);
+	return sysfs_emit(buf, "%#llx\n", len);
 }
 
 static struct device_attribute dev_attr_pmem_size =


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations
  2021-03-30 23:36 [PATCH v3 0/4] cxl/mem: Fix memdev device setup Dan Williams
  2021-03-30 23:36 ` [PATCH v3 1/4] cxl/mem: Use sysfs_emit() for attribute show routines Dan Williams
@ 2021-03-30 23:36 ` Dan Williams
  2021-03-31 13:07   ` Jason Gunthorpe
  2021-03-30 23:36 ` [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures Dan Williams
  2021-03-30 23:36 ` [PATCH v3 4/4] cxl/mem: Disable cxl device power management Dan Williams
  3 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2021-03-30 23:36 UTC (permalink / raw)
  To: linux-cxl
  Cc: Jason Gunthorpe, linux-kernel, vishal.l.verma, ira.weiny,
	alison.schofield

The percpu_ref to gate whether cxl_memdev_ioctl() is free to use the
driver context (@cxlm) to issue I/O is overkill, implemented incorrectly
(missing a device reference before accessing the percpu_ref), and the
complexities of shutting down a percpu_ref contributed to a bug in the
error unwind in cxl_mem_add_memdev() (missing put_device() to be fixed
separately).

Use an rwsem to explicitly synchronize the usage of cxlmd->cxlm, and add
the missing reference counting for cxlmd in cxl_memdev_open() and
cxl_memdev_release_file().

Fixes: b39cb1052a5c ("cxl/mem: Register CXL memX devices")
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/mem.c |   97 ++++++++++++++++++++++++++++-------------------------
 1 file changed, 51 insertions(+), 46 deletions(-)

diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 30bf4f0f3c17..2cf620d201a6 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -96,21 +96,18 @@ struct mbox_cmd {
  * @dev: driver core device object
  * @cdev: char dev core object for ioctl operations
  * @cxlm: pointer to the parent device driver data
- * @ops_active: active user of @cxlm in ops handlers
- * @ops_dead: completion when all @cxlm ops users have exited
  * @id: id number of this memdev instance.
  */
 struct cxl_memdev {
 	struct device dev;
 	struct cdev cdev;
 	struct cxl_mem *cxlm;
-	struct percpu_ref ops_active;
-	struct completion ops_dead;
 	int id;
 };
 
 static int cxl_mem_major;
 static DEFINE_IDA(cxl_memdev_ida);
+static DECLARE_RWSEM(cxl_memdev_rwsem);
 static struct dentry *cxl_debugfs;
 static bool cxl_raw_allow_all;
 
@@ -776,26 +773,43 @@ static long __cxl_memdev_ioctl(struct cxl_memdev *cxlmd, unsigned int cmd,
 static long cxl_memdev_ioctl(struct file *file, unsigned int cmd,
 			     unsigned long arg)
 {
-	struct cxl_memdev *cxlmd;
-	struct inode *inode;
-	int rc = -ENOTTY;
+	struct cxl_memdev *cxlmd = file->private_data;
+	int rc = -ENXIO;
 
-	inode = file_inode(file);
-	cxlmd = container_of(inode->i_cdev, typeof(*cxlmd), cdev);
+	down_read(&cxl_memdev_rwsem);
+	if (cxlmd->cxlm)
+		rc = __cxl_memdev_ioctl(cxlmd, cmd, arg);
+	up_read(&cxl_memdev_rwsem);
 
-	if (!percpu_ref_tryget_live(&cxlmd->ops_active))
-		return -ENXIO;
+	return rc;
+}
 
-	rc = __cxl_memdev_ioctl(cxlmd, cmd, arg);
+static int cxl_memdev_open(struct inode *inode, struct file *file)
+{
+	struct cxl_memdev *cxlmd =
+		container_of(inode->i_cdev, typeof(*cxlmd), cdev);
 
-	percpu_ref_put(&cxlmd->ops_active);
+	get_device(&cxlmd->dev);
+	file->private_data = cxlmd;
 
-	return rc;
+	return 0;
+}
+
+static int cxl_memdev_release_file(struct inode *inode, struct file *file)
+{
+	struct cxl_memdev *cxlmd =
+		container_of(inode->i_cdev, typeof(*cxlmd), cdev);
+
+	put_device(&cxlmd->dev);
+
+	return 0;
 }
 
 static const struct file_operations cxl_memdev_fops = {
 	.owner = THIS_MODULE,
 	.unlocked_ioctl = cxl_memdev_ioctl,
+	.open = cxl_memdev_open,
+	.release = cxl_memdev_release_file,
 	.compat_ioctl = compat_ptr_ioctl,
 	.llseek = noop_llseek,
 };
@@ -1049,7 +1063,6 @@ static void cxl_memdev_release(struct device *dev)
 {
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 
-	percpu_ref_exit(&cxlmd->ops_active);
 	ida_free(&cxl_memdev_ida, cxlmd->id);
 	kfree(cxlmd);
 }
@@ -1150,24 +1163,28 @@ static const struct device_type cxl_memdev_type = {
 	.groups = cxl_memdev_attribute_groups,
 };
 
-static void cxlmdev_unregister(void *_cxlmd)
+static void cxl_memdev_activate(struct cxl_memdev *cxlmd, struct cxl_mem *cxlm)
 {
-	struct cxl_memdev *cxlmd = _cxlmd;
-	struct device *dev = &cxlmd->dev;
+	cxlmd->cxlm = cxlm;
+	down_write(&cxl_memdev_rwsem);
+	up_write(&cxl_memdev_rwsem);
+}
 
-	percpu_ref_kill(&cxlmd->ops_active);
-	cdev_device_del(&cxlmd->cdev, dev);
-	wait_for_completion(&cxlmd->ops_dead);
+static void cxl_memdev_shutdown(struct cxl_memdev *cxlmd)
+{
+	down_write(&cxl_memdev_rwsem);
 	cxlmd->cxlm = NULL;
-	put_device(dev);
+	up_write(&cxl_memdev_rwsem);
 }
 
-static void cxlmdev_ops_active_release(struct percpu_ref *ref)
+static void cxl_memdev_unregister(void *_cxlmd)
 {
-	struct cxl_memdev *cxlmd =
-		container_of(ref, typeof(*cxlmd), ops_active);
+	struct cxl_memdev *cxlmd = _cxlmd;
+	struct device *dev = &cxlmd->dev;
 
-	complete(&cxlmd->ops_dead);
+	cdev_device_del(&cxlmd->cdev, dev);
+	cxl_memdev_shutdown(cxlmd);
+	put_device(dev);
 }
 
 static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
@@ -1181,17 +1198,6 @@ static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
 	cxlmd = kzalloc(sizeof(*cxlmd), GFP_KERNEL);
 	if (!cxlmd)
 		return -ENOMEM;
-	init_completion(&cxlmd->ops_dead);
-
-	/*
-	 * @cxlm is deallocated when the driver unbinds so operations
-	 * that are using it need to hold a live reference.
-	 */
-	cxlmd->cxlm = cxlm;
-	rc = percpu_ref_init(&cxlmd->ops_active, cxlmdev_ops_active_release, 0,
-			     GFP_KERNEL);
-	if (rc)
-		goto err_ref;
 
 	rc = ida_alloc_range(&cxl_memdev_ida, 0, CXL_MEM_MAX_DEVS, GFP_KERNEL);
 	if (rc < 0)
@@ -1209,23 +1215,22 @@ static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
 	cdev = &cxlmd->cdev;
 	cdev_init(cdev, &cxl_memdev_fops);
 
+	cxl_memdev_activate(cxlmd, cxlm);
 	rc = cdev_device_add(cdev, dev);
 	if (rc)
 		goto err_add;
 
-	return devm_add_action_or_reset(dev->parent, cxlmdev_unregister, cxlmd);
+	return devm_add_action_or_reset(&pdev->dev, cxl_memdev_unregister,
+					cxlmd);
 
 err_add:
-	ida_free(&cxl_memdev_ida, cxlmd->id);
-err_id:
 	/*
-	 * Theoretically userspace could have already entered the fops,
-	 * so flush ops_active.
+	 * The cdev was briefly live, shutdown any ioctl operations that
+	 * saw that state.
 	 */
-	percpu_ref_kill(&cxlmd->ops_active);
-	wait_for_completion(&cxlmd->ops_dead);
-	percpu_ref_exit(&cxlmd->ops_active);
-err_ref:
+	cxl_memdev_shutdown(cxlmd);
+	ida_free(&cxl_memdev_ida, cxlmd->id);
+err_id:
 	kfree(cxlmd);
 
 	return rc;


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures
  2021-03-30 23:36 [PATCH v3 0/4] cxl/mem: Fix memdev device setup Dan Williams
  2021-03-30 23:36 ` [PATCH v3 1/4] cxl/mem: Use sysfs_emit() for attribute show routines Dan Williams
  2021-03-30 23:36 ` [PATCH v3 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations Dan Williams
@ 2021-03-30 23:36 ` Dan Williams
  2021-03-31 13:09   ` Jason Gunthorpe
  2021-03-30 23:36 ` [PATCH v3 4/4] cxl/mem: Disable cxl device power management Dan Williams
  3 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2021-03-30 23:36 UTC (permalink / raw)
  To: linux-cxl
  Cc: Jason Gunthorpe, linux-kernel, vishal.l.verma, ira.weiny,
	alison.schofield

While device_add() will happen to catch dev_set_name() failures it is a
broken pattern to follow given that the core may try to fall back to a
different name.

Add explicit checking for dev_set_name() failures to be cleaned up by
put_device(). Skip cdev_device_add() and proceed directly to
put_device() if the name set fails.

This type of bug is easier to see if 'alloc' is split from 'add'
operations that require put_device() on failure. So cxl_memdev_alloc()
is split out as a result.

Fixes: b39cb1052a5c ("cxl/mem: Register CXL memX devices")
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/mem.c |   41 ++++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 2cf620d201a6..759713b619ab 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -1187,7 +1187,7 @@ static void cxl_memdev_unregister(void *_cxlmd)
 	put_device(dev);
 }
 
-static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
+static struct cxl_memdev *cxl_memdev_alloc(struct cxl_mem *cxlm)
 {
 	struct pci_dev *pdev = cxlm->pdev;
 	struct cxl_memdev *cxlmd;
@@ -1197,11 +1197,11 @@ static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
 
 	cxlmd = kzalloc(sizeof(*cxlmd), GFP_KERNEL);
 	if (!cxlmd)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	rc = ida_alloc_range(&cxl_memdev_ida, 0, CXL_MEM_MAX_DEVS, GFP_KERNEL);
 	if (rc < 0)
-		goto err_id;
+		goto err;
 	cxlmd->id = rc;
 
 	dev = &cxlmd->dev;
@@ -1210,29 +1210,48 @@ static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
 	dev->bus = &cxl_bus_type;
 	dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
 	dev->type = &cxl_memdev_type;
-	dev_set_name(dev, "mem%d", cxlmd->id);
 
 	cdev = &cxlmd->cdev;
 	cdev_init(cdev, &cxl_memdev_fops);
+	return cxlmd;
+
+err:
+	kfree(cxlmd);
+	return ERR_PTR(rc);
+}
+
+static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
+{
+	struct cxl_memdev *cxlmd;
+	struct device *dev;
+	struct cdev *cdev;
+	int rc;
+
+	cxlmd = cxl_memdev_alloc(cxlm);
+	if (IS_ERR(cxlmd))
+		return PTR_ERR(cxlmd);
+
+	dev = &cxlmd->dev;
+	rc = dev_set_name(dev, "mem%d", cxlmd->id);
+	if (rc)
+		goto err;
 
+	cdev = &cxlmd->cdev;
 	cxl_memdev_activate(cxlmd, cxlm);
 	rc = cdev_device_add(cdev, dev);
 	if (rc)
-		goto err_add;
+		goto err;
 
-	return devm_add_action_or_reset(&pdev->dev, cxl_memdev_unregister,
+	return devm_add_action_or_reset(dev->parent, cxl_memdev_unregister,
 					cxlmd);
 
-err_add:
+err:
 	/*
 	 * The cdev was briefly live, shutdown any ioctl operations that
 	 * saw that state.
 	 */
 	cxl_memdev_shutdown(cxlmd);
-	ida_free(&cxl_memdev_ida, cxlmd->id);
-err_id:
-	kfree(cxlmd);
-
+	put_device(dev);
 	return rc;
 }
 


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 4/4] cxl/mem: Disable cxl device power management
  2021-03-30 23:36 [PATCH v3 0/4] cxl/mem: Fix memdev device setup Dan Williams
                   ` (2 preceding siblings ...)
  2021-03-30 23:36 ` [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures Dan Williams
@ 2021-03-30 23:36 ` Dan Williams
  3 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2021-03-30 23:36 UTC (permalink / raw)
  To: linux-cxl
  Cc: Ben Widawsky, linux-kernel, vishal.l.verma, ira.weiny, alison.schofield

There is no power management of cxl virtual devices, disable
device-power-management and runtime-power-management to prevent
userspace from growing expectations of those attributes appearing. They
can be added back in the future if needed.

Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/mem.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 759713b619ab..a8f750a9e2e4 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -1210,6 +1210,7 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_mem *cxlm)
 	dev->bus = &cxl_bus_type;
 	dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
 	dev->type = &cxl_memdev_type;
+	device_set_pm_not_required(dev);
 
 	cdev = &cxlmd->cdev;
 	cdev_init(cdev, &cxl_memdev_fops);


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations
  2021-03-30 23:36 ` [PATCH v3 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations Dan Williams
@ 2021-03-31 13:07   ` Jason Gunthorpe
  2021-03-31 15:45     ` Dan Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2021-03-31 13:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, linux-kernel, vishal.l.verma, ira.weiny, alison.schofield

On Tue, Mar 30, 2021 at 04:36:37PM -0700, Dan Williams wrote:
> -static void cxlmdev_unregister(void *_cxlmd)
> +static void cxl_memdev_activate(struct cxl_memdev *cxlmd, struct cxl_mem *cxlm)
>  {
> -	struct cxl_memdev *cxlmd = _cxlmd;
> -	struct device *dev = &cxlmd->dev;
> +	cxlmd->cxlm = cxlm;
> +	down_write(&cxl_memdev_rwsem);
> +	up_write(&cxl_memdev_rwsem);
> +}

No reason not to put the assignment inside the lock. Though using the
lock at all is overkill as the pointer hasn't left the local stack
frame at this point.

>  err_add:
> -	ida_free(&cxl_memdev_ida, cxlmd->id);
> -err_id:
>  	/*
> -	 * Theoretically userspace could have already entered the fops,
> -	 * so flush ops_active.
> +	 * The cdev was briefly live, shutdown any ioctl operations that
> +	 * saw that state.
>  	 */

Wow it is really subtle that cdev_device_add has this tiny hole where
it can fail but have already allowed open() :<

Other than the lock it looks OK

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures
  2021-03-30 23:36 ` [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures Dan Williams
@ 2021-03-31 13:09   ` Jason Gunthorpe
  2021-03-31 16:04     ` Dan Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2021-03-31 13:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, linux-kernel, vishal.l.verma, ira.weiny, alison.schofield

On Tue, Mar 30, 2021 at 04:36:42PM -0700, Dan Williams wrote:
> +static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
> +{
> +	struct cxl_memdev *cxlmd;
> +	struct device *dev;
> +	struct cdev *cdev;
> +	int rc;
> +
> +	cxlmd = cxl_memdev_alloc(cxlm);
> +	if (IS_ERR(cxlmd))
> +		return PTR_ERR(cxlmd);
> +
> +	dev = &cxlmd->dev;
> +	rc = dev_set_name(dev, "mem%d", cxlmd->id);
> +	if (rc)
> +		goto err;
>  
> +	cdev = &cxlmd->cdev;
>  	cxl_memdev_activate(cxlmd, cxlm);
>  	rc = cdev_device_add(cdev, dev);
>  	if (rc)
> -		goto err_add;
> +		goto err;

It might read nicer to have the error unwind here just call cxl_memdev_unregister()

> -	return devm_add_action_or_reset(&pdev->dev, cxl_memdev_unregister,
> +	return devm_add_action_or_reset(dev->parent, cxl_memdev_unregister,
>  					cxlmd);

Since that is what the error unwind does at this point.

>  
> -err_add:
> +err:
>  	/*
>  	 * The cdev was briefly live, shutdown any ioctl operations that
>  	 * saw that state.
>  	 */
>  	cxl_memdev_shutdown(cxlmd);

Then this doesn't need to be a function

But it is OK as is

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations
  2021-03-31 13:07   ` Jason Gunthorpe
@ 2021-03-31 15:45     ` Dan Williams
  0 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2021-03-31 15:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-cxl, Linux Kernel Mailing List, Vishal L Verma, Weiny, Ira,
	Schofield, Alison

On Wed, Mar 31, 2021 at 6:07 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Mar 30, 2021 at 04:36:37PM -0700, Dan Williams wrote:
> > -static void cxlmdev_unregister(void *_cxlmd)
> > +static void cxl_memdev_activate(struct cxl_memdev *cxlmd, struct cxl_mem *cxlm)
> >  {
> > -     struct cxl_memdev *cxlmd = _cxlmd;
> > -     struct device *dev = &cxlmd->dev;
> > +     cxlmd->cxlm = cxlm;
> > +     down_write(&cxl_memdev_rwsem);
> > +     up_write(&cxl_memdev_rwsem);
> > +}
>
> No reason not to put the assignment inside the lock. Though using the
> lock at all is overkill as the pointer hasn't left the local stack
> frame at this point.

Right, I was considering just leaving it as a bare pointer assignment,
in fact that must be sufficient as publishing the cdev needs to depend
on all cdev init having completed. So if this write somehow leaks into
cdev_device_add() there are much larger problems afoot.

>
> >  err_add:
> > -     ida_free(&cxl_memdev_ida, cxlmd->id);
> > -err_id:
> >       /*
> > -      * Theoretically userspace could have already entered the fops,
> > -      * so flush ops_active.
> > +      * The cdev was briefly live, shutdown any ioctl operations that
> > +      * saw that state.
> >        */
>
> Wow it is really subtle that cdev_device_add has this tiny hole where
> it can fail but have already allowed open() :<

Yes, this was something I wanted to address in the cdev api proposal
integrating the debugfs fops proxy / reference counting aproach. I
want a cdev api that does not allow open until after the associated
device has registered and fired the KOBJ_ADD event.

>
> Other than the lock it looks OK
>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Thanks, Jason.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures
  2021-03-31 13:09   ` Jason Gunthorpe
@ 2021-03-31 16:04     ` Dan Williams
  2021-03-31 16:17       ` Jason Gunthorpe
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2021-03-31 16:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-cxl, Linux Kernel Mailing List, Vishal L Verma, Weiny, Ira,
	Schofield, Alison

On Wed, Mar 31, 2021 at 6:10 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Mar 30, 2021 at 04:36:42PM -0700, Dan Williams wrote:
> > +static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
> > +{
> > +     struct cxl_memdev *cxlmd;
> > +     struct device *dev;
> > +     struct cdev *cdev;
> > +     int rc;
> > +
> > +     cxlmd = cxl_memdev_alloc(cxlm);
> > +     if (IS_ERR(cxlmd))
> > +             return PTR_ERR(cxlmd);
> > +
> > +     dev = &cxlmd->dev;
> > +     rc = dev_set_name(dev, "mem%d", cxlmd->id);
> > +     if (rc)
> > +             goto err;
> >
> > +     cdev = &cxlmd->cdev;
> >       cxl_memdev_activate(cxlmd, cxlm);
> >       rc = cdev_device_add(cdev, dev);
> >       if (rc)
> > -             goto err_add;
> > +             goto err;
>
> It might read nicer to have the error unwind here just call cxl_memdev_unregister()

Perhaps, but I don't think cdev_del() and device_del() are prepared to
deal with an object that was not successfully added.

>
> > -     return devm_add_action_or_reset(&pdev->dev, cxl_memdev_unregister,
> > +     return devm_add_action_or_reset(dev->parent, cxl_memdev_unregister,
> >                                       cxlmd);
>
> Since that is what the error unwind does at this point.

Right, but at this point the code knows that cdev_del() and
device_del() will receive an object in the appropriate state.

>
> >
> > -err_add:
> > +err:
> >       /*
> >        * The cdev was briefly live, shutdown any ioctl operations that
> >        * saw that state.
> >        */
> >       cxl_memdev_shutdown(cxlmd);
>
> Then this doesn't need to be a function
>
> But it is OK as is

Unless I'm missing something I think it's required to use only
put_device() to cleanup after cdev_device_add() failure, but yes I
don't like that cxl_memdev_shutdown() needs to be open coded like
this.

>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Appreciate it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures
  2021-03-31 16:04     ` Dan Williams
@ 2021-03-31 16:17       ` Jason Gunthorpe
  2021-03-31 16:32         ` Dan Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2021-03-31 16:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Linux Kernel Mailing List, Vishal L Verma, Weiny, Ira,
	Schofield, Alison

On Wed, Mar 31, 2021 at 09:04:32AM -0700, Dan Williams wrote:
> On Wed, Mar 31, 2021 at 6:10 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Tue, Mar 30, 2021 at 04:36:42PM -0700, Dan Williams wrote:
> > > +static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
> > > +{
> > > +     struct cxl_memdev *cxlmd;
> > > +     struct device *dev;
> > > +     struct cdev *cdev;
> > > +     int rc;
> > > +
> > > +     cxlmd = cxl_memdev_alloc(cxlm);
> > > +     if (IS_ERR(cxlmd))
> > > +             return PTR_ERR(cxlmd);
> > > +
> > > +     dev = &cxlmd->dev;
> > > +     rc = dev_set_name(dev, "mem%d", cxlmd->id);
> > > +     if (rc)
> > > +             goto err;
> > >
> > > +     cdev = &cxlmd->cdev;
> > >       cxl_memdev_activate(cxlmd, cxlm);
> > >       rc = cdev_device_add(cdev, dev);
> > >       if (rc)
> > > -             goto err_add;
> > > +             goto err;
> >
> > It might read nicer to have the error unwind here just call cxl_memdev_unregister()
> 
> Perhaps, but I don't think cdev_del() and device_del() are prepared to
> deal with an object that was not successfully added.

Oh, probably not, yuk yuk yuk.

Ideally cdev_device_add should not fail in a way that allows an open,
I think that is just an artifact of it being composed of smaller
functions..

For instance if we replace the kobj_map with xarray then we can
use xa_reserve and xa_store to avoid this condition.

This actually looks like a good fit because the dev_t has pretty
"lumpy" allocations and this isn't really performance sensitive.

A clever person could then make the dev_t self allocating and solve
another pain point with this interface. Hum..

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures
  2021-03-31 16:17       ` Jason Gunthorpe
@ 2021-03-31 16:32         ` Dan Williams
  0 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2021-03-31 16:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-cxl, Linux Kernel Mailing List, Vishal L Verma, Weiny, Ira,
	Schofield, Alison

On Wed, Mar 31, 2021 at 9:18 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Wed, Mar 31, 2021 at 09:04:32AM -0700, Dan Williams wrote:
> > On Wed, Mar 31, 2021 at 6:10 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > On Tue, Mar 30, 2021 at 04:36:42PM -0700, Dan Williams wrote:
> > > > +static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
> > > > +{
> > > > +     struct cxl_memdev *cxlmd;
> > > > +     struct device *dev;
> > > > +     struct cdev *cdev;
> > > > +     int rc;
> > > > +
> > > > +     cxlmd = cxl_memdev_alloc(cxlm);
> > > > +     if (IS_ERR(cxlmd))
> > > > +             return PTR_ERR(cxlmd);
> > > > +
> > > > +     dev = &cxlmd->dev;
> > > > +     rc = dev_set_name(dev, "mem%d", cxlmd->id);
> > > > +     if (rc)
> > > > +             goto err;
> > > >
> > > > +     cdev = &cxlmd->cdev;
> > > >       cxl_memdev_activate(cxlmd, cxlm);
> > > >       rc = cdev_device_add(cdev, dev);
> > > >       if (rc)
> > > > -             goto err_add;
> > > > +             goto err;
> > >
> > > It might read nicer to have the error unwind here just call cxl_memdev_unregister()
> >
> > Perhaps, but I don't think cdev_del() and device_del() are prepared to
> > deal with an object that was not successfully added.
>
> Oh, probably not, yuk yuk yuk.
>
> Ideally cdev_device_add should not fail in a way that allows an open,
> I think that is just an artifact of it being composed of smaller
> functions..
>
> For instance if we replace the kobj_map with xarray then we can
> use xa_reserve and xa_store to avoid this condition.
>
> This actually looks like a good fit because the dev_t has pretty
> "lumpy" allocations and this isn't really performance sensitive.
>
> A clever person could then make the dev_t self allocating and solve
> another pain point with this interface. Hum..
>

...not a bad idea.

/me bookmarks this thread for future consideration.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-03-31 16:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-30 23:36 [PATCH v3 0/4] cxl/mem: Fix memdev device setup Dan Williams
2021-03-30 23:36 ` [PATCH v3 1/4] cxl/mem: Use sysfs_emit() for attribute show routines Dan Williams
2021-03-30 23:36 ` [PATCH v3 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations Dan Williams
2021-03-31 13:07   ` Jason Gunthorpe
2021-03-31 15:45     ` Dan Williams
2021-03-30 23:36 ` [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures Dan Williams
2021-03-31 13:09   ` Jason Gunthorpe
2021-03-31 16:04     ` Dan Williams
2021-03-31 16:17       ` Jason Gunthorpe
2021-03-31 16:32         ` Dan Williams
2021-03-30 23:36 ` [PATCH v3 4/4] cxl/mem: Disable cxl device power management Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).