All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4]  cxl/mem: Fix memdev device setup
@ 2021-03-30 23:36 Dan Williams
  2021-03-30 23:36 ` [PATCH v3 1/4] cxl/mem: Use sysfs_emit() for attribute show routines Dan Williams
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Dan Williams @ 2021-03-30 23:36 UTC (permalink / raw)
  To: linux-cxl
  Cc: Ben Widawsky, Jason Gunthorpe, linux-kernel, vishal.l.verma,
	ira.weiny, alison.schofield

Changes since v2: [1]
- switch from non-idiomatic srcu synchronization of the device
  registration state to rwsem protection of the cxlmd->cxlm pointer.
  (Jason)

[1]: http://lore.kernel.org/r/161707245893.2072157.6743322596719518693.stgit@dwillia2-desk3.amr.corp.intel.com

---

A collection of fixes initially inspired by Jason's recognition of
dev_set_name() error handling mistakes on other driver review, but also
from a deeper discussion of idiomatic device operation shutdown flows.
The end result is easier to reason about and validate. Thank you, Jason.

The sysfs_emit() fixup and unpublishing of device power management files
are independent sanity cleanups.

---

Dan Williams (4):
      cxl/mem: Use sysfs_emit() for attribute show routines
      cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations
      cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures
      cxl/mem: Disable cxl device power management


 drivers/cxl/mem.c |  141 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 83 insertions(+), 58 deletions(-)

base-commit: a38fd8748464831584a19438cbb3082b5a2dab15

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/4] cxl/mem: Use sysfs_emit() for attribute show routines
  2021-03-30 23:36 [PATCH v3 0/4] cxl/mem: Fix memdev device setup Dan Williams
@ 2021-03-30 23:36 ` Dan Williams
  2021-03-30 23:36 ` [PATCH v3 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations Dan Williams
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2021-03-30 23:36 UTC (permalink / raw)
  To: linux-cxl
  Cc: Jason Gunthorpe, Ben Widawsky, Jason Gunthorpe, linux-kernel,
	vishal.l.verma, ira.weiny, alison.schofield

While none the CXL sysfs attributes are threatening to overrun a
PAGE_SIZE of output, it is good form to use the recommended helpers.

Fixes: b39cb1052a5c ("cxl/mem: Register CXL memX devices")
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/161661971101.1721612.16412318662284948582.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/mem.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index ecfc9ccdba8d..30bf4f0f3c17 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -1066,7 +1066,7 @@ static ssize_t firmware_version_show(struct device *dev,
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_mem *cxlm = cxlmd->cxlm;
 
-	return sprintf(buf, "%.16s\n", cxlm->firmware_version);
+	return sysfs_emit(buf, "%.16s\n", cxlm->firmware_version);
 }
 static DEVICE_ATTR_RO(firmware_version);
 
@@ -1076,7 +1076,7 @@ static ssize_t payload_max_show(struct device *dev,
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_mem *cxlm = cxlmd->cxlm;
 
-	return sprintf(buf, "%zu\n", cxlm->payload_size);
+	return sysfs_emit(buf, "%zu\n", cxlm->payload_size);
 }
 static DEVICE_ATTR_RO(payload_max);
 
@@ -1087,7 +1087,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr,
 	struct cxl_mem *cxlm = cxlmd->cxlm;
 	unsigned long long len = range_len(&cxlm->ram_range);
 
-	return sprintf(buf, "%#llx\n", len);
+	return sysfs_emit(buf, "%#llx\n", len);
 }
 
 static struct device_attribute dev_attr_ram_size =
@@ -1100,7 +1100,7 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr,
 	struct cxl_mem *cxlm = cxlmd->cxlm;
 	unsigned long long len = range_len(&cxlm->pmem_range);
 
-	return sprintf(buf, "%#llx\n", len);
+	return sysfs_emit(buf, "%#llx\n", len);
 }
 
 static struct device_attribute dev_attr_pmem_size =


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations
  2021-03-30 23:36 [PATCH v3 0/4] cxl/mem: Fix memdev device setup Dan Williams
  2021-03-30 23:36 ` [PATCH v3 1/4] cxl/mem: Use sysfs_emit() for attribute show routines Dan Williams
@ 2021-03-30 23:36 ` Dan Williams
  2021-03-31 13:07   ` Jason Gunthorpe
  2021-03-30 23:36 ` [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures Dan Williams
  2021-03-30 23:36 ` [PATCH v3 4/4] cxl/mem: Disable cxl device power management Dan Williams
  3 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2021-03-30 23:36 UTC (permalink / raw)
  To: linux-cxl
  Cc: Jason Gunthorpe, linux-kernel, vishal.l.verma, ira.weiny,
	alison.schofield

The percpu_ref to gate whether cxl_memdev_ioctl() is free to use the
driver context (@cxlm) to issue I/O is overkill, implemented incorrectly
(missing a device reference before accessing the percpu_ref), and the
complexities of shutting down a percpu_ref contributed to a bug in the
error unwind in cxl_mem_add_memdev() (missing put_device() to be fixed
separately).

Use an rwsem to explicitly synchronize the usage of cxlmd->cxlm, and add
the missing reference counting for cxlmd in cxl_memdev_open() and
cxl_memdev_release_file().

Fixes: b39cb1052a5c ("cxl/mem: Register CXL memX devices")
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/mem.c |   97 ++++++++++++++++++++++++++++-------------------------
 1 file changed, 51 insertions(+), 46 deletions(-)

diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 30bf4f0f3c17..2cf620d201a6 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -96,21 +96,18 @@ struct mbox_cmd {
  * @dev: driver core device object
  * @cdev: char dev core object for ioctl operations
  * @cxlm: pointer to the parent device driver data
- * @ops_active: active user of @cxlm in ops handlers
- * @ops_dead: completion when all @cxlm ops users have exited
  * @id: id number of this memdev instance.
  */
 struct cxl_memdev {
 	struct device dev;
 	struct cdev cdev;
 	struct cxl_mem *cxlm;
-	struct percpu_ref ops_active;
-	struct completion ops_dead;
 	int id;
 };
 
 static int cxl_mem_major;
 static DEFINE_IDA(cxl_memdev_ida);
+static DECLARE_RWSEM(cxl_memdev_rwsem);
 static struct dentry *cxl_debugfs;
 static bool cxl_raw_allow_all;
 
@@ -776,26 +773,43 @@ static long __cxl_memdev_ioctl(struct cxl_memdev *cxlmd, unsigned int cmd,
 static long cxl_memdev_ioctl(struct file *file, unsigned int cmd,
 			     unsigned long arg)
 {
-	struct cxl_memdev *cxlmd;
-	struct inode *inode;
-	int rc = -ENOTTY;
+	struct cxl_memdev *cxlmd = file->private_data;
+	int rc = -ENXIO;
 
-	inode = file_inode(file);
-	cxlmd = container_of(inode->i_cdev, typeof(*cxlmd), cdev);
+	down_read(&cxl_memdev_rwsem);
+	if (cxlmd->cxlm)
+		rc = __cxl_memdev_ioctl(cxlmd, cmd, arg);
+	up_read(&cxl_memdev_rwsem);
 
-	if (!percpu_ref_tryget_live(&cxlmd->ops_active))
-		return -ENXIO;
+	return rc;
+}
 
-	rc = __cxl_memdev_ioctl(cxlmd, cmd, arg);
+static int cxl_memdev_open(struct inode *inode, struct file *file)
+{
+	struct cxl_memdev *cxlmd =
+		container_of(inode->i_cdev, typeof(*cxlmd), cdev);
 
-	percpu_ref_put(&cxlmd->ops_active);
+	get_device(&cxlmd->dev);
+	file->private_data = cxlmd;
 
-	return rc;
+	return 0;
+}
+
+static int cxl_memdev_release_file(struct inode *inode, struct file *file)
+{
+	struct cxl_memdev *cxlmd =
+		container_of(inode->i_cdev, typeof(*cxlmd), cdev);
+
+	put_device(&cxlmd->dev);
+
+	return 0;
 }
 
 static const struct file_operations cxl_memdev_fops = {
 	.owner = THIS_MODULE,
 	.unlocked_ioctl = cxl_memdev_ioctl,
+	.open = cxl_memdev_open,
+	.release = cxl_memdev_release_file,
 	.compat_ioctl = compat_ptr_ioctl,
 	.llseek = noop_llseek,
 };
@@ -1049,7 +1063,6 @@ static void cxl_memdev_release(struct device *dev)
 {
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 
-	percpu_ref_exit(&cxlmd->ops_active);
 	ida_free(&cxl_memdev_ida, cxlmd->id);
 	kfree(cxlmd);
 }
@@ -1150,24 +1163,28 @@ static const struct device_type cxl_memdev_type = {
 	.groups = cxl_memdev_attribute_groups,
 };
 
-static void cxlmdev_unregister(void *_cxlmd)
+static void cxl_memdev_activate(struct cxl_memdev *cxlmd, struct cxl_mem *cxlm)
 {
-	struct cxl_memdev *cxlmd = _cxlmd;
-	struct device *dev = &cxlmd->dev;
+	cxlmd->cxlm = cxlm;
+	down_write(&cxl_memdev_rwsem);
+	up_write(&cxl_memdev_rwsem);
+}
 
-	percpu_ref_kill(&cxlmd->ops_active);
-	cdev_device_del(&cxlmd->cdev, dev);
-	wait_for_completion(&cxlmd->ops_dead);
+static void cxl_memdev_shutdown(struct cxl_memdev *cxlmd)
+{
+	down_write(&cxl_memdev_rwsem);
 	cxlmd->cxlm = NULL;
-	put_device(dev);
+	up_write(&cxl_memdev_rwsem);
 }
 
-static void cxlmdev_ops_active_release(struct percpu_ref *ref)
+static void cxl_memdev_unregister(void *_cxlmd)
 {
-	struct cxl_memdev *cxlmd =
-		container_of(ref, typeof(*cxlmd), ops_active);
+	struct cxl_memdev *cxlmd = _cxlmd;
+	struct device *dev = &cxlmd->dev;
 
-	complete(&cxlmd->ops_dead);
+	cdev_device_del(&cxlmd->cdev, dev);
+	cxl_memdev_shutdown(cxlmd);
+	put_device(dev);
 }
 
 static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
@@ -1181,17 +1198,6 @@ static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
 	cxlmd = kzalloc(sizeof(*cxlmd), GFP_KERNEL);
 	if (!cxlmd)
 		return -ENOMEM;
-	init_completion(&cxlmd->ops_dead);
-
-	/*
-	 * @cxlm is deallocated when the driver unbinds so operations
-	 * that are using it need to hold a live reference.
-	 */
-	cxlmd->cxlm = cxlm;
-	rc = percpu_ref_init(&cxlmd->ops_active, cxlmdev_ops_active_release, 0,
-			     GFP_KERNEL);
-	if (rc)
-		goto err_ref;
 
 	rc = ida_alloc_range(&cxl_memdev_ida, 0, CXL_MEM_MAX_DEVS, GFP_KERNEL);
 	if (rc < 0)
@@ -1209,23 +1215,22 @@ static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
 	cdev = &cxlmd->cdev;
 	cdev_init(cdev, &cxl_memdev_fops);
 
+	cxl_memdev_activate(cxlmd, cxlm);
 	rc = cdev_device_add(cdev, dev);
 	if (rc)
 		goto err_add;
 
-	return devm_add_action_or_reset(dev->parent, cxlmdev_unregister, cxlmd);
+	return devm_add_action_or_reset(&pdev->dev, cxl_memdev_unregister,
+					cxlmd);
 
 err_add:
-	ida_free(&cxl_memdev_ida, cxlmd->id);
-err_id:
 	/*
-	 * Theoretically userspace could have already entered the fops,
-	 * so flush ops_active.
+	 * The cdev was briefly live, shutdown any ioctl operations that
+	 * saw that state.
 	 */
-	percpu_ref_kill(&cxlmd->ops_active);
-	wait_for_completion(&cxlmd->ops_dead);
-	percpu_ref_exit(&cxlmd->ops_active);
-err_ref:
+	cxl_memdev_shutdown(cxlmd);
+	ida_free(&cxl_memdev_ida, cxlmd->id);
+err_id:
 	kfree(cxlmd);
 
 	return rc;


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures
  2021-03-30 23:36 [PATCH v3 0/4] cxl/mem: Fix memdev device setup Dan Williams
  2021-03-30 23:36 ` [PATCH v3 1/4] cxl/mem: Use sysfs_emit() for attribute show routines Dan Williams
  2021-03-30 23:36 ` [PATCH v3 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations Dan Williams
@ 2021-03-30 23:36 ` Dan Williams
  2021-03-31 13:09   ` Jason Gunthorpe
  2021-03-30 23:36 ` [PATCH v3 4/4] cxl/mem: Disable cxl device power management Dan Williams
  3 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2021-03-30 23:36 UTC (permalink / raw)
  To: linux-cxl
  Cc: Jason Gunthorpe, linux-kernel, vishal.l.verma, ira.weiny,
	alison.schofield

While device_add() will happen to catch dev_set_name() failures it is a
broken pattern to follow given that the core may try to fall back to a
different name.

Add explicit checking for dev_set_name() failures to be cleaned up by
put_device(). Skip cdev_device_add() and proceed directly to
put_device() if the name set fails.

This type of bug is easier to see if 'alloc' is split from 'add'
operations that require put_device() on failure. So cxl_memdev_alloc()
is split out as a result.

Fixes: b39cb1052a5c ("cxl/mem: Register CXL memX devices")
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/mem.c |   41 ++++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 2cf620d201a6..759713b619ab 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -1187,7 +1187,7 @@ static void cxl_memdev_unregister(void *_cxlmd)
 	put_device(dev);
 }
 
-static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
+static struct cxl_memdev *cxl_memdev_alloc(struct cxl_mem *cxlm)
 {
 	struct pci_dev *pdev = cxlm->pdev;
 	struct cxl_memdev *cxlmd;
@@ -1197,11 +1197,11 @@ static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
 
 	cxlmd = kzalloc(sizeof(*cxlmd), GFP_KERNEL);
 	if (!cxlmd)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	rc = ida_alloc_range(&cxl_memdev_ida, 0, CXL_MEM_MAX_DEVS, GFP_KERNEL);
 	if (rc < 0)
-		goto err_id;
+		goto err;
 	cxlmd->id = rc;
 
 	dev = &cxlmd->dev;
@@ -1210,29 +1210,48 @@ static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
 	dev->bus = &cxl_bus_type;
 	dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
 	dev->type = &cxl_memdev_type;
-	dev_set_name(dev, "mem%d", cxlmd->id);
 
 	cdev = &cxlmd->cdev;
 	cdev_init(cdev, &cxl_memdev_fops);
+	return cxlmd;
+
+err:
+	kfree(cxlmd);
+	return ERR_PTR(rc);
+}
+
+static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
+{
+	struct cxl_memdev *cxlmd;
+	struct device *dev;
+	struct cdev *cdev;
+	int rc;
+
+	cxlmd = cxl_memdev_alloc(cxlm);
+	if (IS_ERR(cxlmd))
+		return PTR_ERR(cxlmd);
+
+	dev = &cxlmd->dev;
+	rc = dev_set_name(dev, "mem%d", cxlmd->id);
+	if (rc)
+		goto err;
 
+	cdev = &cxlmd->cdev;
 	cxl_memdev_activate(cxlmd, cxlm);
 	rc = cdev_device_add(cdev, dev);
 	if (rc)
-		goto err_add;
+		goto err;
 
-	return devm_add_action_or_reset(&pdev->dev, cxl_memdev_unregister,
+	return devm_add_action_or_reset(dev->parent, cxl_memdev_unregister,
 					cxlmd);
 
-err_add:
+err:
 	/*
 	 * The cdev was briefly live, shutdown any ioctl operations that
 	 * saw that state.
 	 */
 	cxl_memdev_shutdown(cxlmd);
-	ida_free(&cxl_memdev_ida, cxlmd->id);
-err_id:
-	kfree(cxlmd);
-
+	put_device(dev);
 	return rc;
 }
 


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 4/4] cxl/mem: Disable cxl device power management
  2021-03-30 23:36 [PATCH v3 0/4] cxl/mem: Fix memdev device setup Dan Williams
                   ` (2 preceding siblings ...)
  2021-03-30 23:36 ` [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures Dan Williams
@ 2021-03-30 23:36 ` Dan Williams
  3 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2021-03-30 23:36 UTC (permalink / raw)
  To: linux-cxl
  Cc: Ben Widawsky, linux-kernel, vishal.l.verma, ira.weiny, alison.schofield

There is no power management of cxl virtual devices, disable
device-power-management and runtime-power-management to prevent
userspace from growing expectations of those attributes appearing. They
can be added back in the future if needed.

Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/mem.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 759713b619ab..a8f750a9e2e4 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -1210,6 +1210,7 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_mem *cxlm)
 	dev->bus = &cxl_bus_type;
 	dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
 	dev->type = &cxl_memdev_type;
+	device_set_pm_not_required(dev);
 
 	cdev = &cxlmd->cdev;
 	cdev_init(cdev, &cxl_memdev_fops);


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations
  2021-03-30 23:36 ` [PATCH v3 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations Dan Williams
@ 2021-03-31 13:07   ` Jason Gunthorpe
  2021-03-31 15:45     ` Dan Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2021-03-31 13:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, linux-kernel, vishal.l.verma, ira.weiny, alison.schofield

On Tue, Mar 30, 2021 at 04:36:37PM -0700, Dan Williams wrote:
> -static void cxlmdev_unregister(void *_cxlmd)
> +static void cxl_memdev_activate(struct cxl_memdev *cxlmd, struct cxl_mem *cxlm)
>  {
> -	struct cxl_memdev *cxlmd = _cxlmd;
> -	struct device *dev = &cxlmd->dev;
> +	cxlmd->cxlm = cxlm;
> +	down_write(&cxl_memdev_rwsem);
> +	up_write(&cxl_memdev_rwsem);
> +}

No reason not to put the assignment inside the lock. Though using the
lock at all is overkill as the pointer hasn't left the local stack
frame at this point.

>  err_add:
> -	ida_free(&cxl_memdev_ida, cxlmd->id);
> -err_id:
>  	/*
> -	 * Theoretically userspace could have already entered the fops,
> -	 * so flush ops_active.
> +	 * The cdev was briefly live, shutdown any ioctl operations that
> +	 * saw that state.
>  	 */

Wow it is really subtle that cdev_device_add has this tiny hole where
it can fail but have already allowed open() :<

Other than the lock it looks OK

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures
  2021-03-30 23:36 ` [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures Dan Williams
@ 2021-03-31 13:09   ` Jason Gunthorpe
  2021-03-31 16:04     ` Dan Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2021-03-31 13:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, linux-kernel, vishal.l.verma, ira.weiny, alison.schofield

On Tue, Mar 30, 2021 at 04:36:42PM -0700, Dan Williams wrote:
> +static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
> +{
> +	struct cxl_memdev *cxlmd;
> +	struct device *dev;
> +	struct cdev *cdev;
> +	int rc;
> +
> +	cxlmd = cxl_memdev_alloc(cxlm);
> +	if (IS_ERR(cxlmd))
> +		return PTR_ERR(cxlmd);
> +
> +	dev = &cxlmd->dev;
> +	rc = dev_set_name(dev, "mem%d", cxlmd->id);
> +	if (rc)
> +		goto err;
>  
> +	cdev = &cxlmd->cdev;
>  	cxl_memdev_activate(cxlmd, cxlm);
>  	rc = cdev_device_add(cdev, dev);
>  	if (rc)
> -		goto err_add;
> +		goto err;

It might read nicer to have the error unwind here just call cxl_memdev_unregister()

> -	return devm_add_action_or_reset(&pdev->dev, cxl_memdev_unregister,
> +	return devm_add_action_or_reset(dev->parent, cxl_memdev_unregister,
>  					cxlmd);

Since that is what the error unwind does at this point.

>  
> -err_add:
> +err:
>  	/*
>  	 * The cdev was briefly live, shutdown any ioctl operations that
>  	 * saw that state.
>  	 */
>  	cxl_memdev_shutdown(cxlmd);

Then this doesn't need to be a function

But it is OK as is

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations
  2021-03-31 13:07   ` Jason Gunthorpe
@ 2021-03-31 15:45     ` Dan Williams
  0 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2021-03-31 15:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-cxl, Linux Kernel Mailing List, Vishal L Verma, Weiny, Ira,
	Schofield, Alison

On Wed, Mar 31, 2021 at 6:07 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Mar 30, 2021 at 04:36:37PM -0700, Dan Williams wrote:
> > -static void cxlmdev_unregister(void *_cxlmd)
> > +static void cxl_memdev_activate(struct cxl_memdev *cxlmd, struct cxl_mem *cxlm)
> >  {
> > -     struct cxl_memdev *cxlmd = _cxlmd;
> > -     struct device *dev = &cxlmd->dev;
> > +     cxlmd->cxlm = cxlm;
> > +     down_write(&cxl_memdev_rwsem);
> > +     up_write(&cxl_memdev_rwsem);
> > +}
>
> No reason not to put the assignment inside the lock. Though using the
> lock at all is overkill as the pointer hasn't left the local stack
> frame at this point.

Right, I was considering just leaving it as a bare pointer assignment,
in fact that must be sufficient as publishing the cdev needs to depend
on all cdev init having completed. So if this write somehow leaks into
cdev_device_add() there are much larger problems afoot.

>
> >  err_add:
> > -     ida_free(&cxl_memdev_ida, cxlmd->id);
> > -err_id:
> >       /*
> > -      * Theoretically userspace could have already entered the fops,
> > -      * so flush ops_active.
> > +      * The cdev was briefly live, shutdown any ioctl operations that
> > +      * saw that state.
> >        */
>
> Wow it is really subtle that cdev_device_add has this tiny hole where
> it can fail but have already allowed open() :<

Yes, this was something I wanted to address in the cdev api proposal
integrating the debugfs fops proxy / reference counting aproach. I
want a cdev api that does not allow open until after the associated
device has registered and fired the KOBJ_ADD event.

>
> Other than the lock it looks OK
>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Thanks, Jason.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures
  2021-03-31 13:09   ` Jason Gunthorpe
@ 2021-03-31 16:04     ` Dan Williams
  2021-03-31 16:17       ` Jason Gunthorpe
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2021-03-31 16:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-cxl, Linux Kernel Mailing List, Vishal L Verma, Weiny, Ira,
	Schofield, Alison

On Wed, Mar 31, 2021 at 6:10 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Mar 30, 2021 at 04:36:42PM -0700, Dan Williams wrote:
> > +static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
> > +{
> > +     struct cxl_memdev *cxlmd;
> > +     struct device *dev;
> > +     struct cdev *cdev;
> > +     int rc;
> > +
> > +     cxlmd = cxl_memdev_alloc(cxlm);
> > +     if (IS_ERR(cxlmd))
> > +             return PTR_ERR(cxlmd);
> > +
> > +     dev = &cxlmd->dev;
> > +     rc = dev_set_name(dev, "mem%d", cxlmd->id);
> > +     if (rc)
> > +             goto err;
> >
> > +     cdev = &cxlmd->cdev;
> >       cxl_memdev_activate(cxlmd, cxlm);
> >       rc = cdev_device_add(cdev, dev);
> >       if (rc)
> > -             goto err_add;
> > +             goto err;
>
> It might read nicer to have the error unwind here just call cxl_memdev_unregister()

Perhaps, but I don't think cdev_del() and device_del() are prepared to
deal with an object that was not successfully added.

>
> > -     return devm_add_action_or_reset(&pdev->dev, cxl_memdev_unregister,
> > +     return devm_add_action_or_reset(dev->parent, cxl_memdev_unregister,
> >                                       cxlmd);
>
> Since that is what the error unwind does at this point.

Right, but at this point the code knows that cdev_del() and
device_del() will receive an object in the appropriate state.

>
> >
> > -err_add:
> > +err:
> >       /*
> >        * The cdev was briefly live, shutdown any ioctl operations that
> >        * saw that state.
> >        */
> >       cxl_memdev_shutdown(cxlmd);
>
> Then this doesn't need to be a function
>
> But it is OK as is

Unless I'm missing something I think it's required to use only
put_device() to cleanup after cdev_device_add() failure, but yes I
don't like that cxl_memdev_shutdown() needs to be open coded like
this.

>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Appreciate it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures
  2021-03-31 16:04     ` Dan Williams
@ 2021-03-31 16:17       ` Jason Gunthorpe
  2021-03-31 16:32         ` Dan Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2021-03-31 16:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, Linux Kernel Mailing List, Vishal L Verma, Weiny, Ira,
	Schofield, Alison

On Wed, Mar 31, 2021 at 09:04:32AM -0700, Dan Williams wrote:
> On Wed, Mar 31, 2021 at 6:10 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Tue, Mar 30, 2021 at 04:36:42PM -0700, Dan Williams wrote:
> > > +static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
> > > +{
> > > +     struct cxl_memdev *cxlmd;
> > > +     struct device *dev;
> > > +     struct cdev *cdev;
> > > +     int rc;
> > > +
> > > +     cxlmd = cxl_memdev_alloc(cxlm);
> > > +     if (IS_ERR(cxlmd))
> > > +             return PTR_ERR(cxlmd);
> > > +
> > > +     dev = &cxlmd->dev;
> > > +     rc = dev_set_name(dev, "mem%d", cxlmd->id);
> > > +     if (rc)
> > > +             goto err;
> > >
> > > +     cdev = &cxlmd->cdev;
> > >       cxl_memdev_activate(cxlmd, cxlm);
> > >       rc = cdev_device_add(cdev, dev);
> > >       if (rc)
> > > -             goto err_add;
> > > +             goto err;
> >
> > It might read nicer to have the error unwind here just call cxl_memdev_unregister()
> 
> Perhaps, but I don't think cdev_del() and device_del() are prepared to
> deal with an object that was not successfully added.

Oh, probably not, yuk yuk yuk.

Ideally cdev_device_add should not fail in a way that allows an open,
I think that is just an artifact of it being composed of smaller
functions..

For instance if we replace the kobj_map with xarray then we can
use xa_reserve and xa_store to avoid this condition.

This actually looks like a good fit because the dev_t has pretty
"lumpy" allocations and this isn't really performance sensitive.

A clever person could then make the dev_t self allocating and solve
another pain point with this interface. Hum..

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures
  2021-03-31 16:17       ` Jason Gunthorpe
@ 2021-03-31 16:32         ` Dan Williams
  0 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2021-03-31 16:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-cxl, Linux Kernel Mailing List, Vishal L Verma, Weiny, Ira,
	Schofield, Alison

On Wed, Mar 31, 2021 at 9:18 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Wed, Mar 31, 2021 at 09:04:32AM -0700, Dan Williams wrote:
> > On Wed, Mar 31, 2021 at 6:10 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > On Tue, Mar 30, 2021 at 04:36:42PM -0700, Dan Williams wrote:
> > > > +static int cxl_mem_add_memdev(struct cxl_mem *cxlm)
> > > > +{
> > > > +     struct cxl_memdev *cxlmd;
> > > > +     struct device *dev;
> > > > +     struct cdev *cdev;
> > > > +     int rc;
> > > > +
> > > > +     cxlmd = cxl_memdev_alloc(cxlm);
> > > > +     if (IS_ERR(cxlmd))
> > > > +             return PTR_ERR(cxlmd);
> > > > +
> > > > +     dev = &cxlmd->dev;
> > > > +     rc = dev_set_name(dev, "mem%d", cxlmd->id);
> > > > +     if (rc)
> > > > +             goto err;
> > > >
> > > > +     cdev = &cxlmd->cdev;
> > > >       cxl_memdev_activate(cxlmd, cxlm);
> > > >       rc = cdev_device_add(cdev, dev);
> > > >       if (rc)
> > > > -             goto err_add;
> > > > +             goto err;
> > >
> > > It might read nicer to have the error unwind here just call cxl_memdev_unregister()
> >
> > Perhaps, but I don't think cdev_del() and device_del() are prepared to
> > deal with an object that was not successfully added.
>
> Oh, probably not, yuk yuk yuk.
>
> Ideally cdev_device_add should not fail in a way that allows an open,
> I think that is just an artifact of it being composed of smaller
> functions..
>
> For instance if we replace the kobj_map with xarray then we can
> use xa_reserve and xa_store to avoid this condition.
>
> This actually looks like a good fit because the dev_t has pretty
> "lumpy" allocations and this isn't really performance sensitive.
>
> A clever person could then make the dev_t self allocating and solve
> another pain point with this interface. Hum..
>

...not a bad idea.

/me bookmarks this thread for future consideration.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-03-31 16:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-30 23:36 [PATCH v3 0/4] cxl/mem: Fix memdev device setup Dan Williams
2021-03-30 23:36 ` [PATCH v3 1/4] cxl/mem: Use sysfs_emit() for attribute show routines Dan Williams
2021-03-30 23:36 ` [PATCH v3 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations Dan Williams
2021-03-31 13:07   ` Jason Gunthorpe
2021-03-31 15:45     ` Dan Williams
2021-03-30 23:36 ` [PATCH v3 3/4] cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures Dan Williams
2021-03-31 13:09   ` Jason Gunthorpe
2021-03-31 16:04     ` Dan Williams
2021-03-31 16:17       ` Jason Gunthorpe
2021-03-31 16:32         ` Dan Williams
2021-03-30 23:36 ` [PATCH v3 4/4] cxl/mem: Disable cxl device power management Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.