All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] cxl/core: Hold the region rwsem during poison ops
@ 2023-11-14  2:53 alison.schofield
  2023-11-14 18:20 ` Dave Jiang
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: alison.schofield @ 2023-11-14  2:53 UTC (permalink / raw)
  To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl

From: Alison Schofield <alison.schofield@intel.com>

Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")
added a lockdep_assert_held() to make sure all callers hold the
region state stable while doing work that depends on the number
of committed decoders.

That lockdep assert triggered in poison list, inject, and clear
functions highlighting a gap between region attach and decoder
commit where holding the dpa_rwsem is not enough to assure that
a DPA is not added to a region. In such a case, if poison is
found in at that DPA, the trace event omits the region info
that users expect.

Close the gap by snapshotting an unchangeable region state during
all poison ops. Hold the region_rwsem in all the places that hold
the dpa_rwsem rather than in the region specific function only.

Fixes: 7ff6ad107588 ("cxl/memdev: Add trigger_poison_list sysfs attribute")
Fixes: f0832a586396 ("cxl/region: Provide region info to the cxl_poison trace event")
Fixes: d2fbc4865802 ("cxl/memdev: Add support for the Inject Poison mailbox command")
Fixes: 9690b07748d1 ("cxl/memdev: Add support for the Clear Poison mailbox command")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/cxl/core/memdev.c | 18 +++++++++---------
 drivers/cxl/core/region.c |  5 -----
 2 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index fc5c2b414793..961da365b097 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -227,9 +227,8 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
 	if (!port || !is_cxl_endpoint(port))
 		return -EINVAL;
 
-	rc = down_read_interruptible(&cxl_dpa_rwsem);
-	if (rc)
-		return rc;
+	down_read(&cxl_region_rwsem);
+	down_read(&cxl_dpa_rwsem);
 
 	if (cxl_num_decoders_committed(port) == 0) {
 		/* No regions mapped to this memdev */
@@ -239,6 +238,7 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
 		rc =  cxl_get_poison_by_endpoint(port);
 	}
 	up_read(&cxl_dpa_rwsem);
+	up_read(&cxl_region_rwsem);
 
 	return rc;
 }
@@ -324,9 +324,8 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
 	if (!IS_ENABLED(CONFIG_DEBUG_FS))
 		return 0;
 
-	rc = down_read_interruptible(&cxl_dpa_rwsem);
-	if (rc)
-		return rc;
+	down_read(&cxl_region_rwsem);
+	down_read(&cxl_dpa_rwsem);
 
 	rc = cxl_validate_poison_dpa(cxlmd, dpa);
 	if (rc)
@@ -355,6 +354,7 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
 	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_INJECT);
 out:
 	up_read(&cxl_dpa_rwsem);
+	up_read(&cxl_region_rwsem);
 
 	return rc;
 }
@@ -372,9 +372,8 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
 	if (!IS_ENABLED(CONFIG_DEBUG_FS))
 		return 0;
 
-	rc = down_read_interruptible(&cxl_dpa_rwsem);
-	if (rc)
-		return rc;
+	down_read(&cxl_region_rwsem);
+	down_read(&cxl_dpa_rwsem);
 
 	rc = cxl_validate_poison_dpa(cxlmd, dpa);
 	if (rc)
@@ -412,6 +411,7 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
 	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_CLEAR);
 out:
 	up_read(&cxl_dpa_rwsem);
+	up_read(&cxl_region_rwsem);
 
 	return rc;
 }
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 56e575c79bb4..3e817a6f94c6 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2467,10 +2467,6 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
 	struct cxl_poison_context ctx;
 	int rc = 0;
 
-	rc = down_read_interruptible(&cxl_region_rwsem);
-	if (rc)
-		return rc;
-
 	ctx = (struct cxl_poison_context) {
 		.port = port
 	};
@@ -2480,7 +2476,6 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
 		rc = cxl_get_poison_unmapped(to_cxl_memdev(port->uport_dev),
 					     &ctx);
 
-	up_read(&cxl_region_rwsem);
 	return rc;
 }
 

base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] cxl/core: Hold the region rwsem during poison ops
  2023-11-14  2:53 [PATCH] cxl/core: Hold the region rwsem during poison ops alison.schofield
@ 2023-11-14 18:20 ` Dave Jiang
  2023-11-14 18:25 ` [PATCH] cxl: Convert pioson ops rwsem usages to scope based resource management Dave Jiang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Dave Jiang @ 2023-11-14 18:20 UTC (permalink / raw)
  To: alison.schofield, Davidlohr Bueso, Jonathan Cameron,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl



On 11/13/23 19:53, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")
> added a lockdep_assert_held() to make sure all callers hold the
> region state stable while doing work that depends on the number
> of committed decoders.
> 
> That lockdep assert triggered in poison list, inject, and clear
> functions highlighting a gap between region attach and decoder
> commit where holding the dpa_rwsem is not enough to assure that
> a DPA is not added to a region. In such a case, if poison is
> found in at that DPA, the trace event omits the region info
> that users expect.
> 
> Close the gap by snapshotting an unchangeable region state during
> all poison ops. Hold the region_rwsem in all the places that hold
> the dpa_rwsem rather than in the region specific function only.
> 
> Fixes: 7ff6ad107588 ("cxl/memdev: Add trigger_poison_list sysfs attribute")
> Fixes: f0832a586396 ("cxl/region: Provide region info to the cxl_poison trace event")
> Fixes: d2fbc4865802 ("cxl/memdev: Add support for the Inject Poison mailbox command")
> Fixes: 9690b07748d1 ("cxl/memdev: Add support for the Clear Poison mailbox command")
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>  drivers/cxl/core/memdev.c | 18 +++++++++---------
>  drivers/cxl/core/region.c |  5 -----
>  2 files changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index fc5c2b414793..961da365b097 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -227,9 +227,8 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
>  	if (!port || !is_cxl_endpoint(port))
>  		return -EINVAL;
>  
> -	rc = down_read_interruptible(&cxl_dpa_rwsem);
> -	if (rc)
> -		return rc;
> +	down_read(&cxl_region_rwsem);
> +	down_read(&cxl_dpa_rwsem);
>  
>  	if (cxl_num_decoders_committed(port) == 0) {
>  		/* No regions mapped to this memdev */
> @@ -239,6 +238,7 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
>  		rc =  cxl_get_poison_by_endpoint(port);
>  	}
>  	up_read(&cxl_dpa_rwsem);
> +	up_read(&cxl_region_rwsem);
>  
>  	return rc;
>  }
> @@ -324,9 +324,8 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
>  	if (!IS_ENABLED(CONFIG_DEBUG_FS))
>  		return 0;
>  
> -	rc = down_read_interruptible(&cxl_dpa_rwsem);
> -	if (rc)
> -		return rc;
> +	down_read(&cxl_region_rwsem);
> +	down_read(&cxl_dpa_rwsem);
>  
>  	rc = cxl_validate_poison_dpa(cxlmd, dpa);
>  	if (rc)
> @@ -355,6 +354,7 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
>  	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_INJECT);
>  out:
>  	up_read(&cxl_dpa_rwsem);
> +	up_read(&cxl_region_rwsem);
>  
>  	return rc;
>  }
> @@ -372,9 +372,8 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
>  	if (!IS_ENABLED(CONFIG_DEBUG_FS))
>  		return 0;
>  
> -	rc = down_read_interruptible(&cxl_dpa_rwsem);
> -	if (rc)
> -		return rc;
> +	down_read(&cxl_region_rwsem);
> +	down_read(&cxl_dpa_rwsem);
>  
>  	rc = cxl_validate_poison_dpa(cxlmd, dpa);
>  	if (rc)
> @@ -412,6 +411,7 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
>  	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_CLEAR);
>  out:
>  	up_read(&cxl_dpa_rwsem);
> +	up_read(&cxl_region_rwsem);
>  
>  	return rc;
>  }
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 56e575c79bb4..3e817a6f94c6 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2467,10 +2467,6 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
>  	struct cxl_poison_context ctx;
>  	int rc = 0;
>  
> -	rc = down_read_interruptible(&cxl_region_rwsem);
> -	if (rc)
> -		return rc;
> -
>  	ctx = (struct cxl_poison_context) {
>  		.port = port
>  	};
> @@ -2480,7 +2476,6 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
>  		rc = cxl_get_poison_unmapped(to_cxl_memdev(port->uport_dev),
>  					     &ctx);
>  
> -	up_read(&cxl_region_rwsem);
>  	return rc;
>  }
>  
> 
> base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] cxl: Convert pioson ops rwsem usages to scope based resource management
  2023-11-14  2:53 [PATCH] cxl/core: Hold the region rwsem during poison ops alison.schofield
  2023-11-14 18:20 ` Dave Jiang
@ 2023-11-14 18:25 ` Dave Jiang
  2023-11-15 23:32   ` Alison Schofield
  2023-11-16 22:14 ` [PATCH] cxl/core: Hold the region rwsem during poison ops Davidlohr Bueso
  2023-11-23  1:48 ` Dan Williams
  3 siblings, 1 reply; 13+ messages in thread
From: Dave Jiang @ 2023-11-14 18:25 UTC (permalink / raw)
  To: linux-cxl, alison.schofield
  Cc: dave, jonathan.cameron, vishal.l.verma, ira.weiny, dan.j.williams

Cleanup the rwsem usages in the poison ops to reduce complexity and reduce
code lines.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---

Hi Alison, follow on patch to yours. Can't be backported, but nice clean
up going forward.

 drivers/cxl/core/memdev.c |   32 ++++++++++++--------------------
 1 file changed, 12 insertions(+), 20 deletions(-)

diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 961da365b097..9ab748fadb38 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -227,8 +227,8 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
 	if (!port || !is_cxl_endpoint(port))
 		return -EINVAL;
 
-	down_read(&cxl_region_rwsem);
-	down_read(&cxl_dpa_rwsem);
+	guard(rwsem_read)(&cxl_region_rwsem);
+	guard(rwsem_read)(&cxl_dpa_rwsem);
 
 	if (cxl_num_decoders_committed(port) == 0) {
 		/* No regions mapped to this memdev */
@@ -237,8 +237,6 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
 		/* Regions mapped, collect poison by endpoint */
 		rc =  cxl_get_poison_by_endpoint(port);
 	}
-	up_read(&cxl_dpa_rwsem);
-	up_read(&cxl_region_rwsem);
 
 	return rc;
 }
@@ -324,12 +322,12 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
 	if (!IS_ENABLED(CONFIG_DEBUG_FS))
 		return 0;
 
-	down_read(&cxl_region_rwsem);
-	down_read(&cxl_dpa_rwsem);
+	guard(rwsem_read)(&cxl_region_rwsem);
+	guard(rwsem_read)(&cxl_dpa_rwsem);
 
 	rc = cxl_validate_poison_dpa(cxlmd, dpa);
 	if (rc)
-		goto out;
+		return rc;
 
 	inject.address = cpu_to_le64(dpa);
 	mbox_cmd = (struct cxl_mbox_cmd) {
@@ -339,7 +337,7 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
 	};
 	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
 	if (rc)
-		goto out;
+		return rc;
 
 	cxlr = cxl_dpa_to_region(cxlmd, dpa);
 	if (cxlr)
@@ -352,11 +350,8 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
 		.length = cpu_to_le32(1),
 	};
 	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_INJECT);
-out:
-	up_read(&cxl_dpa_rwsem);
-	up_read(&cxl_region_rwsem);
 
-	return rc;
+	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, CXL);
 
@@ -372,12 +367,12 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
 	if (!IS_ENABLED(CONFIG_DEBUG_FS))
 		return 0;
 
-	down_read(&cxl_region_rwsem);
-	down_read(&cxl_dpa_rwsem);
+	guard(rwsem_read)(&cxl_region_rwsem);
+	guard(rwsem_read)(&cxl_dpa_rwsem);
 
 	rc = cxl_validate_poison_dpa(cxlmd, dpa);
 	if (rc)
-		goto out;
+		return rc;
 
 	/*
 	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
@@ -396,7 +391,7 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
 
 	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
 	if (rc)
-		goto out;
+		return rc;
 
 	cxlr = cxl_dpa_to_region(cxlmd, dpa);
 	if (cxlr)
@@ -409,11 +404,8 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
 		.length = cpu_to_le32(1),
 	};
 	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_CLEAR);
-out:
-	up_read(&cxl_dpa_rwsem);
-	up_read(&cxl_region_rwsem);
 
-	return rc;
+	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_clear_poison, CXL);
 



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] cxl: Convert pioson ops rwsem usages to scope based resource management
  2023-11-14 18:25 ` [PATCH] cxl: Convert pioson ops rwsem usages to scope based resource management Dave Jiang
@ 2023-11-15 23:32   ` Alison Schofield
  2023-11-15 23:55     ` Dave Jiang
  0 siblings, 1 reply; 13+ messages in thread
From: Alison Schofield @ 2023-11-15 23:32 UTC (permalink / raw)
  To: Dave Jiang
  Cc: linux-cxl, dave, jonathan.cameron, vishal.l.verma, ira.weiny,
	dan.j.williams

On Tue, Nov 14, 2023 at 11:25:20AM -0700, Dave Jiang wrote:
> Cleanup the rwsem usages in the poison ops to reduce complexity and reduce
> code lines.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
> 
> Hi Alison, follow on patch to yours. Can't be backported, but nice clean
> up going forward.

Tell me more about your backport worry.  Are we expected to avoid using
the new guard any place where there is a backport possibility?  

I'm thinking I should just rev the patch with your updates.

> 
>  drivers/cxl/core/memdev.c |   32 ++++++++++++--------------------
>  1 file changed, 12 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 961da365b097..9ab748fadb38 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -227,8 +227,8 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
>  	if (!port || !is_cxl_endpoint(port))
>  		return -EINVAL;
>  
> -	down_read(&cxl_region_rwsem);
> -	down_read(&cxl_dpa_rwsem);
> +	guard(rwsem_read)(&cxl_region_rwsem);
> +	guard(rwsem_read)(&cxl_dpa_rwsem);
>  
>  	if (cxl_num_decoders_committed(port) == 0) {
>  		/* No regions mapped to this memdev */
> @@ -237,8 +237,6 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
>  		/* Regions mapped, collect poison by endpoint */
>  		rc =  cxl_get_poison_by_endpoint(port);
>  	}
> -	up_read(&cxl_dpa_rwsem);
> -	up_read(&cxl_region_rwsem);
>  
>  	return rc;
>  }
> @@ -324,12 +322,12 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
>  	if (!IS_ENABLED(CONFIG_DEBUG_FS))
>  		return 0;
>  
> -	down_read(&cxl_region_rwsem);
> -	down_read(&cxl_dpa_rwsem);
> +	guard(rwsem_read)(&cxl_region_rwsem);
> +	guard(rwsem_read)(&cxl_dpa_rwsem);
>  
>  	rc = cxl_validate_poison_dpa(cxlmd, dpa);
>  	if (rc)
> -		goto out;
> +		return rc;
>  
>  	inject.address = cpu_to_le64(dpa);
>  	mbox_cmd = (struct cxl_mbox_cmd) {
> @@ -339,7 +337,7 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
>  	};
>  	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
>  	if (rc)
> -		goto out;
> +		return rc;
>  
>  	cxlr = cxl_dpa_to_region(cxlmd, dpa);
>  	if (cxlr)
> @@ -352,11 +350,8 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
>  		.length = cpu_to_le32(1),
>  	};
>  	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_INJECT);
> -out:
> -	up_read(&cxl_dpa_rwsem);
> -	up_read(&cxl_region_rwsem);
>  
> -	return rc;
> +	return 0;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, CXL);
>  
> @@ -372,12 +367,12 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
>  	if (!IS_ENABLED(CONFIG_DEBUG_FS))
>  		return 0;
>  
> -	down_read(&cxl_region_rwsem);
> -	down_read(&cxl_dpa_rwsem);
> +	guard(rwsem_read)(&cxl_region_rwsem);
> +	guard(rwsem_read)(&cxl_dpa_rwsem);
>  
>  	rc = cxl_validate_poison_dpa(cxlmd, dpa);
>  	if (rc)
> -		goto out;
> +		return rc;
>  
>  	/*
>  	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
> @@ -396,7 +391,7 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
>  
>  	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
>  	if (rc)
> -		goto out;
> +		return rc;
>  
>  	cxlr = cxl_dpa_to_region(cxlmd, dpa);
>  	if (cxlr)
> @@ -409,11 +404,8 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
>  		.length = cpu_to_le32(1),
>  	};
>  	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_CLEAR);
> -out:
> -	up_read(&cxl_dpa_rwsem);
> -	up_read(&cxl_region_rwsem);
>  
> -	return rc;
> +	return 0;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_clear_poison, CXL);
>  
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] cxl: Convert pioson ops rwsem usages to scope based resource management
  2023-11-15 23:32   ` Alison Schofield
@ 2023-11-15 23:55     ` Dave Jiang
  2023-11-16  2:17       ` Alison Schofield
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Jiang @ 2023-11-15 23:55 UTC (permalink / raw)
  To: Alison Schofield
  Cc: linux-cxl, dave, jonathan.cameron, vishal.l.verma, ira.weiny,
	dan.j.williams



On 11/15/23 16:32, Alison Schofield wrote:
> On Tue, Nov 14, 2023 at 11:25:20AM -0700, Dave Jiang wrote:
>> Cleanup the rwsem usages in the poison ops to reduce complexity and reduce
>> code lines.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>>
>> Hi Alison, follow on patch to yours. Can't be backported, but nice clean
>> up going forward.
> 
> Tell me more about your backport worry.  Are we expected to avoid using
> the new guard any place where there is a backport possibility?  

Given that there's none of the cleanup.h support in stable kernels, I don't see how we can backport the guard() code automatically. Thus your original fix with a fixes tag plus a new cleanup patch follow on w/o backport issues seems necessary. Otherwise a separate backport patch would be needed no?
> 
> I'm thinking I should just rev the patch with your updates.
> 
>>
>>  drivers/cxl/core/memdev.c |   32 ++++++++++++--------------------
>>  1 file changed, 12 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
>> index 961da365b097..9ab748fadb38 100644
>> --- a/drivers/cxl/core/memdev.c
>> +++ b/drivers/cxl/core/memdev.c
>> @@ -227,8 +227,8 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
>>  	if (!port || !is_cxl_endpoint(port))
>>  		return -EINVAL;
>>  
>> -	down_read(&cxl_region_rwsem);
>> -	down_read(&cxl_dpa_rwsem);
>> +	guard(rwsem_read)(&cxl_region_rwsem);
>> +	guard(rwsem_read)(&cxl_dpa_rwsem);
>>  
>>  	if (cxl_num_decoders_committed(port) == 0) {
>>  		/* No regions mapped to this memdev */
>> @@ -237,8 +237,6 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
>>  		/* Regions mapped, collect poison by endpoint */
>>  		rc =  cxl_get_poison_by_endpoint(port);
>>  	}
>> -	up_read(&cxl_dpa_rwsem);
>> -	up_read(&cxl_region_rwsem);
>>  
>>  	return rc;
>>  }
>> @@ -324,12 +322,12 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
>>  	if (!IS_ENABLED(CONFIG_DEBUG_FS))
>>  		return 0;
>>  
>> -	down_read(&cxl_region_rwsem);
>> -	down_read(&cxl_dpa_rwsem);
>> +	guard(rwsem_read)(&cxl_region_rwsem);
>> +	guard(rwsem_read)(&cxl_dpa_rwsem);
>>  
>>  	rc = cxl_validate_poison_dpa(cxlmd, dpa);
>>  	if (rc)
>> -		goto out;
>> +		return rc;
>>  
>>  	inject.address = cpu_to_le64(dpa);
>>  	mbox_cmd = (struct cxl_mbox_cmd) {
>> @@ -339,7 +337,7 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
>>  	};
>>  	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
>>  	if (rc)
>> -		goto out;
>> +		return rc;
>>  
>>  	cxlr = cxl_dpa_to_region(cxlmd, dpa);
>>  	if (cxlr)
>> @@ -352,11 +350,8 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
>>  		.length = cpu_to_le32(1),
>>  	};
>>  	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_INJECT);
>> -out:
>> -	up_read(&cxl_dpa_rwsem);
>> -	up_read(&cxl_region_rwsem);
>>  
>> -	return rc;
>> +	return 0;
>>  }
>>  EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, CXL);
>>  
>> @@ -372,12 +367,12 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
>>  	if (!IS_ENABLED(CONFIG_DEBUG_FS))
>>  		return 0;
>>  
>> -	down_read(&cxl_region_rwsem);
>> -	down_read(&cxl_dpa_rwsem);
>> +	guard(rwsem_read)(&cxl_region_rwsem);
>> +	guard(rwsem_read)(&cxl_dpa_rwsem);
>>  
>>  	rc = cxl_validate_poison_dpa(cxlmd, dpa);
>>  	if (rc)
>> -		goto out;
>> +		return rc;
>>  
>>  	/*
>>  	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
>> @@ -396,7 +391,7 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
>>  
>>  	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
>>  	if (rc)
>> -		goto out;
>> +		return rc;
>>  
>>  	cxlr = cxl_dpa_to_region(cxlmd, dpa);
>>  	if (cxlr)
>> @@ -409,11 +404,8 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
>>  		.length = cpu_to_le32(1),
>>  	};
>>  	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_CLEAR);
>> -out:
>> -	up_read(&cxl_dpa_rwsem);
>> -	up_read(&cxl_region_rwsem);
>>  
>> -	return rc;
>> +	return 0;
>>  }
>>  EXPORT_SYMBOL_NS_GPL(cxl_clear_poison, CXL);
>>  
>>
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] cxl: Convert pioson ops rwsem usages to scope based resource management
  2023-11-15 23:55     ` Dave Jiang
@ 2023-11-16  2:17       ` Alison Schofield
  2023-11-16 16:27         ` Dave Jiang
  2023-11-23  1:12         ` Dan Williams
  0 siblings, 2 replies; 13+ messages in thread
From: Alison Schofield @ 2023-11-16  2:17 UTC (permalink / raw)
  To: Dave Jiang
  Cc: linux-cxl, dave, jonathan.cameron, vishal.l.verma, ira.weiny,
	dan.j.williams

On Wed, Nov 15, 2023 at 04:55:11PM -0700, Dave Jiang wrote:
> 
> 
> On 11/15/23 16:32, Alison Schofield wrote:
> > On Tue, Nov 14, 2023 at 11:25:20AM -0700, Dave Jiang wrote:
> >> Cleanup the rwsem usages in the poison ops to reduce complexity and reduce
> >> code lines.
> >>
> >> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> >> ---
> >>
> >> Hi Alison, follow on patch to yours. Can't be backported, but nice clean
> >> up going forward.
> > 
> > Tell me more about your backport worry.  Are we expected to avoid using
> > the new guard any place where there is a backport possibility?  
> 
> Given that there's none of the cleanup.h support in stable kernels, I don't see how we can backport the guard() code automatically. Thus your original fix with a fixes tag plus a new cleanup patch follow on w/o backport issues seems necessary. Otherwise a separate backport patch would be needed no?

Sure, it would be needed. I guess I'm looking for why this backport
issue is so special. (not being sarcastic). Is there specific guidance
not to use the cleanup stuff if we think a patch might be backported?
I don't usually consider backportability when adding a Fixes tag to a
Patch. Have 'backport folks' asked us not to use it?

I'm imagining the slippery slope of not fixing something the best way
because we are worried that backport folks can't figure out how to
merge it.

> > 
> > I'm thinking I should just rev the patch with your updates.
> > 
> >>
> >>  drivers/cxl/core/memdev.c |   32 ++++++++++++--------------------
> >>  1 file changed, 12 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> >> index 961da365b097..9ab748fadb38 100644
> >> --- a/drivers/cxl/core/memdev.c
> >> +++ b/drivers/cxl/core/memdev.c
> >> @@ -227,8 +227,8 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
> >>  	if (!port || !is_cxl_endpoint(port))
> >>  		return -EINVAL;
> >>  
> >> -	down_read(&cxl_region_rwsem);
> >> -	down_read(&cxl_dpa_rwsem);
> >> +	guard(rwsem_read)(&cxl_region_rwsem);
> >> +	guard(rwsem_read)(&cxl_dpa_rwsem);
> >>  
> >>  	if (cxl_num_decoders_committed(port) == 0) {
> >>  		/* No regions mapped to this memdev */
> >> @@ -237,8 +237,6 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
> >>  		/* Regions mapped, collect poison by endpoint */
> >>  		rc =  cxl_get_poison_by_endpoint(port);
> >>  	}
> >> -	up_read(&cxl_dpa_rwsem);
> >> -	up_read(&cxl_region_rwsem);
> >>  
> >>  	return rc;
> >>  }
> >> @@ -324,12 +322,12 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
> >>  	if (!IS_ENABLED(CONFIG_DEBUG_FS))
> >>  		return 0;
> >>  
> >> -	down_read(&cxl_region_rwsem);
> >> -	down_read(&cxl_dpa_rwsem);
> >> +	guard(rwsem_read)(&cxl_region_rwsem);
> >> +	guard(rwsem_read)(&cxl_dpa_rwsem);
> >>  
> >>  	rc = cxl_validate_poison_dpa(cxlmd, dpa);
> >>  	if (rc)
> >> -		goto out;
> >> +		return rc;
> >>  
> >>  	inject.address = cpu_to_le64(dpa);
> >>  	mbox_cmd = (struct cxl_mbox_cmd) {
> >> @@ -339,7 +337,7 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
> >>  	};
> >>  	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> >>  	if (rc)
> >> -		goto out;
> >> +		return rc;
> >>  
> >>  	cxlr = cxl_dpa_to_region(cxlmd, dpa);
> >>  	if (cxlr)
> >> @@ -352,11 +350,8 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
> >>  		.length = cpu_to_le32(1),
> >>  	};
> >>  	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_INJECT);
> >> -out:
> >> -	up_read(&cxl_dpa_rwsem);
> >> -	up_read(&cxl_region_rwsem);
> >>  
> >> -	return rc;
> >> +	return 0;
> >>  }
> >>  EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, CXL);
> >>  
> >> @@ -372,12 +367,12 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
> >>  	if (!IS_ENABLED(CONFIG_DEBUG_FS))
> >>  		return 0;
> >>  
> >> -	down_read(&cxl_region_rwsem);
> >> -	down_read(&cxl_dpa_rwsem);
> >> +	guard(rwsem_read)(&cxl_region_rwsem);
> >> +	guard(rwsem_read)(&cxl_dpa_rwsem);
> >>  
> >>  	rc = cxl_validate_poison_dpa(cxlmd, dpa);
> >>  	if (rc)
> >> -		goto out;
> >> +		return rc;
> >>  
> >>  	/*
> >>  	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
> >> @@ -396,7 +391,7 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
> >>  
> >>  	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> >>  	if (rc)
> >> -		goto out;
> >> +		return rc;
> >>  
> >>  	cxlr = cxl_dpa_to_region(cxlmd, dpa);
> >>  	if (cxlr)
> >> @@ -409,11 +404,8 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
> >>  		.length = cpu_to_le32(1),
> >>  	};
> >>  	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_CLEAR);
> >> -out:
> >> -	up_read(&cxl_dpa_rwsem);
> >> -	up_read(&cxl_region_rwsem);
> >>  
> >> -	return rc;
> >> +	return 0;
> >>  }
> >>  EXPORT_SYMBOL_NS_GPL(cxl_clear_poison, CXL);
> >>  
> >>
> >>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] cxl: Convert pioson ops rwsem usages to scope based resource management
  2023-11-16  2:17       ` Alison Schofield
@ 2023-11-16 16:27         ` Dave Jiang
  2023-11-23  1:12         ` Dan Williams
  1 sibling, 0 replies; 13+ messages in thread
From: Dave Jiang @ 2023-11-16 16:27 UTC (permalink / raw)
  To: Alison Schofield
  Cc: linux-cxl, dave, jonathan.cameron, vishal.l.verma, ira.weiny,
	dan.j.williams



On 11/15/23 19:17, Alison Schofield wrote:
> On Wed, Nov 15, 2023 at 04:55:11PM -0700, Dave Jiang wrote:
>>
>>
>> On 11/15/23 16:32, Alison Schofield wrote:
>>> On Tue, Nov 14, 2023 at 11:25:20AM -0700, Dave Jiang wrote:
>>>> Cleanup the rwsem usages in the poison ops to reduce complexity and reduce
>>>> code lines.
>>>>
>>>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>>>> ---
>>>>
>>>> Hi Alison, follow on patch to yours. Can't be backported, but nice clean
>>>> up going forward.
>>>
>>> Tell me more about your backport worry.  Are we expected to avoid using
>>> the new guard any place where there is a backport possibility?  
>>
>> Given that there's none of the cleanup.h support in stable kernels, I don't see how we can backport the guard() code automatically. Thus your original fix with a fixes tag plus a new cleanup patch follow on w/o backport issues seems necessary. Otherwise a separate backport patch would be needed no?
> 
> Sure, it would be needed. I guess I'm looking for why this backport
> issue is so special. (not being sarcastic). Is there specific guidance
> not to use the cleanup stuff if we think a patch might be backported?
> I don't usually consider backportability when adding a Fixes tag to a
> Patch. Have 'backport folks' asked us not to use it?
> 
> I'm imagining the slippery slope of not fixing something the best way
> because we are worried that backport folks can't figure out how to
> merge it.

Just auto backport vs manual backport. And if you don't mind doing the work, then I guess use the new calls. 


> 
>>>
>>> I'm thinking I should just rev the patch with your updates.
>>>
>>>>
>>>>  drivers/cxl/core/memdev.c |   32 ++++++++++++--------------------
>>>>  1 file changed, 12 insertions(+), 20 deletions(-)
>>>>
>>>> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
>>>> index 961da365b097..9ab748fadb38 100644
>>>> --- a/drivers/cxl/core/memdev.c
>>>> +++ b/drivers/cxl/core/memdev.c
>>>> @@ -227,8 +227,8 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
>>>>  	if (!port || !is_cxl_endpoint(port))
>>>>  		return -EINVAL;
>>>>  
>>>> -	down_read(&cxl_region_rwsem);
>>>> -	down_read(&cxl_dpa_rwsem);
>>>> +	guard(rwsem_read)(&cxl_region_rwsem);
>>>> +	guard(rwsem_read)(&cxl_dpa_rwsem);
>>>>  
>>>>  	if (cxl_num_decoders_committed(port) == 0) {
>>>>  		/* No regions mapped to this memdev */
>>>> @@ -237,8 +237,6 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
>>>>  		/* Regions mapped, collect poison by endpoint */
>>>>  		rc =  cxl_get_poison_by_endpoint(port);
>>>>  	}
>>>> -	up_read(&cxl_dpa_rwsem);
>>>> -	up_read(&cxl_region_rwsem);
>>>>  
>>>>  	return rc;
>>>>  }
>>>> @@ -324,12 +322,12 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
>>>>  	if (!IS_ENABLED(CONFIG_DEBUG_FS))
>>>>  		return 0;
>>>>  
>>>> -	down_read(&cxl_region_rwsem);
>>>> -	down_read(&cxl_dpa_rwsem);
>>>> +	guard(rwsem_read)(&cxl_region_rwsem);
>>>> +	guard(rwsem_read)(&cxl_dpa_rwsem);
>>>>  
>>>>  	rc = cxl_validate_poison_dpa(cxlmd, dpa);
>>>>  	if (rc)
>>>> -		goto out;
>>>> +		return rc;
>>>>  
>>>>  	inject.address = cpu_to_le64(dpa);
>>>>  	mbox_cmd = (struct cxl_mbox_cmd) {
>>>> @@ -339,7 +337,7 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
>>>>  	};
>>>>  	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
>>>>  	if (rc)
>>>> -		goto out;
>>>> +		return rc;
>>>>  
>>>>  	cxlr = cxl_dpa_to_region(cxlmd, dpa);
>>>>  	if (cxlr)
>>>> @@ -352,11 +350,8 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
>>>>  		.length = cpu_to_le32(1),
>>>>  	};
>>>>  	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_INJECT);
>>>> -out:
>>>> -	up_read(&cxl_dpa_rwsem);
>>>> -	up_read(&cxl_region_rwsem);
>>>>  
>>>> -	return rc;
>>>> +	return 0;
>>>>  }
>>>>  EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, CXL);
>>>>  
>>>> @@ -372,12 +367,12 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
>>>>  	if (!IS_ENABLED(CONFIG_DEBUG_FS))
>>>>  		return 0;
>>>>  
>>>> -	down_read(&cxl_region_rwsem);
>>>> -	down_read(&cxl_dpa_rwsem);
>>>> +	guard(rwsem_read)(&cxl_region_rwsem);
>>>> +	guard(rwsem_read)(&cxl_dpa_rwsem);
>>>>  
>>>>  	rc = cxl_validate_poison_dpa(cxlmd, dpa);
>>>>  	if (rc)
>>>> -		goto out;
>>>> +		return rc;
>>>>  
>>>>  	/*
>>>>  	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
>>>> @@ -396,7 +391,7 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
>>>>  
>>>>  	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
>>>>  	if (rc)
>>>> -		goto out;
>>>> +		return rc;
>>>>  
>>>>  	cxlr = cxl_dpa_to_region(cxlmd, dpa);
>>>>  	if (cxlr)
>>>> @@ -409,11 +404,8 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
>>>>  		.length = cpu_to_le32(1),
>>>>  	};
>>>>  	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_CLEAR);
>>>> -out:
>>>> -	up_read(&cxl_dpa_rwsem);
>>>> -	up_read(&cxl_region_rwsem);
>>>>  
>>>> -	return rc;
>>>> +	return 0;
>>>>  }
>>>>  EXPORT_SYMBOL_NS_GPL(cxl_clear_poison, CXL);
>>>>  
>>>>
>>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] cxl/core: Hold the region rwsem during poison ops
  2023-11-14  2:53 [PATCH] cxl/core: Hold the region rwsem during poison ops alison.schofield
  2023-11-14 18:20 ` Dave Jiang
  2023-11-14 18:25 ` [PATCH] cxl: Convert pioson ops rwsem usages to scope based resource management Dave Jiang
@ 2023-11-16 22:14 ` Davidlohr Bueso
  2023-11-20 19:07   ` Alison Schofield
  2023-11-23  1:48 ` Dan Williams
  3 siblings, 1 reply; 13+ messages in thread
From: Davidlohr Bueso @ 2023-11-16 22:14 UTC (permalink / raw)
  To: alison.schofield
  Cc: Jonathan Cameron, Dave Jiang, Vishal Verma, Ira Weiny,
	Dan Williams, linux-cxl

On Mon, 13 Nov 2023, alison.schofield@intel.com wrote:

>From: Alison Schofield <alison.schofield@intel.com>
>
>Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")
>added a lockdep_assert_held() to make sure all callers hold the
>region state stable while doing work that depends on the number
>of committed decoders.

Unrelated, but that kind of helper would be better as a static
inline in a header file.

>
>That lockdep assert triggered in poison list, inject, and clear
>functions highlighting a gap between region attach and decoder
>commit where holding the dpa_rwsem is not enough to assure that
>a DPA is not added to a region. In such a case, if poison is
>found in at that DPA, the trace event omits the region info
>that users expect.
>
>Close the gap by snapshotting an unchangeable region state during
>all poison ops. Hold the region_rwsem in all the places that hold
>the dpa_rwsem rather than in the region specific function only.

Makes sense and lock oder is consistent with that of attach_target().
I do think that the interruptible semantics should be kept considering
this is from sysfs/debugfs.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] cxl/core: Hold the region rwsem during poison ops
  2023-11-16 22:14 ` [PATCH] cxl/core: Hold the region rwsem during poison ops Davidlohr Bueso
@ 2023-11-20 19:07   ` Alison Schofield
  0 siblings, 0 replies; 13+ messages in thread
From: Alison Schofield @ 2023-11-20 19:07 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Jonathan Cameron, Dave Jiang, Vishal Verma, Ira Weiny,
	Dan Williams, linux-cxl

On Thu, Nov 16, 2023 at 02:14:37PM -0800, Davidlohr Bueso wrote:
> On Mon, 13 Nov 2023, alison.schofield@intel.com wrote:
> 
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")
> > added a lockdep_assert_held() to make sure all callers hold the
> > region state stable while doing work that depends on the number
> > of committed decoders.
> 
> Unrelated, but that kind of helper would be better as a static
> inline in a header file.
> 
> > 
> > That lockdep assert triggered in poison list, inject, and clear
> > functions highlighting a gap between region attach and decoder
> > commit where holding the dpa_rwsem is not enough to assure that
> > a DPA is not added to a region. In such a case, if poison is
> > found in at that DPA, the trace event omits the region info
> > that users expect.
> > 
> > Close the gap by snapshotting an unchangeable region state during
> > all poison ops. Hold the region_rwsem in all the places that hold
> > the dpa_rwsem rather than in the region specific function only.
> 
> Makes sense and lock oder is consistent with that of attach_target().
> I do think that the interruptible semantics should be kept considering
> this is from sysfs/debugfs.

Good eye, bad me!  It was intentional to remove the interruptible,
but I should have noted it in the commit log.

At the point the lock is taken, the driver is committed to completing
the action. It is not interruptible once begun. This notion of a poison
state was first introduced here:
d0abf5787adc ("cxl/mbox: Initialize the poison state")

The basic premise is that the driver will synchronize reads of poison
so as not to return incomplete lists.

> 
> Thanks,
> Davidlohr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] cxl: Convert pioson ops rwsem usages to scope based resource management
  2023-11-16  2:17       ` Alison Schofield
  2023-11-16 16:27         ` Dave Jiang
@ 2023-11-23  1:12         ` Dan Williams
  2023-11-23  1:33           ` Dan Williams
  1 sibling, 1 reply; 13+ messages in thread
From: Dan Williams @ 2023-11-23  1:12 UTC (permalink / raw)
  To: Alison Schofield, Dave Jiang
  Cc: linux-cxl, dave, jonathan.cameron, vishal.l.verma, ira.weiny,
	dan.j.williams

Alison Schofield wrote:
> On Wed, Nov 15, 2023 at 04:55:11PM -0700, Dave Jiang wrote:
> > 
> > 
> > On 11/15/23 16:32, Alison Schofield wrote:
> > > On Tue, Nov 14, 2023 at 11:25:20AM -0700, Dave Jiang wrote:
> > >> Cleanup the rwsem usages in the poison ops to reduce complexity and reduce
> > >> code lines.
> > >>
> > >> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> > >> ---
> > >>
> > >> Hi Alison, follow on patch to yours. Can't be backported, but nice clean
> > >> up going forward.
> > > 
> > > Tell me more about your backport worry.  Are we expected to avoid using
> > > the new guard any place where there is a backport possibility?  
> > 
> > Given that there's none of the cleanup.h support in stable kernels, I don't see how we can backport the guard() code automatically. Thus your original fix with a fixes tag plus a new cleanup patch follow on w/o backport issues seems necessary. Otherwise a separate backport patch would be needed no?
> 
> Sure, it would be needed. I guess I'm looking for why this backport
> issue is so special. (not being sarcastic). Is there specific guidance
> not to use the cleanup stuff if we think a patch might be backported?
> I don't usually consider backportability when adding a Fixes tag to a
> Patch. Have 'backport folks' asked us not to use it?
> 
> I'm imagining the slippery slope of not fixing something the best way
> because we are worried that backport folks can't figure out how to
> merge it.

Upstream should be fixing things "the best way" in the current kernel.
If the best way requires some work when backporting, so be it.

The general rule for making backports easier is for when a fix
identifies additional cleanups. In that scenario the fix should be made
first and then the cleanups layered on top.

The cleanup.h helpers are an interesting case because they allow adding
new locking calls and defining the scope of the lock at the same time. I
submit that cleanup.h helpers are as easy / difficult to backport as
open-coded lock / unlock calls.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] cxl: Convert pioson ops rwsem usages to scope based resource management
  2023-11-23  1:12         ` Dan Williams
@ 2023-11-23  1:33           ` Dan Williams
  0 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2023-11-23  1:33 UTC (permalink / raw)
  To: Dan Williams, Alison Schofield, Dave Jiang
  Cc: linux-cxl, dave, jonathan.cameron, vishal.l.verma, ira.weiny,
	dan.j.williams

Dan Williams wrote:
> Alison Schofield wrote:
> > On Wed, Nov 15, 2023 at 04:55:11PM -0700, Dave Jiang wrote:
> > > 
> > > 
> > > On 11/15/23 16:32, Alison Schofield wrote:
> > > > On Tue, Nov 14, 2023 at 11:25:20AM -0700, Dave Jiang wrote:
> > > >> Cleanup the rwsem usages in the poison ops to reduce complexity and reduce
> > > >> code lines.
> > > >>
> > > >> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> > > >> ---
> > > >>
> > > >> Hi Alison, follow on patch to yours. Can't be backported, but nice clean
> > > >> up going forward.
> > > > 
> > > > Tell me more about your backport worry.  Are we expected to avoid using
> > > > the new guard any place where there is a backport possibility?  
> > > 
> > > Given that there's none of the cleanup.h support in stable kernels, I don't see how we can backport the guard() code automatically. Thus your original fix with a fixes tag plus a new cleanup patch follow on w/o backport issues seems necessary. Otherwise a separate backport patch would be needed no?
> > 
> > Sure, it would be needed. I guess I'm looking for why this backport
> > issue is so special. (not being sarcastic). Is there specific guidance
> > not to use the cleanup stuff if we think a patch might be backported?
> > I don't usually consider backportability when adding a Fixes tag to a
> > Patch. Have 'backport folks' asked us not to use it?
> > 
> > I'm imagining the slippery slope of not fixing something the best way
> > because we are worried that backport folks can't figure out how to
> > merge it.
> 
> Upstream should be fixing things "the best way" in the current kernel.
> If the best way requires some work when backporting, so be it.
> 
> The general rule for making backports easier is for when a fix
> identifies additional cleanups. In that scenario the fix should be made
> first and then the cleanups layered on top.
> 
> The cleanup.h helpers are an interesting case because they allow adding
> new locking calls and defining the scope of the lock at the same time. I
> submit that cleanup.h helpers are as easy / difficult to backport as
> open-coded lock / unlock calls.

The other concern here though is mixing a conversion to use cleanup.h
with a cleanup to use scope-based locking, and the fact that the
_interruptible version of the scope based locking is not available until
v6.8. So while I think it is ok to introduce new locking as a fix with
the cleanup.h headers. The old-style should be used when the fix overlaps
a conversion.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] cxl/core: Hold the region rwsem during poison ops
  2023-11-14  2:53 [PATCH] cxl/core: Hold the region rwsem during poison ops alison.schofield
                   ` (2 preceding siblings ...)
  2023-11-16 22:14 ` [PATCH] cxl/core: Hold the region rwsem during poison ops Davidlohr Bueso
@ 2023-11-23  1:48 ` Dan Williams
  2023-11-27  0:03   ` Alison Schofield
  3 siblings, 1 reply; 13+ messages in thread
From: Dan Williams @ 2023-11-23  1:48 UTC (permalink / raw)
  To: alison.schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl

alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")
> added a lockdep_assert_held() to make sure all callers hold the
> region state stable while doing work that depends on the number
> of committed decoders.
> 
> That lockdep assert triggered in poison list, inject, and clear
> functions highlighting a gap between region attach and decoder
> commit where holding the dpa_rwsem is not enough to assure that
> a DPA is not added to a region. In such a case, if poison is
> found in at that DPA, the trace event omits the region info
> that users expect.
> 
> Close the gap by snapshotting an unchangeable region state during
> all poison ops. Hold the region_rwsem in all the places that hold
> the dpa_rwsem rather than in the region specific function only.
> 
> Fixes: 7ff6ad107588 ("cxl/memdev: Add trigger_poison_list sysfs attribute")
> Fixes: f0832a586396 ("cxl/region: Provide region info to the cxl_poison trace event")
> Fixes: d2fbc4865802 ("cxl/memdev: Add support for the Inject Poison mailbox command")
> Fixes: 9690b07748d1 ("cxl/memdev: Add support for the Clear Poison mailbox command")

I think this is an indication that the fixes would benefit from being
broken up into at least 2 commits so that the specific side effect of
each can be commented upon.

For example:

- Fix walking committed decoders in cxl_trigger_poison_list()

- Fix walking dpa to region lookups in cxl_{inject,clear}_poison()

...look like 2 separate topics in this combined patch.

> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  drivers/cxl/core/memdev.c | 18 +++++++++---------
>  drivers/cxl/core/region.c |  5 -----
>  2 files changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index fc5c2b414793..961da365b097 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -227,9 +227,8 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
>  	if (!port || !is_cxl_endpoint(port))
>  		return -EINVAL;
>  
> -	rc = down_read_interruptible(&cxl_dpa_rwsem);
> -	if (rc)
> -		return rc;

What the rationale for dropping interruptible, it seems appropriate here
since this function is directly servicing a debugfs trigger and maybe
someone gets tired of waiting.

> +	down_read(&cxl_region_rwsem);
> +	down_read(&cxl_dpa_rwsem);
>  
>  	if (cxl_num_decoders_committed(port) == 0) {
>  		/* No regions mapped to this memdev */
> @@ -239,6 +238,7 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
>  		rc =  cxl_get_poison_by_endpoint(port);
>  	}
>  	up_read(&cxl_dpa_rwsem);
> +	up_read(&cxl_region_rwsem);
>  
>  	return rc;
>  }
> @@ -324,9 +324,8 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
>  	if (!IS_ENABLED(CONFIG_DEBUG_FS))
>  		return 0;
>  
> -	rc = down_read_interruptible(&cxl_dpa_rwsem);
> -	if (rc)
> -		return rc;
> +	down_read(&cxl_region_rwsem);
> +	down_read(&cxl_dpa_rwsem);
>  
>  	rc = cxl_validate_poison_dpa(cxlmd, dpa);
>  	if (rc)
> @@ -355,6 +354,7 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
>  	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_INJECT);
>  out:
>  	up_read(&cxl_dpa_rwsem);
> +	up_read(&cxl_region_rwsem);
>  
>  	return rc;
>  }
> @@ -372,9 +372,8 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
>  	if (!IS_ENABLED(CONFIG_DEBUG_FS))
>  		return 0;
>  
> -	rc = down_read_interruptible(&cxl_dpa_rwsem);
> -	if (rc)
> -		return rc;
> +	down_read(&cxl_region_rwsem);
> +	down_read(&cxl_dpa_rwsem);
>  
>  	rc = cxl_validate_poison_dpa(cxlmd, dpa);
>  	if (rc)
> @@ -412,6 +411,7 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
>  	trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_CLEAR);
>  out:
>  	up_read(&cxl_dpa_rwsem);
> +	up_read(&cxl_region_rwsem);
>  
>  	return rc;
>  }
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 56e575c79bb4..3e817a6f94c6 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2467,10 +2467,6 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
>  	struct cxl_poison_context ctx;
>  	int rc = 0;
>  
> -	rc = down_read_interruptible(&cxl_region_rwsem);
> -	if (rc)
> -		return rc;
> -
>  	ctx = (struct cxl_poison_context) {
>  		.port = port
>  	};
> @@ -2480,7 +2476,6 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
>  		rc = cxl_get_poison_unmapped(to_cxl_memdev(port->uport_dev),
>  					     &ctx);
>  
> -	up_read(&cxl_region_rwsem);
>  	return rc;

This hunk deserves to be called out that region locking is being
upleveled as part of topic 1, and this reinforces splitting the 2 topics
into 2 patches.

Keep the _interruptible versions throughout, if you want to drop
interruptible that should be a separate follow-on behavior change patch.
The need to keep _interruptible also obviates conversion to cleanup.h
helpers for now.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] cxl/core: Hold the region rwsem during poison ops
  2023-11-23  1:48 ` Dan Williams
@ 2023-11-27  0:03   ` Alison Schofield
  0 siblings, 0 replies; 13+ messages in thread
From: Alison Schofield @ 2023-11-27  0:03 UTC (permalink / raw)
  To: Dan Williams
  Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
	Ira Weiny, linux-cxl

On Wed, Nov 22, 2023 at 05:48:45PM -0800, Dan Williams wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@intel.com>

snip

> > 
> > 
> > Fixes: 7ff6ad107588 ("cxl/memdev: Add trigger_poison_list sysfs attribute")
> > Fixes: f0832a586396 ("cxl/region: Provide region info to the cxl_poison trace event")
> > Fixes: d2fbc4865802 ("cxl/memdev: Add support for the Inject Poison mailbox command")
> > Fixes: 9690b07748d1 ("cxl/memdev: Add support for the Clear Poison mailbox command")
> 
> I think this is an indication that the fixes would benefit from being
> broken up into at least 2 commits so that the specific side effect of
> each can be commented upon.
> 

Agree. I've split into 2 patches and expanded the impact in each commit
message.

> For example:
> 
> - Fix walking committed decoders in cxl_trigger_poison_list()

That's not where I found the lock issue. When walking committed
decoders, the region_rwsem was held. It is when we think there
are no regions mapped, and collect by memdev only, that I found
a gap.

> 
> - Fix walking dpa to region lookups in cxl_{inject,clear}_poison()

We're probably in sync here, albeit different words.

> 
> ...look like 2 separate topics in this combined patch.
> 
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >  drivers/cxl/core/memdev.c | 18 +++++++++---------
> >  drivers/cxl/core/region.c |  5 -----
> >  2 files changed, 9 insertions(+), 14 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index fc5c2b414793..961da365b097 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -227,9 +227,8 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
> >  	if (!port || !is_cxl_endpoint(port))
> >  		return -EINVAL;
> >  
> > -	rc = down_read_interruptible(&cxl_dpa_rwsem);
> > -	if (rc)
> > -		return rc;
> 
> What the rationale for dropping interruptible, it seems appropriate here
> since this function is directly servicing a debugfs trigger and maybe
> someone gets tired of waiting.
> 

This one is the sysfs, not debugfs, trigger. They shouldn't get tired of
waiting as it is only reading static info - the existing poison list of 
the device.

However, I did put back the _interruptible...for now.

I am not clear on how it all trickles down if this gets interrupted and
need to understand that further. We are trying to maintain a poison
state, and prevent partial reads of poison lists. That is not part of 
this patch, so I will study that some more and come back at it if I 
still think it's an issue.


> > +	down_read(&cxl_region_rwsem);
> > +	down_read(&cxl_dpa_rwsem);
> >  

snip

> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -2467,10 +2467,6 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
> >  	struct cxl_poison_context ctx;
> >  	int rc = 0;
> >  
> > -	rc = down_read_interruptible(&cxl_region_rwsem);
> > -	if (rc)
> > -		return rc;
> > -
> >  	ctx = (struct cxl_poison_context) {
> >  		.port = port
> >  	};
> > @@ -2480,7 +2476,6 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
> >  		rc = cxl_get_poison_unmapped(to_cxl_memdev(port->uport_dev),
> >  					     &ctx);
> >  
> > -	up_read(&cxl_region_rwsem);
> >  	return rc;
> 
> This hunk deserves to be called out that region locking is being
> upleveled as part of topic 1, and this reinforces splitting the 2 topics
> into 2 patches.

done.

> 
> Keep the _interruptible versions throughout, if you want to drop
> interruptible that should be a separate follow-on behavior change patch.
> The need to keep _interruptible also obviates conversion to cleanup.h
> helpers for now.

done.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-11-27  0:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-14  2:53 [PATCH] cxl/core: Hold the region rwsem during poison ops alison.schofield
2023-11-14 18:20 ` Dave Jiang
2023-11-14 18:25 ` [PATCH] cxl: Convert pioson ops rwsem usages to scope based resource management Dave Jiang
2023-11-15 23:32   ` Alison Schofield
2023-11-15 23:55     ` Dave Jiang
2023-11-16  2:17       ` Alison Schofield
2023-11-16 16:27         ` Dave Jiang
2023-11-23  1:12         ` Dan Williams
2023-11-23  1:33           ` Dan Williams
2023-11-16 22:14 ` [PATCH] cxl/core: Hold the region rwsem during poison ops Davidlohr Bueso
2023-11-20 19:07   ` Alison Schofield
2023-11-23  1:48 ` Dan Williams
2023-11-27  0:03   ` Alison Schofield

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.