All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] cxl: Force context lock during EEH flow
@ 2017-04-05  6:12 Vaibhav Jain
  2017-04-05 10:10   ` Frederic Barrat
  0 siblings, 1 reply; 4+ messages in thread
From: Vaibhav Jain @ 2017-04-05  6:12 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev
  Cc: Vaibhav Jain, Andrew Donnellan, Ian Munsie, Christophe Lombard,
	Philippe Bergheaud, Greg Kurz, stable

During an eeh event when the cxl card is fenced and card sysfs attr
perst_reloads_same_image is set following warning message is seen in the
kernel logs:

 [   60.622727] Adapter context unlocked with 0 active contexts
 [   60.622762] ------------[ cut here ]------------
 [   60.622771] WARNING: CPU: 12 PID: 627 at
 ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl]

Even though this warning is harmless, it clutters the kernel log
during an eeh event. This warning is triggered as the EEH callback
cxl_pci_error_detected doesn't obtain a context-lock before forcibly
detaching all active context and when context-lock is released during
call to cxl_configure_adapter from cxl_pci_slot_reset, a warning in
cxl_adapter_context_unlock is triggered.

To fix this warning, we acquire the adapter context-lock via
cxl_adapter_context_lock() in the eeh callback
cxl_pci_error_detected() once all the virtual AFU PHBs are notified
and their contexts detached. After the EEH flow concludes with
call to cxl_pci_resume() the context-lock is released with a call to
cxl_adapter_context_unlock() just before notifying all virtual AFU
PHBs to resume so that they can re-activate any contexts if needed.

Cc: stable@vger.kernel.org
Fixes: 70b565bbdb91("cxl: Prevent adapter reset if an active context exists")
Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
---
Change-Log:

v2..v3
- As discussed with Fred removed function
cxl_adapter_context_force_lock() which may potentially expose the code
to deadlock in the future.
-  Other details of changes in cxl_pci_error_detected() to fix an
earlier issue of eeh callbacks not being passed on to all slices, is
being reworked as a separate patch.

v2..v1
- Moved the call to cxl_adapter_context_force_lock() from
cxl_pci_error_detected() to cxl_remove. (Fred)
---
 drivers/misc/cxl/pci.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index b27ea98..1e98245 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1496,8 +1496,6 @@ static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev)
 	if ((rc = cxl_native_register_psl_err_irq(adapter)))
 		goto err;
 
-	/* Release the context lock as adapter is configured */
-	cxl_adapter_context_unlock(adapter);
 	return 0;
 
 err:
@@ -1596,6 +1594,9 @@ static struct cxl *cxl_pci_init_adapter(struct pci_dev *dev)
 	if ((rc = cxl_sysfs_adapter_add(adapter)))
 		goto err_put1;
 
+	/* Release the context lock as adapter is configured */
+	cxl_adapter_context_unlock(adapter);
+
 	return adapter;
 
 err_put1:
@@ -1895,6 +1896,13 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
 		cxl_ops->afu_deactivate_mode(afu, afu->current_mode);
 		pci_deconfigure_afu(afu);
 	}
+
+	/* should take the context lock here */
+	if (cxl_adapter_context_lock(adapter) != 0)
+		dev_warn(&adapter->dev,
+			 "Couldn't take context lock with %d active-contexts\n",
+			 atomic_read(&adapter->contexts_num));
+
 	cxl_deconfigure_adapter(adapter);
 
 	return result;
@@ -1977,6 +1985,9 @@ static void cxl_pci_resume(struct pci_dev *pdev)
 	struct pci_dev *afu_dev;
 	int i;
 
+	/* Unlock context activation for the adapter */
+	cxl_adapter_context_unlock(adapter);
+
 	/* Everything is back now. Drivers should restart work now.
 	 * This is not the place to be checking if everything came back up
 	 * properly, because there's no return value: do that in slot_reset.
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] cxl: Force context lock during EEH flow
  2017-04-05  6:12 [PATCH v3] cxl: Force context lock during EEH flow Vaibhav Jain
@ 2017-04-05 10:10   ` Frederic Barrat
  0 siblings, 0 replies; 4+ messages in thread
From: Frederic Barrat @ 2017-04-05 10:10 UTC (permalink / raw)
  To: Vaibhav Jain, linuxppc-dev
  Cc: Andrew Donnellan, Ian Munsie, Christophe Lombard,
	Philippe Bergheaud, Greg Kurz, stable

Hi Vaibhav,

I like the simplified version better. However, I think it breaks 
cxlflash's recovery. With this patch, we are also unlocking the adapter 
later. It doesn't matter when the driver initializes the first time, but 
on EEH, we now call the "slot reset" callback of the virtual device 
while locked (we unlock on "resume"). And cxlflash re-attach's on "slot 
reset", see cxlflash_pci_slot_reset()->init_afu()->init_mc().
That was also true on the previous versions of the patch.

   Fred

Le 05/04/2017 � 08:12, Vaibhav Jain a �crit :
> During an eeh event when the cxl card is fenced and card sysfs attr
> perst_reloads_same_image is set following warning message is seen in the
> kernel logs:
>
>  [   60.622727] Adapter context unlocked with 0 active contexts
>  [   60.622762] ------------[ cut here ]------------
>  [   60.622771] WARNING: CPU: 12 PID: 627 at
>  ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl]
>
> Even though this warning is harmless, it clutters the kernel log
> during an eeh event. This warning is triggered as the EEH callback
> cxl_pci_error_detected doesn't obtain a context-lock before forcibly
> detaching all active context and when context-lock is released during
> call to cxl_configure_adapter from cxl_pci_slot_reset, a warning in
> cxl_adapter_context_unlock is triggered.
>
> To fix this warning, we acquire the adapter context-lock via
> cxl_adapter_context_lock() in the eeh callback
> cxl_pci_error_detected() once all the virtual AFU PHBs are notified
> and their contexts detached. After the EEH flow concludes with
> call to cxl_pci_resume() the context-lock is released with a call to
> cxl_adapter_context_unlock() just before notifying all virtual AFU
> PHBs to resume so that they can re-activate any contexts if needed.
>
> Cc: stable@vger.kernel.org
> Fixes: 70b565bbdb91("cxl: Prevent adapter reset if an active context exists")
> Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
> ---
> Change-Log:
>
> v2..v3
> - As discussed with Fred removed function
> cxl_adapter_context_force_lock() which may potentially expose the code
> to deadlock in the future.
> -  Other details of changes in cxl_pci_error_detected() to fix an
> earlier issue of eeh callbacks not being passed on to all slices, is
> being reworked as a separate patch.
>
> v2..v1
> - Moved the call to cxl_adapter_context_force_lock() from
> cxl_pci_error_detected() to cxl_remove. (Fred)
> ---
>  drivers/misc/cxl/pci.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index b27ea98..1e98245 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -1496,8 +1496,6 @@ static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev)
>  	if ((rc = cxl_native_register_psl_err_irq(adapter)))
>  		goto err;
>
> -	/* Release the context lock as adapter is configured */
> -	cxl_adapter_context_unlock(adapter);
>  	return 0;
>
>  err:
> @@ -1596,6 +1594,9 @@ static struct cxl *cxl_pci_init_adapter(struct pci_dev *dev)
>  	if ((rc = cxl_sysfs_adapter_add(adapter)))
>  		goto err_put1;
>
> +	/* Release the context lock as adapter is configured */
> +	cxl_adapter_context_unlock(adapter);
> +
>  	return adapter;
>
>  err_put1:
> @@ -1895,6 +1896,13 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
>  		cxl_ops->afu_deactivate_mode(afu, afu->current_mode);
>  		pci_deconfigure_afu(afu);
>  	}
> +
> +	/* should take the context lock here */
> +	if (cxl_adapter_context_lock(adapter) != 0)
> +		dev_warn(&adapter->dev,
> +			 "Couldn't take context lock with %d active-contexts\n",
> +			 atomic_read(&adapter->contexts_num));
> +
>  	cxl_deconfigure_adapter(adapter);
>
>  	return result;
> @@ -1977,6 +1985,9 @@ static void cxl_pci_resume(struct pci_dev *pdev)
>  	struct pci_dev *afu_dev;
>  	int i;
>
> +	/* Unlock context activation for the adapter */
> +	cxl_adapter_context_unlock(adapter);
> +
>  	/* Everything is back now. Drivers should restart work now.
>  	 * This is not the place to be checking if everything came back up
>  	 * properly, because there's no return value: do that in slot_reset.
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] cxl: Force context lock during EEH flow
@ 2017-04-05 10:10   ` Frederic Barrat
  0 siblings, 0 replies; 4+ messages in thread
From: Frederic Barrat @ 2017-04-05 10:10 UTC (permalink / raw)
  To: Vaibhav Jain, linuxppc-dev
  Cc: Andrew Donnellan, Ian Munsie, Christophe Lombard,
	Philippe Bergheaud, Greg Kurz, stable

Hi Vaibhav,

I like the simplified version better. However, I think it breaks 
cxlflash's recovery. With this patch, we are also unlocking the adapter 
later. It doesn't matter when the driver initializes the first time, but 
on EEH, we now call the "slot reset" callback of the virtual device 
while locked (we unlock on "resume"). And cxlflash re-attach's on "slot 
reset", see cxlflash_pci_slot_reset()->init_afu()->init_mc().
That was also true on the previous versions of the patch.

   Fred

Le 05/04/2017 à 08:12, Vaibhav Jain a écrit :
> During an eeh event when the cxl card is fenced and card sysfs attr
> perst_reloads_same_image is set following warning message is seen in the
> kernel logs:
>
>  [   60.622727] Adapter context unlocked with 0 active contexts
>  [   60.622762] ------------[ cut here ]------------
>  [   60.622771] WARNING: CPU: 12 PID: 627 at
>  ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl]
>
> Even though this warning is harmless, it clutters the kernel log
> during an eeh event. This warning is triggered as the EEH callback
> cxl_pci_error_detected doesn't obtain a context-lock before forcibly
> detaching all active context and when context-lock is released during
> call to cxl_configure_adapter from cxl_pci_slot_reset, a warning in
> cxl_adapter_context_unlock is triggered.
>
> To fix this warning, we acquire the adapter context-lock via
> cxl_adapter_context_lock() in the eeh callback
> cxl_pci_error_detected() once all the virtual AFU PHBs are notified
> and their contexts detached. After the EEH flow concludes with
> call to cxl_pci_resume() the context-lock is released with a call to
> cxl_adapter_context_unlock() just before notifying all virtual AFU
> PHBs to resume so that they can re-activate any contexts if needed.
>
> Cc: stable@vger.kernel.org
> Fixes: 70b565bbdb91("cxl: Prevent adapter reset if an active context exists")
> Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
> ---
> Change-Log:
>
> v2..v3
> - As discussed with Fred removed function
> cxl_adapter_context_force_lock() which may potentially expose the code
> to deadlock in the future.
> -  Other details of changes in cxl_pci_error_detected() to fix an
> earlier issue of eeh callbacks not being passed on to all slices, is
> being reworked as a separate patch.
>
> v2..v1
> - Moved the call to cxl_adapter_context_force_lock() from
> cxl_pci_error_detected() to cxl_remove. (Fred)
> ---
>  drivers/misc/cxl/pci.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index b27ea98..1e98245 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -1496,8 +1496,6 @@ static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev)
>  	if ((rc = cxl_native_register_psl_err_irq(adapter)))
>  		goto err;
>
> -	/* Release the context lock as adapter is configured */
> -	cxl_adapter_context_unlock(adapter);
>  	return 0;
>
>  err:
> @@ -1596,6 +1594,9 @@ static struct cxl *cxl_pci_init_adapter(struct pci_dev *dev)
>  	if ((rc = cxl_sysfs_adapter_add(adapter)))
>  		goto err_put1;
>
> +	/* Release the context lock as adapter is configured */
> +	cxl_adapter_context_unlock(adapter);
> +
>  	return adapter;
>
>  err_put1:
> @@ -1895,6 +1896,13 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
>  		cxl_ops->afu_deactivate_mode(afu, afu->current_mode);
>  		pci_deconfigure_afu(afu);
>  	}
> +
> +	/* should take the context lock here */
> +	if (cxl_adapter_context_lock(adapter) != 0)
> +		dev_warn(&adapter->dev,
> +			 "Couldn't take context lock with %d active-contexts\n",
> +			 atomic_read(&adapter->contexts_num));
> +
>  	cxl_deconfigure_adapter(adapter);
>
>  	return result;
> @@ -1977,6 +1985,9 @@ static void cxl_pci_resume(struct pci_dev *pdev)
>  	struct pci_dev *afu_dev;
>  	int i;
>
> +	/* Unlock context activation for the adapter */
> +	cxl_adapter_context_unlock(adapter);
> +
>  	/* Everything is back now. Drivers should restart work now.
>  	 * This is not the place to be checking if everything came back up
>  	 * properly, because there's no return value: do that in slot_reset.
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] cxl: Force context lock during EEH flow
  2017-04-05 10:10   ` Frederic Barrat
  (?)
@ 2017-04-06  8:41   ` Vaibhav Jain
  -1 siblings, 0 replies; 4+ messages in thread
From: Vaibhav Jain @ 2017-04-06  8:41 UTC (permalink / raw)
  To: Frederic Barrat, Uma Krishnan
  Cc: Philippe Bergheaud, stable, Ian Munsie, Andrew Donnellan,
	Christophe Lombard, Greg Kurz, linuxppc-dev


Thanks for reviewing the patch Fred. I have sent a v4 addressing your
comments regarding cxlflash module.

@Uma,
Can you please take a look at v4 of this patch(
https://patchwork.ozlabs.org/patch/747236/) and see if this change is ok
from cxlflash module prespetive.

Thanks,
-- 
Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Linux Technology Center, IBM India Pvt. Ltd.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-04-06  8:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-05  6:12 [PATCH v3] cxl: Force context lock during EEH flow Vaibhav Jain
2017-04-05 10:10 ` Frederic Barrat
2017-04-05 10:10   ` Frederic Barrat
2017-04-06  8:41   ` Vaibhav Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.