linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
@ 2020-09-17 18:32 Russ Weight
  2020-09-17 20:28 ` Tom Rix
  2020-09-18  2:08 ` Wu, Hao
  0 siblings, 2 replies; 14+ messages in thread
From: Russ Weight @ 2020-09-17 18:32 UTC (permalink / raw)
  To: mdf, linux-fpga, linux-kernel
  Cc: trix, lgoncalv, yilun.xu, hao.wu, matthew.gerlach, Russ Weight

Port enable is not complete until ACK = 0. Change
__afu_port_enable() to guarantee that the enable process
is complete by polling for ACK == 0.

Signed-off-by: Russ Weight <russell.h.weight@intel.com>
---
 drivers/fpga/dfl-afu-error.c |  2 +-
 drivers/fpga/dfl-afu-main.c  | 29 +++++++++++++++++++++--------
 drivers/fpga/dfl-afu.h       |  2 +-
 3 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
index c4691187cca9..0806532a3e9f 100644
--- a/drivers/fpga/dfl-afu-error.c
+++ b/drivers/fpga/dfl-afu-error.c
@@ -103,7 +103,7 @@ static int afu_port_err_clear(struct device *dev, u64 err)
 	__afu_port_err_mask(dev, false);
 
 	/* Enable the Port by clear the reset */
-	__afu_port_enable(pdev);
+	ret = __afu_port_enable(pdev);
 
 done:
 	mutex_unlock(&pdata->lock);
diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index 753cda4b2568..f73b06cdf13c 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -21,6 +21,9 @@
 
 #include "dfl-afu.h"
 
+#define RST_POLL_INVL 10 /* us */
+#define RST_POLL_TIMEOUT 1000 /* us */
+
 /**
  * __afu_port_enable - enable a port by clear reset
  * @pdev: port platform device.
@@ -32,7 +35,7 @@
  *
  * The caller needs to hold lock for protection.
  */
-void __afu_port_enable(struct platform_device *pdev)
+int __afu_port_enable(struct platform_device *pdev)
 {
 	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
 	void __iomem *base;
@@ -41,7 +44,7 @@ void __afu_port_enable(struct platform_device *pdev)
 	WARN_ON(!pdata->disable_count);
 
 	if (--pdata->disable_count != 0)
-		return;
+		return 0;
 
 	base = dfl_get_feature_ioaddr_by_id(&pdev->dev, PORT_FEATURE_ID_HEADER);
 
@@ -49,10 +52,20 @@ void __afu_port_enable(struct platform_device *pdev)
 	v = readq(base + PORT_HDR_CTRL);
 	v &= ~PORT_CTRL_SFTRST;
 	writeq(v, base + PORT_HDR_CTRL);
-}
 
-#define RST_POLL_INVL 10 /* us */
-#define RST_POLL_TIMEOUT 1000 /* us */
+	/*
+	 * HW clears the ack bit to indicate that the port is fully out
+	 * of reset.
+	 */
+	if (readq_poll_timeout(base + PORT_HDR_CTRL, v,
+			       !(v & PORT_CTRL_SFTRST_ACK),
+			       RST_POLL_INVL, RST_POLL_TIMEOUT)) {
+		dev_err(&pdev->dev, "timeout, failure to enable device\n");
+		return -ETIMEDOUT;
+	}
+
+	return 0;
+}
 
 /**
  * __afu_port_disable - disable a port by hold reset
@@ -111,7 +124,7 @@ static int __port_reset(struct platform_device *pdev)
 
 	ret = __afu_port_disable(pdev);
 	if (!ret)
-		__afu_port_enable(pdev);
+		ret = __afu_port_enable(pdev);
 
 	return ret;
 }
@@ -872,11 +885,11 @@ static int afu_dev_destroy(struct platform_device *pdev)
 static int port_enable_set(struct platform_device *pdev, bool enable)
 {
 	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
-	int ret = 0;
+	int ret;
 
 	mutex_lock(&pdata->lock);
 	if (enable)
-		__afu_port_enable(pdev);
+		ret = __afu_port_enable(pdev);
 	else
 		ret = __afu_port_disable(pdev);
 	mutex_unlock(&pdata->lock);
diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
index 576e94960086..e5020e2b1f3d 100644
--- a/drivers/fpga/dfl-afu.h
+++ b/drivers/fpga/dfl-afu.h
@@ -80,7 +80,7 @@ struct dfl_afu {
 };
 
 /* hold pdata->lock when call __afu_port_enable/disable */
-void __afu_port_enable(struct platform_device *pdev);
+int __afu_port_enable(struct platform_device *pdev);
 int __afu_port_disable(struct platform_device *pdev);
 
 void afu_mmio_region_init(struct dfl_feature_platform_data *pdata);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
  2020-09-17 18:32 [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic Russ Weight
@ 2020-09-17 20:28 ` Tom Rix
  2020-09-17 21:38   ` Moritz Fischer
  2021-02-02 20:32   ` Russ Weight
  2020-09-18  2:08 ` Wu, Hao
  1 sibling, 2 replies; 14+ messages in thread
From: Tom Rix @ 2020-09-17 20:28 UTC (permalink / raw)
  To: Russ Weight, mdf, linux-fpga, linux-kernel
  Cc: lgoncalv, yilun.xu, hao.wu, matthew.gerlach


On 9/17/20 11:32 AM, Russ Weight wrote:
> Port enable is not complete until ACK = 0. Change
> __afu_port_enable() to guarantee that the enable process
> is complete by polling for ACK == 0.
>
> Signed-off-by: Russ Weight <russell.h.weight@intel.com>
> ---
>  drivers/fpga/dfl-afu-error.c |  2 +-
>  drivers/fpga/dfl-afu-main.c  | 29 +++++++++++++++++++++--------
>  drivers/fpga/dfl-afu.h       |  2 +-
>  3 files changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
> index c4691187cca9..0806532a3e9f 100644
> --- a/drivers/fpga/dfl-afu-error.c
> +++ b/drivers/fpga/dfl-afu-error.c
> @@ -103,7 +103,7 @@ static int afu_port_err_clear(struct device *dev, u64 err)
>  	__afu_port_err_mask(dev, false);
>  

There is an earlier bit that sets ret = -EINVAL.

This error will be lost or not handled well.

Right now it doesn't seem to be handled.

>  	/* Enable the Port by clear the reset */
> -	__afu_port_enable(pdev);
> +	ret = __afu_port_enable(pdev);
>  
>  done:
>  	mutex_unlock(&pdata->lock);
> diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
> index 753cda4b2568..f73b06cdf13c 100644
> --- a/drivers/fpga/dfl-afu-main.c
> +++ b/drivers/fpga/dfl-afu-main.c
> @@ -21,6 +21,9 @@
>  
>  #include "dfl-afu.h"
>  
> +#define RST_POLL_INVL 10 /* us */
> +#define RST_POLL_TIMEOUT 1000 /* us */
> +
>  /**
>   * __afu_port_enable - enable a port by clear reset
>   * @pdev: port platform device.
> @@ -32,7 +35,7 @@
>   *
>   * The caller needs to hold lock for protection.
>   */
> -void __afu_port_enable(struct platform_device *pdev)
> +int __afu_port_enable(struct platform_device *pdev)
>  {
>  	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
>  	void __iomem *base;
> @@ -41,7 +44,7 @@ void __afu_port_enable(struct platform_device *pdev)
>  	WARN_ON(!pdata->disable_count);
>  
>  	if (--pdata->disable_count != 0)
> -		return;
> +		return 0;
Is this really a success ? Maybe -EBUSY ?
>  
>  	base = dfl_get_feature_ioaddr_by_id(&pdev->dev, PORT_FEATURE_ID_HEADER);
>  
> @@ -49,10 +52,20 @@ void __afu_port_enable(struct platform_device *pdev)
>  	v = readq(base + PORT_HDR_CTRL);
>  	v &= ~PORT_CTRL_SFTRST;
>  	writeq(v, base + PORT_HDR_CTRL);
> -}
>  
> -#define RST_POLL_INVL 10 /* us */
> -#define RST_POLL_TIMEOUT 1000 /* us */
> +	/*
> +	 * HW clears the ack bit to indicate that the port is fully out
> +	 * of reset.
> +	 */
> +	if (readq_poll_timeout(base + PORT_HDR_CTRL, v,
> +			       !(v & PORT_CTRL_SFTRST_ACK),
> +			       RST_POLL_INVL, RST_POLL_TIMEOUT)) {
> +		dev_err(&pdev->dev, "timeout, failure to enable device\n");
> +		return -ETIMEDOUT;
> +	}
> +
> +	return 0;
> +}
>  
>  /**
>   * __afu_port_disable - disable a port by hold reset
> @@ -111,7 +124,7 @@ static int __port_reset(struct platform_device *pdev)
>  
>  	ret = __afu_port_disable(pdev);
>  	if (!ret)
> -		__afu_port_enable(pdev);
> +		ret = __afu_port_enable(pdev);
>  
>  	return ret;
>  }
> @@ -872,11 +885,11 @@ static int afu_dev_destroy(struct platform_device *pdev)
>  static int port_enable_set(struct platform_device *pdev, bool enable)
>  {
>  	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
> -	int ret = 0;
> +	int ret;
>  
>  	mutex_lock(&pdata->lock);
>  	if (enable)
> -		__afu_port_enable(pdev);
> +		ret = __afu_port_enable(pdev);
>  	else
>  		ret = __afu_port_disable(pdev);
>  	mutex_unlock(&pdata->lock);
> diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
> index 576e94960086..e5020e2b1f3d 100644
> --- a/drivers/fpga/dfl-afu.h
> +++ b/drivers/fpga/dfl-afu.h
> @@ -80,7 +80,7 @@ struct dfl_afu {
>  };
>  
>  /* hold pdata->lock when call __afu_port_enable/disable */
> -void __afu_port_enable(struct platform_device *pdev);
> +int __afu_port_enable(struct platform_device *pdev);
>  int __afu_port_disable(struct platform_device *pdev);

The other functions in this file have afu_*  since the __afu_port_enable/disable

are used other places would it make sense to remove the '__' prefix ?

If you think so, maybe a cleanup patch later.

Tom

>  
>  void afu_mmio_region_init(struct dfl_feature_platform_data *pdata);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
  2020-09-17 20:28 ` Tom Rix
@ 2020-09-17 21:38   ` Moritz Fischer
  2020-09-18  1:23     ` Xu Yilun
                       ` (2 more replies)
  2021-02-02 20:32   ` Russ Weight
  1 sibling, 3 replies; 14+ messages in thread
From: Moritz Fischer @ 2020-09-17 21:38 UTC (permalink / raw)
  To: Tom Rix
  Cc: Russ Weight, mdf, linux-fpga, linux-kernel, lgoncalv, yilun.xu,
	hao.wu, matthew.gerlach

On Thu, Sep 17, 2020 at 01:28:22PM -0700, Tom Rix wrote:
> 
> On 9/17/20 11:32 AM, Russ Weight wrote:
> > Port enable is not complete until ACK = 0. Change
> > __afu_port_enable() to guarantee that the enable process
> > is complete by polling for ACK == 0.
> >
> > Signed-off-by: Russ Weight <russell.h.weight@intel.com>
General note: Please keep a changelog if you send updated versions of a
patch. This can be added here with an extra '---' + Text between Signed-off and
diffstat:

--- 
Changes from v1:
- FOo
- Bar
> > ---
> >  drivers/fpga/dfl-afu-error.c |  2 +-
> >  drivers/fpga/dfl-afu-main.c  | 29 +++++++++++++++++++++--------
> >  drivers/fpga/dfl-afu.h       |  2 +-
> >  3 files changed, 23 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
> > index c4691187cca9..0806532a3e9f 100644
> > --- a/drivers/fpga/dfl-afu-error.c
> > +++ b/drivers/fpga/dfl-afu-error.c
> > @@ -103,7 +103,7 @@ static int afu_port_err_clear(struct device *dev, u64 err)
> >  	__afu_port_err_mask(dev, false);
> >  
> 
> There is an earlier bit that sets ret = -EINVAL.
> 
> This error will be lost or not handled well.
> 
> Right now it doesn't seem to be handled.

Ultimately you'd want to report *at least* one of them, the current code
seems to continue and enable the port either case. Is that what it
should be doing? 

Is the timeout more severe than the invalid value? Do you want to print
a warning?

Either way a comment explaining why this is ok would be appreciated :)
> 
> >  	/* Enable the Port by clear the reset */
> > -	__afu_port_enable(pdev);
> > +	ret = __afu_port_enable(pdev);
> >  
> >  done:
> >  	mutex_unlock(&pdata->lock);
> > diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
> > index 753cda4b2568..f73b06cdf13c 100644
> > --- a/drivers/fpga/dfl-afu-main.c
> > +++ b/drivers/fpga/dfl-afu-main.c
> > @@ -21,6 +21,9 @@
> >  
> >  #include "dfl-afu.h"
> >  
> > +#define RST_POLL_INVL 10 /* us */
> > +#define RST_POLL_TIMEOUT 1000 /* us */
> > +
> >  /**
> >   * __afu_port_enable - enable a port by clear reset
> >   * @pdev: port platform device.
> > @@ -32,7 +35,7 @@
> >   *
> >   * The caller needs to hold lock for protection.
> >   */
> > -void __afu_port_enable(struct platform_device *pdev)
> > +int __afu_port_enable(struct platform_device *pdev)
> >  {
> >  	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
> >  	void __iomem *base;
> > @@ -41,7 +44,7 @@ void __afu_port_enable(struct platform_device *pdev)
> >  	WARN_ON(!pdata->disable_count);
> >  
> >  	if (--pdata->disable_count != 0)
> > -		return;
> > +		return 0;
> Is this really a success ? Maybe -EBUSY ?
Seems like if it's severe enough for a warning you'd probably want to
return an error.
> >  
> >  	base = dfl_get_feature_ioaddr_by_id(&pdev->dev, PORT_FEATURE_ID_HEADER);
> >  
> > @@ -49,10 +52,20 @@ void __afu_port_enable(struct platform_device *pdev)
> >  	v = readq(base + PORT_HDR_CTRL);
> >  	v &= ~PORT_CTRL_SFTRST;
> >  	writeq(v, base + PORT_HDR_CTRL);
> > -}
> >  
> > -#define RST_POLL_INVL 10 /* us */
> > -#define RST_POLL_TIMEOUT 1000 /* us */
> > +	/*
> > +	 * HW clears the ack bit to indicate that the port is fully out
> > +	 * of reset.
> > +	 */
> > +	if (readq_poll_timeout(base + PORT_HDR_CTRL, v,
> > +			       !(v & PORT_CTRL_SFTRST_ACK),
> > +			       RST_POLL_INVL, RST_POLL_TIMEOUT)) {
> > +		dev_err(&pdev->dev, "timeout, failure to enable device\n");
> > +		return -ETIMEDOUT;
> > +	}
> > +
> > +	return 0;
> > +}
> >  
> >  /**
> >   * __afu_port_disable - disable a port by hold reset
> > @@ -111,7 +124,7 @@ static int __port_reset(struct platform_device *pdev)
> >  
> >  	ret = __afu_port_disable(pdev);
> >  	if (!ret)
> > -		__afu_port_enable(pdev);
> > +		ret = __afu_port_enable(pdev);
> >  
> >  	return ret;
> >  }
> > @@ -872,11 +885,11 @@ static int afu_dev_destroy(struct platform_device *pdev)
> >  static int port_enable_set(struct platform_device *pdev, bool enable)
> >  {
> >  	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
> > -	int ret = 0;
> > +	int ret;
> >  
> >  	mutex_lock(&pdata->lock);
> >  	if (enable)
> > -		__afu_port_enable(pdev);
> > +		ret = __afu_port_enable(pdev);
> >  	else
> >  		ret = __afu_port_disable(pdev);
> >  	mutex_unlock(&pdata->lock);
> > diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
> > index 576e94960086..e5020e2b1f3d 100644
> > --- a/drivers/fpga/dfl-afu.h
> > +++ b/drivers/fpga/dfl-afu.h
> > @@ -80,7 +80,7 @@ struct dfl_afu {
> >  };
> >  
> >  /* hold pdata->lock when call __afu_port_enable/disable */
> > -void __afu_port_enable(struct platform_device *pdev);
> > +int __afu_port_enable(struct platform_device *pdev);
> >  int __afu_port_disable(struct platform_device *pdev);
> 
> The other functions in this file have afu_*  since the __afu_port_enable/disable
> 
> are used other places would it make sense to remove the '__' prefix ?

The idea on those is to indicate that the caller need to be cautious
(often a lock / mutex) is required. I think keeping them as is is fine.

> 
> If you think so, maybe a cleanup patch later.
> 
> Tom
> 
> >  
> >  void afu_mmio_region_init(struct dfl_feature_platform_data *pdata);
> 

Thanks,
Moritz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
  2020-09-17 21:38   ` Moritz Fischer
@ 2020-09-18  1:23     ` Xu Yilun
  2020-09-18  2:00     ` Wu, Hao
  2021-02-02 20:44     ` Russ Weight
  2 siblings, 0 replies; 14+ messages in thread
From: Xu Yilun @ 2020-09-18  1:23 UTC (permalink / raw)
  To: Moritz Fischer
  Cc: Tom Rix, Russ Weight, linux-fpga, linux-kernel, lgoncalv, hao.wu,
	matthew.gerlach, yilun.xu

> > >  /**
> > >   * __afu_port_enable - enable a port by clear reset
> > >   * @pdev: port platform device.
> > > @@ -32,7 +35,7 @@
> > >   *
> > >   * The caller needs to hold lock for protection.
> > >   */
> > > -void __afu_port_enable(struct platform_device *pdev)
> > > +int __afu_port_enable(struct platform_device *pdev)
> > >  {
> > >  	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
> > >  	void __iomem *base;
> > > @@ -41,7 +44,7 @@ void __afu_port_enable(struct platform_device *pdev)
> > >  	WARN_ON(!pdata->disable_count);
> > >  
> > >  	if (--pdata->disable_count != 0)
> > > -		return;
> > > +		return 0;
> > Is this really a success ? Maybe -EBUSY ?
> Seems like if it's severe enough for a warning you'd probably want to
> return an error.

This code is to handle the port enable/disable request from multiple
users. This is a voting mechanism, the port would not be physically
enabled if there is still an disable vote. The --diable_count != 0 works
for this purpose. So I think it should be OK here since the voting
mechanism is working as expected.

Thanks,
Yilun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
  2020-09-17 21:38   ` Moritz Fischer
  2020-09-18  1:23     ` Xu Yilun
@ 2020-09-18  2:00     ` Wu, Hao
  2021-02-02 20:44     ` Russ Weight
  2 siblings, 0 replies; 14+ messages in thread
From: Wu, Hao @ 2020-09-18  2:00 UTC (permalink / raw)
  To: Moritz Fischer, Tom Rix
  Cc: Weight, Russell H, linux-fpga, linux-kernel, lgoncalv, Xu, Yilun,
	Gerlach, Matthew

> Subject: Re: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
> 
> On Thu, Sep 17, 2020 at 01:28:22PM -0700, Tom Rix wrote:
> >
> > On 9/17/20 11:32 AM, Russ Weight wrote:
> > > Port enable is not complete until ACK = 0. Change
> > > __afu_port_enable() to guarantee that the enable process
> > > is complete by polling for ACK == 0.
> > >
> > > Signed-off-by: Russ Weight <russell.h.weight@intel.com>
> General note: Please keep a changelog if you send updated versions of a
> patch. This can be added here with an extra '---' + Text between Signed-off
> and
> diffstat:
> 
> ---
> Changes from v1:
> - FOo
> - Bar
> > > ---
> > >  drivers/fpga/dfl-afu-error.c |  2 +-
> > >  drivers/fpga/dfl-afu-main.c  | 29 +++++++++++++++++++++--------
> > >  drivers/fpga/dfl-afu.h       |  2 +-
> > >  3 files changed, 23 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
> > > index c4691187cca9..0806532a3e9f 100644
> > > --- a/drivers/fpga/dfl-afu-error.c
> > > +++ b/drivers/fpga/dfl-afu-error.c
> > > @@ -103,7 +103,7 @@ static int afu_port_err_clear(struct device *dev,
> u64 err)
> > >  	__afu_port_err_mask(dev, false);
> > >
> >
> > There is an earlier bit that sets ret = -EINVAL.
> >
> > This error will be lost or not handled well.
> >
> > Right now it doesn't seem to be handled.
> 
> Ultimately you'd want to report *at least* one of them, the current code
> seems to continue and enable the port either case. Is that what it
> should be doing?

In order to do error clear, we have to put port into reset firstly and then
clear port after error clearing is done. If we see failure during error clearing
that we still want to get the port back to work at least. As we know, if
port is still in reset, then the accelerator connected to the port won't work.

> 
> Is the timeout more severe than the invalid value? Do you want to print
> a warning?

Yes, It's a very bad case if port can not be enabled any more (accelerator may
not be accessible any more), hardware should already be in error, it's better
we have some warning messages here.

> 
> Either way a comment explaining why this is ok would be appreciated :)
> >
> > >  	/* Enable the Port by clear the reset */
> > > -	__afu_port_enable(pdev);
> > > +	ret = __afu_port_enable(pdev);
> > >
> > >  done:
> > >  	mutex_unlock(&pdata->lock);
> > > diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
> > > index 753cda4b2568..f73b06cdf13c 100644
> > > --- a/drivers/fpga/dfl-afu-main.c
> > > +++ b/drivers/fpga/dfl-afu-main.c
> > > @@ -21,6 +21,9 @@
> > >
> > >  #include "dfl-afu.h"
> > >
> > > +#define RST_POLL_INVL 10 /* us */
> > > +#define RST_POLL_TIMEOUT 1000 /* us */
> > > +
> > >  /**
> > >   * __afu_port_enable - enable a port by clear reset
> > >   * @pdev: port platform device.
> > > @@ -32,7 +35,7 @@
> > >   *
> > >   * The caller needs to hold lock for protection.
> > >   */
> > > -void __afu_port_enable(struct platform_device *pdev)
> > > +int __afu_port_enable(struct platform_device *pdev)
> > >  {
> > >  	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev-
> >dev);
> > >  	void __iomem *base;
> > > @@ -41,7 +44,7 @@ void __afu_port_enable(struct platform_device
> *pdev)
> > >  	WARN_ON(!pdata->disable_count);
> > >
> > >  	if (--pdata->disable_count != 0)
> > > -		return;
> > > +		return 0;
> > Is this really a success ? Maybe -EBUSY ?
> Seems like if it's severe enough for a warning you'd probably want to
> return an error.

As Yilun mentioned, this is just a reference count operation, we don't
need to return error code.

> > >
> > >  	base = dfl_get_feature_ioaddr_by_id(&pdev->dev,
> PORT_FEATURE_ID_HEADER);
> > >
> > > @@ -49,10 +52,20 @@ void __afu_port_enable(struct platform_device
> *pdev)
> > >  	v = readq(base + PORT_HDR_CTRL);
> > >  	v &= ~PORT_CTRL_SFTRST;
> > >  	writeq(v, base + PORT_HDR_CTRL);
> > > -}
> > >
> > > -#define RST_POLL_INVL 10 /* us */
> > > -#define RST_POLL_TIMEOUT 1000 /* us */
> > > +	/*
> > > +	 * HW clears the ack bit to indicate that the port is fully out
> > > +	 * of reset.
> > > +	 */
> > > +	if (readq_poll_timeout(base + PORT_HDR_CTRL, v,
> > > +			       !(v & PORT_CTRL_SFTRST_ACK),
> > > +			       RST_POLL_INVL, RST_POLL_TIMEOUT)) {
> > > +		dev_err(&pdev->dev, "timeout, failure to enable device\n");
> > > +		return -ETIMEDOUT;
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > >
> > >  /**
> > >   * __afu_port_disable - disable a port by hold reset
> > > @@ -111,7 +124,7 @@ static int __port_reset(struct platform_device
> *pdev)
> > >
> > >  	ret = __afu_port_disable(pdev);
> > >  	if (!ret)
> > > -		__afu_port_enable(pdev);
> > > +		ret = __afu_port_enable(pdev);
> > >
> > >  	return ret;
> > >  }
> > > @@ -872,11 +885,11 @@ static int afu_dev_destroy(struct
> platform_device *pdev)
> > >  static int port_enable_set(struct platform_device *pdev, bool enable)
> > >  {
> > >  	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev-
> >dev);
> > > -	int ret = 0;
> > > +	int ret;
> > >
> > >  	mutex_lock(&pdata->lock);
> > >  	if (enable)
> > > -		__afu_port_enable(pdev);
> > > +		ret = __afu_port_enable(pdev);
> > >  	else
> > >  		ret = __afu_port_disable(pdev);
> > >  	mutex_unlock(&pdata->lock);
> > > diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
> > > index 576e94960086..e5020e2b1f3d 100644
> > > --- a/drivers/fpga/dfl-afu.h
> > > +++ b/drivers/fpga/dfl-afu.h
> > > @@ -80,7 +80,7 @@ struct dfl_afu {
> > >  };
> > >
> > >  /* hold pdata->lock when call __afu_port_enable/disable */
> > > -void __afu_port_enable(struct platform_device *pdev);
> > > +int __afu_port_enable(struct platform_device *pdev);
> > >  int __afu_port_disable(struct platform_device *pdev);
> >
> > The other functions in this file have afu_*  since the
> __afu_port_enable/disable
> >
> > are used other places would it make sense to remove the '__' prefix ?
> 
> The idea on those is to indicate that the caller need to be cautious
> (often a lock / mutex) is required. I think keeping them as is is fine.

Yes. That's why we add the prefix for these functions.

Thanks
Hao

> 
> >
> > If you think so, maybe a cleanup patch later.
> >
> > Tom
> >
> > >
> > >  void afu_mmio_region_init(struct dfl_feature_platform_data *pdata);
> >
> 
> Thanks,
> Moritz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
  2020-09-17 18:32 [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic Russ Weight
  2020-09-17 20:28 ` Tom Rix
@ 2020-09-18  2:08 ` Wu, Hao
  2021-02-02 20:16   ` Russ Weight
  1 sibling, 1 reply; 14+ messages in thread
From: Wu, Hao @ 2020-09-18  2:08 UTC (permalink / raw)
  To: Weight, Russell H, mdf, linux-fpga, linux-kernel
  Cc: trix, lgoncalv, Xu, Yilun, Gerlach, Matthew, Weight, Russell H

> -----Original Message-----
> From: Russ Weight <russell.h.weight@intel.com>
> Sent: Friday, September 18, 2020 2:32 AM
> To: mdf@kernel.org; linux-fpga@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Cc: trix@redhat.com; lgoncalv@redhat.com; Xu, Yilun <yilun.xu@intel.com>;
> Wu, Hao <hao.wu@intel.com>; Gerlach, Matthew
> <matthew.gerlach@intel.com>; Weight, Russell H
> <russell.h.weight@intel.com>
> Subject: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
> 
> Port enable is not complete until ACK = 0. Change
> __afu_port_enable() to guarantee that the enable process
> is complete by polling for ACK == 0.

The description of this port reset ack bit is

" After initiating a Port soft reset, SW should monitor this bit. HW 
will set this bit when all outstanding requests initiated by this port
have been drained, and the minimum soft reset pulse width has 
elapsed. "

But no description about what to do when clearing a Port soft reset
to enable the port.

So we need to understand clearly on why we need this change 
(e.g. what may happen without this change), and will it apply for all 
existing DFL devices and future ones, or just for one specific card.
Could you please help? : )

> 
> Signed-off-by: Russ Weight <russell.h.weight@intel.com>
> ---
>  drivers/fpga/dfl-afu-error.c |  2 +-
>  drivers/fpga/dfl-afu-main.c  | 29 +++++++++++++++++++++--------
>  drivers/fpga/dfl-afu.h       |  2 +-
>  3 files changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
> index c4691187cca9..0806532a3e9f 100644
> --- a/drivers/fpga/dfl-afu-error.c
> +++ b/drivers/fpga/dfl-afu-error.c
> @@ -103,7 +103,7 @@ static int afu_port_err_clear(struct device *dev, u64
> err)
>  	__afu_port_err_mask(dev, false);
> 
>  	/* Enable the Port by clear the reset */
> -	__afu_port_enable(pdev);
> +	ret = __afu_port_enable(pdev);
> 
>  done:
>  	mutex_unlock(&pdata->lock);
> diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
> index 753cda4b2568..f73b06cdf13c 100644
> --- a/drivers/fpga/dfl-afu-main.c
> +++ b/drivers/fpga/dfl-afu-main.c
> @@ -21,6 +21,9 @@
> 
>  #include "dfl-afu.h"
> 
> +#define RST_POLL_INVL 10 /* us */
> +#define RST_POLL_TIMEOUT 1000 /* us */
> +
>  /**
>   * __afu_port_enable - enable a port by clear reset
>   * @pdev: port platform device.
> @@ -32,7 +35,7 @@
>   *
>   * The caller needs to hold lock for protection.
>   */
> -void __afu_port_enable(struct platform_device *pdev)
> +int __afu_port_enable(struct platform_device *pdev)
>  {
>  	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev-
> >dev);
>  	void __iomem *base;
> @@ -41,7 +44,7 @@ void __afu_port_enable(struct platform_device *pdev)
>  	WARN_ON(!pdata->disable_count);
> 
>  	if (--pdata->disable_count != 0)
> -		return;
> +		return 0;
> 
>  	base = dfl_get_feature_ioaddr_by_id(&pdev->dev,
> PORT_FEATURE_ID_HEADER);
> 
> @@ -49,10 +52,20 @@ void __afu_port_enable(struct platform_device
> *pdev)
>  	v = readq(base + PORT_HDR_CTRL);
>  	v &= ~PORT_CTRL_SFTRST;
>  	writeq(v, base + PORT_HDR_CTRL);
> -}
> 
> -#define RST_POLL_INVL 10 /* us */
> -#define RST_POLL_TIMEOUT 1000 /* us */
> +	/*
> +	 * HW clears the ack bit to indicate that the port is fully out
> +	 * of reset.
> +	 */
> +	if (readq_poll_timeout(base + PORT_HDR_CTRL, v,
> +			       !(v & PORT_CTRL_SFTRST_ACK),
> +			       RST_POLL_INVL, RST_POLL_TIMEOUT)) {
> +		dev_err(&pdev->dev, "timeout, failure to enable device\n");
> +		return -ETIMEDOUT;
> +	}
> +
> +	return 0;
> +}
> 
>  /**
>   * __afu_port_disable - disable a port by hold reset
> @@ -111,7 +124,7 @@ static int __port_reset(struct platform_device *pdev)
> 
>  	ret = __afu_port_disable(pdev);
>  	if (!ret)
> -		__afu_port_enable(pdev);
> +		ret = __afu_port_enable(pdev);
> 
>  	return ret;

What about:

	ret = __afu_port_disable(pdev);
	if (ret)
		return ret;

	return __afu_port_enable(pdev);

Thanks
Hao

>  }
> @@ -872,11 +885,11 @@ static int afu_dev_destroy(struct platform_device
> *pdev)
>  static int port_enable_set(struct platform_device *pdev, bool enable)
>  {
>  	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev-
> >dev);
> -	int ret = 0;
> +	int ret;
> 
>  	mutex_lock(&pdata->lock);
>  	if (enable)
> -		__afu_port_enable(pdev);
> +		ret = __afu_port_enable(pdev);
>  	else
>  		ret = __afu_port_disable(pdev);
>  	mutex_unlock(&pdata->lock);
> diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
> index 576e94960086..e5020e2b1f3d 100644
> --- a/drivers/fpga/dfl-afu.h
> +++ b/drivers/fpga/dfl-afu.h
> @@ -80,7 +80,7 @@ struct dfl_afu {
>  };
> 
>  /* hold pdata->lock when call __afu_port_enable/disable */
> -void __afu_port_enable(struct platform_device *pdev);
> +int __afu_port_enable(struct platform_device *pdev);
>  int __afu_port_disable(struct platform_device *pdev);
> 
>  void afu_mmio_region_init(struct dfl_feature_platform_data *pdata);
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
  2020-09-18  2:08 ` Wu, Hao
@ 2021-02-02 20:16   ` Russ Weight
  2021-02-03  9:28     ` Wu, Hao
  0 siblings, 1 reply; 14+ messages in thread
From: Russ Weight @ 2021-02-02 20:16 UTC (permalink / raw)
  To: Wu, Hao, mdf, linux-fpga, linux-kernel
  Cc: trix, lgoncalv, Xu, Yilun, Gerlach, Matthew

Sorry for the delay on this patch. It seemed like a lower priority patch than
others, since we haven't seen any issues with current products. Please my
responses inline.

On 9/17/20 7:08 PM, Wu, Hao wrote:
>> -----Original Message-----
>> From: Russ Weight <russell.h.weight@intel.com>
>> Sent: Friday, September 18, 2020 2:32 AM
>> To: mdf@kernel.org; linux-fpga@vger.kernel.org; linux-
>> kernel@vger.kernel.org
>> Cc: trix@redhat.com; lgoncalv@redhat.com; Xu, Yilun <yilun.xu@intel.com>;
>> Wu, Hao <hao.wu@intel.com>; Gerlach, Matthew
>> <matthew.gerlach@intel.com>; Weight, Russell H
>> <russell.h.weight@intel.com>
>> Subject: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
>>
>> Port enable is not complete until ACK = 0. Change
>> __afu_port_enable() to guarantee that the enable process
>> is complete by polling for ACK == 0.
> The description of this port reset ack bit is
>
> " After initiating a Port soft reset, SW should monitor this bit. HW
> will set this bit when all outstanding requests initiated by this port
> have been drained, and the minimum soft reset pulse width has
> elapsed. "
>
> But no description about what to do when clearing a Port soft reset
> to enable the port.
>
> So we need to understand clearly on why we need this change
> (e.g. what may happen without this change), and will it apply for all
> existing DFL devices and future ones, or just for one specific card.
> Could you please help? : )
I touched bases with the hardware engineers. The recommendation to wait
for ACK to be cleared is new with OFS and is documented in the latest
OFS specification as follows (see step #4):

> 3.7.1 AFU Soft Resets
> Software may cause a soft reset to be issued to the AFU as follows:
> 1. Assert the PortSoftReset field of the PORT_CONTROL register
> 2. Wait for the Port to acknowledge the soft reset by monitoring the
> PortSoftResetAck field of the PORT_CONTROL register, i.e. PortSoftResetAck=1
> 3. Deasserting the PortSoftReset field
> 4. Wait for the Port to acknowledge the soft reset de-assertion by monitoring the
> PortSoftResetAck field of the PORT_CONTROL register, i.e. PortSoftResetAck=0
>
> This sequence ensures that outstanding transactions are suitably flushed and
> that the FIM minimum reset pulse width is respected. Failing to follow this 
> sequence leaves the AFU in an undefined state.

The OFS specification has not been posted publicly, yet.

Also, this is how it was explained to me:

> In most scenario, port will be able to get out of reset soon enough
> when SW releases the port reset, especially on all the PAC products
> which have been verified before release.
>
> Polling for HW to clear the ACK is meant to handle the following scenarios:
>
>   * Different platform can take variable period of time to get out of reset
>   * Bug in the HW that hold the port in reset

So this change is not required for the currently released PAC cards,
but it is needed for OFS based products. I don't think there is any reason
to hold off on the patch, as it is still valid for current products.

>> Signed-off-by: Russ Weight <russell.h.weight@intel.com>
>> ---
>>  drivers/fpga/dfl-afu-error.c |  2 +-
>>  drivers/fpga/dfl-afu-main.c  | 29 +++++++++++++++++++++--------
>>  drivers/fpga/dfl-afu.h       |  2 +-
>>  3 files changed, 23 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
>> index c4691187cca9..0806532a3e9f 100644
>> --- a/drivers/fpga/dfl-afu-error.c
>> +++ b/drivers/fpga/dfl-afu-error.c
>> @@ -103,7 +103,7 @@ static int afu_port_err_clear(struct device *dev, u64
>> err)
>>  __afu_port_err_mask(dev, false);
>>
>>  /* Enable the Port by clear the reset */
>> -__afu_port_enable(pdev);
>> +ret = __afu_port_enable(pdev);
>>
>>  done:
>>  mutex_unlock(&pdata->lock);
>> diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
>> index 753cda4b2568..f73b06cdf13c 100644
>> --- a/drivers/fpga/dfl-afu-main.c
>> +++ b/drivers/fpga/dfl-afu-main.c
>> @@ -21,6 +21,9 @@
>>
>>  #include "dfl-afu.h"
>>
>> +#define RST_POLL_INVL 10 /* us */
>> +#define RST_POLL_TIMEOUT 1000 /* us */
>> +
>>  /**
>>   * __afu_port_enable - enable a port by clear reset
>>   * @pdev: port platform device.
>> @@ -32,7 +35,7 @@
>>   *
>>   * The caller needs to hold lock for protection.
>>   */
>> -void __afu_port_enable(struct platform_device *pdev)
>> +int __afu_port_enable(struct platform_device *pdev)
>>  {
>>  struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev-
>>> dev);
>>  void __iomem *base;
>> @@ -41,7 +44,7 @@ void __afu_port_enable(struct platform_device *pdev)
>>  WARN_ON(!pdata->disable_count);
>>
>>  if (--pdata->disable_count != 0)
>> -return;
>> +return 0;
>>
>>  base = dfl_get_feature_ioaddr_by_id(&pdev->dev,
>> PORT_FEATURE_ID_HEADER);
>>
>> @@ -49,10 +52,20 @@ void __afu_port_enable(struct platform_device
>> *pdev)
>>  v = readq(base + PORT_HDR_CTRL);
>>  v &= ~PORT_CTRL_SFTRST;
>>  writeq(v, base + PORT_HDR_CTRL);
>> -}
>>
>> -#define RST_POLL_INVL 10 /* us */
>> -#define RST_POLL_TIMEOUT 1000 /* us */
>> +/*
>> + * HW clears the ack bit to indicate that the port is fully out
>> + * of reset.
>> + */
>> +if (readq_poll_timeout(base + PORT_HDR_CTRL, v,
>> +       !(v & PORT_CTRL_SFTRST_ACK),
>> +       RST_POLL_INVL, RST_POLL_TIMEOUT)) {
>> +dev_err(&pdev->dev, "timeout, failure to enable device\n");
>> +return -ETIMEDOUT;
>> +}
>> +
>> +return 0;
>> +}
>>
>>  /**
>>   * __afu_port_disable - disable a port by hold reset
>> @@ -111,7 +124,7 @@ static int __port_reset(struct platform_device *pdev)
>>
>>  ret = __afu_port_disable(pdev);
>>  if (!ret)
>> -__afu_port_enable(pdev);
>> +ret = __afu_port_enable(pdev);
>>
>>  return ret;
> What about:
>
> ret = __afu_port_disable(pdev);
> if (ret)
> return ret;
>
> return __afu_port_enable(pdev);
Sure - I'll make this change.

Thanks,
- Russ
>
> Thanks
> Hao
>
>>  }
>> @@ -872,11 +885,11 @@ static int afu_dev_destroy(struct platform_device
>> *pdev)
>>  static int port_enable_set(struct platform_device *pdev, bool enable)
>>  {
>>  struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev-
>>> dev);
>> -int ret = 0;
>> +int ret;
>>
>>  mutex_lock(&pdata->lock);
>>  if (enable)
>> -__afu_port_enable(pdev);
>> +ret = __afu_port_enable(pdev);
>>  else
>>  ret = __afu_port_disable(pdev);
>>  mutex_unlock(&pdata->lock);
>> diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
>> index 576e94960086..e5020e2b1f3d 100644
>> --- a/drivers/fpga/dfl-afu.h
>> +++ b/drivers/fpga/dfl-afu.h
>> @@ -80,7 +80,7 @@ struct dfl_afu {
>>  };
>>
>>  /* hold pdata->lock when call __afu_port_enable/disable */
>> -void __afu_port_enable(struct platform_device *pdev);
>> +int __afu_port_enable(struct platform_device *pdev);
>>  int __afu_port_disable(struct platform_device *pdev);
>>
>>  void afu_mmio_region_init(struct dfl_feature_platform_data *pdata);
>> --
>> 2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
  2020-09-17 20:28 ` Tom Rix
  2020-09-17 21:38   ` Moritz Fischer
@ 2021-02-02 20:32   ` Russ Weight
  2021-02-02 20:38     ` Russ Weight
  1 sibling, 1 reply; 14+ messages in thread
From: Russ Weight @ 2021-02-02 20:32 UTC (permalink / raw)
  To: Tom Rix, mdf, linux-fpga, linux-kernel
  Cc: lgoncalv, yilun.xu, hao.wu, matthew.gerlach



On 9/17/20 1:28 PM, Tom Rix wrote:
> On 9/17/20 11:32 AM, Russ Weight wrote:
>> Port enable is not complete until ACK = 0. Change
>> __afu_port_enable() to guarantee that the enable process
>> is complete by polling for ACK == 0.
>>
>> Signed-off-by: Russ Weight <russell.h.weight@intel.com>
>> ---
>>  drivers/fpga/dfl-afu-error.c |  2 +-
>>  drivers/fpga/dfl-afu-main.c  | 29 +++++++++++++++++++++--------
>>  drivers/fpga/dfl-afu.h       |  2 +-
>>  3 files changed, 23 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
>> index c4691187cca9..0806532a3e9f 100644
>> --- a/drivers/fpga/dfl-afu-error.c
>> +++ b/drivers/fpga/dfl-afu-error.c
>> @@ -103,7 +103,7 @@ static int afu_port_err_clear(struct device *dev, u64 err)
>>  	__afu_port_err_mask(dev, false);
>>  
> There is an earlier bit that sets ret = -EINVAL.
>
> This error will be lost or not handled well.
>
> Right now it doesn't seem to be handled.
Good catch. I'll give priority to -EINVAL in the next version of the
patch, as it is more informative in the context of this function.
>
>>  	/* Enable the Port by clear the reset */
>> -	__afu_port_enable(pdev);
>> +	ret = __afu_port_enable(pdev);
>>  
>>  done:
>>  	mutex_unlock(&pdata->lock);
>> diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
>> index 753cda4b2568..f73b06cdf13c 100644
>> --- a/drivers/fpga/dfl-afu-main.c
>> +++ b/drivers/fpga/dfl-afu-main.c
>> @@ -21,6 +21,9 @@
>>  
>>  #include "dfl-afu.h"
>>  
>> +#define RST_POLL_INVL 10 /* us */
>> +#define RST_POLL_TIMEOUT 1000 /* us */
>> +
>>  /**
>>   * __afu_port_enable - enable a port by clear reset
>>   * @pdev: port platform device.
>> @@ -32,7 +35,7 @@
>>   *
>>   * The caller needs to hold lock for protection.
>>   */
>> -void __afu_port_enable(struct platform_device *pdev)
>> +int __afu_port_enable(struct platform_device *pdev)
>>  {
>>  	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
>>  	void __iomem *base;
>> @@ -41,7 +44,7 @@ void __afu_port_enable(struct platform_device *pdev)
>>  	WARN_ON(!pdata->disable_count);
>>  
>>  	if (--pdata->disable_count != 0)
>> -		return;
>> +		return 0;
> Is this really a success ? Maybe -EBUSY ?
Yilun addressed this question in his previous response. This isessentially a
reference count for nested disable calls. Weonly do the enable if the
disable count has gone to zero, so this isn't an error condition.
>>  
>>  	base = dfl_get_feature_ioaddr_by_id(&pdev->dev, PORT_FEATURE_ID_HEADER);
>>  
>> @@ -49,10 +52,20 @@ void __afu_port_enable(struct platform_device *pdev)
>>  	v = readq(base + PORT_HDR_CTRL);
>>  	v &= ~PORT_CTRL_SFTRST;
>>  	writeq(v, base + PORT_HDR_CTRL);
>> -}
>>  
>> -#define RST_POLL_INVL 10 /* us */
>> -#define RST_POLL_TIMEOUT 1000 /* us */
>> +	/*
>> +	 * HW clears the ack bit to indicate that the port is fully out
>> +	 * of reset.
>> +	 */
>> +	if (readq_poll_timeout(base + PORT_HDR_CTRL, v,
>> +			       !(v & PORT_CTRL_SFTRST_ACK),
>> +			       RST_POLL_INVL, RST_POLL_TIMEOUT)) {
>> +		dev_err(&pdev->dev, "timeout, failure to enable device\n");
>> +		return -ETIMEDOUT;
>> +	}
>> +
>> +	return 0;
>> +}
>>  
>>  /**
>>   * __afu_port_disable - disable a port by hold reset
>> @@ -111,7 +124,7 @@ static int __port_reset(struct platform_device *pdev)
>>  
>>  	ret = __afu_port_disable(pdev);
>>  	if (!ret)
>> -		__afu_port_enable(pdev);
>> +		ret = __afu_port_enable(pdev);
>>  
>>  	return ret;
>>  }
>> @@ -872,11 +885,11 @@ static int afu_dev_destroy(struct platform_device *pdev)
>>  static int port_enable_set(struct platform_device *pdev, bool enable)
>>  {
>>  	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
>> -	int ret = 0;
>> +	int ret;
>>  
>>  	mutex_lock(&pdata->lock);
>>  	if (enable)
>> -		__afu_port_enable(pdev);
>> +		ret = __afu_port_enable(pdev);
>>  	else
>>  		ret = __afu_port_disable(pdev);
>>  	mutex_unlock(&pdata->lock);
>> diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
>> index 576e94960086..e5020e2b1f3d 100644
>> --- a/drivers/fpga/dfl-afu.h
>> +++ b/drivers/fpga/dfl-afu.h
>> @@ -80,7 +80,7 @@ struct dfl_afu {
>>  };
>>  
>>  /* hold pdata->lock when call __afu_port_enable/disable */
>> -void __afu_port_enable(struct platform_device *pdev);
>> +int __afu_port_enable(struct platform_device *pdev);
>>  int __afu_port_disable(struct platform_device *pdev);
> The other functions in this file have afu_*  since the __afu_port_enable/disable
>
> are used other places would it make sense to remove the '__' prefix ?
>
> If you think so, maybe a cleanup patch later.
Yilun and Hao addressed this comment in their previous responses. We are using the
'__' prefix to indicate highlight the fact caller needs to use care in managing
the locking associated with these functions.

Thanks,
- Russ
>
> Tom
>
>>  
>>  void afu_mmio_region_init(struct dfl_feature_platform_data *pdata);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
  2021-02-02 20:32   ` Russ Weight
@ 2021-02-02 20:38     ` Russ Weight
  0 siblings, 0 replies; 14+ messages in thread
From: Russ Weight @ 2021-02-02 20:38 UTC (permalink / raw)
  To: Tom Rix, mdf, linux-fpga, linux-kernel
  Cc: lgoncalv, yilun.xu, hao.wu, matthew.gerlach



On 2/2/21 12:32 PM, Russ Weight wrote:
>
> On 9/17/20 1:28 PM, Tom Rix wrote:
>> On 9/17/20 11:32 AM, Russ Weight wrote:
>>> Port enable is not complete until ACK = 0. Change
>>> __afu_port_enable() to guarantee that the enable process
>>> is complete by polling for ACK == 0.
>>>
>>> Signed-off-by: Russ Weight <russell.h.weight@intel.com>
>>> ---
>>>  drivers/fpga/dfl-afu-error.c |  2 +-
>>>  drivers/fpga/dfl-afu-main.c  | 29 +++++++++++++++++++++--------
>>>  drivers/fpga/dfl-afu.h       |  2 +-
>>>  3 files changed, 23 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
>>> index c4691187cca9..0806532a3e9f 100644
>>> --- a/drivers/fpga/dfl-afu-error.c
>>> +++ b/drivers/fpga/dfl-afu-error.c
>>> @@ -103,7 +103,7 @@ static int afu_port_err_clear(struct device *dev, u64 err)
>>>  	__afu_port_err_mask(dev, false);
>>>  
>> There is an earlier bit that sets ret = -EINVAL.
>>
>> This error will be lost or not handled well.
>>
>> Right now it doesn't seem to be handled.
> Good catch. I'll give priority to -EINVAL in the next version of the
> patch, as it is more informative in the context of this function.
Actually - Hao pointed out in his response that the falure to re-enable the port
is a more serious error, so the code flow OK, but needs a comment.

- Russ
>>>  	/* Enable the Port by clear the reset */
>>> -	__afu_port_enable(pdev);
>>> +	ret = __afu_port_enable(pdev);
>>>  
>>>  done:
>>>  	mutex_unlock(&pdata->lock);
>>> diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
>>> index 753cda4b2568..f73b06cdf13c 100644
>>> --- a/drivers/fpga/dfl-afu-main.c
>>> +++ b/drivers/fpga/dfl-afu-main.c
>>> @@ -21,6 +21,9 @@
>>>  
>>>  #include "dfl-afu.h"
>>>  
>>> +#define RST_POLL_INVL 10 /* us */
>>> +#define RST_POLL_TIMEOUT 1000 /* us */
>>> +
>>>  /**
>>>   * __afu_port_enable - enable a port by clear reset
>>>   * @pdev: port platform device.
>>> @@ -32,7 +35,7 @@
>>>   *
>>>   * The caller needs to hold lock for protection.
>>>   */
>>> -void __afu_port_enable(struct platform_device *pdev)
>>> +int __afu_port_enable(struct platform_device *pdev)
>>>  {
>>>  	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
>>>  	void __iomem *base;
>>> @@ -41,7 +44,7 @@ void __afu_port_enable(struct platform_device *pdev)
>>>  	WARN_ON(!pdata->disable_count);
>>>  
>>>  	if (--pdata->disable_count != 0)
>>> -		return;
>>> +		return 0;
>> Is this really a success ? Maybe -EBUSY ?
> Yilun addressed this question in his previous response. This isessentially a
> reference count for nested disable calls. Weonly do the enable if the
> disable count has gone to zero, so this isn't an error condition.
>>>  
>>>  	base = dfl_get_feature_ioaddr_by_id(&pdev->dev, PORT_FEATURE_ID_HEADER);
>>>  
>>> @@ -49,10 +52,20 @@ void __afu_port_enable(struct platform_device *pdev)
>>>  	v = readq(base + PORT_HDR_CTRL);
>>>  	v &= ~PORT_CTRL_SFTRST;
>>>  	writeq(v, base + PORT_HDR_CTRL);
>>> -}
>>>  
>>> -#define RST_POLL_INVL 10 /* us */
>>> -#define RST_POLL_TIMEOUT 1000 /* us */
>>> +	/*
>>> +	 * HW clears the ack bit to indicate that the port is fully out
>>> +	 * of reset.
>>> +	 */
>>> +	if (readq_poll_timeout(base + PORT_HDR_CTRL, v,
>>> +			       !(v & PORT_CTRL_SFTRST_ACK),
>>> +			       RST_POLL_INVL, RST_POLL_TIMEOUT)) {
>>> +		dev_err(&pdev->dev, "timeout, failure to enable device\n");
>>> +		return -ETIMEDOUT;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>>  
>>>  /**
>>>   * __afu_port_disable - disable a port by hold reset
>>> @@ -111,7 +124,7 @@ static int __port_reset(struct platform_device *pdev)
>>>  
>>>  	ret = __afu_port_disable(pdev);
>>>  	if (!ret)
>>> -		__afu_port_enable(pdev);
>>> +		ret = __afu_port_enable(pdev);
>>>  
>>>  	return ret;
>>>  }
>>> @@ -872,11 +885,11 @@ static int afu_dev_destroy(struct platform_device *pdev)
>>>  static int port_enable_set(struct platform_device *pdev, bool enable)
>>>  {
>>>  	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
>>> -	int ret = 0;
>>> +	int ret;
>>>  
>>>  	mutex_lock(&pdata->lock);
>>>  	if (enable)
>>> -		__afu_port_enable(pdev);
>>> +		ret = __afu_port_enable(pdev);
>>>  	else
>>>  		ret = __afu_port_disable(pdev);
>>>  	mutex_unlock(&pdata->lock);
>>> diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
>>> index 576e94960086..e5020e2b1f3d 100644
>>> --- a/drivers/fpga/dfl-afu.h
>>> +++ b/drivers/fpga/dfl-afu.h
>>> @@ -80,7 +80,7 @@ struct dfl_afu {
>>>  };
>>>  
>>>  /* hold pdata->lock when call __afu_port_enable/disable */
>>> -void __afu_port_enable(struct platform_device *pdev);
>>> +int __afu_port_enable(struct platform_device *pdev);
>>>  int __afu_port_disable(struct platform_device *pdev);
>> The other functions in this file have afu_*  since the __afu_port_enable/disable
>>
>> are used other places would it make sense to remove the '__' prefix ?
>>
>> If you think so, maybe a cleanup patch later.
> Yilun and Hao addressed this comment in their previous responses. We are using the
> '__' prefix to indicate highlight the fact caller needs to use care in managing
> the locking associated with these functions.
>
> Thanks,
> - Russ
>> Tom
>>
>>>  
>>>  void afu_mmio_region_init(struct dfl_feature_platform_data *pdata);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
  2020-09-17 21:38   ` Moritz Fischer
  2020-09-18  1:23     ` Xu Yilun
  2020-09-18  2:00     ` Wu, Hao
@ 2021-02-02 20:44     ` Russ Weight
  2 siblings, 0 replies; 14+ messages in thread
From: Russ Weight @ 2021-02-02 20:44 UTC (permalink / raw)
  To: Moritz Fischer, Tom Rix
  Cc: linux-fpga, linux-kernel, lgoncalv, yilun.xu, hao.wu, matthew.gerlach



On 9/17/20 2:38 PM, Moritz Fischer wrote:
> On Thu, Sep 17, 2020 at 01:28:22PM -0700, Tom Rix wrote:
>> On 9/17/20 11:32 AM, Russ Weight wrote:
>>> Port enable is not complete until ACK = 0. Change
>>> __afu_port_enable() to guarantee that the enable process
>>> is complete by polling for ACK == 0.
>>>
>>> Signed-off-by: Russ Weight <russell.h.weight@intel.com>
> General note: Please keep a changelog if you send updated versions of a
> patch. This can be added here with an extra '---' + Text between Signed-off and
> diffstat:
>
> --- 
> Changes from v1:
> - FOo
> - Bar
Yes - I'll do that on future patch updates. In this case v2 just fixed a typo
in the commit message, so the patch was essentially the same as v1.
>>> ---
>>>  drivers/fpga/dfl-afu-error.c |  2 +-
>>>  drivers/fpga/dfl-afu-main.c  | 29 +++++++++++++++++++++--------
>>>  drivers/fpga/dfl-afu.h       |  2 +-
>>>  3 files changed, 23 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
>>> index c4691187cca9..0806532a3e9f 100644
>>> --- a/drivers/fpga/dfl-afu-error.c
>>> +++ b/drivers/fpga/dfl-afu-error.c
>>> @@ -103,7 +103,7 @@ static int afu_port_err_clear(struct device *dev, u64 err)
>>>  	__afu_port_err_mask(dev, false);
>>>  
>> There is an earlier bit that sets ret = -EINVAL.
>>
>> This error will be lost or not handled well.
>>
>> Right now it doesn't seem to be handled.
> Ultimately you'd want to report *at least* one of them, the current code
> seems to continue and enable the port either case. Is that what it
> should be doing? 
>
> Is the timeout more severe than the invalid value? Do you want to print
> a warning?
>
> Either way a comment explaining why this is ok would be appreciated :)
Yes - I'll add a comment explaining how the errors arebeing prioritized.
I'll give priority to the timeout, asit is likely a HW failure.

>>>  	/* Enable the Port by clear the reset */
>>> -	__afu_port_enable(pdev);
>>> +	ret = __afu_port_enable(pdev);
>>>  
>>>  done:
>>>  	mutex_unlock(&pdata->lock);
>>> diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
>>> index 753cda4b2568..f73b06cdf13c 100644
>>> --- a/drivers/fpga/dfl-afu-main.c
>>> +++ b/drivers/fpga/dfl-afu-main.c
>>> @@ -21,6 +21,9 @@
>>>  
>>>  #include "dfl-afu.h"
>>>  
>>> +#define RST_POLL_INVL 10 /* us */
>>> +#define RST_POLL_TIMEOUT 1000 /* us */
>>> +
>>>  /**
>>>   * __afu_port_enable - enable a port by clear reset
>>>   * @pdev: port platform device.
>>> @@ -32,7 +35,7 @@
>>>   *
>>>   * The caller needs to hold lock for protection.
>>>   */
>>> -void __afu_port_enable(struct platform_device *pdev)
>>> +int __afu_port_enable(struct platform_device *pdev)
>>>  {
>>>  	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
>>>  	void __iomem *base;
>>> @@ -41,7 +44,7 @@ void __afu_port_enable(struct platform_device *pdev)
>>>  	WARN_ON(!pdata->disable_count);
>>>  
>>>  	if (--pdata->disable_count != 0)
>>> -		return;
>>> +		return 0;
>> Is this really a success ? Maybe -EBUSY ?
> Seems like if it's severe enough for a warning you'd probably want to
> return an error.
As mentioned by Hao and Yilun, the disable_count is a reference count.The
WARN_ON() is checking for a different condition - an invalid reference count.
We should never call port_enable if the port is not disabled. Do you think a
comment is needed here?

Thanks,
- Russ

>>>  
>>>  	base = dfl_get_feature_ioaddr_by_id(&pdev->dev, PORT_FEATURE_ID_HEADER);
>>>  
>>> @@ -49,10 +52,20 @@ void __afu_port_enable(struct platform_device *pdev)
>>>  	v = readq(base + PORT_HDR_CTRL);
>>>  	v &= ~PORT_CTRL_SFTRST;
>>>  	writeq(v, base + PORT_HDR_CTRL);
>>> -}
>>>  
>>> -#define RST_POLL_INVL 10 /* us */
>>> -#define RST_POLL_TIMEOUT 1000 /* us */
>>> +	/*
>>> +	 * HW clears the ack bit to indicate that the port is fully out
>>> +	 * of reset.
>>> +	 */
>>> +	if (readq_poll_timeout(base + PORT_HDR_CTRL, v,
>>> +			       !(v & PORT_CTRL_SFTRST_ACK),
>>> +			       RST_POLL_INVL, RST_POLL_TIMEOUT)) {
>>> +		dev_err(&pdev->dev, "timeout, failure to enable device\n");
>>> +		return -ETIMEDOUT;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>>  
>>>  /**
>>>   * __afu_port_disable - disable a port by hold reset
>>> @@ -111,7 +124,7 @@ static int __port_reset(struct platform_device *pdev)
>>>  
>>>  	ret = __afu_port_disable(pdev);
>>>  	if (!ret)
>>> -		__afu_port_enable(pdev);
>>> +		ret = __afu_port_enable(pdev);
>>>  
>>>  	return ret;
>>>  }
>>> @@ -872,11 +885,11 @@ static int afu_dev_destroy(struct platform_device *pdev)
>>>  static int port_enable_set(struct platform_device *pdev, bool enable)
>>>  {
>>>  	struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
>>> -	int ret = 0;
>>> +	int ret;
>>>  
>>>  	mutex_lock(&pdata->lock);
>>>  	if (enable)
>>> -		__afu_port_enable(pdev);
>>> +		ret = __afu_port_enable(pdev);
>>>  	else
>>>  		ret = __afu_port_disable(pdev);
>>>  	mutex_unlock(&pdata->lock);
>>> diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
>>> index 576e94960086..e5020e2b1f3d 100644
>>> --- a/drivers/fpga/dfl-afu.h
>>> +++ b/drivers/fpga/dfl-afu.h
>>> @@ -80,7 +80,7 @@ struct dfl_afu {
>>>  };
>>>  
>>>  /* hold pdata->lock when call __afu_port_enable/disable */
>>> -void __afu_port_enable(struct platform_device *pdev);
>>> +int __afu_port_enable(struct platform_device *pdev);
>>>  int __afu_port_disable(struct platform_device *pdev);
>> The other functions in this file have afu_*  since the __afu_port_enable/disable
>>
>> are used other places would it make sense to remove the '__' prefix ?
> The idea on those is to indicate that the caller need to be cautious
> (often a lock / mutex) is required. I think keeping them as is is fine.
>
>> If you think so, maybe a cleanup patch later.
>>
>> Tom
>>
>>>  
>>>  void afu_mmio_region_init(struct dfl_feature_platform_data *pdata);
> Thanks,
> Moritz


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
  2021-02-02 20:16   ` Russ Weight
@ 2021-02-03  9:28     ` Wu, Hao
  2021-02-03 22:43       ` Russ Weight
  0 siblings, 1 reply; 14+ messages in thread
From: Wu, Hao @ 2021-02-03  9:28 UTC (permalink / raw)
  To: Weight, Russell H, mdf, linux-fpga, linux-kernel
  Cc: trix, lgoncalv, Xu, Yilun, Gerlach, Matthew

> Subject: Re: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
> 
> Sorry for the delay on this patch. It seemed like a lower priority patch than
> others, since we haven't seen any issues with current products. Please my
> responses inline.
> 
> On 9/17/20 7:08 PM, Wu, Hao wrote:
> >> -----Original Message-----
> >> From: Russ Weight <russell.h.weight@intel.com>
> >> Sent: Friday, September 18, 2020 2:32 AM
> >> To: mdf@kernel.org; linux-fpga@vger.kernel.org; linux-
> >> kernel@vger.kernel.org
> >> Cc: trix@redhat.com; lgoncalv@redhat.com; Xu, Yilun <yilun.xu@intel.com>;
> >> Wu, Hao <hao.wu@intel.com>; Gerlach, Matthew
> >> <matthew.gerlach@intel.com>; Weight, Russell H
> >> <russell.h.weight@intel.com>
> >> Subject: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
> >>
> >> Port enable is not complete until ACK = 0. Change
> >> __afu_port_enable() to guarantee that the enable process
> >> is complete by polling for ACK == 0.
> > The description of this port reset ack bit is
> >
> > " After initiating a Port soft reset, SW should monitor this bit. HW
> > will set this bit when all outstanding requests initiated by this port
> > have been drained, and the minimum soft reset pulse width has
> > elapsed. "
> >
> > But no description about what to do when clearing a Port soft reset
> > to enable the port.
> >
> > So we need to understand clearly on why we need this change
> > (e.g. what may happen without this change), and will it apply for all
> > existing DFL devices and future ones, or just for one specific card.
> > Could you please help? : )
> I touched bases with the hardware engineers. The recommendation to wait
> for ACK to be cleared is new with OFS and is documented in the latest
> OFS specification as follows (see step #4):
> 
> > 3.7.1 AFU Soft Resets
> > Software may cause a soft reset to be issued to the AFU as follows:
> > 1. Assert the PortSoftReset field of the PORT_CONTROL register
> > 2. Wait for the Port to acknowledge the soft reset by monitoring the
> > PortSoftResetAck field of the PORT_CONTROL register, i.e.
> PortSoftResetAck=1
> > 3. Deasserting the PortSoftReset field
> > 4. Wait for the Port to acknowledge the soft reset de-assertion by monitoring
> the
> > PortSoftResetAck field of the PORT_CONTROL register, i.e.
> PortSoftResetAck=0
> >
> > This sequence ensures that outstanding transactions are suitably flushed and
> > that the FIM minimum reset pulse width is respected. Failing to follow this
> > sequence leaves the AFU in an undefined state.
> 
> The OFS specification has not been posted publicly, yet.
> 
> Also, this is how it was explained to me:
> 
> > In most scenario, port will be able to get out of reset soon enough
> > when SW releases the port reset, especially on all the PAC products
> > which have been verified before release.
> >
> > Polling for HW to clear the ACK is meant to handle the following scenarios:
> >
> >   * Different platform can take variable period of time to get out of reset
> >   * Bug in the HW that hold the port in reset
> 
> So this change is not required for the currently released PAC cards,
> but it is needed for OFS based products. I don't think there is any reason
> to hold off on the patch, as it is still valid for current products.

As you know, this driver is used for different cards, and we need to make
sure new changes introduced in new version spec, don't break old products
as we are sharing the same driver. and we are not sure if in the future some 
new products but still uses old specs, and then things may be broken if the
driver which always perform new flow. Another method is that introduce 1 
bit in hardware register to tell the driver to perform the additional steps, 
then it can avoid impacts to the old products. If this can't be done, then
we at least need to verify this change on all existing hardware and suggest
users to follow new spec only.

Hao

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
  2021-02-03  9:28     ` Wu, Hao
@ 2021-02-03 22:43       ` Russ Weight
  2021-02-03 23:07         ` matthew.gerlach
  0 siblings, 1 reply; 14+ messages in thread
From: Russ Weight @ 2021-02-03 22:43 UTC (permalink / raw)
  To: Wu, Hao, mdf, linux-fpga, linux-kernel
  Cc: trix, lgoncalv, Xu, Yilun, Gerlach, Matthew



On 2/3/21 1:28 AM, Wu, Hao wrote:
>> Subject: Re: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
>>
>> Sorry for the delay on this patch. It seemed like a lower priority patch than
>> others, since we haven't seen any issues with current products. Please my
>> responses inline.
>>
>> On 9/17/20 7:08 PM, Wu, Hao wrote:
>>>> -----Original Message-----
>>>> From: Russ Weight <russell.h.weight@intel.com>
>>>> Sent: Friday, September 18, 2020 2:32 AM
>>>> To: mdf@kernel.org; linux-fpga@vger.kernel.org; linux-
>>>> kernel@vger.kernel.org
>>>> Cc: trix@redhat.com; lgoncalv@redhat.com; Xu, Yilun <yilun.xu@intel.com>;
>>>> Wu, Hao <hao.wu@intel.com>; Gerlach, Matthew
>>>> <matthew.gerlach@intel.com>; Weight, Russell H
>>>> <russell.h.weight@intel.com>
>>>> Subject: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
>>>>
>>>> Port enable is not complete until ACK = 0. Change
>>>> __afu_port_enable() to guarantee that the enable process
>>>> is complete by polling for ACK == 0.
>>> The description of this port reset ack bit is
>>>
>>> " After initiating a Port soft reset, SW should monitor this bit. HW
>>> will set this bit when all outstanding requests initiated by this port
>>> have been drained, and the minimum soft reset pulse width has
>>> elapsed. "
>>>
>>> But no description about what to do when clearing a Port soft reset
>>> to enable the port.
>>>
>>> So we need to understand clearly on why we need this change
>>> (e.g. what may happen without this change), and will it apply for all
>>> existing DFL devices and future ones, or just for one specific card.
>>> Could you please help? : )
>> I touched bases with the hardware engineers. The recommendation to wait
>> for ACK to be cleared is new with OFS and is documented in the latest
>> OFS specification as follows (see step #4):
>>
>>> 3.7.1 AFU Soft Resets
>>> Software may cause a soft reset to be issued to the AFU as follows:
>>> 1. Assert the PortSoftReset field of the PORT_CONTROL register
>>> 2. Wait for the Port to acknowledge the soft reset by monitoring the
>>> PortSoftResetAck field of the PORT_CONTROL register, i.e.
>> PortSoftResetAck=1
>>> 3. Deasserting the PortSoftReset field
>>> 4. Wait for the Port to acknowledge the soft reset de-assertion by monitoring
>> the
>>> PortSoftResetAck field of the PORT_CONTROL register, i.e.
>> PortSoftResetAck=0
>>> This sequence ensures that outstanding transactions are suitably flushed and
>>> that the FIM minimum reset pulse width is respected. Failing to follow this
>>> sequence leaves the AFU in an undefined state.
>> The OFS specification has not been posted publicly, yet.
>>
>> Also, this is how it was explained to me:
>>
>>> In most scenario, port will be able to get out of reset soon enough
>>> when SW releases the port reset, especially on all the PAC products
>>> which have been verified before release.
>>>
>>> Polling for HW to clear the ACK is meant to handle the following scenarios:
>>>
>>>   * Different platform can take variable period of time to get out of reset
>>>   * Bug in the HW that hold the port in reset
>> So this change is not required for the currently released PAC cards,
>> but it is needed for OFS based products. I don't think there is any reason
>> to hold off on the patch, as it is still valid for current products.
> As you know, this driver is used for different cards, and we need to make
> sure new changes introduced in new version spec, don't break old products
> as we are sharing the same driver. and we are not sure if in the future some
> new products but still uses old specs, and then things may be broken if the
> driver which always perform new flow. Another method is that introduce 1
> bit in hardware register to tell the driver to perform the additional steps,
> then it can avoid impacts to the old products. If this can't be done, then
> we at least need to verify this change on all existing hardware and suggest
> users to follow new spec only.

According to the HW engineers, the RTL implementation has not changed; it is
the same as the RTL in the current PAC products. Polling for HW to clear the
ACK is something we could have (should have?) been doing all along. The timing
hasn't been an issue for the current PAC products, as proven by our testing.
However, with OFS we cannot anticipate what the timing will be for customer
designed products, so the specification is calling out this requirement as a
precaution.

I am using a development machine that has the older PAC devices installed. I
cleared port errors on these cards as a quick check, and the reset completes
without hanging - which indicates that the ACK bit is in fact getting cleared.
So there is not need for any device-specific conditional statements here.

- Russ

>
> Hao


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
  2021-02-03 22:43       ` Russ Weight
@ 2021-02-03 23:07         ` matthew.gerlach
  2021-02-04  1:55           ` Wu, Hao
  0 siblings, 1 reply; 14+ messages in thread
From: matthew.gerlach @ 2021-02-03 23:07 UTC (permalink / raw)
  To: Russ Weight
  Cc: Wu, Hao, mdf, linux-fpga, linux-kernel, trix, lgoncalv, Xu,
	Yilun, Gerlach, Matthew



On Wed, 3 Feb 2021, Russ Weight wrote:

>
>
> On 2/3/21 1:28 AM, Wu, Hao wrote:
>>> Subject: Re: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
>>>
>>> Sorry for the delay on this patch. It seemed like a lower priority patch than
>>> others, since we haven't seen any issues with current products. Please my
>>> responses inline.
>>>
>>> On 9/17/20 7:08 PM, Wu, Hao wrote:
>>>>> -----Original Message-----
>>>>> From: Russ Weight <russell.h.weight@intel.com>
>>>>> Sent: Friday, September 18, 2020 2:32 AM
>>>>> To: mdf@kernel.org; linux-fpga@vger.kernel.org; linux-
>>>>> kernel@vger.kernel.org
>>>>> Cc: trix@redhat.com; lgoncalv@redhat.com; Xu, Yilun <yilun.xu@intel.com>;
>>>>> Wu, Hao <hao.wu@intel.com>; Gerlach, Matthew
>>>>> <matthew.gerlach@intel.com>; Weight, Russell H
>>>>> <russell.h.weight@intel.com>
>>>>> Subject: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
>>>>>
>>>>> Port enable is not complete until ACK = 0. Change
>>>>> __afu_port_enable() to guarantee that the enable process
>>>>> is complete by polling for ACK == 0.
>>>> The description of this port reset ack bit is
>>>>
>>>> " After initiating a Port soft reset, SW should monitor this bit. HW
>>>> will set this bit when all outstanding requests initiated by this port
>>>> have been drained, and the minimum soft reset pulse width has
>>>> elapsed. "
>>>>
>>>> But no description about what to do when clearing a Port soft reset
>>>> to enable the port.
>>>>
>>>> So we need to understand clearly on why we need this change
>>>> (e.g. what may happen without this change), and will it apply for all
>>>> existing DFL devices and future ones, or just for one specific card.
>>>> Could you please help? : )
>>> I touched bases with the hardware engineers. The recommendation to wait
>>> for ACK to be cleared is new with OFS and is documented in the latest
>>> OFS specification as follows (see step #4):
>>>
>>>> 3.7.1 AFU Soft Resets
>>>> Software may cause a soft reset to be issued to the AFU as follows:
>>>> 1. Assert the PortSoftReset field of the PORT_CONTROL register
>>>> 2. Wait for the Port to acknowledge the soft reset by monitoring the
>>>> PortSoftResetAck field of the PORT_CONTROL register, i.e.
>>> PortSoftResetAck=1
>>>> 3. Deasserting the PortSoftReset field
>>>> 4. Wait for the Port to acknowledge the soft reset de-assertion by monitoring
>>> the
>>>> PortSoftResetAck field of the PORT_CONTROL register, i.e.
>>> PortSoftResetAck=0
>>>> This sequence ensures that outstanding transactions are suitably flushed and
>>>> that the FIM minimum reset pulse width is respected. Failing to follow this
>>>> sequence leaves the AFU in an undefined state.
>>> The OFS specification has not been posted publicly, yet.
>>>
>>> Also, this is how it was explained to me:
>>>
>>>> In most scenario, port will be able to get out of reset soon enough
>>>> when SW releases the port reset, especially on all the PAC products
>>>> which have been verified before release.
>>>>
>>>> Polling for HW to clear the ACK is meant to handle the following scenarios:
>>>>
>>>>   * Different platform can take variable period of time to get out of reset
>>>>   * Bug in the HW that hold the port in reset
>>> So this change is not required for the currently released PAC cards,
>>> but it is needed for OFS based products. I don't think there is any reason
>>> to hold off on the patch, as it is still valid for current products.
>> As you know, this driver is used for different cards, and we need to make
>> sure new changes introduced in new version spec, don't break old products
>> as we are sharing the same driver. and we are not sure if in the future some
>> new products but still uses old specs, and then things may be broken if the
>> driver which always perform new flow. Another method is that introduce 1
>> bit in hardware register to tell the driver to perform the additional steps,
>> then it can avoid impacts to the old products. If this can't be done, then
>> we at least need to verify this change on all existing hardware and suggest
>> users to follow new spec only.
>
> According to the HW engineers, the RTL implementation has not changed; it is
> the same as the RTL in the current PAC products. Polling for HW to clear the
> ACK is something we could have (should have?) been doing all along. The timing

I also confirmed with HW engineers.  The original specification was 
not precise.  The code should have been doing this all along.

Matthew Gerlach

> hasn't been an issue for the current PAC products, as proven by our testing.
> However, with OFS we cannot anticipate what the timing will be for customer
> designed products, so the specification is calling out this requirement as a
> precaution.
>
> I am using a development machine that has the older PAC devices installed. I
> cleared port errors on these cards as a quick check, and the reset completes
> without hanging - which indicates that the ACK bit is in fact getting cleared.
> So there is not need for any device-specific conditional statements here.
>
> - Russ
>
>>
>> Hao
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
  2021-02-03 23:07         ` matthew.gerlach
@ 2021-02-04  1:55           ` Wu, Hao
  0 siblings, 0 replies; 14+ messages in thread
From: Wu, Hao @ 2021-02-04  1:55 UTC (permalink / raw)
  To: matthew.gerlach, Weight, Russell H
  Cc: mdf, linux-fpga, linux-kernel, trix, lgoncalv, Xu, Yilun,
	Gerlach, Matthew

> On Wed, 3 Feb 2021, Russ Weight wrote:
> 
> >
> >
> > On 2/3/21 1:28 AM, Wu, Hao wrote:
> >>> Subject: Re: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
> >>>
> >>> Sorry for the delay on this patch. It seemed like a lower priority patch than
> >>> others, since we haven't seen any issues with current products. Please my
> >>> responses inline.
> >>>
> >>> On 9/17/20 7:08 PM, Wu, Hao wrote:
> >>>>> -----Original Message-----
> >>>>> From: Russ Weight <russell.h.weight@intel.com>
> >>>>> Sent: Friday, September 18, 2020 2:32 AM
> >>>>> To: mdf@kernel.org; linux-fpga@vger.kernel.org; linux-
> >>>>> kernel@vger.kernel.org
> >>>>> Cc: trix@redhat.com; lgoncalv@redhat.com; Xu, Yilun
> <yilun.xu@intel.com>;
> >>>>> Wu, Hao <hao.wu@intel.com>; Gerlach, Matthew
> >>>>> <matthew.gerlach@intel.com>; Weight, Russell H
> >>>>> <russell.h.weight@intel.com>
> >>>>> Subject: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
> >>>>>
> >>>>> Port enable is not complete until ACK = 0. Change
> >>>>> __afu_port_enable() to guarantee that the enable process
> >>>>> is complete by polling for ACK == 0.
> >>>> The description of this port reset ack bit is
> >>>>
> >>>> " After initiating a Port soft reset, SW should monitor this bit. HW
> >>>> will set this bit when all outstanding requests initiated by this port
> >>>> have been drained, and the minimum soft reset pulse width has
> >>>> elapsed. "
> >>>>
> >>>> But no description about what to do when clearing a Port soft reset
> >>>> to enable the port.
> >>>>
> >>>> So we need to understand clearly on why we need this change
> >>>> (e.g. what may happen without this change), and will it apply for all
> >>>> existing DFL devices and future ones, or just for one specific card.
> >>>> Could you please help? : )
> >>> I touched bases with the hardware engineers. The recommendation to wait
> >>> for ACK to be cleared is new with OFS and is documented in the latest
> >>> OFS specification as follows (see step #4):
> >>>
> >>>> 3.7.1 AFU Soft Resets
> >>>> Software may cause a soft reset to be issued to the AFU as follows:
> >>>> 1. Assert the PortSoftReset field of the PORT_CONTROL register
> >>>> 2. Wait for the Port to acknowledge the soft reset by monitoring the
> >>>> PortSoftResetAck field of the PORT_CONTROL register, i.e.
> >>> PortSoftResetAck=1
> >>>> 3. Deasserting the PortSoftReset field
> >>>> 4. Wait for the Port to acknowledge the soft reset de-assertion by
> monitoring
> >>> the
> >>>> PortSoftResetAck field of the PORT_CONTROL register, i.e.
> >>> PortSoftResetAck=0
> >>>> This sequence ensures that outstanding transactions are suitably flushed
> and
> >>>> that the FIM minimum reset pulse width is respected. Failing to follow this
> >>>> sequence leaves the AFU in an undefined state.
> >>> The OFS specification has not been posted publicly, yet.
> >>>
> >>> Also, this is how it was explained to me:
> >>>
> >>>> In most scenario, port will be able to get out of reset soon enough
> >>>> when SW releases the port reset, especially on all the PAC products
> >>>> which have been verified before release.
> >>>>
> >>>> Polling for HW to clear the ACK is meant to handle the following scenarios:
> >>>>
> >>>>   * Different platform can take variable period of time to get out of reset
> >>>>   * Bug in the HW that hold the port in reset
> >>> So this change is not required for the currently released PAC cards,
> >>> but it is needed for OFS based products. I don't think there is any reason
> >>> to hold off on the patch, as it is still valid for current products.
> >> As you know, this driver is used for different cards, and we need to make
> >> sure new changes introduced in new version spec, don't break old products
> >> as we are sharing the same driver. and we are not sure if in the future some
> >> new products but still uses old specs, and then things may be broken if the
> >> driver which always perform new flow. Another method is that introduce 1
> >> bit in hardware register to tell the driver to perform the additional steps,
> >> then it can avoid impacts to the old products. If this can't be done, then
> >> we at least need to verify this change on all existing hardware and suggest
> >> users to follow new spec only.
> >
> > According to the HW engineers, the RTL implementation has not changed; it is
> > the same as the RTL in the current PAC products. Polling for HW to clear the
> > ACK is something we could have (should have?) been doing all along. The
> timing
> 
> I also confirmed with HW engineers.  The original specification was
> not precise.  The code should have been doing this all along.

Thanks for this confirmation, then it sounds good to me. I think only Intel
hardware is using this driver now, so if this is confirmed from hardware side,
then we should be safe to take this one.

Hao

> 
> Matthew Gerlach
> 
> > hasn't been an issue for the current PAC products, as proven by our testing.
> > However, with OFS we cannot anticipate what the timing will be for customer
> > designed products, so the specification is calling out this requirement as a
> > precaution.
> >
> > I am using a development machine that has the older PAC devices installed. I
> > cleared port errors on these cards as a quick check, and the reset completes
> > without hanging - which indicates that the ACK bit is in fact getting cleared.
> > So there is not need for any device-specific conditional statements here.
> >
> > - Russ
> >
> >>
> >> Hao
> >
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-02-04  1:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-17 18:32 [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic Russ Weight
2020-09-17 20:28 ` Tom Rix
2020-09-17 21:38   ` Moritz Fischer
2020-09-18  1:23     ` Xu Yilun
2020-09-18  2:00     ` Wu, Hao
2021-02-02 20:44     ` Russ Weight
2021-02-02 20:32   ` Russ Weight
2021-02-02 20:38     ` Russ Weight
2020-09-18  2:08 ` Wu, Hao
2021-02-02 20:16   ` Russ Weight
2021-02-03  9:28     ` Wu, Hao
2021-02-03 22:43       ` Russ Weight
2021-02-03 23:07         ` matthew.gerlach
2021-02-04  1:55           ` Wu, Hao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).