linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init
@ 2021-10-19 23:15 Russ Weight
  2021-10-19 23:15 ` [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init Russ Weight
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Russ Weight @ 2021-10-19 23:15 UTC (permalink / raw)
  To: mdf, linux-fpga, linux-kernel
  Cc: trix, lgoncalv, yilun.xu, hao.wu, matthew.gerlach, Russ Weight

These patches address a request to log and clear any prexisting errors on
FPGA cards when the drivers load. Any existing errors will result in print
statements to the kernel error log before the errors are cleared. These
changes specifically affect the fme and port error registers.

Russ Weight (2):
  fpga: dfl: afu: Clear port errors in afu init
  fpga: dfl: fme: Clear fme global errors at driver init

 drivers/fpga/dfl-afu-error.c |  26 ++++---
 drivers/fpga/dfl-fme-error.c | 128 +++++++++++++++++++++++------------
 2 files changed, 100 insertions(+), 54 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init
  2021-10-19 23:15 [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init Russ Weight
@ 2021-10-19 23:15 ` Russ Weight
  2021-10-20  4:15   ` Xu Yilun
  2021-10-20 12:18   ` Tom Rix
  2021-10-19 23:15 ` [PATCH v1 2/2] fpga: dfl: fme: Clear fme global errors at driver init Russ Weight
  2021-10-20  4:44 ` [PATCH v1 0/2] fpga: dfl: Log and clear errors on " Wu, Hao
  2 siblings, 2 replies; 10+ messages in thread
From: Russ Weight @ 2021-10-19 23:15 UTC (permalink / raw)
  To: mdf, linux-fpga, linux-kernel
  Cc: trix, lgoncalv, yilun.xu, hao.wu, matthew.gerlach, Russ Weight

When the AFU driver initializes, log any pre-existing errors and clear them.

Signed-off-by: Russ Weight <russell.h.weight@intel.com>
---
 drivers/fpga/dfl-afu-error.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
index ab7be6217368..0dc60bf49902 100644
--- a/drivers/fpga/dfl-afu-error.c
+++ b/drivers/fpga/dfl-afu-error.c
@@ -47,13 +47,13 @@ static void afu_port_err_mask(struct device *dev, bool mask)
 }
 
 /* clear port errors. */
-static int afu_port_err_clear(struct device *dev, u64 err)
+static int afu_port_err_clear(struct device *dev, u64 err, bool clear_all)
 {
 	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
 	struct platform_device *pdev = to_platform_device(dev);
+	u64 v, port_error, port_first_error;
 	void __iomem *base_err, *base_hdr;
 	int enable_ret = 0, ret = -EBUSY;
-	u64 v;
 
 	base_err = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
 	base_hdr = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
@@ -88,16 +88,21 @@ static int afu_port_err_clear(struct device *dev, u64 err)
 	__afu_port_err_mask(dev, true);
 
 	/* Clear errors if err input matches with current port errors.*/
-	v = readq(base_err + PORT_ERROR);
+	port_error = readq(base_err + PORT_ERROR);
 
-	if (v == err) {
-		writeq(v, base_err + PORT_ERROR);
+	if (clear_all || port_error == err) {
+		port_first_error = readq(base_err + PORT_FIRST_ERROR);
 
-		v = readq(base_err + PORT_FIRST_ERROR);
-		writeq(v, base_err + PORT_FIRST_ERROR);
+		if (clear_all && (port_error || port_first_error))
+			dev_warn(dev,
+				 "Port Error: 0x%llx, First Error 0x%llx\n",
+				 port_error, port_first_error);
+
+		writeq(port_error, base_err + PORT_ERROR);
+		writeq(port_first_error, base_err + PORT_FIRST_ERROR);
 	} else {
 		dev_warn(dev, "%s: received 0x%llx, expected 0x%llx\n",
-			 __func__, v, err);
+			 __func__, port_error, err);
 		ret = -EINVAL;
 	}
 
@@ -137,7 +142,7 @@ static ssize_t errors_store(struct device *dev, struct device_attribute *attr,
 	if (kstrtou64(buff, 0, &value))
 		return -EINVAL;
 
-	ret = afu_port_err_clear(dev, value);
+	ret = afu_port_err_clear(dev, value, false);
 
 	return ret ? ret : count;
 }
@@ -211,7 +216,8 @@ const struct attribute_group port_err_group = {
 static int port_err_init(struct platform_device *pdev,
 			 struct dfl_feature *feature)
 {
-	afu_port_err_mask(&pdev->dev, false);
+	if (afu_port_err_clear(&pdev->dev, 0, true))
+		afu_port_err_mask(&pdev->dev, false);
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v1 2/2] fpga: dfl: fme: Clear fme global errors at driver init
  2021-10-19 23:15 [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init Russ Weight
  2021-10-19 23:15 ` [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init Russ Weight
@ 2021-10-19 23:15 ` Russ Weight
  2021-10-20 12:59   ` Tom Rix
  2021-10-20  4:44 ` [PATCH v1 0/2] fpga: dfl: Log and clear errors on " Wu, Hao
  2 siblings, 1 reply; 10+ messages in thread
From: Russ Weight @ 2021-10-19 23:15 UTC (permalink / raw)
  To: mdf, linux-fpga, linux-kernel
  Cc: trix, lgoncalv, yilun.xu, hao.wu, matthew.gerlach, Russ Weight

When the fme-error driver initializes, log any pre-existing errors and
clear them. To avoid code replication, common code is gathered into
a common fme_err_clear() function and a structure (err_reg) is created
to describe each of the error registers, the corresponding mask
registers, and the default mask for each register.

Signed-off-by: Russ Weight <russell.h.weight@intel.com>
---
 drivers/fpga/dfl-fme-error.c | 128 +++++++++++++++++++++++------------
 1 file changed, 84 insertions(+), 44 deletions(-)

diff --git a/drivers/fpga/dfl-fme-error.c b/drivers/fpga/dfl-fme-error.c
index 51c2892ec06d..44da7b30c469 100644
--- a/drivers/fpga/dfl-fme-error.c
+++ b/drivers/fpga/dfl-fme-error.c
@@ -39,6 +39,27 @@
 
 #define ERROR_MASK		GENMASK_ULL(63, 0)
 
+struct err_reg {
+	char *name;
+	u32 err_offset;
+	u32 mask_offset;
+	u32 mask_value;
+};
+
+static struct err_reg pcie0_reg = {
+	.name = "PCIE0",
+	.err_offset = PCIE0_ERROR,
+	.mask_offset = PCIE0_ERROR_MASK,
+	.mask_value = 0ULL
+};
+
+static struct err_reg pcie1_reg = {
+	.name = "PCIE1",
+	.err_offset = PCIE1_ERROR,
+	.mask_offset = PCIE1_ERROR_MASK,
+	.mask_value = 0ULL
+};
+
 static ssize_t pcie0_errors_show(struct device *dev,
 				 struct device_attribute *attr, char *buf)
 {
@@ -55,31 +76,48 @@ static ssize_t pcie0_errors_show(struct device *dev,
 	return sprintf(buf, "0x%llx\n", (unsigned long long)value);
 }
 
-static ssize_t pcie0_errors_store(struct device *dev,
-				  struct device_attribute *attr,
-				  const char *buf, size_t count)
+static int fme_err_clear(struct device *dev, struct err_reg *reg,
+			 u64 err, bool clear_all)
 {
 	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
 	void __iomem *base;
 	int ret = 0;
-	u64 v, val;
-
-	if (kstrtou64(buf, 0, &val))
-		return -EINVAL;
+	u64 v;
 
 	base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_GLOBAL_ERR);
 
 	mutex_lock(&pdata->lock);
-	writeq(GENMASK_ULL(63, 0), base + PCIE0_ERROR_MASK);
+	writeq(GENMASK_ULL(63, 0), base + reg->mask_offset);
 
-	v = readq(base + PCIE0_ERROR);
-	if (val == v)
-		writeq(v, base + PCIE0_ERROR);
-	else
+	v = readq(base + reg->err_offset);
+	if (clear_all || err == v) {
+		if (clear_all && v)
+			dev_warn(dev, "%s: %s Errors: 0x%llx\n",
+				 __func__, reg->name, v);
+
+		writeq(v, base + reg->err_offset);
+	} else {
 		ret = -EINVAL;
+	}
 
-	writeq(0ULL, base + PCIE0_ERROR_MASK);
+	writeq(reg->mask_value, base + reg->mask_offset);
 	mutex_unlock(&pdata->lock);
+
+	return ret;
+}
+
+static ssize_t pcie0_errors_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t count)
+{
+	u64 val;
+	int ret;
+
+	if (kstrtou64(buf, 0, &val))
+		return -EINVAL;
+
+	ret = fme_err_clear(dev, &pcie0_reg, val, false);
+
 	return ret ? ret : count;
 }
 static DEVICE_ATTR_RW(pcie0_errors);
@@ -104,27 +142,14 @@ static ssize_t pcie1_errors_store(struct device *dev,
 				  struct device_attribute *attr,
 				  const char *buf, size_t count)
 {
-	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
-	void __iomem *base;
-	int ret = 0;
-	u64 v, val;
+	u64 val;
+	int ret;
 
 	if (kstrtou64(buf, 0, &val))
 		return -EINVAL;
 
-	base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_GLOBAL_ERR);
+	ret = fme_err_clear(dev, &pcie1_reg, val, false);
 
-	mutex_lock(&pdata->lock);
-	writeq(GENMASK_ULL(63, 0), base + PCIE1_ERROR_MASK);
-
-	v = readq(base + PCIE1_ERROR);
-	if (val == v)
-		writeq(v, base + PCIE1_ERROR);
-	else
-		ret = -EINVAL;
-
-	writeq(0ULL, base + PCIE1_ERROR_MASK);
-	mutex_unlock(&pdata->lock);
 	return ret ? ret : count;
 }
 static DEVICE_ATTR_RW(pcie1_errors);
@@ -218,29 +243,26 @@ static ssize_t fme_errors_store(struct device *dev,
 				struct device_attribute *attr,
 				const char *buf, size_t count)
 {
-	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+	static struct err_reg fme_reg = {
+		.name = "FME",
+		.err_offset = FME_ERROR,
+		.mask_offset = FME_ERROR_MASK,
+		.mask_value = 0ULL
+	};
 	void __iomem *base;
-	u64 v, val;
-	int ret = 0;
+	u64 val;
+	int ret;
 
 	if (kstrtou64(buf, 0, &val))
 		return -EINVAL;
 
+	/* Workaround: disable MBP_ERROR if feature revision is 0 */
 	base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_GLOBAL_ERR);
+	if (!dfl_feature_revision(base))
+		fme_reg.mask_value = MBP_ERROR;
 
-	mutex_lock(&pdata->lock);
-	writeq(GENMASK_ULL(63, 0), base + FME_ERROR_MASK);
+	ret = fme_err_clear(dev, &fme_reg, val, false);
 
-	v = readq(base + FME_ERROR);
-	if (val == v)
-		writeq(v, base + FME_ERROR);
-	else
-		ret = -EINVAL;
-
-	/* Workaround: disable MBP_ERROR if feature revision is 0 */
-	writeq(dfl_feature_revision(base) ? 0ULL : MBP_ERROR,
-	       base + FME_ERROR_MASK);
-	mutex_unlock(&pdata->lock);
 	return ret ? ret : count;
 }
 static DEVICE_ATTR_RW(fme_errors);
@@ -338,6 +360,24 @@ static void fme_err_mask(struct device *dev, bool mask)
 static int fme_global_err_init(struct platform_device *pdev,
 			       struct dfl_feature *feature)
 {
+	static struct err_reg fme_reg = {
+		.name = "FME",
+		.err_offset = FME_ERROR,
+		.mask_offset = FME_ERROR_MASK,
+		.mask_value = 0ULL
+	};
+	void __iomem *base;
+
+	/* Workaround: disable MBP_ERROR if feature revision is 0 */
+	base = dfl_get_feature_ioaddr_by_id(&pdev->dev,
+					    FME_FEATURE_ID_GLOBAL_ERR);
+	if (!dfl_feature_revision(base))
+		fme_reg.mask_value = MBP_ERROR;
+
+	(void)fme_err_clear(&pdev->dev, &pcie0_reg, 0ULL, true);
+	(void)fme_err_clear(&pdev->dev, &pcie1_reg, 0ULL, true);
+	(void)fme_err_clear(&pdev->dev, &fme_reg, 0ULL, true);
+
 	fme_err_mask(&pdev->dev, false);
 
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init
  2021-10-19 23:15 ` [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init Russ Weight
@ 2021-10-20  4:15   ` Xu Yilun
  2021-10-20 12:18   ` Tom Rix
  1 sibling, 0 replies; 10+ messages in thread
From: Xu Yilun @ 2021-10-20  4:15 UTC (permalink / raw)
  To: Russ Weight
  Cc: mdf, linux-fpga, linux-kernel, trix, lgoncalv, hao.wu, matthew.gerlach

On Tue, Oct 19, 2021 at 04:15:44PM -0700, Russ Weight wrote:
> When the AFU driver initializes, log any pre-existing errors and clear them.

Please wrapper the commit message to pass checkpatch.pl

> 
> Signed-off-by: Russ Weight <russell.h.weight@intel.com>
> ---
>  drivers/fpga/dfl-afu-error.c | 26 ++++++++++++++++----------
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
> index ab7be6217368..0dc60bf49902 100644
> --- a/drivers/fpga/dfl-afu-error.c
> +++ b/drivers/fpga/dfl-afu-error.c
> @@ -47,13 +47,13 @@ static void afu_port_err_mask(struct device *dev, bool mask)
>  }
>  
>  /* clear port errors. */
> -static int afu_port_err_clear(struct device *dev, u64 err)
> +static int afu_port_err_clear(struct device *dev, u64 err, bool clear_all)

how about "clear_all_on_init"?

>  {
>  	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
>  	struct platform_device *pdev = to_platform_device(dev);
> +	u64 v, port_error, port_first_error;
>  	void __iomem *base_err, *base_hdr;
>  	int enable_ret = 0, ret = -EBUSY;
> -	u64 v;
>  
>  	base_err = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
>  	base_hdr = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
> @@ -88,16 +88,21 @@ static int afu_port_err_clear(struct device *dev, u64 err)
>  	__afu_port_err_mask(dev, true);
>  
>  	/* Clear errors if err input matches with current port errors.*/
> -	v = readq(base_err + PORT_ERROR);
> +	port_error = readq(base_err + PORT_ERROR);
>  
> -	if (v == err) {
> -		writeq(v, base_err + PORT_ERROR);
> +	if (clear_all || port_error == err) {
> +		port_first_error = readq(base_err + PORT_FIRST_ERROR);
>  
> -		v = readq(base_err + PORT_FIRST_ERROR);
> -		writeq(v, base_err + PORT_FIRST_ERROR);
> +		if (clear_all && (port_error || port_first_error))
> +			dev_warn(dev,
> +				 "Port Error: 0x%llx, First Error 0x%llx\n",
> +				 port_error, port_first_error);

The log could be more specific that we are reporting errors on device
initialization.

Thanks,
Yilun

> +
> +		writeq(port_error, base_err + PORT_ERROR);
> +		writeq(port_first_error, base_err + PORT_FIRST_ERROR);
>  	} else {
>  		dev_warn(dev, "%s: received 0x%llx, expected 0x%llx\n",
> -			 __func__, v, err);
> +			 __func__, port_error, err);
>  		ret = -EINVAL;
>  	}
>  
> @@ -137,7 +142,7 @@ static ssize_t errors_store(struct device *dev, struct device_attribute *attr,
>  	if (kstrtou64(buff, 0, &value))
>  		return -EINVAL;
>  
> -	ret = afu_port_err_clear(dev, value);
> +	ret = afu_port_err_clear(dev, value, false);
>  
>  	return ret ? ret : count;
>  }
> @@ -211,7 +216,8 @@ const struct attribute_group port_err_group = {
>  static int port_err_init(struct platform_device *pdev,
>  			 struct dfl_feature *feature)
>  {
> -	afu_port_err_mask(&pdev->dev, false);
> +	if (afu_port_err_clear(&pdev->dev, 0, true))
> +		afu_port_err_mask(&pdev->dev, false);
>  
>  	return 0;
>  }
> -- 
> 2.25.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init
  2021-10-19 23:15 [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init Russ Weight
  2021-10-19 23:15 ` [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init Russ Weight
  2021-10-19 23:15 ` [PATCH v1 2/2] fpga: dfl: fme: Clear fme global errors at driver init Russ Weight
@ 2021-10-20  4:44 ` Wu, Hao
  2021-10-21  0:05   ` Russ Weight
  2 siblings, 1 reply; 10+ messages in thread
From: Wu, Hao @ 2021-10-20  4:44 UTC (permalink / raw)
  To: Weight, Russell H, mdf, linux-fpga, linux-kernel
  Cc: trix, lgoncalv, Xu, Yilun, Gerlach, Matthew

> Subject: [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init
> 
> These patches address a request to log and clear any prexisting errors on
> FPGA cards when the drivers load. Any existing errors will result in print
> statements to the kernel error log before the errors are cleared. These
> changes specifically affect the fme and port error registers.

Could you please explain more about why we need this change? 
As we have user interface to log and clear errors already, is it a better choice
to let userspace log and clear them during AFU initialization?

Hao

> 
> Russ Weight (2):
>   fpga: dfl: afu: Clear port errors in afu init
>   fpga: dfl: fme: Clear fme global errors at driver init
> 
>  drivers/fpga/dfl-afu-error.c |  26 ++++---
>  drivers/fpga/dfl-fme-error.c | 128 +++++++++++++++++++++++------------
>  2 files changed, 100 insertions(+), 54 deletions(-)
> 
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init
  2021-10-19 23:15 ` [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init Russ Weight
  2021-10-20  4:15   ` Xu Yilun
@ 2021-10-20 12:18   ` Tom Rix
  2021-10-20 16:02     ` Russ Weight
  1 sibling, 1 reply; 10+ messages in thread
From: Tom Rix @ 2021-10-20 12:18 UTC (permalink / raw)
  To: Russ Weight, mdf, linux-fpga, linux-kernel
  Cc: lgoncalv, yilun.xu, hao.wu, matthew.gerlach


On 10/19/21 4:15 PM, Russ Weight wrote:
> When the AFU driver initializes, log any pre-existing errors and clear them.
>
> Signed-off-by: Russ Weight <russell.h.weight@intel.com>
> ---
>   drivers/fpga/dfl-afu-error.c | 26 ++++++++++++++++----------
>   1 file changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
> index ab7be6217368..0dc60bf49902 100644
> --- a/drivers/fpga/dfl-afu-error.c
> +++ b/drivers/fpga/dfl-afu-error.c
> @@ -47,13 +47,13 @@ static void afu_port_err_mask(struct device *dev, bool mask)
>   }
>   
>   /* clear port errors. */
> -static int afu_port_err_clear(struct device *dev, u64 err)
> +static int afu_port_err_clear(struct device *dev, u64 err, bool clear_all)
>   {
>   	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
>   	struct platform_device *pdev = to_platform_device(dev);
> +	u64 v, port_error, port_first_error;
v is only used now by the read of PORT_HDR_STS, could v be changed to a 
more descriptive variable like hdr_sts ?
>   	void __iomem *base_err, *base_hdr;
>   	int enable_ret = 0, ret = -EBUSY;
> -	u64 v;
>   
>   	base_err = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
>   	base_hdr = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
> @@ -88,16 +88,21 @@ static int afu_port_err_clear(struct device *dev, u64 err)
>   	__afu_port_err_mask(dev, true);
>   
>   	/* Clear errors if err input matches with current port errors.*/
> -	v = readq(base_err + PORT_ERROR);
> +	port_error = readq(base_err + PORT_ERROR);
>   
> -	if (v == err) {
> -		writeq(v, base_err + PORT_ERROR);
> +	if (clear_all || port_error == err) {
> +		port_first_error = readq(base_err + PORT_FIRST_ERROR);
>   
> -		v = readq(base_err + PORT_FIRST_ERROR);
> -		writeq(v, base_err + PORT_FIRST_ERROR);
> +		if (clear_all && (port_error || port_first_error))

likely with clear_all that this dev_warn will over report.

how about removing clear_all && from if-check ?

> +			dev_warn(dev,
> +				 "Port Error: 0x%llx, First Error 0x%llx\n",
> +				 port_error, port_first_error);
> +
> +		writeq(port_error, base_err + PORT_ERROR);
> +		writeq(port_first_error, base_err + PORT_FIRST_ERROR);
>   	} else {
>   		dev_warn(dev, "%s: received 0x%llx, expected 0x%llx\n",
> -			 __func__, v, err);
> +			 __func__, port_error, err);
>   		ret = -EINVAL;
>   	}
>   
> @@ -137,7 +142,7 @@ static ssize_t errors_store(struct device *dev, struct device_attribute *attr,
>   	if (kstrtou64(buff, 0, &value))
>   		return -EINVAL;
>   
> -	ret = afu_port_err_clear(dev, value);
> +	ret = afu_port_err_clear(dev, value, false);
>   
>   	return ret ? ret : count;
>   }
> @@ -211,7 +216,8 @@ const struct attribute_group port_err_group = {
>   static int port_err_init(struct platform_device *pdev,
>   			 struct dfl_feature *feature)
>   {
> -	afu_port_err_mask(&pdev->dev, false);
> +	if (afu_port_err_clear(&pdev->dev, 0, true))
> +		afu_port_err_mask(&pdev->dev, false);

There is a __afu_port_err_mask at the end of afu_port_err_clear so this 
call isn't needed.

Tom

>   
>   	return 0;
>   }


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/2] fpga: dfl: fme: Clear fme global errors at driver init
  2021-10-19 23:15 ` [PATCH v1 2/2] fpga: dfl: fme: Clear fme global errors at driver init Russ Weight
@ 2021-10-20 12:59   ` Tom Rix
  0 siblings, 0 replies; 10+ messages in thread
From: Tom Rix @ 2021-10-20 12:59 UTC (permalink / raw)
  To: Russ Weight, mdf, linux-fpga, linux-kernel
  Cc: lgoncalv, yilun.xu, hao.wu, matthew.gerlach


On 10/19/21 4:15 PM, Russ Weight wrote:
> When the fme-error driver initializes, log any pre-existing errors and
> clear them. To avoid code replication, common code is gathered into
> a common fme_err_clear() function and a structure (err_reg) is created
> to describe each of the error registers, the corresponding mask
> registers, and the default mask for each register.
>
> Signed-off-by: Russ Weight <russell.h.weight@intel.com>
> ---
>   drivers/fpga/dfl-fme-error.c | 128 +++++++++++++++++++++++------------
>   1 file changed, 84 insertions(+), 44 deletions(-)
>
> diff --git a/drivers/fpga/dfl-fme-error.c b/drivers/fpga/dfl-fme-error.c
> index 51c2892ec06d..44da7b30c469 100644
> --- a/drivers/fpga/dfl-fme-error.c
> +++ b/drivers/fpga/dfl-fme-error.c
> @@ -39,6 +39,27 @@
>   
>   #define ERROR_MASK		GENMASK_ULL(63, 0)
>   
> +struct err_reg {
> +	char *name;
> +	u32 err_offset;
> +	u32 mask_offset;
> +	u32 mask_value;
> +};
> +
> +static struct err_reg pcie0_reg = {
> +	.name = "PCIE0",
> +	.err_offset = PCIE0_ERROR,
> +	.mask_offset = PCIE0_ERROR_MASK,
> +	.mask_value = 0ULL
> +};
> +
> +static struct err_reg pcie1_reg = {
> +	.name = "PCIE1",
> +	.err_offset = PCIE1_ERROR,
> +	.mask_offset = PCIE1_ERROR_MASK,
> +	.mask_value = 0ULL
> +};
> +
Can these be stored in an array ?
>   static ssize_t pcie0_errors_show(struct device *dev,
>   				 struct device_attribute *attr, char *buf)
>   {
> @@ -55,31 +76,48 @@ static ssize_t pcie0_errors_show(struct device *dev,
>   	return sprintf(buf, "0x%llx\n", (unsigned long long)value);
>   }
>   
> -static ssize_t pcie0_errors_store(struct device *dev,
> -				  struct device_attribute *attr,
> -				  const char *buf, size_t count)
> +static int fme_err_clear(struct device *dev, struct err_reg *reg,
> +			 u64 err, bool clear_all)
>   {
>   	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
>   	void __iomem *base;
>   	int ret = 0;
> -	u64 v, val;
> -
> -	if (kstrtou64(buf, 0, &val))
> -		return -EINVAL;
> +	u64 v;
>   
>   	base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_GLOBAL_ERR);
>   
>   	mutex_lock(&pdata->lock);
> -	writeq(GENMASK_ULL(63, 0), base + PCIE0_ERROR_MASK);
> +	writeq(GENMASK_ULL(63, 0), base + reg->mask_offset);
>   
> -	v = readq(base + PCIE0_ERROR);
> -	if (val == v)
> -		writeq(v, base + PCIE0_ERROR);
> -	else
> +	v = readq(base + reg->err_offset);
> +	if (clear_all || err == v) {
> +		if (clear_all && v)
clear_all will over report.
> +			dev_warn(dev, "%s: %s Errors: 0x%llx\n",
> +				 __func__, reg->name, v);
> +
> +		writeq(v, base + reg->err_offset);
> +	} else {
>   		ret = -EINVAL;
> +	}
>   
> -	writeq(0ULL, base + PCIE0_ERROR_MASK);
> +	writeq(reg->mask_value, base + reg->mask_offset);
>   	mutex_unlock(&pdata->lock);
> +
> +	return ret;
> +}
> +
> +static ssize_t pcie0_errors_store(struct device *dev,
> +				  struct device_attribute *attr,
> +				  const char *buf, size_t count)
> +{
> +	u64 val;
> +	int ret;
> +
> +	if (kstrtou64(buf, 0, &val))
> +		return -EINVAL;
> +
> +	ret = fme_err_clear(dev, &pcie0_reg, val, false);
> +
>   	return ret ? ret : count;
>   }
>   static DEVICE_ATTR_RW(pcie0_errors);
> @@ -104,27 +142,14 @@ static ssize_t pcie1_errors_store(struct device *dev,
>   				  struct device_attribute *attr,
>   				  const char *buf, size_t count)
>   {
This looks like a copy of pcie0_errors_store can these functions be 
consolidated ?
> -	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> -	void __iomem *base;
> -	int ret = 0;
> -	u64 v, val;
> +	u64 val;
> +	int ret;
>   
>   	if (kstrtou64(buf, 0, &val))
>   		return -EINVAL;
>   
> -	base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_GLOBAL_ERR);
> +	ret = fme_err_clear(dev, &pcie1_reg, val, false);
>   
> -	mutex_lock(&pdata->lock);
> -	writeq(GENMASK_ULL(63, 0), base + PCIE1_ERROR_MASK);
> -
> -	v = readq(base + PCIE1_ERROR);
> -	if (val == v)
> -		writeq(v, base + PCIE1_ERROR);
> -	else
> -		ret = -EINVAL;
> -
> -	writeq(0ULL, base + PCIE1_ERROR_MASK);
> -	mutex_unlock(&pdata->lock);
>   	return ret ? ret : count;
>   }
>   static DEVICE_ATTR_RW(pcie1_errors);
> @@ -218,29 +243,26 @@ static ssize_t fme_errors_store(struct device *dev,
>   				struct device_attribute *attr,
>   				const char *buf, size_t count)
>   {
> -	struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> +	static struct err_reg fme_reg = {
> +		.name = "FME",
> +		.err_offset = FME_ERROR,
> +		.mask_offset = FME_ERROR_MASK,
> +		.mask_value = 0ULL
> +	};
>   	void __iomem *base;
> -	u64 v, val;
> -	int ret = 0;
> +	u64 val;
> +	int ret;
>   
>   	if (kstrtou64(buf, 0, &val))
>   		return -EINVAL;
>   
> +	/* Workaround: disable MBP_ERROR if feature revision is 0 */
>   	base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_GLOBAL_ERR);
> +	if (!dfl_feature_revision(base))
> +		fme_reg.mask_value = MBP_ERROR;
>   
> -	mutex_lock(&pdata->lock);
> -	writeq(GENMASK_ULL(63, 0), base + FME_ERROR_MASK);
> +	ret = fme_err_clear(dev, &fme_reg, val, false);
>   
> -	v = readq(base + FME_ERROR);
> -	if (val == v)
> -		writeq(v, base + FME_ERROR);
> -	else
> -		ret = -EINVAL;
> -
> -	/* Workaround: disable MBP_ERROR if feature revision is 0 */
> -	writeq(dfl_feature_revision(base) ? 0ULL : MBP_ERROR,
> -	       base + FME_ERROR_MASK);
> -	mutex_unlock(&pdata->lock);
>   	return ret ? ret : count;
>   }
>   static DEVICE_ATTR_RW(fme_errors);
> @@ -338,6 +360,24 @@ static void fme_err_mask(struct device *dev, bool mask)
>   static int fme_global_err_init(struct platform_device *pdev,
>   			       struct dfl_feature *feature)
>   {
> +	static struct err_reg fme_reg = {
> +		.name = "FME",
> +		.err_offset = FME_ERROR,
> +		.mask_offset = FME_ERROR_MASK,
> +		.mask_value = 0ULL
> +	};
> +	void __iomem *base;
> +
> +	/* Workaround: disable MBP_ERROR if feature revision is 0 */
> +	base = dfl_get_feature_ioaddr_by_id(&pdev->dev,
> +					    FME_FEATURE_ID_GLOBAL_ERR);
> +	if (!dfl_feature_revision(base))
> +		fme_reg.mask_value = MBP_ERROR;

A similar block above to set fme_reg.mask_value.

These should be consolidated.

> +
> +	(void)fme_err_clear(&pdev->dev, &pcie0_reg, 0ULL, true);

remove casting return to (void)

Tom

> +	(void)fme_err_clear(&pdev->dev, &pcie1_reg, 0ULL, true);
> +	(void)fme_err_clear(&pdev->dev, &fme_reg, 0ULL, true);
> +
>   	fme_err_mask(&pdev->dev, false);
>   
>   	return 0;


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init
  2021-10-20 12:18   ` Tom Rix
@ 2021-10-20 16:02     ` Russ Weight
  0 siblings, 0 replies; 10+ messages in thread
From: Russ Weight @ 2021-10-20 16:02 UTC (permalink / raw)
  To: Tom Rix, mdf, linux-fpga, linux-kernel
  Cc: lgoncalv, yilun.xu, hao.wu, matthew.gerlach



On 10/20/21 5:18 AM, Tom Rix wrote:
>
> On 10/19/21 4:15 PM, Russ Weight wrote:
>> When the AFU driver initializes, log any pre-existing errors and clear them.
>>
>> Signed-off-by: Russ Weight <russell.h.weight@intel.com>
>> ---
>>   drivers/fpga/dfl-afu-error.c | 26 ++++++++++++++++----------
>>   1 file changed, 16 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
>> index ab7be6217368..0dc60bf49902 100644
>> --- a/drivers/fpga/dfl-afu-error.c
>> +++ b/drivers/fpga/dfl-afu-error.c
>> @@ -47,13 +47,13 @@ static void afu_port_err_mask(struct device *dev, bool mask)
>>   }
>>     /* clear port errors. */
>> -static int afu_port_err_clear(struct device *dev, u64 err)
>> +static int afu_port_err_clear(struct device *dev, u64 err, bool clear_all)
>>   {
>>       struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
>>       struct platform_device *pdev = to_platform_device(dev);
>> +    u64 v, port_error, port_first_error;
> v is only used now by the read of PORT_HDR_STS, could v be changed to a more descriptive variable like hdr_sts ?
>>       void __iomem *base_err, *base_hdr;
>>       int enable_ret = 0, ret = -EBUSY;
>> -    u64 v;
>>         base_err = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
>>       base_hdr = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
>> @@ -88,16 +88,21 @@ static int afu_port_err_clear(struct device *dev, u64 err)
>>       __afu_port_err_mask(dev, true);
>>         /* Clear errors if err input matches with current port errors.*/
>> -    v = readq(base_err + PORT_ERROR);
>> +    port_error = readq(base_err + PORT_ERROR);
>>   -    if (v == err) {
>> -        writeq(v, base_err + PORT_ERROR);
>> +    if (clear_all || port_error == err) {
>> +        port_first_error = readq(base_err + PORT_FIRST_ERROR);
>>   -        v = readq(base_err + PORT_FIRST_ERROR);
>> -        writeq(v, base_err + PORT_FIRST_ERROR);
>> +        if (clear_all && (port_error || port_first_error))
>
> likely with clear_all that this dev_warn will over report.
>
> how about removing clear_all && from if-check ?
I think that would make it report more often? clear_all is only set
at the time that the driver initializes. With the current condition,
errors will only be logged when the driver loads, if and only if
there are actually errors to report.


>
>> +            dev_warn(dev,
>> +                 "Port Error: 0x%llx, First Error 0x%llx\n",
>> +                 port_error, port_first_error);
>> +
>> +        writeq(port_error, base_err + PORT_ERROR);
>> +        writeq(port_first_error, base_err + PORT_FIRST_ERROR);
>>       } else {
>>           dev_warn(dev, "%s: received 0x%llx, expected 0x%llx\n",
>> -             __func__, v, err);
>> +             __func__, port_error, err);
>>           ret = -EINVAL;
>>       }
>>   @@ -137,7 +142,7 @@ static ssize_t errors_store(struct device *dev, struct device_attribute *attr,
>>       if (kstrtou64(buff, 0, &value))
>>           return -EINVAL;
>>   -    ret = afu_port_err_clear(dev, value);
>> +    ret = afu_port_err_clear(dev, value, false);
>>         return ret ? ret : count;
>>   }
>> @@ -211,7 +216,8 @@ const struct attribute_group port_err_group = {
>>   static int port_err_init(struct platform_device *pdev,
>>                struct dfl_feature *feature)
>>   {
>> -    afu_port_err_mask(&pdev->dev, false);
>> +    if (afu_port_err_clear(&pdev->dev, 0, true))
>> +        afu_port_err_mask(&pdev->dev, false);
>
> There is a __afu_port_err_mask at the end of afu_port_err_clear so this call isn't needed.

The condition statements says to only clear the mask explicitly if afu_port_err_clear()
fails. If it succeeds, then the explicit call will not happen, because as you have
pointed out, afu_port_err_clear() clear the mask.

Thanks,
- Russ

>
> Tom
>
>>         return 0;
>>   }
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init
  2021-10-20  4:44 ` [PATCH v1 0/2] fpga: dfl: Log and clear errors on " Wu, Hao
@ 2021-10-21  0:05   ` Russ Weight
  2021-10-21  0:43     ` Wu, Hao
  0 siblings, 1 reply; 10+ messages in thread
From: Russ Weight @ 2021-10-21  0:05 UTC (permalink / raw)
  To: Wu, Hao, mdf, linux-fpga, linux-kernel
  Cc: trix, lgoncalv, Xu, Yilun, Gerlach, Matthew



On 10/19/21 9:44 PM, Wu, Hao wrote:
>> Subject: [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init
>>
>> These patches address a request to log and clear any prexisting errors on
>> FPGA cards when the drivers load. Any existing errors will result in print
>> statements to the kernel error log before the errors are cleared. These
>> changes specifically affect the fme and port error registers.
> Could you please explain more about why we need this change? 
> As we have user interface to log and clear errors already, is it a better choice
> to let userspace log and clear them during AFU initialization?
In the new architecture we are offering more flexibility to customers
for adding functions. With these designs it becomes nearly impossible
to design the AFU interface handler to recover from errors and resume
operation afterwards. The proposed solution is to flag the source of
the error and then capture it in sticky registers so that they can be
read out from SW in the event of a crash/warm boot. To ensure that we
capture these errors, the proposal is to log them in the kernel log and
clear them at driver initialization.

- Russ

> Hao
>
>> Russ Weight (2):
>>   fpga: dfl: afu: Clear port errors in afu init
>>   fpga: dfl: fme: Clear fme global errors at driver init
>>
>>  drivers/fpga/dfl-afu-error.c |  26 ++++---
>>  drivers/fpga/dfl-fme-error.c | 128 +++++++++++++++++++++++------------
>>  2 files changed, 100 insertions(+), 54 deletions(-)
>>
>> --
>> 2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init
  2021-10-21  0:05   ` Russ Weight
@ 2021-10-21  0:43     ` Wu, Hao
  0 siblings, 0 replies; 10+ messages in thread
From: Wu, Hao @ 2021-10-21  0:43 UTC (permalink / raw)
  To: Weight, Russell H, mdf, linux-fpga, linux-kernel
  Cc: trix, lgoncalv, Xu, Yilun, Gerlach, Matthew

> On 10/19/21 9:44 PM, Wu, Hao wrote:
> >> Subject: [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init
> >>
> >> These patches address a request to log and clear any prexisting errors on
> >> FPGA cards when the drivers load. Any existing errors will result in print
> >> statements to the kernel error log before the errors are cleared. These
> >> changes specifically affect the fme and port error registers.
> > Could you please explain more about why we need this change?
> > As we have user interface to log and clear errors already, is it a better choice
> > to let userspace log and clear them during AFU initialization?
> In the new architecture we are offering more flexibility to customers
> for adding functions. With these designs it becomes nearly impossible
> to design the AFU interface handler to recover from errors and resume
> operation afterwards. The proposed solution is to flag the source of
> the error and then capture it in sticky registers so that they can be
> read out from SW in the event of a crash/warm boot. To ensure that we
> capture these errors, the proposal is to log them in the kernel log and
> clear them at driver initialization.

The error can be logged and cleared at user space driver initialization,
as current usage model is let userspace handle everything including error.

Hao

> 
> - Russ
> 
> > Hao
> >
> >> Russ Weight (2):
> >>   fpga: dfl: afu: Clear port errors in afu init
> >>   fpga: dfl: fme: Clear fme global errors at driver init
> >>
> >>  drivers/fpga/dfl-afu-error.c |  26 ++++---
> >>  drivers/fpga/dfl-fme-error.c | 128 +++++++++++++++++++++++------------
> >>  2 files changed, 100 insertions(+), 54 deletions(-)
> >>
> >> --
> >> 2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-10-21  0:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-19 23:15 [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init Russ Weight
2021-10-19 23:15 ` [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init Russ Weight
2021-10-20  4:15   ` Xu Yilun
2021-10-20 12:18   ` Tom Rix
2021-10-20 16:02     ` Russ Weight
2021-10-19 23:15 ` [PATCH v1 2/2] fpga: dfl: fme: Clear fme global errors at driver init Russ Weight
2021-10-20 12:59   ` Tom Rix
2021-10-20  4:44 ` [PATCH v1 0/2] fpga: dfl: Log and clear errors on " Wu, Hao
2021-10-21  0:05   ` Russ Weight
2021-10-21  0:43     ` Wu, Hao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).