* [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init @ 2021-10-19 23:15 Russ Weight 2021-10-19 23:15 ` [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init Russ Weight ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Russ Weight @ 2021-10-19 23:15 UTC (permalink / raw) To: mdf, linux-fpga, linux-kernel Cc: trix, lgoncalv, yilun.xu, hao.wu, matthew.gerlach, Russ Weight These patches address a request to log and clear any prexisting errors on FPGA cards when the drivers load. Any existing errors will result in print statements to the kernel error log before the errors are cleared. These changes specifically affect the fme and port error registers. Russ Weight (2): fpga: dfl: afu: Clear port errors in afu init fpga: dfl: fme: Clear fme global errors at driver init drivers/fpga/dfl-afu-error.c | 26 ++++--- drivers/fpga/dfl-fme-error.c | 128 +++++++++++++++++++++++------------ 2 files changed, 100 insertions(+), 54 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init 2021-10-19 23:15 [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init Russ Weight @ 2021-10-19 23:15 ` Russ Weight 2021-10-20 4:15 ` Xu Yilun 2021-10-20 12:18 ` Tom Rix 2021-10-19 23:15 ` [PATCH v1 2/2] fpga: dfl: fme: Clear fme global errors at driver init Russ Weight 2021-10-20 4:44 ` [PATCH v1 0/2] fpga: dfl: Log and clear errors on " Wu, Hao 2 siblings, 2 replies; 10+ messages in thread From: Russ Weight @ 2021-10-19 23:15 UTC (permalink / raw) To: mdf, linux-fpga, linux-kernel Cc: trix, lgoncalv, yilun.xu, hao.wu, matthew.gerlach, Russ Weight When the AFU driver initializes, log any pre-existing errors and clear them. Signed-off-by: Russ Weight <russell.h.weight@intel.com> --- drivers/fpga/dfl-afu-error.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c index ab7be6217368..0dc60bf49902 100644 --- a/drivers/fpga/dfl-afu-error.c +++ b/drivers/fpga/dfl-afu-error.c @@ -47,13 +47,13 @@ static void afu_port_err_mask(struct device *dev, bool mask) } /* clear port errors. */ -static int afu_port_err_clear(struct device *dev, u64 err) +static int afu_port_err_clear(struct device *dev, u64 err, bool clear_all) { struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); struct platform_device *pdev = to_platform_device(dev); + u64 v, port_error, port_first_error; void __iomem *base_err, *base_hdr; int enable_ret = 0, ret = -EBUSY; - u64 v; base_err = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR); base_hdr = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); @@ -88,16 +88,21 @@ static int afu_port_err_clear(struct device *dev, u64 err) __afu_port_err_mask(dev, true); /* Clear errors if err input matches with current port errors.*/ - v = readq(base_err + PORT_ERROR); + port_error = readq(base_err + PORT_ERROR); - if (v == err) { - writeq(v, base_err + PORT_ERROR); + if (clear_all || port_error == err) { + port_first_error = readq(base_err + PORT_FIRST_ERROR); - v = readq(base_err + PORT_FIRST_ERROR); - writeq(v, base_err + PORT_FIRST_ERROR); + if (clear_all && (port_error || port_first_error)) + dev_warn(dev, + "Port Error: 0x%llx, First Error 0x%llx\n", + port_error, port_first_error); + + writeq(port_error, base_err + PORT_ERROR); + writeq(port_first_error, base_err + PORT_FIRST_ERROR); } else { dev_warn(dev, "%s: received 0x%llx, expected 0x%llx\n", - __func__, v, err); + __func__, port_error, err); ret = -EINVAL; } @@ -137,7 +142,7 @@ static ssize_t errors_store(struct device *dev, struct device_attribute *attr, if (kstrtou64(buff, 0, &value)) return -EINVAL; - ret = afu_port_err_clear(dev, value); + ret = afu_port_err_clear(dev, value, false); return ret ? ret : count; } @@ -211,7 +216,8 @@ const struct attribute_group port_err_group = { static int port_err_init(struct platform_device *pdev, struct dfl_feature *feature) { - afu_port_err_mask(&pdev->dev, false); + if (afu_port_err_clear(&pdev->dev, 0, true)) + afu_port_err_mask(&pdev->dev, false); return 0; } -- 2.25.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init 2021-10-19 23:15 ` [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init Russ Weight @ 2021-10-20 4:15 ` Xu Yilun 2021-10-20 12:18 ` Tom Rix 1 sibling, 0 replies; 10+ messages in thread From: Xu Yilun @ 2021-10-20 4:15 UTC (permalink / raw) To: Russ Weight Cc: mdf, linux-fpga, linux-kernel, trix, lgoncalv, hao.wu, matthew.gerlach On Tue, Oct 19, 2021 at 04:15:44PM -0700, Russ Weight wrote: > When the AFU driver initializes, log any pre-existing errors and clear them. Please wrapper the commit message to pass checkpatch.pl > > Signed-off-by: Russ Weight <russell.h.weight@intel.com> > --- > drivers/fpga/dfl-afu-error.c | 26 ++++++++++++++++---------- > 1 file changed, 16 insertions(+), 10 deletions(-) > > diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c > index ab7be6217368..0dc60bf49902 100644 > --- a/drivers/fpga/dfl-afu-error.c > +++ b/drivers/fpga/dfl-afu-error.c > @@ -47,13 +47,13 @@ static void afu_port_err_mask(struct device *dev, bool mask) > } > > /* clear port errors. */ > -static int afu_port_err_clear(struct device *dev, u64 err) > +static int afu_port_err_clear(struct device *dev, u64 err, bool clear_all) how about "clear_all_on_init"? > { > struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); > struct platform_device *pdev = to_platform_device(dev); > + u64 v, port_error, port_first_error; > void __iomem *base_err, *base_hdr; > int enable_ret = 0, ret = -EBUSY; > - u64 v; > > base_err = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR); > base_hdr = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); > @@ -88,16 +88,21 @@ static int afu_port_err_clear(struct device *dev, u64 err) > __afu_port_err_mask(dev, true); > > /* Clear errors if err input matches with current port errors.*/ > - v = readq(base_err + PORT_ERROR); > + port_error = readq(base_err + PORT_ERROR); > > - if (v == err) { > - writeq(v, base_err + PORT_ERROR); > + if (clear_all || port_error == err) { > + port_first_error = readq(base_err + PORT_FIRST_ERROR); > > - v = readq(base_err + PORT_FIRST_ERROR); > - writeq(v, base_err + PORT_FIRST_ERROR); > + if (clear_all && (port_error || port_first_error)) > + dev_warn(dev, > + "Port Error: 0x%llx, First Error 0x%llx\n", > + port_error, port_first_error); The log could be more specific that we are reporting errors on device initialization. Thanks, Yilun > + > + writeq(port_error, base_err + PORT_ERROR); > + writeq(port_first_error, base_err + PORT_FIRST_ERROR); > } else { > dev_warn(dev, "%s: received 0x%llx, expected 0x%llx\n", > - __func__, v, err); > + __func__, port_error, err); > ret = -EINVAL; > } > > @@ -137,7 +142,7 @@ static ssize_t errors_store(struct device *dev, struct device_attribute *attr, > if (kstrtou64(buff, 0, &value)) > return -EINVAL; > > - ret = afu_port_err_clear(dev, value); > + ret = afu_port_err_clear(dev, value, false); > > return ret ? ret : count; > } > @@ -211,7 +216,8 @@ const struct attribute_group port_err_group = { > static int port_err_init(struct platform_device *pdev, > struct dfl_feature *feature) > { > - afu_port_err_mask(&pdev->dev, false); > + if (afu_port_err_clear(&pdev->dev, 0, true)) > + afu_port_err_mask(&pdev->dev, false); > > return 0; > } > -- > 2.25.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init 2021-10-19 23:15 ` [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init Russ Weight 2021-10-20 4:15 ` Xu Yilun @ 2021-10-20 12:18 ` Tom Rix 2021-10-20 16:02 ` Russ Weight 1 sibling, 1 reply; 10+ messages in thread From: Tom Rix @ 2021-10-20 12:18 UTC (permalink / raw) To: Russ Weight, mdf, linux-fpga, linux-kernel Cc: lgoncalv, yilun.xu, hao.wu, matthew.gerlach On 10/19/21 4:15 PM, Russ Weight wrote: > When the AFU driver initializes, log any pre-existing errors and clear them. > > Signed-off-by: Russ Weight <russell.h.weight@intel.com> > --- > drivers/fpga/dfl-afu-error.c | 26 ++++++++++++++++---------- > 1 file changed, 16 insertions(+), 10 deletions(-) > > diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c > index ab7be6217368..0dc60bf49902 100644 > --- a/drivers/fpga/dfl-afu-error.c > +++ b/drivers/fpga/dfl-afu-error.c > @@ -47,13 +47,13 @@ static void afu_port_err_mask(struct device *dev, bool mask) > } > > /* clear port errors. */ > -static int afu_port_err_clear(struct device *dev, u64 err) > +static int afu_port_err_clear(struct device *dev, u64 err, bool clear_all) > { > struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); > struct platform_device *pdev = to_platform_device(dev); > + u64 v, port_error, port_first_error; v is only used now by the read of PORT_HDR_STS, could v be changed to a more descriptive variable like hdr_sts ? > void __iomem *base_err, *base_hdr; > int enable_ret = 0, ret = -EBUSY; > - u64 v; > > base_err = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR); > base_hdr = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); > @@ -88,16 +88,21 @@ static int afu_port_err_clear(struct device *dev, u64 err) > __afu_port_err_mask(dev, true); > > /* Clear errors if err input matches with current port errors.*/ > - v = readq(base_err + PORT_ERROR); > + port_error = readq(base_err + PORT_ERROR); > > - if (v == err) { > - writeq(v, base_err + PORT_ERROR); > + if (clear_all || port_error == err) { > + port_first_error = readq(base_err + PORT_FIRST_ERROR); > > - v = readq(base_err + PORT_FIRST_ERROR); > - writeq(v, base_err + PORT_FIRST_ERROR); > + if (clear_all && (port_error || port_first_error)) likely with clear_all that this dev_warn will over report. how about removing clear_all && from if-check ? > + dev_warn(dev, > + "Port Error: 0x%llx, First Error 0x%llx\n", > + port_error, port_first_error); > + > + writeq(port_error, base_err + PORT_ERROR); > + writeq(port_first_error, base_err + PORT_FIRST_ERROR); > } else { > dev_warn(dev, "%s: received 0x%llx, expected 0x%llx\n", > - __func__, v, err); > + __func__, port_error, err); > ret = -EINVAL; > } > > @@ -137,7 +142,7 @@ static ssize_t errors_store(struct device *dev, struct device_attribute *attr, > if (kstrtou64(buff, 0, &value)) > return -EINVAL; > > - ret = afu_port_err_clear(dev, value); > + ret = afu_port_err_clear(dev, value, false); > > return ret ? ret : count; > } > @@ -211,7 +216,8 @@ const struct attribute_group port_err_group = { > static int port_err_init(struct platform_device *pdev, > struct dfl_feature *feature) > { > - afu_port_err_mask(&pdev->dev, false); > + if (afu_port_err_clear(&pdev->dev, 0, true)) > + afu_port_err_mask(&pdev->dev, false); There is a __afu_port_err_mask at the end of afu_port_err_clear so this call isn't needed. Tom > > return 0; > } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init 2021-10-20 12:18 ` Tom Rix @ 2021-10-20 16:02 ` Russ Weight 0 siblings, 0 replies; 10+ messages in thread From: Russ Weight @ 2021-10-20 16:02 UTC (permalink / raw) To: Tom Rix, mdf, linux-fpga, linux-kernel Cc: lgoncalv, yilun.xu, hao.wu, matthew.gerlach On 10/20/21 5:18 AM, Tom Rix wrote: > > On 10/19/21 4:15 PM, Russ Weight wrote: >> When the AFU driver initializes, log any pre-existing errors and clear them. >> >> Signed-off-by: Russ Weight <russell.h.weight@intel.com> >> --- >> drivers/fpga/dfl-afu-error.c | 26 ++++++++++++++++---------- >> 1 file changed, 16 insertions(+), 10 deletions(-) >> >> diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c >> index ab7be6217368..0dc60bf49902 100644 >> --- a/drivers/fpga/dfl-afu-error.c >> +++ b/drivers/fpga/dfl-afu-error.c >> @@ -47,13 +47,13 @@ static void afu_port_err_mask(struct device *dev, bool mask) >> } >> /* clear port errors. */ >> -static int afu_port_err_clear(struct device *dev, u64 err) >> +static int afu_port_err_clear(struct device *dev, u64 err, bool clear_all) >> { >> struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); >> struct platform_device *pdev = to_platform_device(dev); >> + u64 v, port_error, port_first_error; > v is only used now by the read of PORT_HDR_STS, could v be changed to a more descriptive variable like hdr_sts ? >> void __iomem *base_err, *base_hdr; >> int enable_ret = 0, ret = -EBUSY; >> - u64 v; >> base_err = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR); >> base_hdr = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER); >> @@ -88,16 +88,21 @@ static int afu_port_err_clear(struct device *dev, u64 err) >> __afu_port_err_mask(dev, true); >> /* Clear errors if err input matches with current port errors.*/ >> - v = readq(base_err + PORT_ERROR); >> + port_error = readq(base_err + PORT_ERROR); >> - if (v == err) { >> - writeq(v, base_err + PORT_ERROR); >> + if (clear_all || port_error == err) { >> + port_first_error = readq(base_err + PORT_FIRST_ERROR); >> - v = readq(base_err + PORT_FIRST_ERROR); >> - writeq(v, base_err + PORT_FIRST_ERROR); >> + if (clear_all && (port_error || port_first_error)) > > likely with clear_all that this dev_warn will over report. > > how about removing clear_all && from if-check ? I think that would make it report more often? clear_all is only set at the time that the driver initializes. With the current condition, errors will only be logged when the driver loads, if and only if there are actually errors to report. > >> + dev_warn(dev, >> + "Port Error: 0x%llx, First Error 0x%llx\n", >> + port_error, port_first_error); >> + >> + writeq(port_error, base_err + PORT_ERROR); >> + writeq(port_first_error, base_err + PORT_FIRST_ERROR); >> } else { >> dev_warn(dev, "%s: received 0x%llx, expected 0x%llx\n", >> - __func__, v, err); >> + __func__, port_error, err); >> ret = -EINVAL; >> } >> @@ -137,7 +142,7 @@ static ssize_t errors_store(struct device *dev, struct device_attribute *attr, >> if (kstrtou64(buff, 0, &value)) >> return -EINVAL; >> - ret = afu_port_err_clear(dev, value); >> + ret = afu_port_err_clear(dev, value, false); >> return ret ? ret : count; >> } >> @@ -211,7 +216,8 @@ const struct attribute_group port_err_group = { >> static int port_err_init(struct platform_device *pdev, >> struct dfl_feature *feature) >> { >> - afu_port_err_mask(&pdev->dev, false); >> + if (afu_port_err_clear(&pdev->dev, 0, true)) >> + afu_port_err_mask(&pdev->dev, false); > > There is a __afu_port_err_mask at the end of afu_port_err_clear so this call isn't needed. The condition statements says to only clear the mask explicitly if afu_port_err_clear() fails. If it succeeds, then the explicit call will not happen, because as you have pointed out, afu_port_err_clear() clear the mask. Thanks, - Russ > > Tom > >> return 0; >> } > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v1 2/2] fpga: dfl: fme: Clear fme global errors at driver init 2021-10-19 23:15 [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init Russ Weight 2021-10-19 23:15 ` [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init Russ Weight @ 2021-10-19 23:15 ` Russ Weight 2021-10-20 12:59 ` Tom Rix 2021-10-20 4:44 ` [PATCH v1 0/2] fpga: dfl: Log and clear errors on " Wu, Hao 2 siblings, 1 reply; 10+ messages in thread From: Russ Weight @ 2021-10-19 23:15 UTC (permalink / raw) To: mdf, linux-fpga, linux-kernel Cc: trix, lgoncalv, yilun.xu, hao.wu, matthew.gerlach, Russ Weight When the fme-error driver initializes, log any pre-existing errors and clear them. To avoid code replication, common code is gathered into a common fme_err_clear() function and a structure (err_reg) is created to describe each of the error registers, the corresponding mask registers, and the default mask for each register. Signed-off-by: Russ Weight <russell.h.weight@intel.com> --- drivers/fpga/dfl-fme-error.c | 128 +++++++++++++++++++++++------------ 1 file changed, 84 insertions(+), 44 deletions(-) diff --git a/drivers/fpga/dfl-fme-error.c b/drivers/fpga/dfl-fme-error.c index 51c2892ec06d..44da7b30c469 100644 --- a/drivers/fpga/dfl-fme-error.c +++ b/drivers/fpga/dfl-fme-error.c @@ -39,6 +39,27 @@ #define ERROR_MASK GENMASK_ULL(63, 0) +struct err_reg { + char *name; + u32 err_offset; + u32 mask_offset; + u32 mask_value; +}; + +static struct err_reg pcie0_reg = { + .name = "PCIE0", + .err_offset = PCIE0_ERROR, + .mask_offset = PCIE0_ERROR_MASK, + .mask_value = 0ULL +}; + +static struct err_reg pcie1_reg = { + .name = "PCIE1", + .err_offset = PCIE1_ERROR, + .mask_offset = PCIE1_ERROR_MASK, + .mask_value = 0ULL +}; + static ssize_t pcie0_errors_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -55,31 +76,48 @@ static ssize_t pcie0_errors_show(struct device *dev, return sprintf(buf, "0x%llx\n", (unsigned long long)value); } -static ssize_t pcie0_errors_store(struct device *dev, - struct device_attribute *attr, - const char *buf, size_t count) +static int fme_err_clear(struct device *dev, struct err_reg *reg, + u64 err, bool clear_all) { struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); void __iomem *base; int ret = 0; - u64 v, val; - - if (kstrtou64(buf, 0, &val)) - return -EINVAL; + u64 v; base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_GLOBAL_ERR); mutex_lock(&pdata->lock); - writeq(GENMASK_ULL(63, 0), base + PCIE0_ERROR_MASK); + writeq(GENMASK_ULL(63, 0), base + reg->mask_offset); - v = readq(base + PCIE0_ERROR); - if (val == v) - writeq(v, base + PCIE0_ERROR); - else + v = readq(base + reg->err_offset); + if (clear_all || err == v) { + if (clear_all && v) + dev_warn(dev, "%s: %s Errors: 0x%llx\n", + __func__, reg->name, v); + + writeq(v, base + reg->err_offset); + } else { ret = -EINVAL; + } - writeq(0ULL, base + PCIE0_ERROR_MASK); + writeq(reg->mask_value, base + reg->mask_offset); mutex_unlock(&pdata->lock); + + return ret; +} + +static ssize_t pcie0_errors_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + u64 val; + int ret; + + if (kstrtou64(buf, 0, &val)) + return -EINVAL; + + ret = fme_err_clear(dev, &pcie0_reg, val, false); + return ret ? ret : count; } static DEVICE_ATTR_RW(pcie0_errors); @@ -104,27 +142,14 @@ static ssize_t pcie1_errors_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { - struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); - void __iomem *base; - int ret = 0; - u64 v, val; + u64 val; + int ret; if (kstrtou64(buf, 0, &val)) return -EINVAL; - base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_GLOBAL_ERR); + ret = fme_err_clear(dev, &pcie1_reg, val, false); - mutex_lock(&pdata->lock); - writeq(GENMASK_ULL(63, 0), base + PCIE1_ERROR_MASK); - - v = readq(base + PCIE1_ERROR); - if (val == v) - writeq(v, base + PCIE1_ERROR); - else - ret = -EINVAL; - - writeq(0ULL, base + PCIE1_ERROR_MASK); - mutex_unlock(&pdata->lock); return ret ? ret : count; } static DEVICE_ATTR_RW(pcie1_errors); @@ -218,29 +243,26 @@ static ssize_t fme_errors_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { - struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); + static struct err_reg fme_reg = { + .name = "FME", + .err_offset = FME_ERROR, + .mask_offset = FME_ERROR_MASK, + .mask_value = 0ULL + }; void __iomem *base; - u64 v, val; - int ret = 0; + u64 val; + int ret; if (kstrtou64(buf, 0, &val)) return -EINVAL; + /* Workaround: disable MBP_ERROR if feature revision is 0 */ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_GLOBAL_ERR); + if (!dfl_feature_revision(base)) + fme_reg.mask_value = MBP_ERROR; - mutex_lock(&pdata->lock); - writeq(GENMASK_ULL(63, 0), base + FME_ERROR_MASK); + ret = fme_err_clear(dev, &fme_reg, val, false); - v = readq(base + FME_ERROR); - if (val == v) - writeq(v, base + FME_ERROR); - else - ret = -EINVAL; - - /* Workaround: disable MBP_ERROR if feature revision is 0 */ - writeq(dfl_feature_revision(base) ? 0ULL : MBP_ERROR, - base + FME_ERROR_MASK); - mutex_unlock(&pdata->lock); return ret ? ret : count; } static DEVICE_ATTR_RW(fme_errors); @@ -338,6 +360,24 @@ static void fme_err_mask(struct device *dev, bool mask) static int fme_global_err_init(struct platform_device *pdev, struct dfl_feature *feature) { + static struct err_reg fme_reg = { + .name = "FME", + .err_offset = FME_ERROR, + .mask_offset = FME_ERROR_MASK, + .mask_value = 0ULL + }; + void __iomem *base; + + /* Workaround: disable MBP_ERROR if feature revision is 0 */ + base = dfl_get_feature_ioaddr_by_id(&pdev->dev, + FME_FEATURE_ID_GLOBAL_ERR); + if (!dfl_feature_revision(base)) + fme_reg.mask_value = MBP_ERROR; + + (void)fme_err_clear(&pdev->dev, &pcie0_reg, 0ULL, true); + (void)fme_err_clear(&pdev->dev, &pcie1_reg, 0ULL, true); + (void)fme_err_clear(&pdev->dev, &fme_reg, 0ULL, true); + fme_err_mask(&pdev->dev, false); return 0; -- 2.25.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v1 2/2] fpga: dfl: fme: Clear fme global errors at driver init 2021-10-19 23:15 ` [PATCH v1 2/2] fpga: dfl: fme: Clear fme global errors at driver init Russ Weight @ 2021-10-20 12:59 ` Tom Rix 0 siblings, 0 replies; 10+ messages in thread From: Tom Rix @ 2021-10-20 12:59 UTC (permalink / raw) To: Russ Weight, mdf, linux-fpga, linux-kernel Cc: lgoncalv, yilun.xu, hao.wu, matthew.gerlach On 10/19/21 4:15 PM, Russ Weight wrote: > When the fme-error driver initializes, log any pre-existing errors and > clear them. To avoid code replication, common code is gathered into > a common fme_err_clear() function and a structure (err_reg) is created > to describe each of the error registers, the corresponding mask > registers, and the default mask for each register. > > Signed-off-by: Russ Weight <russell.h.weight@intel.com> > --- > drivers/fpga/dfl-fme-error.c | 128 +++++++++++++++++++++++------------ > 1 file changed, 84 insertions(+), 44 deletions(-) > > diff --git a/drivers/fpga/dfl-fme-error.c b/drivers/fpga/dfl-fme-error.c > index 51c2892ec06d..44da7b30c469 100644 > --- a/drivers/fpga/dfl-fme-error.c > +++ b/drivers/fpga/dfl-fme-error.c > @@ -39,6 +39,27 @@ > > #define ERROR_MASK GENMASK_ULL(63, 0) > > +struct err_reg { > + char *name; > + u32 err_offset; > + u32 mask_offset; > + u32 mask_value; > +}; > + > +static struct err_reg pcie0_reg = { > + .name = "PCIE0", > + .err_offset = PCIE0_ERROR, > + .mask_offset = PCIE0_ERROR_MASK, > + .mask_value = 0ULL > +}; > + > +static struct err_reg pcie1_reg = { > + .name = "PCIE1", > + .err_offset = PCIE1_ERROR, > + .mask_offset = PCIE1_ERROR_MASK, > + .mask_value = 0ULL > +}; > + Can these be stored in an array ? > static ssize_t pcie0_errors_show(struct device *dev, > struct device_attribute *attr, char *buf) > { > @@ -55,31 +76,48 @@ static ssize_t pcie0_errors_show(struct device *dev, > return sprintf(buf, "0x%llx\n", (unsigned long long)value); > } > > -static ssize_t pcie0_errors_store(struct device *dev, > - struct device_attribute *attr, > - const char *buf, size_t count) > +static int fme_err_clear(struct device *dev, struct err_reg *reg, > + u64 err, bool clear_all) > { > struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); > void __iomem *base; > int ret = 0; > - u64 v, val; > - > - if (kstrtou64(buf, 0, &val)) > - return -EINVAL; > + u64 v; > > base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_GLOBAL_ERR); > > mutex_lock(&pdata->lock); > - writeq(GENMASK_ULL(63, 0), base + PCIE0_ERROR_MASK); > + writeq(GENMASK_ULL(63, 0), base + reg->mask_offset); > > - v = readq(base + PCIE0_ERROR); > - if (val == v) > - writeq(v, base + PCIE0_ERROR); > - else > + v = readq(base + reg->err_offset); > + if (clear_all || err == v) { > + if (clear_all && v) clear_all will over report. > + dev_warn(dev, "%s: %s Errors: 0x%llx\n", > + __func__, reg->name, v); > + > + writeq(v, base + reg->err_offset); > + } else { > ret = -EINVAL; > + } > > - writeq(0ULL, base + PCIE0_ERROR_MASK); > + writeq(reg->mask_value, base + reg->mask_offset); > mutex_unlock(&pdata->lock); > + > + return ret; > +} > + > +static ssize_t pcie0_errors_store(struct device *dev, > + struct device_attribute *attr, > + const char *buf, size_t count) > +{ > + u64 val; > + int ret; > + > + if (kstrtou64(buf, 0, &val)) > + return -EINVAL; > + > + ret = fme_err_clear(dev, &pcie0_reg, val, false); > + > return ret ? ret : count; > } > static DEVICE_ATTR_RW(pcie0_errors); > @@ -104,27 +142,14 @@ static ssize_t pcie1_errors_store(struct device *dev, > struct device_attribute *attr, > const char *buf, size_t count) > { This looks like a copy of pcie0_errors_store can these functions be consolidated ? > - struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); > - void __iomem *base; > - int ret = 0; > - u64 v, val; > + u64 val; > + int ret; > > if (kstrtou64(buf, 0, &val)) > return -EINVAL; > > - base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_GLOBAL_ERR); > + ret = fme_err_clear(dev, &pcie1_reg, val, false); > > - mutex_lock(&pdata->lock); > - writeq(GENMASK_ULL(63, 0), base + PCIE1_ERROR_MASK); > - > - v = readq(base + PCIE1_ERROR); > - if (val == v) > - writeq(v, base + PCIE1_ERROR); > - else > - ret = -EINVAL; > - > - writeq(0ULL, base + PCIE1_ERROR_MASK); > - mutex_unlock(&pdata->lock); > return ret ? ret : count; > } > static DEVICE_ATTR_RW(pcie1_errors); > @@ -218,29 +243,26 @@ static ssize_t fme_errors_store(struct device *dev, > struct device_attribute *attr, > const char *buf, size_t count) > { > - struct dfl_feature_platform_data *pdata = dev_get_platdata(dev); > + static struct err_reg fme_reg = { > + .name = "FME", > + .err_offset = FME_ERROR, > + .mask_offset = FME_ERROR_MASK, > + .mask_value = 0ULL > + }; > void __iomem *base; > - u64 v, val; > - int ret = 0; > + u64 val; > + int ret; > > if (kstrtou64(buf, 0, &val)) > return -EINVAL; > > + /* Workaround: disable MBP_ERROR if feature revision is 0 */ > base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_GLOBAL_ERR); > + if (!dfl_feature_revision(base)) > + fme_reg.mask_value = MBP_ERROR; > > - mutex_lock(&pdata->lock); > - writeq(GENMASK_ULL(63, 0), base + FME_ERROR_MASK); > + ret = fme_err_clear(dev, &fme_reg, val, false); > > - v = readq(base + FME_ERROR); > - if (val == v) > - writeq(v, base + FME_ERROR); > - else > - ret = -EINVAL; > - > - /* Workaround: disable MBP_ERROR if feature revision is 0 */ > - writeq(dfl_feature_revision(base) ? 0ULL : MBP_ERROR, > - base + FME_ERROR_MASK); > - mutex_unlock(&pdata->lock); > return ret ? ret : count; > } > static DEVICE_ATTR_RW(fme_errors); > @@ -338,6 +360,24 @@ static void fme_err_mask(struct device *dev, bool mask) > static int fme_global_err_init(struct platform_device *pdev, > struct dfl_feature *feature) > { > + static struct err_reg fme_reg = { > + .name = "FME", > + .err_offset = FME_ERROR, > + .mask_offset = FME_ERROR_MASK, > + .mask_value = 0ULL > + }; > + void __iomem *base; > + > + /* Workaround: disable MBP_ERROR if feature revision is 0 */ > + base = dfl_get_feature_ioaddr_by_id(&pdev->dev, > + FME_FEATURE_ID_GLOBAL_ERR); > + if (!dfl_feature_revision(base)) > + fme_reg.mask_value = MBP_ERROR; A similar block above to set fme_reg.mask_value. These should be consolidated. > + > + (void)fme_err_clear(&pdev->dev, &pcie0_reg, 0ULL, true); remove casting return to (void) Tom > + (void)fme_err_clear(&pdev->dev, &pcie1_reg, 0ULL, true); > + (void)fme_err_clear(&pdev->dev, &fme_reg, 0ULL, true); > + > fme_err_mask(&pdev->dev, false); > > return 0; ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init 2021-10-19 23:15 [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init Russ Weight 2021-10-19 23:15 ` [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init Russ Weight 2021-10-19 23:15 ` [PATCH v1 2/2] fpga: dfl: fme: Clear fme global errors at driver init Russ Weight @ 2021-10-20 4:44 ` Wu, Hao 2021-10-21 0:05 ` Russ Weight 2 siblings, 1 reply; 10+ messages in thread From: Wu, Hao @ 2021-10-20 4:44 UTC (permalink / raw) To: Weight, Russell H, mdf, linux-fpga, linux-kernel Cc: trix, lgoncalv, Xu, Yilun, Gerlach, Matthew > Subject: [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init > > These patches address a request to log and clear any prexisting errors on > FPGA cards when the drivers load. Any existing errors will result in print > statements to the kernel error log before the errors are cleared. These > changes specifically affect the fme and port error registers. Could you please explain more about why we need this change? As we have user interface to log and clear errors already, is it a better choice to let userspace log and clear them during AFU initialization? Hao > > Russ Weight (2): > fpga: dfl: afu: Clear port errors in afu init > fpga: dfl: fme: Clear fme global errors at driver init > > drivers/fpga/dfl-afu-error.c | 26 ++++--- > drivers/fpga/dfl-fme-error.c | 128 +++++++++++++++++++++++------------ > 2 files changed, 100 insertions(+), 54 deletions(-) > > -- > 2.25.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init 2021-10-20 4:44 ` [PATCH v1 0/2] fpga: dfl: Log and clear errors on " Wu, Hao @ 2021-10-21 0:05 ` Russ Weight 2021-10-21 0:43 ` Wu, Hao 0 siblings, 1 reply; 10+ messages in thread From: Russ Weight @ 2021-10-21 0:05 UTC (permalink / raw) To: Wu, Hao, mdf, linux-fpga, linux-kernel Cc: trix, lgoncalv, Xu, Yilun, Gerlach, Matthew On 10/19/21 9:44 PM, Wu, Hao wrote: >> Subject: [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init >> >> These patches address a request to log and clear any prexisting errors on >> FPGA cards when the drivers load. Any existing errors will result in print >> statements to the kernel error log before the errors are cleared. These >> changes specifically affect the fme and port error registers. > Could you please explain more about why we need this change? > As we have user interface to log and clear errors already, is it a better choice > to let userspace log and clear them during AFU initialization? In the new architecture we are offering more flexibility to customers for adding functions. With these designs it becomes nearly impossible to design the AFU interface handler to recover from errors and resume operation afterwards. The proposed solution is to flag the source of the error and then capture it in sticky registers so that they can be read out from SW in the event of a crash/warm boot. To ensure that we capture these errors, the proposal is to log them in the kernel log and clear them at driver initialization. - Russ > Hao > >> Russ Weight (2): >> fpga: dfl: afu: Clear port errors in afu init >> fpga: dfl: fme: Clear fme global errors at driver init >> >> drivers/fpga/dfl-afu-error.c | 26 ++++--- >> drivers/fpga/dfl-fme-error.c | 128 +++++++++++++++++++++++------------ >> 2 files changed, 100 insertions(+), 54 deletions(-) >> >> -- >> 2.25.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init 2021-10-21 0:05 ` Russ Weight @ 2021-10-21 0:43 ` Wu, Hao 0 siblings, 0 replies; 10+ messages in thread From: Wu, Hao @ 2021-10-21 0:43 UTC (permalink / raw) To: Weight, Russell H, mdf, linux-fpga, linux-kernel Cc: trix, lgoncalv, Xu, Yilun, Gerlach, Matthew > On 10/19/21 9:44 PM, Wu, Hao wrote: > >> Subject: [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init > >> > >> These patches address a request to log and clear any prexisting errors on > >> FPGA cards when the drivers load. Any existing errors will result in print > >> statements to the kernel error log before the errors are cleared. These > >> changes specifically affect the fme and port error registers. > > Could you please explain more about why we need this change? > > As we have user interface to log and clear errors already, is it a better choice > > to let userspace log and clear them during AFU initialization? > In the new architecture we are offering more flexibility to customers > for adding functions. With these designs it becomes nearly impossible > to design the AFU interface handler to recover from errors and resume > operation afterwards. The proposed solution is to flag the source of > the error and then capture it in sticky registers so that they can be > read out from SW in the event of a crash/warm boot. To ensure that we > capture these errors, the proposal is to log them in the kernel log and > clear them at driver initialization. The error can be logged and cleared at user space driver initialization, as current usage model is let userspace handle everything including error. Hao > > - Russ > > > Hao > > > >> Russ Weight (2): > >> fpga: dfl: afu: Clear port errors in afu init > >> fpga: dfl: fme: Clear fme global errors at driver init > >> > >> drivers/fpga/dfl-afu-error.c | 26 ++++--- > >> drivers/fpga/dfl-fme-error.c | 128 +++++++++++++++++++++++------------ > >> 2 files changed, 100 insertions(+), 54 deletions(-) > >> > >> -- > >> 2.25.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-10-21 0:43 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-10-19 23:15 [PATCH v1 0/2] fpga: dfl: Log and clear errors on driver init Russ Weight 2021-10-19 23:15 ` [PATCH v1 1/2] fpga: dfl: afu: Clear port errors in afu init Russ Weight 2021-10-20 4:15 ` Xu Yilun 2021-10-20 12:18 ` Tom Rix 2021-10-20 16:02 ` Russ Weight 2021-10-19 23:15 ` [PATCH v1 2/2] fpga: dfl: fme: Clear fme global errors at driver init Russ Weight 2021-10-20 12:59 ` Tom Rix 2021-10-20 4:44 ` [PATCH v1 0/2] fpga: dfl: Log and clear errors on " Wu, Hao 2021-10-21 0:05 ` Russ Weight 2021-10-21 0:43 ` Wu, Hao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).