All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: James Morse <james.morse@arm.com>
Cc: Zhengqiang <zhengqiang10@huawei.com>,
	Fan Wu <wufan@codeaurora.org>, <mchehab@kernel.org>,
	<bp@alien8.de>, <baicar.tyler@gmail.com>,
	<linux-edac@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	Linuxarm <linuxarm@huawei.com>,
	Xiaofei Tan <tanxiaofei@huawei.com>,
	wanghuiqiang <wanghuiqiang@huawei.com>,
	Shiju Jose <shiju.jose@huawei.com>
Subject: Re: [PATCH] EDAC, ghes: use CPER module handles to locate DIMMs
Date: Thu, 30 Aug 2018 17:50:53 +0100	[thread overview]
Message-ID: <6cc3c5e2-3827-a89c-e37b-09728a34f21f@huawei.com> (raw)
In-Reply-To: <5eab89c6-c063-cbc2-4d02-459faf87698a@arm.com>

On 30/08/2018 17:34, James Morse wrote:

Hi James,

Zhengqiang no longer works on this topic, so I have cc'ed some more guys 
who should be able to help.

John

> Hi Zhengqiang,
>
> On 29/08/18 19:33, Fan Wu wrote:
>> The current ghes_edac driver does not update per-dimm error
>> counters when reporting memory errors, because there is no
>> platform-independent way to find DIMMs based on the error
>> information provided by firmware. This patch offers a solution
>> for platforms whose firmwares provide valid module handles
>> (SMBIOS type 17) in error records. In this case ghes_edac will
>> use the module handles to locate DIMMs and thus makes per-dimm
>> error reporting possible.
>
> Does your platform set CPER_MEM_VALID_MODULE_HANDLE in GHES Memory errors? If
> so, any chance you could test this patch on your platform? [0]
> (original patch: https://lore.kernel.org/patchwork/patch/978928/)
>
> Thanks,
>
> James
>
> [0] https://marc.info/?l=linux-edac&m=152603960002324
>
>
>> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
>> index 473aeec..db527f0 100644
>> --- a/drivers/edac/ghes_edac.c
>> +++ b/drivers/edac/ghes_edac.c
>> @@ -81,6 +81,26 @@ static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
>>  		(*num_dimm)++;
>>  }
>>
>> +static int ghes_edac_dimm_index(u16 handle)
>> +{
>> +	struct mem_ctl_info *mci;
>> +	int i;
>> +
>> +	if (!ghes_pvt)
>> +		return -1;
>> +
>> +	mci = ghes_pvt->mci;
>> +
>> +	if (!mci)
>> +		return -1;
>> +
>> +	for (i = 0; i < mci->tot_dimms; i++) {
>> +		if (mci->dimms[i]->smbios_handle == handle)
>> +			return i;
>> +	}
>> +	return -1;
>> +}
>> +
>>  static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
>>  {
>>  	struct ghes_edac_dimm_fill *dimm_fill = arg;
>> @@ -177,6 +197,8 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
>>  				entry->total_width, entry->data_width);
>>  		}
>>
>> +		dimm->smbios_handle = entry->handle;
>> +
>>  		dimm_fill->count++;
>>  	}
>>  }
>> @@ -327,12 +349,20 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
>>  		p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos);
>>  	if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) {
>>  		const char *bank = NULL, *device = NULL;
>> +		int index = -1;
>> +
>>  		dmi_memdev_name(mem_err->mem_dev_handle, &bank, &device);
>> +		p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
>> +			     mem_err->mem_dev_handle);
>>  		if (bank != NULL && device != NULL)
>>  			p += sprintf(p, "DIMM location:%s %s ", bank, device);
>> -		else
>> -			p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
>> -				     mem_err->mem_dev_handle);
>> +
>> +		index = ghes_edac_dimm_index(mem_err->mem_dev_handle);
>> +		if (index >= 0) {
>> +			e->top_layer = index;
>> +			e->enable_per_layer_report = true;
>> +		}
>> +
>>  	}
>>  	if (p > e->location)
>>  		*(p - 1) = '\0';
>> diff --git a/include/linux/edac.h b/include/linux/edac.h
>> index bffb978..a45ce1f 100644
>> --- a/include/linux/edac.h
>> +++ b/include/linux/edac.h
>> @@ -451,6 +451,8 @@ struct dimm_info {
>>  	u32 nr_pages;			/* number of pages on this dimm */
>>
>>  	unsigned csrow, cschannel;	/* Points to the old API data */
>> +
>> +	u16 smbios_handle;              /* Handle for SMBIOS type 17 */
>>  };
>>
>>  /**
>>
>
>
> .
>



WARNING: multiple messages have this Message-ID (diff)
From: John Garry <john.garry@huawei.com>
To: James Morse <james.morse@arm.com>
Cc: Zhengqiang <zhengqiang10@huawei.com>,
	Fan Wu <wufan@codeaurora.org>,
	mchehab@kernel.org, bp@alien8.de, baicar.tyler@gmail.com,
	linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	Linuxarm <linuxarm@huawei.com>,
	Xiaofei Tan <tanxiaofei@huawei.com>,
	wanghuiqiang <wanghuiqiang@huawei.com>,
	Shiju Jose <shiju.jose@huawei.com>
Subject: EDAC, ghes: use CPER module handles to locate DIMMs
Date: Thu, 30 Aug 2018 17:50:53 +0100	[thread overview]
Message-ID: <6cc3c5e2-3827-a89c-e37b-09728a34f21f@huawei.com> (raw)

On 30/08/2018 17:34, James Morse wrote:

Hi James,

Zhengqiang no longer works on this topic, so I have cc'ed some more guys 
who should be able to help.

John

> Hi Zhengqiang,
>
> On 29/08/18 19:33, Fan Wu wrote:
>> The current ghes_edac driver does not update per-dimm error
>> counters when reporting memory errors, because there is no
>> platform-independent way to find DIMMs based on the error
>> information provided by firmware. This patch offers a solution
>> for platforms whose firmwares provide valid module handles
>> (SMBIOS type 17) in error records. In this case ghes_edac will
>> use the module handles to locate DIMMs and thus makes per-dimm
>> error reporting possible.
>
> Does your platform set CPER_MEM_VALID_MODULE_HANDLE in GHES Memory errors? If
> so, any chance you could test this patch on your platform? [0]
> (original patch: https://lore.kernel.org/patchwork/patch/978928/)
>
> Thanks,
>
> James
>
> [0] https://marc.info/?l=linux-edac&m=152603960002324
>
>
>> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
>> index 473aeec..db527f0 100644
>> --- a/drivers/edac/ghes_edac.c
>> +++ b/drivers/edac/ghes_edac.c
>> @@ -81,6 +81,26 @@ static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
>>  		(*num_dimm)++;
>>  }
>>
>> +static int ghes_edac_dimm_index(u16 handle)
>> +{
>> +	struct mem_ctl_info *mci;
>> +	int i;
>> +
>> +	if (!ghes_pvt)
>> +		return -1;
>> +
>> +	mci = ghes_pvt->mci;
>> +
>> +	if (!mci)
>> +		return -1;
>> +
>> +	for (i = 0; i < mci->tot_dimms; i++) {
>> +		if (mci->dimms[i]->smbios_handle == handle)
>> +			return i;
>> +	}
>> +	return -1;
>> +}
>> +
>>  static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
>>  {
>>  	struct ghes_edac_dimm_fill *dimm_fill = arg;
>> @@ -177,6 +197,8 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
>>  				entry->total_width, entry->data_width);
>>  		}
>>
>> +		dimm->smbios_handle = entry->handle;
>> +
>>  		dimm_fill->count++;
>>  	}
>>  }
>> @@ -327,12 +349,20 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
>>  		p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos);
>>  	if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) {
>>  		const char *bank = NULL, *device = NULL;
>> +		int index = -1;
>> +
>>  		dmi_memdev_name(mem_err->mem_dev_handle, &bank, &device);
>> +		p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
>> +			     mem_err->mem_dev_handle);
>>  		if (bank != NULL && device != NULL)
>>  			p += sprintf(p, "DIMM location:%s %s ", bank, device);
>> -		else
>> -			p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
>> -				     mem_err->mem_dev_handle);
>> +
>> +		index = ghes_edac_dimm_index(mem_err->mem_dev_handle);
>> +		if (index >= 0) {
>> +			e->top_layer = index;
>> +			e->enable_per_layer_report = true;
>> +		}
>> +
>>  	}
>>  	if (p > e->location)
>>  		*(p - 1) = '\0';
>> diff --git a/include/linux/edac.h b/include/linux/edac.h
>> index bffb978..a45ce1f 100644
>> --- a/include/linux/edac.h
>> +++ b/include/linux/edac.h
>> @@ -451,6 +451,8 @@ struct dimm_info {
>>  	u32 nr_pages;			/* number of pages on this dimm */
>>
>>  	unsigned csrow, cschannel;	/* Points to the old API data */
>> +
>> +	u16 smbios_handle;              /* Handle for SMBIOS type 17 */
>>  };
>>
>>  /**
>>
>
>
> .
>

WARNING: multiple messages have this Message-ID (diff)
From: john.garry@huawei.com (John Garry)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] EDAC, ghes: use CPER module handles to locate DIMMs
Date: Thu, 30 Aug 2018 17:50:53 +0100	[thread overview]
Message-ID: <6cc3c5e2-3827-a89c-e37b-09728a34f21f@huawei.com> (raw)
In-Reply-To: <5eab89c6-c063-cbc2-4d02-459faf87698a@arm.com>

On 30/08/2018 17:34, James Morse wrote:

Hi James,

Zhengqiang no longer works on this topic, so I have cc'ed some more guys 
who should be able to help.

John

> Hi Zhengqiang,
>
> On 29/08/18 19:33, Fan Wu wrote:
>> The current ghes_edac driver does not update per-dimm error
>> counters when reporting memory errors, because there is no
>> platform-independent way to find DIMMs based on the error
>> information provided by firmware. This patch offers a solution
>> for platforms whose firmwares provide valid module handles
>> (SMBIOS type 17) in error records. In this case ghes_edac will
>> use the module handles to locate DIMMs and thus makes per-dimm
>> error reporting possible.
>
> Does your platform set CPER_MEM_VALID_MODULE_HANDLE in GHES Memory errors? If
> so, any chance you could test this patch on your platform? [0]
> (original patch: https://lore.kernel.org/patchwork/patch/978928/)
>
> Thanks,
>
> James
>
> [0] https://marc.info/?l=linux-edac&m=152603960002324
>
>
>> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
>> index 473aeec..db527f0 100644
>> --- a/drivers/edac/ghes_edac.c
>> +++ b/drivers/edac/ghes_edac.c
>> @@ -81,6 +81,26 @@ static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
>>  		(*num_dimm)++;
>>  }
>>
>> +static int ghes_edac_dimm_index(u16 handle)
>> +{
>> +	struct mem_ctl_info *mci;
>> +	int i;
>> +
>> +	if (!ghes_pvt)
>> +		return -1;
>> +
>> +	mci = ghes_pvt->mci;
>> +
>> +	if (!mci)
>> +		return -1;
>> +
>> +	for (i = 0; i < mci->tot_dimms; i++) {
>> +		if (mci->dimms[i]->smbios_handle == handle)
>> +			return i;
>> +	}
>> +	return -1;
>> +}
>> +
>>  static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
>>  {
>>  	struct ghes_edac_dimm_fill *dimm_fill = arg;
>> @@ -177,6 +197,8 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
>>  				entry->total_width, entry->data_width);
>>  		}
>>
>> +		dimm->smbios_handle = entry->handle;
>> +
>>  		dimm_fill->count++;
>>  	}
>>  }
>> @@ -327,12 +349,20 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
>>  		p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos);
>>  	if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) {
>>  		const char *bank = NULL, *device = NULL;
>> +		int index = -1;
>> +
>>  		dmi_memdev_name(mem_err->mem_dev_handle, &bank, &device);
>> +		p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
>> +			     mem_err->mem_dev_handle);
>>  		if (bank != NULL && device != NULL)
>>  			p += sprintf(p, "DIMM location:%s %s ", bank, device);
>> -		else
>> -			p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
>> -				     mem_err->mem_dev_handle);
>> +
>> +		index = ghes_edac_dimm_index(mem_err->mem_dev_handle);
>> +		if (index >= 0) {
>> +			e->top_layer = index;
>> +			e->enable_per_layer_report = true;
>> +		}
>> +
>>  	}
>>  	if (p > e->location)
>>  		*(p - 1) = '\0';
>> diff --git a/include/linux/edac.h b/include/linux/edac.h
>> index bffb978..a45ce1f 100644
>> --- a/include/linux/edac.h
>> +++ b/include/linux/edac.h
>> @@ -451,6 +451,8 @@ struct dimm_info {
>>  	u32 nr_pages;			/* number of pages on this dimm */
>>
>>  	unsigned csrow, cschannel;	/* Points to the old API data */
>> +
>> +	u16 smbios_handle;              /* Handle for SMBIOS type 17 */
>>  };
>>
>>  /**
>>
>
>
> .
>

  reply	other threads:[~2018-08-30 16:51 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-29 18:33 [PATCH] EDAC, ghes: use CPER module handles to locate DIMMs Fan Wu
2018-08-29 18:33 ` Fan Wu
2018-08-29 18:33 ` wufan
2018-08-30 10:43 ` [PATCH] " Borislav Petkov
2018-08-30 10:43   ` Borislav Petkov
2018-08-30 10:43   ` Borislav Petkov
2018-08-30 14:20   ` [PATCH] " wufan
2018-08-30 14:20     ` wufan
2018-08-30 14:20     ` wufan
2018-08-30 15:12     ` [PATCH] " Boris Petkov
2018-08-30 15:12       ` Boris Petkov
2018-08-30 15:12       ` Borislav Petkov
2018-08-30 16:34   ` [PATCH] " James Morse
2018-08-30 16:34     ` James Morse
2018-08-30 16:34     ` James Morse
2018-08-30 10:48 ` [PATCH] " James Morse
2018-08-30 10:48   ` James Morse
2018-08-30 10:48   ` James Morse
2018-08-30 14:40   ` [PATCH] " wufan
2018-08-30 14:40     ` wufan
2018-08-30 14:40     ` wufan
2018-08-30 16:32     ` [PATCH] " James Morse
2018-08-30 16:32       ` James Morse
2018-08-30 16:32       ` James Morse
2018-08-30 16:45       ` [PATCH] " wufan
2018-08-30 16:45         ` wufan
2018-08-30 16:45         ` wufan
2018-08-30 16:46       ` [PATCH] " Tyler Baicar
2018-08-30 16:46         ` Tyler Baicar
2018-08-30 16:46         ` Tyler Baicar
2018-08-30 17:11         ` [PATCH] " wufan
2018-08-30 17:11           ` wufan
2018-08-30 17:11           ` wufan
2018-08-30 16:34 ` [PATCH] " James Morse
2018-08-30 16:34   ` James Morse
2018-08-30 16:34   ` James Morse
2018-08-30 16:50   ` John Garry [this message]
2018-08-30 16:50     ` [PATCH] " John Garry
2018-08-30 16:50     ` John Garry
2018-08-31 10:06     ` [PATCH] " tanxiaofei
2018-08-31 10:06       ` tanxiaofei
2018-08-31 10:06       ` tanxiaofei
2018-09-03 15:05       ` [PATCH] " wufan
2018-09-03 15:05         ` wufan
2018-09-03 15:05         ` wufan
2018-09-03 19:18         ` [PATCH] " Borislav Petkov
2018-09-03 19:18           ` Borislav Petkov
2018-09-03 19:18           ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6cc3c5e2-3827-a89c-e37b-09728a34f21f@huawei.com \
    --to=john.garry@huawei.com \
    --cc=baicar.tyler@gmail.com \
    --cc=bp@alien8.de \
    --cc=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=shiju.jose@huawei.com \
    --cc=tanxiaofei@huawei.com \
    --cc=wanghuiqiang@huawei.com \
    --cc=wufan@codeaurora.org \
    --cc=zhengqiang10@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.