linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robert Richter <rrichter@marvell.com>
To: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 02/24] EDAC, ghes: Fix grain calculation
Date: Mon, 12 Aug 2019 06:42:00 +0000	[thread overview]
Message-ID: <20190812064147.5czmkj7e6hxgvje3@rric.localdomain> (raw)
In-Reply-To: <20190809131559.GF2152@zn.tnic>

On 09.08.19 15:15:59, Borislav Petkov wrote:
> On Mon, Jun 24, 2019 at 03:08:57PM +0000, Robert Richter wrote:
> > The conversion from the physical address mask to a grain (defined as
> > granularity in bytes) is broken:
> > 
> > 	e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
> > 
> > E.g., a physical address mask of ~0xfff should give a grain of 0x1000,
> > instead the grain is wrong with the upper bits always set. We also
> > remove the limitation to the page size as the granularity is unrelated
> > to the page size used in the system. We fix this with:
> > 
> > 	e->grain = ~mem_err->physical_addr_mask + 1;
> > 
> > Note: We need to adopt the grain_bits calculation as e->grain is now a
> > power of 2 and no longer a bit mask. The formula is now the same as in
> > edac_mc and can later be unified.
> 
> Please refrain from using "We" or "I" or etc personal pronouns in a
> commit message and in the code comments below.
> 
> >From Documentation/process/submitting-patches.rst:
> 
>  "Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
>   instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
>   to do frotz", as if you are giving orders to the codebase to change
>   its behaviour."
> 
> Please fix all your other commit messages for the next submission.

Sure, will reword.

I have seen you had actively promoted this style guideline, I even was
not aware of it, thanks for the pointer.

> 
> > Signed-off-by: Robert Richter <rrichter@marvell.com>
> > ---
> >  drivers/edac/ghes_edac.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> > index 7f19f1c672c3..d095d98d6a8d 100644
> > --- a/drivers/edac/ghes_edac.c
> > +++ b/drivers/edac/ghes_edac.c
> > @@ -222,6 +222,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
> >  	/* Cleans the error report buffer */
> >  	memset(e, 0, sizeof (*e));
> >  	e->error_count = 1;
> > +	e->grain = 1;
> >  	strcpy(e->label, "unknown label");
> >  	e->msg = pvt->msg;
> >  	e->other_detail = pvt->other_detail;
> > @@ -317,7 +318,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
> >  
> >  	/* Error grain */
> >  	if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
> > -		e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
> > +		e->grain = ~mem_err->physical_addr_mask + 1;
> 
> This is assuming that that ->physical_addr_mask is contiguous but I
> don't trust any firmware. I guess we can leave it like that for now
> until some "inventive" firmware actually does it.

With the grain_bits calculation the mask is rounded up to the next
power of 2 value. I therefore don't see any issues for non-contiguous
bit masks. I have updated the patch description.

> 
> >  
> >  	/* Memory error location, mapped on e->location */
> >  	p = e->location;
> > @@ -433,8 +434,15 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
> >  	if (p > pvt->other_detail)
> >  		*(p - 1) = '\0';
> >  
> > +	/*
> > +	 * We expect the hw to report a reasonable grain, fallback to
> > +	 * 1 byte granularity otherwise.
> > +	 */
> > +	if (WARN_ON_ONCE(!e->grain))
> 
> Please move that WARN_ON_ONCE in the
> 
> 	if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
> 
> branch above because you're presetting grain to 1 so the warn should be
> close to where it could happen, i.e., when coming from the firmware.

The reason this is here is because this check will be moved to
edac_raw_mc_handle_error() to unify edac_mc and ghes code (see patch
#4). I understand the warn should be close to its source, on the other
side we need the check for all the drivers that setup the grain. Thus,
it cannot be in the driver that is setting up the grain.

Thanks,

-Robert

  reply	other threads:[~2019-08-12  6:42 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-24 15:08 [PATCH v2 00/24] EDAC, mc, ghes: Fixes and updates to improve memory error reporting Robert Richter
2019-06-24 15:08 ` [PATCH v2 01/24] EDAC, mc: Fix grain_bits calculation Robert Richter
2019-08-03 10:08   ` Borislav Petkov
2019-06-24 15:08 ` [PATCH v2 02/24] EDAC, ghes: Fix grain calculation Robert Richter
2019-08-09 13:15   ` Borislav Petkov
2019-08-12  6:42     ` Robert Richter [this message]
2019-08-12  7:32       ` Borislav Petkov
2019-08-12 12:05         ` Robert Richter
2019-08-12 12:38           ` Borislav Petkov
2019-06-24 15:08 ` [PATCH v2 03/24] EDAC, ghes: Remove pvt->detail_location string Robert Richter
2019-08-02 17:04   ` James Morse
2019-08-07  9:00     ` Robert Richter
2019-08-13  8:09   ` Borislav Petkov
2019-06-24 15:09 ` [PATCH v2 04/24] EDAC, ghes: Unify trace_mc_event() code with edac_mc driver Robert Richter
2019-06-24 15:09 ` [PATCH v2 05/24] EDAC, mc: Fix and improve sysfs init functions Robert Richter
2019-08-13  8:26   ` Borislav Petkov
2019-06-24 15:09 ` [PATCH v2 06/24] EDAC: Kill EDAC_DIMM_PTR() macro Robert Richter
2019-08-13 14:59   ` Borislav Petkov
2019-08-27 12:20     ` Robert Richter
2019-06-24 15:09 ` [PATCH v2 07/24] EDAC: Kill EDAC_DIMM_OFF() macro Robert Richter
2019-08-14 14:52   ` Borislav Petkov
2019-06-24 15:09 ` [PATCH v2 08/24] EDAC: Introduce mci_for_each_dimm() iterator Robert Richter
2019-08-14 15:18   ` Borislav Petkov
2019-08-28  8:18     ` Robert Richter
2019-06-24 15:09 ` [PATCH v2 09/24] EDAC, mc: Cleanup _edac_mc_free() code Robert Richter
2019-08-14 16:31   ` Borislav Petkov
2019-06-24 15:09 ` [PATCH v2 10/24] EDAC, mc: Remove per layer counters Robert Richter
2019-08-16  9:24   ` Borislav Petkov
2019-06-24 15:09 ` [PATCH v2 11/24] EDAC, mc: Rework edac_raw_mc_handle_error() to use struct dimm_info Robert Richter
2019-06-24 15:09 ` [PATCH v2 12/24] EDAC, ghes: Use standard kernel macros for page calculations Robert Richter
2019-08-02 17:04   ` James Morse
2019-08-07  9:52     ` Robert Richter
2019-06-24 15:09 ` [PATCH v2 13/24] EDAC, ghes: Add support for legacy API counters Robert Richter
2019-08-16  9:55   ` Borislav Petkov
2019-08-30  9:35     ` Robert Richter
2019-06-24 15:09 ` [PATCH v2 14/24] EDAC, ghes: Rework memory hierarchy detection Robert Richter
2019-08-20  8:56   ` Borislav Petkov
2019-06-24 15:09 ` [PATCH v2 15/24] EDAC, ghes: Extract numa node information for each dimm Robert Richter
2019-08-02 17:05   ` James Morse
2019-08-09 13:09     ` Robert Richter
2019-06-24 15:09 ` [PATCH v2 16/24] EDAC, ghes: Moving code around ghes_edac_register() Robert Richter
2019-06-24 15:09 ` [PATCH v2 17/24] EDAC, ghes: Create one memory controller device per node Robert Richter
2019-06-24 15:09 ` [PATCH v2 18/24] EDAC, ghes: Fill sysfs with the DMI DIMM label information Robert Richter
2019-06-24 15:09 ` [PATCH v2 19/24] EDAC, mc: Introduce edac_mc_alloc_by_dimm() for per dimm allocation Robert Richter
2019-06-24 15:09 ` [PATCH v2 20/24] EDAC, ghes: Identify dimm by node, card, module and handle Robert Richter
2019-06-24 15:09 ` [PATCH v2 21/24] EDAC, ghes: Enable per-layer reporting based on card/module Robert Richter
2019-06-24 15:09 ` [PATCH v2 22/24] EDAC, ghes: Move struct member smbios_handle to struct ghes_dimm_info Robert Richter
2019-06-24 15:09 ` [PATCH v2 23/24] EDAC, Documentation: Describe CPER module definition and DIMM ranks Robert Richter
2019-06-24 15:09 ` [PATCH v2 24/24] EDAC, ghes: Disable legacy API for ARM64 Robert Richter
2019-06-26  9:33   ` James Morse
2019-06-26 10:11     ` Robert Richter
2019-08-02  7:58 ` [PATCH v2 00/24] EDAC, mc, ghes: Fixes and updates to improve memory error reporting Robert Richter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190812064147.5czmkj7e6hxgvje3@rric.localdomain \
    --to=rrichter@marvell.com \
    --cc=bp@alien8.de \
    --cc=james.morse@arm.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).