linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robert Richter <rrichter@marvell.com>
To: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	Toshi Kani <toshi.kani@hpe.com>
Subject: Re: [PATCH] EDAC, ghes: Fix grain calculation
Date: Fri, 14 Jun 2019 14:40:01 +0000	[thread overview]
Message-ID: <20190614143932.cacobehkuw62frzx@rric.localdomain> (raw)
In-Reply-To: <20190614100918.GA2586@zn.tnic>

On 14.06.19 12:09:18, Borislav Petkov wrote:
> On Fri, Jun 14, 2019 at 07:21:47AM +0000, Robert Richter wrote:
> > On 14.06.19 00:41:30, Borislav Petkov wrote:
> > > On Thu, Jun 13, 2019 at 09:07:38PM +0000, Robert Richter wrote:
> > > >  grain_bits = fls_long(e->grain) + 1;
> > > 
> > > For 4K grain this makes grain_bits == 12.
> > 
> > fls(0) == 0
> > fls(1) == 1
> > fls(2) == 2
> > fls(4) == 3
> > ...
> > fls(0x1000) == 13
> 
> I believe the intent of the "+ 1" above is that e->grain was a mask of
> the type 0xfff so you have to + 1 since we count from bit 0.

No, as you grepped below, the granularity is the number of bytes, and
mostly power of 2. Also note that fls(0xfff) is 12 and fls(0x1000) is
13 ...

> 
> > Correct for edac_mc is:
> > 
> >  grain_bits = fls_long(e->grain ? e->grain - 1 : 0);

... thus the -1 here.

> 
> First of all, you don't wanna do fls(0).

You need to deal with fls(0) as you don't know what drivers fill in
for the grain. fls(0) calculates fine here to 1 grain bit which is a
one byte granularity.

> 
> Then, we don't want to use PAGE_MASK for grain computation because as
> James showed, it causes problems.
> 
> > James' calculation for ghes is correct in this particular case
> > (assuming a bit mask with (power of 2 - 1)).
> 
> And this is the question that needs to be clarified first: from what do
> we compute the grain.
> 
> ghes_edac uses ~physical_addr_mask which can basically be:
> 
> 	grain_bits = hweight_long(physical_addr_mask);
> 
> which, in the 0xfff case gives you the correct 12 bits. It is assuming
> it is a contiguous mask though. It better be...
> 
> Now, if you look at how the grain gets read out in edac_mc_handle_error(), it
> comes from dimm->grain. And that is defined as:
> 
>         u32 grain;              /* granularity of reported error in bytes */
> 
> in bytes!
> 
> Now, if you grep for "grain" in drivers/edac/ you see most (if not all)
> initialized as bytes:
> 
> i82975x_edac.c:415:                     dimm->grain = 1 << 7;   /* 128Byte cache-line resolution */
> pnd2_edac.c:1260:               dimm->grain = 32;
> skx_common.c:313:       dimm->grain = 32;
> highbank_mc_edac.c:223: dimm->grain = 8;
> ...
> 
> and so on.
> 
> So actually I think that
> 
> 	grain_bits = fls_long(e->grain) + 1;
> 
> is wrong in that case.

This calculates correctly:

	grain_bits = fls_long(e->grain ? e->grain - 1 : 0);

and also handles the case if the number of bytes is not power of 2.

> It should simply hand down e->grain as it is
> already the number of bytes.
> Which we should simply dump in the tracepoint too - the number of bytes
> and not that silly shifting there:
> 
> 	1 << __entry->grain_bits,

The tracepoint format for mc_event changes from byte to long for the
grain then. This is not backward compatible and breaks userland.

> 
> And once we've said that the grain is going to *everywhere* be the
> granularity of the error in number of bytes, then we should stick to
> that and fix all those drivers which don't do that.
> 
> Ok?

See my other mail in this thread with a diff. I would go with that
one.

-Robert

  reply	other threads:[~2019-06-14 14:40 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-29 15:22 [PATCH] EDAC, ghes: Fix grain calculation James Morse
2019-05-30 16:50 ` Kani, Toshi
2019-06-12  4:34 ` Borislav Petkov
2019-06-13 17:06   ` James Morse
2019-06-13 19:18     ` Borislav Petkov
2019-06-13 21:07       ` Robert Richter
2019-06-13 22:41         ` Borislav Petkov
2019-06-14  7:21           ` Robert Richter
2019-06-14  8:39             ` Robert Richter
2019-06-14 10:09             ` Borislav Petkov
2019-06-14 14:40               ` Robert Richter [this message]
2019-06-14 15:06                 ` Borislav Petkov
2019-06-14  6:32         ` Robert Richter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190614143932.cacobehkuw62frzx@rric.localdomain \
    --to=rrichter@marvell.com \
    --cc=bp@alien8.de \
    --cc=james.morse@arm.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=toshi.kani@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).