linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robert Richter <rrichter@marvell.com>
To: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>, Tony Luck <tony.luck@intel.com>,
	"James Morse" <james.morse@arm.com>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 14/20] EDAC, mc: Remove per layer counters
Date: Mon, 18 Nov 2019 20:30:29 +0000	[thread overview]
Message-ID: <20191118203021.jvlerg4k645g43yn@rric.localdomain> (raw)
In-Reply-To: <20191109084056.35b4b8ab@kernel.org>

Hi Mauro,

On 09.11.19 08:40:56, Mauro Carvalho Chehab wrote:
> Em Wed, 6 Nov 2019 09:33:32 +0000
> Robert Richter <rrichter@marvell.com> escreveu:
> 
> > Looking at how mci->{ue,ce}_per_layer[EDAC_MAX_LAYERS] is used, it
> > turns out that only the leaves in the memory hierarchy are consumed
> > (in sysfs), but not the intermediate layers, e.g.:
> > 
> >  count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx];

first of all, this is the only user, where ce_per_layer[][] is
accessed *readable*. Note that n_layers is a constant value per
mci. Thus we could also convert this without any change of
functionality to:

 count = dimm->mci->ce_counts[dimm->idx];

We can also remove the code that writes counter values to inner
layers, those values are never read.

The above is nothing else than storing the count per DIMM, which can
be converted to the following by just adding the count to struct
dimm_info:

 count = dimm->ce_count;

Same applies to ue_count.

As we have the counts in struct dimm_info now, we no longer need to
allocate {ue,ce}_counts arrays and can remove its allocation and
release code including everything around.

> > 
> > These unused counters only add complexity, remove them. The error
> > counter values are directly stored in struct dimm_info now.
> 
> I guess this patch will cause troubles with some memory controllers.
> 
> The problem is that, depending on the memory type and how many bits
> are wrong, it may not be technically possible to pinpoint an error
> to a single DIMM.

If a DIMM can not be identified, the MC has one or more of the pos
values (pos[0] to pos[mci->n_layers-1]) unset (negative values). The
count of the outer layer (mci->ce_per_layer[mci->n_layers][index]) is
not written then. See below in function edac_inc_ce_error().

> 
> I mean, the memory controller can be, for instance, grouping
> DIMM1 and DIMM2. If there's just one bit errored, it is possible to
> assign it to either DIMM1 or DIMM2, but if there are multiple bits
> wrong, most ECC codes won't allow to pinpoint if the error ocurred
> at DIMM1 or at DIMM2.

An error would not be counted for any DIMM then.

> 
> All we know is that the layer has an error.

Right, but this hasn't any effect on DIMM error counters.

This has only effect to csrow/channel counters. The code for this did
not change, see edac_mc_handle_error().

> 
> So, assigning the error to the dimm struct seems plain wrong to me.

I think this is the code in question for you:

> >  static void edac_inc_ce_error(struct mem_ctl_info *mci,
> > -			      bool enable_per_layer_report,
> >  			      const int pos[EDAC_MAX_LAYERS],
> >  			      const u16 count)
> >  {
> > -	int i, index = 0;
> > +	struct dimm_info *dimm = edac_get_dimm(mci, pos[0], pos[1], pos[2]);
> >  
> >  	mci->ce_mc += count;
> >  
> > -	if (!enable_per_layer_report) {
> > +	if (dimm)
> > +		dimm->ce_count += count;
> > +	else
> >  		mci->ce_noinfo_count += count;
> > -		return;
> > -	}
> > -
> > -	for (i = 0; i < mci->n_layers; i++) {
> > -		if (pos[i] < 0)
> > -			break;
> > -		index += pos[i];
> > -		mci->ce_per_layer[i][index] += count;

No value written here if pos[] < 0.

> > -
> > -		if (i < mci->n_layers - 1)
> > -			index *= mci->layers[i + 1].size;
> > -	}

So in an intermediate step the for loop could be converted to:

	dimm = edac_get_dimm(mci, pos[0], pos[1], pos[2]);

	if (dimm)
		mci->ce_per_layer[mci->n_layers - 1][dimm->idx] += count;

No change in functionality, right?

> >  }

I hope this explains what this patch does.

It looks sane to me, please review again. If you still think it is
broken, give me an example.

Thanks,

-Robert

  reply	other threads:[~2019-11-18 20:30 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-06  9:32 [PATCH v2 00/20] EDAC: Rework edac_mc and ghes drivers Robert Richter
2019-11-06  9:33 ` [PATCH v2 01/20] EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function Robert Richter
2019-11-09  7:23   ` Mauro Carvalho Chehab
2019-11-06  9:33 ` [PATCH v2 02/20] EDAC: Remove EDAC_DIMM_OFF() macro Robert Richter
2019-11-09  7:27   ` Mauro Carvalho Chehab
2019-11-06  9:33 ` [PATCH v2 03/20] EDAC: Introduce mci_for_each_dimm() iterator Robert Richter
2019-11-10 11:37   ` Borislav Petkov
2019-11-06  9:33 ` [PATCH v2 04/20] EDAC, mc: Do not BUG_ON() in edac_mc_alloc() Robert Richter
2019-11-06  9:33 ` [PATCH v2 05/20] EDAC, mc: Remove needless zero string termination Robert Richter
2019-11-09  7:28   ` Mauro Carvalho Chehab
2019-11-09 10:11   ` Borislav Petkov
2019-11-06  9:33 ` [PATCH v2 06/20] EDAC, mc: Reduce indentation level in edac_mc_handle_error() Robert Richter
2019-11-06  9:33 ` [PATCH v2 07/20] EDAC, mc: Rename iterator variable to idx Robert Richter
2019-11-09 10:21   ` Borislav Petkov
2019-11-06  9:33 ` [PATCH v2 08/20] EDAC: Remove misleading comment in struct edac_raw_error_desc Robert Richter
2019-11-06  9:33 ` [PATCH v2 09/20] EDAC, ghes: Use standard kernel macros for page calculations Robert Richter
2019-11-06  9:33 ` [PATCH v2 10/20] EDAC, ghes: Fix grain calculation Robert Richter
2019-11-06  9:33 ` [PATCH v2 11/20] EDAC, ghes: Remove intermediate buffer pvt->detail_location Robert Richter
2019-11-06  9:33 ` [PATCH v2 12/20] EDAC, ghes: Unify trace_mc_event() code with edac_mc driver Robert Richter
2019-11-06  9:33 ` [PATCH v2 13/20] EDAC, Documentation: Describe CPER module definition and DIMM ranks Robert Richter
2019-11-06  9:33 ` [PATCH v2 14/20] EDAC, mc: Remove per layer counters Robert Richter
2019-11-09  7:40   ` Mauro Carvalho Chehab
2019-11-18 20:30     ` Robert Richter [this message]
2019-11-06  9:33 ` [PATCH v2 15/20] EDAC, mc: Split edac_mc_alloc() into smaller functions Robert Richter
2019-11-06  9:33 ` [PATCH v2 16/20] EDAC, mc: Reorder functions edac_mc_alloc*() Robert Richter
2019-11-06  9:33 ` [PATCH v2 17/20] EDAC, mc: Rework edac_raw_mc_handle_error() to use struct dimm_info Robert Richter
2019-11-06  9:33 ` [PATCH v2 18/20] EDAC: Store error type in struct edac_raw_error_desc Robert Richter
2019-11-09  7:30   ` Mauro Carvalho Chehab
2019-11-06  9:33 ` [PATCH v2 19/20] EDAC, mc: Determine mci pointer from the error descriptor Robert Richter
2019-11-06  9:33 ` [PATCH v2 20/20] EDAC, mc: Create new function edac_inc_csrow() Robert Richter
2019-11-09  7:32   ` Mauro Carvalho Chehab
2019-11-10 14:51 ` [PATCH v2 00/20] EDAC: Rework edac_mc and ghes drivers Borislav Petkov
2019-11-11 15:49   ` Robert Richter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191118203021.jvlerg4k645g43yn@rric.localdomain \
    --to=rrichter@marvell.com \
    --cc=bp@alien8.de \
    --cc=james.morse@arm.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).