linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab@kernel.org>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>, James Morse <james.morse@arm.com>,
	"Hawa, Hanna" <hhhawa@amazon.com>,
	"robh+dt@kernel.org" <robh+dt@kernel.org>,
	"Woodhouse, David" <dwmw@amazon.co.uk>,
	"paulmck@linux.ibm.com" <paulmck@linux.ibm.com>,
	"mark.rutland@arm.com" <mark.rutland@arm.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"nicolas.ferre@microchip.com" <nicolas.ferre@microchip.com>,
	"devicetree@vger.kernel.org" <devicetree@vger.kernel.org>,
	"Shenhar, Talel" <talel@amazon.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Chocron, Jonathan" <jonnyc@amazon.com>,
	"Krupnik, Ronen" <ronenk@amazon.com>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"Hanoch, Uri" <hanochu@amazon.com>
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC
Date: Wed, 12 Jun 2019 07:42:42 -0300	[thread overview]
Message-ID: <20190612074242.53a4cf56@coco.lan> (raw)
In-Reply-To: <08bd58dc0045670223f8d3bbc8be774505bd3ddf.camel@kernel.crashing.org>

Em Wed, 12 Jun 2019 18:29:26 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> escreveu:

> On Wed, 2019-06-12 at 05:48 +0200, Borislav Petkov wrote:
> > On Wed, Jun 12, 2019 at 08:25:52AM +1000, Benjamin Herrenschmidt wrote:  
> > > Yes, we would be in a world of pain already if tracepoints couldn't
> > > handle concurrency :-)  
> > 
> > Right, lockless buffer and the whole shebang :)  
> 
> Yup.
> 
> > > Sort-of... I still don't see a race in what we propose but I might be
> > > missing something subtle. We are talking about two drivers for two
> > > different IP blocks updating different counters etc...  
> > 
> > If you do only *that* you should be fine. That should technically be ok.  
> 
> Yes, that' the point.

I don't think the EDAC core has any troubles with concurrency. 

As far as I remember, the hacks we had to do at x86 world are due to 
concurrency at the hardware side: having two RAS codes (either between
a driver and BIOS APEI/GHES or between two drivers) accessing the same 
set of MCA registers doesn't work at the Intel chipsets I'm aware of.

> 
> > I still think, though, that the sensible thing to do is have one
> > platform driver which concentrates all RAS functionality.   
>

> I tend to disagree here. We've been down that rabbit hole in the past
> and we (Linux in general) are trying to move away from that sort of
> "platform" overarching driver as much as possible.

Without entering on ARM-specific architecture design, I tend to agree
with the principle that different hardware components would use different
RAS drivers, provided that they use the already existing RAS cores.

Up to some extend, we have already multiple RAS drivers right now.

Besides EDAC/MCE, there are for example, PCIe errors that can come via 
AER. Network drivers also report errors using different mechanisms.
I know a person that it is interested on working on an implementation
to collect disk errors using a trace-based error report mechanism.

So, at the end of the day, different errors may come from different
drivers using different non-overlapping error mechanisms.

That's said, from the admin PoV, it makes sense to have a single
daemon that collect errors from all error sources and take the
needed actions.

> 
> > It is the
> > more sensible design and takes care of potential EDAC shortcomings and
> > the need to communicate between the different logging functionality,
> > as in, for example, "I had so many errors, lemme go and increase DRAM
> > scrubber frequency." For example. And all the other advantages of having
> > everything in a single driver.  
> 
> This is a policy. It should either belong to userspace, or be in some
> generic RAS code in the kernel, there's no reason why these can't be
> abstracted. Also in your specific example, it could be entirely local
> to the MC EDAC / DRAM controller path, we could have a generic way for
> EDAC to advertise that a given memory channel is giving lots of errors
> and have memory controller drivers listen to it but usually the EDAC MC
> driver *is* the only thing that looks like a MC driver to begin with,
> so again, pretty much no overlap with L1/L2 caches RAS or PCIe RAS
> etc...

On userspace, I guess there's just the rasdaemon and the old legacy
edac-utils. 

Right now, the rasdaemon doesn't have anything like that, but we keep
integrating new RAS report mechanisms to it (we just added support
this week for network devlink error reports). 

If I had to put a policy like that on userspace, rasdaemon should probably 
be the right place to add it.

Thanks,
Mauro

  parent reply	other threads:[~2019-06-12 10:43 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-30 10:15 [PATCH 0/2] Add support for Amazon's Annapurna Labs EDAC for L1/L2 Hanna Hawa
2019-05-30 10:15 ` [PATCH 1/2] dt-bindings: EDAC: add Amazon Annapurna Labs EDAC binding Hanna Hawa
2019-05-30 11:54   ` Greg KH
2019-05-31  0:35     ` Borislav Petkov
2019-05-30 10:15 ` [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC Hanna Hawa
2019-05-30 11:57   ` Greg KH
2019-05-30 12:52     ` hhhawa
2019-05-30 13:04       ` Joe Perches
2019-05-30 18:19   ` Boris Petkov
2019-05-31  1:15     ` Herrenschmidt, Benjamin
2019-05-31  5:14       ` Borislav Petkov
2019-06-05 15:13         ` James Morse
2019-06-06  7:53         ` Hawa, Hanna
2019-06-06 10:03           ` Borislav Petkov
2019-06-06 10:33           ` James Morse
2019-06-06 11:22             ` Borislav Petkov
2019-06-06 11:37             ` Shenhar, Talel
2019-06-07 15:11               ` James Morse
2019-06-08  0:22                 ` Benjamin Herrenschmidt
2019-06-08  0:16             ` Benjamin Herrenschmidt
2019-06-08  9:05               ` Borislav Petkov
2019-06-11  5:50                 ` Benjamin Herrenschmidt
2019-06-11  7:21                   ` Benjamin Herrenschmidt
2019-06-11 11:56                     ` Borislav Petkov
2019-06-11 22:25                       ` Benjamin Herrenschmidt
2019-06-12  3:48                         ` Borislav Petkov
2019-06-12  8:29                           ` Benjamin Herrenschmidt
2019-06-12 10:42                             ` Borislav Petkov
2019-06-12 23:54                               ` Benjamin Herrenschmidt
2019-06-13  7:44                                 ` Borislav Petkov
2019-06-14 10:53                                 ` Borislav Petkov
2019-06-12 10:42                             ` Mauro Carvalho Chehab [this message]
2019-06-12 11:00                               ` Borislav Petkov
2019-06-12 11:42                                 ` Mauro Carvalho Chehab
2019-06-12 11:57                                   ` Benjamin Herrenschmidt
2019-06-12 12:25                                     ` Borislav Petkov
2019-06-12 12:35                                       ` Hawa, Hanna
2019-06-12 15:34                                         ` Borislav Petkov
2019-06-12 23:57                                       ` Benjamin Herrenschmidt
2019-06-12 23:56                                 ` Benjamin Herrenschmidt
2019-06-11  7:29                   ` Hawa, Hanna
2019-06-11 11:59                     ` Borislav Petkov
2019-06-11 11:47                   ` Borislav Petkov
2019-06-03  6:56       ` Hawa, Hanna
2019-06-05 15:16   ` James Morse
2019-06-11 19:56     ` Hawa, Hanna
2019-06-13 17:05       ` James Morse
2019-06-14 10:49         ` James Morse
2019-06-17 13:00         ` Hawa, Hanna
2019-06-19 17:22           ` James Morse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190612074242.53a4cf56@coco.lan \
    --to=mchehab@kernel.org \
    --cc=benh@kernel.crashing.org \
    --cc=bp@alien8.de \
    --cc=davem@davemloft.net \
    --cc=devicetree@vger.kernel.org \
    --cc=dwmw@amazon.co.uk \
    --cc=gregkh@linuxfoundation.org \
    --cc=hanochu@amazon.com \
    --cc=hhhawa@amazon.com \
    --cc=james.morse@arm.com \
    --cc=jonnyc@amazon.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=nicolas.ferre@microchip.com \
    --cc=paulmck@linux.ibm.com \
    --cc=robh+dt@kernel.org \
    --cc=ronenk@amazon.com \
    --cc=talel@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).