From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: James Morse <james.morse@arm.com>
Cc: <linux-edac@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
<linux-efi@vger.kernel.org>, <linuxarm@huawei.com>,
<rjw@rjwysocki.net>, <tony.luck@intel.com>, <bp@alien8.de>,
<ard.beisheuvel@arm.com>, <nariman.poushin@linaro.org>,
<jcm@redhat.com>,
Charles Garcia-Tobin <Charles.Garcia-Tobin@arm.com>
Subject: Re: [RFC PATCH 0/6] CCIX Protocol Error reporting
Date: Tue, 6 Aug 2019 12:14:43 +0100 [thread overview]
Message-ID: <20190806121443.00006211@huawei.com> (raw)
In-Reply-To: <20190703140849.000018d6@huawei.com>
On Wed, 3 Jul 2019 14:08:49 +0100
Jonathan Cameron <jonathan.cameron@huawei.com> wrote:
> On Wed, 3 Jul 2019 10:28:08 +0100
> James Morse <james.morse@arm.com> wrote:
>
> > Hi Jonathan,
> >
> > (Beware: this CCIX thing is new to me, some of this will be questions on the scope of CCIX)
>
> Sure and welcome to our crazy world.
>
> >
> > On 06/06/2019 13:36, Jonathan Cameron wrote:
> > > UEFI 2.8 defines a new CPER record Appendix N for CCIX Protocol Error Records
> > > (PER). www.uefi.org
> >
> > > These include Protocol Error Record logs which are defined in the
> > > CCIX 1.0 Base Specification www.ccixconsortium.com.
> >
> > I can't find a public version of this spec, and I don't have an account to look at the
> > private one. (It looks like I could get one but I'm guessing there is an NDA between me
> > and the document text!)
> >
> > Will it ever be public?
>
> +cc A few people who might be able to help.
Evaluation version of the CCIX 1.0a base specification now available,
(though there is a form to complete and license agreement)..
https://www.ccixconsortium.com/ccix-library/download-form/
>
> >
> >
> > > Handling of coherency protocol errors is complex and how Linux does this
> > > will take some time to evolve. For now, fatal errors are handled via the
> > > usual means and everything else is reported.
> >
> > > There are 6 types of error defined, covering:
> > > * Memory errors
> >
> > How does this interact with the vanilla Memory-Error CPER records? Do we always get them
> > as a pair? (that struct definition looks familiar...)
>
> The snag is that the standard memory error doesn't provide quite enough information to
> actually identify a component once you have buses involved.
> The actual memory specific bit is nearly identical, but the header section isn't
> as it gives the CCIX agent ID (see below given you asked a more specific question
> on that).
>
> We did discuss this in a lot of detail and currently the answer is that the
> UEFI spec + ccix allows you to report only the ccix error if you want to.
>
> Note that I believe there is an OSC bit that I'm not currently setting so
> right now no firmware compliant with the CCIX SW Guide (information available
> from your local Charles) should actually be generating these anyway because
> the OS hasn't said it can handle them. I took the view I wanted to get
> round 2 which actually hooks up handling rather than just reporting before
> setting that bit.
>
> > Is memory described like this never present in the UEFI memory map? (so we never
> > accidentally use it as host memory, for processes etc)
> Yes it definitely can. If you want to you can have all your RAM attached by CCIX.
> (probably not a good idea performance wise :)
>
> More likely in the short term is SCM using something like the hmem patches
> to hotplug it as if it were RAM.
> https://patchwork.kernel.org/cover/10982677/
>
> In order to support legacy OS, it wouldn't be unusual to have a bios switch
> to make SCM on CCIX just appear as normal memory in the first place
> (if that was true though it is highly unlikely you'd use the CCIX
> PER messages.)
>
> >
> > ~
> >
> > include/linux/memremap.h has:
> > | * Device memory that is cache coherent from device and CPU point of view. This
> > | * is use on platform that have an advance system bus (like CAPI or CCIX). A
> > | * driver can hotplug the device memory using ZONE_DEVICE and with that memory
> > | * type. Any page of a process can be migrated to such memory. However no one
> > | * should be allow to pin such memory so that it can always be evicted.
> >
> > Is this the same CCIX?
>
> Not always, but sometimes. Typically if it's on an accelerator (GPU etc) then
> one software model is to do that. There are others though. Until we have
> more devices out there, its not even clear that model with dominate.
>
> It's a much argued about topic. My personal thoughts are you either have:
> 1. An accelerator that was designed from the ground up assuming that it had
> tight control of its own memory layout etc - someone bolted on coherency
> later. This is the current GPU model and fits with ZONE_DEVICE etc. GPUs
> can and do require shuffling memory.
>
> 2. An accelerator designed to run without it's own memory. We just decided
> to move some of the host memory to it's end of the bus for size or
> efficiency reasons. As it's reasonably happy with fragmentation and
> the assumption that someone else is doing it's memory management for it
> (the OS) you can run in an SVM type mode, with a bit of hinting by
> a userspace app on where it would prefer the accelerator accessed memory
> comes from. Those have no more constraints than any other memory. If
> you want to pin for RDMA for example, its fine.
>
> More importantly, it can just be normal DDR on a card with no extra functions.
> In those cases it's just a memory only NUMA node.
>
> >
> > If process memory can be migrated onto this stuff we need to kick off memory_failure() in
> > response to memory errors, otherwise we don't mark the pages as poison, signal user-space etc.
>
> Agreed. That was in the follow up patch set (which I haven't written yet)
> that actually does some error handling. I need to do some work on the
> emulation side to actually be able to generate sensible errors of that type.
>
> I'll stick some mention of that in the v2 cover letter.
> >
> >
> > > * Cache errors
> > > * Address translation unit errors
> >
> > > * CCIX port errors
> > > * CCIX link errors
> >
> > It looks like this stuff operates 'over' PCIe[0]. How does this interact with AER?
> > Will we always get a vanilla PCIE-AER CPER in addition to these two?
>
> Nope. You'll get an AER for PCIe protocol, transport and data link layers and
> PER error for CCIX protocol layer. We'll ignore the horrible abuse of
> Internal Error in AER that people do (pointing no fingers but the spec makes
> it clear what this is for and it's not what most people use it for).
>
> So depends on what went wrong. Some types of error (ripping out a board perhaps)
> might cause a cascade but in general they are independent systems. For extra
> fun they aren't even routed over the same physical links in some topologies.
> CCIX per is CCIX ID routed (and we have meshes), AER has to follow the PCIe
> tree. In theory the CCIX Error Agent doesn't actually to be in the host,
> and it certainly doesn't have to be via the same root port.
>
> >
> > (I see 'CHECK THE AER EQUIVALENT' comments in your series, are these TODO:?)
> Oops. That was a formatting question that I'd forgotten about.
>
> Both PCIe AER and CCIX PER have error cases in which part of the TLP
> (at different layers) is captured in the error record. I just wanted
> to sanity check if I'd used similar formatting.
>
> In some CCIX errors you get 32 bytes of the message that was in flight
> when the problem occurred.
>
> I'll actually check the formatting and scrub the comment or adjust it for v2.
> Sorry about that - these were kicking around internally for far too long
> so they have gone in out of my head several times.
>
> >
> >
> > > * Agent internal errors.
> > >
> > > The set includes tracepoints to report the errors to RAS Daemon and a patch
> > > set for RAS Daemon will follow shortly.
> > >
> > > There are several open questions for this RFC.
> > > 1. Reporting of vendor data. We have little choice but to do this via a
> > > dynamic array as these blocks can take arbitrary size. I had hoped
> > > no one would actually use these given the odd mismatch between a
> > > standard error structure and non standard element, but there are
> > > already designs out there that do use it.
> >
> > I think its okay to spit these blobs out of the trace points, but could we avoid printing
> > them in the kernel log?
>
> Seems like a reasonable compromise. Leave them in the tp_printk
> and if anyone really wants to fill their kernel logs then they can.
>
> >
> >
> > > 2. The trade off between explicit tracepoint fields, on which we might
> > > want to filter, and the simplicity of a blob. I have gone for having
> > > the whole of the block specific to the PER error type in an opaque blob.
> > > Perhaps this is not the right balance?
> >
> > (I suspect I don't understand this).
> >
> > The filtering can be done by user-space. Isn't that enough?
>
> It can, but then you have already logged them. As I understand it you can
> apply the filters at the logging stage. Will take another look at this.
> Still I'm happy with the current balance, and there it'll be done in
> userspace anyway.
>
> >
> > I see the memory-error event format file has 'pa', which is all anyone is likely to care
> > about.
> > Do 'source/component' indicate the device? (what do I pull out of the machine to stop
> > these happening?)
>
> The answer to that is a qualified yes. If you know the source ID and
> the configuration then it gets you to the actual device.
> For the various 'agents' Request Agent, Home Agent and Slave Agent,
> the sourceID is the globally unique (per type of agent) ID assigned during
> the topology configuration bit of firmware enumeration (it ends up as a field
> you can read from PCI config space). For other things sourceID is the device ID
> which is assigned at an earlier phase of enumeration. You may then need the
> portID to figure out exactly where it's broken (those are fixed for a given
> device).
>
> Ultimately you use these to find out what the BDF is and hopefully figure out
> which board to pull and throw out of the window.
>
> >
> >
> > > 3. Whether defining 6 new tracepoints is sensible. I think it is:
> > > * They are all defined by the CCIX specification as independant error
> > > classes.
> > > * Many of them can only be generated by particular types of agent.
> > > * The handling required will vary widely depending on types.
> > > In the kernel some map cleanly onto existing handling. Keeping the
> > > whole flow separate will aide this. They vary by a similar amount
> > > in scope to the RAS errors found on an existing system which have
> > > independent tracepoints.
> > > * Separating them out allows for filtering on the tracepoints by
> > > elements that are not shared between them.
> > > * Muxing the lot into one record type can lead to ugly code both in
> > > kernel and in userspace.
> >
> > Sold!
> >
> >
> > > This patch is being distributed by the CCIX Consortium, Inc. (CCIX) to
> > > you and other parties that are paticipating (the "participants") in the
> > > Linux kernel with the understanding that the participants will use CCIX's
> > > name and trademark only when this patch is used in association with the
> > > Linux kernel and associated user space.
> > >
> > > CCIX is also distributing this patch to these participants with the
> > > understanding that if any portion of the CCIX specification will be
> > > used or referenced in the Linux kernel, the participants will not modify
> > > the cited portion of the CCIX specification and will give CCIX propery
> > > copyright attribution by including the following copyright notice with
> > > the cited part of the CCIX specification:
> > > "© 2019 CCIX CONSORTIUM, INC. ALL RIGHTS RESERVED."
> >
> > Is this text permission from the CCIX-Consortium to reproduce bits of their spec in the
> > kernel?
> <standard I'm not a lawyer disclaimer>
>
> Hmm. The answer to that is 'sort of'. It technically says anything in this
> patch is fine if it used that statement. I believe it is kind of a catch all
> that just says quote under fair use terms, but as normal give clear copyright
> acknowledgment. Some of the answers that others are hopefully going to give
> to earlier questions should clarify why this matters.
>
> The main point of the whole statement being there at all is to give an explicit
> trademark grant to the kernel so that we can use CCIX(tm) without worrying about it.
> If you want to get scared, google the PCI-SIG(tm) trademark policy and think where
> we might be pushing the limits.
>
> We asked the CCIX legal advisors to come up with a clear statement
> that meant open source projects could get in trouble for sensible trademark use.
> This grant is apparently the best way to do it.
>
> >
> > I'm guessing any one who can hold of the spec is also prevented from publishing it.
> >
> > (Nits: participating, proper)
> Yup. I copy typed this from my windows laptop to the linux machine because
> I was too lazy to navigate Huawei security zones and did a really bad job.
> Sorry Frank! (hope Frank never see this)
>
> >
> >
> > Thanks,
> >
> > James
> >
> > [0] https://en.wikichip.org/wiki/ccix
>
>
prev parent reply other threads:[~2019-08-06 11:15 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-06 12:36 [RFC PATCH 0/6] CCIX Protocol Error reporting Jonathan Cameron
2019-06-06 12:36 ` [RFC PATCH 1/6] efi / ras: CCIX Memory error reporting Jonathan Cameron
2019-06-21 17:40 ` Jonathan Cameron
2019-06-06 12:36 ` [RFC PATCH 2/6] efi / ras: CCIX Cache " Jonathan Cameron
2019-06-06 12:36 ` [RFC PATCH 3/6] efi / ras: CCIX Address Translation " Jonathan Cameron
2019-06-06 12:36 ` [RFC PATCH 4/6] efi / ras: CCIX Port " Jonathan Cameron
2019-06-06 12:36 ` [RFC PATCH 5/6] efi / ras: CCIX Link " Jonathan Cameron
2019-06-06 12:36 ` [RFC PATCH 6/6] efi / ras: CCIX Agent internal " Jonathan Cameron
2019-06-25 11:34 ` [RFC PATCH 0/6] CCIX Protocol Error reporting Jonathan Cameron
2019-07-03 9:28 ` James Morse
2019-07-03 13:08 ` Jonathan Cameron
2019-08-06 11:14 ` Jonathan Cameron [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190806121443.00006211@huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=Charles.Garcia-Tobin@arm.com \
--cc=ard.beisheuvel@arm.com \
--cc=bp@alien8.de \
--cc=james.morse@arm.com \
--cc=jcm@redhat.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-efi@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=nariman.poushin@linaro.org \
--cc=rjw@rjwysocki.net \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).