linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Punit Agrawal <punit1.agrawal@toshiba.co.jp>
To: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Borislav Petkov <bp@alien8.de>,
	Smita Koralahalli Channabasappa <skoralah@amd.com>,
	Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>,
	<x86@kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-pm@vger.kernel.org>, <linux-edac@vger.kernel.org>,
	<linux-efi@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
	<devel@acpica.org>, Tony Luck <tony.luck@intel.com>,
	"Rafael J . Wysocki" <rjw@rjwysocki.net>,
	Len Brown <len.brown@intel.com>, Ard Biesheuvel <ardb@kernel.org>
Subject: Re: [PATCH v4] cper, apei, mce: Pass x86 CPER through the MCA handling chain
Date: Mon, 28 Sep 2020 17:06:36 +0900	[thread overview]
Message-ID: <87lfgugwab.fsf@kokedama.swc.toshiba.co.jp> (raw)
In-Reply-To: 20200925161940.GA21194@yaz-nikka.amd.com

Yazen Ghannam <yazen.ghannam@amd.com> writes:

> On Fri, Sep 25, 2020 at 09:54:06AM +0900, Punit Agrawal wrote:
>> Borislav Petkov <bp@alien8.de> writes:
>> 
>> > On Thu, Sep 24, 2020 at 12:23:27PM -0500, Smita Koralahalli Channabasappa wrote:
>> >> > Even though it's not defined in the UEFI spec, it doesn't mean a
>> >> > structure definition cannot be created.
>> >
>> > Created for what? That structure better have a big fat comment above it, what
>> > firmware generates its layout.
>> 
>> Maybe I could've used a better choice of words - I meant to define a
>> structure with meaningful member names to replace the *(ptr + i)
>> accesses in the patch.
>> 
>> The requirement for documenting the record layout doesn't change -
>> whether using raw pointer arithmetic vs a structure definition.
>> 
>> >> > After all, the patch is relying on some guarantee of the meaning of
>> >> > the values and their ordering.
>> >
>> > AFAICT, this looks like an ad-hoc definition and the moment they change
>> > it in some future revision, that struct of yours becomes invalid so we'd
>> > need to add another one.
>> 
>> If there's no spec backing the current layout, then it'll indeed be an
>> ad-hoc definition of a structure in the kernel. But considering that
>> it's part of firmware / OS interface for an important part of the RAS
>> story I would hope that the code is based on a spec - having that
>> reference included would help maintainability.
>> 
>> Incompatible changes will indeed break the assumptions in the kernel and
>> code will need to be updated - regardless of the choice of kernel
>> implementation; pointer arithmetic, structure definition - ad-hoc or
>> spec provided.
>> 
>> Having versioning will allow running older kernels on newer hardware and
>> vice versa - but I don't see why that is important only when using a
>> structure based access.
>>
>
> There is no versioning option for the x86 context info structure in the
> UEFI spec, so I don't think there'd be a clean way to include version
> information.
>
> The format of the data in the context info is not totally ad-hoc, and it
> does follow the UEFI spec. The "Register Array" field is raw data. This
> may follow one of the predefined formats in the UEFI spec like the "X64
> Register State", etc. Or, in the case of MSR and Memory Mapped
> Registers, this is a raw dump of the registers starting from the address
> shown in the structure. The two values that can be changed are the
> starting address and the array size. These two together provide a window
> to the registers. The registers are fixed, so a single context info
> struture should include a single contiguous range of registers. Multiple
> context info structures can be provided to include registers from
> different, non-contiguous ranges.
>
> This patch is checking if an MSR context info structure lines up with
> the MCAX register space used on Scalable MCA systems. This register
> space is defined in the AMD Processor Programming Reference for various
> products. This is considered a hardware feature extension, so the
> existing register layout won't change though new registers may be added.
> A layout change would require moving to another register space which is
> what happened going from legacy MCA (starting at address 0x400) to MCAX
> (starting at address 0xC0002000) registers.

Thanks for the SMCA related background.
>
> The only two things firmware can change are from what address does the
> info start and where does the info end. So the implementation-specific
> details here are that currently the starting address is MCA_STATUS (in
> MCAX space) for a bank and the remaining info includes the other MCA
> registers for this bank.
>
> So I think the kernel can be strict with this format, i.e. the two
> variables match what we're looking for. This patch already has a check
> on the starting address. It should also include a check that "Register
> Array Size" is large enough to include all the registers we want to
> extract. If the format doesn't match, then we fall back to a raw dump
> of the data like we have today.
>
> Or the kernel can be more flexible and try to find the window of
> registers based on the starting address. I think this is really
> open-ended though.

I think I understand the hesitancy here if the firmware can arbitrarily
move the starting address. Though I hope that doesn't happen as it would
break the feature introduced in $SUBJECT.

The way I read the code / spec led me to believe that the MSR context
info records in the SMCA space are just encoding the layout of MC Bank
registers[0] and making it explicit can only help.

But Boris seems to think the current approach is good enough. So no
objections from me.

Thanks,
Punit

[0] AMD Processor Programming Reference for Family 17H, Sec 3.1.5

      parent reply	other threads:[~2020-09-28  8:07 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-04 14:04 [PATCH v4] cper, apei, mce: Pass x86 CPER through the MCA handling chain Smita Koralahalli
2020-09-23 10:07 ` Punit Agrawal
2020-09-23 14:05   ` Borislav Petkov
2020-09-23 14:52     ` Ard Biesheuvel
2020-09-23 15:39       ` Borislav Petkov
2020-09-23 18:24         ` Ard Biesheuvel
2020-09-24  0:02     ` Punit Agrawal
2020-09-24 17:23       ` Smita Koralahalli Channabasappa
2020-09-24 17:50         ` Borislav Petkov
2020-09-25  0:54           ` Punit Agrawal
2020-09-25  7:07             ` Borislav Petkov
2020-09-25 16:19             ` Yazen Ghannam
2020-09-25 16:27               ` Borislav Petkov
2020-09-28  8:06               ` Punit Agrawal [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lfgugwab.fsf@kokedama.swc.toshiba.co.jp \
    --to=punit1.agrawal@toshiba.co.jp \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=devel@acpica.org \
    --cc=len.brown@intel.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=skoralah@amd.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).