linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: Tyler Baicar <baicar.tyler@gmail.com>, Borislav Petkov <bp@alien8.de>
Cc: Rafael Wysocki <rjw@rjwysocki.net>,
	Tony Luck <tony.luck@intel.com>, Fan Wu <wufan@codeaurora.org>,
	linux-mm@kvack.org, Marc Zyngier <marc.zyngier@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Xie XiuQi <xiexiuqi@huawei.com>,
	Will Deacon <will.deacon@arm.com>,
	Christoffer Dall <christoffer.dall@arm.com>,
	Dongjiu Geng <gengdongjiu@huawei.com>,
	Linux ACPI <linux-acpi@vger.kernel.org>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	kvmarm@lists.cs.columbia.edu,
	arm-mail-list <linux-arm-kernel@lists.infradead.org>,
	Len Brown <lenb@kernel.org>
Subject: Re: [PATCH v7 10/25] ACPI / APEI: Tell firmware the estatus queue consumed the records
Date: Fri, 11 Jan 2019 18:09:28 +0000	[thread overview]
Message-ID: <0939c14d-de58-f21d-57a6-89bdce3bcb44@arm.com> (raw)
In-Reply-To: <CABo9ajAk5XNBmNHRRfUb-dQzW7-UOs5826jPkrVz-8zrtMUYkg@mail.gmail.com>

Hi guys,

On 11/01/2019 15:32, Tyler Baicar wrote:
> On Fri, Jan 11, 2019 at 7:03 AM Borislav Petkov <bp@alien8.de> wrote:
>> On Thu, Jan 10, 2019 at 04:01:27PM -0500, Tyler Baicar wrote:
>>> On Thu, Jan 10, 2019 at 1:23 PM James Morse <james.morse@arm.com> wrote:
>>>>>>
>>>>>> +    if (is_hest_type_generic_v2(ghes) && ghes_ack_error(ghes->generic_v2))
>>>>>
>>>>> Since ghes_ack_error() is always prepended with this check, you could
>>>>> push it down into the function:
>>>>>
>>>>> ghes_ack_error(ghes)
>>>>> ...
>>>>>
>>>>>       if (!is_hest_type_generic_v2(ghes))
>>>>>               return 0;
>>>>>
>>>>> and simplify the two callsites :)
>>>>
>>>> Great idea! ...
>>>>
>>>> .. huh. Turns out for ghes_proc() we discard any errors other than ENOENT from
>>>> ghes_read_estatus() if is_hest_type_generic_v2(). This masks EIO.
>>>>
>>>> Most of the error sources discard the result, the worst thing I can find is
>>>> ghes_irq_func() will return IRQ_HANDLED, instead of IRQ_NONE when we didn't
>>>> really handle the IRQ. They're registered as SHARED, but I don't have an example
>>>> of what goes wrong next.
>>>>
>>>> I think this will also stop the spurious handling code kicking in to shut it up
>>>> if its broken and screaming. Unlikely, but not impossible.

[....]

>>> Looks good to me, I guess there's no harm in acking invalid error status blocks.

Great, I didn't miss something nasty...


>> Err, why?
> 
> If ghes_read_estatus() fails, then either there was no error populated or the
> error status block was invalid.
> If the error status block is invalid, then the kernel doesn't know what happened
> in hardware.

What do we mean by 'hardware' here? We're receiving a corrupt report of
something via memory.
The GHESv2 ack just means we're done with the memory. I think it exists because
the external-agent can't peek into the CPU to see if its returned from the
notification.


> I originally thought this was changing what's acked, but it's just changing the
> return value of ghes_proc() when ghes_read_estatus() returns -EIO.

Sorry, that will be due to my bad description.


>> I don't know what the firmware glue does on ARM but if I'd have to
>> remain logical - which is hard to do with firmware - the proper thing to
>> do would be this:
>>
>>         rc = ghes_read_estatus(ghes, &buf_paddr);
>>         if (rc) {
>>                 ghes_reset_hardware();
> 
> The kernel would have no way of knowing what to do here.

Is there anything wrong with what we do today? We stamp on the records so that
we don't processes them again. (especially if is polled), and we tell firmware
it can re-use this memory.

(I think we should return an error, or print a ratelimited warning for corrupt
records)


>>         }
>>
>>         /* clear estatus and bla bla */
>>
>>         /* Now, I'm in the success case: */
>>          ghes_ack_error();
>>
>>
>> This way, you have the error path clear of something unexpected happened
>> when reading the hardware, obvious and separated. ghes_reset_hardware()
>> clears the registers and does the necessary steps to put the hardware in
>> good state again so that it can report the next error.
>>
>> And the success path simply acks the error and does possibly the same
>> thing. The naming of the functions is important though, to denote what
>> gets called when.

I think this duplicates the record-stamping/acking. If there is anything in that
memory region, the action for processed/copied/ignored-because-its-corrupt is
the same.

We can return on ENOENT out earlier, as nothing needs doing in that case. Its
what the GHES_TO_CLEAR spaghetti is for, we can probably move the ack thing into
ghes_clear_estatus(), that way that thing means 'I'm done with this memory'.

Something like:
-------------------------
rc = ghes_read_estatus();
if (rc == -ENOENT)
	return 0;

if (!rc) {
	ghes_do_proc() and friends;
}

ghes_clear_estatus();

return rc;
-------------------------

We would no longer return errors from the ack code, I suspect that can only
happen for a corrupt gas, which we would have caught earlier as we rely on the
mapping being cached.



Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2019-01-11 18:09 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03 18:05 [PATCH v7 00/25] APEI in_nmi() rework and SDEI wire-up James Morse
2018-12-03 18:05 ` [PATCH v7 01/25] ACPI / APEI: Don't wait to serialise with oops messages when panic()ing James Morse
2018-12-03 18:05 ` [PATCH v7 02/25] ACPI / APEI: Remove silent flag from ghes_read_estatus() James Morse
2018-12-04 11:36   ` Borislav Petkov
2018-12-03 18:05 ` [PATCH v7 03/25] ACPI / APEI: Switch estatus pool to use vmalloc memory James Morse
2018-12-04 13:01   ` Borislav Petkov
2018-12-03 18:05 ` [PATCH v7 04/25] ACPI / APEI: Make hest.c manage the estatus memory pool James Morse
2018-12-11 16:48   ` Borislav Petkov
2018-12-14 13:56     ` James Morse
2018-12-19 14:42       ` Borislav Petkov
2019-01-10 18:20         ` James Morse
2018-12-03 18:05 ` [PATCH v7 05/25] ACPI / APEI: Make estatus pool allocation a static size James Morse
2018-12-11 16:54   ` Borislav Petkov
2018-12-03 18:05 ` [PATCH v7 06/25] ACPI / APEI: Don't store CPER records physical address in struct ghes James Morse
2018-12-11 17:04   ` Borislav Petkov
2018-12-03 18:05 ` [PATCH v7 07/25] ACPI / APEI: Remove spurious GHES_TO_CLEAR check James Morse
2018-12-11 17:18   ` Borislav Petkov
2018-12-03 18:05 ` [PATCH v7 08/25] ACPI / APEI: Don't update struct ghes' flags in read/clear estatus James Morse
2018-12-03 18:05 ` [PATCH v7 09/25] ACPI / APEI: Generalise the estatus queue's notify code James Morse
2018-12-11 17:44   ` Borislav Petkov
2019-01-10 18:21     ` James Morse
2019-01-11 11:46       ` Borislav Petkov
2018-12-03 18:05 ` [PATCH v7 10/25] ACPI / APEI: Tell firmware the estatus queue consumed the records James Morse
2018-12-11 18:36   ` Borislav Petkov
2019-01-10 18:22     ` James Morse
2019-01-10 21:01       ` Tyler Baicar
2019-01-11 12:03         ` Borislav Petkov
2019-01-11 15:32           ` Tyler Baicar
2019-01-11 17:45             ` Borislav Petkov
2019-01-11 18:25               ` James Morse
2019-01-11 19:58                 ` Borislav Petkov
2019-01-23 18:36                   ` James Morse
2019-01-29 11:49                     ` Borislav Petkov
2019-01-29 18:48                       ` James Morse
2019-01-31 13:29                         ` Borislav Petkov
2019-01-11 18:09             ` James Morse [this message]
2019-01-11 20:01               ` Borislav Petkov
2019-01-11 20:53               ` Tyler Baicar
2019-01-29 18:48                 ` James Morse
2018-12-03 18:05 ` [PATCH v7 11/25] ACPI / APEI: Move NOTIFY_SEA between the estatus-queue and NOTIFY_NMI James Morse
2019-01-21 13:01   ` Borislav Petkov
2018-12-03 18:06 ` [PATCH v7 12/25] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue James Morse
2018-12-03 18:06 ` [PATCH v7 13/25] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing James Morse
2018-12-06 16:17   ` Catalin Marinas
2018-12-03 18:06 ` [PATCH v7 14/25] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface James Morse
2018-12-06 16:17   ` Catalin Marinas
2018-12-03 18:06 ` [PATCH v7 15/25] ACPI / APEI: Move locking to the notification helper James Morse
2018-12-03 18:06 ` [PATCH v7 16/25] ACPI / APEI: Let the notification helper specify the fixmap slot James Morse
2018-12-03 18:06 ` [PATCH v7 17/25] ACPI / APEI: Pass ghes and estatus separately to avoid a later copy James Morse
2019-01-21 13:35   ` Borislav Petkov
2018-12-03 18:06 ` [PATCH v7 18/25] ACPI / APEI: Split ghes_read_estatus() to allow a peek at the CPER length James Morse
2019-01-21 13:53   ` Borislav Petkov
2018-12-03 18:06 ` [PATCH v7 19/25] ACPI / APEI: Only use queued estatus entry during _in_nmi_notify_one() James Morse
2019-01-21 17:19   ` Borislav Petkov
2018-12-03 18:06 ` [PATCH v7 20/25] ACPI / APEI: Use separate fixmap pages for arm64 NMI-like notifications James Morse
2019-01-21 17:27   ` Borislav Petkov
2019-01-23 18:33     ` James Morse
2019-01-31 13:38       ` Borislav Petkov
2018-12-03 18:06 ` [PATCH v7 21/25] mm/memory-failure: Add memory_failure_queue_kick() James Morse
2018-12-03 18:06 ` [PATCH v7 22/25] ACPI / APEI: Kick the memory_failure() queue for synchronous errors James Morse
2018-12-05  2:02   ` Xie XiuQi
2018-12-10 19:15     ` James Morse
2019-01-22 10:51       ` Borislav Petkov
2019-01-23 18:37         ` James Morse
2019-01-21 17:58   ` Borislav Petkov
2019-01-23 18:40     ` James Morse
2019-01-31 14:04       ` Borislav Petkov
2018-12-03 18:06 ` [PATCH v7 23/25] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work James Morse
2018-12-06 16:18   ` Catalin Marinas
2018-12-03 18:06 ` [PATCH v7 24/25] firmware: arm_sdei: Add ACPI GHES registration helper James Morse
2018-12-06 16:18   ` Catalin Marinas
2018-12-03 18:06 ` [PATCH v7 25/25] ACPI / APEI: Add support for the SDEI GHES Notification type James Morse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0939c14d-de58-f21d-57a6-89bdce3bcb44@arm.com \
    --to=james.morse@arm.com \
    --cc=baicar.tyler@gmail.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=christoffer.dall@arm.com \
    --cc=gengdongjiu@huawei.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=marc.zyngier@arm.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=rjw@rjwysocki.net \
    --cc=tony.luck@intel.com \
    --cc=will.deacon@arm.com \
    --cc=wufan@codeaurora.org \
    --cc=xiexiuqi@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).