All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: Borislav Petkov <bp@alien8.de>
Cc: jonathan.zhang@cavium.com, Rafael Wysocki <rjw@rjwysocki.net>,
	Tony Luck <tony.luck@intel.com>,
	linux-mm@kvack.org, Marc Zyngier <marc.zyngier@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Tyler Baicar <tbaicar@codeaurora.org>,
	Will Deacon <will.deacon@arm.com>,
	Dongjiu Geng <gengdongjiu@huawei.com>,
	linux-acpi@vger.kernel.org, Punit Agrawal <punit.agrawal@arm.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org, Len Brown <lenb@kernel.org>
Subject: Re: [PATCH v6 00/18] APEI in_nmi() rework
Date: Wed, 3 Oct 2018 18:50:38 +0100	[thread overview]
Message-ID: <c04d1b78-122b-d7f2-5a75-3d9c56386b11@arm.com> (raw)
In-Reply-To: <20180925124526.GD23986@zn.tnic>

Hi Boris,

On 25/09/18 13:45, Borislav Petkov wrote:
> On Fri, Sep 21, 2018 at 11:16:47PM +0100, James Morse wrote:
>> Hello,
>>
>> The GHES driver has collected quite a few bugs:
>>
>> ghes_proc() at ghes_probe() time can be interrupted by an NMI that
>> will clobber the ghes->estatus fields, flags, and the buffer_paddr.
>>
>> ghes_copy_tofrom_phys() uses in_nmi() to decide which path to take. arm64's
>> SEA taking both paths, depending on what it interrupted.
>>
>> There is no guarantee that queued memory_failure() errors will be processed
>> before this CPU returns to user-space.
>>
>> x86 can't TLBI from interrupt-masked code which this driver does all the
>> time.
>>
>>
>> This series aims to fix the first three, with an eye to fixing the
>> last one with a follow-up series.
>>
>> Previous postings included the SDEI notification calls, which I haven't
>> finished re-testing. This series is big enough as it is.

> Yeah, and everywhere I look, this thing looks overengineered. Like,
> for example, what's the purpose of this ghes_esource_prealloc_size()
> computing a size each time the pool changes size?

The size to grow the pool by, because each error-source described by a GHES
entry has its own worst-case size.

Today ghes_nmi_add() does this each time its called. You could have multiple
GHES entries in the HEST that describe NMI as the notification. The worst-case
size for the records is described in the GHES entry, and could be different for
each one. (error_block_length and records_to_preallocate, or table 18-379 of
acpi v6.2)

These different error-sources could be delivered on different CPUs at the same
time, so need their own pre-allocated reserved memory. ghes_notify_nmi()'s
atomic_add_unless() suggests this can happen on x86, but I don't know the
arch-specifics. It definitely can happen on arm64.


> AFAICT, this size can be computed exactly *once* at driver init and be
> done with it. Right?

We could do two passes of the HEST to pre-compute the total size of this
estatus-queue memory, allocate it, then do the notification registration stuff.
But this doesn't really work with the way this driver acts as platform-driver
for a ghes device...

The non-ghes HEST entries have a "number of records to pre-allocate" too, we
could make this memory pool something hest.c looks after, but I can't see if the
other error sources use those values.

Hmmm,
The size is capped to 64K, we could ignore the firmware description of the
memory requirements, and allocate SZ_64K each time. Doing it per-GHES is still
the only way to avoid allocating nmi-safe memory for irqs.


Thanks,

James

WARNING: multiple messages have this Message-ID (diff)
From: James Morse <james.morse@arm.com>
To: Borislav Petkov <bp@alien8.de>
Cc: linux-acpi@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
	Marc Zyngier <marc.zyngier@arm.com>,
	Christoffer Dall <christoffer.dall@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Rafael Wysocki <rjw@rjwysocki.net>, Len Brown <lenb@kernel.org>,
	Tony Luck <tony.luck@intel.com>,
	Tyler Baicar <tbaicar@codeaurora.org>,
	Dongjiu Geng <gengdongjiu@huawei.com>,
	Xie XiuQi <xiexiuqi@huawei.com>,
	Punit Agrawal <punit.agrawal@arm.com>,
	jonathan.zhang@cavium.com
Subject: Re: [PATCH v6 00/18] APEI in_nmi() rework
Date: Wed, 3 Oct 2018 18:50:38 +0100	[thread overview]
Message-ID: <c04d1b78-122b-d7f2-5a75-3d9c56386b11@arm.com> (raw)
In-Reply-To: <20180925124526.GD23986@zn.tnic>

Hi Boris,

On 25/09/18 13:45, Borislav Petkov wrote:
> On Fri, Sep 21, 2018 at 11:16:47PM +0100, James Morse wrote:
>> Hello,
>>
>> The GHES driver has collected quite a few bugs:
>>
>> ghes_proc() at ghes_probe() time can be interrupted by an NMI that
>> will clobber the ghes->estatus fields, flags, and the buffer_paddr.
>>
>> ghes_copy_tofrom_phys() uses in_nmi() to decide which path to take. arm64's
>> SEA taking both paths, depending on what it interrupted.
>>
>> There is no guarantee that queued memory_failure() errors will be processed
>> before this CPU returns to user-space.
>>
>> x86 can't TLBI from interrupt-masked code which this driver does all the
>> time.
>>
>>
>> This series aims to fix the first three, with an eye to fixing the
>> last one with a follow-up series.
>>
>> Previous postings included the SDEI notification calls, which I haven't
>> finished re-testing. This series is big enough as it is.

> Yeah, and everywhere I look, this thing looks overengineered. Like,
> for example, what's the purpose of this ghes_esource_prealloc_size()
> computing a size each time the pool changes size?

The size to grow the pool by, because each error-source described by a GHES
entry has its own worst-case size.

Today ghes_nmi_add() does this each time its called. You could have multiple
GHES entries in the HEST that describe NMI as the notification. The worst-case
size for the records is described in the GHES entry, and could be different for
each one. (error_block_length and records_to_preallocate, or table 18-379 of
acpi v6.2)

These different error-sources could be delivered on different CPUs at the same
time, so need their own pre-allocated reserved memory. ghes_notify_nmi()'s
atomic_add_unless() suggests this can happen on x86, but I don't know the
arch-specifics. It definitely can happen on arm64.


> AFAICT, this size can be computed exactly *once* at driver init and be
> done with it. Right?

We could do two passes of the HEST to pre-compute the total size of this
estatus-queue memory, allocate it, then do the notification registration stuff.
But this doesn't really work with the way this driver acts as platform-driver
for a ghes device...

The non-ghes HEST entries have a "number of records to pre-allocate" too, we
could make this memory pool something hest.c looks after, but I can't see if the
other error sources use those values.

Hmmm,
The size is capped to 64K, we could ignore the firmware description of the
memory requirements, and allocate SZ_64K each time. Doing it per-GHES is still
the only way to avoid allocating nmi-safe memory for irqs.


Thanks,

James

WARNING: multiple messages have this Message-ID (diff)
From: james.morse@arm.com (James Morse)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v6 00/18] APEI in_nmi() rework
Date: Wed, 3 Oct 2018 18:50:38 +0100	[thread overview]
Message-ID: <c04d1b78-122b-d7f2-5a75-3d9c56386b11@arm.com> (raw)
In-Reply-To: <20180925124526.GD23986@zn.tnic>

Hi Boris,

On 25/09/18 13:45, Borislav Petkov wrote:
> On Fri, Sep 21, 2018 at 11:16:47PM +0100, James Morse wrote:
>> Hello,
>>
>> The GHES driver has collected quite a few bugs:
>>
>> ghes_proc() at ghes_probe() time can be interrupted by an NMI that
>> will clobber the ghes->estatus fields, flags, and the buffer_paddr.
>>
>> ghes_copy_tofrom_phys() uses in_nmi() to decide which path to take. arm64's
>> SEA taking both paths, depending on what it interrupted.
>>
>> There is no guarantee that queued memory_failure() errors will be processed
>> before this CPU returns to user-space.
>>
>> x86 can't TLBI from interrupt-masked code which this driver does all the
>> time.
>>
>>
>> This series aims to fix the first three, with an eye to fixing the
>> last one with a follow-up series.
>>
>> Previous postings included the SDEI notification calls, which I haven't
>> finished re-testing. This series is big enough as it is.

> Yeah, and everywhere I look, this thing looks overengineered. Like,
> for example, what's the purpose of this ghes_esource_prealloc_size()
> computing a size each time the pool changes size?

The size to grow the pool by, because each error-source described by a GHES
entry has its own worst-case size.

Today ghes_nmi_add() does this each time its called. You could have multiple
GHES entries in the HEST that describe NMI as the notification. The worst-case
size for the records is described in the GHES entry, and could be different for
each one. (error_block_length and records_to_preallocate, or table 18-379 of
acpi v6.2)

These different error-sources could be delivered on different CPUs at the same
time, so need their own pre-allocated reserved memory. ghes_notify_nmi()'s
atomic_add_unless() suggests this can happen on x86, but I don't know the
arch-specifics. It definitely can happen on arm64.


> AFAICT, this size can be computed exactly *once* at driver init and be
> done with it. Right?

We could do two passes of the HEST to pre-compute the total size of this
estatus-queue memory, allocate it, then do the notification registration stuff.
But this doesn't really work with the way this driver acts as platform-driver
for a ghes device...

The non-ghes HEST entries have a "number of records to pre-allocate" too, we
could make this memory pool something hest.c looks after, but I can't see if the
other error sources use those values.

Hmmm,
The size is capped to 64K, we could ignore the firmware description of the
memory requirements, and allocate SZ_64K each time. Doing it per-GHES is still
the only way to avoid allocating nmi-safe memory for irqs.


Thanks,

James

  reply	other threads:[~2018-10-03 17:50 UTC|newest]

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-21 22:16 [PATCH v6 00/18] APEI in_nmi() rework James Morse
2018-09-21 22:16 ` James Morse
2018-09-21 22:16 ` James Morse
2018-09-21 22:16 ` [PATCH v6 01/18] ACPI / APEI: Move the estatus queue code up, and under its own ifdef James Morse
2018-09-21 22:16   ` James Morse
2018-09-21 22:16   ` James Morse
2018-09-21 22:16 ` [PATCH v6 02/18] ACPI / APEI: Generalise the estatus queue's add/remove and notify code James Morse
2018-09-21 22:16   ` James Morse
2018-09-21 22:16   ` James Morse
2018-09-21 22:16 ` [PATCH v6 03/18] ACPI / APEI: don't wait to serialise with oops messages when panic()ing James Morse
2018-09-21 22:16   ` James Morse
2018-09-21 22:16   ` James Morse
2018-09-21 22:16 ` [PATCH v6 04/18] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue James Morse
2018-09-21 22:16   ` James Morse
2018-09-21 22:16   ` James Morse
2018-09-28 17:04   ` Borislav Petkov
2018-09-28 17:04     ` Borislav Petkov
2018-09-28 17:04     ` Borislav Petkov
2018-09-21 22:16 ` [PATCH v6 05/18] ACPI / APEI: Make estatus queue a Kconfig symbol James Morse
2018-09-21 22:16   ` James Morse
2018-09-21 22:16   ` James Morse
2018-10-01 17:59   ` Borislav Petkov
2018-10-01 17:59     ` Borislav Petkov
2018-10-01 17:59     ` Borislav Petkov
2018-10-03 17:50     ` James Morse
2018-10-03 17:50       ` James Morse
2018-10-03 17:50       ` James Morse
2018-10-04 17:34       ` Borislav Petkov
2018-10-04 17:34         ` Borislav Petkov
2018-10-04 17:34         ` Borislav Petkov
2018-10-12 17:17         ` James Morse
2018-10-12 17:17           ` James Morse
2018-10-12 17:17           ` James Morse
2018-10-12 18:10           ` Borislav Petkov
2018-10-12 18:10             ` Borislav Petkov
2018-10-12 18:10             ` Borislav Petkov
2018-09-21 22:16 ` [PATCH v6 06/18] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing James Morse
2018-09-21 22:16   ` James Morse
2018-09-21 22:16   ` James Morse
2018-10-12  9:57   ` Borislav Petkov
2018-10-12  9:57     ` Borislav Petkov
2018-10-12  9:57     ` Borislav Petkov
2018-10-12 17:18     ` James Morse
2018-10-12 17:18       ` James Morse
2018-10-12 17:18       ` James Morse
2018-09-21 22:16 ` [PATCH v6 07/18] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface James Morse
2018-09-21 22:16   ` James Morse
2018-09-21 22:16   ` James Morse
2018-10-12 10:02   ` Borislav Petkov
2018-10-12 10:02     ` Borislav Petkov
2018-10-12 10:02     ` Borislav Petkov
2018-10-12 17:18     ` James Morse
2018-10-12 17:18       ` James Morse
2018-10-12 17:18       ` James Morse
2018-09-21 22:16 ` [PATCH v6 08/18] ACPI / APEI: Move locking to the notification helper James Morse
2018-09-21 22:16   ` James Morse
2018-09-21 22:16   ` James Morse
2018-10-12 11:08   ` Borislav Petkov
2018-10-12 11:08     ` Borislav Petkov
2018-10-12 11:08     ` Borislav Petkov
2018-09-21 22:16 ` [PATCH v6 09/18] ACPI / APEI: Let the notification helper specify the fixmap slot James Morse
2018-09-21 22:16   ` James Morse
2018-09-21 22:16   ` James Morse
2018-10-12 11:14   ` Borislav Petkov
2018-10-12 11:14     ` Borislav Petkov
2018-10-12 11:14     ` Borislav Petkov
2018-09-21 22:16 ` [PATCH v6 10/18] ACPI / APEI: preparatory split of ghes->estatus James Morse
2018-09-21 22:16   ` James Morse
2018-09-21 22:16   ` James Morse
2018-10-12 16:37   ` Borislav Petkov
2018-10-12 16:37     ` Borislav Petkov
2018-10-12 16:37     ` Borislav Petkov
2018-09-21 22:16 ` [PATCH v6 11/18] ACPI / APEI: Remove silent flag from ghes_read_estatus() James Morse
2018-09-21 22:16   ` James Morse
2018-09-21 22:16   ` James Morse
2018-10-12 16:55   ` Borislav Petkov
2018-10-12 16:55     ` Borislav Petkov
2018-10-12 16:55     ` Borislav Petkov
2018-09-21 22:16 ` [PATCH v6 12/18] ACPI / APEI: Don't store CPER records physical address in struct ghes James Morse
2018-09-21 22:16   ` James Morse
2018-09-21 22:16   ` James Morse
2018-09-21 22:17 ` [PATCH v6 13/18] ACPI / APEI: Don't update struct ghes' flags in read/clear estatus James Morse
2018-09-21 22:17   ` James Morse
2018-09-21 22:17   ` James Morse
2018-10-12 17:14   ` Borislav Petkov
2018-10-12 17:14     ` Borislav Petkov
2018-10-12 17:14     ` Borislav Petkov
2018-09-21 22:17 ` [PATCH v6 14/18] ACPI / APEI: Split ghes_read_estatus() to read CPER length James Morse
2018-09-21 22:17   ` James Morse
2018-09-21 22:17   ` James Morse
2018-10-12 17:25   ` Borislav Petkov
2018-10-12 17:25     ` Borislav Petkov
2018-10-12 17:25     ` Borislav Petkov
2018-09-21 22:17 ` [PATCH v6 15/18] ACPI / APEI: Only use queued estatus entry during _in_nmi_notify_one() James Morse
2018-09-21 22:17   ` James Morse
2018-09-21 22:17   ` James Morse
2018-10-12 17:34   ` Borislav Petkov
2018-10-12 17:34     ` Borislav Petkov
2018-10-12 17:34     ` Borislav Petkov
2018-09-21 22:17 ` [PATCH v6 16/18] ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications James Morse
2018-09-21 22:17   ` James Morse
2018-09-21 22:17   ` James Morse
2018-09-21 22:17 ` [PATCH v6 17/18] mm/memory-failure: increase queued recovery work's priority James Morse
2018-09-21 22:17   ` James Morse
2018-09-21 22:17   ` James Morse
2018-10-15 16:49   ` Borislav Petkov
2018-10-15 16:49     ` Borislav Petkov
2018-10-15 16:49     ` Borislav Petkov
2018-10-16  7:43     ` Peter Zijlstra
2018-10-16  7:43       ` Peter Zijlstra
2018-10-16  7:43       ` Peter Zijlstra
2018-09-21 22:17 ` [PATCH v6 18/18] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work James Morse
2018-09-21 22:17   ` James Morse
2018-09-21 22:17   ` James Morse
2018-09-25 12:45 ` [PATCH v6 00/18] APEI in_nmi() rework Borislav Petkov
2018-09-25 12:45   ` Borislav Petkov
2018-09-25 12:45   ` Borislav Petkov
2018-10-03 17:50   ` James Morse [this message]
2018-10-03 17:50     ` James Morse
2018-10-03 17:50     ` James Morse
2018-10-04 15:15     ` Borislav Petkov
2018-10-04 15:15       ` Borislav Petkov
2018-10-04 15:15       ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c04d1b78-122b-d7f2-5a75-3d9c56386b11@arm.com \
    --to=james.morse@arm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=gengdongjiu@huawei.com \
    --cc=jonathan.zhang@cavium.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=marc.zyngier@arm.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=punit.agrawal@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=tbaicar@codeaurora.org \
    --cc=tony.luck@intel.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.