All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: gengdongjiu <gengdj.1984@gmail.com>
Cc: wuquanming <wuquanming@huawei.com>,
	linux-acpi@vger.kernel.org, kvm@vger.kernel.org,
	linux-doc@vger.kernel.org, Marc Zyngier <marc.zyngier@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Jonathan Corbet <corbet@lwn.net>,
	rjw@rjwysocki.net, linux@armlinux.org.uk,
	gengdongjiu <gengdongjiu@huawei.com>,
	linuxarm@huawei.com, bp@alien8.de,
	arm-mail-list <linux-arm-kernel@lists.infradead.org>,
	pbonzini@redhat.com, Huangshaoyu <huangshaoyu@huawei.com>,
	kvmarm@lists.cs.columbia.edu,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	devel@acpica.org
Subject: Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
Date: Mon, 22 Jan 2018 19:32:12 +0000	[thread overview]
Message-ID: <5A663C3C.7040904@arm.com> (raw)
In-Reply-To: <CAMj-D2C5Bz3r-w9v3QasZKK-W2JBY7Mx9W=yCDBnPYyk99gGzA@mail.gmail.com>

Hi gengdongjiu,

On 21/01/18 02:45, gengdongjiu wrote:
> For the ESR_ELx_AET_UER, this exception is precise, closing the VM may
> be better[1].
> But if you think panic is better until we support kernel-first, it is
> also OK to me.

I'm not convinced SError while a guest was running means only guest memory could
be affected. Mechanisms like KSM means the error could affect multiple guests.

Both firmware-fist and kernel-first will give us the address, with which we can
know which processes are affected, isolated the memory and signal affected
processes.

Until we have one of these panic() is the only way we have to contain an error,
but its an interim fix.
Not panic()ing the host for an error that should be contained to the guest is a
fudge, we don't actually know its safe (KSM, page-table etc). I want to improve
on this with {firmware, kernel}-first support (or both!), I don't want to expose
that this is happening to user-space, as once we have one of {firmware,
kernel}-first, it shouldn't happen.


>> This is inventing something new for RAS errors not claimed by firmware-first.
>> If we have kernel-first too, this will never happen. (unless your system is
>> losing the error description).

> In fact, if we have kernel-first, I think we still need to judge the
> error type by ESR, right?

The kernel-first mechanism should consider the ESR/FAR, yes, but once the error
has been claimed and handled, KVM shouldn't care about any of these values.
(maybe we'll sanity check for uncontained errors, just in case the error escaped
to the RAS code...)

My point here was exposing 'unhandled' (ignored) RAS errors to user-space
creates an ABI: someone will complain once we start handling the error, and they
no longer get a notification via this 'unhandled' interface. Code written to use
this interface becomes useless/untested.


> If the handle_guest_sei() , may be the system does not support firmware-first,
> so we judge the ESR value,

...and panic()/ignore as appropriate.

I agree not all systems will support firmware-first, (big-endian is the obvious
example), but if we get kernel-first support this ESR guessing can disappear,
I'm against exposing it to user-space in the meantime.


Thanks,

James

WARNING: multiple messages have this Message-ID (diff)
From: James Morse <james.morse@arm.com>
To: gengdongjiu <gengdj.1984@gmail.com>
Cc: gengdongjiu <gengdongjiu@huawei.com>,
	wuquanming <wuquanming@huawei.com>,
	linux-doc@vger.kernel.org, kvm@vger.kernel.org,
	Marc Zyngier <marc.zyngier@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Jonathan Corbet <corbet@lwn.net>,
	rjw@rjwysocki.net, linux@armlinux.org.uk, linuxarm@huawei.com,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-acpi@vger.kernel.org, bp@alien8.de,
	arm-mail-list <linux-arm-kernel@lists.infradead.org>,
	Huangshaoyu <huangshaoyu@huawei.com>,
	pbonzini@redhat.com, kvmarm@lists.cs.columbia.edu,
	devel@acpica.org
Subject: Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
Date: Mon, 22 Jan 2018 19:32:12 +0000	[thread overview]
Message-ID: <5A663C3C.7040904@arm.com> (raw)
In-Reply-To: <CAMj-D2C5Bz3r-w9v3QasZKK-W2JBY7Mx9W=yCDBnPYyk99gGzA@mail.gmail.com>

Hi gengdongjiu,

On 21/01/18 02:45, gengdongjiu wrote:
> For the ESR_ELx_AET_UER, this exception is precise, closing the VM may
> be better[1].
> But if you think panic is better until we support kernel-first, it is
> also OK to me.

I'm not convinced SError while a guest was running means only guest memory could
be affected. Mechanisms like KSM means the error could affect multiple guests.

Both firmware-fist and kernel-first will give us the address, with which we can
know which processes are affected, isolated the memory and signal affected
processes.

Until we have one of these panic() is the only way we have to contain an error,
but its an interim fix.
Not panic()ing the host for an error that should be contained to the guest is a
fudge, we don't actually know its safe (KSM, page-table etc). I want to improve
on this with {firmware, kernel}-first support (or both!), I don't want to expose
that this is happening to user-space, as once we have one of {firmware,
kernel}-first, it shouldn't happen.


>> This is inventing something new for RAS errors not claimed by firmware-first.
>> If we have kernel-first too, this will never happen. (unless your system is
>> losing the error description).

> In fact, if we have kernel-first, I think we still need to judge the
> error type by ESR, right?

The kernel-first mechanism should consider the ESR/FAR, yes, but once the error
has been claimed and handled, KVM shouldn't care about any of these values.
(maybe we'll sanity check for uncontained errors, just in case the error escaped
to the RAS code...)

My point here was exposing 'unhandled' (ignored) RAS errors to user-space
creates an ABI: someone will complain once we start handling the error, and they
no longer get a notification via this 'unhandled' interface. Code written to use
this interface becomes useless/untested.


> If the handle_guest_sei() , may be the system does not support firmware-first,
> so we judge the ESR value,

...and panic()/ignore as appropriate.

I agree not all systems will support firmware-first, (big-endian is the obvious
example), but if we get kernel-first support this ESR guessing can disappear,
I'm against exposing it to user-space in the meantime.


Thanks,

James

WARNING: multiple messages have this Message-ID (diff)
From: james.morse@arm.com (James Morse)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
Date: Mon, 22 Jan 2018 19:32:12 +0000	[thread overview]
Message-ID: <5A663C3C.7040904@arm.com> (raw)
In-Reply-To: <CAMj-D2C5Bz3r-w9v3QasZKK-W2JBY7Mx9W=yCDBnPYyk99gGzA@mail.gmail.com>

Hi gengdongjiu,

On 21/01/18 02:45, gengdongjiu wrote:
> For the ESR_ELx_AET_UER, this exception is precise, closing the VM may
> be better[1].
> But if you think panic is better until we support kernel-first, it is
> also OK to me.

I'm not convinced SError while a guest was running means only guest memory could
be affected. Mechanisms like KSM means the error could affect multiple guests.

Both firmware-fist and kernel-first will give us the address, with which we can
know which processes are affected, isolated the memory and signal affected
processes.

Until we have one of these panic() is the only way we have to contain an error,
but its an interim fix.
Not panic()ing the host for an error that should be contained to the guest is a
fudge, we don't actually know its safe (KSM, page-table etc). I want to improve
on this with {firmware, kernel}-first support (or both!), I don't want to expose
that this is happening to user-space, as once we have one of {firmware,
kernel}-first, it shouldn't happen.


>> This is inventing something new for RAS errors not claimed by firmware-first.
>> If we have kernel-first too, this will never happen. (unless your system is
>> losing the error description).

> In fact, if we have kernel-first, I think we still need to judge the
> error type by ESR, right?

The kernel-first mechanism should consider the ESR/FAR, yes, but once the error
has been claimed and handled, KVM shouldn't care about any of these values.
(maybe we'll sanity check for uncontained errors, just in case the error escaped
to the RAS code...)

My point here was exposing 'unhandled' (ignored) RAS errors to user-space
creates an ABI: someone will complain once we start handling the error, and they
no longer get a notification via this 'unhandled' interface. Code written to use
this interface becomes useless/untested.


> If the handle_guest_sei() , may be the system does not support firmware-first,
> so we judge the ESR value,

...and panic()/ignore as appropriate.

I agree not all systems will support firmware-first, (big-endian is the obvious
example), but if we get kernel-first support this ESR guessing can disappear,
I'm against exposing it to user-space in the meantime.


Thanks,

James

WARNING: multiple messages have this Message-ID (diff)
From: James Morse <james.morse at arm.com>
To: devel@acpica.org
Subject: Re: [Devel] [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
Date: Mon, 22 Jan 2018 19:32:12 +0000	[thread overview]
Message-ID: <5A663C3C.7040904@arm.com> (raw)
In-Reply-To: CAMj-D2C5Bz3r-w9v3QasZKK-W2JBY7Mx9W=yCDBnPYyk99gGzA@mail.gmail.com

[-- Attachment #1: Type: text/plain, Size: 2284 bytes --]

Hi gengdongjiu,

On 21/01/18 02:45, gengdongjiu wrote:
> For the ESR_ELx_AET_UER, this exception is precise, closing the VM may
> be better[1].
> But if you think panic is better until we support kernel-first, it is
> also OK to me.

I'm not convinced SError while a guest was running means only guest memory could
be affected. Mechanisms like KSM means the error could affect multiple guests.

Both firmware-fist and kernel-first will give us the address, with which we can
know which processes are affected, isolated the memory and signal affected
processes.

Until we have one of these panic() is the only way we have to contain an error,
but its an interim fix.
Not panic()ing the host for an error that should be contained to the guest is a
fudge, we don't actually know its safe (KSM, page-table etc). I want to improve
on this with {firmware, kernel}-first support (or both!), I don't want to expose
that this is happening to user-space, as once we have one of {firmware,
kernel}-first, it shouldn't happen.


>> This is inventing something new for RAS errors not claimed by firmware-first.
>> If we have kernel-first too, this will never happen. (unless your system is
>> losing the error description).

> In fact, if we have kernel-first, I think we still need to judge the
> error type by ESR, right?

The kernel-first mechanism should consider the ESR/FAR, yes, but once the error
has been claimed and handled, KVM shouldn't care about any of these values.
(maybe we'll sanity check for uncontained errors, just in case the error escaped
to the RAS code...)

My point here was exposing 'unhandled' (ignored) RAS errors to user-space
creates an ABI: someone will complain once we start handling the error, and they
no longer get a notification via this 'unhandled' interface. Code written to use
this interface becomes useless/untested.


> If the handle_guest_sei() , may be the system does not support firmware-first,
> so we judge the ESR value,

...and panic()/ignore as appropriate.

I agree not all systems will support firmware-first, (big-endian is the obvious
example), but if we get kernel-first support this ESR guessing can disappear,
I'm against exposing it to user-space in the meantime.


Thanks,

James

  reply	other threads:[~2018-01-22 19:32 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-10 19:54 [PATCH v8 0/7] Support RAS virtualization in KVM Dongjiu Geng
2017-11-10 19:54 ` [Devel] " Dongjiu Geng
2017-11-10 19:54 ` Dongjiu Geng
2017-11-10 19:54 ` Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 1/7] arm64: cpufeature: Detect CPU RAS Extentions Dongjiu Geng
2017-11-10 19:54   ` [Devel] " Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 2/7] KVM: arm64: Save ESR_EL2 on guest SError Dongjiu Geng
2017-11-10 19:54   ` [Devel] " Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 3/7] acpi: apei: Add SEI notification type support for ARMv8 Dongjiu Geng
2017-11-10 19:54   ` [Devel] " Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 4/7] KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA Dongjiu Geng
2017-11-10 19:54   ` [Devel] " Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl Dongjiu Geng
2017-11-10 19:54   ` [Devel] " Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 6/7] arm64: kvm: Set Virtual SError Exception Syndrome for guest Dongjiu Geng
2017-11-10 19:54   ` [Devel] " Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization Dongjiu Geng
2017-11-10 19:54   ` [Devel] " Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-14 16:00   ` James Morse
2017-11-14 16:00     ` [Devel] " James Morse
2017-11-14 16:00     ` James Morse
2017-11-14 16:00     ` James Morse
2017-11-15 11:29     ` gengdongjiu
2017-11-15 11:29       ` [Devel] " gengdongjiu
2017-11-15 11:29       ` gengdongjiu
2017-11-15 11:29       ` gengdongjiu
2017-12-06 10:26     ` gengdongjiu
2017-12-06 10:26       ` [Devel] " gengdongjiu
2017-12-06 10:26       ` gengdongjiu
2017-12-06 10:26       ` gengdongjiu
2017-12-06 19:04       ` James Morse
2017-12-06 19:04         ` [Devel] " James Morse
2017-12-06 19:04         ` James Morse
2017-12-07  6:37         ` gengdongjiu
2017-12-07  6:37           ` [Devel] " gengdongjiu
2017-12-07  6:37           ` gengdongjiu
2017-12-07  6:37           ` gengdongjiu
2017-12-15  3:30           ` gengdongjiu
2017-12-15  3:30             ` [Devel] " gengdongjiu
2017-12-15  3:30             ` gengdongjiu
2017-12-15  3:30             ` gengdongjiu
2018-01-12 18:05             ` James Morse
2018-01-12 18:05               ` [Devel] " James Morse
2018-01-12 18:05               ` James Morse
2018-01-12 18:05               ` James Morse
2018-01-15  8:33               ` Christoffer Dall
2018-01-15  8:33                 ` Christoffer Dall
2018-01-16 11:19                 ` gengdongjiu
2018-01-16 11:19                   ` [Devel] " gengdongjiu
2018-01-16 11:19                   ` gengdongjiu
2018-01-21  3:10                 ` gengdongjiu
2018-01-21  3:10                   ` gengdongjiu
2018-01-21  2:45               ` gengdongjiu
2018-01-21  2:45                 ` gengdongjiu
2018-01-22 19:32                 ` James Morse [this message]
2018-01-22 19:32                   ` [Devel] " James Morse
2018-01-22 19:32                   ` James Morse
2018-01-22 19:32                   ` James Morse
2017-12-15 18:52           ` James Morse
2017-12-15 18:52             ` [Devel] " James Morse
2017-12-15 18:52             ` James Morse
2017-12-16  3:44             ` gengdongjiu
2017-12-16  3:44               ` gengdongjiu
2017-12-16  3:44               ` gengdongjiu
2018-01-22 19:36               ` James Morse
2018-01-22 19:36                 ` James Morse
2017-12-16  4:47     ` gengdongjiu
2017-12-16  4:47       ` gengdongjiu
2018-01-12 18:05       ` James Morse
2018-01-12 18:05         ` [Devel] " James Morse
2018-01-12 18:05         ` James Morse
2018-01-12 18:05         ` James Morse
2018-01-16 11:22         ` gengdongjiu
2018-01-16 11:22           ` [Devel] " gengdongjiu
2018-01-16 11:22           ` gengdongjiu
2018-01-21  2:54         ` gengdongjiu
2018-01-21  2:54           ` gengdongjiu
2017-11-14 16:00 ` [PATCH v8 0/7] Support RAS virtualization in KVM James Morse
2017-11-14 16:00   ` [Devel] " James Morse
2017-11-14 16:00   ` James Morse
2017-11-15 11:06   ` gengdongjiu
2017-11-15 11:06     ` [Devel] " gengdongjiu
2017-11-15 11:06     ` gengdongjiu
2017-11-15 11:06     ` gengdongjiu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A663C3C.7040904@arm.com \
    --to=james.morse@arm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=devel@acpica.org \
    --cc=gengdj.1984@gmail.com \
    --cc=gengdongjiu@huawei.com \
    --cc=huangshaoyu@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxarm@huawei.com \
    --cc=marc.zyngier@arm.com \
    --cc=pbonzini@redhat.com \
    --cc=rjw@rjwysocki.net \
    --cc=wuquanming@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.