All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: kvm@vger.kernel.org, kernel-team@android.com,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 2/5] KVM: arm64: Work around GICv3 locally generated SErrors
Date: Mon, 04 Oct 2021 14:25:12 +0100	[thread overview]
Message-ID: <8735pgsz0n.wl-maz@kernel.org> (raw)
In-Reply-To: <6e50193e-95c4-e1fa-8287-1b909a714ebd@arm.com>

Hi Alex,

On Mon, 04 Oct 2021 12:23:41 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On 9/24/21 09:25, Marc Zyngier wrote:
> > The infamous M1 has a feature nobody else ever implemented,
> > in the form of the "GIC locally generated SError interrupts",
> > also known as SEIS for short.
> >
> > These SErrors are generated when a guest does something that violates
> > the GIC state machine. It would have been simpler to just *ignore*
> > the damned thing, but that's not what this HW does. Oh well.
> >
> > This part of of the architecture is also amazingly under-specified.
> > There is a whole 10 lines that describe the feature in a spec that
> > is 930 pages long, and some of these lines are factually wrong.
> > Oh, and it is deprecated, so the insentive to clarify it is low.
> >
> > Now, the spec says that this should be a *virtual* SError when
> > HCR_EL2.AMO is set. As it turns out, that's not always the case
> > on this CPU, and the SError sometimes fires on the host as a
> > physical SError. Goodbye, cruel world. This clearly is a HW bug,
> > and it means that a guest can easily take the host down, on demand.
> >
> > Thankfully, we have seen systems that were just as broken in the
> > past, and we have the perfect vaccine for it.
> >
> > Apple M1, please meet the Cavium ThunderX workaround. All your
> > GIC accesses will be trapped, sanitised, and emulated. Only the
> > signalling aspect of the HW will be used. It won't be super speedy,
> > but it will at least be safe. You're most welcome.
> >
> > Given that this has only ever been seen on this single implementation,
> > that the spec is unclear at best and that we cannot trust it to ever
> > be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS
> > being set.
> 
> I grepped for system error in Arm IHI 0069F, and turns out there's a number of
> ways to make the GIC generate one:
> 
> - When programming the ITS
> 
> - On a write to ICC_DIR_EL1 (or the corresponding virtual CPU interface register)
> with split priority drop/interrupt deactivation is not enabled.
> 
> - On a write to GICV_AEOIR or GICC_DIR.
> 
> ITS and the legacy GICv2 interface is memory mapped, so I am going
> to trust that KVM emulates that correctly and avoids putting the GIC
> into a state that triggers the SErrors.

And to be clear, if the host kernel was doing the wrong thing, it
would take a *physical* SError. And on the M1, it really doesn't
matter as there is no physical GIC.

> The CPU interface registers are accessed directly by the guest, then
> changing that to trap-and-emulate looks like the only way to avoid
> the guest from crashing the host with an SError.
> 
> As for making the trap-and-emulate depend on the ICH_VTR_EL2.SEIS
> being set, that sounds reasonable to me, considering that there were
> no reports so far of this being implemented. And if it turns out
> that there are device which implement GIC generated SErrors
> *correctly* and the trap-and-emulate cost is too much, then we can
> always get an errata number from Apple and have the trapping depend
> on that, right?

I have very little hope that we can get Apple to give us anything
here. The CPU doesn't even advertise that it has a vGIC, so we're in
uncharted territories. But we could definitely key that on the MIDR.

> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks!

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Eric Auger <eric.auger@redhat.com>,
	Christoffer Dall <christoffer.dall@arm.com>,
	kernel-team@android.com
Subject: Re: [PATCH 2/5] KVM: arm64: Work around GICv3 locally generated SErrors
Date: Mon, 04 Oct 2021 14:25:12 +0100	[thread overview]
Message-ID: <8735pgsz0n.wl-maz@kernel.org> (raw)
In-Reply-To: <6e50193e-95c4-e1fa-8287-1b909a714ebd@arm.com>

Hi Alex,

On Mon, 04 Oct 2021 12:23:41 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On 9/24/21 09:25, Marc Zyngier wrote:
> > The infamous M1 has a feature nobody else ever implemented,
> > in the form of the "GIC locally generated SError interrupts",
> > also known as SEIS for short.
> >
> > These SErrors are generated when a guest does something that violates
> > the GIC state machine. It would have been simpler to just *ignore*
> > the damned thing, but that's not what this HW does. Oh well.
> >
> > This part of of the architecture is also amazingly under-specified.
> > There is a whole 10 lines that describe the feature in a spec that
> > is 930 pages long, and some of these lines are factually wrong.
> > Oh, and it is deprecated, so the insentive to clarify it is low.
> >
> > Now, the spec says that this should be a *virtual* SError when
> > HCR_EL2.AMO is set. As it turns out, that's not always the case
> > on this CPU, and the SError sometimes fires on the host as a
> > physical SError. Goodbye, cruel world. This clearly is a HW bug,
> > and it means that a guest can easily take the host down, on demand.
> >
> > Thankfully, we have seen systems that were just as broken in the
> > past, and we have the perfect vaccine for it.
> >
> > Apple M1, please meet the Cavium ThunderX workaround. All your
> > GIC accesses will be trapped, sanitised, and emulated. Only the
> > signalling aspect of the HW will be used. It won't be super speedy,
> > but it will at least be safe. You're most welcome.
> >
> > Given that this has only ever been seen on this single implementation,
> > that the spec is unclear at best and that we cannot trust it to ever
> > be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS
> > being set.
> 
> I grepped for system error in Arm IHI 0069F, and turns out there's a number of
> ways to make the GIC generate one:
> 
> - When programming the ITS
> 
> - On a write to ICC_DIR_EL1 (or the corresponding virtual CPU interface register)
> with split priority drop/interrupt deactivation is not enabled.
> 
> - On a write to GICV_AEOIR or GICC_DIR.
> 
> ITS and the legacy GICv2 interface is memory mapped, so I am going
> to trust that KVM emulates that correctly and avoids putting the GIC
> into a state that triggers the SErrors.

And to be clear, if the host kernel was doing the wrong thing, it
would take a *physical* SError. And on the M1, it really doesn't
matter as there is no physical GIC.

> The CPU interface registers are accessed directly by the guest, then
> changing that to trap-and-emulate looks like the only way to avoid
> the guest from crashing the host with an SError.
> 
> As for making the trap-and-emulate depend on the ICH_VTR_EL2.SEIS
> being set, that sounds reasonable to me, considering that there were
> no reports so far of this being implemented. And if it turns out
> that there are device which implement GIC generated SErrors
> *correctly* and the trap-and-emulate cost is too much, then we can
> always get an errata number from Apple and have the trapping depend
> on that, right?

I have very little hope that we can get Apple to give us anything
here. The CPU doesn't even advertise that it has a vGIC, so we're in
uncharted territories. But we could definitely key that on the MIDR.

> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks!

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Eric Auger <eric.auger@redhat.com>,
	Christoffer Dall <christoffer.dall@arm.com>,
	kernel-team@android.com
Subject: Re: [PATCH 2/5] KVM: arm64: Work around GICv3 locally generated SErrors
Date: Mon, 04 Oct 2021 14:25:12 +0100	[thread overview]
Message-ID: <8735pgsz0n.wl-maz@kernel.org> (raw)
In-Reply-To: <6e50193e-95c4-e1fa-8287-1b909a714ebd@arm.com>

Hi Alex,

On Mon, 04 Oct 2021 12:23:41 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On 9/24/21 09:25, Marc Zyngier wrote:
> > The infamous M1 has a feature nobody else ever implemented,
> > in the form of the "GIC locally generated SError interrupts",
> > also known as SEIS for short.
> >
> > These SErrors are generated when a guest does something that violates
> > the GIC state machine. It would have been simpler to just *ignore*
> > the damned thing, but that's not what this HW does. Oh well.
> >
> > This part of of the architecture is also amazingly under-specified.
> > There is a whole 10 lines that describe the feature in a spec that
> > is 930 pages long, and some of these lines are factually wrong.
> > Oh, and it is deprecated, so the insentive to clarify it is low.
> >
> > Now, the spec says that this should be a *virtual* SError when
> > HCR_EL2.AMO is set. As it turns out, that's not always the case
> > on this CPU, and the SError sometimes fires on the host as a
> > physical SError. Goodbye, cruel world. This clearly is a HW bug,
> > and it means that a guest can easily take the host down, on demand.
> >
> > Thankfully, we have seen systems that were just as broken in the
> > past, and we have the perfect vaccine for it.
> >
> > Apple M1, please meet the Cavium ThunderX workaround. All your
> > GIC accesses will be trapped, sanitised, and emulated. Only the
> > signalling aspect of the HW will be used. It won't be super speedy,
> > but it will at least be safe. You're most welcome.
> >
> > Given that this has only ever been seen on this single implementation,
> > that the spec is unclear at best and that we cannot trust it to ever
> > be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS
> > being set.
> 
> I grepped for system error in Arm IHI 0069F, and turns out there's a number of
> ways to make the GIC generate one:
> 
> - When programming the ITS
> 
> - On a write to ICC_DIR_EL1 (or the corresponding virtual CPU interface register)
> with split priority drop/interrupt deactivation is not enabled.
> 
> - On a write to GICV_AEOIR or GICC_DIR.
> 
> ITS and the legacy GICv2 interface is memory mapped, so I am going
> to trust that KVM emulates that correctly and avoids putting the GIC
> into a state that triggers the SErrors.

And to be clear, if the host kernel was doing the wrong thing, it
would take a *physical* SError. And on the M1, it really doesn't
matter as there is no physical GIC.

> The CPU interface registers are accessed directly by the guest, then
> changing that to trap-and-emulate looks like the only way to avoid
> the guest from crashing the host with an SError.
> 
> As for making the trap-and-emulate depend on the ICH_VTR_EL2.SEIS
> being set, that sounds reasonable to me, considering that there were
> no reports so far of this being implemented. And if it turns out
> that there are device which implement GIC generated SErrors
> *correctly* and the trap-and-emulate cost is too much, then we can
> always get an errata number from Apple and have the trapping depend
> on that, right?

I have very little hope that we can get Apple to give us anything
here. The CPU doesn't even advertise that it has a vGIC, so we're in
uncharted territories. But we could definitely key that on the MIDR.

> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks!

	M.

-- 
Without deviation from the norm, progress is not possible.

  reply	other threads:[~2021-10-04 13:25 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-24  8:25 [PATCH 0/5] KVM: arm64: Assorted vgic-v3 fixes Marc Zyngier
2021-09-24  8:25 ` Marc Zyngier
2021-09-24  8:25 ` Marc Zyngier
2021-09-24  8:25 ` [PATCH 1/5] KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3 Marc Zyngier
2021-09-24  8:25   ` Marc Zyngier
2021-09-24  8:25   ` Marc Zyngier
2021-09-29 15:29   ` Alexandru Elisei
2021-09-29 15:29     ` Alexandru Elisei
2021-09-29 15:29     ` Alexandru Elisei
2021-09-29 16:04     ` Marc Zyngier
2021-09-29 16:04       ` Marc Zyngier
2021-09-29 16:04       ` Marc Zyngier
2021-09-30  9:48       ` Alexandru Elisei
2021-09-30  9:48         ` Alexandru Elisei
2021-09-30  9:48         ` Alexandru Elisei
2021-09-24  8:25 ` [PATCH 2/5] KVM: arm64: Work around GICv3 locally generated SErrors Marc Zyngier
2021-09-24  8:25   ` Marc Zyngier
2021-09-24  8:25   ` Marc Zyngier
2021-10-01 21:43   ` Joey Gouly
2021-10-01 21:43     ` Joey Gouly
2021-10-01 21:43     ` Joey Gouly
2021-10-04 11:23   ` Alexandru Elisei
2021-10-04 11:23     ` Alexandru Elisei
2021-10-04 11:23     ` Alexandru Elisei
2021-10-04 13:25     ` Marc Zyngier [this message]
2021-10-04 13:25       ` Marc Zyngier
2021-10-04 13:25       ` Marc Zyngier
2021-09-24  8:25 ` [PATCH 3/5] KVM: arm64: vgic-v3: Don't advertise ICC_CTLR_EL1.SEIS Marc Zyngier
2021-09-24  8:25   ` Marc Zyngier
2021-09-24  8:25   ` Marc Zyngier
2021-10-04 12:49   ` Alexandru Elisei
2021-10-04 12:49     ` Alexandru Elisei
2021-10-04 12:49     ` Alexandru Elisei
2021-09-24  8:25 ` [PATCH 4/5] KVM: arm64: vgic-v3: Don't propagate LPI active state from LRs into the distributor Marc Zyngier
2021-09-24  8:25   ` Marc Zyngier
2021-09-24  8:25   ` Marc Zyngier
2021-09-24  8:25 ` [PATCH 5/5] KVM: arm64: vgic-v3: Align emulated cpuif LPI state machine with the pseudocode Marc Zyngier
2021-09-24  8:25   ` Marc Zyngier
2021-09-24  8:25   ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8735pgsz0n.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=alexandru.elisei@arm.com \
    --cc=kernel-team@android.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.