All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oliver Upton <oupton@google.com>
To: Gavin Shan <gshan@redhat.com>
Cc: kvmarm@lists.cs.columbia.edu, maz@kernel.org,
	linux-kernel@vger.kernel.org, eauger@redhat.com,
	shan.gavin@gmail.com, Jonathan.Cameron@huawei.com,
	pbonzini@redhat.com, vkuznets@redhat.com, will@kernel.org
Subject: Re: [PATCH v5 02/22] KVM: arm64: Add SDEI virtualization infrastructure
Date: Thu, 24 Mar 2022 09:04:40 +0000	[thread overview]
Message-ID: <Yjw0KFZ+DG0A1KxK@google.com> (raw)
In-Reply-To: <4d4e5645-4443-c233-6d25-97e68d804512@redhat.com>

On Thu, Mar 24, 2022 at 02:54:00PM +0800, Gavin Shan wrote:
> > > +#define KVM_SDEI_DEFAULT_EVENT	0
> > > +
> > > +#define KVM_SDEI_MAX_VCPUS	512	/* Aligned to 64 */
> > > +#define KVM_SDEI_MAX_EVENTS	128
> > 
> > I would *strongly* recommend against having these limits. I find the
> > vCPU limit especially concerning, because we're making KVM_MAX_VCPUS
> > ABI, which it definitely is not. Anything that deals with a vCPU should
> > be accessed through a vCPU FD (and thus agnostic to the maximum number
> > of vCPUs) to avoid such a complication.
> > 
> 
> For KVM_SDEI_DEFAULT_EVENT, which corresponds to the software signaled
> event. As you suggested on PATCH[15/22], we can't assume its usage.
> I will define it with SDEI_SW_SIGNALED_EVENT in uapi/linux/arm_sdei.h
> 
> For KVM_SDEI_MAX_EVENTS, it will be moved from this header file to
> kvm_sdei.h after static arrays to hold the data structures or their
> pointers are used, as you suggested early for this patch (PATCH[02/22]).
> 
> There are two types of (SDEI) events: shared and private. For the private
> event, it can be registered independently from the vcpus. It also means
> the address and argument for the entry points, corresponding to @ep_address
> and @ep_arg in struct kvm_sdei_registered_event, can be different on
> the individual vcpus. However, all the registered/enabled states and
> the entry point address and argument are same on all vcpus for the shared
> event. KVM_SDEI_MAX_VCPUS was introduced to use same data structure to
> represent both shared and private event.

You're providing a great deal of abstraction around the SDEI
specification, but I really am unconvinced that KVM needs that. This
series needs to add support for a single SDEI event (event 0) and async
PF to follow. Since we are going to support a static set of events under
KVM I believe a lot of the complexity in this design should fade away.

> > > +struct kvm_sdei_exposed_event_state {
> > > +	__u64	num;
> > > +
> > > +	__u8	type;
> > > +	__u8	signaled;
> > > +	__u8	priority;
> > > +	__u8	padding[5];
> > > +	__u64	notifier;
> > 
> > Wait, isn't this a kernel function pointer!?
> > 
> 
> Yeah, it is a kernel function pointer, used by Async PF to know if
> the corresponding event has been handled or not. Async PF can cancel
> the previously injected event for performance concerns. Either Async PF
> or SDEI needs to migrate it. To keep SDEI transparent enough to Async PF,
> SDEI is responsible for its migration.

But this is a UAPI header, why would we even consider giving userspace a
window into the kernel like this? Couldn't userspace crash the kernel by
writing whatever it wants to this field, knowing that it will call it as
a function pointer?

Security aside, there's no guarantee that a function winds up at the
same address between compiler versions or kernel versions.

Overall, I don't think that userspace should have the ability to add
arbitrary SDEI events. KVM takes ownership of its own vendor-specific
ABI in patch 3/22 by declaring its vendor identifier, so any new events
we support must remain within KVM. There is going to be some state that
will need to be migrated for KVM's SDEI events, that ought to be
surfaced to userspace through the KVM_{GET,SET}_ONE_REG ioctls.

Given that there isn't much userspace involvement to make SDEI
work, do you think it would be possible to drop the proposed UAPI from
your series and work on enabling software-signaled SDEI events within
KVM first? By this I mean a VM running under KVM shouldn't require any
ioctls to make it work.

In so doing, we can discover exactly what the mechanics look like in KVM
and only then talk about the necessary UAPI to migrate state. One piece
of the mechanics that is top of mind which I'd like to see addressed is
the use of linked lists and the preallocations that have been made in
structures. KVM will know how many events exist at compile time, so we
can represent these statically.

> > > +};
> > > +
> > > +struct kvm_sdei_registered_event_state {
> > 
> > You should fold these fields together with kvm_sdei_exposed_event_state
> > into a single 'kvm_sdei_event' structure:
> > 
> 
> @route_mode and @route_affinity can't be configured or modified until
> the event is registered. Besides, they're only valid to the shared
> events. For private events, they don't have the routing needs. It means
> those two fields would be part of struct kvm_sdei_registered_event instead
> of kvm_sdei_exposed_event.
> 
> 
> > > +	__u64	num;
> > > +
> > > +	__u8	route_mode;
> > > +	__u8	padding[3];
> > > +	__u64	route_affinity;
> > 
> > And these shouldn't be UAPI at the VM scope. Each of these properties
> > could be accessed via a synthetic/'pseudo-firmware' register on a vCPU FD:
> > 
> 
> They're accessed through vcpu or kvm fd depending on what type the event
> is. For the VM-owned shared event, they're accessed through KVM fd. For the
> vcpu-owned private event, they're accessed through vcpu fd.

Some of the state that you represent in struct kvm_sdei_registered_event_state
is really per-vCPU state. Any time that there's data available at a vCPU
granularity userspace should access it with a vCPU FD.

> I'm not sure if I catch the idea to have a synthetic register and I'm to
> confirm. If I'm correct, you're talking about the "IMPLEMENTATION DEFINED"
> system register, whose OP0 and CRn are 0B11 and 0B1x11. If two implementation
> defined registers can be adopted, I don't think we need to expose anything
> through ABI. All the operations and the needed data can be passed through
> the system registers.

No, I'm not talking about the guest-facing interface. I'm talking about
what gets exposed to userspace to migrate the VM's state. For parts of
the guest that do not map to an architectural construct, we've defined
the concept of a firmware pseudo-register. What that really means is we
expose a register to userspace that is inaccessible from the guest which
migrates KVM's nonarchitected state. We are abusing the fact that VMMs
will save/restore whatever registers are reported on KVM_GET_REG_LIST to
trick it into migrating KVM state, and it has worked quite nicely to
avoid adding new ioctls for new features. It also means that older VMMs
are capable of utilizing new KVM features, so long as they obey the
prescribed rules.

--
Thanks,
Oliver

WARNING: multiple messages have this Message-ID (diff)
From: Oliver Upton <oupton@google.com>
To: Gavin Shan <gshan@redhat.com>
Cc: maz@kernel.org, linux-kernel@vger.kernel.org, eauger@redhat.com,
	shan.gavin@gmail.com, Jonathan.Cameron@huawei.com,
	pbonzini@redhat.com, vkuznets@redhat.com, will@kernel.org,
	kvmarm@lists.cs.columbia.edu
Subject: Re: [PATCH v5 02/22] KVM: arm64: Add SDEI virtualization infrastructure
Date: Thu, 24 Mar 2022 09:04:40 +0000	[thread overview]
Message-ID: <Yjw0KFZ+DG0A1KxK@google.com> (raw)
In-Reply-To: <4d4e5645-4443-c233-6d25-97e68d804512@redhat.com>

On Thu, Mar 24, 2022 at 02:54:00PM +0800, Gavin Shan wrote:
> > > +#define KVM_SDEI_DEFAULT_EVENT	0
> > > +
> > > +#define KVM_SDEI_MAX_VCPUS	512	/* Aligned to 64 */
> > > +#define KVM_SDEI_MAX_EVENTS	128
> > 
> > I would *strongly* recommend against having these limits. I find the
> > vCPU limit especially concerning, because we're making KVM_MAX_VCPUS
> > ABI, which it definitely is not. Anything that deals with a vCPU should
> > be accessed through a vCPU FD (and thus agnostic to the maximum number
> > of vCPUs) to avoid such a complication.
> > 
> 
> For KVM_SDEI_DEFAULT_EVENT, which corresponds to the software signaled
> event. As you suggested on PATCH[15/22], we can't assume its usage.
> I will define it with SDEI_SW_SIGNALED_EVENT in uapi/linux/arm_sdei.h
> 
> For KVM_SDEI_MAX_EVENTS, it will be moved from this header file to
> kvm_sdei.h after static arrays to hold the data structures or their
> pointers are used, as you suggested early for this patch (PATCH[02/22]).
> 
> There are two types of (SDEI) events: shared and private. For the private
> event, it can be registered independently from the vcpus. It also means
> the address and argument for the entry points, corresponding to @ep_address
> and @ep_arg in struct kvm_sdei_registered_event, can be different on
> the individual vcpus. However, all the registered/enabled states and
> the entry point address and argument are same on all vcpus for the shared
> event. KVM_SDEI_MAX_VCPUS was introduced to use same data structure to
> represent both shared and private event.

You're providing a great deal of abstraction around the SDEI
specification, but I really am unconvinced that KVM needs that. This
series needs to add support for a single SDEI event (event 0) and async
PF to follow. Since we are going to support a static set of events under
KVM I believe a lot of the complexity in this design should fade away.

> > > +struct kvm_sdei_exposed_event_state {
> > > +	__u64	num;
> > > +
> > > +	__u8	type;
> > > +	__u8	signaled;
> > > +	__u8	priority;
> > > +	__u8	padding[5];
> > > +	__u64	notifier;
> > 
> > Wait, isn't this a kernel function pointer!?
> > 
> 
> Yeah, it is a kernel function pointer, used by Async PF to know if
> the corresponding event has been handled or not. Async PF can cancel
> the previously injected event for performance concerns. Either Async PF
> or SDEI needs to migrate it. To keep SDEI transparent enough to Async PF,
> SDEI is responsible for its migration.

But this is a UAPI header, why would we even consider giving userspace a
window into the kernel like this? Couldn't userspace crash the kernel by
writing whatever it wants to this field, knowing that it will call it as
a function pointer?

Security aside, there's no guarantee that a function winds up at the
same address between compiler versions or kernel versions.

Overall, I don't think that userspace should have the ability to add
arbitrary SDEI events. KVM takes ownership of its own vendor-specific
ABI in patch 3/22 by declaring its vendor identifier, so any new events
we support must remain within KVM. There is going to be some state that
will need to be migrated for KVM's SDEI events, that ought to be
surfaced to userspace through the KVM_{GET,SET}_ONE_REG ioctls.

Given that there isn't much userspace involvement to make SDEI
work, do you think it would be possible to drop the proposed UAPI from
your series and work on enabling software-signaled SDEI events within
KVM first? By this I mean a VM running under KVM shouldn't require any
ioctls to make it work.

In so doing, we can discover exactly what the mechanics look like in KVM
and only then talk about the necessary UAPI to migrate state. One piece
of the mechanics that is top of mind which I'd like to see addressed is
the use of linked lists and the preallocations that have been made in
structures. KVM will know how many events exist at compile time, so we
can represent these statically.

> > > +};
> > > +
> > > +struct kvm_sdei_registered_event_state {
> > 
> > You should fold these fields together with kvm_sdei_exposed_event_state
> > into a single 'kvm_sdei_event' structure:
> > 
> 
> @route_mode and @route_affinity can't be configured or modified until
> the event is registered. Besides, they're only valid to the shared
> events. For private events, they don't have the routing needs. It means
> those two fields would be part of struct kvm_sdei_registered_event instead
> of kvm_sdei_exposed_event.
> 
> 
> > > +	__u64	num;
> > > +
> > > +	__u8	route_mode;
> > > +	__u8	padding[3];
> > > +	__u64	route_affinity;
> > 
> > And these shouldn't be UAPI at the VM scope. Each of these properties
> > could be accessed via a synthetic/'pseudo-firmware' register on a vCPU FD:
> > 
> 
> They're accessed through vcpu or kvm fd depending on what type the event
> is. For the VM-owned shared event, they're accessed through KVM fd. For the
> vcpu-owned private event, they're accessed through vcpu fd.

Some of the state that you represent in struct kvm_sdei_registered_event_state
is really per-vCPU state. Any time that there's data available at a vCPU
granularity userspace should access it with a vCPU FD.

> I'm not sure if I catch the idea to have a synthetic register and I'm to
> confirm. If I'm correct, you're talking about the "IMPLEMENTATION DEFINED"
> system register, whose OP0 and CRn are 0B11 and 0B1x11. If two implementation
> defined registers can be adopted, I don't think we need to expose anything
> through ABI. All the operations and the needed data can be passed through
> the system registers.

No, I'm not talking about the guest-facing interface. I'm talking about
what gets exposed to userspace to migrate the VM's state. For parts of
the guest that do not map to an architectural construct, we've defined
the concept of a firmware pseudo-register. What that really means is we
expose a register to userspace that is inaccessible from the guest which
migrates KVM's nonarchitected state. We are abusing the fact that VMMs
will save/restore whatever registers are reported on KVM_GET_REG_LIST to
trick it into migrating KVM state, and it has worked quite nicely to
avoid adding new ioctls for new features. It also means that older VMMs
are capable of utilizing new KVM features, so long as they obey the
prescribed rules.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

  reply	other threads:[~2022-03-24  9:04 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-22  8:06 [PATCH v5 00/22] Support SDEI Virtualization Gavin Shan
2022-03-22  8:06 ` Gavin Shan
2022-03-22  8:06 ` [PATCH v5 01/22] KVM: arm64: Introduce template for inline functions Gavin Shan
2022-03-22  8:06   ` Gavin Shan
2022-03-22 19:42   ` Oliver Upton
2022-03-22 19:42     ` Oliver Upton
2022-03-23 12:16     ` Gavin Shan
2022-03-23 12:16       ` Gavin Shan
2022-03-22  8:06 ` [PATCH v5 02/22] KVM: arm64: Add SDEI virtualization infrastructure Gavin Shan
2022-03-22  8:06   ` Gavin Shan
2022-03-22 22:43   ` Oliver Upton
2022-03-22 22:43     ` Oliver Upton
2022-03-23 12:40     ` Gavin Shan
2022-03-23 12:40       ` Gavin Shan
2022-03-23 17:11   ` Oliver Upton
2022-03-23 17:11     ` Oliver Upton
2022-03-24  6:54     ` Gavin Shan
2022-03-24  6:54       ` Gavin Shan
2022-03-24  9:04       ` Oliver Upton [this message]
2022-03-24  9:04         ` Oliver Upton
2022-03-25  6:07         ` Gavin Shan
2022-03-25  6:07           ` Gavin Shan
2022-03-22  8:06 ` [PATCH v5 03/22] KVM: arm64: Support SDEI_VERSION hypercall Gavin Shan
2022-03-22  8:06   ` Gavin Shan
2022-03-22 18:04   ` Oliver Upton
2022-03-22 18:04     ` Oliver Upton
2022-03-23 12:46     ` Gavin Shan
2022-03-23 12:46       ` Gavin Shan
2022-03-23 16:31       ` Oliver Upton
2022-03-23 16:31         ` Oliver Upton
2022-03-24  4:07         ` Gavin Shan
2022-03-24  4:07           ` Gavin Shan
2022-03-24  7:48           ` Oliver Upton
2022-03-24  7:48             ` Oliver Upton
2022-03-25  6:11             ` Gavin Shan
2022-03-25  6:11               ` Gavin Shan
2022-03-22  8:06 ` [PATCH v5 04/22] KVM: arm64: Support SDEI_EVENT_REGISTER hypercall Gavin Shan
2022-03-22  8:06   ` Gavin Shan
2022-03-22  8:06 ` [PATCH v5 05/22] KVM: arm64: Support SDEI_EVENT_{ENABLE, DISABLE} hypercall Gavin Shan
2022-03-22  8:06   ` Gavin Shan
2022-03-22  8:06 ` [PATCH v5 06/22] KVM: arm64: Support SDEI_EVENT_CONTEXT hypercall Gavin Shan
2022-03-22  8:06   ` Gavin Shan
2022-03-22  8:06 ` [PATCH v5 07/22] KVM: arm64: Support SDEI_EVENT_UNREGISTER hypercall Gavin Shan
2022-03-22  8:06   ` Gavin Shan
2022-03-22  8:06 ` [PATCH v5 08/22] KVM: arm64: Support SDEI_EVENT_STATUS hypercall Gavin Shan
2022-03-22  8:06   ` Gavin Shan
2022-03-22  8:06 ` [PATCH v5 09/22] KVM: arm64: Support SDEI_EVENT_GET_INFO hypercall Gavin Shan
2022-03-22  8:06   ` Gavin Shan
2022-03-22  8:06 ` [PATCH v5 10/22] KVM: arm64: Support SDEI_EVENT_ROUTING_SET hypercall Gavin Shan
2022-03-22  8:06   ` Gavin Shan
2022-03-22  8:06 ` [PATCH v5 11/22] KVM: arm64: Support SDEI_PE_{MASK, UNMASK} hypercall Gavin Shan
2022-03-22  8:06   ` Gavin Shan
2022-03-22  8:07 ` [PATCH v5 12/22] KVM: arm64: Support SDEI_{PRIVATE, SHARED}_RESET Gavin Shan
2022-03-22  8:07   ` Gavin Shan
2022-03-22  8:07 ` [PATCH v5 13/22] KVM: arm64: Support SDEI_FEATURES hypercall Gavin Shan
2022-03-22  8:07   ` Gavin Shan
2022-03-22  8:07 ` [PATCH v5 14/22] KVM: arm64: Support SDEI event injection, delivery and cancellation Gavin Shan
2022-03-22  8:07   ` Gavin Shan
2022-03-22  8:07 ` [PATCH v5 15/22] KVM: arm64: Support SDEI_EVENT_SIGNAL hypercall Gavin Shan
2022-03-22  8:07   ` Gavin Shan
2022-03-22 23:06   ` Oliver Upton
2022-03-22 23:06     ` Oliver Upton
2022-03-23 12:52     ` Gavin Shan
2022-03-23 12:52       ` Gavin Shan
2022-03-22  8:07 ` [PATCH v5 16/22] KVM: arm64: Support SDEI_EVENT_{COMPLETE,COMPLETE_AND_RESUME} hypercall Gavin Shan
2022-03-22  8:07   ` [PATCH v5 16/22] KVM: arm64: Support SDEI_EVENT_{COMPLETE, COMPLETE_AND_RESUME} hypercall Gavin Shan
2022-03-22  8:07 ` [PATCH v5 17/22] KVM: arm64: Support SDEI event notifier Gavin Shan
2022-03-22  8:07   ` Gavin Shan
2022-03-22  8:07 ` [PATCH v5 18/22] KVM: arm64: Support SDEI ioctl commands on VM Gavin Shan
2022-03-22  8:07   ` Gavin Shan
2022-03-23 17:28   ` Oliver Upton
2022-03-23 17:28     ` Oliver Upton
2022-03-25  6:59     ` Gavin Shan
2022-03-25  6:59       ` Gavin Shan
2022-03-25  7:35       ` Oliver Upton
2022-03-25  7:35         ` Oliver Upton
2022-03-25 10:14         ` Gavin Shan
2022-03-25 10:14           ` Gavin Shan
2022-03-22  8:07 ` [PATCH v5 19/22] KVM: arm64: Support SDEI ioctl commands on vCPU Gavin Shan
2022-03-22  8:07   ` Gavin Shan
2022-03-23 17:55   ` Oliver Upton
2022-03-23 17:55     ` Oliver Upton
2022-03-25  7:59     ` Gavin Shan
2022-03-25  7:59       ` Gavin Shan
2022-03-25  8:37       ` Oliver Upton
2022-03-25  8:37         ` Oliver Upton
2022-03-25 10:23         ` Gavin Shan
2022-03-25 10:23           ` Gavin Shan
2022-03-22  8:07 ` [PATCH v5 20/22] KVM: arm64: Export SDEI capability Gavin Shan
2022-03-22  8:07   ` Gavin Shan
2022-03-22  8:07 ` [PATCH v5 21/22] KVM: arm64: Add SDEI document Gavin Shan
2022-03-22  8:07   ` Gavin Shan
2022-03-22  8:07 ` [PATCH v5 22/22] KVM: selftests: Add SDEI test case Gavin Shan
2022-03-22  8:07   ` Gavin Shan
2022-03-22 18:13 ` [PATCH v5 00/22] Support SDEI Virtualization Oliver Upton
2022-03-22 18:13   ` Oliver Upton
2022-03-23 12:57   ` Gavin Shan
2022-03-23 12:57     ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yjw0KFZ+DG0A1KxK@google.com \
    --to=oupton@google.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=eauger@redhat.com \
    --cc=gshan@redhat.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=shan.gavin@gmail.com \
    --cc=vkuznets@redhat.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.