kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Elena Afanasova <eafanasova@gmail.com>
Cc: kvm@vger.kernel.org, jag.raman@oracle.com, elena.ufimtseva@oracle.com
Subject: Re: [RFC v2 2/4] KVM: x86: add support for ioregionfd signal handling
Date: Sat, 30 Jan 2021 16:58:59 +0000	[thread overview]
Message-ID: <20210130165859.GC98016@stefanha-x1.localdomain> (raw)
In-Reply-To: <aa049c6e5bade3565c5ffa820bbbb67bd5d1bf4b.1611850291.git.eafanasova@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6275 bytes --]

On Thu, Jan 28, 2021 at 09:32:21PM +0300, Elena Afanasova wrote:
> The vCPU thread may receive a signal during ioregionfd communication,
> ioctl(KVM_RUN) needs to return to userspace and then ioctl(KVM_RUN)
> must resume ioregionfd.
> 
> Signed-off-by: Elena Afanasova <eafanasova@gmail.com>
> ---
> Changes in v2:
>   - add support for x86 signal handling
>   - changes after code review
> 
>  arch/x86/kvm/x86.c            | 196 +++++++++++++++++++++++++++++++---
>  include/linux/kvm_host.h      |  13 +++
>  include/uapi/linux/ioregion.h |  32 ++++++
>  virt/kvm/ioregion.c           | 177 +++++++++++++++++++++++++++++-
>  virt/kvm/kvm_main.c           |  16 ++-
>  5 files changed, 415 insertions(+), 19 deletions(-)
>  create mode 100644 include/uapi/linux/ioregion.h
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ddb28f5ca252..a04516b531da 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5799,19 +5799,33 @@ static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
>  {
>  	int handled = 0;
>  	int n;
> +	int ret = 0;
> +	bool is_apic;
>  
>  	do {
>  		n = min(len, 8);
> -		if (!(lapic_in_kernel(vcpu) &&
> -		      !kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, addr, n, v))
> -		    && kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, n, v))
> -			break;
> +		is_apic = lapic_in_kernel(vcpu) &&
> +			  !kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev,
> +					      addr, n, v);
> +		if (!is_apic) {
> +			ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS,
> +					       addr, n, v);
> +			if (ret)
> +				break;
> +		}
>  		handled += n;
>  		addr += n;
>  		len -= n;
>  		v += n;
>  	} while (len);
>  
> +#ifdef CONFIG_KVM_IOREGION
> +	if (ret == -EINTR) {
> +		vcpu->run->exit_reason = KVM_EXIT_INTR;
> +		++vcpu->stat.signal_exits;
> +	}
> +#endif
> +
>  	return handled;
>  }

There is a special case for crossing page boundaries:
1. ioregion in the first 4 bytes (page 1) but not the second 4 bytes (page 2).
2. ioregion in the second 4 bytes (page 2) but not the first 4 bytes (page 1).
3. The first 4 bytes (page 1) in one ioregion and the second 4 bytes (page 2) in another ioregion.
4. The first 4 bytes (page 1) in one ioregion and the second 4 bytes (page 2) in the same ioregion.

Cases 3 and 4 are tricky. If I'm reading the code correctly we try
ioregion accesses twice, even if the first one returns -EINTR?

> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7cd667dddba9..5cfdecfca6db 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -318,6 +318,19 @@ struct kvm_vcpu {
>  #endif
>  	bool preempted;
>  	bool ready;
> +#ifdef CONFIG_KVM_IOREGION
> +	bool ioregion_interrupted;

Can this field move into ioregion_ctx?

> +	struct {
> +		struct kvm_io_device *dev;
> +		int pio;
> +		void *val;
> +		u8 state;
> +		u64 addr;
> +		int len;
> +		u64 data;
> +		bool in;
> +	} ioregion_ctx;

This struct can be reordered to remove holes between fields.

> +#endif
>  	struct kvm_vcpu_arch arch;
>  };
>  
> diff --git a/include/uapi/linux/ioregion.h b/include/uapi/linux/ioregion.h
> new file mode 100644
> index 000000000000..7898c01f84a1
> --- /dev/null
> +++ b/include/uapi/linux/ioregion.h
> @@ -0,0 +1,32 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */

To encourage people to implement the wire protocol even beyond the Linux
syscall environment (e.g. in other hypervisors and VMMs) you could make
the license more permissive:

  /* SPDX-License-Identifier: ((GPL-2.0-only WITH Linux-syscall-note) OR BSD-3-Clause) */

Several other <linux/*.h> files do this so that the header can be used
outside Linux without license concerns.

Here is the BSD 3-Clause license:
https://opensource.org/licenses/BSD-3-Clause

> +#ifndef _UAPI_LINUX_IOREGION_H
> +#define _UAPI_LINUX_IOREGION_H

Please add the wire protocol specification/documentation into this file.
That way this header file will serve as a comprehensive reference for
the protocol and changes to the header will also update the
documentation.

(The ioctl KVM_SET_IOREGIONFD parts belong in
Documentation/virt/kvm/api.rst but the wire protocol should be in this
header file instead.)

> +
> +/* Wire protocol */
> +struct ioregionfd_cmd {
> +	__u32 info;
> +	__u32 padding;
> +	__u64 user_data;
> +	__u64 offset;
> +	__u64 data;
> +};
> +
> +struct ioregionfd_resp {
> +	__u64 data;
> +	__u8 pad[24];
> +};
> +
> +#define IOREGIONFD_CMD_READ    0
> +#define IOREGIONFD_CMD_WRITE   1
> +
> +#define IOREGIONFD_SIZE_8BIT   0
> +#define IOREGIONFD_SIZE_16BIT  1
> +#define IOREGIONFD_SIZE_32BIT  2
> +#define IOREGIONFD_SIZE_64BIT  3

It's possible that larger read/write operations will be needed in the
future. For example, the PCI Express bus supports much larger
transactions than just 64 bits.

You don't need to address this right now but I wanted to mention it.

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 88b92fc3da51..df387857f51f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4193,6 +4193,7 @@ static int __kvm_io_bus_write(struct kvm_vcpu *vcpu, struct kvm_io_bus *bus,
>  			      struct kvm_io_range *range, const void *val)
>  {
>  	int idx;
> +	int ret = 0;
>  
>  	idx = kvm_io_bus_get_first_dev(bus, range->addr, range->len);
>  	if (idx < 0)
> @@ -4200,9 +4201,12 @@ static int __kvm_io_bus_write(struct kvm_vcpu *vcpu, struct kvm_io_bus *bus,
>  
>  	while (idx < bus->dev_count &&
>  		kvm_io_bus_cmp(range, &bus->range[idx]) == 0) {
> -		if (!kvm_iodevice_write(vcpu, bus->range[idx].dev, range->addr,
> -					range->len, val))
> +		ret = kvm_iodevice_write(vcpu, bus->range[idx].dev, range->addr,
> +					 range->len, val);
> +		if (!ret)
>  			return idx;
> +		if (ret < 0 && ret != -EOPNOTSUPP)
> +			return ret;

I audited all kvm_io_bus_read/write() callers to check that it's safe to
add error return values besides -EOPNOTSUPP. Extending the meaning of
the return value is fine but any arches that want to support ioregionfd
need to explicitly handle -EINTR return values now. Only x86 does after
this patch.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2021-01-30 17:00 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-28 18:32 [RFC v2 0/4] Introduce MMIO/PIO dispatch file descriptors (ioregionfd) Elena Afanasova
2021-01-28 18:32 ` [RFC v2 2/4] KVM: x86: add support for ioregionfd signal handling Elena Afanasova
2021-01-30 16:58   ` Stefan Hajnoczi [this message]
2021-02-03 14:00     ` Elena Afanasova
2021-02-09  6:21   ` Jason Wang
2021-02-09 14:49     ` Stefan Hajnoczi
2021-02-10 19:06     ` Elena Afanasova
2021-02-09  6:26   ` Jason Wang
2021-01-28 18:32 ` [RFC v2 3/4] KVM: add support for ioregionfd cmds/replies serialization Elena Afanasova
2021-01-30 18:54   ` Stefan Hajnoczi
2021-02-03 14:10     ` Elena Afanasova
2021-01-28 18:32 ` [RFC v2 4/4] KVM: enforce NR_IOBUS_DEVS limit if kmemcg is disabled Elena Afanasova
2021-01-29 18:48 ` [RESEND RFC v2 1/4] KVM: add initial support for KVM_SET_IOREGION Elena Afanasova
2021-01-30 15:04   ` Stefan Hajnoczi
2021-02-04 13:03   ` Cornelia Huck
2021-02-05 18:39     ` Elena Afanasova
2021-02-08 11:49       ` Cornelia Huck
2021-02-08  6:21   ` Jason Wang
2021-02-09 14:59     ` Stefan Hajnoczi
2021-02-18  6:17       ` Jason Wang
2021-02-10 19:31     ` Elena Afanasova
2021-02-11 14:59       ` Stefan Hajnoczi
2021-02-17 23:05         ` Elena Afanasova
2021-02-18  6:22         ` Jason Wang
2021-02-18  6:20       ` Jason Wang
2021-01-30 14:56 ` [RFC v2 0/4] Introduce MMIO/PIO dispatch file descriptors (ioregionfd) Stefan Hajnoczi
2021-02-02 14:59 ` Stefan Hajnoczi
2021-02-08  6:02 ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210130165859.GC98016@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=eafanasova@gmail.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=jag.raman@oracle.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).