All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Jones <drjones@redhat.com>
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Peter Xu <peterx@redhat.com>,
	Michael Tsirkin <mst@redhat.com>,
	Julia Suvorova <jsuvorov@redhat.com>,
	Andy Lutomirski <luto@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] KVM: x86: introduce KVM_MEM_PCI_HOLE memory
Date: Wed, 5 Aug 2020 17:18:43 +0200	[thread overview]
Message-ID: <20200805151843.yii4ufv7ubc7hqb5@kamzik.brq.redhat.com> (raw)
In-Reply-To: <20200728143741.2718593-3-vkuznets@redhat.com>

On Tue, Jul 28, 2020 at 04:37:40PM +0200, Vitaly Kuznetsov wrote:
> PCIe config space can (depending on the configuration) be quite big but
> usually is sparsely populated. Guest may scan it by accessing individual
> device's page which, when device is missing, is supposed to have 'pci
> hole' semantics: reads return '0xff' and writes get discarded. Compared
> to the already existing KVM_MEM_READONLY, VMM doesn't need to allocate
> real memory and stuff it with '0xff'.
> 
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  Documentation/virt/kvm/api.rst  | 19 +++++++++++-----
>  arch/x86/include/uapi/asm/kvm.h |  1 +
>  arch/x86/kvm/mmu/mmu.c          |  5 ++++-
>  arch/x86/kvm/mmu/paging_tmpl.h  |  3 +++
>  arch/x86/kvm/x86.c              | 10 ++++++---
>  include/linux/kvm_host.h        |  7 +++++-
>  include/uapi/linux/kvm.h        |  3 ++-
>  virt/kvm/kvm_main.c             | 39 +++++++++++++++++++++++++++------
>  8 files changed, 68 insertions(+), 19 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 644e5326aa50..fbbf533a331b 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -1241,6 +1241,7 @@ yet and must be cleared on entry.
>    /* for kvm_memory_region::flags */
>    #define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
>    #define KVM_MEM_READONLY	(1UL << 1)
> +  #define KVM_MEM_PCI_HOLE		(1UL << 2)
>  
>  This ioctl allows the user to create, modify or delete a guest physical
>  memory slot.  Bits 0-15 of "slot" specify the slot id and this value
> @@ -1268,12 +1269,18 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
>  be identical.  This allows large pages in the guest to be backed by large
>  pages in the host.
>  
> -The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
> -KVM_MEM_READONLY.  The former can be set to instruct KVM to keep track of
> -writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to
> -use it.  The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
> -to make a new slot read-only.  In this case, writes to this memory will be
> -posted to userspace as KVM_EXIT_MMIO exits.
> +The flags field supports the following flags: KVM_MEM_LOG_DIRTY_PAGES,
> +KVM_MEM_READONLY, KVM_MEM_READONLY:

The second KVM_MEM_READONLY should be KVM_MEM_PCI_HOLE. Or just drop the
list here, as they're listed below anyway

> +- KVM_MEM_LOG_DIRTY_PAGES can be set to instruct KVM to keep track of writes to
> +  memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to use it.
> +- KVM_MEM_READONLY can be set, if KVM_CAP_READONLY_MEM capability allows it,
> +  to make a new slot read-only.  In this case, writes to this memory will be
> +  posted to userspace as KVM_EXIT_MMIO exits.
> +- KVM_MEM_PCI_HOLE can be set, if KVM_CAP_PCI_HOLE_MEM capability allows it,
> +  to create a new virtual read-only slot which will always return '0xff' when
> +  guest reads from it. 'userspace_addr' has to be set to NULL. This flag is
> +  mutually exclusive with KVM_MEM_LOG_DIRTY_PAGES/KVM_MEM_READONLY. All writes
> +  to this memory will be posted to userspace as KVM_EXIT_MMIO exits.

I see 2/3's of this text is copy+pasted from above, but how about this

 - KVM_MEM_LOG_DIRTY_PAGES: log writes.  Use KVM_GET_DIRTY_LOG to retreive
   the log.
 - KVM_MEM_READONLY: exit to userspace with KVM_EXIT_MMIO on writes.  Only
   available when KVM_CAP_READONLY_MEM is present.
 - KVM_MEM_PCI_HOLE: always return 0xff on reads, exit to userspace with
   KVM_EXIT_MMIO on writes.  Only available when KVM_CAP_PCI_HOLE_MEM is
   present.  When setting the memory region 'userspace_addr' must be NULL.
   This flag is mutually exclusive with KVM_MEM_LOG_DIRTY_PAGES and with
   KVM_MEM_READONLY.

>  
>  When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
>  the memory region are automatically reflected into the guest.  For example, an
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 17c5a038f42d..cf80a26d74f5 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -48,6 +48,7 @@
>  #define __KVM_HAVE_XSAVE
>  #define __KVM_HAVE_XCRS
>  #define __KVM_HAVE_READONLY_MEM
> +#define __KVM_HAVE_PCI_HOLE_MEM
>  
>  /* Architectural interrupt line count. */
>  #define KVM_NR_INTERRUPTS 256
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 8597e8102636..c2e3a1deafdd 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3253,7 +3253,7 @@ static int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
>  		return PG_LEVEL_4K;
>  
>  	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, true);
> -	if (!slot)
> +	if (!slot || (slot->flags & KVM_MEM_PCI_HOLE))
>  		return PG_LEVEL_4K;
>  
>  	max_level = min(max_level, max_page_level);
> @@ -4104,6 +4104,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
>  
>  	slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
>  
> +	if (!write && slot && (slot->flags & KVM_MEM_PCI_HOLE))
> +		return RET_PF_EMULATE;
> +
>  	if (try_async_pf(vcpu, slot, prefault, gfn, gpa, &pfn, write,
>  			 &map_writable))
>  		return RET_PF_RETRY;
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 5c6a895f67c3..27abd69e69f6 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -836,6 +836,9 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
>  
>  	slot = kvm_vcpu_gfn_to_memslot(vcpu, walker.gfn);
>  
> +	if (!write_fault && slot && (slot->flags & KVM_MEM_PCI_HOLE))
> +		return RET_PF_EMULATE;
> +
>  	if (try_async_pf(vcpu, slot, prefault, walker.gfn, addr, &pfn,
>  			 write_fault, &map_writable))
>  		return RET_PF_RETRY;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 95ef62922869..dc312b8bfa05 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3515,6 +3515,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_EXCEPTION_PAYLOAD:
>  	case KVM_CAP_SET_GUEST_DEBUG:
>  	case KVM_CAP_LAST_CPU:
> +	case KVM_CAP_PCI_HOLE_MEM:
>  		r = 1;
>  		break;
>  	case KVM_CAP_SYNC_REGS:
> @@ -10115,9 +10116,11 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
>  		ugfn = slot->userspace_addr >> PAGE_SHIFT;
>  		/*
>  		 * If the gfn and userspace address are not aligned wrt each
> -		 * other, disable large page support for this slot.
> +		 * other, disable large page support for this slot. Also,
> +		 * disable large page support for KVM_MEM_PCI_HOLE slots.
>  		 */
> -		if ((slot->base_gfn ^ ugfn) & (KVM_PAGES_PER_HPAGE(level) - 1)) {
> +		if (slot->flags & KVM_MEM_PCI_HOLE || ((slot->base_gfn ^ ugfn) &

Please add () around the first expression

> +				      (KVM_PAGES_PER_HPAGE(level) - 1))) {
>  			unsigned long j;
>  
>  			for (j = 0; j < lpages; ++j)
> @@ -10179,7 +10182,8 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
>  	 * Nothing to do for RO slots or CREATE/MOVE/DELETE of a slot.
>  	 * See comments below.
>  	 */
> -	if ((change != KVM_MR_FLAGS_ONLY) || (new->flags & KVM_MEM_READONLY))
> +	if ((change != KVM_MR_FLAGS_ONLY) || (new->flags & KVM_MEM_READONLY) ||
> +	    (new->flags & KVM_MEM_PCI_HOLE))

How about

 if ((change != KVM_MR_FLAGS_ONLY) ||
     (new->flags & (KVM_MEM_READONLY|KVM_MEM_PCI_HOLE)))

>  		return;
>  
>  	/*
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 989afcbe642f..63c2d93ef172 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1081,7 +1081,12 @@ __gfn_to_memslot(struct kvm_memslots *slots, gfn_t gfn)
>  static inline unsigned long
>  __gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
>  {
> -	return slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE;
> +	if (likely(!(slot->flags & KVM_MEM_PCI_HOLE))) {
> +		return slot->userspace_addr +
> +			(gfn - slot->base_gfn) * PAGE_SIZE;
> +	} else {
> +		BUG();

Debug code you forgot to remove? I see below you've modified
__gfn_to_hva_many() to return KVM_HVA_ERR_BAD already when
given a PCI hole slot. I think that's the only check we should add.

> +	}
>  }
>  
>  static inline int memslot_id(struct kvm *kvm, gfn_t gfn)
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 2c73dcfb3dbb..59d631cbb71d 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -109,6 +109,7 @@ struct kvm_userspace_memory_region {
>   */
>  #define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
>  #define KVM_MEM_READONLY	(1UL << 1)
> +#define KVM_MEM_PCI_HOLE		(1UL << 2)
>  
>  /* for KVM_IRQ_LINE */
>  struct kvm_irq_level {
> @@ -1034,7 +1035,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_ASYNC_PF_INT 183
>  #define KVM_CAP_LAST_CPU 184
>  #define KVM_CAP_SMALLER_MAXPHYADDR 185
> -
> +#define KVM_CAP_PCI_HOLE_MEM 186
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 2c2c0254c2d8..3f69ae711021 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1107,6 +1107,10 @@ static int check_memory_region_flags(const struct kvm_userspace_memory_region *m
>  	valid_flags |= KVM_MEM_READONLY;
>  #endif
>  
> +#ifdef __KVM_HAVE_PCI_HOLE_MEM
> +	valid_flags |= KVM_MEM_PCI_HOLE;
> +#endif
> +
>  	if (mem->flags & ~valid_flags)
>  		return -EINVAL;
>  
> @@ -1284,11 +1288,26 @@ int __kvm_set_memory_region(struct kvm *kvm,
>  		return -EINVAL;
>  	if (mem->guest_phys_addr & (PAGE_SIZE - 1))
>  		return -EINVAL;
> -	/* We can read the guest memory with __xxx_user() later on. */
> -	if ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
> -	     !access_ok((void __user *)(unsigned long)mem->userspace_addr,
> -			mem->memory_size))
> +
> +	/*
> +	 * KVM_MEM_PCI_HOLE is mutually exclusive with KVM_MEM_READONLY/
> +	 * KVM_MEM_LOG_DIRTY_PAGES.
> +	 */
> +	if ((mem->flags & KVM_MEM_PCI_HOLE) &&
> +	    (mem->flags & (KVM_MEM_READONLY | KVM_MEM_LOG_DIRTY_PAGES)))
>  		return -EINVAL;
> +
> +	if (!(mem->flags & KVM_MEM_PCI_HOLE)) {
> +		/* We can read the guest memory with __xxx_user() later on. */
> +		if ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
> +		    !access_ok((void __user *)(unsigned long)mem->userspace_addr,
> +			       mem->memory_size))
> +			return -EINVAL;
> +	} else {
> +		if (mem->userspace_addr)
> +			return -EINVAL;
> +	}
> +
>  	if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM)
>  		return -EINVAL;
>  	if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
> @@ -1328,7 +1347,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
>  	} else { /* Modify an existing slot. */
>  		if ((new.userspace_addr != old.userspace_addr) ||
>  		    (new.npages != old.npages) ||
> -		    ((new.flags ^ old.flags) & KVM_MEM_READONLY))
> +		    ((new.flags ^ old.flags) & KVM_MEM_READONLY) ||
> +		    ((new.flags ^ old.flags) & KVM_MEM_PCI_HOLE))
>  			return -EINVAL;
>  
>  		if (new.base_gfn != old.base_gfn)
> @@ -1715,13 +1735,13 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
>  
>  static bool memslot_is_readonly(struct kvm_memory_slot *slot)
>  {
> -	return slot->flags & KVM_MEM_READONLY;
> +	return slot->flags & (KVM_MEM_READONLY | KVM_MEM_PCI_HOLE);
>  }
>  
>  static unsigned long __gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
>  				       gfn_t *nr_pages, bool write)
>  {
> -	if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
> +	if (!slot || (slot->flags & (KVM_MEMSLOT_INVALID | KVM_MEM_PCI_HOLE)))
>  		return KVM_HVA_ERR_BAD;
>  
>  	if (memslot_is_readonly(slot) && write)
> @@ -2318,6 +2338,11 @@ static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn,
>  	int r;
>  	unsigned long addr;
>  
> +	if (unlikely(slot && (slot->flags & KVM_MEM_PCI_HOLE))) {
> +		memset(data, 0xff, len);
> +		return 0;
> +	}
> +
>  	addr = gfn_to_hva_memslot_prot(slot, gfn, NULL);
>  	if (kvm_is_error_hva(addr))
>  		return -EFAULT;
> -- 
> 2.25.4
>

I didn't really review this patch, as it's touching lots of x86 mm
functions that I didn't want to delve into, but I took a quick look
since I was curious about the feature.

Thanks,
drew


  reply	other threads:[~2020-08-05 20:01 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28 14:37 [PATCH 0/3] KVM: x86: KVM_MEM_PCI_HOLE memory Vitaly Kuznetsov
2020-07-28 14:37 ` [PATCH 1/3] KVM: x86: move kvm_vcpu_gfn_to_memslot() out of try_async_pf() Vitaly Kuznetsov
2020-07-28 14:37 ` [PATCH 2/3] KVM: x86: introduce KVM_MEM_PCI_HOLE memory Vitaly Kuznetsov
2020-08-05 15:18   ` Andrew Jones [this message]
2020-08-06  9:08     ` Vitaly Kuznetsov
2020-08-05 17:05   ` Jim Mattson
2020-08-06  0:18     ` Michael S. Tsirkin
2020-08-06 17:36       ` Jim Mattson
2020-08-06  9:14     ` Vitaly Kuznetsov
2020-07-28 14:37 ` [PATCH 3/3] KVM: selftests: add KVM_MEM_PCI_HOLE test Vitaly Kuznetsov
2020-08-06  0:21 ` [PATCH 0/3] KVM: x86: KVM_MEM_PCI_HOLE memory Michael S. Tsirkin
2020-08-06  9:19   ` Vitaly Kuznetsov
2020-08-06  9:53     ` Michael S. Tsirkin
2020-08-06 11:39       ` Vitaly Kuznetsov
2020-08-06 12:19         ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200805151843.yii4ufv7ubc7hqb5@kamzik.brq.redhat.com \
    --to=drjones@redhat.com \
    --cc=jmattson@google.com \
    --cc=jsuvorov@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.