All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chao Peng <chao.p.peng@linux.intel.com>
To: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: Yu Zhang <yu.c.zhang@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	Sean Christopherson <seanjc@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	kvm@vger.kernel.org, Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
	Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>,
	"J . Bruce Fields" <bfields@fieldses.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com,
	ak@linux.intel.com, david@redhat.com, qemu-devel@nongnu.org
Subject: Re: [PATCH v4 12/12] KVM: Expose KVM_MEM_PRIVATE
Date: Wed, 23 Feb 2022 20:00:47 +0800	[thread overview]
Message-ID: <20220223120047.GB53733@chaop.bj.intel.com> (raw)
In-Reply-To: <45148f5f-fe79-b452-f3b2-482c5c3291c4@maciej.szmigiero.name>

On Tue, Feb 22, 2022 at 02:16:46AM +0100, Maciej S. Szmigiero wrote:
> On 17.02.2022 14:45, Chao Peng wrote:
> > On Tue, Jan 25, 2022 at 09:20:39PM +0100, Maciej S. Szmigiero wrote:
> > > On 18.01.2022 14:21, Chao Peng wrote:
> > > > KVM_MEM_PRIVATE is not exposed by default but architecture code can turn
> > > > on it by implementing kvm_arch_private_memory_supported().
> > > > 
> > > > Also private memslot cannot be movable and the same file+offset can not
> > > > be mapped into different GFNs.
> > > > 
> > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> > > > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> > > > ---
> > > (..)
> > > >    static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id,
> > > > -				      gfn_t start, gfn_t end)
> > > > +				      struct file *file,
> > > > +				      gfn_t start, gfn_t end,
> > > > +				      loff_t start_off, loff_t end_off)
> > > >    {
> > > >    	struct kvm_memslot_iter iter;
> > > > +	struct kvm_memory_slot *slot;
> > > > +	struct inode *inode;
> > > > +	int bkt;
> > > >    	kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) {
> > > >    		if (iter.slot->id != id)
> > > >    			return true;
> > > >    	}
> > > > +	/* Disallow mapping the same file+offset into multiple gfns. */
> > > > +	if (file) {
> > > > +		inode = file_inode(file);
> > > > +		kvm_for_each_memslot(slot, bkt, slots) {
> > > > +			if (slot->private_file &&
> > > > +			     file_inode(slot->private_file) == inode &&
> > > > +			     !(end_off <= slot->private_offset ||
> > > > +			       start_off >= slot->private_offset
> > > > +					     + (slot->npages >> PAGE_SHIFT)))
> > > > +				return true;
> > > > +		}
> > > > +	}
> > > 
> > > That's a linear scan of all memslots on each CREATE (and MOVE) operation
> > > with a fd - we just spent more than a year rewriting similar linear scans
> > > into more efficient operations in KVM.
> > 
> > In the last version I tried to solve this problem by using interval tree
> > (just like existing hva_tree), but finally we realized that in one VM we
> > can have multiple fds with overlapped offsets so that approach is
> > incorrect. See https://lkml.org/lkml/2021/12/28/480 for the discussion.
> 
> That's right, in this case a two-level structure would be necessary:
> the first level matching a file, then the second level matching that
> file ranges.
> However, if such data is going to be used just for checking possible
> overlap at memslot add or move time it is almost certainly an overkill.

Yes, that is also what I'm seeing.

> 
> > So linear scan is used before I can find a better way.
> 
> Another option would be to simply not check for overlap at add or move
> time, declare such configuration undefined behavior under KVM API and
> make sure in MMU notifiers that nothing bad happens to the host kernel
> if it turns out somebody actually set up a VM this way (it could be
> inefficient in this case, since it's not supposed to ever happen
> unless there is a bug somewhere in the userspace part).

Specific to TDX case, SEAMMODULE will fail the overlapping case and then
KVM prints a message to the kernel log. It will not cause any other side
effect, it does look weird however. Yes warn that in the API document
can help to some extent.

Thanks,
Chao
> 
> > Chao
> 
> Thanks,
> Maciej

WARNING: multiple messages have this Message-ID (diff)
From: Chao Peng <chao.p.peng@linux.intel.com>
To: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: Wanpeng Li <wanpengli@tencent.com>,
	jun.nakajima@intel.com, kvm@vger.kernel.org, david@redhat.com,
	qemu-devel@nongnu.org, "J . Bruce Fields" <bfields@fieldses.org>,
	linux-mm@kvack.org, "H . Peter Anvin" <hpa@zytor.com>,
	ak@linux.intel.com, Jonathan Corbet <corbet@lwn.net>,
	Joerg Roedel <joro@8bytes.org>,
	x86@kernel.org, Hugh Dickins <hughd@google.com>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	luto@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Jim Mattson <jmattson@google.com>,
	dave.hansen@intel.com, Sean Christopherson <seanjc@google.com>,
	Jeff Layton <jlayton@kernel.org>,
	linux-kernel@vger.kernel.org,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	linux-fsdevel@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH v4 12/12] KVM: Expose KVM_MEM_PRIVATE
Date: Wed, 23 Feb 2022 20:00:47 +0800	[thread overview]
Message-ID: <20220223120047.GB53733@chaop.bj.intel.com> (raw)
In-Reply-To: <45148f5f-fe79-b452-f3b2-482c5c3291c4@maciej.szmigiero.name>

On Tue, Feb 22, 2022 at 02:16:46AM +0100, Maciej S. Szmigiero wrote:
> On 17.02.2022 14:45, Chao Peng wrote:
> > On Tue, Jan 25, 2022 at 09:20:39PM +0100, Maciej S. Szmigiero wrote:
> > > On 18.01.2022 14:21, Chao Peng wrote:
> > > > KVM_MEM_PRIVATE is not exposed by default but architecture code can turn
> > > > on it by implementing kvm_arch_private_memory_supported().
> > > > 
> > > > Also private memslot cannot be movable and the same file+offset can not
> > > > be mapped into different GFNs.
> > > > 
> > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> > > > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> > > > ---
> > > (..)
> > > >    static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id,
> > > > -				      gfn_t start, gfn_t end)
> > > > +				      struct file *file,
> > > > +				      gfn_t start, gfn_t end,
> > > > +				      loff_t start_off, loff_t end_off)
> > > >    {
> > > >    	struct kvm_memslot_iter iter;
> > > > +	struct kvm_memory_slot *slot;
> > > > +	struct inode *inode;
> > > > +	int bkt;
> > > >    	kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) {
> > > >    		if (iter.slot->id != id)
> > > >    			return true;
> > > >    	}
> > > > +	/* Disallow mapping the same file+offset into multiple gfns. */
> > > > +	if (file) {
> > > > +		inode = file_inode(file);
> > > > +		kvm_for_each_memslot(slot, bkt, slots) {
> > > > +			if (slot->private_file &&
> > > > +			     file_inode(slot->private_file) == inode &&
> > > > +			     !(end_off <= slot->private_offset ||
> > > > +			       start_off >= slot->private_offset
> > > > +					     + (slot->npages >> PAGE_SHIFT)))
> > > > +				return true;
> > > > +		}
> > > > +	}
> > > 
> > > That's a linear scan of all memslots on each CREATE (and MOVE) operation
> > > with a fd - we just spent more than a year rewriting similar linear scans
> > > into more efficient operations in KVM.
> > 
> > In the last version I tried to solve this problem by using interval tree
> > (just like existing hva_tree), but finally we realized that in one VM we
> > can have multiple fds with overlapped offsets so that approach is
> > incorrect. See https://lkml.org/lkml/2021/12/28/480 for the discussion.
> 
> That's right, in this case a two-level structure would be necessary:
> the first level matching a file, then the second level matching that
> file ranges.
> However, if such data is going to be used just for checking possible
> overlap at memslot add or move time it is almost certainly an overkill.

Yes, that is also what I'm seeing.

> 
> > So linear scan is used before I can find a better way.
> 
> Another option would be to simply not check for overlap at add or move
> time, declare such configuration undefined behavior under KVM API and
> make sure in MMU notifiers that nothing bad happens to the host kernel
> if it turns out somebody actually set up a VM this way (it could be
> inefficient in this case, since it's not supposed to ever happen
> unless there is a bug somewhere in the userspace part).

Specific to TDX case, SEAMMODULE will fail the overlapping case and then
KVM prints a message to the kernel log. It will not cause any other side
effect, it does look weird however. Yes warn that in the API document
can help to some extent.

Thanks,
Chao
> 
> > Chao
> 
> Thanks,
> Maciej


  reply	other threads:[~2022-02-23 12:01 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-18 13:21 [PATCH v4 00/12] KVM: mm: fd-based approach for supporting KVM guest private memory Chao Peng
2022-01-18 13:21 ` Chao Peng
2022-01-18 13:21 ` [PATCH v4 01/12] mm/shmem: Introduce F_SEAL_INACCESSIBLE Chao Peng
2022-01-18 13:21   ` Chao Peng
2022-02-07 12:24   ` Vlastimil Babka
2022-02-07 12:24     ` Vlastimil Babka
2022-02-17 12:56     ` Chao Peng
2022-02-17 12:56       ` Chao Peng
2022-02-11 23:33   ` Andy Lutomirski
2022-02-11 23:33     ` Andy Lutomirski
2022-02-17 13:06     ` Chao Peng
2022-02-17 13:06       ` Chao Peng
2022-02-17 19:09       ` Andy Lutomirski
2022-02-17 19:09         ` Andy Lutomirski
2022-02-23 11:49         ` Chao Peng
2022-02-23 11:49           ` Chao Peng
2022-02-23 12:05           ` Steven Price
2022-02-23 12:05             ` Steven Price
2022-03-04 19:24             ` Andy Lutomirski
2022-03-04 19:24               ` Andy Lutomirski
2022-03-07 13:26               ` Chao Peng
2022-03-07 13:26                 ` Chao Peng
2022-03-08 12:17                 ` Paolo Bonzini
2022-03-08 12:17                   ` Paolo Bonzini
2022-01-18 13:21 ` [PATCH v4 02/12] mm/memfd: Introduce MFD_INACCESSIBLE flag Chao Peng
2022-01-18 13:21   ` Chao Peng
2022-01-21 15:50   ` Steven Price
2022-01-21 15:50     ` Steven Price
2022-01-24 13:29     ` Chao Peng
2022-01-24 13:29       ` Chao Peng
2022-02-07 18:51   ` Vlastimil Babka
2022-02-07 18:51     ` Vlastimil Babka
2022-02-08  8:49     ` David Hildenbrand
2022-02-08  8:49       ` David Hildenbrand
2022-02-08 18:22       ` Mike Rapoport
2022-02-08 18:22         ` Mike Rapoport
2022-01-18 13:21 ` [PATCH v4 03/12] mm: Introduce memfile_notifier Chao Peng
2022-01-18 13:21   ` Chao Peng
2022-03-07 15:42   ` Vlastimil Babka
2022-03-07 15:42     ` Vlastimil Babka
2022-03-08  1:45     ` Chao Peng
2022-03-08  1:45       ` Chao Peng
2022-01-18 13:21 ` [PATCH v4 04/12] mm/shmem: Support memfile_notifier Chao Peng
2022-01-18 13:21   ` Chao Peng
2022-02-08 18:29   ` Mike Rapoport
2022-02-08 18:29     ` Mike Rapoport
2022-02-17 13:10     ` Chao Peng
2022-02-17 13:10       ` Chao Peng
2022-02-11 23:40   ` Andy Lutomirski
2022-02-11 23:40     ` Andy Lutomirski
2022-02-17 13:23     ` Chao Peng
2022-02-17 13:23       ` Chao Peng
2022-01-18 13:21 ` [PATCH v4 05/12] KVM: Extend the memslot to support fd-based private memory Chao Peng
2022-01-18 13:21   ` Chao Peng
2022-01-18 13:21 ` [PATCH v4 06/12] KVM: Use kvm_userspace_memory_region_ext Chao Peng
2022-01-18 13:21   ` Chao Peng
2022-01-18 13:21 ` [PATCH v4 07/12] KVM: Add KVM_EXIT_MEMORY_ERROR exit Chao Peng
2022-01-18 13:21   ` Chao Peng
2022-01-18 13:21 ` [PATCH v4 08/12] KVM: Use memfile_pfn_ops to obtain pfn for private pages Chao Peng
2022-01-18 13:21   ` Chao Peng
2022-01-18 13:21 ` [PATCH v4 09/12] KVM: Handle page fault for private memory Chao Peng
2022-01-18 13:21   ` Chao Peng
2022-01-18 13:21 ` [PATCH v4 10/12] KVM: Register private memslot to memory backing store Chao Peng
2022-01-18 13:21   ` Chao Peng
2022-01-18 13:21 ` [PATCH v4 11/12] KVM: Zap existing KVM mappings when pages changed in the private fd Chao Peng
2022-01-18 13:21   ` Chao Peng
2022-01-18 13:21 ` [PATCH v4 12/12] KVM: Expose KVM_MEM_PRIVATE Chao Peng
2022-01-18 13:21   ` Chao Peng
2022-01-25 20:20   ` Maciej S. Szmigiero
2022-01-25 20:20     ` Maciej S. Szmigiero
2022-02-17 13:45     ` Chao Peng
2022-02-17 13:45       ` Chao Peng
2022-02-22  1:16       ` Maciej S. Szmigiero
2022-02-22  1:16         ` Maciej S. Szmigiero
2022-02-23 12:00         ` Chao Peng [this message]
2022-02-23 12:00           ` Chao Peng
2022-02-23 18:32           ` Maciej S. Szmigiero
2022-02-23 18:32             ` Maciej S. Szmigiero
2022-02-24  8:07             ` Chao Peng
2022-02-24  8:07               ` Chao Peng
2022-01-28 16:47 ` [PATCH v4 00/12] KVM: mm: fd-based approach for supporting KVM guest private memory Steven Price
2022-01-28 16:47   ` Steven Price
2022-02-02  2:28   ` Nakajima, Jun
2022-02-02  2:28     ` Nakajima, Jun
2022-02-02  9:23     ` Steven Price
2022-02-02  9:23       ` Steven Price
2022-02-02 20:47       ` Nakajima, Jun
2022-02-02 20:47         ` Nakajima, Jun
2022-02-08 18:33 ` Mike Rapoport
2022-02-08 18:33   ` Mike Rapoport
2022-02-17 13:47   ` Chao Peng
2022-02-17 13:47     ` Chao Peng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220223120047.GB53733@chaop.bj.intel.com \
    --to=chao.p.peng@linux.intel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jlayton@kernel.org \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=jun.nakajima@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.