All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Dan Williams" <dan.j.williams@intel.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Vitaly Kuznetsov" <vkuznets@redhat.com>,
	"Wanpeng Li" <wanpengli@tencent.com>,
	"Jim Mattson" <jmattson@google.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	"KVM list" <kvm@vger.kernel.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Adam Borowski" <kilobyte@angband.pl>,
	"David Hildenbrand" <david@redhat.com>
Subject: Re: [PATCH 1/2] KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved
Date: Tue, 12 Nov 2019 08:57:17 -0800	[thread overview]
Message-ID: <20191112165717.GA18089@linux.intel.com> (raw)
In-Reply-To: <e6637be8-7890-579b-8131-6fdbbd791fa0@redhat.com>

On Tue, Nov 12, 2019 at 11:19:44AM +0100, Paolo Bonzini wrote:
> On 12/11/19 01:51, Dan Williams wrote:
> > An elevated page reference count for file mapped pages causes the
> > filesystem (for a dax mode file) to wait for that reference count to
> > drop to 1 before allowing the truncate to proceed. For a page cache
> > backed file mapping (non-dax) the reference count is not considered in
> > the truncate path. It does prevent the page from getting freed in the
> > page cache case, but the association to the file is lost for truncate.
> 
> KVM support for file-backed guest memory is limited.  It is not
> completely broken, in fact cases such as hugetlbfs are in use routinely,
> but corner cases such as truncate aren't covered well indeed.

KVM's actual MMU should be ok since it coordinates with the mmu_notifier.

kvm_vcpu_map() is where KVM could run afoul of page cache truncation.
This is the other main use of hva_to_pfn*(), where KVM directly accesses
guest memory (which could be file-backed) without coordinating with the
mmu_notifier.  IIUC, an ill-timed page cache truncation could result in a
write from KVM effectively being dropped due to writeback racing with
KVM's write to the page.  If that's true, then I think KVM would need to
to move to the proposed pin_user_pages() to ensure its "DMA" isn't lost.

> > As long as any memory the guest expects to be persistent is backed by
> > mmu-notifier coordination we're all good, otherwise an elevated
> > reference count does not coordinate with truncate in a reliable way.

KVM itself is (mostly) blissfully unaware of any such expectations.  The
userspace VMM, e.g. Qemu, is ultimately responsible for ensuring the guest
sees a valid model, e.g. that persistent memory (as presented to the guest)
is actually persistent (from the guest's perspective).

The big caveat is the truncation issue above.

  reply	other threads:[~2019-11-12 16:57 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-06 17:07 [PATCH 0/2] KVM: MMU: Fix a refcount bug with ZONE_DEVICE pages Sean Christopherson
2019-11-06 17:07 ` [PATCH 1/2] KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved Sean Christopherson
2019-11-06 17:14   ` Paolo Bonzini
2019-11-06 17:46     ` Sean Christopherson
2019-11-06 18:04   ` Dan Williams
2019-11-06 20:26     ` Sean Christopherson
2019-11-06 20:34       ` Dan Williams
2019-11-06 21:09     ` Paolo Bonzini
2019-11-06 21:30       ` Sean Christopherson
2019-11-06 23:20       ` Dan Williams
2019-11-06 23:39         ` Sean Christopherson
2019-11-07  0:01           ` Dan Williams
2019-11-07  5:48             ` Dan Williams
2019-11-07 11:12               ` Paolo Bonzini
2019-11-07 15:36                 ` Dan Williams
2019-11-07 15:58                   ` Sean Christopherson
2019-11-09  1:43                     ` Sean Christopherson
2019-11-09  2:00                       ` Dan Williams
2019-11-11 18:27                         ` Sean Christopherson
2019-11-11 19:15                           ` Paolo Bonzini
2019-11-12  0:51                           ` Dan Williams
2019-11-12 10:19                             ` Paolo Bonzini
2019-11-12 16:57                               ` Sean Christopherson [this message]
2019-11-06 17:07 ` [PATCH 2/2] KVM: x86/mmu: Add helper to consolidate huge page promotion Sean Christopherson
2019-11-06 17:22   ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191112165717.GA18089@linux.intel.com \
    --to=sean.j.christopherson@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kilobyte@angband.pl \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.