All of lore.kernel.org
 help / color / mirror / Atom feed
From: Barret Rhoden <brho-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
To: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: X86 ML <x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	"Zhang,
	Yu C" <yu.c.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	KVM list <kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	rkrcmar-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	linux-nvdimm
	<linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org>,
	Linux Kernel Mailing List
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Borislav Petkov <bp-Gina5bIWoIWzQB+pC5nmwQ@public.gmane.org>,
	zwisler-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	Paolo Bonzini <pbonzini-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	"H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>,
	"Zhang,
	Yi Z" <yi.z.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Subject: Re: [RFC PATCH] kvm: Use huge pages for DAX-backed files
Date: Tue, 30 Oct 2018 15:45:24 -0400	[thread overview]
Message-ID: <20181030154524.181b8236@gnomeregan.cam.corp.google.com> (raw)
In-Reply-To: <CAPcyv4gQztHrJ3--rhU4ZpaZyyqdqE0=gx50CRArHKiXwfYC+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 2018-10-29 at 20:10 Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> > > >  static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
> > > >                                         gfn_t *gfnp, kvm_pfn_t *pfnp,
> > > >                                         int *levelp)
> > > > @@ -3168,7 +3237,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
> > > >          */
> > > >         if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
> > > >             level == PT_PAGE_TABLE_LEVEL &&
> > > > -           PageTransCompoundMap(pfn_to_page(pfn)) &&
> > > > +           pfn_is_pmd_mapped(vcpu->kvm, gfn, pfn) &&  
> > >
> > > I'm wondering if we're adding an explicit is_zone_device_page() check
> > > in this path to determine the page mapping size if that can be a
> > > replacement for the kvm_is_reserved_pfn() check. In other words, the
> > > goal of fixing up PageReserved() was to preclude the need for DAX-page
> > > special casing in KVM, but if we already need add some special casing
> > > for page size determination, might as well bypass the
> > > kvm_is_reserved_pfn() dependency as well.  
> >
> > kvm_is_reserved_pfn() is used in some other places, like
> > kvm_set_pfn_dirty()and kvm_set_pfn_accessed().  Maybe the way those
> > treat DAX pages matters on a case-by-case basis?
> >
> > There are other callers of kvm_is_reserved_pfn() such as
> > kvm_pfn_to_page() and gfn_to_page().  I'm not familiar (yet) with how
> > struct pages and DAX work together, and whether or not the callers of
> > those pfn_to_page() functions have expectations about the 'type' of
> > struct page they get back.
> >  
> 
> The property of DAX pages that requires special coordination is the
> fact that the device hosting the pages can be disabled at will. The
> get_dev_pagemap() api is the interface to pin a device-pfn so that you
> can safely perform a pfn_to_page() operation.
> 
> Have the pages that kvm uses in this path already been pinned by vfio?

I'm not aware of any explicit pinning, but it might be happening under
the hood.  These pages are just generic guest RAM, but they are present
in a host-side mapping.  I ran into this when looking at EPT fault
handling.  In the code I changed, a physical page was faulted in to the
task's page table, then while the kvm->mmu_lock is held, KVM makes an
EPT mapping to the same physical page.  That mmu_lock seems to prevent
any concurrent host-side unmappings; though I'm not familiar with the mm
notifier stuff.

One usage of kvm_is_reserved_pfn() in KVM code is like this:

static struct page *kvm_pfn_to_page(kvm_pfn_t pfn)
{  
        if (is_error_noslot_pfn(pfn))
                return KVM_ERR_PTR_BAD_PAGE; 
    
        if (kvm_is_reserved_pfn(pfn)) {                      
                WARN_ON(1);
                return KVM_ERR_PTR_BAD_PAGE;                         
        }

        return pfn_to_page(pfn);                                                  
}

I think there's no guarantee the kvm->mmu_lock is held in the generic
case.  Here's one case where it wasn't (from walking through the code):

handle_exception
-handle_ud
--kvm_emulate_instruction
---x86_emulate_instruction
----x86_emulate_insn
-----writeback
------segmented_cmpxchg
-------emulator_cmpxchg_emulated
--------kvm_vcpu_gfn_to_page
---------kvm_pfn_to_page

There are probably other rules related to gfn_to_page that keep the
page alive, maybe just during interrupt/vmexit context?  Whatever keeps
those pages alive for normal memory might grab that devmap reference
under the hood for DAX mappings.

Thanks,
Barret

WARNING: multiple messages have this Message-ID (diff)
From: Barret Rhoden <brho@google.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>,
	zwisler@kernel.org, Vishal L Verma <vishal.l.verma@intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	rkrcmar@redhat.com, Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>, X86 ML <x86@kernel.org>,
	KVM list <kvm@vger.kernel.org>,
	"Zhang, Yu C" <yu.c.zhang@intel.com>,
	"Zhang, Yi Z" <yi.z.zhang@intel.com>
Subject: Re: [RFC PATCH] kvm: Use huge pages for DAX-backed files
Date: Tue, 30 Oct 2018 15:45:24 -0400	[thread overview]
Message-ID: <20181030154524.181b8236@gnomeregan.cam.corp.google.com> (raw)
In-Reply-To: <CAPcyv4gQztHrJ3--rhU4ZpaZyyqdqE0=gx50CRArHKiXwfYC+A@mail.gmail.com>

On 2018-10-29 at 20:10 Dan Williams <dan.j.williams@intel.com> wrote:
> > > >  static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
> > > >                                         gfn_t *gfnp, kvm_pfn_t *pfnp,
> > > >                                         int *levelp)
> > > > @@ -3168,7 +3237,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
> > > >          */
> > > >         if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
> > > >             level == PT_PAGE_TABLE_LEVEL &&
> > > > -           PageTransCompoundMap(pfn_to_page(pfn)) &&
> > > > +           pfn_is_pmd_mapped(vcpu->kvm, gfn, pfn) &&  
> > >
> > > I'm wondering if we're adding an explicit is_zone_device_page() check
> > > in this path to determine the page mapping size if that can be a
> > > replacement for the kvm_is_reserved_pfn() check. In other words, the
> > > goal of fixing up PageReserved() was to preclude the need for DAX-page
> > > special casing in KVM, but if we already need add some special casing
> > > for page size determination, might as well bypass the
> > > kvm_is_reserved_pfn() dependency as well.  
> >
> > kvm_is_reserved_pfn() is used in some other places, like
> > kvm_set_pfn_dirty()and kvm_set_pfn_accessed().  Maybe the way those
> > treat DAX pages matters on a case-by-case basis?
> >
> > There are other callers of kvm_is_reserved_pfn() such as
> > kvm_pfn_to_page() and gfn_to_page().  I'm not familiar (yet) with how
> > struct pages and DAX work together, and whether or not the callers of
> > those pfn_to_page() functions have expectations about the 'type' of
> > struct page they get back.
> >  
> 
> The property of DAX pages that requires special coordination is the
> fact that the device hosting the pages can be disabled at will. The
> get_dev_pagemap() api is the interface to pin a device-pfn so that you
> can safely perform a pfn_to_page() operation.
> 
> Have the pages that kvm uses in this path already been pinned by vfio?

I'm not aware of any explicit pinning, but it might be happening under
the hood.  These pages are just generic guest RAM, but they are present
in a host-side mapping.  I ran into this when looking at EPT fault
handling.  In the code I changed, a physical page was faulted in to the
task's page table, then while the kvm->mmu_lock is held, KVM makes an
EPT mapping to the same physical page.  That mmu_lock seems to prevent
any concurrent host-side unmappings; though I'm not familiar with the mm
notifier stuff.

One usage of kvm_is_reserved_pfn() in KVM code is like this:

static struct page *kvm_pfn_to_page(kvm_pfn_t pfn)
{  
        if (is_error_noslot_pfn(pfn))
                return KVM_ERR_PTR_BAD_PAGE; 
    
        if (kvm_is_reserved_pfn(pfn)) {                      
                WARN_ON(1);
                return KVM_ERR_PTR_BAD_PAGE;                         
        }

        return pfn_to_page(pfn);                                                  
}

I think there's no guarantee the kvm->mmu_lock is held in the generic
case.  Here's one case where it wasn't (from walking through the code):

handle_exception
-handle_ud
--kvm_emulate_instruction
---x86_emulate_instruction
----x86_emulate_insn
-----writeback
------segmented_cmpxchg
-------emulator_cmpxchg_emulated
--------kvm_vcpu_gfn_to_page
---------kvm_pfn_to_page

There are probably other rules related to gfn_to_page that keep the
page alive, maybe just during interrupt/vmexit context?  Whatever keeps
those pages alive for normal memory might grab that devmap reference
under the hood for DAX mappings.

Thanks,
Barret


  parent reply	other threads:[~2018-10-30 19:45 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-29 21:07 [RFC PATCH] kvm: Use huge pages for DAX-backed files Barret Rhoden
2018-10-29 21:07 ` Barret Rhoden
2018-10-29 22:25 ` Dan Williams
2018-10-29 22:25   ` Dan Williams
2018-10-29 22:25   ` Dan Williams
     [not found]   ` <CAPcyv4gJUjuSKwy7i2wuKR=Vz-AkDrxnGya5qkg7XTFxuXbtzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-10-30  0:28     ` Barret Rhoden
2018-10-30  0:28       ` Barret Rhoden
2018-10-30  3:10       ` Dan Williams
2018-10-30  3:10         ` Dan Williams
     [not found]         ` <CAPcyv4gQztHrJ3--rhU4ZpaZyyqdqE0=gx50CRArHKiXwfYC+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-10-30 19:45           ` Barret Rhoden [this message]
2018-10-30 19:45             ` Barret Rhoden
2018-10-31  8:49             ` Paolo Bonzini
2018-10-31  8:49               ` Paolo Bonzini
2018-11-02 20:32               ` Barret Rhoden
2018-11-06 10:19                 ` Paolo Bonzini
     [not found]                   ` <876d5a71-8dda-4728-5329-4e169777ba4a-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-11-06 16:22                     ` Barret Rhoden
2018-11-06 16:22                       ` Barret Rhoden
2018-10-31  3:05         ` Yu Zhang
2018-10-31  3:05           ` Yu Zhang
2018-10-31  3:05           ` Yu Zhang
2018-10-31  8:52   ` Paolo Bonzini
2018-10-31  8:52     ` Paolo Bonzini
2018-10-31  8:52     ` Paolo Bonzini
2018-10-31 21:16     ` Dan Williams
2018-10-31 21:16       ` Dan Williams
2018-10-31 21:16       ` Dan Williams
2018-11-06 10:22       ` Paolo Bonzini
     [not found] ` <20181029210716.212159-1-brho-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2018-11-06 21:05   ` Barret Rhoden
2018-11-06 21:05     ` Barret Rhoden
2018-11-06 21:16     ` Paolo Bonzini
2018-11-06 21:16       ` Paolo Bonzini
2018-11-06 21:17       ` Barret Rhoden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181030154524.181b8236@gnomeregan.cam.corp.google.com \
    --to=brho-hpiqsd4aklfqt0dzr+alfa@public.gmane.org \
    --cc=bp-Gina5bIWoIWzQB+pC5nmwQ@public.gmane.org \
    --cc=dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org \
    --cc=kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org \
    --cc=mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=pbonzini-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=rkrcmar-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org \
    --cc=x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=yi.z.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=yu.c.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=zwisler-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.