All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Zhang <yu.c.zhang@linux.intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Barret Rhoden <brho@google.com>,
	rkrcmar@redhat.com, "Zhang, Yu C  <yu.c.zhang@intel.com>,
	KVM list <kvm@vger.kernel.org>,
	linux-nvdimm" <linux-nvdimm@lists.01.org>,
	X86 ML <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	zwisler@kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>Zhang,
Subject: Re: [RFC PATCH] kvm: Use huge pages for DAX-backed files
Date: Wed, 31 Oct 2018 11:05:35 +0800	[thread overview]
Message-ID: <20181031030535.slgmorcrhacdrdml@linux.intel.com> (raw)
In-Reply-To: <CAPcyv4gQztHrJ3--rhU4ZpaZyyqdqE0=gx50CRArHKiXwfYC+A@mail.gmail.com>

On Mon, Oct 29, 2018 at 08:10:52PM -0700, Dan Williams wrote:
> On Mon, Oct 29, 2018 at 5:29 PM Barret Rhoden <brho@google.com> wrote:
> >
> > On 2018-10-29 at 15:25 Dan Williams <dan.j.williams@intel.com> wrote:
> > > > +       /*
> > > > +        * Our caller grabbed the KVM mmu_lock with a successful
> > > > +        * mmu_notifier_retry, so we're safe to walk the page table.
> > > > +        */
> > > > +       map_sz = pgd_mapping_size(current->mm, hva);
> > > > +       switch (map_sz) {
> > > > +       case PMD_SIZE:
> > > > +               return true;
> > > > +       case P4D_SIZE:
> > > > +       case PUD_SIZE:
> > > > +               printk_once(KERN_INFO "KVM THP promo found a very large page");
> > >
> > > Why not allow PUD_SIZE? The device-dax interface supports PUD mappings.
> >
> > The place where I use that helper seemed to care about PMDs (compared
> > to huge pages larger than PUDs), I think due to THP.  Though it also
> > checks "level == PT_PAGE_TABLE_LEVEL", so it's probably a moot point.
> >
> > I can change it from pfn_is_pmd_mapped -> pfn_is_huge_mapped and allow
> > any huge mapping that is appropriate: so PUD or PMD for DAX, PMD for
> > non-DAX, IIUC.
> 
> Yes, THP stops at PMDs, but DAX and hugetlbfs support PUD level mappings.
> 
> > > > +               return false;
> > > > +       }
> > > > +       return false;
> > > > +}
> > >
> > > The above 2 functions are  similar to what we need to do for
> > > determining the blast radius of a memory error, see
> > > dev_pagemap_mapping_shift() and its usage in add_to_kill().
> >
> > Great.  I don't know if I have access in the KVM code to the VMA to use
> > those functions directly, but I can extract the guts of
> > dev_pagemap_mapping_shift() or something and put it in mm/util.c.
> 
> Sounds good.
> 
> > > >  static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
> > > >                                         gfn_t *gfnp, kvm_pfn_t *pfnp,
> > > >                                         int *levelp)
> > > > @@ -3168,7 +3237,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
> > > >          */
> > > >         if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
> > > >             level == PT_PAGE_TABLE_LEVEL &&
> > > > -           PageTransCompoundMap(pfn_to_page(pfn)) &&
> > > > +           pfn_is_pmd_mapped(vcpu->kvm, gfn, pfn) &&
> > >
> > > I'm wondering if we're adding an explicit is_zone_device_page() check
> > > in this path to determine the page mapping size if that can be a
> > > replacement for the kvm_is_reserved_pfn() check. In other words, the
> > > goal of fixing up PageReserved() was to preclude the need for DAX-page
> > > special casing in KVM, but if we already need add some special casing
> > > for page size determination, might as well bypass the
> > > kvm_is_reserved_pfn() dependency as well.
> >
> > kvm_is_reserved_pfn() is used in some other places, like
> > kvm_set_pfn_dirty()and kvm_set_pfn_accessed().  Maybe the way those
> > treat DAX pages matters on a case-by-case basis?
> >
> > There are other callers of kvm_is_reserved_pfn() such as
> > kvm_pfn_to_page() and gfn_to_page().  I'm not familiar (yet) with how
> > struct pages and DAX work together, and whether or not the callers of
> > those pfn_to_page() functions have expectations about the 'type' of
> > struct page they get back.
> >
> 
> The property of DAX pages that requires special coordination is the
> fact that the device hosting the pages can be disabled at will. The
> get_dev_pagemap() api is the interface to pin a device-pfn so that you
> can safely perform a pfn_to_page() operation.
> 
> Have the pages that kvm uses in this path already been pinned by vfio?

My understanding is, it could be - if there's any device assigned to the VM.
Otherwise, they will not.

B.R.
Yu

> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Yu Zhang <yu.c.zhang@linux.intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Barret Rhoden <brho@google.com>, X86 ML <x86@kernel.org>,
	"Zhang, Yu C" <yu.c.zhang@intel.com>,
	KVM list <kvm@vger.kernel.org>,
	rkrcmar@redhat.com, linux-nvdimm <linux-nvdimm@lists.01.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	zwisler@kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Zhang, Yi Z" <yi.z.zhang@intel.com>
Subject: Re: [RFC PATCH] kvm: Use huge pages for DAX-backed files
Date: Wed, 31 Oct 2018 11:05:35 +0800	[thread overview]
Message-ID: <20181031030535.slgmorcrhacdrdml@linux.intel.com> (raw)
In-Reply-To: <CAPcyv4gQztHrJ3--rhU4ZpaZyyqdqE0=gx50CRArHKiXwfYC+A@mail.gmail.com>

On Mon, Oct 29, 2018 at 08:10:52PM -0700, Dan Williams wrote:
> On Mon, Oct 29, 2018 at 5:29 PM Barret Rhoden <brho@google.com> wrote:
> >
> > On 2018-10-29 at 15:25 Dan Williams <dan.j.williams@intel.com> wrote:
> > > > +       /*
> > > > +        * Our caller grabbed the KVM mmu_lock with a successful
> > > > +        * mmu_notifier_retry, so we're safe to walk the page table.
> > > > +        */
> > > > +       map_sz = pgd_mapping_size(current->mm, hva);
> > > > +       switch (map_sz) {
> > > > +       case PMD_SIZE:
> > > > +               return true;
> > > > +       case P4D_SIZE:
> > > > +       case PUD_SIZE:
> > > > +               printk_once(KERN_INFO "KVM THP promo found a very large page");
> > >
> > > Why not allow PUD_SIZE? The device-dax interface supports PUD mappings.
> >
> > The place where I use that helper seemed to care about PMDs (compared
> > to huge pages larger than PUDs), I think due to THP.  Though it also
> > checks "level == PT_PAGE_TABLE_LEVEL", so it's probably a moot point.
> >
> > I can change it from pfn_is_pmd_mapped -> pfn_is_huge_mapped and allow
> > any huge mapping that is appropriate: so PUD or PMD for DAX, PMD for
> > non-DAX, IIUC.
> 
> Yes, THP stops at PMDs, but DAX and hugetlbfs support PUD level mappings.
> 
> > > > +               return false;
> > > > +       }
> > > > +       return false;
> > > > +}
> > >
> > > The above 2 functions are  similar to what we need to do for
> > > determining the blast radius of a memory error, see
> > > dev_pagemap_mapping_shift() and its usage in add_to_kill().
> >
> > Great.  I don't know if I have access in the KVM code to the VMA to use
> > those functions directly, but I can extract the guts of
> > dev_pagemap_mapping_shift() or something and put it in mm/util.c.
> 
> Sounds good.
> 
> > > >  static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
> > > >                                         gfn_t *gfnp, kvm_pfn_t *pfnp,
> > > >                                         int *levelp)
> > > > @@ -3168,7 +3237,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
> > > >          */
> > > >         if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
> > > >             level == PT_PAGE_TABLE_LEVEL &&
> > > > -           PageTransCompoundMap(pfn_to_page(pfn)) &&
> > > > +           pfn_is_pmd_mapped(vcpu->kvm, gfn, pfn) &&
> > >
> > > I'm wondering if we're adding an explicit is_zone_device_page() check
> > > in this path to determine the page mapping size if that can be a
> > > replacement for the kvm_is_reserved_pfn() check. In other words, the
> > > goal of fixing up PageReserved() was to preclude the need for DAX-page
> > > special casing in KVM, but if we already need add some special casing
> > > for page size determination, might as well bypass the
> > > kvm_is_reserved_pfn() dependency as well.
> >
> > kvm_is_reserved_pfn() is used in some other places, like
> > kvm_set_pfn_dirty()and kvm_set_pfn_accessed().  Maybe the way those
> > treat DAX pages matters on a case-by-case basis?
> >
> > There are other callers of kvm_is_reserved_pfn() such as
> > kvm_pfn_to_page() and gfn_to_page().  I'm not familiar (yet) with how
> > struct pages and DAX work together, and whether or not the callers of
> > those pfn_to_page() functions have expectations about the 'type' of
> > struct page they get back.
> >
> 
> The property of DAX pages that requires special coordination is the
> fact that the device hosting the pages can be disabled at will. The
> get_dev_pagemap() api is the interface to pin a device-pfn so that you
> can safely perform a pfn_to_page() operation.
> 
> Have the pages that kvm uses in this path already been pinned by vfio?

My understanding is, it could be - if there's any device assigned to the VM.
Otherwise, they will not.

B.R.
Yu

> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
> 

WARNING: multiple messages have this Message-ID (diff)
From: Yu Zhang <yu.c.zhang-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
To: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Barret Rhoden <brho-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	rkrcmar-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, "Zhang,
	Yu C" <yu.c.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	KVM list <kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-nvdimm
	<linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org>,
	X86 ML <x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Linux Kernel Mailing List
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Borislav Petkov <bp-Gina5bIWoIWzQB+pC5nmwQ@public.gmane.org>,
	zwisler-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	Paolo Bonzini <pbonzini-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	"H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>,
	"Zhang,
	Yi Z" <yi.z.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Subject: Re: [RFC PATCH] kvm: Use huge pages for DAX-backed files
Date: Wed, 31 Oct 2018 11:05:35 +0800	[thread overview]
Message-ID: <20181031030535.slgmorcrhacdrdml@linux.intel.com> (raw)
In-Reply-To: <CAPcyv4gQztHrJ3--rhU4ZpaZyyqdqE0=gx50CRArHKiXwfYC+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Mon, Oct 29, 2018 at 08:10:52PM -0700, Dan Williams wrote:
> On Mon, Oct 29, 2018 at 5:29 PM Barret Rhoden <brho-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > On 2018-10-29 at 15:25 Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> > > > +       /*
> > > > +        * Our caller grabbed the KVM mmu_lock with a successful
> > > > +        * mmu_notifier_retry, so we're safe to walk the page table.
> > > > +        */
> > > > +       map_sz = pgd_mapping_size(current->mm, hva);
> > > > +       switch (map_sz) {
> > > > +       case PMD_SIZE:
> > > > +               return true;
> > > > +       case P4D_SIZE:
> > > > +       case PUD_SIZE:
> > > > +               printk_once(KERN_INFO "KVM THP promo found a very large page");
> > >
> > > Why not allow PUD_SIZE? The device-dax interface supports PUD mappings.
> >
> > The place where I use that helper seemed to care about PMDs (compared
> > to huge pages larger than PUDs), I think due to THP.  Though it also
> > checks "level == PT_PAGE_TABLE_LEVEL", so it's probably a moot point.
> >
> > I can change it from pfn_is_pmd_mapped -> pfn_is_huge_mapped and allow
> > any huge mapping that is appropriate: so PUD or PMD for DAX, PMD for
> > non-DAX, IIUC.
> 
> Yes, THP stops at PMDs, but DAX and hugetlbfs support PUD level mappings.
> 
> > > > +               return false;
> > > > +       }
> > > > +       return false;
> > > > +}
> > >
> > > The above 2 functions are  similar to what we need to do for
> > > determining the blast radius of a memory error, see
> > > dev_pagemap_mapping_shift() and its usage in add_to_kill().
> >
> > Great.  I don't know if I have access in the KVM code to the VMA to use
> > those functions directly, but I can extract the guts of
> > dev_pagemap_mapping_shift() or something and put it in mm/util.c.
> 
> Sounds good.
> 
> > > >  static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
> > > >                                         gfn_t *gfnp, kvm_pfn_t *pfnp,
> > > >                                         int *levelp)
> > > > @@ -3168,7 +3237,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
> > > >          */
> > > >         if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
> > > >             level == PT_PAGE_TABLE_LEVEL &&
> > > > -           PageTransCompoundMap(pfn_to_page(pfn)) &&
> > > > +           pfn_is_pmd_mapped(vcpu->kvm, gfn, pfn) &&
> > >
> > > I'm wondering if we're adding an explicit is_zone_device_page() check
> > > in this path to determine the page mapping size if that can be a
> > > replacement for the kvm_is_reserved_pfn() check. In other words, the
> > > goal of fixing up PageReserved() was to preclude the need for DAX-page
> > > special casing in KVM, but if we already need add some special casing
> > > for page size determination, might as well bypass the
> > > kvm_is_reserved_pfn() dependency as well.
> >
> > kvm_is_reserved_pfn() is used in some other places, like
> > kvm_set_pfn_dirty()and kvm_set_pfn_accessed().  Maybe the way those
> > treat DAX pages matters on a case-by-case basis?
> >
> > There are other callers of kvm_is_reserved_pfn() such as
> > kvm_pfn_to_page() and gfn_to_page().  I'm not familiar (yet) with how
> > struct pages and DAX work together, and whether or not the callers of
> > those pfn_to_page() functions have expectations about the 'type' of
> > struct page they get back.
> >
> 
> The property of DAX pages that requires special coordination is the
> fact that the device hosting the pages can be disabled at will. The
> get_dev_pagemap() api is the interface to pin a device-pfn so that you
> can safely perform a pfn_to_page() operation.
> 
> Have the pages that kvm uses in this path already been pinned by vfio?

My understanding is, it could be - if there's any device assigned to the VM.
Otherwise, they will not.

B.R.
Yu

> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
> 

  parent reply	other threads:[~2018-10-31  3:07 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-29 21:07 [RFC PATCH] kvm: Use huge pages for DAX-backed files Barret Rhoden
2018-10-29 21:07 ` Barret Rhoden
2018-10-29 22:25 ` Dan Williams
2018-10-29 22:25   ` Dan Williams
2018-10-29 22:25   ` Dan Williams
     [not found]   ` <CAPcyv4gJUjuSKwy7i2wuKR=Vz-AkDrxnGya5qkg7XTFxuXbtzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-10-30  0:28     ` Barret Rhoden
2018-10-30  0:28       ` Barret Rhoden
2018-10-30  3:10       ` Dan Williams
2018-10-30  3:10         ` Dan Williams
     [not found]         ` <CAPcyv4gQztHrJ3--rhU4ZpaZyyqdqE0=gx50CRArHKiXwfYC+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-10-30 19:45           ` Barret Rhoden
2018-10-30 19:45             ` Barret Rhoden
2018-10-31  8:49             ` Paolo Bonzini
2018-10-31  8:49               ` Paolo Bonzini
2018-11-02 20:32               ` Barret Rhoden
2018-11-06 10:19                 ` Paolo Bonzini
     [not found]                   ` <876d5a71-8dda-4728-5329-4e169777ba4a-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-11-06 16:22                     ` Barret Rhoden
2018-11-06 16:22                       ` Barret Rhoden
2018-10-31  3:05         ` Yu Zhang [this message]
2018-10-31  3:05           ` Yu Zhang
2018-10-31  3:05           ` Yu Zhang
2018-10-31  8:52   ` Paolo Bonzini
2018-10-31  8:52     ` Paolo Bonzini
2018-10-31  8:52     ` Paolo Bonzini
2018-10-31 21:16     ` Dan Williams
2018-10-31 21:16       ` Dan Williams
2018-10-31 21:16       ` Dan Williams
2018-11-06 10:22       ` Paolo Bonzini
     [not found] ` <20181029210716.212159-1-brho-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2018-11-06 21:05   ` Barret Rhoden
2018-11-06 21:05     ` Barret Rhoden
2018-11-06 21:16     ` Paolo Bonzini
2018-11-06 21:16       ` Paolo Bonzini
2018-11-06 21:17       ` Barret Rhoden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181031030535.slgmorcrhacdrdml@linux.intel.com \
    --to=yu.c.zhang@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=brho@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=zwisler@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.