All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pingfan Liu <kernelfans@gmail.com>
To: Ira Weiny <ira.weiny@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Mike Rapoport <rppt@linux.ibm.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Keith Busch <keith.busch@intel.com>,
	Christoph Hellwig <hch@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCHv3 1/2] mm/gup: fix omission of check on FOLL_LONGTERM in get_user_pages_fast()
Date: Wed, 12 Jun 2019 22:10:12 +0800	[thread overview]
Message-ID: <CAFgQCTvaoOgzkei6vSNUAfs2D0un3ypuoEM02C9gWB7SnNy5Gw@mail.gmail.com> (raw)
In-Reply-To: <20190611164733.GA14336@iweiny-DESK2.sc.intel.com>

On Wed, Jun 12, 2019 at 12:46 AM Ira Weiny <ira.weiny@intel.com> wrote:
>
> On Tue, Jun 11, 2019 at 08:29:35PM +0800, Pingfan Liu wrote:
> > On Fri, Jun 07, 2019 at 02:10:15PM +0800, Pingfan Liu wrote:
> > > On Fri, Jun 7, 2019 at 5:17 AM John Hubbard <jhubbard@nvidia.com> wrote:
> > > >
> > > > On 6/5/19 7:19 PM, Pingfan Liu wrote:
> > > > > On Thu, Jun 6, 2019 at 5:49 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > ...
> > > > >>> --- a/mm/gup.c
> > > > >>> +++ b/mm/gup.c
> > > > >>> @@ -2196,6 +2196,26 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages,
> > > > >>>       return ret;
> > > > >>>  }
> > > > >>>
> > > > >>> +#ifdef CONFIG_CMA
> > > > >>> +static inline int reject_cma_pages(int nr_pinned, struct page **pages)
> > > > >>> +{
> > > > >>> +     int i;
> > > > >>> +
> > > > >>> +     for (i = 0; i < nr_pinned; i++)
> > > > >>> +             if (is_migrate_cma_page(pages[i])) {
> > > > >>> +                     put_user_pages(pages + i, nr_pinned - i);
> > > > >>> +                     return i;
> > > > >>> +             }
> > > > >>> +
> > > > >>> +     return nr_pinned;
> > > > >>> +}
> > > > >>
> > > > >> There's no point in inlining this.
> > > > > OK, will drop it in V4.
> > > > >
> > > > >>
> > > > >> The code seems inefficient.  If it encounters a single CMA page it can
> > > > >> end up discarding a possibly significant number of non-CMA pages.  I
> > > > > The trick is the page is not be discarded, in fact, they are still be
> > > > > referrenced by pte. We just leave the slow path to pick up the non-CMA
> > > > > pages again.
> > > > >
> > > > >> guess that doesn't matter much, as get_user_pages(FOLL_LONGTERM) is
> > > > >> rare.  But could we avoid this (and the second pass across pages[]) by
> > > > >> checking for a CMA page within gup_pte_range()?
> > > > > It will spread the same logic to hugetlb pte and normal pte. And no
> > > > > improvement in performance due to slow path. So I think maybe it is
> > > > > not worth.
> > > > >
> > > > >>
> > > >
> > > > I think the concern is: for the successful gup_fast case with no CMA
> > > > pages, this patch is adding another complete loop through all the
> > > > pages. In the fast case.
> > > >
> > > > If the check were instead done as part of the gup_pte_range(), then
> > > > it would be a little more efficient for that case.
> > > >
> > > > As for whether it's worth it, *probably* this is too small an effect to measure.
> > > > But in order to attempt a measurement: running fio (https://github.com/axboe/fio)
> > > > with O_DIRECT on an NVMe drive, might shed some light. Here's an fio.conf file
> > > > that Jan Kara and Tom Talpey helped me come up with, for related testing:
> > > >
> > > > [reader]
> > > > direct=1
> > > > ioengine=libaio
> > > > blocksize=4096
> > > > size=1g
> > > > numjobs=1
> > > > rw=read
> > > > iodepth=64
> > > >
> > Unable to get a NVME device to have a test. And when testing fio on the
> > tranditional disk, I got the error "fio: engine libaio not loadable
> > fio: failed to load engine
> > fio: file:ioengines.c:89, func=dlopen, error=libaio: cannot open shared object file: No such file or directory"
> >
> > But I found a test case which can be slightly adjusted to met the aim.
> > It is tools/testing/selftests/vm/gup_benchmark.c
> >
> > Test enviroment:
> >   MemTotal:       264079324 kB
> >   MemFree:        262306788 kB
> >   CmaTotal:              0 kB
> >   CmaFree:               0 kB
> >   on AMD EPYC 7601
> >
> > Test command:
> >   gup_benchmark -r 100 -n 64
> >   gup_benchmark -r 100 -n 64 -l
> > where -r stands for repeat times, -n is nr_pages param for
> > get_user_pages_fast(), -l is a new option to test FOLL_LONGTERM in fast
> > path, see a patch at the tail.
>
> Thanks!  That is a good test to add.  You should add the patch to the series.
OK.
>
> >
> > Test result:
> > w/o     477.800000
> > w/o-l   481.070000
> > a       481.800000
> > a-l     640.410000
> > b       466.240000  (question a: b outperforms w/o ?)
> > b-l     529.740000
> >
> > Where w/o is baseline without any patch using v5.2-rc2, a is this series, b
> > does the check in gup_pte_range(). '-l' means FOLL_LONGTERM.
> >
> > I am suprised that b-l has about 17% improvement than a. (640.41 -529.74)/640.41
>
> Wow that is bigger than I would have thought.  I suspect it gets worse as -n
> increases?
Yes. I test with -n 64/128/256/512. It has this trend. See the data below.

>
> >
> > As for "question a: b outperforms w/o ?", I can not figure out why, maybe it can be
> > considered as variance.
>
> :-/
>
> Does this change with larger -r or -n values?
-r should have no effect on this. And I change -n 64/128/256/512. The
data always shows b outperforms w/o a bit.

      64        128         256        512
a-l  633.23   676.83  747.14  683.19    (n=256 should be disturbed by
something, but the overall trend keeps going up)
b-l  528.32   529.10  523.95  512.88
w/o  479.73   473.87  477.67  488.70
b    470.13   467.11  463.06  469.62

Thanks,
  Pingfan
>
> >
> > Based on the above result, I think it is better to do the check inside
> > gup_pte_range().
> >
> > Any comment?
>
> I agree.
>
> Ira
>
> >
> > Thanks,
> >
> >
> > > Yeah, agreed. Data is more persuasive. Thanks for your suggestion. I
> > > will try to bring out the result.
> > >
> > > Thanks,
> > >   Pingfan
> > >
> >
>
> > ---
> > Patch to do check inside gup_pte_range()
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 2ce3091..ba213a0 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -1757,6 +1757,10 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> >               VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
> >               page = pte_page(pte);
> >
> > +             if (unlikely(flags & FOLL_LONGTERM) &&
> > +                     is_migrate_cma_page(page))
> > +                             goto pte_unmap;
> > +
> >               head = try_get_compound_head(page, 1);
> >               if (!head)
> >                       goto pte_unmap;
> > @@ -1900,6 +1904,12 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> >               refs++;
> >       } while (addr += PAGE_SIZE, addr != end);
> >
> > +     if (unlikely(flags & FOLL_LONGTERM) &&
> > +             is_migrate_cma_page(page)) {
> > +             *nr -= refs;
> > +             return 0;
> > +     }
> > +
> >       head = try_get_compound_head(pmd_page(orig), refs);
> >       if (!head) {
> >               *nr -= refs;
> > @@ -1941,6 +1951,12 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> >               refs++;
> >       } while (addr += PAGE_SIZE, addr != end);
> >
> > +     if (unlikely(flags & FOLL_LONGTERM) &&
> > +             is_migrate_cma_page(page)) {
> > +             *nr -= refs;
> > +             return 0;
> > +     }
> > +
> >       head = try_get_compound_head(pud_page(orig), refs);
> >       if (!head) {
> >               *nr -= refs;
> > @@ -1978,6 +1994,12 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
> >               refs++;
> >       } while (addr += PAGE_SIZE, addr != end);
> >
> > +     if (unlikely(flags & FOLL_LONGTERM) &&
> > +             is_migrate_cma_page(page)) {
> > +             *nr -= refs;
> > +             return 0;
> > +     }
> > +
> >       head = try_get_compound_head(pgd_page(orig), refs);
> >       if (!head) {
> >               *nr -= refs;
>
> > ---
> > Patch for testing
> >
> > diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
> > index 7dd602d..61dec5f 100644
> > --- a/mm/gup_benchmark.c
> > +++ b/mm/gup_benchmark.c
> > @@ -6,8 +6,9 @@
> >  #include <linux/debugfs.h>
> >
> >  #define GUP_FAST_BENCHMARK   _IOWR('g', 1, struct gup_benchmark)
> > -#define GUP_LONGTERM_BENCHMARK       _IOWR('g', 2, struct gup_benchmark)
> > -#define GUP_BENCHMARK                _IOWR('g', 3, struct gup_benchmark)
> > +#define GUP_FAST_LONGTERM_BENCHMARK  _IOWR('g', 2, struct gup_benchmark)
> > +#define GUP_LONGTERM_BENCHMARK       _IOWR('g', 3, struct gup_benchmark)
> > +#define GUP_BENCHMARK                _IOWR('g', 4, struct gup_benchmark)
> >
> >  struct gup_benchmark {
> >       __u64 get_delta_usec;
> > @@ -53,6 +54,11 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
> >                       nr = get_user_pages_fast(addr, nr, gup->flags & 1,
> >                                                pages + i);
> >                       break;
> > +             case GUP_FAST_LONGTERM_BENCHMARK:
> > +                     nr = get_user_pages_fast(addr, nr,
> > +                                              (gup->flags & 1) | FOLL_LONGTERM,
> > +                                              pages + i);
> > +                     break;
> >               case GUP_LONGTERM_BENCHMARK:
> >                       nr = get_user_pages(addr, nr,
> >                                           (gup->flags & 1) | FOLL_LONGTERM,
> > @@ -96,6 +102,7 @@ static long gup_benchmark_ioctl(struct file *filep, unsigned int cmd,
> >
> >       switch (cmd) {
> >       case GUP_FAST_BENCHMARK:
> > +     case GUP_FAST_LONGTERM_BENCHMARK:
> >       case GUP_LONGTERM_BENCHMARK:
> >       case GUP_BENCHMARK:
> >               break;
> > diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c
> > index c0534e2..ade8acb 100644
> > --- a/tools/testing/selftests/vm/gup_benchmark.c
> > +++ b/tools/testing/selftests/vm/gup_benchmark.c
> > @@ -15,8 +15,9 @@
> >  #define PAGE_SIZE sysconf(_SC_PAGESIZE)
> >
> >  #define GUP_FAST_BENCHMARK   _IOWR('g', 1, struct gup_benchmark)
> > -#define GUP_LONGTERM_BENCHMARK       _IOWR('g', 2, struct gup_benchmark)
> > -#define GUP_BENCHMARK                _IOWR('g', 3, struct gup_benchmark)
> > +#define GUP_FAST_LONGTERM_BENCHMARK  _IOWR('g', 2, struct gup_benchmark)
> > +#define GUP_LONGTERM_BENCHMARK       _IOWR('g', 3, struct gup_benchmark)
> > +#define GUP_BENCHMARK                _IOWR('g', 4, struct gup_benchmark)
> >
> >  struct gup_benchmark {
> >       __u64 get_delta_usec;
> > @@ -37,7 +38,7 @@ int main(int argc, char **argv)
> >       char *file = "/dev/zero";
> >       char *p;
> >
> > -     while ((opt = getopt(argc, argv, "m:r:n:f:tTLUSH")) != -1) {
> > +     while ((opt = getopt(argc, argv, "m:r:n:f:tTlLUSH")) != -1) {
> >               switch (opt) {
> >               case 'm':
> >                       size = atoi(optarg) * MB;
> > @@ -54,6 +55,9 @@ int main(int argc, char **argv)
> >               case 'T':
> >                       thp = 0;
> >                       break;
> > +             case 'l':
> > +                     cmd = GUP_FAST_LONGTERM_BENCHMARK;
> > +                     break;
> >               case 'L':
> >                       cmd = GUP_LONGTERM_BENCHMARK;
> >                       break;
> > --
> > 2.7.5
> >
>

  reply	other threads:[~2019-06-12 14:10 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-05  9:10 [PATCHv3 1/2] mm/gup: fix omission of check on FOLL_LONGTERM in get_user_pages_fast() Pingfan Liu
2019-06-05  9:10 ` [PATCHv3 2/2] mm/gup: rename nr as nr_pinned " Pingfan Liu
2019-06-05 21:49 ` [PATCHv3 1/2] mm/gup: fix omission of check on FOLL_LONGTERM " Andrew Morton
2019-06-06  2:19   ` Pingfan Liu
2019-06-06  2:19     ` Pingfan Liu
2019-06-06 21:17     ` John Hubbard
2019-06-07  6:10       ` Pingfan Liu
2019-06-07  6:10         ` Pingfan Liu
2019-06-11 12:29         ` Pingfan Liu
2019-06-11 13:52           ` Christoph Hellwig
2019-06-11 19:49             ` John Hubbard
2019-06-11 16:47           ` Ira Weiny
2019-06-12 14:10             ` Pingfan Liu [this message]
2019-06-12 14:10               ` Pingfan Liu
2019-06-11 16:15 ` Aneesh Kumar K.V
2019-06-11 16:29   ` Weiny, Ira
2019-06-12 13:54     ` Pingfan Liu
2019-06-12 23:50       ` Ira Weiny
2019-06-13 10:48         ` Pingfan Liu
2019-06-13 10:48           ` Pingfan Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFgQCTvaoOgzkei6vSNUAfs2D0un3ypuoEM02C9gWB7SnNy5Gw@mail.gmail.com \
    --to=kernelfans@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=ira.weiny@intel.com \
    --cc=jhubbard@nvidia.com \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rppt@linux.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.