All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>, Leon Romanovsky <leonro@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>, Linux-MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.cz>, Michal Hocko <mhocko@suse.com>,
	Kirill Tkhai <ktkhai@virtuozzo.com>,
	Kirill Shutemov <kirill@shutemov.name>,
	Hugh Dickins <hughd@google.com>, Christoph Hellwig <hch@lst.de>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Oleg Nesterov <oleg@redhat.com>, Jann Horn <jannh@google.com>
Subject: Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
Date: Mon, 28 Sep 2020 20:57:39 -0300	[thread overview]
Message-ID: <20200928235739.GU9916@ziepe.ca> (raw)
In-Reply-To: <CAHk-=wiRHLVK7Usrfdbh0MF7b86NLZ7gBWHV-Y89K_p3GU4KQg@mail.gmail.com>

On Mon, Sep 28, 2020 at 12:29:55PM -0700, Linus Torvalds wrote:
> So a read pin action would basically never work for the fast-path for
> a few cases, notably a shared read-only mapping - because we could
> never mark it in the page tables as "fast pin accessible"

Agree, I was assuming we'd loose more of the fast path to create this
thing. It would only still be fast if the pages are already writable.

I strongly suspect the case of DMA'ing actual read-only data is the
minority here, the usual case is probably filling a writable buffer
with something interesting and then triggering the DMA. The DMA just
happens to be read from the driver view so the driver doesn't set
FOLL_WRITE.

Looking at the FOLL_LONGTERM users, which should be the banner usecase
for this, there are very few that do a read pin and use fast.

> And it would basically have no advantages over a writable FOLL_PIN. It
> would break the association with any backing store for private pages,
> because otherwise it can't follow future writes.

Yes, I wasn't clear enough, I'm looking at this from a driver API
perspective. We have this API

  pin_user_pages(FOLL_LONGTERM | FOLL_WRITE)

Which now has no decoherence issues with the MM. If the driver
naturally wants to do read-only access it might be tempted to do:

  pin_user_pages(FOLL_LONGTERM)

Which is now NOT the same thing and brings all these really surprising
mm coherence issues back.

The driver author might discover this in testing, then be tempted to
hardwire 'FOLL_LONGTERM | FOLL_WRITE'. Now their uAPI is broken for
things that are actually read-only like .rodata.

If they discover this then they add a FOLL_FORCE to the mix.

When someone comes along to read this later it is a big leap to see
  pin_user_pages(FOLL_LONGTERM | FOLL_FORCE | FOLL_WRITE)

and realize this is code for "read only mapping". At least it took me
a while to decipher it the first time I saw it.

I think this is really hard to use and ugly. My thinking has been to
just stick:

   if (flags & FOLL_LONGTERM)
       flags |= FOLL_FORCE | FOLL_WRITE

In pin_user_pages(). It would make the driver API cleaner. If we can
do a bit better somehow by not COW'ing for certain VMA's as you
explained then all the better, but not my primary goal..

Basically, I think if a driver is using FOLL_LONGTERM | FOLL_PIN we
should guarentee that driver a consistent MM and take the gup_fast
performance hit to do it.

AFAICT the giant wack of other cases not using FOLL_LONGTERM really
shouldn't care about read-decoherence. For those cases the user should
really not be racing write's with data under read-only pin, and the
new COW logic looks like it solves the other issues with this.

I know Jann/John have been careful to not have special behaviors for
the DMA case, but I think it makes sense here. It is actually different.

Jason

  reply	other threads:[~2020-09-28 23:57 UTC|newest]

Thread overview: 133+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-21 21:17 [PATCH 0/5] mm: Break COW for pinned pages during fork() Peter Xu
2020-09-21 21:17 ` [PATCH 1/5] mm: Introduce mm_struct.has_pinned Peter Xu
2020-09-21 21:43   ` Jann Horn
2020-09-21 21:43     ` Jann Horn
2020-09-21 22:30     ` Peter Xu
2020-09-21 22:47       ` Jann Horn
2020-09-21 22:47         ` Jann Horn
2020-09-22 11:54         ` Jason Gunthorpe
2020-09-22 14:28           ` Peter Xu
2020-09-22 15:56             ` Jason Gunthorpe
2020-09-22 16:25               ` Linus Torvalds
2020-09-22 16:25                 ` Linus Torvalds
2020-09-21 23:53   ` John Hubbard
2020-09-22  0:01     ` John Hubbard
2020-09-22 15:17     ` Peter Xu
2020-09-22 16:10       ` Jason Gunthorpe
2020-09-22 17:54         ` Peter Xu
2020-09-22 19:11           ` Jason Gunthorpe
2020-09-23  0:27             ` Peter Xu
2020-09-23 13:10               ` Peter Xu
2020-09-23 14:20                 ` Jan Kara
2020-09-23 17:12                   ` Jason Gunthorpe
2020-09-24  7:44                     ` Jan Kara
2020-09-24 14:02                       ` Jason Gunthorpe
2020-09-24 14:45                         ` Jan Kara
2020-09-23 17:07               ` Jason Gunthorpe
2020-09-24 14:35                 ` Peter Xu
2020-09-24 16:51                   ` Jason Gunthorpe
2020-09-24 17:55                     ` Peter Xu
2020-09-24 18:15                       ` Jason Gunthorpe
2020-09-24 18:34                         ` Peter Xu
2020-09-24 18:39                           ` Jason Gunthorpe
2020-09-24 21:30                             ` Peter Xu
2020-09-25 19:56                               ` Linus Torvalds
2020-09-25 19:56                                 ` Linus Torvalds
2020-09-25 21:06                                 ` Linus Torvalds
2020-09-25 21:06                                   ` Linus Torvalds
2020-09-26  0:41                                   ` Jason Gunthorpe
2020-09-26  1:15                                     ` Linus Torvalds
2020-09-26  1:15                                       ` Linus Torvalds
2020-09-26 22:28                                       ` Linus Torvalds
2020-09-26 22:28                                         ` Linus Torvalds
2020-09-27  6:23                                         ` Leon Romanovsky
2020-09-27 18:16                                           ` Linus Torvalds
2020-09-27 18:16                                             ` Linus Torvalds
2020-09-27 18:45                                             ` Linus Torvalds
2020-09-27 18:45                                               ` Linus Torvalds
2020-09-28 12:49                                               ` Jason Gunthorpe
2020-09-28 16:17                                                 ` Linus Torvalds
2020-09-28 16:17                                                   ` Linus Torvalds
2020-09-28 17:22                                                   ` Peter Xu
2020-09-28 17:54                                                     ` Linus Torvalds
2020-09-28 17:54                                                       ` Linus Torvalds
2020-09-28 18:39                                                       ` Jason Gunthorpe
2020-09-28 19:29                                                         ` Linus Torvalds
2020-09-28 19:29                                                           ` Linus Torvalds
2020-09-28 23:57                                                           ` Jason Gunthorpe [this message]
2020-09-29  0:18                                                             ` John Hubbard
2020-09-28 19:36                                                         ` Linus Torvalds
2020-09-28 19:36                                                           ` Linus Torvalds
2020-09-28 19:50                                                           ` Linus Torvalds
2020-09-28 19:50                                                             ` Linus Torvalds
2020-09-28 22:51                                                             ` Jason Gunthorpe
2020-09-29  0:30                                                               ` Peter Xu
2020-10-08  5:49                                                             ` Leon Romanovsky
2020-09-28 17:13                                             ` Peter Xu
2020-09-25 21:13                                 ` Peter Xu
2020-09-25 22:08                                   ` Linus Torvalds
2020-09-25 22:08                                     ` Linus Torvalds
2020-09-22 18:02       ` John Hubbard
2020-09-22 18:15         ` Peter Xu
2020-09-22 19:11       ` John Hubbard
2020-09-27  0:41   ` [mm] 698ac7610f: will-it-scale.per_thread_ops 8.2% improvement kernel test robot
2020-09-27  0:41     ` kernel test robot
2020-09-21 21:17 ` [PATCH 2/5] mm/fork: Pass new vma pointer into copy_page_range() Peter Xu
2020-09-21 21:17 ` [PATCH 3/5] mm: Rework return value for copy_one_pte() Peter Xu
2020-09-22  7:11   ` John Hubbard
2020-09-22 15:29     ` Peter Xu
2020-09-22 10:08   ` Oleg Nesterov
2020-09-22 10:18     ` Oleg Nesterov
2020-09-22 15:36       ` Peter Xu
2020-09-22 15:48         ` Oleg Nesterov
2020-09-22 16:03           ` Peter Xu
2020-09-22 16:53             ` Oleg Nesterov
2020-09-22 18:13               ` Peter Xu
2020-09-22 18:23                 ` Oleg Nesterov
2020-09-22 18:49                   ` Peter Xu
2020-09-23  6:52                     ` Oleg Nesterov
2020-09-23 17:16   ` Linus Torvalds
2020-09-23 17:16     ` Linus Torvalds
2020-09-23 21:24     ` Linus Torvalds
2020-09-23 21:24       ` Linus Torvalds
2020-09-21 21:20 ` [PATCH 4/5] mm: Do early cow for pinned pages during fork() for ptes Peter Xu
2020-09-21 21:55   ` Jann Horn
2020-09-21 21:55     ` Jann Horn
2020-09-21 22:18     ` John Hubbard
2020-09-21 22:27       ` Jann Horn
2020-09-21 22:27         ` Jann Horn
2020-09-22  0:08         ` John Hubbard
2020-09-21 22:27     ` Peter Xu
2020-09-22 11:48   ` Oleg Nesterov
2020-09-22 12:40     ` Oleg Nesterov
2020-09-22 15:58       ` Peter Xu
2020-09-22 16:52         ` Oleg Nesterov
2020-09-22 18:34           ` Peter Xu
2020-09-22 18:44             ` Oleg Nesterov
2020-09-23  1:03               ` Peter Xu
2020-09-23 20:25                 ` Linus Torvalds
2020-09-23 20:25                   ` Linus Torvalds
2020-09-24 15:08                   ` Peter Xu
2020-09-24 11:48   ` Kirill Tkhai
2020-09-24 15:16     ` Peter Xu
2020-09-21 21:20 ` [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork() Peter Xu
2020-09-22  6:41   ` John Hubbard
2020-09-22 10:33     ` Jan Kara
2020-09-22 20:01       ` John Hubbard
2020-09-23  9:22         ` Jan Kara
2020-09-23 13:50           ` Peter Xu
2020-09-23 14:01             ` Jan Kara
2020-09-23 15:44               ` Peter Xu
2020-09-23 20:19                 ` John Hubbard
2020-09-24 18:49                   ` Peter Xu
2020-09-23 16:06     ` Peter Xu
2020-09-22 12:05   ` Jason Gunthorpe
2020-09-23 15:24     ` Peter Xu
2020-09-23 16:07       ` Yang Shi
2020-09-23 16:07         ` Yang Shi
2020-09-24 15:47         ` Peter Xu
2020-09-24 17:29           ` Yang Shi
2020-09-24 17:29             ` Yang Shi
2020-09-23 17:17       ` Jason Gunthorpe
2020-09-23 10:21 ` [PATCH 0/5] mm: Break COW for pinned pages during fork() Leon Romanovsky
2020-09-23 15:37   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200928235739.GU9916@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hch@lst.de \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill@shutemov.name \
    --cc=ktkhai@virtuozzo.com \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=oleg@redhat.com \
    --cc=peterx@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.