linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>, John Hubbard <jhubbard@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Linux-MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Maya B . Gokhale" <gokhale2@llnl.gov>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	Marty Mcfadden <mcfadden8@llnl.gov>,
	Kirill Shutemov <kirill@shutemov.name>,
	Oleg Nesterov <oleg@redhat.com>, Jann Horn <jannh@google.com>,
	Jan Kara <jack@suse.cz>, Kirill Tkhai <ktkhai@virtuozzo.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification
Date: Fri, 18 Sep 2020 21:28:50 -0300	[thread overview]
Message-ID: <20200919002850.GA8409@ziepe.ca> (raw)
In-Reply-To: <CAHk-=wiF03A3X0FoE68yXWDqO19ofgXq=21kWcRAe5+uN-WHXw@mail.gmail.com>

On Fri, Sep 18, 2020 at 01:59:41PM -0700, Linus Torvalds wrote:

> Honestly, if we had a completely *reliable* sign of "this page is
> pinned", then I think the much nicer option would be to just say
> "pinned pages will not be copied at all". Kind of an implicit
> VM_DONTCOPY.

It would be simpler to implement, but it makes the programming model
really sketchy. For instance O_DIRECT is using FOLL_PIN, so imagine
this program:

        CPU0                                      CPU1

 a = malloc(1024);
                                                b = malloc(1024);
 read(fd, a, 1024); // FD is O_DIRECT
 ...                                            fork()
                                                  *b = ...
 read completes

Here a and b got lucky and both come from the same page due to the
allocator.

In this case the fork() child in CPU1, would be very surprised that
'b' was not mapped into the fork.

Similiarly, CPU0 would have silent data corruption if the read didn't
deposit data into 'a' - which is a bug we have today. In this race the
COW break of *b might steal the physical page to the child, and *a
won't see the data. For this reason, John is right, fork needs to
eventually do this for O_DIRECT as well.

The copy on fork nicely fixes all of this weird oddball stuff.

Jason


  reply	other threads:[~2020-09-19  0:28 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-21 23:49 [PATCH 0/4] mm: Simplfy cow handling Peter Xu
2020-08-21 23:49 ` [PATCH 1/4] mm: Trial do_wp_page() simplification Peter Xu
2020-08-24  8:36   ` Kirill Tkhai
2020-08-24 14:30     ` Jan Kara
2020-08-24 15:37       ` Kirill Tkhai
2020-08-24 18:22         ` Linus Torvalds
2020-09-01  7:01           ` Hugh Dickins
2020-09-14 14:38   ` Jason Gunthorpe
2020-09-14 17:32     ` Linus Torvalds
2020-09-14 18:34       ` Peter Xu
2020-09-14 21:15         ` Peter Xu
2020-09-14 22:55           ` Jason Gunthorpe
2020-09-14 22:59             ` Linus Torvalds
2020-09-14 23:28               ` Jason Gunthorpe
2020-09-15  0:19                 ` Linus Torvalds
2020-09-15 14:50                 ` Peter Xu
2020-09-15 15:17                   ` Peter Xu
2020-09-15 16:05                   ` Jason Gunthorpe
2020-09-15 18:29                     ` Jason Gunthorpe
2020-09-15 19:13                       ` Peter Xu
2020-09-15 19:38                         ` Jason Gunthorpe
2020-09-15 21:33                           ` Peter Xu
2020-09-15 23:22                             ` Jason Gunthorpe
2020-09-16  1:50                               ` John Hubbard
2020-09-16 17:48                                 ` Jason Gunthorpe
2020-09-16 18:46                                   ` Peter Xu
2020-09-17 11:25                                     ` Jason Gunthorpe
2020-09-17 18:11                                       ` Linus Torvalds
2020-09-17 19:38                                         ` Jason Gunthorpe
2020-09-17 19:51                                           ` Linus Torvalds
2020-09-18 16:40                                             ` Peter Xu
2020-09-18 17:16                                               ` Linus Torvalds
2020-09-18 19:57                                                 ` Peter Xu
2020-09-18 17:32                                               ` Jason Gunthorpe
2020-09-18 20:40                                                 ` Peter Xu
2020-09-18 20:59                                                   ` Linus Torvalds
2020-09-19  0:28                                                     ` Jason Gunthorpe [this message]
2020-09-18 21:06                                                   ` John Hubbard
2020-09-19  0:01                                                     ` Jason Gunthorpe
2020-09-21  8:35                                                       ` Jan Kara
2020-09-21 12:03                                                         ` Jason Gunthorpe
2022-02-16 16:59                                                           ` Oded Gabbay
2022-02-16 17:24                                                             ` Oded Gabbay
2022-02-16 19:04                                                             ` Linus Torvalds
2022-02-16 19:20                                                               ` Oded Gabbay
2022-02-16 19:24                                                               ` David Hildenbrand
2020-09-21 13:42                                               ` Michal Hocko
2020-09-21 14:18                                                 ` Peter Xu
2020-09-21 14:28                                                   ` Michal Hocko
2020-09-21 14:38                                                     ` Tejun Heo
2020-09-21 14:43                                                       ` Christian Brauner
2020-09-21 14:55                                                         ` Michal Hocko
2020-09-21 15:04                                                           ` Christian Brauner
2020-09-21 16:06                                                             ` Michal Hocko
2020-09-23  7:53                                                               ` Michal Hocko
2020-09-21 14:41                                                 ` Christian Brauner
2020-09-21 14:57                                                   ` Michal Hocko
2020-09-21 16:31                                                     ` Peter Xu
2020-09-17 18:14                                       ` Peter Xu
2020-09-17 18:26                                         ` Linus Torvalds
2020-09-17 19:03                                           ` Peter Xu
2020-09-17 19:42                                             ` Linus Torvalds
2020-09-17 19:55                                               ` John Hubbard
2020-09-17 20:06                                               ` Jason Gunthorpe
2020-09-17 20:19                                                 ` John Hubbard
2020-09-17 20:25                                                   ` Jason Gunthorpe
2020-09-17 20:35                                                 ` Linus Torvalds
2020-09-17 21:40                                                   ` Peter Xu
2020-09-17 22:09                                                     ` Jason Gunthorpe
2020-09-17 22:25                                                       ` Linus Torvalds
2020-09-17 22:48                                                       ` Ira Weiny
2020-09-18  9:36                                                         ` Jan Kara
2020-09-18  9:44                                                       ` Jan Kara
2020-09-18 16:19                                             ` Jason Gunthorpe
2020-09-15 10:23           ` Leon Romanovsky
2020-09-15 15:56           ` Jason Gunthorpe
2020-09-15 15:03   ` Oleg Nesterov
2020-09-15 16:18     ` Peter Xu
2020-08-21 23:49 ` [PATCH 2/4] mm/ksm: Remove reuse_ksm_page() Peter Xu
2020-08-21 23:49 ` [PATCH 3/4] mm/gup: Remove enfornced COW mechanism Peter Xu
2020-09-14 14:27   ` Oleg Nesterov
2020-09-14 17:59     ` Peter Xu
2020-09-14 19:03       ` Linus Torvalds
2020-08-21 23:49 ` [PATCH 4/4] mm: Add PGREUSE counter Peter Xu
2020-08-22 16:14   ` Linus Torvalds
2020-08-24  0:24     ` Peter Xu
2020-08-22 16:05 ` [PATCH 0/4] mm: Simplfy cow handling Linus Torvalds
2020-08-23 23:58   ` Peter Xu
2020-08-24  8:38 ` Kirill Tkhai
2020-08-27 14:15 ` Peter Xu
2021-02-02 14:40 [PATCH 1/4] mm: Trial do_wp_page() simplification Gal Pressman
2021-02-02 16:31 ` Peter Xu
2021-02-02 16:44   ` Jason Gunthorpe
2021-02-02 17:05     ` Peter Xu
2021-02-02 17:13       ` Jason Gunthorpe
2021-02-03 12:43         ` Gal Pressman
2021-02-03 14:00           ` Jason Gunthorpe
2021-02-03 14:47             ` Gal Pressman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200919002850.GA8409@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=gokhale2@llnl.gov \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill@shutemov.name \
    --cc=ktkhai@virtuozzo.com \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcfadden8@llnl.gov \
    --cc=oleg@redhat.com \
    --cc=peterx@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).