Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Ahmed S. Darwish" <a.darwish@linutronix.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Christoph Hellwig <hch@lst.de>, Hugh Dickins <hughd@google.com>,
	Jan Kara <jack@suse.cz>, Jann Horn <jannh@google.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Kirill Shutemov <kirill@shutemov.name>,
	Kirill Tkhai <ktkhai@virtuozzo.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Linux-MM <linux-mm@kvack.org>, Michal Hocko <mhocko@suse.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Sebastian Siewior <bigeasy@linutronix.de>
Subject: Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
Date: Tue, 3 Nov 2020 12:03:27 -0500
Message-ID: <20201103170327.GJ20600@xz-x1> (raw)
In-Reply-To: <20201103001712.GB52235@lx-t490>

On Tue, Nov 03, 2020 at 01:17:12AM +0100, Ahmed S. Darwish wrote:
> > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > index c48f8df6e50268..294c2c3c4fe00d 100644
> > > > +++ b/mm/memory.c
> > > > @@ -1171,6 +1171,12 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
> > > >  		mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
> > > >  					0, src_vma, src_mm, addr, end);
> > > >  		mmu_notifier_invalidate_range_start(&range);
> > > > +		/*
> > > > +		 * The read side doesn't spin, it goes to the mmap_lock, so the
> > > > +		 * raw version is used to avoid disabling preemption here
> > > > +		 */
> > > > +		mmap_assert_write_locked(src_mm);
> > > > +		raw_write_seqcount_t_begin(&src_mm->write_protect_seq);
> > >
> > > Would raw_write_seqcount_begin() be better here?
> >
> > Hum..
> >
> > I felt no because it had the preempt stuff added into it, however it
> > would work - __seqcount_lock_preemptible() == false for the seqcount_t
> > case (see below)
> >
> > Looking more closely, maybe the right API to pick is
> > write_seqcount_t_begin() and write_seqcount_t_end() ??
> >
> 
> No, that's not the right API: it is also internal to seqlock.h.
> 
> Please stick with the official exported API: raw_write_seqcount_begin().
> 
> It should satisfy your needs, and the raw_*() variant is created exactly
> for contexts wishing to avoid the lockdep checks (e.g. NMI handlers
> cannot invoke lockdep, etc.)

Ahmed, Jason, feel free to correct me - but I feel like what Jason wanted here
is indeed the version that does not require disabling of preemption, a.k.a.,
write_seqcount_t_begin() and write_seqcount_t_end(), since it's preempt-safe if
the read side does not retry.  Not sure whether there's no "*_t_*" version of it.

Another idea is that maybe we can use the raw_write_seqcount_begin() version,
instead of in copy_page_range() but move it to copy_pte_range().  That would
not affect normal Linux on preemption I think, since when reach pte level we
should have disabled preemption already after all (by taking the pgtable spin
lock).  But again there could be extra overhead since we'll need to take the
write seqcount very often (rather than once per fork(), so maybe there's some
perf influence), also that means it'll be an extra/real disable_preempt() for
the future RT code if it'll land some day (since again rt_spin_lock should not
need to disable preemption, iiuc, which is used here).  So seems it's still
better to do in copy_page_range() as Jason proposed.

Thanks,

-- 
Peter Xu



  parent reply index

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-30 14:46 [PATCH v2 0/2] Add a seqcount between gup_fast and copy_page_range() Jason Gunthorpe
2020-10-30 14:46 ` [PATCH v2 1/2] mm: reorganize internal_get_user_pages_fast() Jason Gunthorpe
2020-10-30 16:29   ` Jan Kara
2020-10-30 21:31   ` John Hubbard
2020-10-30 22:36   ` Peter Xu
2020-10-30 14:46 ` [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork Jason Gunthorpe
2020-10-30 16:51   ` Jan Kara
2020-10-30 17:02     ` Jason Gunthorpe
2020-11-02  8:31       ` Jan Kara
2020-10-30 21:20   ` John Hubbard
2020-10-30 22:52   ` Peter Xu
2020-10-30 23:51     ` Jason Gunthorpe
2020-10-31 15:26       ` Peter Xu
2020-11-03  0:33         ` Ahmed S. Darwish
2020-11-03  0:17       ` Ahmed S. Darwish
2020-11-03  0:25         ` Jason Gunthorpe
2020-11-03  0:41           ` Ahmed S. Darwish
2020-11-03  2:20             ` John Hubbard
2020-11-03  6:52               ` Ahmed S. Darwish
2020-11-03  7:05                 ` John Hubbard
2020-11-03 17:40                 ` Linus Torvalds
2020-11-04  1:32                   ` Ahmed S. Darwish
2020-11-04  2:01                     ` John Hubbard
2020-11-04  3:17                       ` Ahmed S. Darwish
2020-11-04 18:38                         ` Linus Torvalds
2020-11-10 11:53                         ` Peter Zijlstra
2020-11-03 17:03         ` Peter Xu [this message]
2020-11-02 23:58   ` Ahmed S. Darwish
2020-11-02 22:19 ` [PATCH v2 0/2] Add a seqcount between gup_fast and copy_page_range() Ahmed S. Darwish
2020-11-02 22:39   ` Linus Torvalds
2020-11-02 23:18     ` Ahmed S. Darwish

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201103170327.GJ20600@xz-x1 \
    --to=peterx@redhat.com \
    --cc=a.darwish@linutronix.de \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=bigeasy@linutronix.de \
    --cc=hch@lst.de \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill@shutemov.name \
    --cc=ktkhai@virtuozzo.com \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git