Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork

From: "Ahmed S. Darwish" <a.darwish@linutronix.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>, Peter Xu <peterx@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Christoph Hellwig <hch@lst.de>, Hugh Dickins <hughd@google.com>,
	Jan Kara <jack@suse.cz>, Jann Horn <jannh@google.com>,
	Kirill Shutemov <kirill@shutemov.name>,
	Kirill Tkhai <ktkhai@virtuozzo.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Linux-MM <linux-mm@kvack.org>, Michal Hocko <mhocko@suse.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Sebastian Siewior <bigeasy@linutronix.de>
Subject: Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
Date: Wed, 4 Nov 2020 02:32:12 +0100	[thread overview]
Message-ID: <20201104013212.GA82153@lx-t490> (raw)
In-Reply-To: <CAHk-=wgB8nyOQufpn0o6a5BpJCJPnXvH+kRxApujhsgG+7qAwQ@mail.gmail.com>

On Tue, Nov 03, 2020 at 09:40:22AM -0800, Linus Torvalds wrote:
> On Mon, Nov 2, 2020 at 10:52 PM Ahmed S. Darwish
> <a.darwish@linutronix.de> wrote:
> >
> > The problem is, I've already documented seqlock.h to death.... There are
> > more comments than code in there, and there is "seqlock.rst" under
> > Documentation/ to further describe the big picture.
>
> Well, honestly, I think the correct thing to do is to get rid of the
> *_seqcount_t_*() functions entirely.
>
> They add nothing but confusion, and they are entirely misnamed. That's
> not the pattern we use for "internal use only" functions, and they are
> *very* confusing.
>

I see. Would the enclosed patch #1 be OK? It basically uses the "__do_"
prefix instead, with some rationale.

>
> They have other issues too: like raw_write_seqcount_end() not being
> usable on its own when preemptibility isn't an issue like here. You
> basically _have_ to use raw_write_seqcount_t_end(), because otherwise
> it tries to re-enable preemption that was never there.
>

Hmmm, raw_write_seqcount_{begin,end}() *never* disable/enable preemption
for plain seqcount_t. This is why I kept recommending those for this
patch series instead of internal raw_write_seqcount_*t*_{begin,end}().

But..... given that multiple people made the exact same remark by now, I
guess that's due to:

#define raw_write_seqcount_begin(s)		\
do {						\
	if (__seqcount_lock_preemptible(s))	\
		preempt_disable();		\
						\
	...					\
} while (0);

#define raw_write_seqcount_end(s)		\
do {						\
	...					\
						\
	if (__seqcount_lock_preemptible(s))	\
		preempt_enable();		\
} while (0);

The tricky part is that __seqcount_lock_preemptible() is always false
for plain "seqcount_t".  With that data type, the _Generic() selection
makes it resolve to __seqprop_preemptible(), which just returns false.

Originally, __seqcount_lock_preemptible() was called:

  __seqcount_associated_lock_exists_and_is_preemptible()

but it got transformed to its current short form in the process of some
pre-mainline refactorings. Looking at it now after all the dust has
settled, maybe the verbose form was much more clear.

Please see the enclosed patch #2... Would that be OK too?

(I will submit these two patches in their own thread after some common
 ground is reached.)

Patches
-------

====>
====> patch #1:
====>

Subject: [RFC][PATCH 1/2] seqlock: Use __do_ prefix instead of non-standed
 _seqcount_t_ marker

The use of "*_seqcount_t_*" as a marker to denote internal seqlock.h
functions taking only plain seqcount_t instead of the whole
seqcount_LOCKNAME_t family is confusing users, as it's also not the
standard kernel pattern for denoting header file internal functions.

Use the __do_ prefix instead.

Note, a plain "__" prefix is not used since seqlock.h already uses it
for some of its exported functions; e.g. __read_seqcount_begin() and
__read_seqcount_retry().

Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Reported-by: John Hubbard <jhubbard@nvidia.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/CAHk-=wgB8nyOQufpn0o6a5BpJCJPnXvH+kRxApujhsgG+7qAwQ@mail.gmail.com
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 62 ++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index cbfc78b92b65..5de043841d33 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -425,9 +425,9 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
  * Return: true if a read section retry is required, else false
  */
 #define __read_seqcount_retry(s, start)					\
-	__read_seqcount_t_retry(__seqcount_ptr(s), start)
+	__do___read_seqcount_retry(__seqcount_ptr(s), start)

-static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
+static inline int __do___read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
 	kcsan_atomic_next(0);
 	return unlikely(READ_ONCE(s->sequence) != start);
@@ -445,12 +445,12 @@ static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
  * Return: true if a read section retry is required, else false
  */
 #define read_seqcount_retry(s, start)					\
-	read_seqcount_t_retry(__seqcount_ptr(s), start)
+	__do_read_seqcount_retry(__seqcount_ptr(s), start)

-static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
+static inline int __do_read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
 	smp_rmb();
-	return __read_seqcount_t_retry(s, start);
+	return __do___read_seqcount_retry(s, start);
 }

 /**
@@ -462,10 +462,10 @@ do {									\
 	if (__seqcount_lock_preemptible(s))				\
 		preempt_disable();					\
 									\
-	raw_write_seqcount_t_begin(__seqcount_ptr(s));			\
+	__do_raw_write_seqcount_begin(__seqcount_ptr(s));		\
 } while (0)

-static inline void raw_write_seqcount_t_begin(seqcount_t *s)
+static inline void __do_raw_write_seqcount_begin(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
 	s->sequence++;
@@ -478,13 +478,13 @@ static inline void raw_write_seqcount_t_begin(seqcount_t *s)
  */
 #define raw_write_seqcount_end(s)					\
 do {									\
-	raw_write_seqcount_t_end(__seqcount_ptr(s));			\
+	__do_raw_write_seqcount_end(__seqcount_ptr(s));			\
 									\
 	if (__seqcount_lock_preemptible(s))				\
 		preempt_enable();					\
 } while (0)

-static inline void raw_write_seqcount_t_end(seqcount_t *s)
+static inline void __do_raw_write_seqcount_end(seqcount_t *s)
 {
 	smp_wmb();
 	s->sequence++;
@@ -506,12 +506,12 @@ do {									\
 	if (__seqcount_lock_preemptible(s))				\
 		preempt_disable();					\
 									\
-	write_seqcount_t_begin_nested(__seqcount_ptr(s), subclass);	\
+	__do_write_seqcount_begin_nested(__seqcount_ptr(s), subclass);	\
 } while (0)

-static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
+static inline void __do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
-	raw_write_seqcount_t_begin(s);
+	__do_raw_write_seqcount_begin(s);
 	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }

@@ -533,12 +533,12 @@ do {									\
 	if (__seqcount_lock_preemptible(s))				\
 		preempt_disable();					\
 									\
-	write_seqcount_t_begin(__seqcount_ptr(s));			\
+	__do_write_seqcount_begin(__seqcount_ptr(s));			\
 } while (0)

-static inline void write_seqcount_t_begin(seqcount_t *s)
+static inline void __do_write_seqcount_begin(seqcount_t *s)
 {
-	write_seqcount_t_begin_nested(s, 0);
+	__do_write_seqcount_begin_nested(s, 0);
 }

 /**
@@ -549,16 +549,16 @@ static inline void write_seqcount_t_begin(seqcount_t *s)
  */
 #define write_seqcount_end(s)						\
 do {									\
-	write_seqcount_t_end(__seqcount_ptr(s));			\
+	__do_write_seqcount_end(__seqcount_ptr(s));			\
 									\
 	if (__seqcount_lock_preemptible(s))				\
 		preempt_enable();					\
 } while (0)

-static inline void write_seqcount_t_end(seqcount_t *s)
+static inline void __do_write_seqcount_end(seqcount_t *s)
 {
 	seqcount_release(&s->dep_map, _RET_IP_);
-	raw_write_seqcount_t_end(s);
+	__do_raw_write_seqcount_end(s);
 }

 /**
@@ -603,9 +603,9 @@ static inline void write_seqcount_t_end(seqcount_t *s)
  *      }
  */
 #define raw_write_seqcount_barrier(s)					\
-	raw_write_seqcount_t_barrier(__seqcount_ptr(s))
+	__do_raw_write_seqcount_barrier(__seqcount_ptr(s))

-static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
+static inline void __do_raw_write_seqcount_barrier(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
 	s->sequence++;
@@ -623,9 +623,9 @@ static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
  * will complete successfully and see data older than this.
  */
 #define write_seqcount_invalidate(s)					\
-	write_seqcount_t_invalidate(__seqcount_ptr(s))
+	__do_write_seqcount_invalidate(__seqcount_ptr(s))

-static inline void write_seqcount_t_invalidate(seqcount_t *s)
+static inline void __do_write_seqcount_invalidate(seqcount_t *s)
 {
 	smp_wmb();
 	kcsan_nestable_atomic_begin();
@@ -865,7 +865,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 }

 /*
- * For all seqlock_t write side functions, use write_seqcount_*t*_begin()
+ * For all seqlock_t write side functions, use __do_write_seqcount_begin()
  * instead of the generic write_seqcount_begin(). This way, no redundant
  * lockdep_assert_held() checks are added.
  */
@@ -886,7 +886,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	write_seqcount_t_begin(&sl->seqcount.seqcount);
+	__do_write_seqcount_begin(&sl->seqcount.seqcount);
 }

 /**
@@ -898,7 +898,7 @@ static inline void write_seqlock(seqlock_t *sl)
  */
 static inline void write_sequnlock(seqlock_t *sl)
 {
-	write_seqcount_t_end(&sl->seqcount.seqcount);
+	__do_write_seqcount_end(&sl->seqcount.seqcount);
 	spin_unlock(&sl->lock);
 }

@@ -912,7 +912,7 @@ static inline void write_sequnlock(seqlock_t *sl)
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
-	write_seqcount_t_begin(&sl->seqcount.seqcount);
+	__do_write_seqcount_begin(&sl->seqcount.seqcount);
 }

 /**
@@ -925,7 +925,7 @@ static inline void write_seqlock_bh(seqlock_t *sl)
  */
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
-	write_seqcount_t_end(&sl->seqcount.seqcount);
+	__do_write_seqcount_end(&sl->seqcount.seqcount);
 	spin_unlock_bh(&sl->lock);
 }

@@ -939,7 +939,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
-	write_seqcount_t_begin(&sl->seqcount.seqcount);
+	__do_write_seqcount_begin(&sl->seqcount.seqcount);
 }

 /**
@@ -951,7 +951,7 @@ static inline void write_seqlock_irq(seqlock_t *sl)
  */
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
-	write_seqcount_t_end(&sl->seqcount.seqcount);
+	__do_write_seqcount_end(&sl->seqcount.seqcount);
 	spin_unlock_irq(&sl->lock);
 }

@@ -960,7 +960,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	unsigned long flags;

 	spin_lock_irqsave(&sl->lock, flags);
-	write_seqcount_t_begin(&sl->seqcount.seqcount);
+	__do_write_seqcount_begin(&sl->seqcount.seqcount);
 	return flags;
 }

@@ -989,7 +989,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
-	write_seqcount_t_end(&sl->seqcount.seqcount);
+	__do_write_seqcount_end(&sl->seqcount.seqcount);
 	spin_unlock_irqrestore(&sl->lock, flags);
 }

====>
====> patch #2:
====>

Subject: [PATCH 2/2] seqlock: seqcount_LOCKAME_t: Use more verbose macro names

As evidenced by multiple discussions over LKML so far, it's not clear
that __seqcount_lock_preemptible() is always false for plain seqcount_t.
For that data type, the _Generic() selection resolves to
__seqprop_preemptible(), which just returns false.

Use __seqcount_associated_lock_exists_and_is_preemptible() instead,
which hints that "preemptibility" is for the associated write
serialization lock (if any), not for the seqcount itself.

Similarly, rename __seqcount_assert_lock_held() to
__seqcount_assert_associated_lock_held().

Link: https://lkml.kernel.org/r/CAHk-=wgB8nyOQufpn0o6a5BpJCJPnXvH+kRxApujhsgG+7qAwQ@mail.gmail.com
Link: https://lkml.kernel.org/r/20201030235121.GQ2620339@nvidia.com
Link: https://lkml.kernel.org/r/20201103170327.GJ20600@xz-x1
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 5de043841d33..eb1e5a822e44 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -307,10 +307,10 @@ SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mu
 	__seqprop_case((s),	mutex,		prop),			\
 	__seqprop_case((s),	ww_mutex,	prop))

-#define __seqcount_ptr(s)		__seqprop(s, ptr)
-#define __seqcount_sequence(s)		__seqprop(s, sequence)
-#define __seqcount_lock_preemptible(s)	__seqprop(s, preemptible)
-#define __seqcount_assert_lock_held(s)	__seqprop(s, assert)
+#define __seqcount_ptr(s)					__seqprop(s, ptr)
+#define __seqcount_sequence(s)					__seqprop(s, sequence)
+#define __seqcount_associated_lock_exists_and_is_preemptible(s)	__seqprop(s, preemptible)
+#define __seqcount_assert_associated_lock_held(s)		__seqprop(s, assert)

 /**
  * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
@@ -459,7 +459,7 @@ static inline int __do_read_seqcount_retry(const seqcount_t *s, unsigned start)
  */
 #define raw_write_seqcount_begin(s)					\
 do {									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (__seqcount_associated_lock_exists_and_is_preemptible(s))	\
 		preempt_disable();					\
 									\
 	__do_raw_write_seqcount_begin(__seqcount_ptr(s));		\
@@ -480,7 +480,7 @@ static inline void __do_raw_write_seqcount_begin(seqcount_t *s)
 do {									\
 	__do_raw_write_seqcount_end(__seqcount_ptr(s));			\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (__seqcount_associated_lock_exists_and_is_preemptible(s))	\
 		preempt_enable();					\
 } while (0)

@@ -501,9 +501,9 @@ static inline void __do_raw_write_seqcount_end(seqcount_t *s)
  */
 #define write_seqcount_begin_nested(s, subclass)			\
 do {									\
-	__seqcount_assert_lock_held(s);					\
+	__seqcount_assert_associated_lock_held(s);			\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (__seqcount_associated_lock_exists_and_is_preemptible(s))	\
 		preempt_disable();					\
 									\
 	__do_write_seqcount_begin_nested(__seqcount_ptr(s), subclass);	\
@@ -528,9 +528,9 @@ static inline void __do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
  */
 #define write_seqcount_begin(s)						\
 do {									\
-	__seqcount_assert_lock_held(s);					\
+	__seqcount_assert_associated_lock_held(s);			\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (__seqcount_associated_lock_exists_and_is_preemptible(s))	\
 		preempt_disable();					\
 									\
 	__do_write_seqcount_begin(__seqcount_ptr(s));			\
@@ -551,7 +551,7 @@ static inline void __do_write_seqcount_begin(seqcount_t *s)
 do {									\
 	__do_write_seqcount_end(__seqcount_ptr(s));			\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (__seqcount_associated_lock_exists_and_is_preemptible(s))	\
 		preempt_enable();					\
 } while (0)

>                    Linus

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH