From: "Ahmed S. Darwish" <a.darwish@linutronix.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>,
Jason Gunthorpe <jgg@nvidia.com>, Peter Xu <peterx@redhat.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
Christoph Hellwig <hch@lst.de>, Hugh Dickins <hughd@google.com>,
Jan Kara <jack@suse.cz>, Jann Horn <jannh@google.com>,
Kirill Shutemov <kirill@shutemov.name>,
Kirill Tkhai <ktkhai@virtuozzo.com>,
Leon Romanovsky <leonro@nvidia.com>,
Linux-MM <linux-mm@kvack.org>, Michal Hocko <mhocko@suse.com>,
Oleg Nesterov <oleg@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Sebastian Siewior <bigeasy@linutronix.de>
Subject: Re: [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork
Date: Wed, 4 Nov 2020 02:32:12 +0100 [thread overview]
Message-ID: <20201104013212.GA82153@lx-t490> (raw)
In-Reply-To: <CAHk-=wgB8nyOQufpn0o6a5BpJCJPnXvH+kRxApujhsgG+7qAwQ@mail.gmail.com>
On Tue, Nov 03, 2020 at 09:40:22AM -0800, Linus Torvalds wrote:
> On Mon, Nov 2, 2020 at 10:52 PM Ahmed S. Darwish
> <a.darwish@linutronix.de> wrote:
> >
> > The problem is, I've already documented seqlock.h to death.... There are
> > more comments than code in there, and there is "seqlock.rst" under
> > Documentation/ to further describe the big picture.
>
> Well, honestly, I think the correct thing to do is to get rid of the
> *_seqcount_t_*() functions entirely.
>
> They add nothing but confusion, and they are entirely misnamed. That's
> not the pattern we use for "internal use only" functions, and they are
> *very* confusing.
>
I see. Would the enclosed patch #1 be OK? It basically uses the "__do_"
prefix instead, with some rationale.
>
> They have other issues too: like raw_write_seqcount_end() not being
> usable on its own when preemptibility isn't an issue like here. You
> basically _have_ to use raw_write_seqcount_t_end(), because otherwise
> it tries to re-enable preemption that was never there.
>
Hmmm, raw_write_seqcount_{begin,end}() *never* disable/enable preemption
for plain seqcount_t. This is why I kept recommending those for this
patch series instead of internal raw_write_seqcount_*t*_{begin,end}().
But..... given that multiple people made the exact same remark by now, I
guess that's due to:
#define raw_write_seqcount_begin(s) \
do { \
if (__seqcount_lock_preemptible(s)) \
preempt_disable(); \
\
... \
} while (0);
#define raw_write_seqcount_end(s) \
do { \
... \
\
if (__seqcount_lock_preemptible(s)) \
preempt_enable(); \
} while (0);
The tricky part is that __seqcount_lock_preemptible() is always false
for plain "seqcount_t". With that data type, the _Generic() selection
makes it resolve to __seqprop_preemptible(), which just returns false.
Originally, __seqcount_lock_preemptible() was called:
__seqcount_associated_lock_exists_and_is_preemptible()
but it got transformed to its current short form in the process of some
pre-mainline refactorings. Looking at it now after all the dust has
settled, maybe the verbose form was much more clear.
Please see the enclosed patch #2... Would that be OK too?
(I will submit these two patches in their own thread after some common
ground is reached.)
Patches
-------
====>
====> patch #1:
====>
Subject: [RFC][PATCH 1/2] seqlock: Use __do_ prefix instead of non-standed
_seqcount_t_ marker
The use of "*_seqcount_t_*" as a marker to denote internal seqlock.h
functions taking only plain seqcount_t instead of the whole
seqcount_LOCKNAME_t family is confusing users, as it's also not the
standard kernel pattern for denoting header file internal functions.
Use the __do_ prefix instead.
Note, a plain "__" prefix is not used since seqlock.h already uses it
for some of its exported functions; e.g. __read_seqcount_begin() and
__read_seqcount_retry().
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Reported-by: John Hubbard <jhubbard@nvidia.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/CAHk-=wgB8nyOQufpn0o6a5BpJCJPnXvH+kRxApujhsgG+7qAwQ@mail.gmail.com
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
include/linux/seqlock.h | 62 ++++++++++++++++++++---------------------
1 file changed, 31 insertions(+), 31 deletions(-)
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index cbfc78b92b65..5de043841d33 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -425,9 +425,9 @@ SEQCOUNT_LOCKNAME(ww_mutex, struct ww_mutex, true, &s->lock->base, ww_mu
* Return: true if a read section retry is required, else false
*/
#define __read_seqcount_retry(s, start) \
- __read_seqcount_t_retry(__seqcount_ptr(s), start)
+ __do___read_seqcount_retry(__seqcount_ptr(s), start)
-static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
+static inline int __do___read_seqcount_retry(const seqcount_t *s, unsigned start)
{
kcsan_atomic_next(0);
return unlikely(READ_ONCE(s->sequence) != start);
@@ -445,12 +445,12 @@ static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
* Return: true if a read section retry is required, else false
*/
#define read_seqcount_retry(s, start) \
- read_seqcount_t_retry(__seqcount_ptr(s), start)
+ __do_read_seqcount_retry(__seqcount_ptr(s), start)
-static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
+static inline int __do_read_seqcount_retry(const seqcount_t *s, unsigned start)
{
smp_rmb();
- return __read_seqcount_t_retry(s, start);
+ return __do___read_seqcount_retry(s, start);
}
/**
@@ -462,10 +462,10 @@ do { \
if (__seqcount_lock_preemptible(s)) \
preempt_disable(); \
\
- raw_write_seqcount_t_begin(__seqcount_ptr(s)); \
+ __do_raw_write_seqcount_begin(__seqcount_ptr(s)); \
} while (0)
-static inline void raw_write_seqcount_t_begin(seqcount_t *s)
+static inline void __do_raw_write_seqcount_begin(seqcount_t *s)
{
kcsan_nestable_atomic_begin();
s->sequence++;
@@ -478,13 +478,13 @@ static inline void raw_write_seqcount_t_begin(seqcount_t *s)
*/
#define raw_write_seqcount_end(s) \
do { \
- raw_write_seqcount_t_end(__seqcount_ptr(s)); \
+ __do_raw_write_seqcount_end(__seqcount_ptr(s)); \
\
if (__seqcount_lock_preemptible(s)) \
preempt_enable(); \
} while (0)
-static inline void raw_write_seqcount_t_end(seqcount_t *s)
+static inline void __do_raw_write_seqcount_end(seqcount_t *s)
{
smp_wmb();
s->sequence++;
@@ -506,12 +506,12 @@ do { \
if (__seqcount_lock_preemptible(s)) \
preempt_disable(); \
\
- write_seqcount_t_begin_nested(__seqcount_ptr(s), subclass); \
+ __do_write_seqcount_begin_nested(__seqcount_ptr(s), subclass); \
} while (0)
-static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
+static inline void __do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
{
- raw_write_seqcount_t_begin(s);
+ __do_raw_write_seqcount_begin(s);
seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
}
@@ -533,12 +533,12 @@ do { \
if (__seqcount_lock_preemptible(s)) \
preempt_disable(); \
\
- write_seqcount_t_begin(__seqcount_ptr(s)); \
+ __do_write_seqcount_begin(__seqcount_ptr(s)); \
} while (0)
-static inline void write_seqcount_t_begin(seqcount_t *s)
+static inline void __do_write_seqcount_begin(seqcount_t *s)
{
- write_seqcount_t_begin_nested(s, 0);
+ __do_write_seqcount_begin_nested(s, 0);
}
/**
@@ -549,16 +549,16 @@ static inline void write_seqcount_t_begin(seqcount_t *s)
*/
#define write_seqcount_end(s) \
do { \
- write_seqcount_t_end(__seqcount_ptr(s)); \
+ __do_write_seqcount_end(__seqcount_ptr(s)); \
\
if (__seqcount_lock_preemptible(s)) \
preempt_enable(); \
} while (0)
-static inline void write_seqcount_t_end(seqcount_t *s)
+static inline void __do_write_seqcount_end(seqcount_t *s)
{
seqcount_release(&s->dep_map, _RET_IP_);
- raw_write_seqcount_t_end(s);
+ __do_raw_write_seqcount_end(s);
}
/**
@@ -603,9 +603,9 @@ static inline void write_seqcount_t_end(seqcount_t *s)
* }
*/
#define raw_write_seqcount_barrier(s) \
- raw_write_seqcount_t_barrier(__seqcount_ptr(s))
+ __do_raw_write_seqcount_barrier(__seqcount_ptr(s))
-static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
+static inline void __do_raw_write_seqcount_barrier(seqcount_t *s)
{
kcsan_nestable_atomic_begin();
s->sequence++;
@@ -623,9 +623,9 @@ static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
* will complete successfully and see data older than this.
*/
#define write_seqcount_invalidate(s) \
- write_seqcount_t_invalidate(__seqcount_ptr(s))
+ __do_write_seqcount_invalidate(__seqcount_ptr(s))
-static inline void write_seqcount_t_invalidate(seqcount_t *s)
+static inline void __do_write_seqcount_invalidate(seqcount_t *s)
{
smp_wmb();
kcsan_nestable_atomic_begin();
@@ -865,7 +865,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
}
/*
- * For all seqlock_t write side functions, use write_seqcount_*t*_begin()
+ * For all seqlock_t write side functions, use __do_write_seqcount_begin()
* instead of the generic write_seqcount_begin(). This way, no redundant
* lockdep_assert_held() checks are added.
*/
@@ -886,7 +886,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
static inline void write_seqlock(seqlock_t *sl)
{
spin_lock(&sl->lock);
- write_seqcount_t_begin(&sl->seqcount.seqcount);
+ __do_write_seqcount_begin(&sl->seqcount.seqcount);
}
/**
@@ -898,7 +898,7 @@ static inline void write_seqlock(seqlock_t *sl)
*/
static inline void write_sequnlock(seqlock_t *sl)
{
- write_seqcount_t_end(&sl->seqcount.seqcount);
+ __do_write_seqcount_end(&sl->seqcount.seqcount);
spin_unlock(&sl->lock);
}
@@ -912,7 +912,7 @@ static inline void write_sequnlock(seqlock_t *sl)
static inline void write_seqlock_bh(seqlock_t *sl)
{
spin_lock_bh(&sl->lock);
- write_seqcount_t_begin(&sl->seqcount.seqcount);
+ __do_write_seqcount_begin(&sl->seqcount.seqcount);
}
/**
@@ -925,7 +925,7 @@ static inline void write_seqlock_bh(seqlock_t *sl)
*/
static inline void write_sequnlock_bh(seqlock_t *sl)
{
- write_seqcount_t_end(&sl->seqcount.seqcount);
+ __do_write_seqcount_end(&sl->seqcount.seqcount);
spin_unlock_bh(&sl->lock);
}
@@ -939,7 +939,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
static inline void write_seqlock_irq(seqlock_t *sl)
{
spin_lock_irq(&sl->lock);
- write_seqcount_t_begin(&sl->seqcount.seqcount);
+ __do_write_seqcount_begin(&sl->seqcount.seqcount);
}
/**
@@ -951,7 +951,7 @@ static inline void write_seqlock_irq(seqlock_t *sl)
*/
static inline void write_sequnlock_irq(seqlock_t *sl)
{
- write_seqcount_t_end(&sl->seqcount.seqcount);
+ __do_write_seqcount_end(&sl->seqcount.seqcount);
spin_unlock_irq(&sl->lock);
}
@@ -960,7 +960,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
unsigned long flags;
spin_lock_irqsave(&sl->lock, flags);
- write_seqcount_t_begin(&sl->seqcount.seqcount);
+ __do_write_seqcount_begin(&sl->seqcount.seqcount);
return flags;
}
@@ -989,7 +989,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
static inline void
write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
{
- write_seqcount_t_end(&sl->seqcount.seqcount);
+ __do_write_seqcount_end(&sl->seqcount.seqcount);
spin_unlock_irqrestore(&sl->lock, flags);
}
====>
====> patch #2:
====>
Subject: [PATCH 2/2] seqlock: seqcount_LOCKAME_t: Use more verbose macro names
As evidenced by multiple discussions over LKML so far, it's not clear
that __seqcount_lock_preemptible() is always false for plain seqcount_t.
For that data type, the _Generic() selection resolves to
__seqprop_preemptible(), which just returns false.
Use __seqcount_associated_lock_exists_and_is_preemptible() instead,
which hints that "preemptibility" is for the associated write
serialization lock (if any), not for the seqcount itself.
Similarly, rename __seqcount_assert_lock_held() to
__seqcount_assert_associated_lock_held().
Link: https://lkml.kernel.org/r/CAHk-=wgB8nyOQufpn0o6a5BpJCJPnXvH+kRxApujhsgG+7qAwQ@mail.gmail.com
Link: https://lkml.kernel.org/r/20201030235121.GQ2620339@nvidia.com
Link: https://lkml.kernel.org/r/20201103170327.GJ20600@xz-x1
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
include/linux/seqlock.h | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 5de043841d33..eb1e5a822e44 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -307,10 +307,10 @@ SEQCOUNT_LOCKNAME(ww_mutex, struct ww_mutex, true, &s->lock->base, ww_mu
__seqprop_case((s), mutex, prop), \
__seqprop_case((s), ww_mutex, prop))
-#define __seqcount_ptr(s) __seqprop(s, ptr)
-#define __seqcount_sequence(s) __seqprop(s, sequence)
-#define __seqcount_lock_preemptible(s) __seqprop(s, preemptible)
-#define __seqcount_assert_lock_held(s) __seqprop(s, assert)
+#define __seqcount_ptr(s) __seqprop(s, ptr)
+#define __seqcount_sequence(s) __seqprop(s, sequence)
+#define __seqcount_associated_lock_exists_and_is_preemptible(s) __seqprop(s, preemptible)
+#define __seqcount_assert_associated_lock_held(s) __seqprop(s, assert)
/**
* __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
@@ -459,7 +459,7 @@ static inline int __do_read_seqcount_retry(const seqcount_t *s, unsigned start)
*/
#define raw_write_seqcount_begin(s) \
do { \
- if (__seqcount_lock_preemptible(s)) \
+ if (__seqcount_associated_lock_exists_and_is_preemptible(s)) \
preempt_disable(); \
\
__do_raw_write_seqcount_begin(__seqcount_ptr(s)); \
@@ -480,7 +480,7 @@ static inline void __do_raw_write_seqcount_begin(seqcount_t *s)
do { \
__do_raw_write_seqcount_end(__seqcount_ptr(s)); \
\
- if (__seqcount_lock_preemptible(s)) \
+ if (__seqcount_associated_lock_exists_and_is_preemptible(s)) \
preempt_enable(); \
} while (0)
@@ -501,9 +501,9 @@ static inline void __do_raw_write_seqcount_end(seqcount_t *s)
*/
#define write_seqcount_begin_nested(s, subclass) \
do { \
- __seqcount_assert_lock_held(s); \
+ __seqcount_assert_associated_lock_held(s); \
\
- if (__seqcount_lock_preemptible(s)) \
+ if (__seqcount_associated_lock_exists_and_is_preemptible(s)) \
preempt_disable(); \
\
__do_write_seqcount_begin_nested(__seqcount_ptr(s), subclass); \
@@ -528,9 +528,9 @@ static inline void __do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
*/
#define write_seqcount_begin(s) \
do { \
- __seqcount_assert_lock_held(s); \
+ __seqcount_assert_associated_lock_held(s); \
\
- if (__seqcount_lock_preemptible(s)) \
+ if (__seqcount_associated_lock_exists_and_is_preemptible(s)) \
preempt_disable(); \
\
__do_write_seqcount_begin(__seqcount_ptr(s)); \
@@ -551,7 +551,7 @@ static inline void __do_write_seqcount_begin(seqcount_t *s)
do { \
__do_write_seqcount_end(__seqcount_ptr(s)); \
\
- if (__seqcount_lock_preemptible(s)) \
+ if (__seqcount_associated_lock_exists_and_is_preemptible(s)) \
preempt_enable(); \
} while (0)
> Linus
Thanks,
--
Ahmed S. Darwish
Linutronix GmbH
next prev parent reply other threads:[~2020-11-04 1:32 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <0-v2-dfe9ecdb6c74+2066-gup_fork_jgg@nvidia.com>
2020-10-30 14:46 ` [PATCH v2 1/2] mm: reorganize internal_get_user_pages_fast() Jason Gunthorpe
2020-10-30 16:29 ` Jan Kara
2020-10-30 21:31 ` John Hubbard
2020-10-30 22:36 ` Peter Xu
2020-11-02 22:19 ` [PATCH v2 0/2] Add a seqcount between gup_fast and copy_page_range() Ahmed S. Darwish
2020-11-02 22:39 ` Linus Torvalds
2020-11-02 23:18 ` Ahmed S. Darwish
[not found] ` <2-v2-dfe9ecdb6c74+2066-gup_fork_jgg@nvidia.com>
2020-10-30 16:51 ` [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork Jan Kara
[not found] ` <20201030170226.GF2620339@nvidia.com>
2020-11-02 8:31 ` Jan Kara
2020-10-30 22:52 ` Peter Xu
[not found] ` <20201030235121.GQ2620339@nvidia.com>
2020-10-31 15:26 ` Peter Xu
2020-11-03 0:33 ` Ahmed S. Darwish
2020-11-03 0:17 ` Ahmed S. Darwish
[not found] ` <20201103002532.GL2620339@nvidia.com>
2020-11-03 0:41 ` Ahmed S. Darwish
2020-11-03 2:20 ` John Hubbard
2020-11-03 6:52 ` Ahmed S. Darwish
2020-11-03 17:40 ` Linus Torvalds
2020-11-04 1:32 ` Ahmed S. Darwish [this message]
2020-11-04 2:01 ` John Hubbard
2020-11-04 3:17 ` Ahmed S. Darwish
2020-11-04 18:38 ` Linus Torvalds
2020-11-04 19:54 ` Ahmed S. Darwish
2020-12-09 18:38 ` [tip: locking/core] seqlock: Prefix internal seqcount_t-only macros with a "do_" tip-bot2 for Ahmed S. Darwish
2020-12-09 18:38 ` [tip: locking/core] seqlock: kernel-doc: Specify when preemption is automatically altered tip-bot2 for Ahmed S. Darwish
2020-11-10 11:53 ` [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork Peter Zijlstra
2020-12-03 10:35 ` [tip: locking/core] seqlock: Rename __seqprop() users tip-bot2 for Peter Zijlstra
2020-11-03 17:03 ` [PATCH v2 2/2] mm: prevent gup_fast from racing with COW during fork Peter Xu
2020-11-02 23:58 ` Ahmed S. Darwish
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201104013212.GA82153@lx-t490 \
--to=a.darwish@linutronix.de \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=bigeasy@linutronix.de \
--cc=hch@lst.de \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=kirill@shutemov.name \
--cc=ktkhai@virtuozzo.com \
--cc=leonro@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).