linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks
@ 2020-05-19 21:45 Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount Ahmed S. Darwish
                   ` (30 more replies)
  0 siblings, 31 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, David S. Miller,
	Andrew Morton, Jens Axboe, Jonathan Corbet, Alexander Viro,
	David Airlie, Daniel Vetter, netdev, linux-mm, linux-block,
	dri-devel, linux-fsdevel, linux-doc

Hi,

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the write side critical section.

There is no built-in debugging mechanism to verify that the lock used
for writer serialization is held and preemption is disabled. Some usage
sites like dma-buf have explicit lockdep checks for the writer-side
lock, but this covers only a small portion of the sequence counter usage
in the kernel.

Add new sequence counter types which allows to associate a lock to the
sequence counter at initialization time. The seqcount API functions are
extended to provide appropriate lockdep assertions depending on the
seqcount/lock type.

For sequence counters with associated locks that do not implicitly
disable preemption, preemption protection is enforced in the sequence
counter write side functions. This removes the need to explicitly add
preempt_disable/enable() around the write side critical sections: the
write_begin/end() functions for these new sequence counter types
automatically do this.

Extend the lockdep API with a macro asserting that preemption is
disabled.  Use it to verify that preemption is disabled for all sequence
counters write side critical sections.

If lockdep is disabled, these lock associations and non-preemptibility
checks are compiled out and have neither storage size nor runtime
overhead. If lockdep is enabled, a pointer to the lock is stored in the
seqcount and the write side API functions enable lockdep assertions.

The following seqcount types with associated locks are introduced:

     seqcount_spinlock_t
     seqcount_raw_spinlock_t
     seqcount_rwlock_t
     seqcount_mutex_t
     seqcount_ww_mutex_t

This lock association is not only useful for debugging purposes, it also
provides a mechanism for PREEMPT_RT to prevent writer starvation. On RT
kernels spinlocks and rwlocks are substituted with sleeping locks and
the code sections protected by these locks become preemptible, which has
the same problem as write side critical section with preemption enabled
on a non-RT kernel. RT utilizes this association by storing the provided
lock pointer and in case that a reader sees an active writer (seqcount
is odd), it does not spin, but blocks on the associated lock similar to
read_seqbegin_or_lock().

By using the lockdep debugging mechanisms added in this patch series, a
number of erroneous seqcount call-sites were discovered across the
kernel. The fixes are included at the beginning of the series.

Thanks,

8<--------------

Ahmed S. Darwish (25):
  net: core: device_rename: Use rwsem instead of a seqcount
  mm/swap: Don't abuse the seqcount latching API
  net: phy: fixed_phy: Remove unused seqcount
  block: nr_sects_write(): Disable preemption on seqcount write
  u64_stats: Document writer non-preemptibility requirement
  dma-buf: Remove custom seqcount lockdep class key
  lockdep: Add preemption disabled assertion API
  seqlock: lockdep assert non-preemptibility on seqcount_t write
  Documentation: locking: Describe seqlock design and usage
  seqlock: Add RST directives to kernel-doc code samples and notes
  seqlock: Add missing kernel-doc annotations
  seqlock: Extend seqcount API with associated locks
  dma-buf: Use sequence counter with associated wound/wait mutex
  sched: tasks: Use sequence counter with associated spinlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  xfrm: policy: Use sequence counters with associated lock
  timekeeping: Use sequence counter with associated raw spinlock
  vfs: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  hrtimer: Use sequence counter with associated raw spinlock

 Documentation/locking/index.rst               |   1 +
 Documentation/locking/seqlock.rst             | 239 +++++
 MAINTAINERS                                   |   2 +-
 block/blk-iocost.c                            |   5 +-
 block/blk.h                                   |   2 +
 drivers/dma-buf/dma-resv.c                    |  15 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   2 -
 drivers/md/raid5.c                            |   2 +-
 drivers/md/raid5.h                            |   2 +-
 drivers/net/phy/fixed_phy.c                   |  25 +-
 fs/dcache.c                                   |   2 +-
 fs/fs_struct.c                                |   4 +-
 fs/nfs/nfs4_fs.h                              |   2 +-
 fs/nfs/nfs4state.c                            |   2 +-
 fs/userfaultfd.c                              |   4 +-
 include/linux/dcache.h                        |   2 +-
 include/linux/dma-resv.h                      |   4 +-
 include/linux/fs_struct.h                     |   2 +-
 include/linux/hrtimer.h                       |   2 +-
 include/linux/kvm_irqfd.h                     |   2 +-
 include/linux/lockdep.h                       |   9 +
 include/linux/sched.h                         |   2 +-
 include/linux/seqlock.h                       | 882 +++++++++++++++---
 include/linux/seqlock_types_internal.h        | 187 ++++
 include/linux/u64_stats_sync.h                |  38 +-
 include/net/netfilter/nf_conntrack.h          |   2 +-
 init/init_task.c                              |   3 +-
 kernel/fork.c                                 |   2 +-
 kernel/locking/lockdep.c                      |  15 +
 kernel/time/hrtimer.c                         |  13 +-
 kernel/time/timekeeping.c                     |  19 +-
 lib/Kconfig.debug                             |   1 +
 mm/swap.c                                     |  57 +-
 net/core/dev.c                                |  30 +-
 net/netfilter/nf_conntrack_core.c             |   5 +-
 net/netfilter/nft_set_rbtree.c                |   4 +-
 net/xfrm/xfrm_policy.c                        |  10 +-
 virt/kvm/eventfd.c                            |   2 +-
 38 files changed, 1325 insertions(+), 277 deletions(-)
 create mode 100644 Documentation/locking/seqlock.rst
 create mode 100644 include/linux/seqlock_types_internal.h

base-commit: 2ef96a5bb12be62ef75b5828c0aab838ebb29cb8
--
2.20.1

^ permalink raw reply	[flat|nested] 258+ messages in thread

* [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 22:01   ` Stephen Hemminger
                     ` (2 more replies)
  2020-05-19 21:45 ` [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API Ahmed S. Darwish
                   ` (29 subsequent siblings)
  30 siblings, 3 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, David S. Miller,
	Jakub Kicinski, netdev

Sequence counters write paths are critical sections that must never be
preempted, and blocking, even for CONFIG_PREEMPTION=n, is not allowed.

Commit 5dbe7c178d3f ("net: fix kernel deadlock with interface rename and
netdev name retrieval.") handled a deadlock, observed with
CONFIG_PREEMPTION=n, where the devnet_rename seqcount read side was
infinitely spinning: it got scheduled after the seqcount write side
blocked inside its own critical section.

To fix that deadlock, among other issues, the commit added a
cond_resched() inside the read side section. While this will get the
non-preemptible kernel eventually unstuck, the seqcount reader is fully
exhausting its slice just spinning -- until TIF_NEED_RESCHED is set.

The fix is also still broken: if the seqcount reader belongs to a
real-time scheduling policy, it can spin forever and the kernel will
livelock.

Disabling preemption over the seqcount write side critical section will
not work: inside it are a number of GFP_KERNEL allocations and mutex
locking through the drivers/base/ :: device_rename() call chain.

From all the above, replace the seqcount with a rwsem.

Fixes: 5dbe7c178d3f (net: fix kernel deadlock with interface rename and netdev name retrieval.)
Fixes: 30e6c9fa93cf (net: devnet_rename_seq should be a seqcount)
Fixes: c91f6df2db49 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name)
Cc: <stable@vger.kernel.org>
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/core/dev.c | 30 ++++++++++++------------------
 1 file changed, 12 insertions(+), 18 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 522288177bbd..e18a4c23df0e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -79,6 +79,7 @@
 #include <linux/sched.h>
 #include <linux/sched/mm.h>
 #include <linux/mutex.h>
+#include <linux/rwsem.h>
 #include <linux/string.h>
 #include <linux/mm.h>
 #include <linux/socket.h>
@@ -194,7 +195,7 @@ static DEFINE_SPINLOCK(napi_hash_lock);
 static unsigned int napi_gen_id = NR_CPUS;
 static DEFINE_READ_MOSTLY_HASHTABLE(napi_hash, 8);
 
-static seqcount_t devnet_rename_seq;
+static DECLARE_RWSEM(devnet_rename_sem);
 
 static inline void dev_base_seq_inc(struct net *net)
 {
@@ -930,18 +931,13 @@ EXPORT_SYMBOL(dev_get_by_napi_id);
  *	@net: network namespace
  *	@name: a pointer to the buffer where the name will be stored.
  *	@ifindex: the ifindex of the interface to get the name from.
- *
- *	The use of raw_seqcount_begin() and cond_resched() before
- *	retrying is required as we want to give the writers a chance
- *	to complete when CONFIG_PREEMPTION is not set.
  */
 int netdev_get_name(struct net *net, char *name, int ifindex)
 {
 	struct net_device *dev;
-	unsigned int seq;
 
-retry:
-	seq = raw_seqcount_begin(&devnet_rename_seq);
+	down_read(&devnet_rename_sem);
+
 	rcu_read_lock();
 	dev = dev_get_by_index_rcu(net, ifindex);
 	if (!dev) {
@@ -951,10 +947,8 @@ int netdev_get_name(struct net *net, char *name, int ifindex)
 
 	strcpy(name, dev->name);
 	rcu_read_unlock();
-	if (read_seqcount_retry(&devnet_rename_seq, seq)) {
-		cond_resched();
-		goto retry;
-	}
+
+	up_read(&devnet_rename_sem);
 
 	return 0;
 }
@@ -1228,10 +1222,10 @@ int dev_change_name(struct net_device *dev, const char *newname)
 	    likely(!(dev->priv_flags & IFF_LIVE_RENAME_OK)))
 		return -EBUSY;
 
-	write_seqcount_begin(&devnet_rename_seq);
+	down_write(&devnet_rename_sem);
 
 	if (strncmp(newname, dev->name, IFNAMSIZ) == 0) {
-		write_seqcount_end(&devnet_rename_seq);
+		up_write(&devnet_rename_sem);
 		return 0;
 	}
 
@@ -1239,7 +1233,7 @@ int dev_change_name(struct net_device *dev, const char *newname)
 
 	err = dev_get_valid_name(net, dev, newname);
 	if (err < 0) {
-		write_seqcount_end(&devnet_rename_seq);
+		up_write(&devnet_rename_sem);
 		return err;
 	}
 
@@ -1254,11 +1248,11 @@ int dev_change_name(struct net_device *dev, const char *newname)
 	if (ret) {
 		memcpy(dev->name, oldname, IFNAMSIZ);
 		dev->name_assign_type = old_assign_type;
-		write_seqcount_end(&devnet_rename_seq);
+		up_write(&devnet_rename_sem);
 		return ret;
 	}
 
-	write_seqcount_end(&devnet_rename_seq);
+	up_write(&devnet_rename_sem);
 
 	netdev_adjacent_rename_links(dev, oldname);
 
@@ -1279,7 +1273,7 @@ int dev_change_name(struct net_device *dev, const char *newname)
 		/* err >= 0 after dev_alloc_name() or stores the first errno */
 		if (err >= 0) {
 			err = ret;
-			write_seqcount_begin(&devnet_rename_seq);
+			down_write(&devnet_rename_sem);
 			memcpy(dev->name, oldname, IFNAMSIZ);
 			memcpy(oldname, newname, IFNAMSIZ);
 			dev->name_assign_type = old_assign_type;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-20 12:22   ` Konstantin Khlebnikov
  2020-05-22 14:57   ` Peter Zijlstra
  2020-05-19 21:45 ` [PATCH v1 03/25] net: phy: fixed_phy: Remove unused seqcount Ahmed S. Darwish
                   ` (28 subsequent siblings)
  30 siblings, 2 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Andrew Morton,
	Konstantin Khlebnikov, linux-mm

Commit eef1a429f234 ("mm/swap.c: piggyback lru_add_drain_all() calls")
implemented an optimization mechanism to exit the to-be-started LRU
drain operation (name it A) if another drain operation *started and
finished* while (A) was blocked on the LRU draining mutex.

This was done through a seqcount latch, which is an abuse of its
semantics:

  1. Seqcount latching should be used for the purpose of switching
     between two storage places with sequence protection to allow
     interruptible, preemptible writer sections. The optimization
     mechanism has absolutely nothing to do with that.

  2. The used raw_write_seqcount_latch() has two smp write memory
     barriers to always insure one consistent storage place out of the
     two storage places available. This extra smp_wmb() is redundant for
     the optimization use case.

Beside the API abuse, the semantics of a latch sequence counter was
force fitted into the optimization. What was actually meant is to track
generations of LRU draining operations, where "current lru draining
generation = x" implies that all generations 0 < n <= x are already
*scheduled* for draining.

Remove the conceptually-inappropriate seqcount latch usage and manually
implement the optimization using a counter and SMP memory barriers.

Link: https://lkml.kernel.org/r/CALYGNiPSr-cxV9MX9czaVh6Wz_gzSv3H_8KPvgjBTGbJywUJpA@mail.gmail.com
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 mm/swap.c | 57 +++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 47 insertions(+), 10 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index bf9a79fed62d..d6910eeed43d 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -713,10 +713,20 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
  */
 void lru_add_drain_all(void)
 {
-	static seqcount_t seqcount = SEQCNT_ZERO(seqcount);
-	static DEFINE_MUTEX(lock);
+	/*
+	 * lru_drain_gen - Current generation of pages that could be in vectors
+	 *
+	 * (A) Definition: lru_drain_gen = x implies that all generations
+	 *     0 < n <= x are already scheduled for draining.
+	 *
+	 * This is an optimization for the highly-contended use case where a
+	 * user space workload keeps constantly generating a flow of pages
+	 * for each CPU.
+	 */
+	static unsigned int lru_drain_gen;
 	static struct cpumask has_work;
-	int cpu, seq;
+	static DEFINE_MUTEX(lock);
+	int cpu, this_gen;
 
 	/*
 	 * Make sure nobody triggers this path before mm_percpu_wq is fully
@@ -725,21 +735,48 @@ void lru_add_drain_all(void)
 	if (WARN_ON(!mm_percpu_wq))
 		return;
 
-	seq = raw_read_seqcount_latch(&seqcount);
+	/*
+	 * (B) Cache the LRU draining generation number
+	 *
+	 * smp_rmb() ensures that the counter is loaded before the mutex is
+	 * taken. It pairs with the smp_wmb() inside the mutex critical section
+	 * at (D).
+	 */
+	this_gen = READ_ONCE(lru_drain_gen);
+	smp_rmb();
 
 	mutex_lock(&lock);
 
 	/*
-	 * Piggyback on drain started and finished while we waited for lock:
-	 * all pages pended at the time of our enter were drained from vectors.
+	 * (C) Exit the draining operation if a newer generation, from another
+	 * lru_add_drain_all(), was already scheduled for draining. Check (A).
 	 */
-	if (__read_seqcount_retry(&seqcount, seq))
+	if (unlikely(this_gen != lru_drain_gen))
 		goto done;
 
-	raw_write_seqcount_latch(&seqcount);
+	/*
+	 * (D) Increment generation number
+	 *
+	 * Pairs with READ_ONCE() and smp_rmb() at (B), outside of the critical
+	 * section.
+	 *
+	 * This pairing must be done here, before the for_each_online_cpu loop
+	 * below which drains the page vectors.
+	 *
+	 * Let x, y, and z represent some system CPU numbers, where x < y < z.
+	 * Assume CPU #z is is in the middle of the for_each_online_cpu loop
+	 * below and has already reached CPU #y's per-cpu data. CPU #x comes
+	 * along, adds some pages to its per-cpu vectors, then calls
+	 * lru_add_drain_all().
+	 *
+	 * If the paired smp_wmb() below is done at any later step, e.g. after
+	 * the loop, CPU #x will just exit at (C) and miss flushing out all of
+	 * its added pages.
+	 */
+	WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
+	smp_wmb();
 
 	cpumask_clear(&has_work);
-
 	for_each_online_cpu(cpu) {
 		struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
 
@@ -766,7 +803,7 @@ void lru_add_drain_all(void)
 {
 	lru_add_drain();
 }
-#endif
+#endif /* CONFIG_SMP */
 
 /**
  * release_pages - batched put_page()
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 03/25] net: phy: fixed_phy: Remove unused seqcount
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 04/25] block: nr_sects_write(): Disable preemption on seqcount write Ahmed S. Darwish
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit, Russell King, David S. Miller,
	netdev

Commit bf7afb29d545 ("phy: improve safety of fixed-phy MII register
reading") protected the fixed PHY status with a sequence counter.

Two years later, commit d2b977939b18 ("net: phy: fixed-phy: remove
fixed_phy_update_state()") removed the sequence counter's write side
critical section -- neutralizing its read side retry loop.

Remove the unused seqcount.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/phy/fixed_phy.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
index 4a3d34f40cb9..f55365c9d1f7 100644
--- a/drivers/net/phy/fixed_phy.c
+++ b/drivers/net/phy/fixed_phy.c
@@ -34,7 +34,6 @@ struct fixed_mdio_bus {
 struct fixed_phy {
 	int addr;
 	struct phy_device *phydev;
-	seqcount_t seqcount;
 	struct fixed_phy_status status;
 	bool no_carrier;
 	int (*link_update)(struct net_device *, struct fixed_phy_status *);
@@ -80,19 +79,17 @@ static int fixed_mdio_read(struct mii_bus *bus, int phy_addr, int reg_num)
 	list_for_each_entry(fp, &fmb->phys, node) {
 		if (fp->addr == phy_addr) {
 			struct fixed_phy_status state;
-			int s;
 
-			do {
-				s = read_seqcount_begin(&fp->seqcount);
-				fp->status.link = !fp->no_carrier;
-				/* Issue callback if user registered it. */
-				if (fp->link_update)
-					fp->link_update(fp->phydev->attached_dev,
-							&fp->status);
-				/* Check the GPIO for change in status */
-				fixed_phy_update(fp);
-				state = fp->status;
-			} while (read_seqcount_retry(&fp->seqcount, s));
+			fp->status.link = !fp->no_carrier;
+
+			/* Issue callback if user registered it. */
+			if (fp->link_update)
+				fp->link_update(fp->phydev->attached_dev,
+						&fp->status);
+
+			/* Check the GPIO for change in status */
+			fixed_phy_update(fp);
+			state = fp->status;
 
 			return swphy_read_reg(reg_num, &state);
 		}
@@ -150,8 +147,6 @@ static int fixed_phy_add_gpiod(unsigned int irq, int phy_addr,
 	if (!fp)
 		return -ENOMEM;
 
-	seqcount_init(&fp->seqcount);
-
 	if (irq != PHY_POLL)
 		fmb->mii_bus->irq[phy_addr] = irq;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 04/25] block: nr_sects_write(): Disable preemption on seqcount write
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (2 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 03/25] net: phy: fixed_phy: Remove unused seqcount Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-22 16:39   ` Peter Zijlstra
       [not found]   ` <20200522001237.A00E8206BE@mail.kernel.org>
  2020-05-19 21:45 ` [PATCH v1 05/25] u64_stats: Document writer non-preemptibility requirement Ahmed S. Darwish
                   ` (26 subsequent siblings)
  30 siblings, 2 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jens Axboe, Phillip Susi,
	Vivek Goyal, linux-block

For optimized block readers not holding a mutex, the "number of sectors"
64-bit value is protected from tearing on 32-bit architectures by a
sequence counter.

Disable preemption before entering that sequence counter's write side
critical section. Otherwise, the read side can preempt the write side
section and spin for the entire scheduler tick. If the reader belongs to
a real-time scheduling class, it can spin forever and the kernel will
livelock.

Fixes: c83f6bf98dc1 ("block: add partition resize function to blkpg ioctl")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 block/blk.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/blk.h b/block/blk.h
index 0a94ec68af32..151f86932547 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -470,9 +470,11 @@ static inline sector_t part_nr_sects_read(struct hd_struct *part)
 static inline void part_nr_sects_write(struct hd_struct *part, sector_t size)
 {
 #if BITS_PER_LONG==32 && defined(CONFIG_SMP)
+	preempt_disable();
 	write_seqcount_begin(&part->nr_sects_seq);
 	part->nr_sects = size;
 	write_seqcount_end(&part->nr_sects_seq);
+	preempt_enable();
 #elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
 	preempt_disable();
 	part->nr_sects = size;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 05/25] u64_stats: Document writer non-preemptibility requirement
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (3 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 04/25] block: nr_sects_write(): Disable preemption on seqcount write Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 06/25] dma-buf: Remove custom seqcount lockdep class key Ahmed S. Darwish
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, David S. Miller,
	Jakub Kicinski, netdev

The u64_stats mechanism uses sequence counters to protect against 64-bit
values tearing on 32-bit architectures. Updating such statistics is a
sequence counter write side critical section.

Preemption must be disabled before entering this seqcount write critical
section.  Failing to do so, the seqcount read side can preempt the write
side section and spin for the entire scheduler tick.  If that reader
belongs to a real-time scheduling class, it can spin forever and the
kernel will livelock.

Document this statistics update side non-preemptibility requirement.

Reword the u64_stats header file top comment to always mention "Reader"
or "Writer" at the start of each bullet point, making it easier to
follow which side each point is actually for.

Fix the statement "whole thing is a NOOP on 64bit arches or UP kernels".
For 32-bit UP kernels, preemption is always disabled for the statistics
read side section.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/u64_stats_sync.h | 38 ++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
index 9de5c10293f5..30358ce3d8fe 100644
--- a/include/linux/u64_stats_sync.h
+++ b/include/linux/u64_stats_sync.h
@@ -7,29 +7,31 @@
  * we provide a synchronization point, that is a noop on 64bit or UP kernels.
  *
  * Key points :
- * 1) Use a seqcount on SMP 32bits, with low overhead.
- * 2) Whole thing is a noop on 64bit arches or UP kernels.
- * 3) Write side must ensure mutual exclusion or one seqcount update could
+ *
+ * 1) Use a seqcount on 32-bit SMP, only disable preemption for 32-bit UP.
+ *
+ * 2) The whole thing is a no-op on 64-bit architectures.
+ *
+ * 3) Write side must ensure mutual exclusion, or one seqcount update could
  *    be lost, thus blocking readers forever.
- *    If this synchronization point is not a mutex, but a spinlock or
- *    spinlock_bh() or disable_bh() :
- * 3.1) Write side should not sleep.
- * 3.2) Write side should not allow preemption.
- * 3.3) If applicable, interrupts should be disabled.
  *
- * 4) If reader fetches several counters, there is no guarantee the whole values
- *    are consistent (remember point 1) : this is a noop on 64bit arches anyway)
+ * 4) Write side must disable preemption, or a seqcount reader can preempt the
+ *    writer and also spin forever.
  *
- * 5) readers are allowed to sleep or be preempted/interrupted : They perform
- *    pure reads. But if they have to fetch many values, it's better to not allow
- *    preemptions/interruptions to avoid many retries.
+ * 5) Write side must use the _irqsave() variant if other writers, or a reader,
+ *    can be invoked from an IRQ context.
  *
- * 6) If counter might be written by an interrupt, readers should block interrupts.
- *    (On UP, there is no seqcount_t protection, a reader allowing interrupts could
- *     read partial values)
+ * 6) If reader fetches several counters, there is no guarantee the whole values
+ *    are consistent w.r.t. each other (remember point #2: seqcounts are not
+ *    used for 64bit architectures).
  *
- * 7) For irq and softirq uses, readers can use u64_stats_fetch_begin_irq() and
- *    u64_stats_fetch_retry_irq() helpers
+ * 7) Readers are allowed to sleep or be preempted/interrupted: they perform
+ *    pure reads.
+ *
+ * 8) Readers must use both u64_stats_fetch_{begin,retry}_irq() if the stats
+ *    might be updated from a hardirq or softirq context (remember point #1:
+ *    seqcounts are not used for UP kernels). 32-bit UP stat readers could read
+ *    corrupted 64-bit values otherwise.
  *
  * Usage :
  *
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 06/25] dma-buf: Remove custom seqcount lockdep class key
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (4 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 05/25] u64_stats: Document writer non-preemptibility requirement Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 07/25] lockdep: Add preemption disabled assertion API Ahmed S. Darwish
                   ` (24 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Sumit Semwal,
	David Airlie, Daniel Vetter, linux-media, dri-devel

Commit 3c3b177a9369 ("reservation: add support for read-only access
using rcu") introduced a sequence counter to manage updates to
reservations. Back then, the reservation object initializer
reservation_object_init() was always inlined.

Having the sequence counter initialization inlined meant that each of
the call sites would have a different lockdep class key, which would've
broken lockdep's deadlock detection. The aforementioned commit thus
introduced, and exported, a custom seqcount lockdep class key and name.

The commit 8735f16803f00 ("dma-buf: cleanup reservation_object_init...")
transformed the reservation object initializer to a normal non-inlined C
function. seqcount_init(), which automatically defines the seqcount
lockdep class key and must be called non-inlined, can now be safely used.

Remove the seqcount custom lockdep class key, name, and export. Use
seqcount_init() inside the dma reservation object initializer.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/dma-buf/dma-resv.c | 9 +--------
 include/linux/dma-resv.h   | 2 --
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 4264e64788c4..590ce7ad60a0 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -50,12 +50,6 @@
 DEFINE_WD_CLASS(reservation_ww_class);
 EXPORT_SYMBOL(reservation_ww_class);
 
-struct lock_class_key reservation_seqcount_class;
-EXPORT_SYMBOL(reservation_seqcount_class);
-
-const char reservation_seqcount_string[] = "reservation_seqcount";
-EXPORT_SYMBOL(reservation_seqcount_string);
-
 /**
  * dma_resv_list_alloc - allocate fence list
  * @shared_max: number of fences we need space for
@@ -134,9 +128,8 @@ subsys_initcall(dma_resv_lockdep);
 void dma_resv_init(struct dma_resv *obj)
 {
 	ww_mutex_init(&obj->lock, &reservation_ww_class);
+	seqcount_init(&obj->seq);
 
-	__seqcount_init(&obj->seq, reservation_seqcount_string,
-			&reservation_seqcount_class);
 	RCU_INIT_POINTER(obj->fence, NULL);
 	RCU_INIT_POINTER(obj->fence_excl, NULL);
 }
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index ee50d10f052b..a6538ae7d93f 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -46,8 +46,6 @@
 #include <linux/rcupdate.h>
 
 extern struct ww_class reservation_ww_class;
-extern struct lock_class_key reservation_seqcount_class;
-extern const char reservation_seqcount_string[];
 
 /**
  * struct dma_resv_list - a list of shared fences
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 07/25] lockdep: Add preemption disabled assertion API
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (5 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 06/25] dma-buf: Remove custom seqcount lockdep class key Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-22 17:55   ` Peter Zijlstra
  2020-05-19 21:45 ` [PATCH v1 08/25] seqlock: lockdep assert non-preemptibility on seqcount_t write Ahmed S. Darwish
                   ` (23 subsequent siblings)
  30 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish

Asserting that preemption is disabled is a critical sanity check.
Developers are usually reluctant to add such a check in a fastpath, as
reading the preemption count can be costly.

Extend the lockdep API with a preemption disabled assertion. If lockdep
is disabled, or if the underlying architecture does not support kernel
preemption, this assert has no runtime overhead.

Since the lockdep assertion references sched.h task_struct current,
define it at lockdep.c instead of lockdep.h. This unbinds a potential
circular header dependency chain for call-sites, defined inlined, at
other header files already included and needed by sched.h.

Mark the exported assertion symbol with NOKPROBE_SYMBOL. Lockdep
functions can be involved in breakpoint handling and probing on those
functions can cause a breakpoint recursion.

References: f54bb2ec02c8 ("locking/lockdep: Add IRQs disabled/enabled assertion APIs: ...")
References: 2f43c6022d84 ("kprobes: Prohibit probing on lockdep functions")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/lockdep.h  |  9 +++++++++
 kernel/locking/lockdep.c | 15 +++++++++++++++
 lib/Kconfig.debug        |  1 +
 3 files changed, 25 insertions(+)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 206774ac6946..54c929ea5b98 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -702,6 +702,14 @@ do {									\
 			  "Not in hardirq as expected\n");		\
 	} while (0)
 
+/*
+ * Don't define this assertion here to avoid a call-site's header file
+ * dependency on sched.h task_struct current. This is needed by call
+ * sites that are inline defined at header files already included by
+ * sched.h.
+ */
+void lockdep_assert_preemption_disabled(void);
+
 #else
 # define might_lock(lock) do { } while (0)
 # define might_lock_read(lock) do { } while (0)
@@ -709,6 +717,7 @@ do {									\
 # define lockdep_assert_irqs_enabled() do { } while (0)
 # define lockdep_assert_irqs_disabled() do { } while (0)
 # define lockdep_assert_in_irq() do { } while (0)
+# define lockdep_assert_preemption_disabled() do { } while (0)
 #endif
 
 #ifdef CONFIG_PROVE_RAW_LOCK_NESTING
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index ac10db66cc63..4dae65bc65c2 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -5857,3 +5857,18 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
 	dump_stack();
 }
 EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious);
+
+#ifdef CONFIG_PROVE_LOCKING
+
+void lockdep_assert_preemption_disabled(void)
+{
+	WARN_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT)	&&
+		  debug_locks				&&
+		  !current->lockdep_recursion		&&
+		  (preempt_count() == 0 && current->hardirqs_enabled),
+		  "preemption not disabled as expected\n");
+}
+EXPORT_SYMBOL_GPL(lockdep_assert_preemption_disabled);
+NOKPROBE_SYMBOL(lockdep_assert_preemption_disabled);
+
+#endif
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 21d9c5f6e7ec..34d9d8896003 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1062,6 +1062,7 @@ config PROVE_LOCKING
 	select DEBUG_RWSEMS
 	select DEBUG_WW_MUTEX_SLOWPATH
 	select DEBUG_LOCK_ALLOC
+	select PREEMPT_COUNT if !ARCH_NO_PREEMPT
 	select TRACE_IRQFLAGS
 	default n
 	help
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 08/25] seqlock: lockdep assert non-preemptibility on seqcount_t write
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (6 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 07/25] lockdep: Add preemption disabled assertion API Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 09/25] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
                   ` (22 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish

Preemption must be disabled before entering a sequence count write side
critical section.  Failing to do so, the seqcount read side can preempt
the write side section and spin for the entire scheduler tick.  If that
reader belongs to a real-time scheduling class, it can spin forever and
the kernel will livelock.

Assert through lockdep that preemption is disabled for seqcount writers.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 0491d963d47e..d35be7709403 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -369,14 +369,32 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
 
 /*
  * Sequence counter only version assumes that callers are using their
- * own mutexing.
+ * own locking and preemption is disabled.
  */
-static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
+
+static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
 	raw_write_seqcount_begin(s);
 	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }
 
+static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
+{
+	lockdep_assert_preemption_disabled();
+	__write_seqcount_begin_nested(s, subclass);
+}
+
+/*
+ * write_seqcount_begin() without lockdep non-preemptibility checks.
+ *
+ * Use for internal seqlock.h code where it's known that preemption
+ * is already disabled. For example, seqlock_t write functions.
+ */
+static inline void __write_seqcount_begin(seqcount_t *s)
+{
+	__write_seqcount_begin_nested(s, 0);
+}
+
 static inline void write_seqcount_begin(seqcount_t *s)
 {
 	write_seqcount_begin_nested(s, 0);
@@ -446,7 +464,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
 static inline void write_sequnlock(seqlock_t *sl)
@@ -458,7 +476,7 @@ static inline void write_sequnlock(seqlock_t *sl)
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
 static inline void write_sequnlock_bh(seqlock_t *sl)
@@ -470,7 +488,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
 static inline void write_sequnlock_irq(seqlock_t *sl)
@@ -484,7 +502,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	unsigned long flags;
 
 	spin_lock_irqsave(&sl->lock, flags);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 	return flags;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 09/25] Documentation: locking: Describe seqlock design and usage
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (7 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 08/25] seqlock: lockdep assert non-preemptibility on seqcount_t write Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-22 18:01   ` Peter Zijlstra
  2020-05-19 21:45 ` [PATCH v1 10/25] seqlock: Add RST directives to kernel-doc code samples and notes Ahmed S. Darwish
                   ` (21 subsequent siblings)
  30 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc

Proper documentation for the design and usage of sequence counters and
sequential locks does not exist. Complete the seqlock.h documentation as
follows:

  - Divide all documentation on a seqcount_t vs. seqlock_t basis. The
    description for both mechanisms was intermingled, which is incorrect
    since the usage constrains for each type are vastly different.

  - Add an introductory paragraph describing the internal design of, and
    rationale for, sequence counters.

  - Document seqcount_t writer non-preemptibility requirement, which was
    not previously documented anywhere, and provide a clear rationale.

  - Provide template code for seqcount_t and seqlock_t initialization
    and reader/writer critical sections.

  - Recommend using seqlock_t by default. It implicitly handles the
    serialization and non-preemptibility requirements of writers.

At seqlock.h:

  - Remove references to brlocks as they've long been removed from the
    kernel.

  - Remove references to gcc-3.x since the kernel's minimum supported
    gcc version is 4.6.

  - Remove the severely lacking top comment and reference the newly
    introduced Documentation/locking/seqlock.rst file instead.

References: 0f6ed63b1707 ("no need to keep brlock macros anymore...")
References: cafa0010cd51 ("Raise the minimum required gcc version to 4.6")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 Documentation/locking/index.rst   |   1 +
 Documentation/locking/seqlock.rst | 181 ++++++++++++++++++++++++++++++
 include/linux/seqlock.h           |  73 +++++-------
 3 files changed, 213 insertions(+), 42 deletions(-)
 create mode 100644 Documentation/locking/seqlock.rst

diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
index 5d6800a723dc..aad15fc81ccd 100644
--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -14,6 +14,7 @@ locking
     mutex-design
     rt-mutex-design
     rt-mutex
+    seqlock
     spinlocks
     ww-mutex-design
 
diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
new file mode 100644
index 000000000000..2242ae00e7bf
--- /dev/null
+++ b/Documentation/locking/seqlock.rst
@@ -0,0 +1,181 @@
+======================================
+Sequence counters and sequential locks
+======================================
+
+Introduction
+============
+
+Sequence counters are a reader-writer consistency mechanism with
+lockless readers (read-only retry loops), and no writer starvation. They
+are used for data that's rarely written to (e.g. system time), where the
+reader wants a consistent set of information and is willing to retry if
+that information changes.
+
+A data set is consistent when the sequence count at the beginning of the
+read side critical section is even and the same sequence count value is
+read again at the end of the critical section. The data in the set must
+be copied out inside the read side critical section. If the sequence
+count has changed between the start and the end of the critical section,
+the reader must retry.
+
+Writers increment the sequence count at the start and the end of their
+critical section. After starting the critical section the sequence count
+is odd and indicates to the readers that an update is in progress. At
+the end of the write side critical section the sequence count becomes
+even again which lets readers make progress.
+
+A sequence counter write side critical section must never be preempted
+or interrupted by read side sections. Otherwise the reader will spin for
+the entire scheduler tick due to the odd sequence count value and the
+interrupted writer. If that reader belongs to a real-time scheduling
+class, it can spin forever and the kernel will livelock.
+
+.. _seqcount_t:
+
+Sequence counters (:c:type:`seqcount_t`)
+========================================
+
+This is the the raw counting mechanism, which does not protect against
+multiple writers.  Write side critical sections must thus be serialized
+by an external lock.
+
+If the write serialization primitive is not implicitly disabling
+preemption, preemption must be explicitly disabled before entering the
+write side section. If the sequence counter read section can be invoked
+from hardirq or softirq contexts, interrupts or bottom halves must be
+respectively disabled before entering the write side section.
+
+If it's desired to automatically handle the sequence counter
+requirements of writer serialization and non-preemptibility, use a
+:ref:`sequential lock <seqlock_t>` instead.
+
+Initialization:
+
+.. code-block:: c
+
+	/* dynamic */
+	seqcount_t foo_seqcount;
+	seqcount_init(&foo_seqcount);
+
+	/* static */
+	static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
+
+	/* C99 struct init */
+	struct {
+		.seq   = SEQCNT_ZERO(foo.seq),
+	} foo;
+
+Write path:
+
+.. code-block:: c
+
+	/* Serialized context with disabled preemption */
+
+	write_seqcount_begin(&foo_seqcount);
+
+	/* ... [[write-side critical section]] ... */
+
+	write_seqcount_end(&foo_seqcount);
+
+Read path:
+
+.. code-block:: c
+
+	do {
+		seq = read_seqcount_begin(&foo_seqcount);
+
+		/* ... [[read-side critical section]] ... */
+
+	} while (read_seqcount_retry(&foo_seqcount, seq));
+
+.. _seqlock_t:
+
+Sequential locks (:c:type:`seqlock_t`)
+======================================
+
+This contains the :ref:`sequence counting mechanism <seqcount_t>`
+earlier discussed, plus an embedded spinlock for writer serialization
+and non-preemptibility.
+
+If the read side section can be invoked from hardirq or softirq context,
+use the write side function variants which respectively disable
+interrupts or bottom halves.
+
+Initialization:
+
+.. code-block:: c
+
+	/* dynamic */
+	seqlock_t foo_seqlock;
+	seqlock_init(&foo_seqlock);
+
+	/* static */
+	static DEFINE_SEQLOCK(foo_seqlock);
+
+	/* C99 struct init */
+	struct {
+		.seql   = __SEQLOCK_UNLOCKED(foo.seql)
+	} foo;
+
+Write path:
+
+.. code-block:: c
+
+	write_seqlock(&foo_seqlock);
+
+	/* ... [[write-side critical section]] ... */
+
+	write_sequnlock(&foo_seqlock);
+
+Read path, three categories:
+
+1. Normal Sequence readers which never block a writer but they must
+   retry if a writer is in progress by detecting change in the sequence
+   number.  Writers do not wait for a sequence reader.
+
+   .. code-block:: c
+
+	do {
+		seq = read_seqbegin(&foo_seqlock);
+
+		/* ... [[read-side critical section]] ... */
+
+	} while (read_seqretry(&foo_seqlock, seq));
+
+2. Locking readers which will wait if a writer or another locking reader
+   is in progress. A locking reader in progress will also block a writer
+   from entering its critical section. This read lock is
+   exclusive. Unlike rwlock_t, only one locking reader can acquire it.
+
+   .. code-block:: c
+
+	read_seqlock_excl(&foo_seqlock);
+
+	/* ... [[read-side critical section]] ... */
+
+	read_sequnlock_excl(&foo_seqlock);
+
+3. Conditional lockless reader (as in 1), or locking reader (as in 2),
+   according to a passed marker. This is used to avoid lockless readers
+   starvation (too much retry loops) in case of a sharp spike in write
+   activity. First, a lockless read is tried (even marker passed). If
+   that trial fails (odd sequence counter is returned, which is used as
+   the next iteration marker), the lockless read is transformed to a
+   full locking read and no retry loop is necessary.
+
+   .. code-block:: c
+
+	/* marker; even initialization */
+	int seq = 0;
+	do {
+		read_seqbegin_or_lock(&foo_seqlock, &seq);
+
+		/* ... [[read-side critical section]] ... */
+
+	} while (need_seqretry(&foo_seqlock, seq));
+	done_seqretry(&foo_seqlock, seq);
+
+API documentation
+=================
+
+.. kernel-doc:: include/linux/seqlock.h
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index d35be7709403..2a4af746b1da 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -1,36 +1,15 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef __LINUX_SEQLOCK_H
 #define __LINUX_SEQLOCK_H
+
 /*
- * Reader/writer consistent mechanism without starving writers. This type of
- * lock for data where the reader wants a consistent set of information
- * and is willing to retry if the information changes. There are two types
- * of readers:
- * 1. Sequence readers which never block a writer but they may have to retry
- *    if a writer is in progress by detecting change in sequence number.
- *    Writers do not wait for a sequence reader.
- * 2. Locking readers which will wait if a writer or another locking reader
- *    is in progress. A locking reader in progress will also block a writer
- *    from going forward. Unlike the regular rwlock, the read lock here is
- *    exclusive so that only one locking reader can get it.
+ * seqcount_t / seqlock_t - a reader-writer consistency mechanism with
+ * lockless readers (read-only retry loops), and no writer starvation.
  *
- * This is not as cache friendly as brlock. Also, this may not work well
- * for data that contains pointers, because any writer could
- * invalidate a pointer that a reader was following.
+ * See Documentation/locking/seqlock.rst for full description.
  *
- * Expected non-blocking reader usage:
- * 	do {
- *	    seq = read_seqbegin(&foo);
- * 	...
- *      } while (read_seqretry(&foo, seq));
- *
- *
- * On non-SMP the spin locks disappear but the writer still needs
- * to increment the sequence variables because an interrupt routine could
- * change the state of the data.
- *
- * Based on x86_64 vsyscall gettimeofday 
- * by Keith Owens and Andrea Arcangeli
+ * Copyrights:
+ * - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli
  */
 
 #include <linux/spinlock.h>
@@ -40,11 +19,23 @@
 #include <asm/processor.h>
 
 /*
- * Version using sequence counter only.
- * This can be used when code has its own mutex protecting the
- * updating starting before the write_seqcountbeqin() and ending
- * after the write_seqcount_end().
+ * Sequence counters (seqcount_t)
+ *
+ * The raw counting mechanism without any writer protection. Write side
+ * critical sections must be serialized and readers on the same CPU
+ * (e.g. through preemption or interrupts) must be excluded.
+ *
+ * If the write serialization mechanism is one of the common kernel
+ * locking primitives, use a sequence counter with associated lock
+ * (seqcount_LOCKTYPE_t) instead.
+ *
+ * If it's desired to automatically handle the sequence counter writer
+ * serialization and non-preemptibility requirements, use a sequential
+ * lock (seqlock_t) instead.
+ *
+ * See Documentation/locking/seqlock.rst
  */
+
 typedef struct seqcount {
 	unsigned sequence;
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -221,8 +212,6 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
 	return __read_seqcount_retry(s, start);
 }
 
-
-
 static inline void raw_write_seqcount_begin(seqcount_t *s)
 {
 	s->sequence++;
@@ -367,11 +356,6 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
        smp_wmb();      /* increment "sequence" before following stores */
 }
 
-/*
- * Sequence counter only version assumes that callers are using their
- * own locking and preemption is disabled.
- */
-
 static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
 	raw_write_seqcount_begin(s);
@@ -419,15 +403,20 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
 	s->sequence+=2;
 }
 
+/*
+ * Sequential locks (seqlock_t)
+ *
+ * Sequence counters with an embedded spinlock for writer serialization
+ * and non-preemptibility.
+ *
+ * See Documentation/locking/seqlock.rst
+ */
+
 typedef struct {
 	struct seqcount seqcount;
 	spinlock_t lock;
 } seqlock_t;
 
-/*
- * These macros triggered gcc-3.x compile-time problems.  We think these are
- * OK now.  Be cautious.
- */
 #define __SEQLOCK_UNLOCKED(lockname)			\
 	{						\
 		.seqcount = SEQCNT_ZERO(lockname),	\
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 10/25] seqlock: Add RST directives to kernel-doc code samples and notes
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (8 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 09/25] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-22 18:02   ` Peter Zijlstra
  2020-05-19 21:45 ` [PATCH v1 11/25] seqlock: Add missing kernel-doc annotations Ahmed S. Darwish
                   ` (20 subsequent siblings)
  30 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc

Mark all C code samples inside seqlock.h kernel-doc text with the RST
'code-block: c' directive. Sphinx won't properly format the example code
and will produce noisy text indentation warnings otherwise.

Mark all kernel-doc "NOTE" sections with the RST 'attention' directive.
Otherwise Sphinx produces "duplicate section name 'NOTE'" warnings.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/seqlock.h | 82 +++++++++++++++++++++++------------------
 1 file changed, 47 insertions(+), 35 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 2a4af746b1da..dfec0c9c19c4 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -232,6 +232,8 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  * usual consistency guarantee. It is one wmb cheaper, because we can
  * collapse the two back-to-back wmb()s.
  *
+ * .. code-block:: c
+ *
  *      seqcount_t seq;
  *      bool X = true, Y = false;
  *
@@ -292,62 +294,72 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  *
  * The basic form is a data structure like:
  *
- * struct latch_struct {
- *	seqcount_t		seq;
- *	struct data_struct	data[2];
- * };
+ * .. code-block:: c
+ *
+ *	struct latch_struct {
+ *		seqcount_t		seq;
+ *		struct data_struct	data[2];
+ *	};
  *
  * Where a modification, which is assumed to be externally serialized, does the
  * following:
  *
- * void latch_modify(struct latch_struct *latch, ...)
- * {
- *	smp_wmb();	<- Ensure that the last data[1] update is visible
- *	latch->seq++;
- *	smp_wmb();	<- Ensure that the seqcount update is visible
+ * .. code-block:: c
  *
- *	modify(latch->data[0], ...);
+ *	void latch_modify(struct latch_struct *latch, ...)
+ *	{
+ *		smp_wmb();	// Ensure that the last data[1] update is visible
+ *		latch->seq++;
+ *		smp_wmb();	// Ensure that the seqcount update is visible
  *
- *	smp_wmb();	<- Ensure that the data[0] update is visible
- *	latch->seq++;
- *	smp_wmb();	<- Ensure that the seqcount update is visible
+ *		modify(latch->data[0], ...);
  *
- *	modify(latch->data[1], ...);
- * }
+ *		smp_wmb();	// Ensure that the data[0] update is visible
+ *		latch->seq++;
+ *		smp_wmb();	// Ensure that the seqcount update is visible
+ *
+ *		modify(latch->data[1], ...);
+ *	}
  *
  * The query will have a form like:
  *
- * struct entry *latch_query(struct latch_struct *latch, ...)
- * {
- *	struct entry *entry;
- *	unsigned seq, idx;
+ * .. code-block:: c
  *
- *	do {
- *		seq = raw_read_seqcount_latch(&latch->seq);
+ *	struct entry *latch_query(struct latch_struct *latch, ...)
+ *	{
+ *		struct entry *entry;
+ *		unsigned seq, idx;
  *
- *		idx = seq & 0x01;
- *		entry = data_query(latch->data[idx], ...);
+ *		do {
+ *			seq = raw_read_seqcount_latch(&latch->seq);
  *
- *		smp_rmb();
- *	} while (seq != latch->seq);
+ *			idx = seq & 0x01;
+ *			entry = data_query(latch->data[idx], ...);
  *
- *	return entry;
- * }
+ *			smp_rmb();
+ *		} while (seq != latch->seq);
+ *
+ *		return entry;
+ *	}
  *
  * So during the modification, queries are first redirected to data[1]. Then we
  * modify data[0]. When that is complete, we redirect queries back to data[0]
  * and we can modify data[1].
  *
- * NOTE: The non-requirement for atomic modifications does _NOT_ include
- *       the publishing of new entries in the case where data is a dynamic
- *       data structure.
+ * .. attention::
  *
- *       An iteration might start in data[0] and get suspended long enough
- *       to miss an entire modification sequence, once it resumes it might
- *       observe the new entry.
+ *	The non-requirement for atomic modifications does _NOT_ include
+ *	the publishing of new entries in the case where data is a dynamic
+ *	data structure.
  *
- * NOTE: When data is a dynamic data structure; one should use regular RCU
- *       patterns to manage the lifetimes of the objects within.
+ *	An iteration might start in data[0] and get suspended long enough
+ *	to miss an entire modification sequence, once it resumes it might
+ *	observe the new entry.
+ *
+ * .. attention::
+ *
+ *	When data is a dynamic data structure; one should use regular RCU
+ *	patterns to manage the lifetimes of the objects within.
  */
 static inline void raw_write_seqcount_latch(seqcount_t *s)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 11/25] seqlock: Add missing kernel-doc annotations
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (9 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 10/25] seqlock: Add RST directives to kernel-doc code samples and notes Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 12/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc

A small number of the the exported seqlock.h functions are kernel-doc
annotated.

Since seqlock.h is now included by the kernel's RST documentation, add
kernel-doc annotations for all of the remaining functions.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 414 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 361 insertions(+), 53 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index dfec0c9c19c4..dd55555ff607 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -57,6 +57,10 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
 # define SEQCOUNT_DEP_MAP_INIT(lockname) \
 		.dep_map = { .name = #lockname } \
 
+/**
+ * seqcount_init() - runtime initializer for seqcount_t
+ * @s: Pointer to the &typedef seqcount_t instance
+ */
 # define seqcount_init(s)				\
 	do {						\
 		static struct lock_class_key __key;	\
@@ -80,13 +84,17 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
 # define seqcount_lockdep_reader_access(x)
 #endif
 
-#define SEQCNT_ZERO(lockname) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(lockname)}
+/**
+ * SEQCNT_ZERO() - static initializer for seqcount_t
+ * @name: Name of the &typedef seqcount_t instance
+ */
+#define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) }
 
 
 /**
- * __read_seqcount_begin - begin a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * __read_seqcount_begin() - begin a seq-read critical section (without barrier)
+ * @s: Pointer to &typedef seqcount_t
+ * Returns: count to be passed to read_seqcount_retry()
  *
  * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
  * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
@@ -110,9 +118,9 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount - Read the raw seqcount
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * raw_read_seqcount() - Read the raw seqcount
+ * @s: Pointer to &typedef seqcount_t
+ * Returns: count to be passed to read_seqcount_retry()
  *
  * raw_read_seqcount opens a read critical section of the given
  * seqcount without any lockdep checking and without checking or
@@ -126,13 +134,13 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount_begin - start seq-read critical section w/o lockdep
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * raw_read_seqcount_begin() - start seq-read critical section w/o lockdep
+ * @s: Pointer to &typedef seqcount_t
+ * Returns: count to be passed to read_seqcount_retry()
  *
  * raw_read_seqcount_begin opens a read critical section of the given
  * seqcount, but without any lockdep checking. Validity of the critical
- * section is tested by checking read_seqcount_retry function.
+ * section is tested by calling read_seqcount_retry().
  */
 static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
 {
@@ -142,13 +150,13 @@ static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * read_seqcount_begin - begin a seq-read critical section
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * read_seqcount_begin() - begin a seq-read critical section
+ * @s: Pointer to &typedef seqcount_t
+ * Returns: count to be passed to read_seqcount_retry()
  *
- * read_seqcount_begin opens a read critical section of the given seqcount.
- * Validity of the critical section is tested by checking read_seqcount_retry
- * function.
+ * read_seqcount_begin opens a read critical section of the given
+ * seqcount_t.  Validity of the critical section is tested by calling
+ * read_seqcount_retry().
  */
 static inline unsigned read_seqcount_begin(const seqcount_t *s)
 {
@@ -157,8 +165,8 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * raw_seqcount_begin - begin a seq-read critical section
- * @s: pointer to seqcount_t
+ * raw_seqcount_begin() - begin a seq-read critical section
+ * @s: Pointer to &typedef seqcount_t
  * Returns: count to be passed to read_seqcount_retry
  *
  * raw_seqcount_begin opens a read critical section of the given seqcount.
@@ -178,8 +186,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * __read_seqcount_retry - end a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
+ * __read_seqcount_retry() - end a seq-read critical section (without barrier)
+ * @s: Pointer to &typedef seqcount_t
  * @start: count, from read_seqcount_begin
  * Returns: 1 if retry is required, else 0
  *
@@ -197,8 +205,8 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
 }
 
 /**
- * read_seqcount_retry - end a seq-read critical section
- * @s: pointer to seqcount_t
+ * read_seqcount_retry() - end a seq-read critical section
+ * @s: Pointer to &typedef seqcount_t
  * @start: count, from read_seqcount_begin
  * Returns: 1 if retry is required, else 0
  *
@@ -225,8 +233,8 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 }
 
 /**
- * raw_write_seqcount_barrier - do a seq write barrier
- * @s: pointer to seqcount_t
+ * raw_write_seqcount_barrier() - do a seq write barrier
+ * @s: Pointer to &typedef seqcount_t
  *
  * This can be used to provide an ordering guarantee instead of the
  * usual consistency guarantee. It is one wmb cheaper, because we can
@@ -267,6 +275,21 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
 	s->sequence++;
 }
 
+/**
+ * raw_read_seqcount_latch() - pick even or odd seqcount latch data copy
+ * @s: Pointer to &typedef seqcount_t
+ *
+ * Use seqcount latching to switch between two storage places with
+ * sequence protection to allow interruptible, preemptible, writer
+ * sections.
+ *
+ * Check raw_write_seqcount_latch() for more details and a full reader
+ * and writer usage example.
+ *
+ * Return: sequence counter. Use the lowest bit as index for picking
+ * which data copy to read. Full counter must then be passed to
+ * read_seqcount_retry().
+ */
 static inline int raw_read_seqcount_latch(seqcount_t *s)
 {
 	/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
@@ -275,8 +298,8 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
 }
 
 /**
- * raw_write_seqcount_latch - redirect readers to even/odd copy
- * @s: pointer to seqcount_t
+ * raw_write_seqcount_latch() - redirect readers to even/odd copy
+ * @s: Pointer to &typedef seqcount_t
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -336,8 +359,8 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  *			idx = seq & 0x01;
  *			entry = data_query(latch->data[idx], ...);
  *
- *			smp_rmb();
- *		} while (seq != latch->seq);
+ *			// read_seqcount_retry() includes necessary smp_rmb()
+ *		} while (read_seqcount_retry(&latch->seq, seq);
  *
  *		return entry;
  *	}
@@ -391,11 +414,26 @@ static inline void __write_seqcount_begin(seqcount_t *s)
 	__write_seqcount_begin_nested(s, 0);
 }
 
+/**
+ * write_seqcount_begin() - start a seqcount write-side critical section
+ * @s: Pointer to &typedef seqcount_t
+ *
+ * write_seqcount_begin opens a write-side critical section of the given
+ * seqcount. Seqcount write-side critical sections must be externally
+ * serialized and non-preemptible.
+ */
 static inline void write_seqcount_begin(seqcount_t *s)
 {
 	write_seqcount_begin_nested(s, 0);
 }
 
+/**
+ * write_seqcount_end() - end a seqcount write-side critical section
+ * @s: Pointer to &typedef seqcount_t
+ *
+ * write_seqcount_end closes a write-side critical section of the given
+ * seqcount.
+ */
 static inline void write_seqcount_end(seqcount_t *s)
 {
 	seqcount_release(&s->dep_map, _RET_IP_);
@@ -403,8 +441,8 @@ static inline void write_seqcount_end(seqcount_t *s)
 }
 
 /**
- * write_seqcount_invalidate - invalidate in-progress read-side seq operations
- * @s: pointer to seqcount_t
+ * write_seqcount_invalidate() - invalidate in-progress read-side seq operations
+ * @s: Pointer to &typedef seqcount_t
  *
  * After write_seqcount_invalidate, no read-side seq operations will complete
  * successfully and see data older than this.
@@ -435,32 +473,68 @@ typedef struct {
 		.lock =	__SPIN_LOCK_UNLOCKED(lockname)	\
 	}
 
-#define seqlock_init(x)					\
+/**
+ * seqlock_init() - dynamic initializer for seqlock_t
+ * @sl: Pointer to the seqlock_t instance
+ */
+#define seqlock_init(sl)				\
 	do {						\
-		seqcount_init(&(x)->seqcount);		\
-		spin_lock_init(&(x)->lock);		\
+		seqcount_init(&(sl)->seqcount);		\
+		spin_lock_init(&(sl)->lock);		\
 	} while (0)
 
-#define DEFINE_SEQLOCK(x) \
-		seqlock_t x = __SEQLOCK_UNLOCKED(x)
+/**
+ * DEFINE_SEQLOCK() - Define a statically-allocated seqlock_t
+ * @sl: Name of the &typedef seqlock_t instance
+ */
+#define DEFINE_SEQLOCK(sl) \
+		seqlock_t sl = __SEQLOCK_UNLOCKED(sl)
 
-/*
- * Read side functions for starting and finalizing a read side section.
+/**
+ * read_seqbegin() - start a seqlock_t read-side critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_seqbegin opens a read side critical section of the given
+ * seqlock_t. Validity of the critical section is tested by checking
+ * read_seqretry().
+ *
+ * Return: count to be passed to read_seqretry()
  */
 static inline unsigned read_seqbegin(const seqlock_t *sl)
 {
 	return read_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * read_seqretry() - end a seqlock_t read side critical section
+ * @sl: Pointer to &typedef seqlock_t
+ * @start: count, from read_seqbegin()
+ *
+ * read_seqretry closes a read side critical section of the given
+ * seqlock_t. If the read side critical section was invalid, it must be
+ * ignored and retried.
+ *
+ * Return: 1 if a retry is required, 0 otherwise
+ */
 static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 {
 	return read_seqcount_retry(&sl->seqcount, start);
 }
 
-/*
- * Lock out other writers and update the count.
- * Acts like a normal spin_lock/unlock.
- * Don't need preempt_disable() because that is in the spin_lock already.
+/**
+ * write_seqlock() - start a seqlock_t write side critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_seqlock opens a write side critical section of the given
+ * seqlock_t.  It also acquires the spinlock embedded inside the
+ * sequential lock. All seqlock_t write side critical sections are thus
+ * automatically serialized and non-preemptible.
+ *
+ * If the seqlock_t read side section can be invoked from a hardirq or
+ * softirq context, the ``_irqsave`` and ``_bh`` variants of this
+ * function must be respectively used instead.
+ *
+ * The opened write side section must be closed with write_sequnlock().
  */
 static inline void write_seqlock(seqlock_t *sl)
 {
@@ -468,30 +542,74 @@ static inline void write_seqlock(seqlock_t *sl)
 	__write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock() - end a seqlock_t write side critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_sequnlock closes the (serialized and non-preemptible) write
+ * side critical section of the given seqlock_t.
+ */
 static inline void write_sequnlock(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
 	spin_unlock(&sl->lock);
 }
 
+/**
+ * write_seqlock_bh() - start a softirqs-disabled seqlock_t write  section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_seqlock_bh is a write_seqlock() variant that disables softirqs
+ * before opening the serialized seqlock_t write side critical section.
+ * Use it only if the read side section, or other writers, can be
+ * invoked from a softirq context.
+ *
+ * The opened write section must be closed with write_sequnlock_bh().
+ */
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
 	__write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock_bh() - end a softirqs-disabled seqlock_t write section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_sequnlock_bh closes the serialized, softirqs-disabled,
+ * seqlock_t write side critical section. It enables softirqs if they
+ * were already enabled before calling the paired write_seqlock_bh().
+ */
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
 	spin_unlock_bh(&sl->lock);
 }
 
+/**
+ * write_seqlock_irq() - start a non-interruptible seqlock_t write side section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_seqlock_irq is a write_seqlock() variant where hardirqs are
+ * disabled before opening the serialized and non-preemptible seqlock_t
+ * write side critical section.
+ */
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
 	__write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock_irq() - end a non-interruptible seqlock_t write side section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_sequnlock_irq closes the serialized and non-interruptible write
+ * side critical section of the given seqlock_t. It enables local
+ * interrupts afterwards.
+ *
+ * The write critical section must've been opened with write_seqlock_irq().
+ */
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
@@ -507,9 +625,36 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	return flags;
 }
 
+/**
+ * write_seqlock_irqsave() - start a non-interruptible seqlock_t write section
+ * @lock:  Pointer to &typedef seqlock_t
+ * @flags: Stack-allocated storage for saving caller's local interrupt
+ *         state, to be passed to write_sequnlock_irqrestore().
+ *
+ * write_seqlock_irqsave is a write_seqlock() variant where the caller's
+ * local interrupts state is saved, then local interrupts are disabled,
+ * before opening the serialized and non-preemptible seqlock_t write
+ * side critical section.
+ *
+ * Use this only if the read side section can be invoked from a hardirq
+ * context.
+ *
+ * The opened write section must be closed with write_sequnlock_irqrestore().
+ */
 #define write_seqlock_irqsave(lock, flags)				\
 	do { flags = __write_seqlock_irqsave(lock); } while (0)
 
+/**
+ * write_sequnlock_irqrestore() - end non-interruptible seqlock_t write section
+ * @sl:    Pointer to &typedef seqlock_t
+ * @flags: Caller's saved interrupt state, from write_seqlock_irqsave()
+ *
+ * write_sequnlock_irq closes the serialized and non-interruptible write
+ * side critical section of the given seqlock_t. It then restores the
+ * caller's local interrupts saved state.
+ *
+ * The write section must've been opened with write_seqlock_irqsave().
+ */
 static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
@@ -517,30 +662,61 @@ write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
-/*
- * A locking reader exclusively locks out other writers and locking readers,
- * but doesn't update the sequence number. Acts like a normal spin_lock/unlock.
- * Don't need preempt_disable() because that is in the spin_lock already.
+/**
+ * read_seqlock_excl() - begin a seqlock_t locking reader critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_seqlock_excl opens a locking reader critical section for the
+ * given seqlock_t. A locking reader exclusively locks out other writers
+ * and other locking readers, but doesn't update the sequence number.
+ *
+ * Locking readers act like a normal spin_lock()/spin_unlock().
+ *
+ * The opened read side section must be closed with read_sequnlock_excl().
  */
 static inline void read_seqlock_excl(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl() - end a seqlock_t locking reader critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_sequnlock_excl closes a locking reader critical section.  The
+ * read section must've been opened with read_seqlock_excl().
+ */
 static inline void read_sequnlock_excl(seqlock_t *sl)
 {
 	spin_unlock(&sl->lock);
 }
 
 /**
- * read_seqbegin_or_lock - begin a sequence number check or locking block
- * @lock: sequence lock
- * @seq : sequence number to be checked
+ * read_seqbegin_or_lock() - begin a seqlock_t lockless or locking reader
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq : Marker and return parameter. If the passed value is even, the
+ * reader will become a *lockless* seqlock_t sequence counter reader as
+ * in read_seqbegin(). If the passed value is odd, the reader will
+ * become a fully locking reader, as in read_seqlock_excl().  In the
+ * first call to read_seqbegin_or_lock(), the caller **must** initialize
+ * and pass an even value in @seq so a lockless read is optimistically
+ * tried first.
  *
- * First try it once optimistically without taking the lock. If that fails,
- * take the lock. The sequence number is also used as a marker for deciding
- * whether to be a reader (even) or writer (odd).
- * N.B. seq must be initialized to an even number to begin with.
+ * read_seqbegin_or_lock optimistically tries a lockless seqlock_t
+ * sequence counter read first. If an odd counter is found, the lockless
+ * read trial has failed, and the reader transforms to a full seqlock_t
+ * locking reader as in read_seqlock_excl().  This is typically used to
+ * avoid lockless seqlock_t readers starvation (too much retry loops) in
+ * the case of a sharp spike in write activity.
+ *
+ * The opened read section must be closed with done_seqretry().  Check
+ * Documentation/locking/seqlock.rst for template example code.
+ *
+ * Return: The read critical section status is returned through @seq,
+ * which is overloaded as a return parameter. This value must be passed
+ * to need_seqretry() to check the validity of the tried seqlock_t read
+ * section. If the read section must be retried, the returned value must
+ * also be passed to the next iteration of read_seqbegin_or_lock().
  */
 static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
 {
@@ -550,32 +726,98 @@ static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
 		read_seqlock_excl(lock);
 }
 
+/**
+ * need_seqretry() - validate seqlock_t "locking or lockless" reader section
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq: count, from read_seqbegin_or_lock()
+ *
+ * need_seqretry checks if the seqlock_t read-side critical section
+ * started with read_seqbegin_or_lock() is valid. If it was not, the
+ * caller must retry the read-side section.
+ *
+ * Return: 1 if a retry is required, 0 otherwise
+ */
 static inline int need_seqretry(seqlock_t *lock, int seq)
 {
 	return !(seq & 1) && read_seqretry(lock, seq);
 }
 
+/**
+ * done_seqretry() - end seqlock_t "locking or lockless" reader section
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq: count, from read_seqbegin_or_lock()
+ *
+ * done_seqretry finishes the seqlock_t read side critical section
+ * started by read_seqbegin_or_lock(). Before finishing the critical
+ * section, the validity of the read side section must've been already
+ * verified with need_seqretry().
+ */
 static inline void done_seqretry(seqlock_t *lock, int seq)
 {
 	if (seq & 1)
 		read_sequnlock_excl(lock);
 }
 
+/**
+ * read_seqlock_excl_bh() - start a locking reader seqlock_t section
+ *			    with softirqs disabled
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_seqlock_excl_bh is a variant of read_seqlock_excl() that saves
+ * softirqs state, then disables softirqs, before starting the locking
+ * reader read side section. Only use this variant if the seqlock_t
+ * write side section, *or other read sections*, can be invoked from a
+ * softirq context
+ *
+ * The opened section must be closed with read_sequnlock_excl_bh().
+ */
 static inline void read_seqlock_excl_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl_bh() - stop a seqlock_t softirq-disabled locking
+ *			      reader section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_sequnlock_excl_bh ends the softirq-disabled seqlock_t locking
+ * reader read side section. It restores the softirqs state saved by
+ * read_seqlock_excl_bh() afterwards.
+ */
 static inline void read_sequnlock_excl_bh(seqlock_t *sl)
 {
 	spin_unlock_bh(&sl->lock);
 }
 
+/**
+ * read_seqlock_excl_irq() - start a non-interruptible seqlock_t locking
+ *			     reader section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_seqlock_excl_irq is a variant of read_seqlock_excl() that
+ * disables interrupts before starting the locking reader read side
+ * section. Only use this variant if the seqlock_t write side section,
+ * *or other read sections*, can be invoked from a hardirq context
+ *
+ * The opened read section must be closed with read_sequnlock_excl_irq().
+ */
 static inline void read_seqlock_excl_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl_irq() - end an interrupts-disabled seqlock_t
+ *                             locking reader section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_sequnlock_excl_irq ends the interrupts-disabled seqlock_t
+ * locking reader read side critical section. It enables local
+ * interrupts afterwards.
+ *
+ * The read section must've been started with read_seqlock_excl_irq().
+ */
 static inline void read_sequnlock_excl_irq(seqlock_t *sl)
 {
 	spin_unlock_irq(&sl->lock);
@@ -589,15 +831,68 @@ static inline unsigned long __read_seqlock_excl_irqsave(seqlock_t *sl)
 	return flags;
 }
 
+/**
+ * read_seqlock_excl_irqsave() - start a non-interruptible seqlock_t
+ *				 locking reader section
+ * @lock: Pointer to &typedef seqlock_t
+ * @flags: Stack-allocated storage for saving caller's local interrupt
+ *         state, to be passed to read_sequnlock_excl_irqrestore().
+ *
+ * read_seqlock_excl_irqsave is a read_seqlock_excl() variant which
+ * saves the caller's local interrupts state, then disables local
+ * interrupts, before opening the seqlock_t locking reader critical
+ * section.
+ *
+ * Use this only if the seqlock_t write side critical section, or other
+ * read side sections, can be invoked from a hardirq context.
+ *
+ * The opened locking reader critical section must be closed with
+ * read_sequnlock_excl_irqrestore().
+ */
 #define read_seqlock_excl_irqsave(lock, flags)				\
 	do { flags = __read_seqlock_excl_irqsave(lock); } while (0)
 
+/**
+ * read_sequnlock_excl_irqrestore() - end non-interruptible seqlock_t
+ *				      locking reader section
+ * @sl: Pointer to &typedef seqlock_t
+ * @flags: Caller's saved interrupt state, from
+ *	   read_seqlock_excl_irqsave()
+ *
+ * read_sequnlock_excl_irqrestore closes the non-interruptible seqlock_t
+ * locking reader section. It then restores the caller's local
+ * interrupts saved state.
+ *
+ * The read section must've been opened with read_seqlock_excl_irqsave().
+ */
 static inline void
 read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags)
 {
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
+/**
+ * read_seqbegin_or_lock_irqsave() - begin a seqlock_t lockless reader, or
+ *                                   a non-interruptible locking reader
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq: Marker and return parameter. Check read_seqbegin_or_lock().
+ *
+ * read_seqbegin_or_lock_irqsave is a variant of read_seqbegin_or_lock()
+ * which saves the local interrupts state, then disables local
+ * interrupts, before opening a seqlock_t *locking reader* critical
+ * section.
+ *
+ * The opened section must be closed with done_seqretry_irqrestore().
+ *
+ * Return:
+ *
+ *   1. The saved local interrupts state in case of a locking reader, to
+ *      be passed to done_seqretry_irqrestore().
+ *
+ *   2. The read critical section status, returned through @seq which is
+ *      overloaded as a return parameter. Check read_seqbegin_or_lock()
+ *      for more info.
+ */
 static inline unsigned long
 read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
 {
@@ -611,6 +906,19 @@ read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
 	return flags;
 }
 
+/**
+ * done_seqretry_irqrestore() - end a seqlock_t lockless reader, or a
+ *				non-interruptible locking reader section
+ * @lock:  Pointer to &typedef seqlock_t
+ * @seq:   Count, from read_seqbegin_or_lock_irqsave()
+ * @flags: Caller's saved local interrupt state in case of a locking
+ *	   reader, also from read_seqbegin_or_lock_irqsave()
+ *
+ * done_seqretry_irqrestore is a variant of done_seqretry() which
+ * restores the callers saved local interrupts state in case of a
+ * locking reader. Check done_seqretry() for more information. The read
+ * section must've been opened with read_seqbegin_or_lock_irqsave().
+ */
 static inline void
 done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 12/25] seqlock: Extend seqcount API with associated locks
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (10 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 11/25] seqlock: Add missing kernel-doc annotations Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 13/25] dma-buf: Use sequence counter with associated wound/wait mutex Ahmed S. Darwish
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the write side critical section.

There is no built-in debugging mechanism to verify that the lock used
for writer serialization is held and preemption is disabled. Some usage
sites like dma-buf have explicit lockdep checks for the writer-side
lock, but this covers only a small portion of the sequence counter usage
in the kernel.

Add new sequence counter types which allows to associate a lock to the
sequence counter at initialization time. The seqcount API functions are
extended to provide appropriate lockdep assertions depending on the
seqcount/lock type.

For sequence counters with associated locks that do not implicitly
disable preemption, preemption protection is enforced in the sequence
counter write side functions. This removes the need to explicitly add
preempt_disable/enable() around the write side critical sections: the
write_begin/end() functions for these new sequence counter types
automatically do this.

Introduce the following seqcount types with associated locks:

     seqcount_spinlock_t
     seqcount_raw_spinlock_t
     seqcount_rwlock_t
     seqcount_mutex_t
     seqcount_ww_mutex_t

Extend the seqcount read and write functions to branch out to the
specific seqcount_LOCKTYPE_t implementation at compile-time. This avoids
kernel API explosion per each new seqcount_LOCKTYPE_t added. Add such
compile-time type detection logic into a new, internal, seqlock header.

Document the proper seqcount_LOCKTYPE_t usage, and rationale, at
Documentation/locking/seqlock.rst.

If lockdep is disabled, this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 Documentation/locking/seqlock.rst      |  64 ++++-
 MAINTAINERS                            |   2 +-
 include/linux/seqlock.h                | 355 +++++++++++++++++++++----
 include/linux/seqlock_types_internal.h | 187 +++++++++++++
 4 files changed, 549 insertions(+), 59 deletions(-)
 create mode 100644 include/linux/seqlock_types_internal.h

diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
index 2242ae00e7bf..e6f8e4be7db8 100644
--- a/Documentation/locking/seqlock.rst
+++ b/Documentation/locking/seqlock.rst
@@ -45,9 +45,11 @@ write side section. If the sequence counter read section can be invoked
 from hardirq or softirq contexts, interrupts or bottom halves must be
 respectively disabled before entering the write side section.
 
-If it's desired to automatically handle the sequence counter
-requirements of writer serialization and non-preemptibility, use a
-:ref:`sequential lock <seqlock_t>` instead.
+If the write serialization mechanism is one of the common kernel locking
+primitives, use :ref:`sequence counters with associated locks
+<seqcount_locktype_t>` instead. If it's desired to automatically handle
+the sequence counter writer serialization and non-preemptibility
+requirements, use a :ref:`sequential lock <seqlock_t>`.
 
 Initialization:
 
@@ -67,6 +69,7 @@ Initialization:
 
 Write path:
 
+.. _seqcount_write_ops:
 .. code-block:: c
 
 	/* Serialized context with disabled preemption */
@@ -79,6 +82,7 @@ Write path:
 
 Read path:
 
+.. _seqcount_read_ops:
 .. code-block:: c
 
 	do {
@@ -88,6 +92,60 @@ Read path:
 
 	} while (read_seqcount_retry(&foo_seqcount, seq));
 
+.. _seqcount_locktype_t:
+
+Sequence counters with associated locks (:c:type:`seqcount_LOCKTYPE_t`)
+-----------------------------------------------------------------------
+
+As :ref:`earlier discussed <seqcount_t>`, seqcount write side critical
+sections must be serialized and non-preemptible. This variant of
+sequence counters associate the lock used for writer serialization at
+the seqcount initialization time. This enables lockdep to validate that
+the write side critical section is properly serialized.
+
+This lock association is a NOOP if lockdep is disabled and has neither
+storage nor runtime overhead. If lockdep is enabled, the lock pointer is
+stored in struct seqcount and lockdep's "lock is held" assertions are
+injected at the beginning of the write side critical section to validate
+that it is properly protected.
+
+For lock types which do not implicitly disable preemption, preemption
+protection is enforced in the write side function.
+
+The following seqcounts with associated locks are defined:
+
+  - :c:type:`seqcount_spinlock_t`
+  - :c:type:`seqcount_raw_spinlock_t`
+  - :c:type:`seqcount_rwlock_t`
+  - :c:type:`seqcount_mutex_t`
+  - :c:type:`seqcount_ww_mutex_t`
+
+The plain seqcount read and write APIs branch out to the specific
+seqcount_LOCKTYPE_t implementation at compile-time. This avoids kernel
+API explosion per each new seqcount LOCKTYPE.
+
+Initialization (replace "LOCKTYPE" with one of the supported locks):
+
+.. code-block:: c
+
+	/* dynamic */
+	seqcount_LOCKTYPE_t foo_seqcount;
+	seqcount_LOCKTYPE_init(&foo_seqcount, &lock);
+
+	/* static */
+	static seqcount_LOCKTYPE_t foo_seqcount =
+		SEQCNT_LOCKTYPE_ZERO(foo_seqcount, &lock);
+
+	/* C99 struct init */
+	struct {
+		.seq   = SEQCNT_LOCKTYPE_ZERO(foo.seq, &lock),
+	} foo;
+
+Write path: same as in :ref:`plain seqcount_t <seqcount_write_ops>`,
+while running from a context with the associated LOCKTYPE lock acquired.
+
+Read path: same as in :ref:`plain seqcount_t <seqcount_read_ops>`.
+
 .. _seqlock_t:
 
 Sequential locks (:c:type:`seqlock_t`)
diff --git a/MAINTAINERS b/MAINTAINERS
index 091ec22c1a23..f3ae546009ee 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9925,7 +9925,7 @@ F:	include/linux/lockdep.h
 F:	include/linux/mutex*.h
 F:	include/linux/rwlock*.h
 F:	include/linux/rwsem*.h
-F:	include/linux/seqlock.h
+F:	include/linux/seqlock*.h
 F:	include/linux/spinlock*.h
 F:	kernel/locking/
 F:	lib/locking*.[ch]
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index dd55555ff607..eca464ecf012 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -90,11 +90,10 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  */
 #define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) }
 
-
 /**
  * __read_seqcount_begin() - begin a seq-read critical section (without barrier)
- * @s: Pointer to &typedef seqcount_t
- * Returns: count to be passed to read_seqcount_retry()
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
+ * Returns: count to be passed to read_seqcount_retry
  *
  * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
  * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
@@ -104,7 +103,9 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  * Use carefully, only in critical code, and comment how the barrier is
  * provided.
  */
-static inline unsigned __read_seqcount_begin(const seqcount_t *s)
+#define __read_seqcount_begin(s)	do___read_seqcount_begin(s)
+
+static inline unsigned __read_seqcount_t_begin(const seqcount_t *s)
 {
 	unsigned ret;
 
@@ -119,14 +120,16 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 
 /**
  * raw_read_seqcount() - Read the raw seqcount
- * @s: Pointer to &typedef seqcount_t
- * Returns: count to be passed to read_seqcount_retry()
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
+ * Returns: count to be passed to read_seqcount_retry
  *
  * raw_read_seqcount opens a read critical section of the given
  * seqcount without any lockdep checking and without checking or
  * masking the LSB. Calling code is responsible for handling that.
  */
-static inline unsigned raw_read_seqcount(const seqcount_t *s)
+#define raw_read_seqcount(s)	do_raw_read_seqcount(s)
+
+static inline unsigned raw_read_seqcount_t(const seqcount_t *s)
 {
 	unsigned ret = READ_ONCE(s->sequence);
 	smp_rmb();
@@ -135,38 +138,42 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
 
 /**
  * raw_read_seqcount_begin() - start seq-read critical section w/o lockdep
- * @s: Pointer to &typedef seqcount_t
- * Returns: count to be passed to read_seqcount_retry()
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
+ * Returns: count to be passed to read_seqcount_retry
  *
  * raw_read_seqcount_begin opens a read critical section of the given
  * seqcount, but without any lockdep checking. Validity of the critical
  * section is tested by calling read_seqcount_retry().
  */
-static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
+#define raw_read_seqcount_begin(s)	do_raw_read_seqcount_begin(s)
+
+static inline unsigned raw_read_seqcount_t_begin(const seqcount_t *s)
 {
-	unsigned ret = __read_seqcount_begin(s);
+	unsigned ret = __read_seqcount_t_begin(s);
 	smp_rmb();
 	return ret;
 }
 
 /**
  * read_seqcount_begin() - begin a seq-read critical section
- * @s: Pointer to &typedef seqcount_t
- * Returns: count to be passed to read_seqcount_retry()
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
+ * Returns: count to be passed to read_seqcount_retry
  *
  * read_seqcount_begin opens a read critical section of the given
  * seqcount_t.  Validity of the critical section is tested by calling
  * read_seqcount_retry().
  */
-static inline unsigned read_seqcount_begin(const seqcount_t *s)
+#define read_seqcount_begin(s)	do_read_seqcount_begin(s)
+
+static inline unsigned read_seqcount_t_begin(const seqcount_t *s)
 {
 	seqcount_lockdep_reader_access(s);
-	return raw_read_seqcount_begin(s);
+	return raw_read_seqcount_t_begin(s);
 }
 
 /**
  * raw_seqcount_begin() - begin a seq-read critical section
- * @s: Pointer to &typedef seqcount_t
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * Returns: count to be passed to read_seqcount_retry
  *
  * raw_seqcount_begin opens a read critical section of the given seqcount.
@@ -178,7 +185,9 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
  * read_seqcount_retry() instead of stabilizing at the beginning of the
  * critical section.
  */
-static inline unsigned raw_seqcount_begin(const seqcount_t *s)
+#define raw_seqcount_begin(s)	do_raw_seqcount_begin(s)
+
+static inline unsigned raw_seqcount_t_begin(const seqcount_t *s)
 {
 	unsigned ret = READ_ONCE(s->sequence);
 	smp_rmb();
@@ -187,7 +196,7 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 
 /**
  * __read_seqcount_retry() - end a seq-read critical section (without barrier)
- * @s: Pointer to &typedef seqcount_t
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * @start: count, from read_seqcount_begin
  * Returns: 1 if retry is required, else 0
  *
@@ -199,14 +208,16 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
  * Use carefully, only in critical code, and comment how the barrier is
  * provided.
  */
-static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
+#define __read_seqcount_retry(s, start)	do___read_seqcount_retry(s, start)
+
+static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 {
 	return unlikely(s->sequence != start);
 }
 
 /**
  * read_seqcount_retry() - end a seq-read critical section
- * @s: Pointer to &typedef seqcount_t
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * @start: count, from read_seqcount_begin
  * Returns: 1 if retry is required, else 0
  *
@@ -214,19 +225,25 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
  * If the critical section was invalid, it must be ignored (and typically
  * retried).
  */
-static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
+#define read_seqcount_retry(s, start)	do_read_seqcount_retry(s, start)
+
+static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 {
 	smp_rmb();
-	return __read_seqcount_retry(s, start);
+	return __read_seqcount_t_retry(s, start);
 }
 
-static inline void raw_write_seqcount_begin(seqcount_t *s)
+#define raw_write_seqcount_begin(s)	do_raw_write_seqcount_begin(s)
+
+static inline void raw_write_seqcount_t_begin(seqcount_t *s)
 {
 	s->sequence++;
 	smp_wmb();
 }
 
-static inline void raw_write_seqcount_end(seqcount_t *s)
+#define raw_write_seqcount_end(s)	do_raw_write_seqcount_end(s)
+
+static inline void raw_write_seqcount_t_end(seqcount_t *s)
 {
 	smp_wmb();
 	s->sequence++;
@@ -234,7 +251,7 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 
 /**
  * raw_write_seqcount_barrier() - do a seq write barrier
- * @s: Pointer to &typedef seqcount_t
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * This can be used to provide an ordering guarantee instead of the
  * usual consistency guarantee. It is one wmb cheaper, because we can
@@ -268,7 +285,9 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  *              X = false;
  *      }
  */
-static inline void raw_write_seqcount_barrier(seqcount_t *s)
+#define raw_write_seqcount_barrier(s)	do_raw_write_seqcount_barrier(s)
+
+static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
 {
 	s->sequence++;
 	smp_wmb();
@@ -277,7 +296,7 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
 
 /**
  * raw_read_seqcount_latch() - pick even or odd seqcount latch data copy
- * @s: Pointer to &typedef seqcount_t
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * Use seqcount latching to switch between two storage places with
  * sequence protection to allow interruptible, preemptible, writer
@@ -290,7 +309,9 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
  * which data copy to read. Full counter must then be passed to
  * read_seqcount_retry().
  */
-static inline int raw_read_seqcount_latch(seqcount_t *s)
+#define raw_read_seqcount_latch(s)	do_raw_read_seqcount_latch(s)
+
+static inline int raw_read_seqcount_t_latch(seqcount_t *s)
 {
 	/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
 	int seq = READ_ONCE(s->sequence); /* ^^^ */
@@ -299,7 +320,7 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
 
 /**
  * raw_write_seqcount_latch() - redirect readers to even/odd copy
- * @s: Pointer to &typedef seqcount_t
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -384,34 +405,39 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  *	When data is a dynamic data structure; one should use regular RCU
  *	patterns to manage the lifetimes of the objects within.
  */
-static inline void raw_write_seqcount_latch(seqcount_t *s)
+#define raw_write_seqcount_latch(s)	do_raw_write_seqcount_latch(s)
+
+static inline void raw_write_seqcount_t_latch(seqcount_t *s)
 {
        smp_wmb();      /* prior stores before incrementing "sequence" */
        s->sequence++;
        smp_wmb();      /* increment "sequence" before following stores */
 }
 
-static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
+#define write_seqcount_begin_nested(s, subclass)		\
+	do_write_seqcount_begin_nested(s, subclass)
+
+static inline void __write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
 {
-	raw_write_seqcount_begin(s);
+	raw_write_seqcount_t_begin(s);
 	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }
 
-static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
+static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
 {
 	lockdep_assert_preemption_disabled();
-	__write_seqcount_begin_nested(s, subclass);
+	__write_seqcount_t_begin_nested(s, subclass);
 }
 
 /*
- * write_seqcount_begin() without lockdep non-preemptibility checks.
+ * write_seqcount_t_begin() without lockdep non-preemptibility check.
  *
  * Use for internal seqlock.h code where it's known that preemption
- * is already disabled. For example, seqlock_t write functions.
+ * is already disabled. For example, seqlock_t write side functions.
  */
-static inline void __write_seqcount_begin(seqcount_t *s)
+static inline void __write_seqcount_t_begin(seqcount_t *s)
 {
-	__write_seqcount_begin_nested(s, 0);
+	__write_seqcount_t_begin_nested(s, 0);
 }
 
 /**
@@ -422,9 +448,11 @@ static inline void __write_seqcount_begin(seqcount_t *s)
  * seqcount. Seqcount write-side critical sections must be externally
  * serialized and non-preemptible.
  */
-static inline void write_seqcount_begin(seqcount_t *s)
+#define write_seqcount_begin(s)		do_write_seqcount_begin(s)
+
+static inline void write_seqcount_t_begin(seqcount_t *s)
 {
-	write_seqcount_begin_nested(s, 0);
+	write_seqcount_t_begin_nested(s, 0);
 }
 
 /**
@@ -434,25 +462,242 @@ static inline void write_seqcount_begin(seqcount_t *s)
  * write_seqcount_end closes a write-side critical section of the given
  * seqcount.
  */
-static inline void write_seqcount_end(seqcount_t *s)
+#define write_seqcount_end(s)		do_write_seqcount_end(s)
+
+static inline void write_seqcount_t_end(seqcount_t *s)
 {
 	seqcount_release(&s->dep_map, _RET_IP_);
-	raw_write_seqcount_end(s);
+	raw_write_seqcount_t_end(s);
 }
 
 /**
  * write_seqcount_invalidate() - invalidate in-progress read-side seq operations
- * @s: Pointer to &typedef seqcount_t
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * After write_seqcount_invalidate, no read-side seq operations will complete
  * successfully and see data older than this.
  */
-static inline void write_seqcount_invalidate(seqcount_t *s)
+#define write_seqcount_invalidate(s)	do_write_seqcount_invalidate(s)
+
+static inline void write_seqcount_t_invalidate(seqcount_t *s)
 {
 	smp_wmb();
 	s->sequence+=2;
 }
 
+/*
+ * Sequence counters with associated locks (seqcount_LOCKTYPE_t)
+ *
+ * A sequence counter which associates the lock used for writer
+ * serialization at initialization time. This enables lockdep to validate
+ * that the write side critical section is properly serialized.
+ *
+ * For assicated locks which do not implicitly disable preemption,
+ * preemption protection is enforced in the write side function.
+ *
+ * See Documentation/locking/seqlock.rst
+ */
+
+/**
+ * typedef seqcount_spinlock_t - sequence count with spinlock associated
+ * @seqcount:		The real sequence counter
+ * @lock:		Pointer to the associated spinlock
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * spinlock. The spinlock is associated to the sequence count in the
+ * static initializer or init function. This enables lockdep to validate
+ * that the write side critical section is properly serialized.
+ */
+typedef struct seqcount_spinlock {
+	seqcount_t      seqcount;
+#ifdef CONFIG_LOCKDEP
+	spinlock_t	*lock;
+#endif
+} seqcount_spinlock_t;
+
+#ifdef CONFIG_LOCKDEP
+
+#define SEQCOUNT_LOCKTYPE_ZERO(seq_name, assoc_lock) {		\
+	.seqcount	= SEQCNT_ZERO(seq_name.seqcount),	\
+	.lock		= (assoc_lock),				\
+}
+
+/* Define as macro due to static lockdep key @ seqcount_init() */
+#define seqcount_locktype_init(s, assoc_lock)			\
+do {								\
+	seqcount_init(&(s)->seqcount);				\
+	(s)->lock = (assoc_lock);				\
+} while (0)
+
+#else /* !CONFIG_LOCKDEP */
+
+#define SEQCOUNT_LOCKTYPE_ZERO(seq_name, assoc_lock) {		\
+	.seqcount	= SEQCNT_ZERO(seq_name.seqcount),	\
+}
+
+#define seqcount_locktype_init(s, assoc_lock)			\
+do {								\
+	seqcount_init(&(s)->seqcount);				\
+} while (0)
+
+#endif
+
+/**
+ * SEQCNT_SPINLOCK_ZERO - static initializer for seqcount_spinlock_t
+ * @name:	Name of the &typedef seqcount_spinlock_t instance
+ * @lock:	Pointer to the associated spinlock
+ */
+#define SEQCNT_SPINLOCK_ZERO(name, lock)	\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_spinlock_init - runtime initializer for seqcount_spinlock_t
+ * @s:		Pointer to the &typedef seqcount_spinlock_t instance
+ * @lock:	Pointer to the associated spinlock
+ */
+#define seqcount_spinlock_init(s, lock)		\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_raw_spinlock_t - sequence count with raw spinlock associated
+ * @seqcount:		The real sequence counter
+ * @lock:		Pointer to the associated raw spinlock
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * raw spinlock. The raw spinlock is associated to the sequence count in
+ * the static initializer or init function. This enables lockdep to
+ * validate that the write side critical section is properly serialized.
+ */
+typedef struct seqcount_raw_spinlock {
+	seqcount_t      seqcount;
+#ifdef CONFIG_LOCKDEP
+	raw_spinlock_t	*lock;
+#endif
+} seqcount_raw_spinlock_t;
+
+/**
+ * SEQCNT_RAW_SPINLOCK_ZERO - static initializer for seqcount_raw_spinlock_t
+ * @name:	Name of the &typedef seqcount_raw_spinlock_t instance
+ * @lock:	Pointer to the associated raw_spinlock
+ */
+#define SEQCNT_RAW_SPINLOCK_ZERO(name, lock)	\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_raw_spinlock_init - runtime initializer for seqcount_raw_spinlock_t
+ * @s:		Pointer to the &typedef seqcount_raw_spinlock_t instance
+ * @lock:	Pointer to the associated raw_spinlock
+ */
+#define seqcount_raw_spinlock_init(s, lock)	\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_rwlock_t - sequence count with rwlock associated
+ * @seqcount:		The real sequence counter
+ * @lock:		Pointer to the associated rwlock
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * rwlock. The rwlock is associated to the sequence count in the static
+ * initializer or init function. This enables lockdep to validate that
+ * the write side critical section is properly serialized.
+ */
+typedef struct seqcount_rwlock {
+	seqcount_t      seqcount;
+#ifdef CONFIG_LOCKDEP
+	rwlock_t	*lock;
+#endif
+} seqcount_rwlock_t;
+
+/**
+ * SEQCNT_RWLOCK_ZERO - static initializer for seqcount_rwlock_t
+ * @name:	Name of the &typedef seqcount_rwlock_t instance
+ * @lock:	Pointer to the associated rwlock
+ */
+#define SEQCNT_RWLOCK_ZERO(name, lock)		\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_rwlock_init - runtime initializer for seqcount_rwlock_t
+ * @s:		Pointer to the &typedef seqcount_rwlock_t instance
+ * @lock:	Pointer to the associated rwlock
+ */
+#define seqcount_rwlock_init(s, lock)		\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_mutex_t - sequence count with mutex associated
+ * @seqcount:		The real sequence counter
+ * @lock:		Pointer to the associated mutex
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * mutex. The mutex is associated to the sequence counter in the static
+ * initializer or init function. This enables lockdep to validate that
+ * the write side critical section is properly serialized.
+ *
+ * The write side API functions write_seqcount_begin()/end() automatically
+ * disable and enable preemption when used with seqcount_mutex_t.
+ */
+typedef struct seqcount_mutex {
+	seqcount_t      seqcount;
+#ifdef CONFIG_LOCKDEP
+	struct mutex	*lock;
+#endif
+} seqcount_mutex_t;
+
+/**
+ * SEQCNT_MUTEX_ZERO - static initializer for seqcount_mutex_t
+ * @name:	Name of the &typedef seqcount_mutex_t instance
+ * @lock:	Pointer to the associated mutex
+ */
+#define SEQCNT_MUTEX_ZERO(name, lock)		\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_mutex_init - runtime initializer for seqcount_mutex_t
+ * @s:		Pointer to the &typedef seqcount_mutex_t instance
+ * @lock:	Pointer to the associated mutex
+ */
+#define seqcount_mutex_init(s, lock)		\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_ww_mutex_t - sequence count with ww_mutex associated
+ * @seqcount:		The real sequence counter
+ * @lock:		Pointer to the associated ww_mutex
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * ww_mutex. The ww_mutex is associated to the sequence counter in the static
+ * initializer or init function. This enables lockdep to validate that
+ * the write side critical section is properly serialized.
+ *
+ * The write side API functions write_seqcount_begin()/end() automatically
+ * disable and enable preemption when used with seqcount_ww_mutex_t.
+ */
+typedef struct seqcount_ww_mutex {
+	seqcount_t      seqcount;
+#ifdef CONFIG_LOCKDEP
+	struct ww_mutex	*lock;
+#endif
+} seqcount_ww_mutex_t;
+
+/**
+ * SEQCNT_WW_MUTEX_ZERO - static initializer for seqcount_ww_mutex_t
+ * @name:	Name of the &typedef seqcount_ww_mutex_t instance
+ * @lock:	Pointer to the associated ww_mutex
+ */
+#define SEQCNT_WW_MUTEX_ZERO(name, lock)	\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_ww_mutex_init - runtime initializer for seqcount_ww_mutex_t
+ * @s:		Pointer to the &typedef seqcount_ww_mutex_t instance
+ * @lock:	Pointer to the associated ww_mutex
+ */
+#define seqcount_ww_mutex_init(s, lock)		\
+	seqcount_locktype_init(s, lock)
+
+#include <linux/seqlock_types_internal.h>
+
 /*
  * Sequential locks (seqlock_t)
  *
@@ -475,7 +720,7 @@ typedef struct {
 
 /**
  * seqlock_init() - dynamic initializer for seqlock_t
- * @sl: Pointer to the seqlock_t instance
+ * @sl: Pointer to the &typedef seqlock_t instance
  */
 #define seqlock_init(sl)				\
 	do {						\
@@ -502,7 +747,7 @@ typedef struct {
  */
 static inline unsigned read_seqbegin(const seqlock_t *sl)
 {
-	return read_seqcount_begin(&sl->seqcount);
+	return read_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -518,7 +763,7 @@ static inline unsigned read_seqbegin(const seqlock_t *sl)
  */
 static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 {
-	return read_seqcount_retry(&sl->seqcount, start);
+	return read_seqcount_t_retry(&sl->seqcount, start);
 }
 
 /**
@@ -539,7 +784,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	__write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -551,7 +796,7 @@ static inline void write_seqlock(seqlock_t *sl)
  */
 static inline void write_sequnlock(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock(&sl->lock);
 }
 
@@ -569,7 +814,7 @@ static inline void write_sequnlock(seqlock_t *sl)
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
-	__write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -582,7 +827,7 @@ static inline void write_seqlock_bh(seqlock_t *sl)
  */
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock_bh(&sl->lock);
 }
 
@@ -597,7 +842,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
-	__write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -612,7 +857,7 @@ static inline void write_seqlock_irq(seqlock_t *sl)
  */
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock_irq(&sl->lock);
 }
 
@@ -621,7 +866,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	unsigned long flags;
 
 	spin_lock_irqsave(&sl->lock, flags);
-	__write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_t_begin(&sl->seqcount);
 	return flags;
 }
 
@@ -658,7 +903,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
diff --git a/include/linux/seqlock_types_internal.h b/include/linux/seqlock_types_internal.h
new file mode 100644
index 000000000000..de635f4c7297
--- /dev/null
+++ b/include/linux/seqlock_types_internal.h
@@ -0,0 +1,187 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_SEQLOCK_TYPES_INTERNAL_H
+#define __LINUX_SEQLOCK_TYPES_INTERNAL_H
+
+/*
+ * Sequence counters with associated locks
+ *
+ * Copyright (C) 2020 Linutronix GmbH
+ */
+
+#ifndef __LINUX_SEQLOCK_H
+#error This is an INTERNAL header; it must only be included by seqlock.h
+#endif
+
+#include <linux/mutex.h>
+#include <linux/rwlock.h>
+#include <linux/spinlock.h>
+#include <linux/ww_mutex.h>
+
+/*
+ * @s: pointer to seqcount_t or any of the seqcount_locktype_t variants
+ */
+#define __to_seqcount_t(s)						\
+({									\
+	seqcount_t *seq;						\
+									\
+	if (__same_type(*(s), seqcount_t))				\
+		seq = (seqcount_t *)(s);				\
+	else if (__same_type(*(s), seqcount_spinlock_t))		\
+		seq = &((seqcount_spinlock_t *)(s))->seqcount;		\
+	else if (__same_type(*(s), seqcount_raw_spinlock_t))		\
+		seq = &((seqcount_raw_spinlock_t *)(s))->seqcount;	\
+	else if (__same_type(*(s), seqcount_rwlock_t))			\
+		seq = &((seqcount_rwlock_t *)(s))->seqcount;		\
+	else if (__same_type(*(s), seqcount_mutex_t))			\
+		seq = &((seqcount_mutex_t *)(s))->seqcount;		\
+	else if (__same_type(*(s), seqcount_ww_mutex_t))		\
+		seq = &((seqcount_ww_mutex_t *)(s))->seqcount;		\
+	else								\
+		BUILD_BUG_ON_MSG(1, "Unknown seqcount type");		\
+									\
+	seq;								\
+})
+
+/*
+ *	seqcount_LOCKTYPE_t -- write APIs
+ *
+ * For associated lock types which do not implicitly disable preemption,
+ * enforce preemption protection in the write side functions.
+ *
+ * Never use lockdep for the raw write variants.
+ */
+
+#define __associated_lock_is_preemptible(s)				\
+({									\
+	bool ret;							\
+									\
+	if (__same_type(*(s), seqcount_t) ||				\
+	    __same_type(*(s), seqcount_spinlock_t) ||			\
+	    __same_type(*(s), seqcount_raw_spinlock_t) ||		\
+	    __same_type(*(s), seqcount_rwlock_t)) {			\
+		ret = false;						\
+	} else if (__same_type(*(s), seqcount_mutex_t) ||		\
+		   __same_type(*(s), seqcount_ww_mutex_t)) {		\
+		ret = true;						\
+	} else								\
+		BUILD_BUG_ON_MSG(1, "Unknown seqcount type");		\
+									\
+	ret;								\
+})
+
+#ifdef CONFIG_LOCKDEP
+
+#define __assert_associated_lock_held(s)				\
+do {									\
+	if (__same_type(*(s), seqcount_t))				\
+		break;							\
+									\
+	if (__same_type(*(s), seqcount_spinlock_t))			\
+		lockdep_assert_held(((seqcount_spinlock_t *)(s))->lock);\
+	else if (__same_type(*(s), seqcount_raw_spinlock_t))		\
+		lockdep_assert_held(((seqcount_raw_spinlock_t *)(s))->lock);	\
+	else if (__same_type(*(s), seqcount_rwlock_t))			\
+		lockdep_assert_held_write(((seqcount_rwlock_t *)(s))->lock);	\
+	else if (__same_type(*(s), seqcount_mutex_t))			\
+		lockdep_assert_held(((seqcount_mutex_t *)(s))->lock);	\
+	else if (__same_type(*(s), seqcount_ww_mutex_t))		\
+		lockdep_assert_held(&((seqcount_ww_mutex_t *)(s))->lock->base);	\
+	else								\
+		BUILD_BUG_ON_MSG(1, "Unknown seqcount type");		\
+} while (0)
+
+#else
+
+#define __assert_associated_lock_held(s)				\
+do {									\
+	(void) __to_seqcount_t(s);					\
+} while (0)
+
+#endif /* CONFIG_LOCKDEP */
+
+#define do_raw_write_seqcount_begin(s)					\
+do {									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_disable();					\
+									\
+	raw_write_seqcount_t_begin(__to_seqcount_t(s));			\
+} while (0)
+
+#define do_raw_write_seqcount_end(s)					\
+do {									\
+	raw_write_seqcount_t_end(__to_seqcount_t(s));			\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_enable();					\
+} while (0)
+
+#define do_write_seqcount_begin_nested(s, subclass)			\
+do {									\
+	__assert_associated_lock_held(s);				\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_disable();					\
+									\
+	write_seqcount_t_begin_nested(__to_seqcount_t(s), subclass);	\
+} while (0)
+
+#define do_write_seqcount_begin(s)					\
+do {									\
+	__assert_associated_lock_held(s);				\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_disable();					\
+									\
+	write_seqcount_t_begin(__to_seqcount_t(s));			\
+} while (0)
+
+#define do_write_seqcount_end(s)					\
+do {									\
+	write_seqcount_t_end(__to_seqcount_t(s));			\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_enable();					\
+} while (0)
+
+#define do_write_seqcount_invalidate(s)					\
+	write_seqcount_t_invalidate(__to_seqcount_t(s))
+
+#define do_raw_write_seqcount_barrier(s)				\
+	raw_write_seqcount_t_barrier(__to_seqcount_t(s))
+
+/*
+ * Latch sequence counters write side critical sections don't need to
+ * run with preemption disabled. Check @raw_write_seqcount_latch().
+ */
+#define do_raw_write_seqcount_latch(s)					\
+	raw_write_seqcount_t_latch(__to_seqcount_t(s))
+
+/*
+ *	seqcount_LOCKTYPE_t -- read APIs
+ */
+
+#define do___read_seqcount_begin(s)					\
+	__read_seqcount_t_begin(__to_seqcount_t(s))
+
+#define do_raw_read_seqcount(s)						\
+	raw_read_seqcount_t(__to_seqcount_t(s))
+
+#define do_raw_seqcount_begin(s)					\
+	raw_seqcount_t_begin(__to_seqcount_t(s))
+
+#define do_raw_read_seqcount_begin(s)					\
+	raw_read_seqcount_t_begin(__to_seqcount_t(s))
+
+#define do_read_seqcount_begin(s)					\
+	read_seqcount_t_begin(__to_seqcount_t(s))
+
+#define do_raw_read_seqcount_latch(s)					\
+	raw_read_seqcount_t_latch(__to_seqcount_t(s))
+
+#define do___read_seqcount_retry(s, start)				\
+	__read_seqcount_t_retry(__to_seqcount_t(s), start)
+
+#define do_read_seqcount_retry(s, start)				\
+	read_seqcount_t_retry(__to_seqcount_t(s), start)
+
+#endif /* __LINUX_SEQLOCK_TYPES_INTERNAL_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 13/25] dma-buf: Use sequence counter with associated wound/wait mutex
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (11 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 12/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-20 10:48   ` Christian König
  2020-05-19 21:45 ` [PATCH v1 14/25] sched: tasks: Use sequence counter with associated spinlock Ahmed S. Darwish
                   ` (17 subsequent siblings)
  30 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Sumit Semwal,
	Felix Kuehling, Alex Deucher, Christian König,
	David (ChunMing) Zhou, David Airlie, Daniel Vetter, linux-media,
	dri-devel, amd-gfx

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the sequence counter write side critical
section.

The dma-buf reservation subsystem uses plain sequence counters to manage
updates to reservations. Writer serialization is accomplished through a
wound/wait mutex.

Acquiring a wound/wait mutex does not disable preemption, so this needs
to be done manually before and after the write side critical section.

Use the newly-added seqcount_ww_mutex_t instead:

  - It associates the ww_mutex with the sequence count, which enables
    lockdep to validate that the write side critical section is properly
    serialized.

  - It removes the need to explicitly add preempt_disable/enable()
    around the write side critical section because the write_begin/end()
    functions for this new data type automatically do this.

If lockdep is disabled this ww_mutex lock association is compiled out
and has neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 drivers/dma-buf/dma-resv.c                       | 8 +-------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 --
 include/linux/dma-resv.h                         | 2 +-
 3 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 590ce7ad60a0..3aba2b2bfc48 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -128,7 +128,7 @@ subsys_initcall(dma_resv_lockdep);
 void dma_resv_init(struct dma_resv *obj)
 {
 	ww_mutex_init(&obj->lock, &reservation_ww_class);
-	seqcount_init(&obj->seq);
+	seqcount_ww_mutex_init(&obj->seq, &obj->lock);
 
 	RCU_INIT_POINTER(obj->fence, NULL);
 	RCU_INIT_POINTER(obj->fence_excl, NULL);
@@ -259,7 +259,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
 	fobj = dma_resv_get_list(obj);
 	count = fobj->shared_count;
 
-	preempt_disable();
 	write_seqcount_begin(&obj->seq);
 
 	for (i = 0; i < count; ++i) {
@@ -281,7 +280,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
 	smp_store_mb(fobj->shared_count, count);
 
 	write_seqcount_end(&obj->seq);
-	preempt_enable();
 	dma_fence_put(old);
 }
 EXPORT_SYMBOL(dma_resv_add_shared_fence);
@@ -308,14 +306,12 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
 	if (fence)
 		dma_fence_get(fence);
 
-	preempt_disable();
 	write_seqcount_begin(&obj->seq);
 	/* write_seqcount_begin provides the necessary memory barrier */
 	RCU_INIT_POINTER(obj->fence_excl, fence);
 	if (old)
 		old->shared_count = 0;
 	write_seqcount_end(&obj->seq);
-	preempt_enable();
 
 	/* inplace update, no shared fences */
 	while (i--)
@@ -393,13 +389,11 @@ int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
 	src_list = dma_resv_get_list(dst);
 	old = dma_resv_get_excl(dst);
 
-	preempt_disable();
 	write_seqcount_begin(&dst->seq);
 	/* write_seqcount_begin provides the necessary memory barrier */
 	RCU_INIT_POINTER(dst->fence_excl, new);
 	RCU_INIT_POINTER(dst->fence, dst_list);
 	write_seqcount_end(&dst->seq);
-	preempt_enable();
 
 	dma_resv_list_free(src_list);
 	dma_fence_put(old);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 9dff792c9290..87fd32aae8f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -258,11 +258,9 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
 	new->shared_count = k;
 
 	/* Install the new fence list, seqcount provides the barriers */
-	preempt_disable();
 	write_seqcount_begin(&resv->seq);
 	RCU_INIT_POINTER(resv->fence, new);
 	write_seqcount_end(&resv->seq);
-	preempt_enable();
 
 	/* Drop the references to the removed fences or move them to ef_list */
 	for (i = j, k = 0; i < old->shared_count; ++i) {
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index a6538ae7d93f..d44a77e8a7e3 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -69,7 +69,7 @@ struct dma_resv_list {
  */
 struct dma_resv {
 	struct ww_mutex lock;
-	seqcount_t seq;
+	seqcount_ww_mutex_t seq;
 
 	struct dma_fence __rcu *fence_excl;
 	struct dma_resv_list __rcu *fence;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 14/25] sched: tasks: Use sequence counter with associated spinlock
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (12 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 13/25] dma-buf: Use sequence counter with associated wound/wait mutex Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 15/25] netfilter: conntrack: " Ahmed S. Darwish
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/sched.h | 2 +-
 init/init_task.c      | 3 ++-
 kernel/fork.c         | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4418f5cb8324..a9ce6fbeb735 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1046,7 +1046,7 @@ struct task_struct {
 	/* Protected by ->alloc_lock: */
 	nodemask_t			mems_allowed;
 	/* Seqence number to catch updates: */
-	seqcount_t			mems_allowed_seq;
+	seqcount_spinlock_t		mems_allowed_seq;
 	int				cpuset_mem_spread_rotor;
 	int				cpuset_slab_spread_rotor;
 #endif
diff --git a/init/init_task.c b/init/init_task.c
index bd403ed3e418..94bf4aea8293 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -142,7 +142,8 @@ struct task_struct init_task
 	.rcu_tasks_idle_cpu = -1,
 #endif
 #ifdef CONFIG_CPUSETS
-	.mems_allowed_seq = SEQCNT_ZERO(init_task.mems_allowed_seq),
+	.mems_allowed_seq = SEQCNT_SPINLOCK_ZERO(init_task.mems_allowed_seq,
+						 &init_task.alloc_lock),
 #endif
 #ifdef CONFIG_RT_MUTEXES
 	.pi_waiters	= RB_ROOT_CACHED,
diff --git a/kernel/fork.c b/kernel/fork.c
index 8c700f881d92..a0fde1f17e0a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2019,7 +2019,7 @@ static __latent_entropy struct task_struct *copy_process(
 #ifdef CONFIG_CPUSETS
 	p->cpuset_mem_spread_rotor = NUMA_NO_NODE;
 	p->cpuset_slab_spread_rotor = NUMA_NO_NODE;
-	seqcount_init(&p->mems_allowed_seq);
+	seqcount_spinlock_init(&p->mems_allowed_seq, &p->alloc_lock);
 #endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	p->irq_events = 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 15/25] netfilter: conntrack: Use sequence counter with associated spinlock
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (13 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 14/25] sched: tasks: Use sequence counter with associated spinlock Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 16/25] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock Ahmed S. Darwish
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	Jakub Kicinski, netfilter-devel, coreteam, netdev

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/net/netfilter/nf_conntrack.h | 2 +-
 net/netfilter/nf_conntrack_core.c    | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index 9f551f3b69c6..333fd54aec30 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -286,7 +286,7 @@ int nf_conntrack_hash_resize(unsigned int hashsize);
 
 extern struct hlist_nulls_head *nf_conntrack_hash;
 extern unsigned int nf_conntrack_htable_size;
-extern seqcount_t nf_conntrack_generation;
+extern seqcount_spinlock_t nf_conntrack_generation;
 extern unsigned int nf_conntrack_max;
 
 /* must be called with rcu read lock held */
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index c4582eb71766..48a839377da2 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -180,7 +180,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_htable_size);
 
 unsigned int nf_conntrack_max __read_mostly;
 EXPORT_SYMBOL_GPL(nf_conntrack_max);
-seqcount_t nf_conntrack_generation __read_mostly;
+seqcount_spinlock_t nf_conntrack_generation __read_mostly;
 static unsigned int nf_conntrack_hash_rnd __read_mostly;
 
 static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple,
@@ -2512,7 +2512,8 @@ int nf_conntrack_init_start(void)
 	/* struct nf_ct_ext uses u8 to store offsets/size */
 	BUILD_BUG_ON(total_extension_size() > 255u);
 
-	seqcount_init(&nf_conntrack_generation);
+	seqcount_spinlock_init(&nf_conntrack_generation,
+			       &nf_conntrack_locks_all_lock);
 
 	for (i = 0; i < CONNTRACK_LOCKS; i++)
 		spin_lock_init(&nf_conntrack_locks[i]);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 16/25] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (14 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 15/25] netfilter: conntrack: " Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 17/25] xfrm: policy: Use sequence counters with associated lock Ahmed S. Darwish
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	Jakub Kicinski, netfilter-devel, coreteam, netdev

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_rwlock_t data type, which allows to associate a
rwlock with the sequence counter. This enables lockdep to verify that
the rwlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 net/netfilter/nft_set_rbtree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 3ffef454d469..f50d986d43c5 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -18,7 +18,7 @@
 struct nft_rbtree {
 	struct rb_root		root;
 	rwlock_t		lock;
-	seqcount_t		count;
+	seqcount_rwlock_t	count;
 	struct delayed_work	gc_work;
 };
 
@@ -505,7 +505,7 @@ static int nft_rbtree_init(const struct nft_set *set,
 	struct nft_rbtree *priv = nft_set_priv(set);
 
 	rwlock_init(&priv->lock);
-	seqcount_init(&priv->count);
+	seqcount_rwlock_init(&priv->count, &priv->lock);
 	priv->root = RB_ROOT;
 
 	INIT_DEFERRABLE_WORK(&priv->gc_work, nft_rbtree_gc);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 17/25] xfrm: policy: Use sequence counters with associated lock
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (15 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 16/25] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 18/25] timekeeping: Use sequence counter with associated raw spinlock Ahmed S. Darwish
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Steffen Klassert,
	Herbert Xu, David S. Miller, Jakub Kicinski, netdev

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the sequence counter write side critical
section.

A plain seqcount_t does not contain the information of which lock must
be held when entering a write side critical section.

Use the new seqcount_spinlock_t and seqcount_mutex_t data types instead,
which allow to associate a lock with the sequence counter. This enables
lockdep to verify that the lock used for writer serialization is held
when the write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 net/xfrm/xfrm_policy.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 297b2fdb3c29..aae78a7aecd7 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -122,7 +122,7 @@ struct xfrm_pol_inexact_bin {
 	/* list containing '*:*' policies */
 	struct hlist_head hhead;
 
-	seqcount_t count;
+	seqcount_spinlock_t count;
 	/* tree sorted by daddr/prefix */
 	struct rb_root root_d;
 
@@ -155,7 +155,7 @@ static struct xfrm_policy_afinfo const __rcu *xfrm_policy_afinfo[AF_INET6 + 1]
 						__read_mostly;
 
 static struct kmem_cache *xfrm_dst_cache __ro_after_init;
-static __read_mostly seqcount_t xfrm_policy_hash_generation;
+static __read_mostly seqcount_mutex_t xfrm_policy_hash_generation;
 
 static struct rhashtable xfrm_policy_inexact_table;
 static const struct rhashtable_params xfrm_pol_inexact_params;
@@ -719,7 +719,7 @@ xfrm_policy_inexact_alloc_bin(const struct xfrm_policy *pol, u8 dir)
 	INIT_HLIST_HEAD(&bin->hhead);
 	bin->root_d = RB_ROOT;
 	bin->root_s = RB_ROOT;
-	seqcount_init(&bin->count);
+	seqcount_spinlock_init(&bin->count, &net->xfrm.xfrm_policy_lock);
 
 	prev = rhashtable_lookup_get_insert_key(&xfrm_policy_inexact_table,
 						&bin->k, &bin->head,
@@ -1911,7 +1911,7 @@ static int xfrm_policy_match(const struct xfrm_policy *pol,
 
 static struct xfrm_pol_inexact_node *
 xfrm_policy_lookup_inexact_addr(const struct rb_root *r,
-				seqcount_t *count,
+				seqcount_spinlock_t *count,
 				const xfrm_address_t *addr, u16 family)
 {
 	const struct rb_node *parent;
@@ -4158,7 +4158,7 @@ void __init xfrm_init(void)
 {
 	register_pernet_subsys(&xfrm_net_ops);
 	xfrm_dev_init();
-	seqcount_init(&xfrm_policy_hash_generation);
+	seqcount_mutex_init(&xfrm_policy_hash_generation, &hash_resize_mutex);
 	xfrm_input_init();
 
 #ifdef CONFIG_INET_ESPINTCP
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 18/25] timekeeping: Use sequence counter with associated raw spinlock
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (16 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 17/25] xfrm: policy: Use sequence counters with associated lock Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 19/25] vfs: Use sequence counter with associated spinlock Ahmed S. Darwish
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, John Stultz,
	Stephen Boyd

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_raw_spinlock_t data type, which allows to associate
a raw spinlock with the sequence counter. This enables lockdep to verify
that the raw spinlock used for writer serialization is held when the
write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 kernel/time/timekeeping.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 9ebaab13339d..24e91a1e2acd 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -39,18 +39,19 @@ enum timekeeping_adv_mode {
 	TK_ADV_FREQ
 };
 
+static DEFINE_RAW_SPINLOCK(timekeeper_lock);
+
 /*
  * The most important data for readout fits into a single 64 byte
  * cache line.
  */
 static struct {
-	seqcount_t		seq;
+	seqcount_raw_spinlock_t	seq;
 	struct timekeeper	timekeeper;
 } tk_core ____cacheline_aligned = {
-	.seq = SEQCNT_ZERO(tk_core.seq),
+	.seq = SEQCNT_RAW_SPINLOCK_ZERO(tk_core.seq, &timekeeper_lock),
 };
 
-static DEFINE_RAW_SPINLOCK(timekeeper_lock);
 static struct timekeeper shadow_timekeeper;
 
 /**
@@ -63,7 +64,7 @@ static struct timekeeper shadow_timekeeper;
  * See @update_fast_timekeeper() below.
  */
 struct tk_fast {
-	seqcount_t		seq;
+	seqcount_raw_spinlock_t	seq;
 	struct tk_read_base	base[2];
 };
 
@@ -80,11 +81,13 @@ static struct clocksource dummy_clock = {
 };
 
 static struct tk_fast tk_fast_mono ____cacheline_aligned = {
+	.seq     = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_mono.seq, &timekeeper_lock),
 	.base[0] = { .clock = &dummy_clock, },
 	.base[1] = { .clock = &dummy_clock, },
 };
 
 static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
+	.seq     = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_raw.seq, &timekeeper_lock),
 	.base[0] = { .clock = &dummy_clock, },
 	.base[1] = { .clock = &dummy_clock, },
 };
@@ -157,7 +160,7 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
  * tk_clock_read - atomic clocksource read() helper
  *
  * This helper is necessary to use in the read paths because, while the
- * seqlock ensures we don't return a bad value while structures are updated,
+ * seqcount ensures we don't return a bad value while structures are updated,
  * it doesn't protect from potential crashes. There is the possibility that
  * the tkr's clocksource may change between the read reference, and the
  * clock reference passed to the read function.  This can cause crashes if
@@ -222,10 +225,10 @@ static inline u64 timekeeping_get_delta(const struct tk_read_base *tkr)
 	unsigned int seq;
 
 	/*
-	 * Since we're called holding a seqlock, the data may shift
+	 * Since we're called holding a seqcount, the data may shift
 	 * under us while we're doing the calculation. This can cause
 	 * false positives, since we'd note a problem but throw the
-	 * results away. So nest another seqlock here to atomically
+	 * results away. So nest another seqcount here to atomically
 	 * grab the points we are checking with.
 	 */
 	do {
@@ -486,7 +489,7 @@ EXPORT_SYMBOL_GPL(ktime_get_raw_fast_ns);
  *
  * To keep it NMI safe since we're accessing from tracing, we're not using a
  * separate timekeeper with updates to monotonic clock and boot offset
- * protected with seqlocks. This has the following minor side effects:
+ * protected with seqcounts. This has the following minor side effects:
  *
  * (1) Its possible that a timestamp be taken after the boot offset is updated
  * but before the timekeeper is updated. If this happens, the new boot offset
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 19/25] vfs: Use sequence counter with associated spinlock
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (17 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 18/25] timekeeping: Use sequence counter with associated raw spinlock Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 20/25] raid5: " Ahmed S. Darwish
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Alexander Viro,
	linux-fsdevel

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 fs/dcache.c               | 2 +-
 fs/fs_struct.c            | 4 ++--
 include/linux/dcache.h    | 2 +-
 include/linux/fs_struct.h | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b280e07e162b..e5f365d8fd67 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1727,7 +1727,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	dentry->d_lockref.count = 1;
 	dentry->d_flags = 0;
 	spin_lock_init(&dentry->d_lock);
-	seqcount_init(&dentry->d_seq);
+	seqcount_spinlock_init(&dentry->d_seq, &dentry->d_lock);
 	dentry->d_inode = NULL;
 	dentry->d_parent = dentry;
 	dentry->d_sb = sb;
diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index ca639ed967b7..04b3f5b9c629 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -117,7 +117,7 @@ struct fs_struct *copy_fs_struct(struct fs_struct *old)
 		fs->users = 1;
 		fs->in_exec = 0;
 		spin_lock_init(&fs->lock);
-		seqcount_init(&fs->seq);
+		seqcount_spinlock_init(&fs->seq, &fs->lock);
 		fs->umask = old->umask;
 
 		spin_lock(&old->lock);
@@ -163,6 +163,6 @@ EXPORT_SYMBOL(current_umask);
 struct fs_struct init_fs = {
 	.users		= 1,
 	.lock		= __SPIN_LOCK_UNLOCKED(init_fs.lock),
-	.seq		= SEQCNT_ZERO(init_fs.seq),
+	.seq		= SEQCNT_SPINLOCK_ZERO(init_fs.seq, &init_fs.lock),
 	.umask		= 0022,
 };
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index c1488cc84fd9..235563da356d 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -89,7 +89,7 @@ extern struct dentry_stat_t dentry_stat;
 struct dentry {
 	/* RCU lookup touched fields */
 	unsigned int d_flags;		/* protected by d_lock */
-	seqcount_t d_seq;		/* per dentry seqlock */
+	seqcount_spinlock_t d_seq;	/* per dentry seqlock */
 	struct hlist_bl_node d_hash;	/* lookup hash list */
 	struct dentry *d_parent;	/* parent directory */
 	struct qstr d_name;
diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h
index cf1015abfbf2..783b48dedb72 100644
--- a/include/linux/fs_struct.h
+++ b/include/linux/fs_struct.h
@@ -9,7 +9,7 @@
 struct fs_struct {
 	int users;
 	spinlock_t lock;
-	seqcount_t seq;
+	seqcount_spinlock_t seq;
 	int umask;
 	int in_exec;
 	struct path root, pwd;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 20/25] raid5: Use sequence counter with associated spinlock
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (18 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 19/25] vfs: Use sequence counter with associated spinlock Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 21/25] iocost: " Ahmed S. Darwish
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Song Liu, linux-raid

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 drivers/md/raid5.c | 2 +-
 drivers/md/raid5.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ba00e9877f02..69f31c675b58 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6929,7 +6929,7 @@ static struct r5conf *setup_conf(struct mddev *mddev)
 	} else
 		goto abort;
 	spin_lock_init(&conf->device_lock);
-	seqcount_init(&conf->gen_lock);
+	seqcount_spinlock_init(&conf->gen_lock, &conf->device_lock);
 	mutex_init(&conf->cache_size_mutex);
 	init_waitqueue_head(&conf->wait_for_quiescent);
 	init_waitqueue_head(&conf->wait_for_stripe);
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index f90e0704bed9..a2c9e9e9f5ac 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -589,7 +589,7 @@ struct r5conf {
 	int			prev_chunk_sectors;
 	int			prev_algo;
 	short			generation; /* increments with every reshape */
-	seqcount_t		gen_lock;	/* lock against generation changes */
+	seqcount_spinlock_t	gen_lock;	/* lock against generation changes */
 	unsigned long		reshape_checkpoint; /* Time we last updated
 						     * metadata */
 	long long		min_offset_diff; /* minimum difference between
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 21/25] iocost: Use sequence counter with associated spinlock
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (19 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 20/25] raid5: " Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 22/25] NFSv4: " Ahmed S. Darwish
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jens Axboe, linux-block

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 block/blk-iocost.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 7c1fe605d0d6..8029a9e8fa55 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -405,7 +405,7 @@ struct ioc {
 	enum ioc_running		running;
 	atomic64_t			vtime_rate;
 
-	seqcount_t			period_seqcount;
+	seqcount_spinlock_t		period_seqcount;
 	u32				period_at;	/* wallclock starttime */
 	u64				period_at_vtime; /* vtime starttime */
 
@@ -872,7 +872,6 @@ static void ioc_now(struct ioc *ioc, struct ioc_now *now)
 
 static void ioc_start_period(struct ioc *ioc, struct ioc_now *now)
 {
-	lockdep_assert_held(&ioc->lock);
 	WARN_ON_ONCE(ioc->running != IOC_RUNNING);
 
 	write_seqcount_begin(&ioc->period_seqcount);
@@ -1958,7 +1957,7 @@ static int blk_iocost_init(struct request_queue *q)
 
 	ioc->running = IOC_IDLE;
 	atomic64_set(&ioc->vtime_rate, VTIME_PER_USEC);
-	seqcount_init(&ioc->period_seqcount);
+	seqcount_spinlock_init(&ioc->period_seqcount, &ioc->lock);
 	ioc->period_at = ktime_to_us(ktime_get());
 	atomic64_set(&ioc->cur_period, 0);
 	atomic_set(&ioc->hweight_gen, 0);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 22/25] NFSv4: Use sequence counter with associated spinlock
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (20 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 21/25] iocost: " Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 23/25] userfaultfd: " Ahmed S. Darwish
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Trond Myklebust,
	Anna Schumaker, linux-nfs

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 fs/nfs/nfs4_fs.h   | 2 +-
 fs/nfs/nfs4state.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 2b7f6dcd2eb8..210e590e1f71 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -117,7 +117,7 @@ struct nfs4_state_owner {
 	unsigned long	     so_flags;
 	struct list_head     so_states;
 	struct nfs_seqid_counter so_seqid;
-	seqcount_t	     so_reclaim_seqcount;
+	seqcount_spinlock_t  so_reclaim_seqcount;
 	struct mutex	     so_delegreturn_mutex;
 };
 
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index ac93715c05a4..9b2bad35ad24 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -509,7 +509,7 @@ nfs4_alloc_state_owner(struct nfs_server *server,
 	nfs4_init_seqid_counter(&sp->so_seqid);
 	atomic_set(&sp->so_count, 1);
 	INIT_LIST_HEAD(&sp->so_lru);
-	seqcount_init(&sp->so_reclaim_seqcount);
+	seqcount_spinlock_init(&sp->so_reclaim_seqcount, &sp->so_lock);
 	mutex_init(&sp->so_delegreturn_mutex);
 	return sp;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 23/25] userfaultfd: Use sequence counter with associated spinlock
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (21 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 22/25] NFSv4: " Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 24/25] kvm/eventfd: " Ahmed S. Darwish
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Alexander Viro,
	linux-fsdevel

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 fs/userfaultfd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index e39fdec8a0b0..dd3aab31c50f 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -61,7 +61,7 @@ struct userfaultfd_ctx {
 	/* waitqueue head for events */
 	wait_queue_head_t event_wqh;
 	/* a refile sequence protected by fault_pending_wqh lock */
-	struct seqcount refile_seq;
+	seqcount_spinlock_t refile_seq;
 	/* pseudo fd refcounting */
 	refcount_t refcount;
 	/* userfaultfd syscall flags */
@@ -1998,7 +1998,7 @@ static void init_once_userfaultfd_ctx(void *mem)
 	init_waitqueue_head(&ctx->fault_wqh);
 	init_waitqueue_head(&ctx->event_wqh);
 	init_waitqueue_head(&ctx->fd_wqh);
-	seqcount_init(&ctx->refile_seq);
+	seqcount_spinlock_init(&ctx->refile_seq, &ctx->fault_pending_wqh.lock);
 }
 
 SYSCALL_DEFINE1(userfaultfd, int, flags)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 24/25] kvm/eventfd: Use sequence counter with associated spinlock
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (22 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 23/25] userfaultfd: " Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-05-19 21:45 ` [PATCH v1 25/25] hrtimer: Use sequence counter with associated raw spinlock Ahmed S. Darwish
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Paolo Bonzini, kvm

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/kvm_irqfd.h | 2 +-
 virt/kvm/eventfd.c        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index dc1da020305b..dac047abdba7 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -42,7 +42,7 @@ struct kvm_kernel_irqfd {
 	wait_queue_entry_t wait;
 	/* Update side is protected by irqfds.lock */
 	struct kvm_kernel_irq_routing_entry irq_entry;
-	seqcount_t irq_entry_sc;
+	seqcount_spinlock_t irq_entry_sc;
 	/* Used for level IRQ fast-path */
 	int gsi;
 	struct work_struct inject;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 67b6fc153e9c..8694a2920ea9 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -303,7 +303,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	INIT_LIST_HEAD(&irqfd->list);
 	INIT_WORK(&irqfd->inject, irqfd_inject);
 	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
-	seqcount_init(&irqfd->irq_entry_sc);
+	seqcount_spinlock_init(&irqfd->irq_entry_sc, &kvm->irqfds.lock);
 
 	f = fdget(args->fd);
 	if (!f.file) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 25/25] hrtimer: Use sequence counter with associated raw spinlock
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (23 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 24/25] kvm/eventfd: " Ahmed S. Darwish
@ 2020-05-19 21:45 ` Ahmed S. Darwish
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_raw_spinlock_t data type, which allows to associate
a raw spinlock with the sequence counter. This enables lockdep to verify
that the raw spinlock used for writer serialization is held when the
write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/hrtimer.h |  2 +-
 kernel/time/hrtimer.c   | 13 ++++++++++---
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 15c8ac313678..25993b86ac5c 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -159,7 +159,7 @@ struct hrtimer_clock_base {
 	struct hrtimer_cpu_base	*cpu_base;
 	unsigned int		index;
 	clockid_t		clockid;
-	seqcount_t		seq;
+	seqcount_raw_spinlock_t	seq;
 	struct hrtimer		*running;
 	struct timerqueue_head	active;
 	ktime_t			(*get_time)(void);
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index d89da1c7e005..c4038511d5c9 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -135,7 +135,11 @@ static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
  * timer->base->cpu_base
  */
 static struct hrtimer_cpu_base migration_cpu_base = {
-	.clock_base = { { .cpu_base = &migration_cpu_base, }, },
+	.clock_base = { {
+		.cpu_base = &migration_cpu_base,
+		.seq      = SEQCNT_RAW_SPINLOCK_ZERO(migration_cpu_base.seq,
+						     &migration_cpu_base.lock),
+	}, },
 };
 
 #define migration_base	migration_cpu_base.clock_base[0]
@@ -1998,8 +2002,11 @@ int hrtimers_prepare_cpu(unsigned int cpu)
 	int i;
 
 	for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
-		cpu_base->clock_base[i].cpu_base = cpu_base;
-		timerqueue_init_head(&cpu_base->clock_base[i].active);
+		struct hrtimer_clock_base *clock_b = &cpu_base->clock_base[i];
+
+		clock_b->cpu_base = cpu_base;
+		seqcount_raw_spinlock_init(&clock_b->seq, &cpu_base->lock);
+		timerqueue_init_head(&clock_b->active);
 	}
 
 	cpu_base->cpu = cpu;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-19 21:45 ` [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount Ahmed S. Darwish
@ 2020-05-19 22:01   ` Stephen Hemminger
  2020-05-19 22:23     ` Thomas Gleixner
  2020-05-20  2:01   ` Eric Dumazet
  2020-05-20 14:37   ` Dan Carpenter
  2 siblings, 1 reply; 258+ messages in thread
From: Stephen Hemminger @ 2020-05-19 22:01 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML,
	David S. Miller, Jakub Kicinski, netdev

On Tue, 19 May 2020 23:45:23 +0200
"Ahmed S. Darwish" <a.darwish@linutronix.de> wrote:

> Sequence counters write paths are critical sections that must never be
> preempted, and blocking, even for CONFIG_PREEMPTION=n, is not allowed.
> 
> Commit 5dbe7c178d3f ("net: fix kernel deadlock with interface rename and
> netdev name retrieval.") handled a deadlock, observed with
> CONFIG_PREEMPTION=n, where the devnet_rename seqcount read side was
> infinitely spinning: it got scheduled after the seqcount write side
> blocked inside its own critical section.
> 
> To fix that deadlock, among other issues, the commit added a
> cond_resched() inside the read side section. While this will get the
> non-preemptible kernel eventually unstuck, the seqcount reader is fully
> exhausting its slice just spinning -- until TIF_NEED_RESCHED is set.
> 
> The fix is also still broken: if the seqcount reader belongs to a
> real-time scheduling policy, it can spin forever and the kernel will
> livelock.
> 
> Disabling preemption over the seqcount write side critical section will
> not work: inside it are a number of GFP_KERNEL allocations and mutex
> locking through the drivers/base/ :: device_rename() call chain.
> 
> From all the above, replace the seqcount with a rwsem.
> 
> Fixes: 5dbe7c178d3f (net: fix kernel deadlock with interface rename and netdev name retrieval.)
> Fixes: 30e6c9fa93cf (net: devnet_rename_seq should be a seqcount)
> Fixes: c91f6df2db49 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name)
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Have your performance tested this with 1000's of network devices?

The reason seqcount logic is was done here was to achieve scaleablity
and a semaphore does not scale as well.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-19 22:01   ` Stephen Hemminger
@ 2020-05-19 22:23     ` Thomas Gleixner
  2020-05-19 23:11       ` Stephen Hemminger
  0 siblings, 1 reply; 258+ messages in thread
From: Thomas Gleixner @ 2020-05-19 22:23 UTC (permalink / raw)
  To: Stephen Hemminger, Ahmed S. Darwish
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, David S. Miller,
	Jakub Kicinski, netdev

Stephen Hemminger <stephen@networkplumber.org> writes:
> On Tue, 19 May 2020 23:45:23 +0200
> "Ahmed S. Darwish" <a.darwish@linutronix.de> wrote:
>
>> Sequence counters write paths are critical sections that must never be
>> preempted, and blocking, even for CONFIG_PREEMPTION=n, is not allowed.
>> 
>> Commit 5dbe7c178d3f ("net: fix kernel deadlock with interface rename and
>> netdev name retrieval.") handled a deadlock, observed with
>> CONFIG_PREEMPTION=n, where the devnet_rename seqcount read side was
>> infinitely spinning: it got scheduled after the seqcount write side
>> blocked inside its own critical section.
>> 
>> To fix that deadlock, among other issues, the commit added a
>> cond_resched() inside the read side section. While this will get the
>> non-preemptible kernel eventually unstuck, the seqcount reader is fully
>> exhausting its slice just spinning -- until TIF_NEED_RESCHED is set.
>> 
>> The fix is also still broken: if the seqcount reader belongs to a
>> real-time scheduling policy, it can spin forever and the kernel will
>> livelock.
>> 
>> Disabling preemption over the seqcount write side critical section will
>> not work: inside it are a number of GFP_KERNEL allocations and mutex
>> locking through the drivers/base/ :: device_rename() call chain.
>> 
>> From all the above, replace the seqcount with a rwsem.
>> 
>> Fixes: 5dbe7c178d3f (net: fix kernel deadlock with interface rename and netdev name retrieval.)
>> Fixes: 30e6c9fa93cf (net: devnet_rename_seq should be a seqcount)
>> Fixes: c91f6df2db49 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name)
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
>> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> Have your performance tested this with 1000's of network devices?

No. We did not. -ENOTESTCASE

> The reason seqcount logic is was done here was to achieve scaleablity
> and a semaphore does not scale as well.

That still does not make the livelock magically going away. Just make a
reader with real-time priority preempt the writer and the system stops
dead. The net result is perfomance <= 0.

This was observed on RT kernels without a special 1000's of network
devices test case.

Just for the record: This is not a RT specific problem. You can
reproduce that w/o an RT kernel as well. Just run the reader with
real-time scheduling policy.

As much as you hate it from a performance POV the only sane rule of
programming is: Correctness first.

And this code clearly violates that rule.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-19 22:23     ` Thomas Gleixner
@ 2020-05-19 23:11       ` Stephen Hemminger
  2020-05-19 23:42         ` Thomas Gleixner
  0 siblings, 1 reply; 258+ messages in thread
From: Stephen Hemminger @ 2020-05-19 23:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ahmed S. Darwish, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML,
	David S. Miller, Jakub Kicinski, netdev

On Wed, 20 May 2020 00:23:48 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:

> Stephen Hemminger <stephen@networkplumber.org> writes:
> > On Tue, 19 May 2020 23:45:23 +0200
> > "Ahmed S. Darwish" <a.darwish@linutronix.de> wrote:
> >  
> >> Sequence counters write paths are critical sections that must never be
> >> preempted, and blocking, even for CONFIG_PREEMPTION=n, is not allowed.
> >> 
> >> Commit 5dbe7c178d3f ("net: fix kernel deadlock with interface rename and
> >> netdev name retrieval.") handled a deadlock, observed with
> >> CONFIG_PREEMPTION=n, where the devnet_rename seqcount read side was
> >> infinitely spinning: it got scheduled after the seqcount write side
> >> blocked inside its own critical section.
> >> 
> >> To fix that deadlock, among other issues, the commit added a
> >> cond_resched() inside the read side section. While this will get the
> >> non-preemptible kernel eventually unstuck, the seqcount reader is fully
> >> exhausting its slice just spinning -- until TIF_NEED_RESCHED is set.
> >> 
> >> The fix is also still broken: if the seqcount reader belongs to a
> >> real-time scheduling policy, it can spin forever and the kernel will
> >> livelock.
> >> 
> >> Disabling preemption over the seqcount write side critical section will
> >> not work: inside it are a number of GFP_KERNEL allocations and mutex
> >> locking through the drivers/base/ :: device_rename() call chain.
> >> 
> >> From all the above, replace the seqcount with a rwsem.
> >> 
> >> Fixes: 5dbe7c178d3f (net: fix kernel deadlock with interface rename and netdev name retrieval.)
> >> Fixes: 30e6c9fa93cf (net: devnet_rename_seq should be a seqcount)
> >> Fixes: c91f6df2db49 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name)
> >> Cc: <stable@vger.kernel.org>
> >> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> >> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>  
> >
> > Have your performance tested this with 1000's of network devices?  
> 
> No. We did not. -ENOTESTCASE

Please try, it isn't that hard..

# time for ((i=0;i<1000;i++)); do ip li add dev dummy$i type dummy; done

real	0m17.002s
user	0m1.064s
sys	0m0.375s

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-19 23:11       ` Stephen Hemminger
@ 2020-05-19 23:42         ` Thomas Gleixner
  2020-05-20  0:06           ` Stephen Hemminger
  2020-05-20  2:57           ` David Miller
  0 siblings, 2 replies; 258+ messages in thread
From: Thomas Gleixner @ 2020-05-19 23:42 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Ahmed S. Darwish, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML,
	David S. Miller, Jakub Kicinski, netdev

Stephen Hemminger <stephen@networkplumber.org> writes:
> On Wed, 20 May 2020 00:23:48 +0200
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> No. We did not. -ENOTESTCASE
>
> Please try, it isn't that hard..
>
> # time for ((i=0;i<1000;i++)); do ip li add dev dummy$i type dummy; done
>
> real	0m17.002s
> user	0m1.064s
> sys	0m0.375s

And that solves the incorrectness of the current code in which way?

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-19 23:42         ` Thomas Gleixner
@ 2020-05-20  0:06           ` Stephen Hemminger
  2020-05-20  1:55             ` Thomas Gleixner
  2020-05-20  2:57           ` David Miller
  1 sibling, 1 reply; 258+ messages in thread
From: Stephen Hemminger @ 2020-05-20  0:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ahmed S. Darwish, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML,
	David S. Miller, Jakub Kicinski, netdev

On Wed, 20 May 2020 01:42:30 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:

> Stephen Hemminger <stephen@networkplumber.org> writes:
> > On Wed, 20 May 2020 00:23:48 +0200
> > Thomas Gleixner <tglx@linutronix.de> wrote:  
> >> No. We did not. -ENOTESTCASE  
> >
> > Please try, it isn't that hard..
> >
> > # time for ((i=0;i<1000;i++)); do ip li add dev dummy$i type dummy; done
> >
> > real	0m17.002s
> > user	0m1.064s
> > sys	0m0.375s  
> 
> And that solves the incorrectness of the current code in which way?

Agree that the current code is has evolved over time to a state where it is not
correct in the case of Preempt-RT. The motivation for the changes to seqcount
goes back many years when there were ISP's that were concerned about scaling of tunnels, vlans etc.

Is it too much to ask for a simple before/after test of your patch as part 
of the submission. You probably measure latency changes to the nanosecond.

Getting it correct without causing user complaints.



^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-20  0:06           ` Stephen Hemminger
@ 2020-05-20  1:55             ` Thomas Gleixner
  0 siblings, 0 replies; 258+ messages in thread
From: Thomas Gleixner @ 2020-05-20  1:55 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Ahmed S. Darwish, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML,
	David S. Miller, Jakub Kicinski, netdev

Stephen Hemminger <stephen@networkplumber.org> writes:
> On Wed, 20 May 2020 01:42:30 +0200
> Thomas Gleixner <tglx@linutronix.de> wrote:
>
>> Stephen Hemminger <stephen@networkplumber.org> writes:
>> > On Wed, 20 May 2020 00:23:48 +0200
>> > Thomas Gleixner <tglx@linutronix.de> wrote:  
>> >> No. We did not. -ENOTESTCASE  
>> >
>> > Please try, it isn't that hard..
>> >
>> > # time for ((i=0;i<1000;i++)); do ip li add dev dummy$i type dummy; done
>> >
>> > real	0m17.002s
>> > user	0m1.064s
>> > sys	0m0.375s  
>> 
>> And that solves the incorrectness of the current code in which way?
>
> Agree that the current code is has evolved over time to a state where it is not
> correct in the case of Preempt-RT.

That's not a RT problem as explained in great length in the changelog
and as I pointed out in my previous reply.

 Realtime scheduling classes are available on stock kernels and all
 those attempts to "fix" the livelock problem are ignoring that fact.

Just because you or whoever involved are not using them or do not care
is not making the code more correct.

> The motivation for the changes to seqcount goes back many years when
> there were ISP's that were concerned about scaling of tunnels, vlans
> etc.

I completely understand where this comes from, but that is not a
justification for incorrect code at all.

> Is it too much to ask for a simple before/after test of your patch as part 
> of the submission. You probably measure latency changes to the
> nanosecond.

It's not too much to ask and I'm happy to provide the numbers.

But before I waste my time and produce them, can you please explain how
any numbers provided are going to change the fact that the code is
incorrect?

  A bug, is a bug no matter what the numbers are.

I don't have a insta reproducer at hand for the problem which made that
code go belly up, but the net result is simply:

      Before:			After:
	real	INFINTE         0mxx.yyys

And the 'Before' comes with the extra benefit of stall warnings (if
enabled in the config).

If you insist I surely can go the extra mile and write up the insta
reproducer and stick it into a bugzilla for you.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-19 21:45 ` [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount Ahmed S. Darwish
  2020-05-19 22:01   ` Stephen Hemminger
@ 2020-05-20  2:01   ` Eric Dumazet
  2020-05-20  6:42     ` Ahmed S. Darwish
  2020-05-20 14:37   ` Dan Carpenter
  2 siblings, 1 reply; 258+ messages in thread
From: Eric Dumazet @ 2020-05-20  2:01 UTC (permalink / raw)
  To: Ahmed S. Darwish, Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, David S. Miller, Jakub Kicinski, netdev



On 5/19/20 2:45 PM, Ahmed S. Darwish wrote:
> Sequence counters write paths are critical sections that must never be
> preempted, and blocking, even for CONFIG_PREEMPTION=n, is not allowed.
> 
> Commit 5dbe7c178d3f ("net: fix kernel deadlock with interface rename and
> netdev name retrieval.") handled a deadlock, observed with
> CONFIG_PREEMPTION=n, where the devnet_rename seqcount read side was
> infinitely spinning: it got scheduled after the seqcount write side
> blocked inside its own critical section.
> 
> To fix that deadlock, among other issues, the commit added a
> cond_resched() inside the read side section. While this will get the
> non-preemptible kernel eventually unstuck, the seqcount reader is fully
> exhausting its slice just spinning -- until TIF_NEED_RESCHED is set.
> 
> The fix is also still broken: if the seqcount reader belongs to a
> real-time scheduling policy, it can spin forever and the kernel will
> livelock.
> 
> Disabling preemption over the seqcount write side critical section will
> not work: inside it are a number of GFP_KERNEL allocations and mutex
> locking through the drivers/base/ :: device_rename() call chain.
> 
> From all the above, replace the seqcount with a rwsem.
> 
> Fixes: 5dbe7c178d3f (net: fix kernel deadlock with interface rename and netdev name retrieval.)
> Fixes: 30e6c9fa93cf (net: devnet_rename_seq should be a seqcount)
> Fixes: c91f6df2db49 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name)
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  net/core/dev.c | 30 ++++++++++++------------------
>  1 file changed, 12 insertions(+), 18 deletions(-)
>

Seems fine to me, assuming rwsem prevent starvation of the writer.

(Presumably this could be a per ndevice rwsem, or per netns, to provide some isolation)

Alternative would be to convert ndev->name from char array to a pointer (rcu protected),
but this looks quite invasive change, certainly not for stable branches.

Reviewed-by: Eric Dumazet <edumazet@google.com>



^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-19 23:42         ` Thomas Gleixner
  2020-05-20  0:06           ` Stephen Hemminger
@ 2020-05-20  2:57           ` David Miller
  2020-05-20  3:18             ` Eric Dumazet
  2020-05-20 19:37             ` Thomas Gleixner
  1 sibling, 2 replies; 258+ messages in thread
From: David Miller @ 2020-05-20  2:57 UTC (permalink / raw)
  To: tglx
  Cc: stephen, a.darwish, peterz, mingo, will, paulmck, bigeasy,
	rostedt, linux-kernel, kuba, netdev

From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 20 May 2020 01:42:30 +0200

> Stephen Hemminger <stephen@networkplumber.org> writes:
>> On Wed, 20 May 2020 00:23:48 +0200
>> Thomas Gleixner <tglx@linutronix.de> wrote:
>>> No. We did not. -ENOTESTCASE
>>
>> Please try, it isn't that hard..
>>
>> # time for ((i=0;i<1000;i++)); do ip li add dev dummy$i type dummy; done
>>
>> real	0m17.002s
>> user	0m1.064s
>> sys	0m0.375s
> 
> And that solves the incorrectness of the current code in which way?

You mentioned that there wasn't a test case, he gave you one to try.


^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-20  2:57           ` David Miller
@ 2020-05-20  3:18             ` Eric Dumazet
  2020-05-20  4:36               ` Stephen Hemminger
  2020-05-20 19:37             ` Thomas Gleixner
  1 sibling, 1 reply; 258+ messages in thread
From: Eric Dumazet @ 2020-05-20  3:18 UTC (permalink / raw)
  To: David Miller, tglx
  Cc: stephen, a.darwish, peterz, mingo, will, paulmck, bigeasy,
	rostedt, linux-kernel, kuba, netdev



On 5/19/20 7:57 PM, David Miller wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed, 20 May 2020 01:42:30 +0200
> 
>> Stephen Hemminger <stephen@networkplumber.org> writes:
>>> On Wed, 20 May 2020 00:23:48 +0200
>>> Thomas Gleixner <tglx@linutronix.de> wrote:
>>>> No. We did not. -ENOTESTCASE
>>>
>>> Please try, it isn't that hard..
>>>
>>> # time for ((i=0;i<1000;i++)); do ip li add dev dummy$i type dummy; done
>>>
>>> real	0m17.002s
>>> user	0m1.064s
>>> sys	0m0.375s
>>
>> And that solves the incorrectness of the current code in which way?
> 
> You mentioned that there wasn't a test case, he gave you one to try.
> 

I do not think this would ever use device rename, nor netdev_get_name()

None of this stuff is fast path really.

# time for ((i=1;i<1000;i++)); do ip li add dev dummy$i type dummy; done

real	0m1.127s
user	0m0.270s
sys	0m1.039s

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-20  3:18             ` Eric Dumazet
@ 2020-05-20  4:36               ` Stephen Hemminger
  0 siblings, 0 replies; 258+ messages in thread
From: Stephen Hemminger @ 2020-05-20  4:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, tglx, a.darwish, peterz, mingo, will, paulmck,
	bigeasy, rostedt, linux-kernel, kuba, netdev

On Tue, 19 May 2020 20:18:19 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On 5/19/20 7:57 PM, David Miller wrote:
> > From: Thomas Gleixner <tglx@linutronix.de>
> > Date: Wed, 20 May 2020 01:42:30 +0200
> >   
> >> Stephen Hemminger <stephen@networkplumber.org> writes:  
> >>> On Wed, 20 May 2020 00:23:48 +0200
> >>> Thomas Gleixner <tglx@linutronix.de> wrote:  
> >>>> No. We did not. -ENOTESTCASE  
> >>>
> >>> Please try, it isn't that hard..
> >>>
> >>> # time for ((i=0;i<1000;i++)); do ip li add dev dummy$i type dummy; done
> >>>
> >>> real	0m17.002s
> >>> user	0m1.064s
> >>> sys	0m0.375s  
> >>
> >> And that solves the incorrectness of the current code in which way?  
> > 
> > You mentioned that there wasn't a test case, he gave you one to try.
> >   
> 
> I do not think this would ever use device rename, nor netdev_get_name()
> 
> None of this stuff is fast path really.
> 
> # time for ((i=1;i<1000;i++)); do ip li add dev dummy$i type dummy; done
> 
> real	0m1.127s
> user	0m0.270s
> sys	0m1.039s

Your right it is a weak test, and most of the overhead is in the syscall
and all netlink events that happen.

It does end up looking up the new name, so would exercise that.
Better test is to use %d syntax or create 1000 dummy's then rename every one.

This is more of a stress test
# for ((i=0;i<1000;i++)); do echo link add dev dummy%d type dummy; done | time ip -batch -
0.00user 0.29system 0:02.11elapsed 13%CPU (0avgtext+0avgdata 2544maxresident)k
0inputs+0outputs (0major+148minor)pagefaults 0swaps

# for ((i=999;i>=0;i--)); do echo link set dummy$i name dummy$((i+1)); done | time ip -batch -
0.00user 0.26system 0:54.98elapsed 0%CPU (0avgtext+0avgdata 2508maxresident)k
0inputs+0outputs (0major+145minor)pagefaults 0swaps


^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-20  2:01   ` Eric Dumazet
@ 2020-05-20  6:42     ` Ahmed S. Darwish
  2020-05-20 12:51       ` Eric Dumazet
  0 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-20  6:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, David S. Miller, Jakub Kicinski, netdev

Hello Eric,

On Tue, May 19, 2020 at 07:01:38PM -0700, Eric Dumazet wrote:
>
> On 5/19/20 2:45 PM, Ahmed S. Darwish wrote:
> > Sequence counters write paths are critical sections that must never be
> > preempted, and blocking, even for CONFIG_PREEMPTION=n, is not allowed.
> >
> > Commit 5dbe7c178d3f ("net: fix kernel deadlock with interface rename and
> > netdev name retrieval.") handled a deadlock, observed with
> > CONFIG_PREEMPTION=n, where the devnet_rename seqcount read side was
> > infinitely spinning: it got scheduled after the seqcount write side
> > blocked inside its own critical section.
> >
> > To fix that deadlock, among other issues, the commit added a
> > cond_resched() inside the read side section. While this will get the
> > non-preemptible kernel eventually unstuck, the seqcount reader is fully
> > exhausting its slice just spinning -- until TIF_NEED_RESCHED is set.
> >
> > The fix is also still broken: if the seqcount reader belongs to a
> > real-time scheduling policy, it can spin forever and the kernel will
> > livelock.
> >
> > Disabling preemption over the seqcount write side critical section will
> > not work: inside it are a number of GFP_KERNEL allocations and mutex
> > locking through the drivers/base/ :: device_rename() call chain.
> >
> > From all the above, replace the seqcount with a rwsem.
> >
> > Fixes: 5dbe7c178d3f (net: fix kernel deadlock with interface rename and netdev name retrieval.)
> > Fixes: 30e6c9fa93cf (net: devnet_rename_seq should be a seqcount)
> > Fixes: c91f6df2db49 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name)
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> > Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > ---
> >  net/core/dev.c | 30 ++++++++++++------------------
> >  1 file changed, 12 insertions(+), 18 deletions(-)
> >
>
> Seems fine to me, assuming rwsem prevent starvation of the writer.
>

Thanks for the review.

AFAIK, due to 5cfd92e12e13 ("locking/rwsem: Adaptive disabling of reader
optimistic spinning"), using a rwsem shouldn't lead to writer starvation
in the contended case.

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 13/25] dma-buf: Use sequence counter with associated wound/wait mutex
  2020-05-19 21:45 ` [PATCH v1 13/25] dma-buf: Use sequence counter with associated wound/wait mutex Ahmed S. Darwish
@ 2020-05-20 10:48   ` Christian König
  2020-05-21  0:09     ` Ahmed S. Darwish
  0 siblings, 1 reply; 258+ messages in thread
From: Christian König @ 2020-05-20 10:48 UTC (permalink / raw)
  To: Ahmed S. Darwish, Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: David (ChunMing) Zhou, amd-gfx, Paul E. McKenney, David Airlie,
	Sebastian A. Siewior, LKML, Steven Rostedt, Christian König,
	dri-devel, Daniel Vetter, Alex Deucher, Felix Kuehling,
	Thomas Gleixner, Sumit Semwal, linux-media

Am 19.05.20 um 23:45 schrieb Ahmed S. Darwish:
> A sequence counter write side critical section must be protected by some
> form of locking to serialize writers. If the serialization primitive is
> not disabling preemption implicitly, preemption has to be explicitly
> disabled before entering the sequence counter write side critical
> section.
>
> The dma-buf reservation subsystem uses plain sequence counters to manage
> updates to reservations. Writer serialization is accomplished through a
> wound/wait mutex.
>
> Acquiring a wound/wait mutex does not disable preemption, so this needs
> to be done manually before and after the write side critical section.
>
> Use the newly-added seqcount_ww_mutex_t instead:
>
>    - It associates the ww_mutex with the sequence count, which enables
>      lockdep to validate that the write side critical section is properly
>      serialized.
>
>    - It removes the need to explicitly add preempt_disable/enable()
>      around the write side critical section because the write_begin/end()
>      functions for this new data type automatically do this.
>
> If lockdep is disabled this ww_mutex lock association is compiled out
> and has neither storage size nor runtime overhead.

Mhm, is the dma_resv object the only user of this new seqcount_ww_mutex 
variant ?

If yes we are trying to get rid of this sequence counter for quite some 
time, so I would rather invest the additional time to finish this.

Regards,
Christian.

>
> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> ---
>   drivers/dma-buf/dma-resv.c                       | 8 +-------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 --
>   include/linux/dma-resv.h                         | 2 +-
>   3 files changed, 2 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 590ce7ad60a0..3aba2b2bfc48 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -128,7 +128,7 @@ subsys_initcall(dma_resv_lockdep);
>   void dma_resv_init(struct dma_resv *obj)
>   {
>   	ww_mutex_init(&obj->lock, &reservation_ww_class);
> -	seqcount_init(&obj->seq);
> +	seqcount_ww_mutex_init(&obj->seq, &obj->lock);
>   
>   	RCU_INIT_POINTER(obj->fence, NULL);
>   	RCU_INIT_POINTER(obj->fence_excl, NULL);
> @@ -259,7 +259,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
>   	fobj = dma_resv_get_list(obj);
>   	count = fobj->shared_count;
>   
> -	preempt_disable();
>   	write_seqcount_begin(&obj->seq);
>   
>   	for (i = 0; i < count; ++i) {
> @@ -281,7 +280,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
>   	smp_store_mb(fobj->shared_count, count);
>   
>   	write_seqcount_end(&obj->seq);
> -	preempt_enable();
>   	dma_fence_put(old);
>   }
>   EXPORT_SYMBOL(dma_resv_add_shared_fence);
> @@ -308,14 +306,12 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
>   	if (fence)
>   		dma_fence_get(fence);
>   
> -	preempt_disable();
>   	write_seqcount_begin(&obj->seq);
>   	/* write_seqcount_begin provides the necessary memory barrier */
>   	RCU_INIT_POINTER(obj->fence_excl, fence);
>   	if (old)
>   		old->shared_count = 0;
>   	write_seqcount_end(&obj->seq);
> -	preempt_enable();
>   
>   	/* inplace update, no shared fences */
>   	while (i--)
> @@ -393,13 +389,11 @@ int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
>   	src_list = dma_resv_get_list(dst);
>   	old = dma_resv_get_excl(dst);
>   
> -	preempt_disable();
>   	write_seqcount_begin(&dst->seq);
>   	/* write_seqcount_begin provides the necessary memory barrier */
>   	RCU_INIT_POINTER(dst->fence_excl, new);
>   	RCU_INIT_POINTER(dst->fence, dst_list);
>   	write_seqcount_end(&dst->seq);
> -	preempt_enable();
>   
>   	dma_resv_list_free(src_list);
>   	dma_fence_put(old);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 9dff792c9290..87fd32aae8f9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -258,11 +258,9 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
>   	new->shared_count = k;
>   
>   	/* Install the new fence list, seqcount provides the barriers */
> -	preempt_disable();
>   	write_seqcount_begin(&resv->seq);
>   	RCU_INIT_POINTER(resv->fence, new);
>   	write_seqcount_end(&resv->seq);
> -	preempt_enable();
>   
>   	/* Drop the references to the removed fences or move them to ef_list */
>   	for (i = j, k = 0; i < old->shared_count; ++i) {
> diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
> index a6538ae7d93f..d44a77e8a7e3 100644
> --- a/include/linux/dma-resv.h
> +++ b/include/linux/dma-resv.h
> @@ -69,7 +69,7 @@ struct dma_resv_list {
>    */
>   struct dma_resv {
>   	struct ww_mutex lock;
> -	seqcount_t seq;
> +	seqcount_ww_mutex_t seq;
>   
>   	struct dma_fence __rcu *fence_excl;
>   	struct dma_resv_list __rcu *fence;


^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API
  2020-05-19 21:45 ` [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API Ahmed S. Darwish
@ 2020-05-20 12:22   ` Konstantin Khlebnikov
  2020-05-20 13:05     ` Peter Zijlstra
  2020-05-22 14:57   ` Peter Zijlstra
  1 sibling, 1 reply; 258+ messages in thread
From: Konstantin Khlebnikov @ 2020-05-20 12:22 UTC (permalink / raw)
  To: Ahmed S. Darwish, Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Andrew Morton, linux-mm

On 20/05/2020 00.45, Ahmed S. Darwish wrote:
> Commit eef1a429f234 ("mm/swap.c: piggyback lru_add_drain_all() calls")
> implemented an optimization mechanism to exit the to-be-started LRU
> drain operation (name it A) if another drain operation *started and
> finished* while (A) was blocked on the LRU draining mutex.
> 
> This was done through a seqcount latch, which is an abuse of its
> semantics:
> 
>    1. Seqcount latching should be used for the purpose of switching
>       between two storage places with sequence protection to allow
>       interruptible, preemptible writer sections. The optimization
>       mechanism has absolutely nothing to do with that.
> 
>    2. The used raw_write_seqcount_latch() has two smp write memory
>       barriers to always insure one consistent storage place out of the
>       two storage places available. This extra smp_wmb() is redundant for
>       the optimization use case.
> 
> Beside the API abuse, the semantics of a latch sequence counter was
> force fitted into the optimization. What was actually meant is to track
> generations of LRU draining operations, where "current lru draining
> generation = x" implies that all generations 0 < n <= x are already
> *scheduled* for draining.
> 
> Remove the conceptually-inappropriate seqcount latch usage and manually
> implement the optimization using a counter and SMP memory barriers.

Well, I thought it fits perfectly =)

Maybe it's worth to add helpers with appropriate semantics?
This is pretty common pattern.

> 
> Link: https://lkml.kernel.org/r/CALYGNiPSr-cxV9MX9czaVh6Wz_gzSv3H_8KPvgjBTGbJywUJpA@mail.gmail.com
> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> ---
>   mm/swap.c | 57 +++++++++++++++++++++++++++++++++++++++++++++----------
>   1 file changed, 47 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index bf9a79fed62d..d6910eeed43d 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -713,10 +713,20 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
>    */
>   void lru_add_drain_all(void)
>   {
> -	static seqcount_t seqcount = SEQCNT_ZERO(seqcount);
> -	static DEFINE_MUTEX(lock);
> +	/*
> +	 * lru_drain_gen - Current generation of pages that could be in vectors
> +	 *
> +	 * (A) Definition: lru_drain_gen = x implies that all generations
> +	 *     0 < n <= x are already scheduled for draining.
> +	 *
> +	 * This is an optimization for the highly-contended use case where a
> +	 * user space workload keeps constantly generating a flow of pages
> +	 * for each CPU.
> +	 */
> +	static unsigned int lru_drain_gen;
>   	static struct cpumask has_work;
> -	int cpu, seq;
> +	static DEFINE_MUTEX(lock);
> +	int cpu, this_gen;
>   
>   	/*
>   	 * Make sure nobody triggers this path before mm_percpu_wq is fully
> @@ -725,21 +735,48 @@ void lru_add_drain_all(void)
>   	if (WARN_ON(!mm_percpu_wq))
>   		return;
>   
> -	seq = raw_read_seqcount_latch(&seqcount);
> +	/*
> +	 * (B) Cache the LRU draining generation number
> +	 *
> +	 * smp_rmb() ensures that the counter is loaded before the mutex is
> +	 * taken. It pairs with the smp_wmb() inside the mutex critical section
> +	 * at (D).
> +	 */
> +	this_gen = READ_ONCE(lru_drain_gen);
> +	smp_rmb();
>   
>   	mutex_lock(&lock);
>   
>   	/*
> -	 * Piggyback on drain started and finished while we waited for lock:
> -	 * all pages pended at the time of our enter were drained from vectors.
> +	 * (C) Exit the draining operation if a newer generation, from another
> +	 * lru_add_drain_all(), was already scheduled for draining. Check (A).
>   	 */
> -	if (__read_seqcount_retry(&seqcount, seq))
> +	if (unlikely(this_gen != lru_drain_gen))
>   		goto done;
>   
> -	raw_write_seqcount_latch(&seqcount);
> +	/*
> +	 * (D) Increment generation number
> +	 *
> +	 * Pairs with READ_ONCE() and smp_rmb() at (B), outside of the critical
> +	 * section.
> +	 *
> +	 * This pairing must be done here, before the for_each_online_cpu loop
> +	 * below which drains the page vectors.
> +	 *
> +	 * Let x, y, and z represent some system CPU numbers, where x < y < z.
> +	 * Assume CPU #z is is in the middle of the for_each_online_cpu loop
> +	 * below and has already reached CPU #y's per-cpu data. CPU #x comes
> +	 * along, adds some pages to its per-cpu vectors, then calls
> +	 * lru_add_drain_all().
> +	 *
> +	 * If the paired smp_wmb() below is done at any later step, e.g. after
> +	 * the loop, CPU #x will just exit at (C) and miss flushing out all of
> +	 * its added pages.
> +	 */
> +	WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
> +	smp_wmb();
>   
>   	cpumask_clear(&has_work);
> -
>   	for_each_online_cpu(cpu) {
>   		struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
>   
> @@ -766,7 +803,7 @@ void lru_add_drain_all(void)
>   {
>   	lru_add_drain();
>   }
> -#endif
> +#endif /* CONFIG_SMP */
>   
>   /**
>    * release_pages - batched put_page()
> 

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-20  6:42     ` Ahmed S. Darwish
@ 2020-05-20 12:51       ` Eric Dumazet
  2020-06-03 14:33         ` Ahmed S. Darwish
  0 siblings, 1 reply; 258+ messages in thread
From: Eric Dumazet @ 2020-05-20 12:51 UTC (permalink / raw)
  To: Ahmed S. Darwish, Eric Dumazet
  Cc: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, David S. Miller, Jakub Kicinski, netdev



On 5/19/20 11:42 PM, Ahmed S. Darwish wrote:
> Hello Eric,
> 
> On Tue, May 19, 2020 at 07:01:38PM -0700, Eric Dumazet wrote:
>>
>> On 5/19/20 2:45 PM, Ahmed S. Darwish wrote:
>>> Sequence counters write paths are critical sections that must never be
>>> preempted, and blocking, even for CONFIG_PREEMPTION=n, is not allowed.
>>>
>>> Commit 5dbe7c178d3f ("net: fix kernel deadlock with interface rename and
>>> netdev name retrieval.") handled a deadlock, observed with
>>> CONFIG_PREEMPTION=n, where the devnet_rename seqcount read side was
>>> infinitely spinning: it got scheduled after the seqcount write side
>>> blocked inside its own critical section.
>>>
>>> To fix that deadlock, among other issues, the commit added a
>>> cond_resched() inside the read side section. While this will get the
>>> non-preemptible kernel eventually unstuck, the seqcount reader is fully
>>> exhausting its slice just spinning -- until TIF_NEED_RESCHED is set.
>>>
>>> The fix is also still broken: if the seqcount reader belongs to a
>>> real-time scheduling policy, it can spin forever and the kernel will
>>> livelock.
>>>
>>> Disabling preemption over the seqcount write side critical section will
>>> not work: inside it are a number of GFP_KERNEL allocations and mutex
>>> locking through the drivers/base/ :: device_rename() call chain.
>>>
>>> From all the above, replace the seqcount with a rwsem.
>>>
>>> Fixes: 5dbe7c178d3f (net: fix kernel deadlock with interface rename and netdev name retrieval.)
>>> Fixes: 30e6c9fa93cf (net: devnet_rename_seq should be a seqcount)
>>> Fixes: c91f6df2db49 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name)
>>> Cc: <stable@vger.kernel.org>
>>> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
>>> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>>> ---
>>>  net/core/dev.c | 30 ++++++++++++------------------
>>>  1 file changed, 12 insertions(+), 18 deletions(-)
>>>
>>
>> Seems fine to me, assuming rwsem prevent starvation of the writer.
>>
> 
> Thanks for the review.
> 
> AFAIK, due to 5cfd92e12e13 ("locking/rwsem: Adaptive disabling of reader
> optimistic spinning"), using a rwsem shouldn't lead to writer starvation
> in the contended case.

Hmm this was in linux-5.3, so very recent stuff.

Has this patch been backported to stable releases ?

With all the Fixes: tags you added, stable teams will backport this networking patch to
all stable versions.

Do we have a way to tune a dedicare rwsem to 'give preference to the (unique in this case) writer" over
a myriad of potential readers ?

Thanks.


^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API
  2020-05-20 12:22   ` Konstantin Khlebnikov
@ 2020-05-20 13:05     ` Peter Zijlstra
  0 siblings, 0 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-05-20 13:05 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Ahmed S. Darwish, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML,
	Andrew Morton, linux-mm

On Wed, May 20, 2020 at 03:22:15PM +0300, Konstantin Khlebnikov wrote:
> On 20/05/2020 00.45, Ahmed S. Darwish wrote:
> > Commit eef1a429f234 ("mm/swap.c: piggyback lru_add_drain_all() calls")
> > implemented an optimization mechanism to exit the to-be-started LRU
> > drain operation (name it A) if another drain operation *started and
> > finished* while (A) was blocked on the LRU draining mutex.

That commit is horrible...

> Well, I thought it fits perfectly =)
> 
> Maybe it's worth to add helpers with appropriate semantics?
> This is pretty common pattern.

Where's more sites?

> > @@ -725,21 +735,48 @@ void lru_add_drain_all(void)
> >   	if (WARN_ON(!mm_percpu_wq))
> >   		return;
> > -	seq = raw_read_seqcount_latch(&seqcount);
> >   	mutex_lock(&lock);
> >   	/*
> > -	 * Piggyback on drain started and finished while we waited for lock:
> > -	 * all pages pended at the time of our enter were drained from vectors.
> >   	 */
> > -	if (__read_seqcount_retry(&seqcount, seq))
> >   		goto done;

Since there is no ordering in raw_read_seqcount_latch(), and
mutex_lock() is an ACQUIRE, there's no guarantee the read actually
happens before the mutex is acquired.

> > -	raw_write_seqcount_latch(&seqcount);
> >   	cpumask_clear(&has_work);

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-19 21:45 ` [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount Ahmed S. Darwish
  2020-05-19 22:01   ` Stephen Hemminger
  2020-05-20  2:01   ` Eric Dumazet
@ 2020-05-20 14:37   ` Dan Carpenter
  2020-05-25 16:22     ` Ahmed S. Darwish
  2 siblings, 1 reply; 258+ messages in thread
From: Dan Carpenter @ 2020-05-20 14:37 UTC (permalink / raw)
  To: kbuild, Ahmed S. Darwish, Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: lkp, kbuild-all, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Ahmed S. Darwish,
	Jakub Kicinski, netdev

[-- Attachment #1: Type: text/plain, Size: 2944 bytes --]

Hi "Ahmed,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/locking/core]
[also build test WARNING on nf-next/master nf/master tip/timers/core linus/master v5.7-rc6 next-20200519]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Ahmed-S-Darwish/seqlock-Extend-seqcount-API-with-associated-locks/20200520-055145
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 23b5ae2e8e1326c91b5dfdbb6ebcd5a6820074ae
config: x86_64-defconfig (attached as .config)

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
net/core/dev.c:953 netdev_get_name() warn: inconsistent returns 'devnet_rename_sem'.

# https://github.com/0day-ci/linux/commit/2354e271ada778bbb935d7b20113693710905cff
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 2354e271ada778bbb935d7b20113693710905cff
vim +/devnet_rename_sem +953 net/core/dev.c

5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  935  int netdev_get_name(struct net *net, char *name, int ifindex)
5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  936  {
5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  937  	struct net_device *dev;
5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  938  
2354e271ada778b Ahmed S. Darwish 2020-05-19  939  	down_read(&devnet_rename_sem);
                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

2354e271ada778b Ahmed S. Darwish 2020-05-19  940  
5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  941  	rcu_read_lock();
5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  942  	dev = dev_get_by_index_rcu(net, ifindex);
5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  943  	if (!dev) {
5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  944  		rcu_read_unlock();
5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  945  		return -ENODEV;
                                                                ^^^^^^^^^^^^^^
We need to drop the new semaphore on error.

5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  946  	}
5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  947  
5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  948  	strcpy(name, dev->name);
5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  949  	rcu_read_unlock();
2354e271ada778b Ahmed S. Darwish 2020-05-19  950  
2354e271ada778b Ahmed S. Darwish 2020-05-19  951  	up_read(&devnet_rename_sem);
5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  952  
5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26 @953  	return 0;
5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  954  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29111 bytes --]

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-20  2:57           ` David Miller
  2020-05-20  3:18             ` Eric Dumazet
@ 2020-05-20 19:37             ` Thomas Gleixner
  2020-05-20 21:36               ` Stephen Hemminger
  1 sibling, 1 reply; 258+ messages in thread
From: Thomas Gleixner @ 2020-05-20 19:37 UTC (permalink / raw)
  To: David Miller
  Cc: stephen, a.darwish, peterz, mingo, will, paulmck, bigeasy,
	rostedt, linux-kernel, kuba, netdev

David Miller <davem@davemloft.net> writes:
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed, 20 May 2020 01:42:30 +0200
>>> Please try, it isn't that hard..
>>>
>>> # time for ((i=0;i<1000;i++)); do ip li add dev dummy$i type dummy; done
>>>
>>> real	0m17.002s
>>> user	0m1.064s
>>> sys	0m0.375s
>> 
>> And that solves the incorrectness of the current code in which way?
>
> You mentioned that there wasn't a test case, he gave you one to try.

If it makes you happy to compare incorrrect code with correct code, here
you go:

5 runs of 1000 device add, 1000 device rename and 1000 device del

CONFIG_PREEMPT_NONE=y

         Base      rwsem
 add     0:05.01   0:05.28
	 0:05.93   0:06.11
	 0:06.52   0:06.26
	 0:06.06   0:05.74
	 0:05.71   0:06.07

 rename  0:32.57   0:33.04
	 0:32.91   0:32.45
	 0:32.72   0:32.53
	 0:39.65   0:34.18
	 0:34.52   0:32.50

 delete  3:48.65   3:48.91
	 3:49.66   3:49.13
	 3:45.29   3:48.26
	 3:47.56   3:46.60
	 3:50.01   3:48.06

 -------------------------

CONFIG_PREEMPT_VOLUNTARY=y

         Base      rwsem
 add     0:06.80   0:06.42
	 0:04.77   0:05.03
	 0:05.74   0:04.62
	 0:05.87   0:04.34
	 0:04.20   0:04.12

 rename  0:33.33   0:42.02
	 0:42.36   0:32.55
	 0:39.58   0:31.60
	 0:33.69   0:35.08
	 0:34.24   0:33.97

 delete  3:47.82   3:44.00
	 3:47.42   3:51.00
	 3:48.52   3:48.88
	 3:48.50   3:48.09
	 3:50.03   3:46.56

 -------------------------

CONFIG_PREEMPT=y

         Base      rwsem

 add     0:07.89   0:07.72
	 0:07.25   0:06.72
	 0:07.42   0:06.51
	 0:06.92   0:06.38
	 0:06.20   0:06.72

 rename  0:41.77   0:32.39
	 0:44.29   0:33.29
	 0:36.19   0:34.86
	 0:33.19   0:35.06
	 0:37.00   0:34.78

 delete  2:36.96   2:39.97
	 2:37.80   2:42.19
	 2:44.66   2:48.40
	 2:39.75   2:41.02
	 2:40.77   2:38.36

The runtime variation is rather large and when running the same in a VM
I got complete random numbers for both base and rwsem. The most amazing
was delete where the time varies from 30s to 6m20s.

Btw, Sebastian noticed that rename spams dmesg:

  netdev_info(dev, "renamed from %s\n", oldname);

which eats about 50% of the Rename run time.

         Base      netdev_info() removed

Rename   0:34.84   0:17.48

That number at least makes tons of sense

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-20 19:37             ` Thomas Gleixner
@ 2020-05-20 21:36               ` Stephen Hemminger
  0 siblings, 0 replies; 258+ messages in thread
From: Stephen Hemminger @ 2020-05-20 21:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Miller, a.darwish, peterz, mingo, will, paulmck, bigeasy,
	rostedt, linux-kernel, kuba, netdev

On Wed, 20 May 2020 21:37:11 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:

> David Miller <davem@davemloft.net> writes:
> > From: Thomas Gleixner <tglx@linutronix.de>
> > Date: Wed, 20 May 2020 01:42:30 +0200  
> >>> Please try, it isn't that hard..
> >>>
> >>> # time for ((i=0;i<1000;i++)); do ip li add dev dummy$i type dummy; done
> >>>
> >>> real	0m17.002s
> >>> user	0m1.064s
> >>> sys	0m0.375s  
> >> 
> >> And that solves the incorrectness of the current code in which way?  
> >
> > You mentioned that there wasn't a test case, he gave you one to try.  
> 
> If it makes you happy to compare incorrrect code with correct code, here
> you go:
> 
> 5 runs of 1000 device add, 1000 device rename and 1000 device del
> 
> CONFIG_PREEMPT_NONE=y
> 
>          Base      rwsem
>  add     0:05.01   0:05.28
> 	 0:05.93   0:06.11
> 	 0:06.52   0:06.26
> 	 0:06.06   0:05.74
> 	 0:05.71   0:06.07
> 
>  rename  0:32.57   0:33.04
> 	 0:32.91   0:32.45
> 	 0:32.72   0:32.53
> 	 0:39.65   0:34.18
> 	 0:34.52   0:32.50
> 
>  delete  3:48.65   3:48.91
> 	 3:49.66   3:49.13
> 	 3:45.29   3:48.26
> 	 3:47.56   3:46.60
> 	 3:50.01   3:48.06
> 
>  -------------------------
> 
> CONFIG_PREEMPT_VOLUNTARY=y
> 
>          Base      rwsem
>  add     0:06.80   0:06.42
> 	 0:04.77   0:05.03
> 	 0:05.74   0:04.62
> 	 0:05.87   0:04.34
> 	 0:04.20   0:04.12
> 
>  rename  0:33.33   0:42.02
> 	 0:42.36   0:32.55
> 	 0:39.58   0:31.60
> 	 0:33.69   0:35.08
> 	 0:34.24   0:33.97
> 
>  delete  3:47.82   3:44.00
> 	 3:47.42   3:51.00
> 	 3:48.52   3:48.88
> 	 3:48.50   3:48.09
> 	 3:50.03   3:46.56
> 
>  -------------------------
> 
> CONFIG_PREEMPT=y
> 
>          Base      rwsem
> 
>  add     0:07.89   0:07.72
> 	 0:07.25   0:06.72
> 	 0:07.42   0:06.51
> 	 0:06.92   0:06.38
> 	 0:06.20   0:06.72
> 
>  rename  0:41.77   0:32.39
> 	 0:44.29   0:33.29
> 	 0:36.19   0:34.86
> 	 0:33.19   0:35.06
> 	 0:37.00   0:34.78
> 
>  delete  2:36.96   2:39.97
> 	 2:37.80   2:42.19
> 	 2:44.66   2:48.40
> 	 2:39.75   2:41.02
> 	 2:40.77   2:38.36
> 
> The runtime variation is rather large and when running the same in a VM
> I got complete random numbers for both base and rwsem. The most amazing
> was delete where the time varies from 30s to 6m20s.
> 
> Btw, Sebastian noticed that rename spams dmesg:
> 
>   netdev_info(dev, "renamed from %s\n", oldname);
> 
> which eats about 50% of the Rename run time.
> 
>          Base      netdev_info() removed
> 
> Rename   0:34.84   0:17.48
> 
> That number at least makes tons of sense
> 
> Thanks,
> 
>         tglx

Looks good thanks for following through.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 13/25] dma-buf: Use sequence counter with associated wound/wait mutex
  2020-05-20 10:48   ` Christian König
@ 2020-05-21  0:09     ` Ahmed S. Darwish
  2020-05-21 13:20       ` Christian König
  0 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-21  0:09 UTC (permalink / raw)
  To: Christian König
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, David (ChunMing) Zhou,
	amd-gfx, Paul E. McKenney, David Airlie, Sebastian A. Siewior,
	LKML, Steven Rostedt, Christian König, dri-devel,
	Daniel Vetter, Alex Deucher, Felix Kuehling, Thomas Gleixner,
	Sumit Semwal, linux-media

On Wed, May 20, 2020, Christian König wrote:
> Am 19.05.20 um 23:45 schrieb Ahmed S. Darwish:
> > A sequence counter write side critical section must be protected by some
> > form of locking to serialize writers. If the serialization primitive is
> > not disabling preemption implicitly, preemption has to be explicitly
> > disabled before entering the sequence counter write side critical
> > section.
> >
> > The dma-buf reservation subsystem uses plain sequence counters to manage
> > updates to reservations. Writer serialization is accomplished through a
> > wound/wait mutex.
> >
> > Acquiring a wound/wait mutex does not disable preemption, so this needs
> > to be done manually before and after the write side critical section.
> >
> > Use the newly-added seqcount_ww_mutex_t instead:
> >
> >    - It associates the ww_mutex with the sequence count, which enables
> >      lockdep to validate that the write side critical section is properly
> >      serialized.
> >
> >    - It removes the need to explicitly add preempt_disable/enable()
> >      around the write side critical section because the write_begin/end()
> >      functions for this new data type automatically do this.
> >
> > If lockdep is disabled this ww_mutex lock association is compiled out
> > and has neither storage size nor runtime overhead.
>
> Mhm, is the dma_resv object the only user of this new seqcount_ww_mutex
> variant ?
>
> If yes we are trying to get rid of this sequence counter for quite some
> time, so I would rather invest the additional time to finish this.
>

In this patch series, each extra "seqcount with associated lock" data
type costs us, exactly:

  - 1 typedef definition, seqcount_ww_mutex_t
  - 1 static initializer, SEQCNT_WW_MUTEX_ZERO()
  - 1 runtime initializer, seqcount_ww_mutex_init()

Definitions for the typedef and the 2 initializers above are
template-code one liners.

The logic which automatically disables preemption upon entering a
seqcount_ww_mutex_t write side critical section is also already shared
with seqcount_mutex_t and any future, preemptible, associated lock.

So, yes, dma-resv is the only user of seqcount_ww_mutex.

But even in that case, given the one liner template code nature of
seqcount_ww_mutex_t logic, it does not make sense to block the dma_resv
and amdgpu change until at some point in the future the sequence counter
is completely removed.

**If and when** the sequence counter gets removed, please just remove
the seqcount_ww_mutex_t data type with it. It will be extremely simple.

> Regards,
> Christian.
>

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 13/25] dma-buf: Use sequence counter with associated wound/wait mutex
  2020-05-21  0:09     ` Ahmed S. Darwish
@ 2020-05-21 13:20       ` Christian König
  0 siblings, 0 replies; 258+ messages in thread
From: Christian König @ 2020-05-21 13:20 UTC (permalink / raw)
  To: Ahmed S. Darwish, Christian König
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, David (ChunMing) Zhou,
	amd-gfx, Paul E. McKenney, David Airlie, Sebastian A. Siewior,
	LKML, Steven Rostedt, dri-devel, Daniel Vetter, Alex Deucher,
	Felix Kuehling, Thomas Gleixner, Sumit Semwal, linux-media

Am 21.05.20 um 02:09 schrieb Ahmed S. Darwish:
> On Wed, May 20, 2020, Christian König wrote:
>> Am 19.05.20 um 23:45 schrieb Ahmed S. Darwish:
>>> A sequence counter write side critical section must be protected by some
>>> form of locking to serialize writers. If the serialization primitive is
>>> not disabling preemption implicitly, preemption has to be explicitly
>>> disabled before entering the sequence counter write side critical
>>> section.
>>>
>>> The dma-buf reservation subsystem uses plain sequence counters to manage
>>> updates to reservations. Writer serialization is accomplished through a
>>> wound/wait mutex.
>>>
>>> Acquiring a wound/wait mutex does not disable preemption, so this needs
>>> to be done manually before and after the write side critical section.
>>>
>>> Use the newly-added seqcount_ww_mutex_t instead:
>>>
>>>     - It associates the ww_mutex with the sequence count, which enables
>>>       lockdep to validate that the write side critical section is properly
>>>       serialized.
>>>
>>>     - It removes the need to explicitly add preempt_disable/enable()
>>>       around the write side critical section because the write_begin/end()
>>>       functions for this new data type automatically do this.
>>>
>>> If lockdep is disabled this ww_mutex lock association is compiled out
>>> and has neither storage size nor runtime overhead.
>> Mhm, is the dma_resv object the only user of this new seqcount_ww_mutex
>> variant ?
>>
>> If yes we are trying to get rid of this sequence counter for quite some
>> time, so I would rather invest the additional time to finish this.
>>
> In this patch series, each extra "seqcount with associated lock" data
> type costs us, exactly:
>
>    - 1 typedef definition, seqcount_ww_mutex_t
>    - 1 static initializer, SEQCNT_WW_MUTEX_ZERO()
>    - 1 runtime initializer, seqcount_ww_mutex_init()
>
> Definitions for the typedef and the 2 initializers above are
> template-code one liners.

In this case I'm perfectly fine with this.

>
> The logic which automatically disables preemption upon entering a
> seqcount_ww_mutex_t write side critical section is also already shared
> with seqcount_mutex_t and any future, preemptible, associated lock.
>
> So, yes, dma-resv is the only user of seqcount_ww_mutex.
>
> But even in that case, given the one liner template code nature of
> seqcount_ww_mutex_t logic, it does not make sense to block the dma_resv
> and amdgpu change until at some point in the future the sequence counter
> is completely removed.
>
> **If and when** the sequence counter gets removed, please just remove
> the seqcount_ww_mutex_t data type with it. It will be extremely simple.

Completely agree, just wanted to prevent that we now add a lot of code 
which gets removed again ~3 month from now.

Regards,
Christian.

>
>> Regards,
>> Christian.
>>
> Thanks,
>
> --
> Ahmed S. Darwish
> Linutronix GmbH


^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API
  2020-05-19 21:45 ` [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API Ahmed S. Darwish
  2020-05-20 12:22   ` Konstantin Khlebnikov
@ 2020-05-22 14:57   ` Peter Zijlstra
  2020-05-22 15:17     ` Sebastian A. Siewior
                       ` (2 more replies)
  1 sibling, 3 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-05-22 14:57 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Andrew Morton,
	Konstantin Khlebnikov, linux-mm

On Tue, May 19, 2020 at 11:45:24PM +0200, Ahmed S. Darwish wrote:
> @@ -713,10 +713,20 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
>   */
>  void lru_add_drain_all(void)
>  {

> +	static unsigned int lru_drain_gen;
>  	static struct cpumask has_work;
> +	static DEFINE_MUTEX(lock);
> +	int cpu, this_gen;
>  
>  	/*
>  	 * Make sure nobody triggers this path before mm_percpu_wq is fully
> @@ -725,21 +735,48 @@ void lru_add_drain_all(void)
>  	if (WARN_ON(!mm_percpu_wq))
>  		return;
>  

> +	this_gen = READ_ONCE(lru_drain_gen);
> +	smp_rmb();

	this_gen = smp_load_acquire(&lru_drain_gen);
>  
>  	mutex_lock(&lock);
>  
>  	/*
> +	 * (C) Exit the draining operation if a newer generation, from another
> +	 * lru_add_drain_all(), was already scheduled for draining. Check (A).
>  	 */
> +	if (unlikely(this_gen != lru_drain_gen))
>  		goto done;
>  

> +	WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
> +	smp_wmb();

You can leave this smp_wmb() out and rely on the smp_mb() implied by
queue_work_on()'s test_and_set_bit().

>  	cpumask_clear(&has_work);
> -
>  	for_each_online_cpu(cpu) {
>  		struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
>  

While you're here, do:

	s/cpumask_set_cpu/__&/

> @@ -766,7 +803,7 @@ void lru_add_drain_all(void)
>  {
>  	lru_add_drain();
>  }
> -#endif
> +#endif /* CONFIG_SMP */
>  
>  /**
>   * release_pages - batched put_page()

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API
  2020-05-22 14:57   ` Peter Zijlstra
@ 2020-05-22 15:17     ` Sebastian A. Siewior
  2020-05-22 16:23       ` Peter Zijlstra
  2020-05-25 15:24     ` Ahmed S. Darwish
  2020-05-25 16:10     ` John Ogness
  2 siblings, 1 reply; 258+ messages in thread
From: Sebastian A. Siewior @ 2020-05-22 15:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ahmed S. Darwish, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Steven Rostedt, LKML, Andrew Morton,
	Konstantin Khlebnikov, linux-mm

On 2020-05-22 16:57:07 [+0200], Peter Zijlstra wrote:
> > @@ -725,21 +735,48 @@ void lru_add_drain_all(void)
> >  	if (WARN_ON(!mm_percpu_wq))
> >  		return;
> >  
> 
> > +	this_gen = READ_ONCE(lru_drain_gen);
> > +	smp_rmb();
> 
> 	this_gen = smp_load_acquire(&lru_drain_gen);
> >  
> >  	mutex_lock(&lock);
> >  
> >  	/*
> > +	 * (C) Exit the draining operation if a newer generation, from another
> > +	 * lru_add_drain_all(), was already scheduled for draining. Check (A).
> >  	 */
> > +	if (unlikely(this_gen != lru_drain_gen))
> >  		goto done;
> >  
> 
> > +	WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
> > +	smp_wmb();
> 
> You can leave this smp_wmb() out and rely on the smp_mb() implied by
> queue_work_on()'s test_and_set_bit().

This is to avoid smp_store_release() ?

Sebastian

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API
  2020-05-22 15:17     ` Sebastian A. Siewior
@ 2020-05-22 16:23       ` Peter Zijlstra
  0 siblings, 0 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-05-22 16:23 UTC (permalink / raw)
  To: Sebastian A. Siewior
  Cc: Ahmed S. Darwish, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Steven Rostedt, LKML, Andrew Morton,
	Konstantin Khlebnikov, linux-mm

On Fri, May 22, 2020 at 05:17:05PM +0200, Sebastian A. Siewior wrote:
> On 2020-05-22 16:57:07 [+0200], Peter Zijlstra wrote:
> > > @@ -725,21 +735,48 @@ void lru_add_drain_all(void)
> > >  	if (WARN_ON(!mm_percpu_wq))
> > >  		return;
> > >  
> > 
> > > +	this_gen = READ_ONCE(lru_drain_gen);
> > > +	smp_rmb();
> > 
> > 	this_gen = smp_load_acquire(&lru_drain_gen);
> > >  
> > >  	mutex_lock(&lock);
> > >  
> > >  	/*
> > > +	 * (C) Exit the draining operation if a newer generation, from another
> > > +	 * lru_add_drain_all(), was already scheduled for draining. Check (A).
> > >  	 */
> > > +	if (unlikely(this_gen != lru_drain_gen))
> > >  		goto done;
> > >  
> > 
> > > +	WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
> > > +	smp_wmb();
> > 
> > You can leave this smp_wmb() out and rely on the smp_mb() implied by
> > queue_work_on()'s test_and_set_bit().
> 
> This is to avoid smp_store_release() ?

store_release would have the barrier on the other end. If you read the
comments (I so helpfully cut out) you'll see it wants to order against
later stores, not ealier.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 04/25] block: nr_sects_write(): Disable preemption on seqcount write
  2020-05-19 21:45 ` [PATCH v1 04/25] block: nr_sects_write(): Disable preemption on seqcount write Ahmed S. Darwish
@ 2020-05-22 16:39   ` Peter Zijlstra
  2020-05-25  9:56     ` Ahmed S. Darwish
       [not found]   ` <20200522001237.A00E8206BE@mail.kernel.org>
  1 sibling, 1 reply; 258+ messages in thread
From: Peter Zijlstra @ 2020-05-22 16:39 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Jens Axboe,
	Phillip Susi, Vivek Goyal, linux-block

On Tue, May 19, 2020 at 11:45:26PM +0200, Ahmed S. Darwish wrote:
> For optimized block readers not holding a mutex, the "number of sectors"
> 64-bit value is protected from tearing on 32-bit architectures by a
> sequence counter.
> 
> Disable preemption before entering that sequence counter's write side
> critical section. Otherwise, the read side can preempt the write side
> section and spin for the entire scheduler tick. If the reader belongs to
> a real-time scheduling class, it can spin forever and the kernel will
> livelock.
> 
> Fixes: c83f6bf98dc1 ("block: add partition resize function to blkpg ioctl")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  block/blk.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/block/blk.h b/block/blk.h
> index 0a94ec68af32..151f86932547 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -470,9 +470,11 @@ static inline sector_t part_nr_sects_read(struct hd_struct *part)
>  static inline void part_nr_sects_write(struct hd_struct *part, sector_t size)
>  {
>  #if BITS_PER_LONG==32 && defined(CONFIG_SMP)
> +	preempt_disable();
>  	write_seqcount_begin(&part->nr_sects_seq);
>  	part->nr_sects = size;
>  	write_seqcount_end(&part->nr_sects_seq);
> +	preempt_enable();
>  #elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
>  	preempt_disable();
>  	part->nr_sects = size;

This does look like something that include/linux/u64_stats_sync.h could
help with.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 07/25] lockdep: Add preemption disabled assertion API
  2020-05-19 21:45 ` [PATCH v1 07/25] lockdep: Add preemption disabled assertion API Ahmed S. Darwish
@ 2020-05-22 17:55   ` Peter Zijlstra
  2020-05-23 14:59     ` Sebastian A. Siewior
  0 siblings, 1 reply; 258+ messages in thread
From: Peter Zijlstra @ 2020-05-22 17:55 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Tue, May 19, 2020 at 11:45:29PM +0200, Ahmed S. Darwish wrote:
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index 206774ac6946..54c929ea5b98 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -702,6 +702,14 @@ do {									\
>  			  "Not in hardirq as expected\n");		\
>  	} while (0)
>  
> +/*
> + * Don't define this assertion here to avoid a call-site's header file
> + * dependency on sched.h task_struct current. This is needed by call
> + * sites that are inline defined at header files already included by
> + * sched.h.
> + */
> +void lockdep_assert_preemption_disabled(void);

So how about:

#if defined(CONFIG_PREEMPT_COUNT) && defined(CONFIG_TRACE_IRQFLAGS)
#define lockdep_assert_preemption_disabled() do { \
		WARN_ON(debug_locks && !preempt_count() && \
			current->hardirqs_enabled); \
	} while (0)
#else
#define lockdep_assert_preemption_disabled() do { } while (0)
#endif

That is both more consistent with the things you claim it's modelled
after and also completely avoids that header dependency.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 09/25] Documentation: locking: Describe seqlock design and usage
  2020-05-19 21:45 ` [PATCH v1 09/25] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
@ 2020-05-22 18:01   ` Peter Zijlstra
  2020-05-22 22:24     ` Steven Rostedt
  0 siblings, 1 reply; 258+ messages in thread
From: Peter Zijlstra @ 2020-05-22 18:01 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Jonathan Corbet,
	linux-doc

On Tue, May 19, 2020 at 11:45:31PM +0200, Ahmed S. Darwish wrote:
> diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> index d35be7709403..2a4af746b1da 100644
> --- a/include/linux/seqlock.h
> +++ b/include/linux/seqlock.h
> @@ -1,36 +1,15 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
>  #ifndef __LINUX_SEQLOCK_H
>  #define __LINUX_SEQLOCK_H
> +
>  /*
> - * Reader/writer consistent mechanism without starving writers. This type of
> - * lock for data where the reader wants a consistent set of information
> - * and is willing to retry if the information changes. There are two types
> - * of readers:
> - * 1. Sequence readers which never block a writer but they may have to retry
> - *    if a writer is in progress by detecting change in sequence number.
> - *    Writers do not wait for a sequence reader.
> - * 2. Locking readers which will wait if a writer or another locking reader
> - *    is in progress. A locking reader in progress will also block a writer
> - *    from going forward. Unlike the regular rwlock, the read lock here is
> - *    exclusive so that only one locking reader can get it.
> + * seqcount_t / seqlock_t - a reader-writer consistency mechanism with
> + * lockless readers (read-only retry loops), and no writer starvation.
>   *
> - * This is not as cache friendly as brlock. Also, this may not work well
> - * for data that contains pointers, because any writer could
> - * invalidate a pointer that a reader was following.
> + * See Documentation/locking/seqlock.rst for full description.

So I really really hate that... I _much_ prefer code comments to crappy
documents.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 10/25] seqlock: Add RST directives to kernel-doc code samples and notes
  2020-05-19 21:45 ` [PATCH v1 10/25] seqlock: Add RST directives to kernel-doc code samples and notes Ahmed S. Darwish
@ 2020-05-22 18:02   ` Peter Zijlstra
  2020-05-22 18:03     ` Peter Zijlstra
  0 siblings, 1 reply; 258+ messages in thread
From: Peter Zijlstra @ 2020-05-22 18:02 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Jonathan Corbet,
	linux-doc

On Tue, May 19, 2020 at 11:45:32PM +0200, Ahmed S. Darwish wrote:
> Mark all C code samples inside seqlock.h kernel-doc text with the RST
> 'code-block: c' directive. Sphinx won't properly format the example code
> and will produce noisy text indentation warnings otherwise.

I so bloody hate RST.. and now it's infecting perfectly sane comments
and turning them into unreadable junk :-(

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 10/25] seqlock: Add RST directives to kernel-doc code samples and notes
  2020-05-22 18:02   ` Peter Zijlstra
@ 2020-05-22 18:03     ` Peter Zijlstra
  2020-05-22 18:26       ` Thomas Gleixner
  0 siblings, 1 reply; 258+ messages in thread
From: Peter Zijlstra @ 2020-05-22 18:03 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Jonathan Corbet,
	linux-doc

On Fri, May 22, 2020 at 08:02:54PM +0200, Peter Zijlstra wrote:
> On Tue, May 19, 2020 at 11:45:32PM +0200, Ahmed S. Darwish wrote:
> > Mark all C code samples inside seqlock.h kernel-doc text with the RST
> > 'code-block: c' directive. Sphinx won't properly format the example code
> > and will produce noisy text indentation warnings otherwise.
> 
> I so bloody hate RST.. and now it's infecting perfectly sane comments
> and turning them into unreadable junk :-(

The correct fix is, as always, to remove the kernel-doc marker.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 10/25] seqlock: Add RST directives to kernel-doc code samples and notes
  2020-05-22 18:03     ` Peter Zijlstra
@ 2020-05-22 18:26       ` Thomas Gleixner
  2020-05-22 18:32         ` Peter Zijlstra
  0 siblings, 1 reply; 258+ messages in thread
From: Thomas Gleixner @ 2020-05-22 18:26 UTC (permalink / raw)
  To: Peter Zijlstra, Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Jonathan Corbet, linux-doc

Peter Zijlstra <peterz@infradead.org> writes:
> On Fri, May 22, 2020 at 08:02:54PM +0200, Peter Zijlstra wrote:
>> On Tue, May 19, 2020 at 11:45:32PM +0200, Ahmed S. Darwish wrote:
>> > Mark all C code samples inside seqlock.h kernel-doc text with the RST
>> > 'code-block: c' directive. Sphinx won't properly format the example code
>> > and will produce noisy text indentation warnings otherwise.
>> 
>> I so bloody hate RST.. and now it's infecting perfectly sane comments
>> and turning them into unreadable junk :-(
>
> The correct fix is, as always, to remove the kernel-doc marker.

Get over it already.


^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 10/25] seqlock: Add RST directives to kernel-doc code samples and notes
  2020-05-22 18:26       ` Thomas Gleixner
@ 2020-05-22 18:32         ` Peter Zijlstra
  2020-05-25  9:36           ` Ahmed S. Darwish
  0 siblings, 1 reply; 258+ messages in thread
From: Peter Zijlstra @ 2020-05-22 18:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ahmed S. Darwish, Ingo Molnar, Will Deacon, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Jonathan Corbet,
	linux-doc

On Fri, May 22, 2020 at 08:26:44PM +0200, Thomas Gleixner wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> > On Fri, May 22, 2020 at 08:02:54PM +0200, Peter Zijlstra wrote:
> >> On Tue, May 19, 2020 at 11:45:32PM +0200, Ahmed S. Darwish wrote:
> >> > Mark all C code samples inside seqlock.h kernel-doc text with the RST
> >> > 'code-block: c' directive. Sphinx won't properly format the example code
> >> > and will produce noisy text indentation warnings otherwise.
> >> 
> >> I so bloody hate RST.. and now it's infecting perfectly sane comments
> >> and turning them into unreadable junk :-(
> >
> > The correct fix is, as always, to remove the kernel-doc marker.
> 
> Get over it already.

I will not let sensible code comments deteriorate to the benefit of some
external piece of crap.

As a programmer the primary interface to all this is a text editor, not
a web broswer or a pdf file or whatever other bullshit.

If comments are unreadable in your text editor, they're useless.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 09/25] Documentation: locking: Describe seqlock design and usage
  2020-05-22 18:01   ` Peter Zijlstra
@ 2020-05-22 22:24     ` Steven Rostedt
  2020-05-25 10:50       ` Ahmed S. Darwish
  0 siblings, 1 reply; 258+ messages in thread
From: Steven Rostedt @ 2020-05-22 22:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ahmed S. Darwish, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, LKML, Jonathan Corbet,
	linux-doc

On Fri, 22 May 2020 20:01:45 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, May 19, 2020 at 11:45:31PM +0200, Ahmed S. Darwish wrote:
> > diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> > index d35be7709403..2a4af746b1da 100644
> > --- a/include/linux/seqlock.h
> > +++ b/include/linux/seqlock.h
> > @@ -1,36 +1,15 @@
> >  /* SPDX-License-Identifier: GPL-2.0 */
> >  #ifndef __LINUX_SEQLOCK_H
> >  #define __LINUX_SEQLOCK_H
> > +
> >  /*
> > - * Reader/writer consistent mechanism without starving writers. This type of
> > - * lock for data where the reader wants a consistent set of information
> > - * and is willing to retry if the information changes. There are two types
> > - * of readers:
> > - * 1. Sequence readers which never block a writer but they may have to retry
> > - *    if a writer is in progress by detecting change in sequence number.
> > - *    Writers do not wait for a sequence reader.
> > - * 2. Locking readers which will wait if a writer or another locking reader
> > - *    is in progress. A locking reader in progress will also block a writer
> > - *    from going forward. Unlike the regular rwlock, the read lock here is
> > - *    exclusive so that only one locking reader can get it.
> > + * seqcount_t / seqlock_t - a reader-writer consistency mechanism with
> > + * lockless readers (read-only retry loops), and no writer starvation.
> >   *
> > - * This is not as cache friendly as brlock. Also, this may not work well
> > - * for data that contains pointers, because any writer could
> > - * invalidate a pointer that a reader was following.
> > + * See Documentation/locking/seqlock.rst for full description.  
> 
> So I really really hate that... I _much_ prefer code comments to crappy
> documents.

Agreed. Comments are much less likely to bitrot than documents. The
farther away the documentation is from the code, the quicker it becomes
stale.

It's fine to add "See Documentation/..." but please don't *ever* remove
comments that's next to the actual code.

-- Steve

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 07/25] lockdep: Add preemption disabled assertion API
  2020-05-22 17:55   ` Peter Zijlstra
@ 2020-05-23 14:59     ` Sebastian A. Siewior
  2020-05-23 22:41       ` Peter Zijlstra
  0 siblings, 1 reply; 258+ messages in thread
From: Sebastian A. Siewior @ 2020-05-23 14:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ahmed S. Darwish, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Steven Rostedt, LKML

On 2020-05-22 19:55:03 [+0200], Peter Zijlstra wrote:
> On Tue, May 19, 2020 at 11:45:29PM +0200, Ahmed S. Darwish wrote:
> > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> > index 206774ac6946..54c929ea5b98 100644
> > --- a/include/linux/lockdep.h
> > +++ b/include/linux/lockdep.h
> > @@ -702,6 +702,14 @@ do {									\
> >  			  "Not in hardirq as expected\n");		\
> >  	} while (0)
> >  
> > +/*
> > + * Don't define this assertion here to avoid a call-site's header file
> > + * dependency on sched.h task_struct current. This is needed by call
> > + * sites that are inline defined at header files already included by
> > + * sched.h.
> > + */
> > +void lockdep_assert_preemption_disabled(void);
> 
> So how about:
> 
> #if defined(CONFIG_PREEMPT_COUNT) && defined(CONFIG_TRACE_IRQFLAGS)
> #define lockdep_assert_preemption_disabled() do { \
> 		WARN_ON(debug_locks && !preempt_count() && \
> 			current->hardirqs_enabled); \
> 	} while (0)
> #else
> #define lockdep_assert_preemption_disabled() do { } while (0)
> #endif
> 
> That is both more consistent with the things you claim it's modelled
> after and also completely avoids that header dependency.

So we need additionally: 

- #include <linux/sched.h> in include/linux/flex_proportions.h
  and I think un another file as well.

- write_seqcount_t_begin_nested() as a define

- write_seqcount_t_begin() as a define

Any "static inline" in the header file using
lockdep_assert_preemption_disabled() will tro to complain about missing
current-> define. But yes, it will work otherwise.

Sebastian

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 07/25] lockdep: Add preemption disabled assertion API
  2020-05-23 14:59     ` Sebastian A. Siewior
@ 2020-05-23 22:41       ` Peter Zijlstra
  2020-05-24 10:50         ` Sebastian A. Siewior
  2020-05-25 10:22         ` Peter Zijlstra
  0 siblings, 2 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-05-23 22:41 UTC (permalink / raw)
  To: Sebastian A. Siewior
  Cc: Ahmed S. Darwish, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Steven Rostedt, LKML

On Sat, May 23, 2020 at 04:59:42PM +0200, Sebastian A. Siewior wrote:
> On 2020-05-22 19:55:03 [+0200], Peter Zijlstra wrote:

> > That is both more consistent with the things you claim it's modelled
> > after and also completely avoids that header dependency.
> 
> So we need additionally: 
> 
> - #include <linux/sched.h> in include/linux/flex_proportions.h
>   and I think un another file as well.
> 
> - write_seqcount_t_begin_nested() as a define
> 
> - write_seqcount_t_begin() as a define
> 
> Any "static inline" in the header file using
> lockdep_assert_preemption_disabled() will tro to complain about missing
> current-> define. But yes, it will work otherwise.

Because...? /me rummages around.. Ah you're proposing sticking this in
seqcount itself and then header hell.

Moo.. ok I'll go have another look on Monday.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 07/25] lockdep: Add preemption disabled assertion API
  2020-05-23 22:41       ` Peter Zijlstra
@ 2020-05-24 10:50         ` Sebastian A. Siewior
  2020-05-25 10:22         ` Peter Zijlstra
  1 sibling, 0 replies; 258+ messages in thread
From: Sebastian A. Siewior @ 2020-05-24 10:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ahmed S. Darwish, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Steven Rostedt, LKML

On 2020-05-24 00:41:32 [+0200], Peter Zijlstra wrote:
> Because...? /me rummages around.. Ah you're proposing sticking this in
> seqcount itself and then header hell.
> 
> Moo.. ok I'll go have another look on Monday.

if you have a look on Monday you might want to start with the patch at
the bottom on top of the series. Both sched.h includes are needed also
you need to avoid write_seqcount_t_begin_nested() as static inline in
header files.

diff --git a/include/linux/flex_proportions.h b/include/linux/flex_proportions.h
index c12df59d3f5fc..c0f88f08371d7 100644
--- a/include/linux/flex_proportions.h
+++ b/include/linux/flex_proportions.h
@@ -12,6 +12,7 @@
 #include <linux/spinlock.h>
 #include <linux/seqlock.h>
 #include <linux/gfp.h>
+#include <linux/sched.h>
 
 /*
  * When maximum proportion of some event type is specified, this is the
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 54c929ea5b982..76385e599a9cb 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -702,13 +702,13 @@ do {									\
 			  "Not in hardirq as expected\n");		\
 	} while (0)
 
-/*
- * Don't define this assertion here to avoid a call-site's header file
- * dependency on sched.h task_struct current. This is needed by call
- * sites that are inline defined at header files already included by
- * sched.h.
- */
-void lockdep_assert_preemption_disabled(void);
+#define lockdep_assert_preemption_disabled() do {			\
+	WARN_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT)      &&		\
+		  debug_locks                           &&		\
+		  !current->lockdep_recursion           &&		\
+		  (preempt_count() == 0 && current->hardirqs_enabled),	\
+		  "preemption not disabled as expected\n");		\
+       } while (0)
 
 #else
 # define might_lock(lock) do { } while (0)
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index eca464ecf012f..7f1261376110a 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -423,11 +423,11 @@ static inline void __write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
 	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }
 
-static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
-{
-	lockdep_assert_preemption_disabled();
-	__write_seqcount_t_begin_nested(s, subclass);
-}
+#define write_seqcount_t_begin_nested(__s, __subclass)		\
+	do {							\
+		lockdep_assert_preemption_disabled();		\
+		__write_seqcount_t_begin_nested(__s, __subclass);\
+	} while (0)
 
 /*
  * write_seqcount_t_begin() without lockdep non-preemptibility check.
@@ -450,10 +450,7 @@ static inline void __write_seqcount_t_begin(seqcount_t *s)
  */
 #define write_seqcount_begin(s)		do_write_seqcount_begin(s)
 
-static inline void write_seqcount_t_begin(seqcount_t *s)
-{
-	write_seqcount_t_begin_nested(s, 0);
-}
+#define write_seqcount_t_begin(_s)	write_seqcount_t_begin_nested(_s, 0);
 
 /**
  * write_seqcount_end() - end a seqcount write-side critical section
diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
index 30358ce3d8fe1..d0fd3edcdc50b 100644
--- a/include/linux/u64_stats_sync.h
+++ b/include/linux/u64_stats_sync.h
@@ -62,6 +62,7 @@
  * Example of use in drivers/net/loopback.c, using per_cpu containers,
  * in BH disabled context.
  */
+#include <linux/sched.h>
 #include <linux/seqlock.h>
 
 struct u64_stats_sync {
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 4dae65bc65c24..ac10db66cc63f 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -5857,18 +5857,3 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
 	dump_stack();
 }
 EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious);
-
-#ifdef CONFIG_PROVE_LOCKING
-
-void lockdep_assert_preemption_disabled(void)
-{
-	WARN_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT)	&&
-		  debug_locks				&&
-		  !current->lockdep_recursion		&&
-		  (preempt_count() == 0 && current->hardirqs_enabled),
-		  "preemption not disabled as expected\n");
-}
-EXPORT_SYMBOL_GPL(lockdep_assert_preemption_disabled);
-NOKPROBE_SYMBOL(lockdep_assert_preemption_disabled);
-
-#endif

Sebastian

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 10/25] seqlock: Add RST directives to kernel-doc code samples and notes
  2020-05-22 18:32         ` Peter Zijlstra
@ 2020-05-25  9:36           ` Ahmed S. Darwish
  2020-05-25 13:44             ` Peter Zijlstra
  0 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-25  9:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Will Deacon, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Jonathan Corbet,
	linux-doc

Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, May 22, 2020 at 08:26:44PM +0200, Thomas Gleixner wrote:
> > Peter Zijlstra <peterz@infradead.org> writes:
> > > On Fri, May 22, 2020 at 08:02:54PM +0200, Peter Zijlstra wrote:
> > >> On Tue, May 19, 2020 at 11:45:32PM +0200, Ahmed S. Darwish wrote:
> > >> > Mark all C code samples inside seqlock.h kernel-doc text with the RST
> > >> > 'code-block: c' directive. Sphinx won't properly format the example code
> > >> > and will produce noisy text indentation warnings otherwise.
> > >>
> > >> I so bloody hate RST.. and now it's infecting perfectly sane comments
> > >> and turning them into unreadable junk :-(
> > >
> > > The correct fix is, as always, to remove the kernel-doc marker.
> >
> > Get over it already.
>
> I will not let sensible code comments deteriorate to the benefit of some
> external piece of crap.
>
> As a programmer the primary interface to all this is a text editor, not
> a web broswer or a pdf file or whatever other bullshit.
>
> If comments are unreadable in your text editor, they're useless.

Wait.

Most of the patch in question is just substituting the code snippet's
leading white spaces to tabs. For illustration purposes, if we remove
these white space hunks from the diff, it becomes:

  --- a/include/linux/seqlock.h
  +++ b/include/linux/seqlock.h
  @@ -232,6 +232,8 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  + * .. code-block:: c
  ...
  + * .. code-block:: c
  ...
  - * NOTE: The non-requirement for atomic modifications does _NOT_ include
  - *       the publishing of new entries in the case where data is a dynamic
  - *       data structure.
  + * .. attention::
  + *
  + *     The non-requirement for atomic modifications does _NOT_ include
  + *     the publishing of new entries in the case where data is a dynamic
  + *     data structure.
  ...

Are you trying to tell me that, good heavens, these directives are
really hurting your eyes so much?

Putting kernel-doc aside... That huge raw_write_seqcount_latch() comment
is actually *way more readable from any text editor* after applying this
patch. Go figure.

>>> The correct fix is, as always, to remove the kernel-doc marker.

Sorry, that's not the correct fix.

In the following patches, kernel-doc for the entire seqlock.h API is
added. Singling out raw_write_seqcount_latch() doesn't make any sense.

If you look at the top of this patch series, a lot of seqlock.h
seqcount_t call sites were badly broken. The 0day kernel test bot sent
me even more erroneous call sites due to the added lockdep checks. This
is an extra argument for the added documentation: the existing one is
horrible.

So, please, don't claim that the current situation is fine. It is not.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 04/25] block: nr_sects_write(): Disable preemption on seqcount write
  2020-05-22 16:39   ` Peter Zijlstra
@ 2020-05-25  9:56     ` Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-25  9:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Jens Axboe,
	Phillip Susi, Vivek Goyal, linux-block

Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, May 19, 2020 at 11:45:26PM +0200, Ahmed S. Darwish wrote:
> > For optimized block readers not holding a mutex, the "number of sectors"
> > 64-bit value is protected from tearing on 32-bit architectures by a
> > sequence counter.
> >
> > Disable preemption before entering that sequence counter's write side
> > critical section. Otherwise, the read side can preempt the write side
> > section and spin for the entire scheduler tick. If the reader belongs to
> > a real-time scheduling class, it can spin forever and the kernel will
> > livelock.
> >
> > Fixes: c83f6bf98dc1 ("block: add partition resize function to blkpg ioctl")
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> > Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > ---
> >  block/blk.h | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/block/blk.h b/block/blk.h
> > index 0a94ec68af32..151f86932547 100644
> > --- a/block/blk.h
> > +++ b/block/blk.h
> > @@ -470,9 +470,11 @@ static inline sector_t part_nr_sects_read(struct hd_struct *part)
> >  static inline void part_nr_sects_write(struct hd_struct *part, sector_t size)
> >  {
> >  #if BITS_PER_LONG==32 && defined(CONFIG_SMP)
> > +	preempt_disable();
> >  	write_seqcount_begin(&part->nr_sects_seq);
> >  	part->nr_sects = size;
> >  	write_seqcount_end(&part->nr_sects_seq);
> > +	preempt_enable();
> >  #elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
> >  	preempt_disable();
> >  	part->nr_sects = size;
>
> This does look like something that include/linux/u64_stats_sync.h could
> help with.

Correct.

I just felt though that this would be too much for a 'Cc: stable' patch.

In another (in-progress) seqlock.h patch series, all of the seqcount_t
call sites that are used for 64-bit values tearing protection on 32-bit
kernels are transformed to the u64_stats_sync.h API.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 04/25] block: nr_sects_write(): Disable preemption on seqcount write
       [not found]   ` <20200522001237.A00E8206BE@mail.kernel.org>
@ 2020-05-25 10:12     ` Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-25 10:12 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Peter Zijlstra, Thomas Gleixner, Sebastian A. Siewior, stable,
	Jens Axboe, Christoph Hellwig, linux-block, LKML

Sasha Levin <sashal@kernel.org> wrote:
> Hi
>
> [This is an automated email]
>
> This commit has been processed because it contains a "Fixes:" tag
> fixing commit: c83f6bf98dc1 ("block: add partition resize function to blkpg ioctl").
>
> The bot has tested the following trees: v5.6.13, v5.4.41, v4.19.123, v4.14.180, v4.9.223, v4.4.223.
>
> v5.6.13: Failed to apply! Possible dependencies:
...
> v5.4.41: Failed to apply! Possible dependencies:
...
> v4.19.123: Failed to apply! Possible dependencies:
...
> v4.14.180: Failed to apply! Possible dependencies:
...
> v4.9.223: Failed to apply! Possible dependencies:
...
> v4.4.223: Failed to apply! Possible dependencies:
...
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
>

The v5.7-rc1 commit 581e26004a09 ("block: move block layer internals out
of include/linux/genhd.h") moved the part_nr_sects_write() static inline
function from include/linux/genhd.h to block/blk.h.

After review, I'll send a rebased patch to stable.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 07/25] lockdep: Add preemption disabled assertion API
  2020-05-23 22:41       ` Peter Zijlstra
  2020-05-24 10:50         ` Sebastian A. Siewior
@ 2020-05-25 10:22         ` Peter Zijlstra
  2020-05-26  0:52           ` Ahmed S. Darwish
  1 sibling, 1 reply; 258+ messages in thread
From: Peter Zijlstra @ 2020-05-25 10:22 UTC (permalink / raw)
  To: Sebastian A. Siewior
  Cc: Ahmed S. Darwish, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Steven Rostedt, LKML

On Sun, May 24, 2020 at 12:41:32AM +0200, Peter Zijlstra wrote:
> On Sat, May 23, 2020 at 04:59:42PM +0200, Sebastian A. Siewior wrote:
> > On 2020-05-22 19:55:03 [+0200], Peter Zijlstra wrote:
> 
> > > That is both more consistent with the things you claim it's modelled
> > > after and also completely avoids that header dependency.
> > 
> > So we need additionally: 
> > 
> > - #include <linux/sched.h> in include/linux/flex_proportions.h
> >   and I think un another file as well.
> > 
> > - write_seqcount_t_begin_nested() as a define
> > 
> > - write_seqcount_t_begin() as a define
> > 
> > Any "static inline" in the header file using
> > lockdep_assert_preemption_disabled() will tro to complain about missing
> > current-> define. But yes, it will work otherwise.
> 
> Because...? /me rummages around.. Ah you're proposing sticking this in
> seqcount itself and then header hell.
> 
> Moo.. ok I'll go have another look on Monday.

How's this?

---

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index d7f7e436c3af..459ae7a6c207 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -14,6 +14,7 @@
 
 #include <linux/typecheck.h>
 #include <asm/irqflags.h>
+#include <asm/percpu.h>
 
 /* Currently lockdep_softirqs_on/off is used only by lockdep */
 #ifdef CONFIG_PROVE_LOCKING
@@ -31,18 +32,22 @@
 #endif
 
 #ifdef CONFIG_TRACE_IRQFLAGS
+
+DECLARE_PER_CPU(int, hardirqs_enabled);
+DECLARE_PER_CPU(int, hardirq_context);
+
   extern void trace_hardirqs_on_prepare(void);
   extern void trace_hardirqs_off_prepare(void);
   extern void trace_hardirqs_on(void);
   extern void trace_hardirqs_off(void);
-# define lockdep_hardirq_context(p)	((p)->hardirq_context)
+# define lockdep_hardirq_context(p)	(this_cpu_read(hardirq_context))
 # define lockdep_softirq_context(p)	((p)->softirq_context)
-# define lockdep_hardirqs_enabled(p)	((p)->hardirqs_enabled)
+# define lockdep_hardirqs_enabled(p)	(this_cpu_read(hardirqs_enabled))
 # define lockdep_softirqs_enabled(p)	((p)->softirqs_enabled)
-# define lockdep_hardirq_enter()		\
-do {						\
-	if (!current->hardirq_context++)	\
-		current->hardirq_threaded = 0;	\
+# define lockdep_hardirq_enter()			\
+do {							\
+	if (!this_cpu_inc_return(hardirq_context))	\
+		current->hardirq_threaded = 0;		\
 } while (0)
 # define lockdep_hardirq_threaded()		\
 do {						\
@@ -50,7 +55,7 @@ do {						\
 } while (0)
 # define lockdep_hardirq_exit()			\
 do {						\
-	current->hardirq_context--;		\
+	this_cpu_dec(hardirq_context);		\
 } while (0)
 # define lockdep_softirq_enter()		\
 do {						\
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 8fce5c98a4b0..754c31e30a83 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -20,6 +20,7 @@ extern int lock_stat;
 #define MAX_LOCKDEP_SUBCLASSES		8UL
 
 #include <linux/types.h>
+#include <asm/percpu.h>
 
 enum lockdep_wait_type {
 	LD_WAIT_INV = 0,	/* not checked, catch all */
@@ -703,28 +704,29 @@ do {									\
 	lock_release(&(lock)->dep_map, _THIS_IP_);			\
 } while (0)
 
-#define lockdep_assert_irqs_enabled()	do {				\
-		WARN_ONCE(debug_locks && !current->lockdep_recursion &&	\
-			  !current->hardirqs_enabled,			\
-			  "IRQs not enabled as expected\n");		\
-	} while (0)
+DECLARE_PER_CPU(int, hardirqs_enabled);
+DECLARE_PER_CPU(int, hardirq_context);
 
-#define lockdep_assert_irqs_disabled()	do {				\
-		WARN_ONCE(debug_locks && !current->lockdep_recursion &&	\
-			  current->hardirqs_enabled,			\
-			  "IRQs not disabled as expected\n");		\
-	} while (0)
+#define lockdep_assert_irqs_enabled()					\
+do {									\
+	WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirqs_enabled));	\
+} while (0)
 
-#define lockdep_assert_in_irq() do {					\
-		WARN_ONCE(debug_locks && !current->lockdep_recursion &&	\
-			  !current->hardirq_context,			\
-			  "Not in hardirq as expected\n");		\
-	} while (0)
+#define lockdep_assert_irqs_disabled()					\
+do {									\
+	WARN_ON_ONCE(debug_locks && this_cpu_read(hardirqs_enabled));	\
+} while (0)
+
+#define lockdep_assert_in_irq()						\
+do {									\
+	WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirq_context));	\
+} while (0)
 
 #else
 # define might_lock(lock) do { } while (0)
 # define might_lock_read(lock) do { } while (0)
 # define might_lock_nested(lock, subclass) do { } while (0)
+
 # define lockdep_assert_irqs_enabled() do { } while (0)
 # define lockdep_assert_irqs_disabled() do { } while (0)
 # define lockdep_assert_in_irq() do { } while (0)
@@ -734,7 +736,7 @@ do {									\
 
 # define lockdep_assert_RT_in_threaded_ctx() do {			\
 		WARN_ONCE(debug_locks && !current->lockdep_recursion &&	\
-			  current->hardirq_context &&			\
+			  lockdep_hardirq_context(current) &&		\
 			  !(current->hardirq_threaded || current->irq_config),	\
 			  "Not in threaded context on PREEMPT_RT as expected\n");	\
 } while (0)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1d68ee36c583..3d48f80848db 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -990,8 +990,6 @@ struct task_struct {
 	unsigned long			hardirq_disable_ip;
 	unsigned int			hardirq_enable_event;
 	unsigned int			hardirq_disable_event;
-	int				hardirqs_enabled;
-	int				hardirq_context;
 	u64				hardirq_chain_key;
 	unsigned long			softirq_disable_ip;
 	unsigned long			softirq_enable_ip;
diff --git a/kernel/fork.c b/kernel/fork.c
index 96eb4b535ced..5e0e4918dc9f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1946,8 +1946,8 @@ static __latent_entropy struct task_struct *copy_process(
 
 	rt_mutex_init_task(p);
 
+	lockdep_assert_irqs_enabled();
 #ifdef CONFIG_PROVE_LOCKING
-	DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled);
 	DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled);
 #endif
 	retval = -EAGAIN;
@@ -2028,7 +2028,6 @@ static __latent_entropy struct task_struct *copy_process(
 #endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	p->irq_events = 0;
-	p->hardirqs_enabled = 0;
 	p->hardirq_enable_ip = 0;
 	p->hardirq_enable_event = 0;
 	p->hardirq_disable_ip = _THIS_IP_;
@@ -2038,7 +2037,6 @@ static __latent_entropy struct task_struct *copy_process(
 	p->softirq_enable_event = 0;
 	p->softirq_disable_ip = 0;
 	p->softirq_disable_event = 0;
-	p->hardirq_context = 0;
 	p->softirq_context = 0;
 #endif
 
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index bdea09b365b6..b113941f579e 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2062,9 +2062,9 @@ print_bad_irq_dependency(struct task_struct *curr,
 	pr_warn("-----------------------------------------------------\n");
 	pr_warn("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] is trying to acquire:\n",
 		curr->comm, task_pid_nr(curr),
-		curr->hardirq_context, hardirq_count() >> HARDIRQ_SHIFT,
+		lockdep_hardirq_context(curr), hardirq_count() >> HARDIRQ_SHIFT,
 		curr->softirq_context, softirq_count() >> SOFTIRQ_SHIFT,
-		curr->hardirqs_enabled,
+		lockdep_hardirqs_enabled(curr),
 		curr->softirqs_enabled);
 	print_lock(next);
 
@@ -3649,7 +3649,7 @@ void lockdep_hardirqs_on_prepare(unsigned long ip)
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
 
-	if (unlikely(current->hardirqs_enabled)) {
+	if (unlikely(lockdep_hardirqs_enabled(current))) {
 		/*
 		 * Neither irq nor preemption are disabled here
 		 * so this is racy by nature but losing one hit
@@ -3677,7 +3677,7 @@ void lockdep_hardirqs_on_prepare(unsigned long ip)
 	 * Can't allow enabling interrupts while in an interrupt handler,
 	 * that's general bad form and such. Recursion, limited stack etc..
 	 */
-	if (DEBUG_LOCKS_WARN_ON(current->hardirq_context))
+	if (DEBUG_LOCKS_WARN_ON(lockdep_hardirq_context(current)))
 		return;
 
 	current->hardirq_chain_key = current->curr_chain_key;
@@ -3695,7 +3695,7 @@ void noinstr lockdep_hardirqs_on(unsigned long ip)
 	if (unlikely(!debug_locks || curr->lockdep_recursion))
 		return;
 
-	if (curr->hardirqs_enabled) {
+	if (lockdep_hardirqs_enabled(curr)) {
 		/*
 		 * Neither irq nor preemption are disabled here
 		 * so this is racy by nature but losing one hit
@@ -3721,7 +3721,7 @@ void noinstr lockdep_hardirqs_on(unsigned long ip)
 			    current->curr_chain_key);
 
 	/* we'll do an OFF -> ON transition: */
-	curr->hardirqs_enabled = 1;
+	this_cpu_write(hardirqs_enabled, 1);
 	curr->hardirq_enable_ip = ip;
 	curr->hardirq_enable_event = ++curr->irq_events;
 	debug_atomic_inc(hardirqs_on_events);
@@ -3745,11 +3745,11 @@ void noinstr lockdep_hardirqs_off(unsigned long ip)
 	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
 		return;
 
-	if (curr->hardirqs_enabled) {
+	if (lockdep_hardirqs_enabled(curr)) {
 		/*
 		 * We have done an ON -> OFF transition:
 		 */
-		curr->hardirqs_enabled = 0;
+		this_cpu_write(hardirqs_enabled, 0);
 		curr->hardirq_disable_ip = ip;
 		curr->hardirq_disable_event = ++curr->irq_events;
 		debug_atomic_inc(hardirqs_off_events);
@@ -3794,7 +3794,7 @@ void lockdep_softirqs_on(unsigned long ip)
 	 * usage bit for all held locks, if hardirqs are
 	 * enabled too:
 	 */
-	if (curr->hardirqs_enabled)
+	if (lockdep_hardirqs_enabled(curr))
 		mark_held_locks(curr, LOCK_ENABLED_SOFTIRQ);
 	lockdep_recursion_finish();
 }
@@ -3843,7 +3843,7 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
 	 */
 	if (!hlock->trylock) {
 		if (hlock->read) {
-			if (curr->hardirq_context)
+			if (lockdep_hardirq_context(curr))
 				if (!mark_lock(curr, hlock,
 						LOCK_USED_IN_HARDIRQ_READ))
 					return 0;
@@ -3852,7 +3852,7 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
 						LOCK_USED_IN_SOFTIRQ_READ))
 					return 0;
 		} else {
-			if (curr->hardirq_context)
+			if (lockdep_hardirq_context(curr))
 				if (!mark_lock(curr, hlock, LOCK_USED_IN_HARDIRQ))
 					return 0;
 			if (curr->softirq_context)
@@ -3890,7 +3890,7 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
 
 static inline unsigned int task_irq_context(struct task_struct *task)
 {
-	return LOCK_CHAIN_HARDIRQ_CONTEXT * !!task->hardirq_context +
+	return LOCK_CHAIN_HARDIRQ_CONTEXT * !!lockdep_hardirq_context(task) +
 	       LOCK_CHAIN_SOFTIRQ_CONTEXT * !!task->softirq_context;
 }
 
@@ -3983,7 +3983,7 @@ static inline short task_wait_context(struct task_struct *curr)
 	 * Set appropriate wait type for the context; for IRQs we have to take
 	 * into account force_irqthread as that is implied by PREEMPT_RT.
 	 */
-	if (curr->hardirq_context) {
+	if (lockdep_hardirq_context(curr)) {
 		/*
 		 * Check if force_irqthreads will run us threaded.
 		 */
@@ -4826,11 +4826,11 @@ static void check_flags(unsigned long flags)
 		return;
 
 	if (irqs_disabled_flags(flags)) {
-		if (DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)) {
+		if (DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled(current))) {
 			printk("possible reason: unannotated irqs-off.\n");
 		}
 	} else {
-		if (DEBUG_LOCKS_WARN_ON(!current->hardirqs_enabled)) {
+		if (DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled(current))) {
 			printk("possible reason: unannotated irqs-on.\n");
 		}
 	}
diff --git a/kernel/softirq.c b/kernel/softirq.c
index beb8e3a66c7c..f45ebff906f7 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -107,6 +107,12 @@ static bool ksoftirqd_running(unsigned long pending)
  * where hardirqs are disabled legitimately:
  */
 #ifdef CONFIG_TRACE_IRQFLAGS
+
+DEFINE_PER_CPU(int, hardirqs_enabled);
+DEFINE_PER_CPU(int, hardirq_context);
+EXPORT_PER_CPU_SYMBOL_GPL(hardirqs_enabled);
+EXPORT_PER_CPU_SYMBOL_GPL(hardirq_context);
+
 void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
 {
 	unsigned long flags;

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 09/25] Documentation: locking: Describe seqlock design and usage
  2020-05-22 22:24     ` Steven Rostedt
@ 2020-05-25 10:50       ` Ahmed S. Darwish
  2020-05-25 11:02         ` Ahmed S. Darwish
  0 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-25 10:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, LKML, Jonathan Corbet,
	linux-doc

Steven Rostedt <rostedt@goodmis.org> wrote:
> Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, May 19, 2020 at 11:45:31PM +0200, Ahmed S. Darwish wrote:
> > > diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> > > index d35be7709403..2a4af746b1da 100644
> > > --- a/include/linux/seqlock.h
> > > +++ b/include/linux/seqlock.h
> > > @@ -1,36 +1,15 @@
> > >  /* SPDX-License-Identifier: GPL-2.0 */
> > >  #ifndef __LINUX_SEQLOCK_H
> > >  #define __LINUX_SEQLOCK_H
> > > +
> > >  /*
> > > - * Reader/writer consistent mechanism without starving writers. This type of
> > > - * lock for data where the reader wants a consistent set of information
> > > - * and is willing to retry if the information changes. There are two types
> > > - * of readers:
> > > - * 1. Sequence readers which never block a writer but they may have to retry
> > > - *    if a writer is in progress by detecting change in sequence number.
> > > - *    Writers do not wait for a sequence reader.
> > > - * 2. Locking readers which will wait if a writer or another locking reader
> > > - *    is in progress. A locking reader in progress will also block a writer
> > > - *    from going forward. Unlike the regular rwlock, the read lock here is
> > > - *    exclusive so that only one locking reader can get it.
> > > + * seqcount_t / seqlock_t - a reader-writer consistency mechanism with
> > > + * lockless readers (read-only retry loops), and no writer starvation.
> > >   *
> > > - * This is not as cache friendly as brlock. Also, this may not work well
> > > - * for data that contains pointers, because any writer could
> > > - * invalidate a pointer that a reader was following.
> > > + * See Documentation/locking/seqlock.rst for full description.
> >
> > So I really really hate that... I _much_ prefer code comments to crappy
> > documents.
>
> Agreed. Comments are much less likely to bitrot than documents. The
> farther away the documentation is from the code, the quicker it becomes
> stale.
>
> It's fine to add "See Documentation/..." but please don't *ever* remove
> comments that's next to the actual code.
>

This patch was unfairly cut at the hunk above :)

If you follow the rest of it, you see that the documentation has just
moved 3 lines below:

     /*
    - * Version using sequence counter only.
    - * This can be used when code has its own mutex protecting the
    - * updating starting before the write_seqcountbeqin() and ending
    - * after the write_seqcount_end().
    + * Sequence counters (seqcount_t)
    + *
    + * The raw counting mechanism without any writer protection. Write side
    + * critical sections must be serialized and readers on the same CPU
    + * (e.g. through preemption or interrupts) must be excluded.
    + *
    + * If it's desired to automatically handle the sequence counter writer
    + * serialization and non-preemptibility requirements, use a sequential
    + * lock (seqlock_t) instead.
    + *
    + * See Documentation/locking/seqlock.rst
      */
    +
     typedef struct seqcount {

and:

    +/*
    + * Sequential locks (seqlock_t)
    + *
    + * Sequence counters with an embedded spinlock for writer serialization
    + * and non-preemptibility.
    + *
    + * See Documentation/locking/seqlock.rst
    + */
    +
     typedef struct {
     	struct seqcount seqcount;
     	spinlock_t lock;
     } seqlock_t;

This was done because, as said in the commit log, documentation of
seqcount_t and seqlock_t was originally intermingled. This is incorrect
and confusing since the usage constrains for each type are vastly
different.

Then, the brlock comment:

    This is not as cache friendly as brlock. Also, this may not work
    well for data that contains pointers, because any writer could
    invalidate a pointer that a reader was following.

was removed not because it's moved to Documentation/locking/seqlock.rst,
but because it's obsolete: 0f6ed63b1707 ("no need to keep brlock macros
anymore...").

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 09/25] Documentation: locking: Describe seqlock design and usage
  2020-05-25 10:50       ` Ahmed S. Darwish
@ 2020-05-25 11:02         ` Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-25 11:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, LKML, Jonathan Corbet,
	linux-doc

Ahmed S. Darwish <a.darwish@linutronix.de> wrote:
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > Peter Zijlstra <peterz@infradead.org> wrote:
...
> > >
> > > So I really really hate that... I _much_ prefer code comments to crappy
> > > documents.
> >
> > Agreed. Comments are much less likely to bitrot than documents. The
> > farther away the documentation is from the code, the quicker it becomes
> > stale.
> >
> > It's fine to add "See Documentation/..." but please don't *ever* remove
> > comments that's next to the actual code.
...
>
> Then, the brlock comment:
>
>     This is not as cache friendly as brlock. Also, this may not work
>     well for data that contains pointers, because any writer could
>     invalidate a pointer that a reader was following.
>
> was removed not because it's moved to Documentation/locking/seqlock.rst,
> but because it's obsolete: 0f6ed63b1707 ("no need to keep brlock macros
> anymore...").
>

Hmm, the part about not including pointers is only mentiond in the RST
file though, and not at seqlock.h.

Anyway, ACK, I'll beef up the comments at seqlock.h and make sure they
are self-contained.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 10/25] seqlock: Add RST directives to kernel-doc code samples and notes
  2020-05-25  9:36           ` Ahmed S. Darwish
@ 2020-05-25 13:44             ` Peter Zijlstra
  2020-05-25 14:07               ` Peter Zijlstra
  0 siblings, 1 reply; 258+ messages in thread
From: Peter Zijlstra @ 2020-05-25 13:44 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Thomas Gleixner, Ingo Molnar, Will Deacon, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Jonathan Corbet,
	linux-doc

On Mon, May 25, 2020 at 11:36:49AM +0200, Ahmed S. Darwish wrote:
> Peter Zijlstra <peterz@infradead.org> wrote:

> > I will not let sensible code comments deteriorate to the benefit of some
> > external piece of crap.
> >
> > As a programmer the primary interface to all this is a text editor, not
> > a web broswer or a pdf file or whatever other bullshit.
> >
> > If comments are unreadable in your text editor, they're useless.
> 
> Wait.
> 
> Most of the patch in question is just substituting the code snippet's
> leading white spaces to tabs. For illustration purposes, if we remove
> these white space hunks from the diff, it becomes:
> 
>   --- a/include/linux/seqlock.h
>   +++ b/include/linux/seqlock.h
>   @@ -232,6 +232,8 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
>   + * .. code-block:: c
>   ...
>   + * .. code-block:: c
>   ...
>   - * NOTE: The non-requirement for atomic modifications does _NOT_ include
>   - *       the publishing of new entries in the case where data is a dynamic
>   - *       data structure.
>   + * .. attention::
>   + *
>   + *     The non-requirement for atomic modifications does _NOT_ include
>   + *     the publishing of new entries in the case where data is a dynamic
>   + *     data structure.
>   ...
> 
> Are you trying to tell me that, good heavens, these directives are
> really hurting your eyes so much?

Yep, they're a distraction and serve absolutely no purpose. They're also
utterly moronic, of course it's code and of course it's bloody well C.

> Putting kernel-doc aside... That huge raw_write_seqcount_latch() comment
> is actually *way more readable from any text editor* after applying this
> patch. Go figure.

I don't mind the re-indent.

> >>> The correct fix is, as always, to remove the kernel-doc marker.
> 
> Sorry, that's not the correct fix.

Of course it is, if kerneldoc complains that a perfectly good comment
is no good, then the fault lies with kerneldoc.

It's like checkpatch; assume it is wrong :-)

> In the following patches, kernel-doc for the entire seqlock.h API is
> added. Singling out raw_write_seqcount_latch() doesn't make any sense.

% s/\/\*\*/\/\*/g -- tada!!

> If you look at the top of this patch series, a lot of seqlock.h
> seqcount_t call sites were badly broken. The 0day kernel test bot sent
> me even more erroneous call sites due to the added lockdep checks. This
> is an extra argument for the added documentation: the existing one is
> horrible.

I've nothing against improving comments, I'm just saying that RST is
absolute atrocious shite and has nothing to do with good comments.

If sphinx doesn't like "NOTE:' when go teach it.

> So, please, don't claim that the current situation is fine. It is not.

I've never claimed that. My claim is that RST is shite and has no added
value.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 10/25] seqlock: Add RST directives to kernel-doc code samples and notes
  2020-05-25 13:44             ` Peter Zijlstra
@ 2020-05-25 14:07               ` Peter Zijlstra
  0 siblings, 0 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-05-25 14:07 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Thomas Gleixner, Ingo Molnar, Will Deacon, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Jonathan Corbet,
	linux-doc

On Mon, May 25, 2020 at 03:44:29PM +0200, Peter Zijlstra wrote:

> I've never claimed that. My claim is that RST is shite and has no added
> value.

Or rather, it has negative value, for it makes comments less readable.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API
  2020-05-22 14:57   ` Peter Zijlstra
  2020-05-22 15:17     ` Sebastian A. Siewior
@ 2020-05-25 15:24     ` Ahmed S. Darwish
  2020-05-25 15:45       ` Peter Zijlstra
  2020-05-25 16:10     ` John Ogness
  2 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-25 15:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Andrew Morton,
	Konstantin Khlebnikov, linux-mm

Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, May 19, 2020 at 11:45:24PM +0200, Ahmed S. Darwish wrote:
> > @@ -713,10 +713,20 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
> >   */
> >  void lru_add_drain_all(void)
> >  {
>

Re-adding cut-out comment for context:

	/*
	 * lru_drain_gen - Current generation of pages that could be in vectors
	 *
	 * (A) Definition: lru_drain_gen = x implies that all generations
	 *     0 < n <= x are already scheduled for draining.
	 *
	 * This is an optimization for the highly-contended use case where a
	 * user space workload keeps constantly generating a flow of pages
	 * for each CPU.
	 */
> > +	static unsigned int lru_drain_gen;
> >  	static struct cpumask has_work;
> > +	static DEFINE_MUTEX(lock);
> > +	int cpu, this_gen;
> >
> >  	/*
> >  	 * Make sure nobody triggers this path before mm_percpu_wq is fully
> > @@ -725,21 +735,48 @@ void lru_add_drain_all(void)
> >  	if (WARN_ON(!mm_percpu_wq))
> >  		return;
> >
>

Re-adding cut-out comment for context:

	/*
	 * (B) Cache the LRU draining generation number
	 *
	 * smp_rmb() ensures that the counter is loaded before the mutex is
	 * taken. It pairs with the smp_wmb() inside the mutex critical section
	 * at (D).
	 */
> > +	this_gen = READ_ONCE(lru_drain_gen);
> > +	smp_rmb();
>
> 	this_gen = smp_load_acquire(&lru_drain_gen);

ACK. will do.

> >
> >  	mutex_lock(&lock);
> >
> >  	/*
> > +	 * (C) Exit the draining operation if a newer generation, from another
> > +	 * lru_add_drain_all(), was already scheduled for draining. Check (A).
> >  	 */
> > +	if (unlikely(this_gen != lru_drain_gen))
> >  		goto done;
> >
>

Re-adding cut-out comment for context:

	/*
	 * (D) Increment generation number
	 *
	 * Pairs with READ_ONCE() and smp_rmb() at (B), outside of the critical
	 * section.
	 *
	 * This pairing must be done here, before the for_each_online_cpu loop
	 * below which drains the page vectors.
	 *
	 * Let x, y, and z represent some system CPU numbers, where x < y < z.
	 * Assume CPU #z is is in the middle of the for_each_online_cpu loop
	 * below and has already reached CPU #y's per-cpu data. CPU #x comes
	 * along, adds some pages to its per-cpu vectors, then calls
	 * lru_add_drain_all().
	 *
	 * If the paired smp_wmb() below is done at any later step, e.g. after
	 * the loop, CPU #x will just exit at (C) and miss flushing out all of
	 * its added pages.
	 */
> > +	WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
> > +	smp_wmb();
>
> You can leave this smp_wmb() out and rely on the smp_mb() implied by
> queue_work_on()'s test_and_set_bit().
>

Won't this be too implicit?

Isn't it possible that, over the years, queue_work_on() impementation
changes and the test_and_set_bit()/smp_mb() gets removed?

If that happens, this commit will get *silently* broken and the local
CPU pages won't be drained.

> >  	cpumask_clear(&has_work);
> > -
> >  	for_each_online_cpu(cpu) {
> >  		struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
> >
>
> While you're here, do:
>
> 	s/cpumask_set_cpu/__&/
>

ACK.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API
  2020-05-25 15:24     ` Ahmed S. Darwish
@ 2020-05-25 15:45       ` Peter Zijlstra
  0 siblings, 0 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-05-25 15:45 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Andrew Morton,
	Konstantin Khlebnikov, linux-mm

On Mon, May 25, 2020 at 05:24:01PM +0200, Ahmed S. Darwish wrote:
> Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, May 19, 2020 at 11:45:24PM +0200, Ahmed S. Darwish wrote:

> > > +	WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
> > > +	smp_wmb();
> >
> > You can leave this smp_wmb() out and rely on the smp_mb() implied by
> > queue_work_on()'s test_and_set_bit().
> >
> 
> Won't this be too implicit?
> 
> Isn't it possible that, over the years, queue_work_on() impementation
> changes and the test_and_set_bit()/smp_mb() gets removed?
> 
> If that happens, this commit will get *silently* broken and the local
> CPU pages won't be drained.

Add a comment to queue_work_on() that points here? That way people are
aware.


^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API
  2020-05-22 14:57   ` Peter Zijlstra
  2020-05-22 15:17     ` Sebastian A. Siewior
  2020-05-25 15:24     ` Ahmed S. Darwish
@ 2020-05-25 16:10     ` John Ogness
  2020-09-10 15:08       ` [tip: locking/core] mm/swap: Do not abuse the seqcount_t " tip-bot2 for Ahmed S. Darwish
  2 siblings, 1 reply; 258+ messages in thread
From: John Ogness @ 2020-05-25 16:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ahmed S. Darwish, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML,
	Andrew Morton, Konstantin Khlebnikov, linux-mm

[-- Attachment #1: Type: text/plain, Size: 2799 bytes --]

Hi,

This optimization is broken. The main concern here: Is it possible that
lru_add_drain_all() _would_ have drained pagevec X, but then aborted
because another lru_add_drain_all() is underway and that other task will
_not_ drain pagevec X? I claim the answer is yes!

My suggested changes are inline below.

I attached a litmus test to verify it.

On 2020-05-22, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, May 19, 2020 at 11:45:24PM +0200, Ahmed S. Darwish wrote:
>> @@ -713,10 +713,20 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
>>   */
>>  void lru_add_drain_all(void)
>>  {
>
>> +	static unsigned int lru_drain_gen;
>>  	static struct cpumask has_work;
>> +	static DEFINE_MUTEX(lock);
>> +	int cpu, this_gen;
>>  
>>  	/*
>>  	 * Make sure nobody triggers this path before mm_percpu_wq is fully
>> @@ -725,21 +735,48 @@ void lru_add_drain_all(void)
>>  	if (WARN_ON(!mm_percpu_wq))
>>  		return;
>>  

An smp_mb() is needed here.

	/*
	 * Guarantee the pagevec counter stores visible by
	 * this CPU are visible to other CPUs before loading
	 * the current drain generation.
	 */
	smp_mb();

>> +	this_gen = READ_ONCE(lru_drain_gen);
>> +	smp_rmb();
>
> 	this_gen = smp_load_acquire(&lru_drain_gen);
>>  
>>  	mutex_lock(&lock);
>>  
>>  	/*
>> +	 * (C) Exit the draining operation if a newer generation, from another
>> +	 * lru_add_drain_all(), was already scheduled for draining. Check (A).
>>  	 */
>> +	if (unlikely(this_gen != lru_drain_gen))
>>  		goto done;
>>  
>
>> +	WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
>> +	smp_wmb();

Instead of smp_wmb(), this needs to be a full memory barrier.

	/*
	 * Guarantee the new drain generation is stored before
	 * loading the pagevec counters.
	 */
	smp_mb();

> You can leave this smp_wmb() out and rely on the smp_mb() implied by
> queue_work_on()'s test_and_set_bit().
>
>>  	cpumask_clear(&has_work);
>> -
>>  	for_each_online_cpu(cpu) {
>>  		struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
>>  
>
> While you're here, do:
>
> 	s/cpumask_set_cpu/__&/
>
>> @@ -766,7 +803,7 @@ void lru_add_drain_all(void)
>>  {
>>  	lru_add_drain();
>>  }
>> -#endif
>> +#endif /* CONFIG_SMP */
>>  
>>  /**
>>   * release_pages - batched put_page()

For the litmus test:

1:rx=0             (P1 did not see the pagevec counter)
2:rx=1             (P2 _would_ have seen the pagevec counter)
2:ry1=0 /\ 2:ry2=1 (P2 aborted due to optimization)

Changing the smp_mb() back to smp_wmb() in P1 and removing the smp_mb()
in P2 represents this patch. And it shows that sometimes P2 will abort
even though it would have drained the pagevec and P1 did not drain the
pagevec.

This is ugly as hell. And there maybe other memory barrier types to make
it pretty. But as is, memory barriers are missing.

John Ogness


[-- Attachment #2: lru_add_drain_all.litmus --]
[-- Type: text/plain, Size: 1069 bytes --]

C lru_add_drain_all

(*
 * x is a pagevec counter
 * y is @lru_drain_gen
 * z is @lock
 *)

{ }

P0(int *x)
{
	// mark pagevec for draining
	WRITE_ONCE(*x, 1);
}

P1(int *x, int *y, int *z)
{
	int rx;
	int rz;

	// mutex_lock(&lock);
	rz = cmpxchg_acquire(z, 0, 1);
	if (rz == 0) {

		// WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
		WRITE_ONCE(*y, 1);

		// guarantee lru_drain_gen store before loading pagevec
		smp_mb();

		// if (pagevec_count(...))
		rx = READ_ONCE(*x);

		// mutex_unlock(&lock);
		rz = cmpxchg_release(z, 1, 2);
	}
}

P2(int *x, int *y, int *z)
{
	int rx;
	int ry1;
	int ry2;
	int rz;

	// the pagevec counter as visible now to this CPU
	rx = READ_ONCE(*x);

	// guarantee pagevec store before loading lru_drain_gen
	smp_mb();

	// this_gen = READ_ONCE(lru_drain_gen); smp_rmb();
	ry1 = smp_load_acquire(y);

	// mutex_lock(&lock) - acquired after P1
	rz = cmpxchg_acquire(z, 2, 3);
	if (rz == 2) {

		// if (unlikely(this_gen != lru_drain_gen))
		ry2 = READ_ONCE(*y);
	}
}

locations [x; y; z]
exists (1:rx=0 /\ 2:rx=1 /\ 2:ry1=0 /\ 2:ry2=1)

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-20 14:37   ` Dan Carpenter
@ 2020-05-25 16:22     ` Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-25 16:22 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: kbuild, Peter Zijlstra, Ingo Molnar, Will Deacon, lkp,
	kbuild-all, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Jakub Kicinski,
	netdev

On Wed, May 20, 2020 at 05:37:07PM +0300, Dan Carpenter wrote:
...
>
> smatch warnings:
> net/core/dev.c:953 netdev_get_name() warn: inconsistent returns 'devnet_rename_sem'.
>
...
>
> 5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  935  int netdev_get_name(struct net *net, char *name, int ifindex)
> 5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  936  {
> 5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  937  	struct net_device *dev;
> 5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  938
> 2354e271ada778b Ahmed S. Darwish 2020-05-19  939  	down_read(&devnet_rename_sem);
>                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> 2354e271ada778b Ahmed S. Darwish 2020-05-19  940
> 5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  941  	rcu_read_lock();
> 5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  942  	dev = dev_get_by_index_rcu(net, ifindex);
> 5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  943  	if (!dev) {
> 5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  944  		rcu_read_unlock();
> 5dbe7c178d3f0a4 Nicolas Schichan 2013-06-26  945  		return -ENODEV;
>                                                               ^^^^^^^^^^^^^^

Oh, shouldn't have missed that. Will fix in v2.

Thanks,

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 07/25] lockdep: Add preemption disabled assertion API
  2020-05-25 10:22         ` Peter Zijlstra
@ 2020-05-26  0:52           ` Ahmed S. Darwish
  2020-05-26  8:13             ` Peter Zijlstra
  0 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-26  0:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sebastian A. Siewior, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Steven Rostedt, LKML

Peter Zijlstra <peterz@infradead.org> wrote:
> On Sun, May 24, 2020 at 12:41:32AM +0200, Peter Zijlstra wrote:
> > On Sat, May 23, 2020 at 04:59:42PM +0200, Sebastian A. Siewior wrote:
> > >
> > > Any "static inline" in the header file using
> > > lockdep_assert_preemption_disabled() will tro to complain about missing
> > > current-> define. But yes, it will work otherwise.
> >
> > Because...? /me rummages around.. Ah you're proposing sticking this in
> > seqcount itself and then header hell.
> >
> > Moo.. ok I'll go have another look on Monday.
>
> How's this?
>

This will work for my case as current-> is no longer referenced by the
lockdep macros. Please continue below though.

...

> -#define lockdep_assert_irqs_enabled()	do {				\
> -		WARN_ONCE(debug_locks && !current->lockdep_recursion &&	\
> -			  !current->hardirqs_enabled,			\
> -			  "IRQs not enabled as expected\n");		\
> -	} while (0)
> +DECLARE_PER_CPU(int, hardirqs_enabled);
> +DECLARE_PER_CPU(int, hardirq_context);
>
> -#define lockdep_assert_irqs_disabled()	do {				\
> -		WARN_ONCE(debug_locks && !current->lockdep_recursion &&	\
> -			  current->hardirqs_enabled,			\
> -			  "IRQs not disabled as expected\n");		\
> -	} while (0)
> +#define lockdep_assert_irqs_enabled()					\
> +do {									\
> +	WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirqs_enabled));	\
> +} while (0)
>

Given that lockdep_off() is defined at lockdep.c as:

  void lockdep_off(void)
  {
        current->lockdep_recursion += LOCKDEP_OFF;
  }

This would imply that all of the macros:

  - lockdep_assert_irqs_enabled()
  - lockdep_assert_irqs_disabled()
  - lockdep_assert_in_irq()
  - lockdep_assert_preemption_disabled()
  - lockdep_assert_preemption_enabled()

will do the lockdep checks *even if* lockdep_off() was called.

This doesn't sound right. Even if all of the above macros call sites
didn't care about lockdep_off()/on(), it is semantically incoherent.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 07/25] lockdep: Add preemption disabled assertion API
  2020-05-26  0:52           ` Ahmed S. Darwish
@ 2020-05-26  8:13             ` Peter Zijlstra
  2020-05-26  9:45               ` Ahmed S. Darwish
  2020-06-03 15:30               ` Ahmed S. Darwish
  0 siblings, 2 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-05-26  8:13 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Sebastian A. Siewior, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Steven Rostedt, LKML

On Tue, May 26, 2020 at 02:52:31AM +0200, Ahmed S. Darwish wrote:
> Peter Zijlstra <peterz@infradead.org> wrote:

> > +#define lockdep_assert_irqs_enabled()					\
> > +do {									\
> > +	WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirqs_enabled));	\
> > +} while (0)
> >
> 
> Given that lockdep_off() is defined at lockdep.c as:
> 
>   void lockdep_off(void)
>   {
>         current->lockdep_recursion += LOCKDEP_OFF;
>   }
> 
> This would imply that all of the macros:
> 
>   - lockdep_assert_irqs_enabled()
>   - lockdep_assert_irqs_disabled()
>   - lockdep_assert_in_irq()
>   - lockdep_assert_preemption_disabled()
>   - lockdep_assert_preemption_enabled()
> 
> will do the lockdep checks *even if* lockdep_off() was called.
> 
> This doesn't sound right. Even if all of the above macros call sites
> didn't care about lockdep_off()/on(), it is semantically incoherent.

lockdep_off() is an abomination and really should not exist.

That dm-cache-target.c thing, for example, is atrocious shite that will
explode on -rt. Whoever wrote that needs a 'medal'.

People using it deserve all the pain they get.

Also; IRQ state _should_ be tracked irrespective of tracking lock
dependencies -- I see that that currently isn't entirely the case, lemme
go fix that.


^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 07/25] lockdep: Add preemption disabled assertion API
  2020-05-26  8:13             ` Peter Zijlstra
@ 2020-05-26  9:45               ` Ahmed S. Darwish
  2020-06-03 15:30               ` Ahmed S. Darwish
  1 sibling, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-05-26  9:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sebastian A. Siewior, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Steven Rostedt, LKML

On Tue, May 26, 2020 at 10:13:50AM +0200, Peter Zijlstra wrote:
> On Tue, May 26, 2020 at 02:52:31AM +0200, Ahmed S. Darwish wrote:
> > Peter Zijlstra <peterz@infradead.org> wrote:
>
> > > +#define lockdep_assert_irqs_enabled()					\
> > > +do {									\
> > > +	WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirqs_enabled));	\
> > > +} while (0)
> > >
> >
> > Given that lockdep_off() is defined at lockdep.c as:
> >
> >   void lockdep_off(void)
> >   {
> >         current->lockdep_recursion += LOCKDEP_OFF;
> >   }
> >
> > This would imply that all of the macros:
> >
> >   - lockdep_assert_irqs_enabled()
> >   - lockdep_assert_irqs_disabled()
> >   - lockdep_assert_in_irq()
> >   - lockdep_assert_preemption_disabled()
> >   - lockdep_assert_preemption_enabled()
> >
> > will do the lockdep checks *even if* lockdep_off() was called.
> >
> > This doesn't sound right. Even if all of the above macros call sites
> > didn't care about lockdep_off()/on(), it is semantically incoherent.
>
> lockdep_off() is an abomination and really should not exist.
>
> That dm-cache-target.c thing, for example, is atrocious shite that will
> explode on -rt. Whoever wrote that needs a 'medal'.
>
> People using it deserve all the pain they get.
>
> Also; IRQ state _should_ be tracked irrespective of tracking lock
> dependencies -- I see that that currently isn't entirely the case, lemme
> go fix that.
>

Exactly, currently all the lockdep IRQ checks gets nullified if
lockdep_off() is called. That was the source of my confusion.

If you'll have any extra patches on this, I can also queue them in the
next iteration of this series, before this patch.

Thanks a lot,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount
  2020-05-20 12:51       ` Eric Dumazet
@ 2020-06-03 14:33         ` Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-03 14:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, David S. Miller, Jakub Kicinski, netdev

On Wed, May 20, 2020 at 05:51:27AM -0700, Eric Dumazet wrote:
>
> On 5/19/20 11:42 PM, Ahmed S. Darwish wrote:
> > Hello Eric,
> >
> > On Tue, May 19, 2020 at 07:01:38PM -0700, Eric Dumazet wrote:
> >>
> >> On 5/19/20 2:45 PM, Ahmed S. Darwish wrote:
> >>> Sequence counters write paths are critical sections that must never be
> >>> preempted, and blocking, even for CONFIG_PREEMPTION=n, is not allowed.
> >>>
> >>> Commit 5dbe7c178d3f ("net: fix kernel deadlock with interface rename and
> >>> netdev name retrieval.") handled a deadlock, observed with
> >>> CONFIG_PREEMPTION=n, where the devnet_rename seqcount read side was
> >>> infinitely spinning: it got scheduled after the seqcount write side
> >>> blocked inside its own critical section.
> >>>
> >>> To fix that deadlock, among other issues, the commit added a
> >>> cond_resched() inside the read side section. While this will get the
> >>> non-preemptible kernel eventually unstuck, the seqcount reader is fully
> >>> exhausting its slice just spinning -- until TIF_NEED_RESCHED is set.
> >>>
> >>> The fix is also still broken: if the seqcount reader belongs to a
> >>> real-time scheduling policy, it can spin forever and the kernel will
> >>> livelock.
> >>>
> >>> Disabling preemption over the seqcount write side critical section will
> >>> not work: inside it are a number of GFP_KERNEL allocations and mutex
> >>> locking through the drivers/base/ :: device_rename() call chain.
> >>>
> >>> From all the above, replace the seqcount with a rwsem.
> >>>
> >>> Fixes: 5dbe7c178d3f (net: fix kernel deadlock with interface rename and netdev name retrieval.)
> >>> Fixes: 30e6c9fa93cf (net: devnet_rename_seq should be a seqcount)
> >>> Fixes: c91f6df2db49 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name)
> >>> Cc: <stable@vger.kernel.org>
> >>> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> >>> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> >>> ---
> >>>  net/core/dev.c | 30 ++++++++++++------------------
> >>>  1 file changed, 12 insertions(+), 18 deletions(-)
> >>>
> >>
> >> Seems fine to me, assuming rwsem prevent starvation of the writer.
> >>
> >
> > Thanks for the review.
> >
> > AFAIK, due to 5cfd92e12e13 ("locking/rwsem: Adaptive disabling of reader
> > optimistic spinning"), using a rwsem shouldn't lead to writer starvation
> > in the contended case.
>
> Hmm this was in linux-5.3, so very recent stuff.
>
> Has this patch been backported to stable releases ?
>
> With all the Fixes: tags you added, stable teams will backport this
> networking patch to all stable versions.
>
> Do we have a way to tune a dedicare rwsem to 'give preference to the
> (unique in this case) writer" over a myriad of potential readers ?
>

I was wrong in referencing the commit 5cfd92e12e13 above.

Before and after that commit, once a rwsem writer is blocking, all
subsequent readers will block until that writer makes progress.

Given that behavior, and that the read section is already quite short, I
don't think there's any danger incurred on writers here.

(a v2 will be sent shortly, fixing the error found Dan/kbuild-bot.)

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 07/25] lockdep: Add preemption disabled assertion API
  2020-05-26  8:13             ` Peter Zijlstra
  2020-05-26  9:45               ` Ahmed S. Darwish
@ 2020-06-03 15:30               ` Ahmed S. Darwish
  1 sibling, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-03 15:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sebastian A. Siewior, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Steven Rostedt, LKML

On Tue, May 26, 2020 at 10:13:50AM +0200, Peter Zijlstra wrote:
> On Tue, May 26, 2020 at 02:52:31AM +0200, Ahmed S. Darwish wrote:
> > Peter Zijlstra <peterz@infradead.org> wrote:
>
> > > +#define lockdep_assert_irqs_enabled()					\
> > > +do {									\
> > > +	WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirqs_enabled));	\
> > > +} while (0)
> > >
> >
> > Given that lockdep_off() is defined at lockdep.c as:
> >
> >   void lockdep_off(void)
> >   {
> >         current->lockdep_recursion += LOCKDEP_OFF;
> >   }
> >
> > This would imply that all of the macros:
> >
> >   - lockdep_assert_irqs_enabled()
> >   - lockdep_assert_irqs_disabled()
> >   - lockdep_assert_in_irq()
> >   - lockdep_assert_preemption_disabled()
> >   - lockdep_assert_preemption_enabled()
> >
> > will do the lockdep checks *even if* lockdep_off() was called.
> >
> > This doesn't sound right. Even if all of the above macros call sites
> > didn't care about lockdep_off()/on(), it is semantically incoherent.
>
> lockdep_off() is an abomination and really should not exist.
>
> That dm-cache-target.c thing, for example, is atrocious shite that will
> explode on -rt. Whoever wrote that needs a 'medal'.
>
> People using it deserve all the pain they get.
>
> Also; IRQ state _should_ be tracked irrespective of tracking lock
> dependencies -- I see that that currently isn't entirely the case, lemme
> go fix that.
>

Since the lockdep/x86 series:

  https://lkml.kernel.org/r/20200529212728.795169701@infradead.org
  https://lkml.kernel.org/r/20200529213550.683440625@infradead.org

are pending and quite big, I'll drop patch #7 and patch #8 from this
series, and post a seqlock v2.

This way, this seqlock series can move forward.

Patches #7 and #8 are an "add-on" debugging feature anyway. They're
quite important of course, evident by the number of buggy call sites
they've found, but they don't affect the rest of the seqlock series in
any way.

Once the lockdep/x86 series above get merged, I can easily rebase and
post paches #7 and #8 again.

Thanks a lot,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (24 preceding siblings ...)
  2020-05-19 21:45 ` [PATCH v1 25/25] hrtimer: Use sequence counter with associated raw spinlock Ahmed S. Darwish
@ 2020-06-08  0:57 ` Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 01/18] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
                     ` (17 more replies)
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (4 subsequent siblings)
  30 siblings, 18 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc, David Airlie, Daniel Vetter, dri-devel,
	David S. Miller, netdev, Jens Axboe, linux-block, Alexander Viro,
	linux-fsdevel

Hi,

This is v2 of the seqlock patch series:

   [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks
   https://lore.kernel.org/lkml/20200519214547.352050-1-a.darwish@linutronix.de

Patches 1=>3 of this v2 series add documentation for the existing
seqlock.h datatypes and APIs. Hopefully they can hit v5.8 -rc2 or -rc3.

Changelog-v2
============

1. Drop, for now, the seqlock v1 patches #7 and #8. These patches added
lockdep non-preemptibility checks to seqcount_t write paths, but they
now depend on on-going work by Peter:

   [PATCH v3 0/5] lockdep: Change IRQ state tracking to use per-cpu variables
   https://lkml.kernel.org/r/20200529213550.683440625@infradead.org

   [PATCH 00/14] x86/entry: disallow #DB more and x86/entry lockdep/nmi
   https://lkml.kernel.org/r/20200529212728.795169701@infradead.org

Once Peter's work get merged, I'll send the non-preemptibility checks as
a separate series.

2. Drop the v1 seqcount_t call-sites bugfixes. I've already posted them
in an isolated series. They got merged into their respective trees, and
will hit v5.8-rc1 soon:

   [PATCH v2 0/6] seqlock: seqcount_t call sites bugfixes
   https://lore.kernel.org/lkml/20200603144949.1122421-1-a.darwish@linutronix.de

3. Patch #1: Add a small paragraph explaining that seqcount_t/seqlock_t
cannot be used if the protected data contains pointers. A similar
paragraph already existed in seqlock.h, but got mistakenly dropped.

4. Patch #2: Don't add RST directives inside kernel-doc comments. Peter
doesn't like them :) I've kept the indentation though, and found a
minimal way for Sphinx to properly render these code samples without too
much disruption.

5. Patch #3: Brush up the introduced kernel-doc comments. Make them more
consistent overall, and more concise.

Thanks,

8<--------------

Ahmed S. Darwish (18):
  Documentation: locking: Describe seqlock design and usage
  seqlock: Properly format kernel-doc code samples
  seqlock: Add missing kernel-doc annotations
  seqlock: Extend seqcount API with associated locks
  dma-buf: Remove custom seqcount lockdep class key
  dma-buf: Use sequence counter with associated wound/wait mutex
  sched: tasks: Use sequence counter with associated spinlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  xfrm: policy: Use sequence counters with associated lock
  timekeeping: Use sequence counter with associated raw spinlock
  vfs: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  hrtimer: Use sequence counter with associated raw spinlock

 Documentation/locking/index.rst               |   1 +
 Documentation/locking/seqlock.rst             | 242 +++++
 MAINTAINERS                                   |   2 +-
 block/blk-iocost.c                            |   5 +-
 drivers/dma-buf/dma-resv.c                    |  15 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   2 -
 drivers/md/raid5.c                            |   2 +-
 drivers/md/raid5.h                            |   2 +-
 fs/dcache.c                                   |   2 +-
 fs/fs_struct.c                                |   4 +-
 fs/nfs/nfs4_fs.h                              |   2 +-
 fs/nfs/nfs4state.c                            |   2 +-
 fs/userfaultfd.c                              |   4 +-
 include/linux/dcache.h                        |   2 +-
 include/linux/dma-resv.h                      |   4 +-
 include/linux/fs_struct.h                     |   2 +-
 include/linux/hrtimer.h                       |   2 +-
 include/linux/kvm_irqfd.h                     |   2 +-
 include/linux/sched.h                         |   2 +-
 include/linux/seqlock.h                       | 855 ++++++++++++++----
 include/linux/seqlock_types_internal.h        | 187 ++++
 include/net/netfilter/nf_conntrack.h          |   2 +-
 init/init_task.c                              |   3 +-
 kernel/fork.c                                 |   2 +-
 kernel/time/hrtimer.c                         |  13 +-
 kernel/time/timekeeping.c                     |  19 +-
 net/netfilter/nf_conntrack_core.c             |   5 +-
 net/netfilter/nft_set_rbtree.c                |   4 +-
 net/xfrm/xfrm_policy.c                        |  10 +-
 virt/kvm/eventfd.c                            |   2 +-
 30 files changed, 1175 insertions(+), 226 deletions(-)
 create mode 100644 Documentation/locking/seqlock.rst
 create mode 100644 include/linux/seqlock_types_internal.h

base-commit: 3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162
--
2.20.1

^ permalink raw reply	[flat|nested] 258+ messages in thread

* [PATCH v2 01/18] Documentation: locking: Describe seqlock design and usage
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 02/18] seqlock: Properly format kernel-doc code samples Ahmed S. Darwish
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc

Proper documentation for the design and usage of sequence counters and
sequential locks does not exist. Complete the seqlock.h documentation as
follows:

  - Divide all documentation on a seqcount_t vs. seqlock_t basis. The
    description for both mechanisms was intermingled, which is incorrect
    since the usage constrains for each type are vastly different.

  - Add an introductory paragraph describing the internal design of, and
    rationale for, sequence counters.

  - Document seqcount_t writer non-preemptibility requirement, which was
    not previously documented anywhere, and provide a clear rationale.

  - Provide template code for seqcount_t and seqlock_t initialization
    and reader/writer critical sections.

  - Recommend using seqlock_t by default. It implicitly handles the
    serialization and non-preemptibility requirements of writers.

At seqlock.h:

  - Remove references to brlocks as they've long been removed from the
    kernel.

  - Remove references to gcc-3.x since the kernel's minimum supported
    gcc version is 4.6.

References: 0f6ed63b1707 ("no need to keep brlock macros anymore...")
References: cafa0010cd51 ("Raise the minimum required gcc version to 4.6")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 Documentation/locking/index.rst   |   1 +
 Documentation/locking/seqlock.rst | 184 ++++++++++++++++++++++++++++++
 include/linux/seqlock.h           |  75 ++++++------
 3 files changed, 219 insertions(+), 41 deletions(-)
 create mode 100644 Documentation/locking/seqlock.rst

diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
index 5d6800a723dc..aad15fc81ccd 100644
--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -14,6 +14,7 @@ locking
     mutex-design
     rt-mutex-design
     rt-mutex
+    seqlock
     spinlocks
     ww-mutex-design
 
diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
new file mode 100644
index 000000000000..c9916efe038e
--- /dev/null
+++ b/Documentation/locking/seqlock.rst
@@ -0,0 +1,184 @@
+======================================
+Sequence counters and sequential locks
+======================================
+
+Introduction
+============
+
+Sequence counters are a reader-writer consistency mechanism with
+lockless readers (read-only retry loops), and no writer starvation. They
+are used for data that's rarely written to (e.g. system time), where the
+reader wants a consistent set of information and is willing to retry if
+that information changes.
+
+A data set is consistent when the sequence count at the beginning of the
+read side critical section is even and the same sequence count value is
+read again at the end of the critical section. The data in the set must
+be copied out inside the read side critical section. If the sequence
+count has changed between the start and the end of the critical section,
+the reader must retry.
+
+Writers increment the sequence count at the start and the end of their
+critical section. After starting the critical section the sequence count
+is odd and indicates to the readers that an update is in progress. At
+the end of the write side critical section the sequence count becomes
+even again which lets readers make progress.
+
+A sequence counter write side critical section must never be preempted
+or interrupted by read side sections. Otherwise the reader will spin for
+the entire scheduler tick due to the odd sequence count value and the
+interrupted writer. If that reader belongs to a real-time scheduling
+class, it can spin forever and the kernel will livelock.
+
+This mechanism cannot be used if the protected data contains pointers,
+as the writer can invalidate a pointer that the reader is following.
+
+.. _seqcount_t:
+
+Sequence counters (:c:type:`seqcount_t`)
+========================================
+
+This is the the raw counting mechanism, which does not protect against
+multiple writers.  Write side critical sections must thus be serialized
+by an external lock.
+
+If the write serialization primitive is not implicitly disabling
+preemption, preemption must be explicitly disabled before entering the
+write side section. If the read section can be invoked from hardirq or
+softirq contexts, interrupts or bottom halves must also be respectively
+disabled before entering the write section.
+
+If it's desired to automatically handle the sequence counter
+requirements of writer serialization and non-preemptibility, use a
+:ref:`sequential lock <seqlock_t>` instead.
+
+Initialization:
+
+.. code-block:: c
+
+	/* dynamic */
+	seqcount_t foo_seqcount;
+	seqcount_init(&foo_seqcount);
+
+	/* static */
+	static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
+
+	/* C99 struct init */
+	struct {
+		.seq   = SEQCNT_ZERO(foo.seq),
+	} foo;
+
+Write path:
+
+.. code-block:: c
+
+	/* Serialized context with disabled preemption */
+
+	write_seqcount_begin(&foo_seqcount);
+
+	/* ... [[write-side critical section]] ... */
+
+	write_seqcount_end(&foo_seqcount);
+
+Read path:
+
+.. code-block:: c
+
+	do {
+		seq = read_seqcount_begin(&foo_seqcount);
+
+		/* ... [[read-side critical section]] ... */
+
+	} while (read_seqcount_retry(&foo_seqcount, seq));
+
+.. _seqlock_t:
+
+Sequential locks (:c:type:`seqlock_t`)
+======================================
+
+This contains the :ref:`sequence counting mechanism <seqcount_t>`
+earlier discussed, plus an embedded spinlock for writer serialization
+and non-preemptibility.
+
+If the read side section can be invoked from hardirq or softirq context,
+use the write side function variants which disable interrupts or bottom
+halves respectively.
+
+Initialization:
+
+.. code-block:: c
+
+	/* dynamic */
+	seqlock_t foo_seqlock;
+	seqlock_init(&foo_seqlock);
+
+	/* static */
+	static DEFINE_SEQLOCK(foo_seqlock);
+
+	/* C99 struct init */
+	struct {
+		.seql   = __SEQLOCK_UNLOCKED(foo.seql)
+	} foo;
+
+Write path:
+
+.. code-block:: c
+
+	write_seqlock(&foo_seqlock);
+
+	/* ... [[write-side critical section]] ... */
+
+	write_sequnlock(&foo_seqlock);
+
+Read path, three categories:
+
+1. Normal Sequence readers which never block a writer but they must
+   retry if a writer is in progress by detecting change in the sequence
+   number.  Writers do not wait for a sequence reader.
+
+   .. code-block:: c
+
+	do {
+		seq = read_seqbegin(&foo_seqlock);
+
+		/* ... [[read-side critical section]] ... */
+
+	} while (read_seqretry(&foo_seqlock, seq));
+
+2. Locking readers which will wait if a writer or another locking reader
+   is in progress. A locking reader in progress will also block a writer
+   from entering its critical section. This read lock is
+   exclusive. Unlike rwlock_t, only one locking reader can acquire it.
+
+   .. code-block:: c
+
+	read_seqlock_excl(&foo_seqlock);
+
+	/* ... [[read-side critical section]] ... */
+
+	read_sequnlock_excl(&foo_seqlock);
+
+3. Conditional lockless reader (as in 1), or locking reader (as in 2),
+   according to a passed marker. This is used to avoid lockless readers
+   starvation (too much retry loops) in case of a sharp spike in write
+   activity. First, a lockless read is tried (even marker passed). If
+   that trial fails (odd sequence counter is returned, which is used as
+   the next iteration marker), the lockless read is transformed to a
+   full locking read and no retry loop is necessary.
+
+   .. code-block:: c
+
+	/* marker; even initialization */
+	int seq = 0;
+	do {
+		read_seqbegin_or_lock(&foo_seqlock, &seq);
+
+		/* ... [[read-side critical section]] ... */
+
+	} while (need_seqretry(&foo_seqlock, seq));
+	done_seqretry(&foo_seqlock, seq);
+
+API documentation
+=================
+
+.. kernel-doc:: include/linux/seqlock.h
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 0491d963d47e..aee894dc49aa 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -1,36 +1,15 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef __LINUX_SEQLOCK_H
 #define __LINUX_SEQLOCK_H
+
 /*
- * Reader/writer consistent mechanism without starving writers. This type of
- * lock for data where the reader wants a consistent set of information
- * and is willing to retry if the information changes. There are two types
- * of readers:
- * 1. Sequence readers which never block a writer but they may have to retry
- *    if a writer is in progress by detecting change in sequence number.
- *    Writers do not wait for a sequence reader.
- * 2. Locking readers which will wait if a writer or another locking reader
- *    is in progress. A locking reader in progress will also block a writer
- *    from going forward. Unlike the regular rwlock, the read lock here is
- *    exclusive so that only one locking reader can get it.
+ * seqcount_t / seqlock_t - a reader-writer consistency mechanism with
+ * lockless readers (read-only retry loops), and no writer starvation.
  *
- * This is not as cache friendly as brlock. Also, this may not work well
- * for data that contains pointers, because any writer could
- * invalidate a pointer that a reader was following.
+ * See Documentation/locking/seqlock.rst for full description.
  *
- * Expected non-blocking reader usage:
- * 	do {
- *	    seq = read_seqbegin(&foo);
- * 	...
- *      } while (read_seqretry(&foo, seq));
- *
- *
- * On non-SMP the spin locks disappear but the writer still needs
- * to increment the sequence variables because an interrupt routine could
- * change the state of the data.
- *
- * Based on x86_64 vsyscall gettimeofday 
- * by Keith Owens and Andrea Arcangeli
+ * Copyrights:
+ * - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli
  */
 
 #include <linux/spinlock.h>
@@ -40,10 +19,24 @@
 #include <asm/processor.h>
 
 /*
- * Version using sequence counter only.
- * This can be used when code has its own mutex protecting the
- * updating starting before the write_seqcountbeqin() and ending
- * after the write_seqcount_end().
+ * Sequence counters (seqcount_t)
+ *
+ * This is the raw counting mechanism, without any writer protection.
+ *
+ * Write side critical sections must be serialized and non-preemptible.
+ *
+ * If readers can be invoked from hardirq or softirq contexts,
+ * interrupts or bottom halves must also be respectively disabled before
+ * entering the write section.
+ *
+ * This mechanism can't be used if the protected data contains pointers,
+ * as the writer can invalidate a pointer that a reader is following.
+ *
+ * If it's desired to automatically handle the sequence counter writer
+ * serialization and non-preemptibility requirements, use a sequential
+ * lock (seqlock_t) instead.
+ *
+ * See Documentation/locking/seqlock.rst
  */
 typedef struct seqcount {
 	unsigned sequence;
@@ -221,8 +214,6 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
 	return __read_seqcount_retry(s, start);
 }
 
-
-
 static inline void raw_write_seqcount_begin(seqcount_t *s)
 {
 	s->sequence++;
@@ -367,10 +358,6 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
        smp_wmb();      /* increment "sequence" before following stores */
 }
 
-/*
- * Sequence counter only version assumes that callers are using their
- * own mutexing.
- */
 static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
 	raw_write_seqcount_begin(s);
@@ -401,15 +388,21 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
 	s->sequence+=2;
 }
 
+/*
+ * Sequential locks (seqlock_t)
+ *
+ * Sequence counters with an embedded spinlock for writer serialization
+ * and non-preemptibility.
+ *
+ * For more info, see:
+ *   - Comments on top of seqcount_t
+ *   - Documentation/locking/seqlock.rst
+ */
 typedef struct {
 	struct seqcount seqcount;
 	spinlock_t lock;
 } seqlock_t;
 
-/*
- * These macros triggered gcc-3.x compile-time problems.  We think these are
- * OK now.  Be cautious.
- */
 #define __SEQLOCK_UNLOCKED(lockname)			\
 	{						\
 		.seqcount = SEQCNT_ZERO(lockname),	\
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 02/18] seqlock: Properly format kernel-doc code samples
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 01/18] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 03/18] seqlock: Add missing kernel-doc annotations Ahmed S. Darwish
                     ` (15 subsequent siblings)
  17 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc

Align the code samples and note sections inside kernel-doc comments with
tabs. This way they can be properly parsed and rendered by Sphinx. It
also makes the code samples easier to read from text editors.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 82 +++++++++++++++++++++--------------------
 1 file changed, 43 insertions(+), 39 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index aee894dc49aa..7296af778301 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -232,7 +232,7 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  *
  * This can be used to provide an ordering guarantee instead of the
  * usual consistency guarantee. It is one wmb cheaper, because we can
- * collapse the two back-to-back wmb()s.
+ * collapse the two back-to-back wmb()s::
  *
  *      seqcount_t seq;
  *      bool X = true, Y = false;
@@ -292,64 +292,68 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  * Very simply put: we first modify one copy and then the other. This ensures
  * there is always one copy in a stable state, ready to give us an answer.
  *
- * The basic form is a data structure like:
+ * The basic form is a data structure like::
  *
- * struct latch_struct {
- *	seqcount_t		seq;
- *	struct data_struct	data[2];
- * };
+ *	struct latch_struct {
+ *		seqcount_t		seq;
+ *		struct data_struct	data[2];
+ *	};
  *
  * Where a modification, which is assumed to be externally serialized, does the
- * following:
+ * following::
  *
- * void latch_modify(struct latch_struct *latch, ...)
- * {
- *	smp_wmb();	<- Ensure that the last data[1] update is visible
- *	latch->seq++;
- *	smp_wmb();	<- Ensure that the seqcount update is visible
+ *	void latch_modify(struct latch_struct *latch, ...)
+ *	{
+ *		smp_wmb();	// Ensure that the last data[1] update is visible
+ *		latch->seq++;
+ *		smp_wmb();	// Ensure that the seqcount update is visible
  *
- *	modify(latch->data[0], ...);
+ *		modify(latch->data[0], ...);
  *
- *	smp_wmb();	<- Ensure that the data[0] update is visible
- *	latch->seq++;
- *	smp_wmb();	<- Ensure that the seqcount update is visible
+ *		smp_wmb();	// Ensure that the data[0] update is visible
+ *		latch->seq++;
+ *		smp_wmb();	// Ensure that the seqcount update is visible
  *
- *	modify(latch->data[1], ...);
- * }
+ *		modify(latch->data[1], ...);
+ *	}
  *
- * The query will have a form like:
+ * The query will have a form like::
  *
- * struct entry *latch_query(struct latch_struct *latch, ...)
- * {
- *	struct entry *entry;
- *	unsigned seq, idx;
+ *	struct entry *latch_query(struct latch_struct *latch, ...)
+ *	{
+ *		struct entry *entry;
+ *		unsigned seq, idx;
  *
- *	do {
- *		seq = raw_read_seqcount_latch(&latch->seq);
+ *		do {
+ *			seq = raw_read_seqcount_latch(&latch->seq);
  *
- *		idx = seq & 0x01;
- *		entry = data_query(latch->data[idx], ...);
+ *			idx = seq & 0x01;
+ *			entry = data_query(latch->data[idx], ...);
  *
- *		smp_rmb();
- *	} while (seq != latch->seq);
+ *			smp_rmb();
+ *		} while (seq != latch->seq);
  *
- *	return entry;
- * }
+ *		return entry;
+ *	}
  *
  * So during the modification, queries are first redirected to data[1]. Then we
  * modify data[0]. When that is complete, we redirect queries back to data[0]
  * and we can modify data[1].
  *
- * NOTE: The non-requirement for atomic modifications does _NOT_ include
- *       the publishing of new entries in the case where data is a dynamic
- *       data structure.
+ * NOTE:
  *
- *       An iteration might start in data[0] and get suspended long enough
- *       to miss an entire modification sequence, once it resumes it might
- *       observe the new entry.
+ *	The non-requirement for atomic modifications does _NOT_ include
+ *	the publishing of new entries in the case where data is a dynamic
+ *	data structure.
  *
- * NOTE: When data is a dynamic data structure; one should use regular RCU
- *       patterns to manage the lifetimes of the objects within.
+ *	An iteration might start in data[0] and get suspended long enough
+ *	to miss an entire modification sequence, once it resumes it might
+ *	observe the new entry.
+ *
+ * NOTE:
+ *
+ *	When data is a dynamic data structure; one should use regular RCU
+ *	patterns to manage the lifetimes of the objects within.
  */
 static inline void raw_write_seqcount_latch(seqcount_t *s)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 03/18] seqlock: Add missing kernel-doc annotations
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 01/18] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 02/18] seqlock: Properly format kernel-doc code samples Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 04/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (14 subsequent siblings)
  17 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc

A small number of the the exported seqlock.h functions are kernel-doc
annotated.

Since seqlock.h is now included by the kernel's RST documentation, add
kernel-doc annotations for all of the remaining functions.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 388 ++++++++++++++++++++++++++++++++++------
 1 file changed, 329 insertions(+), 59 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 7296af778301..a11b113ed396 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -59,6 +59,10 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
 # define SEQCOUNT_DEP_MAP_INIT(lockname) \
 		.dep_map = { .name = #lockname } \
 
+/**
+ * seqcount_init() - runtime initializer for seqcount_t
+ * @s: Pointer to the &typedef seqcount_t instance
+ */
 # define seqcount_init(s)				\
 	do {						\
 		static struct lock_class_key __key;	\
@@ -82,13 +86,17 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
 # define seqcount_lockdep_reader_access(x)
 #endif
 
-#define SEQCNT_ZERO(lockname) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(lockname)}
+/**
+ * SEQCNT_ZERO() - static initializer for seqcount_t
+ * @name: Name of the &typedef seqcount_t instance
+ */
+#define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) }
 
 
 /**
- * __read_seqcount_begin - begin a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * __read_seqcount_begin() - begin a seq-read critical section (without barrier)
+ * @s: Pointer to &typedef seqcount_t
+ * Returns: count to be passed to read_seqcount_retry()
  *
  * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
  * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
@@ -112,13 +120,14 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount - Read the raw seqcount
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * raw_read_seqcount() - Read the raw seqcount
+ * @s: Pointer to &typedef seqcount_t
+ * Returns: count to be passed to read_seqcount_retry()
  *
  * raw_read_seqcount opens a read critical section of the given
- * seqcount without any lockdep checking and without checking or
- * masking the LSB. Calling code is responsible for handling that.
+ * seqcount_t, without any lockdep checks and without checking or
+ * masking the sequence counter LSB. Calling code is responsible for
+ * handling that.
  */
 static inline unsigned raw_read_seqcount(const seqcount_t *s)
 {
@@ -128,13 +137,13 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount_begin - start seq-read critical section w/o lockdep
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * raw_read_seqcount_begin() - start seq-read critical section w/o lockdep
+ * @s: Pointer to &typedef seqcount_t
+ * Returns: count to be passed to read_seqcount_retry()
  *
  * raw_read_seqcount_begin opens a read critical section of the given
- * seqcount, but without any lockdep checking. Validity of the critical
- * section is tested by checking read_seqcount_retry function.
+ * seqcount_t, but without any lockdep checking. Validity of the read
+ * section must be checked with read_seqcount_retry().
  */
 static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
 {
@@ -144,13 +153,13 @@ static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * read_seqcount_begin - begin a seq-read critical section
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * read_seqcount_begin() - begin a seq-read critical section
+ * @s: Pointer to &typedef seqcount_t
+ * Returns: count to be passed to read_seqcount_retry()
  *
- * read_seqcount_begin opens a read critical section of the given seqcount.
- * Validity of the critical section is tested by checking read_seqcount_retry
- * function.
+ * read_seqcount_begin opens a read critical section of the given
+ * seqcount_t. Validity of the read section must be checked with
+ * read_seqcount_retry().
  */
 static inline unsigned read_seqcount_begin(const seqcount_t *s)
 {
@@ -159,11 +168,11 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * raw_seqcount_begin - begin a seq-read critical section
- * @s: pointer to seqcount_t
+ * raw_seqcount_begin() - begin a seq-read critical section
+ * @s: Pointer to &typedef seqcount_t
  * Returns: count to be passed to read_seqcount_retry
  *
- * raw_seqcount_begin opens a read critical section of the given seqcount.
+ * raw_seqcount_begin opens a read critical section of the given seqcount_t.
  * Validity of the critical section is tested by checking read_seqcount_retry
  * function.
  *
@@ -180,8 +189,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * __read_seqcount_retry - end a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
+ * __read_seqcount_retry() - end a seq-read critical section (without barrier)
+ * @s: Pointer to &typedef seqcount_t
  * @start: count, from read_seqcount_begin
  * Returns: 1 if retry is required, else 0
  *
@@ -199,12 +208,12 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
 }
 
 /**
- * read_seqcount_retry - end a seq-read critical section
- * @s: pointer to seqcount_t
+ * read_seqcount_retry() - end a seq-read critical section
+ * @s: Pointer to &typedef seqcount_t
  * @start: count, from read_seqcount_begin
  * Returns: 1 if retry is required, else 0
  *
- * read_seqcount_retry closes a read critical section of the given seqcount.
+ * read_seqcount_retry closes a read critical section of given seqcount_t.
  * If the critical section was invalid, it must be ignored (and typically
  * retried).
  */
@@ -227,8 +236,8 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 }
 
 /**
- * raw_write_seqcount_barrier - do a seq write barrier
- * @s: pointer to seqcount_t
+ * raw_write_seqcount_barrier() - do a seq write barrier
+ * @s: Pointer to &typedef seqcount_t
  *
  * This can be used to provide an ordering guarantee instead of the
  * usual consistency guarantee. It is one wmb cheaper, because we can
@@ -267,6 +276,21 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
 	s->sequence++;
 }
 
+/**
+ * raw_read_seqcount_latch() - pick even or odd seqcount latch data copy
+ * @s: Pointer to &typedef seqcount_t
+ *
+ * Use seqcount latching to switch between two storage places with
+ * sequence protection to allow interruptible, preemptible, writer
+ * sections.
+ *
+ * Check raw_write_seqcount_latch() for more details and a full reader
+ * and writer usage example.
+ *
+ * Return: sequence counter. Use the lowest bit as index for picking
+ * which data copy to read. Full counter must then be checked with
+ * read_seqcount_retry().
+ */
 static inline int raw_read_seqcount_latch(seqcount_t *s)
 {
 	/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
@@ -275,8 +299,8 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
 }
 
 /**
- * raw_write_seqcount_latch - redirect readers to even/odd copy
- * @s: pointer to seqcount_t
+ * raw_write_seqcount_latch() - redirect readers to even/odd copy
+ * @s: Pointer to &typedef seqcount_t
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -330,8 +354,8 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  *			idx = seq & 0x01;
  *			entry = data_query(latch->data[idx], ...);
  *
- *			smp_rmb();
- *		} while (seq != latch->seq);
+ *			// read_seqcount_retry() includes necessary smp_rmb()
+ *		} while (read_seqcount_retry(&latch->seq, seq);
  *
  *		return entry;
  *	}
@@ -373,6 +397,12 @@ static inline void write_seqcount_begin(seqcount_t *s)
 	write_seqcount_begin_nested(s, 0);
 }
 
+/**
+ * write_seqcount_end() - end a seqcount write-side critical section
+ * @s: Pointer to &typedef seqcount_t
+ *
+ * The write section must've been opened with write_seqcount_begin().
+ */
 static inline void write_seqcount_end(seqcount_t *s)
 {
 	seqcount_release(&s->dep_map, _RET_IP_);
@@ -380,8 +410,8 @@ static inline void write_seqcount_end(seqcount_t *s)
 }
 
 /**
- * write_seqcount_invalidate - invalidate in-progress read-side seq operations
- * @s: pointer to seqcount_t
+ * write_seqcount_invalidate() - invalidate in-progress read-side seq operations
+ * @s: Pointer to &typedef seqcount_t
  *
  * After write_seqcount_invalidate, no read-side seq operations will complete
  * successfully and see data older than this.
@@ -413,32 +443,67 @@ typedef struct {
 		.lock =	__SPIN_LOCK_UNLOCKED(lockname)	\
 	}
 
-#define seqlock_init(x)					\
+/**
+ * seqlock_init() - dynamic initializer for seqlock_t
+ * @sl: Pointer to the &typedef seqlock_t instance
+ */
+#define seqlock_init(sl)				\
 	do {						\
-		seqcount_init(&(x)->seqcount);		\
-		spin_lock_init(&(x)->lock);		\
+		seqcount_init(&(sl)->seqcount);		\
+		spin_lock_init(&(sl)->lock);		\
 	} while (0)
 
-#define DEFINE_SEQLOCK(x) \
-		seqlock_t x = __SEQLOCK_UNLOCKED(x)
+/**
+ * DEFINE_SEQLOCK() - Define a statically-allocated seqlock_t
+ * @sl: Name of the &typedef seqlock_t instance
+ */
+#define DEFINE_SEQLOCK(sl) \
+		seqlock_t sl = __SEQLOCK_UNLOCKED(sl)
 
-/*
- * Read side functions for starting and finalizing a read side section.
+/**
+ * read_seqbegin() - start a seqlock_t read-side critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_seqbegin opens a read side critical section of the given
+ * seqlock_t. Validity of the critical section is tested by checking
+ * read_seqretry().
+ *
+ * Return: count to be passed to read_seqretry()
  */
 static inline unsigned read_seqbegin(const seqlock_t *sl)
 {
 	return read_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * read_seqretry() - end and validate a seqlock_t read side section
+ * @sl: Pointer to &typedef seqlock_t
+ * @start: count, from read_seqbegin()
+ *
+ * read_seqretry closes the given seqlock_t read side critical section,
+ * and checks its validity. If the read section was invalid, it must be
+ * ignored and retried.
+ *
+ * Return: 1 if a retry is required, 0 otherwise
+ */
 static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 {
 	return read_seqcount_retry(&sl->seqcount, start);
 }
 
-/*
- * Lock out other writers and update the count.
- * Acts like a normal spin_lock/unlock.
- * Don't need preempt_disable() because that is in the spin_lock already.
+/**
+ * write_seqlock() - start a seqlock_t write side critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_seqlock opens a write side critical section of the given
+ * seqlock_t.  It also acquires the spinlock_t embedded inside the
+ * sequential lock. All the seqlock_t write side critical sections are
+ * thus automatically serialized and non-preemptible.
+ *
+ * Use the ``_irqsave`` and ``_bh`` variants instead if the read side
+ * can be invoked from a hardirq or softirq context.
+ *
+ * The opened write side section must be closed with write_sequnlock().
  */
 static inline void write_seqlock(seqlock_t *sl)
 {
@@ -446,30 +511,68 @@ static inline void write_seqlock(seqlock_t *sl)
 	write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock() - end a seqlock_t write side critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_sequnlock closes the (serialized and non-preemptible) write
+ * side critical section of given seqlock_t.
+ */
 static inline void write_sequnlock(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
 	spin_unlock(&sl->lock);
 }
 
+/**
+ * write_seqlock_bh() - start a softirqs-disabled seqlock_t write section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_bh`` variant of write_seqlock(). Use only if the read side section
+ * can be invoked from a softirq context.
+ *
+ * The opened write section must be closed with write_sequnlock_bh().
+ */
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
 	write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock_bh() - end a softirqs-disabled seqlock_t write section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_sequnlock_bh closes the serialized, non-preemptible,
+ * softirqs-disabled, seqlock_t write side critical section opened with
+ * write_seqlock_bh().
+ */
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
 	spin_unlock_bh(&sl->lock);
 }
 
+/**
+ * write_seqlock_irq() - start a non-interruptible seqlock_t write side section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * This is the ``_irq`` variant of write_seqlock(). Use only if the read
+ * section of given seqlock_t can be invoked from a hardirq context.
+ */
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
 	write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock_irq() - end a non-interruptible seqlock_t write side section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_irq`` variant of write_sequnlock(). The write side section of
+ * given seqlock_t must've been opened with write_seqlock_irq().
+ */
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
@@ -485,9 +588,28 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	return flags;
 }
 
+/**
+ * write_seqlock_irqsave() - start a non-interruptible seqlock_t write section
+ * @lock:  Pointer to &typedef seqlock_t
+ * @flags: Stack-allocated storage for saving caller's local interrupt
+ *         state, to be passed to write_sequnlock_irqrestore().
+ *
+ * ``_irqsave`` variant of write_seqlock(). Use if the read section of
+ * given seqlock_t can be invoked from a hardirq context.
+ *
+ * The opened write section must be closed with write_sequnlock_irqrestore().
+ */
 #define write_seqlock_irqsave(lock, flags)				\
 	do { flags = __write_seqlock_irqsave(lock); } while (0)
 
+/**
+ * write_sequnlock_irqrestore() - end non-interruptible seqlock_t write section
+ * @sl:    Pointer to &typedef seqlock_t
+ * @flags: Caller's saved interrupt state, from write_seqlock_irqsave()
+ *
+ * ``_irqrestore`` variant of write_sequnlock(). The write section of
+ * given seqlock_t must've been opened with write_seqlock_irqsave().
+ */
 static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
@@ -495,30 +617,64 @@ write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
-/*
- * A locking reader exclusively locks out other writers and locking readers,
- * but doesn't update the sequence number. Acts like a normal spin_lock/unlock.
- * Don't need preempt_disable() because that is in the spin_lock already.
+/**
+ * read_seqlock_excl() - begin a seqlock_t locking reader critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_seqlock_excl opens a locking reader critical section for the
+ * given seqlock_t. A locking reader exclusively locks out other writers
+ * and other *locking* readers, but doesn't update the sequence number.
+ *
+ * Locking readers act like a normal spin_lock()/spin_unlock().
+ *
+ * The opened read side section must be closed with read_sequnlock_excl().
  */
 static inline void read_seqlock_excl(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl() - end a seqlock_t locking reader critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_sequnlock_excl closes the locking reader critical section opened
+ * with read_seqlock_excl().
+ */
 static inline void read_sequnlock_excl(seqlock_t *sl)
 {
 	spin_unlock(&sl->lock);
 }
 
 /**
- * read_seqbegin_or_lock - begin a sequence number check or locking block
- * @lock: sequence lock
- * @seq : sequence number to be checked
- *
- * First try it once optimistically without taking the lock. If that fails,
- * take the lock. The sequence number is also used as a marker for deciding
- * whether to be a reader (even) or writer (odd).
- * N.B. seq must be initialized to an even number to begin with.
+ * read_seqbegin_or_lock() - begin a seqlock_t lockless or locking reader
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq : Marker and return parameter. If the passed value is even, the
+ * reader will become a *lockless* seqlock_t sequence counter reader as
+ * in read_seqbegin(). If the passed value is odd, the reader will
+ * become a fully locking reader, as in read_seqlock_excl().  In the
+ * first call to read_seqbegin_or_lock(), the caller **must** initialize
+ * and pass an even value to @seq so a lockless read is optimistically
+ * tried first.
+ *
+ * read_seqbegin_or_lock is an API designed to optimistically try a
+ * normal lockless seqlock_t read section first, as in read_seqbegin().
+ * If an odd counter is found, the normal lockless read trial has
+ * failed, and the next reader iteration transforms to a full seqlock_t
+ * locking reader as in read_seqlock_excl().
+ *
+ * This is typically used to avoid lockless seqlock_t readers starvation
+ * (too much retry loops) in the case of a sharp spike in write
+ * activity.
+ *
+ * The opened read section must be closed with done_seqretry().  Check
+ * Documentation/locking/seqlock.rst for template example code.
+ *
+ * Return: The encountered sequence counter value, returned through the
+ * @seq parameter, which is overloaded as a return parameter. The
+ * returned value must be checked with need_seqretry(). If the read
+ * section must be retried, the returned value must also be passed to
+ * the @seq parameter of the next read_seqbegin_or_lock() iteration.
  */
 static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
 {
@@ -528,32 +684,90 @@ static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
 		read_seqlock_excl(lock);
 }
 
+/**
+ * need_seqretry() - validate seqlock_t "locking or lockless" reader section
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq: count, from read_seqbegin_or_lock()
+ *
+ * need_seqretry checks if the seqlock_t read-side critical section
+ * started with read_seqbegin_or_lock() is valid. If it was not, the
+ * caller must retry the read-side section.
+ *
+ * Return: 1 if a retry is required, 0 otherwise
+ */
 static inline int need_seqretry(seqlock_t *lock, int seq)
 {
 	return !(seq & 1) && read_seqretry(lock, seq);
 }
 
+/**
+ * done_seqretry() - end seqlock_t "locking or lockless" reader section
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq: count, from read_seqbegin_or_lock()
+ *
+ * done_seqretry finishes the seqlock_t read side critical section
+ * started by read_seqbegin_or_lock(). The read section must've been
+ * already validated with need_seqretry().
+ */
 static inline void done_seqretry(seqlock_t *lock, int seq)
 {
 	if (seq & 1)
 		read_sequnlock_excl(lock);
 }
 
+/**
+ * read_seqlock_excl_bh() - start a locking reader seqlock_t section
+ *			    with softirqs disabled
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_bh`` variant of read_seqlock_excl(). Use this variant if the
+ * seqlock_t write side section, *or other read sections*, can be
+ * invoked from a softirq context
+ *
+ * The opened section must be closed with read_sequnlock_excl_bh().
+ */
 static inline void read_seqlock_excl_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl_bh() - stop a seqlock_t softirq-disabled locking
+ *			      reader section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_bh`` variant of read_sequnlock_excl(). The closed section must've
+ * been opened with read_seqlock_excl_bh().
+ */
 static inline void read_sequnlock_excl_bh(seqlock_t *sl)
 {
 	spin_unlock_bh(&sl->lock);
 }
 
+/**
+ * read_seqlock_excl_irq() - start a non-interruptible seqlock_t locking
+ *			     reader section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_irq`` variant of read_seqlock_excl(). Use this only if the
+ * seqlock_t write side critical section, *or other read side sections*,
+ * can be invoked from a hardirq context.
+ *
+ * The opened read section must be closed with read_sequnlock_excl_irq().
+ */
 static inline void read_seqlock_excl_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl_irq() - end an interrupts-disabled seqlock_t
+ *                             locking reader section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_irq`` variant of read_sequnlock_excl(). The closed section must've
+ * been opened with read_seqlock_excl_irq().
+ */
 static inline void read_sequnlock_excl_irq(seqlock_t *sl)
 {
 	spin_unlock_irq(&sl->lock);
@@ -567,15 +781,59 @@ static inline unsigned long __read_seqlock_excl_irqsave(seqlock_t *sl)
 	return flags;
 }
 
+/**
+ * read_seqlock_excl_irqsave() - start a non-interruptible seqlock_t
+ *				 locking reader section
+ * @lock: Pointer to &typedef seqlock_t
+ * @flags: Stack-allocated storage for saving caller's local interrupt
+ *         state, to be passed to read_sequnlock_excl_irqrestore().
+ *
+ * ``_irqsave`` variant of read_seqlock_excl(). Use this only if the
+ * seqlock_t write side critical section, *or other read side sections*,
+ * can be invoked from a hardirq context.
+ *
+ * Opened section must be closed with read_sequnlock_excl_irqrestore().
+ */
 #define read_seqlock_excl_irqsave(lock, flags)				\
 	do { flags = __read_seqlock_excl_irqsave(lock); } while (0)
 
+/**
+ * read_sequnlock_excl_irqrestore() - end non-interruptible seqlock_t
+ *				      locking reader section
+ * @sl: Pointer to &typedef seqlock_t
+ * @flags: Caller's saved interrupt state, from
+ *	   read_seqlock_excl_irqsave()
+ *
+ * ``_irqrestore`` variant of read_sequnlock_excl(). The closed section
+ * must've been opened with read_seqlock_excl_irqsave().
+ */
 static inline void
 read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags)
 {
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
+/**
+ * read_seqbegin_or_lock_irqsave() - begin a seqlock_t lockless reader, or
+ *                                   a non-interruptible locking reader
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq: Marker and return parameter. Check read_seqbegin_or_lock().
+ *
+ * This is the ``_irqsave`` variant of read_seqbegin_or_lock(). Use if
+ * the seqlock_t write side critical section, *or other read side sections*,
+ * can be invoked from hardirq context.
+ *
+ * The validity of the read section must be checked with need_seqretry().
+ * The opened section must be closed with done_seqretry_irqrestore().
+ *
+ * Return:
+ *
+ *   1. The saved local interrupts state in case of a locking reader, to be
+ *      passed to done_seqretry_irqrestore().
+ *
+ *   2. The encountered sequence counter value, returned through @seq which
+ *      is overloaded as a return parameter. Check read_seqbegin_or_lock().
+ */
 static inline unsigned long
 read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
 {
@@ -589,6 +847,18 @@ read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
 	return flags;
 }
 
+/**
+ * done_seqretry_irqrestore() - end a seqlock_t lockless reader, or a
+ *				non-interruptible locking reader section
+ * @lock:  Pointer to &typedef seqlock_t
+ * @seq:   Count, from read_seqbegin_or_lock_irqsave()
+ * @flags: Caller's saved local interrupt state in case of a locking
+ *	   reader, also from read_seqbegin_or_lock_irqsave()
+ *
+ * This is the ``_irqrestore`` variant of done_seqretry(). The read
+ * section must've been opened with read_seqbegin_or_lock_irqsave(), and
+ * validated with need_seqretry().
+ */
 static inline void
 done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 04/18] seqlock: Extend seqcount API with associated locks
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (2 preceding siblings ...)
  2020-06-08  0:57   ` [PATCH v2 03/18] seqlock: Add missing kernel-doc annotations Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 05/18] dma-buf: Remove custom seqcount lockdep class key Ahmed S. Darwish
                     ` (13 subsequent siblings)
  17 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the write side critical section.

There is no built-in debugging mechanism to verify that the lock used
for writer serialization is held and preemption is disabled. Some usage
sites like dma-buf have explicit lockdep checks for the writer-side
lock, but this covers only a small portion of the sequence counter usage
in the kernel.

Add new sequence counter types which allows to associate a lock to the
sequence counter at initialization time. The seqcount API functions are
extended to provide appropriate lockdep assertions depending on the
seqcount/lock type.

For sequence counters with associated locks that do not implicitly
disable preemption, preemption protection is enforced in the sequence
counter write side functions. This removes the need to explicitly add
preempt_disable/enable() around the write side critical sections: the
write_begin/end() functions for these new sequence counter types
automatically do this.

Introduce the following seqcount types with associated locks:

     seqcount_spinlock_t
     seqcount_raw_spinlock_t
     seqcount_rwlock_t
     seqcount_mutex_t
     seqcount_ww_mutex_t

Extend the seqcount read and write functions to branch out to the
specific seqcount_LOCKTYPE_t implementation at compile-time. This avoids
kernel API explosion per each new seqcount_LOCKTYPE_t added. Add such
compile-time type detection logic into a new, internal, seqlock header.

Document the proper seqcount_LOCKTYPE_t usage, and rationale, at
Documentation/locking/seqlock.rst.

If lockdep is disabled, this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 Documentation/locking/seqlock.rst      |  64 ++++-
 MAINTAINERS                            |   2 +-
 include/linux/seqlock.h                | 354 +++++++++++++++++++++----
 include/linux/seqlock_types_internal.h | 187 +++++++++++++
 4 files changed, 555 insertions(+), 52 deletions(-)
 create mode 100644 include/linux/seqlock_types_internal.h

diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
index c9916efe038e..2d526dc95408 100644
--- a/Documentation/locking/seqlock.rst
+++ b/Documentation/locking/seqlock.rst
@@ -48,9 +48,11 @@ write side section. If the read section can be invoked from hardirq or
 softirq contexts, interrupts or bottom halves must also be respectively
 disabled before entering the write section.
 
-If it's desired to automatically handle the sequence counter
-requirements of writer serialization and non-preemptibility, use a
-:ref:`sequential lock <seqlock_t>` instead.
+If the write serialization mechanism is one of the common kernel locking
+primitives, use :ref:`sequence counters with associated locks
+<seqcount_locktype_t>` instead. If it's desired to automatically handle
+the sequence counter writer serialization and non-preemptibility
+requirements, use a :ref:`sequential lock <seqlock_t>`.
 
 Initialization:
 
@@ -70,6 +72,7 @@ Initialization:
 
 Write path:
 
+.. _seqcount_write_ops:
 .. code-block:: c
 
 	/* Serialized context with disabled preemption */
@@ -82,6 +85,7 @@ Write path:
 
 Read path:
 
+.. _seqcount_read_ops:
 .. code-block:: c
 
 	do {
@@ -91,6 +95,60 @@ Read path:
 
 	} while (read_seqcount_retry(&foo_seqcount, seq));
 
+.. _seqcount_locktype_t:
+
+Sequence counters with associated locks (:c:type:`seqcount_LOCKTYPE_t`)
+-----------------------------------------------------------------------
+
+As :ref:`earlier discussed <seqcount_t>`, seqcount write side critical
+sections must be serialized and non-preemptible. This variant of
+sequence counters associate the lock used for writer serialization at
+the seqcount initialization time. This enables lockdep to validate that
+the write side critical section is properly serialized.
+
+This lock association is a NOOP if lockdep is disabled and has neither
+storage nor runtime overhead. If lockdep is enabled, the lock pointer is
+stored in struct seqcount and lockdep's "lock is held" assertions are
+injected at the beginning of the write side critical section to validate
+that it is properly protected.
+
+For lock types which do not implicitly disable preemption, preemption
+protection is enforced in the write side function.
+
+The following seqcounts with associated locks are defined:
+
+  - :c:type:`seqcount_spinlock_t`
+  - :c:type:`seqcount_raw_spinlock_t`
+  - :c:type:`seqcount_rwlock_t`
+  - :c:type:`seqcount_mutex_t`
+  - :c:type:`seqcount_ww_mutex_t`
+
+The plain seqcount read and write APIs branch out to the specific
+seqcount_LOCKTYPE_t implementation at compile-time. This avoids kernel
+API explosion per each new seqcount LOCKTYPE.
+
+Initialization (replace "LOCKTYPE" with one of the supported locks):
+
+.. code-block:: c
+
+	/* dynamic */
+	seqcount_LOCKTYPE_t foo_seqcount;
+	seqcount_LOCKTYPE_init(&foo_seqcount, &lock);
+
+	/* static */
+	static seqcount_LOCKTYPE_t foo_seqcount =
+		SEQCNT_LOCKTYPE_ZERO(foo_seqcount, &lock);
+
+	/* C99 struct init */
+	struct {
+		.seq   = SEQCNT_LOCKTYPE_ZERO(foo.seq, &lock),
+	} foo;
+
+Write path: same as in :ref:`plain seqcount_t <seqcount_write_ops>`,
+while running from a context with the associated LOCKTYPE lock acquired.
+
+Read path: same as in :ref:`plain seqcount_t <seqcount_read_ops>`.
+
 .. _seqlock_t:
 
 Sequential locks (:c:type:`seqlock_t`)
diff --git a/MAINTAINERS b/MAINTAINERS
index 50659d76976b..d20909ab7d03 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9930,7 +9930,7 @@ F:	include/linux/lockdep.h
 F:	include/linux/mutex*.h
 F:	include/linux/rwlock*.h
 F:	include/linux/rwsem*.h
-F:	include/linux/seqlock.h
+F:	include/linux/seqlock*.h
 F:	include/linux/spinlock*.h
 F:	kernel/locking/
 F:	lib/locking*.[ch]
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index a11b113ed396..ce922bb81642 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -32,6 +32,10 @@
  * This mechanism can't be used if the protected data contains pointers,
  * as the writer can invalidate a pointer that a reader is following.
  *
+ * If the write serialization mechanism is one of the common kernel
+ * locking primitives, use a sequence counter with associated lock
+ * (seqcount_LOCKTYPE_t) instead.
+ *
  * If it's desired to automatically handle the sequence counter writer
  * serialization and non-preemptibility requirements, use a sequential
  * lock (seqlock_t) instead.
@@ -92,11 +96,10 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  */
 #define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) }
 
-
 /**
  * __read_seqcount_begin() - begin a seq-read critical section (without barrier)
- * @s: Pointer to &typedef seqcount_t
- * Returns: count to be passed to read_seqcount_retry()
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
+ * Returns: count to be passed to read_seqcount_retry
  *
  * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
  * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
@@ -106,7 +109,9 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  * Use carefully, only in critical code, and comment how the barrier is
  * provided.
  */
-static inline unsigned __read_seqcount_begin(const seqcount_t *s)
+#define __read_seqcount_begin(s)	do___read_seqcount_begin(s)
+
+static inline unsigned __read_seqcount_t_begin(const seqcount_t *s)
 {
 	unsigned ret;
 
@@ -121,15 +126,17 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 
 /**
  * raw_read_seqcount() - Read the raw seqcount
- * @s: Pointer to &typedef seqcount_t
- * Returns: count to be passed to read_seqcount_retry()
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
+ * Returns: count to be passed to read_seqcount_retry
  *
  * raw_read_seqcount opens a read critical section of the given
  * seqcount_t, without any lockdep checks and without checking or
  * masking the sequence counter LSB. Calling code is responsible for
  * handling that.
  */
-static inline unsigned raw_read_seqcount(const seqcount_t *s)
+#define raw_read_seqcount(s)	do_raw_read_seqcount(s)
+
+static inline unsigned raw_read_seqcount_t(const seqcount_t *s)
 {
 	unsigned ret = READ_ONCE(s->sequence);
 	smp_rmb();
@@ -138,38 +145,42 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
 
 /**
  * raw_read_seqcount_begin() - start seq-read critical section w/o lockdep
- * @s: Pointer to &typedef seqcount_t
- * Returns: count to be passed to read_seqcount_retry()
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
+ * Returns: count to be passed to read_seqcount_retry
  *
  * raw_read_seqcount_begin opens a read critical section of the given
  * seqcount_t, but without any lockdep checking. Validity of the read
  * section must be checked with read_seqcount_retry().
  */
-static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
+#define raw_read_seqcount_begin(s)	do_raw_read_seqcount_begin(s)
+
+static inline unsigned raw_read_seqcount_t_begin(const seqcount_t *s)
 {
-	unsigned ret = __read_seqcount_begin(s);
+	unsigned ret = __read_seqcount_t_begin(s);
 	smp_rmb();
 	return ret;
 }
 
 /**
  * read_seqcount_begin() - begin a seq-read critical section
- * @s: Pointer to &typedef seqcount_t
- * Returns: count to be passed to read_seqcount_retry()
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
+ * Returns: count to be passed to read_seqcount_retry
  *
  * read_seqcount_begin opens a read critical section of the given
  * seqcount_t. Validity of the read section must be checked with
  * read_seqcount_retry().
  */
-static inline unsigned read_seqcount_begin(const seqcount_t *s)
+#define read_seqcount_begin(s)	do_read_seqcount_begin(s)
+
+static inline unsigned read_seqcount_t_begin(const seqcount_t *s)
 {
 	seqcount_lockdep_reader_access(s);
-	return raw_read_seqcount_begin(s);
+	return raw_read_seqcount_t_begin(s);
 }
 
 /**
  * raw_seqcount_begin() - begin a seq-read critical section
- * @s: Pointer to &typedef seqcount_t
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * Returns: count to be passed to read_seqcount_retry
  *
  * raw_seqcount_begin opens a read critical section of the given seqcount_t.
@@ -181,7 +192,9 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
  * read_seqcount_retry() instead of stabilizing at the beginning of the
  * critical section.
  */
-static inline unsigned raw_seqcount_begin(const seqcount_t *s)
+#define raw_seqcount_begin(s)	do_raw_seqcount_begin(s)
+
+static inline unsigned raw_seqcount_t_begin(const seqcount_t *s)
 {
 	unsigned ret = READ_ONCE(s->sequence);
 	smp_rmb();
@@ -190,7 +203,7 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 
 /**
  * __read_seqcount_retry() - end a seq-read critical section (without barrier)
- * @s: Pointer to &typedef seqcount_t
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * @start: count, from read_seqcount_begin
  * Returns: 1 if retry is required, else 0
  *
@@ -202,14 +215,16 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
  * Use carefully, only in critical code, and comment how the barrier is
  * provided.
  */
-static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
+#define __read_seqcount_retry(s, start)	do___read_seqcount_retry(s, start)
+
+static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 {
 	return unlikely(s->sequence != start);
 }
 
 /**
  * read_seqcount_retry() - end a seq-read critical section
- * @s: Pointer to &typedef seqcount_t
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * @start: count, from read_seqcount_begin
  * Returns: 1 if retry is required, else 0
  *
@@ -217,19 +232,25 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
  * If the critical section was invalid, it must be ignored (and typically
  * retried).
  */
-static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
+#define read_seqcount_retry(s, start)	do_read_seqcount_retry(s, start)
+
+static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 {
 	smp_rmb();
-	return __read_seqcount_retry(s, start);
+	return __read_seqcount_t_retry(s, start);
 }
 
-static inline void raw_write_seqcount_begin(seqcount_t *s)
+#define raw_write_seqcount_begin(s)	do_raw_write_seqcount_begin(s)
+
+static inline void raw_write_seqcount_t_begin(seqcount_t *s)
 {
 	s->sequence++;
 	smp_wmb();
 }
 
-static inline void raw_write_seqcount_end(seqcount_t *s)
+#define raw_write_seqcount_end(s)	do_raw_write_seqcount_end(s)
+
+static inline void raw_write_seqcount_t_end(seqcount_t *s)
 {
 	smp_wmb();
 	s->sequence++;
@@ -237,7 +258,7 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 
 /**
  * raw_write_seqcount_barrier() - do a seq write barrier
- * @s: Pointer to &typedef seqcount_t
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * This can be used to provide an ordering guarantee instead of the
  * usual consistency guarantee. It is one wmb cheaper, because we can
@@ -269,7 +290,9 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  *              X = false;
  *      }
  */
-static inline void raw_write_seqcount_barrier(seqcount_t *s)
+#define raw_write_seqcount_barrier(s)	do_raw_write_seqcount_barrier(s)
+
+static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
 {
 	s->sequence++;
 	smp_wmb();
@@ -278,7 +301,7 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
 
 /**
  * raw_read_seqcount_latch() - pick even or odd seqcount latch data copy
- * @s: Pointer to &typedef seqcount_t
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * Use seqcount latching to switch between two storage places with
  * sequence protection to allow interruptible, preemptible, writer
@@ -291,7 +314,9 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
  * which data copy to read. Full counter must then be checked with
  * read_seqcount_retry().
  */
-static inline int raw_read_seqcount_latch(seqcount_t *s)
+#define raw_read_seqcount_latch(s)	do_raw_read_seqcount_latch(s)
+
+static inline int raw_read_seqcount_t_latch(seqcount_t *s)
 {
 	/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
 	int seq = READ_ONCE(s->sequence); /* ^^^ */
@@ -300,7 +325,7 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
 
 /**
  * raw_write_seqcount_latch() - redirect readers to even/odd copy
- * @s: Pointer to &typedef seqcount_t
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -379,22 +404,37 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  *	When data is a dynamic data structure; one should use regular RCU
  *	patterns to manage the lifetimes of the objects within.
  */
-static inline void raw_write_seqcount_latch(seqcount_t *s)
+#define raw_write_seqcount_latch(s)	do_raw_write_seqcount_latch(s)
+
+static inline void raw_write_seqcount_t_latch(seqcount_t *s)
 {
        smp_wmb();      /* prior stores before incrementing "sequence" */
        s->sequence++;
        smp_wmb();      /* increment "sequence" before following stores */
 }
 
-static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
+#define write_seqcount_begin_nested(s, subclass)		\
+	do_write_seqcount_begin_nested(s, subclass)
+
+static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
 {
-	raw_write_seqcount_begin(s);
+	raw_write_seqcount_t_begin(s);
 	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }
 
-static inline void write_seqcount_begin(seqcount_t *s)
+/**
+ * write_seqcount_begin() - start a seqcount write-side critical section
+ * @s: Pointer to &typedef seqcount_t
+ *
+ * write_seqcount_begin opens a write-side critical section of the given
+ * seqcount. Seqcount write-side critical sections must be externally
+ * serialized and non-preemptible.
+ */
+#define write_seqcount_begin(s)		do_write_seqcount_begin(s)
+
+static inline void write_seqcount_t_begin(seqcount_t *s)
 {
-	write_seqcount_begin_nested(s, 0);
+	write_seqcount_t_begin_nested(s, 0);
 }
 
 /**
@@ -403,25 +443,242 @@ static inline void write_seqcount_begin(seqcount_t *s)
  *
  * The write section must've been opened with write_seqcount_begin().
  */
-static inline void write_seqcount_end(seqcount_t *s)
+#define write_seqcount_end(s)		do_write_seqcount_end(s)
+
+static inline void write_seqcount_t_end(seqcount_t *s)
 {
 	seqcount_release(&s->dep_map, _RET_IP_);
-	raw_write_seqcount_end(s);
+	raw_write_seqcount_t_end(s);
 }
 
 /**
  * write_seqcount_invalidate() - invalidate in-progress read-side seq operations
- * @s: Pointer to &typedef seqcount_t
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * After write_seqcount_invalidate, no read-side seq operations will complete
  * successfully and see data older than this.
  */
-static inline void write_seqcount_invalidate(seqcount_t *s)
+#define write_seqcount_invalidate(s)	do_write_seqcount_invalidate(s)
+
+static inline void write_seqcount_t_invalidate(seqcount_t *s)
 {
 	smp_wmb();
 	s->sequence+=2;
 }
 
+/*
+ * Sequence counters with associated locks (seqcount_LOCKTYPE_t)
+ *
+ * A sequence counter which associates the lock used for writer
+ * serialization at initialization time. This enables lockdep to validate
+ * that the write side critical section is properly serialized.
+ *
+ * For associated locks which do not implicitly disable preemption,
+ * preemption protection is enforced in the write side function.
+ *
+ * See Documentation/locking/seqlock.rst
+ */
+
+/**
+ * typedef seqcount_spinlock_t - sequence count with spinlock associated
+ * @seqcount:		The real sequence counter
+ * @lock:		Pointer to the associated spinlock
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * spinlock. The spinlock is associated to the sequence count in the
+ * static initializer or init function. This enables lockdep to validate
+ * that the write side critical section is properly serialized.
+ */
+typedef struct seqcount_spinlock {
+	seqcount_t      seqcount;
+#ifdef CONFIG_LOCKDEP
+	spinlock_t	*lock;
+#endif
+} seqcount_spinlock_t;
+
+#ifdef CONFIG_LOCKDEP
+
+#define SEQCOUNT_LOCKTYPE_ZERO(seq_name, assoc_lock) {		\
+	.seqcount	= SEQCNT_ZERO(seq_name.seqcount),	\
+	.lock		= (assoc_lock),				\
+}
+
+/* Define as macro due to static lockdep key @ seqcount_init() */
+#define seqcount_locktype_init(s, assoc_lock)			\
+do {								\
+	seqcount_init(&(s)->seqcount);				\
+	(s)->lock = (assoc_lock);				\
+} while (0)
+
+#else /* !CONFIG_LOCKDEP */
+
+#define SEQCOUNT_LOCKTYPE_ZERO(seq_name, assoc_lock) {		\
+	.seqcount	= SEQCNT_ZERO(seq_name.seqcount),	\
+}
+
+#define seqcount_locktype_init(s, assoc_lock)			\
+do {								\
+	seqcount_init(&(s)->seqcount);				\
+} while (0)
+
+#endif
+
+/**
+ * SEQCNT_SPINLOCK_ZERO - static initializer for seqcount_spinlock_t
+ * @name:	Name of the &typedef seqcount_spinlock_t instance
+ * @lock:	Pointer to the associated spinlock
+ */
+#define SEQCNT_SPINLOCK_ZERO(name, lock)	\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_spinlock_init - runtime initializer for seqcount_spinlock_t
+ * @s:		Pointer to the &typedef seqcount_spinlock_t instance
+ * @lock:	Pointer to the associated spinlock
+ */
+#define seqcount_spinlock_init(s, lock)		\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_raw_spinlock_t - sequence count with raw spinlock associated
+ * @seqcount:		The real sequence counter
+ * @lock:		Pointer to the associated raw spinlock
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * raw spinlock. The raw spinlock is associated to the sequence count in
+ * the static initializer or init function. This enables lockdep to
+ * validate that the write side critical section is properly serialized.
+ */
+typedef struct seqcount_raw_spinlock {
+	seqcount_t      seqcount;
+#ifdef CONFIG_LOCKDEP
+	raw_spinlock_t	*lock;
+#endif
+} seqcount_raw_spinlock_t;
+
+/**
+ * SEQCNT_RAW_SPINLOCK_ZERO - static initializer for seqcount_raw_spinlock_t
+ * @name:	Name of the &typedef seqcount_raw_spinlock_t instance
+ * @lock:	Pointer to the associated raw_spinlock
+ */
+#define SEQCNT_RAW_SPINLOCK_ZERO(name, lock)	\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_raw_spinlock_init - runtime initializer for seqcount_raw_spinlock_t
+ * @s:		Pointer to the &typedef seqcount_raw_spinlock_t instance
+ * @lock:	Pointer to the associated raw_spinlock
+ */
+#define seqcount_raw_spinlock_init(s, lock)	\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_rwlock_t - sequence count with rwlock associated
+ * @seqcount:		The real sequence counter
+ * @lock:		Pointer to the associated rwlock
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * rwlock. The rwlock is associated to the sequence count in the static
+ * initializer or init function. This enables lockdep to validate that
+ * the write side critical section is properly serialized.
+ */
+typedef struct seqcount_rwlock {
+	seqcount_t      seqcount;
+#ifdef CONFIG_LOCKDEP
+	rwlock_t	*lock;
+#endif
+} seqcount_rwlock_t;
+
+/**
+ * SEQCNT_RWLOCK_ZERO - static initializer for seqcount_rwlock_t
+ * @name:	Name of the &typedef seqcount_rwlock_t instance
+ * @lock:	Pointer to the associated rwlock
+ */
+#define SEQCNT_RWLOCK_ZERO(name, lock)		\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_rwlock_init - runtime initializer for seqcount_rwlock_t
+ * @s:		Pointer to the &typedef seqcount_rwlock_t instance
+ * @lock:	Pointer to the associated rwlock
+ */
+#define seqcount_rwlock_init(s, lock)		\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_mutex_t - sequence count with mutex associated
+ * @seqcount:		The real sequence counter
+ * @lock:		Pointer to the associated mutex
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * mutex. The mutex is associated to the sequence counter in the static
+ * initializer or init function. This enables lockdep to validate that
+ * the write side critical section is properly serialized.
+ *
+ * The write side API functions write_seqcount_begin()/end() automatically
+ * disable and enable preemption when used with seqcount_mutex_t.
+ */
+typedef struct seqcount_mutex {
+	seqcount_t      seqcount;
+#ifdef CONFIG_LOCKDEP
+	struct mutex	*lock;
+#endif
+} seqcount_mutex_t;
+
+/**
+ * SEQCNT_MUTEX_ZERO - static initializer for seqcount_mutex_t
+ * @name:	Name of the &typedef seqcount_mutex_t instance
+ * @lock:	Pointer to the associated mutex
+ */
+#define SEQCNT_MUTEX_ZERO(name, lock)		\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_mutex_init - runtime initializer for seqcount_mutex_t
+ * @s:		Pointer to the &typedef seqcount_mutex_t instance
+ * @lock:	Pointer to the associated mutex
+ */
+#define seqcount_mutex_init(s, lock)		\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_ww_mutex_t - sequence count with ww_mutex associated
+ * @seqcount:		The real sequence counter
+ * @lock:		Pointer to the associated ww_mutex
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * ww_mutex. The ww_mutex is associated to the sequence counter in the static
+ * initializer or init function. This enables lockdep to validate that
+ * the write side critical section is properly serialized.
+ *
+ * The write side API functions write_seqcount_begin()/end() automatically
+ * disable and enable preemption when used with seqcount_ww_mutex_t.
+ */
+typedef struct seqcount_ww_mutex {
+	seqcount_t      seqcount;
+#ifdef CONFIG_LOCKDEP
+	struct ww_mutex	*lock;
+#endif
+} seqcount_ww_mutex_t;
+
+/**
+ * SEQCNT_WW_MUTEX_ZERO - static initializer for seqcount_ww_mutex_t
+ * @name:	Name of the &typedef seqcount_ww_mutex_t instance
+ * @lock:	Pointer to the associated ww_mutex
+ */
+#define SEQCNT_WW_MUTEX_ZERO(name, lock)	\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_ww_mutex_init - runtime initializer for seqcount_ww_mutex_t
+ * @s:		Pointer to the &typedef seqcount_ww_mutex_t instance
+ * @lock:	Pointer to the associated ww_mutex
+ */
+#define seqcount_ww_mutex_init(s, lock)		\
+	seqcount_locktype_init(s, lock)
+
+#include <linux/seqlock_types_internal.h>
+
 /*
  * Sequential locks (seqlock_t)
  *
@@ -472,7 +729,7 @@ typedef struct {
  */
 static inline unsigned read_seqbegin(const seqlock_t *sl)
 {
-	return read_seqcount_begin(&sl->seqcount);
+	return read_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -488,7 +745,7 @@ static inline unsigned read_seqbegin(const seqlock_t *sl)
  */
 static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 {
-	return read_seqcount_retry(&sl->seqcount, start);
+	return read_seqcount_t_retry(&sl->seqcount, start);
 }
 
 /**
@@ -508,7 +765,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	write_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -520,7 +777,7 @@ static inline void write_seqlock(seqlock_t *sl)
  */
 static inline void write_sequnlock(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock(&sl->lock);
 }
 
@@ -536,7 +793,7 @@ static inline void write_sequnlock(seqlock_t *sl)
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	write_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -549,7 +806,7 @@ static inline void write_seqlock_bh(seqlock_t *sl)
  */
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock_bh(&sl->lock);
 }
 
@@ -563,7 +820,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	write_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -575,7 +832,7 @@ static inline void write_seqlock_irq(seqlock_t *sl)
  */
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock_irq(&sl->lock);
 }
 
@@ -584,7 +841,8 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	unsigned long flags;
 
 	spin_lock_irqsave(&sl->lock, flags);
-	write_seqcount_begin(&sl->seqcount);
+	write_seqcount_t_begin(&sl->seqcount);
+
 	return flags;
 }
 
@@ -613,7 +871,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
diff --git a/include/linux/seqlock_types_internal.h b/include/linux/seqlock_types_internal.h
new file mode 100644
index 000000000000..de635f4c7297
--- /dev/null
+++ b/include/linux/seqlock_types_internal.h
@@ -0,0 +1,187 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_SEQLOCK_TYPES_INTERNAL_H
+#define __LINUX_SEQLOCK_TYPES_INTERNAL_H
+
+/*
+ * Sequence counters with associated locks
+ *
+ * Copyright (C) 2020 Linutronix GmbH
+ */
+
+#ifndef __LINUX_SEQLOCK_H
+#error This is an INTERNAL header; it must only be included by seqlock.h
+#endif
+
+#include <linux/mutex.h>
+#include <linux/rwlock.h>
+#include <linux/spinlock.h>
+#include <linux/ww_mutex.h>
+
+/*
+ * @s: pointer to seqcount_t or any of the seqcount_locktype_t variants
+ */
+#define __to_seqcount_t(s)						\
+({									\
+	seqcount_t *seq;						\
+									\
+	if (__same_type(*(s), seqcount_t))				\
+		seq = (seqcount_t *)(s);				\
+	else if (__same_type(*(s), seqcount_spinlock_t))		\
+		seq = &((seqcount_spinlock_t *)(s))->seqcount;		\
+	else if (__same_type(*(s), seqcount_raw_spinlock_t))		\
+		seq = &((seqcount_raw_spinlock_t *)(s))->seqcount;	\
+	else if (__same_type(*(s), seqcount_rwlock_t))			\
+		seq = &((seqcount_rwlock_t *)(s))->seqcount;		\
+	else if (__same_type(*(s), seqcount_mutex_t))			\
+		seq = &((seqcount_mutex_t *)(s))->seqcount;		\
+	else if (__same_type(*(s), seqcount_ww_mutex_t))		\
+		seq = &((seqcount_ww_mutex_t *)(s))->seqcount;		\
+	else								\
+		BUILD_BUG_ON_MSG(1, "Unknown seqcount type");		\
+									\
+	seq;								\
+})
+
+/*
+ *	seqcount_LOCKTYPE_t -- write APIs
+ *
+ * For associated lock types which do not implicitly disable preemption,
+ * enforce preemption protection in the write side functions.
+ *
+ * Never use lockdep for the raw write variants.
+ */
+
+#define __associated_lock_is_preemptible(s)				\
+({									\
+	bool ret;							\
+									\
+	if (__same_type(*(s), seqcount_t) ||				\
+	    __same_type(*(s), seqcount_spinlock_t) ||			\
+	    __same_type(*(s), seqcount_raw_spinlock_t) ||		\
+	    __same_type(*(s), seqcount_rwlock_t)) {			\
+		ret = false;						\
+	} else if (__same_type(*(s), seqcount_mutex_t) ||		\
+		   __same_type(*(s), seqcount_ww_mutex_t)) {		\
+		ret = true;						\
+	} else								\
+		BUILD_BUG_ON_MSG(1, "Unknown seqcount type");		\
+									\
+	ret;								\
+})
+
+#ifdef CONFIG_LOCKDEP
+
+#define __assert_associated_lock_held(s)				\
+do {									\
+	if (__same_type(*(s), seqcount_t))				\
+		break;							\
+									\
+	if (__same_type(*(s), seqcount_spinlock_t))			\
+		lockdep_assert_held(((seqcount_spinlock_t *)(s))->lock);\
+	else if (__same_type(*(s), seqcount_raw_spinlock_t))		\
+		lockdep_assert_held(((seqcount_raw_spinlock_t *)(s))->lock);	\
+	else if (__same_type(*(s), seqcount_rwlock_t))			\
+		lockdep_assert_held_write(((seqcount_rwlock_t *)(s))->lock);	\
+	else if (__same_type(*(s), seqcount_mutex_t))			\
+		lockdep_assert_held(((seqcount_mutex_t *)(s))->lock);	\
+	else if (__same_type(*(s), seqcount_ww_mutex_t))		\
+		lockdep_assert_held(&((seqcount_ww_mutex_t *)(s))->lock->base);	\
+	else								\
+		BUILD_BUG_ON_MSG(1, "Unknown seqcount type");		\
+} while (0)
+
+#else
+
+#define __assert_associated_lock_held(s)				\
+do {									\
+	(void) __to_seqcount_t(s);					\
+} while (0)
+
+#endif /* CONFIG_LOCKDEP */
+
+#define do_raw_write_seqcount_begin(s)					\
+do {									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_disable();					\
+									\
+	raw_write_seqcount_t_begin(__to_seqcount_t(s));			\
+} while (0)
+
+#define do_raw_write_seqcount_end(s)					\
+do {									\
+	raw_write_seqcount_t_end(__to_seqcount_t(s));			\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_enable();					\
+} while (0)
+
+#define do_write_seqcount_begin_nested(s, subclass)			\
+do {									\
+	__assert_associated_lock_held(s);				\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_disable();					\
+									\
+	write_seqcount_t_begin_nested(__to_seqcount_t(s), subclass);	\
+} while (0)
+
+#define do_write_seqcount_begin(s)					\
+do {									\
+	__assert_associated_lock_held(s);				\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_disable();					\
+									\
+	write_seqcount_t_begin(__to_seqcount_t(s));			\
+} while (0)
+
+#define do_write_seqcount_end(s)					\
+do {									\
+	write_seqcount_t_end(__to_seqcount_t(s));			\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_enable();					\
+} while (0)
+
+#define do_write_seqcount_invalidate(s)					\
+	write_seqcount_t_invalidate(__to_seqcount_t(s))
+
+#define do_raw_write_seqcount_barrier(s)				\
+	raw_write_seqcount_t_barrier(__to_seqcount_t(s))
+
+/*
+ * Latch sequence counters write side critical sections don't need to
+ * run with preemption disabled. Check @raw_write_seqcount_latch().
+ */
+#define do_raw_write_seqcount_latch(s)					\
+	raw_write_seqcount_t_latch(__to_seqcount_t(s))
+
+/*
+ *	seqcount_LOCKTYPE_t -- read APIs
+ */
+
+#define do___read_seqcount_begin(s)					\
+	__read_seqcount_t_begin(__to_seqcount_t(s))
+
+#define do_raw_read_seqcount(s)						\
+	raw_read_seqcount_t(__to_seqcount_t(s))
+
+#define do_raw_seqcount_begin(s)					\
+	raw_seqcount_t_begin(__to_seqcount_t(s))
+
+#define do_raw_read_seqcount_begin(s)					\
+	raw_read_seqcount_t_begin(__to_seqcount_t(s))
+
+#define do_read_seqcount_begin(s)					\
+	read_seqcount_t_begin(__to_seqcount_t(s))
+
+#define do_raw_read_seqcount_latch(s)					\
+	raw_read_seqcount_t_latch(__to_seqcount_t(s))
+
+#define do___read_seqcount_retry(s, start)				\
+	__read_seqcount_t_retry(__to_seqcount_t(s), start)
+
+#define do_read_seqcount_retry(s, start)				\
+	read_seqcount_t_retry(__to_seqcount_t(s), start)
+
+#endif /* __LINUX_SEQLOCK_TYPES_INTERNAL_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 05/18] dma-buf: Remove custom seqcount lockdep class key
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (3 preceding siblings ...)
  2020-06-08  0:57   ` [PATCH v2 04/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 06/18] dma-buf: Use sequence counter with associated wound/wait mutex Ahmed S. Darwish
                     ` (12 subsequent siblings)
  17 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Daniel Vetter,
	David Airlie, dri-devel

Commit 3c3b177a9369 ("reservation: add support for read-only access
using rcu") introduced a sequence counter to manage updates to
reservations. Back then, the reservation object initializer
reservation_object_init() was always inlined.

Having the sequence counter initialization inlined meant that each of
the call sites would have a different lockdep class key, which would've
broken lockdep's deadlock detection. The aforementioned commit thus
introduced, and exported, a custom seqcount lockdep class key and name.

The commit 8735f16803f00 ("dma-buf: cleanup reservation_object_init...")
transformed the reservation object initializer to a normal non-inlined C
function. seqcount_init(), which automatically defines the seqcount
lockdep class key and must be called non-inlined, can now be safely used.

Remove the seqcount custom lockdep class key, name, and export. Use
seqcount_init() inside the dma reservation object initializer.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/dma-buf/dma-resv.c | 9 +--------
 include/linux/dma-resv.h   | 2 --
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 4264e64788c4..590ce7ad60a0 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -50,12 +50,6 @@
 DEFINE_WD_CLASS(reservation_ww_class);
 EXPORT_SYMBOL(reservation_ww_class);
 
-struct lock_class_key reservation_seqcount_class;
-EXPORT_SYMBOL(reservation_seqcount_class);
-
-const char reservation_seqcount_string[] = "reservation_seqcount";
-EXPORT_SYMBOL(reservation_seqcount_string);
-
 /**
  * dma_resv_list_alloc - allocate fence list
  * @shared_max: number of fences we need space for
@@ -134,9 +128,8 @@ subsys_initcall(dma_resv_lockdep);
 void dma_resv_init(struct dma_resv *obj)
 {
 	ww_mutex_init(&obj->lock, &reservation_ww_class);
+	seqcount_init(&obj->seq);
 
-	__seqcount_init(&obj->seq, reservation_seqcount_string,
-			&reservation_seqcount_class);
 	RCU_INIT_POINTER(obj->fence, NULL);
 	RCU_INIT_POINTER(obj->fence_excl, NULL);
 }
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index ee50d10f052b..a6538ae7d93f 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -46,8 +46,6 @@
 #include <linux/rcupdate.h>
 
 extern struct ww_class reservation_ww_class;
-extern struct lock_class_key reservation_seqcount_class;
-extern const char reservation_seqcount_string[];
 
 /**
  * struct dma_resv_list - a list of shared fences
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 06/18] dma-buf: Use sequence counter with associated wound/wait mutex
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (4 preceding siblings ...)
  2020-06-08  0:57   ` [PATCH v2 05/18] dma-buf: Remove custom seqcount lockdep class key Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08 14:32     ` Daniel Vetter
  2020-06-08  0:57   ` [PATCH v2 07/18] sched: tasks: Use sequence counter with associated spinlock Ahmed S. Darwish
                     ` (11 subsequent siblings)
  17 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Daniel Vetter,
	David Airlie, Sumit Semwal, Felix Kuehling, Alex Deucher,
	Christian König, David (ChunMing) Zhou, dri-devel, amd-gfx

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the sequence counter write side critical
section.

The dma-buf reservation subsystem uses plain sequence counters to manage
updates to reservations. Writer serialization is accomplished through a
wound/wait mutex.

Acquiring a wound/wait mutex does not disable preemption, so this needs
to be done manually before and after the write side critical section.

Use the newly-added seqcount_ww_mutex_t instead:

  - It associates the ww_mutex with the sequence count, which enables
    lockdep to validate that the write side critical section is properly
    serialized.

  - It removes the need to explicitly add preempt_disable/enable()
    around the write side critical section because the write_begin/end()
    functions for this new data type automatically do this.

If lockdep is disabled this ww_mutex lock association is compiled out
and has neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 drivers/dma-buf/dma-resv.c                       | 8 +-------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 --
 include/linux/dma-resv.h                         | 2 +-
 3 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 590ce7ad60a0..3aba2b2bfc48 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -128,7 +128,7 @@ subsys_initcall(dma_resv_lockdep);
 void dma_resv_init(struct dma_resv *obj)
 {
 	ww_mutex_init(&obj->lock, &reservation_ww_class);
-	seqcount_init(&obj->seq);
+	seqcount_ww_mutex_init(&obj->seq, &obj->lock);
 
 	RCU_INIT_POINTER(obj->fence, NULL);
 	RCU_INIT_POINTER(obj->fence_excl, NULL);
@@ -259,7 +259,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
 	fobj = dma_resv_get_list(obj);
 	count = fobj->shared_count;
 
-	preempt_disable();
 	write_seqcount_begin(&obj->seq);
 
 	for (i = 0; i < count; ++i) {
@@ -281,7 +280,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
 	smp_store_mb(fobj->shared_count, count);
 
 	write_seqcount_end(&obj->seq);
-	preempt_enable();
 	dma_fence_put(old);
 }
 EXPORT_SYMBOL(dma_resv_add_shared_fence);
@@ -308,14 +306,12 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
 	if (fence)
 		dma_fence_get(fence);
 
-	preempt_disable();
 	write_seqcount_begin(&obj->seq);
 	/* write_seqcount_begin provides the necessary memory barrier */
 	RCU_INIT_POINTER(obj->fence_excl, fence);
 	if (old)
 		old->shared_count = 0;
 	write_seqcount_end(&obj->seq);
-	preempt_enable();
 
 	/* inplace update, no shared fences */
 	while (i--)
@@ -393,13 +389,11 @@ int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
 	src_list = dma_resv_get_list(dst);
 	old = dma_resv_get_excl(dst);
 
-	preempt_disable();
 	write_seqcount_begin(&dst->seq);
 	/* write_seqcount_begin provides the necessary memory barrier */
 	RCU_INIT_POINTER(dst->fence_excl, new);
 	RCU_INIT_POINTER(dst->fence, dst_list);
 	write_seqcount_end(&dst->seq);
-	preempt_enable();
 
 	dma_resv_list_free(src_list);
 	dma_fence_put(old);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 6a5b91d23fd9..c71c0bb6ce26 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -258,11 +258,9 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
 	new->shared_count = k;
 
 	/* Install the new fence list, seqcount provides the barriers */
-	preempt_disable();
 	write_seqcount_begin(&resv->seq);
 	RCU_INIT_POINTER(resv->fence, new);
 	write_seqcount_end(&resv->seq);
-	preempt_enable();
 
 	/* Drop the references to the removed fences or move them to ef_list */
 	for (i = j, k = 0; i < old->shared_count; ++i) {
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index a6538ae7d93f..d44a77e8a7e3 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -69,7 +69,7 @@ struct dma_resv_list {
  */
 struct dma_resv {
 	struct ww_mutex lock;
-	seqcount_t seq;
+	seqcount_ww_mutex_t seq;
 
 	struct dma_fence __rcu *fence_excl;
 	struct dma_resv_list __rcu *fence;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 07/18] sched: tasks: Use sequence counter with associated spinlock
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (5 preceding siblings ...)
  2020-06-08  0:57   ` [PATCH v2 06/18] dma-buf: Use sequence counter with associated wound/wait mutex Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 08/18] netfilter: conntrack: " Ahmed S. Darwish
                     ` (10 subsequent siblings)
  17 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Al Viro

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/sched.h | 2 +-
 init/init_task.c      | 3 ++-
 kernel/fork.c         | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4418f5cb8324..a9ce6fbeb735 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1046,7 +1046,7 @@ struct task_struct {
 	/* Protected by ->alloc_lock: */
 	nodemask_t			mems_allowed;
 	/* Seqence number to catch updates: */
-	seqcount_t			mems_allowed_seq;
+	seqcount_spinlock_t		mems_allowed_seq;
 	int				cpuset_mem_spread_rotor;
 	int				cpuset_slab_spread_rotor;
 #endif
diff --git a/init/init_task.c b/init/init_task.c
index bd403ed3e418..94bf4aea8293 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -142,7 +142,8 @@ struct task_struct init_task
 	.rcu_tasks_idle_cpu = -1,
 #endif
 #ifdef CONFIG_CPUSETS
-	.mems_allowed_seq = SEQCNT_ZERO(init_task.mems_allowed_seq),
+	.mems_allowed_seq = SEQCNT_SPINLOCK_ZERO(init_task.mems_allowed_seq,
+						 &init_task.alloc_lock),
 #endif
 #ifdef CONFIG_RT_MUTEXES
 	.pi_waiters	= RB_ROOT_CACHED,
diff --git a/kernel/fork.c b/kernel/fork.c
index 48ed22774efa..3b88bef92875 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2019,7 +2019,7 @@ static __latent_entropy struct task_struct *copy_process(
 #ifdef CONFIG_CPUSETS
 	p->cpuset_mem_spread_rotor = NUMA_NO_NODE;
 	p->cpuset_slab_spread_rotor = NUMA_NO_NODE;
-	seqcount_init(&p->mems_allowed_seq);
+	seqcount_spinlock_init(&p->mems_allowed_seq, &p->alloc_lock);
 #endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	p->irq_events = 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 08/18] netfilter: conntrack: Use sequence counter with associated spinlock
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (6 preceding siblings ...)
  2020-06-08  0:57   ` [PATCH v2 07/18] sched: tasks: Use sequence counter with associated spinlock Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 09/18] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock Ahmed S. Darwish
                     ` (9 subsequent siblings)
  17 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	Jakub Kicinski, netfilter-devel, coreteam, netdev

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/net/netfilter/nf_conntrack.h | 2 +-
 net/netfilter/nf_conntrack_core.c    | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index 90690e37a56f..ea4e2010b246 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -286,7 +286,7 @@ int nf_conntrack_hash_resize(unsigned int hashsize);
 
 extern struct hlist_nulls_head *nf_conntrack_hash;
 extern unsigned int nf_conntrack_htable_size;
-extern seqcount_t nf_conntrack_generation;
+extern seqcount_spinlock_t nf_conntrack_generation;
 extern unsigned int nf_conntrack_max;
 
 /* must be called with rcu read lock held */
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index bb72ca5f3999..1f9518569195 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -180,7 +180,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_htable_size);
 
 unsigned int nf_conntrack_max __read_mostly;
 EXPORT_SYMBOL_GPL(nf_conntrack_max);
-seqcount_t nf_conntrack_generation __read_mostly;
+seqcount_spinlock_t nf_conntrack_generation __read_mostly;
 static unsigned int nf_conntrack_hash_rnd __read_mostly;
 
 static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple,
@@ -2589,7 +2589,8 @@ int nf_conntrack_init_start(void)
 	/* struct nf_ct_ext uses u8 to store offsets/size */
 	BUILD_BUG_ON(total_extension_size() > 255u);
 
-	seqcount_init(&nf_conntrack_generation);
+	seqcount_spinlock_init(&nf_conntrack_generation,
+			       &nf_conntrack_locks_all_lock);
 
 	for (i = 0; i < CONNTRACK_LOCKS; i++)
 		spin_lock_init(&nf_conntrack_locks[i]);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 09/18] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (7 preceding siblings ...)
  2020-06-08  0:57   ` [PATCH v2 08/18] netfilter: conntrack: " Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 10/18] xfrm: policy: Use sequence counters with associated lock Ahmed S. Darwish
                     ` (8 subsequent siblings)
  17 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	Jakub Kicinski, netfilter-devel, coreteam, netdev

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_rwlock_t data type, which allows to associate a
rwlock with the sequence counter. This enables lockdep to verify that
the rwlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 net/netfilter/nft_set_rbtree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 62f416bc0579..9f58261ee4c7 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -18,7 +18,7 @@
 struct nft_rbtree {
 	struct rb_root		root;
 	rwlock_t		lock;
-	seqcount_t		count;
+	seqcount_rwlock_t	count;
 	struct delayed_work	gc_work;
 };
 
@@ -516,7 +516,7 @@ static int nft_rbtree_init(const struct nft_set *set,
 	struct nft_rbtree *priv = nft_set_priv(set);
 
 	rwlock_init(&priv->lock);
-	seqcount_init(&priv->count);
+	seqcount_rwlock_init(&priv->count, &priv->lock);
 	priv->root = RB_ROOT;
 
 	INIT_DEFERRABLE_WORK(&priv->gc_work, nft_rbtree_gc);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 10/18] xfrm: policy: Use sequence counters with associated lock
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (8 preceding siblings ...)
  2020-06-08  0:57   ` [PATCH v2 09/18] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 11/18] timekeeping: Use sequence counter with associated raw spinlock Ahmed S. Darwish
                     ` (7 subsequent siblings)
  17 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Steffen Klassert,
	Herbert Xu, David S. Miller, Jakub Kicinski, netdev

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the sequence counter write side critical
section.

A plain seqcount_t does not contain the information of which lock must
be held when entering a write side critical section.

Use the new seqcount_spinlock_t and seqcount_mutex_t data types instead,
which allow to associate a lock with the sequence counter. This enables
lockdep to verify that the lock used for writer serialization is held
when the write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 net/xfrm/xfrm_policy.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 564aa6492e7c..732a940468b0 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -122,7 +122,7 @@ struct xfrm_pol_inexact_bin {
 	/* list containing '*:*' policies */
 	struct hlist_head hhead;
 
-	seqcount_t count;
+	seqcount_spinlock_t count;
 	/* tree sorted by daddr/prefix */
 	struct rb_root root_d;
 
@@ -155,7 +155,7 @@ static struct xfrm_policy_afinfo const __rcu *xfrm_policy_afinfo[AF_INET6 + 1]
 						__read_mostly;
 
 static struct kmem_cache *xfrm_dst_cache __ro_after_init;
-static __read_mostly seqcount_t xfrm_policy_hash_generation;
+static __read_mostly seqcount_mutex_t xfrm_policy_hash_generation;
 
 static struct rhashtable xfrm_policy_inexact_table;
 static const struct rhashtable_params xfrm_pol_inexact_params;
@@ -719,7 +719,7 @@ xfrm_policy_inexact_alloc_bin(const struct xfrm_policy *pol, u8 dir)
 	INIT_HLIST_HEAD(&bin->hhead);
 	bin->root_d = RB_ROOT;
 	bin->root_s = RB_ROOT;
-	seqcount_init(&bin->count);
+	seqcount_spinlock_init(&bin->count, &net->xfrm.xfrm_policy_lock);
 
 	prev = rhashtable_lookup_get_insert_key(&xfrm_policy_inexact_table,
 						&bin->k, &bin->head,
@@ -1906,7 +1906,7 @@ static int xfrm_policy_match(const struct xfrm_policy *pol,
 
 static struct xfrm_pol_inexact_node *
 xfrm_policy_lookup_inexact_addr(const struct rb_root *r,
-				seqcount_t *count,
+				seqcount_spinlock_t *count,
 				const xfrm_address_t *addr, u16 family)
 {
 	const struct rb_node *parent;
@@ -4153,7 +4153,7 @@ void __init xfrm_init(void)
 {
 	register_pernet_subsys(&xfrm_net_ops);
 	xfrm_dev_init();
-	seqcount_init(&xfrm_policy_hash_generation);
+	seqcount_mutex_init(&xfrm_policy_hash_generation, &hash_resize_mutex);
 	xfrm_input_init();
 
 #ifdef CONFIG_INET_ESPINTCP
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 11/18] timekeeping: Use sequence counter with associated raw spinlock
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (9 preceding siblings ...)
  2020-06-08  0:57   ` [PATCH v2 10/18] xfrm: policy: Use sequence counters with associated lock Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 12/18] vfs: Use sequence counter with associated spinlock Ahmed S. Darwish
                     ` (6 subsequent siblings)
  17 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, John Stultz,
	Stephen Boyd

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_raw_spinlock_t data type, which allows to associate
a raw spinlock with the sequence counter. This enables lockdep to verify
that the raw spinlock used for writer serialization is held when the
write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 kernel/time/timekeeping.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 9ebaab13339d..24e91a1e2acd 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -39,18 +39,19 @@ enum timekeeping_adv_mode {
 	TK_ADV_FREQ
 };
 
+static DEFINE_RAW_SPINLOCK(timekeeper_lock);
+
 /*
  * The most important data for readout fits into a single 64 byte
  * cache line.
  */
 static struct {
-	seqcount_t		seq;
+	seqcount_raw_spinlock_t	seq;
 	struct timekeeper	timekeeper;
 } tk_core ____cacheline_aligned = {
-	.seq = SEQCNT_ZERO(tk_core.seq),
+	.seq = SEQCNT_RAW_SPINLOCK_ZERO(tk_core.seq, &timekeeper_lock),
 };
 
-static DEFINE_RAW_SPINLOCK(timekeeper_lock);
 static struct timekeeper shadow_timekeeper;
 
 /**
@@ -63,7 +64,7 @@ static struct timekeeper shadow_timekeeper;
  * See @update_fast_timekeeper() below.
  */
 struct tk_fast {
-	seqcount_t		seq;
+	seqcount_raw_spinlock_t	seq;
 	struct tk_read_base	base[2];
 };
 
@@ -80,11 +81,13 @@ static struct clocksource dummy_clock = {
 };
 
 static struct tk_fast tk_fast_mono ____cacheline_aligned = {
+	.seq     = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_mono.seq, &timekeeper_lock),
 	.base[0] = { .clock = &dummy_clock, },
 	.base[1] = { .clock = &dummy_clock, },
 };
 
 static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
+	.seq     = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_raw.seq, &timekeeper_lock),
 	.base[0] = { .clock = &dummy_clock, },
 	.base[1] = { .clock = &dummy_clock, },
 };
@@ -157,7 +160,7 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
  * tk_clock_read - atomic clocksource read() helper
  *
  * This helper is necessary to use in the read paths because, while the
- * seqlock ensures we don't return a bad value while structures are updated,
+ * seqcount ensures we don't return a bad value while structures are updated,
  * it doesn't protect from potential crashes. There is the possibility that
  * the tkr's clocksource may change between the read reference, and the
  * clock reference passed to the read function.  This can cause crashes if
@@ -222,10 +225,10 @@ static inline u64 timekeeping_get_delta(const struct tk_read_base *tkr)
 	unsigned int seq;
 
 	/*
-	 * Since we're called holding a seqlock, the data may shift
+	 * Since we're called holding a seqcount, the data may shift
 	 * under us while we're doing the calculation. This can cause
 	 * false positives, since we'd note a problem but throw the
-	 * results away. So nest another seqlock here to atomically
+	 * results away. So nest another seqcount here to atomically
 	 * grab the points we are checking with.
 	 */
 	do {
@@ -486,7 +489,7 @@ EXPORT_SYMBOL_GPL(ktime_get_raw_fast_ns);
  *
  * To keep it NMI safe since we're accessing from tracing, we're not using a
  * separate timekeeper with updates to monotonic clock and boot offset
- * protected with seqlocks. This has the following minor side effects:
+ * protected with seqcounts. This has the following minor side effects:
  *
  * (1) Its possible that a timestamp be taken after the boot offset is updated
  * but before the timekeeper is updated. If this happens, the new boot offset
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 12/18] vfs: Use sequence counter with associated spinlock
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (10 preceding siblings ...)
  2020-06-08  0:57   ` [PATCH v2 11/18] timekeeping: Use sequence counter with associated raw spinlock Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 13/18] raid5: " Ahmed S. Darwish
                     ` (5 subsequent siblings)
  17 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Alexander Viro,
	Mauro Carvalho Chehab, Jonathan Corbet, linux-fsdevel

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 fs/dcache.c               | 2 +-
 fs/fs_struct.c            | 4 ++--
 include/linux/dcache.h    | 2 +-
 include/linux/fs_struct.h | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b280e07e162b..e5f365d8fd67 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1727,7 +1727,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	dentry->d_lockref.count = 1;
 	dentry->d_flags = 0;
 	spin_lock_init(&dentry->d_lock);
-	seqcount_init(&dentry->d_seq);
+	seqcount_spinlock_init(&dentry->d_seq, &dentry->d_lock);
 	dentry->d_inode = NULL;
 	dentry->d_parent = dentry;
 	dentry->d_sb = sb;
diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index ca639ed967b7..04b3f5b9c629 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -117,7 +117,7 @@ struct fs_struct *copy_fs_struct(struct fs_struct *old)
 		fs->users = 1;
 		fs->in_exec = 0;
 		spin_lock_init(&fs->lock);
-		seqcount_init(&fs->seq);
+		seqcount_spinlock_init(&fs->seq, &fs->lock);
 		fs->umask = old->umask;
 
 		spin_lock(&old->lock);
@@ -163,6 +163,6 @@ EXPORT_SYMBOL(current_umask);
 struct fs_struct init_fs = {
 	.users		= 1,
 	.lock		= __SPIN_LOCK_UNLOCKED(init_fs.lock),
-	.seq		= SEQCNT_ZERO(init_fs.seq),
+	.seq		= SEQCNT_SPINLOCK_ZERO(init_fs.seq, &init_fs.lock),
 	.umask		= 0022,
 };
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index c1488cc84fd9..235563da356d 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -89,7 +89,7 @@ extern struct dentry_stat_t dentry_stat;
 struct dentry {
 	/* RCU lookup touched fields */
 	unsigned int d_flags;		/* protected by d_lock */
-	seqcount_t d_seq;		/* per dentry seqlock */
+	seqcount_spinlock_t d_seq;	/* per dentry seqlock */
 	struct hlist_bl_node d_hash;	/* lookup hash list */
 	struct dentry *d_parent;	/* parent directory */
 	struct qstr d_name;
diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h
index cf1015abfbf2..783b48dedb72 100644
--- a/include/linux/fs_struct.h
+++ b/include/linux/fs_struct.h
@@ -9,7 +9,7 @@
 struct fs_struct {
 	int users;
 	spinlock_t lock;
-	seqcount_t seq;
+	seqcount_spinlock_t seq;
 	int umask;
 	int in_exec;
 	struct path root, pwd;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 13/18] raid5: Use sequence counter with associated spinlock
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (11 preceding siblings ...)
  2020-06-08  0:57   ` [PATCH v2 12/18] vfs: Use sequence counter with associated spinlock Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 14/18] iocost: " Ahmed S. Darwish
                     ` (4 subsequent siblings)
  17 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Song Liu, linux-raid

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 drivers/md/raid5.c | 2 +-
 drivers/md/raid5.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ba00e9877f02..69f31c675b58 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6929,7 +6929,7 @@ static struct r5conf *setup_conf(struct mddev *mddev)
 	} else
 		goto abort;
 	spin_lock_init(&conf->device_lock);
-	seqcount_init(&conf->gen_lock);
+	seqcount_spinlock_init(&conf->gen_lock, &conf->device_lock);
 	mutex_init(&conf->cache_size_mutex);
 	init_waitqueue_head(&conf->wait_for_quiescent);
 	init_waitqueue_head(&conf->wait_for_stripe);
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index f90e0704bed9..a2c9e9e9f5ac 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -589,7 +589,7 @@ struct r5conf {
 	int			prev_chunk_sectors;
 	int			prev_algo;
 	short			generation; /* increments with every reshape */
-	seqcount_t		gen_lock;	/* lock against generation changes */
+	seqcount_spinlock_t	gen_lock;	/* lock against generation changes */
 	unsigned long		reshape_checkpoint; /* Time we last updated
 						     * metadata */
 	long long		min_offset_diff; /* minimum difference between
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 14/18] iocost: Use sequence counter with associated spinlock
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (12 preceding siblings ...)
  2020-06-08  0:57   ` [PATCH v2 13/18] raid5: " Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 15/18] NFSv4: " Ahmed S. Darwish
                     ` (3 subsequent siblings)
  17 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jens Axboe, linux-block

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 block/blk-iocost.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 7c1fe605d0d6..8029a9e8fa55 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -405,7 +405,7 @@ struct ioc {
 	enum ioc_running		running;
 	atomic64_t			vtime_rate;
 
-	seqcount_t			period_seqcount;
+	seqcount_spinlock_t		period_seqcount;
 	u32				period_at;	/* wallclock starttime */
 	u64				period_at_vtime; /* vtime starttime */
 
@@ -872,7 +872,6 @@ static void ioc_now(struct ioc *ioc, struct ioc_now *now)
 
 static void ioc_start_period(struct ioc *ioc, struct ioc_now *now)
 {
-	lockdep_assert_held(&ioc->lock);
 	WARN_ON_ONCE(ioc->running != IOC_RUNNING);
 
 	write_seqcount_begin(&ioc->period_seqcount);
@@ -1958,7 +1957,7 @@ static int blk_iocost_init(struct request_queue *q)
 
 	ioc->running = IOC_IDLE;
 	atomic64_set(&ioc->vtime_rate, VTIME_PER_USEC);
-	seqcount_init(&ioc->period_seqcount);
+	seqcount_spinlock_init(&ioc->period_seqcount, &ioc->lock);
 	ioc->period_at = ktime_to_us(ktime_get());
 	atomic64_set(&ioc->cur_period, 0);
 	atomic_set(&ioc->hweight_gen, 0);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 15/18] NFSv4: Use sequence counter with associated spinlock
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (13 preceding siblings ...)
  2020-06-08  0:57   ` [PATCH v2 14/18] iocost: " Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 16/18] userfaultfd: " Ahmed S. Darwish
                     ` (2 subsequent siblings)
  17 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Trond Myklebust,
	Anna Schumaker, linux-nfs

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 fs/nfs/nfs4_fs.h   | 2 +-
 fs/nfs/nfs4state.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 2b7f6dcd2eb8..210e590e1f71 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -117,7 +117,7 @@ struct nfs4_state_owner {
 	unsigned long	     so_flags;
 	struct list_head     so_states;
 	struct nfs_seqid_counter so_seqid;
-	seqcount_t	     so_reclaim_seqcount;
+	seqcount_spinlock_t  so_reclaim_seqcount;
 	struct mutex	     so_delegreturn_mutex;
 };
 
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index a8dc25ce48bb..b1dba24918f8 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -509,7 +509,7 @@ nfs4_alloc_state_owner(struct nfs_server *server,
 	nfs4_init_seqid_counter(&sp->so_seqid);
 	atomic_set(&sp->so_count, 1);
 	INIT_LIST_HEAD(&sp->so_lru);
-	seqcount_init(&sp->so_reclaim_seqcount);
+	seqcount_spinlock_init(&sp->so_reclaim_seqcount, &sp->so_lock);
 	mutex_init(&sp->so_delegreturn_mutex);
 	return sp;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 16/18] userfaultfd: Use sequence counter with associated spinlock
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (14 preceding siblings ...)
  2020-06-08  0:57   ` [PATCH v2 15/18] NFSv4: " Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 17/18] kvm/eventfd: " Ahmed S. Darwish
  2020-06-08  0:57   ` [PATCH v2 18/18] hrtimer: Use sequence counter with associated raw spinlock Ahmed S. Darwish
  17 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Alexander Viro,
	linux-fsdevel

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 fs/userfaultfd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index e39fdec8a0b0..dd3aab31c50f 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -61,7 +61,7 @@ struct userfaultfd_ctx {
 	/* waitqueue head for events */
 	wait_queue_head_t event_wqh;
 	/* a refile sequence protected by fault_pending_wqh lock */
-	struct seqcount refile_seq;
+	seqcount_spinlock_t refile_seq;
 	/* pseudo fd refcounting */
 	refcount_t refcount;
 	/* userfaultfd syscall flags */
@@ -1998,7 +1998,7 @@ static void init_once_userfaultfd_ctx(void *mem)
 	init_waitqueue_head(&ctx->fault_wqh);
 	init_waitqueue_head(&ctx->event_wqh);
 	init_waitqueue_head(&ctx->fd_wqh);
-	seqcount_init(&ctx->refile_seq);
+	seqcount_spinlock_init(&ctx->refile_seq, &ctx->fault_pending_wqh.lock);
 }
 
 SYSCALL_DEFINE1(userfaultfd, int, flags)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 17/18] kvm/eventfd: Use sequence counter with associated spinlock
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (15 preceding siblings ...)
  2020-06-08  0:57   ` [PATCH v2 16/18] userfaultfd: " Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  2020-06-08 12:57     ` Paolo Bonzini
  2020-06-08  0:57   ` [PATCH v2 18/18] hrtimer: Use sequence counter with associated raw spinlock Ahmed S. Darwish
  17 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Paolo Bonzini, kvm

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/kvm_irqfd.h | 2 +-
 virt/kvm/eventfd.c        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index dc1da020305b..dac047abdba7 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -42,7 +42,7 @@ struct kvm_kernel_irqfd {
 	wait_queue_entry_t wait;
 	/* Update side is protected by irqfds.lock */
 	struct kvm_kernel_irq_routing_entry irq_entry;
-	seqcount_t irq_entry_sc;
+	seqcount_spinlock_t irq_entry_sc;
 	/* Used for level IRQ fast-path */
 	int gsi;
 	struct work_struct inject;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 67b6fc153e9c..8694a2920ea9 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -303,7 +303,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	INIT_LIST_HEAD(&irqfd->list);
 	INIT_WORK(&irqfd->inject, irqfd_inject);
 	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
-	seqcount_init(&irqfd->irq_entry_sc);
+	seqcount_spinlock_init(&irqfd->irq_entry_sc, &kvm->irqfds.lock);
 
 	f = fdget(args->fd);
 	if (!f.file) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v2 18/18] hrtimer: Use sequence counter with associated raw spinlock
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (16 preceding siblings ...)
  2020-06-08  0:57   ` [PATCH v2 17/18] kvm/eventfd: " Ahmed S. Darwish
@ 2020-06-08  0:57   ` Ahmed S. Darwish
  17 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-08  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_raw_spinlock_t data type, which allows to associate
a raw spinlock with the sequence counter. This enables lockdep to verify
that the raw spinlock used for writer serialization is held when the
write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/hrtimer.h |  2 +-
 kernel/time/hrtimer.c   | 13 ++++++++++---
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 15c8ac313678..25993b86ac5c 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -159,7 +159,7 @@ struct hrtimer_clock_base {
 	struct hrtimer_cpu_base	*cpu_base;
 	unsigned int		index;
 	clockid_t		clockid;
-	seqcount_t		seq;
+	seqcount_raw_spinlock_t	seq;
 	struct hrtimer		*running;
 	struct timerqueue_head	active;
 	ktime_t			(*get_time)(void);
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index d89da1c7e005..c4038511d5c9 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -135,7 +135,11 @@ static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
  * timer->base->cpu_base
  */
 static struct hrtimer_cpu_base migration_cpu_base = {
-	.clock_base = { { .cpu_base = &migration_cpu_base, }, },
+	.clock_base = { {
+		.cpu_base = &migration_cpu_base,
+		.seq      = SEQCNT_RAW_SPINLOCK_ZERO(migration_cpu_base.seq,
+						     &migration_cpu_base.lock),
+	}, },
 };
 
 #define migration_base	migration_cpu_base.clock_base[0]
@@ -1998,8 +2002,11 @@ int hrtimers_prepare_cpu(unsigned int cpu)
 	int i;
 
 	for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
-		cpu_base->clock_base[i].cpu_base = cpu_base;
-		timerqueue_init_head(&cpu_base->clock_base[i].active);
+		struct hrtimer_clock_base *clock_b = &cpu_base->clock_base[i];
+
+		clock_b->cpu_base = cpu_base;
+		seqcount_raw_spinlock_init(&clock_b->seq, &cpu_base->lock);
+		timerqueue_init_head(&clock_b->active);
 	}
 
 	cpu_base->cpu = cpu;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* Re: [PATCH v2 17/18] kvm/eventfd: Use sequence counter with associated spinlock
  2020-06-08  0:57   ` [PATCH v2 17/18] kvm/eventfd: " Ahmed S. Darwish
@ 2020-06-08 12:57     ` Paolo Bonzini
  0 siblings, 0 replies; 258+ messages in thread
From: Paolo Bonzini @ 2020-06-08 12:57 UTC (permalink / raw)
  To: Ahmed S. Darwish, Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, kvm

On 08/06/20 02:57, Ahmed S. Darwish wrote:
> A sequence counter write side critical section must be protected by some
> form of locking to serialize writers. A plain seqcount_t does not
> contain the information of which lock must be held when entering a write
> side critical section.
> 
> Use the new seqcount_spinlock_t data type, which allows to associate a
> spinlock with the sequence counter. This enables lockdep to verify that
> the spinlock used for writer serialization is held when the write side
> critical section is entered.
> 
> If lockdep is disabled this lock association is compiled out and has
> neither storage size nor runtime overhead.
> 
> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> ---
>  include/linux/kvm_irqfd.h | 2 +-
>  virt/kvm/eventfd.c        | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
> index dc1da020305b..dac047abdba7 100644
> --- a/include/linux/kvm_irqfd.h
> +++ b/include/linux/kvm_irqfd.h
> @@ -42,7 +42,7 @@ struct kvm_kernel_irqfd {
>  	wait_queue_entry_t wait;
>  	/* Update side is protected by irqfds.lock */
>  	struct kvm_kernel_irq_routing_entry irq_entry;
> -	seqcount_t irq_entry_sc;
> +	seqcount_spinlock_t irq_entry_sc;
>  	/* Used for level IRQ fast-path */
>  	int gsi;
>  	struct work_struct inject;
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 67b6fc153e9c..8694a2920ea9 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -303,7 +303,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>  	INIT_LIST_HEAD(&irqfd->list);
>  	INIT_WORK(&irqfd->inject, irqfd_inject);
>  	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
> -	seqcount_init(&irqfd->irq_entry_sc);
> +	seqcount_spinlock_init(&irqfd->irq_entry_sc, &kvm->irqfds.lock);
>  
>  	f = fdget(args->fd);
>  	if (!f.file) {
> 

Acked-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v2 06/18] dma-buf: Use sequence counter with associated wound/wait mutex
  2020-06-08  0:57   ` [PATCH v2 06/18] dma-buf: Use sequence counter with associated wound/wait mutex Ahmed S. Darwish
@ 2020-06-08 14:32     ` Daniel Vetter
  0 siblings, 0 replies; 258+ messages in thread
From: Daniel Vetter @ 2020-06-08 14:32 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML,
	Daniel Vetter, David Airlie, Sumit Semwal, Felix Kuehling,
	Alex Deucher, Christian König, David (ChunMing) Zhou,
	dri-devel, amd-gfx

On Mon, Jun 08, 2020 at 02:57:17AM +0200, Ahmed S. Darwish wrote:
> A sequence counter write side critical section must be protected by some
> form of locking to serialize writers. If the serialization primitive is
> not disabling preemption implicitly, preemption has to be explicitly
> disabled before entering the sequence counter write side critical
> section.
> 
> The dma-buf reservation subsystem uses plain sequence counters to manage
> updates to reservations. Writer serialization is accomplished through a
> wound/wait mutex.
> 
> Acquiring a wound/wait mutex does not disable preemption, so this needs
> to be done manually before and after the write side critical section.
> 
> Use the newly-added seqcount_ww_mutex_t instead:
> 
>   - It associates the ww_mutex with the sequence count, which enables
>     lockdep to validate that the write side critical section is properly
>     serialized.
> 
>   - It removes the need to explicitly add preempt_disable/enable()
>     around the write side critical section because the write_begin/end()
>     functions for this new data type automatically do this.
> 
> If lockdep is disabled this ww_mutex lock association is compiled out
> and has neither storage size nor runtime overhead.
> 
> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>

I'm not seeing the patch that adds the seqcount ww_mutex glue and not
quite motivated enough to grab it from lore, so someone else needs to
check the details. Just

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

for merging through whatever tree/branch makes sense from me.
-Daniel

> ---
>  drivers/dma-buf/dma-resv.c                       | 8 +-------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 --
>  include/linux/dma-resv.h                         | 2 +-
>  3 files changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 590ce7ad60a0..3aba2b2bfc48 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -128,7 +128,7 @@ subsys_initcall(dma_resv_lockdep);
>  void dma_resv_init(struct dma_resv *obj)
>  {
>  	ww_mutex_init(&obj->lock, &reservation_ww_class);
> -	seqcount_init(&obj->seq);
> +	seqcount_ww_mutex_init(&obj->seq, &obj->lock);
>  
>  	RCU_INIT_POINTER(obj->fence, NULL);
>  	RCU_INIT_POINTER(obj->fence_excl, NULL);
> @@ -259,7 +259,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
>  	fobj = dma_resv_get_list(obj);
>  	count = fobj->shared_count;
>  
> -	preempt_disable();
>  	write_seqcount_begin(&obj->seq);
>  
>  	for (i = 0; i < count; ++i) {
> @@ -281,7 +280,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
>  	smp_store_mb(fobj->shared_count, count);
>  
>  	write_seqcount_end(&obj->seq);
> -	preempt_enable();
>  	dma_fence_put(old);
>  }
>  EXPORT_SYMBOL(dma_resv_add_shared_fence);
> @@ -308,14 +306,12 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
>  	if (fence)
>  		dma_fence_get(fence);
>  
> -	preempt_disable();
>  	write_seqcount_begin(&obj->seq);
>  	/* write_seqcount_begin provides the necessary memory barrier */
>  	RCU_INIT_POINTER(obj->fence_excl, fence);
>  	if (old)
>  		old->shared_count = 0;
>  	write_seqcount_end(&obj->seq);
> -	preempt_enable();
>  
>  	/* inplace update, no shared fences */
>  	while (i--)
> @@ -393,13 +389,11 @@ int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
>  	src_list = dma_resv_get_list(dst);
>  	old = dma_resv_get_excl(dst);
>  
> -	preempt_disable();
>  	write_seqcount_begin(&dst->seq);
>  	/* write_seqcount_begin provides the necessary memory barrier */
>  	RCU_INIT_POINTER(dst->fence_excl, new);
>  	RCU_INIT_POINTER(dst->fence, dst_list);
>  	write_seqcount_end(&dst->seq);
> -	preempt_enable();
>  
>  	dma_resv_list_free(src_list);
>  	dma_fence_put(old);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 6a5b91d23fd9..c71c0bb6ce26 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -258,11 +258,9 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
>  	new->shared_count = k;
>  
>  	/* Install the new fence list, seqcount provides the barriers */
> -	preempt_disable();
>  	write_seqcount_begin(&resv->seq);
>  	RCU_INIT_POINTER(resv->fence, new);
>  	write_seqcount_end(&resv->seq);
> -	preempt_enable();
>  
>  	/* Drop the references to the removed fences or move them to ef_list */
>  	for (i = j, k = 0; i < old->shared_count; ++i) {
> diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
> index a6538ae7d93f..d44a77e8a7e3 100644
> --- a/include/linux/dma-resv.h
> +++ b/include/linux/dma-resv.h
> @@ -69,7 +69,7 @@ struct dma_resv_list {
>   */
>  struct dma_resv {
>  	struct ww_mutex lock;
> -	seqcount_t seq;
> +	seqcount_ww_mutex_t seq;
>  
>  	struct dma_fence __rcu *fence_excl;
>  	struct dma_resv_list __rcu *fence;
> -- 
> 2.20.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 258+ messages in thread

* [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (25 preceding siblings ...)
  2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
@ 2020-06-30  5:44 ` Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 01/20] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
                     ` (19 more replies)
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (3 subsequent siblings)
  30 siblings, 20 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc, David Airlie, Daniel Vetter, dri-devel,
	David S. Miller, netdev, Jens Axboe, linux-block, Alexander Viro,
	linux-fsdevel

Hi,

This is v3 of the seqlock patch series:

   [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks
   https://lore.kernel.org/lkml/20200519214547.352050-1-a.darwish@linutronix.de

   [PATCH v2 00/18]
   https://lore.kernel.org/lkml/20200608005729.1874024-1-a.darwish@linutronix.de

It's based over:

   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git locking/core

to get Peter's lockdep irqstate tracking series below, which untangles
mainline seqlock.h<=>sched.h 'current->' task_struct circular dependency:

   https://lkml.kernel.org/r/linuxppc-dev/20200623083645.277342609@infradead.org

Changelog-v3:

 - Re-add lockdep non-preemptibility checks on seqcount_t write paths.
   They were removed from v2 due to the circular dependencies mentioned.

 - Slight rebase over the new v5.8-rc1 KCSAN seqlock.h changes

 - Collect seqcount_t call-sites acked-by tags

Thanks,

8<--------------

Ahmed S. Darwish (20):
  Documentation: locking: Describe seqlock design and usage
  seqlock: Properly format kernel-doc code samples
  seqlock: Add missing kernel-doc annotations
  lockdep: Add preemption enabled/disabled assertion APIs
  seqlock: lockdep assert non-preemptibility on seqcount_t write
  seqlock: Extend seqcount API with associated locks
  dma-buf: Remove custom seqcount lockdep class key
  dma-buf: Use sequence counter with associated wound/wait mutex
  sched: tasks: Use sequence counter with associated spinlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  xfrm: policy: Use sequence counters with associated lock
  timekeeping: Use sequence counter with associated raw spinlock
  vfs: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  hrtimer: Use sequence counter with associated raw spinlock

 Documentation/locking/index.rst               |   1 +
 Documentation/locking/seqlock.rst             | 242 +++++
 MAINTAINERS                                   |   2 +-
 block/blk-iocost.c                            |   5 +-
 drivers/dma-buf/dma-resv.c                    |  15 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   2 -
 drivers/md/raid5.c                            |   2 +-
 drivers/md/raid5.h                            |   2 +-
 fs/dcache.c                                   |   2 +-
 fs/fs_struct.c                                |   4 +-
 fs/nfs/nfs4_fs.h                              |   2 +-
 fs/nfs/nfs4state.c                            |   2 +-
 fs/userfaultfd.c                              |   4 +-
 include/linux/dcache.h                        |   2 +-
 include/linux/dma-resv.h                      |   4 +-
 include/linux/fs_struct.h                     |   2 +-
 include/linux/hrtimer.h                       |   2 +-
 include/linux/kvm_irqfd.h                     |   2 +-
 include/linux/lockdep.h                       |  18 +
 include/linux/sched.h                         |   2 +-
 include/linux/seqlock.h                       | 872 ++++++++++++++----
 include/linux/seqlock_types_internal.h        | 186 ++++
 include/net/netfilter/nf_conntrack.h          |   2 +-
 init/init_task.c                              |   3 +-
 kernel/fork.c                                 |   2 +-
 kernel/time/hrtimer.c                         |  13 +-
 kernel/time/timekeeping.c                     |  19 +-
 lib/Kconfig.debug                             |   1 +
 net/netfilter/nf_conntrack_core.c             |   5 +-
 net/netfilter/nft_set_rbtree.c                |   4 +-
 net/xfrm/xfrm_policy.c                        |  10 +-
 virt/kvm/eventfd.c                            |   2 +-
 32 files changed, 1211 insertions(+), 225 deletions(-)
 create mode 100644 Documentation/locking/seqlock.rst
 create mode 100644 include/linux/seqlock_types_internal.h

base-commit: 997e89fa345e9006f311cf9f9c8fd9f7d96c240f
--
2.20.1

^ permalink raw reply	[flat|nested] 258+ messages in thread

* [PATCH v3 01/20] Documentation: locking: Describe seqlock design and usage
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-07-06 21:04     ` Peter Zijlstra
  2020-06-30  5:44   ` [PATCH v3 02/20] seqlock: Properly format kernel-doc code samples Ahmed S. Darwish
                     ` (18 subsequent siblings)
  19 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc

Proper documentation for the design and usage of sequence counters and
sequential locks does not exist. Complete the seqlock.h documentation as
follows:

  - Divide all documentation on a seqcount_t vs. seqlock_t basis. The
    description for both mechanisms was intermingled, which is incorrect
    since the usage constrains for each type are vastly different.

  - Add an introductory paragraph describing the internal design of, and
    rationale for, sequence counters.

  - Document seqcount_t writer non-preemptibility requirement, which was
    not previously documented anywhere, and provide a clear rationale.

  - Provide template code for seqcount_t and seqlock_t initialization
    and reader/writer critical sections.

  - Recommend using seqlock_t by default. It implicitly handles the
    serialization and non-preemptibility requirements of writers.

At seqlock.h:

  - Remove references to brlocks as they've long been removed from the
    kernel.

  - Remove references to gcc-3.x since the kernel's minimum supported
    gcc version is 4.6.

References: 0f6ed63b1707 ("no need to keep brlock macros anymore...")
References: cafa0010cd51 ("Raise the minimum required gcc version to 4.6")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 Documentation/locking/index.rst   |   1 +
 Documentation/locking/seqlock.rst | 184 ++++++++++++++++++++++++++++++
 include/linux/seqlock.h           |  77 ++++++-------
 3 files changed, 221 insertions(+), 41 deletions(-)
 create mode 100644 Documentation/locking/seqlock.rst

diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
index d785878cad65..7003bd5aeff4 100644
--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -14,6 +14,7 @@ locking
     mutex-design
     rt-mutex-design
     rt-mutex
+    seqlock
     spinlocks
     ww-mutex-design
     preempt-locking
diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
new file mode 100644
index 000000000000..c9916efe038e
--- /dev/null
+++ b/Documentation/locking/seqlock.rst
@@ -0,0 +1,184 @@
+======================================
+Sequence counters and sequential locks
+======================================
+
+Introduction
+============
+
+Sequence counters are a reader-writer consistency mechanism with
+lockless readers (read-only retry loops), and no writer starvation. They
+are used for data that's rarely written to (e.g. system time), where the
+reader wants a consistent set of information and is willing to retry if
+that information changes.
+
+A data set is consistent when the sequence count at the beginning of the
+read side critical section is even and the same sequence count value is
+read again at the end of the critical section. The data in the set must
+be copied out inside the read side critical section. If the sequence
+count has changed between the start and the end of the critical section,
+the reader must retry.
+
+Writers increment the sequence count at the start and the end of their
+critical section. After starting the critical section the sequence count
+is odd and indicates to the readers that an update is in progress. At
+the end of the write side critical section the sequence count becomes
+even again which lets readers make progress.
+
+A sequence counter write side critical section must never be preempted
+or interrupted by read side sections. Otherwise the reader will spin for
+the entire scheduler tick due to the odd sequence count value and the
+interrupted writer. If that reader belongs to a real-time scheduling
+class, it can spin forever and the kernel will livelock.
+
+This mechanism cannot be used if the protected data contains pointers,
+as the writer can invalidate a pointer that the reader is following.
+
+.. _seqcount_t:
+
+Sequence counters (:c:type:`seqcount_t`)
+========================================
+
+This is the the raw counting mechanism, which does not protect against
+multiple writers.  Write side critical sections must thus be serialized
+by an external lock.
+
+If the write serialization primitive is not implicitly disabling
+preemption, preemption must be explicitly disabled before entering the
+write side section. If the read section can be invoked from hardirq or
+softirq contexts, interrupts or bottom halves must also be respectively
+disabled before entering the write section.
+
+If it's desired to automatically handle the sequence counter
+requirements of writer serialization and non-preemptibility, use a
+:ref:`sequential lock <seqlock_t>` instead.
+
+Initialization:
+
+.. code-block:: c
+
+	/* dynamic */
+	seqcount_t foo_seqcount;
+	seqcount_init(&foo_seqcount);
+
+	/* static */
+	static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
+
+	/* C99 struct init */
+	struct {
+		.seq   = SEQCNT_ZERO(foo.seq),
+	} foo;
+
+Write path:
+
+.. code-block:: c
+
+	/* Serialized context with disabled preemption */
+
+	write_seqcount_begin(&foo_seqcount);
+
+	/* ... [[write-side critical section]] ... */
+
+	write_seqcount_end(&foo_seqcount);
+
+Read path:
+
+.. code-block:: c
+
+	do {
+		seq = read_seqcount_begin(&foo_seqcount);
+
+		/* ... [[read-side critical section]] ... */
+
+	} while (read_seqcount_retry(&foo_seqcount, seq));
+
+.. _seqlock_t:
+
+Sequential locks (:c:type:`seqlock_t`)
+======================================
+
+This contains the :ref:`sequence counting mechanism <seqcount_t>`
+earlier discussed, plus an embedded spinlock for writer serialization
+and non-preemptibility.
+
+If the read side section can be invoked from hardirq or softirq context,
+use the write side function variants which disable interrupts or bottom
+halves respectively.
+
+Initialization:
+
+.. code-block:: c
+
+	/* dynamic */
+	seqlock_t foo_seqlock;
+	seqlock_init(&foo_seqlock);
+
+	/* static */
+	static DEFINE_SEQLOCK(foo_seqlock);
+
+	/* C99 struct init */
+	struct {
+		.seql   = __SEQLOCK_UNLOCKED(foo.seql)
+	} foo;
+
+Write path:
+
+.. code-block:: c
+
+	write_seqlock(&foo_seqlock);
+
+	/* ... [[write-side critical section]] ... */
+
+	write_sequnlock(&foo_seqlock);
+
+Read path, three categories:
+
+1. Normal Sequence readers which never block a writer but they must
+   retry if a writer is in progress by detecting change in the sequence
+   number.  Writers do not wait for a sequence reader.
+
+   .. code-block:: c
+
+	do {
+		seq = read_seqbegin(&foo_seqlock);
+
+		/* ... [[read-side critical section]] ... */
+
+	} while (read_seqretry(&foo_seqlock, seq));
+
+2. Locking readers which will wait if a writer or another locking reader
+   is in progress. A locking reader in progress will also block a writer
+   from entering its critical section. This read lock is
+   exclusive. Unlike rwlock_t, only one locking reader can acquire it.
+
+   .. code-block:: c
+
+	read_seqlock_excl(&foo_seqlock);
+
+	/* ... [[read-side critical section]] ... */
+
+	read_sequnlock_excl(&foo_seqlock);
+
+3. Conditional lockless reader (as in 1), or locking reader (as in 2),
+   according to a passed marker. This is used to avoid lockless readers
+   starvation (too much retry loops) in case of a sharp spike in write
+   activity. First, a lockless read is tried (even marker passed). If
+   that trial fails (odd sequence counter is returned, which is used as
+   the next iteration marker), the lockless read is transformed to a
+   full locking read and no retry loop is necessary.
+
+   .. code-block:: c
+
+	/* marker; even initialization */
+	int seq = 0;
+	do {
+		read_seqbegin_or_lock(&foo_seqlock, &seq);
+
+		/* ... [[read-side critical section]] ... */
+
+	} while (need_seqretry(&foo_seqlock, seq));
+	done_seqretry(&foo_seqlock, seq);
+
+API documentation
+=================
+
+.. kernel-doc:: include/linux/seqlock.h
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 8b97204f35a7..e54ff48e87f8 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -1,36 +1,15 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef __LINUX_SEQLOCK_H
 #define __LINUX_SEQLOCK_H
+
 /*
- * Reader/writer consistent mechanism without starving writers. This type of
- * lock for data where the reader wants a consistent set of information
- * and is willing to retry if the information changes. There are two types
- * of readers:
- * 1. Sequence readers which never block a writer but they may have to retry
- *    if a writer is in progress by detecting change in sequence number.
- *    Writers do not wait for a sequence reader.
- * 2. Locking readers which will wait if a writer or another locking reader
- *    is in progress. A locking reader in progress will also block a writer
- *    from going forward. Unlike the regular rwlock, the read lock here is
- *    exclusive so that only one locking reader can get it.
+ * seqcount_t / seqlock_t - a reader-writer consistency mechanism with
+ * lockless readers (read-only retry loops), and no writer starvation.
  *
- * This is not as cache friendly as brlock. Also, this may not work well
- * for data that contains pointers, because any writer could
- * invalidate a pointer that a reader was following.
+ * See Documentation/locking/seqlock.rst for full description.
  *
- * Expected non-blocking reader usage:
- * 	do {
- *	    seq = read_seqbegin(&foo);
- * 	...
- *      } while (read_seqretry(&foo, seq));
- *
- *
- * On non-SMP the spin locks disappear but the writer still needs
- * to increment the sequence variables because an interrupt routine could
- * change the state of the data.
- *
- * Based on x86_64 vsyscall gettimeofday 
- * by Keith Owens and Andrea Arcangeli
+ * Copyrights:
+ * - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli
  */
 
 #include <linux/spinlock.h>
@@ -41,8 +20,8 @@
 #include <asm/processor.h>
 
 /*
- * The seqlock interface does not prescribe a precise sequence of read
- * begin/retry/end. For readers, typically there is a call to
+ * The seqlock seqcount_t interface does not prescribe a precise sequence of
+ * read begin/retry/end. For readers, typically there is a call to
  * read_seqcount_begin() and read_seqcount_retry(), however, there are more
  * esoteric cases which do not follow this pattern.
  *
@@ -56,10 +35,24 @@
 #define KCSAN_SEQLOCK_REGION_MAX 1000
 
 /*
- * Version using sequence counter only.
- * This can be used when code has its own mutex protecting the
- * updating starting before the write_seqcountbeqin() and ending
- * after the write_seqcount_end().
+ * Sequence counters (seqcount_t)
+ *
+ * This is the raw counting mechanism, without any writer protection.
+ *
+ * Write side critical sections must be serialized and non-preemptible.
+ *
+ * If readers can be invoked from hardirq or softirq contexts,
+ * interrupts or bottom halves must also be respectively disabled before
+ * entering the write section.
+ *
+ * This mechanism can't be used if the protected data contains pointers,
+ * as the writer can invalidate a pointer that a reader is following.
+ *
+ * If it's desired to automatically handle the sequence counter writer
+ * serialization and non-preemptibility requirements, use a sequential
+ * lock (seqlock_t) instead.
+ *
+ * See Documentation/locking/seqlock.rst
  */
 typedef struct seqcount {
 	unsigned sequence;
@@ -398,10 +391,6 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
        smp_wmb();      /* increment "sequence" before following stores */
 }
 
-/*
- * Sequence counter only version assumes that callers are using their
- * own mutexing.
- */
 static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
 	raw_write_seqcount_begin(s);
@@ -434,15 +423,21 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
+/*
+ * Sequential locks (seqlock_t)
+ *
+ * Sequence counters with an embedded spinlock for writer serialization
+ * and non-preemptibility.
+ *
+ * For more info, see:
+ *   - Comments on top of seqcount_t
+ *   - Documentation/locking/seqlock.rst
+ */
 typedef struct {
 	struct seqcount seqcount;
 	spinlock_t lock;
 } seqlock_t;
 
-/*
- * These macros triggered gcc-3.x compile-time problems.  We think these are
- * OK now.  Be cautious.
- */
 #define __SEQLOCK_UNLOCKED(lockname)			\
 	{						\
 		.seqcount = SEQCNT_ZERO(lockname),	\
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 02/20] seqlock: Properly format kernel-doc code samples
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 01/20] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 03/20] seqlock: Add missing kernel-doc annotations Ahmed S. Darwish
                     ` (17 subsequent siblings)
  19 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc

Align the code samples and note sections inside kernel-doc comments with
tabs. This way they can be properly parsed and rendered by Sphinx. It
also makes the code samples easier to read from text editors.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 82 +++++++++++++++++++++--------------------
 1 file changed, 43 insertions(+), 39 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index e54ff48e87f8..d3bba59eb4df 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -256,7 +256,7 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  *
  * This can be used to provide an ordering guarantee instead of the
  * usual consistency guarantee. It is one wmb cheaper, because we can
- * collapse the two back-to-back wmb()s.
+ * collapse the two back-to-back wmb()s::
  *
  * Note that writes surrounding the barrier should be declared atomic (e.g.
  * via WRITE_ONCE): a) to ensure the writes become visible to other threads
@@ -325,64 +325,68 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  * Very simply put: we first modify one copy and then the other. This ensures
  * there is always one copy in a stable state, ready to give us an answer.
  *
- * The basic form is a data structure like:
+ * The basic form is a data structure like::
  *
- * struct latch_struct {
- *	seqcount_t		seq;
- *	struct data_struct	data[2];
- * };
+ *	struct latch_struct {
+ *		seqcount_t		seq;
+ *		struct data_struct	data[2];
+ *	};
  *
  * Where a modification, which is assumed to be externally serialized, does the
- * following:
+ * following::
  *
- * void latch_modify(struct latch_struct *latch, ...)
- * {
- *	smp_wmb();	<- Ensure that the last data[1] update is visible
- *	latch->seq++;
- *	smp_wmb();	<- Ensure that the seqcount update is visible
+ *	void latch_modify(struct latch_struct *latch, ...)
+ *	{
+ *		smp_wmb();	// Ensure that the last data[1] update is visible
+ *		latch->seq++;
+ *		smp_wmb();	// Ensure that the seqcount update is visible
  *
- *	modify(latch->data[0], ...);
+ *		modify(latch->data[0], ...);
  *
- *	smp_wmb();	<- Ensure that the data[0] update is visible
- *	latch->seq++;
- *	smp_wmb();	<- Ensure that the seqcount update is visible
+ *		smp_wmb();	// Ensure that the data[0] update is visible
+ *		latch->seq++;
+ *		smp_wmb();	// Ensure that the seqcount update is visible
  *
- *	modify(latch->data[1], ...);
- * }
+ *		modify(latch->data[1], ...);
+ *	}
  *
- * The query will have a form like:
+ * The query will have a form like::
  *
- * struct entry *latch_query(struct latch_struct *latch, ...)
- * {
- *	struct entry *entry;
- *	unsigned seq, idx;
+ *	struct entry *latch_query(struct latch_struct *latch, ...)
+ *	{
+ *		struct entry *entry;
+ *		unsigned seq, idx;
  *
- *	do {
- *		seq = raw_read_seqcount_latch(&latch->seq);
+ *		do {
+ *			seq = raw_read_seqcount_latch(&latch->seq);
  *
- *		idx = seq & 0x01;
- *		entry = data_query(latch->data[idx], ...);
+ *			idx = seq & 0x01;
+ *			entry = data_query(latch->data[idx], ...);
  *
- *		smp_rmb();
- *	} while (seq != latch->seq);
+ *			smp_rmb();
+ *		} while (seq != latch->seq);
  *
- *	return entry;
- * }
+ *		return entry;
+ *	}
  *
  * So during the modification, queries are first redirected to data[1]. Then we
  * modify data[0]. When that is complete, we redirect queries back to data[0]
  * and we can modify data[1].
  *
- * NOTE: The non-requirement for atomic modifications does _NOT_ include
- *       the publishing of new entries in the case where data is a dynamic
- *       data structure.
+ * NOTE:
  *
- *       An iteration might start in data[0] and get suspended long enough
- *       to miss an entire modification sequence, once it resumes it might
- *       observe the new entry.
+ *	The non-requirement for atomic modifications does _NOT_ include
+ *	the publishing of new entries in the case where data is a dynamic
+ *	data structure.
  *
- * NOTE: When data is a dynamic data structure; one should use regular RCU
- *       patterns to manage the lifetimes of the objects within.
+ *	An iteration might start in data[0] and get suspended long enough
+ *	to miss an entire modification sequence, once it resumes it might
+ *	observe the new entry.
+ *
+ * NOTE:
+ *
+ *	When data is a dynamic data structure; one should use regular RCU
+ *	patterns to manage the lifetimes of the objects within.
  */
 static inline void raw_write_seqcount_latch(seqcount_t *s)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 03/20] seqlock: Add missing kernel-doc annotations
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 01/20] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 02/20] seqlock: Properly format kernel-doc code samples Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 04/20] lockdep: Add preemption enabled/disabled assertion APIs Ahmed S. Darwish
                     ` (16 subsequent siblings)
  19 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc

A small number of the the exported seqlock.h functions are kernel-doc
annotated.

Since seqlock.h is now included by the kernel's RST documentation, add
kernel-doc annotations for all of the remaining functions.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 398 ++++++++++++++++++++++++++++++++++------
 1 file changed, 339 insertions(+), 59 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index d3bba59eb4df..057f7326a877 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -75,6 +75,10 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
 # define SEQCOUNT_DEP_MAP_INIT(lockname) \
 		.dep_map = { .name = #lockname } \
 
+/**
+ * seqcount_init() - runtime initializer for seqcount_t
+ * @s: Pointer to the &typedef seqcount_t instance
+ */
 # define seqcount_init(s)				\
 	do {						\
 		static struct lock_class_key __key;	\
@@ -98,13 +102,17 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
 # define seqcount_lockdep_reader_access(x)
 #endif
 
-#define SEQCNT_ZERO(lockname) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(lockname)}
+/**
+ * SEQCNT_ZERO() - static initializer for seqcount_t
+ * @name: Name of the &typedef seqcount_t instance
+ */
+#define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) }
 
 
 /**
- * __read_seqcount_begin - begin a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * __read_seqcount_begin() - begin a seqcount_t read section (without barrier)
+ * @s: Pointer to &typedef seqcount_t
+ * Returns: count to be passed to read_seqcount_retry()
  *
  * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
  * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
@@ -129,13 +137,14 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount - Read the raw seqcount
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * raw_read_seqcount() - Read the seqcount_t raw counter value
+ * @s: Pointer to &typedef seqcount_t
+ * Returns: count to be passed to read_seqcount_retry()
  *
  * raw_read_seqcount opens a read critical section of the given
- * seqcount without any lockdep checking and without checking or
- * masking the LSB. Calling code is responsible for handling that.
+ * seqcount_t, without any lockdep checks and without checking or
+ * masking the sequence counter LSB. Calling code is responsible for
+ * handling that.
  */
 static inline unsigned raw_read_seqcount(const seqcount_t *s)
 {
@@ -146,13 +155,13 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount_begin - start seq-read critical section w/o lockdep
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * raw_read_seqcount_begin() - start a seqcount_t read section w/o lockdep
+ * @s: Pointer to &typedef seqcount_t
+ * Returns: count to be passed to read_seqcount_retry()
  *
  * raw_read_seqcount_begin opens a read critical section of the given
- * seqcount, but without any lockdep checking. Validity of the critical
- * section is tested by checking read_seqcount_retry function.
+ * seqcount_t, but without any lockdep checking. Validity of the read
+ * section must be checked with read_seqcount_retry().
  */
 static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
 {
@@ -162,13 +171,13 @@ static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * read_seqcount_begin - begin a seq-read critical section
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * read_seqcount_begin() - start a seqcount_t read critical section
+ * @s: Pointer to &typedef seqcount_t
+ * Returns: count to be passed to read_seqcount_retry()
  *
- * read_seqcount_begin opens a read critical section of the given seqcount.
- * Validity of the critical section is tested by checking read_seqcount_retry
- * function.
+ * read_seqcount_begin opens a read critical section of the given
+ * seqcount_t. Validity of the read section must be checked with
+ * read_seqcount_retry().
  */
 static inline unsigned read_seqcount_begin(const seqcount_t *s)
 {
@@ -177,11 +186,11 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * raw_seqcount_begin - begin a seq-read critical section
- * @s: pointer to seqcount_t
+ * raw_seqcount_begin() - begin a seq-read critical section
+ * @s: Pointer to &typedef seqcount_t
  * Returns: count to be passed to read_seqcount_retry
  *
- * raw_seqcount_begin opens a read critical section of the given seqcount.
+ * raw_seqcount_begin opens a read critical section of the given seqcount_t.
  * Validity of the critical section is tested by checking read_seqcount_retry
  * function.
  *
@@ -199,8 +208,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * __read_seqcount_retry - end a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
+ * __read_seqcount_retry() - end a seq-read critical section (without barrier)
+ * @s: Pointer to &typedef seqcount_t
  * @start: count, from read_seqcount_begin
  * Returns: 1 if retry is required, else 0
  *
@@ -219,12 +228,12 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
 }
 
 /**
- * read_seqcount_retry - end a seq-read critical section
- * @s: pointer to seqcount_t
+ * read_seqcount_retry() - end a seq-read critical section
+ * @s: Pointer to &typedef seqcount_t
  * @start: count, from read_seqcount_begin
  * Returns: 1 if retry is required, else 0
  *
- * read_seqcount_retry closes a read critical section of the given seqcount.
+ * read_seqcount_retry closes a read critical section of given seqcount_t.
  * If the critical section was invalid, it must be ignored (and typically
  * retried).
  */
@@ -251,8 +260,8 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 }
 
 /**
- * raw_write_seqcount_barrier - do a seq write barrier
- * @s: pointer to seqcount_t
+ * raw_write_seqcount_barrier() - do a seq write barrier
+ * @s: Pointer to &typedef seqcount_t
  *
  * This can be used to provide an ordering guarantee instead of the
  * usual consistency guarantee. It is one wmb cheaper, because we can
@@ -300,6 +309,21 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
+/**
+ * raw_read_seqcount_latch() - pick even or odd seqcount_t latch data copy
+ * @s: Pointer to &typedef seqcount_t
+ *
+ * Use seqcount latching to switch between two storage places with
+ * sequence protection to allow interruptible, preemptible, writer
+ * sections.
+ *
+ * Check raw_write_seqcount_latch() for more details and a full reader
+ * and writer usage example.
+ *
+ * Return: sequence counter. Use the lowest bit as index for picking
+ * which data copy to read. Full counter must then be checked with
+ * read_seqcount_retry().
+ */
 static inline int raw_read_seqcount_latch(seqcount_t *s)
 {
 	/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
@@ -308,8 +332,8 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
 }
 
 /**
- * raw_write_seqcount_latch - redirect readers to even/odd copy
- * @s: pointer to seqcount_t
+ * raw_write_seqcount_latch() - redirect readers to even/odd copy
+ * @s: Pointer to &typedef seqcount_t
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -363,8 +387,8 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  *			idx = seq & 0x01;
  *			entry = data_query(latch->data[idx], ...);
  *
- *			smp_rmb();
- *		} while (seq != latch->seq);
+ *			// read_seqcount_retry() includes necessary smp_rmb()
+ *		} while (read_seqcount_retry(&latch->seq, seq);
  *
  *		return entry;
  *	}
@@ -401,11 +425,27 @@ static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
 	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }
 
+/**
+ * write_seqcount_begin() - start a seqcount_t write-side critical section
+ * @s: Pointer to &typedef seqcount_t
+ *
+ * write_seqcount_begin opens a write-side critical section of the given
+ * seqcount_t. Sequence counter write-side critical sections must be
+ * serialized and non-preemptible. If readers can be invoked from
+ * hardirq or softirq contexts, interrupts or bottom halves must be
+ * respectively disabled.
+ */
 static inline void write_seqcount_begin(seqcount_t *s)
 {
 	write_seqcount_begin_nested(s, 0);
 }
 
+/**
+ * write_seqcount_end() - end a seqcount_t write-side critical section
+ * @s: Pointer to &typedef seqcount_t
+ *
+ * The write section must've been opened with write_seqcount_begin().
+ */
 static inline void write_seqcount_end(seqcount_t *s)
 {
 	seqcount_release(&s->dep_map, _RET_IP_);
@@ -413,8 +453,8 @@ static inline void write_seqcount_end(seqcount_t *s)
 }
 
 /**
- * write_seqcount_invalidate - invalidate in-progress read-side seq operations
- * @s: pointer to seqcount_t
+ * write_seqcount_invalidate() - invalidate in-progress read-side seq operations
+ * @s: Pointer to &typedef seqcount_t
  *
  * After write_seqcount_invalidate, no read-side seq operations will complete
  * successfully and see data older than this.
@@ -448,17 +488,32 @@ typedef struct {
 		.lock =	__SPIN_LOCK_UNLOCKED(lockname)	\
 	}
 
-#define seqlock_init(x)					\
+/**
+ * seqlock_init() - dynamic initializer for seqlock_t
+ * @sl: Pointer to the &typedef seqlock_t instance
+ */
+#define seqlock_init(sl)				\
 	do {						\
-		seqcount_init(&(x)->seqcount);		\
-		spin_lock_init(&(x)->lock);		\
+		seqcount_init(&(sl)->seqcount);		\
+		spin_lock_init(&(sl)->lock);		\
 	} while (0)
 
-#define DEFINE_SEQLOCK(x) \
-		seqlock_t x = __SEQLOCK_UNLOCKED(x)
+/**
+ * DEFINE_SEQLOCK() - Define a statically-allocated seqlock_t
+ * @sl: Name of the &typedef seqlock_t instance
+ */
+#define DEFINE_SEQLOCK(sl) \
+		seqlock_t sl = __SEQLOCK_UNLOCKED(sl)
 
-/*
- * Read side functions for starting and finalizing a read side section.
+/**
+ * read_seqbegin() - start a seqlock_t read-side critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_seqbegin opens a read side critical section of the given
+ * seqlock_t. Validity of the critical section is tested by checking
+ * read_seqretry().
+ *
+ * Return: count to be passed to read_seqretry()
  */
 static inline unsigned read_seqbegin(const seqlock_t *sl)
 {
@@ -469,6 +524,17 @@ static inline unsigned read_seqbegin(const seqlock_t *sl)
 	return ret;
 }
 
+/**
+ * read_seqretry() - end and validate a seqlock_t read side section
+ * @sl: Pointer to &typedef seqlock_t
+ * @start: count, from read_seqbegin()
+ *
+ * read_seqretry closes the given seqlock_t read side critical section,
+ * and checks its validity. If the read section was invalid, it must be
+ * ignored and retried.
+ *
+ * Return: 1 if a retry is required, 0 otherwise
+ */
 static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 {
 	/*
@@ -480,10 +546,19 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 	return read_seqcount_retry(&sl->seqcount, start);
 }
 
-/*
- * Lock out other writers and update the count.
- * Acts like a normal spin_lock/unlock.
- * Don't need preempt_disable() because that is in the spin_lock already.
+/**
+ * write_seqlock() - start a seqlock_t write side critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_seqlock opens a write side critical section of the given
+ * seqlock_t.  It also acquires the spinlock_t embedded inside the
+ * sequential lock. All the seqlock_t write side critical sections are
+ * thus automatically serialized and non-preemptible.
+ *
+ * Use the ``_irqsave`` and ``_bh`` variants instead if the read side
+ * can be invoked from a hardirq or softirq context.
+ *
+ * The opened write side section must be closed with write_sequnlock().
  */
 static inline void write_seqlock(seqlock_t *sl)
 {
@@ -491,30 +566,68 @@ static inline void write_seqlock(seqlock_t *sl)
 	write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock() - end a seqlock_t write side critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_sequnlock closes the (serialized and non-preemptible) write
+ * side critical section of given seqlock_t.
+ */
 static inline void write_sequnlock(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
 	spin_unlock(&sl->lock);
 }
 
+/**
+ * write_seqlock_bh() - start a softirqs-disabled seqlock_t write section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_bh`` variant of write_seqlock(). Use only if the read side section
+ * can be invoked from a softirq context.
+ *
+ * The opened write section must be closed with write_sequnlock_bh().
+ */
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
 	write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock_bh() - end a softirqs-disabled seqlock_t write section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_sequnlock_bh closes the serialized, non-preemptible,
+ * softirqs-disabled, seqlock_t write side critical section opened with
+ * write_seqlock_bh().
+ */
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
 	spin_unlock_bh(&sl->lock);
 }
 
+/**
+ * write_seqlock_irq() - start a non-interruptible seqlock_t write side section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * This is the ``_irq`` variant of write_seqlock(). Use only if the read
+ * section of given seqlock_t can be invoked from a hardirq context.
+ */
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
 	write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock_irq() - end a non-interruptible seqlock_t write side section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_irq`` variant of write_sequnlock(). The write side section of
+ * given seqlock_t must've been opened with write_seqlock_irq().
+ */
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
@@ -530,9 +643,28 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	return flags;
 }
 
+/**
+ * write_seqlock_irqsave() - start a non-interruptible seqlock_t write section
+ * @lock:  Pointer to &typedef seqlock_t
+ * @flags: Stack-allocated storage for saving caller's local interrupt
+ *         state, to be passed to write_sequnlock_irqrestore().
+ *
+ * ``_irqsave`` variant of write_seqlock(). Use if the read section of
+ * given seqlock_t can be invoked from a hardirq context.
+ *
+ * The opened write section must be closed with write_sequnlock_irqrestore().
+ */
 #define write_seqlock_irqsave(lock, flags)				\
 	do { flags = __write_seqlock_irqsave(lock); } while (0)
 
+/**
+ * write_sequnlock_irqrestore() - end non-interruptible seqlock_t write section
+ * @sl:    Pointer to &typedef seqlock_t
+ * @flags: Caller's saved interrupt state, from write_seqlock_irqsave()
+ *
+ * ``_irqrestore`` variant of write_sequnlock(). The write section of
+ * given seqlock_t must've been opened with write_seqlock_irqsave().
+ */
 static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
@@ -540,30 +672,64 @@ write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
-/*
- * A locking reader exclusively locks out other writers and locking readers,
- * but doesn't update the sequence number. Acts like a normal spin_lock/unlock.
- * Don't need preempt_disable() because that is in the spin_lock already.
+/**
+ * read_seqlock_excl() - begin a seqlock_t locking reader critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_seqlock_excl opens a locking reader critical section for the
+ * given seqlock_t. A locking reader exclusively locks out other writers
+ * and other *locking* readers, but doesn't update the sequence number.
+ *
+ * Locking readers act like a normal spin_lock()/spin_unlock().
+ *
+ * The opened read side section must be closed with read_sequnlock_excl().
  */
 static inline void read_seqlock_excl(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl() - end a seqlock_t locking reader critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_sequnlock_excl closes the locking reader critical section opened
+ * with read_seqlock_excl().
+ */
 static inline void read_sequnlock_excl(seqlock_t *sl)
 {
 	spin_unlock(&sl->lock);
 }
 
 /**
- * read_seqbegin_or_lock - begin a sequence number check or locking block
- * @lock: sequence lock
- * @seq : sequence number to be checked
- *
- * First try it once optimistically without taking the lock. If that fails,
- * take the lock. The sequence number is also used as a marker for deciding
- * whether to be a reader (even) or writer (odd).
- * N.B. seq must be initialized to an even number to begin with.
+ * read_seqbegin_or_lock() - begin a seqlock_t lockless or locking reader
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq : Marker and return parameter. If the passed value is even, the
+ * reader will become a *lockless* seqlock_t sequence counter reader as
+ * in read_seqbegin(). If the passed value is odd, the reader will
+ * become a fully locking reader, as in read_seqlock_excl().  In the
+ * first call to read_seqbegin_or_lock(), the caller **must** initialize
+ * and pass an even value to @seq so a lockless read is optimistically
+ * tried first.
+ *
+ * read_seqbegin_or_lock is an API designed to optimistically try a
+ * normal lockless seqlock_t read section first, as in read_seqbegin().
+ * If an odd counter is found, the normal lockless read trial has
+ * failed, and the next reader iteration transforms to a full seqlock_t
+ * locking reader as in read_seqlock_excl().
+ *
+ * This is typically used to avoid lockless seqlock_t readers starvation
+ * (too much retry loops) in the case of a sharp spike in write
+ * activity.
+ *
+ * The opened read section must be closed with done_seqretry().  Check
+ * Documentation/locking/seqlock.rst for template example code.
+ *
+ * Return: The encountered sequence counter value, returned through the
+ * @seq parameter, which is overloaded as a return parameter. The
+ * returned value must be checked with need_seqretry(). If the read
+ * section must be retried, the returned value must also be passed to
+ * the @seq parameter of the next read_seqbegin_or_lock() iteration.
  */
 static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
 {
@@ -573,32 +739,90 @@ static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
 		read_seqlock_excl(lock);
 }
 
+/**
+ * need_seqretry() - validate seqlock_t "locking or lockless" reader section
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq: count, from read_seqbegin_or_lock()
+ *
+ * need_seqretry checks if the seqlock_t read-side critical section
+ * started with read_seqbegin_or_lock() is valid. If it was not, the
+ * caller must retry the read-side section.
+ *
+ * Return: 1 if a retry is required, 0 otherwise
+ */
 static inline int need_seqretry(seqlock_t *lock, int seq)
 {
 	return !(seq & 1) && read_seqretry(lock, seq);
 }
 
+/**
+ * done_seqretry() - end seqlock_t "locking or lockless" reader section
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq: count, from read_seqbegin_or_lock()
+ *
+ * done_seqretry finishes the seqlock_t read side critical section
+ * started by read_seqbegin_or_lock(). The read section must've been
+ * already validated with need_seqretry().
+ */
 static inline void done_seqretry(seqlock_t *lock, int seq)
 {
 	if (seq & 1)
 		read_sequnlock_excl(lock);
 }
 
+/**
+ * read_seqlock_excl_bh() - start a locking reader seqlock_t section
+ *			    with softirqs disabled
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_bh`` variant of read_seqlock_excl(). Use this variant if the
+ * seqlock_t write side section, *or other read sections*, can be
+ * invoked from a softirq context
+ *
+ * The opened section must be closed with read_sequnlock_excl_bh().
+ */
 static inline void read_seqlock_excl_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl_bh() - stop a seqlock_t softirq-disabled locking
+ *			      reader section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_bh`` variant of read_sequnlock_excl(). The closed section must've
+ * been opened with read_seqlock_excl_bh().
+ */
 static inline void read_sequnlock_excl_bh(seqlock_t *sl)
 {
 	spin_unlock_bh(&sl->lock);
 }
 
+/**
+ * read_seqlock_excl_irq() - start a non-interruptible seqlock_t locking
+ *			     reader section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_irq`` variant of read_seqlock_excl(). Use this only if the
+ * seqlock_t write side critical section, *or other read side sections*,
+ * can be invoked from a hardirq context.
+ *
+ * The opened read section must be closed with read_sequnlock_excl_irq().
+ */
 static inline void read_seqlock_excl_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl_irq() - end an interrupts-disabled seqlock_t
+ *                             locking reader section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_irq`` variant of read_sequnlock_excl(). The closed section must've
+ * been opened with read_seqlock_excl_irq().
+ */
 static inline void read_sequnlock_excl_irq(seqlock_t *sl)
 {
 	spin_unlock_irq(&sl->lock);
@@ -612,15 +836,59 @@ static inline unsigned long __read_seqlock_excl_irqsave(seqlock_t *sl)
 	return flags;
 }
 
+/**
+ * read_seqlock_excl_irqsave() - start a non-interruptible seqlock_t
+ *				 locking reader section
+ * @lock: Pointer to &typedef seqlock_t
+ * @flags: Stack-allocated storage for saving caller's local interrupt
+ *         state, to be passed to read_sequnlock_excl_irqrestore().
+ *
+ * ``_irqsave`` variant of read_seqlock_excl(). Use this only if the
+ * seqlock_t write side critical section, *or other read side sections*,
+ * can be invoked from a hardirq context.
+ *
+ * Opened section must be closed with read_sequnlock_excl_irqrestore().
+ */
 #define read_seqlock_excl_irqsave(lock, flags)				\
 	do { flags = __read_seqlock_excl_irqsave(lock); } while (0)
 
+/**
+ * read_sequnlock_excl_irqrestore() - end non-interruptible seqlock_t
+ *				      locking reader section
+ * @sl: Pointer to &typedef seqlock_t
+ * @flags: Caller's saved interrupt state, from
+ *	   read_seqlock_excl_irqsave()
+ *
+ * ``_irqrestore`` variant of read_sequnlock_excl(). The closed section
+ * must've been opened with read_seqlock_excl_irqsave().
+ */
 static inline void
 read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags)
 {
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
+/**
+ * read_seqbegin_or_lock_irqsave() - begin a seqlock_t lockless reader, or
+ *                                   a non-interruptible locking reader
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq: Marker and return parameter. Check read_seqbegin_or_lock().
+ *
+ * This is the ``_irqsave`` variant of read_seqbegin_or_lock(). Use if
+ * the seqlock_t write side critical section, *or other read side sections*,
+ * can be invoked from hardirq context.
+ *
+ * The validity of the read section must be checked with need_seqretry().
+ * The opened section must be closed with done_seqretry_irqrestore().
+ *
+ * Return:
+ *
+ *   1. The saved local interrupts state in case of a locking reader, to be
+ *      passed to done_seqretry_irqrestore().
+ *
+ *   2. The encountered sequence counter value, returned through @seq which
+ *      is overloaded as a return parameter. Check read_seqbegin_or_lock().
+ */
 static inline unsigned long
 read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
 {
@@ -634,6 +902,18 @@ read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
 	return flags;
 }
 
+/**
+ * done_seqretry_irqrestore() - end a seqlock_t lockless reader, or a
+ *				non-interruptible locking reader section
+ * @lock:  Pointer to &typedef seqlock_t
+ * @seq:   Count, from read_seqbegin_or_lock_irqsave()
+ * @flags: Caller's saved local interrupt state in case of a locking
+ *	   reader, also from read_seqbegin_or_lock_irqsave()
+ *
+ * This is the ``_irqrestore`` variant of done_seqretry(). The read
+ * section must've been opened with read_seqbegin_or_lock_irqsave(), and
+ * validated with need_seqretry().
+ */
 static inline void
 done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 04/20] lockdep: Add preemption enabled/disabled assertion APIs
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (2 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 03/20] seqlock: Add missing kernel-doc annotations Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-07-06 20:50     ` Peter Zijlstra
  2020-06-30  5:44   ` [PATCH v3 05/20] seqlock: lockdep assert non-preemptibility on seqcount_t write Ahmed S. Darwish
                     ` (15 subsequent siblings)
  19 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish

Asserting that preemption is enabled or disabled is a critical sanity
check.  Developers are usually reluctant to add such a check in a
fastpath as reading the preemption count can be costly.

Extend the lockdep API with macros asserting that preemption is disabled
or enabled. If lockdep is disabled, or if the underlying architecture
does not support kernel preemption, this assert has no runtime overhead.

References: f54bb2ec02c8 ("locking/lockdep: Add IRQs disabled/enabled assertion APIs: ...")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/lockdep.h | 18 ++++++++++++++++++
 lib/Kconfig.debug       |  1 +
 2 files changed, 19 insertions(+)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index fd04b9e96091..53eff6b26fac 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -548,6 +548,22 @@ do {									\
 	WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirq_context));	\
 } while (0)
 
+#define lockdep_assert_preemption_enabled()				\
+do {									\
+	WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT)	&&		\
+		     debug_locks			&&		\
+		     (preempt_count() != 0		||		\
+		      !this_cpu_read(hardirqs_enabled)));		\
+} while (0)
+
+#define lockdep_assert_preemption_disabled()				\
+do {									\
+	WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT)	&&		\
+		     debug_locks			&&		\
+		     (preempt_count() == 0		&&		\
+		      this_cpu_read(hardirqs_enabled)));		\
+} while (0)
+
 #else
 # define might_lock(lock) do { } while (0)
 # define might_lock_read(lock) do { } while (0)
@@ -556,6 +572,8 @@ do {									\
 # define lockdep_assert_irqs_enabled() do { } while (0)
 # define lockdep_assert_irqs_disabled() do { } while (0)
 # define lockdep_assert_in_irq() do { } while (0)
+# define lockdep_assert_preemption_enabled() do { } while (0)
+# define lockdep_assert_preemption_disabled() do { } while (0)
 #endif
 
 #ifdef CONFIG_PROVE_RAW_LOCK_NESTING
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d74ac0fd6b2d..e5e2e632b749 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1118,6 +1118,7 @@ config PROVE_LOCKING
 	select DEBUG_RWSEMS
 	select DEBUG_WW_MUTEX_SLOWPATH
 	select DEBUG_LOCK_ALLOC
+	select PREEMPT_COUNT if !ARCH_NO_PREEMPT
 	select TRACE_IRQFLAGS
 	default n
 	help
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 05/20] seqlock: lockdep assert non-preemptibility on seqcount_t write
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (3 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 04/20] lockdep: Add preemption enabled/disabled assertion APIs Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (14 subsequent siblings)
  19 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish

Preemption must be disabled before entering a sequence count write side
critical section.  Failing to do so, the seqcount read side can preempt
the write side section and spin for the entire scheduler tick.  If that
reader belongs to a real-time scheduling class, it can spin forever and
the kernel will livelock.

Assert through lockdep that preemption is disabled for seqcount writers.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 057f7326a877..679c440b17fe 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -419,12 +419,29 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
        smp_wmb();      /* increment "sequence" before following stores */
 }
 
-static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
+static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
 	raw_write_seqcount_begin(s);
 	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }
 
+static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
+{
+	lockdep_assert_preemption_disabled();
+	__write_seqcount_begin_nested(s, subclass);
+}
+
+/*
+ * write_seqcount_begin() without lockdep non-preemptibility checks.
+ *
+ * Use for internal seqlock.h code where it's known that preemption is
+ * already disabled. For example, seqlock_t write side functions.
+ */
+static inline void __write_seqcount_begin(seqcount_t *s)
+{
+	__write_seqcount_begin_nested(s, 0);
+}
+
 /**
  * write_seqcount_begin() - start a seqcount_t write-side critical section
  * @s: Pointer to &typedef seqcount_t
@@ -563,7 +580,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
 /**
@@ -591,7 +608,7 @@ static inline void write_sequnlock(seqlock_t *sl)
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
 /**
@@ -618,7 +635,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
 /**
@@ -639,7 +656,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	unsigned long flags;
 
 	spin_lock_irqsave(&sl->lock, flags);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 	return flags;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (4 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 05/20] seqlock: lockdep assert non-preemptibility on seqcount_t write Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-07-06 21:21     ` Peter Zijlstra
  2020-06-30  5:44   ` [PATCH v3 07/20] dma-buf: Remove custom seqcount lockdep class key Ahmed S. Darwish
                     ` (13 subsequent siblings)
  19 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the write side critical section.

There is no built-in debugging mechanism to verify that the lock used
for writer serialization is held and preemption is disabled. Some usage
sites like dma-buf have explicit lockdep checks for the writer-side
lock, but this covers only a small portion of the sequence counter usage
in the kernel.

Add new sequence counter types which allows to associate a lock to the
sequence counter at initialization time. The seqcount API functions are
extended to provide appropriate lockdep assertions depending on the
seqcount/lock type.

For sequence counters with associated locks that do not implicitly
disable preemption, preemption protection is enforced in the sequence
counter write side functions. This removes the need to explicitly add
preempt_disable/enable() around the write side critical sections: the
write_begin/end() functions for these new sequence counter types
automatically do this.

Introduce the following seqcount types with associated locks:

     seqcount_spinlock_t
     seqcount_raw_spinlock_t
     seqcount_rwlock_t
     seqcount_mutex_t
     seqcount_ww_mutex_t

Extend the seqcount read and write functions to branch out to the
specific seqcount_LOCKTYPE_t implementation at compile-time. This avoids
kernel API explosion per each new seqcount_LOCKTYPE_t added. Add such
compile-time type detection logic into a new, internal, seqlock header.

Document the proper seqcount_LOCKTYPE_t usage, and rationale, at
Documentation/locking/seqlock.rst.

If lockdep is disabled, this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 Documentation/locking/seqlock.rst      |  64 ++++-
 MAINTAINERS                            |   2 +-
 include/linux/seqlock.h                | 372 ++++++++++++++++++++-----
 include/linux/seqlock_types_internal.h | 186 +++++++++++++
 4 files changed, 558 insertions(+), 66 deletions(-)
 create mode 100644 include/linux/seqlock_types_internal.h

diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
index c9916efe038e..2d526dc95408 100644
--- a/Documentation/locking/seqlock.rst
+++ b/Documentation/locking/seqlock.rst
@@ -48,9 +48,11 @@ write side section. If the read section can be invoked from hardirq or
 softirq contexts, interrupts or bottom halves must also be respectively
 disabled before entering the write section.
 
-If it's desired to automatically handle the sequence counter
-requirements of writer serialization and non-preemptibility, use a
-:ref:`sequential lock <seqlock_t>` instead.
+If the write serialization mechanism is one of the common kernel locking
+primitives, use :ref:`sequence counters with associated locks
+<seqcount_locktype_t>` instead. If it's desired to automatically handle
+the sequence counter writer serialization and non-preemptibility
+requirements, use a :ref:`sequential lock <seqlock_t>`.
 
 Initialization:
 
@@ -70,6 +72,7 @@ Initialization:
 
 Write path:
 
+.. _seqcount_write_ops:
 .. code-block:: c
 
 	/* Serialized context with disabled preemption */
@@ -82,6 +85,7 @@ Write path:
 
 Read path:
 
+.. _seqcount_read_ops:
 .. code-block:: c
 
 	do {
@@ -91,6 +95,60 @@ Read path:
 
 	} while (read_seqcount_retry(&foo_seqcount, seq));
 
+.. _seqcount_locktype_t:
+
+Sequence counters with associated locks (:c:type:`seqcount_LOCKTYPE_t`)
+-----------------------------------------------------------------------
+
+As :ref:`earlier discussed <seqcount_t>`, seqcount write side critical
+sections must be serialized and non-preemptible. This variant of
+sequence counters associate the lock used for writer serialization at
+the seqcount initialization time. This enables lockdep to validate that
+the write side critical section is properly serialized.
+
+This lock association is a NOOP if lockdep is disabled and has neither
+storage nor runtime overhead. If lockdep is enabled, the lock pointer is
+stored in struct seqcount and lockdep's "lock is held" assertions are
+injected at the beginning of the write side critical section to validate
+that it is properly protected.
+
+For lock types which do not implicitly disable preemption, preemption
+protection is enforced in the write side function.
+
+The following seqcounts with associated locks are defined:
+
+  - :c:type:`seqcount_spinlock_t`
+  - :c:type:`seqcount_raw_spinlock_t`
+  - :c:type:`seqcount_rwlock_t`
+  - :c:type:`seqcount_mutex_t`
+  - :c:type:`seqcount_ww_mutex_t`
+
+The plain seqcount read and write APIs branch out to the specific
+seqcount_LOCKTYPE_t implementation at compile-time. This avoids kernel
+API explosion per each new seqcount LOCKTYPE.
+
+Initialization (replace "LOCKTYPE" with one of the supported locks):
+
+.. code-block:: c
+
+	/* dynamic */
+	seqcount_LOCKTYPE_t foo_seqcount;
+	seqcount_LOCKTYPE_init(&foo_seqcount, &lock);
+
+	/* static */
+	static seqcount_LOCKTYPE_t foo_seqcount =
+		SEQCNT_LOCKTYPE_ZERO(foo_seqcount, &lock);
+
+	/* C99 struct init */
+	struct {
+		.seq   = SEQCNT_LOCKTYPE_ZERO(foo.seq, &lock),
+	} foo;
+
+Write path: same as in :ref:`plain seqcount_t <seqcount_write_ops>`,
+while running from a context with the associated LOCKTYPE lock acquired.
+
+Read path: same as in :ref:`plain seqcount_t <seqcount_read_ops>`.
+
 .. _seqlock_t:
 
 Sequential locks (:c:type:`seqlock_t`)
diff --git a/MAINTAINERS b/MAINTAINERS
index 68f21d46614c..84a859bc8b4c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10077,7 +10077,7 @@ F:	include/linux/lockdep.h
 F:	include/linux/mutex*.h
 F:	include/linux/rwlock*.h
 F:	include/linux/rwsem*.h
-F:	include/linux/seqlock.h
+F:	include/linux/seqlock*.h
 F:	include/linux/spinlock*.h
 F:	kernel/locking/
 F:	lib/locking*.[ch]
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 679c440b17fe..fb0fe5b7c4f4 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -48,6 +48,10 @@
  * This mechanism can't be used if the protected data contains pointers,
  * as the writer can invalidate a pointer that a reader is following.
  *
+ * If the write serialization mechanism is one of the common kernel
+ * locking primitives, use a sequence counter with associated lock
+ * (seqcount_LOCKTYPE_t) instead.
+ *
  * If it's desired to automatically handle the sequence counter writer
  * serialization and non-preemptibility requirements, use a sequential
  * lock (seqlock_t) instead.
@@ -108,11 +112,223 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  */
 #define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) }
 
+/*
+ * Sequence counters with associated locks (seqcount_LOCKTYPE_t)
+ *
+ * A sequence counter which associates the lock used for writer
+ * serialization at initialization time. This enables lockdep to validate
+ * that the write side critical section is properly serialized.
+ *
+ * For associated locks which do not implicitly disable preemption,
+ * preemption protection is enforced in the write side function.
+ *
+ * See Documentation/locking/seqlock.rst
+ */
 
 /**
- * __read_seqcount_begin() - begin a seqcount_t read section (without barrier)
- * @s: Pointer to &typedef seqcount_t
- * Returns: count to be passed to read_seqcount_retry()
+ * typedef seqcount_spinlock_t - sequence count with spinlock associated
+ * @seqcount:		The real sequence counter
+ * @lock:		Pointer to the associated spinlock
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * spinlock. The spinlock is associated to the sequence count in the
+ * static initializer or init function. This enables lockdep to validate
+ * that the write side critical section is properly serialized.
+ */
+typedef struct seqcount_spinlock {
+	seqcount_t      seqcount;
+#ifdef CONFIG_LOCKDEP
+	spinlock_t	*lock;
+#endif
+} seqcount_spinlock_t;
+
+#ifdef CONFIG_LOCKDEP
+
+#define SEQCOUNT_LOCKTYPE_ZERO(seq_name, assoc_lock) {		\
+	.seqcount	= SEQCNT_ZERO(seq_name.seqcount),	\
+	.lock		= (assoc_lock),				\
+}
+
+/* Define as macro due to static lockdep key @ seqcount_init() */
+#define seqcount_locktype_init(s, assoc_lock)			\
+do {								\
+	seqcount_init(&(s)->seqcount);				\
+	(s)->lock = (assoc_lock);				\
+} while (0)
+
+#else /* !CONFIG_LOCKDEP */
+
+#define SEQCOUNT_LOCKTYPE_ZERO(seq_name, assoc_lock) {		\
+	.seqcount	= SEQCNT_ZERO(seq_name.seqcount),	\
+}
+
+#define seqcount_locktype_init(s, assoc_lock)			\
+do {								\
+	seqcount_init(&(s)->seqcount);				\
+} while (0)
+
+#endif
+
+/**
+ * SEQCNT_SPINLOCK_ZERO - static initializer for seqcount_spinlock_t
+ * @name:	Name of the &typedef seqcount_spinlock_t instance
+ * @lock:	Pointer to the associated spinlock
+ */
+#define SEQCNT_SPINLOCK_ZERO(name, lock)	\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_spinlock_init - runtime initializer for seqcount_spinlock_t
+ * @s:		Pointer to the &typedef seqcount_spinlock_t instance
+ * @lock:	Pointer to the associated spinlock
+ */
+#define seqcount_spinlock_init(s, lock)		\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_raw_spinlock_t - sequence count with raw spinlock associated
+ * @seqcount:		The real sequence counter
+ * @lock:		Pointer to the associated raw spinlock
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * raw spinlock. The raw spinlock is associated to the sequence count in
+ * the static initializer or init function. This enables lockdep to
+ * validate that the write side critical section is properly serialized.
+ */
+typedef struct seqcount_raw_spinlock {
+	seqcount_t      seqcount;
+#ifdef CONFIG_LOCKDEP
+	raw_spinlock_t	*lock;
+#endif
+} seqcount_raw_spinlock_t;
+
+/**
+ * SEQCNT_RAW_SPINLOCK_ZERO - static initializer for seqcount_raw_spinlock_t
+ * @name:	Name of the &typedef seqcount_raw_spinlock_t instance
+ * @lock:	Pointer to the associated raw_spinlock
+ */
+#define SEQCNT_RAW_SPINLOCK_ZERO(name, lock)	\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_raw_spinlock_init - runtime initializer for seqcount_raw_spinlock_t
+ * @s:		Pointer to the &typedef seqcount_raw_spinlock_t instance
+ * @lock:	Pointer to the associated raw_spinlock
+ */
+#define seqcount_raw_spinlock_init(s, lock)	\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_rwlock_t - sequence count with rwlock associated
+ * @seqcount:		The real sequence counter
+ * @lock:		Pointer to the associated rwlock
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * rwlock. The rwlock is associated to the sequence count in the static
+ * initializer or init function. This enables lockdep to validate that
+ * the write side critical section is properly serialized.
+ */
+typedef struct seqcount_rwlock {
+	seqcount_t      seqcount;
+#ifdef CONFIG_LOCKDEP
+	rwlock_t	*lock;
+#endif
+} seqcount_rwlock_t;
+
+/**
+ * SEQCNT_RWLOCK_ZERO - static initializer for seqcount_rwlock_t
+ * @name:	Name of the &typedef seqcount_rwlock_t instance
+ * @lock:	Pointer to the associated rwlock
+ */
+#define SEQCNT_RWLOCK_ZERO(name, lock)		\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_rwlock_init - runtime initializer for seqcount_rwlock_t
+ * @s:		Pointer to the &typedef seqcount_rwlock_t instance
+ * @lock:	Pointer to the associated rwlock
+ */
+#define seqcount_rwlock_init(s, lock)		\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_mutex_t - sequence count with mutex associated
+ * @seqcount:		The real sequence counter
+ * @lock:		Pointer to the associated mutex
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * mutex. The mutex is associated to the sequence counter in the static
+ * initializer or init function. This enables lockdep to validate that
+ * the write side critical section is properly serialized.
+ *
+ * The write side API functions write_seqcount_begin()/end() automatically
+ * disable and enable preemption when used with seqcount_mutex_t.
+ */
+typedef struct seqcount_mutex {
+	seqcount_t      seqcount;
+#ifdef CONFIG_LOCKDEP
+	struct mutex	*lock;
+#endif
+} seqcount_mutex_t;
+
+/**
+ * SEQCNT_MUTEX_ZERO - static initializer for seqcount_mutex_t
+ * @name:	Name of the &typedef seqcount_mutex_t instance
+ * @lock:	Pointer to the associated mutex
+ */
+#define SEQCNT_MUTEX_ZERO(name, lock)		\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_mutex_init - runtime initializer for seqcount_mutex_t
+ * @s:		Pointer to the &typedef seqcount_mutex_t instance
+ * @lock:	Pointer to the associated mutex
+ */
+#define seqcount_mutex_init(s, lock)		\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_ww_mutex_t - sequence count with ww_mutex associated
+ * @seqcount:		The real sequence counter
+ * @lock:		Pointer to the associated ww_mutex
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * ww_mutex. The ww_mutex is associated to the sequence counter in the static
+ * initializer or init function. This enables lockdep to validate that
+ * the write side critical section is properly serialized.
+ *
+ * The write side API functions write_seqcount_begin()/end() automatically
+ * disable and enable preemption when used with seqcount_ww_mutex_t.
+ */
+typedef struct seqcount_ww_mutex {
+	seqcount_t      seqcount;
+#ifdef CONFIG_LOCKDEP
+	struct ww_mutex	*lock;
+#endif
+} seqcount_ww_mutex_t;
+
+/**
+ * SEQCNT_WW_MUTEX_ZERO - static initializer for seqcount_ww_mutex_t
+ * @name:	Name of the &typedef seqcount_ww_mutex_t instance
+ * @lock:	Pointer to the associated ww_mutex
+ */
+#define SEQCNT_WW_MUTEX_ZERO(name, lock)	\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_ww_mutex_init - runtime initializer for seqcount_ww_mutex_t
+ * @s:		Pointer to the &typedef seqcount_ww_mutex_t instance
+ * @lock:	Pointer to the associated ww_mutex
+ */
+#define seqcount_ww_mutex_init(s, lock)		\
+	seqcount_locktype_init(s, lock)
+
+#include <linux/seqlock_types_internal.h>
+
+/**
+ * __read_seqcount_begin() - begin a seqcount read section (without barrier)
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
+ * Returns: count to be passed to read_seqcount_retry
  *
  * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
  * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
@@ -122,7 +338,9 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  * Use carefully, only in critical code, and comment how the barrier is
  * provided.
  */
-static inline unsigned __read_seqcount_begin(const seqcount_t *s)
+#define __read_seqcount_begin(s)	do___read_seqcount_begin(s)
+
+static inline unsigned __read_seqcount_t_begin(const seqcount_t *s)
 {
 	unsigned ret;
 
@@ -137,16 +355,18 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount() - Read the seqcount_t raw counter value
- * @s: Pointer to &typedef seqcount_t
- * Returns: count to be passed to read_seqcount_retry()
+ * raw_read_seqcount() - Read the seqcount raw counter value
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
+ * Returns: count to be passed to read_seqcount_retry
  *
  * raw_read_seqcount opens a read critical section of the given
  * seqcount_t, without any lockdep checks and without checking or
  * masking the sequence counter LSB. Calling code is responsible for
  * handling that.
  */
-static inline unsigned raw_read_seqcount(const seqcount_t *s)
+#define raw_read_seqcount(s)	do_raw_read_seqcount(s)
+
+static inline unsigned raw_read_seqcount_t(const seqcount_t *s)
 {
 	unsigned ret = READ_ONCE(s->sequence);
 	smp_rmb();
@@ -155,39 +375,43 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount_begin() - start a seqcount_t read section w/o lockdep
- * @s: Pointer to &typedef seqcount_t
- * Returns: count to be passed to read_seqcount_retry()
+ * raw_read_seqcount_begin() - start a seqcount read section w/o lockdep
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
+ * Returns: count to be passed to read_seqcount_retry
  *
  * raw_read_seqcount_begin opens a read critical section of the given
  * seqcount_t, but without any lockdep checking. Validity of the read
  * section must be checked with read_seqcount_retry().
  */
-static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
+#define raw_read_seqcount_begin(s)	do_raw_read_seqcount_begin(s)
+
+static inline unsigned raw_read_seqcount_t_begin(const seqcount_t *s)
 {
-	unsigned ret = __read_seqcount_begin(s);
+	unsigned ret = __read_seqcount_t_begin(s);
 	smp_rmb();
 	return ret;
 }
 
 /**
- * read_seqcount_begin() - start a seqcount_t read critical section
- * @s: Pointer to &typedef seqcount_t
- * Returns: count to be passed to read_seqcount_retry()
+ * read_seqcount_begin() - start a seqcount read critical section
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
+ * Returns: count to be passed to read_seqcount_retry
  *
  * read_seqcount_begin opens a read critical section of the given
  * seqcount_t. Validity of the read section must be checked with
  * read_seqcount_retry().
  */
-static inline unsigned read_seqcount_begin(const seqcount_t *s)
+#define read_seqcount_begin(s)	do_read_seqcount_begin(s)
+
+static inline unsigned read_seqcount_t_begin(const seqcount_t *s)
 {
 	seqcount_lockdep_reader_access(s);
-	return raw_read_seqcount_begin(s);
+	return raw_read_seqcount_t_begin(s);
 }
 
 /**
  * raw_seqcount_begin() - begin a seq-read critical section
- * @s: Pointer to &typedef seqcount_t
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * Returns: count to be passed to read_seqcount_retry
  *
  * raw_seqcount_begin opens a read critical section of the given seqcount_t.
@@ -199,7 +423,9 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
  * read_seqcount_retry() instead of stabilizing at the beginning of the
  * critical section.
  */
-static inline unsigned raw_seqcount_begin(const seqcount_t *s)
+#define raw_seqcount_begin(s)	do_raw_seqcount_begin(s)
+
+static inline unsigned raw_seqcount_t_begin(const seqcount_t *s)
 {
 	unsigned ret = READ_ONCE(s->sequence);
 	smp_rmb();
@@ -209,7 +435,7 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 
 /**
  * __read_seqcount_retry() - end a seq-read critical section (without barrier)
- * @s: Pointer to &typedef seqcount_t
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * @start: count, from read_seqcount_begin
  * Returns: 1 if retry is required, else 0
  *
@@ -221,7 +447,9 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
  * Use carefully, only in critical code, and comment how the barrier is
  * provided.
  */
-static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
+#define __read_seqcount_retry(s, start)		do___read_seqcount_retry(s, start)
+
+static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 {
 	kcsan_atomic_next(0);
 	return unlikely(READ_ONCE(s->sequence) != start);
@@ -229,7 +457,7 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
 
 /**
  * read_seqcount_retry() - end a seq-read critical section
- * @s: Pointer to &typedef seqcount_t
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * @start: count, from read_seqcount_begin
  * Returns: 1 if retry is required, else 0
  *
@@ -237,22 +465,28 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
  * If the critical section was invalid, it must be ignored (and typically
  * retried).
  */
-static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
+#define read_seqcount_retry(s, start)	do_read_seqcount_retry(s, start)
+
+static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 {
 	smp_rmb();
-	return __read_seqcount_retry(s, start);
+	return __read_seqcount_t_retry(s, start);
 }
 
 
 
-static inline void raw_write_seqcount_begin(seqcount_t *s)
+#define raw_write_seqcount_begin(s)	do_raw_write_seqcount_begin(s)
+
+static inline void raw_write_seqcount_t_begin(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
 	s->sequence++;
 	smp_wmb();
 }
 
-static inline void raw_write_seqcount_end(seqcount_t *s)
+#define raw_write_seqcount_end(s)	do_raw_write_seqcount_end(s)
+
+static inline void raw_write_seqcount_t_end(seqcount_t *s)
 {
 	smp_wmb();
 	s->sequence++;
@@ -261,7 +495,7 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 
 /**
  * raw_write_seqcount_barrier() - do a seq write barrier
- * @s: Pointer to &typedef seqcount_t
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * This can be used to provide an ordering guarantee instead of the
  * usual consistency guarantee. It is one wmb cheaper, because we can
@@ -300,7 +534,9 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  *              WRITE_ONCE(X, false);
  *      }
  */
-static inline void raw_write_seqcount_barrier(seqcount_t *s)
+#define raw_write_seqcount_barrier(s)	do_raw_write_seqcount_barrier(s)
+
+static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
 	s->sequence++;
@@ -310,8 +546,8 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount_latch() - pick even or odd seqcount_t latch data copy
- * @s: Pointer to &typedef seqcount_t
+ * raw_read_seqcount_latch() - pick even or odd seqcount latch data copy
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * Use seqcount latching to switch between two storage places with
  * sequence protection to allow interruptible, preemptible, writer
@@ -324,7 +560,9 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
  * which data copy to read. Full counter must then be checked with
  * read_seqcount_retry().
  */
-static inline int raw_read_seqcount_latch(seqcount_t *s)
+#define raw_read_seqcount_latch(s)	do_raw_read_seqcount_latch(s)
+
+static inline int raw_read_seqcount_t_latch(seqcount_t *s)
 {
 	/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
 	int seq = READ_ONCE(s->sequence); /* ^^^ */
@@ -333,7 +571,7 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
 
 /**
  * raw_write_seqcount_latch() - redirect readers to even/odd copy
- * @s: Pointer to &typedef seqcount_t
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -412,49 +650,54 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  *	When data is a dynamic data structure; one should use regular RCU
  *	patterns to manage the lifetimes of the objects within.
  */
-static inline void raw_write_seqcount_latch(seqcount_t *s)
+#define raw_write_seqcount_latch(s)	do_raw_write_seqcount_latch(s)
+
+static inline void raw_write_seqcount_t_latch(seqcount_t *s)
 {
        smp_wmb();      /* prior stores before incrementing "sequence" */
        s->sequence++;
        smp_wmb();      /* increment "sequence" before following stores */
 }
 
-static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
+static inline void __write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
 {
-	raw_write_seqcount_begin(s);
+	raw_write_seqcount_t_begin(s);
 	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }
 
-static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
+#define write_seqcount_begin_nested(s, subclass)		\
+	do_write_seqcount_begin_nested(s, subclass)
+
+static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
 {
 	lockdep_assert_preemption_disabled();
-	__write_seqcount_begin_nested(s, subclass);
+	__write_seqcount_t_begin_nested(s, subclass);
 }
 
 /*
- * write_seqcount_begin() without lockdep non-preemptibility checks.
+ * write_seqcount_t_begin() without lockdep non-preemptibility checks.
  *
  * Use for internal seqlock.h code where it's known that preemption is
  * already disabled. For example, seqlock_t write side functions.
  */
-static inline void __write_seqcount_begin(seqcount_t *s)
+static inline void __write_seqcount_t_begin(seqcount_t *s)
 {
-	__write_seqcount_begin_nested(s, 0);
+	__write_seqcount_t_begin_nested(s, 0);
 }
 
 /**
- * write_seqcount_begin() - start a seqcount_t write-side critical section
- * @s: Pointer to &typedef seqcount_t
+ * write_seqcount_begin() - start a seqcount write-side critical section
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * write_seqcount_begin opens a write-side critical section of the given
- * seqcount_t. Sequence counter write-side critical sections must be
- * serialized and non-preemptible. If readers can be invoked from
- * hardirq or softirq contexts, interrupts or bottom halves must be
- * respectively disabled.
+ * seqcount. Seqcount write-side critical sections must be externally
+ * serialized and non-preemptible.
  */
-static inline void write_seqcount_begin(seqcount_t *s)
+#define write_seqcount_begin(s)		do_write_seqcount_begin(s)
+
+static inline void write_seqcount_t_begin(seqcount_t *s)
 {
-	write_seqcount_begin_nested(s, 0);
+	write_seqcount_t_begin_nested(s, 0);
 }
 
 /**
@@ -463,7 +706,9 @@ static inline void write_seqcount_begin(seqcount_t *s)
  *
  * The write section must've been opened with write_seqcount_begin().
  */
-static inline void write_seqcount_end(seqcount_t *s)
+#define write_seqcount_end(s)		do_write_seqcount_end(s)
+
+static inline void write_seqcount_t_end(seqcount_t *s)
 {
 	seqcount_release(&s->dep_map, _RET_IP_);
 	raw_write_seqcount_end(s);
@@ -471,12 +716,14 @@ static inline void write_seqcount_end(seqcount_t *s)
 
 /**
  * write_seqcount_invalidate() - invalidate in-progress read-side seq operations
- * @s: Pointer to &typedef seqcount_t
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * After write_seqcount_invalidate, no read-side seq operations will complete
  * successfully and see data older than this.
  */
-static inline void write_seqcount_invalidate(seqcount_t *s)
+#define write_seqcount_invalidate(s)	do_write_seqcount_invalidate(s)
+
+static inline void write_seqcount_t_invalidate(seqcount_t *s)
 {
 	smp_wmb();
 	kcsan_nestable_atomic_begin();
@@ -534,7 +781,7 @@ typedef struct {
  */
 static inline unsigned read_seqbegin(const seqlock_t *sl)
 {
-	unsigned ret = read_seqcount_begin(&sl->seqcount);
+	unsigned ret = read_seqcount_t_begin(&sl->seqcount);
 
 	kcsan_atomic_next(0);  /* non-raw usage, assume closing read_seqretry() */
 	kcsan_flat_atomic_begin();
@@ -560,7 +807,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 	 */
 	kcsan_flat_atomic_end();
 
-	return read_seqcount_retry(&sl->seqcount, start);
+	return read_seqcount_t_retry(&sl->seqcount, start);
 }
 
 /**
@@ -580,7 +827,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	__write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -592,7 +839,7 @@ static inline void write_seqlock(seqlock_t *sl)
  */
 static inline void write_sequnlock(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock(&sl->lock);
 }
 
@@ -608,7 +855,7 @@ static inline void write_sequnlock(seqlock_t *sl)
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
-	__write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -621,7 +868,7 @@ static inline void write_seqlock_bh(seqlock_t *sl)
  */
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock_bh(&sl->lock);
 }
 
@@ -635,7 +882,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
-	__write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -647,7 +894,7 @@ static inline void write_seqlock_irq(seqlock_t *sl)
  */
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock_irq(&sl->lock);
 }
 
@@ -656,7 +903,8 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	unsigned long flags;
 
 	spin_lock_irqsave(&sl->lock, flags);
-	__write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_t_begin(&sl->seqcount);
+
 	return flags;
 }
 
@@ -685,7 +933,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
diff --git a/include/linux/seqlock_types_internal.h b/include/linux/seqlock_types_internal.h
new file mode 100644
index 000000000000..a70bf38e6eb5
--- /dev/null
+++ b/include/linux/seqlock_types_internal.h
@@ -0,0 +1,186 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_SEQLOCK_TYPES_INTERNAL_H
+#define __LINUX_SEQLOCK_TYPES_INTERNAL_H
+
+/*
+ * Sequence counters with associated locks
+ *
+ * Copyright (C) 2020 Linutronix GmbH
+ */
+
+#ifndef __LINUX_SEQLOCK_H
+#error This is an INTERNAL header; it must only be included by seqlock.h
+#endif
+
+#include <linux/mutex.h>
+#include <linux/spinlock.h>
+#include <linux/ww_mutex.h>
+
+/*
+ * @s: pointer to seqcount_t or any of the seqcount_locktype_t variants
+ */
+#define __to_seqcount_t(s)						\
+({									\
+	seqcount_t *seq;						\
+									\
+	if (__same_type(*(s), seqcount_t))				\
+		seq = (seqcount_t *)(s);				\
+	else if (__same_type(*(s), seqcount_spinlock_t))		\
+		seq = &((seqcount_spinlock_t *)(s))->seqcount;		\
+	else if (__same_type(*(s), seqcount_raw_spinlock_t))		\
+		seq = &((seqcount_raw_spinlock_t *)(s))->seqcount;	\
+	else if (__same_type(*(s), seqcount_rwlock_t))			\
+		seq = &((seqcount_rwlock_t *)(s))->seqcount;		\
+	else if (__same_type(*(s), seqcount_mutex_t))			\
+		seq = &((seqcount_mutex_t *)(s))->seqcount;		\
+	else if (__same_type(*(s), seqcount_ww_mutex_t))		\
+		seq = &((seqcount_ww_mutex_t *)(s))->seqcount;		\
+	else								\
+		BUILD_BUG_ON_MSG(1, "Unknown seqcount type");		\
+									\
+	seq;								\
+})
+
+/*
+ *	seqcount_LOCKTYPE_t -- write APIs
+ *
+ * For associated lock types which do not implicitly disable preemption,
+ * enforce preemption protection in the write side functions.
+ *
+ * Never use lockdep for the raw write variants.
+ */
+
+#define __associated_lock_is_preemptible(s)				\
+({									\
+	bool ret;							\
+									\
+	if (__same_type(*(s), seqcount_t) ||				\
+	    __same_type(*(s), seqcount_spinlock_t) ||			\
+	    __same_type(*(s), seqcount_raw_spinlock_t) ||		\
+	    __same_type(*(s), seqcount_rwlock_t)) {			\
+		ret = false;						\
+	} else if (__same_type(*(s), seqcount_mutex_t) ||		\
+		   __same_type(*(s), seqcount_ww_mutex_t)) {		\
+		ret = true;						\
+	} else								\
+		BUILD_BUG_ON_MSG(1, "Unknown seqcount type");		\
+									\
+	ret;								\
+})
+
+#ifdef CONFIG_LOCKDEP
+
+#define __assert_associated_lock_held(s)				\
+do {									\
+	if (__same_type(*(s), seqcount_t))				\
+		break;							\
+									\
+	if (__same_type(*(s), seqcount_spinlock_t))			\
+		lockdep_assert_held(((seqcount_spinlock_t *)(s))->lock);\
+	else if (__same_type(*(s), seqcount_raw_spinlock_t))		\
+		lockdep_assert_held(((seqcount_raw_spinlock_t *)(s))->lock);	\
+	else if (__same_type(*(s), seqcount_rwlock_t))			\
+		lockdep_assert_held_write(((seqcount_rwlock_t *)(s))->lock);	\
+	else if (__same_type(*(s), seqcount_mutex_t))			\
+		lockdep_assert_held(((seqcount_mutex_t *)(s))->lock);	\
+	else if (__same_type(*(s), seqcount_ww_mutex_t))		\
+		lockdep_assert_held(&((seqcount_ww_mutex_t *)(s))->lock->base);	\
+	else								\
+		BUILD_BUG_ON_MSG(1, "Unknown seqcount type");		\
+} while (0)
+
+#else
+
+#define __assert_associated_lock_held(s)				\
+do {									\
+	(void) __to_seqcount_t(s);					\
+} while (0)
+
+#endif /* CONFIG_LOCKDEP */
+
+#define do_raw_write_seqcount_begin(s)					\
+do {									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_disable();					\
+									\
+	raw_write_seqcount_t_begin(__to_seqcount_t(s));			\
+} while (0)
+
+#define do_raw_write_seqcount_end(s)					\
+do {									\
+	raw_write_seqcount_t_end(__to_seqcount_t(s));			\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_enable();					\
+} while (0)
+
+#define do_write_seqcount_begin_nested(s, subclass)			\
+do {									\
+	__assert_associated_lock_held(s);				\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_disable();					\
+									\
+	write_seqcount_t_begin_nested(__to_seqcount_t(s), subclass);	\
+} while (0)
+
+#define do_write_seqcount_begin(s)					\
+do {									\
+	__assert_associated_lock_held(s);				\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_disable();					\
+									\
+	write_seqcount_t_begin(__to_seqcount_t(s));			\
+} while (0)
+
+#define do_write_seqcount_end(s)					\
+do {									\
+	write_seqcount_t_end(__to_seqcount_t(s));			\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_enable();					\
+} while (0)
+
+#define do_write_seqcount_invalidate(s)					\
+	write_seqcount_t_invalidate(__to_seqcount_t(s))
+
+#define do_raw_write_seqcount_barrier(s)				\
+	raw_write_seqcount_t_barrier(__to_seqcount_t(s))
+
+/*
+ * Latch sequence counters write side critical sections don't need to
+ * run with preemption disabled. Check @raw_write_seqcount_latch().
+ */
+#define do_raw_write_seqcount_latch(s)					\
+	raw_write_seqcount_t_latch(__to_seqcount_t(s))
+
+/*
+ *	seqcount_LOCKTYPE_t -- read APIs
+ */
+
+#define do___read_seqcount_begin(s)					\
+	__read_seqcount_t_begin(__to_seqcount_t(s))
+
+#define do_raw_read_seqcount(s)						\
+	raw_read_seqcount_t(__to_seqcount_t(s))
+
+#define do_raw_seqcount_begin(s)					\
+	raw_seqcount_t_begin(__to_seqcount_t(s))
+
+#define do_raw_read_seqcount_begin(s)					\
+	raw_read_seqcount_t_begin(__to_seqcount_t(s))
+
+#define do_read_seqcount_begin(s)					\
+	read_seqcount_t_begin(__to_seqcount_t(s))
+
+#define do_raw_read_seqcount_latch(s)					\
+	raw_read_seqcount_t_latch(__to_seqcount_t(s))
+
+#define do___read_seqcount_retry(s, start)				\
+	__read_seqcount_t_retry(__to_seqcount_t(s), start)
+
+#define do_read_seqcount_retry(s, start)				\
+	read_seqcount_t_retry(__to_seqcount_t(s), start)
+
+#endif /* __LINUX_SEQLOCK_TYPES_INTERNAL_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 07/20] dma-buf: Remove custom seqcount lockdep class key
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (5 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 08/20] dma-buf: Use sequence counter with associated wound/wait mutex Ahmed S. Darwish
                     ` (12 subsequent siblings)
  19 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Daniel Vetter,
	David Airlie, dri-devel

Commit 3c3b177a9369 ("reservation: add support for read-only access
using rcu") introduced a sequence counter to manage updates to
reservations. Back then, the reservation object initializer
reservation_object_init() was always inlined.

Having the sequence counter initialization inlined meant that each of
the call sites would have a different lockdep class key, which would've
broken lockdep's deadlock detection. The aforementioned commit thus
introduced, and exported, a custom seqcount lockdep class key and name.

The commit 8735f16803f00 ("dma-buf: cleanup reservation_object_init...")
transformed the reservation object initializer to a normal non-inlined C
function. seqcount_init(), which automatically defines the seqcount
lockdep class key and must be called non-inlined, can now be safely used.

Remove the seqcount custom lockdep class key, name, and export. Use
seqcount_init() inside the dma reservation object initializer.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/dma-buf/dma-resv.c | 9 +--------
 include/linux/dma-resv.h   | 2 --
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index b45f8514dc82..15efa0c2dacb 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -51,12 +51,6 @@
 DEFINE_WD_CLASS(reservation_ww_class);
 EXPORT_SYMBOL(reservation_ww_class);
 
-struct lock_class_key reservation_seqcount_class;
-EXPORT_SYMBOL(reservation_seqcount_class);
-
-const char reservation_seqcount_string[] = "reservation_seqcount";
-EXPORT_SYMBOL(reservation_seqcount_string);
-
 /**
  * dma_resv_list_alloc - allocate fence list
  * @shared_max: number of fences we need space for
@@ -135,9 +129,8 @@ subsys_initcall(dma_resv_lockdep);
 void dma_resv_init(struct dma_resv *obj)
 {
 	ww_mutex_init(&obj->lock, &reservation_ww_class);
+	seqcount_init(&obj->seq);
 
-	__seqcount_init(&obj->seq, reservation_seqcount_string,
-			&reservation_seqcount_class);
 	RCU_INIT_POINTER(obj->fence, NULL);
 	RCU_INIT_POINTER(obj->fence_excl, NULL);
 }
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index ee50d10f052b..a6538ae7d93f 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -46,8 +46,6 @@
 #include <linux/rcupdate.h>
 
 extern struct ww_class reservation_ww_class;
-extern struct lock_class_key reservation_seqcount_class;
-extern const char reservation_seqcount_string[];
 
 /**
  * struct dma_resv_list - a list of shared fences
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 08/20] dma-buf: Use sequence counter with associated wound/wait mutex
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (6 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 07/20] dma-buf: Remove custom seqcount lockdep class key Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 09/20] sched: tasks: Use sequence counter with associated spinlock Ahmed S. Darwish
                     ` (11 subsequent siblings)
  19 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Daniel Vetter,
	David Airlie, dri-devel

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the sequence counter write side critical
section.

The dma-buf reservation subsystem uses plain sequence counters to manage
updates to reservations. Writer serialization is accomplished through a
wound/wait mutex.

Acquiring a wound/wait mutex does not disable preemption, so this needs
to be done manually before and after the write side critical section.

Use the newly-added seqcount_ww_mutex_t instead:

  - It associates the ww_mutex with the sequence count, which enables
    lockdep to validate that the write side critical section is properly
    serialized.

  - It removes the need to explicitly add preempt_disable/enable()
    around the write side critical section because the write_begin/end()
    functions for this new data type automatically do this.

If lockdep is disabled this ww_mutex lock association is compiled out
and has neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/dma-buf/dma-resv.c                       | 8 +-------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 --
 include/linux/dma-resv.h                         | 2 +-
 3 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 15efa0c2dacb..a7631352a486 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -129,7 +129,7 @@ subsys_initcall(dma_resv_lockdep);
 void dma_resv_init(struct dma_resv *obj)
 {
 	ww_mutex_init(&obj->lock, &reservation_ww_class);
-	seqcount_init(&obj->seq);
+	seqcount_ww_mutex_init(&obj->seq, &obj->lock);
 
 	RCU_INIT_POINTER(obj->fence, NULL);
 	RCU_INIT_POINTER(obj->fence_excl, NULL);
@@ -260,7 +260,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
 	fobj = dma_resv_get_list(obj);
 	count = fobj->shared_count;
 
-	preempt_disable();
 	write_seqcount_begin(&obj->seq);
 
 	for (i = 0; i < count; ++i) {
@@ -282,7 +281,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
 	smp_store_mb(fobj->shared_count, count);
 
 	write_seqcount_end(&obj->seq);
-	preempt_enable();
 	dma_fence_put(old);
 }
 EXPORT_SYMBOL(dma_resv_add_shared_fence);
@@ -309,14 +307,12 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
 	if (fence)
 		dma_fence_get(fence);
 
-	preempt_disable();
 	write_seqcount_begin(&obj->seq);
 	/* write_seqcount_begin provides the necessary memory barrier */
 	RCU_INIT_POINTER(obj->fence_excl, fence);
 	if (old)
 		old->shared_count = 0;
 	write_seqcount_end(&obj->seq);
-	preempt_enable();
 
 	/* inplace update, no shared fences */
 	while (i--)
@@ -394,13 +390,11 @@ int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
 	src_list = dma_resv_get_list(dst);
 	old = dma_resv_get_excl(dst);
 
-	preempt_disable();
 	write_seqcount_begin(&dst->seq);
 	/* write_seqcount_begin provides the necessary memory barrier */
 	RCU_INIT_POINTER(dst->fence_excl, new);
 	RCU_INIT_POINTER(dst->fence, dst_list);
 	write_seqcount_end(&dst->seq);
-	preempt_enable();
 
 	dma_resv_list_free(src_list);
 	dma_fence_put(old);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index b91b5171270f..ff4b583cb96a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -258,11 +258,9 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
 	new->shared_count = k;
 
 	/* Install the new fence list, seqcount provides the barriers */
-	preempt_disable();
 	write_seqcount_begin(&resv->seq);
 	RCU_INIT_POINTER(resv->fence, new);
 	write_seqcount_end(&resv->seq);
-	preempt_enable();
 
 	/* Drop the references to the removed fences or move them to ef_list */
 	for (i = j, k = 0; i < old->shared_count; ++i) {
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index a6538ae7d93f..d44a77e8a7e3 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -69,7 +69,7 @@ struct dma_resv_list {
  */
 struct dma_resv {
 	struct ww_mutex lock;
-	seqcount_t seq;
+	seqcount_ww_mutex_t seq;
 
 	struct dma_fence __rcu *fence_excl;
 	struct dma_resv_list __rcu *fence;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 09/20] sched: tasks: Use sequence counter with associated spinlock
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (7 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 08/20] dma-buf: Use sequence counter with associated wound/wait mutex Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 10/20] netfilter: conntrack: " Ahmed S. Darwish
                     ` (10 subsequent siblings)
  19 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Al Viro

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/sched.h | 2 +-
 init/init_task.c      | 3 ++-
 kernel/fork.c         | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 03403ca6e44e..1b4e6b8dc523 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1055,7 +1055,7 @@ struct task_struct {
 	/* Protected by ->alloc_lock: */
 	nodemask_t			mems_allowed;
 	/* Seqence number to catch updates: */
-	seqcount_t			mems_allowed_seq;
+	seqcount_spinlock_t		mems_allowed_seq;
 	int				cpuset_mem_spread_rotor;
 	int				cpuset_slab_spread_rotor;
 #endif
diff --git a/init/init_task.c b/init/init_task.c
index 15089d15010a..94fe3ba1bb60 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -154,7 +154,8 @@ struct task_struct init_task
 	.trc_holdout_list = LIST_HEAD_INIT(init_task.trc_holdout_list),
 #endif
 #ifdef CONFIG_CPUSETS
-	.mems_allowed_seq = SEQCNT_ZERO(init_task.mems_allowed_seq),
+	.mems_allowed_seq = SEQCNT_SPINLOCK_ZERO(init_task.mems_allowed_seq,
+						 &init_task.alloc_lock),
 #endif
 #ifdef CONFIG_RT_MUTEXES
 	.pi_waiters	= RB_ROOT_CACHED,
diff --git a/kernel/fork.c b/kernel/fork.c
index f44d70307210..eb260c6bdb8b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2032,7 +2032,7 @@ static __latent_entropy struct task_struct *copy_process(
 #ifdef CONFIG_CPUSETS
 	p->cpuset_mem_spread_rotor = NUMA_NO_NODE;
 	p->cpuset_slab_spread_rotor = NUMA_NO_NODE;
-	seqcount_init(&p->mems_allowed_seq);
+	seqcount_spinlock_init(&p->mems_allowed_seq, &p->alloc_lock);
 #endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	p->irq_events = 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 10/20] netfilter: conntrack: Use sequence counter with associated spinlock
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (8 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 09/20] sched: tasks: Use sequence counter with associated spinlock Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 11/20] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock Ahmed S. Darwish
                     ` (9 subsequent siblings)
  19 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	Jakub Kicinski, netfilter-devel, coreteam, netdev

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/net/netfilter/nf_conntrack.h | 2 +-
 net/netfilter/nf_conntrack_core.c    | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index 90690e37a56f..ea4e2010b246 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -286,7 +286,7 @@ int nf_conntrack_hash_resize(unsigned int hashsize);
 
 extern struct hlist_nulls_head *nf_conntrack_hash;
 extern unsigned int nf_conntrack_htable_size;
-extern seqcount_t nf_conntrack_generation;
+extern seqcount_spinlock_t nf_conntrack_generation;
 extern unsigned int nf_conntrack_max;
 
 /* must be called with rcu read lock held */
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 79cd9dde457b..b8c54d390f93 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -180,7 +180,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_htable_size);
 
 unsigned int nf_conntrack_max __read_mostly;
 EXPORT_SYMBOL_GPL(nf_conntrack_max);
-seqcount_t nf_conntrack_generation __read_mostly;
+seqcount_spinlock_t nf_conntrack_generation __read_mostly;
 static unsigned int nf_conntrack_hash_rnd __read_mostly;
 
 static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple,
@@ -2598,7 +2598,8 @@ int nf_conntrack_init_start(void)
 	/* struct nf_ct_ext uses u8 to store offsets/size */
 	BUILD_BUG_ON(total_extension_size() > 255u);
 
-	seqcount_init(&nf_conntrack_generation);
+	seqcount_spinlock_init(&nf_conntrack_generation,
+			       &nf_conntrack_locks_all_lock);
 
 	for (i = 0; i < CONNTRACK_LOCKS; i++)
 		spin_lock_init(&nf_conntrack_locks[i]);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 11/20] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (9 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 10/20] netfilter: conntrack: " Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 12/20] xfrm: policy: Use sequence counters with associated lock Ahmed S. Darwish
                     ` (8 subsequent siblings)
  19 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	Jakub Kicinski, netfilter-devel, coreteam, netdev

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_rwlock_t data type, which allows to associate a
rwlock with the sequence counter. This enables lockdep to verify that
the rwlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 net/netfilter/nft_set_rbtree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 62f416bc0579..9f58261ee4c7 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -18,7 +18,7 @@
 struct nft_rbtree {
 	struct rb_root		root;
 	rwlock_t		lock;
-	seqcount_t		count;
+	seqcount_rwlock_t	count;
 	struct delayed_work	gc_work;
 };
 
@@ -516,7 +516,7 @@ static int nft_rbtree_init(const struct nft_set *set,
 	struct nft_rbtree *priv = nft_set_priv(set);
 
 	rwlock_init(&priv->lock);
-	seqcount_init(&priv->count);
+	seqcount_rwlock_init(&priv->count, &priv->lock);
 	priv->root = RB_ROOT;
 
 	INIT_DEFERRABLE_WORK(&priv->gc_work, nft_rbtree_gc);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 12/20] xfrm: policy: Use sequence counters with associated lock
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (10 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 11/20] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 13/20] timekeeping: Use sequence counter with associated raw spinlock Ahmed S. Darwish
                     ` (7 subsequent siblings)
  19 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Steffen Klassert,
	Herbert Xu, David S. Miller, Jakub Kicinski, netdev

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the sequence counter write side critical
section.

A plain seqcount_t does not contain the information of which lock must
be held when entering a write side critical section.

Use the new seqcount_spinlock_t and seqcount_mutex_t data types instead,
which allow to associate a lock with the sequence counter. This enables
lockdep to verify that the lock used for writer serialization is held
when the write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 net/xfrm/xfrm_policy.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 564aa6492e7c..732a940468b0 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -122,7 +122,7 @@ struct xfrm_pol_inexact_bin {
 	/* list containing '*:*' policies */
 	struct hlist_head hhead;
 
-	seqcount_t count;
+	seqcount_spinlock_t count;
 	/* tree sorted by daddr/prefix */
 	struct rb_root root_d;
 
@@ -155,7 +155,7 @@ static struct xfrm_policy_afinfo const __rcu *xfrm_policy_afinfo[AF_INET6 + 1]
 						__read_mostly;
 
 static struct kmem_cache *xfrm_dst_cache __ro_after_init;
-static __read_mostly seqcount_t xfrm_policy_hash_generation;
+static __read_mostly seqcount_mutex_t xfrm_policy_hash_generation;
 
 static struct rhashtable xfrm_policy_inexact_table;
 static const struct rhashtable_params xfrm_pol_inexact_params;
@@ -719,7 +719,7 @@ xfrm_policy_inexact_alloc_bin(const struct xfrm_policy *pol, u8 dir)
 	INIT_HLIST_HEAD(&bin->hhead);
 	bin->root_d = RB_ROOT;
 	bin->root_s = RB_ROOT;
-	seqcount_init(&bin->count);
+	seqcount_spinlock_init(&bin->count, &net->xfrm.xfrm_policy_lock);
 
 	prev = rhashtable_lookup_get_insert_key(&xfrm_policy_inexact_table,
 						&bin->k, &bin->head,
@@ -1906,7 +1906,7 @@ static int xfrm_policy_match(const struct xfrm_policy *pol,
 
 static struct xfrm_pol_inexact_node *
 xfrm_policy_lookup_inexact_addr(const struct rb_root *r,
-				seqcount_t *count,
+				seqcount_spinlock_t *count,
 				const xfrm_address_t *addr, u16 family)
 {
 	const struct rb_node *parent;
@@ -4153,7 +4153,7 @@ void __init xfrm_init(void)
 {
 	register_pernet_subsys(&xfrm_net_ops);
 	xfrm_dev_init();
-	seqcount_init(&xfrm_policy_hash_generation);
+	seqcount_mutex_init(&xfrm_policy_hash_generation, &hash_resize_mutex);
 	xfrm_input_init();
 
 #ifdef CONFIG_INET_ESPINTCP
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 13/20] timekeeping: Use sequence counter with associated raw spinlock
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (11 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 12/20] xfrm: policy: Use sequence counters with associated lock Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 14/20] vfs: Use sequence counter with associated spinlock Ahmed S. Darwish
                     ` (6 subsequent siblings)
  19 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, John Stultz,
	Stephen Boyd

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_raw_spinlock_t data type, which allows to associate
a raw spinlock with the sequence counter. This enables lockdep to verify
that the raw spinlock used for writer serialization is held when the
write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 kernel/time/timekeeping.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index d20d489841c8..05ecfd8a3314 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -39,18 +39,19 @@ enum timekeeping_adv_mode {
 	TK_ADV_FREQ
 };
 
+static DEFINE_RAW_SPINLOCK(timekeeper_lock);
+
 /*
  * The most important data for readout fits into a single 64 byte
  * cache line.
  */
 static struct {
-	seqcount_t		seq;
+	seqcount_raw_spinlock_t	seq;
 	struct timekeeper	timekeeper;
 } tk_core ____cacheline_aligned = {
-	.seq = SEQCNT_ZERO(tk_core.seq),
+	.seq = SEQCNT_RAW_SPINLOCK_ZERO(tk_core.seq, &timekeeper_lock),
 };
 
-static DEFINE_RAW_SPINLOCK(timekeeper_lock);
 static struct timekeeper shadow_timekeeper;
 
 /**
@@ -63,7 +64,7 @@ static struct timekeeper shadow_timekeeper;
  * See @update_fast_timekeeper() below.
  */
 struct tk_fast {
-	seqcount_t		seq;
+	seqcount_raw_spinlock_t	seq;
 	struct tk_read_base	base[2];
 };
 
@@ -80,11 +81,13 @@ static struct clocksource dummy_clock = {
 };
 
 static struct tk_fast tk_fast_mono ____cacheline_aligned = {
+	.seq     = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_mono.seq, &timekeeper_lock),
 	.base[0] = { .clock = &dummy_clock, },
 	.base[1] = { .clock = &dummy_clock, },
 };
 
 static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
+	.seq     = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_raw.seq, &timekeeper_lock),
 	.base[0] = { .clock = &dummy_clock, },
 	.base[1] = { .clock = &dummy_clock, },
 };
@@ -157,7 +160,7 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
  * tk_clock_read - atomic clocksource read() helper
  *
  * This helper is necessary to use in the read paths because, while the
- * seqlock ensures we don't return a bad value while structures are updated,
+ * seqcount ensures we don't return a bad value while structures are updated,
  * it doesn't protect from potential crashes. There is the possibility that
  * the tkr's clocksource may change between the read reference, and the
  * clock reference passed to the read function.  This can cause crashes if
@@ -222,10 +225,10 @@ static inline u64 timekeeping_get_delta(const struct tk_read_base *tkr)
 	unsigned int seq;
 
 	/*
-	 * Since we're called holding a seqlock, the data may shift
+	 * Since we're called holding a seqcount, the data may shift
 	 * under us while we're doing the calculation. This can cause
 	 * false positives, since we'd note a problem but throw the
-	 * results away. So nest another seqlock here to atomically
+	 * results away. So nest another seqcount here to atomically
 	 * grab the points we are checking with.
 	 */
 	do {
@@ -486,7 +489,7 @@ EXPORT_SYMBOL_GPL(ktime_get_raw_fast_ns);
  *
  * To keep it NMI safe since we're accessing from tracing, we're not using a
  * separate timekeeper with updates to monotonic clock and boot offset
- * protected with seqlocks. This has the following minor side effects:
+ * protected with seqcounts. This has the following minor side effects:
  *
  * (1) Its possible that a timestamp be taken after the boot offset is updated
  * but before the timekeeper is updated. If this happens, the new boot offset
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 14/20] vfs: Use sequence counter with associated spinlock
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (12 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 13/20] timekeeping: Use sequence counter with associated raw spinlock Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 15/20] raid5: " Ahmed S. Darwish
                     ` (5 subsequent siblings)
  19 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Alexander Viro,
	Mauro Carvalho Chehab, Jonathan Corbet, linux-fsdevel

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 fs/dcache.c               | 2 +-
 fs/fs_struct.c            | 4 ++--
 include/linux/dcache.h    | 2 +-
 include/linux/fs_struct.h | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 361ea7ab30ea..ea0485861d93 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1746,7 +1746,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	dentry->d_lockref.count = 1;
 	dentry->d_flags = 0;
 	spin_lock_init(&dentry->d_lock);
-	seqcount_init(&dentry->d_seq);
+	seqcount_spinlock_init(&dentry->d_seq, &dentry->d_lock);
 	dentry->d_inode = NULL;
 	dentry->d_parent = dentry;
 	dentry->d_sb = sb;
diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index ca639ed967b7..04b3f5b9c629 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -117,7 +117,7 @@ struct fs_struct *copy_fs_struct(struct fs_struct *old)
 		fs->users = 1;
 		fs->in_exec = 0;
 		spin_lock_init(&fs->lock);
-		seqcount_init(&fs->seq);
+		seqcount_spinlock_init(&fs->seq, &fs->lock);
 		fs->umask = old->umask;
 
 		spin_lock(&old->lock);
@@ -163,6 +163,6 @@ EXPORT_SYMBOL(current_umask);
 struct fs_struct init_fs = {
 	.users		= 1,
 	.lock		= __SPIN_LOCK_UNLOCKED(init_fs.lock),
-	.seq		= SEQCNT_ZERO(init_fs.seq),
+	.seq		= SEQCNT_SPINLOCK_ZERO(init_fs.seq, &init_fs.lock),
 	.umask		= 0022,
 };
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index a81f0c3cf352..65d975bf9390 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -89,7 +89,7 @@ extern struct dentry_stat_t dentry_stat;
 struct dentry {
 	/* RCU lookup touched fields */
 	unsigned int d_flags;		/* protected by d_lock */
-	seqcount_t d_seq;		/* per dentry seqlock */
+	seqcount_spinlock_t d_seq;	/* per dentry seqlock */
 	struct hlist_bl_node d_hash;	/* lookup hash list */
 	struct dentry *d_parent;	/* parent directory */
 	struct qstr d_name;
diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h
index cf1015abfbf2..783b48dedb72 100644
--- a/include/linux/fs_struct.h
+++ b/include/linux/fs_struct.h
@@ -9,7 +9,7 @@
 struct fs_struct {
 	int users;
 	spinlock_t lock;
-	seqcount_t seq;
+	seqcount_spinlock_t seq;
 	int umask;
 	int in_exec;
 	struct path root, pwd;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 15/20] raid5: Use sequence counter with associated spinlock
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (13 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 14/20] vfs: Use sequence counter with associated spinlock Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 16/20] iocost: " Ahmed S. Darwish
                     ` (4 subsequent siblings)
  19 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Song Liu, linux-raid

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 drivers/md/raid5.c | 2 +-
 drivers/md/raid5.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ab8067f9ce8c..892aefe88fa7 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6935,7 +6935,7 @@ static struct r5conf *setup_conf(struct mddev *mddev)
 	} else
 		goto abort;
 	spin_lock_init(&conf->device_lock);
-	seqcount_init(&conf->gen_lock);
+	seqcount_spinlock_init(&conf->gen_lock, &conf->device_lock);
 	mutex_init(&conf->cache_size_mutex);
 	init_waitqueue_head(&conf->wait_for_quiescent);
 	init_waitqueue_head(&conf->wait_for_stripe);
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index f90e0704bed9..a2c9e9e9f5ac 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -589,7 +589,7 @@ struct r5conf {
 	int			prev_chunk_sectors;
 	int			prev_algo;
 	short			generation; /* increments with every reshape */
-	seqcount_t		gen_lock;	/* lock against generation changes */
+	seqcount_spinlock_t	gen_lock;	/* lock against generation changes */
 	unsigned long		reshape_checkpoint; /* Time we last updated
 						     * metadata */
 	long long		min_offset_diff; /* minimum difference between
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 16/20] iocost: Use sequence counter with associated spinlock
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (14 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 15/20] raid5: " Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-06-30  7:11     ` Daniel Wagner
  2020-06-30  5:44   ` [PATCH v3 17/20] NFSv4: " Ahmed S. Darwish
                     ` (3 subsequent siblings)
  19 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jens Axboe, linux-block

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 block/blk-iocost.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 8ac4aad66ebc..8e940c27c27c 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -406,7 +406,7 @@ struct ioc {
 	enum ioc_running		running;
 	atomic64_t			vtime_rate;
 
-	seqcount_t			period_seqcount;
+	seqcount_spinlock_t		period_seqcount;
 	u32				period_at;	/* wallclock starttime */
 	u64				period_at_vtime; /* vtime starttime */
 
@@ -873,7 +873,6 @@ static void ioc_now(struct ioc *ioc, struct ioc_now *now)
 
 static void ioc_start_period(struct ioc *ioc, struct ioc_now *now)
 {
-	lockdep_assert_held(&ioc->lock);
 	WARN_ON_ONCE(ioc->running != IOC_RUNNING);
 
 	write_seqcount_begin(&ioc->period_seqcount);
@@ -2001,7 +2000,7 @@ static int blk_iocost_init(struct request_queue *q)
 
 	ioc->running = IOC_IDLE;
 	atomic64_set(&ioc->vtime_rate, VTIME_PER_USEC);
-	seqcount_init(&ioc->period_seqcount);
+	seqcount_spinlock_init(&ioc->period_seqcount, &ioc->lock);
 	ioc->period_at = ktime_to_us(ktime_get());
 	atomic64_set(&ioc->cur_period, 0);
 	atomic_set(&ioc->hweight_gen, 0);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 17/20] NFSv4: Use sequence counter with associated spinlock
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (15 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 16/20] iocost: " Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 18/20] userfaultfd: " Ahmed S. Darwish
                     ` (2 subsequent siblings)
  19 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Trond Myklebust,
	Anna Schumaker, linux-nfs

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 fs/nfs/nfs4_fs.h   | 2 +-
 fs/nfs/nfs4state.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 2b7f6dcd2eb8..210e590e1f71 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -117,7 +117,7 @@ struct nfs4_state_owner {
 	unsigned long	     so_flags;
 	struct list_head     so_states;
 	struct nfs_seqid_counter so_seqid;
-	seqcount_t	     so_reclaim_seqcount;
+	seqcount_spinlock_t  so_reclaim_seqcount;
 	struct mutex	     so_delegreturn_mutex;
 };
 
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index a8dc25ce48bb..b1dba24918f8 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -509,7 +509,7 @@ nfs4_alloc_state_owner(struct nfs_server *server,
 	nfs4_init_seqid_counter(&sp->so_seqid);
 	atomic_set(&sp->so_count, 1);
 	INIT_LIST_HEAD(&sp->so_lru);
-	seqcount_init(&sp->so_reclaim_seqcount);
+	seqcount_spinlock_init(&sp->so_reclaim_seqcount, &sp->so_lock);
 	mutex_init(&sp->so_delegreturn_mutex);
 	return sp;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 18/20] userfaultfd: Use sequence counter with associated spinlock
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (16 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 17/20] NFSv4: " Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 19/20] kvm/eventfd: " Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 20/20] hrtimer: Use sequence counter with associated raw spinlock Ahmed S. Darwish
  19 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Alexander Viro,
	linux-fsdevel

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 fs/userfaultfd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 52de29000c7e..26e8b23594fb 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -61,7 +61,7 @@ struct userfaultfd_ctx {
 	/* waitqueue head for events */
 	wait_queue_head_t event_wqh;
 	/* a refile sequence protected by fault_pending_wqh lock */
-	struct seqcount refile_seq;
+	seqcount_spinlock_t refile_seq;
 	/* pseudo fd refcounting */
 	refcount_t refcount;
 	/* userfaultfd syscall flags */
@@ -1998,7 +1998,7 @@ static void init_once_userfaultfd_ctx(void *mem)
 	init_waitqueue_head(&ctx->fault_wqh);
 	init_waitqueue_head(&ctx->event_wqh);
 	init_waitqueue_head(&ctx->fd_wqh);
-	seqcount_init(&ctx->refile_seq);
+	seqcount_spinlock_init(&ctx->refile_seq, &ctx->fault_pending_wqh.lock);
 }
 
 SYSCALL_DEFINE1(userfaultfd, int, flags)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 19/20] kvm/eventfd: Use sequence counter with associated spinlock
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (17 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 18/20] userfaultfd: " Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  2020-06-30  5:44   ` [PATCH v3 20/20] hrtimer: Use sequence counter with associated raw spinlock Ahmed S. Darwish
  19 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Paolo Bonzini, kvm

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/linux/kvm_irqfd.h | 2 +-
 virt/kvm/eventfd.c        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index dc1da020305b..dac047abdba7 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -42,7 +42,7 @@ struct kvm_kernel_irqfd {
 	wait_queue_entry_t wait;
 	/* Update side is protected by irqfds.lock */
 	struct kvm_kernel_irq_routing_entry irq_entry;
-	seqcount_t irq_entry_sc;
+	seqcount_spinlock_t irq_entry_sc;
 	/* Used for level IRQ fast-path */
 	int gsi;
 	struct work_struct inject;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index ef7ed916ad4a..d6408bb497dc 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -303,7 +303,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	INIT_LIST_HEAD(&irqfd->list);
 	INIT_WORK(&irqfd->inject, irqfd_inject);
 	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
-	seqcount_init(&irqfd->irq_entry_sc);
+	seqcount_spinlock_init(&irqfd->irq_entry_sc, &kvm->irqfds.lock);
 
 	f = fdget(args->fd);
 	if (!f.file) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v3 20/20] hrtimer: Use sequence counter with associated raw spinlock
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (18 preceding siblings ...)
  2020-06-30  5:44   ` [PATCH v3 19/20] kvm/eventfd: " Ahmed S. Darwish
@ 2020-06-30  5:44   ` Ahmed S. Darwish
  19 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-06-30  5:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_raw_spinlock_t data type, which allows to associate
a raw spinlock with the sequence counter. This enables lockdep to verify
that the raw spinlock used for writer serialization is held when the
write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/hrtimer.h |  2 +-
 kernel/time/hrtimer.c   | 13 ++++++++++---
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 15c8ac313678..25993b86ac5c 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -159,7 +159,7 @@ struct hrtimer_clock_base {
 	struct hrtimer_cpu_base	*cpu_base;
 	unsigned int		index;
 	clockid_t		clockid;
-	seqcount_t		seq;
+	seqcount_raw_spinlock_t	seq;
 	struct hrtimer		*running;
 	struct timerqueue_head	active;
 	ktime_t			(*get_time)(void);
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index d89da1c7e005..c4038511d5c9 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -135,7 +135,11 @@ static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
  * timer->base->cpu_base
  */
 static struct hrtimer_cpu_base migration_cpu_base = {
-	.clock_base = { { .cpu_base = &migration_cpu_base, }, },
+	.clock_base = { {
+		.cpu_base = &migration_cpu_base,
+		.seq      = SEQCNT_RAW_SPINLOCK_ZERO(migration_cpu_base.seq,
+						     &migration_cpu_base.lock),
+	}, },
 };
 
 #define migration_base	migration_cpu_base.clock_base[0]
@@ -1998,8 +2002,11 @@ int hrtimers_prepare_cpu(unsigned int cpu)
 	int i;
 
 	for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
-		cpu_base->clock_base[i].cpu_base = cpu_base;
-		timerqueue_init_head(&cpu_base->clock_base[i].active);
+		struct hrtimer_clock_base *clock_b = &cpu_base->clock_base[i];
+
+		clock_b->cpu_base = cpu_base;
+		seqcount_raw_spinlock_init(&clock_b->seq, &cpu_base->lock);
+		timerqueue_init_head(&clock_b->active);
 	}
 
 	cpu_base->cpu = cpu;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 16/20] iocost: Use sequence counter with associated spinlock
  2020-06-30  5:44   ` [PATCH v3 16/20] iocost: " Ahmed S. Darwish
@ 2020-06-30  7:11     ` Daniel Wagner
  0 siblings, 0 replies; 258+ messages in thread
From: Daniel Wagner @ 2020-06-30  7:11 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML,
	Jens Axboe, linux-block

On Tue, Jun 30, 2020 at 07:44:48AM +0200, Ahmed S. Darwish wrote:
> A sequence counter write side critical section must be protected by some
> form of locking to serialize writers. A plain seqcount_t does not
> contain the information of which lock must be held when entering a write
> side critical section.
> 
> Use the new seqcount_spinlock_t data type, which allows to associate a
> spinlock with the sequence counter. This enables lockdep to verify that
> the spinlock used for writer serialization is held when the write side
> critical section is entered.
> 
> If lockdep is disabled this lock association is compiled out and has
> neither storage size nor runtime overhead.
> 
> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>

Reviewed-by: Daniel Wagner <dwagner@suse.de>

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 04/20] lockdep: Add preemption enabled/disabled assertion APIs
  2020-06-30  5:44   ` [PATCH v3 04/20] lockdep: Add preemption enabled/disabled assertion APIs Ahmed S. Darwish
@ 2020-07-06 20:50     ` Peter Zijlstra
  2020-07-07  7:34       ` Sebastian A. Siewior
  0 siblings, 1 reply; 258+ messages in thread
From: Peter Zijlstra @ 2020-07-06 20:50 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Tue, Jun 30, 2020 at 07:44:36AM +0200, Ahmed S. Darwish wrote:
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index d74ac0fd6b2d..e5e2e632b749 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1118,6 +1118,7 @@ config PROVE_LOCKING
>  	select DEBUG_RWSEMS
>  	select DEBUG_WW_MUTEX_SLOWPATH
>  	select DEBUG_LOCK_ALLOC
> +	select PREEMPT_COUNT if !ARCH_NO_PREEMPT
>  	select TRACE_IRQFLAGS
>  	default n
>  	help

I suspect this can be done unconditional, the thing that requires arch
support is CONFIG_PREEMPTION.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 01/20] Documentation: locking: Describe seqlock design and usage
  2020-06-30  5:44   ` [PATCH v3 01/20] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
@ 2020-07-06 21:04     ` Peter Zijlstra
  2020-07-06 21:12       ` Jonathan Corbet
                         ` (2 more replies)
  0 siblings, 3 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-07-06 21:04 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Jonathan Corbet,
	linux-doc

On Tue, Jun 30, 2020 at 07:44:33AM +0200, Ahmed S. Darwish wrote:
> +Sequence counters (:c:type:`seqcount_t`)
> +========================================

> +.. code-block:: c

I so hate RST, of course it's C. Also, ISTR Jon saying you can leave
that all out without issue.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 01/20] Documentation: locking: Describe seqlock design and usage
  2020-07-06 21:04     ` Peter Zijlstra
@ 2020-07-06 21:12       ` Jonathan Corbet
  2020-07-06 21:16       ` Peter Zijlstra
  2020-07-07 10:12       ` Ahmed S. Darwish
  2 siblings, 0 replies; 258+ messages in thread
From: Jonathan Corbet @ 2020-07-06 21:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ahmed S. Darwish, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML,
	linux-doc

On Mon, 6 Jul 2020 23:04:39 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Jun 30, 2020 at 07:44:33AM +0200, Ahmed S. Darwish wrote:
> > +Sequence counters (:c:type:`seqcount_t`)
> > +========================================  
> 
> > +.. code-block:: c  
> 
> I so hate RST, of course it's C. Also, ISTR Jon saying you can leave
> that all out without issue.

The "c" enables keyword coloring and such - something I can happily do
without but others seem to care about it.  If you take out that line,
though, you'll need a "::" at least to start a literal block.

jon

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 01/20] Documentation: locking: Describe seqlock design and usage
  2020-07-06 21:04     ` Peter Zijlstra
  2020-07-06 21:12       ` Jonathan Corbet
@ 2020-07-06 21:16       ` Peter Zijlstra
  2020-07-07 10:12       ` Ahmed S. Darwish
  2 siblings, 0 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-07-06 21:16 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Jonathan Corbet,
	linux-doc

On Mon, Jul 06, 2020 at 11:04:39PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 30, 2020 at 07:44:33AM +0200, Ahmed S. Darwish wrote:
> > +Sequence counters (:c:type:`seqcount_t`)
> > +========================================
> 
> > +.. code-block:: c
> 
> I so hate RST, of course it's C. Also, ISTR Jon saying you can leave
> that all out without issue.

Something like the below, and then there's all that :ref: nonsense in
that's unreadable gibberish.


Index: linux-2.6/Documentation/locking/seqlock.rst
===================================================================
--- linux-2.6.orig/Documentation/locking/seqlock.rst
+++ linux-2.6/Documentation/locking/seqlock.rst
@@ -33,10 +33,8 @@ class, it can spin forever and the kerne
 This mechanism cannot be used if the protected data contains pointers,
 as the writer can invalidate a pointer that the reader is following.
 
-.. _seqcount_t:
-
-Sequence counters (:c:type:`seqcount_t`)
-========================================
+Sequence counters (`seqcount_t`)
+================================
 
 This is the the raw counting mechanism, which does not protect against
 multiple writers.  Write side critical sections must thus be serialized
@@ -56,8 +54,6 @@ requirements, use a :ref:`sequential loc
 
 Initialization:
 
-.. code-block:: c
-
 	/* dynamic */
 	seqcount_t foo_seqcount;
 	seqcount_init(&foo_seqcount);
@@ -72,9 +68,6 @@ Initialization:
 
 Write path:
 
-.. _seqcount_write_ops:
-.. code-block:: c
-
 	/* Serialized context with disabled preemption */
 
 	write_seqcount_begin(&foo_seqcount);
@@ -85,9 +78,6 @@ Write path:
 
 Read path:
 
-.. _seqcount_read_ops:
-.. code-block:: c
-
 	do {
 		seq = read_seqcount_begin(&foo_seqcount);
 
@@ -95,9 +85,7 @@ Read path:
 
 	} while (read_seqcount_retry(&foo_seqcount, seq));
 
-.. _seqcount_locktype_t:
-
-Sequence counters with associated locks (:c:type:`seqcount_LOCKTYPE_t`)
+Sequence counters with associated locks (`seqcount_LOCKTYPE_t`)
 -----------------------------------------------------------------------
 
 As :ref:`earlier discussed <seqcount_t>`, seqcount write side critical
@@ -117,11 +105,11 @@ protection is enforced in the write side
 
 The following seqcounts with associated locks are defined:
 
-  - :c:type:`seqcount_spinlock_t`
-  - :c:type:`seqcount_raw_spinlock_t`
-  - :c:type:`seqcount_rwlock_t`
-  - :c:type:`seqcount_mutex_t`
-  - :c:type:`seqcount_ww_mutex_t`
+  - `seqcount_spinlock_t`
+  - `seqcount_raw_spinlock_t`
+  - `seqcount_rwlock_t`
+  - `seqcount_mutex_t`
+  - `seqcount_ww_mutex_t`
 
 The plain seqcount read and write APIs branch out to the specific
 seqcount_LOCKTYPE_t implementation at compile-time. This avoids kernel
@@ -129,8 +117,6 @@ API explosion per each new seqcount LOCK
 
 Initialization (replace "LOCKTYPE" with one of the supported locks):
 
-.. code-block:: c
-
 	/* dynamic */
 	seqcount_LOCKTYPE_t foo_seqcount;
 	seqcount_LOCKTYPE_init(&foo_seqcount, &lock);
@@ -149,9 +135,7 @@ while running from a context with the as
 
 Read path: same as in :ref:`plain seqcount_t <seqcount_read_ops>`.
 
-.. _seqlock_t:
-
-Sequential locks (:c:type:`seqlock_t`)
+Sequential locks (`seqlock_t`)
 ======================================
 
 This contains the :ref:`sequence counting mechanism <seqcount_t>`
@@ -164,8 +148,6 @@ halves respectively.
 
 Initialization:
 
-.. code-block:: c
-
 	/* dynamic */
 	seqlock_t foo_seqlock;
 	seqlock_init(&foo_seqlock);
@@ -180,8 +162,6 @@ Initialization:
 
 Write path:
 
-.. code-block:: c
-
 	write_seqlock(&foo_seqlock);
 
 	/* ... [[write-side critical section]] ... */
@@ -194,8 +174,6 @@ Read path, three categories:
    retry if a writer is in progress by detecting change in the sequence
    number.  Writers do not wait for a sequence reader.
 
-   .. code-block:: c
-
 	do {
 		seq = read_seqbegin(&foo_seqlock);
 
@@ -208,8 +186,6 @@ Read path, three categories:
    from entering its critical section. This read lock is
    exclusive. Unlike rwlock_t, only one locking reader can acquire it.
 
-   .. code-block:: c
-
 	read_seqlock_excl(&foo_seqlock);
 
 	/* ... [[read-side critical section]] ... */
@@ -224,8 +200,6 @@ Read path, three categories:
    the next iteration marker), the lockless read is transformed to a
    full locking read and no retry loop is necessary.
 
-   .. code-block:: c
-
 	/* marker; even initialization */
 	int seq = 0;
 	do {


^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-06-30  5:44   ` [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
@ 2020-07-06 21:21     ` Peter Zijlstra
  2020-07-07  8:40       ` Ahmed S. Darwish
  0 siblings, 1 reply; 258+ messages in thread
From: Peter Zijlstra @ 2020-07-06 21:21 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Tue, Jun 30, 2020 at 07:44:38AM +0200, Ahmed S. Darwish wrote:
> +#include <linux/seqlock_types_internal.h>

Why? why not include those lines directly here? Having it in a separate
file actually makes it harder to read.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 04/20] lockdep: Add preemption enabled/disabled assertion APIs
  2020-07-06 20:50     ` Peter Zijlstra
@ 2020-07-07  7:34       ` Sebastian A. Siewior
  0 siblings, 0 replies; 258+ messages in thread
From: Sebastian A. Siewior @ 2020-07-07  7:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ahmed S. Darwish, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Steven Rostedt, LKML

On 2020-07-06 22:50:04 [+0200], Peter Zijlstra wrote:
> On Tue, Jun 30, 2020 at 07:44:36AM +0200, Ahmed S. Darwish wrote:
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index d74ac0fd6b2d..e5e2e632b749 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1118,6 +1118,7 @@ config PROVE_LOCKING
> >  	select DEBUG_RWSEMS
> >  	select DEBUG_WW_MUTEX_SLOWPATH
> >  	select DEBUG_LOCK_ALLOC
> > +	select PREEMPT_COUNT if !ARCH_NO_PREEMPT
> >  	select TRACE_IRQFLAGS
> >  	default n
> >  	help
> 
> I suspect this can be done unconditional, the thing that requires arch
> support is CONFIG_PREEMPTION.

|$ git grep -C 2 PREEMPT_COUNT lib/Kconfig.debug 
|lib/Kconfig.debug-config DEBUG_ATOMIC_SLEEP
|lib/Kconfig.debug-      bool "Sleep inside atomic section checking"
|lib/Kconfig.debug:      select PREEMPT_COUNT
|lib/Kconfig.debug-      depends on DEBUG_KERNEL
|lib/Kconfig.debug-      depends on !ARCH_NO_PREEMPT

There will be a build fault if you force preempt_count.

Sebastian

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-07-06 21:21     ` Peter Zijlstra
@ 2020-07-07  8:40       ` Ahmed S. Darwish
  2020-07-07 13:04         ` Peter Zijlstra
  0 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-07  8:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Mon, Jul 06, 2020 at 11:21:48PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 30, 2020 at 07:44:38AM +0200, Ahmed S. Darwish wrote:
> > +#include <linux/seqlock_types_internal.h>
>
> Why? why not include those lines directly here? Having it in a separate
> file actually makes it harder to read.
>

The seqlock_types_internal.h file contains mostly a heavy sequence of
typeof() branching macros, which is not related to the core logic of
sequence counters or sequential locks:

#define __to_seqcount_t(s)
({
	seqcount_t *seq;

	if (__same_type(*(s), seqcount_t))
		seq = (seqcount_t *)(s);
	else if (__same_type(*(s), seqcount_spinlock_t))
		seq = &((seqcount_spinlock_t *)(s))->seqcount;
	else if (__same_type(*(s), seqcount_raw_spinlock_t))
		seq = &((seqcount_raw_spinlock_t *)(s))->seqcount;
	else if (__same_type(*(s), seqcount_rwlock_t))
		seq = &((seqcount_rwlock_t *)(s))->seqcount;
	else if (__same_type(*(s), seqcount_mutex_t))
		seq = &((seqcount_mutex_t *)(s))->seqcount;
	else if (__same_type(*(s), seqcount_ww_mutex_t))
		seq = &((seqcount_ww_mutex_t *)(s))->seqcount;
	else
		BUILD_BUG_ON_MSG(1, "Unknown seqcount type");

	seq;
})

#define __associated_lock_exists_and_is_preemptible(s)
({
	bool ret;

	/* No associated lock for these */
	if (__same_type(*(s), seqcount_t) ||
	    __same_type(*(s), seqcount_irq_t)) {
		ret = false;
	} else if (__same_type(*(s), seqcount_spinlock_t)	||
		   __same_type(*(s), seqcount_raw_spinlock_t)	||
		   __same_type(*(s), seqcount_rwlock_t)) {
		ret = false;
	} else if (__same_type(*(s), seqcount_mutex_t)		||
		   __same_type(*(s), seqcount_ww_mutex_t)) {
		ret = true;
	} else
		BUILD_BUG_ON_MSG(1, "Unknown seqcount type");

	ret;
})

#define __assert_seqcount_write_section_is_protected(s)
do {
	/* Can't assert anything with plain seqcount_t */
	if (__same_type(*(s), seqcount_t))
		break;

	if (__same_type(*(s), seqcount_spinlock_t))
		lockdep_assert_held(((seqcount_spinlock_t *)(s))->lock);
	else if (__same_type(*(s), seqcount_raw_spinlock_t))
		lockdep_assert_held(((seqcount_raw_spinlock_t *)(s))->lock);
	else if (__same_type(*(s), seqcount_rwlock_t))
		lockdep_assert_held_write(((seqcount_rwlock_t *)(s))->lock);
	else if (__same_type(*(s), seqcount_mutex_t))
		lockdep_assert_held(((seqcount_mutex_t *)(s))->lock);
	else if (__same_type(*(s), seqcount_ww_mutex_t))
		lockdep_assert_held(&((seqcount_ww_mutex_t *)(s))->lock->base);
	else if (__same_type(*(s), seqcount_irq_t))
		lockdep_assert_irqs_disabled();
	else
		BUILD_BUG_ON_MSG(1, "Unknown seqcount type");
} while (0)

and so on and so forth. With the final patch that enables PREEMPT_RT
support (not yet submitted upstream), this file gets even more fugly:

#define __rt_lock_unlock_associated_sleeping_lock(s)
do {
	if (__same_type(*(s), seqcount_t)  ||
	    __same_type(*(s), seqcount_raw_spinlock_t))	{
		break;
	}

	if (__same_type(*(s), seqcount_spinlock_t)) {
		spin_lock(((seqcount_spinlock_t *) s)->lock);
		spin_unlock(((seqcount_spinlock_t *) s)->lock);
	} else if (__same_type(*(s), seqcount_rwlock_t)) {
		read_lock(((seqcount_rwlock_t *) s)->lock);
		read_unlock(((seqcount_rwlock_t *) s)->lock);
	} else if (__same_type(*(s), seqcount_mutex_t)) {
		mutex_lock(((seqcount_mutex_t *) s)->lock);
		mutex_unlock(((seqcount_mutex_t *) s)->lock);
	} else if (__same_type(*(s), seqcount_ww_mutex_t)) {
		ww_mutex_lock(((seqcount_ww_mutex_t *) s)->lock, NULL);
		ww_mutex_unlock(((seqcount_ww_mutex_t *) s)->lock);
	} else
		BUILD_BUG_ON_MSG(1, "Unknown seqcount type");
} while (0)

and even more macros not included here for brevity.

IMHO it makes sense to isolate these "not core logic" parts. Adding all
of this to plain seqlock.h makes it almost unreadable.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 01/20] Documentation: locking: Describe seqlock design and usage
  2020-07-06 21:04     ` Peter Zijlstra
  2020-07-06 21:12       ` Jonathan Corbet
  2020-07-06 21:16       ` Peter Zijlstra
@ 2020-07-07 10:12       ` Ahmed S. Darwish
  2020-07-07 12:47         ` Peter Zijlstra
  2 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-07 10:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Jonathan Corbet,
	linux-doc

On Mon, Jul 06, 2020 at 11:04:39PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 30, 2020 at 07:44:33AM +0200, Ahmed S. Darwish wrote:
> > +Sequence counters (:c:type:`seqcount_t`)
> > +========================================
>
> > +.. code-block:: c
>
> I so hate RST, of course it's C. Also, ISTR Jon saying you can leave
> that all out without issue.

There you go again.

  linux$ git grep 'code-block:: c' | wc -l
  ==>  470

  linux$ git grep ':ref:' | wc -l
  ==>  3171

  linux$ git grep ':c:type:' | wc -l
  ==>  1533

In writing this, Documentation/doc-guide/ was followed, and literally
the thousands of examples above.

But honestly, I don't want to block merging documentation because of
this, and the point of making the docs as readable as possible from text
editors also has merit.

So... I'll just make that file as "RST-lite" as possible.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 01/20] Documentation: locking: Describe seqlock design and usage
  2020-07-07 10:12       ` Ahmed S. Darwish
@ 2020-07-07 12:47         ` Peter Zijlstra
  0 siblings, 0 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-07-07 12:47 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML, Jonathan Corbet,
	linux-doc

On Tue, Jul 07, 2020 at 12:12:01PM +0200, Ahmed S. Darwish wrote:
> On Mon, Jul 06, 2020 at 11:04:39PM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 30, 2020 at 07:44:33AM +0200, Ahmed S. Darwish wrote:
> > > +Sequence counters (:c:type:`seqcount_t`)
> > > +========================================
> >
> > > +.. code-block:: c
> >
> > I so hate RST, of course it's C. Also, ISTR Jon saying you can leave
> > that all out without issue.
> 
> There you go again.
> 
>   linux$ git grep 'code-block:: c' | wc -l
>   ==>  470
> 
>   linux$ git grep ':ref:' | wc -l
>   ==>  3171
> 
>   linux$ git grep ':c:type:' | wc -l
>   ==>  1533
> 
> In writing this, Documentation/doc-guide/ was followed, and literally
> the thousands of examples above.

Then this is a very bad guide. Jon, why is it recommending such horrible
crap? Should we add another field to MAINTAINERS to indicate RST/TXT
preference? Because I'm getting fed up trying to fix up all this
unreadable gunk.

I'm >< close to writing an script that will unconditionally and silently
convert everything to txt before committing.

> But honestly, I don't want to block merging documentation because of
> this, and the point of making the docs as readable as possible from text
> editors also has merit.
> 
> So... I'll just make that file as "RST-lite" as possible.

Thanks!


^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-07-07  8:40       ` Ahmed S. Darwish
@ 2020-07-07 13:04         ` Peter Zijlstra
  2020-07-07 14:37           ` Peter Zijlstra
  0 siblings, 1 reply; 258+ messages in thread
From: Peter Zijlstra @ 2020-07-07 13:04 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Tue, Jul 07, 2020 at 10:40:24AM +0200, Ahmed S. Darwish wrote:
> On Mon, Jul 06, 2020 at 11:21:48PM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 30, 2020 at 07:44:38AM +0200, Ahmed S. Darwish wrote:
> > > +#include <linux/seqlock_types_internal.h>
> >
> > Why? why not include those lines directly here? Having it in a separate
> > file actually makes it harder to read.
> >
> 
> The seqlock_types_internal.h file contains mostly a heavy sequence of
> typeof() branching macros, which is not related to the core logic of
> sequence counters or sequential locks:

> IMHO it makes sense to isolate these "not core logic" parts. Adding all
> of this to plain seqlock.h makes it almost unreadable.

So I applied it all yesterday and tried to make sense of the resulting
indirections and couldn't find stuff -- because it was hidding in
another file.

Specifically I disliked that *seqcount_t* naming and didn't see how it
all connected.

And that other file is less than 200 lines, which resulted in me
wondering why it needed to be hidden away like that.

Anyway, let me muck around with that a bit.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-07-07 13:04         ` Peter Zijlstra
@ 2020-07-07 14:37           ` Peter Zijlstra
  2020-07-08  9:12             ` Peter Zijlstra
  2020-07-08 10:33             ` Ahmed S. Darwish
  0 siblings, 2 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-07-07 14:37 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Tue, Jul 07, 2020 at 03:04:10PM +0200, Peter Zijlstra wrote:

> Anyway, let me muck around with that a bit.

How's this? it removes a level of indirection and a bunch of repetition.

It doesn't provide SEQCNT_LOCKTYPE_ZERO() for each LOCKTYPE, but you can
use this one macro for any LOCKTYPE.

It's also more than 200 lines shorter.

---
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 8b97204f35a7..2b599cf03db4 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -1,36 +1,15 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef __LINUX_SEQLOCK_H
 #define __LINUX_SEQLOCK_H
+
 /*
- * Reader/writer consistent mechanism without starving writers. This type of
- * lock for data where the reader wants a consistent set of information
- * and is willing to retry if the information changes. There are two types
- * of readers:
- * 1. Sequence readers which never block a writer but they may have to retry
- *    if a writer is in progress by detecting change in sequence number.
- *    Writers do not wait for a sequence reader.
- * 2. Locking readers which will wait if a writer or another locking reader
- *    is in progress. A locking reader in progress will also block a writer
- *    from going forward. Unlike the regular rwlock, the read lock here is
- *    exclusive so that only one locking reader can get it.
- *
- * This is not as cache friendly as brlock. Also, this may not work well
- * for data that contains pointers, because any writer could
- * invalidate a pointer that a reader was following.
- *
- * Expected non-blocking reader usage:
- * 	do {
- *	    seq = read_seqbegin(&foo);
- * 	...
- *      } while (read_seqretry(&foo, seq));
+ * seqcount_t / seqlock_t - a reader-writer consistency mechanism with
+ * lockless readers (read-only retry loops), and no writer starvation.
  *
+ * See Documentation/locking/seqlock.rst for full description.
  *
- * On non-SMP the spin locks disappear but the writer still needs
- * to increment the sequence variables because an interrupt routine could
- * change the state of the data.
- *
- * Based on x86_64 vsyscall gettimeofday 
- * by Keith Owens and Andrea Arcangeli
+ * Copyrights:
+ * - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli
  */
 
 #include <linux/spinlock.h>
@@ -41,8 +20,8 @@
 #include <asm/processor.h>
 
 /*
- * The seqlock interface does not prescribe a precise sequence of read
- * begin/retry/end. For readers, typically there is a call to
+ * The seqlock seqcount_t interface does not prescribe a precise sequence of
+ * read begin/retry/end. For readers, typically there is a call to
  * read_seqcount_begin() and read_seqcount_retry(), however, there are more
  * esoteric cases which do not follow this pattern.
  *
@@ -56,10 +35,28 @@
 #define KCSAN_SEQLOCK_REGION_MAX 1000
 
 /*
- * Version using sequence counter only.
- * This can be used when code has its own mutex protecting the
- * updating starting before the write_seqcountbeqin() and ending
- * after the write_seqcount_end().
+ * Sequence counters (seqcount_t)
+ *
+ * This is the raw counting mechanism, without any writer protection.
+ *
+ * Write side critical sections must be serialized and non-preemptible.
+ *
+ * If readers can be invoked from hardirq or softirq contexts,
+ * interrupts or bottom halves must also be respectively disabled before
+ * entering the write section.
+ *
+ * This mechanism can't be used if the protected data contains pointers,
+ * as the writer can invalidate a pointer that a reader is following.
+ *
+ * If the write serialization mechanism is one of the common kernel
+ * locking primitives, use a sequence counter with associated lock
+ * (seqcount_LOCKTYPE_t) instead.
+ *
+ * If it's desired to automatically handle the sequence counter writer
+ * serialization and non-preemptibility requirements, use a sequential
+ * lock (seqlock_t) instead.
+ *
+ * See Documentation/locking/seqlock.rst
  */
 typedef struct seqcount {
 	unsigned sequence;
@@ -82,6 +79,10 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
 # define SEQCOUNT_DEP_MAP_INIT(lockname) \
 		.dep_map = { .name = #lockname } \
 
+/**
+ * seqcount_init() - runtime initializer for seqcount_t
+ * @s: Pointer to the &typedef seqcount_t instance
+ */
 # define seqcount_init(s)				\
 	do {						\
 		static struct lock_class_key __key;	\
@@ -105,12 +106,122 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
 # define seqcount_lockdep_reader_access(x)
 #endif
 
-#define SEQCNT_ZERO(lockname) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(lockname)}
+/**
+ * SEQCNT_ZERO() - static initializer for seqcount_t
+ * @name: Name of the &typedef seqcount_t instance
+ */
+#define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) }
+
+/*
+ * Sequence counters with associated locks (seqcount_LOCKTYPE_t)
+ *
+ * A sequence counter which associates the lock used for writer
+ * serialization at initialization time. This enables lockdep to validate
+ * that the write side critical section is properly serialized.
+ *
+ * For associated locks which do not implicitly disable preemption,
+ * preemption protection is enforced in the write side function.
+ *
+ * See Documentation/locking/seqlock.rst
+ */
+
+#ifdef CONFIG_LOCKDEP
+#define __SEQCOUNT_LOCKDEP(expr)	expr
+#else
+#define __SEQCOUNT_LOCKDEP(expr)
+#endif
+
+#define SEQCOUNT_LOCKTYPE(name, locktype)				\
+typedef struct seqcount_##name {					\
+	seqcount_t	seqcount;					\
+	__SEQCOUNT_LOCKDEP(locktype *lock);				\
+} seqcount_##name##_t;							\
+									\
+static __always_inline void						\
+seqcount_##name##_init(seqcount_##name##_t *s, locktype *l) 		\
+{									\
+	seqcount_init(&s->seqcount);					\
+	__SEQCOUNT_LOCKDEP(s->lock = l);				\
+}
+
+#define SEQCNT_LOCKTYPE_ZERO(_name, _lock) {				\
+	.seqcount = SEQCNT_ZERO(_name.seqcount), 			\
+	__SEQCOUNT_LOCKDEP(.lock = (_lock))				\
+}
+
+#include <linux/spinlock.h>
+#include <linux/ww_mutex.h>
+
+SEQCOUNT_LOCKTYPE(raw_spinlock, raw_spinlock_t)
+SEQCOUNT_LOCKTYPE(spinlock, spinlock_t)
+SEQCOUNT_LOCKTYPE(rwlock, rwlock_t)
+SEQCOUNT_LOCKTYPE(mutex, struct mutex)
+SEQCOUNT_LOCKTYPE(ww_mutex, struct ww_mutex)
+
+#define __to_seqcount_t(s)	(seqcount_t *)(s)
+
+/*
+ *	seqcount_LOCKTYPE_t -- write APIs
+ *
+ * For associated lock types which do not implicitly disable preemption,
+ * enforce preemption protection in the write side functions.
+ *
+ * Never use lockdep for the raw write variants.
+ */
+
+#define __associated_lock_is_preemptible(s)				\
+({									\
+	bool ret;							\
+									\
+	if (__same_type(*(s), seqcount_t) ||				\
+	    __same_type(*(s), seqcount_raw_spinlock_t)) {		\
+		ret = false;						\
+	} else if (__same_type(*(s), seqcount_spinlock_t) ||		\
+		   __same_type(*(s), seqcount_rwlock_t)) {		\
+		ret = IS_BUILTIN(CONFIG_PREEMPT_RT);			\
+	} else if (__same_type(*(s), seqcount_mutex_t) ||		\
+		   __same_type(*(s), seqcount_ww_mutex_t)) {		\
+		ret = true;						\
+	} else								\
+		BUILD_BUG_ON_MSG(1, "Unknown seqcount type");		\
+									\
+	ret;								\
+})
+
+#ifdef CONFIG_LOCKDEP
+
+#define __assert_associated_lock_held(s)				\
+do {									\
+	if (__same_type(*(s), seqcount_t))				\
+		break;							\
+									\
+	if (__same_type(*(s), seqcount_spinlock_t))			\
+		lockdep_assert_held(((seqcount_spinlock_t *)(s))->lock);\
+	else if (__same_type(*(s), seqcount_raw_spinlock_t))		\
+		lockdep_assert_held(((seqcount_raw_spinlock_t *)(s))->lock);	\
+	else if (__same_type(*(s), seqcount_rwlock_t))			\
+		lockdep_assert_held_write(((seqcount_rwlock_t *)(s))->lock);	\
+	else if (__same_type(*(s), seqcount_mutex_t))			\
+		lockdep_assert_held(((seqcount_mutex_t *)(s))->lock);	\
+	else if (__same_type(*(s), seqcount_ww_mutex_t))		\
+		lockdep_assert_held(&((seqcount_ww_mutex_t *)(s))->lock->base);	\
+	else								\
+		BUILD_BUG_ON_MSG(1, "Unknown seqcount type");		\
+} while (0)
+
+#else
+
+#define __assert_associated_lock_held(s)				\
+do {									\
+	(void) __to_seqcount_t(s);					\
+} while (0)
+
+#endif /* CONFIG_LOCKDEP */
 
 
 /**
- * __read_seqcount_begin - begin a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
+ * __read_seqcount_begin() - begin a seqcount read section (without barrier)
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * Returns: count to be passed to read_seqcount_retry
  *
  * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
@@ -121,7 +232,9 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  * Use carefully, only in critical code, and comment how the barrier is
  * provided.
  */
-static inline unsigned __read_seqcount_begin(const seqcount_t *s)
+#define __read_seqcount_begin(s)	do___read_seqcount_begin(__to_seqcount_t(s))
+
+static inline unsigned do___read_seqcount_begin(const seqcount_t *s)
 {
 	unsigned ret;
 
@@ -136,15 +249,18 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount - Read the raw seqcount
- * @s: pointer to seqcount_t
+ * raw_read_seqcount() - Read the seqcount raw counter value
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * Returns: count to be passed to read_seqcount_retry
  *
  * raw_read_seqcount opens a read critical section of the given
- * seqcount without any lockdep checking and without checking or
- * masking the LSB. Calling code is responsible for handling that.
+ * seqcount_t, without any lockdep checks and without checking or
+ * masking the sequence counter LSB. Calling code is responsible for
+ * handling that.
  */
-static inline unsigned raw_read_seqcount(const seqcount_t *s)
+#define raw_read_seqcount(s)	do_raw_read_seqcount(__to_seqcount_t(s))
+
+static inline unsigned do_raw_read_seqcount(const seqcount_t *s)
 {
 	unsigned ret = READ_ONCE(s->sequence);
 	smp_rmb();
@@ -153,42 +269,46 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount_begin - start seq-read critical section w/o lockdep
- * @s: pointer to seqcount_t
+ * raw_read_seqcount_begin() - start a seqcount read section w/o lockdep
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * Returns: count to be passed to read_seqcount_retry
  *
  * raw_read_seqcount_begin opens a read critical section of the given
- * seqcount, but without any lockdep checking. Validity of the critical
- * section is tested by checking read_seqcount_retry function.
+ * seqcount_t, but without any lockdep checking. Validity of the read
+ * section must be checked with read_seqcount_retry().
  */
-static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
+#define raw_read_seqcount_begin(s)	do_raw_read_seqcount_begin(__to_seqcount_t(s))
+
+static inline unsigned do_raw_read_seqcount_begin(const seqcount_t *s)
 {
-	unsigned ret = __read_seqcount_begin(s);
+	unsigned ret = do___read_seqcount_begin(s);
 	smp_rmb();
 	return ret;
 }
 
 /**
- * read_seqcount_begin - begin a seq-read critical section
- * @s: pointer to seqcount_t
+ * read_seqcount_begin() - start a seqcount read critical section
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * Returns: count to be passed to read_seqcount_retry
  *
- * read_seqcount_begin opens a read critical section of the given seqcount.
- * Validity of the critical section is tested by checking read_seqcount_retry
- * function.
+ * read_seqcount_begin opens a read critical section of the given
+ * seqcount_t. Validity of the read section must be checked with
+ * read_seqcount_retry().
  */
-static inline unsigned read_seqcount_begin(const seqcount_t *s)
+#define read_seqcount_begin(s)	do_read_seqcount_begin(__to_seqcount_t(s))
+
+static inline unsigned do_read_seqcount_begin(const seqcount_t *s)
 {
 	seqcount_lockdep_reader_access(s);
-	return raw_read_seqcount_begin(s);
+	return do_raw_read_seqcount_begin(s);
 }
 
 /**
- * raw_seqcount_begin - begin a seq-read critical section
- * @s: pointer to seqcount_t
+ * raw_seqcount_begin() - begin a seq-read critical section
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * Returns: count to be passed to read_seqcount_retry
  *
- * raw_seqcount_begin opens a read critical section of the given seqcount.
+ * raw_seqcount_begin opens a read critical section of the given seqcount_t.
  * Validity of the critical section is tested by checking read_seqcount_retry
  * function.
  *
@@ -197,7 +317,9 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
  * read_seqcount_retry() instead of stabilizing at the beginning of the
  * critical section.
  */
-static inline unsigned raw_seqcount_begin(const seqcount_t *s)
+#define raw_seqcount_begin(s)	do_raw_seqcount_begin(__to_seqcount_t(s))
+
+static inline unsigned do_raw_seqcount_begin(const seqcount_t *s)
 {
 	unsigned ret = READ_ONCE(s->sequence);
 	smp_rmb();
@@ -206,8 +328,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * __read_seqcount_retry - end a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
+ * __read_seqcount_retry() - end a seq-read critical section (without barrier)
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * @start: count, from read_seqcount_begin
  * Returns: 1 if retry is required, else 0
  *
@@ -219,38 +341,56 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
  * Use carefully, only in critical code, and comment how the barrier is
  * provided.
  */
-static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
+#define __read_seqcount_retry(s, start)	do___read_seqcount_retry(__to_seqcount_t(s), start)
+
+static inline int do___read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
 	kcsan_atomic_next(0);
 	return unlikely(READ_ONCE(s->sequence) != start);
 }
 
 /**
- * read_seqcount_retry - end a seq-read critical section
- * @s: pointer to seqcount_t
+ * read_seqcount_retry() - end a seq-read critical section
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * @start: count, from read_seqcount_begin
  * Returns: 1 if retry is required, else 0
  *
- * read_seqcount_retry closes a read critical section of the given seqcount.
+ * read_seqcount_retry closes a read critical section of given seqcount_t.
  * If the critical section was invalid, it must be ignored (and typically
  * retried).
  */
-static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
+#define read_seqcount_retry(s, start)	do_read_seqcount_retry(__to_seqcount_t(s), start)
+
+static inline int do_read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
 	smp_rmb();
-	return __read_seqcount_retry(s, start);
+	return do___read_seqcount_retry(s, start);
 }
 
+#define raw_write_seqcount_begin(s)					\
+do {									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_disable();					\
+									\
+	do_raw_write_seqcount_begin(__to_seqcount_t(s));		\
+} while (0)
 
-
-static inline void raw_write_seqcount_begin(seqcount_t *s)
+static inline void do_raw_write_seqcount_begin(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
 	s->sequence++;
 	smp_wmb();
 }
 
-static inline void raw_write_seqcount_end(seqcount_t *s)
+#define raw_write_seqcount_end(s)					\
+do {									\
+	do_raw_write_seqcount_end(__to_seqcount_t(s));			\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_enable();					\
+} while (0)
+
+static inline void do_raw_write_seqcount_end(seqcount_t *s)
 {
 	smp_wmb();
 	s->sequence++;
@@ -258,12 +398,12 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 }
 
 /**
- * raw_write_seqcount_barrier - do a seq write barrier
- * @s: pointer to seqcount_t
+ * raw_write_seqcount_barrier() - do a seq write barrier
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * This can be used to provide an ordering guarantee instead of the
  * usual consistency guarantee. It is one wmb cheaper, because we can
- * collapse the two back-to-back wmb()s.
+ * collapse the two back-to-back wmb()s::
  *
  * Note that writes surrounding the barrier should be declared atomic (e.g.
  * via WRITE_ONCE): a) to ensure the writes become visible to other threads
@@ -298,7 +438,9 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  *              WRITE_ONCE(X, false);
  *      }
  */
-static inline void raw_write_seqcount_barrier(seqcount_t *s)
+#define raw_write_seqcount_barrier(s)	do_raw_write_seqcount_barrier(__to_seqcount_t(s))
+
+static inline void do_raw_write_seqcount_barrier(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
 	s->sequence++;
@@ -307,7 +449,24 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
-static inline int raw_read_seqcount_latch(seqcount_t *s)
+/**
+ * raw_read_seqcount_latch() - pick even or odd seqcount latch data copy
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
+ *
+ * Use seqcount latching to switch between two storage places with
+ * sequence protection to allow interruptible, preemptible, writer
+ * sections.
+ *
+ * Check raw_write_seqcount_latch() for more details and a full reader
+ * and writer usage example.
+ *
+ * Return: sequence counter. Use the lowest bit as index for picking
+ * which data copy to read. Full counter must then be checked with
+ * read_seqcount_retry().
+ */
+#define raw_read_seqcount_latch(s)	do_raw_read_seqcount_latch(__to_seqcount_t(s))
+
+static inline int do_raw_read_seqcount_latch(seqcount_t *s)
 {
 	/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
 	int seq = READ_ONCE(s->sequence); /* ^^^ */
@@ -315,8 +474,8 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
 }
 
 /**
- * raw_write_seqcount_latch - redirect readers to even/odd copy
- * @s: pointer to seqcount_t
+ * raw_write_seqcount_latch() - redirect readers to even/odd copy
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -332,101 +491,164 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  * Very simply put: we first modify one copy and then the other. This ensures
  * there is always one copy in a stable state, ready to give us an answer.
  *
- * The basic form is a data structure like:
+ * The basic form is a data structure like::
  *
- * struct latch_struct {
- *	seqcount_t		seq;
- *	struct data_struct	data[2];
- * };
+ *	struct latch_struct {
+ *		seqcount_t		seq;
+ *		struct data_struct	data[2];
+ *	};
  *
  * Where a modification, which is assumed to be externally serialized, does the
- * following:
+ * following::
  *
- * void latch_modify(struct latch_struct *latch, ...)
- * {
- *	smp_wmb();	<- Ensure that the last data[1] update is visible
- *	latch->seq++;
- *	smp_wmb();	<- Ensure that the seqcount update is visible
+ *	void latch_modify(struct latch_struct *latch, ...)
+ *	{
+ *		smp_wmb();	// Ensure that the last data[1] update is visible
+ *		latch->seq++;
+ *		smp_wmb();	// Ensure that the seqcount update is visible
  *
- *	modify(latch->data[0], ...);
+ *		modify(latch->data[0], ...);
  *
- *	smp_wmb();	<- Ensure that the data[0] update is visible
- *	latch->seq++;
- *	smp_wmb();	<- Ensure that the seqcount update is visible
+ *		smp_wmb();	// Ensure that the data[0] update is visible
+ *		latch->seq++;
+ *		smp_wmb();	// Ensure that the seqcount update is visible
  *
- *	modify(latch->data[1], ...);
- * }
+ *		modify(latch->data[1], ...);
+ *	}
  *
- * The query will have a form like:
+ * The query will have a form like::
  *
- * struct entry *latch_query(struct latch_struct *latch, ...)
- * {
- *	struct entry *entry;
- *	unsigned seq, idx;
+ *	struct entry *latch_query(struct latch_struct *latch, ...)
+ *	{
+ *		struct entry *entry;
+ *		unsigned seq, idx;
  *
- *	do {
- *		seq = raw_read_seqcount_latch(&latch->seq);
+ *		do {
+ *			seq = raw_read_seqcount_latch(&latch->seq);
  *
- *		idx = seq & 0x01;
- *		entry = data_query(latch->data[idx], ...);
+ *			idx = seq & 0x01;
+ *			entry = data_query(latch->data[idx], ...);
  *
- *		smp_rmb();
- *	} while (seq != latch->seq);
+ *			// read_seqcount_retry() includes necessary smp_rmb()
+ *		} while (read_seqcount_retry(&latch->seq, seq);
  *
- *	return entry;
- * }
+ *		return entry;
+ *	}
  *
  * So during the modification, queries are first redirected to data[1]. Then we
  * modify data[0]. When that is complete, we redirect queries back to data[0]
  * and we can modify data[1].
  *
- * NOTE: The non-requirement for atomic modifications does _NOT_ include
- *       the publishing of new entries in the case where data is a dynamic
- *       data structure.
+ * NOTE:
+ *
+ *	The non-requirement for atomic modifications does _NOT_ include
+ *	the publishing of new entries in the case where data is a dynamic
+ *	data structure.
  *
- *       An iteration might start in data[0] and get suspended long enough
- *       to miss an entire modification sequence, once it resumes it might
- *       observe the new entry.
+ *	An iteration might start in data[0] and get suspended long enough
+ *	to miss an entire modification sequence, once it resumes it might
+ *	observe the new entry.
  *
- * NOTE: When data is a dynamic data structure; one should use regular RCU
- *       patterns to manage the lifetimes of the objects within.
+ * NOTE:
+ *
+ *	When data is a dynamic data structure; one should use regular RCU
+ *	patterns to manage the lifetimes of the objects within.
  */
-static inline void raw_write_seqcount_latch(seqcount_t *s)
+#define raw_write_seqcount_latch(s)	do_raw_write_seqcount_latch(__to_seqcount_t(s))
+
+static inline void do_raw_write_seqcount_latch(seqcount_t *s)
 {
        smp_wmb();      /* prior stores before incrementing "sequence" */
        s->sequence++;
        smp_wmb();      /* increment "sequence" before following stores */
 }
 
+static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
+{
+	do_raw_write_seqcount_begin(s);
+	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
+}
+
+#define write_seqcount_begin_nested(s, subclass)		\
+do {									\
+	__assert_associated_lock_held(s);				\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_disable();					\
+									\
+	do_write_seqcount_begin_nested(__to_seqcount_t(s), subclass);	\
+} while (0)
+
+static inline void do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
+{
+//	lockdep_assert_preemption_disabled();
+	__write_seqcount_begin_nested(s, subclass);
+}
+
 /*
- * Sequence counter only version assumes that callers are using their
- * own mutexing.
+ * write_seqcount_t_begin() without lockdep non-preemptibility checks.
+ *
+ * Use for internal seqlock.h code where it's known that preemption is
+ * already disabled. For example, seqlock_t write side functions.
  */
-static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
+static inline void __write_seqcount_begin(seqcount_t *s)
 {
-	raw_write_seqcount_begin(s);
-	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
+	__write_seqcount_begin_nested(s, 0);
 }
 
-static inline void write_seqcount_begin(seqcount_t *s)
+/**
+ * write_seqcount_begin() - start a seqcount write-side critical section
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
+ *
+ * write_seqcount_begin opens a write-side critical section of the given
+ * seqcount. Seqcount write-side critical sections must be externally
+ * serialized and non-preemptible.
+ */
+#define write_seqcount_begin(s)						\
+do {									\
+	__assert_associated_lock_held(s);				\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_disable();					\
+									\
+	do_write_seqcount_begin(__to_seqcount_t(s));			\
+} while (0)
+
+static inline void do_write_seqcount_begin(seqcount_t *s)
 {
 	write_seqcount_begin_nested(s, 0);
 }
 
-static inline void write_seqcount_end(seqcount_t *s)
+/**
+ * write_seqcount_end() - end a seqcount_t write-side critical section
+ * @s: Pointer to &typedef seqcount_t
+ *
+ * The write section must've been opened with write_seqcount_begin().
+ */
+#define write_seqcount_end(s)						\
+do {									\
+	do_write_seqcount_end(__to_seqcount_t(s));			\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_enable();					\
+} while (0)
+
+static inline void do_write_seqcount_end(seqcount_t *s)
 {
 	seqcount_release(&s->dep_map, _RET_IP_);
 	raw_write_seqcount_end(s);
 }
 
 /**
- * write_seqcount_invalidate - invalidate in-progress read-side seq operations
- * @s: pointer to seqcount_t
+ * write_seqcount_invalidate() - invalidate in-progress read-side seq operations
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * After write_seqcount_invalidate, no read-side seq operations will complete
  * successfully and see data older than this.
  */
-static inline void write_seqcount_invalidate(seqcount_t *s)
+#define write_seqcount_invalidate(s)	do_write_seqcount_invalidate(__to_seqcount_t(s))
+
+static inline void do_write_seqcount_invalidate(seqcount_t *s)
 {
 	smp_wmb();
 	kcsan_nestable_atomic_begin();
@@ -434,42 +656,74 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
+/*
+ * Sequential locks (seqlock_t)
+ *
+ * Sequence counters with an embedded spinlock for writer serialization
+ * and non-preemptibility.
+ *
+ * For more info, see:
+ *   - Comments on top of seqcount_t
+ *   - Documentation/locking/seqlock.rst
+ */
 typedef struct {
 	struct seqcount seqcount;
 	spinlock_t lock;
 } seqlock_t;
 
-/*
- * These macros triggered gcc-3.x compile-time problems.  We think these are
- * OK now.  Be cautious.
- */
 #define __SEQLOCK_UNLOCKED(lockname)			\
 	{						\
 		.seqcount = SEQCNT_ZERO(lockname),	\
 		.lock =	__SPIN_LOCK_UNLOCKED(lockname)	\
 	}
 
-#define seqlock_init(x)					\
+/**
+ * seqlock_init() - dynamic initializer for seqlock_t
+ * @sl: Pointer to the &typedef seqlock_t instance
+ */
+#define seqlock_init(sl)				\
 	do {						\
-		seqcount_init(&(x)->seqcount);		\
-		spin_lock_init(&(x)->lock);		\
+		seqcount_init(&(sl)->seqcount);		\
+		spin_lock_init(&(sl)->lock);		\
 	} while (0)
 
-#define DEFINE_SEQLOCK(x) \
-		seqlock_t x = __SEQLOCK_UNLOCKED(x)
+/**
+ * DEFINE_SEQLOCK() - Define a statically-allocated seqlock_t
+ * @sl: Name of the &typedef seqlock_t instance
+ */
+#define DEFINE_SEQLOCK(sl) \
+		seqlock_t sl = __SEQLOCK_UNLOCKED(sl)
 
-/*
- * Read side functions for starting and finalizing a read side section.
+/**
+ * read_seqbegin() - start a seqlock_t read-side critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_seqbegin opens a read side critical section of the given
+ * seqlock_t. Validity of the critical section is tested by checking
+ * read_seqretry().
+ *
+ * Return: count to be passed to read_seqretry()
  */
 static inline unsigned read_seqbegin(const seqlock_t *sl)
 {
-	unsigned ret = read_seqcount_begin(&sl->seqcount);
+	unsigned ret = do_read_seqcount_begin(&sl->seqcount);
 
 	kcsan_atomic_next(0);  /* non-raw usage, assume closing read_seqretry() */
 	kcsan_flat_atomic_begin();
 	return ret;
 }
 
+/**
+ * read_seqretry() - end and validate a seqlock_t read side section
+ * @sl: Pointer to &typedef seqlock_t
+ * @start: count, from read_seqbegin()
+ *
+ * read_seqretry closes the given seqlock_t read side critical section,
+ * and checks its validity. If the read section was invalid, it must be
+ * ignored and retried.
+ *
+ * Return: 1 if a retry is required, 0 otherwise
+ */
 static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 {
 	/*
@@ -478,47 +732,94 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 	 */
 	kcsan_flat_atomic_end();
 
-	return read_seqcount_retry(&sl->seqcount, start);
+	return do_read_seqcount_retry(&sl->seqcount, start);
 }
 
-/*
- * Lock out other writers and update the count.
- * Acts like a normal spin_lock/unlock.
- * Don't need preempt_disable() because that is in the spin_lock already.
+/**
+ * write_seqlock() - start a seqlock_t write side critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_seqlock opens a write side critical section of the given
+ * seqlock_t.  It also acquires the spinlock_t embedded inside the
+ * sequential lock. All the seqlock_t write side critical sections are
+ * thus automatically serialized and non-preemptible.
+ *
+ * Use the ``_irqsave`` and ``_bh`` variants instead if the read side
+ * can be invoked from a hardirq or softirq context.
+ *
+ * The opened write side section must be closed with write_sequnlock().
  */
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock() - end a seqlock_t write side critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_sequnlock closes the (serialized and non-preemptible) write
+ * side critical section of given seqlock_t.
+ */
 static inline void write_sequnlock(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	do_write_seqcount_end(&sl->seqcount);
 	spin_unlock(&sl->lock);
 }
 
+/**
+ * write_seqlock_bh() - start a softirqs-disabled seqlock_t write section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_bh`` variant of write_seqlock(). Use only if the read side section
+ * can be invoked from a softirq context.
+ *
+ * The opened write section must be closed with write_sequnlock_bh().
+ */
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock_bh() - end a softirqs-disabled seqlock_t write section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_sequnlock_bh closes the serialized, non-preemptible,
+ * softirqs-disabled, seqlock_t write side critical section opened with
+ * write_seqlock_bh().
+ */
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	do_write_seqcount_end(&sl->seqcount);
 	spin_unlock_bh(&sl->lock);
 }
 
+/**
+ * write_seqlock_irq() - start a non-interruptible seqlock_t write side section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * This is the ``_irq`` variant of write_seqlock(). Use only if the read
+ * section of given seqlock_t can be invoked from a hardirq context.
+ */
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock_irq() - end a non-interruptible seqlock_t write side section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_irq`` variant of write_sequnlock(). The write side section of
+ * given seqlock_t must've been opened with write_seqlock_irq().
+ */
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	do_write_seqcount_end(&sl->seqcount);
 	spin_unlock_irq(&sl->lock);
 }
 
@@ -527,44 +828,98 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	unsigned long flags;
 
 	spin_lock_irqsave(&sl->lock, flags);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
+
 	return flags;
 }
 
+/**
+ * write_seqlock_irqsave() - start a non-interruptible seqlock_t write section
+ * @lock:  Pointer to &typedef seqlock_t
+ * @flags: Stack-allocated storage for saving caller's local interrupt
+ *         state, to be passed to write_sequnlock_irqrestore().
+ *
+ * ``_irqsave`` variant of write_seqlock(). Use if the read section of
+ * given seqlock_t can be invoked from a hardirq context.
+ *
+ * The opened write section must be closed with write_sequnlock_irqrestore().
+ */
 #define write_seqlock_irqsave(lock, flags)				\
 	do { flags = __write_seqlock_irqsave(lock); } while (0)
 
+/**
+ * write_sequnlock_irqrestore() - end non-interruptible seqlock_t write section
+ * @sl:    Pointer to &typedef seqlock_t
+ * @flags: Caller's saved interrupt state, from write_seqlock_irqsave()
+ *
+ * ``_irqrestore`` variant of write_sequnlock(). The write section of
+ * given seqlock_t must've been opened with write_seqlock_irqsave().
+ */
 static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
-	write_seqcount_end(&sl->seqcount);
+	do_write_seqcount_end(&sl->seqcount);
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
-/*
- * A locking reader exclusively locks out other writers and locking readers,
- * but doesn't update the sequence number. Acts like a normal spin_lock/unlock.
- * Don't need preempt_disable() because that is in the spin_lock already.
+/**
+ * read_seqlock_excl() - begin a seqlock_t locking reader critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_seqlock_excl opens a locking reader critical section for the
+ * given seqlock_t. A locking reader exclusively locks out other writers
+ * and other *locking* readers, but doesn't update the sequence number.
+ *
+ * Locking readers act like a normal spin_lock()/spin_unlock().
+ *
+ * The opened read side section must be closed with read_sequnlock_excl().
  */
 static inline void read_seqlock_excl(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl() - end a seqlock_t locking reader critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_sequnlock_excl closes the locking reader critical section opened
+ * with read_seqlock_excl().
+ */
 static inline void read_sequnlock_excl(seqlock_t *sl)
 {
 	spin_unlock(&sl->lock);
 }
 
 /**
- * read_seqbegin_or_lock - begin a sequence number check or locking block
- * @lock: sequence lock
- * @seq : sequence number to be checked
+ * read_seqbegin_or_lock() - begin a seqlock_t lockless or locking reader
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq : Marker and return parameter. If the passed value is even, the
+ * reader will become a *lockless* seqlock_t sequence counter reader as
+ * in read_seqbegin(). If the passed value is odd, the reader will
+ * become a fully locking reader, as in read_seqlock_excl().  In the
+ * first call to read_seqbegin_or_lock(), the caller **must** initialize
+ * and pass an even value to @seq so a lockless read is optimistically
+ * tried first.
+ *
+ * read_seqbegin_or_lock is an API designed to optimistically try a
+ * normal lockless seqlock_t read section first, as in read_seqbegin().
+ * If an odd counter is found, the normal lockless read trial has
+ * failed, and the next reader iteration transforms to a full seqlock_t
+ * locking reader as in read_seqlock_excl().
  *
- * First try it once optimistically without taking the lock. If that fails,
- * take the lock. The sequence number is also used as a marker for deciding
- * whether to be a reader (even) or writer (odd).
- * N.B. seq must be initialized to an even number to begin with.
+ * This is typically used to avoid lockless seqlock_t readers starvation
+ * (too much retry loops) in the case of a sharp spike in write
+ * activity.
+ *
+ * The opened read section must be closed with done_seqretry().  Check
+ * Documentation/locking/seqlock.rst for template example code.
+ *
+ * Return: The encountered sequence counter value, returned through the
+ * @seq parameter, which is overloaded as a return parameter. The
+ * returned value must be checked with need_seqretry(). If the read
+ * section must be retried, the returned value must also be passed to
+ * the @seq parameter of the next read_seqbegin_or_lock() iteration.
  */
 static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
 {
@@ -574,32 +929,90 @@ static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
 		read_seqlock_excl(lock);
 }
 
+/**
+ * need_seqretry() - validate seqlock_t "locking or lockless" reader section
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq: count, from read_seqbegin_or_lock()
+ *
+ * need_seqretry checks if the seqlock_t read-side critical section
+ * started with read_seqbegin_or_lock() is valid. If it was not, the
+ * caller must retry the read-side section.
+ *
+ * Return: 1 if a retry is required, 0 otherwise
+ */
 static inline int need_seqretry(seqlock_t *lock, int seq)
 {
 	return !(seq & 1) && read_seqretry(lock, seq);
 }
 
+/**
+ * done_seqretry() - end seqlock_t "locking or lockless" reader section
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq: count, from read_seqbegin_or_lock()
+ *
+ * done_seqretry finishes the seqlock_t read side critical section
+ * started by read_seqbegin_or_lock(). The read section must've been
+ * already validated with need_seqretry().
+ */
 static inline void done_seqretry(seqlock_t *lock, int seq)
 {
 	if (seq & 1)
 		read_sequnlock_excl(lock);
 }
 
+/**
+ * read_seqlock_excl_bh() - start a locking reader seqlock_t section
+ *			    with softirqs disabled
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_bh`` variant of read_seqlock_excl(). Use this variant if the
+ * seqlock_t write side section, *or other read sections*, can be
+ * invoked from a softirq context
+ *
+ * The opened section must be closed with read_sequnlock_excl_bh().
+ */
 static inline void read_seqlock_excl_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl_bh() - stop a seqlock_t softirq-disabled locking
+ *			      reader section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_bh`` variant of read_sequnlock_excl(). The closed section must've
+ * been opened with read_seqlock_excl_bh().
+ */
 static inline void read_sequnlock_excl_bh(seqlock_t *sl)
 {
 	spin_unlock_bh(&sl->lock);
 }
 
+/**
+ * read_seqlock_excl_irq() - start a non-interruptible seqlock_t locking
+ *			     reader section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_irq`` variant of read_seqlock_excl(). Use this only if the
+ * seqlock_t write side critical section, *or other read side sections*,
+ * can be invoked from a hardirq context.
+ *
+ * The opened read section must be closed with read_sequnlock_excl_irq().
+ */
 static inline void read_seqlock_excl_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl_irq() - end an interrupts-disabled seqlock_t
+ *                             locking reader section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_irq`` variant of read_sequnlock_excl(). The closed section must've
+ * been opened with read_seqlock_excl_irq().
+ */
 static inline void read_sequnlock_excl_irq(seqlock_t *sl)
 {
 	spin_unlock_irq(&sl->lock);
@@ -613,15 +1026,59 @@ static inline unsigned long __read_seqlock_excl_irqsave(seqlock_t *sl)
 	return flags;
 }
 
+/**
+ * read_seqlock_excl_irqsave() - start a non-interruptible seqlock_t
+ *				 locking reader section
+ * @lock: Pointer to &typedef seqlock_t
+ * @flags: Stack-allocated storage for saving caller's local interrupt
+ *         state, to be passed to read_sequnlock_excl_irqrestore().
+ *
+ * ``_irqsave`` variant of read_seqlock_excl(). Use this only if the
+ * seqlock_t write side critical section, *or other read side sections*,
+ * can be invoked from a hardirq context.
+ *
+ * Opened section must be closed with read_sequnlock_excl_irqrestore().
+ */
 #define read_seqlock_excl_irqsave(lock, flags)				\
 	do { flags = __read_seqlock_excl_irqsave(lock); } while (0)
 
+/**
+ * read_sequnlock_excl_irqrestore() - end non-interruptible seqlock_t
+ *				      locking reader section
+ * @sl: Pointer to &typedef seqlock_t
+ * @flags: Caller's saved interrupt state, from
+ *	   read_seqlock_excl_irqsave()
+ *
+ * ``_irqrestore`` variant of read_sequnlock_excl(). The closed section
+ * must've been opened with read_seqlock_excl_irqsave().
+ */
 static inline void
 read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags)
 {
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
+/**
+ * read_seqbegin_or_lock_irqsave() - begin a seqlock_t lockless reader, or
+ *                                   a non-interruptible locking reader
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq: Marker and return parameter. Check read_seqbegin_or_lock().
+ *
+ * This is the ``_irqsave`` variant of read_seqbegin_or_lock(). Use if
+ * the seqlock_t write side critical section, *or other read side sections*,
+ * can be invoked from hardirq context.
+ *
+ * The validity of the read section must be checked with need_seqretry().
+ * The opened section must be closed with done_seqretry_irqrestore().
+ *
+ * Return:
+ *
+ *   1. The saved local interrupts state in case of a locking reader, to be
+ *      passed to done_seqretry_irqrestore().
+ *
+ *   2. The encountered sequence counter value, returned through @seq which
+ *      is overloaded as a return parameter. Check read_seqbegin_or_lock().
+ */
 static inline unsigned long
 read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
 {
@@ -635,6 +1092,18 @@ read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
 	return flags;
 }
 
+/**
+ * done_seqretry_irqrestore() - end a seqlock_t lockless reader, or a
+ *				non-interruptible locking reader section
+ * @lock:  Pointer to &typedef seqlock_t
+ * @seq:   Count, from read_seqbegin_or_lock_irqsave()
+ * @flags: Caller's saved local interrupt state in case of a locking
+ *	   reader, also from read_seqbegin_or_lock_irqsave()
+ *
+ * This is the ``_irqrestore`` variant of done_seqretry(). The read
+ * section must've been opened with read_seqbegin_or_lock_irqsave(), and
+ * validated with need_seqretry().
+ */
 static inline void
 done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags)
 {

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-07-07 14:37           ` Peter Zijlstra
@ 2020-07-08  9:12             ` Peter Zijlstra
  2020-07-08 10:43               ` Ahmed S. Darwish
  2020-07-08 10:33             ` Ahmed S. Darwish
  1 sibling, 1 reply; 258+ messages in thread
From: Peter Zijlstra @ 2020-07-08  9:12 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Tue, Jul 07, 2020 at 04:37:26PM +0200, Peter Zijlstra wrote:
> + * Use the ``_irqsave`` and ``_bh`` variants instead if the read side
> + * ``_bh`` variant of write_seqlock(). Use only if the read side section
> + * ``_irq`` variant of write_sequnlock(). The write side section of
> + * ``_irqsave`` variant of write_seqlock(). Use if the read section of
> + * ``_irqrestore`` variant of write_sequnlock(). The write section of
> + * ``_bh`` variant of read_seqlock_excl(). Use this variant if the
> + * ``_bh`` variant of read_sequnlock_excl(). The closed section must've
> + * ``_irq`` variant of read_seqlock_excl(). Use this only if the
> + * ``_irq`` variant of read_sequnlock_excl(). The closed section must've
> + * ``_irqsave`` variant of read_seqlock_excl(). Use this only if the
> + * ``_irqrestore`` variant of read_sequnlock_excl(). The closed section
> + * This is the ``_irqsave`` variant of read_seqbegin_or_lock(). Use if
> + * This is the ``_irqrestore`` variant of done_seqretry(). The read

Can we also get rid of that '' nonsense, the gods of ASCII gifted us "
for this.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-07-07 14:37           ` Peter Zijlstra
  2020-07-08  9:12             ` Peter Zijlstra
@ 2020-07-08 10:33             ` Ahmed S. Darwish
  2020-07-08 12:29               ` Peter Zijlstra
  1 sibling, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-08 10:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Tue, Jul 07, 2020 at 04:37:26PM +0200, Peter Zijlstra wrote:
>
> How's this? it removes a level of indirection and a bunch of repetition.

ACK, for the extra level of indirection removed.

> It's also more than 200 lines shorter.
...
> +#define __to_seqcount_t(s)	(seqcount_t *)(s)
...
> +#define read_seqcount_begin(s)	do_read_seqcount_begin(__to_seqcount_t(s))
> +
> +static inline unsigned do_read_seqcount_begin(const seqcount_t *s)
> +{
...

Hmm, the __to_seqcount_t(s) force cast is not good. It will break the
arguments type-safety of seqcount macros that do not have either:

    __associated_lock_is_preemptible() or
    __assert_associated_lock_held()

in their path. This basically includes all the read path macros, and
even some others (e.g. write_seqcount_invalidate()).

With the suggested force cast above, I can literally *pass anything* to
read_seqcount_begin() and friends, and the compiler won't say a thing.

So, I'll restore __to_seqcount_t(s) that to its original implementation:

/*
 * @s: pointer to seqcount_t or any of the seqcount_locktype_t variants
 */
#define __to_seqcount_t(s)						\
({									\
	seqcount_t *seq;						\
									\
	if (__same_type(*(s), seqcount_t))				\
		seq = (seqcount_t *)(s);				\
	else if (__same_type(*(s), seqcount_spinlock_t))		\
		seq = &((seqcount_spinlock_t *)(s))->seqcount;		\
	else if (__same_type(*(s), seqcount_raw_spinlock_t))		\
		seq = &((seqcount_raw_spinlock_t *)(s))->seqcount;	\
	else if (__same_type(*(s), seqcount_rwlock_t))			\
		seq = &((seqcount_rwlock_t *)(s))->seqcount;		\
	else if (__same_type(*(s), seqcount_mutex_t))			\
		seq = &((seqcount_mutex_t *)(s))->seqcount;		\
	else if (__same_type(*(s), seqcount_ww_mutex_t))		\
		seq = &((seqcount_ww_mutex_t *)(s))->seqcount;		\
	else								\
		BUILD_BUG_ON_MSG(1, "Unknown seqcount type");		\
									\
	seq;								\
})

Yes, I know, it's not the prettiest thing in the world, but I'd take
ugly over breaking the compiler type checks any day.

>
> It doesn't provide SEQCNT_LOCKTYPE_ZERO() for each LOCKTYPE, but you can
> use this one macro for any LOCKTYPE.
>

From call-sites perspectives, SEQCNT_SPINLOCK_ZERO() is much more
readable than SEQCNT_LOCKTYPE_ZERO() and so on. It only costs us 5 lines
*for all* the seqcount lock types. IMHO it's worth the trade-off.

>
> So I applied it all yesterday and tried to make sense of the resulting
> indirections and couldn't find stuff -- because it was hidding in
> another file.
>
> Specifically I disliked that *seqcount_t* naming and didn't see how it
> all connected.
>

Hmm, the idea was that write_seqcount_t_begin() and friends applies only
to plain old "seqcount_t". But, yes, I agree, it's confusing as hell.

The way you've organized the macros is much more readable, so thanks a
lot, I'll incorporate that in v4.

Kind regards,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-07-08  9:12             ` Peter Zijlstra
@ 2020-07-08 10:43               ` Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-08 10:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Wed, Jul 08, 2020 at 11:12:01AM +0200, Peter Zijlstra wrote:
> On Tue, Jul 07, 2020 at 04:37:26PM +0200, Peter Zijlstra wrote:
> > + * Use the ``_irqsave`` and ``_bh`` variants instead if the read side
> > + * ``_bh`` variant of write_seqlock(). Use only if the read side section
> > + * ``_irq`` variant of write_sequnlock(). The write side section of
> > + * ``_irqsave`` variant of write_seqlock(). Use if the read section of
> > + * ``_irqrestore`` variant of write_sequnlock(). The write section of
> > + * ``_bh`` variant of read_seqlock_excl(). Use this variant if the
> > + * ``_bh`` variant of read_sequnlock_excl(). The closed section must've
> > + * ``_irq`` variant of read_seqlock_excl(). Use this only if the
> > + * ``_irq`` variant of read_sequnlock_excl(). The closed section must've
> > + * ``_irqsave`` variant of read_seqlock_excl(). Use this only if the
> > + * ``_irqrestore`` variant of read_sequnlock_excl(). The closed section
> > + * This is the ``_irqsave`` variant of read_seqbegin_or_lock(). Use if
> > + * This is the ``_irqrestore`` variant of done_seqretry(). The read
>
> Can we also get rid of that '' nonsense, the gods of ASCII gifted us "
> for this.

Hehe, why not. I welcome back our ASCII overlords.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-07-08 10:33             ` Ahmed S. Darwish
@ 2020-07-08 12:29               ` Peter Zijlstra
  2020-07-08 14:13                 ` Peter Zijlstra
  2020-07-08 15:09                 ` Ahmed S. Darwish
  0 siblings, 2 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-07-08 12:29 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Wed, Jul 08, 2020 at 12:33:14PM +0200, Ahmed S. Darwish wrote:

> > +#define read_seqcount_begin(s)	do_read_seqcount_begin(__to_seqcount_t(s))
> > +
> > +static inline unsigned do_read_seqcount_begin(const seqcount_t *s)
> > +{
> ...
> 
> Hmm, the __to_seqcount_t(s) force cast is not good. It will break the
> arguments type-safety of seqcount macros that do not have either:
> 
>     __associated_lock_is_preemptible() or
>     __assert_associated_lock_held()
> 
> in their path. This basically includes all the read path macros, and
> even some others (e.g. write_seqcount_invalidate()).
> 
> With the suggested force cast above, I can literally *pass anything* to
> read_seqcount_begin() and friends, and the compiler won't say a thing.
> 
> So, I'll restore __to_seqcount_t(s) that to its original implementation:

Right, I figured that the write side would be enough to catch glaring
abuse. But sure.

It's a bummer we didn't upgrade the minimum compiler version to 4.9,
that would've given us _Generic(), which allows one to write this
slightly less verbose I think.

How about something disguisting like this then?


#ifdef CONFIG_LOCKDEP
#define __SEQCOUNT_LOCKDEP(expr)	expr
#else
#define __SEQCOUNT_LOCKDEP(expr)
#endif

typedef seqcount_t * __seqprop_ptr_t;
typedef bool	     __seqprop_preempt_t;
typedef void	     __seqprop_assert_t;

static __always_inline __seqprop_ptr_t
__seqprop_seqcount_ptr(seqcount_t *s)
{
	return s;
}

static __always_inline __seqprop_preempt_t
__seqprop_seqcount_preempt(seqcount_t *s)
{
	return false;
}

static __always_inline __seqprop_assert_t
__seqprop_seqcount_assert(seqcount_t *s)
{
}

#define SEQCOUNT_LOCKTYPE(name, locktype, preempt, lockmember)		\
typedef struct seqcount_##name {					\
	seqcount_t	seqcount;					\
	__SEQCOUNT_LOCKDEP(locktype *lock);				\
} seqcount_##name##_t;							\
									\
static __always_inline void						\
seqcount_##name##_init(seqcount_##name##_t *s, locktype *l)		\
{									\
	seqcount_init(&s->seqcount);					\
	__SEQCOUNT_LOCKDEP(s->lock = l);				\
}									\
									\
static __always_inline __seqprop_ptr_t					\
__seqprop_##name##_ptr(seqcount_##name##_t *s)				\
{									\
	return &s->seqcount;						\
}									\
									\
static __always_inline __seqprop_preempt_t				\
__seqprop_##name##_preempt(seqcount_##name##_t *s)			\
{									\
	return preempt;							\
}									\
									\
static __always_inline __seqprop_assert_t				\
__seqprop_##name##_assert(seqcount_##name##_t *s)			\
{									\
	__SEQCOUNT_LOCKDEP(lockdep_assert_held(s->lockmember));		\
}


#define SEQCNT_LOCKTYPE_ZERO(_name, _lock) {				\
	.seqcount = SEQCNT_ZERO(_name.seqcount),			\
	__SEQCOUNT_LOCKDEP(.lock = (_lock))				\
}

#include <linux/spinlock.h>
#include <linux/ww_mutex.h>

#define __SEQ_RT	IS_BUILTIN(CONFIG_PREEMPT_RT)

SEQCOUNT_LOCKTYPE(raw_spinlock, raw_spinlock_t,	false,		lock)
SEQCOUNT_LOCKTYPE(spinlock,	spinlock_t,	__SEQ_RT,	lock)
SEQCOUNT_LOCKTYPE(rwlock,	rwlock_t,	__SEQ_RT,	lock)
SEQCOUNT_LOCKTYPE(mutex,	struct mutex,	true,		lock)
SEQCOUNT_LOCKTYPE(ww_mutex,	struct ww_mutex,true,		lock->base)

#if (defined(CONFIG_CC_IS_GCC) && CONFIG_GCC_VERSION < 40900) || defined(__CHECKER__)

#define __seqprop_pick(const_expr, s, locktype, prop, otherwise)	\
	__builtin_choose_expr(const_expr,				\
			      __seqprop_##locktype##_##prop((void *)(s)), \
			      otherwise)

extern void __seqprop_invalid(void);

#define __seqprop(s, prop)								\
	__seqprop_pick(__same_type(*(s), seqcount_t), (s), seqcount, prop,		\
	  __seqprop_pick(__same_type(*(s), seqcount_raw_spinlock_t), (s), raw_spinlock, prop, \
	    __seqprop_pick(__same_type(*(s), seqcount_spinlock_t), (s), spinlock, prop,	\
	      __seqprop_pick(__same_type(*(s), seqcount_rwlock_t), (s), rwlock, prop,	\
	        __seqprop_pick(__same_type(*(s), seqcount_mutex_t), (s), mutex, prop,	\
	          __seqprop_pick(__same_type(*(s), seqcount_ww_mutex_t), (s), ww_mutex, prop, \
		    __seqprop_invalid()))))))

#else

#define __seqprop_case(s, locktype, prop) \
	seqcount_##locktype##_t: __seqprop_##locktype##_##prop((void *)s)

#define __seqprop(s, prop)					\
	_Generic(*(s),						\
		 seqcount_t: __seqprop_seqcount_##prop((void*)s),\
		 __seqprop_case((s), raw_spinlock, prop),	\
		 __seqprop_case((s), spinlock, prop),		\
		 __seqprop_case((s), rwlock, prop),		\
		 __seqprop_case((s), mutex, prop),		\
		 __seqprop_case((s), ww_mutex, prop))

#endif

#define __to_seqcount_t(s)			__seqprop(s, ptr)
#define __associated_lock_is_preemptible(s)	__seqprop(s, preempt)
#define __assert_associated_lock_held(s)	__seqprop(s, assert)

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-07-08 12:29               ` Peter Zijlstra
@ 2020-07-08 14:13                 ` Peter Zijlstra
  2020-07-08 14:25                   ` Peter Zijlstra
  2020-07-08 15:09                 ` Ahmed S. Darwish
  1 sibling, 1 reply; 258+ messages in thread
From: Peter Zijlstra @ 2020-07-08 14:13 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Wed, Jul 08, 2020 at 02:29:38PM +0200, Peter Zijlstra wrote:

> #define SEQCOUNT_LOCKTYPE(name, locktype, preempt, lockmember)		\
> typedef struct seqcount_##name {					\
> 	seqcount_t	seqcount;					\
> 	__SEQCOUNT_LOCKDEP(locktype *lock);				\
> } seqcount_##name##_t;							\
> 									\
> static __always_inline void						\
> seqcount_##name##_init(seqcount_##name##_t *s, locktype *l)		\
> {									\
> 	seqcount_init(&s->seqcount);					\
> 	__SEQCOUNT_LOCKDEP(s->lock = l);				\
> }									\
> 									\
> static __always_inline __seqprop_ptr_t					\
> __seqprop_##name##_ptr(seqcount_##name##_t *s)				\
> {									\
> 	return &s->seqcount;						\
> }									\
> 									\
> static __always_inline __seqprop_preempt_t				\
> __seqprop_##name##_preempt(seqcount_##name##_t *s)			\
> {									\
> 	return preempt;							\
> }									\
> 									\
> static __always_inline __seqprop_assert_t				\
> __seqprop_##name##_assert(seqcount_##name##_t *s)			\
> {									\
> 	__SEQCOUNT_LOCKDEP(lockdep_assert_held(s->lockmember));		\
> }

For PREEMPT_RT's magic thing, you can add:

static __always_inline void						\
__seqprop_##name##_lock(seqcount_##name##_t *s)				\
{									\
	if (!__SEQ_RT || !preempt)					\
		return;							\
									\
	lockbase##_lock(&s->lock);					\
	lockbase##_unlock(&s->lock);					\
}

and:

#define __rt_lock_unlock_associated_sleeping_lock(s) __seqprop(s, lock)


^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-07-08 14:13                 ` Peter Zijlstra
@ 2020-07-08 14:25                   ` Peter Zijlstra
  0 siblings, 0 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-07-08 14:25 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Wed, Jul 08, 2020 at 04:13:32PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 08, 2020 at 02:29:38PM +0200, Peter Zijlstra wrote:
> 
> > #define SEQCOUNT_LOCKTYPE(name, locktype, preempt, lockmember)		\
> > typedef struct seqcount_##name {					\
> > 	seqcount_t	seqcount;					\
> > 	__SEQCOUNT_LOCKDEP(locktype *lock);				\
> > } seqcount_##name##_t;							\
> > 									\
> > static __always_inline void						\
> > seqcount_##name##_init(seqcount_##name##_t *s, locktype *l)		\
> > {									\
> > 	seqcount_init(&s->seqcount);					\
> > 	__SEQCOUNT_LOCKDEP(s->lock = l);				\
> > }									\
> > 									\
> > static __always_inline __seqprop_ptr_t					\
> > __seqprop_##name##_ptr(seqcount_##name##_t *s)				\
> > {									\
> > 	return &s->seqcount;						\
> > }									\
> > 									\
> > static __always_inline __seqprop_preempt_t				\
> > __seqprop_##name##_preempt(seqcount_##name##_t *s)			\
> > {									\
> > 	return preempt;							\
> > }									\
> > 									\
> > static __always_inline __seqprop_assert_t				\
> > __seqprop_##name##_assert(seqcount_##name##_t *s)			\
> > {									\
> > 	__SEQCOUNT_LOCKDEP(lockdep_assert_held(s->lockmember));		\
> > }
> 
> For PREEMPT_RT's magic thing, you can add:
> 
> static __always_inline void						\
> __seqprop_##name##_lock(seqcount_##name##_t *s)				\
> {									\
> 	if (!__SEQ_RT || !preempt)					\
> 		return;							\
> 									\
> 	lockbase##_lock(&s->lock);					\
> 	lockbase##_unlock(&s->lock);					\
> }
> 
> and:
> 
> #define __rt_lock_unlock_associated_sleeping_lock(s) __seqprop(s, lock)

Or possible have it like:

	if (!__SEQ_RT || !preempt)
		return smp_cond_load_relaxed(&s->seqcount->sequence, !(VAL & 1));

	lockbase##_lock(&s->lock);
	lockbase##_unlock(&s->lock);

	return READ_ONCE(s->seqcount->sequence);

and then replace most of __read_seqcount_begin() with it.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-07-08 12:29               ` Peter Zijlstra
  2020-07-08 14:13                 ` Peter Zijlstra
@ 2020-07-08 15:09                 ` Ahmed S. Darwish
  2020-07-08 15:35                   ` Peter Zijlstra
  1 sibling, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-08 15:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Wed, Jul 08, 2020 at 02:29:38PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 08, 2020 at 12:33:14PM +0200, Ahmed S. Darwish wrote:
>
> > > +#define read_seqcount_begin(s)	do_read_seqcount_begin(__to_seqcount_t(s))
> > > +
> > > +static inline unsigned do_read_seqcount_begin(const seqcount_t *s)
> > > +{
> > ...
> >
> > Hmm, the __to_seqcount_t(s) force cast is not good. It will break the
> > arguments type-safety of seqcount macros that do not have either:
> >
> >     __associated_lock_is_preemptible() or
> >     __assert_associated_lock_held()
> >
> > in their path. This basically includes all the read path macros, and
> > even some others (e.g. write_seqcount_invalidate()).
> >
> > With the suggested force cast above, I can literally *pass anything* to
> > read_seqcount_begin() and friends, and the compiler won't say a thing.
> >
> > So, I'll restore __to_seqcount_t(s) that to its original implementation:
>
> Right, I figured that the write side would be enough to catch glaring
> abuse. But sure.
>
> It's a bummer we didn't upgrade the minimum compiler version to 4.9,
> that would've given us _Generic(), which allows one to write this
> slightly less verbose I think.
>

Looking at 5429ef62bcf3 ("compiler/gcc: Raise minimum GCC version for
kernel builds to 4.8"), it seems that the decision of picking gcc 4.8
vs. 4.9 was kinda arbitrary.

Anyway, please continue below.

> How about something disguisting like this then?
>
...
> #define __SEQ_RT	IS_BUILTIN(CONFIG_PREEMPT_RT)
>
> SEQCOUNT_LOCKTYPE(raw_spinlock, raw_spinlock_t,	false,		lock)
> SEQCOUNT_LOCKTYPE(spinlock,	spinlock_t,		__SEQ_RT,	lock)
> SEQCOUNT_LOCKTYPE(rwlock,	rwlock_t,		__SEQ_RT,	lock)
> SEQCOUNT_LOCKTYPE(mutex,	struct mutex,		true,		lock)
> SEQCOUNT_LOCKTYPE(ww_mutex,	struct ww_mutex,	true,		lock->base)
>
> #if (defined(CONFIG_CC_IS_GCC) && CONFIG_GCC_VERSION < 40900) || defined(__CHECKER__)
>
> #define __seqprop_pick(const_expr, s, locktype, prop, otherwise)	\
> 	__builtin_choose_expr(const_expr,				\
> 			      __seqprop_##locktype##_##prop((void *)(s)), \
> 			      otherwise)
>
> extern void __seqprop_invalid(void);
>
> #define __seqprop(s, prop)								\
> 	__seqprop_pick(__same_type(*(s), seqcount_t), (s), seqcount, prop,		\
> 	  __seqprop_pick(__same_type(*(s), seqcount_raw_spinlock_t), (s), raw_spinlock, prop, \
> 	    __seqprop_pick(__same_type(*(s), seqcount_spinlock_t), (s), spinlock, prop,	\
> 	      __seqprop_pick(__same_type(*(s), seqcount_rwlock_t), (s), rwlock, prop,	\
> 	        __seqprop_pick(__same_type(*(s), seqcount_mutex_t), (s), mutex, prop,	\
> 	          __seqprop_pick(__same_type(*(s), seqcount_ww_mutex_t), (s), ww_mutex, prop, \
> 		    __seqprop_invalid()))))))
>
> #else
>
> #define __seqprop_case(s, locktype, prop) \
> 	seqcount_##locktype##_t: __seqprop_##locktype##_##prop((void *)s)
>
> #define __seqprop(s, prop)					\
> 	_Generic(*(s),						\
> 		 seqcount_t: __seqprop_seqcount_##prop((void*)s),\
> 		 __seqprop_case((s), raw_spinlock, prop),	\
> 		 __seqprop_case((s), spinlock, prop),		\
> 		 __seqprop_case((s), rwlock, prop),		\
> 		 __seqprop_case((s), mutex, prop),		\
> 		 __seqprop_case((s), ww_mutex, prop))
>
> #endif
>
> #define __to_seqcount_t(s)			__seqprop(s, ptr)
> #define __associated_lock_is_preemptible(s)	__seqprop(s, preempt)
> #define __assert_associated_lock_held(s)	__seqprop(s, assert)

Hmm, I'll prototype the whole thing (along with PREEMPT_RT associated
lock()/unlock() as you've mentioned in the other e-mail), and come back.

Honestly, I have a first impression that this is heading into too much
complexity and compaction, but let's finish the whole thing first.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-07-08 15:09                 ` Ahmed S. Darwish
@ 2020-07-08 15:35                   ` Peter Zijlstra
  2020-07-08 15:58                     ` Ahmed S. Darwish
  2020-07-08 16:01                     ` Peter Zijlstra
  0 siblings, 2 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-07-08 15:35 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Wed, Jul 08, 2020 at 05:09:30PM +0200, Ahmed S. Darwish wrote:
> On Wed, Jul 08, 2020 at 02:29:38PM +0200, Peter Zijlstra wrote:

> > How about something disguisting like this then?
> >
> ...
> > #define __SEQ_RT	IS_BUILTIN(CONFIG_PREEMPT_RT)
> >
> > SEQCOUNT_LOCKTYPE(raw_spinlock, raw_spinlock_t,	false,		lock)
> > SEQCOUNT_LOCKTYPE(spinlock,	spinlock_t,		__SEQ_RT,	lock)
> > SEQCOUNT_LOCKTYPE(rwlock,	rwlock_t,		__SEQ_RT,	lock)
> > SEQCOUNT_LOCKTYPE(mutex,	struct mutex,		true,		lock)
> > SEQCOUNT_LOCKTYPE(ww_mutex,	struct ww_mutex,	true,		lock->base)
> >
> > #if (defined(CONFIG_CC_IS_GCC) && CONFIG_GCC_VERSION < 40900) || defined(__CHECKER__)
> >
> > #define __seqprop_pick(const_expr, s, locktype, prop, otherwise)	\
> > 	__builtin_choose_expr(const_expr,				\
> > 			      __seqprop_##locktype##_##prop((void *)(s)), \
> > 			      otherwise)
> >
> > extern void __seqprop_invalid(void);
> >
> > #define __seqprop(s, prop)								\
> > 	__seqprop_pick(__same_type(*(s), seqcount_t), (s), seqcount, prop,		\
> > 	  __seqprop_pick(__same_type(*(s), seqcount_raw_spinlock_t), (s), raw_spinlock, prop, \
> > 	    __seqprop_pick(__same_type(*(s), seqcount_spinlock_t), (s), spinlock, prop,	\
> > 	      __seqprop_pick(__same_type(*(s), seqcount_rwlock_t), (s), rwlock, prop,	\
> > 	        __seqprop_pick(__same_type(*(s), seqcount_mutex_t), (s), mutex, prop,	\
> > 	          __seqprop_pick(__same_type(*(s), seqcount_ww_mutex_t), (s), ww_mutex, prop, \
> > 		    __seqprop_invalid()))))))
> >
> > #else
> >
> > #define __seqprop_case(s, locktype, prop) \
> > 	seqcount_##locktype##_t: __seqprop_##locktype##_##prop((void *)s)
> >
> > #define __seqprop(s, prop)					\
> > 	_Generic(*(s),						\
> > 		 seqcount_t: __seqprop_seqcount_##prop((void*)s),\
> > 		 __seqprop_case((s), raw_spinlock, prop),	\
> > 		 __seqprop_case((s), spinlock, prop),		\
> > 		 __seqprop_case((s), rwlock, prop),		\
> > 		 __seqprop_case((s), mutex, prop),		\
> > 		 __seqprop_case((s), ww_mutex, prop))
> >
> > #endif
> >
> > #define __to_seqcount_t(s)			__seqprop(s, ptr)
> > #define __associated_lock_is_preemptible(s)	__seqprop(s, preempt)
> > #define __assert_associated_lock_held(s)	__seqprop(s, assert)
> 
> Hmm, I'll prototype the whole thing (along with PREEMPT_RT associated
> lock()/unlock() as you've mentioned in the other e-mail), and come back.
> 
> Honestly, I have a first impression that this is heading into too much
> complexity and compaction, but let's finish the whole thing first.

So the thing I pasted compiles kernel/sched/cputime.o, but that only
uses the old seqcount_t thing, not any of the fancy new stuff, still the
compiler groks it all.

And while the gcc-4.8 code is horrendous crap, the rest should be pretty
straight forward and concentrates on the pieces where there are
differences.

I even considered:

#define __SEQPROP(name, prop, expr) \
static __always_inline __seqprop_##prop##_t \
__seqprop##name##_##prop(seqcount##name##_t *s) \
{ \
	expr; \
}

Such that we could write:

__SEQPROP(, ptr, return s)
__SEQPROP(, preempt, return false)
__SEQPROP(, assert, )

__SEQPROP(_##locktype, ptr, return &s->seqcount) \
__SEQPROP(_##locktype, preempt, return preempt) \
__SEQPROP(_##locktype, assert, __SEQCOUNT_LOCKDEP(lockdep_assert_held(s->lockmember))) \

But I figured _that_ might've been one step too far ;-)

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-07-08 15:35                   ` Peter Zijlstra
@ 2020-07-08 15:58                     ` Ahmed S. Darwish
  2020-07-08 16:16                       ` Peter Zijlstra
  2020-07-08 16:18                       ` Peter Zijlstra
  2020-07-08 16:01                     ` Peter Zijlstra
  1 sibling, 2 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-08 15:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Wed, Jul 08, 2020 at 05:35:22PM +0200, Peter Zijlstra wrote:
...
>
> And while the gcc-4.8 code is horrendous crap, the rest should be pretty
> straight forward and concentrates on the pieces where there are
> differences.
>

Is there any possibility of upgrading the minimum gcc version to 4.9? Is
there any supported architecture that is still stuck on 4.8?

...
> I even considered:
>
> #define __SEQPROP(name, prop, expr) \
> static __always_inline __seqprop_##prop##_t \
> __seqprop##name##_##prop(seqcount##name##_t *s) \
> { \
> 	expr; \
> }
>
> Such that we could write:
>
> __SEQPROP(, ptr, return s)
> __SEQPROP(, preempt, return false)
> __SEQPROP(, assert, )
>
> __SEQPROP(_##locktype, ptr, return &s->seqcount) \
> __SEQPROP(_##locktype, preempt, return preempt) \
> __SEQPROP(_##locktype, assert, __SEQCOUNT_LOCKDEP(lockdep_assert_held(s->lockmember))) \
>
> But I figured _that_ might've been one step too far ;-)

Initially I implemented something like this during internal,
pre-upstream, versions of this patch series. We've decided afterwards
that the macro compression level is so high that the whole thing is not
so easily understandable.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-07-08 15:35                   ` Peter Zijlstra
  2020-07-08 15:58                     ` Ahmed S. Darwish
@ 2020-07-08 16:01                     ` Peter Zijlstra
  1 sibling, 0 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-07-08 16:01 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Wed, Jul 08, 2020 at 05:35:22PM +0200, Peter Zijlstra wrote:
> But I figured _that_ might've been one step too far ;-)

Damn, now you made me do it... and it's not too horrible. Included the
-rt part.

---
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 8b97204f35a7..cc15a6aaab00 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -1,36 +1,15 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef __LINUX_SEQLOCK_H
 #define __LINUX_SEQLOCK_H
+
 /*
- * Reader/writer consistent mechanism without starving writers. This type of
- * lock for data where the reader wants a consistent set of information
- * and is willing to retry if the information changes. There are two types
- * of readers:
- * 1. Sequence readers which never block a writer but they may have to retry
- *    if a writer is in progress by detecting change in sequence number.
- *    Writers do not wait for a sequence reader.
- * 2. Locking readers which will wait if a writer or another locking reader
- *    is in progress. A locking reader in progress will also block a writer
- *    from going forward. Unlike the regular rwlock, the read lock here is
- *    exclusive so that only one locking reader can get it.
- *
- * This is not as cache friendly as brlock. Also, this may not work well
- * for data that contains pointers, because any writer could
- * invalidate a pointer that a reader was following.
- *
- * Expected non-blocking reader usage:
- * 	do {
- *	    seq = read_seqbegin(&foo);
- * 	...
- *      } while (read_seqretry(&foo, seq));
+ * seqcount_t / seqlock_t - a reader-writer consistency mechanism with
+ * lockless readers (read-only retry loops), and no writer starvation.
  *
+ * See Documentation/locking/seqlock.rst for full description.
  *
- * On non-SMP the spin locks disappear but the writer still needs
- * to increment the sequence variables because an interrupt routine could
- * change the state of the data.
- *
- * Based on x86_64 vsyscall gettimeofday 
- * by Keith Owens and Andrea Arcangeli
+ * Copyrights:
+ * - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli
  */
 
 #include <linux/spinlock.h>
@@ -41,8 +20,8 @@
 #include <asm/processor.h>
 
 /*
- * The seqlock interface does not prescribe a precise sequence of read
- * begin/retry/end. For readers, typically there is a call to
+ * The seqlock seqcount_t interface does not prescribe a precise sequence of
+ * read begin/retry/end. For readers, typically there is a call to
  * read_seqcount_begin() and read_seqcount_retry(), however, there are more
  * esoteric cases which do not follow this pattern.
  *
@@ -56,10 +35,28 @@
 #define KCSAN_SEQLOCK_REGION_MAX 1000
 
 /*
- * Version using sequence counter only.
- * This can be used when code has its own mutex protecting the
- * updating starting before the write_seqcountbeqin() and ending
- * after the write_seqcount_end().
+ * Sequence counters (seqcount_t)
+ *
+ * This is the raw counting mechanism, without any writer protection.
+ *
+ * Write side critical sections must be serialized and non-preemptible.
+ *
+ * If readers can be invoked from hardirq or softirq contexts,
+ * interrupts or bottom halves must also be respectively disabled before
+ * entering the write section.
+ *
+ * This mechanism can't be used if the protected data contains pointers,
+ * as the writer can invalidate a pointer that a reader is following.
+ *
+ * If the write serialization mechanism is one of the common kernel
+ * locking primitives, use a sequence counter with associated lock
+ * (seqcount_LOCKTYPE_t) instead.
+ *
+ * If it's desired to automatically handle the sequence counter writer
+ * serialization and non-preemptibility requirements, use a sequential
+ * lock (seqlock_t) instead.
+ *
+ * See Documentation/locking/seqlock.rst
  */
 typedef struct seqcount {
 	unsigned sequence;
@@ -82,6 +79,10 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
 # define SEQCOUNT_DEP_MAP_INIT(lockname) \
 		.dep_map = { .name = #lockname } \
 
+/**
+ * seqcount_init() - runtime initializer for seqcount_t
+ * @s: Pointer to the &typedef seqcount_t instance
+ */
 # define seqcount_init(s)				\
 	do {						\
 		static struct lock_class_key __key;	\
@@ -105,12 +106,139 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
 # define seqcount_lockdep_reader_access(x)
 #endif
 
-#define SEQCNT_ZERO(lockname) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(lockname)}
+/**
+ * SEQCNT_ZERO() - static initializer for seqcount_t
+ * @name: Name of the &typedef seqcount_t instance
+ */
+#define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) }
+
+/*
+ * Sequence counters with associated locks (seqcount_LOCKTYPE_t)
+ *
+ * A sequence counter which associates the lock used for writer
+ * serialization at initialization time. This enables lockdep to validate
+ * that the write side critical section is properly serialized.
+ *
+ * For associated locks which do not implicitly disable preemption,
+ * preemption protection is enforced in the write side function.
+ *
+ * See Documentation/locking/seqlock.rst
+ */
+
+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_PREEMPT_RT)
+#define __SEQ_LOCK(expr)	expr
+#else
+#define __SEQ_LOCK(expr)
+#endif
+
+typedef seqcount_t * __seqprop_ptr_t;
+typedef bool	     __seqprop_preempt_t;
+typedef void	     __seqprop_assert_t;
+typedef unsigned     __seqprop_begin_t;
+
+#define __SEQPROP(name, prop)			\
+static __always_inline __seqprop_##prop##_t	\
+__seqprop##name##_##prop(seqcount##name##_t *s)
+
+__SEQPROP(, ptr) { return s; }
+__SEQPROP(, preempt) { return false; }
+__SEQPROP(, assert) { }
+__SEQPROP(, begin) { return smp_cond_load_relaxed(&s->sequence, !(VAL & 1)); }
+
+#define SEQCOUNT_LOCKTYPE(name, locktype, lockbase, blocking, lockmember)	\
+typedef struct seqcount_##name {					\
+	seqcount_t	seqcount;					\
+	__SEQ_LOCK(locktype *lock);					\
+} seqcount_##name##_t;							\
+									\
+static __always_inline void						\
+seqcount_##name##_init(seqcount_##name##_t *s, locktype *l)		\
+{									\
+	seqcount_init(&s->seqcount);					\
+	__SEQ_LOCK(s->lock = l);					\
+}									\
+									\
+__SEQPROP(_##name, ptr)	    { return &s->seqcount; }			\
+__SEQPROP(_##name, preempt) { return blocking; }			\
+__SEQPROP(_##name, assert)  {						\
+	__SEQ_LOCK(lockdep_assert_held(s->lockmember));			\
+}									\
+__SEQPROP(_##name, begin)   {						\
+	if (!__SEQ_RT || !blocking)					\
+		return __seqprop_begin(&s->seqcount);			\
+									\
+	__SEQ_LOCK(lockbase##_lock(&s->lock);				\
+		   lockbase##_unlock(&s->lock));			\
+									\
+	return READ_ONCE(s->seqcount.sequence);				\
+}
+
+#define SEQCNT_LOCKTYPE_ZERO(_name, _lock) {				\
+	.seqcount = SEQCNT_ZERO(_name.seqcount),			\
+	__SEQ_LOCK(.lock = (_lock))					\
+}
+
+#include <linux/spinlock.h>
+#include <linux/ww_mutex.h>
+
+#define __SEQ_RT	IS_BUILTIN(CONFIG_PREEMPT_RT)
+
+SEQCOUNT_LOCKTYPE(raw_spinlock, raw_spinlock_t,	raw_spin, false,	lock)
+SEQCOUNT_LOCKTYPE(spinlock,	spinlock_t,	spin,     __SEQ_RT,	lock)
+SEQCOUNT_LOCKTYPE(rwlock,	rwlock_t,	read,     __SEQ_RT,	lock)
+SEQCOUNT_LOCKTYPE(mutex,	struct mutex,	mutex,    true,		lock)
+SEQCOUNT_LOCKTYPE(ww_mutex,	struct ww_mutex,ww_mutex, true,		lock->base)
 
+/*
+ *	seqcount_LOCKTYPE_t -- write APIs
+ *
+ * For associated lock types which do not implicitly disable preemption,
+ * enforce preemption protection in the write side functions.
+ *
+ * Never use lockdep for the raw write variants.
+ */
+
+#define __seqprop_pick(s, name, prop, otherwise)			\
+	__builtin_choose_expr(__same_type(*(s), seqcount##name##_t,	\
+			      __seqprop##name##_##prop((void *)(s)),	\
+			      otherwise)
+
+extern void __seqprop_invalid(void);
+
+#if (defined(CONFIG_CC_IS_GCC) && CONFIG_GCC_VERSION < 40900) || defined(__CHECKER__)
+
+#define __seqprop(s, prop)				\
+	__seqprop_pick((s), , prop,			\
+	  __seqprop_pick((s), _raw_spinlock, prop,	\
+	    __seqprop_pick((s), _spinlock, prop,	\
+	      __seqprop_pick((s), _rwlock, prop,	\
+	        __seqprop_pick((s), _mutex, prop,	\
+	          __seqprop_pick((s), _ww_mutex, prop,	\
+		    __seqprop_invalid()))))))
+
+#else
+
+#define __seqprop_case(s, name, prop) \
+	seqcount##name##_t: __seqprop##name##_##prop((void *)s)
+
+#define __seqprop(s, prop)					\
+	_Generic(*(s),						\
+		 __seqprop_case((s), , prop),			\
+		 __seqprop_case((s), _raw_spinlock, prop),	\
+		 __seqprop_case((s), _spinlock, prop),		\
+		 __seqprop_case((s), _rwlock, prop),		\
+		 __seqprop_case((s), _mutex, prop),		\
+		 __seqprop_case((s), _ww_mutex, prop))
+
+#endif
+
+#define __to_seqcount_t(s)			__seqprop(s, ptr)
+#define __associated_lock_is_preemptible(s)	__seqprop(s, preempt)
+#define __assert_associated_lock_held(s)	__seqprop(s, assert)
 
 /**
- * __read_seqcount_begin - begin a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
+ * __read_seqcount_begin() - begin a seqcount read section (without barrier)
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * Returns: count to be passed to read_seqcount_retry
  *
  * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
@@ -121,7 +249,15 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  * Use carefully, only in critical code, and comment how the barrier is
  * provided.
  */
-static inline unsigned __read_seqcount_begin(const seqcount_t *s)
+#define __read_seqcount_begin(s)				\
+({								\
+	unsigned ret = __seqprop(s, begin);			\
+	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);		\
+	ret;							\
+})
+
+
+static inline unsigned do___read_seqcount_begin(const seqcount_t *s)
 {
 	unsigned ret;
 
@@ -136,15 +272,18 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount - Read the raw seqcount
- * @s: pointer to seqcount_t
+ * raw_read_seqcount() - Read the seqcount raw counter value
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * Returns: count to be passed to read_seqcount_retry
  *
  * raw_read_seqcount opens a read critical section of the given
- * seqcount without any lockdep checking and without checking or
- * masking the LSB. Calling code is responsible for handling that.
+ * seqcount_t, without any lockdep checks and without checking or
+ * masking the sequence counter LSB. Calling code is responsible for
+ * handling that.
  */
-static inline unsigned raw_read_seqcount(const seqcount_t *s)
+#define raw_read_seqcount(s)	do_raw_read_seqcount(__to_seqcount_t(s))
+
+static inline unsigned do_raw_read_seqcount(const seqcount_t *s)
 {
 	unsigned ret = READ_ONCE(s->sequence);
 	smp_rmb();
@@ -153,42 +292,44 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount_begin - start seq-read critical section w/o lockdep
- * @s: pointer to seqcount_t
+ * raw_read_seqcount_begin() - start a seqcount read section w/o lockdep
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * Returns: count to be passed to read_seqcount_retry
  *
  * raw_read_seqcount_begin opens a read critical section of the given
- * seqcount, but without any lockdep checking. Validity of the critical
- * section is tested by checking read_seqcount_retry function.
+ * seqcount_t, but without any lockdep checking. Validity of the read
+ * section must be checked with read_seqcount_retry().
  */
-static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
+#define raw_read_seqcount_begin(s)	do_raw_read_seqcount_begin(__to_seqcount_t(s))
+
+static inline unsigned do_raw_read_seqcount_begin(const seqcount_t *s)
 {
-	unsigned ret = __read_seqcount_begin(s);
+	unsigned ret = do___read_seqcount_begin(s);
 	smp_rmb();
 	return ret;
 }
 
 /**
- * read_seqcount_begin - begin a seq-read critical section
- * @s: pointer to seqcount_t
+ * read_seqcount_begin() - start a seqcount read critical section
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * Returns: count to be passed to read_seqcount_retry
  *
- * read_seqcount_begin opens a read critical section of the given seqcount.
- * Validity of the critical section is tested by checking read_seqcount_retry
- * function.
+ * read_seqcount_begin opens a read critical section of the given
+ * seqcount_t. Validity of the read section must be checked with
+ * read_seqcount_retry().
  */
-static inline unsigned read_seqcount_begin(const seqcount_t *s)
-{
-	seqcount_lockdep_reader_access(s);
-	return raw_read_seqcount_begin(s);
-}
+#define read_seqcount_begin(s)				\
+({							\
+	seqcount_lockdep_reader_access(s);		\
+	__read_seqcount_begin(s);			\
+})
 
 /**
- * raw_seqcount_begin - begin a seq-read critical section
- * @s: pointer to seqcount_t
+ * raw_seqcount_begin() - begin a seq-read critical section
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * Returns: count to be passed to read_seqcount_retry
  *
- * raw_seqcount_begin opens a read critical section of the given seqcount.
+ * raw_seqcount_begin opens a read critical section of the given seqcount_t.
  * Validity of the critical section is tested by checking read_seqcount_retry
  * function.
  *
@@ -197,7 +338,9 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
  * read_seqcount_retry() instead of stabilizing at the beginning of the
  * critical section.
  */
-static inline unsigned raw_seqcount_begin(const seqcount_t *s)
+#define raw_seqcount_begin(s)	do_raw_seqcount_begin(__to_seqcount_t(s))
+
+static inline unsigned do_raw_seqcount_begin(const seqcount_t *s)
 {
 	unsigned ret = READ_ONCE(s->sequence);
 	smp_rmb();
@@ -206,8 +349,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * __read_seqcount_retry - end a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
+ * __read_seqcount_retry() - end a seq-read critical section (without barrier)
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * @start: count, from read_seqcount_begin
  * Returns: 1 if retry is required, else 0
  *
@@ -219,38 +362,56 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
  * Use carefully, only in critical code, and comment how the barrier is
  * provided.
  */
-static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
+#define __read_seqcount_retry(s, start)	do___read_seqcount_retry(__to_seqcount_t(s), start)
+
+static inline int do___read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
 	kcsan_atomic_next(0);
 	return unlikely(READ_ONCE(s->sequence) != start);
 }
 
 /**
- * read_seqcount_retry - end a seq-read critical section
- * @s: pointer to seqcount_t
+ * read_seqcount_retry() - end a seq-read critical section
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  * @start: count, from read_seqcount_begin
  * Returns: 1 if retry is required, else 0
  *
- * read_seqcount_retry closes a read critical section of the given seqcount.
+ * read_seqcount_retry closes a read critical section of given seqcount_t.
  * If the critical section was invalid, it must be ignored (and typically
  * retried).
  */
-static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
+#define read_seqcount_retry(s, start)	do_read_seqcount_retry(__to_seqcount_t(s), start)
+
+static inline int do_read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
 	smp_rmb();
-	return __read_seqcount_retry(s, start);
+	return do___read_seqcount_retry(s, start);
 }
 
+#define raw_write_seqcount_begin(s)					\
+do {									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_disable();					\
+									\
+	do_raw_write_seqcount_begin(__to_seqcount_t(s));		\
+} while (0)
 
-
-static inline void raw_write_seqcount_begin(seqcount_t *s)
+static inline void do_raw_write_seqcount_begin(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
 	s->sequence++;
 	smp_wmb();
 }
 
-static inline void raw_write_seqcount_end(seqcount_t *s)
+#define raw_write_seqcount_end(s)					\
+do {									\
+	do_raw_write_seqcount_end(__to_seqcount_t(s));			\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_enable();					\
+} while (0)
+
+static inline void do_raw_write_seqcount_end(seqcount_t *s)
 {
 	smp_wmb();
 	s->sequence++;
@@ -258,12 +419,12 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 }
 
 /**
- * raw_write_seqcount_barrier - do a seq write barrier
- * @s: pointer to seqcount_t
+ * raw_write_seqcount_barrier() - do a seq write barrier
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * This can be used to provide an ordering guarantee instead of the
  * usual consistency guarantee. It is one wmb cheaper, because we can
- * collapse the two back-to-back wmb()s.
+ * collapse the two back-to-back wmb()s::
  *
  * Note that writes surrounding the barrier should be declared atomic (e.g.
  * via WRITE_ONCE): a) to ensure the writes become visible to other threads
@@ -298,7 +459,9 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  *              WRITE_ONCE(X, false);
  *      }
  */
-static inline void raw_write_seqcount_barrier(seqcount_t *s)
+#define raw_write_seqcount_barrier(s)	do_raw_write_seqcount_barrier(__to_seqcount_t(s))
+
+static inline void do_raw_write_seqcount_barrier(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
 	s->sequence++;
@@ -307,7 +470,24 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
-static inline int raw_read_seqcount_latch(seqcount_t *s)
+/**
+ * raw_read_seqcount_latch() - pick even or odd seqcount latch data copy
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
+ *
+ * Use seqcount latching to switch between two storage places with
+ * sequence protection to allow interruptible, preemptible, writer
+ * sections.
+ *
+ * Check raw_write_seqcount_latch() for more details and a full reader
+ * and writer usage example.
+ *
+ * Return: sequence counter. Use the lowest bit as index for picking
+ * which data copy to read. Full counter must then be checked with
+ * read_seqcount_retry().
+ */
+#define raw_read_seqcount_latch(s)	do_raw_read_seqcount_latch(__to_seqcount_t(s))
+
+static inline int do_raw_read_seqcount_latch(seqcount_t *s)
 {
 	/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
 	int seq = READ_ONCE(s->sequence); /* ^^^ */
@@ -315,8 +495,8 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
 }
 
 /**
- * raw_write_seqcount_latch - redirect readers to even/odd copy
- * @s: pointer to seqcount_t
+ * raw_write_seqcount_latch() - redirect readers to even/odd copy
+ * @s: pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -332,101 +512,164 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  * Very simply put: we first modify one copy and then the other. This ensures
  * there is always one copy in a stable state, ready to give us an answer.
  *
- * The basic form is a data structure like:
+ * The basic form is a data structure like::
  *
- * struct latch_struct {
- *	seqcount_t		seq;
- *	struct data_struct	data[2];
- * };
+ *	struct latch_struct {
+ *		seqcount_t		seq;
+ *		struct data_struct	data[2];
+ *	};
  *
  * Where a modification, which is assumed to be externally serialized, does the
- * following:
+ * following::
  *
- * void latch_modify(struct latch_struct *latch, ...)
- * {
- *	smp_wmb();	<- Ensure that the last data[1] update is visible
- *	latch->seq++;
- *	smp_wmb();	<- Ensure that the seqcount update is visible
+ *	void latch_modify(struct latch_struct *latch, ...)
+ *	{
+ *		smp_wmb();	// Ensure that the last data[1] update is visible
+ *		latch->seq++;
+ *		smp_wmb();	// Ensure that the seqcount update is visible
  *
- *	modify(latch->data[0], ...);
+ *		modify(latch->data[0], ...);
  *
- *	smp_wmb();	<- Ensure that the data[0] update is visible
- *	latch->seq++;
- *	smp_wmb();	<- Ensure that the seqcount update is visible
+ *		smp_wmb();	// Ensure that the data[0] update is visible
+ *		latch->seq++;
+ *		smp_wmb();	// Ensure that the seqcount update is visible
  *
- *	modify(latch->data[1], ...);
- * }
+ *		modify(latch->data[1], ...);
+ *	}
  *
- * The query will have a form like:
+ * The query will have a form like::
  *
- * struct entry *latch_query(struct latch_struct *latch, ...)
- * {
- *	struct entry *entry;
- *	unsigned seq, idx;
+ *	struct entry *latch_query(struct latch_struct *latch, ...)
+ *	{
+ *		struct entry *entry;
+ *		unsigned seq, idx;
  *
- *	do {
- *		seq = raw_read_seqcount_latch(&latch->seq);
+ *		do {
+ *			seq = raw_read_seqcount_latch(&latch->seq);
  *
- *		idx = seq & 0x01;
- *		entry = data_query(latch->data[idx], ...);
+ *			idx = seq & 0x01;
+ *			entry = data_query(latch->data[idx], ...);
  *
- *		smp_rmb();
- *	} while (seq != latch->seq);
+ *			// read_seqcount_retry() includes necessary smp_rmb()
+ *		} while (read_seqcount_retry(&latch->seq, seq);
  *
- *	return entry;
- * }
+ *		return entry;
+ *	}
  *
  * So during the modification, queries are first redirected to data[1]. Then we
  * modify data[0]. When that is complete, we redirect queries back to data[0]
  * and we can modify data[1].
  *
- * NOTE: The non-requirement for atomic modifications does _NOT_ include
- *       the publishing of new entries in the case where data is a dynamic
- *       data structure.
+ * NOTE:
+ *
+ *	The non-requirement for atomic modifications does _NOT_ include
+ *	the publishing of new entries in the case where data is a dynamic
+ *	data structure.
+ *
+ *	An iteration might start in data[0] and get suspended long enough
+ *	to miss an entire modification sequence, once it resumes it might
+ *	observe the new entry.
  *
- *       An iteration might start in data[0] and get suspended long enough
- *       to miss an entire modification sequence, once it resumes it might
- *       observe the new entry.
+ * NOTE:
  *
- * NOTE: When data is a dynamic data structure; one should use regular RCU
- *       patterns to manage the lifetimes of the objects within.
+ *	When data is a dynamic data structure; one should use regular RCU
+ *	patterns to manage the lifetimes of the objects within.
  */
-static inline void raw_write_seqcount_latch(seqcount_t *s)
+#define raw_write_seqcount_latch(s)	do_raw_write_seqcount_latch(__to_seqcount_t(s))
+
+static inline void do_raw_write_seqcount_latch(seqcount_t *s)
 {
        smp_wmb();      /* prior stores before incrementing "sequence" */
        s->sequence++;
        smp_wmb();      /* increment "sequence" before following stores */
 }
 
+static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
+{
+	do_raw_write_seqcount_begin(s);
+	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
+}
+
+#define write_seqcount_begin_nested(s, subclass)			\
+do {									\
+	__assert_associated_lock_held(s);				\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_disable();					\
+									\
+	do_write_seqcount_begin_nested(__to_seqcount_t(s), subclass);	\
+} while (0)
+
+static inline void do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
+{
+//	lockdep_assert_preemption_disabled();
+	__write_seqcount_begin_nested(s, subclass);
+}
+
 /*
- * Sequence counter only version assumes that callers are using their
- * own mutexing.
+ * write_seqcount_t_begin() without lockdep non-preemptibility checks.
+ *
+ * Use for internal seqlock.h code where it's known that preemption is
+ * already disabled. For example, seqlock_t write side functions.
  */
-static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
+static inline void __write_seqcount_begin(seqcount_t *s)
 {
-	raw_write_seqcount_begin(s);
-	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
+	__write_seqcount_begin_nested(s, 0);
 }
 
-static inline void write_seqcount_begin(seqcount_t *s)
+/**
+ * write_seqcount_begin() - start a seqcount write-side critical section
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
+ *
+ * write_seqcount_begin opens a write-side critical section of the given
+ * seqcount. Seqcount write-side critical sections must be externally
+ * serialized and non-preemptible.
+ */
+#define write_seqcount_begin(s)						\
+do {									\
+	__assert_associated_lock_held(s);				\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_disable();					\
+									\
+	do_write_seqcount_begin(__to_seqcount_t(s));			\
+} while (0)
+
+static inline void do_write_seqcount_begin(seqcount_t *s)
 {
 	write_seqcount_begin_nested(s, 0);
 }
 
-static inline void write_seqcount_end(seqcount_t *s)
+/**
+ * write_seqcount_end() - end a seqcount_t write-side critical section
+ * @s: Pointer to &typedef seqcount_t
+ *
+ * The write section must've been opened with write_seqcount_begin().
+ */
+#define write_seqcount_end(s)						\
+do {									\
+	do_write_seqcount_end(__to_seqcount_t(s));			\
+									\
+	if (__associated_lock_is_preemptible(s))			\
+		preempt_enable();					\
+} while (0)
+
+static inline void do_write_seqcount_end(seqcount_t *s)
 {
 	seqcount_release(&s->dep_map, _RET_IP_);
 	raw_write_seqcount_end(s);
 }
 
 /**
- * write_seqcount_invalidate - invalidate in-progress read-side seq operations
- * @s: pointer to seqcount_t
+ * write_seqcount_invalidate() - invalidate in-progress read-side seq operations
+ * @s: Pointer to &typedef seqcount_t or any of the seqcount_locktype_t variants
  *
  * After write_seqcount_invalidate, no read-side seq operations will complete
  * successfully and see data older than this.
  */
-static inline void write_seqcount_invalidate(seqcount_t *s)
+#define write_seqcount_invalidate(s)	do_write_seqcount_invalidate(__to_seqcount_t(s))
+
+static inline void do_write_seqcount_invalidate(seqcount_t *s)
 {
 	smp_wmb();
 	kcsan_nestable_atomic_begin();
@@ -434,32 +677,53 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
+/*
+ * Sequential locks (seqlock_t)
+ *
+ * Sequence counters with an embedded spinlock for writer serialization
+ * and non-preemptibility.
+ *
+ * For more info, see:
+ *   - Comments on top of seqcount_t
+ *   - Documentation/locking/seqlock.rst
+ */
 typedef struct {
 	struct seqcount seqcount;
 	spinlock_t lock;
 } seqlock_t;
 
-/*
- * These macros triggered gcc-3.x compile-time problems.  We think these are
- * OK now.  Be cautious.
- */
 #define __SEQLOCK_UNLOCKED(lockname)			\
 	{						\
 		.seqcount = SEQCNT_ZERO(lockname),	\
 		.lock =	__SPIN_LOCK_UNLOCKED(lockname)	\
 	}
 
-#define seqlock_init(x)					\
+/**
+ * seqlock_init() - dynamic initializer for seqlock_t
+ * @sl: Pointer to the &typedef seqlock_t instance
+ */
+#define seqlock_init(sl)				\
 	do {						\
-		seqcount_init(&(x)->seqcount);		\
-		spin_lock_init(&(x)->lock);		\
+		seqcount_init(&(sl)->seqcount);		\
+		spin_lock_init(&(sl)->lock);		\
 	} while (0)
 
-#define DEFINE_SEQLOCK(x) \
-		seqlock_t x = __SEQLOCK_UNLOCKED(x)
+/**
+ * DEFINE_SEQLOCK() - Define a statically-allocated seqlock_t
+ * @sl: Name of the &typedef seqlock_t instance
+ */
+#define DEFINE_SEQLOCK(sl) \
+		seqlock_t sl = __SEQLOCK_UNLOCKED(sl)
 
-/*
- * Read side functions for starting and finalizing a read side section.
+/**
+ * read_seqbegin() - start a seqlock_t read-side critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_seqbegin opens a read side critical section of the given
+ * seqlock_t. Validity of the critical section is tested by checking
+ * read_seqretry().
+ *
+ * Return: count to be passed to read_seqretry()
  */
 static inline unsigned read_seqbegin(const seqlock_t *sl)
 {
@@ -470,6 +734,17 @@ static inline unsigned read_seqbegin(const seqlock_t *sl)
 	return ret;
 }
 
+/**
+ * read_seqretry() - end and validate a seqlock_t read side section
+ * @sl: Pointer to &typedef seqlock_t
+ * @start: count, from read_seqbegin()
+ *
+ * read_seqretry closes the given seqlock_t read side critical section,
+ * and checks its validity. If the read section was invalid, it must be
+ * ignored and retried.
+ *
+ * Return: 1 if a retry is required, 0 otherwise
+ */
 static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 {
 	/*
@@ -478,47 +753,94 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 	 */
 	kcsan_flat_atomic_end();
 
-	return read_seqcount_retry(&sl->seqcount, start);
+	return do_read_seqcount_retry(&sl->seqcount, start);
 }
 
-/*
- * Lock out other writers and update the count.
- * Acts like a normal spin_lock/unlock.
- * Don't need preempt_disable() because that is in the spin_lock already.
+/**
+ * write_seqlock() - start a seqlock_t write side critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_seqlock opens a write side critical section of the given
+ * seqlock_t.  It also acquires the spinlock_t embedded inside the
+ * sequential lock. All the seqlock_t write side critical sections are
+ * thus automatically serialized and non-preemptible.
+ *
+ * Use the ``_irqsave`` and ``_bh`` variants instead if the read side
+ * can be invoked from a hardirq or softirq context.
+ *
+ * The opened write side section must be closed with write_sequnlock().
  */
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock() - end a seqlock_t write side critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_sequnlock closes the (serialized and non-preemptible) write
+ * side critical section of given seqlock_t.
+ */
 static inline void write_sequnlock(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	do_write_seqcount_end(&sl->seqcount);
 	spin_unlock(&sl->lock);
 }
 
+/**
+ * write_seqlock_bh() - start a softirqs-disabled seqlock_t write section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_bh`` variant of write_seqlock(). Use only if the read side section
+ * can be invoked from a softirq context.
+ *
+ * The opened write section must be closed with write_sequnlock_bh().
+ */
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock_bh() - end a softirqs-disabled seqlock_t write section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * write_sequnlock_bh closes the serialized, non-preemptible,
+ * softirqs-disabled, seqlock_t write side critical section opened with
+ * write_seqlock_bh().
+ */
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	do_write_seqcount_end(&sl->seqcount);
 	spin_unlock_bh(&sl->lock);
 }
 
+/**
+ * write_seqlock_irq() - start a non-interruptible seqlock_t write side section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * This is the ``_irq`` variant of write_seqlock(). Use only if the read
+ * section of given seqlock_t can be invoked from a hardirq context.
+ */
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock_irq() - end a non-interruptible seqlock_t write side section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_irq`` variant of write_sequnlock(). The write side section of
+ * given seqlock_t must've been opened with write_seqlock_irq().
+ */
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	do_write_seqcount_end(&sl->seqcount);
 	spin_unlock_irq(&sl->lock);
 }
 
@@ -527,44 +849,98 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	unsigned long flags;
 
 	spin_lock_irqsave(&sl->lock, flags);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
+
 	return flags;
 }
 
+/**
+ * write_seqlock_irqsave() - start a non-interruptible seqlock_t write section
+ * @lock:  Pointer to &typedef seqlock_t
+ * @flags: Stack-allocated storage for saving caller's local interrupt
+ *         state, to be passed to write_sequnlock_irqrestore().
+ *
+ * ``_irqsave`` variant of write_seqlock(). Use if the read section of
+ * given seqlock_t can be invoked from a hardirq context.
+ *
+ * The opened write section must be closed with write_sequnlock_irqrestore().
+ */
 #define write_seqlock_irqsave(lock, flags)				\
 	do { flags = __write_seqlock_irqsave(lock); } while (0)
 
+/**
+ * write_sequnlock_irqrestore() - end non-interruptible seqlock_t write section
+ * @sl:    Pointer to &typedef seqlock_t
+ * @flags: Caller's saved interrupt state, from write_seqlock_irqsave()
+ *
+ * ``_irqrestore`` variant of write_sequnlock(). The write section of
+ * given seqlock_t must've been opened with write_seqlock_irqsave().
+ */
 static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
-	write_seqcount_end(&sl->seqcount);
+	do_write_seqcount_end(&sl->seqcount);
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
-/*
- * A locking reader exclusively locks out other writers and locking readers,
- * but doesn't update the sequence number. Acts like a normal spin_lock/unlock.
- * Don't need preempt_disable() because that is in the spin_lock already.
+/**
+ * read_seqlock_excl() - begin a seqlock_t locking reader critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_seqlock_excl opens a locking reader critical section for the
+ * given seqlock_t. A locking reader exclusively locks out other writers
+ * and other *locking* readers, but doesn't update the sequence number.
+ *
+ * Locking readers act like a normal spin_lock()/spin_unlock().
+ *
+ * The opened read side section must be closed with read_sequnlock_excl().
  */
 static inline void read_seqlock_excl(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl() - end a seqlock_t locking reader critical section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * read_sequnlock_excl closes the locking reader critical section opened
+ * with read_seqlock_excl().
+ */
 static inline void read_sequnlock_excl(seqlock_t *sl)
 {
 	spin_unlock(&sl->lock);
 }
 
 /**
- * read_seqbegin_or_lock - begin a sequence number check or locking block
- * @lock: sequence lock
- * @seq : sequence number to be checked
+ * read_seqbegin_or_lock() - begin a seqlock_t lockless or locking reader
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq : Marker and return parameter. If the passed value is even, the
+ * reader will become a *lockless* seqlock_t sequence counter reader as
+ * in read_seqbegin(). If the passed value is odd, the reader will
+ * become a fully locking reader, as in read_seqlock_excl().  In the
+ * first call to read_seqbegin_or_lock(), the caller **must** initialize
+ * and pass an even value to @seq so a lockless read is optimistically
+ * tried first.
+ *
+ * read_seqbegin_or_lock is an API designed to optimistically try a
+ * normal lockless seqlock_t read section first, as in read_seqbegin().
+ * If an odd counter is found, the normal lockless read trial has
+ * failed, and the next reader iteration transforms to a full seqlock_t
+ * locking reader as in read_seqlock_excl().
+ *
+ * This is typically used to avoid lockless seqlock_t readers starvation
+ * (too much retry loops) in the case of a sharp spike in write
+ * activity.
  *
- * First try it once optimistically without taking the lock. If that fails,
- * take the lock. The sequence number is also used as a marker for deciding
- * whether to be a reader (even) or writer (odd).
- * N.B. seq must be initialized to an even number to begin with.
+ * The opened read section must be closed with done_seqretry().  Check
+ * Documentation/locking/seqlock.rst for template example code.
+ *
+ * Return: The encountered sequence counter value, returned through the
+ * @seq parameter, which is overloaded as a return parameter. The
+ * returned value must be checked with need_seqretry(). If the read
+ * section must be retried, the returned value must also be passed to
+ * the @seq parameter of the next read_seqbegin_or_lock() iteration.
  */
 static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
 {
@@ -574,32 +950,90 @@ static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
 		read_seqlock_excl(lock);
 }
 
+/**
+ * need_seqretry() - validate seqlock_t "locking or lockless" reader section
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq: count, from read_seqbegin_or_lock()
+ *
+ * need_seqretry checks if the seqlock_t read-side critical section
+ * started with read_seqbegin_or_lock() is valid. If it was not, the
+ * caller must retry the read-side section.
+ *
+ * Return: 1 if a retry is required, 0 otherwise
+ */
 static inline int need_seqretry(seqlock_t *lock, int seq)
 {
 	return !(seq & 1) && read_seqretry(lock, seq);
 }
 
+/**
+ * done_seqretry() - end seqlock_t "locking or lockless" reader section
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq: count, from read_seqbegin_or_lock()
+ *
+ * done_seqretry finishes the seqlock_t read side critical section
+ * started by read_seqbegin_or_lock(). The read section must've been
+ * already validated with need_seqretry().
+ */
 static inline void done_seqretry(seqlock_t *lock, int seq)
 {
 	if (seq & 1)
 		read_sequnlock_excl(lock);
 }
 
+/**
+ * read_seqlock_excl_bh() - start a locking reader seqlock_t section
+ *			    with softirqs disabled
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_bh`` variant of read_seqlock_excl(). Use this variant if the
+ * seqlock_t write side section, *or other read sections*, can be
+ * invoked from a softirq context
+ *
+ * The opened section must be closed with read_sequnlock_excl_bh().
+ */
 static inline void read_seqlock_excl_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl_bh() - stop a seqlock_t softirq-disabled locking
+ *			      reader section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_bh`` variant of read_sequnlock_excl(). The closed section must've
+ * been opened with read_seqlock_excl_bh().
+ */
 static inline void read_sequnlock_excl_bh(seqlock_t *sl)
 {
 	spin_unlock_bh(&sl->lock);
 }
 
+/**
+ * read_seqlock_excl_irq() - start a non-interruptible seqlock_t locking
+ *			     reader section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_irq`` variant of read_seqlock_excl(). Use this only if the
+ * seqlock_t write side critical section, *or other read side sections*,
+ * can be invoked from a hardirq context.
+ *
+ * The opened read section must be closed with read_sequnlock_excl_irq().
+ */
 static inline void read_seqlock_excl_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl_irq() - end an interrupts-disabled seqlock_t
+ *                             locking reader section
+ * @sl: Pointer to &typedef seqlock_t
+ *
+ * ``_irq`` variant of read_sequnlock_excl(). The closed section must've
+ * been opened with read_seqlock_excl_irq().
+ */
 static inline void read_sequnlock_excl_irq(seqlock_t *sl)
 {
 	spin_unlock_irq(&sl->lock);
@@ -613,15 +1047,59 @@ static inline unsigned long __read_seqlock_excl_irqsave(seqlock_t *sl)
 	return flags;
 }
 
+/**
+ * read_seqlock_excl_irqsave() - start a non-interruptible seqlock_t
+ *				 locking reader section
+ * @lock: Pointer to &typedef seqlock_t
+ * @flags: Stack-allocated storage for saving caller's local interrupt
+ *         state, to be passed to read_sequnlock_excl_irqrestore().
+ *
+ * ``_irqsave`` variant of read_seqlock_excl(). Use this only if the
+ * seqlock_t write side critical section, *or other read side sections*,
+ * can be invoked from a hardirq context.
+ *
+ * Opened section must be closed with read_sequnlock_excl_irqrestore().
+ */
 #define read_seqlock_excl_irqsave(lock, flags)				\
 	do { flags = __read_seqlock_excl_irqsave(lock); } while (0)
 
+/**
+ * read_sequnlock_excl_irqrestore() - end non-interruptible seqlock_t
+ *				      locking reader section
+ * @sl: Pointer to &typedef seqlock_t
+ * @flags: Caller's saved interrupt state, from
+ *	   read_seqlock_excl_irqsave()
+ *
+ * ``_irqrestore`` variant of read_sequnlock_excl(). The closed section
+ * must've been opened with read_seqlock_excl_irqsave().
+ */
 static inline void
 read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags)
 {
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
+/**
+ * read_seqbegin_or_lock_irqsave() - begin a seqlock_t lockless reader, or
+ *                                   a non-interruptible locking reader
+ * @lock: Pointer to &typedef seqlock_t
+ * @seq: Marker and return parameter. Check read_seqbegin_or_lock().
+ *
+ * This is the ``_irqsave`` variant of read_seqbegin_or_lock(). Use if
+ * the seqlock_t write side critical section, *or other read side sections*,
+ * can be invoked from hardirq context.
+ *
+ * The validity of the read section must be checked with need_seqretry().
+ * The opened section must be closed with done_seqretry_irqrestore().
+ *
+ * Return:
+ *
+ *   1. The saved local interrupts state in case of a locking reader, to be
+ *      passed to done_seqretry_irqrestore().
+ *
+ *   2. The encountered sequence counter value, returned through @seq which
+ *      is overloaded as a return parameter. Check read_seqbegin_or_lock().
+ */
 static inline unsigned long
 read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
 {
@@ -635,6 +1113,18 @@ read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
 	return flags;
 }
 
+/**
+ * done_seqretry_irqrestore() - end a seqlock_t lockless reader, or a
+ *				non-interruptible locking reader section
+ * @lock:  Pointer to &typedef seqlock_t
+ * @seq:   Count, from read_seqbegin_or_lock_irqsave()
+ * @flags: Caller's saved local interrupt state in case of a locking
+ *	   reader, also from read_seqbegin_or_lock_irqsave()
+ *
+ * This is the ``_irqrestore`` variant of done_seqretry(). The read
+ * section must've been opened with read_seqbegin_or_lock_irqsave(), and
+ * validated with need_seqretry().
+ */
 static inline void
 done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags)
 {

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-07-08 15:58                     ` Ahmed S. Darwish
@ 2020-07-08 16:16                       ` Peter Zijlstra
  2020-07-08 16:18                       ` Peter Zijlstra
  1 sibling, 0 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-07-08 16:16 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Wed, Jul 08, 2020 at 05:58:13PM +0200, Ahmed S. Darwish wrote:
> On Wed, Jul 08, 2020 at 05:35:22PM +0200, Peter Zijlstra wrote:
> ...
> >
> > And while the gcc-4.8 code is horrendous crap, the rest should be pretty
> > straight forward and concentrates on the pieces where there are
> > differences.
> >
> 
> Is there any possibility of upgrading the minimum gcc version to 4.9? Is
> there any supported architecture that is still stuck on 4.8?

Upgrading also got mention here:

  https://lkml.kernel.org/r/CAHk-=wgD+q+oDdtukYC74_cDX5i0Ynf0GLhuNe2Faaokejj6fQ@mail.gmail.com

But it didn't get much response.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks
  2020-07-08 15:58                     ` Ahmed S. Darwish
  2020-07-08 16:16                       ` Peter Zijlstra
@ 2020-07-08 16:18                       ` Peter Zijlstra
  1 sibling, 0 replies; 258+ messages in thread
From: Peter Zijlstra @ 2020-07-08 16:18 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, Steven Rostedt, LKML

On Wed, Jul 08, 2020 at 05:58:13PM +0200, Ahmed S. Darwish wrote:
> > I even considered:
> >
> > #define __SEQPROP(name, prop, expr) \
> > static __always_inline __seqprop_##prop##_t \
> > __seqprop##name##_##prop(seqcount##name##_t *s) \
> > { \
> > 	expr; \
> > }
> >
> > Such that we could write:
> >
> > __SEQPROP(, ptr, return s)
> > __SEQPROP(, preempt, return false)
> > __SEQPROP(, assert, )
> >
> > __SEQPROP(_##locktype, ptr, return &s->seqcount) \
> > __SEQPROP(_##locktype, preempt, return preempt) \
> > __SEQPROP(_##locktype, assert, __SEQCOUNT_LOCKDEP(lockdep_assert_held(s->lockmember))) \
> >
> > But I figured _that_ might've been one step too far ;-)
> 
> Initially I implemented something like this during internal,
> pre-upstream, versions of this patch series. We've decided afterwards
> that the macro compression level is so high that the whole thing is not
> so easily understandable.

I've been reading too much tracing code lately... :-) This is only 2
levels of expansion and fits on a single screen.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (26 preceding siblings ...)
  2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
@ 2020-07-20 15:55 ` Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 01/24] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
                     ` (24 more replies)
  2020-08-27 11:40 ` [PATCH v1 0/8] seqlock: Introduce seqcount_latch_t Ahmed S. Darwish
                   ` (2 subsequent siblings)
  30 siblings, 25 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc, David S. Miller, netdev, Alexander Viro,
	linux-fsdevel

Hi,

This is v4 of the seqlock patch series:

   [PATCH v1 00/25]
   https://lore.kernel.org/lkml/20200519214547.352050-1-a.darwish@linutronix.de

   [PATCH v2 00/06] (bugfixes-only, merged)
   https://lore.kernel.org/lkml/20200603144949.1122421-1-a.darwish@linutronix.de

   [PATCH v2 00/18]
   https://lore.kernel.org/lkml/20200608005729.1874024-1-a.darwish@linutronix.de

   [PATCH v3 00/20]
   https://lore.kernel.org/lkml/20200630054452.3675847-1-a.darwish@linutronix.de

It is based over:

   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git :: locking/core

Changelog
=========

 - Unconditionally use C11 _Generic() expressions for seqcount_locktype_t
   switching. Thanks to Peter, pushing for 6ec4476ac825 ("Raise gcc
   version requirement to 4.9").

 - Compress the new seqcount_locktype_t code code by using generative
   macros, as suggested by Peter here:

   https://lkml.kernel.org/r/20200708122938.GQ4800@hirez.programming.kicks-ass.net

   Keep *all* functions that are to be invoked by call-sites out of such
   generative macros though. This simplifies the generative macros code,
   and (more importantly) make the newly exported seqlock.h API explicit.

 - Make all documentation "RST-lite", for better readability from text
   editors.

 - Add additional clean-ups at the start of the series for better
   overall readability of seqlock.h code, and for future extensibility.

Thanks,

8<--------------

Ahmed S. Darwish (24):
  Documentation: locking: Describe seqlock design and usage
  seqlock: Properly format kernel-doc code samples
  seqlock: seqcount_t latch: End read sections with
    read_seqcount_retry()
  seqlock: Reorder seqcount_t and seqlock_t API definitions
  seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs
  seqlock: Implement raw_seqcount_begin() in terms of
    raw_read_seqcount()
  lockdep: Add preemption enabled/disabled assertion APIs
  seqlock: lockdep assert non-preemptibility on seqcount_t write
  seqlock: Extend seqcount API with associated locks
  seqlock: Align multi-line macros newline escapes at 72 columns
  dma-buf: Remove custom seqcount lockdep class key
  dma-buf: Use sequence counter with associated wound/wait mutex
  sched: tasks: Use sequence counter with associated spinlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  xfrm: policy: Use sequence counters with associated lock
  timekeeping: Use sequence counter with associated raw spinlock
  vfs: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  hrtimer: Use sequence counter with associated raw spinlock

 Documentation/locking/index.rst               |    1 +
 Documentation/locking/seqlock.rst             |  222 ++++
 block/blk-iocost.c                            |    5 +-
 drivers/dma-buf/dma-resv.c                    |   15 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |    2 -
 drivers/md/raid5.c                            |    2 +-
 drivers/md/raid5.h                            |    2 +-
 fs/dcache.c                                   |    2 +-
 fs/fs_struct.c                                |    4 +-
 fs/nfs/nfs4_fs.h                              |    2 +-
 fs/nfs/nfs4state.c                            |    2 +-
 fs/userfaultfd.c                              |    4 +-
 include/linux/dcache.h                        |    2 +-
 include/linux/dma-resv.h                      |    4 +-
 include/linux/fs_struct.h                     |    2 +-
 include/linux/hrtimer.h                       |    2 +-
 include/linux/kvm_irqfd.h                     |    2 +-
 include/linux/lockdep.h                       |   19 +
 include/linux/sched.h                         |    2 +-
 include/linux/seqlock.h                       | 1139 +++++++++++++----
 include/net/netfilter/nf_conntrack.h          |    2 +-
 init/init_task.c                              |    3 +-
 kernel/fork.c                                 |    2 +-
 kernel/time/hrtimer.c                         |   13 +-
 kernel/time/timekeeping.c                     |   19 +-
 lib/Kconfig.debug                             |    1 +
 net/netfilter/nf_conntrack_core.c             |    5 +-
 net/netfilter/nft_set_rbtree.c                |    4 +-
 net/xfrm/xfrm_policy.c                        |   10 +-
 virt/kvm/eventfd.c                            |    2 +-
 30 files changed, 1173 insertions(+), 323 deletions(-)
 create mode 100644 Documentation/locking/seqlock.rst

base-commit: a9232dc5607dbada801f2fe83ea307cda762969a
--
2.20.1

^ permalink raw reply	[flat|nested] 258+ messages in thread

* [PATCH v4 01/24] Documentation: locking: Describe seqlock design and usage
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-21  1:35     ` Steven Rostedt
                       ` (2 more replies)
  2020-07-20 15:55   ` [PATCH v4 02/24] seqlock: Properly format kernel-doc code samples Ahmed S. Darwish
                     ` (23 subsequent siblings)
  24 siblings, 3 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc

Proper documentation for the design and usage of sequence counters and
sequential locks does not exist. Complete the seqlock.h documentation as
follows:

  - Divide all documentation on a seqcount_t vs. seqlock_t basis. The
    description for both mechanisms was intermingled, which is incorrect
    since the usage constrains for each type are vastly different.

  - Add an introductory paragraph describing the internal design of, and
    rationale for, sequence counters.

  - Document seqcount_t writer non-preemptibility requirement, which was
    not previously documented anywhere, and provide a clear rationale.

  - Provide template code for seqcount_t and seqlock_t initialization
    and reader/writer critical sections.

  - Recommend using seqlock_t by default. It implicitly handles the
    serialization and non-preemptibility requirements of writers.

At seqlock.h:

  - Remove references to brlocks as they've long been removed from the
    kernel.

  - Remove references to gcc-3.x since the kernel's minimum supported
    gcc version is 4.9.

References: 0f6ed63b1707 ("no need to keep brlock macros anymore...")
References: 6ec4476ac825 ("Raise gcc version requirement to 4.9")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 Documentation/locking/index.rst   |   1 +
 Documentation/locking/seqlock.rst | 170 ++++++++++++++++++++++++++++++
 include/linux/seqlock.h           |  81 +++++++-------
 3 files changed, 209 insertions(+), 43 deletions(-)
 create mode 100644 Documentation/locking/seqlock.rst

diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
index d785878cad65..7003bd5aeff4 100644
--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -14,6 +14,7 @@ locking
     mutex-design
     rt-mutex-design
     rt-mutex
+    seqlock
     spinlocks
     ww-mutex-design
     preempt-locking
diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
new file mode 100644
index 000000000000..366dd368d90a
--- /dev/null
+++ b/Documentation/locking/seqlock.rst
@@ -0,0 +1,170 @@
+======================================
+Sequence counters and sequential locks
+======================================
+
+Introduction
+============
+
+Sequence counters are a reader-writer consistency mechanism with
+lockless readers (read-only retry loops), and no writer starvation. They
+are used for data that's rarely written to (e.g. system time), where the
+reader wants a consistent set of information and is willing to retry if
+that information changes.
+
+A data set is consistent when the sequence count at the beginning of the
+read side critical section is even and the same sequence count value is
+read again at the end of the critical section. The data in the set must
+be copied out inside the read side critical section. If the sequence
+count has changed between the start and the end of the critical section,
+the reader must retry.
+
+Writers increment the sequence count at the start and the end of their
+critical section. After starting the critical section the sequence count
+is odd and indicates to the readers that an update is in progress. At
+the end of the write side critical section the sequence count becomes
+even again which lets readers make progress.
+
+A sequence counter write side critical section must never be preempted
+or interrupted by read side sections. Otherwise the reader will spin for
+the entire scheduler tick due to the odd sequence count value and the
+interrupted writer. If that reader belongs to a real-time scheduling
+class, it can spin forever and the kernel will livelock.
+
+This mechanism cannot be used if the protected data contains pointers,
+as the writer can invalidate a pointer that the reader is following.
+
+
+.. _seqcount_t:
+
+Sequence counters (``seqcount_t``)
+==================================
+
+This is the the raw counting mechanism, which does not protect against
+multiple writers.  Write side critical sections must thus be serialized
+by an external lock.
+
+If the write serialization primitive is not implicitly disabling
+preemption, preemption must be explicitly disabled before entering the
+write side section. If the read section can be invoked from hardirq or
+softirq contexts, interrupts or bottom halves must also be respectively
+disabled before entering the write section.
+
+If it's desired to automatically handle the sequence counter
+requirements of writer serialization and non-preemptibility, use
+:ref:`seqlock_t` instead.
+
+Initialization::
+
+	/* dynamic */
+	seqcount_t foo_seqcount;
+	seqcount_init(&foo_seqcount);
+
+	/* static */
+	static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
+
+	/* C99 struct init */
+	struct {
+		.seq   = SEQCNT_ZERO(foo.seq),
+	} foo;
+
+Write path::
+
+	/* Serialized context with disabled preemption */
+
+	write_seqcount_begin(&foo_seqcount);
+
+	/* ... [[write-side critical section]] ... */
+
+	write_seqcount_end(&foo_seqcount);
+
+Read path::
+
+	do {
+		seq = read_seqcount_begin(&foo_seqcount);
+
+		/* ... [[read-side critical section]] ... */
+
+	} while (read_seqcount_retry(&foo_seqcount, seq));
+
+
+.. _seqlock_t:
+
+Sequential locks (``seqlock_t``)
+================================
+
+This contains the :ref:`seqcount_t` mechanism earlier discussed, plus an
+embedded spinlock for writer serialization and non-preemptibility.
+
+If the read side section can be invoked from hardirq or softirq context,
+use the write side function variants which disable interrupts or bottom
+halves respectively.
+
+Initialization::
+
+	/* dynamic */
+	seqlock_t foo_seqlock;
+	seqlock_init(&foo_seqlock);
+
+	/* static */
+	static DEFINE_SEQLOCK(foo_seqlock);
+
+	/* C99 struct init */
+	struct {
+		.seql   = __SEQLOCK_UNLOCKED(foo.seql)
+	} foo;
+
+Write path::
+
+	write_seqlock(&foo_seqlock);
+
+	/* ... [[write-side critical section]] ... */
+
+	write_sequnlock(&foo_seqlock);
+
+Read path, three categories:
+
+1. Normal Sequence readers which never block a writer but they must
+   retry if a writer is in progress by detecting change in the sequence
+   number.  Writers do not wait for a sequence reader::
+
+	do {
+		seq = read_seqbegin(&foo_seqlock);
+
+		/* ... [[read-side critical section]] ... */
+
+	} while (read_seqretry(&foo_seqlock, seq));
+
+2. Locking readers which will wait if a writer or another locking reader
+   is in progress. A locking reader in progress will also block a writer
+   from entering its critical section. This read lock is
+   exclusive. Unlike rwlock_t, only one locking reader can acquire it::
+
+	read_seqlock_excl(&foo_seqlock);
+
+	/* ... [[read-side critical section]] ... */
+
+	read_sequnlock_excl(&foo_seqlock);
+
+3. Conditional lockless reader (as in 1), or locking reader (as in 2),
+   according to a passed marker. This is used to avoid lockless readers
+   starvation (too much retry loops) in case of a sharp spike in write
+   activity. First, a lockless read is tried (even marker passed). If
+   that trial fails (odd sequence counter is returned, which is used as
+   the next iteration marker), the lockless read is transformed to a
+   full locking read and no retry loop is necessary::
+
+	/* marker; even initialization */
+	int seq = 0;
+	do {
+		read_seqbegin_or_lock(&foo_seqlock, &seq);
+
+		/* ... [[read-side critical section]] ... */
+
+	} while (need_seqretry(&foo_seqlock, seq));
+	done_seqretry(&foo_seqlock, seq);
+
+
+API documentation
+=================
+
+.. kernel-doc:: include/linux/seqlock.h
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 8b97204f35a7..299d68f10325 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -1,36 +1,15 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef __LINUX_SEQLOCK_H
 #define __LINUX_SEQLOCK_H
+
 /*
- * Reader/writer consistent mechanism without starving writers. This type of
- * lock for data where the reader wants a consistent set of information
- * and is willing to retry if the information changes. There are two types
- * of readers:
- * 1. Sequence readers which never block a writer but they may have to retry
- *    if a writer is in progress by detecting change in sequence number.
- *    Writers do not wait for a sequence reader.
- * 2. Locking readers which will wait if a writer or another locking reader
- *    is in progress. A locking reader in progress will also block a writer
- *    from going forward. Unlike the regular rwlock, the read lock here is
- *    exclusive so that only one locking reader can get it.
+ * seqcount_t / seqlock_t - a reader-writer consistency mechanism with
+ * lockless readers (read-only retry loops), and no writer starvation.
  *
- * This is not as cache friendly as brlock. Also, this may not work well
- * for data that contains pointers, because any writer could
- * invalidate a pointer that a reader was following.
+ * See Documentation/locking/seqlock.rst
  *
- * Expected non-blocking reader usage:
- * 	do {
- *	    seq = read_seqbegin(&foo);
- * 	...
- *      } while (read_seqretry(&foo, seq));
- *
- *
- * On non-SMP the spin locks disappear but the writer still needs
- * to increment the sequence variables because an interrupt routine could
- * change the state of the data.
- *
- * Based on x86_64 vsyscall gettimeofday 
- * by Keith Owens and Andrea Arcangeli
+ * Copyrights:
+ * - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli
  */
 
 #include <linux/spinlock.h>
@@ -41,8 +20,8 @@
 #include <asm/processor.h>
 
 /*
- * The seqlock interface does not prescribe a precise sequence of read
- * begin/retry/end. For readers, typically there is a call to
+ * The seqlock seqcount_t interface does not prescribe a precise sequence of
+ * read begin/retry/end. For readers, typically there is a call to
  * read_seqcount_begin() and read_seqcount_retry(), however, there are more
  * esoteric cases which do not follow this pattern.
  *
@@ -50,16 +29,30 @@
  * via seqcount_t under KCSAN: upon beginning a seq-reader critical section,
  * pessimistically mark the next KCSAN_SEQLOCK_REGION_MAX memory accesses as
  * atomics; if there is a matching read_seqcount_retry() call, no following
- * memory operations are considered atomic. Usage of seqlocks via seqlock_t
- * interface is not affected.
+ * memory operations are considered atomic. Usage of the seqlock_t interface
+ * is not affected.
  */
 #define KCSAN_SEQLOCK_REGION_MAX 1000
 
 /*
- * Version using sequence counter only.
- * This can be used when code has its own mutex protecting the
- * updating starting before the write_seqcountbeqin() and ending
- * after the write_seqcount_end().
+ * Sequence counters (seqcount_t)
+ *
+ * This is the raw counting mechanism, without any writer protection.
+ *
+ * Write side critical sections must be serialized and non-preemptible.
+ *
+ * If readers can be invoked from hardirq or softirq contexts,
+ * interrupts or bottom halves must also be respectively disabled before
+ * entering the write section.
+ *
+ * This mechanism can't be used if the protected data contains pointers,
+ * as the writer can invalidate a pointer that a reader is following.
+ *
+ * If it's desired to automatically handle the sequence counter writer
+ * serialization and non-preemptibility requirements, use a sequential
+ * lock (seqlock_t) instead.
+ *
+ * See Documentation/locking/seqlock.rst
  */
 typedef struct seqcount {
 	unsigned sequence;
@@ -398,10 +391,6 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
        smp_wmb();      /* increment "sequence" before following stores */
 }
 
-/*
- * Sequence counter only version assumes that callers are using their
- * own mutexing.
- */
 static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
 	raw_write_seqcount_begin(s);
@@ -434,15 +423,21 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
+/*
+ * Sequential locks (seqlock_t)
+ *
+ * Sequence counters with an embedded spinlock for writer serialization
+ * and non-preemptibility.
+ *
+ * For more info, see:
+ *    - Comments on top of seqcount_t
+ *    - Documentation/locking/seqlock.rst
+ */
 typedef struct {
 	struct seqcount seqcount;
 	spinlock_t lock;
 } seqlock_t;
 
-/*
- * These macros triggered gcc-3.x compile-time problems.  We think these are
- * OK now.  Be cautious.
- */
 #define __SEQLOCK_UNLOCKED(lockname)			\
 	{						\
 		.seqcount = SEQCNT_ZERO(lockname),	\
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 02/24] seqlock: Properly format kernel-doc code samples
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 01/24] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 03/24] seqlock: seqcount_t latch: End read sections with read_seqcount_retry() Ahmed S. Darwish
                     ` (22 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc

Align the code samples and note sections inside kernel-doc comments with
tabs. This way they can be properly parsed and rendered by Sphinx. It
also makes the code samples easier to read from text editors.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 108 +++++++++++++++++++++-------------------
 1 file changed, 56 insertions(+), 52 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 299d68f10325..6c4f68ef1393 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -263,32 +263,32 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  * atomically, avoiding compiler optimizations; b) to document which writes are
  * meant to propagate to the reader critical section. This is necessary because
  * neither writes before and after the barrier are enclosed in a seq-writer
- * critical section that would ensure readers are aware of ongoing writes.
+ * critical section that would ensure readers are aware of ongoing writes::
  *
- *      seqcount_t seq;
- *      bool X = true, Y = false;
+ *	seqcount_t seq;
+ *	bool X = true, Y = false;
  *
- *      void read(void)
- *      {
- *              bool x, y;
+ *	void read(void)
+ *	{
+ *		bool x, y;
  *
- *              do {
- *                      int s = read_seqcount_begin(&seq);
+ *		do {
+ *			int s = read_seqcount_begin(&seq);
  *
- *                      x = X; y = Y;
+ *			x = X; y = Y;
  *
- *              } while (read_seqcount_retry(&seq, s));
+ *		} while (read_seqcount_retry(&seq, s));
  *
- *              BUG_ON(!x && !y);
+ *		BUG_ON(!x && !y);
  *      }
  *
  *      void write(void)
  *      {
- *              WRITE_ONCE(Y, true);
+ *		WRITE_ONCE(Y, true);
  *
- *              raw_write_seqcount_barrier(seq);
+ *		raw_write_seqcount_barrier(seq);
  *
- *              WRITE_ONCE(X, false);
+ *		WRITE_ONCE(X, false);
  *      }
  */
 static inline void raw_write_seqcount_barrier(seqcount_t *s)
@@ -325,64 +325,68 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  * Very simply put: we first modify one copy and then the other. This ensures
  * there is always one copy in a stable state, ready to give us an answer.
  *
- * The basic form is a data structure like:
+ * The basic form is a data structure like::
  *
- * struct latch_struct {
- *	seqcount_t		seq;
- *	struct data_struct	data[2];
- * };
+ *	struct latch_struct {
+ *		seqcount_t		seq;
+ *		struct data_struct	data[2];
+ *	};
  *
  * Where a modification, which is assumed to be externally serialized, does the
- * following:
+ * following::
  *
- * void latch_modify(struct latch_struct *latch, ...)
- * {
- *	smp_wmb();	<- Ensure that the last data[1] update is visible
- *	latch->seq++;
- *	smp_wmb();	<- Ensure that the seqcount update is visible
+ *	void latch_modify(struct latch_struct *latch, ...)
+ *	{
+ *		smp_wmb();	// Ensure that the last data[1] update is visible
+ *		latch->seq++;
+ *		smp_wmb();	// Ensure that the seqcount update is visible
  *
- *	modify(latch->data[0], ...);
+ *		modify(latch->data[0], ...);
  *
- *	smp_wmb();	<- Ensure that the data[0] update is visible
- *	latch->seq++;
- *	smp_wmb();	<- Ensure that the seqcount update is visible
+ *		smp_wmb();	// Ensure that the data[0] update is visible
+ *		latch->seq++;
+ *		smp_wmb();	// Ensure that the seqcount update is visible
  *
- *	modify(latch->data[1], ...);
- * }
+ *		modify(latch->data[1], ...);
+ *	}
  *
- * The query will have a form like:
+ * The query will have a form like::
  *
- * struct entry *latch_query(struct latch_struct *latch, ...)
- * {
- *	struct entry *entry;
- *	unsigned seq, idx;
+ *	struct entry *latch_query(struct latch_struct *latch, ...)
+ *	{
+ *		struct entry *entry;
+ *		unsigned seq, idx;
  *
- *	do {
- *		seq = raw_read_seqcount_latch(&latch->seq);
+ *		do {
+ *			seq = raw_read_seqcount_latch(&latch->seq);
  *
- *		idx = seq & 0x01;
- *		entry = data_query(latch->data[idx], ...);
+ *			idx = seq & 0x01;
+ *			entry = data_query(latch->data[idx], ...);
  *
- *		smp_rmb();
- *	} while (seq != latch->seq);
+ *			smp_rmb();
+ *		} while (seq != latch->seq);
  *
- *	return entry;
- * }
+ *		return entry;
+ *	}
  *
  * So during the modification, queries are first redirected to data[1]. Then we
  * modify data[0]. When that is complete, we redirect queries back to data[0]
  * and we can modify data[1].
  *
- * NOTE: The non-requirement for atomic modifications does _NOT_ include
- *       the publishing of new entries in the case where data is a dynamic
- *       data structure.
+ * NOTE:
  *
- *       An iteration might start in data[0] and get suspended long enough
- *       to miss an entire modification sequence, once it resumes it might
- *       observe the new entry.
+ *	The non-requirement for atomic modifications does _NOT_ include
+ *	the publishing of new entries in the case where data is a dynamic
+ *	data structure.
  *
- * NOTE: When data is a dynamic data structure; one should use regular RCU
- *       patterns to manage the lifetimes of the objects within.
+ *	An iteration might start in data[0] and get suspended long enough
+ *	to miss an entire modification sequence, once it resumes it might
+ *	observe the new entry.
+ *
+ * NOTE:
+ *
+ *	When data is a dynamic data structure; one should use regular RCU
+ *	patterns to manage the lifetimes of the objects within.
  */
 static inline void raw_write_seqcount_latch(seqcount_t *s)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 03/24] seqlock: seqcount_t latch: End read sections with read_seqcount_retry()
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 01/24] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 02/24] seqlock: Properly format kernel-doc code samples Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 04/24] seqlock: Reorder seqcount_t and seqlock_t API definitions Ahmed S. Darwish
                     ` (21 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc

The seqcount_t latch reader example at the raw_write_seqcount_latch()
kernel-doc comment ends the latch read section with a manual smp memory
barrier and sequence counter comparison.

This is technically correct, but it is suboptimal: read_seqcount_retry()
already contains the same logic of an smp memory barrier and sequence
counter comparison.

End the latch read critical section example with read_seqcount_retry().

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 6c4f68ef1393..d724b5e5408d 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -363,8 +363,8 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  *			idx = seq & 0x01;
  *			entry = data_query(latch->data[idx], ...);
  *
- *			smp_rmb();
- *		} while (seq != latch->seq);
+ *		// read_seqcount_retry() includes needed smp_rmb()
+ *		} while (read_seqcount_retry(&latch->seq, seq));
  *
  *		return entry;
  *	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 04/24] seqlock: Reorder seqcount_t and seqlock_t API definitions
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (2 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 03/24] seqlock: seqcount_t latch: End read sections with read_seqcount_retry() Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 05/24] seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs Ahmed S. Darwish
                     ` (20 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish

The seqlock.h seqcount_t and seqlock_t API definitions are presented in
the chronological order of their development rather than the order that
makes most sense to readers. This makes it hard to follow and understand
the header file code.

Group and reorder all of the exported seqlock.h functions according to
their function.

First, group together the seqcount_t standard read path functions:

    - __read_seqcount_begin()
    - raw_read_seqcount_begin()
    - read_seqcount_begin()

since each function is implemented exactly in terms of the one above
it. Then, group the special-case seqcount_t readers on their own as:

    - raw_read_seqcount()
    - raw_seqcount_begin()

since the only difference between the two functions is that the second
one masks the sequence counter LSB while the first one does not. Note
that raw_seqcount_begin() can actually be implemented in terms of
raw_read_seqcount(), which will be done in a follow-up commit.

Then, group the seqcount_t write path functions, instead of injecting
unrelated seqcount_t latch functions between them, and order them as:

    - raw_write_seqcount_begin()
    - raw_write_seqcount_end()
    - write_seqcount_begin_nested()
    - write_seqcount_begin()
    - write_seqcount_end()
    - raw_write_seqcount_barrier()
    - write_seqcount_invalidate()

which is the expected natural order. This also isolates the seqcount_t
latch functions into their own area, at the end of the sequence counters
section, and before jumping to the next one: sequential locks
(seqlock_t).

Do a similar grouping and reordering for seqlock_t "locking" readers vs.
the "conditionally locking or lockless" ones.

No implementation code was changed in any of the reordering above.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 174 ++++++++++++++++++++--------------------
 1 file changed, 86 insertions(+), 88 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index d724b5e5408d..4c1456008d89 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -128,23 +128,6 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 	return ret;
 }
 
-/**
- * raw_read_seqcount - Read the raw seqcount
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
- *
- * raw_read_seqcount opens a read critical section of the given
- * seqcount without any lockdep checking and without checking or
- * masking the LSB. Calling code is responsible for handling that.
- */
-static inline unsigned raw_read_seqcount(const seqcount_t *s)
-{
-	unsigned ret = READ_ONCE(s->sequence);
-	smp_rmb();
-	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
-	return ret;
-}
-
 /**
  * raw_read_seqcount_begin - start seq-read critical section w/o lockdep
  * @s: pointer to seqcount_t
@@ -176,6 +159,23 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
 	return raw_read_seqcount_begin(s);
 }
 
+/**
+ * raw_read_seqcount - Read the raw seqcount
+ * @s: pointer to seqcount_t
+ * Returns: count to be passed to read_seqcount_retry
+ *
+ * raw_read_seqcount opens a read critical section of the given
+ * seqcount without any lockdep checking and without checking or
+ * masking the LSB. Calling code is responsible for handling that.
+ */
+static inline unsigned raw_read_seqcount(const seqcount_t *s)
+{
+	unsigned ret = READ_ONCE(s->sequence);
+	smp_rmb();
+	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
+	return ret;
+}
+
 /**
  * raw_seqcount_begin - begin a seq-read critical section
  * @s: pointer to seqcount_t
@@ -234,8 +234,6 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
 	return __read_seqcount_retry(s, start);
 }
 
-
-
 static inline void raw_write_seqcount_begin(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
@@ -250,6 +248,23 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
+static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
+{
+	raw_write_seqcount_begin(s);
+	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
+}
+
+static inline void write_seqcount_begin(seqcount_t *s)
+{
+	write_seqcount_begin_nested(s, 0);
+}
+
+static inline void write_seqcount_end(seqcount_t *s)
+{
+	seqcount_release(&s->dep_map, _RET_IP_);
+	raw_write_seqcount_end(s);
+}
+
 /**
  * raw_write_seqcount_barrier - do a seq write barrier
  * @s: pointer to seqcount_t
@@ -300,6 +315,21 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
+/**
+ * write_seqcount_invalidate - invalidate in-progress read-side seq operations
+ * @s: pointer to seqcount_t
+ *
+ * After write_seqcount_invalidate, no read-side seq operations will complete
+ * successfully and see data older than this.
+ */
+static inline void write_seqcount_invalidate(seqcount_t *s)
+{
+	smp_wmb();
+	kcsan_nestable_atomic_begin();
+	s->sequence+=2;
+	kcsan_nestable_atomic_end();
+}
+
 static inline int raw_read_seqcount_latch(seqcount_t *s)
 {
 	/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
@@ -395,38 +425,6 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
        smp_wmb();      /* increment "sequence" before following stores */
 }
 
-static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
-{
-	raw_write_seqcount_begin(s);
-	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
-}
-
-static inline void write_seqcount_begin(seqcount_t *s)
-{
-	write_seqcount_begin_nested(s, 0);
-}
-
-static inline void write_seqcount_end(seqcount_t *s)
-{
-	seqcount_release(&s->dep_map, _RET_IP_);
-	raw_write_seqcount_end(s);
-}
-
-/**
- * write_seqcount_invalidate - invalidate in-progress read-side seq operations
- * @s: pointer to seqcount_t
- *
- * After write_seqcount_invalidate, no read-side seq operations will complete
- * successfully and see data older than this.
- */
-static inline void write_seqcount_invalidate(seqcount_t *s)
-{
-	smp_wmb();
-	kcsan_nestable_atomic_begin();
-	s->sequence+=2;
-	kcsan_nestable_atomic_end();
-}
-
 /*
  * Sequential locks (seqlock_t)
  *
@@ -555,6 +553,43 @@ static inline void read_sequnlock_excl(seqlock_t *sl)
 	spin_unlock(&sl->lock);
 }
 
+static inline void read_seqlock_excl_bh(seqlock_t *sl)
+{
+	spin_lock_bh(&sl->lock);
+}
+
+static inline void read_sequnlock_excl_bh(seqlock_t *sl)
+{
+	spin_unlock_bh(&sl->lock);
+}
+
+static inline void read_seqlock_excl_irq(seqlock_t *sl)
+{
+	spin_lock_irq(&sl->lock);
+}
+
+static inline void read_sequnlock_excl_irq(seqlock_t *sl)
+{
+	spin_unlock_irq(&sl->lock);
+}
+
+static inline unsigned long __read_seqlock_excl_irqsave(seqlock_t *sl)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&sl->lock, flags);
+	return flags;
+}
+
+#define read_seqlock_excl_irqsave(lock, flags)				\
+	do { flags = __read_seqlock_excl_irqsave(lock); } while (0)
+
+static inline void
+read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags)
+{
+	spin_unlock_irqrestore(&sl->lock, flags);
+}
+
 /**
  * read_seqbegin_or_lock - begin a sequence number check or locking block
  * @lock: sequence lock
@@ -584,43 +619,6 @@ static inline void done_seqretry(seqlock_t *lock, int seq)
 		read_sequnlock_excl(lock);
 }
 
-static inline void read_seqlock_excl_bh(seqlock_t *sl)
-{
-	spin_lock_bh(&sl->lock);
-}
-
-static inline void read_sequnlock_excl_bh(seqlock_t *sl)
-{
-	spin_unlock_bh(&sl->lock);
-}
-
-static inline void read_seqlock_excl_irq(seqlock_t *sl)
-{
-	spin_lock_irq(&sl->lock);
-}
-
-static inline void read_sequnlock_excl_irq(seqlock_t *sl)
-{
-	spin_unlock_irq(&sl->lock);
-}
-
-static inline unsigned long __read_seqlock_excl_irqsave(seqlock_t *sl)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&sl->lock, flags);
-	return flags;
-}
-
-#define read_seqlock_excl_irqsave(lock, flags)				\
-	do { flags = __read_seqlock_excl_irqsave(lock); } while (0)
-
-static inline void
-read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags)
-{
-	spin_unlock_irqrestore(&sl->lock, flags);
-}
-
 static inline unsigned long
 read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 05/24] seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (3 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 04/24] seqlock: Reorder seqcount_t and seqlock_t API definitions Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 06/24] seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount() Ahmed S. Darwish
                     ` (19 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Jonathan Corbet,
	linux-doc

seqlock.h is now included by kernel's RST documentation, but a small
number of the the exported seqlock.h functions are kernel-doc annotated.

Add kernel-doc for all seqlock.h exported APIs.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 425 ++++++++++++++++++++++++++++++++--------
 1 file changed, 348 insertions(+), 77 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 4c1456008d89..85fb3ac93ffb 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -75,6 +75,10 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
 # define SEQCOUNT_DEP_MAP_INIT(lockname) \
 		.dep_map = { .name = #lockname } \
 
+/**
+ * seqcount_init() - runtime initializer for seqcount_t
+ * @s: Pointer to the seqcount_t instance
+ */
 # define seqcount_init(s)				\
 	do {						\
 		static struct lock_class_key __key;	\
@@ -98,13 +102,15 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
 # define seqcount_lockdep_reader_access(x)
 #endif
 
-#define SEQCNT_ZERO(lockname) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(lockname)}
-
+/**
+ * SEQCNT_ZERO() - static initializer for seqcount_t
+ * @name: Name of the seqcount_t instance
+ */
+#define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) }
 
 /**
- * __read_seqcount_begin - begin a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
+ * @s: Pointer to seqcount_t
  *
  * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
  * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
@@ -113,6 +119,8 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  *
  * Use carefully, only in critical code, and comment how the barrier is
  * provided.
+ *
+ * Return: count to be passed to read_seqcount_retry()
  */
 static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 {
@@ -129,13 +137,10 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount_begin - start seq-read critical section w/o lockdep
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * raw_read_seqcount_begin() - begin a seqcount_t read section w/o lockdep
+ * @s: Pointer to seqcount_t
  *
- * raw_read_seqcount_begin opens a read critical section of the given
- * seqcount, but without any lockdep checking. Validity of the critical
- * section is tested by checking read_seqcount_retry function.
+ * Return: count to be passed to read_seqcount_retry()
  */
 static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
 {
@@ -145,13 +150,10 @@ static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * read_seqcount_begin - begin a seq-read critical section
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * read_seqcount_begin() - begin a seqcount_t read critical section
+ * @s: Pointer to seqcount_t
  *
- * read_seqcount_begin opens a read critical section of the given seqcount.
- * Validity of the critical section is tested by checking read_seqcount_retry
- * function.
+ * Return: count to be passed to read_seqcount_retry()
  */
 static inline unsigned read_seqcount_begin(const seqcount_t *s)
 {
@@ -160,13 +162,15 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount - Read the raw seqcount
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * raw_read_seqcount() - read the raw seqcount_t counter value
+ * @s: Pointer to seqcount_t
  *
  * raw_read_seqcount opens a read critical section of the given
- * seqcount without any lockdep checking and without checking or
- * masking the LSB. Calling code is responsible for handling that.
+ * seqcount_t, without any lockdep checking, and without checking or
+ * masking the sequence counter LSB. Calling code is responsible for
+ * handling that.
+ *
+ * Return: count to be passed to read_seqcount_retry()
  */
 static inline unsigned raw_read_seqcount(const seqcount_t *s)
 {
@@ -177,18 +181,21 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
 }
 
 /**
- * raw_seqcount_begin - begin a seq-read critical section
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * raw_seqcount_begin() - begin a seqcount_t read critical section w/o
+ *                        lockdep and w/o counter stabilization
+ * @s: Pointer to seqcount_t
  *
- * raw_seqcount_begin opens a read critical section of the given seqcount.
- * Validity of the critical section is tested by checking read_seqcount_retry
- * function.
+ * raw_seqcount_begin opens a read critical section of the given
+ * seqcount_t. Unlike read_seqcount_begin(), this function will not wait
+ * for the count to stabilize. If a writer is active when it begins, it
+ * will fail the read_seqcount_retry() at the end of the read critical
+ * section instead of stabilizing at the beginning of it.
  *
- * Unlike read_seqcount_begin(), this function will not wait for the count
- * to stabilize. If a writer is active when we begin, we will fail the
- * read_seqcount_retry() instead of stabilizing at the beginning of the
- * critical section.
+ * Use this only in special kernel hot paths where the read section is
+ * small and has a high probability of success through other external
+ * means. It will save a single branching instruction.
+ *
+ * Return: count to be passed to read_seqcount_retry()
  */
 static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 {
@@ -199,10 +206,9 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * __read_seqcount_retry - end a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
- * @start: count, from read_seqcount_begin
- * Returns: 1 if retry is required, else 0
+ * __read_seqcount_retry() - end a seqcount_t read section w/o barrier
+ * @s: Pointer to seqcount_t
+ * @start: count, from read_seqcount_begin()
  *
  * __read_seqcount_retry is like read_seqcount_retry, but has no smp_rmb()
  * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
@@ -211,6 +217,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
  *
  * Use carefully, only in critical code, and comment how the barrier is
  * provided.
+ *
+ * Return: true if a read section retry is required, else false
  */
 static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
@@ -219,14 +227,15 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
 }
 
 /**
- * read_seqcount_retry - end a seq-read critical section
- * @s: pointer to seqcount_t
- * @start: count, from read_seqcount_begin
- * Returns: 1 if retry is required, else 0
+ * read_seqcount_retry() - end a seqcount_t read critical section
+ * @s: Pointer to seqcount_t
+ * @start: count, from read_seqcount_begin()
  *
- * read_seqcount_retry closes a read critical section of the given seqcount.
- * If the critical section was invalid, it must be ignored (and typically
- * retried).
+ * read_seqcount_retry closes the read critical section of given
+ * seqcount_t.  If the critical section was invalid, it must be ignored
+ * (and typically retried).
+ *
+ * Return: true if a read section retry is required, else false
  */
 static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
@@ -234,6 +243,10 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
 	return __read_seqcount_retry(s, start);
 }
 
+/**
+ * raw_write_seqcount_begin() - start a seqcount_t write section w/o lockdep
+ * @s: Pointer to seqcount_t
+ */
 static inline void raw_write_seqcount_begin(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
@@ -241,6 +254,10 @@ static inline void raw_write_seqcount_begin(seqcount_t *s)
 	smp_wmb();
 }
 
+/**
+ * raw_write_seqcount_end() - end a seqcount_t write section w/o lockdep
+ * @s: Pointer to seqcount_t
+ */
 static inline void raw_write_seqcount_end(seqcount_t *s)
 {
 	smp_wmb();
@@ -248,17 +265,42 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
+/**
+ * write_seqcount_begin_nested() - start a seqcount_t write section with
+ *                                 custom lockdep nesting level
+ * @s: Pointer to seqcount_t
+ * @subclass: lockdep nesting level
+ *
+ * See Documentation/locking/lockdep-design.rst
+ */
 static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
 	raw_write_seqcount_begin(s);
 	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }
 
+/**
+ * write_seqcount_begin() - start a seqcount_t write side critical section
+ * @s: Pointer to seqcount_t
+ *
+ * write_seqcount_begin opens a write side critical section of the given
+ * seqcount_t.
+ *
+ * Context: seqcount_t write side critical sections must be serialized and
+ * non-preemptible. If readers can be invoked from hardirq or softirq
+ * context, interrupts or bottom halves must be respectively disabled.
+ */
 static inline void write_seqcount_begin(seqcount_t *s)
 {
 	write_seqcount_begin_nested(s, 0);
 }
 
+/**
+ * write_seqcount_end() - end a seqcount_t write side critical section
+ * @s: Pointer to seqcount_t
+ *
+ * The write section must've been opened with write_seqcount_begin().
+ */
 static inline void write_seqcount_end(seqcount_t *s)
 {
 	seqcount_release(&s->dep_map, _RET_IP_);
@@ -266,12 +308,12 @@ static inline void write_seqcount_end(seqcount_t *s)
 }
 
 /**
- * raw_write_seqcount_barrier - do a seq write barrier
- * @s: pointer to seqcount_t
+ * raw_write_seqcount_barrier() - do a seqcount_t write barrier
+ * @s: Pointer to seqcount_t
  *
- * This can be used to provide an ordering guarantee instead of the
- * usual consistency guarantee. It is one wmb cheaper, because we can
- * collapse the two back-to-back wmb()s.
+ * This can be used to provide an ordering guarantee instead of the usual
+ * consistency guarantee. It is one wmb cheaper, because it can collapse
+ * the two back-to-back wmb()s.
  *
  * Note that writes surrounding the barrier should be declared atomic (e.g.
  * via WRITE_ONCE): a) to ensure the writes become visible to other threads
@@ -316,11 +358,12 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
 }
 
 /**
- * write_seqcount_invalidate - invalidate in-progress read-side seq operations
- * @s: pointer to seqcount_t
+ * write_seqcount_invalidate() - invalidate in-progress seqcount_t read
+ *                               side operations
+ * @s: Pointer to seqcount_t
  *
- * After write_seqcount_invalidate, no read-side seq operations will complete
- * successfully and see data older than this.
+ * After write_seqcount_invalidate, no seqcount_t read side operations
+ * will complete successfully and see data older than this.
  */
 static inline void write_seqcount_invalidate(seqcount_t *s)
 {
@@ -330,6 +373,21 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
+/**
+ * raw_read_seqcount_latch() - pick even/odd seqcount_t latch data copy
+ * @s: Pointer to seqcount_t
+ *
+ * Use seqcount_t latching to switch between two storage places protected
+ * by a sequence counter. Doing so allows having interruptible, preemptible,
+ * seqcount_t write side critical sections.
+ *
+ * Check raw_write_seqcount_latch() for more details and a full reader and
+ * writer usage example.
+ *
+ * Return: sequence counter raw value. Use the lowest bit as an index for
+ * picking which data copy to read. The full counter value must then be
+ * checked with read_seqcount_retry().
+ */
 static inline int raw_read_seqcount_latch(seqcount_t *s)
 {
 	/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
@@ -338,8 +396,8 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
 }
 
 /**
- * raw_write_seqcount_latch - redirect readers to even/odd copy
- * @s: pointer to seqcount_t
+ * raw_write_seqcount_latch() - redirect readers to even/odd copy
+ * @s: Pointer to seqcount_t
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -446,17 +504,28 @@ typedef struct {
 		.lock =	__SPIN_LOCK_UNLOCKED(lockname)	\
 	}
 
-#define seqlock_init(x)					\
+/**
+ * seqlock_init() - dynamic initializer for seqlock_t
+ * @sl: Pointer to the seqlock_t instance
+ */
+#define seqlock_init(sl)				\
 	do {						\
-		seqcount_init(&(x)->seqcount);		\
-		spin_lock_init(&(x)->lock);		\
+		seqcount_init(&(sl)->seqcount);		\
+		spin_lock_init(&(sl)->lock);		\
 	} while (0)
 
-#define DEFINE_SEQLOCK(x) \
-		seqlock_t x = __SEQLOCK_UNLOCKED(x)
+/**
+ * DEFINE_SEQLOCK() - Define a statically allocated seqlock_t
+ * @sl: Name of the seqlock_t instance
+ */
+#define DEFINE_SEQLOCK(sl) \
+		seqlock_t sl = __SEQLOCK_UNLOCKED(sl)
 
-/*
- * Read side functions for starting and finalizing a read side section.
+/**
+ * read_seqbegin() - start a seqlock_t read side critical section
+ * @sl: Pointer to seqlock_t
+ *
+ * Return: count, to be passed to read_seqretry()
  */
 static inline unsigned read_seqbegin(const seqlock_t *sl)
 {
@@ -467,6 +536,17 @@ static inline unsigned read_seqbegin(const seqlock_t *sl)
 	return ret;
 }
 
+/**
+ * read_seqretry() - end a seqlock_t read side section
+ * @sl: Pointer to seqlock_t
+ * @start: count, from read_seqbegin()
+ *
+ * read_seqretry closes the read side critical section of given seqlock_t.
+ * If the critical section was invalid, it must be ignored (and typically
+ * retried).
+ *
+ * Return: true if a read section retry is required, else false
+ */
 static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 {
 	/*
@@ -478,10 +558,18 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 	return read_seqcount_retry(&sl->seqcount, start);
 }
 
-/*
- * Lock out other writers and update the count.
- * Acts like a normal spin_lock/unlock.
- * Don't need preempt_disable() because that is in the spin_lock already.
+/**
+ * write_seqlock() - start a seqlock_t write side critical section
+ * @sl: Pointer to seqlock_t
+ *
+ * write_seqlock opens a write side critical section for the given
+ * seqlock_t.  It also implicitly acquires the spinlock_t embedded inside
+ * that sequential lock. All seqlock_t write side sections are thus
+ * automatically serialized and non-preemptible.
+ *
+ * Context: if the seqlock_t read section, or other write side critical
+ * sections, can be invoked from hardirq or softirq contexts, use the
+ * _irqsave or _bh variants of this function instead.
  */
 static inline void write_seqlock(seqlock_t *sl)
 {
@@ -489,30 +577,66 @@ static inline void write_seqlock(seqlock_t *sl)
 	write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock() - end a seqlock_t write side critical section
+ * @sl: Pointer to seqlock_t
+ *
+ * write_sequnlock closes the (serialized and non-preemptible) write side
+ * critical section of given seqlock_t.
+ */
 static inline void write_sequnlock(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
 	spin_unlock(&sl->lock);
 }
 
+/**
+ * write_seqlock_bh() - start a softirqs-disabled seqlock_t write section
+ * @sl: Pointer to seqlock_t
+ *
+ * _bh variant of write_seqlock(). Use only if the read side section, or
+ * other write side sections, can be invoked from softirq contexts.
+ */
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
 	write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock_bh() - end a softirqs-disabled seqlock_t write section
+ * @sl: Pointer to seqlock_t
+ *
+ * write_sequnlock_bh closes the serialized, non-preemptible, and
+ * softirqs-disabled, seqlock_t write side critical section opened with
+ * write_seqlock_bh().
+ */
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
 	spin_unlock_bh(&sl->lock);
 }
 
+/**
+ * write_seqlock_irq() - start a non-interruptible seqlock_t write section
+ * @sl: Pointer to seqlock_t
+ *
+ * _irq variant of write_seqlock(). Use only if the read side section, or
+ * other write sections, can be invoked from hardirq contexts.
+ */
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
 	write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock_irq() - end a non-interruptible seqlock_t write section
+ * @sl: Pointer to seqlock_t
+ *
+ * write_sequnlock_irq closes the serialized and non-interruptible
+ * seqlock_t write side section opened with write_seqlock_irq().
+ */
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
@@ -528,9 +652,28 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	return flags;
 }
 
+/**
+ * write_seqlock_irqsave() - start a non-interruptible seqlock_t write
+ *                           section
+ * @lock:  Pointer to seqlock_t
+ * @flags: Stack-allocated storage for saving caller's local interrupt
+ *         state, to be passed to write_sequnlock_irqrestore().
+ *
+ * _irqsave variant of write_seqlock(). Use it only if the read side
+ * section, or other write sections, can be invoked from hardirq context.
+ */
 #define write_seqlock_irqsave(lock, flags)				\
 	do { flags = __write_seqlock_irqsave(lock); } while (0)
 
+/**
+ * write_sequnlock_irqrestore() - end non-interruptible seqlock_t write
+ *                                section
+ * @sl:    Pointer to seqlock_t
+ * @flags: Caller's saved interrupt state, from write_seqlock_irqsave()
+ *
+ * write_sequnlock_irqrestore closes the serialized and non-interruptible
+ * seqlock_t write section previously opened with write_seqlock_irqsave().
+ */
 static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
@@ -538,36 +681,79 @@ write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
-/*
- * A locking reader exclusively locks out other writers and locking readers,
- * but doesn't update the sequence number. Acts like a normal spin_lock/unlock.
- * Don't need preempt_disable() because that is in the spin_lock already.
+/**
+ * read_seqlock_excl() - begin a seqlock_t locking reader section
+ * @sl: Pointer to seqlock_t
+ *
+ * read_seqlock_excl opens a seqlock_t locking reader critical section.  A
+ * locking reader exclusively locks out *both* other writers *and* other
+ * locking readers, but it does not update the embedded sequence number.
+ *
+ * Locking readers act like a normal spin_lock()/spin_unlock().
+ *
+ * Context: if the seqlock_t write section, *or other read sections*, can
+ * be invoked from hardirq or softirq contexts, use the _irqsave or _bh
+ * variant of this function instead.
+ *
+ * The opened read section must be closed with read_sequnlock_excl().
  */
 static inline void read_seqlock_excl(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl() - end a seqlock_t locking reader critical section
+ * @sl: Pointer to seqlock_t
+ */
 static inline void read_sequnlock_excl(seqlock_t *sl)
 {
 	spin_unlock(&sl->lock);
 }
 
+/**
+ * read_seqlock_excl_bh() - start a seqlock_t locking reader section with
+ *			    softirqs disabled
+ * @sl: Pointer to seqlock_t
+ *
+ * _bh variant of read_seqlock_excl(). Use this variant only if the
+ * seqlock_t write side section, *or other read sections*, can be invoked
+ * from softirq contexts.
+ */
 static inline void read_seqlock_excl_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl_bh() - stop a seqlock_t softirq-disabled locking
+ *			      reader section
+ * @sl: Pointer to seqlock_t
+ */
 static inline void read_sequnlock_excl_bh(seqlock_t *sl)
 {
 	spin_unlock_bh(&sl->lock);
 }
 
+/**
+ * read_seqlock_excl_irq() - start a non-interruptible seqlock_t locking
+ *			     reader section
+ * @sl: Pointer to seqlock_t
+ *
+ * _irq variant of read_seqlock_excl(). Use this only if the seqlock_t
+ * write side section, *or other read sections*, can be invoked from a
+ * hardirq context.
+ */
 static inline void read_seqlock_excl_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl_irq() - end an interrupts-disabled seqlock_t
+ *                             locking reader section
+ * @sl: Pointer to seqlock_t
+ */
 static inline void read_sequnlock_excl_irq(seqlock_t *sl)
 {
 	spin_unlock_irq(&sl->lock);
@@ -581,9 +767,26 @@ static inline unsigned long __read_seqlock_excl_irqsave(seqlock_t *sl)
 	return flags;
 }
 
+/**
+ * read_seqlock_excl_irqsave() - start a non-interruptible seqlock_t
+ *				 locking reader section
+ * @lock:  Pointer to seqlock_t
+ * @flags: Stack-allocated storage for saving caller's local interrupt
+ *         state, to be passed to read_sequnlock_excl_irqrestore().
+ *
+ * _irqsave variant of read_seqlock_excl(). Use this only if the seqlock_t
+ * write side section, *or other read sections*, can be invoked from a
+ * hardirq context.
+ */
 #define read_seqlock_excl_irqsave(lock, flags)				\
 	do { flags = __read_seqlock_excl_irqsave(lock); } while (0)
 
+/**
+ * read_sequnlock_excl_irqrestore() - end non-interruptible seqlock_t
+ *				      locking reader section
+ * @sl:    Pointer to seqlock_t
+ * @flags: Caller saved interrupt state, from read_seqlock_excl_irqsave()
+ */
 static inline void
 read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags)
 {
@@ -591,14 +794,35 @@ read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags)
 }
 
 /**
- * read_seqbegin_or_lock - begin a sequence number check or locking block
- * @lock: sequence lock
- * @seq : sequence number to be checked
- *
- * First try it once optimistically without taking the lock. If that fails,
- * take the lock. The sequence number is also used as a marker for deciding
- * whether to be a reader (even) or writer (odd).
- * N.B. seq must be initialized to an even number to begin with.
+ * read_seqbegin_or_lock() - begin a seqlock_t lockless or locking reader
+ * @lock: Pointer to seqlock_t
+ * @seq : Marker and return parameter. If the passed value is even, the
+ * reader will become a *lockless* seqlock_t reader as in read_seqbegin().
+ * If the passed value is odd, the reader will become a *locking* reader
+ * as in read_seqlock_excl().  In the first call to this function, the
+ * caller *must* initialize and pass an even value to @seq; this way, a
+ * lockless read can be optimistically tried first.
+ *
+ * read_seqbegin_or_lock is an API designed to optimistically try a normal
+ * lockless seqlock_t read section first.  If an odd counter is found, the
+ * lockless read trial has failed, and the next read iteration transforms
+ * itself into a full seqlock_t locking reader.
+ *
+ * This is typically used to avoid seqlock_t lockless readers starvation
+ * (too much retry loops) in the case of a sharp spike in write side
+ * activity.
+ *
+ * Context: if the seqlock_t write section, *or other read sections*, can
+ * be invoked from hardirq or softirq contexts, use the _irqsave or _bh
+ * variant of this function instead.
+ *
+ * Check Documentation/locking/seqlock.rst for template example code.
+ *
+ * Return: the encountered sequence counter value, through the @seq
+ * parameter, which is overloaded as a return parameter. This returned
+ * value must be checked with need_seqretry(). If the read section need to
+ * be retried, this returned value must also be passed as the @seq
+ * parameter of the next read_seqbegin_or_lock() iteration.
  */
 static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
 {
@@ -608,17 +832,52 @@ static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
 		read_seqlock_excl(lock);
 }
 
+/**
+ * need_seqretry() - validate seqlock_t "locking or lockless" read section
+ * @lock: Pointer to seqlock_t
+ * @seq: sequence count, from read_seqbegin_or_lock()
+ *
+ * Return: true if a read section retry is required, false otherwise
+ */
 static inline int need_seqretry(seqlock_t *lock, int seq)
 {
 	return !(seq & 1) && read_seqretry(lock, seq);
 }
 
+/**
+ * done_seqretry() - end seqlock_t "locking or lockless" reader section
+ * @lock: Pointer to seqlock_t
+ * @seq: count, from read_seqbegin_or_lock()
+ *
+ * done_seqretry finishes the seqlock_t read side critical section started
+ * with read_seqbegin_or_lock() and validated by need_seqretry().
+ */
 static inline void done_seqretry(seqlock_t *lock, int seq)
 {
 	if (seq & 1)
 		read_sequnlock_excl(lock);
 }
 
+/**
+ * read_seqbegin_or_lock_irqsave() - begin a seqlock_t lockless reader, or
+ *                                   a non-interruptible locking reader
+ * @lock: Pointer to seqlock_t
+ * @seq:  Marker and return parameter. Check read_seqbegin_or_lock().
+ *
+ * This is the _irqsave variant of read_seqbegin_or_lock(). Use it only if
+ * the seqlock_t write section, *or other read sections*, can be invoked
+ * from hardirq context.
+ *
+ * Note: Interrupts will be disabled only for "locking reader" mode.
+ *
+ * Return:
+ *
+ *   1. The saved local interrupts state in case of a locking reader, to
+ *      be passed to done_seqretry_irqrestore().
+ *
+ *   2. The encountered sequence counter value, returned through @seq
+ *      overloaded as a return parameter. Check read_seqbegin_or_lock().
+ */
 static inline unsigned long
 read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
 {
@@ -632,6 +891,18 @@ read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
 	return flags;
 }
 
+/**
+ * done_seqretry_irqrestore() - end a seqlock_t lockless reader, or a
+ *				non-interruptible locking reader section
+ * @lock:  Pointer to seqlock_t
+ * @seq:   Count, from read_seqbegin_or_lock_irqsave()
+ * @flags: Caller's saved local interrupt state in case of a locking
+ *	   reader, also from read_seqbegin_or_lock_irqsave()
+ *
+ * This is the _irqrestore variant of done_seqretry(). The read section
+ * must've been opened with read_seqbegin_or_lock_irqsave(), and validated
+ * by need_seqretry().
+ */
 static inline void
 done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 06/24] seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount()
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (4 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 05/24] seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 07/24] lockdep: Add preemption enabled/disabled assertion APIs Ahmed S. Darwish
                     ` (18 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish

raw_seqcount_begin() has the same code as raw_read_seqcount(), with the
exception of masking the sequence counter's LSB before returning it to
the caller.

Note, raw_seqcount_begin() masks the counter's LSB before returning it
to the caller so that read_seqcount_retry() can fail if the counter is
odd -- without the overhead of an extra branching instruction.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 85fb3ac93ffb..e885702d8b82 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -199,10 +199,11 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
  */
 static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 {
-	unsigned ret = READ_ONCE(s->sequence);
-	smp_rmb();
-	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
-	return ret & ~1;
+	/*
+	 * If the counter is odd, let read_seqcount_retry() fail
+	 * by decrementing the counter.
+	 */
+	return raw_read_seqcount(s) & ~1;
 }
 
 /**
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 07/24] lockdep: Add preemption enabled/disabled assertion APIs
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (5 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 06/24] seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount() Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write Ahmed S. Darwish
                     ` (17 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish

Asserting that preemption is enabled or disabled is a critical sanity
check.  Developers are usually reluctant to add such a check in a
fastpath as reading the preemption count can be costly.

Extend the lockdep API with macros asserting that preemption is disabled
or enabled. If lockdep is disabled, or if the underlying architecture
does not support kernel preemption, this assert has no runtime overhead.

References: f54bb2ec02c8 ("locking/lockdep: Add IRQs disabled/enabled assertion APIs: ...")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/lockdep.h | 19 +++++++++++++++++++
 lib/Kconfig.debug       |  1 +
 2 files changed, 20 insertions(+)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 7aafba0ddcf9..39a35699d0d6 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -549,6 +549,22 @@ do {									\
 	WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirq_context));	\
 } while (0)
 
+#define lockdep_assert_preemption_enabled()				\
+do {									\
+	WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT)	&&		\
+		     debug_locks			&&		\
+		     (preempt_count() != 0		||		\
+		      !this_cpu_read(hardirqs_enabled)));		\
+} while (0)
+
+#define lockdep_assert_preemption_disabled()				\
+do {									\
+	WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT)	&&		\
+		     debug_locks			&&		\
+		     (preempt_count() == 0		&&		\
+		      this_cpu_read(hardirqs_enabled)));		\
+} while (0)
+
 #else
 # define might_lock(lock) do { } while (0)
 # define might_lock_read(lock) do { } while (0)
@@ -557,6 +573,9 @@ do {									\
 # define lockdep_assert_irqs_enabled() do { } while (0)
 # define lockdep_assert_irqs_disabled() do { } while (0)
 # define lockdep_assert_in_irq() do { } while (0)
+
+# define lockdep_assert_preemption_enabled() do { } while (0)
+# define lockdep_assert_preemption_disabled() do { } while (0)
 #endif
 
 #ifdef CONFIG_PROVE_RAW_LOCK_NESTING
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 9ad9210d70a1..5379931ba3b5 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1117,6 +1117,7 @@ config PROVE_LOCKING
 	select DEBUG_RWSEMS
 	select DEBUG_WW_MUTEX_SLOWPATH
 	select DEBUG_LOCK_ALLOC
+	select PREEMPT_COUNT if !ARCH_NO_PREEMPT
 	select TRACE_IRQFLAGS
 	default n
 	help
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (6 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 07/24] lockdep: Add preemption enabled/disabled assertion APIs Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-08-08 23:21     ` [PATCH v4 08/24] " Guenter Roeck
  2020-07-20 15:55   ` [PATCH v4 09/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (16 subsequent siblings)
  24 siblings, 2 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish

Preemption must be disabled before entering a sequence count write side
critical section.  Failing to do so, the seqcount read side can preempt
the write side section and spin for the entire scheduler tick.  If that
reader belongs to a real-time scheduling class, it can spin forever and
the kernel will livelock.

Assert through lockdep that preemption is disabled for seqcount writers.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index e885702d8b82..54bc20496392 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -266,6 +266,12 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
+static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
+{
+	raw_write_seqcount_begin(s);
+	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
+}
+
 /**
  * write_seqcount_begin_nested() - start a seqcount_t write section with
  *                                 custom lockdep nesting level
@@ -276,8 +282,19 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  */
 static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
-	raw_write_seqcount_begin(s);
-	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
+	lockdep_assert_preemption_disabled();
+	__write_seqcount_begin_nested(s, subclass);
+}
+
+/*
+ * A write_seqcount_begin() variant w/o lockdep non-preemptibility checks.
+ *
+ * Use for internal seqlock.h code where it's known that preemption is
+ * already disabled. For example, seqlock_t write side functions.
+ */
+static inline void __write_seqcount_begin(seqcount_t *s)
+{
+	__write_seqcount_begin_nested(s, 0);
 }
 
 /**
@@ -575,7 +592,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
 /**
@@ -601,7 +618,7 @@ static inline void write_sequnlock(seqlock_t *sl)
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
 /**
@@ -628,7 +645,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
 /**
@@ -649,7 +666,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	unsigned long flags;
 
 	spin_lock_irqsave(&sl->lock, flags);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 	return flags;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 09/24] seqlock: Extend seqcount API with associated locks
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (7 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 10/24] seqlock: Align multi-line macros newline escapes at 72 columns Ahmed S. Darwish
                     ` (15 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the write side critical section.

There is no built-in debugging mechanism to verify that the lock used
for writer serialization is held and preemption is disabled. Some usage
sites like dma-buf have explicit lockdep checks for the writer-side
lock, but this covers only a small portion of the sequence counter usage
in the kernel.

Add new sequence counter types which allows to associate a lock to the
sequence counter at initialization time. The seqcount API functions are
extended to provide appropriate lockdep assertions depending on the
seqcount/lock type.

For sequence counters with associated locks that do not implicitly
disable preemption, preemption protection is enforced in the sequence
counter write side functions. This removes the need to explicitly add
preempt_disable/enable() around the write side critical sections: the
write_begin/end() functions for these new sequence counter types
automatically do this.

Introduce the following seqcount types with associated locks:

     seqcount_spinlock_t
     seqcount_raw_spinlock_t
     seqcount_rwlock_t
     seqcount_mutex_t
     seqcount_ww_mutex_t

Extend the seqcount read and write functions to branch out to the
specific seqcount_LOCKTYPE_t implementation at compile-time. This avoids
kernel API explosion per each new seqcount_LOCKTYPE_t added. Add such
compile-time type detection logic into a new, internal, seqlock header.

Document the proper seqcount_LOCKTYPE_t usage, and rationale, at
Documentation/locking/seqlock.rst.

If lockdep is disabled, this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 Documentation/locking/seqlock.rst |  52 ++++
 include/linux/seqlock.h           | 462 +++++++++++++++++++++++++-----
 2 files changed, 446 insertions(+), 68 deletions(-)

diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
index 366dd368d90a..62c5ad98c11c 100644
--- a/Documentation/locking/seqlock.rst
+++ b/Documentation/locking/seqlock.rst
@@ -87,6 +87,58 @@ Read path::
 	} while (read_seqcount_retry(&foo_seqcount, seq));
 
 
+.. _seqcount_locktype_t:
+
+Sequence counters with associated locks (``seqcount_LOCKTYPE_t``)
+-----------------------------------------------------------------
+
+As discussed at :ref:`seqcount_t`, sequence count write side critical
+sections must be serialized and non-preemptible. This variant of
+sequence counters associate the lock used for writer serialization at
+initialization time, which enables lockdep to validate that the write
+side critical sections are properly serialized.
+
+This lock association is a NOOP if lockdep is disabled and has neither
+storage nor runtime overhead. If lockdep is enabled, the lock pointer is
+stored in struct seqcount and lockdep's "lock is held" assertions are
+injected at the beginning of the write side critical section to validate
+that it is properly protected.
+
+For lock types which do not implicitly disable preemption, preemption
+protection is enforced in the write side function.
+
+The following sequence counters with associated locks are defined:
+
+  - ``seqcount_spinlock_t``
+  - ``seqcount_raw_spinlock_t``
+  - ``seqcount_rwlock_t``
+  - ``seqcount_mutex_t``
+  - ``seqcount_ww_mutex_t``
+
+The plain seqcount read and write APIs branch out to the specific
+seqcount_LOCKTYPE_t implementation at compile-time. This avoids kernel
+API explosion per each new seqcount LOCKTYPE.
+
+Initialization (replace "LOCKTYPE" with one of the supported locks)::
+
+	/* dynamic */
+	seqcount_LOCKTYPE_t foo_seqcount;
+	seqcount_LOCKTYPE_init(&foo_seqcount, &lock);
+
+	/* static */
+	static seqcount_LOCKTYPE_t foo_seqcount =
+		SEQCNT_LOCKTYPE_ZERO(foo_seqcount, &lock);
+
+	/* C99 struct init */
+	struct {
+		.seq   = SEQCNT_LOCKTYPE_ZERO(foo.seq, &lock),
+	} foo;
+
+Write path: same as in :ref:`seqcount_t`, while running from a context
+with the associated LOCKTYPE lock acquired.
+
+Read path: same as in :ref:`seqcount_t`.
+
 .. _seqlock_t:
 
 Sequential locks (``seqlock_t``)
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 54bc20496392..8c16a494c968 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -10,13 +10,17 @@
  *
  * Copyrights:
  * - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli
+ * - Sequence counters with associated locks, (C) 2020 Linutronix GmbH
  */
 
-#include <linux/spinlock.h>
-#include <linux/preempt.h>
-#include <linux/lockdep.h>
 #include <linux/compiler.h>
 #include <linux/kcsan-checks.h>
+#include <linux/lockdep.h>
+#include <linux/mutex.h>
+#include <linux/preempt.h>
+#include <linux/spinlock.h>
+#include <linux/ww_mutex.h>
+
 #include <asm/processor.h>
 
 /*
@@ -48,6 +52,10 @@
  * This mechanism can't be used if the protected data contains pointers,
  * as the writer can invalidate a pointer that a reader is following.
  *
+ * If the write serialization mechanism is one of the common kernel
+ * locking primitives, use a sequence counter with associated lock
+ * (seqcount_LOCKTYPE_t) instead.
+ *
  * If it's desired to automatically handle the sequence counter writer
  * serialization and non-preemptibility requirements, use a sequential
  * lock (seqlock_t) instead.
@@ -108,9 +116,267 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  */
 #define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) }
 
+/*
+ * Sequence counters with associated locks (seqcount_LOCKTYPE_t)
+ *
+ * A sequence counter which associates the lock used for writer
+ * serialization at initialization time. This enables lockdep to validate
+ * that the write side critical section is properly serialized.
+ *
+ * For associated locks which do not implicitly disable preemption,
+ * preemption protection is enforced in the write side function.
+ *
+ * Lockdep is never used in any for the raw write variants.
+ *
+ * See Documentation/locking/seqlock.rst
+ */
+
+#ifdef CONFIG_LOCKDEP
+#define __SEQ_LOCKDEP(expr)	expr
+#else
+#define __SEQ_LOCKDEP(expr)
+#endif
+
+#define SEQCOUNT_LOCKTYPE_ZERO(seq_name, assoc_lock) {			\
+	.seqcount		= SEQCNT_ZERO(seq_name.seqcount),	\
+	__SEQ_LOCKDEP(.lock	= (assoc_lock))				\
+}
+
+#define seqcount_locktype_init(s, assoc_lock)				\
+do {									\
+	seqcount_init(&(s)->seqcount);					\
+	__SEQ_LOCKDEP((s)->lock = (assoc_lock));			\
+} while (0)
+
+/**
+ * typedef seqcount_spinlock_t - sequence counter with spinlock associated
+ * @seqcount:	The real sequence counter
+ * @lock:	Pointer to the associated spinlock
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * spinlock. The spinlock is associated to the sequence count in the
+ * static initializer or init function. This enables lockdep to validate
+ * that the write side critical section is properly serialized.
+ */
+typedef struct seqcount_spinlock {
+	seqcount_t	seqcount;
+	__SEQ_LOCKDEP(spinlock_t	*lock);
+} seqcount_spinlock_t;
+
+/**
+ * SEQCNT_SPINLOCK_ZERO - static initializer for seqcount_spinlock_t
+ * @name:	Name of the seqcount_spinlock_t instance
+ * @lock:	Pointer to the associated spinlock
+ */
+#define SEQCNT_SPINLOCK_ZERO(name, lock)				\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_spinlock_init - runtime initializer for seqcount_spinlock_t
+ * @s:		Pointer to the seqcount_spinlock_t instance
+ * @lock:	Pointer to the associated spinlock
+ */
+#define seqcount_spinlock_init(s, lock)					\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_raw_spinlock_t - sequence count with raw spinlock associated
+ * @seqcount:	The real sequence counter
+ * @lock:	Pointer to the associated raw spinlock
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * raw spinlock. The raw spinlock is associated to the sequence count in
+ * the static initializer or init function. This enables lockdep to
+ * validate that the write side critical section is properly serialized.
+ */
+typedef struct seqcount_raw_spinlock {
+	seqcount_t      seqcount;
+	__SEQ_LOCKDEP(raw_spinlock_t	*lock);
+} seqcount_raw_spinlock_t;
+
+/**
+ * SEQCNT_RAW_SPINLOCK_ZERO - static initializer for seqcount_raw_spinlock_t
+ * @name:	Name of the seqcount_raw_spinlock_t instance
+ * @lock:	Pointer to the associated raw_spinlock
+ */
+#define SEQCNT_RAW_SPINLOCK_ZERO(name, lock)				\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_raw_spinlock_init - runtime initializer for seqcount_raw_spinlock_t
+ * @s:		Pointer to the seqcount_raw_spinlock_t instance
+ * @lock:	Pointer to the associated raw_spinlock
+ */
+#define seqcount_raw_spinlock_init(s, lock)				\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_rwlock_t - sequence count with rwlock associated
+ * @seqcount:	The real sequence counter
+ * @lock:	Pointer to the associated rwlock
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * rwlock. The rwlock is associated to the sequence count in the static
+ * initializer or init function. This enables lockdep to validate that
+ * the write side critical section is properly serialized.
+ */
+typedef struct seqcount_rwlock {
+	seqcount_t      seqcount;
+	__SEQ_LOCKDEP(rwlock_t		*lock);
+} seqcount_rwlock_t;
+
+/**
+ * SEQCNT_RWLOCK_ZERO - static initializer for seqcount_rwlock_t
+ * @name:	Name of the seqcount_rwlock_t instance
+ * @lock:	Pointer to the associated rwlock
+ */
+#define SEQCNT_RWLOCK_ZERO(name, lock)					\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_rwlock_init - runtime initializer for seqcount_rwlock_t
+ * @s:		Pointer to the seqcount_rwlock_t instance
+ * @lock:	Pointer to the associated rwlock
+ */
+#define seqcount_rwlock_init(s, lock)					\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_mutex_t - sequence count with mutex associated
+ * @seqcount:	The real sequence counter
+ * @lock:	Pointer to the associated mutex
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * mutex. The mutex is associated to the sequence counter in the static
+ * initializer or init function. This enables lockdep to validate that
+ * the write side critical section is properly serialized.
+ *
+ * The write side API functions write_seqcount_begin()/end() automatically
+ * disable and enable preemption when used with seqcount_mutex_t.
+ */
+typedef struct seqcount_mutex {
+	seqcount_t      seqcount;
+	__SEQ_LOCKDEP(struct mutex	*lock);
+} seqcount_mutex_t;
+
+/**
+ * SEQCNT_MUTEX_ZERO - static initializer for seqcount_mutex_t
+ * @name:	Name of the seqcount_mutex_t instance
+ * @lock:	Pointer to the associated mutex
+ */
+#define SEQCNT_MUTEX_ZERO(name, lock)					\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_mutex_init - runtime initializer for seqcount_mutex_t
+ * @s:		Pointer to the seqcount_mutex_t instance
+ * @lock:	Pointer to the associated mutex
+ */
+#define seqcount_mutex_init(s, lock)					\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_ww_mutex_t - sequence count with ww_mutex associated
+ * @seqcount:	The real sequence counter
+ * @lock:	Pointer to the associated ww_mutex
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * ww_mutex. The ww_mutex is associated to the sequence counter in the static
+ * initializer or init function. This enables lockdep to validate that
+ * the write side critical section is properly serialized.
+ *
+ * The write side API functions write_seqcount_begin()/end() automatically
+ * disable and enable preemption when used with seqcount_ww_mutex_t.
+ */
+typedef struct seqcount_ww_mutex {
+	seqcount_t      seqcount;
+	__SEQ_LOCKDEP(struct ww_mutex	*lock);
+} seqcount_ww_mutex_t;
+
+/**
+ * SEQCNT_WW_MUTEX_ZERO - static initializer for seqcount_ww_mutex_t
+ * @name:	Name of the seqcount_ww_mutex_t instance
+ * @lock:	Pointer to the associated ww_mutex
+ */
+#define SEQCNT_WW_MUTEX_ZERO(name, lock)				\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_ww_mutex_init - runtime initializer for seqcount_ww_mutex_t
+ * @s:		Pointer to the seqcount_ww_mutex_t instance
+ * @lock:	Pointer to the associated ww_mutex
+ */
+#define seqcount_ww_mutex_init(s, lock)					\
+	seqcount_locktype_init(s, lock)
+
+/*
+ * @preempt: Is the associated write serialization lock preemtpible?
+ */
+#define SEQCOUNT_LOCKTYPE(locktype, preempt, lockmember)		\
+static inline seqcount_t *						\
+__seqcount_##locktype##_ptr(seqcount_##locktype##_t *s)			\
+{									\
+	return &s->seqcount;						\
+}									\
+									\
+static inline bool							\
+__seqcount_##locktype##_preemptible(seqcount_##locktype##_t *s)		\
+{									\
+	return preempt;							\
+}									\
+									\
+static inline void							\
+__seqcount_##locktype##_assert(seqcount_##locktype##_t *s)		\
+{									\
+	__SEQ_LOCKDEP(lockdep_assert_held(lockmember));			\
+}
+
+/*
+ * Similar hooks, but for plain seqcount_t
+ */
+
+static inline seqcount_t *__seqcount_ptr(seqcount_t *s)
+{
+	return s;
+}
+
+static inline bool __seqcount_preemptible(seqcount_t *s)
+{
+	return false;
+}
+
+static inline void __seqcount_assert(seqcount_t *s)
+{
+	lockdep_assert_preemption_disabled();
+}
+
+/*
+ * @s: Pointer to seqcount_locktype_t, generated hooks first parameter.
+ */
+SEQCOUNT_LOCKTYPE(raw_spinlock,	false,	s->lock)
+SEQCOUNT_LOCKTYPE(spinlock,	false,	s->lock)
+SEQCOUNT_LOCKTYPE(rwlock,	false,	s->lock)
+SEQCOUNT_LOCKTYPE(mutex,	true,	s->lock)
+SEQCOUNT_LOCKTYPE(ww_mutex,	true,	&s->lock->base)
+
+#define __seqprop_case(s, locktype, prop)				\
+	seqcount_##locktype##_t: __seqcount_##locktype##_##prop((void *)(s))
+
+#define __seqprop(s, prop) _Generic(*(s),				\
+	seqcount_t:		__seqcount_##prop((void *)(s)),		\
+	__seqprop_case((s),	raw_spinlock,	prop),			\
+	__seqprop_case((s),	spinlock,	prop),			\
+	__seqprop_case((s),	rwlock,		prop),			\
+	__seqprop_case((s),	mutex,		prop),			\
+	__seqprop_case((s),	ww_mutex,	prop))
+
+#define __to_seqcount_t(s)				__seqprop(s, ptr)
+#define __associated_lock_exists_and_is_preemptible(s)	__seqprop(s, preemptible)
+#define __assert_write_section_is_protected(s)		__seqprop(s, assert)
+
 /**
  * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
  * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
@@ -122,7 +388,10 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  *
  * Return: count to be passed to read_seqcount_retry()
  */
-static inline unsigned __read_seqcount_begin(const seqcount_t *s)
+#define __read_seqcount_begin(s)					\
+	__read_seqcount_t_begin(__to_seqcount_t(s))
+
+static inline unsigned __read_seqcount_t_begin(const seqcount_t *s)
 {
 	unsigned ret;
 
@@ -138,32 +407,38 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 
 /**
  * raw_read_seqcount_begin() - begin a seqcount_t read section w/o lockdep
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * Return: count to be passed to read_seqcount_retry()
  */
-static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
+#define raw_read_seqcount_begin(s)					\
+	raw_read_seqcount_t_begin(__to_seqcount_t(s))
+
+static inline unsigned raw_read_seqcount_t_begin(const seqcount_t *s)
 {
-	unsigned ret = __read_seqcount_begin(s);
+	unsigned ret = __read_seqcount_t_begin(s);
 	smp_rmb();
 	return ret;
 }
 
 /**
  * read_seqcount_begin() - begin a seqcount_t read critical section
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * Return: count to be passed to read_seqcount_retry()
  */
-static inline unsigned read_seqcount_begin(const seqcount_t *s)
+#define read_seqcount_begin(s)						\
+	read_seqcount_t_begin(__to_seqcount_t(s))
+
+static inline unsigned read_seqcount_t_begin(const seqcount_t *s)
 {
 	seqcount_lockdep_reader_access(s);
-	return raw_read_seqcount_begin(s);
+	return raw_read_seqcount_t_begin(s);
 }
 
 /**
  * raw_read_seqcount() - read the raw seqcount_t counter value
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * raw_read_seqcount opens a read critical section of the given
  * seqcount_t, without any lockdep checking, and without checking or
@@ -172,7 +447,10 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
  *
  * Return: count to be passed to read_seqcount_retry()
  */
-static inline unsigned raw_read_seqcount(const seqcount_t *s)
+#define raw_read_seqcount(s)						\
+	raw_read_seqcount_t(__to_seqcount_t(s))
+
+static inline unsigned raw_read_seqcount_t(const seqcount_t *s)
 {
 	unsigned ret = READ_ONCE(s->sequence);
 	smp_rmb();
@@ -183,7 +461,7 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
 /**
  * raw_seqcount_begin() - begin a seqcount_t read critical section w/o
  *                        lockdep and w/o counter stabilization
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * raw_seqcount_begin opens a read critical section of the given
  * seqcount_t. Unlike read_seqcount_begin(), this function will not wait
@@ -197,18 +475,21 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
  *
  * Return: count to be passed to read_seqcount_retry()
  */
-static inline unsigned raw_seqcount_begin(const seqcount_t *s)
+#define raw_seqcount_begin(s)						\
+	raw_seqcount_t_begin(__to_seqcount_t(s))
+
+static inline unsigned raw_seqcount_t_begin(const seqcount_t *s)
 {
 	/*
 	 * If the counter is odd, let read_seqcount_retry() fail
 	 * by decrementing the counter.
 	 */
-	return raw_read_seqcount(s) & ~1;
+	return raw_read_seqcount_t(s) & ~1;
 }
 
 /**
  * __read_seqcount_retry() - end a seqcount_t read section w/o barrier
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  * @start: count, from read_seqcount_begin()
  *
  * __read_seqcount_retry is like read_seqcount_retry, but has no smp_rmb()
@@ -221,7 +502,10 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
  *
  * Return: true if a read section retry is required, else false
  */
-static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
+#define __read_seqcount_retry(s, start)					\
+	__read_seqcount_t_retry(__to_seqcount_t(s), start)
+
+static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 {
 	kcsan_atomic_next(0);
 	return unlikely(READ_ONCE(s->sequence) != start);
@@ -229,7 +513,7 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
 
 /**
  * read_seqcount_retry() - end a seqcount_t read critical section
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  * @start: count, from read_seqcount_begin()
  *
  * read_seqcount_retry closes the read critical section of given
@@ -238,17 +522,28 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
  *
  * Return: true if a read section retry is required, else false
  */
-static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
+#define read_seqcount_retry(s, start)					\
+	read_seqcount_t_retry(__to_seqcount_t(s), start)
+
+static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 {
 	smp_rmb();
-	return __read_seqcount_retry(s, start);
+	return __read_seqcount_t_retry(s, start);
 }
 
 /**
  * raw_write_seqcount_begin() - start a seqcount_t write section w/o lockdep
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  */
-static inline void raw_write_seqcount_begin(seqcount_t *s)
+#define raw_write_seqcount_begin(s)					\
+do {									\
+	if (__associated_lock_exists_and_is_preemptible(s))		\
+		preempt_disable();					\
+									\
+	raw_write_seqcount_t_begin(__to_seqcount_t(s));			\
+} while (0)
+
+static inline void raw_write_seqcount_t_begin(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
 	s->sequence++;
@@ -257,49 +552,50 @@ static inline void raw_write_seqcount_begin(seqcount_t *s)
 
 /**
  * raw_write_seqcount_end() - end a seqcount_t write section w/o lockdep
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  */
-static inline void raw_write_seqcount_end(seqcount_t *s)
+#define raw_write_seqcount_end(s)					\
+do {									\
+	raw_write_seqcount_t_end(__to_seqcount_t(s));			\
+									\
+	if (__associated_lock_exists_and_is_preemptible(s))		\
+		preempt_enable();					\
+} while (0)
+
+static inline void raw_write_seqcount_t_end(seqcount_t *s)
 {
 	smp_wmb();
 	s->sequence++;
 	kcsan_nestable_atomic_end();
 }
 
-static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
-{
-	raw_write_seqcount_begin(s);
-	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
-}
-
 /**
  * write_seqcount_begin_nested() - start a seqcount_t write section with
  *                                 custom lockdep nesting level
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  * @subclass: lockdep nesting level
  *
  * See Documentation/locking/lockdep-design.rst
  */
-static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
-{
-	lockdep_assert_preemption_disabled();
-	__write_seqcount_begin_nested(s, subclass);
-}
+#define write_seqcount_begin_nested(s, subclass)			\
+do {									\
+	__assert_write_section_is_protected(s);				\
+									\
+	if (__associated_lock_exists_and_is_preemptible(s))		\
+		preempt_disable();					\
+									\
+	write_seqcount_t_begin_nested(__to_seqcount_t(s), subclass);	\
+} while (0)
 
-/*
- * A write_seqcount_begin() variant w/o lockdep non-preemptibility checks.
- *
- * Use for internal seqlock.h code where it's known that preemption is
- * already disabled. For example, seqlock_t write side functions.
- */
-static inline void __write_seqcount_begin(seqcount_t *s)
+static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
 {
-	__write_seqcount_begin_nested(s, 0);
+	raw_write_seqcount_t_begin(s);
+	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }
 
 /**
  * write_seqcount_begin() - start a seqcount_t write side critical section
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * write_seqcount_begin opens a write side critical section of the given
  * seqcount_t.
@@ -308,26 +604,44 @@ static inline void __write_seqcount_begin(seqcount_t *s)
  * non-preemptible. If readers can be invoked from hardirq or softirq
  * context, interrupts or bottom halves must be respectively disabled.
  */
-static inline void write_seqcount_begin(seqcount_t *s)
+#define write_seqcount_begin(s)						\
+do {									\
+	__assert_write_section_is_protected(s);				\
+									\
+	if (__associated_lock_exists_and_is_preemptible(s))		\
+		preempt_disable();					\
+									\
+	write_seqcount_t_begin(__to_seqcount_t(s));			\
+} while (0)
+
+static inline void write_seqcount_t_begin(seqcount_t *s)
 {
-	write_seqcount_begin_nested(s, 0);
+	write_seqcount_t_begin_nested(s, 0);
 }
 
 /**
  * write_seqcount_end() - end a seqcount_t write side critical section
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * The write section must've been opened with write_seqcount_begin().
  */
-static inline void write_seqcount_end(seqcount_t *s)
+#define write_seqcount_end(s)						\
+do {									\
+	write_seqcount_t_end(__to_seqcount_t(s));			\
+									\
+	if (__associated_lock_exists_and_is_preemptible(s))		\
+		preempt_enable();					\
+} while (0)
+
+static inline void write_seqcount_t_end(seqcount_t *s)
 {
 	seqcount_release(&s->dep_map, _RET_IP_);
-	raw_write_seqcount_end(s);
+	raw_write_seqcount_t_end(s);
 }
 
 /**
  * raw_write_seqcount_barrier() - do a seqcount_t write barrier
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * This can be used to provide an ordering guarantee instead of the usual
  * consistency guarantee. It is one wmb cheaper, because it can collapse
@@ -366,7 +680,10 @@ static inline void write_seqcount_end(seqcount_t *s)
  *		WRITE_ONCE(X, false);
  *      }
  */
-static inline void raw_write_seqcount_barrier(seqcount_t *s)
+#define raw_write_seqcount_barrier(s)					\
+	raw_write_seqcount_t_barrier(__to_seqcount_t(s))
+
+static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
 	s->sequence++;
@@ -378,12 +695,15 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
 /**
  * write_seqcount_invalidate() - invalidate in-progress seqcount_t read
  *                               side operations
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * After write_seqcount_invalidate, no seqcount_t read side operations
  * will complete successfully and see data older than this.
  */
-static inline void write_seqcount_invalidate(seqcount_t *s)
+#define write_seqcount_invalidate(s)					\
+	write_seqcount_t_invalidate(__to_seqcount_t(s))
+
+static inline void write_seqcount_t_invalidate(seqcount_t *s)
 {
 	smp_wmb();
 	kcsan_nestable_atomic_begin();
@@ -393,7 +713,7 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
 
 /**
  * raw_read_seqcount_latch() - pick even/odd seqcount_t latch data copy
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * Use seqcount_t latching to switch between two storage places protected
  * by a sequence counter. Doing so allows having interruptible, preemptible,
@@ -406,7 +726,10 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
  * picking which data copy to read. The full counter value must then be
  * checked with read_seqcount_retry().
  */
-static inline int raw_read_seqcount_latch(seqcount_t *s)
+#define raw_read_seqcount_latch(s)					\
+	raw_read_seqcount_t_latch(__to_seqcount_t(s))
+
+static inline int raw_read_seqcount_t_latch(seqcount_t *s)
 {
 	/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
 	int seq = READ_ONCE(s->sequence); /* ^^^ */
@@ -415,7 +738,7 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
 
 /**
  * raw_write_seqcount_latch() - redirect readers to even/odd copy
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -494,7 +817,10 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  *	When data is a dynamic data structure; one should use regular RCU
  *	patterns to manage the lifetimes of the objects within.
  */
-static inline void raw_write_seqcount_latch(seqcount_t *s)
+#define raw_write_seqcount_latch(s)					\
+	raw_write_seqcount_t_latch(__to_seqcount_t(s))
+
+static inline void raw_write_seqcount_t_latch(seqcount_t *s)
 {
        smp_wmb();      /* prior stores before incrementing "sequence" */
        s->sequence++;
@@ -592,7 +918,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	__write_seqcount_begin(&sl->seqcount);
+	write_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -604,7 +930,7 @@ static inline void write_seqlock(seqlock_t *sl)
  */
 static inline void write_sequnlock(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock(&sl->lock);
 }
 
@@ -618,7 +944,7 @@ static inline void write_sequnlock(seqlock_t *sl)
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
-	__write_seqcount_begin(&sl->seqcount);
+	write_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -631,7 +957,7 @@ static inline void write_seqlock_bh(seqlock_t *sl)
  */
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock_bh(&sl->lock);
 }
 
@@ -645,7 +971,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
-	__write_seqcount_begin(&sl->seqcount);
+	write_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -657,7 +983,7 @@ static inline void write_seqlock_irq(seqlock_t *sl)
  */
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock_irq(&sl->lock);
 }
 
@@ -666,7 +992,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	unsigned long flags;
 
 	spin_lock_irqsave(&sl->lock, flags);
-	__write_seqcount_begin(&sl->seqcount);
+	write_seqcount_t_begin(&sl->seqcount);
 	return flags;
 }
 
@@ -695,13 +1021,13 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
 /**
  * read_seqlock_excl() - begin a seqlock_t locking reader section
- * @sl: Pointer to seqlock_t
+ * @sl:	Pointer to seqlock_t
  *
  * read_seqlock_excl opens a seqlock_t locking reader critical section.  A
  * locking reader exclusively locks out *both* other writers *and* other
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 10/24] seqlock: Align multi-line macros newline escapes at 72 columns
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (8 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 09/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 11/24] dma-buf: Remove custom seqcount lockdep class key Ahmed S. Darwish
                     ` (14 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish

Parent commit, "seqlock: Extend seqcount API with associated locks",
introduced a big number of multi-line macros that are newline-escaped
at 72 columns.

For overall cohesion, align the earlier-existing macros similarly.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 8c16a494c968..b48729988325 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -80,17 +80,18 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
 }
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define SEQCOUNT_DEP_MAP_INIT(lockname) \
-		.dep_map = { .name = #lockname } \
+
+# define SEQCOUNT_DEP_MAP_INIT(lockname)				\
+		.dep_map = { .name = #lockname }
 
 /**
  * seqcount_init() - runtime initializer for seqcount_t
  * @s: Pointer to the seqcount_t instance
  */
-# define seqcount_init(s)				\
-	do {						\
-		static struct lock_class_key __key;	\
-		__seqcount_init((s), #s, &__key);	\
+# define seqcount_init(s)						\
+	do {								\
+		static struct lock_class_key __key;			\
+		__seqcount_init((s), #s, &__key);			\
 	} while (0)
 
 static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
@@ -842,20 +843,20 @@ typedef struct {
 	spinlock_t lock;
 } seqlock_t;
 
-#define __SEQLOCK_UNLOCKED(lockname)			\
-	{						\
-		.seqcount = SEQCNT_ZERO(lockname),	\
-		.lock =	__SPIN_LOCK_UNLOCKED(lockname)	\
+#define __SEQLOCK_UNLOCKED(lockname)					\
+	{								\
+		.seqcount = SEQCNT_ZERO(lockname),			\
+		.lock =	__SPIN_LOCK_UNLOCKED(lockname)			\
 	}
 
 /**
  * seqlock_init() - dynamic initializer for seqlock_t
  * @sl: Pointer to the seqlock_t instance
  */
-#define seqlock_init(sl)				\
-	do {						\
-		seqcount_init(&(sl)->seqcount);		\
-		spin_lock_init(&(sl)->lock);		\
+#define seqlock_init(sl)						\
+	do {								\
+		seqcount_init(&(sl)->seqcount);				\
+		spin_lock_init(&(sl)->lock);				\
 	} while (0)
 
 /**
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 11/24] dma-buf: Remove custom seqcount lockdep class key
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (9 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 10/24] seqlock: Align multi-line macros newline escapes at 72 columns Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 12/24] dma-buf: Use sequence counter with associated wound/wait mutex Ahmed S. Darwish
                     ` (13 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Daniel Vetter

Commit 3c3b177a9369 ("reservation: add support for read-only access
using rcu") introduced a sequence counter to manage updates to
reservations. Back then, the reservation object initializer
reservation_object_init() was always inlined.

Having the sequence counter initialization inlined meant that each of
the call sites would have a different lockdep class key, which would've
broken lockdep's deadlock detection. The aforementioned commit thus
introduced, and exported, a custom seqcount lockdep class key and name.

The commit 8735f16803f00 ("dma-buf: cleanup reservation_object_init...")
transformed the reservation object initializer to a normal non-inlined C
function. seqcount_init(), which automatically defines the seqcount
lockdep class key and must be called non-inlined, can now be safely used.

Remove the seqcount custom lockdep class key, name, and export. Use
seqcount_init() inside the dma reservation object initializer.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/dma-buf/dma-resv.c | 9 +--------
 include/linux/dma-resv.h   | 2 --
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index b45f8514dc82..15efa0c2dacb 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -51,12 +51,6 @@
 DEFINE_WD_CLASS(reservation_ww_class);
 EXPORT_SYMBOL(reservation_ww_class);
 
-struct lock_class_key reservation_seqcount_class;
-EXPORT_SYMBOL(reservation_seqcount_class);
-
-const char reservation_seqcount_string[] = "reservation_seqcount";
-EXPORT_SYMBOL(reservation_seqcount_string);
-
 /**
  * dma_resv_list_alloc - allocate fence list
  * @shared_max: number of fences we need space for
@@ -135,9 +129,8 @@ subsys_initcall(dma_resv_lockdep);
 void dma_resv_init(struct dma_resv *obj)
 {
 	ww_mutex_init(&obj->lock, &reservation_ww_class);
+	seqcount_init(&obj->seq);
 
-	__seqcount_init(&obj->seq, reservation_seqcount_string,
-			&reservation_seqcount_class);
 	RCU_INIT_POINTER(obj->fence, NULL);
 	RCU_INIT_POINTER(obj->fence_excl, NULL);
 }
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index ee50d10f052b..a6538ae7d93f 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -46,8 +46,6 @@
 #include <linux/rcupdate.h>
 
 extern struct ww_class reservation_ww_class;
-extern struct lock_class_key reservation_seqcount_class;
-extern const char reservation_seqcount_string[];
 
 /**
  * struct dma_resv_list - a list of shared fences
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 12/24] dma-buf: Use sequence counter with associated wound/wait mutex
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (10 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 11/24] dma-buf: Remove custom seqcount lockdep class key Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 13/24] sched: tasks: Use sequence counter with associated spinlock Ahmed S. Darwish
                     ` (12 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Daniel Vetter

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the sequence counter write side critical
section.

The dma-buf reservation subsystem uses plain sequence counters to manage
updates to reservations. Writer serialization is accomplished through a
wound/wait mutex.

Acquiring a wound/wait mutex does not disable preemption, so this needs
to be done manually before and after the write side critical section.

Use the newly-added seqcount_ww_mutex_t instead:

  - It associates the ww_mutex with the sequence count, which enables
    lockdep to validate that the write side critical section is properly
    serialized.

  - It removes the need to explicitly add preempt_disable/enable()
    around the write side critical section because the write_begin/end()
    functions for this new data type automatically do this.

If lockdep is disabled this ww_mutex lock association is compiled out
and has neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/dma-buf/dma-resv.c                       | 8 +-------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 --
 include/linux/dma-resv.h                         | 2 +-
 3 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 15efa0c2dacb..a7631352a486 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -129,7 +129,7 @@ subsys_initcall(dma_resv_lockdep);
 void dma_resv_init(struct dma_resv *obj)
 {
 	ww_mutex_init(&obj->lock, &reservation_ww_class);
-	seqcount_init(&obj->seq);
+	seqcount_ww_mutex_init(&obj->seq, &obj->lock);
 
 	RCU_INIT_POINTER(obj->fence, NULL);
 	RCU_INIT_POINTER(obj->fence_excl, NULL);
@@ -260,7 +260,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
 	fobj = dma_resv_get_list(obj);
 	count = fobj->shared_count;
 
-	preempt_disable();
 	write_seqcount_begin(&obj->seq);
 
 	for (i = 0; i < count; ++i) {
@@ -282,7 +281,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
 	smp_store_mb(fobj->shared_count, count);
 
 	write_seqcount_end(&obj->seq);
-	preempt_enable();
 	dma_fence_put(old);
 }
 EXPORT_SYMBOL(dma_resv_add_shared_fence);
@@ -309,14 +307,12 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
 	if (fence)
 		dma_fence_get(fence);
 
-	preempt_disable();
 	write_seqcount_begin(&obj->seq);
 	/* write_seqcount_begin provides the necessary memory barrier */
 	RCU_INIT_POINTER(obj->fence_excl, fence);
 	if (old)
 		old->shared_count = 0;
 	write_seqcount_end(&obj->seq);
-	preempt_enable();
 
 	/* inplace update, no shared fences */
 	while (i--)
@@ -394,13 +390,11 @@ int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
 	src_list = dma_resv_get_list(dst);
 	old = dma_resv_get_excl(dst);
 
-	preempt_disable();
 	write_seqcount_begin(&dst->seq);
 	/* write_seqcount_begin provides the necessary memory barrier */
 	RCU_INIT_POINTER(dst->fence_excl, new);
 	RCU_INIT_POINTER(dst->fence, dst_list);
 	write_seqcount_end(&dst->seq);
-	preempt_enable();
 
 	dma_resv_list_free(src_list);
 	dma_fence_put(old);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index b91b5171270f..ff4b583cb96a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -258,11 +258,9 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
 	new->shared_count = k;
 
 	/* Install the new fence list, seqcount provides the barriers */
-	preempt_disable();
 	write_seqcount_begin(&resv->seq);
 	RCU_INIT_POINTER(resv->fence, new);
 	write_seqcount_end(&resv->seq);
-	preempt_enable();
 
 	/* Drop the references to the removed fences or move them to ef_list */
 	for (i = j, k = 0; i < old->shared_count; ++i) {
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index a6538ae7d93f..d44a77e8a7e3 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -69,7 +69,7 @@ struct dma_resv_list {
  */
 struct dma_resv {
 	struct ww_mutex lock;
-	seqcount_t seq;
+	seqcount_ww_mutex_t seq;
 
 	struct dma_fence __rcu *fence_excl;
 	struct dma_resv_list __rcu *fence;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 13/24] sched: tasks: Use sequence counter with associated spinlock
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (11 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 12/24] dma-buf: Use sequence counter with associated wound/wait mutex Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 14/24] netfilter: conntrack: " Ahmed S. Darwish
                     ` (11 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Al Viro

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/sched.h | 2 +-
 init/init_task.c      | 3 ++-
 kernel/fork.c         | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3903a9500926..02b7fbd17bf6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1054,7 +1054,7 @@ struct task_struct {
 	/* Protected by ->alloc_lock: */
 	nodemask_t			mems_allowed;
 	/* Seqence number to catch updates: */
-	seqcount_t			mems_allowed_seq;
+	seqcount_spinlock_t		mems_allowed_seq;
 	int				cpuset_mem_spread_rotor;
 	int				cpuset_slab_spread_rotor;
 #endif
diff --git a/init/init_task.c b/init/init_task.c
index 15089d15010a..94fe3ba1bb60 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -154,7 +154,8 @@ struct task_struct init_task
 	.trc_holdout_list = LIST_HEAD_INIT(init_task.trc_holdout_list),
 #endif
 #ifdef CONFIG_CPUSETS
-	.mems_allowed_seq = SEQCNT_ZERO(init_task.mems_allowed_seq),
+	.mems_allowed_seq = SEQCNT_SPINLOCK_ZERO(init_task.mems_allowed_seq,
+						 &init_task.alloc_lock),
 #endif
 #ifdef CONFIG_RT_MUTEXES
 	.pi_waiters	= RB_ROOT_CACHED,
diff --git a/kernel/fork.c b/kernel/fork.c
index 70d9d0a4de2a..fc72f09a61b2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2032,7 +2032,7 @@ static __latent_entropy struct task_struct *copy_process(
 #ifdef CONFIG_CPUSETS
 	p->cpuset_mem_spread_rotor = NUMA_NO_NODE;
 	p->cpuset_slab_spread_rotor = NUMA_NO_NODE;
-	seqcount_init(&p->mems_allowed_seq);
+	seqcount_spinlock_init(&p->mems_allowed_seq, &p->alloc_lock);
 #endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	p->irq_events = 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 14/24] netfilter: conntrack: Use sequence counter with associated spinlock
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (12 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 13/24] sched: tasks: Use sequence counter with associated spinlock Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 15/24] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock Ahmed S. Darwish
                     ` (10 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	Jakub Kicinski, netfilter-devel, coreteam, netdev

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/net/netfilter/nf_conntrack.h | 2 +-
 net/netfilter/nf_conntrack_core.c    | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index 90690e37a56f..ea4e2010b246 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -286,7 +286,7 @@ int nf_conntrack_hash_resize(unsigned int hashsize);
 
 extern struct hlist_nulls_head *nf_conntrack_hash;
 extern unsigned int nf_conntrack_htable_size;
-extern seqcount_t nf_conntrack_generation;
+extern seqcount_spinlock_t nf_conntrack_generation;
 extern unsigned int nf_conntrack_max;
 
 /* must be called with rcu read lock held */
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 79cd9dde457b..b8c54d390f93 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -180,7 +180,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_htable_size);
 
 unsigned int nf_conntrack_max __read_mostly;
 EXPORT_SYMBOL_GPL(nf_conntrack_max);
-seqcount_t nf_conntrack_generation __read_mostly;
+seqcount_spinlock_t nf_conntrack_generation __read_mostly;
 static unsigned int nf_conntrack_hash_rnd __read_mostly;
 
 static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple,
@@ -2598,7 +2598,8 @@ int nf_conntrack_init_start(void)
 	/* struct nf_ct_ext uses u8 to store offsets/size */
 	BUILD_BUG_ON(total_extension_size() > 255u);
 
-	seqcount_init(&nf_conntrack_generation);
+	seqcount_spinlock_init(&nf_conntrack_generation,
+			       &nf_conntrack_locks_all_lock);
 
 	for (i = 0; i < CONNTRACK_LOCKS; i++)
 		spin_lock_init(&nf_conntrack_locks[i]);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 15/24] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (13 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 14/24] netfilter: conntrack: " Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 16/24] xfrm: policy: Use sequence counters with associated lock Ahmed S. Darwish
                     ` (9 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	Jakub Kicinski, netfilter-devel, coreteam, netdev

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_rwlock_t data type, which allows to associate a
rwlock with the sequence counter. This enables lockdep to verify that
the rwlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 net/netfilter/nft_set_rbtree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index b6aad3fc46c3..4b2834fd17b2 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -18,7 +18,7 @@
 struct nft_rbtree {
 	struct rb_root		root;
 	rwlock_t		lock;
-	seqcount_t		count;
+	seqcount_rwlock_t	count;
 	struct delayed_work	gc_work;
 };
 
@@ -523,7 +523,7 @@ static int nft_rbtree_init(const struct nft_set *set,
 	struct nft_rbtree *priv = nft_set_priv(set);
 
 	rwlock_init(&priv->lock);
-	seqcount_init(&priv->count);
+	seqcount_rwlock_init(&priv->count, &priv->lock);
 	priv->root = RB_ROOT;
 
 	INIT_DEFERRABLE_WORK(&priv->gc_work, nft_rbtree_gc);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 16/24] xfrm: policy: Use sequence counters with associated lock
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (14 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 15/24] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 17/24] timekeeping: Use sequence counter with associated raw spinlock Ahmed S. Darwish
                     ` (8 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Steffen Klassert,
	Herbert Xu, David S. Miller, Jakub Kicinski, netdev

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the sequence counter write side critical
section.

A plain seqcount_t does not contain the information of which lock must
be held when entering a write side critical section.

Use the new seqcount_spinlock_t and seqcount_mutex_t data types instead,
which allow to associate a lock with the sequence counter. This enables
lockdep to verify that the lock used for writer serialization is held
when the write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 net/xfrm/xfrm_policy.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 564aa6492e7c..732a940468b0 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -122,7 +122,7 @@ struct xfrm_pol_inexact_bin {
 	/* list containing '*:*' policies */
 	struct hlist_head hhead;
 
-	seqcount_t count;
+	seqcount_spinlock_t count;
 	/* tree sorted by daddr/prefix */
 	struct rb_root root_d;
 
@@ -155,7 +155,7 @@ static struct xfrm_policy_afinfo const __rcu *xfrm_policy_afinfo[AF_INET6 + 1]
 						__read_mostly;
 
 static struct kmem_cache *xfrm_dst_cache __ro_after_init;
-static __read_mostly seqcount_t xfrm_policy_hash_generation;
+static __read_mostly seqcount_mutex_t xfrm_policy_hash_generation;
 
 static struct rhashtable xfrm_policy_inexact_table;
 static const struct rhashtable_params xfrm_pol_inexact_params;
@@ -719,7 +719,7 @@ xfrm_policy_inexact_alloc_bin(const struct xfrm_policy *pol, u8 dir)
 	INIT_HLIST_HEAD(&bin->hhead);
 	bin->root_d = RB_ROOT;
 	bin->root_s = RB_ROOT;
-	seqcount_init(&bin->count);
+	seqcount_spinlock_init(&bin->count, &net->xfrm.xfrm_policy_lock);
 
 	prev = rhashtable_lookup_get_insert_key(&xfrm_policy_inexact_table,
 						&bin->k, &bin->head,
@@ -1906,7 +1906,7 @@ static int xfrm_policy_match(const struct xfrm_policy *pol,
 
 static struct xfrm_pol_inexact_node *
 xfrm_policy_lookup_inexact_addr(const struct rb_root *r,
-				seqcount_t *count,
+				seqcount_spinlock_t *count,
 				const xfrm_address_t *addr, u16 family)
 {
 	const struct rb_node *parent;
@@ -4153,7 +4153,7 @@ void __init xfrm_init(void)
 {
 	register_pernet_subsys(&xfrm_net_ops);
 	xfrm_dev_init();
-	seqcount_init(&xfrm_policy_hash_generation);
+	seqcount_mutex_init(&xfrm_policy_hash_generation, &hash_resize_mutex);
 	xfrm_input_init();
 
 #ifdef CONFIG_INET_ESPINTCP
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 17/24] timekeeping: Use sequence counter with associated raw spinlock
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (15 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 16/24] xfrm: policy: Use sequence counters with associated lock Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 18/24] vfs: Use sequence counter with associated spinlock Ahmed S. Darwish
                     ` (7 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, John Stultz,
	Stephen Boyd

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_raw_spinlock_t data type, which allows to associate
a raw spinlock with the sequence counter. This enables lockdep to verify
that the raw spinlock used for writer serialization is held when the
write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 kernel/time/timekeeping.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index d20d489841c8..05ecfd8a3314 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -39,18 +39,19 @@ enum timekeeping_adv_mode {
 	TK_ADV_FREQ
 };
 
+static DEFINE_RAW_SPINLOCK(timekeeper_lock);
+
 /*
  * The most important data for readout fits into a single 64 byte
  * cache line.
  */
 static struct {
-	seqcount_t		seq;
+	seqcount_raw_spinlock_t	seq;
 	struct timekeeper	timekeeper;
 } tk_core ____cacheline_aligned = {
-	.seq = SEQCNT_ZERO(tk_core.seq),
+	.seq = SEQCNT_RAW_SPINLOCK_ZERO(tk_core.seq, &timekeeper_lock),
 };
 
-static DEFINE_RAW_SPINLOCK(timekeeper_lock);
 static struct timekeeper shadow_timekeeper;
 
 /**
@@ -63,7 +64,7 @@ static struct timekeeper shadow_timekeeper;
  * See @update_fast_timekeeper() below.
  */
 struct tk_fast {
-	seqcount_t		seq;
+	seqcount_raw_spinlock_t	seq;
 	struct tk_read_base	base[2];
 };
 
@@ -80,11 +81,13 @@ static struct clocksource dummy_clock = {
 };
 
 static struct tk_fast tk_fast_mono ____cacheline_aligned = {
+	.seq     = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_mono.seq, &timekeeper_lock),
 	.base[0] = { .clock = &dummy_clock, },
 	.base[1] = { .clock = &dummy_clock, },
 };
 
 static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
+	.seq     = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_raw.seq, &timekeeper_lock),
 	.base[0] = { .clock = &dummy_clock, },
 	.base[1] = { .clock = &dummy_clock, },
 };
@@ -157,7 +160,7 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
  * tk_clock_read - atomic clocksource read() helper
  *
  * This helper is necessary to use in the read paths because, while the
- * seqlock ensures we don't return a bad value while structures are updated,
+ * seqcount ensures we don't return a bad value while structures are updated,
  * it doesn't protect from potential crashes. There is the possibility that
  * the tkr's clocksource may change between the read reference, and the
  * clock reference passed to the read function.  This can cause crashes if
@@ -222,10 +225,10 @@ static inline u64 timekeeping_get_delta(const struct tk_read_base *tkr)
 	unsigned int seq;
 
 	/*
-	 * Since we're called holding a seqlock, the data may shift
+	 * Since we're called holding a seqcount, the data may shift
 	 * under us while we're doing the calculation. This can cause
 	 * false positives, since we'd note a problem but throw the
-	 * results away. So nest another seqlock here to atomically
+	 * results away. So nest another seqcount here to atomically
 	 * grab the points we are checking with.
 	 */
 	do {
@@ -486,7 +489,7 @@ EXPORT_SYMBOL_GPL(ktime_get_raw_fast_ns);
  *
  * To keep it NMI safe since we're accessing from tracing, we're not using a
  * separate timekeeper with updates to monotonic clock and boot offset
- * protected with seqlocks. This has the following minor side effects:
+ * protected with seqcounts. This has the following minor side effects:
  *
  * (1) Its possible that a timestamp be taken after the boot offset is updated
  * but before the timekeeper is updated. If this happens, the new boot offset
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 18/24] vfs: Use sequence counter with associated spinlock
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (16 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 17/24] timekeeping: Use sequence counter with associated raw spinlock Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 19/24] raid5: " Ahmed S. Darwish
                     ` (6 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Alexander Viro,
	Mauro Carvalho Chehab, Jonathan Corbet, linux-fsdevel

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 fs/dcache.c               | 2 +-
 fs/fs_struct.c            | 4 ++--
 include/linux/dcache.h    | 2 +-
 include/linux/fs_struct.h | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 361ea7ab30ea..ea0485861d93 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1746,7 +1746,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	dentry->d_lockref.count = 1;
 	dentry->d_flags = 0;
 	spin_lock_init(&dentry->d_lock);
-	seqcount_init(&dentry->d_seq);
+	seqcount_spinlock_init(&dentry->d_seq, &dentry->d_lock);
 	dentry->d_inode = NULL;
 	dentry->d_parent = dentry;
 	dentry->d_sb = sb;
diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index ca639ed967b7..04b3f5b9c629 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -117,7 +117,7 @@ struct fs_struct *copy_fs_struct(struct fs_struct *old)
 		fs->users = 1;
 		fs->in_exec = 0;
 		spin_lock_init(&fs->lock);
-		seqcount_init(&fs->seq);
+		seqcount_spinlock_init(&fs->seq, &fs->lock);
 		fs->umask = old->umask;
 
 		spin_lock(&old->lock);
@@ -163,6 +163,6 @@ EXPORT_SYMBOL(current_umask);
 struct fs_struct init_fs = {
 	.users		= 1,
 	.lock		= __SPIN_LOCK_UNLOCKED(init_fs.lock),
-	.seq		= SEQCNT_ZERO(init_fs.seq),
+	.seq		= SEQCNT_SPINLOCK_ZERO(init_fs.seq, &init_fs.lock),
 	.umask		= 0022,
 };
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index a81f0c3cf352..65d975bf9390 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -89,7 +89,7 @@ extern struct dentry_stat_t dentry_stat;
 struct dentry {
 	/* RCU lookup touched fields */
 	unsigned int d_flags;		/* protected by d_lock */
-	seqcount_t d_seq;		/* per dentry seqlock */
+	seqcount_spinlock_t d_seq;	/* per dentry seqlock */
 	struct hlist_bl_node d_hash;	/* lookup hash list */
 	struct dentry *d_parent;	/* parent directory */
 	struct qstr d_name;
diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h
index cf1015abfbf2..783b48dedb72 100644
--- a/include/linux/fs_struct.h
+++ b/include/linux/fs_struct.h
@@ -9,7 +9,7 @@
 struct fs_struct {
 	int users;
 	spinlock_t lock;
-	seqcount_t seq;
+	seqcount_spinlock_t seq;
 	int umask;
 	int in_exec;
 	struct path root, pwd;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 19/24] raid5: Use sequence counter with associated spinlock
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (17 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 18/24] vfs: Use sequence counter with associated spinlock Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-22  6:40     ` Song Liu
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 20/24] iocost: " Ahmed S. Darwish
                     ` (5 subsequent siblings)
  24 siblings, 2 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Song Liu, linux-raid

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 drivers/md/raid5.c | 2 +-
 drivers/md/raid5.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ab8067f9ce8c..892aefe88fa7 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6935,7 +6935,7 @@ static struct r5conf *setup_conf(struct mddev *mddev)
 	} else
 		goto abort;
 	spin_lock_init(&conf->device_lock);
-	seqcount_init(&conf->gen_lock);
+	seqcount_spinlock_init(&conf->gen_lock, &conf->device_lock);
 	mutex_init(&conf->cache_size_mutex);
 	init_waitqueue_head(&conf->wait_for_quiescent);
 	init_waitqueue_head(&conf->wait_for_stripe);
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index f90e0704bed9..a2c9e9e9f5ac 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -589,7 +589,7 @@ struct r5conf {
 	int			prev_chunk_sectors;
 	int			prev_algo;
 	short			generation; /* increments with every reshape */
-	seqcount_t		gen_lock;	/* lock against generation changes */
+	seqcount_spinlock_t	gen_lock;	/* lock against generation changes */
 	unsigned long		reshape_checkpoint; /* Time we last updated
 						     * metadata */
 	long long		min_offset_diff; /* minimum difference between
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 20/24] iocost: Use sequence counter with associated spinlock
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (18 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 19/24] raid5: " Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 21/24] NFSv4: " Ahmed S. Darwish
                     ` (4 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Daniel Wagner

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
---
 block/blk-iocost.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 8ac4aad66ebc..8e940c27c27c 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -406,7 +406,7 @@ struct ioc {
 	enum ioc_running		running;
 	atomic64_t			vtime_rate;
 
-	seqcount_t			period_seqcount;
+	seqcount_spinlock_t		period_seqcount;
 	u32				period_at;	/* wallclock starttime */
 	u64				period_at_vtime; /* vtime starttime */
 
@@ -873,7 +873,6 @@ static void ioc_now(struct ioc *ioc, struct ioc_now *now)
 
 static void ioc_start_period(struct ioc *ioc, struct ioc_now *now)
 {
-	lockdep_assert_held(&ioc->lock);
 	WARN_ON_ONCE(ioc->running != IOC_RUNNING);
 
 	write_seqcount_begin(&ioc->period_seqcount);
@@ -2001,7 +2000,7 @@ static int blk_iocost_init(struct request_queue *q)
 
 	ioc->running = IOC_IDLE;
 	atomic64_set(&ioc->vtime_rate, VTIME_PER_USEC);
-	seqcount_init(&ioc->period_seqcount);
+	seqcount_spinlock_init(&ioc->period_seqcount, &ioc->lock);
 	ioc->period_at = ktime_to_us(ktime_get());
 	atomic64_set(&ioc->cur_period, 0);
 	atomic_set(&ioc->hweight_gen, 0);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 21/24] NFSv4: Use sequence counter with associated spinlock
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (19 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 20/24] iocost: " Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 22/24] userfaultfd: " Ahmed S. Darwish
                     ` (3 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Trond Myklebust,
	Anna Schumaker, linux-nfs

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 fs/nfs/nfs4_fs.h   | 2 +-
 fs/nfs/nfs4state.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 2b7f6dcd2eb8..210e590e1f71 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -117,7 +117,7 @@ struct nfs4_state_owner {
 	unsigned long	     so_flags;
 	struct list_head     so_states;
 	struct nfs_seqid_counter so_seqid;
-	seqcount_t	     so_reclaim_seqcount;
+	seqcount_spinlock_t  so_reclaim_seqcount;
 	struct mutex	     so_delegreturn_mutex;
 };
 
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index a8dc25ce48bb..b1dba24918f8 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -509,7 +509,7 @@ nfs4_alloc_state_owner(struct nfs_server *server,
 	nfs4_init_seqid_counter(&sp->so_seqid);
 	atomic_set(&sp->so_count, 1);
 	INIT_LIST_HEAD(&sp->so_lru);
-	seqcount_init(&sp->so_reclaim_seqcount);
+	seqcount_spinlock_init(&sp->so_reclaim_seqcount, &sp->so_lock);
 	mutex_init(&sp->so_delegreturn_mutex);
 	return sp;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 22/24] userfaultfd: Use sequence counter with associated spinlock
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (20 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 21/24] NFSv4: " Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 23/24] kvm/eventfd: " Ahmed S. Darwish
                     ` (2 subsequent siblings)
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Alexander Viro,
	linux-fsdevel

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 fs/userfaultfd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 52de29000c7e..26e8b23594fb 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -61,7 +61,7 @@ struct userfaultfd_ctx {
 	/* waitqueue head for events */
 	wait_queue_head_t event_wqh;
 	/* a refile sequence protected by fault_pending_wqh lock */
-	struct seqcount refile_seq;
+	seqcount_spinlock_t refile_seq;
 	/* pseudo fd refcounting */
 	refcount_t refcount;
 	/* userfaultfd syscall flags */
@@ -1998,7 +1998,7 @@ static void init_once_userfaultfd_ctx(void *mem)
 	init_waitqueue_head(&ctx->fault_wqh);
 	init_waitqueue_head(&ctx->event_wqh);
 	init_waitqueue_head(&ctx->fd_wqh);
-	seqcount_init(&ctx->refile_seq);
+	seqcount_spinlock_init(&ctx->refile_seq, &ctx->fault_pending_wqh.lock);
 }
 
 SYSCALL_DEFINE1(userfaultfd, int, flags)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 23/24] kvm/eventfd: Use sequence counter with associated spinlock
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (21 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 22/24] userfaultfd: " Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 15:55   ` [PATCH v4 24/24] hrtimer: Use sequence counter with associated raw spinlock Ahmed S. Darwish
  2020-07-20 16:49   ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Eric Biggers
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish, Paolo Bonzini

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/linux/kvm_irqfd.h | 2 +-
 virt/kvm/eventfd.c        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index dc1da020305b..dac047abdba7 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -42,7 +42,7 @@ struct kvm_kernel_irqfd {
 	wait_queue_entry_t wait;
 	/* Update side is protected by irqfds.lock */
 	struct kvm_kernel_irq_routing_entry irq_entry;
-	seqcount_t irq_entry_sc;
+	seqcount_spinlock_t irq_entry_sc;
 	/* Used for level IRQ fast-path */
 	int gsi;
 	struct work_struct inject;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index ef7ed916ad4a..d6408bb497dc 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -303,7 +303,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	INIT_LIST_HEAD(&irqfd->list);
 	INIT_WORK(&irqfd->inject, irqfd_inject);
 	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
-	seqcount_init(&irqfd->irq_entry_sc);
+	seqcount_spinlock_init(&irqfd->irq_entry_sc, &kvm->irqfds.lock);
 
 	f = fdget(args->fd);
 	if (!f.file) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v4 24/24] hrtimer: Use sequence counter with associated raw spinlock
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (22 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 23/24] kvm/eventfd: " Ahmed S. Darwish
@ 2020-07-20 15:55   ` Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-07-20 16:49   ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Eric Biggers
  24 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 15:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML, Ahmed S. Darwish

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_raw_spinlock_t data type, which allows to associate
a raw spinlock with the sequence counter. This enables lockdep to verify
that the raw spinlock used for writer serialization is held when the
write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/hrtimer.h |  2 +-
 kernel/time/hrtimer.c   | 13 ++++++++++---
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 15c8ac313678..25993b86ac5c 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -159,7 +159,7 @@ struct hrtimer_clock_base {
 	struct hrtimer_cpu_base	*cpu_base;
 	unsigned int		index;
 	clockid_t		clockid;
-	seqcount_t		seq;
+	seqcount_raw_spinlock_t	seq;
 	struct hrtimer		*running;
 	struct timerqueue_head	active;
 	ktime_t			(*get_time)(void);
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index d89da1c7e005..c4038511d5c9 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -135,7 +135,11 @@ static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
  * timer->base->cpu_base
  */
 static struct hrtimer_cpu_base migration_cpu_base = {
-	.clock_base = { { .cpu_base = &migration_cpu_base, }, },
+	.clock_base = { {
+		.cpu_base = &migration_cpu_base,
+		.seq      = SEQCNT_RAW_SPINLOCK_ZERO(migration_cpu_base.seq,
+						     &migration_cpu_base.lock),
+	}, },
 };
 
 #define migration_base	migration_cpu_base.clock_base[0]
@@ -1998,8 +2002,11 @@ int hrtimers_prepare_cpu(unsigned int cpu)
 	int i;
 
 	for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
-		cpu_base->clock_base[i].cpu_base = cpu_base;
-		timerqueue_init_head(&cpu_base->clock_base[i].active);
+		struct hrtimer_clock_base *clock_b = &cpu_base->clock_base[i];
+
+		clock_b->cpu_base = cpu_base;
+		seqcount_raw_spinlock_init(&clock_b->seq, &cpu_base->lock);
+		timerqueue_init_head(&clock_b->active);
 	}
 
 	cpu_base->cpu = cpu;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                     ` (23 preceding siblings ...)
  2020-07-20 15:55   ` [PATCH v4 24/24] hrtimer: Use sequence counter with associated raw spinlock Ahmed S. Darwish
@ 2020-07-20 16:49   ` Eric Biggers
  2020-07-20 17:33     ` Ahmed S. Darwish
  24 siblings, 1 reply; 258+ messages in thread
From: Eric Biggers @ 2020-07-20 16:49 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML,
	Jonathan Corbet, linux-doc, David S. Miller, netdev,
	Alexander Viro, linux-fsdevel

On Mon, Jul 20, 2020 at 05:55:06PM +0200, Ahmed S. Darwish wrote:
> Hi,
> 
> This is v4 of the seqlock patch series:
> 
>    [PATCH v1 00/25]
>    https://lore.kernel.org/lkml/20200519214547.352050-1-a.darwish@linutronix.de
> 
>    [PATCH v2 00/06] (bugfixes-only, merged)
>    https://lore.kernel.org/lkml/20200603144949.1122421-1-a.darwish@linutronix.de
> 
>    [PATCH v2 00/18]
>    https://lore.kernel.org/lkml/20200608005729.1874024-1-a.darwish@linutronix.de
> 
>    [PATCH v3 00/20]
>    https://lore.kernel.org/lkml/20200630054452.3675847-1-a.darwish@linutronix.de
> 
> It is based over:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git :: locking/core
> 

Please include an explanation of the patch series in the cover letter.  It looks
like you sent it in v1 and then stopped including it.  That doesn't work;
reviewers shouldn't have to read all previous versions too and try to guess
which parts are still up-to-date.

- Eric

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks
  2020-07-20 16:49   ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Eric Biggers
@ 2020-07-20 17:33     ` Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-20 17:33 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML,
	Jonathan Corbet, linux-doc, David S. Miller, netdev,
	Alexander Viro, linux-fsdevel

On Mon, Jul 20, 2020 at 09:49:12AM -0700, Eric Biggers wrote:
> On Mon, Jul 20, 2020 at 05:55:06PM +0200, Ahmed S. Darwish wrote:
> > Hi,
> >
> > This is v4 of the seqlock patch series:
> >
> >    [PATCH v1 00/25]
> >    https://lore.kernel.org/lkml/20200519214547.352050-1-a.darwish@linutronix.de
> >
> >    [PATCH v2 00/06] (bugfixes-only, merged)
> >    https://lore.kernel.org/lkml/20200603144949.1122421-1-a.darwish@linutronix.de
> >
> >    [PATCH v2 00/18]
> >    https://lore.kernel.org/lkml/20200608005729.1874024-1-a.darwish@linutronix.de
> >
> >    [PATCH v3 00/20]
> >    https://lore.kernel.org/lkml/20200630054452.3675847-1-a.darwish@linutronix.de
> >
> > It is based over:
> >
> >    git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git :: locking/core
> >
>
> Please include an explanation of the patch series in the cover letter.  It looks
> like you sent it in v1 and then stopped including it.  That doesn't work;
> reviewers shouldn't have to read all previous versions too and try to guess
> which parts are still up-to-date.
>

Noted. Thanks for the heads-up.

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 01/24] Documentation: locking: Describe seqlock design and usage
  2020-07-20 15:55   ` [PATCH v4 01/24] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
@ 2020-07-21  1:35     ` Steven Rostedt
  2020-07-21  1:37       ` Steven Rostedt
  2020-07-21  5:34       ` Ahmed S. Darwish
  2020-07-21  1:44     ` Steven Rostedt
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2 siblings, 2 replies; 258+ messages in thread
From: Steven Rostedt @ 2020-07-21  1:35 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, LKML, Jonathan Corbet,
	linux-doc

On Mon, 20 Jul 2020 17:55:07 +0200
"Ahmed S. Darwish" <a.darwish@linutronix.de> wrote:
> +Read path, three categories:
> +
> +1. Normal Sequence readers which never block a writer but they must
> +   retry if a writer is in progress by detecting change in the sequence
> +   number.  Writers do not wait for a sequence reader::
> +
> +	do {
> +		seq = read_seqbegin(&foo_seqlock);
> +
> +		/* ... [[read-side critical section]] ... */
> +
> +	} while (read_seqretry(&foo_seqlock, seq));
> +
> +2. Locking readers which will wait if a writer or another locking reader
> +   is in progress. A locking reader in progress will also block a writer
> +   from entering its critical section. This read lock is
> +   exclusive. Unlike rwlock_t, only one locking reader can acquire it::

Nit, but I would mention that this acts similar to a normal spin_lock, and even disables preeption (in the non-RT case).
> +
> +	read_seqlock_excl(&foo_seqlock);
> +
> +	/* ... [[read-side critical section]] ... */
> +
> +	read_sequnlock_excl(&foo_seqlock);
> +
> +3. Conditional lockless reader (as in 1), or locking reader (as in 2),
> +   according to a passed marker. This is used to avoid lockless readers
> +   starvation (too much retry loops) in case of a sharp spike in write
> +   activity. First, a lockless read is tried (even marker passed). If
> +   that trial fails (odd sequence counter is returned, which is used as
> +   the next iteration marker), the lockless read is transformed to a
> +   full locking read and no retry loop is necessary::
> +
> +	/* marker; even initialization */
> +	int seq = 0;
> +	do {
> +		read_seqbegin_or_lock(&foo_seqlock, &seq);
> +
> +		/* ... [[read-side critical section]] ... */
> +
> +	} while (need_seqretry(&foo_seqlock, seq));
> +	done_seqretry(&foo_seqlock, seq);
> +
> +
> +API documentation
> +=================
> +
> +.. kernel-doc:: include/linux/seqlock.h
> diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> index 8b97204f35a7..299d68f10325 100644
> --- a/include/linux/seqlock.h
> +++ b/include/linux/seqlock.h
> @@ -1,36 +1,15 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
>  #ifndef __LINUX_SEQLOCK_H
>  #define __LINUX_SEQLOCK_H
> +
>  /*
> - * Reader/writer consistent mechanism without starving writers. This type of
> - * lock for data where the reader wants a consistent set of information
> - * and is willing to retry if the information changes. There are two types
> - * of readers:
> - * 1. Sequence readers which never block a writer but they may have to retry
> - *    if a writer is in progress by detecting change in sequence number.
> - *    Writers do not wait for a sequence reader.
> - * 2. Locking readers which will wait if a writer or another locking reader
> - *    is in progress. A locking reader in progress will also block a writer
> - *    from going forward. Unlike the regular rwlock, the read lock here is
> - *    exclusive so that only one locking reader can get it.
> + * seqcount_t / seqlock_t - a reader-writer consistency mechanism with
> + * lockless readers (read-only retry loops), and no writer starvation.
>   *
> - * This is not as cache friendly as brlock. Also, this may not work well
> - * for data that contains pointers, because any writer could
> - * invalidate a pointer that a reader was following.
> + * See Documentation/locking/seqlock.rst
>   *
> - * Expected non-blocking reader usage:
> - * 	do {
> - *	    seq = read_seqbegin(&foo);
> - * 	...
> - *      } while (read_seqretry(&foo, seq));
> - *
> - *
> - * On non-SMP the spin locks disappear but the writer still needs
> - * to increment the sequence variables because an interrupt routine could
> - * change the state of the data.
> - *
> - * Based on x86_64 vsyscall gettimeofday 
> - * by Keith Owens and Andrea Arcangeli
> + * Copyrights:
> + * - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli
>   */
>  
>  #include <linux/spinlock.h>
> @@ -41,8 +20,8 @@
>  #include <asm/processor.h>
>  
>  /*
> - * The seqlock interface does not prescribe a precise sequence of read
> - * begin/retry/end. For readers, typically there is a call to
> + * The seqlock seqcount_t interface does not prescribe a precise sequence of
> + * read begin/retry/end. For readers, typically there is a call to
>   * read_seqcount_begin() and read_seqcount_retry(), however, there are more
>   * esoteric cases which do not follow this pattern.
>   *
> @@ -50,16 +29,30 @@
>   * via seqcount_t under KCSAN: upon beginning a seq-reader critical section,
>   * pessimistically mark the next KCSAN_SEQLOCK_REGION_MAX memory accesses as
>   * atomics; if there is a matching read_seqcount_retry() call, no following
> - * memory operations are considered atomic. Usage of seqlocks via seqlock_t
> - * interface is not affected.
> + * memory operations are considered atomic. Usage of the seqlock_t interface
> + * is not affected.
>   */
>  #define KCSAN_SEQLOCK_REGION_MAX 1000
>  
>  /*
> - * Version using sequence counter only.
> - * This can be used when code has its own mutex protecting the
> - * updating starting before the write_seqcountbeqin() and ending
> - * after the write_seqcount_end().
> + * Sequence counters (seqcount_t)
> + *
> + * This is the raw counting mechanism, without any writer protection.
> + *
> + * Write side critical sections must be serialized and non-preemptible.
> + *
> + * If readers can be invoked from hardirq or softirq contexts,
> + * interrupts or bottom halves must also be respectively disabled before
> + * entering the write section.
> + *
> + * This mechanism can't be used if the protected data contains pointers,
> + * as the writer can invalidate a pointer that a reader is following.
> + *
> + * If it's desired to automatically handle the sequence counter writer
> + * serialization and non-preemptibility requirements, use a sequential
> + * lock (seqlock_t) instead.
> + *
> + * See Documentation/locking/seqlock.rst
>   */
>  typedef struct seqcount {
>  	unsigned sequence;
> @@ -398,10 +391,6 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
>         smp_wmb();      /* increment "sequence" before following stores */
>  }
>  
> -/*
> - * Sequence counter only version assumes that callers are using their
> - * own mutexing.
> - */
>  static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
>  {
>  	raw_write_seqcount_begin(s);
> @@ -434,15 +423,21 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
>  	kcsan_nestable_atomic_end();
>  }
>  
> +/*
> + * Sequential locks (seqlock_t)
> + *
> + * Sequence counters with an embedded spinlock for writer serialization
> + * and non-preemptibility.
> + *
> + * For more info, see:
> + *    - Comments on top of seqcount_t
> + *    - Documentation/locking/seqlock.rst
> + */
>  typedef struct {
>  	struct seqcount seqcount;
>  	spinlock_t lock;
>  } seqlock_t;
>  
> -/*
> - * These macros triggered gcc-3.x compile-time problems.  We think these are
> - * OK now.  Be cautious.
> - */
>  #define __SEQLOCK_UNLOCKED(lockname)			\
>  	{						\
>  		.seqcount = SEQCNT_ZERO(lockname),	\


^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 01/24] Documentation: locking: Describe seqlock design and usage
  2020-07-21  1:35     ` Steven Rostedt
@ 2020-07-21  1:37       ` Steven Rostedt
  2020-07-21  5:34       ` Ahmed S. Darwish
  1 sibling, 0 replies; 258+ messages in thread
From: Steven Rostedt @ 2020-07-21  1:37 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, LKML, Jonathan Corbet,
	linux-doc

On Mon, 20 Jul 2020 21:35:51 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 20 Jul 2020 17:55:07 +0200
> "Ahmed S. Darwish" <a.darwish@linutronix.de> wrote:
> > +Read path, three categories:
> > +
> > +1. Normal Sequence readers which never block a writer but they must
> > +   retry if a writer is in progress by detecting change in the sequence
> > +   number.  Writers do not wait for a sequence reader::
> > +
> > +	do {
> > +		seq = read_seqbegin(&foo_seqlock);
> > +
> > +		/* ... [[read-side critical section]] ... */
> > +
> > +	} while (read_seqretry(&foo_seqlock, seq));
> > +
> > +2. Locking readers which will wait if a writer or another locking reader
> > +   is in progress. A locking reader in progress will also block a writer
> > +   from entering its critical section. This read lock is
> > +   exclusive. Unlike rwlock_t, only one locking reader can acquire it::  
> 
> Nit, but I would mention that this acts similar to a normal spin_lock, and even disables preeption (in the non-RT case).

What I learned today: Ctrl-return on claws mail sends the email :-p

I need to disable that before I send another email before I'm done with it!


I haven't finished reading the rest.

-- Steve

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 01/24] Documentation: locking: Describe seqlock design and usage
  2020-07-20 15:55   ` [PATCH v4 01/24] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
  2020-07-21  1:35     ` Steven Rostedt
@ 2020-07-21  1:44     ` Steven Rostedt
  2020-07-21  1:51       ` Steven Rostedt
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2 siblings, 1 reply; 258+ messages in thread
From: Steven Rostedt @ 2020-07-21  1:44 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, LKML, Jonathan Corbet,
	linux-doc

On Mon, 20 Jul 2020 17:55:07 +0200
"Ahmed S. Darwish" <a.darwish@linutronix.de> wrote:

> +++ b/include/linux/seqlock.h
> @@ -1,36 +1,15 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
>  #ifndef __LINUX_SEQLOCK_H
>  #define __LINUX_SEQLOCK_H
> +
>  /*
> - * Reader/writer consistent mechanism without starving writers. This type of
> - * lock for data where the reader wants a consistent set of information
> - * and is willing to retry if the information changes. There are two types
> - * of readers:
> - * 1. Sequence readers which never block a writer but they may have to retry
> - *    if a writer is in progress by detecting change in sequence number.
> - *    Writers do not wait for a sequence reader.
> - * 2. Locking readers which will wait if a writer or another locking reader
> - *    is in progress. A locking reader in progress will also block a writer
> - *    from going forward. Unlike the regular rwlock, the read lock here is
> - *    exclusive so that only one locking reader can get it.
> + * seqcount_t / seqlock_t - a reader-writer consistency mechanism with
> + * lockless readers (read-only retry loops), and no writer starvation.
>   *
> - * This is not as cache friendly as brlock. Also, this may not work well
> - * for data that contains pointers, because any writer could
> - * invalidate a pointer that a reader was following.
> + * See Documentation/locking/seqlock.rst

I absolutely hate it when I see this.

I much rather have the documentation next to the code. Because
honestly, I trust that comments next to the code will get updated if
the code changes much more likely than comments buried in the
Documentation directory.

It's also more likely that I wont even bother looking at the doc
(because I wont trust it to be up to date) and just read the code and
try to figure it out. Or look at how others have used it.

-- Steve



>   *
> - * Expected non-blocking reader usage:
> - * 	do {
> - *	    seq = read_seqbegin(&foo);
> - * 	...
> - *      } while (read_seqretry(&foo, seq));
> - *
> - *
> - * On non-SMP the spin locks disappear but the writer still needs
> - * to increment the sequence variables because an interrupt routine could
> - * change the state of the data.
> - *
> - * Based on x86_64 vsyscall gettimeofday 
> - * by Keith Owens and Andrea Arcangeli
> + * Copyrights:
> + * - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli
>   */
>  
>  #include <linux/spinlock.h>
> @@ -41,8 +20,8 @@
>  #include <asm/processor.h>
>  
>  /*
> - * The seqlock interface does not prescribe a precise sequence of read
> - * begin/retry/end. For readers, typically there is a call to
> + * The seqlock seqcount_t interface does not prescribe a precise sequence of
> + * read begin/retry/end. For readers, typically there is a call to
>   * read_seqcount_begin() and read_seqcount_retry(), however, there are more
>   * esoteric cases which do not follow this pattern.
>   *
> @@ -50,16 +29,30 @@
>   * via seqcount_t under KCSAN: upon beginning a seq-reader critical section,
>   * pessimistically mark the next KCSAN_SEQLOCK_REGION_MAX memory accesses as
>   * atomics; if there is a matching read_seqcount_retry() call, no following
> - * memory operations are considered atomic. Usage of seqlocks via seqlock_t
> - * interface is not affected.
> + * memory operations are considered atomic. Usage of the seqlock_t interface
> + * is not affected.
>   */
>  #define KCSAN_SEQLOCK_REGION_MAX 1000
>  
>  /*
> - * Version using sequence counter only.
> - * This can be used when code has its own mutex protecting the
> - * updating starting before the write_seqcountbeqin() and ending
> - * after the write_seqcount_end().
> + * Sequence counters (seqcount_t)
> + *
> + * This is the raw counting mechanism, without any writer protection.
> + *
> + * Write side critical sections must be serialized and non-preemptible.
> + *
> + * If readers can be invoked from hardirq or softirq contexts,
> + * interrupts or bottom halves must also be respectively disabled before
> + * entering the write section.
> + *
> + * This mechanism can't be used if the protected data contains pointers,
> + * as the writer can invalidate a pointer that a reader is following.
> + *
> + * If it's desired to automatically handle the sequence counter writer
> + * serialization and non-preemptibility requirements, use a sequential
> + * lock (seqlock_t) instead.
> + *
> + * See Documentation/locking/seqlock.rst
>   */
>  typedef struct seqcount {
>  	unsigned sequence;
> @@ -398,10 +391,6 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
>         smp_wmb();      /* increment "sequence" before following stores */
>  }
>  
> -/*
> - * Sequence counter only version assumes that callers are using their
> - * own mutexing.
> - */
>  static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
>  {
>  	raw_write_seqcount_begin(s);
> @@ -434,15 +423,21 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
>  	kcsan_nestable_atomic_end();
>  }
>  
> +/*
> + * Sequential locks (seqlock_t)
> + *
> + * Sequence counters with an embedded spinlock for writer serialization
> + * and non-preemptibility.
> + *
> + * For more info, see:
> + *    - Comments on top of seqcount_t
> + *    - Documentation/locking/seqlock.rst
> + */
>  typedef struct {
>  	struct seqcount seqcount;
>  	spinlock_t lock;
>  } seqlock_t;
>  
> -/*
> - * These macros triggered gcc-3.x compile-time problems.  We think these are
> - * OK now.  Be cautious.
> - */
>  #define __SEQLOCK_UNLOCKED(lockname)			\
>  	{						\
>  		.seqcount = SEQCNT_ZERO(lockname),	\


^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 01/24] Documentation: locking: Describe seqlock design and usage
  2020-07-21  1:44     ` Steven Rostedt
@ 2020-07-21  1:51       ` Steven Rostedt
  2020-07-21  7:15         ` Ahmed S. Darwish
  0 siblings, 1 reply; 258+ messages in thread
From: Steven Rostedt @ 2020-07-21  1:51 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, LKML, Jonathan Corbet,
	linux-doc

On Mon, 20 Jul 2020 21:44:00 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > - * This is not as cache friendly as brlock. Also, this may not work well
> > - * for data that contains pointers, because any writer could
> > - * invalidate a pointer that a reader was following.
> > + * See Documentation/locking/seqlock.rst  
> 
> I absolutely hate it when I see this.
> 
> I much rather have the documentation next to the code. Because
> honestly, I trust that comments next to the code will get updated if
> the code changes much more likely than comments buried in the
> Documentation directory.
> 
> It's also more likely that I wont even bother looking at the doc
> (because I wont trust it to be up to date) and just read the code and
> try to figure it out. Or look at how others have used it.

Never mind,

I see that kerneldoc is added in patch 5, which helps.

-- Steve

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 01/24] Documentation: locking: Describe seqlock design and usage
  2020-07-21  1:35     ` Steven Rostedt
  2020-07-21  1:37       ` Steven Rostedt
@ 2020-07-21  5:34       ` Ahmed S. Darwish
  1 sibling, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-21  5:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, LKML, Jonathan Corbet,
	linux-doc

On Mon, Jul 20, 2020 at 09:35:51PM -0400, Steven Rostedt wrote:
> On Mon, 20 Jul 2020 17:55:07 +0200
> "Ahmed S. Darwish" <a.darwish@linutronix.de> wrote:
> > +Read path, three categories:
> > +
> > +1. Normal Sequence readers which never block a writer but they must
> > +   retry if a writer is in progress by detecting change in the sequence
> > +   number.  Writers do not wait for a sequence reader::
> > +
> > +	do {
> > +		seq = read_seqbegin(&foo_seqlock);
> > +
> > +		/* ... [[read-side critical section]] ... */
> > +
> > +	} while (read_seqretry(&foo_seqlock, seq));
> > +
> > +2. Locking readers which will wait if a writer or another locking reader
> > +   is in progress. A locking reader in progress will also block a writer
> > +   from entering its critical section. This read lock is
> > +   exclusive. Unlike rwlock_t, only one locking reader can acquire it::
>
> Nit, but I would mention that this acts similar to a normal spin_lock,
> and even disables preeption (in the non-RT case).

will do.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 01/24] Documentation: locking: Describe seqlock design and usage
  2020-07-21  1:51       ` Steven Rostedt
@ 2020-07-21  7:15         ` Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-07-21  7:15 UTC (permalink / raw)
  To: Steven Rostedt, Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Paul E. McKenney,
	Sebastian A. Siewior, LKML, Jonathan Corbet, linux-doc

On Mon, Jul 20, 2020 at 09:51:15PM -0400, Steven Rostedt wrote:
> On Mon, 20 Jul 2020 21:44:00 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > > - * This is not as cache friendly as brlock. Also, this may not work well
> > > - * for data that contains pointers, because any writer could
> > > - * invalidate a pointer that a reader was following.
> > > + * See Documentation/locking/seqlock.rst
> >
> > I absolutely hate it when I see this.
> >
> > I much rather have the documentation next to the code. Because
> > honestly, I trust that comments next to the code will get updated if
> > the code changes much more likely than comments buried in the
> > Documentation directory.
> >
> > It's also more likely that I wont even bother looking at the doc
> > (because I wont trust it to be up to date) and just read the code and
> > try to figure it out. Or look at how others have used it.
>
> Never mind,
>
> I see that kerneldoc is added in patch 5, which helps.
>

Even looking at the current patch #1 *in isolation*, no relevant
comments were removed from seqlock.h at all. They were just moved closer
to the seqcount_t and seqlock_t structure definitions.

The original comments were horrible. They intermingled seqcount_t and
seqlock_t descriptions in a *very* ambiguous, sometimes even in the same
sentence. It was misleading, as the usage constrains for each data type
(especially on the write side) are vastly different.

Even the new KCSAN comment, which was freshly merged this release cycle,
got tainted by such already-existing incoherence. It kept talking about
the "seqlock interface" while it actually meant the seqcount_t
interface.  Then at the end, it realized the semantical ambiguity and
noted how the "seqlock interface" does *not* cover seqlock_t.

Moreover, this seqcount_t/seqlock_t documentation intermingling led to
the critical usage constraint of disabling preemption before entering a
seqcount_t write side critical section never getting explicitly
mentioned. This has (naturally) led to a considerable number of buggy
call sites, and the fixes are now merged.

I've tried to fix the problem at the roots, and basically identified
three major problems:

  1. The seqcount_t/seqlock_t intermingling is confusing everyone. That
     is, both of call-site developers *and* core kernel engineers.

  2. The big picture of seqlock.h was never expressed in a sufficient
     detail.

  3. The usage constraints for the exported seqlock.h APIs were never
     explicitly mentioned.. leading to buggy call sites.

To fix #1, I've changed the all of the existing seqlock.h comments to
explicitly mention either "seqcount_t" or "seqlock_t". Then, divided the
the top seqlock.h comment into a seqcount_t part and a seqlock_t one.
Afterwards, the whole seqlock.h header file code was grouped into
"sections", as seen in patch #4.

To fix #2, Documentation/locking/seqlock.rst was introduced and boldly
referenced at the header-file top comment.

To fix #3, kernel-doc was added for all of the seqlock.h exported APIs.

@Peter, kindly note that all of the above *is exactly why* I've resisted
hiding the new seqcount_LOCKTYPE_t APIs introduced by this patch series
inside generative macros. All the parts that will be referenced by
call-site developers are kept both explicit and kernel-doc commented.

Thanks!

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 19/24] raid5: Use sequence counter with associated spinlock
  2020-07-20 15:55   ` [PATCH v4 19/24] raid5: " Ahmed S. Darwish
@ 2020-07-22  6:40     ` Song Liu
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  1 sibling, 0 replies; 258+ messages in thread
From: Song Liu @ 2020-07-22  6:40 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML,
	linux-raid

On Mon, Jul 20, 2020 at 8:57 AM Ahmed S. Darwish
<a.darwish@linutronix.de> wrote:
>
> A sequence counter write side critical section must be protected by some
> form of locking to serialize writers. A plain seqcount_t does not
> contain the information of which lock must be held when entering a write
> side critical section.
>
> Use the new seqcount_spinlock_t data type, which allows to associate a
> spinlock with the sequence counter. This enables lockdep to verify that
> the spinlock used for writer serialization is held when the write side
> critical section is entered.
>
> If lockdep is disabled this lock association is compiled out and has
> neither storage size nor runtime overhead.
>
> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>

Acked-by: Song Liu <song@kernel.org>

^ permalink raw reply	[flat|nested] 258+ messages in thread

* [tip: locking/core] kvm/eventfd: Use sequence counter with associated spinlock
  2020-07-20 15:55   ` [PATCH v4 23/24] kvm/eventfd: " Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), Paolo Bonzini, x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     5c73b9a2b1b4ecc809a914aa64970157b3d8c936
Gitweb:        https://git.kernel.org/tip/5c73b9a2b1b4ecc809a914aa64970157b3d8c936
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:29 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:29 +02:00

kvm/eventfd: Use sequence counter with associated spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/r/20200720155530.1173732-24-a.darwish@linutronix.de
---
 include/linux/kvm_irqfd.h | 2 +-
 virt/kvm/eventfd.c        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index dc1da02..dac047a 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -42,7 +42,7 @@ struct kvm_kernel_irqfd {
 	wait_queue_entry_t wait;
 	/* Update side is protected by irqfds.lock */
 	struct kvm_kernel_irq_routing_entry irq_entry;
-	seqcount_t irq_entry_sc;
+	seqcount_spinlock_t irq_entry_sc;
 	/* Used for level IRQ fast-path */
 	int gsi;
 	struct work_struct inject;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index ef7ed91..d6408bb 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -303,7 +303,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	INIT_LIST_HEAD(&irqfd->list);
 	INIT_WORK(&irqfd->inject, irqfd_inject);
 	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
-	seqcount_init(&irqfd->irq_entry_sc);
+	seqcount_spinlock_init(&irqfd->irq_entry_sc, &kvm->irqfds.lock);
 
 	f = fdget(args->fd);
 	if (!f.file) {

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] hrtimer: Use sequence counter with associated raw spinlock
  2020-07-20 15:55   ` [PATCH v4 24/24] hrtimer: Use sequence counter with associated raw spinlock Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     af5a06b582ec3d7b0160b4faaa65f73d8dcf989f
Gitweb:        https://git.kernel.org/tip/af5a06b582ec3d7b0160b4faaa65f73d8dcf989f
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:30 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:29 +02:00

hrtimer: Use sequence counter with associated raw spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_raw_spinlock_t data type, which allows to associate
a raw spinlock with the sequence counter. This enables lockdep to verify
that the raw spinlock used for writer serialization is held when the
write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-25-a.darwish@linutronix.de
---
 include/linux/hrtimer.h |  2 +-
 kernel/time/hrtimer.c   | 13 ++++++++++---
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 15c8ac3..25993b8 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -159,7 +159,7 @@ struct hrtimer_clock_base {
 	struct hrtimer_cpu_base	*cpu_base;
 	unsigned int		index;
 	clockid_t		clockid;
-	seqcount_t		seq;
+	seqcount_raw_spinlock_t	seq;
 	struct hrtimer		*running;
 	struct timerqueue_head	active;
 	ktime_t			(*get_time)(void);
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index d89da1c..c403851 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -135,7 +135,11 @@ static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
  * timer->base->cpu_base
  */
 static struct hrtimer_cpu_base migration_cpu_base = {
-	.clock_base = { { .cpu_base = &migration_cpu_base, }, },
+	.clock_base = { {
+		.cpu_base = &migration_cpu_base,
+		.seq      = SEQCNT_RAW_SPINLOCK_ZERO(migration_cpu_base.seq,
+						     &migration_cpu_base.lock),
+	}, },
 };
 
 #define migration_base	migration_cpu_base.clock_base[0]
@@ -1998,8 +2002,11 @@ int hrtimers_prepare_cpu(unsigned int cpu)
 	int i;
 
 	for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
-		cpu_base->clock_base[i].cpu_base = cpu_base;
-		timerqueue_init_head(&cpu_base->clock_base[i].active);
+		struct hrtimer_clock_base *clock_b = &cpu_base->clock_base[i];
+
+		clock_b->cpu_base = cpu_base;
+		seqcount_raw_spinlock_init(&clock_b->seq, &cpu_base->lock);
+		timerqueue_init_head(&clock_b->active);
 	}
 
 	cpu_base->cpu = cpu;

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] userfaultfd: Use sequence counter with associated spinlock
  2020-07-20 15:55   ` [PATCH v4 22/24] userfaultfd: " Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     2ca97ac8bdcc31fdc868f40c73c017f0a648dc07
Gitweb:        https://git.kernel.org/tip/2ca97ac8bdcc31fdc868f40c73c017f0a648dc07
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:28 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:28 +02:00

userfaultfd: Use sequence counter with associated spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-23-a.darwish@linutronix.de
---
 fs/userfaultfd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 52de290..26e8b23 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -61,7 +61,7 @@ struct userfaultfd_ctx {
 	/* waitqueue head for events */
 	wait_queue_head_t event_wqh;
 	/* a refile sequence protected by fault_pending_wqh lock */
-	struct seqcount refile_seq;
+	seqcount_spinlock_t refile_seq;
 	/* pseudo fd refcounting */
 	refcount_t refcount;
 	/* userfaultfd syscall flags */
@@ -1998,7 +1998,7 @@ static void init_once_userfaultfd_ctx(void *mem)
 	init_waitqueue_head(&ctx->fault_wqh);
 	init_waitqueue_head(&ctx->event_wqh);
 	init_waitqueue_head(&ctx->fd_wqh);
-	seqcount_init(&ctx->refile_seq);
+	seqcount_spinlock_init(&ctx->refile_seq, &ctx->fault_pending_wqh.lock);
 }
 
 SYSCALL_DEFINE1(userfaultfd, int, flags)

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] NFSv4: Use sequence counter with associated spinlock
  2020-07-20 15:55   ` [PATCH v4 21/24] NFSv4: " Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     76246c9219726c71e3f470344d8c6a566fa2535b
Gitweb:        https://git.kernel.org/tip/76246c9219726c71e3f470344d8c6a566fa2535b
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:27 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:28 +02:00

NFSv4: Use sequence counter with associated spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-22-a.darwish@linutronix.de
---
 fs/nfs/nfs4_fs.h   | 2 +-
 fs/nfs/nfs4state.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 2b7f6dc..210e590 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -117,7 +117,7 @@ struct nfs4_state_owner {
 	unsigned long	     so_flags;
 	struct list_head     so_states;
 	struct nfs_seqid_counter so_seqid;
-	seqcount_t	     so_reclaim_seqcount;
+	seqcount_spinlock_t  so_reclaim_seqcount;
 	struct mutex	     so_delegreturn_mutex;
 };
 
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index a8dc25c..b1dba24 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -509,7 +509,7 @@ nfs4_alloc_state_owner(struct nfs_server *server,
 	nfs4_init_seqid_counter(&sp->so_seqid);
 	atomic_set(&sp->so_count, 1);
 	INIT_LIST_HEAD(&sp->so_lru);
-	seqcount_init(&sp->so_reclaim_seqcount);
+	seqcount_spinlock_init(&sp->so_reclaim_seqcount, &sp->so_lock);
 	mutex_init(&sp->so_delegreturn_mutex);
 	return sp;
 }

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] iocost: Use sequence counter with associated spinlock
  2020-07-20 15:55   ` [PATCH v4 20/24] iocost: " Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), Daniel Wagner, x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     67b7b641ca69cafb467f7560316b09b8ff0fa5c9
Gitweb:        https://git.kernel.org/tip/67b7b641ca69cafb467f7560316b09b8ff0fa5c9
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:26 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:28 +02:00

iocost: Use sequence counter with associated spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Link: https://lkml.kernel.org/r/20200720155530.1173732-21-a.darwish@linutronix.de
---
 block/blk-iocost.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 8ac4aad..8e940c2 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -406,7 +406,7 @@ struct ioc {
 	enum ioc_running		running;
 	atomic64_t			vtime_rate;
 
-	seqcount_t			period_seqcount;
+	seqcount_spinlock_t		period_seqcount;
 	u32				period_at;	/* wallclock starttime */
 	u64				period_at_vtime; /* vtime starttime */
 
@@ -873,7 +873,6 @@ static void ioc_now(struct ioc *ioc, struct ioc_now *now)
 
 static void ioc_start_period(struct ioc *ioc, struct ioc_now *now)
 {
-	lockdep_assert_held(&ioc->lock);
 	WARN_ON_ONCE(ioc->running != IOC_RUNNING);
 
 	write_seqcount_begin(&ioc->period_seqcount);
@@ -2001,7 +2000,7 @@ static int blk_iocost_init(struct request_queue *q)
 
 	ioc->running = IOC_IDLE;
 	atomic64_set(&ioc->vtime_rate, VTIME_PER_USEC);
-	seqcount_init(&ioc->period_seqcount);
+	seqcount_spinlock_init(&ioc->period_seqcount, &ioc->lock);
 	ioc->period_at = ktime_to_us(ktime_get());
 	atomic64_set(&ioc->cur_period, 0);
 	atomic_set(&ioc->hweight_gen, 0);

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] raid5: Use sequence counter with associated spinlock
  2020-07-20 15:55   ` [PATCH v4 19/24] raid5: " Ahmed S. Darwish
  2020-07-22  6:40     ` Song Liu
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  1 sibling, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), Song Liu, x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     0a87b25ff2eb6169403c88b0d5f3c97bdaa3c930
Gitweb:        https://git.kernel.org/tip/0a87b25ff2eb6169403c88b0d5f3c97bdaa3c930
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:25 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:27 +02:00

raid5: Use sequence counter with associated spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-20-a.darwish@linutronix.de
---
 drivers/md/raid5.c | 2 +-
 drivers/md/raid5.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ab8067f..892aefe 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6935,7 +6935,7 @@ static struct r5conf *setup_conf(struct mddev *mddev)
 	} else
 		goto abort;
 	spin_lock_init(&conf->device_lock);
-	seqcount_init(&conf->gen_lock);
+	seqcount_spinlock_init(&conf->gen_lock, &conf->device_lock);
 	mutex_init(&conf->cache_size_mutex);
 	init_waitqueue_head(&conf->wait_for_quiescent);
 	init_waitqueue_head(&conf->wait_for_stripe);
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index f90e070..a2c9e9e 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -589,7 +589,7 @@ struct r5conf {
 	int			prev_chunk_sectors;
 	int			prev_algo;
 	short			generation; /* increments with every reshape */
-	seqcount_t		gen_lock;	/* lock against generation changes */
+	seqcount_spinlock_t	gen_lock;	/* lock against generation changes */
 	unsigned long		reshape_checkpoint; /* Time we last updated
 						     * metadata */
 	long long		min_offset_diff; /* minimum difference between

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] timekeeping: Use sequence counter with associated raw spinlock
  2020-07-20 15:55   ` [PATCH v4 17/24] timekeeping: Use sequence counter with associated raw spinlock Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     025e82bcbc34cd071390e72fd0b31593282f9425
Gitweb:        https://git.kernel.org/tip/025e82bcbc34cd071390e72fd0b31593282f9425
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:23 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:27 +02:00

timekeeping: Use sequence counter with associated raw spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_raw_spinlock_t data type, which allows to associate
a raw spinlock with the sequence counter. This enables lockdep to verify
that the raw spinlock used for writer serialization is held when the
write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-18-a.darwish@linutronix.de
---
 kernel/time/timekeeping.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index d20d489..05ecfd8 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -39,18 +39,19 @@ enum timekeeping_adv_mode {
 	TK_ADV_FREQ
 };
 
+static DEFINE_RAW_SPINLOCK(timekeeper_lock);
+
 /*
  * The most important data for readout fits into a single 64 byte
  * cache line.
  */
 static struct {
-	seqcount_t		seq;
+	seqcount_raw_spinlock_t	seq;
 	struct timekeeper	timekeeper;
 } tk_core ____cacheline_aligned = {
-	.seq = SEQCNT_ZERO(tk_core.seq),
+	.seq = SEQCNT_RAW_SPINLOCK_ZERO(tk_core.seq, &timekeeper_lock),
 };
 
-static DEFINE_RAW_SPINLOCK(timekeeper_lock);
 static struct timekeeper shadow_timekeeper;
 
 /**
@@ -63,7 +64,7 @@ static struct timekeeper shadow_timekeeper;
  * See @update_fast_timekeeper() below.
  */
 struct tk_fast {
-	seqcount_t		seq;
+	seqcount_raw_spinlock_t	seq;
 	struct tk_read_base	base[2];
 };
 
@@ -80,11 +81,13 @@ static struct clocksource dummy_clock = {
 };
 
 static struct tk_fast tk_fast_mono ____cacheline_aligned = {
+	.seq     = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_mono.seq, &timekeeper_lock),
 	.base[0] = { .clock = &dummy_clock, },
 	.base[1] = { .clock = &dummy_clock, },
 };
 
 static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
+	.seq     = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_raw.seq, &timekeeper_lock),
 	.base[0] = { .clock = &dummy_clock, },
 	.base[1] = { .clock = &dummy_clock, },
 };
@@ -157,7 +160,7 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
  * tk_clock_read - atomic clocksource read() helper
  *
  * This helper is necessary to use in the read paths because, while the
- * seqlock ensures we don't return a bad value while structures are updated,
+ * seqcount ensures we don't return a bad value while structures are updated,
  * it doesn't protect from potential crashes. There is the possibility that
  * the tkr's clocksource may change between the read reference, and the
  * clock reference passed to the read function.  This can cause crashes if
@@ -222,10 +225,10 @@ static inline u64 timekeeping_get_delta(const struct tk_read_base *tkr)
 	unsigned int seq;
 
 	/*
-	 * Since we're called holding a seqlock, the data may shift
+	 * Since we're called holding a seqcount, the data may shift
 	 * under us while we're doing the calculation. This can cause
 	 * false positives, since we'd note a problem but throw the
-	 * results away. So nest another seqlock here to atomically
+	 * results away. So nest another seqcount here to atomically
 	 * grab the points we are checking with.
 	 */
 	do {
@@ -486,7 +489,7 @@ EXPORT_SYMBOL_GPL(ktime_get_raw_fast_ns);
  *
  * To keep it NMI safe since we're accessing from tracing, we're not using a
  * separate timekeeper with updates to monotonic clock and boot offset
- * protected with seqlocks. This has the following minor side effects:
+ * protected with seqcounts. This has the following minor side effects:
  *
  * (1) Its possible that a timestamp be taken after the boot offset is updated
  * but before the timekeeper is updated. If this happens, the new boot offset

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] vfs: Use sequence counter with associated spinlock
  2020-07-20 15:55   ` [PATCH v4 18/24] vfs: Use sequence counter with associated spinlock Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     26475371976c69489d3a8e6c8bbf35afbbc25055
Gitweb:        https://git.kernel.org/tip/26475371976c69489d3a8e6c8bbf35afbbc25055
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:24 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:27 +02:00

vfs: Use sequence counter with associated spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-19-a.darwish@linutronix.de
---
 fs/dcache.c               | 2 +-
 fs/fs_struct.c            | 4 ++--
 include/linux/dcache.h    | 2 +-
 include/linux/fs_struct.h | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 361ea7a..ea04858 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1746,7 +1746,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	dentry->d_lockref.count = 1;
 	dentry->d_flags = 0;
 	spin_lock_init(&dentry->d_lock);
-	seqcount_init(&dentry->d_seq);
+	seqcount_spinlock_init(&dentry->d_seq, &dentry->d_lock);
 	dentry->d_inode = NULL;
 	dentry->d_parent = dentry;
 	dentry->d_sb = sb;
diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index ca639ed..04b3f5b 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -117,7 +117,7 @@ struct fs_struct *copy_fs_struct(struct fs_struct *old)
 		fs->users = 1;
 		fs->in_exec = 0;
 		spin_lock_init(&fs->lock);
-		seqcount_init(&fs->seq);
+		seqcount_spinlock_init(&fs->seq, &fs->lock);
 		fs->umask = old->umask;
 
 		spin_lock(&old->lock);
@@ -163,6 +163,6 @@ EXPORT_SYMBOL(current_umask);
 struct fs_struct init_fs = {
 	.users		= 1,
 	.lock		= __SPIN_LOCK_UNLOCKED(init_fs.lock),
-	.seq		= SEQCNT_ZERO(init_fs.seq),
+	.seq		= SEQCNT_SPINLOCK_ZERO(init_fs.seq, &init_fs.lock),
 	.umask		= 0022,
 };
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index a81f0c3..65d975b 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -89,7 +89,7 @@ extern struct dentry_stat_t dentry_stat;
 struct dentry {
 	/* RCU lookup touched fields */
 	unsigned int d_flags;		/* protected by d_lock */
-	seqcount_t d_seq;		/* per dentry seqlock */
+	seqcount_spinlock_t d_seq;	/* per dentry seqlock */
 	struct hlist_bl_node d_hash;	/* lookup hash list */
 	struct dentry *d_parent;	/* parent directory */
 	struct qstr d_name;
diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h
index cf1015a..783b48d 100644
--- a/include/linux/fs_struct.h
+++ b/include/linux/fs_struct.h
@@ -9,7 +9,7 @@
 struct fs_struct {
 	int users;
 	spinlock_t lock;
-	seqcount_t seq;
+	seqcount_spinlock_t seq;
 	int umask;
 	int in_exec;
 	struct path root, pwd;

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] xfrm: policy: Use sequence counters with associated lock
  2020-07-20 15:55   ` [PATCH v4 16/24] xfrm: policy: Use sequence counters with associated lock Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     77cc278f7b202e4f16f8596837219d02cb090b96
Gitweb:        https://git.kernel.org/tip/77cc278f7b202e4f16f8596837219d02cb090b96
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:22 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:27 +02:00

xfrm: policy: Use sequence counters with associated lock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the sequence counter write side critical
section.

A plain seqcount_t does not contain the information of which lock must
be held when entering a write side critical section.

Use the new seqcount_spinlock_t and seqcount_mutex_t data types instead,
which allow to associate a lock with the sequence counter. This enables
lockdep to verify that the lock used for writer serialization is held
when the write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-17-a.darwish@linutronix.de
---
 net/xfrm/xfrm_policy.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 564aa64..732a940 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -122,7 +122,7 @@ struct xfrm_pol_inexact_bin {
 	/* list containing '*:*' policies */
 	struct hlist_head hhead;
 
-	seqcount_t count;
+	seqcount_spinlock_t count;
 	/* tree sorted by daddr/prefix */
 	struct rb_root root_d;
 
@@ -155,7 +155,7 @@ static struct xfrm_policy_afinfo const __rcu *xfrm_policy_afinfo[AF_INET6 + 1]
 						__read_mostly;
 
 static struct kmem_cache *xfrm_dst_cache __ro_after_init;
-static __read_mostly seqcount_t xfrm_policy_hash_generation;
+static __read_mostly seqcount_mutex_t xfrm_policy_hash_generation;
 
 static struct rhashtable xfrm_policy_inexact_table;
 static const struct rhashtable_params xfrm_pol_inexact_params;
@@ -719,7 +719,7 @@ xfrm_policy_inexact_alloc_bin(const struct xfrm_policy *pol, u8 dir)
 	INIT_HLIST_HEAD(&bin->hhead);
 	bin->root_d = RB_ROOT;
 	bin->root_s = RB_ROOT;
-	seqcount_init(&bin->count);
+	seqcount_spinlock_init(&bin->count, &net->xfrm.xfrm_policy_lock);
 
 	prev = rhashtable_lookup_get_insert_key(&xfrm_policy_inexact_table,
 						&bin->k, &bin->head,
@@ -1906,7 +1906,7 @@ static int xfrm_policy_match(const struct xfrm_policy *pol,
 
 static struct xfrm_pol_inexact_node *
 xfrm_policy_lookup_inexact_addr(const struct rb_root *r,
-				seqcount_t *count,
+				seqcount_spinlock_t *count,
 				const xfrm_address_t *addr, u16 family)
 {
 	const struct rb_node *parent;
@@ -4153,7 +4153,7 @@ void __init xfrm_init(void)
 {
 	register_pernet_subsys(&xfrm_net_ops);
 	xfrm_dev_init();
-	seqcount_init(&xfrm_policy_hash_generation);
+	seqcount_mutex_init(&xfrm_policy_hash_generation, &hash_resize_mutex);
 	xfrm_input_init();
 
 #ifdef CONFIG_INET_ESPINTCP

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  2020-07-20 15:55   ` [PATCH v4 15/24] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     b901892b51317b321c7bc96e2ccd2f522d1380ee
Gitweb:        https://git.kernel.org/tip/b901892b51317b321c7bc96e2ccd2f522d1380ee
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:21 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:26 +02:00

netfilter: nft_set_rbtree: Use sequence counter with associated rwlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_rwlock_t data type, which allows to associate a
rwlock with the sequence counter. This enables lockdep to verify that
the rwlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-16-a.darwish@linutronix.de
---
 net/netfilter/nft_set_rbtree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index b6aad3f..4b2834f 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -18,7 +18,7 @@
 struct nft_rbtree {
 	struct rb_root		root;
 	rwlock_t		lock;
-	seqcount_t		count;
+	seqcount_rwlock_t	count;
 	struct delayed_work	gc_work;
 };
 
@@ -523,7 +523,7 @@ static int nft_rbtree_init(const struct nft_set *set,
 	struct nft_rbtree *priv = nft_set_priv(set);
 
 	rwlock_init(&priv->lock);
-	seqcount_init(&priv->count);
+	seqcount_rwlock_init(&priv->count, &priv->lock);
 	priv->root = RB_ROOT;
 
 	INIT_DEFERRABLE_WORK(&priv->gc_work, nft_rbtree_gc);

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] netfilter: conntrack: Use sequence counter with associated spinlock
  2020-07-20 15:55   ` [PATCH v4 14/24] netfilter: conntrack: " Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     8201d923f492703a7d6c980cff3034759a452b86
Gitweb:        https://git.kernel.org/tip/8201d923f492703a7d6c980cff3034759a452b86
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:20 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:26 +02:00

netfilter: conntrack: Use sequence counter with associated spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-15-a.darwish@linutronix.de
---
 include/net/netfilter/nf_conntrack.h | 2 +-
 net/netfilter/nf_conntrack_core.c    | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index 90690e3..ea4e201 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -286,7 +286,7 @@ int nf_conntrack_hash_resize(unsigned int hashsize);
 
 extern struct hlist_nulls_head *nf_conntrack_hash;
 extern unsigned int nf_conntrack_htable_size;
-extern seqcount_t nf_conntrack_generation;
+extern seqcount_spinlock_t nf_conntrack_generation;
 extern unsigned int nf_conntrack_max;
 
 /* must be called with rcu read lock held */
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index f33d72c..b597b5b 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -180,7 +180,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_htable_size);
 
 unsigned int nf_conntrack_max __read_mostly;
 EXPORT_SYMBOL_GPL(nf_conntrack_max);
-seqcount_t nf_conntrack_generation __read_mostly;
+seqcount_spinlock_t nf_conntrack_generation __read_mostly;
 static unsigned int nf_conntrack_hash_rnd __read_mostly;
 
 static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple,
@@ -2600,7 +2600,8 @@ int nf_conntrack_init_start(void)
 	/* struct nf_ct_ext uses u8 to store offsets/size */
 	BUILD_BUG_ON(total_extension_size() > 255u);
 
-	seqcount_init(&nf_conntrack_generation);
+	seqcount_spinlock_init(&nf_conntrack_generation,
+			       &nf_conntrack_locks_all_lock);
 
 	for (i = 0; i < CONNTRACK_LOCKS; i++)
 		spin_lock_init(&nf_conntrack_locks[i]);

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] sched: tasks: Use sequence counter with associated spinlock
  2020-07-20 15:55   ` [PATCH v4 13/24] sched: tasks: Use sequence counter with associated spinlock Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     b75058614fdd3140074a640b514f6a0b4d485a2d
Gitweb:        https://git.kernel.org/tip/b75058614fdd3140074a640b514f6a0b4d485a2d
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:19 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:26 +02:00

sched: tasks: Use sequence counter with associated spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-14-a.darwish@linutronix.de
---
 include/linux/sched.h | 2 +-
 init/init_task.c      | 3 ++-
 kernel/fork.c         | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8d1de02..9a9d826 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1050,7 +1050,7 @@ struct task_struct {
 	/* Protected by ->alloc_lock: */
 	nodemask_t			mems_allowed;
 	/* Seqence number to catch updates: */
-	seqcount_t			mems_allowed_seq;
+	seqcount_spinlock_t		mems_allowed_seq;
 	int				cpuset_mem_spread_rotor;
 	int				cpuset_slab_spread_rotor;
 #endif
diff --git a/init/init_task.c b/init/init_task.c
index 15089d1..94fe3ba 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -154,7 +154,8 @@ struct task_struct init_task
 	.trc_holdout_list = LIST_HEAD_INIT(init_task.trc_holdout_list),
 #endif
 #ifdef CONFIG_CPUSETS
-	.mems_allowed_seq = SEQCNT_ZERO(init_task.mems_allowed_seq),
+	.mems_allowed_seq = SEQCNT_SPINLOCK_ZERO(init_task.mems_allowed_seq,
+						 &init_task.alloc_lock),
 #endif
 #ifdef CONFIG_RT_MUTEXES
 	.pi_waiters	= RB_ROOT_CACHED,
diff --git a/kernel/fork.c b/kernel/fork.c
index 70d9d0a..fc72f09 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2032,7 +2032,7 @@ static __latent_entropy struct task_struct *copy_process(
 #ifdef CONFIG_CPUSETS
 	p->cpuset_mem_spread_rotor = NUMA_NO_NODE;
 	p->cpuset_slab_spread_rotor = NUMA_NO_NODE;
-	seqcount_init(&p->mems_allowed_seq);
+	seqcount_spinlock_init(&p->mems_allowed_seq, &p->alloc_lock);
 #endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	p->irq_events = 0;

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] dma-buf: Remove custom seqcount lockdep class key
  2020-07-20 15:55   ` [PATCH v4 11/24] dma-buf: Remove custom seqcount lockdep class key Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ahmed S. Darwish, Peter Zijlstra (Intel),
	Sebastian Andrzej Siewior, Daniel Vetter, x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     318ce71f3e3ae4108c1665f3860afa8a2a4c9f02
Gitweb:        https://git.kernel.org/tip/318ce71f3e3ae4108c1665f3860afa8a2a4c9f02
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:17 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:25 +02:00

dma-buf: Remove custom seqcount lockdep class key

Commit 3c3b177a9369 ("reservation: add support for read-only access
using rcu") introduced a sequence counter to manage updates to
reservations. Back then, the reservation object initializer
reservation_object_init() was always inlined.

Having the sequence counter initialization inlined meant that each of
the call sites would have a different lockdep class key, which would've
broken lockdep's deadlock detection. The aforementioned commit thus
introduced, and exported, a custom seqcount lockdep class key and name.

The commit 8735f16803f00 ("dma-buf: cleanup reservation_object_init...")
transformed the reservation object initializer to a normal non-inlined C
function. seqcount_init(), which automatically defines the seqcount
lockdep class key and must be called non-inlined, can now be safely used.

Remove the seqcount custom lockdep class key, name, and export. Use
seqcount_init() inside the dma reservation object initializer.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://lkml.kernel.org/r/20200720155530.1173732-12-a.darwish@linutronix.de
---
 drivers/dma-buf/dma-resv.c |  9 +--------
 include/linux/dma-resv.h   |  2 --
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index b45f851..15efa0c 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -51,12 +51,6 @@
 DEFINE_WD_CLASS(reservation_ww_class);
 EXPORT_SYMBOL(reservation_ww_class);
 
-struct lock_class_key reservation_seqcount_class;
-EXPORT_SYMBOL(reservation_seqcount_class);
-
-const char reservation_seqcount_string[] = "reservation_seqcount";
-EXPORT_SYMBOL(reservation_seqcount_string);
-
 /**
  * dma_resv_list_alloc - allocate fence list
  * @shared_max: number of fences we need space for
@@ -135,9 +129,8 @@ subsys_initcall(dma_resv_lockdep);
 void dma_resv_init(struct dma_resv *obj)
 {
 	ww_mutex_init(&obj->lock, &reservation_ww_class);
+	seqcount_init(&obj->seq);
 
-	__seqcount_init(&obj->seq, reservation_seqcount_string,
-			&reservation_seqcount_class);
 	RCU_INIT_POINTER(obj->fence, NULL);
 	RCU_INIT_POINTER(obj->fence_excl, NULL);
 }
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index ee50d10..a6538ae 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -46,8 +46,6 @@
 #include <linux/rcupdate.h>
 
 extern struct ww_class reservation_ww_class;
-extern struct lock_class_key reservation_seqcount_class;
-extern const char reservation_seqcount_string[];
 
 /**
  * struct dma_resv_list - a list of shared fences

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] dma-buf: Use sequence counter with associated wound/wait mutex
  2020-07-20 15:55   ` [PATCH v4 12/24] dma-buf: Use sequence counter with associated wound/wait mutex Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), Daniel Vetter, x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     cd29f22019ec4ab998d2e1e8c831c7c42db4aa7d
Gitweb:        https://git.kernel.org/tip/cd29f22019ec4ab998d2e1e8c831c7c42db4aa7d
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:18 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:25 +02:00

dma-buf: Use sequence counter with associated wound/wait mutex

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the sequence counter write side critical
section.

The dma-buf reservation subsystem uses plain sequence counters to manage
updates to reservations. Writer serialization is accomplished through a
wound/wait mutex.

Acquiring a wound/wait mutex does not disable preemption, so this needs
to be done manually before and after the write side critical section.

Use the newly-added seqcount_ww_mutex_t instead:

  - It associates the ww_mutex with the sequence count, which enables
    lockdep to validate that the write side critical section is properly
    serialized.

  - It removes the need to explicitly add preempt_disable/enable()
    around the write side critical section because the write_begin/end()
    functions for this new data type automatically do this.

If lockdep is disabled this ww_mutex lock association is compiled out
and has neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://lkml.kernel.org/r/20200720155530.1173732-13-a.darwish@linutronix.de
---
 drivers/dma-buf/dma-resv.c                       | 8 +-------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 --
 include/linux/dma-resv.h                         | 2 +-
 3 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 15efa0c..a763135 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -129,7 +129,7 @@ subsys_initcall(dma_resv_lockdep);
 void dma_resv_init(struct dma_resv *obj)
 {
 	ww_mutex_init(&obj->lock, &reservation_ww_class);
-	seqcount_init(&obj->seq);
+	seqcount_ww_mutex_init(&obj->seq, &obj->lock);
 
 	RCU_INIT_POINTER(obj->fence, NULL);
 	RCU_INIT_POINTER(obj->fence_excl, NULL);
@@ -260,7 +260,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
 	fobj = dma_resv_get_list(obj);
 	count = fobj->shared_count;
 
-	preempt_disable();
 	write_seqcount_begin(&obj->seq);
 
 	for (i = 0; i < count; ++i) {
@@ -282,7 +281,6 @@ replace:
 	smp_store_mb(fobj->shared_count, count);
 
 	write_seqcount_end(&obj->seq);
-	preempt_enable();
 	dma_fence_put(old);
 }
 EXPORT_SYMBOL(dma_resv_add_shared_fence);
@@ -309,14 +307,12 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
 	if (fence)
 		dma_fence_get(fence);
 
-	preempt_disable();
 	write_seqcount_begin(&obj->seq);
 	/* write_seqcount_begin provides the necessary memory barrier */
 	RCU_INIT_POINTER(obj->fence_excl, fence);
 	if (old)
 		old->shared_count = 0;
 	write_seqcount_end(&obj->seq);
-	preempt_enable();
 
 	/* inplace update, no shared fences */
 	while (i--)
@@ -394,13 +390,11 @@ retry:
 	src_list = dma_resv_get_list(dst);
 	old = dma_resv_get_excl(dst);
 
-	preempt_disable();
 	write_seqcount_begin(&dst->seq);
 	/* write_seqcount_begin provides the necessary memory barrier */
 	RCU_INIT_POINTER(dst->fence_excl, new);
 	RCU_INIT_POINTER(dst->fence, dst_list);
 	write_seqcount_end(&dst->seq);
-	preempt_enable();
 
 	dma_resv_list_free(src_list);
 	dma_fence_put(old);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index b91b517..ff4b583 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -258,11 +258,9 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
 	new->shared_count = k;
 
 	/* Install the new fence list, seqcount provides the barriers */
-	preempt_disable();
 	write_seqcount_begin(&resv->seq);
 	RCU_INIT_POINTER(resv->fence, new);
 	write_seqcount_end(&resv->seq);
-	preempt_enable();
 
 	/* Drop the references to the removed fences or move them to ef_list */
 	for (i = j, k = 0; i < old->shared_count; ++i) {
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index a6538ae..d44a77e 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -69,7 +69,7 @@ struct dma_resv_list {
  */
 struct dma_resv {
 	struct ww_mutex lock;
-	seqcount_t seq;
+	seqcount_ww_mutex_t seq;
 
 	struct dma_fence __rcu *fence_excl;
 	struct dma_resv_list __rcu *fence;

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] seqlock: Align multi-line macros newline escapes at 72 columns
  2020-07-20 15:55   ` [PATCH v4 10/24] seqlock: Align multi-line macros newline escapes at 72 columns Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     ec8702da570ebb59f38471007bf71359c51b027b
Gitweb:        https://git.kernel.org/tip/ec8702da570ebb59f38471007bf71359c51b027b
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:16 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:25 +02:00

seqlock: Align multi-line macros newline escapes at 72 columns

Parent commit, "seqlock: Extend seqcount API with associated locks",
introduced a big number of multi-line macros that are newline-escaped
at 72 columns.

For overall cohesion, align the earlier-existing macros similarly.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-11-a.darwish@linutronix.de
---
 include/linux/seqlock.h | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 8c16a49..b487299 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -80,17 +80,18 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
 }
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define SEQCOUNT_DEP_MAP_INIT(lockname) \
-		.dep_map = { .name = #lockname } \
+
+# define SEQCOUNT_DEP_MAP_INIT(lockname)				\
+		.dep_map = { .name = #lockname }
 
 /**
  * seqcount_init() - runtime initializer for seqcount_t
  * @s: Pointer to the seqcount_t instance
  */
-# define seqcount_init(s)				\
-	do {						\
-		static struct lock_class_key __key;	\
-		__seqcount_init((s), #s, &__key);	\
+# define seqcount_init(s)						\
+	do {								\
+		static struct lock_class_key __key;			\
+		__seqcount_init((s), #s, &__key);			\
 	} while (0)
 
 static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
@@ -842,20 +843,20 @@ typedef struct {
 	spinlock_t lock;
 } seqlock_t;
 
-#define __SEQLOCK_UNLOCKED(lockname)			\
-	{						\
-		.seqcount = SEQCNT_ZERO(lockname),	\
-		.lock =	__SPIN_LOCK_UNLOCKED(lockname)	\
+#define __SEQLOCK_UNLOCKED(lockname)					\
+	{								\
+		.seqcount = SEQCNT_ZERO(lockname),			\
+		.lock =	__SPIN_LOCK_UNLOCKED(lockname)			\
 	}
 
 /**
  * seqlock_init() - dynamic initializer for seqlock_t
  * @sl: Pointer to the seqlock_t instance
  */
-#define seqlock_init(sl)				\
-	do {						\
-		seqcount_init(&(sl)->seqcount);		\
-		spin_lock_init(&(sl)->lock);		\
+#define seqlock_init(sl)						\
+	do {								\
+		seqcount_init(&(sl)->seqcount);				\
+		spin_lock_init(&(sl)->lock);				\
 	} while (0)
 
 /**

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] seqlock: lockdep assert non-preemptibility on seqcount_t write
  2020-07-20 15:55   ` [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  2020-08-08 23:21     ` [PATCH v4 08/24] " Guenter Roeck
  1 sibling, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     859247d39fb008ea812e8f0c398a58a20c12899e
Gitweb:        https://git.kernel.org/tip/859247d39fb008ea812e8f0c398a58a20c12899e
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:14 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:24 +02:00

seqlock: lockdep assert non-preemptibility on seqcount_t write

Preemption must be disabled before entering a sequence count write side
critical section.  Failing to do so, the seqcount read side can preempt
the write side section and spin for the entire scheduler tick.  If that
reader belongs to a real-time scheduling class, it can spin forever and
the kernel will livelock.

Assert through lockdep that preemption is disabled for seqcount writers.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-9-a.darwish@linutronix.de
---
 include/linux/seqlock.h | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index e885702..54bc204 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -266,6 +266,12 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
+static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
+{
+	raw_write_seqcount_begin(s);
+	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
+}
+
 /**
  * write_seqcount_begin_nested() - start a seqcount_t write section with
  *                                 custom lockdep nesting level
@@ -276,8 +282,19 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  */
 static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
-	raw_write_seqcount_begin(s);
-	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
+	lockdep_assert_preemption_disabled();
+	__write_seqcount_begin_nested(s, subclass);
+}
+
+/*
+ * A write_seqcount_begin() variant w/o lockdep non-preemptibility checks.
+ *
+ * Use for internal seqlock.h code where it's known that preemption is
+ * already disabled. For example, seqlock_t write side functions.
+ */
+static inline void __write_seqcount_begin(seqcount_t *s)
+{
+	__write_seqcount_begin_nested(s, 0);
 }
 
 /**
@@ -575,7 +592,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
 /**
@@ -601,7 +618,7 @@ static inline void write_sequnlock(seqlock_t *sl)
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
 /**
@@ -628,7 +645,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 }
 
 /**
@@ -649,7 +666,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	unsigned long flags;
 
 	spin_lock_irqsave(&sl->lock, flags);
-	write_seqcount_begin(&sl->seqcount);
+	__write_seqcount_begin(&sl->seqcount);
 	return flags;
 }
 

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] seqlock: Extend seqcount API with associated locks
  2020-07-20 15:55   ` [PATCH v4 09/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     55f3560df975f557c48aa6afc636808f31ecb87a
Gitweb:        https://git.kernel.org/tip/55f3560df975f557c48aa6afc636808f31ecb87a
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:15 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:25 +02:00

seqlock: Extend seqcount API with associated locks

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the write side critical section.

There is no built-in debugging mechanism to verify that the lock used
for writer serialization is held and preemption is disabled. Some usage
sites like dma-buf have explicit lockdep checks for the writer-side
lock, but this covers only a small portion of the sequence counter usage
in the kernel.

Add new sequence counter types which allows to associate a lock to the
sequence counter at initialization time. The seqcount API functions are
extended to provide appropriate lockdep assertions depending on the
seqcount/lock type.

For sequence counters with associated locks that do not implicitly
disable preemption, preemption protection is enforced in the sequence
counter write side functions. This removes the need to explicitly add
preempt_disable/enable() around the write side critical sections: the
write_begin/end() functions for these new sequence counter types
automatically do this.

Introduce the following seqcount types with associated locks:

     seqcount_spinlock_t
     seqcount_raw_spinlock_t
     seqcount_rwlock_t
     seqcount_mutex_t
     seqcount_ww_mutex_t

Extend the seqcount read and write functions to branch out to the
specific seqcount_LOCKTYPE_t implementation at compile-time. This avoids
kernel API explosion per each new seqcount_LOCKTYPE_t added. Add such
compile-time type detection logic into a new, internal, seqlock header.

Document the proper seqcount_LOCKTYPE_t usage, and rationale, at
Documentation/locking/seqlock.rst.

If lockdep is disabled, this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-10-a.darwish@linutronix.de
---
 Documentation/locking/seqlock.rst |  52 +++-
 include/linux/seqlock.h           | 464 ++++++++++++++++++++++++-----
 2 files changed, 447 insertions(+), 69 deletions(-)

diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
index 366dd36..62c5ad9 100644
--- a/Documentation/locking/seqlock.rst
+++ b/Documentation/locking/seqlock.rst
@@ -87,6 +87,58 @@ Read path::
 	} while (read_seqcount_retry(&foo_seqcount, seq));
 
 
+.. _seqcount_locktype_t:
+
+Sequence counters with associated locks (``seqcount_LOCKTYPE_t``)
+-----------------------------------------------------------------
+
+As discussed at :ref:`seqcount_t`, sequence count write side critical
+sections must be serialized and non-preemptible. This variant of
+sequence counters associate the lock used for writer serialization at
+initialization time, which enables lockdep to validate that the write
+side critical sections are properly serialized.
+
+This lock association is a NOOP if lockdep is disabled and has neither
+storage nor runtime overhead. If lockdep is enabled, the lock pointer is
+stored in struct seqcount and lockdep's "lock is held" assertions are
+injected at the beginning of the write side critical section to validate
+that it is properly protected.
+
+For lock types which do not implicitly disable preemption, preemption
+protection is enforced in the write side function.
+
+The following sequence counters with associated locks are defined:
+
+  - ``seqcount_spinlock_t``
+  - ``seqcount_raw_spinlock_t``
+  - ``seqcount_rwlock_t``
+  - ``seqcount_mutex_t``
+  - ``seqcount_ww_mutex_t``
+
+The plain seqcount read and write APIs branch out to the specific
+seqcount_LOCKTYPE_t implementation at compile-time. This avoids kernel
+API explosion per each new seqcount LOCKTYPE.
+
+Initialization (replace "LOCKTYPE" with one of the supported locks)::
+
+	/* dynamic */
+	seqcount_LOCKTYPE_t foo_seqcount;
+	seqcount_LOCKTYPE_init(&foo_seqcount, &lock);
+
+	/* static */
+	static seqcount_LOCKTYPE_t foo_seqcount =
+		SEQCNT_LOCKTYPE_ZERO(foo_seqcount, &lock);
+
+	/* C99 struct init */
+	struct {
+		.seq   = SEQCNT_LOCKTYPE_ZERO(foo.seq, &lock),
+	} foo;
+
+Write path: same as in :ref:`seqcount_t`, while running from a context
+with the associated LOCKTYPE lock acquired.
+
+Read path: same as in :ref:`seqcount_t`.
+
 .. _seqlock_t:
 
 Sequential locks (``seqlock_t``)
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 54bc204..8c16a49 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -10,13 +10,17 @@
  *
  * Copyrights:
  * - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli
+ * - Sequence counters with associated locks, (C) 2020 Linutronix GmbH
  */
 
-#include <linux/spinlock.h>
-#include <linux/preempt.h>
-#include <linux/lockdep.h>
 #include <linux/compiler.h>
 #include <linux/kcsan-checks.h>
+#include <linux/lockdep.h>
+#include <linux/mutex.h>
+#include <linux/preempt.h>
+#include <linux/spinlock.h>
+#include <linux/ww_mutex.h>
+
 #include <asm/processor.h>
 
 /*
@@ -48,6 +52,10 @@
  * This mechanism can't be used if the protected data contains pointers,
  * as the writer can invalidate a pointer that a reader is following.
  *
+ * If the write serialization mechanism is one of the common kernel
+ * locking primitives, use a sequence counter with associated lock
+ * (seqcount_LOCKTYPE_t) instead.
+ *
  * If it's desired to automatically handle the sequence counter writer
  * serialization and non-preemptibility requirements, use a sequential
  * lock (seqlock_t) instead.
@@ -108,9 +116,267 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  */
 #define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) }
 
+/*
+ * Sequence counters with associated locks (seqcount_LOCKTYPE_t)
+ *
+ * A sequence counter which associates the lock used for writer
+ * serialization at initialization time. This enables lockdep to validate
+ * that the write side critical section is properly serialized.
+ *
+ * For associated locks which do not implicitly disable preemption,
+ * preemption protection is enforced in the write side function.
+ *
+ * Lockdep is never used in any for the raw write variants.
+ *
+ * See Documentation/locking/seqlock.rst
+ */
+
+#ifdef CONFIG_LOCKDEP
+#define __SEQ_LOCKDEP(expr)	expr
+#else
+#define __SEQ_LOCKDEP(expr)
+#endif
+
+#define SEQCOUNT_LOCKTYPE_ZERO(seq_name, assoc_lock) {			\
+	.seqcount		= SEQCNT_ZERO(seq_name.seqcount),	\
+	__SEQ_LOCKDEP(.lock	= (assoc_lock))				\
+}
+
+#define seqcount_locktype_init(s, assoc_lock)				\
+do {									\
+	seqcount_init(&(s)->seqcount);					\
+	__SEQ_LOCKDEP((s)->lock = (assoc_lock));			\
+} while (0)
+
+/**
+ * typedef seqcount_spinlock_t - sequence counter with spinlock associated
+ * @seqcount:	The real sequence counter
+ * @lock:	Pointer to the associated spinlock
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * spinlock. The spinlock is associated to the sequence count in the
+ * static initializer or init function. This enables lockdep to validate
+ * that the write side critical section is properly serialized.
+ */
+typedef struct seqcount_spinlock {
+	seqcount_t	seqcount;
+	__SEQ_LOCKDEP(spinlock_t	*lock);
+} seqcount_spinlock_t;
+
+/**
+ * SEQCNT_SPINLOCK_ZERO - static initializer for seqcount_spinlock_t
+ * @name:	Name of the seqcount_spinlock_t instance
+ * @lock:	Pointer to the associated spinlock
+ */
+#define SEQCNT_SPINLOCK_ZERO(name, lock)				\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_spinlock_init - runtime initializer for seqcount_spinlock_t
+ * @s:		Pointer to the seqcount_spinlock_t instance
+ * @lock:	Pointer to the associated spinlock
+ */
+#define seqcount_spinlock_init(s, lock)					\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_raw_spinlock_t - sequence count with raw spinlock associated
+ * @seqcount:	The real sequence counter
+ * @lock:	Pointer to the associated raw spinlock
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * raw spinlock. The raw spinlock is associated to the sequence count in
+ * the static initializer or init function. This enables lockdep to
+ * validate that the write side critical section is properly serialized.
+ */
+typedef struct seqcount_raw_spinlock {
+	seqcount_t      seqcount;
+	__SEQ_LOCKDEP(raw_spinlock_t	*lock);
+} seqcount_raw_spinlock_t;
+
+/**
+ * SEQCNT_RAW_SPINLOCK_ZERO - static initializer for seqcount_raw_spinlock_t
+ * @name:	Name of the seqcount_raw_spinlock_t instance
+ * @lock:	Pointer to the associated raw_spinlock
+ */
+#define SEQCNT_RAW_SPINLOCK_ZERO(name, lock)				\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_raw_spinlock_init - runtime initializer for seqcount_raw_spinlock_t
+ * @s:		Pointer to the seqcount_raw_spinlock_t instance
+ * @lock:	Pointer to the associated raw_spinlock
+ */
+#define seqcount_raw_spinlock_init(s, lock)				\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_rwlock_t - sequence count with rwlock associated
+ * @seqcount:	The real sequence counter
+ * @lock:	Pointer to the associated rwlock
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * rwlock. The rwlock is associated to the sequence count in the static
+ * initializer or init function. This enables lockdep to validate that
+ * the write side critical section is properly serialized.
+ */
+typedef struct seqcount_rwlock {
+	seqcount_t      seqcount;
+	__SEQ_LOCKDEP(rwlock_t		*lock);
+} seqcount_rwlock_t;
+
+/**
+ * SEQCNT_RWLOCK_ZERO - static initializer for seqcount_rwlock_t
+ * @name:	Name of the seqcount_rwlock_t instance
+ * @lock:	Pointer to the associated rwlock
+ */
+#define SEQCNT_RWLOCK_ZERO(name, lock)					\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_rwlock_init - runtime initializer for seqcount_rwlock_t
+ * @s:		Pointer to the seqcount_rwlock_t instance
+ * @lock:	Pointer to the associated rwlock
+ */
+#define seqcount_rwlock_init(s, lock)					\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_mutex_t - sequence count with mutex associated
+ * @seqcount:	The real sequence counter
+ * @lock:	Pointer to the associated mutex
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * mutex. The mutex is associated to the sequence counter in the static
+ * initializer or init function. This enables lockdep to validate that
+ * the write side critical section is properly serialized.
+ *
+ * The write side API functions write_seqcount_begin()/end() automatically
+ * disable and enable preemption when used with seqcount_mutex_t.
+ */
+typedef struct seqcount_mutex {
+	seqcount_t      seqcount;
+	__SEQ_LOCKDEP(struct mutex	*lock);
+} seqcount_mutex_t;
+
+/**
+ * SEQCNT_MUTEX_ZERO - static initializer for seqcount_mutex_t
+ * @name:	Name of the seqcount_mutex_t instance
+ * @lock:	Pointer to the associated mutex
+ */
+#define SEQCNT_MUTEX_ZERO(name, lock)					\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_mutex_init - runtime initializer for seqcount_mutex_t
+ * @s:		Pointer to the seqcount_mutex_t instance
+ * @lock:	Pointer to the associated mutex
+ */
+#define seqcount_mutex_init(s, lock)					\
+	seqcount_locktype_init(s, lock)
+
+/**
+ * typedef seqcount_ww_mutex_t - sequence count with ww_mutex associated
+ * @seqcount:	The real sequence counter
+ * @lock:	Pointer to the associated ww_mutex
+ *
+ * A plain sequence counter with external writer synchronization by a
+ * ww_mutex. The ww_mutex is associated to the sequence counter in the static
+ * initializer or init function. This enables lockdep to validate that
+ * the write side critical section is properly serialized.
+ *
+ * The write side API functions write_seqcount_begin()/end() automatically
+ * disable and enable preemption when used with seqcount_ww_mutex_t.
+ */
+typedef struct seqcount_ww_mutex {
+	seqcount_t      seqcount;
+	__SEQ_LOCKDEP(struct ww_mutex	*lock);
+} seqcount_ww_mutex_t;
+
+/**
+ * SEQCNT_WW_MUTEX_ZERO - static initializer for seqcount_ww_mutex_t
+ * @name:	Name of the seqcount_ww_mutex_t instance
+ * @lock:	Pointer to the associated ww_mutex
+ */
+#define SEQCNT_WW_MUTEX_ZERO(name, lock)				\
+	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
+
+/**
+ * seqcount_ww_mutex_init - runtime initializer for seqcount_ww_mutex_t
+ * @s:		Pointer to the seqcount_ww_mutex_t instance
+ * @lock:	Pointer to the associated ww_mutex
+ */
+#define seqcount_ww_mutex_init(s, lock)					\
+	seqcount_locktype_init(s, lock)
+
+/*
+ * @preempt: Is the associated write serialization lock preemtpible?
+ */
+#define SEQCOUNT_LOCKTYPE(locktype, preempt, lockmember)		\
+static inline seqcount_t *						\
+__seqcount_##locktype##_ptr(seqcount_##locktype##_t *s)			\
+{									\
+	return &s->seqcount;						\
+}									\
+									\
+static inline bool							\
+__seqcount_##locktype##_preemptible(seqcount_##locktype##_t *s)		\
+{									\
+	return preempt;							\
+}									\
+									\
+static inline void							\
+__seqcount_##locktype##_assert(seqcount_##locktype##_t *s)		\
+{									\
+	__SEQ_LOCKDEP(lockdep_assert_held(lockmember));			\
+}
+
+/*
+ * Similar hooks, but for plain seqcount_t
+ */
+
+static inline seqcount_t *__seqcount_ptr(seqcount_t *s)
+{
+	return s;
+}
+
+static inline bool __seqcount_preemptible(seqcount_t *s)
+{
+	return false;
+}
+
+static inline void __seqcount_assert(seqcount_t *s)
+{
+	lockdep_assert_preemption_disabled();
+}
+
+/*
+ * @s: Pointer to seqcount_locktype_t, generated hooks first parameter.
+ */
+SEQCOUNT_LOCKTYPE(raw_spinlock,	false,	s->lock)
+SEQCOUNT_LOCKTYPE(spinlock,	false,	s->lock)
+SEQCOUNT_LOCKTYPE(rwlock,	false,	s->lock)
+SEQCOUNT_LOCKTYPE(mutex,	true,	s->lock)
+SEQCOUNT_LOCKTYPE(ww_mutex,	true,	&s->lock->base)
+
+#define __seqprop_case(s, locktype, prop)				\
+	seqcount_##locktype##_t: __seqcount_##locktype##_##prop((void *)(s))
+
+#define __seqprop(s, prop) _Generic(*(s),				\
+	seqcount_t:		__seqcount_##prop((void *)(s)),		\
+	__seqprop_case((s),	raw_spinlock,	prop),			\
+	__seqprop_case((s),	spinlock,	prop),			\
+	__seqprop_case((s),	rwlock,		prop),			\
+	__seqprop_case((s),	mutex,		prop),			\
+	__seqprop_case((s),	ww_mutex,	prop))
+
+#define __to_seqcount_t(s)				__seqprop(s, ptr)
+#define __associated_lock_exists_and_is_preemptible(s)	__seqprop(s, preemptible)
+#define __assert_write_section_is_protected(s)		__seqprop(s, assert)
+
 /**
  * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
  * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
@@ -122,7 +388,10 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  *
  * Return: count to be passed to read_seqcount_retry()
  */
-static inline unsigned __read_seqcount_begin(const seqcount_t *s)
+#define __read_seqcount_begin(s)					\
+	__read_seqcount_t_begin(__to_seqcount_t(s))
+
+static inline unsigned __read_seqcount_t_begin(const seqcount_t *s)
 {
 	unsigned ret;
 
@@ -138,32 +407,38 @@ repeat:
 
 /**
  * raw_read_seqcount_begin() - begin a seqcount_t read section w/o lockdep
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * Return: count to be passed to read_seqcount_retry()
  */
-static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
+#define raw_read_seqcount_begin(s)					\
+	raw_read_seqcount_t_begin(__to_seqcount_t(s))
+
+static inline unsigned raw_read_seqcount_t_begin(const seqcount_t *s)
 {
-	unsigned ret = __read_seqcount_begin(s);
+	unsigned ret = __read_seqcount_t_begin(s);
 	smp_rmb();
 	return ret;
 }
 
 /**
  * read_seqcount_begin() - begin a seqcount_t read critical section
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * Return: count to be passed to read_seqcount_retry()
  */
-static inline unsigned read_seqcount_begin(const seqcount_t *s)
+#define read_seqcount_begin(s)						\
+	read_seqcount_t_begin(__to_seqcount_t(s))
+
+static inline unsigned read_seqcount_t_begin(const seqcount_t *s)
 {
 	seqcount_lockdep_reader_access(s);
-	return raw_read_seqcount_begin(s);
+	return raw_read_seqcount_t_begin(s);
 }
 
 /**
  * raw_read_seqcount() - read the raw seqcount_t counter value
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * raw_read_seqcount opens a read critical section of the given
  * seqcount_t, without any lockdep checking, and without checking or
@@ -172,7 +447,10 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
  *
  * Return: count to be passed to read_seqcount_retry()
  */
-static inline unsigned raw_read_seqcount(const seqcount_t *s)
+#define raw_read_seqcount(s)						\
+	raw_read_seqcount_t(__to_seqcount_t(s))
+
+static inline unsigned raw_read_seqcount_t(const seqcount_t *s)
 {
 	unsigned ret = READ_ONCE(s->sequence);
 	smp_rmb();
@@ -183,7 +461,7 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
 /**
  * raw_seqcount_begin() - begin a seqcount_t read critical section w/o
  *                        lockdep and w/o counter stabilization
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * raw_seqcount_begin opens a read critical section of the given
  * seqcount_t. Unlike read_seqcount_begin(), this function will not wait
@@ -197,18 +475,21 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
  *
  * Return: count to be passed to read_seqcount_retry()
  */
-static inline unsigned raw_seqcount_begin(const seqcount_t *s)
+#define raw_seqcount_begin(s)						\
+	raw_seqcount_t_begin(__to_seqcount_t(s))
+
+static inline unsigned raw_seqcount_t_begin(const seqcount_t *s)
 {
 	/*
 	 * If the counter is odd, let read_seqcount_retry() fail
 	 * by decrementing the counter.
 	 */
-	return raw_read_seqcount(s) & ~1;
+	return raw_read_seqcount_t(s) & ~1;
 }
 
 /**
  * __read_seqcount_retry() - end a seqcount_t read section w/o barrier
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  * @start: count, from read_seqcount_begin()
  *
  * __read_seqcount_retry is like read_seqcount_retry, but has no smp_rmb()
@@ -221,7 +502,10 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
  *
  * Return: true if a read section retry is required, else false
  */
-static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
+#define __read_seqcount_retry(s, start)					\
+	__read_seqcount_t_retry(__to_seqcount_t(s), start)
+
+static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 {
 	kcsan_atomic_next(0);
 	return unlikely(READ_ONCE(s->sequence) != start);
@@ -229,7 +513,7 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
 
 /**
  * read_seqcount_retry() - end a seqcount_t read critical section
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  * @start: count, from read_seqcount_begin()
  *
  * read_seqcount_retry closes the read critical section of given
@@ -238,17 +522,28 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
  *
  * Return: true if a read section retry is required, else false
  */
-static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
+#define read_seqcount_retry(s, start)					\
+	read_seqcount_t_retry(__to_seqcount_t(s), start)
+
+static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 {
 	smp_rmb();
-	return __read_seqcount_retry(s, start);
+	return __read_seqcount_t_retry(s, start);
 }
 
 /**
  * raw_write_seqcount_begin() - start a seqcount_t write section w/o lockdep
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  */
-static inline void raw_write_seqcount_begin(seqcount_t *s)
+#define raw_write_seqcount_begin(s)					\
+do {									\
+	if (__associated_lock_exists_and_is_preemptible(s))		\
+		preempt_disable();					\
+									\
+	raw_write_seqcount_t_begin(__to_seqcount_t(s));			\
+} while (0)
+
+static inline void raw_write_seqcount_t_begin(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
 	s->sequence++;
@@ -257,49 +552,50 @@ static inline void raw_write_seqcount_begin(seqcount_t *s)
 
 /**
  * raw_write_seqcount_end() - end a seqcount_t write section w/o lockdep
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  */
-static inline void raw_write_seqcount_end(seqcount_t *s)
+#define raw_write_seqcount_end(s)					\
+do {									\
+	raw_write_seqcount_t_end(__to_seqcount_t(s));			\
+									\
+	if (__associated_lock_exists_and_is_preemptible(s))		\
+		preempt_enable();					\
+} while (0)
+
+static inline void raw_write_seqcount_t_end(seqcount_t *s)
 {
 	smp_wmb();
 	s->sequence++;
 	kcsan_nestable_atomic_end();
 }
 
-static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
-{
-	raw_write_seqcount_begin(s);
-	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
-}
-
 /**
  * write_seqcount_begin_nested() - start a seqcount_t write section with
  *                                 custom lockdep nesting level
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  * @subclass: lockdep nesting level
  *
  * See Documentation/locking/lockdep-design.rst
  */
-static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
-{
-	lockdep_assert_preemption_disabled();
-	__write_seqcount_begin_nested(s, subclass);
-}
-
-/*
- * A write_seqcount_begin() variant w/o lockdep non-preemptibility checks.
- *
- * Use for internal seqlock.h code where it's known that preemption is
- * already disabled. For example, seqlock_t write side functions.
- */
-static inline void __write_seqcount_begin(seqcount_t *s)
+#define write_seqcount_begin_nested(s, subclass)			\
+do {									\
+	__assert_write_section_is_protected(s);				\
+									\
+	if (__associated_lock_exists_and_is_preemptible(s))		\
+		preempt_disable();					\
+									\
+	write_seqcount_t_begin_nested(__to_seqcount_t(s), subclass);	\
+} while (0)
+
+static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
 {
-	__write_seqcount_begin_nested(s, 0);
+	raw_write_seqcount_t_begin(s);
+	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }
 
 /**
  * write_seqcount_begin() - start a seqcount_t write side critical section
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * write_seqcount_begin opens a write side critical section of the given
  * seqcount_t.
@@ -308,26 +604,44 @@ static inline void __write_seqcount_begin(seqcount_t *s)
  * non-preemptible. If readers can be invoked from hardirq or softirq
  * context, interrupts or bottom halves must be respectively disabled.
  */
-static inline void write_seqcount_begin(seqcount_t *s)
+#define write_seqcount_begin(s)						\
+do {									\
+	__assert_write_section_is_protected(s);				\
+									\
+	if (__associated_lock_exists_and_is_preemptible(s))		\
+		preempt_disable();					\
+									\
+	write_seqcount_t_begin(__to_seqcount_t(s));			\
+} while (0)
+
+static inline void write_seqcount_t_begin(seqcount_t *s)
 {
-	write_seqcount_begin_nested(s, 0);
+	write_seqcount_t_begin_nested(s, 0);
 }
 
 /**
  * write_seqcount_end() - end a seqcount_t write side critical section
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * The write section must've been opened with write_seqcount_begin().
  */
-static inline void write_seqcount_end(seqcount_t *s)
+#define write_seqcount_end(s)						\
+do {									\
+	write_seqcount_t_end(__to_seqcount_t(s));			\
+									\
+	if (__associated_lock_exists_and_is_preemptible(s))		\
+		preempt_enable();					\
+} while (0)
+
+static inline void write_seqcount_t_end(seqcount_t *s)
 {
 	seqcount_release(&s->dep_map, _RET_IP_);
-	raw_write_seqcount_end(s);
+	raw_write_seqcount_t_end(s);
 }
 
 /**
  * raw_write_seqcount_barrier() - do a seqcount_t write barrier
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * This can be used to provide an ordering guarantee instead of the usual
  * consistency guarantee. It is one wmb cheaper, because it can collapse
@@ -366,7 +680,10 @@ static inline void write_seqcount_end(seqcount_t *s)
  *		WRITE_ONCE(X, false);
  *      }
  */
-static inline void raw_write_seqcount_barrier(seqcount_t *s)
+#define raw_write_seqcount_barrier(s)					\
+	raw_write_seqcount_t_barrier(__to_seqcount_t(s))
+
+static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
 	s->sequence++;
@@ -378,12 +695,15 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
 /**
  * write_seqcount_invalidate() - invalidate in-progress seqcount_t read
  *                               side operations
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * After write_seqcount_invalidate, no seqcount_t read side operations
  * will complete successfully and see data older than this.
  */
-static inline void write_seqcount_invalidate(seqcount_t *s)
+#define write_seqcount_invalidate(s)					\
+	write_seqcount_t_invalidate(__to_seqcount_t(s))
+
+static inline void write_seqcount_t_invalidate(seqcount_t *s)
 {
 	smp_wmb();
 	kcsan_nestable_atomic_begin();
@@ -393,7 +713,7 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
 
 /**
  * raw_read_seqcount_latch() - pick even/odd seqcount_t latch data copy
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * Use seqcount_t latching to switch between two storage places protected
  * by a sequence counter. Doing so allows having interruptible, preemptible,
@@ -406,7 +726,10 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
  * picking which data copy to read. The full counter value must then be
  * checked with read_seqcount_retry().
  */
-static inline int raw_read_seqcount_latch(seqcount_t *s)
+#define raw_read_seqcount_latch(s)					\
+	raw_read_seqcount_t_latch(__to_seqcount_t(s))
+
+static inline int raw_read_seqcount_t_latch(seqcount_t *s)
 {
 	/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
 	int seq = READ_ONCE(s->sequence); /* ^^^ */
@@ -415,7 +738,7 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
 
 /**
  * raw_write_seqcount_latch() - redirect readers to even/odd copy
- * @s: Pointer to seqcount_t
+ * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -494,7 +817,10 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  *	When data is a dynamic data structure; one should use regular RCU
  *	patterns to manage the lifetimes of the objects within.
  */
-static inline void raw_write_seqcount_latch(seqcount_t *s)
+#define raw_write_seqcount_latch(s)					\
+	raw_write_seqcount_t_latch(__to_seqcount_t(s))
+
+static inline void raw_write_seqcount_t_latch(seqcount_t *s)
 {
        smp_wmb();      /* prior stores before incrementing "sequence" */
        s->sequence++;
@@ -592,7 +918,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	__write_seqcount_begin(&sl->seqcount);
+	write_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -604,7 +930,7 @@ static inline void write_seqlock(seqlock_t *sl)
  */
 static inline void write_sequnlock(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock(&sl->lock);
 }
 
@@ -618,7 +944,7 @@ static inline void write_sequnlock(seqlock_t *sl)
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
-	__write_seqcount_begin(&sl->seqcount);
+	write_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -631,7 +957,7 @@ static inline void write_seqlock_bh(seqlock_t *sl)
  */
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock_bh(&sl->lock);
 }
 
@@ -645,7 +971,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
-	__write_seqcount_begin(&sl->seqcount);
+	write_seqcount_t_begin(&sl->seqcount);
 }
 
 /**
@@ -657,7 +983,7 @@ static inline void write_seqlock_irq(seqlock_t *sl)
  */
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock_irq(&sl->lock);
 }
 
@@ -666,7 +992,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	unsigned long flags;
 
 	spin_lock_irqsave(&sl->lock, flags);
-	__write_seqcount_begin(&sl->seqcount);
+	write_seqcount_t_begin(&sl->seqcount);
 	return flags;
 }
 
@@ -695,13 +1021,13 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
-	write_seqcount_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount);
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
 /**
  * read_seqlock_excl() - begin a seqlock_t locking reader section
- * @sl: Pointer to seqlock_t
+ * @sl:	Pointer to seqlock_t
  *
  * read_seqlock_excl opens a seqlock_t locking reader critical section.  A
  * locking reader exclusively locks out *both* other writers *and* other

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] lockdep: Add preemption enabled/disabled assertion APIs
  2020-07-20 15:55   ` [PATCH v4 07/24] lockdep: Add preemption enabled/disabled assertion APIs Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     8fd8ad5c5dfcb09cf62abadd4043eaf1afbbd0ce
Gitweb:        https://git.kernel.org/tip/8fd8ad5c5dfcb09cf62abadd4043eaf1afbbd0ce
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:13 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:24 +02:00

lockdep: Add preemption enabled/disabled assertion APIs

Asserting that preemption is enabled or disabled is a critical sanity
check.  Developers are usually reluctant to add such a check in a
fastpath as reading the preemption count can be costly.

Extend the lockdep API with macros asserting that preemption is disabled
or enabled. If lockdep is disabled, or if the underlying architecture
does not support kernel preemption, this assert has no runtime overhead.

References: f54bb2ec02c8 ("locking/lockdep: Add IRQs disabled/enabled assertion APIs: ...")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-8-a.darwish@linutronix.de
---
 include/linux/lockdep.h | 19 +++++++++++++++++++
 lib/Kconfig.debug       |  1 +
 2 files changed, 20 insertions(+)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 7aafba0..39a3569 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -549,6 +549,22 @@ do {									\
 	WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirq_context));	\
 } while (0)
 
+#define lockdep_assert_preemption_enabled()				\
+do {									\
+	WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT)	&&		\
+		     debug_locks			&&		\
+		     (preempt_count() != 0		||		\
+		      !this_cpu_read(hardirqs_enabled)));		\
+} while (0)
+
+#define lockdep_assert_preemption_disabled()				\
+do {									\
+	WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT)	&&		\
+		     debug_locks			&&		\
+		     (preempt_count() == 0		&&		\
+		      this_cpu_read(hardirqs_enabled)));		\
+} while (0)
+
 #else
 # define might_lock(lock) do { } while (0)
 # define might_lock_read(lock) do { } while (0)
@@ -557,6 +573,9 @@ do {									\
 # define lockdep_assert_irqs_enabled() do { } while (0)
 # define lockdep_assert_irqs_disabled() do { } while (0)
 # define lockdep_assert_in_irq() do { } while (0)
+
+# define lockdep_assert_preemption_enabled() do { } while (0)
+# define lockdep_assert_preemption_disabled() do { } while (0)
 #endif
 
 #ifdef CONFIG_PROVE_RAW_LOCK_NESTING
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 9ad9210..5379931 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1117,6 +1117,7 @@ config PROVE_LOCKING
 	select DEBUG_RWSEMS
 	select DEBUG_WW_MUTEX_SLOWPATH
 	select DEBUG_LOCK_ALLOC
+	select PREEMPT_COUNT if !ARCH_NO_PREEMPT
 	select TRACE_IRQFLAGS
 	default n
 	help

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount()
  2020-07-20 15:55   ` [PATCH v4 06/24] seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount() Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     932e46365226324d2cf26d8bdec8b51ceb296948
Gitweb:        https://git.kernel.org/tip/932e46365226324d2cf26d8bdec8b51ceb296948
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:12 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:24 +02:00

seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount()

raw_seqcount_begin() has the same code as raw_read_seqcount(), with the
exception of masking the sequence counter's LSB before returning it to
the caller.

Note, raw_seqcount_begin() masks the counter's LSB before returning it
to the caller so that read_seqcount_retry() can fail if the counter is
odd -- without the overhead of an extra branching instruction.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-7-a.darwish@linutronix.de
---
 include/linux/seqlock.h |  9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 85fb3ac..e885702 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -199,10 +199,11 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
  */
 static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 {
-	unsigned ret = READ_ONCE(s->sequence);
-	smp_rmb();
-	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
-	return ret & ~1;
+	/*
+	 * If the counter is odd, let read_seqcount_retry() fail
+	 * by decrementing the counter.
+	 */
+	return raw_read_seqcount(s) & ~1;
 }
 
 /**

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs
  2020-07-20 15:55   ` [PATCH v4 05/24] seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     89b88845e05752b3d684eaf147f457c8dfa99c5f
Gitweb:        https://git.kernel.org/tip/89b88845e05752b3d684eaf147f457c8dfa99c5f
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:11 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:23 +02:00

seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs

seqlock.h is now included by kernel's RST documentation, but a small
number of the the exported seqlock.h functions are kernel-doc annotated.

Add kernel-doc for all seqlock.h exported APIs.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-6-a.darwish@linutronix.de
---
 include/linux/seqlock.h | 425 +++++++++++++++++++++++++++++++--------
 1 file changed, 348 insertions(+), 77 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 4c14560..85fb3ac 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -75,6 +75,10 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
 # define SEQCOUNT_DEP_MAP_INIT(lockname) \
 		.dep_map = { .name = #lockname } \
 
+/**
+ * seqcount_init() - runtime initializer for seqcount_t
+ * @s: Pointer to the seqcount_t instance
+ */
 # define seqcount_init(s)				\
 	do {						\
 		static struct lock_class_key __key;	\
@@ -98,13 +102,15 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
 # define seqcount_lockdep_reader_access(x)
 #endif
 
-#define SEQCNT_ZERO(lockname) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(lockname)}
-
+/**
+ * SEQCNT_ZERO() - static initializer for seqcount_t
+ * @name: Name of the seqcount_t instance
+ */
+#define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) }
 
 /**
- * __read_seqcount_begin - begin a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
+ * @s: Pointer to seqcount_t
  *
  * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
  * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
@@ -113,6 +119,8 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  *
  * Use carefully, only in critical code, and comment how the barrier is
  * provided.
+ *
+ * Return: count to be passed to read_seqcount_retry()
  */
 static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 {
@@ -129,13 +137,10 @@ repeat:
 }
 
 /**
- * raw_read_seqcount_begin - start seq-read critical section w/o lockdep
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * raw_read_seqcount_begin() - begin a seqcount_t read section w/o lockdep
+ * @s: Pointer to seqcount_t
  *
- * raw_read_seqcount_begin opens a read critical section of the given
- * seqcount, but without any lockdep checking. Validity of the critical
- * section is tested by checking read_seqcount_retry function.
+ * Return: count to be passed to read_seqcount_retry()
  */
 static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
 {
@@ -145,13 +150,10 @@ static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * read_seqcount_begin - begin a seq-read critical section
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * read_seqcount_begin() - begin a seqcount_t read critical section
+ * @s: Pointer to seqcount_t
  *
- * read_seqcount_begin opens a read critical section of the given seqcount.
- * Validity of the critical section is tested by checking read_seqcount_retry
- * function.
+ * Return: count to be passed to read_seqcount_retry()
  */
 static inline unsigned read_seqcount_begin(const seqcount_t *s)
 {
@@ -160,13 +162,15 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount - Read the raw seqcount
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * raw_read_seqcount() - read the raw seqcount_t counter value
+ * @s: Pointer to seqcount_t
  *
  * raw_read_seqcount opens a read critical section of the given
- * seqcount without any lockdep checking and without checking or
- * masking the LSB. Calling code is responsible for handling that.
+ * seqcount_t, without any lockdep checking, and without checking or
+ * masking the sequence counter LSB. Calling code is responsible for
+ * handling that.
+ *
+ * Return: count to be passed to read_seqcount_retry()
  */
 static inline unsigned raw_read_seqcount(const seqcount_t *s)
 {
@@ -177,18 +181,21 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
 }
 
 /**
- * raw_seqcount_begin - begin a seq-read critical section
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * raw_seqcount_begin() - begin a seqcount_t read critical section w/o
+ *                        lockdep and w/o counter stabilization
+ * @s: Pointer to seqcount_t
  *
- * raw_seqcount_begin opens a read critical section of the given seqcount.
- * Validity of the critical section is tested by checking read_seqcount_retry
- * function.
+ * raw_seqcount_begin opens a read critical section of the given
+ * seqcount_t. Unlike read_seqcount_begin(), this function will not wait
+ * for the count to stabilize. If a writer is active when it begins, it
+ * will fail the read_seqcount_retry() at the end of the read critical
+ * section instead of stabilizing at the beginning of it.
  *
- * Unlike read_seqcount_begin(), this function will not wait for the count
- * to stabilize. If a writer is active when we begin, we will fail the
- * read_seqcount_retry() instead of stabilizing at the beginning of the
- * critical section.
+ * Use this only in special kernel hot paths where the read section is
+ * small and has a high probability of success through other external
+ * means. It will save a single branching instruction.
+ *
+ * Return: count to be passed to read_seqcount_retry()
  */
 static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 {
@@ -199,10 +206,9 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 }
 
 /**
- * __read_seqcount_retry - end a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
- * @start: count, from read_seqcount_begin
- * Returns: 1 if retry is required, else 0
+ * __read_seqcount_retry() - end a seqcount_t read section w/o barrier
+ * @s: Pointer to seqcount_t
+ * @start: count, from read_seqcount_begin()
  *
  * __read_seqcount_retry is like read_seqcount_retry, but has no smp_rmb()
  * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
@@ -211,6 +217,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
  *
  * Use carefully, only in critical code, and comment how the barrier is
  * provided.
+ *
+ * Return: true if a read section retry is required, else false
  */
 static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
@@ -219,14 +227,15 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
 }
 
 /**
- * read_seqcount_retry - end a seq-read critical section
- * @s: pointer to seqcount_t
- * @start: count, from read_seqcount_begin
- * Returns: 1 if retry is required, else 0
+ * read_seqcount_retry() - end a seqcount_t read critical section
+ * @s: Pointer to seqcount_t
+ * @start: count, from read_seqcount_begin()
  *
- * read_seqcount_retry closes a read critical section of the given seqcount.
- * If the critical section was invalid, it must be ignored (and typically
- * retried).
+ * read_seqcount_retry closes the read critical section of given
+ * seqcount_t.  If the critical section was invalid, it must be ignored
+ * (and typically retried).
+ *
+ * Return: true if a read section retry is required, else false
  */
 static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
@@ -234,6 +243,10 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
 	return __read_seqcount_retry(s, start);
 }
 
+/**
+ * raw_write_seqcount_begin() - start a seqcount_t write section w/o lockdep
+ * @s: Pointer to seqcount_t
+ */
 static inline void raw_write_seqcount_begin(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
@@ -241,6 +254,10 @@ static inline void raw_write_seqcount_begin(seqcount_t *s)
 	smp_wmb();
 }
 
+/**
+ * raw_write_seqcount_end() - end a seqcount_t write section w/o lockdep
+ * @s: Pointer to seqcount_t
+ */
 static inline void raw_write_seqcount_end(seqcount_t *s)
 {
 	smp_wmb();
@@ -248,17 +265,42 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
+/**
+ * write_seqcount_begin_nested() - start a seqcount_t write section with
+ *                                 custom lockdep nesting level
+ * @s: Pointer to seqcount_t
+ * @subclass: lockdep nesting level
+ *
+ * See Documentation/locking/lockdep-design.rst
+ */
 static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
 	raw_write_seqcount_begin(s);
 	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }
 
+/**
+ * write_seqcount_begin() - start a seqcount_t write side critical section
+ * @s: Pointer to seqcount_t
+ *
+ * write_seqcount_begin opens a write side critical section of the given
+ * seqcount_t.
+ *
+ * Context: seqcount_t write side critical sections must be serialized and
+ * non-preemptible. If readers can be invoked from hardirq or softirq
+ * context, interrupts or bottom halves must be respectively disabled.
+ */
 static inline void write_seqcount_begin(seqcount_t *s)
 {
 	write_seqcount_begin_nested(s, 0);
 }
 
+/**
+ * write_seqcount_end() - end a seqcount_t write side critical section
+ * @s: Pointer to seqcount_t
+ *
+ * The write section must've been opened with write_seqcount_begin().
+ */
 static inline void write_seqcount_end(seqcount_t *s)
 {
 	seqcount_release(&s->dep_map, _RET_IP_);
@@ -266,12 +308,12 @@ static inline void write_seqcount_end(seqcount_t *s)
 }
 
 /**
- * raw_write_seqcount_barrier - do a seq write barrier
- * @s: pointer to seqcount_t
+ * raw_write_seqcount_barrier() - do a seqcount_t write barrier
+ * @s: Pointer to seqcount_t
  *
- * This can be used to provide an ordering guarantee instead of the
- * usual consistency guarantee. It is one wmb cheaper, because we can
- * collapse the two back-to-back wmb()s.
+ * This can be used to provide an ordering guarantee instead of the usual
+ * consistency guarantee. It is one wmb cheaper, because it can collapse
+ * the two back-to-back wmb()s.
  *
  * Note that writes surrounding the barrier should be declared atomic (e.g.
  * via WRITE_ONCE): a) to ensure the writes become visible to other threads
@@ -316,11 +358,12 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
 }
 
 /**
- * write_seqcount_invalidate - invalidate in-progress read-side seq operations
- * @s: pointer to seqcount_t
+ * write_seqcount_invalidate() - invalidate in-progress seqcount_t read
+ *                               side operations
+ * @s: Pointer to seqcount_t
  *
- * After write_seqcount_invalidate, no read-side seq operations will complete
- * successfully and see data older than this.
+ * After write_seqcount_invalidate, no seqcount_t read side operations
+ * will complete successfully and see data older than this.
  */
 static inline void write_seqcount_invalidate(seqcount_t *s)
 {
@@ -330,6 +373,21 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
+/**
+ * raw_read_seqcount_latch() - pick even/odd seqcount_t latch data copy
+ * @s: Pointer to seqcount_t
+ *
+ * Use seqcount_t latching to switch between two storage places protected
+ * by a sequence counter. Doing so allows having interruptible, preemptible,
+ * seqcount_t write side critical sections.
+ *
+ * Check raw_write_seqcount_latch() for more details and a full reader and
+ * writer usage example.
+ *
+ * Return: sequence counter raw value. Use the lowest bit as an index for
+ * picking which data copy to read. The full counter value must then be
+ * checked with read_seqcount_retry().
+ */
 static inline int raw_read_seqcount_latch(seqcount_t *s)
 {
 	/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
@@ -338,8 +396,8 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
 }
 
 /**
- * raw_write_seqcount_latch - redirect readers to even/odd copy
- * @s: pointer to seqcount_t
+ * raw_write_seqcount_latch() - redirect readers to even/odd copy
+ * @s: Pointer to seqcount_t
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -446,17 +504,28 @@ typedef struct {
 		.lock =	__SPIN_LOCK_UNLOCKED(lockname)	\
 	}
 
-#define seqlock_init(x)					\
+/**
+ * seqlock_init() - dynamic initializer for seqlock_t
+ * @sl: Pointer to the seqlock_t instance
+ */
+#define seqlock_init(sl)				\
 	do {						\
-		seqcount_init(&(x)->seqcount);		\
-		spin_lock_init(&(x)->lock);		\
+		seqcount_init(&(sl)->seqcount);		\
+		spin_lock_init(&(sl)->lock);		\
 	} while (0)
 
-#define DEFINE_SEQLOCK(x) \
-		seqlock_t x = __SEQLOCK_UNLOCKED(x)
+/**
+ * DEFINE_SEQLOCK() - Define a statically allocated seqlock_t
+ * @sl: Name of the seqlock_t instance
+ */
+#define DEFINE_SEQLOCK(sl) \
+		seqlock_t sl = __SEQLOCK_UNLOCKED(sl)
 
-/*
- * Read side functions for starting and finalizing a read side section.
+/**
+ * read_seqbegin() - start a seqlock_t read side critical section
+ * @sl: Pointer to seqlock_t
+ *
+ * Return: count, to be passed to read_seqretry()
  */
 static inline unsigned read_seqbegin(const seqlock_t *sl)
 {
@@ -467,6 +536,17 @@ static inline unsigned read_seqbegin(const seqlock_t *sl)
 	return ret;
 }
 
+/**
+ * read_seqretry() - end a seqlock_t read side section
+ * @sl: Pointer to seqlock_t
+ * @start: count, from read_seqbegin()
+ *
+ * read_seqretry closes the read side critical section of given seqlock_t.
+ * If the critical section was invalid, it must be ignored (and typically
+ * retried).
+ *
+ * Return: true if a read section retry is required, else false
+ */
 static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 {
 	/*
@@ -478,10 +558,18 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 	return read_seqcount_retry(&sl->seqcount, start);
 }
 
-/*
- * Lock out other writers and update the count.
- * Acts like a normal spin_lock/unlock.
- * Don't need preempt_disable() because that is in the spin_lock already.
+/**
+ * write_seqlock() - start a seqlock_t write side critical section
+ * @sl: Pointer to seqlock_t
+ *
+ * write_seqlock opens a write side critical section for the given
+ * seqlock_t.  It also implicitly acquires the spinlock_t embedded inside
+ * that sequential lock. All seqlock_t write side sections are thus
+ * automatically serialized and non-preemptible.
+ *
+ * Context: if the seqlock_t read section, or other write side critical
+ * sections, can be invoked from hardirq or softirq contexts, use the
+ * _irqsave or _bh variants of this function instead.
  */
 static inline void write_seqlock(seqlock_t *sl)
 {
@@ -489,30 +577,66 @@ static inline void write_seqlock(seqlock_t *sl)
 	write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock() - end a seqlock_t write side critical section
+ * @sl: Pointer to seqlock_t
+ *
+ * write_sequnlock closes the (serialized and non-preemptible) write side
+ * critical section of given seqlock_t.
+ */
 static inline void write_sequnlock(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
 	spin_unlock(&sl->lock);
 }
 
+/**
+ * write_seqlock_bh() - start a softirqs-disabled seqlock_t write section
+ * @sl: Pointer to seqlock_t
+ *
+ * _bh variant of write_seqlock(). Use only if the read side section, or
+ * other write side sections, can be invoked from softirq contexts.
+ */
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
 	write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock_bh() - end a softirqs-disabled seqlock_t write section
+ * @sl: Pointer to seqlock_t
+ *
+ * write_sequnlock_bh closes the serialized, non-preemptible, and
+ * softirqs-disabled, seqlock_t write side critical section opened with
+ * write_seqlock_bh().
+ */
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
 	spin_unlock_bh(&sl->lock);
 }
 
+/**
+ * write_seqlock_irq() - start a non-interruptible seqlock_t write section
+ * @sl: Pointer to seqlock_t
+ *
+ * _irq variant of write_seqlock(). Use only if the read side section, or
+ * other write sections, can be invoked from hardirq contexts.
+ */
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
 	write_seqcount_begin(&sl->seqcount);
 }
 
+/**
+ * write_sequnlock_irq() - end a non-interruptible seqlock_t write section
+ * @sl: Pointer to seqlock_t
+ *
+ * write_sequnlock_irq closes the serialized and non-interruptible
+ * seqlock_t write side section opened with write_seqlock_irq().
+ */
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
 	write_seqcount_end(&sl->seqcount);
@@ -528,9 +652,28 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	return flags;
 }
 
+/**
+ * write_seqlock_irqsave() - start a non-interruptible seqlock_t write
+ *                           section
+ * @lock:  Pointer to seqlock_t
+ * @flags: Stack-allocated storage for saving caller's local interrupt
+ *         state, to be passed to write_sequnlock_irqrestore().
+ *
+ * _irqsave variant of write_seqlock(). Use it only if the read side
+ * section, or other write sections, can be invoked from hardirq context.
+ */
 #define write_seqlock_irqsave(lock, flags)				\
 	do { flags = __write_seqlock_irqsave(lock); } while (0)
 
+/**
+ * write_sequnlock_irqrestore() - end non-interruptible seqlock_t write
+ *                                section
+ * @sl:    Pointer to seqlock_t
+ * @flags: Caller's saved interrupt state, from write_seqlock_irqsave()
+ *
+ * write_sequnlock_irqrestore closes the serialized and non-interruptible
+ * seqlock_t write section previously opened with write_seqlock_irqsave().
+ */
 static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
@@ -538,36 +681,79 @@ write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
-/*
- * A locking reader exclusively locks out other writers and locking readers,
- * but doesn't update the sequence number. Acts like a normal spin_lock/unlock.
- * Don't need preempt_disable() because that is in the spin_lock already.
+/**
+ * read_seqlock_excl() - begin a seqlock_t locking reader section
+ * @sl: Pointer to seqlock_t
+ *
+ * read_seqlock_excl opens a seqlock_t locking reader critical section.  A
+ * locking reader exclusively locks out *both* other writers *and* other
+ * locking readers, but it does not update the embedded sequence number.
+ *
+ * Locking readers act like a normal spin_lock()/spin_unlock().
+ *
+ * Context: if the seqlock_t write section, *or other read sections*, can
+ * be invoked from hardirq or softirq contexts, use the _irqsave or _bh
+ * variant of this function instead.
+ *
+ * The opened read section must be closed with read_sequnlock_excl().
  */
 static inline void read_seqlock_excl(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl() - end a seqlock_t locking reader critical section
+ * @sl: Pointer to seqlock_t
+ */
 static inline void read_sequnlock_excl(seqlock_t *sl)
 {
 	spin_unlock(&sl->lock);
 }
 
+/**
+ * read_seqlock_excl_bh() - start a seqlock_t locking reader section with
+ *			    softirqs disabled
+ * @sl: Pointer to seqlock_t
+ *
+ * _bh variant of read_seqlock_excl(). Use this variant only if the
+ * seqlock_t write side section, *or other read sections*, can be invoked
+ * from softirq contexts.
+ */
 static inline void read_seqlock_excl_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl_bh() - stop a seqlock_t softirq-disabled locking
+ *			      reader section
+ * @sl: Pointer to seqlock_t
+ */
 static inline void read_sequnlock_excl_bh(seqlock_t *sl)
 {
 	spin_unlock_bh(&sl->lock);
 }
 
+/**
+ * read_seqlock_excl_irq() - start a non-interruptible seqlock_t locking
+ *			     reader section
+ * @sl: Pointer to seqlock_t
+ *
+ * _irq variant of read_seqlock_excl(). Use this only if the seqlock_t
+ * write side section, *or other read sections*, can be invoked from a
+ * hardirq context.
+ */
 static inline void read_seqlock_excl_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
 }
 
+/**
+ * read_sequnlock_excl_irq() - end an interrupts-disabled seqlock_t
+ *                             locking reader section
+ * @sl: Pointer to seqlock_t
+ */
 static inline void read_sequnlock_excl_irq(seqlock_t *sl)
 {
 	spin_unlock_irq(&sl->lock);
@@ -581,9 +767,26 @@ static inline unsigned long __read_seqlock_excl_irqsave(seqlock_t *sl)
 	return flags;
 }
 
+/**
+ * read_seqlock_excl_irqsave() - start a non-interruptible seqlock_t
+ *				 locking reader section
+ * @lock:  Pointer to seqlock_t
+ * @flags: Stack-allocated storage for saving caller's local interrupt
+ *         state, to be passed to read_sequnlock_excl_irqrestore().
+ *
+ * _irqsave variant of read_seqlock_excl(). Use this only if the seqlock_t
+ * write side section, *or other read sections*, can be invoked from a
+ * hardirq context.
+ */
 #define read_seqlock_excl_irqsave(lock, flags)				\
 	do { flags = __read_seqlock_excl_irqsave(lock); } while (0)
 
+/**
+ * read_sequnlock_excl_irqrestore() - end non-interruptible seqlock_t
+ *				      locking reader section
+ * @sl:    Pointer to seqlock_t
+ * @flags: Caller saved interrupt state, from read_seqlock_excl_irqsave()
+ */
 static inline void
 read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags)
 {
@@ -591,14 +794,35 @@ read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags)
 }
 
 /**
- * read_seqbegin_or_lock - begin a sequence number check or locking block
- * @lock: sequence lock
- * @seq : sequence number to be checked
- *
- * First try it once optimistically without taking the lock. If that fails,
- * take the lock. The sequence number is also used as a marker for deciding
- * whether to be a reader (even) or writer (odd).
- * N.B. seq must be initialized to an even number to begin with.
+ * read_seqbegin_or_lock() - begin a seqlock_t lockless or locking reader
+ * @lock: Pointer to seqlock_t
+ * @seq : Marker and return parameter. If the passed value is even, the
+ * reader will become a *lockless* seqlock_t reader as in read_seqbegin().
+ * If the passed value is odd, the reader will become a *locking* reader
+ * as in read_seqlock_excl().  In the first call to this function, the
+ * caller *must* initialize and pass an even value to @seq; this way, a
+ * lockless read can be optimistically tried first.
+ *
+ * read_seqbegin_or_lock is an API designed to optimistically try a normal
+ * lockless seqlock_t read section first.  If an odd counter is found, the
+ * lockless read trial has failed, and the next read iteration transforms
+ * itself into a full seqlock_t locking reader.
+ *
+ * This is typically used to avoid seqlock_t lockless readers starvation
+ * (too much retry loops) in the case of a sharp spike in write side
+ * activity.
+ *
+ * Context: if the seqlock_t write section, *or other read sections*, can
+ * be invoked from hardirq or softirq contexts, use the _irqsave or _bh
+ * variant of this function instead.
+ *
+ * Check Documentation/locking/seqlock.rst for template example code.
+ *
+ * Return: the encountered sequence counter value, through the @seq
+ * parameter, which is overloaded as a return parameter. This returned
+ * value must be checked with need_seqretry(). If the read section need to
+ * be retried, this returned value must also be passed as the @seq
+ * parameter of the next read_seqbegin_or_lock() iteration.
  */
 static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
 {
@@ -608,17 +832,52 @@ static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
 		read_seqlock_excl(lock);
 }
 
+/**
+ * need_seqretry() - validate seqlock_t "locking or lockless" read section
+ * @lock: Pointer to seqlock_t
+ * @seq: sequence count, from read_seqbegin_or_lock()
+ *
+ * Return: true if a read section retry is required, false otherwise
+ */
 static inline int need_seqretry(seqlock_t *lock, int seq)
 {
 	return !(seq & 1) && read_seqretry(lock, seq);
 }
 
+/**
+ * done_seqretry() - end seqlock_t "locking or lockless" reader section
+ * @lock: Pointer to seqlock_t
+ * @seq: count, from read_seqbegin_or_lock()
+ *
+ * done_seqretry finishes the seqlock_t read side critical section started
+ * with read_seqbegin_or_lock() and validated by need_seqretry().
+ */
 static inline void done_seqretry(seqlock_t *lock, int seq)
 {
 	if (seq & 1)
 		read_sequnlock_excl(lock);
 }
 
+/**
+ * read_seqbegin_or_lock_irqsave() - begin a seqlock_t lockless reader, or
+ *                                   a non-interruptible locking reader
+ * @lock: Pointer to seqlock_t
+ * @seq:  Marker and return parameter. Check read_seqbegin_or_lock().
+ *
+ * This is the _irqsave variant of read_seqbegin_or_lock(). Use it only if
+ * the seqlock_t write section, *or other read sections*, can be invoked
+ * from hardirq context.
+ *
+ * Note: Interrupts will be disabled only for "locking reader" mode.
+ *
+ * Return:
+ *
+ *   1. The saved local interrupts state in case of a locking reader, to
+ *      be passed to done_seqretry_irqrestore().
+ *
+ *   2. The encountered sequence counter value, returned through @seq
+ *      overloaded as a return parameter. Check read_seqbegin_or_lock().
+ */
 static inline unsigned long
 read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
 {
@@ -632,6 +891,18 @@ read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
 	return flags;
 }
 
+/**
+ * done_seqretry_irqrestore() - end a seqlock_t lockless reader, or a
+ *				non-interruptible locking reader section
+ * @lock:  Pointer to seqlock_t
+ * @seq:   Count, from read_seqbegin_or_lock_irqsave()
+ * @flags: Caller's saved local interrupt state in case of a locking
+ *	   reader, also from read_seqbegin_or_lock_irqsave()
+ *
+ * This is the _irqrestore variant of done_seqretry(). The read section
+ * must've been opened with read_seqbegin_or_lock_irqsave(), and validated
+ * by need_seqretry().
+ */
 static inline void
 done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags)
 {

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] seqlock: Reorder seqcount_t and seqlock_t API definitions
  2020-07-20 15:55   ` [PATCH v4 04/24] seqlock: Reorder seqcount_t and seqlock_t API definitions Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     f4a27cbcec90ac04ee60e04b222e1449dcdba0bd
Gitweb:        https://git.kernel.org/tip/f4a27cbcec90ac04ee60e04b222e1449dcdba0bd
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:10 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:23 +02:00

seqlock: Reorder seqcount_t and seqlock_t API definitions

The seqlock.h seqcount_t and seqlock_t API definitions are presented in
the chronological order of their development rather than the order that
makes most sense to readers. This makes it hard to follow and understand
the header file code.

Group and reorder all of the exported seqlock.h functions according to
their function.

First, group together the seqcount_t standard read path functions:

    - __read_seqcount_begin()
    - raw_read_seqcount_begin()
    - read_seqcount_begin()

since each function is implemented exactly in terms of the one above
it. Then, group the special-case seqcount_t readers on their own as:

    - raw_read_seqcount()
    - raw_seqcount_begin()

since the only difference between the two functions is that the second
one masks the sequence counter LSB while the first one does not. Note
that raw_seqcount_begin() can actually be implemented in terms of
raw_read_seqcount(), which will be done in a follow-up commit.

Then, group the seqcount_t write path functions, instead of injecting
unrelated seqcount_t latch functions between them, and order them as:

    - raw_write_seqcount_begin()
    - raw_write_seqcount_end()
    - write_seqcount_begin_nested()
    - write_seqcount_begin()
    - write_seqcount_end()
    - raw_write_seqcount_barrier()
    - write_seqcount_invalidate()

which is the expected natural order. This also isolates the seqcount_t
latch functions into their own area, at the end of the sequence counters
section, and before jumping to the next one: sequential locks
(seqlock_t).

Do a similar grouping and reordering for seqlock_t "locking" readers vs.
the "conditionally locking or lockless" ones.

No implementation code was changed in any of the reordering above.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-5-a.darwish@linutronix.de
---
 include/linux/seqlock.h | 158 +++++++++++++++++++--------------------
 1 file changed, 78 insertions(+), 80 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index d724b5e..4c14560 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -129,23 +129,6 @@ repeat:
 }
 
 /**
- * raw_read_seqcount - Read the raw seqcount
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
- *
- * raw_read_seqcount opens a read critical section of the given
- * seqcount without any lockdep checking and without checking or
- * masking the LSB. Calling code is responsible for handling that.
- */
-static inline unsigned raw_read_seqcount(const seqcount_t *s)
-{
-	unsigned ret = READ_ONCE(s->sequence);
-	smp_rmb();
-	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
-	return ret;
-}
-
-/**
  * raw_read_seqcount_begin - start seq-read critical section w/o lockdep
  * @s: pointer to seqcount_t
  * Returns: count to be passed to read_seqcount_retry
@@ -177,6 +160,23 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
 }
 
 /**
+ * raw_read_seqcount - Read the raw seqcount
+ * @s: pointer to seqcount_t
+ * Returns: count to be passed to read_seqcount_retry
+ *
+ * raw_read_seqcount opens a read critical section of the given
+ * seqcount without any lockdep checking and without checking or
+ * masking the LSB. Calling code is responsible for handling that.
+ */
+static inline unsigned raw_read_seqcount(const seqcount_t *s)
+{
+	unsigned ret = READ_ONCE(s->sequence);
+	smp_rmb();
+	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
+	return ret;
+}
+
+/**
  * raw_seqcount_begin - begin a seq-read critical section
  * @s: pointer to seqcount_t
  * Returns: count to be passed to read_seqcount_retry
@@ -234,8 +234,6 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
 	return __read_seqcount_retry(s, start);
 }
 
-
-
 static inline void raw_write_seqcount_begin(seqcount_t *s)
 {
 	kcsan_nestable_atomic_begin();
@@ -250,6 +248,23 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
+static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
+{
+	raw_write_seqcount_begin(s);
+	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
+}
+
+static inline void write_seqcount_begin(seqcount_t *s)
+{
+	write_seqcount_begin_nested(s, 0);
+}
+
+static inline void write_seqcount_end(seqcount_t *s)
+{
+	seqcount_release(&s->dep_map, _RET_IP_);
+	raw_write_seqcount_end(s);
+}
+
 /**
  * raw_write_seqcount_barrier - do a seq write barrier
  * @s: pointer to seqcount_t
@@ -300,6 +315,21 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
+/**
+ * write_seqcount_invalidate - invalidate in-progress read-side seq operations
+ * @s: pointer to seqcount_t
+ *
+ * After write_seqcount_invalidate, no read-side seq operations will complete
+ * successfully and see data older than this.
+ */
+static inline void write_seqcount_invalidate(seqcount_t *s)
+{
+	smp_wmb();
+	kcsan_nestable_atomic_begin();
+	s->sequence+=2;
+	kcsan_nestable_atomic_end();
+}
+
 static inline int raw_read_seqcount_latch(seqcount_t *s)
 {
 	/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
@@ -395,38 +425,6 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
        smp_wmb();      /* increment "sequence" before following stores */
 }
 
-static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
-{
-	raw_write_seqcount_begin(s);
-	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
-}
-
-static inline void write_seqcount_begin(seqcount_t *s)
-{
-	write_seqcount_begin_nested(s, 0);
-}
-
-static inline void write_seqcount_end(seqcount_t *s)
-{
-	seqcount_release(&s->dep_map, _RET_IP_);
-	raw_write_seqcount_end(s);
-}
-
-/**
- * write_seqcount_invalidate - invalidate in-progress read-side seq operations
- * @s: pointer to seqcount_t
- *
- * After write_seqcount_invalidate, no read-side seq operations will complete
- * successfully and see data older than this.
- */
-static inline void write_seqcount_invalidate(seqcount_t *s)
-{
-	smp_wmb();
-	kcsan_nestable_atomic_begin();
-	s->sequence+=2;
-	kcsan_nestable_atomic_end();
-}
-
 /*
  * Sequential locks (seqlock_t)
  *
@@ -555,35 +553,6 @@ static inline void read_sequnlock_excl(seqlock_t *sl)
 	spin_unlock(&sl->lock);
 }
 
-/**
- * read_seqbegin_or_lock - begin a sequence number check or locking block
- * @lock: sequence lock
- * @seq : sequence number to be checked
- *
- * First try it once optimistically without taking the lock. If that fails,
- * take the lock. The sequence number is also used as a marker for deciding
- * whether to be a reader (even) or writer (odd).
- * N.B. seq must be initialized to an even number to begin with.
- */
-static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
-{
-	if (!(*seq & 1))	/* Even */
-		*seq = read_seqbegin(lock);
-	else			/* Odd */
-		read_seqlock_excl(lock);
-}
-
-static inline int need_seqretry(seqlock_t *lock, int seq)
-{
-	return !(seq & 1) && read_seqretry(lock, seq);
-}
-
-static inline void done_seqretry(seqlock_t *lock, int seq)
-{
-	if (seq & 1)
-		read_sequnlock_excl(lock);
-}
-
 static inline void read_seqlock_excl_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
@@ -621,6 +590,35 @@ read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags)
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
+/**
+ * read_seqbegin_or_lock - begin a sequence number check or locking block
+ * @lock: sequence lock
+ * @seq : sequence number to be checked
+ *
+ * First try it once optimistically without taking the lock. If that fails,
+ * take the lock. The sequence number is also used as a marker for deciding
+ * whether to be a reader (even) or writer (odd).
+ * N.B. seq must be initialized to an even number to begin with.
+ */
+static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
+{
+	if (!(*seq & 1))	/* Even */
+		*seq = read_seqbegin(lock);
+	else			/* Odd */
+		read_seqlock_excl(lock);
+}
+
+static inline int need_seqretry(seqlock_t *lock, int seq)
+{
+	return !(seq & 1) && read_seqretry(lock, seq);
+}
+
+static inline void done_seqretry(seqlock_t *lock, int seq)
+{
+	if (seq & 1)
+		read_sequnlock_excl(lock);
+}
+
 static inline unsigned long
 read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
 {

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] seqlock: seqcount_t latch: End read sections with read_seqcount_retry()
  2020-07-20 15:55   ` [PATCH v4 03/24] seqlock: seqcount_t latch: End read sections with read_seqcount_retry() Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     d3b35b87f436c1b226a8061bee9c8875ba6658bd
Gitweb:        https://git.kernel.org/tip/d3b35b87f436c1b226a8061bee9c8875ba6658bd
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:09 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:23 +02:00

seqlock: seqcount_t latch: End read sections with read_seqcount_retry()

The seqcount_t latch reader example at the raw_write_seqcount_latch()
kernel-doc comment ends the latch read section with a manual smp memory
barrier and sequence counter comparison.

This is technically correct, but it is suboptimal: read_seqcount_retry()
already contains the same logic of an smp memory barrier and sequence
counter comparison.

End the latch read critical section example with read_seqcount_retry().

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-4-a.darwish@linutronix.de
---
 include/linux/seqlock.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 6c4f68e..d724b5e 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -363,8 +363,8 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  *			idx = seq & 0x01;
  *			entry = data_query(latch->data[idx], ...);
  *
- *			smp_rmb();
- *		} while (seq != latch->seq);
+ *		// read_seqcount_retry() includes needed smp_rmb()
+ *		} while (read_seqcount_retry(&latch->seq, seq));
  *
  *		return entry;
  *	}

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] seqlock: Properly format kernel-doc code samples
  2020-07-20 15:55   ` [PATCH v4 02/24] seqlock: Properly format kernel-doc code samples Ahmed S. Darwish
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     15cbe67bbd3adeb4854c42713dbeaf2ff876beee
Gitweb:        https://git.kernel.org/tip/15cbe67bbd3adeb4854c42713dbeaf2ff876beee
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:08 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:23 +02:00

seqlock: Properly format kernel-doc code samples

Align the code samples and note sections inside kernel-doc comments with
tabs. This way they can be properly parsed and rendered by Sphinx. It
also makes the code samples easier to read from text editors.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-3-a.darwish@linutronix.de
---
 include/linux/seqlock.h | 108 ++++++++++++++++++++-------------------
 1 file changed, 56 insertions(+), 52 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 299d68f..6c4f68e 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -263,32 +263,32 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  * atomically, avoiding compiler optimizations; b) to document which writes are
  * meant to propagate to the reader critical section. This is necessary because
  * neither writes before and after the barrier are enclosed in a seq-writer
- * critical section that would ensure readers are aware of ongoing writes.
+ * critical section that would ensure readers are aware of ongoing writes::
  *
- *      seqcount_t seq;
- *      bool X = true, Y = false;
+ *	seqcount_t seq;
+ *	bool X = true, Y = false;
  *
- *      void read(void)
- *      {
- *              bool x, y;
+ *	void read(void)
+ *	{
+ *		bool x, y;
  *
- *              do {
- *                      int s = read_seqcount_begin(&seq);
+ *		do {
+ *			int s = read_seqcount_begin(&seq);
  *
- *                      x = X; y = Y;
+ *			x = X; y = Y;
  *
- *              } while (read_seqcount_retry(&seq, s));
+ *		} while (read_seqcount_retry(&seq, s));
  *
- *              BUG_ON(!x && !y);
+ *		BUG_ON(!x && !y);
  *      }
  *
  *      void write(void)
  *      {
- *              WRITE_ONCE(Y, true);
+ *		WRITE_ONCE(Y, true);
  *
- *              raw_write_seqcount_barrier(seq);
+ *		raw_write_seqcount_barrier(seq);
  *
- *              WRITE_ONCE(X, false);
+ *		WRITE_ONCE(X, false);
  *      }
  */
 static inline void raw_write_seqcount_barrier(seqcount_t *s)
@@ -325,64 +325,68 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  * Very simply put: we first modify one copy and then the other. This ensures
  * there is always one copy in a stable state, ready to give us an answer.
  *
- * The basic form is a data structure like:
+ * The basic form is a data structure like::
  *
- * struct latch_struct {
- *	seqcount_t		seq;
- *	struct data_struct	data[2];
- * };
+ *	struct latch_struct {
+ *		seqcount_t		seq;
+ *		struct data_struct	data[2];
+ *	};
  *
  * Where a modification, which is assumed to be externally serialized, does the
- * following:
+ * following::
  *
- * void latch_modify(struct latch_struct *latch, ...)
- * {
- *	smp_wmb();	<- Ensure that the last data[1] update is visible
- *	latch->seq++;
- *	smp_wmb();	<- Ensure that the seqcount update is visible
+ *	void latch_modify(struct latch_struct *latch, ...)
+ *	{
+ *		smp_wmb();	// Ensure that the last data[1] update is visible
+ *		latch->seq++;
+ *		smp_wmb();	// Ensure that the seqcount update is visible
  *
- *	modify(latch->data[0], ...);
+ *		modify(latch->data[0], ...);
  *
- *	smp_wmb();	<- Ensure that the data[0] update is visible
- *	latch->seq++;
- *	smp_wmb();	<- Ensure that the seqcount update is visible
+ *		smp_wmb();	// Ensure that the data[0] update is visible
+ *		latch->seq++;
+ *		smp_wmb();	// Ensure that the seqcount update is visible
  *
- *	modify(latch->data[1], ...);
- * }
+ *		modify(latch->data[1], ...);
+ *	}
  *
- * The query will have a form like:
+ * The query will have a form like::
  *
- * struct entry *latch_query(struct latch_struct *latch, ...)
- * {
- *	struct entry *entry;
- *	unsigned seq, idx;
+ *	struct entry *latch_query(struct latch_struct *latch, ...)
+ *	{
+ *		struct entry *entry;
+ *		unsigned seq, idx;
  *
- *	do {
- *		seq = raw_read_seqcount_latch(&latch->seq);
+ *		do {
+ *			seq = raw_read_seqcount_latch(&latch->seq);
  *
- *		idx = seq & 0x01;
- *		entry = data_query(latch->data[idx], ...);
+ *			idx = seq & 0x01;
+ *			entry = data_query(latch->data[idx], ...);
  *
- *		smp_rmb();
- *	} while (seq != latch->seq);
+ *			smp_rmb();
+ *		} while (seq != latch->seq);
  *
- *	return entry;
- * }
+ *		return entry;
+ *	}
  *
  * So during the modification, queries are first redirected to data[1]. Then we
  * modify data[0]. When that is complete, we redirect queries back to data[0]
  * and we can modify data[1].
  *
- * NOTE: The non-requirement for atomic modifications does _NOT_ include
- *       the publishing of new entries in the case where data is a dynamic
- *       data structure.
+ * NOTE:
+ *
+ *	The non-requirement for atomic modifications does _NOT_ include
+ *	the publishing of new entries in the case where data is a dynamic
+ *	data structure.
+ *
+ *	An iteration might start in data[0] and get suspended long enough
+ *	to miss an entire modification sequence, once it resumes it might
+ *	observe the new entry.
  *
- *       An iteration might start in data[0] and get suspended long enough
- *       to miss an entire modification sequence, once it resumes it might
- *       observe the new entry.
+ * NOTE:
  *
- * NOTE: When data is a dynamic data structure; one should use regular RCU
- *       patterns to manage the lifetimes of the objects within.
+ *	When data is a dynamic data structure; one should use regular RCU
+ *	patterns to manage the lifetimes of the objects within.
  */
 static inline void raw_write_seqcount_latch(seqcount_t *s)
 {

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] Documentation: locking: Describe seqlock design and usage
  2020-07-20 15:55   ` [PATCH v4 01/24] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
  2020-07-21  1:35     ` Steven Rostedt
  2020-07-21  1:44     ` Steven Rostedt
@ 2020-07-29 14:33     ` tip-bot2 for Ahmed S. Darwish
  2 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-07-29 14:33 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     0d24f65e933ca89d55d17f6dbdb2a72ca88f0992
Gitweb:        https://git.kernel.org/tip/0d24f65e933ca89d55d17f6dbdb2a72ca88f0992
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Mon, 20 Jul 2020 17:55:07 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 29 Jul 2020 16:14:22 +02:00

Documentation: locking: Describe seqlock design and usage

Proper documentation for the design and usage of sequence counters and
sequential locks does not exist. Complete the seqlock.h documentation as
follows:

  - Divide all documentation on a seqcount_t vs. seqlock_t basis. The
    description for both mechanisms was intermingled, which is incorrect
    since the usage constrains for each type are vastly different.

  - Add an introductory paragraph describing the internal design of, and
    rationale for, sequence counters.

  - Document seqcount_t writer non-preemptibility requirement, which was
    not previously documented anywhere, and provide a clear rationale.

  - Provide template code for seqcount_t and seqlock_t initialization
    and reader/writer critical sections.

  - Recommend using seqlock_t by default. It implicitly handles the
    serialization and non-preemptibility requirements of writers.

At seqlock.h:

  - Remove references to brlocks as they've long been removed from the
    kernel.

  - Remove references to gcc-3.x since the kernel's minimum supported
    gcc version is 4.9.

References: 0f6ed63b1707 ("no need to keep brlock macros anymore...")
References: 6ec4476ac825 ("Raise gcc version requirement to 4.9")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-2-a.darwish@linutronix.de
---
 Documentation/locking/index.rst   |   1 +-
 Documentation/locking/seqlock.rst | 170 +++++++++++++++++++++++++++++-
 include/linux/seqlock.h           |  85 +++++++--------
 3 files changed, 211 insertions(+), 45 deletions(-)
 create mode 100644 Documentation/locking/seqlock.rst

diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
index d785878..7003bd5 100644
--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -14,6 +14,7 @@ locking
     mutex-design
     rt-mutex-design
     rt-mutex
+    seqlock
     spinlocks
     ww-mutex-design
     preempt-locking
diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
new file mode 100644
index 0000000..366dd36
--- /dev/null
+++ b/Documentation/locking/seqlock.rst
@@ -0,0 +1,170 @@
+======================================
+Sequence counters and sequential locks
+======================================
+
+Introduction
+============
+
+Sequence counters are a reader-writer consistency mechanism with
+lockless readers (read-only retry loops), and no writer starvation. They
+are used for data that's rarely written to (e.g. system time), where the
+reader wants a consistent set of information and is willing to retry if
+that information changes.
+
+A data set is consistent when the sequence count at the beginning of the
+read side critical section is even and the same sequence count value is
+read again at the end of the critical section. The data in the set must
+be copied out inside the read side critical section. If the sequence
+count has changed between the start and the end of the critical section,
+the reader must retry.
+
+Writers increment the sequence count at the start and the end of their
+critical section. After starting the critical section the sequence count
+is odd and indicates to the readers that an update is in progress. At
+the end of the write side critical section the sequence count becomes
+even again which lets readers make progress.
+
+A sequence counter write side critical section must never be preempted
+or interrupted by read side sections. Otherwise the reader will spin for
+the entire scheduler tick due to the odd sequence count value and the
+interrupted writer. If that reader belongs to a real-time scheduling
+class, it can spin forever and the kernel will livelock.
+
+This mechanism cannot be used if the protected data contains pointers,
+as the writer can invalidate a pointer that the reader is following.
+
+
+.. _seqcount_t:
+
+Sequence counters (``seqcount_t``)
+==================================
+
+This is the the raw counting mechanism, which does not protect against
+multiple writers.  Write side critical sections must thus be serialized
+by an external lock.
+
+If the write serialization primitive is not implicitly disabling
+preemption, preemption must be explicitly disabled before entering the
+write side section. If the read section can be invoked from hardirq or
+softirq contexts, interrupts or bottom halves must also be respectively
+disabled before entering the write section.
+
+If it's desired to automatically handle the sequence counter
+requirements of writer serialization and non-preemptibility, use
+:ref:`seqlock_t` instead.
+
+Initialization::
+
+	/* dynamic */
+	seqcount_t foo_seqcount;
+	seqcount_init(&foo_seqcount);
+
+	/* static */
+	static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
+
+	/* C99 struct init */
+	struct {
+		.seq   = SEQCNT_ZERO(foo.seq),
+	} foo;
+
+Write path::
+
+	/* Serialized context with disabled preemption */
+
+	write_seqcount_begin(&foo_seqcount);
+
+	/* ... [[write-side critical section]] ... */
+
+	write_seqcount_end(&foo_seqcount);
+
+Read path::
+
+	do {
+		seq = read_seqcount_begin(&foo_seqcount);
+
+		/* ... [[read-side critical section]] ... */
+
+	} while (read_seqcount_retry(&foo_seqcount, seq));
+
+
+.. _seqlock_t:
+
+Sequential locks (``seqlock_t``)
+================================
+
+This contains the :ref:`seqcount_t` mechanism earlier discussed, plus an
+embedded spinlock for writer serialization and non-preemptibility.
+
+If the read side section can be invoked from hardirq or softirq context,
+use the write side function variants which disable interrupts or bottom
+halves respectively.
+
+Initialization::
+
+	/* dynamic */
+	seqlock_t foo_seqlock;
+	seqlock_init(&foo_seqlock);
+
+	/* static */
+	static DEFINE_SEQLOCK(foo_seqlock);
+
+	/* C99 struct init */
+	struct {
+		.seql   = __SEQLOCK_UNLOCKED(foo.seql)
+	} foo;
+
+Write path::
+
+	write_seqlock(&foo_seqlock);
+
+	/* ... [[write-side critical section]] ... */
+
+	write_sequnlock(&foo_seqlock);
+
+Read path, three categories:
+
+1. Normal Sequence readers which never block a writer but they must
+   retry if a writer is in progress by detecting change in the sequence
+   number.  Writers do not wait for a sequence reader::
+
+	do {
+		seq = read_seqbegin(&foo_seqlock);
+
+		/* ... [[read-side critical section]] ... */
+
+	} while (read_seqretry(&foo_seqlock, seq));
+
+2. Locking readers which will wait if a writer or another locking reader
+   is in progress. A locking reader in progress will also block a writer
+   from entering its critical section. This read lock is
+   exclusive. Unlike rwlock_t, only one locking reader can acquire it::
+
+	read_seqlock_excl(&foo_seqlock);
+
+	/* ... [[read-side critical section]] ... */
+
+	read_sequnlock_excl(&foo_seqlock);
+
+3. Conditional lockless reader (as in 1), or locking reader (as in 2),
+   according to a passed marker. This is used to avoid lockless readers
+   starvation (too much retry loops) in case of a sharp spike in write
+   activity. First, a lockless read is tried (even marker passed). If
+   that trial fails (odd sequence counter is returned, which is used as
+   the next iteration marker), the lockless read is transformed to a
+   full locking read and no retry loop is necessary::
+
+	/* marker; even initialization */
+	int seq = 0;
+	do {
+		read_seqbegin_or_lock(&foo_seqlock, &seq);
+
+		/* ... [[read-side critical section]] ... */
+
+	} while (need_seqretry(&foo_seqlock, seq));
+	done_seqretry(&foo_seqlock, seq);
+
+
+API documentation
+=================
+
+.. kernel-doc:: include/linux/seqlock.h
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 8b97204..299d68f 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -1,36 +1,15 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef __LINUX_SEQLOCK_H
 #define __LINUX_SEQLOCK_H
+
 /*
- * Reader/writer consistent mechanism without starving writers. This type of
- * lock for data where the reader wants a consistent set of information
- * and is willing to retry if the information changes. There are two types
- * of readers:
- * 1. Sequence readers which never block a writer but they may have to retry
- *    if a writer is in progress by detecting change in sequence number.
- *    Writers do not wait for a sequence reader.
- * 2. Locking readers which will wait if a writer or another locking reader
- *    is in progress. A locking reader in progress will also block a writer
- *    from going forward. Unlike the regular rwlock, the read lock here is
- *    exclusive so that only one locking reader can get it.
- *
- * This is not as cache friendly as brlock. Also, this may not work well
- * for data that contains pointers, because any writer could
- * invalidate a pointer that a reader was following.
- *
- * Expected non-blocking reader usage:
- * 	do {
- *	    seq = read_seqbegin(&foo);
- * 	...
- *      } while (read_seqretry(&foo, seq));
- *
- *
- * On non-SMP the spin locks disappear but the writer still needs
- * to increment the sequence variables because an interrupt routine could
- * change the state of the data.
- *
- * Based on x86_64 vsyscall gettimeofday 
- * by Keith Owens and Andrea Arcangeli
+ * seqcount_t / seqlock_t - a reader-writer consistency mechanism with
+ * lockless readers (read-only retry loops), and no writer starvation.
+ *
+ * See Documentation/locking/seqlock.rst
+ *
+ * Copyrights:
+ * - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli
  */
 
 #include <linux/spinlock.h>
@@ -41,8 +20,8 @@
 #include <asm/processor.h>
 
 /*
- * The seqlock interface does not prescribe a precise sequence of read
- * begin/retry/end. For readers, typically there is a call to
+ * The seqlock seqcount_t interface does not prescribe a precise sequence of
+ * read begin/retry/end. For readers, typically there is a call to
  * read_seqcount_begin() and read_seqcount_retry(), however, there are more
  * esoteric cases which do not follow this pattern.
  *
@@ -50,16 +29,30 @@
  * via seqcount_t under KCSAN: upon beginning a seq-reader critical section,
  * pessimistically mark the next KCSAN_SEQLOCK_REGION_MAX memory accesses as
  * atomics; if there is a matching read_seqcount_retry() call, no following
- * memory operations are considered atomic. Usage of seqlocks via seqlock_t
- * interface is not affected.
+ * memory operations are considered atomic. Usage of the seqlock_t interface
+ * is not affected.
  */
 #define KCSAN_SEQLOCK_REGION_MAX 1000
 
 /*
- * Version using sequence counter only.
- * This can be used when code has its own mutex protecting the
- * updating starting before the write_seqcountbeqin() and ending
- * after the write_seqcount_end().
+ * Sequence counters (seqcount_t)
+ *
+ * This is the raw counting mechanism, without any writer protection.
+ *
+ * Write side critical sections must be serialized and non-preemptible.
+ *
+ * If readers can be invoked from hardirq or softirq contexts,
+ * interrupts or bottom halves must also be respectively disabled before
+ * entering the write section.
+ *
+ * This mechanism can't be used if the protected data contains pointers,
+ * as the writer can invalidate a pointer that a reader is following.
+ *
+ * If it's desired to automatically handle the sequence counter writer
+ * serialization and non-preemptibility requirements, use a sequential
+ * lock (seqlock_t) instead.
+ *
+ * See Documentation/locking/seqlock.rst
  */
 typedef struct seqcount {
 	unsigned sequence;
@@ -398,10 +391,6 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
        smp_wmb();      /* increment "sequence" before following stores */
 }
 
-/*
- * Sequence counter only version assumes that callers are using their
- * own mutexing.
- */
 static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
 	raw_write_seqcount_begin(s);
@@ -434,15 +423,21 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
+/*
+ * Sequential locks (seqlock_t)
+ *
+ * Sequence counters with an embedded spinlock for writer serialization
+ * and non-preemptibility.
+ *
+ * For more info, see:
+ *    - Comments on top of seqcount_t
+ *    - Documentation/locking/seqlock.rst
+ */
 typedef struct {
 	struct seqcount seqcount;
 	spinlock_t lock;
 } seqlock_t;
 
-/*
- * These macros triggered gcc-3.x compile-time problems.  We think these are
- * OK now.  Be cautious.
- */
 #define __SEQLOCK_UNLOCKED(lockname)			\
 	{						\
 		.seqcount = SEQCNT_ZERO(lockname),	\

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write
  2020-07-20 15:55   ` [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write Ahmed S. Darwish
  2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
@ 2020-08-08 23:21     ` Guenter Roeck
  2020-08-08 23:23       ` Guenter Roeck
  2020-08-09 18:42       ` Ahmed S. Darwish
  1 sibling, 2 replies; 258+ messages in thread
From: Guenter Roeck @ 2020-08-08 23:21 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML

On Mon, Jul 20, 2020 at 05:55:14PM +0200, Ahmed S. Darwish wrote:
> Preemption must be disabled before entering a sequence count write side
> critical section.  Failing to do so, the seqcount read side can preempt
> the write side section and spin for the entire scheduler tick.  If that
> reader belongs to a real-time scheduling class, it can spin forever and
> the kernel will livelock.
> 
> Assert through lockdep that preemption is disabled for seqcount writers.
> 

This patch is causing compile failures for various images (eg arm:allmodconfig,
arm:imx_v6_v7_defconfig, mips:allmodconfig).

In file included from arch/arm/include/asm/bug.h:60,
                 from include/linux/bug.h:5,
                 from include/linux/thread_info.h:12,
                 from include/asm-generic/current.h:5,
                 from ./arch/arm/include/generated/asm/current.h:1,
                 from include/linux/sched.h:12,
                 from arch/arm/kernel/asm-offsets.c:11:
include/linux/seqlock.h: In function 'write_seqcount_begin_nested':
include/asm-generic/percpu.h:31:40: error: implicit declaration of function 'raw_smp_processor_id'

Reverting it fixes the problem. Is this being addressed ?

Guenter

> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> ---
>  include/linux/seqlock.h | 29 +++++++++++++++++++++++------
>  1 file changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> index e885702d8b82..54bc20496392 100644
> --- a/include/linux/seqlock.h
> +++ b/include/linux/seqlock.h
> @@ -266,6 +266,12 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
>  	kcsan_nestable_atomic_end();
>  }
>  
> +static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
> +{
> +	raw_write_seqcount_begin(s);
> +	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
> +}
> +
>  /**
>   * write_seqcount_begin_nested() - start a seqcount_t write section with
>   *                                 custom lockdep nesting level
> @@ -276,8 +282,19 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
>   */
>  static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
>  {
> -	raw_write_seqcount_begin(s);
> -	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
> +	lockdep_assert_preemption_disabled();
> +	__write_seqcount_begin_nested(s, subclass);
> +}
> +
> +/*
> + * A write_seqcount_begin() variant w/o lockdep non-preemptibility checks.
> + *
> + * Use for internal seqlock.h code where it's known that preemption is
> + * already disabled. For example, seqlock_t write side functions.
> + */
> +static inline void __write_seqcount_begin(seqcount_t *s)
> +{
> +	__write_seqcount_begin_nested(s, 0);
>  }
>  
>  /**
> @@ -575,7 +592,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
>  static inline void write_seqlock(seqlock_t *sl)
>  {
>  	spin_lock(&sl->lock);
> -	write_seqcount_begin(&sl->seqcount);
> +	__write_seqcount_begin(&sl->seqcount);
>  }
>  
>  /**
> @@ -601,7 +618,7 @@ static inline void write_sequnlock(seqlock_t *sl)
>  static inline void write_seqlock_bh(seqlock_t *sl)
>  {
>  	spin_lock_bh(&sl->lock);
> -	write_seqcount_begin(&sl->seqcount);
> +	__write_seqcount_begin(&sl->seqcount);
>  }
>  
>  /**
> @@ -628,7 +645,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
>  static inline void write_seqlock_irq(seqlock_t *sl)
>  {
>  	spin_lock_irq(&sl->lock);
> -	write_seqcount_begin(&sl->seqcount);
> +	__write_seqcount_begin(&sl->seqcount);
>  }
>  
>  /**
> @@ -649,7 +666,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&sl->lock, flags);
> -	write_seqcount_begin(&sl->seqcount);
> +	__write_seqcount_begin(&sl->seqcount);
>  	return flags;
>  }
>  
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write
  2020-08-08 23:21     ` [PATCH v4 08/24] " Guenter Roeck
@ 2020-08-08 23:23       ` Guenter Roeck
  2020-08-09 18:42       ` Ahmed S. Darwish
  1 sibling, 0 replies; 258+ messages in thread
From: Guenter Roeck @ 2020-08-08 23:23 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML

On Sat, Aug 08, 2020 at 04:21:22PM -0700, Guenter Roeck wrote:
> On Mon, Jul 20, 2020 at 05:55:14PM +0200, Ahmed S. Darwish wrote:
> > Preemption must be disabled before entering a sequence count write side
> > critical section.  Failing to do so, the seqcount read side can preempt
> > the write side section and spin for the entire scheduler tick.  If that
> > reader belongs to a real-time scheduling class, it can spin forever and
> > the kernel will livelock.
> > 
> > Assert through lockdep that preemption is disabled for seqcount writers.
> > 
> 
> This patch is causing compile failures for various images (eg arm:allmodconfig,
> arm:imx_v6_v7_defconfig, mips:allmodconfig).
> 
> In file included from arch/arm/include/asm/bug.h:60,
>                  from include/linux/bug.h:5,
>                  from include/linux/thread_info.h:12,
>                  from include/asm-generic/current.h:5,
>                  from ./arch/arm/include/generated/asm/current.h:1,
>                  from include/linux/sched.h:12,
>                  from arch/arm/kernel/asm-offsets.c:11:
> include/linux/seqlock.h: In function 'write_seqcount_begin_nested':
> include/asm-generic/percpu.h:31:40: error: implicit declaration of function 'raw_smp_processor_id'
> 

Also:

Building sparc64:allmodconfig ... failed
--------------
Error log:
<stdin>:1511:2: warning: #warning syscall clone3 not implemented [-Wcpp]
In file included from arch/sparc/include/asm/bug.h:25,
                 from include/linux/bug.h:5,
                 from include/linux/thread_info.h:12,
                 from include/asm-generic/preempt.h:5,
                 from ./arch/sparc/include/generated/asm/preempt.h:1,
                 from include/linux/preempt.h:78,
                 from include/linux/spinlock.h:51,
                 from include/linux/seqlock.h:15,
                 from include/linux/time.h:6,
                 from arch/sparc/vdso/vclock_gettime.c:16:
include/linux/seqlock.h: In function 'write_seqcount_begin_nested':
arch/sparc/include/asm/percpu_64.h:19:25: error: '__local_per_cpu_offset' undeclared

Again, reverting this patch fixes the problem.

Guenter

> Reverting it fixes the problem. Is this being addressed ?
> 
> Guenter
> 
> > Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> > ---
> >  include/linux/seqlock.h | 29 +++++++++++++++++++++++------
> >  1 file changed, 23 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> > index e885702d8b82..54bc20496392 100644
> > --- a/include/linux/seqlock.h
> > +++ b/include/linux/seqlock.h
> > @@ -266,6 +266,12 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
> >  	kcsan_nestable_atomic_end();
> >  }
> >  
> > +static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
> > +{
> > +	raw_write_seqcount_begin(s);
> > +	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
> > +}
> > +
> >  /**
> >   * write_seqcount_begin_nested() - start a seqcount_t write section with
> >   *                                 custom lockdep nesting level
> > @@ -276,8 +282,19 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
> >   */
> >  static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
> >  {
> > -	raw_write_seqcount_begin(s);
> > -	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
> > +	lockdep_assert_preemption_disabled();
> > +	__write_seqcount_begin_nested(s, subclass);
> > +}
> > +
> > +/*
> > + * A write_seqcount_begin() variant w/o lockdep non-preemptibility checks.
> > + *
> > + * Use for internal seqlock.h code where it's known that preemption is
> > + * already disabled. For example, seqlock_t write side functions.
> > + */
> > +static inline void __write_seqcount_begin(seqcount_t *s)
> > +{
> > +	__write_seqcount_begin_nested(s, 0);
> >  }
> >  
> >  /**
> > @@ -575,7 +592,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
> >  static inline void write_seqlock(seqlock_t *sl)
> >  {
> >  	spin_lock(&sl->lock);
> > -	write_seqcount_begin(&sl->seqcount);
> > +	__write_seqcount_begin(&sl->seqcount);
> >  }
> >  
> >  /**
> > @@ -601,7 +618,7 @@ static inline void write_sequnlock(seqlock_t *sl)
> >  static inline void write_seqlock_bh(seqlock_t *sl)
> >  {
> >  	spin_lock_bh(&sl->lock);
> > -	write_seqcount_begin(&sl->seqcount);
> > +	__write_seqcount_begin(&sl->seqcount);
> >  }
> >  
> >  /**
> > @@ -628,7 +645,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
> >  static inline void write_seqlock_irq(seqlock_t *sl)
> >  {
> >  	spin_lock_irq(&sl->lock);
> > -	write_seqcount_begin(&sl->seqcount);
> > +	__write_seqcount_begin(&sl->seqcount);
> >  }
> >  
> >  /**
> > @@ -649,7 +666,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
> >  	unsigned long flags;
> >  
> >  	spin_lock_irqsave(&sl->lock, flags);
> > -	write_seqcount_begin(&sl->seqcount);
> > +	__write_seqcount_begin(&sl->seqcount);
> >  	return flags;
> >  }
> >  
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write
  2020-08-08 23:21     ` [PATCH v4 08/24] " Guenter Roeck
  2020-08-08 23:23       ` Guenter Roeck
@ 2020-08-09 18:42       ` Ahmed S. Darwish
  2020-08-10  8:59         ` Greg KH
  1 sibling, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-09 18:42 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML

On Sat, Aug 08, 2020 at 04:21:22PM -0700, Guenter Roeck wrote:
> On Mon, Jul 20, 2020 at 05:55:14PM +0200, Ahmed S. Darwish wrote:
> > Preemption must be disabled before entering a sequence count write side
> > critical section.  Failing to do so, the seqcount read side can preempt
> > the write side section and spin for the entire scheduler tick.  If that
> > reader belongs to a real-time scheduling class, it can spin forever and
> > the kernel will livelock.
> >
> > Assert through lockdep that preemption is disabled for seqcount writers.
> >
>
> This patch is causing compile failures for various images (eg arm:allmodconfig,
> arm:imx_v6_v7_defconfig, mips:allmodconfig).
>
> In file included from arch/arm/include/asm/bug.h:60,
>                  from include/linux/bug.h:5,
>                  from include/linux/thread_info.h:12,
>                  from include/asm-generic/current.h:5,
>                  from ./arch/arm/include/generated/asm/current.h:1,
>                  from include/linux/sched.h:12,
>                  from arch/arm/kernel/asm-offsets.c:11:
> include/linux/seqlock.h: In function 'write_seqcount_begin_nested':
> include/asm-generic/percpu.h:31:40: error: implicit declaration of function 'raw_smp_processor_id'
>
> Reverting it fixes the problem. Is this being addressed ?
>

@Peter, I think let's revert this one for now?

Shall I also send you a version of:

    [PATCH v4 09/24] seqlock: Extend seqcount API with associated locks
    https://lkml.kernel.org/r/20200720155530.1173732-10-a.darwish@linutronix.de/

with the problematic "lockdep_assert_preemption_disabled()" removed?

> Guenter
>

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write
  2020-08-09 18:42       ` Ahmed S. Darwish
@ 2020-08-10  8:59         ` Greg KH
  2020-08-10  9:48           ` peterz
                             ` (2 more replies)
  0 siblings, 3 replies; 258+ messages in thread
From: Greg KH @ 2020-08-10  8:59 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Guenter Roeck, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML

On Sun, Aug 09, 2020 at 08:42:51PM +0200, Ahmed S. Darwish wrote:
> On Sat, Aug 08, 2020 at 04:21:22PM -0700, Guenter Roeck wrote:
> > On Mon, Jul 20, 2020 at 05:55:14PM +0200, Ahmed S. Darwish wrote:
> > > Preemption must be disabled before entering a sequence count write side
> > > critical section.  Failing to do so, the seqcount read side can preempt
> > > the write side section and spin for the entire scheduler tick.  If that
> > > reader belongs to a real-time scheduling class, it can spin forever and
> > > the kernel will livelock.
> > >
> > > Assert through lockdep that preemption is disabled for seqcount writers.
> > >
> >
> > This patch is causing compile failures for various images (eg arm:allmodconfig,
> > arm:imx_v6_v7_defconfig, mips:allmodconfig).
> >
> > In file included from arch/arm/include/asm/bug.h:60,
> >                  from include/linux/bug.h:5,
> >                  from include/linux/thread_info.h:12,
> >                  from include/asm-generic/current.h:5,
> >                  from ./arch/arm/include/generated/asm/current.h:1,
> >                  from include/linux/sched.h:12,
> >                  from arch/arm/kernel/asm-offsets.c:11:
> > include/linux/seqlock.h: In function 'write_seqcount_begin_nested':
> > include/asm-generic/percpu.h:31:40: error: implicit declaration of function 'raw_smp_processor_id'
> >
> > Reverting it fixes the problem. Is this being addressed ?
> >
> 
> @Peter, I think let's revert this one for now?

Please do, it's blowing up my local builds as well :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write
  2020-08-10  8:59         ` Greg KH
@ 2020-08-10  9:48           ` peterz
  2020-08-10 10:03             ` Greg KH
  2020-08-10  9:54           ` [PATCH] Revert "seqlock: lockdep assert non-preemptibility on seqcount_t write" Ahmed S. Darwish
  2020-08-10 19:55           ` [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write Thomas Gleixner
  2 siblings, 1 reply; 258+ messages in thread
From: peterz @ 2020-08-10  9:48 UTC (permalink / raw)
  To: Greg KH
  Cc: Ahmed S. Darwish, Guenter Roeck, Ingo Molnar, Will Deacon,
	Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML

On Mon, Aug 10, 2020 at 10:59:54AM +0200, Greg KH wrote:
> On Sun, Aug 09, 2020 at 08:42:51PM +0200, Ahmed S. Darwish wrote:

> > @Peter, I think let's revert this one for now?
> 
> Please do, it's blowing up my local builds as well :(

There's a bunch of patches queued here:

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/urgent

That should fix it all, but I'm not exactly sure when the pull request
for that will go out.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* [PATCH] Revert "seqlock: lockdep assert non-preemptibility on seqcount_t write"
  2020-08-10  8:59         ` Greg KH
  2020-08-10  9:48           ` peterz
@ 2020-08-10  9:54           ` Ahmed S. Darwish
  2020-08-10 10:05             ` Greg KH
  2020-08-10 19:55           ` [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write Thomas Gleixner
  2 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-10  9:54 UTC (permalink / raw)
  To: gregkh
  Cc: a.darwish, bigeasy, linux-kernel, linux, mingo, paulmck, peterz,
	rostedt, tglx, will

This reverts commit 859247d39fb008ea812e8f0c398a58a20c12899e.

Current implementation of lockdep_assert_preemption_disabled() uses
per-CPU variables, which was done to untangle the existing
seqlock.h<=>sched.h 'current->' task_struct circular dependency.

Using per-CPU variables did not fully untangle the dependency for
various non-x86 architectures though, resulting in multiple broken
builds. For the affected architectures, raw_smp_processor_id() led
back to 'current->', thus having the original seqlock.h<=>sched.h
dependency in full-effect.

For now, revert adding lockdep_assert_preemption_disabled() to
seqlock.h.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lkml.kernel.org/r/20200808232122.GA176509@roeck-us.net
Link: https://lkml.kernel.org/r/20200810085954.GA1591892@kroah.com
References: Commit a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---

Notes:
    My apologies for the mess on this one :-(

 include/linux/seqlock.h | 29 ++++++-----------------------
 1 file changed, 6 insertions(+), 23 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 54bc20496392..e885702d8b82 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -266,12 +266,6 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
-static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
-{
-	raw_write_seqcount_begin(s);
-	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
-}
-
 /**
  * write_seqcount_begin_nested() - start a seqcount_t write section with
  *                                 custom lockdep nesting level
@@ -282,19 +276,8 @@ static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
  */
 static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
-	lockdep_assert_preemption_disabled();
-	__write_seqcount_begin_nested(s, subclass);
-}
-
-/*
- * A write_seqcount_begin() variant w/o lockdep non-preemptibility checks.
- *
- * Use for internal seqlock.h code where it's known that preemption is
- * already disabled. For example, seqlock_t write side functions.
- */
-static inline void __write_seqcount_begin(seqcount_t *s)
-{
-	__write_seqcount_begin_nested(s, 0);
+	raw_write_seqcount_begin(s);
+	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }
 
 /**
@@ -592,7 +575,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	__write_seqcount_begin(&sl->seqcount);
+	write_seqcount_begin(&sl->seqcount);
 }
 
 /**
@@ -618,7 +601,7 @@ static inline void write_sequnlock(seqlock_t *sl)
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
-	__write_seqcount_begin(&sl->seqcount);
+	write_seqcount_begin(&sl->seqcount);
 }
 
 /**
@@ -645,7 +628,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
-	__write_seqcount_begin(&sl->seqcount);
+	write_seqcount_begin(&sl->seqcount);
 }
 
 /**
@@ -666,7 +649,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	unsigned long flags;
 
 	spin_lock_irqsave(&sl->lock, flags);
-	__write_seqcount_begin(&sl->seqcount);
+	write_seqcount_begin(&sl->seqcount);
 	return flags;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write
  2020-08-10  9:48           ` peterz
@ 2020-08-10 10:03             ` Greg KH
  0 siblings, 0 replies; 258+ messages in thread
From: Greg KH @ 2020-08-10 10:03 UTC (permalink / raw)
  To: peterz
  Cc: Ahmed S. Darwish, Guenter Roeck, Ingo Molnar, Will Deacon,
	Thomas Gleixner, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML

On Mon, Aug 10, 2020 at 11:48:20AM +0200, peterz@infradead.org wrote:
> On Mon, Aug 10, 2020 at 10:59:54AM +0200, Greg KH wrote:
> > On Sun, Aug 09, 2020 at 08:42:51PM +0200, Ahmed S. Darwish wrote:
> 
> > > @Peter, I think let's revert this one for now?
> > 
> > Please do, it's blowing up my local builds as well :(
> 
> There's a bunch of patches queued here:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/urgent
> 
> That should fix it all, but I'm not exactly sure when the pull request
> for that will go out.

Before 5.9-rc1 hopefully?  :)

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH] Revert "seqlock: lockdep assert non-preemptibility on seqcount_t write"
  2020-08-10  9:54           ` [PATCH] Revert "seqlock: lockdep assert non-preemptibility on seqcount_t write" Ahmed S. Darwish
@ 2020-08-10 10:05             ` Greg KH
  2020-08-10 10:35               ` Ahmed S. Darwish
  2020-08-10 14:10               ` Guenter Roeck
  0 siblings, 2 replies; 258+ messages in thread
From: Greg KH @ 2020-08-10 10:05 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: bigeasy, linux-kernel, linux, mingo, paulmck, peterz, rostedt,
	tglx, will

On Mon, Aug 10, 2020 at 11:54:28AM +0200, Ahmed S. Darwish wrote:
> This reverts commit 859247d39fb008ea812e8f0c398a58a20c12899e.
> 
> Current implementation of lockdep_assert_preemption_disabled() uses
> per-CPU variables, which was done to untangle the existing
> seqlock.h<=>sched.h 'current->' task_struct circular dependency.
> 
> Using per-CPU variables did not fully untangle the dependency for
> various non-x86 architectures though, resulting in multiple broken
> builds. For the affected architectures, raw_smp_processor_id() led
> back to 'current->', thus having the original seqlock.h<=>sched.h
> dependency in full-effect.
> 
> For now, revert adding lockdep_assert_preemption_disabled() to
> seqlock.h.
> 
> Reported-by: Guenter Roeck <linux@roeck-us.net>
> Link: https://lkml.kernel.org/r/20200808232122.GA176509@roeck-us.net
> Link: https://lkml.kernel.org/r/20200810085954.GA1591892@kroah.com
> References: Commit a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables")
> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Even after this, there are still some build errors on arm32, but I don't
think they are due to this change:

	ERROR: modpost: "__aeabi_uldivmod" [drivers/net/ethernet/sfc/sfc.ko] undefined!
	ERROR: modpost: "__bad_udelay" [drivers/net/ethernet/aquantia/atlantic/atlantic.ko] undefined!

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH] Revert "seqlock: lockdep assert non-preemptibility on seqcount_t write"
  2020-08-10 10:05             ` Greg KH
@ 2020-08-10 10:35               ` Ahmed S. Darwish
  2020-08-10 14:10               ` Guenter Roeck
  1 sibling, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-10 10:35 UTC (permalink / raw)
  To: Greg KH
  Cc: bigeasy, linux-kernel, linux, mingo, paulmck, peterz, rostedt,
	tglx, will

On Mon, Aug 10, 2020 at 12:05:02PM +0200, Greg KH wrote:
> On Mon, Aug 10, 2020 at 11:54:28AM +0200, Ahmed S. Darwish wrote:
> > This reverts commit 859247d39fb008ea812e8f0c398a58a20c12899e.
> >
> > Current implementation of lockdep_assert_preemption_disabled() uses
> > per-CPU variables, which was done to untangle the existing
> > seqlock.h<=>sched.h 'current->' task_struct circular dependency.
> >
> > Using per-CPU variables did not fully untangle the dependency for
> > various non-x86 architectures though, resulting in multiple broken
> > builds. For the affected architectures, raw_smp_processor_id() led
> > back to 'current->', thus having the original seqlock.h<=>sched.h
> > dependency in full-effect.
> >
> > For now, revert adding lockdep_assert_preemption_disabled() to
> > seqlock.h.
> >
> > Reported-by: Guenter Roeck <linux@roeck-us.net>
> > Link: https://lkml.kernel.org/r/20200808232122.GA176509@roeck-us.net
> > Link: https://lkml.kernel.org/r/20200810085954.GA1591892@kroah.com
> > References: Commit a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables")
> > Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>
> Even after this, there are still some build errors on arm32, but I don't
> think they are due to this change:
>
> 	ERROR: modpost: "__aeabi_uldivmod" [drivers/net/ethernet/sfc/sfc.ko] undefined!
> 	ERROR: modpost: "__bad_udelay" [drivers/net/ethernet/aquantia/atlantic/atlantic.ko] undefined!
>

Yes, they are unrelated to the seqlock.h changes.

(I've locally reverted the whole series just to be sure, and the same
 linking errors as above were still there for an ARM allyesconfig.)

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH] Revert "seqlock: lockdep assert non-preemptibility on seqcount_t write"
  2020-08-10 10:05             ` Greg KH
  2020-08-10 10:35               ` Ahmed S. Darwish
@ 2020-08-10 14:10               ` Guenter Roeck
  2020-08-18 22:51                 ` Valdis Klētnieks
  1 sibling, 1 reply; 258+ messages in thread
From: Guenter Roeck @ 2020-08-10 14:10 UTC (permalink / raw)
  To: Greg KH, Ahmed S. Darwish
  Cc: bigeasy, linux-kernel, mingo, paulmck, peterz, rostedt, tglx, will

On 8/10/20 3:05 AM, Greg KH wrote:
> On Mon, Aug 10, 2020 at 11:54:28AM +0200, Ahmed S. Darwish wrote:
>> This reverts commit 859247d39fb008ea812e8f0c398a58a20c12899e.
>>
>> Current implementation of lockdep_assert_preemption_disabled() uses
>> per-CPU variables, which was done to untangle the existing
>> seqlock.h<=>sched.h 'current->' task_struct circular dependency.
>>
>> Using per-CPU variables did not fully untangle the dependency for
>> various non-x86 architectures though, resulting in multiple broken
>> builds. For the affected architectures, raw_smp_processor_id() led
>> back to 'current->', thus having the original seqlock.h<=>sched.h
>> dependency in full-effect.
>>
>> For now, revert adding lockdep_assert_preemption_disabled() to
>> seqlock.h.
>>
>> Reported-by: Guenter Roeck <linux@roeck-us.net>
>> Link: https://lkml.kernel.org/r/20200808232122.GA176509@roeck-us.net
>> Link: https://lkml.kernel.org/r/20200810085954.GA1591892@kroah.com
>> References: Commit a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables")
>> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> 
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> Even after this, there are still some build errors on arm32, but I don't
> think they are due to this change:
> 
> 	ERROR: modpost: "__aeabi_uldivmod" [drivers/net/ethernet/sfc/sfc.ko] undefined!

This affects every 32 bit architecture.

> 	ERROR: modpost: "__bad_udelay" [drivers/net/ethernet/aquantia/atlantic/atlantic.ko] undefined!
> 

I don't think that is new. If anything, it is surprising that builds don't fail more
widely because of it. AFAICS it was introduced back in 2018 (a hot 50-ms delay loop
really isn't such a good idea).

Guenter

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write
  2020-08-10  8:59         ` Greg KH
  2020-08-10  9:48           ` peterz
  2020-08-10  9:54           ` [PATCH] Revert "seqlock: lockdep assert non-preemptibility on seqcount_t write" Ahmed S. Darwish
@ 2020-08-10 19:55           ` Thomas Gleixner
  2020-08-11 10:06             ` Greg KH
  2 siblings, 1 reply; 258+ messages in thread
From: Thomas Gleixner @ 2020-08-10 19:55 UTC (permalink / raw)
  To: Greg KH, Ahmed S. Darwish
  Cc: Guenter Roeck, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Paul E. McKenney, Sebastian A. Siewior, Steven Rostedt, LKML

Greg KH <gregkh@linuxfoundation.org> writes:
> On Sun, Aug 09, 2020 at 08:42:51PM +0200, Ahmed S. Darwish wrote:
>> On Sat, Aug 08, 2020 at 04:21:22PM -0700, Guenter Roeck wrote:
>> > Reverting it fixes the problem. Is this being addressed ?
>> >
>> 
>> @Peter, I think let's revert this one for now?
>
> Please do, it's blowing up my local builds as well :(

Peter and Ingo sorted the header mess last week and I just sent a pull
request to Linus.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write
  2020-08-10 19:55           ` [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write Thomas Gleixner
@ 2020-08-11 10:06             ` Greg KH
  0 siblings, 0 replies; 258+ messages in thread
From: Greg KH @ 2020-08-11 10:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ahmed S. Darwish, Guenter Roeck, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Paul E. McKenney, Sebastian A. Siewior,
	Steven Rostedt, LKML

On Mon, Aug 10, 2020 at 09:55:20PM +0200, Thomas Gleixner wrote:
> Greg KH <gregkh@linuxfoundation.org> writes:
> > On Sun, Aug 09, 2020 at 08:42:51PM +0200, Ahmed S. Darwish wrote:
> >> On Sat, Aug 08, 2020 at 04:21:22PM -0700, Guenter Roeck wrote:
> >> > Reverting it fixes the problem. Is this being addressed ?
> >> >
> >> 
> >> @Peter, I think let's revert this one for now?
> >
> > Please do, it's blowing up my local builds as well :(
> 
> Peter and Ingo sorted the header mess last week and I just sent a pull
> request to Linus.

Thanks, that looks good and works for my build tests!

greg k-h

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH] Revert "seqlock: lockdep assert non-preemptibility on seqcount_t write"
  2020-08-10 14:10               ` Guenter Roeck
@ 2020-08-18 22:51                 ` Valdis Klētnieks
  2020-08-19  0:56                   ` Guenter Roeck
  0 siblings, 1 reply; 258+ messages in thread
From: Valdis Klētnieks @ 2020-08-18 22:51 UTC (permalink / raw)
  To: Guenter Roeck, Mark Starovoytov, Igor Russkikh, David S. Miller
  Cc: Greg KH, Ahmed S. Darwish, bigeasy, linux-kernel, mingo, paulmck,
	peterz, rostedt, tglx, will

On Mon, 10 Aug 2020 07:10:32 -0700, Guenter Roeck said:
> > 	ERROR: modpost: "__bad_udelay" [drivers/net/ethernet/aquantia/atlantic/atlantic.ko] undefined!
> > 
>
> I don't think that is new. If anything, it is surprising that builds don't fail more
> widely because of it. AFAICS it was introduced back in 2018 (a hot 50-ms delay loop
> really isn't such a good idea).

Well...it wasn't broken in next-20200720.  A bit of poking with nm,
and building hw_atl/hw_atl_b0.s, it looks like the culprit is this commit:

commit 8dcf2ad39fdb2d183b7bd4307c837713e3150b9a
Author: Mark Starovoytov <mstarovoitov@marvell.com>
Date:   Mon Jul 20 21:32:44 2020 +0300

    net: atlantic: add hwmon getter for MAC temperature

specifically this chunk around line 1634 of hw_atl/hw_atl_b0.c:


+       err = readx_poll_timeout_atomic(hw_atl_b0_ts_ready_and_latch_high_get,
+                                       self, val, val == 1, 10000U, 500000U);

And doing a 'git revert' makes the build work....

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH] Revert "seqlock: lockdep assert non-preemptibility on seqcount_t write"
  2020-08-18 22:51                 ` Valdis Klētnieks
@ 2020-08-19  0:56                   ` Guenter Roeck
  2020-08-19  7:00                     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 258+ messages in thread
From: Guenter Roeck @ 2020-08-19  0:56 UTC (permalink / raw)
  To: Valdis Klētnieks, Mark Starovoytov, Igor Russkikh, David S. Miller
  Cc: Greg KH, Ahmed S. Darwish, bigeasy, linux-kernel, mingo, paulmck,
	peterz, rostedt, tglx, will

On 8/18/20 3:51 PM, Valdis Klētnieks wrote:
> On Mon, 10 Aug 2020 07:10:32 -0700, Guenter Roeck said:
>>> 	ERROR: modpost: "__bad_udelay" [drivers/net/ethernet/aquantia/atlantic/atlantic.ko] undefined!
>>>
>>
>> I don't think that is new. If anything, it is surprising that builds don't fail more
>> widely because of it. AFAICS it was introduced back in 2018 (a hot 50-ms delay loop
>> really isn't such a good idea).
> 
> Well...it wasn't broken in next-20200720.  A bit of poking with nm,
> and building hw_atl/hw_atl_b0.s, it looks like the culprit is this commit:
> 
> commit 8dcf2ad39fdb2d183b7bd4307c837713e3150b9a
> Author: Mark Starovoytov <mstarovoitov@marvell.com>
> Date:   Mon Jul 20 21:32:44 2020 +0300
> 
>     net: atlantic: add hwmon getter for MAC temperature
> 
> specifically this chunk around line 1634 of hw_atl/hw_atl_b0.c:
> 
> 
> +       err = readx_poll_timeout_atomic(hw_atl_b0_ts_ready_and_latch_high_get,
> +                                       self, val, val == 1, 10000U, 500000U);
> 
> And doing a 'git revert' makes the build work....
> 

Nice catch. FWIW, there is no obvious reason why this would need to be atomic.
The calling code does not set a lock, meaning there can be two (or more)
callers entering this code. Weird, especially since the code looks like it
would actually need a mutex to work correctly. It might be interesting to
see what happens if there are, say, half a dozen scripts/processes trying
to read the hwmon attribute introduced by this patch at the same time.

Guenter

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH] Revert "seqlock: lockdep assert non-preemptibility on seqcount_t write"
  2020-08-19  0:56                   ` Guenter Roeck
@ 2020-08-19  7:00                     ` Sebastian Andrzej Siewior
  2020-08-19  7:34                       ` Valdis Klētnieks
  0 siblings, 1 reply; 258+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-08-19  7:00 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Valdis Klētnieks, Mark Starovoytov, Igor Russkikh,
	David S. Miller, Greg KH, Ahmed S. Darwish, linux-kernel, mingo,
	paulmck, peterz, rostedt, tglx, will

On 2020-08-18 17:56:49 [-0700], Guenter Roeck wrote:
> Nice catch. FWIW, there is no obvious reason why this would need to be atomic.
> The calling code does not set a lock, meaning there can be two (or more)
> callers entering this code. Weird, especially since the code looks like it
> would actually need a mutex to work correctly. It might be interesting to
> see what happens if there are, say, half a dozen scripts/processes trying
> to read the hwmon attribute introduced by this patch at the same time.

=> https://lkml.kernel.org/r/20200818161439.3dkf6jzp3vuwmvvh@linutronix.de

> Guenter

Sebastian

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH] Revert "seqlock: lockdep assert non-preemptibility on seqcount_t write"
  2020-08-19  7:00                     ` Sebastian Andrzej Siewior
@ 2020-08-19  7:34                       ` Valdis Klētnieks
  2020-08-19 16:15                         ` Guenter Roeck
  0 siblings, 1 reply; 258+ messages in thread
From: Valdis Klētnieks @ 2020-08-19  7:34 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Guenter Roeck, Mark Starovoytov, Igor Russkikh, David S. Miller,
	Greg KH, Ahmed S. Darwish, linux-kernel, mingo, paulmck, peterz,
	rostedt, tglx, will

[-- Attachment #1: Type: text/plain, Size: 761 bytes --]

On Wed, 19 Aug 2020 09:00:22 +0200, Sebastian Andrzej Siewior said:
> On 2020-08-18 17:56:49 [-0700], Guenter Roeck wrote:
> > Nice catch. FWIW, there is no obvious reason why this would need to be atomic.
> > The calling code does not set a lock, meaning there can be two (or more)
> > callers entering this code. Weird, especially since the code looks like it
> > would actually need a mutex to work correctly. It might be interesting to
> > see what happens if there are, say, half a dozen scripts/processes trying
> > to read the hwmon attribute introduced by this patch at the same time.
>
> => https://lkml.kernel.org/r/20200818161439.3dkf6jzp3vuwmvvh@linutronix.de

Looks reasonable to me, though I've not verified that it's preemptible at that
point...

[-- Attachment #2: Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH] Revert "seqlock: lockdep assert non-preemptibility on seqcount_t write"
  2020-08-19  7:34                       ` Valdis Klētnieks
@ 2020-08-19 16:15                         ` Guenter Roeck
  0 siblings, 0 replies; 258+ messages in thread
From: Guenter Roeck @ 2020-08-19 16:15 UTC (permalink / raw)
  To: Valdis Klētnieks, Sebastian Andrzej Siewior
  Cc: Mark Starovoytov, Igor Russkikh, David S. Miller, Greg KH,
	Ahmed S. Darwish, linux-kernel, mingo, paulmck, peterz, rostedt,
	tglx, will


[-- Attachment #1.1: Type: text/plain, Size: 1102 bytes --]

On 8/19/20 12:34 AM, Valdis Klētnieks wrote:
> On Wed, 19 Aug 2020 09:00:22 +0200, Sebastian Andrzej Siewior said:
>> On 2020-08-18 17:56:49 [-0700], Guenter Roeck wrote:
>>> Nice catch. FWIW, there is no obvious reason why this would need to be atomic.
>>> The calling code does not set a lock, meaning there can be two (or more)
>>> callers entering this code. Weird, especially since the code looks like it
>>> would actually need a mutex to work correctly. It might be interesting to
>>> see what happens if there are, say, half a dozen scripts/processes trying
>>> to read the hwmon attribute introduced by this patch at the same time.
>>
>> => https://lkml.kernel.org/r/20200818161439.3dkf6jzp3vuwmvvh@linutronix.de
> 
> Looks reasonable to me, though I've not verified that it's preemptible at that
> point...
> 

hw_atl_b0_get_mac_temp is called through the .hw_get_mac_temp callback.
This callback is executed from aq_hwmon_read(), which in turn is called
from the hwmon core, more specifically from hwmon_attr_show().
Calls from the hwmon core are unlocked.

Guenter


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 258+ messages in thread

* [PATCH v1 0/8] seqlock: Introduce seqcount_latch_t
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (27 preceding siblings ...)
  2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
@ 2020-08-27 11:40 ` Ahmed S. Darwish
  2020-08-27 11:40   ` [PATCH v1 1/8] time/sched_clock: Use raw_read_seqcount_latch() during suspend Ahmed S. Darwish
                     ` (7 more replies)
  2020-08-28  1:07 ` [PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support Ahmed S. Darwish
  2020-09-10 15:08 ` [tip: locking/core] seqlock: seqcount_LOCKNAME_t: " tip-bot2 for Ahmed S. Darwish
  30 siblings, 8 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-27 11:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Sebastian A. Siewior, LKML, Ahmed S. Darwish,
	Borislav Petkov, x86, H. Peter Anvin

Latch sequence counters are a multiversion concurrency control mechanism
where the embedded seqcount_t counter even/odd value is used to switch
between two copies of protected data. This allows the sequence counter
read side to be invoked from NMIs and safely interrupt its own write
side critical section.

Initially, latch sequence counters were implemented as a single write
function above plain seqcount_t, raw_write_seqcount_latch(). The read
path was expected to use plain seqcount_t raw_read_seqcount().

A specialized latch read function, raw_read_seqcount_latch(), was later
added. It became the standardized way for latch read paths.  Due to the
dependent load, it has one read memory barrier less than the more
generic raw_read_seqcount() API.

Only raw_write_seqcount_latch() and raw_read_seqcount_latch() should be
used with latch sequence counters. Having unique read and write path
APIs means that latch sequence counters are actually a data type of
their own -- just inappropriately overloading plain seqcount_t.

Introduce seqcount_latch_t. This adds type-safety and ensures that only
the correct latch-safe APIs are to be used.

Not to break bisection, let the latch APIs also accept plain seqcount_t
or seqcount_raw_spinlock_t. After converting all call sites to
seqcount_latch_t, (patches #4 => #7), only allow seqcount_latch_t.

Thanks,

8<--------------

Ahmed S. Darwish (8):
  time/sched_clock: Use raw_read_seqcount_latch() during suspend
  mm/swap: Do not abuse the seqcount_t latching API
  seqlock: Introduce seqcount_latch_t
  time/sched_clock: Use seqcount_latch_t
  timekeeping: Use seqcount_latch_t
  x86/tsc: Use seqcount_latch_t
  rbtree_latch: Use seqcount_latch_t
  seqlock: seqcount latch APIs: Only allow seqcount_latch_t

 Documentation/locking/seqlock.rst | 18 ++++++
 arch/x86/kernel/tsc.c             | 12 ++--
 include/linux/rbtree_latch.h      |  6 +-
 include/linux/seqlock.h           | 96 +++++++++++++++++++++----------
 kernel/time/sched_clock.c         |  6 +-
 kernel/time/timekeeping.c         | 10 ++--
 mm/swap.c                         | 65 +++++++++++++++++----
 7 files changed, 156 insertions(+), 57 deletions(-)

base-commit: d012a7190fc1fd72ed48911e77ca97ba4521bccd
--
2.28.0

^ permalink raw reply	[flat|nested] 258+ messages in thread

* [PATCH v1 1/8] time/sched_clock: Use raw_read_seqcount_latch() during suspend
  2020-08-27 11:40 ` [PATCH v1 0/8] seqlock: Introduce seqcount_latch_t Ahmed S. Darwish
@ 2020-08-27 11:40   ` Ahmed S. Darwish
  2020-08-27 11:40   ` [PATCH v1 2/8] mm/swap: Do not abuse the seqcount_t latching API Ahmed S. Darwish
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-27 11:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Sebastian A. Siewior, LKML, Ahmed S. Darwish

sched_clock uses seqcount_t latching to switch between two storage
places protected by the sequence counter. This allows it to have
interruptible, NMI-safe, seqcount_t write side critical sections.

Since 7fc26327b756 ("seqlock: Introduce raw_read_seqcount_latch()"),
raw_read_seqcount_latch() became the standardized way for seqcount_t
latch read paths. Due to the dependent load, it has one read memory
barrier less than the currently used raw_read_seqcount() API.

Use raw_read_seqcount_latch() for the suspend path.

Commit aadd6e5caaac ("time/sched_clock: Use raw_read_seqcount_latch()")
missed changing that instance of raw_read_seqcount().

Link: https://lkml.kernel.org/r/20200625085745.GD117543@hirez.programming.kicks-ass.net
Link: https://lkml.kernel.org/r/20200715092345.GA231464@debian-buster-darwi.lab.linutronix.de
References: 1809bfa44e10 ("timers, sched/clock: Avoid deadlock during read from NMI")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 kernel/time/sched_clock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 1c03eec6ca9b..8c6b5febd7a0 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -258,7 +258,7 @@ void __init generic_sched_clock_init(void)
  */
 static u64 notrace suspended_sched_clock_read(void)
 {
-	unsigned int seq = raw_read_seqcount(&cd.seq);
+	unsigned int seq = raw_read_seqcount_latch(&cd.seq);
 
 	return cd.read_data[seq & 1].epoch_cyc;
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 2/8] mm/swap: Do not abuse the seqcount_t latching API
  2020-08-27 11:40 ` [PATCH v1 0/8] seqlock: Introduce seqcount_latch_t Ahmed S. Darwish
  2020-08-27 11:40   ` [PATCH v1 1/8] time/sched_clock: Use raw_read_seqcount_latch() during suspend Ahmed S. Darwish
@ 2020-08-27 11:40   ` Ahmed S. Darwish
  2020-08-27 11:40   ` [PATCH v1 3/8] seqlock: Introduce seqcount_latch_t Ahmed S. Darwish
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-27 11:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Andrew Morton,
	Konstantin Khlebnikov, linux-mm
  Cc: Thomas Gleixner, Sebastian A. Siewior, LKML, Ahmed S. Darwish

Commit eef1a429f234 ("mm/swap.c: piggyback lru_add_drain_all() calls")
implemented an optimization mechanism to exit the to-be-started LRU
drain operation (name it A) if another drain operation *started and
finished* while (A) was blocked on the LRU draining mutex.

This was done through a seqcount_t latch, which is an abuse of its
semantics:

  1. seqcount_t latching should be used for the purpose of switching
     between two storage places with sequence protection to allow
     interruptible, preemptible, writer sections. The referenced
     optimization mechanism has absolutely nothing to do with that.

  2. The used raw_write_seqcount_latch() has two SMP write memory
     barriers to insure one consistent storage place out of the two
     storage places available. A full memory barrier is required
     instead: to guarantee that the pagevec counter stores visible by
     local CPU are visible to other CPUs -- before loading the current
     drain generation.

Beside the seqcount_t API abuse, the semantics of a latch sequence
counter was force-fitted into the referenced optimization. What was
meant is to track "generations" of LRU draining operations, where
"global lru draining generation = x" implies that all generations
0 < n <= x are already *scheduled* for draining -- thus nothing needs
to be done if the current generation number n <= x.

Remove the conceptually-inappropriate seqcount_t latch usage. Manually
implement the referenced optimization using a counter and SMP memory
barriers.

Note, while at it, use the non-atomic variant of cpumask_set_cpu(),
__cpumask_set_cpu(), due to the already existing mutex protection.

Link: https://lkml.kernel.org/r/CALYGNiPSr-cxV9MX9czaVh6Wz_gzSv3H_8KPvgjBTGbJywUJpA@mail.gmail.com
Link: https://lkml.kernel.org/r/87y2pg9erj.fsf@vostro.fn.ogness.net
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 mm/swap.c | 65 +++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 54 insertions(+), 11 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index d16d65d9b4e0..a1ec807e325d 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -763,10 +763,20 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
  */
 void lru_add_drain_all(void)
 {
-	static seqcount_t seqcount = SEQCNT_ZERO(seqcount);
-	static DEFINE_MUTEX(lock);
+	/*
+	 * lru_drain_gen - Global pages generation number
+	 *
+	 * (A) Definition: global lru_drain_gen = x implies that all generations
+	 *     0 < n <= x are already *scheduled* for draining.
+	 *
+	 * This is an optimization for the highly-contended use case where a
+	 * user space workload keeps constantly generating a flow of pages for
+	 * each CPU.
+	 */
+	static unsigned int lru_drain_gen;
 	static struct cpumask has_work;
-	int cpu, seq;
+	static DEFINE_MUTEX(lock);
+	unsigned cpu, this_gen;
 
 	/*
 	 * Make sure nobody triggers this path before mm_percpu_wq is fully
@@ -775,21 +785,54 @@ void lru_add_drain_all(void)
 	if (WARN_ON(!mm_percpu_wq))
 		return;
 
-	seq = raw_read_seqcount_latch(&seqcount);
+	/*
+	 * Guarantee pagevec counter stores visible by this CPU are visible to
+	 * other CPUs before loading the current drain generation.
+	 */
+	smp_mb();
+
+	/*
+	 * (B) Locally cache global LRU draining generation number
+	 *
+	 * The read barrier ensures that the counter is loaded before the mutex
+	 * is taken. It pairs with smp_mb() inside the mutex critical section
+	 * at (D).
+	 */
+	this_gen = smp_load_acquire(&lru_drain_gen);
 
 	mutex_lock(&lock);
 
 	/*
-	 * Piggyback on drain started and finished while we waited for lock:
-	 * all pages pended at the time of our enter were drained from vectors.
+	 * (C) Exit the draining operation if a newer generation, from another
+	 * lru_add_drain_all(), was already scheduled for draining. Check (A).
 	 */
-	if (__read_seqcount_retry(&seqcount, seq))
+	if (unlikely(this_gen != lru_drain_gen))
 		goto done;
 
-	raw_write_seqcount_latch(&seqcount);
+	/*
+	 * (D) Increment global generation number
+	 *
+	 * Pairs with smp_load_acquire() at (B), outside of the critical
+	 * section. Use a full memory barrier to guarantee that the new global
+	 * drain generation number is stored before loading pagevec counters.
+	 *
+	 * This pairing must be done here, before the for_each_online_cpu loop
+	 * below which drains the page vectors.
+	 *
+	 * Let x, y, and z represent some system CPU numbers, where x < y < z.
+	 * Assume CPU #z is is in the middle of the for_each_online_cpu loop
+	 * below and has already reached CPU #y's per-cpu data. CPU #x comes
+	 * along, adds some pages to its per-cpu vectors, then calls
+	 * lru_add_drain_all().
+	 *
+	 * If the paired barrier is done at any later step, e.g. after the
+	 * loop, CPU #x will just exit at (C) and miss flushing out all of its
+	 * added pages.
+	 */
+	WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
+	smp_mb();
 
 	cpumask_clear(&has_work);
-
 	for_each_online_cpu(cpu) {
 		struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
 
@@ -801,7 +844,7 @@ void lru_add_drain_all(void)
 		    need_activate_page_drain(cpu)) {
 			INIT_WORK(work, lru_add_drain_per_cpu);
 			queue_work_on(cpu, mm_percpu_wq, work);
-			cpumask_set_cpu(cpu, &has_work);
+			__cpumask_set_cpu(cpu, &has_work);
 		}
 	}
 
@@ -816,7 +859,7 @@ void lru_add_drain_all(void)
 {
 	lru_add_drain();
 }
-#endif
+#endif /* CONFIG_SMP */
 
 /**
  * release_pages - batched put_page()
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 3/8] seqlock: Introduce seqcount_latch_t
  2020-08-27 11:40 ` [PATCH v1 0/8] seqlock: Introduce seqcount_latch_t Ahmed S. Darwish
  2020-08-27 11:40   ` [PATCH v1 1/8] time/sched_clock: Use raw_read_seqcount_latch() during suspend Ahmed S. Darwish
  2020-08-27 11:40   ` [PATCH v1 2/8] mm/swap: Do not abuse the seqcount_t latching API Ahmed S. Darwish
@ 2020-08-27 11:40   ` Ahmed S. Darwish
  2020-09-10 15:08     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-08-27 11:40   ` [PATCH v1 4/8] time/sched_clock: Use seqcount_latch_t Ahmed S. Darwish
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-27 11:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Sebastian A. Siewior, LKML, Ahmed S. Darwish

Latch sequence counters are a multiversion concurrency control mechanism
where the seqcount_t counter even/odd value is used to switch between
two copies of protected data. This allows the seqcount_t read path to
safely interrupt its write side critical section (e.g. from NMIs).

Initially, latch sequence counters were implemented as a single write
function above plain seqcount_t: raw_write_seqcount_latch(). The read
side was expected to use plain seqcount_t raw_read_seqcount().

A specialized latch read function, raw_read_seqcount_latch(), was later
added. It became the standardized way for latch read paths.  Due to the
dependent load, it has one read memory barrier less than the plain
seqcount_t raw_read_seqcount() API.

Only raw_write_seqcount_latch() and raw_read_seqcount_latch() should be
used with latch sequence counters. Having *unique* read and write path
APIs means that latch sequence counters are actually a data type of
their own -- just inappropriately overloading plain seqcount_t.

Introduce seqcount_latch_t. This adds type-safety and ensures that only
the correct latch-safe APIs are to be used.

Not to break bisection, let the latch APIs also accept plain seqcount_t
or seqcount_raw_spinlock_t. After converting all call sites to
seqcount_latch_t, only that new data type will be allowed.

References: 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()")
References: 7fc26327b756 ("seqlock: Introduce raw_read_seqcount_latch()")
References: aadd6e5caaac ("time/sched_clock: Use raw_read_seqcount_latch()")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 Documentation/locking/seqlock.rst |  18 ++++++
 include/linux/seqlock.h           | 104 +++++++++++++++++++++---------
 2 files changed, 91 insertions(+), 31 deletions(-)

diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
index 62c5ad98c11c..a334b584f2b3 100644
--- a/Documentation/locking/seqlock.rst
+++ b/Documentation/locking/seqlock.rst
@@ -139,6 +139,24 @@ with the associated LOCKTYPE lock acquired.
 
 Read path: same as in :ref:`seqcount_t`.
 
+
+.. _seqcount_latch_t:
+
+Latch sequence counters (``seqcount_latch_t``)
+----------------------------------------------
+
+Latch sequence counters are a multiversion concurrency control mechanism
+where the embedded seqcount_t counter even/odd value is used to switch
+between two copies of protected data. This allows the sequence counter
+read path to safely interrupt its own write side critical section.
+
+Use seqcount_latch_t when the write side sections cannot be protected
+from interruption by readers. This is typically the case when the read
+side can be invoked from NMI handlers.
+
+Check `raw_write_seqcount_latch()` for more information.
+
+
 .. _seqlock_t:
 
 Sequential locks (``seqlock_t``)
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 962d9768945f..f83fbf2180db 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -587,34 +587,76 @@ static inline void write_seqcount_t_invalidate(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
-/**
- * raw_read_seqcount_latch() - pick even/odd seqcount_t latch data copy
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+/*
+ * Latch sequence counters (seqcount_latch_t)
+ *
+ * A sequence counter variant where the counter even/odd value is used to
+ * switch between two copies of protected data. This allows the read path,
+ * typically NMIs, to safely interrupt the write side critical section.
  *
- * Use seqcount_t latching to switch between two storage places protected
- * by a sequence counter. Doing so allows having interruptible, preemptible,
- * seqcount_t write side critical sections.
+ * As the write sections are fully preemptible, no special handling for
+ * PREEMPT_RT is needed.
+ */
+typedef struct {
+	seqcount_t seqcount;
+} seqcount_latch_t;
+
+/**
+ * SEQCNT_LATCH_ZERO() - static initializer for seqcount_latch_t
+ * @seq_name: Name of the seqcount_latch_t instance
+ */
+#define SEQCNT_LATCH_ZERO(seq_name) {					\
+	.seqcount		= SEQCNT_ZERO(seq_name.seqcount),	\
+}
+
+/**
+ * seqcount_latch_init() - runtime initializer for seqcount_latch_t
+ * @s: Pointer to the seqcount_latch_t instance
+ */
+static inline void seqcount_latch_init(seqcount_latch_t *s)
+{
+	seqcount_init(&s->seqcount);
+}
+
+/**
+ * raw_read_seqcount_latch() - pick even/odd latch data copy
+ * @s: Pointer to seqcount_t, seqcount_raw_spinlock_t, or seqcount_latch_t
  *
- * Check raw_write_seqcount_latch() for more details and a full reader and
- * writer usage example.
+ * See raw_write_seqcount_latch() for details and a full reader/writer
+ * usage example.
  *
  * Return: sequence counter raw value. Use the lowest bit as an index for
- * picking which data copy to read. The full counter value must then be
- * checked with read_seqcount_retry().
+ * picking which data copy to read. The full counter must then be checked
+ * with read_seqcount_latch_retry().
  */
-#define raw_read_seqcount_latch(s)					\
-	raw_read_seqcount_t_latch(__seqcount_ptr(s))
+#define raw_read_seqcount_latch(s)						\
+({										\
+	/*									\
+	 * Pairs with the first smp_wmb() in raw_write_seqcount_latch().	\
+	 * Due to the dependent load, a full smp_rmb() is not needed.		\
+	 */									\
+	_Generic(*(s),								\
+		 seqcount_t:		  READ_ONCE(((seqcount_t *)s)->sequence),			\
+		 seqcount_raw_spinlock_t: READ_ONCE(((seqcount_raw_spinlock_t *)s)->seqcount.sequence),	\
+		 seqcount_latch_t:	  READ_ONCE(((seqcount_latch_t *)s)->seqcount.sequence));	\
+})
 
-static inline int raw_read_seqcount_t_latch(seqcount_t *s)
+/**
+ * read_seqcount_latch_retry() - end a seqcount_latch_t read section
+ * @s:		Pointer to seqcount_latch_t
+ * @start:	count, from raw_read_seqcount_latch()
+ *
+ * Return: true if a read section retry is required, else false
+ */
+static inline int
+read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
 {
-	/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
-	int seq = READ_ONCE(s->sequence); /* ^^^ */
-	return seq;
+	return read_seqcount_retry(&s->seqcount, start);
 }
 
 /**
- * raw_write_seqcount_latch() - redirect readers to even/odd copy
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+ * raw_write_seqcount_latch() - redirect latch readers to even/odd copy
+ * @s: Pointer to seqcount_t, seqcount_raw_spinlock_t, or seqcount_latch_t
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -633,7 +675,7 @@ static inline int raw_read_seqcount_t_latch(seqcount_t *s)
  * The basic form is a data structure like::
  *
  *	struct latch_struct {
- *		seqcount_t		seq;
+ *		seqcount_latch_t	seq;
  *		struct data_struct	data[2];
  *	};
  *
@@ -643,13 +685,13 @@ static inline int raw_read_seqcount_t_latch(seqcount_t *s)
  *	void latch_modify(struct latch_struct *latch, ...)
  *	{
  *		smp_wmb();	// Ensure that the last data[1] update is visible
- *		latch->seq++;
+ *		latch->seq.sequence++;
  *		smp_wmb();	// Ensure that the seqcount update is visible
  *
  *		modify(latch->data[0], ...);
  *
  *		smp_wmb();	// Ensure that the data[0] update is visible
- *		latch->seq++;
+ *		latch->seq.sequence++;
  *		smp_wmb();	// Ensure that the seqcount update is visible
  *
  *		modify(latch->data[1], ...);
@@ -668,8 +710,8 @@ static inline int raw_read_seqcount_t_latch(seqcount_t *s)
  *			idx = seq & 0x01;
  *			entry = data_query(latch->data[idx], ...);
  *
- *		// read_seqcount_retry() includes needed smp_rmb()
- *		} while (read_seqcount_retry(&latch->seq, seq));
+ *		// This includes needed smp_rmb()
+ *		} while (read_seqcount_latch_retry(&latch->seq, seq));
  *
  *		return entry;
  *	}
@@ -693,14 +735,14 @@ static inline int raw_read_seqcount_t_latch(seqcount_t *s)
  *	When data is a dynamic data structure; one should use regular RCU
  *	patterns to manage the lifetimes of the objects within.
  */
-#define raw_write_seqcount_latch(s)					\
-	raw_write_seqcount_t_latch(__seqcount_ptr(s))
-
-static inline void raw_write_seqcount_t_latch(seqcount_t *s)
-{
-       smp_wmb();      /* prior stores before incrementing "sequence" */
-       s->sequence++;
-       smp_wmb();      /* increment "sequence" before following stores */
+#define raw_write_seqcount_latch(s)						\
+{										\
+       smp_wmb();      /* prior stores before incrementing "sequence" */	\
+       _Generic(*(s),								\
+		seqcount_t:		((seqcount_t *)s)->sequence++,		\
+		seqcount_raw_spinlock_t:((seqcount_raw_spinlock_t *)s)->seqcount.sequence++, \
+		seqcount_latch_t:	((seqcount_latch_t *)s)->seqcount.sequence++); \
+       smp_wmb();      /* increment "sequence" before following stores */	\
 }
 
 /*
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 4/8] time/sched_clock: Use seqcount_latch_t
  2020-08-27 11:40 ` [PATCH v1 0/8] seqlock: Introduce seqcount_latch_t Ahmed S. Darwish
                     ` (2 preceding siblings ...)
  2020-08-27 11:40   ` [PATCH v1 3/8] seqlock: Introduce seqcount_latch_t Ahmed S. Darwish
@ 2020-08-27 11:40   ` Ahmed S. Darwish
  2020-09-10 15:08     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-08-27 11:40   ` [PATCH v1 5/8] timekeeping: " Ahmed S. Darwish
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-27 11:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Sebastian A. Siewior, LKML, Ahmed S. Darwish

Latch sequence counters have unique read and write APIs, and thus
seqcount_latch_t was recently introduced at seqlock.h.

Use that new data type instead of plain seqcount_t. This adds the
necessary type-safety and ensures only latching-safe seqcount APIs are
to be used.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 kernel/time/sched_clock.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 8c6b5febd7a0..0642013dace4 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -35,7 +35,7 @@
  * into a single 64-byte cache line.
  */
 struct clock_data {
-	seqcount_t		seq;
+	seqcount_latch_t	seq;
 	struct clock_read_data	read_data[2];
 	ktime_t			wrap_kt;
 	unsigned long		rate;
@@ -76,7 +76,7 @@ struct clock_read_data *sched_clock_read_begin(unsigned int *seq)
 
 int sched_clock_read_retry(unsigned int seq)
 {
-	return read_seqcount_retry(&cd.seq, seq);
+	return read_seqcount_latch_retry(&cd.seq, seq);
 }
 
 unsigned long long notrace sched_clock(void)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 5/8] timekeeping: Use seqcount_latch_t
  2020-08-27 11:40 ` [PATCH v1 0/8] seqlock: Introduce seqcount_latch_t Ahmed S. Darwish
                     ` (3 preceding siblings ...)
  2020-08-27 11:40   ` [PATCH v1 4/8] time/sched_clock: Use seqcount_latch_t Ahmed S. Darwish
@ 2020-08-27 11:40   ` Ahmed S. Darwish
  2020-09-10 15:08     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-08-27 11:40   ` [PATCH v1 6/8] x86/tsc: " Ahmed S. Darwish
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-27 11:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Sebastian A. Siewior, LKML, Ahmed S. Darwish

Latch sequence counters are a multiversion concurrency control mechanism
where the seqcount_t counter even/odd value is used to switch between
two data storage copies. This allows the seqcount_t read path to safely
interrupt its write side critical section (e.g. from NMIs).

Initially, latch sequence counters were implemented as a single write
function, raw_write_seqcount_latch(), above plain seqcount_t. The read
path was expected to use plain seqcount_t raw_read_seqcount().

A specialized read function was later added, raw_read_seqcount_latch(),
and became the standardized way for latch read paths. Having unique read
and write APIs meant that latch sequence counters are basically a data
type of their own -- just inappropriately overloading plain seqcount_t.
The seqcount_latch_t data type was thus introduced at seqlock.h.

Use that new data type instead of seqcount_raw_spinlock_t. This ensures
that only latch-safe APIs are to be used with the sequence counter.

Note that the use of seqcount_raw_spinlock_t was not very useful in the
first place. Only the "raw_" subset of seqcount_t APIs were used at
timekeeping.c. This subset was created for contexts where lockdep cannot
be used. seqcount_LOCKTYPE_t's raison d'être -- verifying that the
seqcount_t writer serialization lock is held -- cannot thus be done.

References: 0c3351d451ae ("seqlock: Use raw_ prefix instead of _no_lockdep")
References: 55f3560df975 ("seqlock: Extend seqcount API with associated locks")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 kernel/time/timekeeping.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4c47f388a83f..999c981ae766 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -64,7 +64,7 @@ static struct timekeeper shadow_timekeeper;
  * See @update_fast_timekeeper() below.
  */
 struct tk_fast {
-	seqcount_raw_spinlock_t	seq;
+	seqcount_latch_t	seq;
 	struct tk_read_base	base[2];
 };
 
@@ -81,13 +81,13 @@ static struct clocksource dummy_clock = {
 };
 
 static struct tk_fast tk_fast_mono ____cacheline_aligned = {
-	.seq     = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_mono.seq, &timekeeper_lock),
+	.seq     = SEQCNT_LATCH_ZERO(tk_fast_mono.seq),
 	.base[0] = { .clock = &dummy_clock, },
 	.base[1] = { .clock = &dummy_clock, },
 };
 
 static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
-	.seq     = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_raw.seq, &timekeeper_lock),
+	.seq     = SEQCNT_LATCH_ZERO(tk_fast_raw.seq),
 	.base[0] = { .clock = &dummy_clock, },
 	.base[1] = { .clock = &dummy_clock, },
 };
@@ -467,7 +467,7 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
 					tk_clock_read(tkr),
 					tkr->cycle_last,
 					tkr->mask));
-	} while (read_seqcount_retry(&tkf->seq, seq));
+	} while (read_seqcount_latch_retry(&tkf->seq, seq));
 
 	return now;
 }
@@ -533,7 +533,7 @@ static __always_inline u64 __ktime_get_real_fast_ns(struct tk_fast *tkf)
 					tk_clock_read(tkr),
 					tkr->cycle_last,
 					tkr->mask));
-	} while (read_seqcount_retry(&tkf->seq, seq));
+	} while (read_seqcount_latch_retry(&tkf->seq, seq));
 
 	return now;
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 6/8] x86/tsc: Use seqcount_latch_t
  2020-08-27 11:40 ` [PATCH v1 0/8] seqlock: Introduce seqcount_latch_t Ahmed S. Darwish
                     ` (4 preceding siblings ...)
  2020-08-27 11:40   ` [PATCH v1 5/8] timekeeping: " Ahmed S. Darwish
@ 2020-08-27 11:40   ` Ahmed S. Darwish
  2020-09-04  7:41     ` peterz
  2020-09-10 15:08     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-08-27 11:40   ` [PATCH v1 7/8] rbtree_latch: " Ahmed S. Darwish
  2020-08-27 11:40   ` [PATCH v1 8/8] seqlock: seqcount latch APIs: Only allow seqcount_latch_t Ahmed S. Darwish
  7 siblings, 2 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-27 11:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Sebastian A. Siewior, LKML, Ahmed S. Darwish,
	Borislav Petkov, x86, H. Peter Anvin

Latch sequence counters have unique read and write APIs, and thus
seqcount_latch_t was recently introduced at seqlock.h.

Use that new data type instead of plain seqcount_t. This adds the
necessary type-safety and ensures that only latching-safe seqcount APIs
are to be used.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 arch/x86/kernel/tsc.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 49d925043171..cbf17e9f1d03 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -54,7 +54,7 @@ struct clocksource *art_related_clocksource;
 
 struct cyc2ns {
 	struct cyc2ns_data data[2];	/*  0 + 2*16 = 32 */
-	seqcount_t	   seq;		/* 32 + 4    = 36 */
+	seqcount_latch_t   seq;		/* 32 + 4    = 36 */
 
 }; /* fits one cacheline */
 
@@ -68,19 +68,21 @@ early_param("tsc_early_khz", tsc_early_khz_setup);
 
 __always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
 {
+	seqcount_latch_t *seqcount;
 	int seq, idx;
 
 	preempt_disable_notrace();
 
+	seqcount = &this_cpu_ptr(&cyc2ns)->seq;
 	do {
-		seq = this_cpu_read(cyc2ns.seq.sequence);
+		seq = raw_read_seqcount_latch(seqcount);
 		idx = seq & 1;
 
 		data->cyc2ns_offset = this_cpu_read(cyc2ns.data[idx].cyc2ns_offset);
 		data->cyc2ns_mul    = this_cpu_read(cyc2ns.data[idx].cyc2ns_mul);
 		data->cyc2ns_shift  = this_cpu_read(cyc2ns.data[idx].cyc2ns_shift);
 
-	} while (unlikely(seq != this_cpu_read(cyc2ns.seq.sequence)));
+	} while (read_seqcount_latch_retry(seqcount, seq));
 }
 
 __always_inline void cyc2ns_read_end(void)
@@ -186,7 +188,7 @@ static void __init cyc2ns_init_boot_cpu(void)
 {
 	struct cyc2ns *c2n = this_cpu_ptr(&cyc2ns);
 
-	seqcount_init(&c2n->seq);
+	seqcount_latch_init(&c2n->seq);
 	__set_cyc2ns_scale(tsc_khz, smp_processor_id(), rdtsc());
 }
 
@@ -203,7 +205,7 @@ static void __init cyc2ns_init_secondary_cpus(void)
 
 	for_each_possible_cpu(cpu) {
 		if (cpu != this_cpu) {
-			seqcount_init(&c2n->seq);
+			seqcount_latch_init(&c2n->seq);
 			c2n = per_cpu_ptr(&cyc2ns, cpu);
 			c2n->data[0] = data[0];
 			c2n->data[1] = data[1];
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 7/8] rbtree_latch: Use seqcount_latch_t
  2020-08-27 11:40 ` [PATCH v1 0/8] seqlock: Introduce seqcount_latch_t Ahmed S. Darwish
                     ` (5 preceding siblings ...)
  2020-08-27 11:40   ` [PATCH v1 6/8] x86/tsc: " Ahmed S. Darwish
@ 2020-08-27 11:40   ` Ahmed S. Darwish
  2020-09-10 15:08     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  2020-08-27 11:40   ` [PATCH v1 8/8] seqlock: seqcount latch APIs: Only allow seqcount_latch_t Ahmed S. Darwish
  7 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-27 11:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Sebastian A. Siewior, LKML, Ahmed S. Darwish

Latch sequence counters have unique read and write APIs, and thus
seqcount_latch_t was recently introduced at seqlock.h.

Use that new data type instead of plain seqcount_t. This adds the
necessary type-safety and ensures that only latching-safe seqcount APIs
are to be used.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/rbtree_latch.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/rbtree_latch.h b/include/linux/rbtree_latch.h
index 7d012faa509a..3d1a9e716b80 100644
--- a/include/linux/rbtree_latch.h
+++ b/include/linux/rbtree_latch.h
@@ -42,8 +42,8 @@ struct latch_tree_node {
 };
 
 struct latch_tree_root {
-	seqcount_t	seq;
-	struct rb_root	tree[2];
+	seqcount_latch_t	seq;
+	struct rb_root		tree[2];
 };
 
 /**
@@ -206,7 +206,7 @@ latch_tree_find(void *key, struct latch_tree_root *root,
 	do {
 		seq = raw_read_seqcount_latch(&root->seq);
 		node = __lt_find(key, root, seq & 1, ops->comp);
-	} while (read_seqcount_retry(&root->seq, seq));
+	} while (read_seqcount_latch_retry(&root->seq, seq));
 
 	return node;
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 8/8] seqlock: seqcount latch APIs: Only allow seqcount_latch_t
  2020-08-27 11:40 ` [PATCH v1 0/8] seqlock: Introduce seqcount_latch_t Ahmed S. Darwish
                     ` (6 preceding siblings ...)
  2020-08-27 11:40   ` [PATCH v1 7/8] rbtree_latch: " Ahmed S. Darwish
@ 2020-08-27 11:40   ` Ahmed S. Darwish
  2020-09-10 15:08     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  7 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-27 11:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Sebastian A. Siewior, LKML, Ahmed S. Darwish

All latch sequence counter call-sites have now been converted from plain
seqcount_t to the new seqcount_latch_t data type.

Enforce type-safety by modifying seqlock.h latch APIs to only accept
seqcount_latch_t.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 36 +++++++++++++++---------------------
 1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index f83fbf2180db..04db28499331 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -620,7 +620,7 @@ static inline void seqcount_latch_init(seqcount_latch_t *s)
 
 /**
  * raw_read_seqcount_latch() - pick even/odd latch data copy
- * @s: Pointer to seqcount_t, seqcount_raw_spinlock_t, or seqcount_latch_t
+ * @s: Pointer to seqcount_latch_t
  *
  * See raw_write_seqcount_latch() for details and a full reader/writer
  * usage example.
@@ -629,17 +629,14 @@ static inline void seqcount_latch_init(seqcount_latch_t *s)
  * picking which data copy to read. The full counter must then be checked
  * with read_seqcount_latch_retry().
  */
-#define raw_read_seqcount_latch(s)						\
-({										\
-	/*									\
-	 * Pairs with the first smp_wmb() in raw_write_seqcount_latch().	\
-	 * Due to the dependent load, a full smp_rmb() is not needed.		\
-	 */									\
-	_Generic(*(s),								\
-		 seqcount_t:		  READ_ONCE(((seqcount_t *)s)->sequence),			\
-		 seqcount_raw_spinlock_t: READ_ONCE(((seqcount_raw_spinlock_t *)s)->seqcount.sequence),	\
-		 seqcount_latch_t:	  READ_ONCE(((seqcount_latch_t *)s)->seqcount.sequence));	\
-})
+static inline unsigned raw_read_seqcount_latch(const seqcount_latch_t *s)
+{
+	/*
+	 * Pairs with the first smp_wmb() in raw_write_seqcount_latch().
+	 * Due to the dependent load, a full smp_rmb() is not needed.
+	 */
+	return READ_ONCE(s->seqcount.sequence);
+}
 
 /**
  * read_seqcount_latch_retry() - end a seqcount_latch_t read section
@@ -656,7 +653,7 @@ read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
 
 /**
  * raw_write_seqcount_latch() - redirect latch readers to even/odd copy
- * @s: Pointer to seqcount_t, seqcount_raw_spinlock_t, or seqcount_latch_t
+ * @s: Pointer to seqcount_latch_t
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -735,14 +732,11 @@ read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
  *	When data is a dynamic data structure; one should use regular RCU
  *	patterns to manage the lifetimes of the objects within.
  */
-#define raw_write_seqcount_latch(s)						\
-{										\
-       smp_wmb();      /* prior stores before incrementing "sequence" */	\
-       _Generic(*(s),								\
-		seqcount_t:		((seqcount_t *)s)->sequence++,		\
-		seqcount_raw_spinlock_t:((seqcount_raw_spinlock_t *)s)->seqcount.sequence++, \
-		seqcount_latch_t:	((seqcount_latch_t *)s)->seqcount.sequence++); \
-       smp_wmb();      /* increment "sequence" before following stores */	\
+static inline void raw_write_seqcount_latch(seqcount_latch_t *s)
+{
+	smp_wmb();	/* prior stores before incrementing "sequence" */
+	s->seqcount.sequence++;
+	smp_wmb();      /* increment "sequence" before following stores */
 }
 
 /*
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (28 preceding siblings ...)
  2020-08-27 11:40 ` [PATCH v1 0/8] seqlock: Introduce seqcount_latch_t Ahmed S. Darwish
@ 2020-08-28  1:07 ` Ahmed S. Darwish
  2020-08-28  1:07   ` [PATCH v1 1/5] seqlock: seqcount_LOCKTYPE_t: Standardize naming convention Ahmed S. Darwish
                     ` (5 more replies)
  2020-09-10 15:08 ` [tip: locking/core] seqlock: seqcount_LOCKNAME_t: " tip-bot2 for Ahmed S. Darwish
  30 siblings, 6 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-28  1:07 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Sebastian A. Siewior, Paul E. McKenney,
	Steven Rostedt, LKML, Ahmed S. Darwish

Hi,

Preemption must be disabled before entering a sequence counter write
side critical section.  Failing to do so, the read side section can
preempt the write side section and spin for the entire scheduler tick.
If that reader belongs to a real-time scheduling class, it can spin
forever and the kernel will livelock.

Disabling preemption cannot be done for PREEMPT_RT. It can lead to
higher latencies and the write side sections will not be able to acquire
locks which become sleeping locks (e.g. spinlock_t).

To solve this dilemma, do not disable preemption for seqcount_LOCKTYPE_t
writers. Rather, detect if a seqcount_LOCKTYPE_t writer is in progress.
If that is the case, acquire then release the associated LOCKTYPE writer
serialization lock. This will allow any preempted writer to make progress
until the end of its writer serialization lock critical section.

Implement this technique for all of PREEMPT_RT sleeping locks.

Thanks,

8<--------------

Ahmed S. Darwish (5):
  seqlock: seqcount_LOCKTYPE_t: Standardize naming convention
  seqlock: Use unique prefix for seqcount_t property accessors
  seqlock: seqcount_t: Implement all read APIs as statement expressions
  seqlock: seqcount_LOCKTYPE_t: Introduce PREEMPT_RT support
  seqlock: PREEMPT_RT: Do not starve seqlock_t writers

 include/linux/seqlock.h | 277 ++++++++++++++++++++++++----------------
 1 file changed, 167 insertions(+), 110 deletions(-)

base-commit: d012a7190fc1fd72ed48911e77ca97ba4521bccd
--
2.28.0

^ permalink raw reply	[flat|nested] 258+ messages in thread

* [PATCH v1 1/5] seqlock: seqcount_LOCKTYPE_t: Standardize naming convention
  2020-08-28  1:07 ` [PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support Ahmed S. Darwish
@ 2020-08-28  1:07   ` Ahmed S. Darwish
  2020-08-28  8:18     ` peterz
  2020-08-28  1:07   ` [PATCH v1 2/5] seqlock: Use unique prefix for seqcount_t property accessors Ahmed S. Darwish
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-28  1:07 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Sebastian A. Siewior, Paul E. McKenney,
	Steven Rostedt, LKML, Ahmed S. Darwish

At seqlock.h, sequence counters with associated locks are either called
seqcount_LOCKNAME_t, seqcount_LOCKTYPE_t, or seqcount_locktype_t.

Standardize on "seqcount_LOCKTYPE_t" for all instances in comments,
kernel-doc, and SEQCOUNT_LOCKTYPE() generative macro paramters.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 80 +++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 39 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 962d9768945f..598af597f74e 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -138,56 +138,58 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
 #endif
 
 /**
- * typedef seqcount_LOCKNAME_t - sequence counter with LOCKTYPR associated
+ * typedef seqcount_LOCKTYPE_t - sequence counter with associated lock
  * @seqcount:	The real sequence counter
  * @lock:	Pointer to the associated spinlock
  *
- * A plain sequence counter with external writer synchronization by a
- * spinlock. The spinlock is associated to the sequence count in the
+ * A plain sequence counter with external writer synchronization by
+ * LOCKTYPE @lock. The lock is associated to the sequence counter in the
  * static initializer or init function. This enables lockdep to validate
  * that the write side critical section is properly serialized.
+ *
+ * LOCKTYPE:	raw_spinlock, spinlock, rwlock, mutex, or ww_mutex.
  */
 
 /**
- * seqcount_LOCKNAME_init() - runtime initializer for seqcount_LOCKNAME_t
- * @s:		Pointer to the seqcount_LOCKNAME_t instance
+ * seqcount_LOCKTYPE_init() - runtime initializer for seqcount_LOCKTYPE_t
+ * @s:		Pointer to the seqcount_LOCKTYPE_t instance
  * @lock:	Pointer to the associated LOCKTYPE
  */
 
 /*
- * SEQCOUNT_LOCKTYPE() - Instantiate seqcount_LOCKNAME_t and helpers
- * @locktype:		actual typename
- * @lockname:		name
+ * SEQCOUNT_LOCKTYPE() - Instantiate seqcount_LOCKTYPE_t and helpers
+ * @locktype:		"LOCKTYPE" part of seqcount_LOCKTYPE_t
+ * @locktype_t:		canonical/full LOCKTYPE C data type
  * @preemptible:	preemptibility of above locktype
  * @lockmember:		argument for lockdep_assert_held()
  */
-#define SEQCOUNT_LOCKTYPE(locktype, lockname, preemptible, lockmember)	\
-typedef struct seqcount_##lockname {					\
+#define SEQCOUNT_LOCKTYPE(locktype, locktype_t, preemptible, lockmember)	\
+typedef struct seqcount_##locktype {					\
 	seqcount_t		seqcount;				\
-	__SEQ_LOCK(locktype	*lock);					\
-} seqcount_##lockname##_t;						\
+	__SEQ_LOCK(locktype_t	*lock);					\
+} seqcount_##locktype##_t;						\
 									\
 static __always_inline void						\
-seqcount_##lockname##_init(seqcount_##lockname##_t *s, locktype *lock)	\
+seqcount_##locktype##_init(seqcount_##locktype##_t *s, locktype_t *lock)\
 {									\
 	seqcount_init(&s->seqcount);					\
 	__SEQ_LOCK(s->lock = lock);					\
 }									\
 									\
 static __always_inline seqcount_t *					\
-__seqcount_##lockname##_ptr(seqcount_##lockname##_t *s)			\
+__seqcount_##locktype##_ptr(seqcount_##locktype##_t *s)			\
 {									\
 	return &s->seqcount;						\
 }									\
 									\
 static __always_inline bool						\
-__seqcount_##lockname##_preemptible(seqcount_##lockname##_t *s)		\
+__seqcount_##locktype##_preemptible(seqcount_##locktype##_t *s)		\
 {									\
 	return preemptible;						\
 }									\
 									\
 static __always_inline void						\
-__seqcount_##lockname##_assert(seqcount_##lockname##_t *s)		\
+__seqcount_##locktype##_assert(seqcount_##locktype##_t *s)		\
 {									\
 	__SEQ_LOCK(lockdep_assert_held(lockmember));			\
 }
@@ -211,15 +213,15 @@ static inline void __seqcount_assert(seqcount_t *s)
 	lockdep_assert_preemption_disabled();
 }
 
-SEQCOUNT_LOCKTYPE(raw_spinlock_t,	raw_spinlock,	false,	s->lock)
-SEQCOUNT_LOCKTYPE(spinlock_t,		spinlock,	false,	s->lock)
-SEQCOUNT_LOCKTYPE(rwlock_t,		rwlock,		false,	s->lock)
-SEQCOUNT_LOCKTYPE(struct mutex,		mutex,		true,	s->lock)
-SEQCOUNT_LOCKTYPE(struct ww_mutex,	ww_mutex,	true,	&s->lock->base)
+SEQCOUNT_LOCKTYPE(raw_spinlock,	raw_spinlock_t,		false,	s->lock)
+SEQCOUNT_LOCKTYPE(spinlock,	spinlock_t,		false,	s->lock)
+SEQCOUNT_LOCKTYPE(rwlock,	rwlock_t,		false,	s->lock)
+SEQCOUNT_LOCKTYPE(mutex,	struct mutex,		true,	s->lock)
+SEQCOUNT_LOCKTYPE(ww_mutex,	struct ww_mutex,	true,	&s->lock->base)
 
 /**
- * SEQCNT_LOCKNAME_ZERO - static initializer for seqcount_LOCKNAME_t
- * @name:	Name of the seqcount_LOCKNAME_t instance
+ * SEQCNT_LOCKTYPE_ZERO - static initializer for seqcount_LOCKTYPE_t
+ * @name:	Name of the seqcount_LOCKTYPE_t instance
  * @lock:	Pointer to the associated LOCKTYPE
  */
 
@@ -235,8 +237,8 @@ SEQCOUNT_LOCKTYPE(struct ww_mutex,	ww_mutex,	true,	&s->lock->base)
 #define SEQCNT_WW_MUTEX_ZERO(name, lock) 	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
 
 
-#define __seqprop_case(s, lockname, prop)				\
-	seqcount_##lockname##_t: __seqcount_##lockname##_##prop((void *)(s))
+#define __seqprop_case(s, locktype, prop)				\
+	seqcount_##locktype##_t: __seqcount_##locktype##_##prop((void *)(s))
 
 #define __seqprop(s, prop) _Generic(*(s),				\
 	seqcount_t:		__seqcount_##prop((void *)(s)),		\
@@ -252,7 +254,7 @@ SEQCOUNT_LOCKTYPE(struct ww_mutex,	ww_mutex,	true,	&s->lock->base)
 
 /**
  * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+ * @s: Pointer to seqcount_t or any of the seqcount_LOCKTYPE_t variants
  *
  * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
  * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
@@ -283,7 +285,7 @@ static inline unsigned __read_seqcount_t_begin(const seqcount_t *s)
 
 /**
  * raw_read_seqcount_begin() - begin a seqcount_t read section w/o lockdep
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+ * @s: Pointer to seqcount_t or any of the seqcount_LOCKTYPE_t variants
  *
  * Return: count to be passed to read_seqcount_retry()
  */
@@ -299,7 +301,7 @@ static inline unsigned raw_read_seqcount_t_begin(const seqcount_t *s)
 
 /**
  * read_seqcount_begin() - begin a seqcount_t read critical section
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+ * @s: Pointer to seqcount_t or any of the seqcount_LOCKTYPE_t variants
  *
  * Return: count to be passed to read_seqcount_retry()
  */
@@ -314,7 +316,7 @@ static inline unsigned read_seqcount_t_begin(const seqcount_t *s)
 
 /**
  * raw_read_seqcount() - read the raw seqcount_t counter value
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+ * @s: Pointer to seqcount_t or any of the seqcount_LOCKTYPE_t variants
  *
  * raw_read_seqcount opens a read critical section of the given
  * seqcount_t, without any lockdep checking, and without checking or
@@ -337,7 +339,7 @@ static inline unsigned raw_read_seqcount_t(const seqcount_t *s)
 /**
  * raw_seqcount_begin() - begin a seqcount_t read critical section w/o
  *                        lockdep and w/o counter stabilization
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+ * @s: Pointer to seqcount_t or any of the seqcount_LOCKTYPE_t variants
  *
  * raw_seqcount_begin opens a read critical section of the given
  * seqcount_t. Unlike read_seqcount_begin(), this function will not wait
@@ -365,7 +367,7 @@ static inline unsigned raw_seqcount_t_begin(const seqcount_t *s)
 
 /**
  * __read_seqcount_retry() - end a seqcount_t read section w/o barrier
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+ * @s: Pointer to seqcount_t or any of the seqcount_LOCKTYPE_t variants
  * @start: count, from read_seqcount_begin()
  *
  * __read_seqcount_retry is like read_seqcount_retry, but has no smp_rmb()
@@ -389,7 +391,7 @@ static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 
 /**
  * read_seqcount_retry() - end a seqcount_t read critical section
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+ * @s: Pointer to seqcount_t or any of the seqcount_LOCKTYPE_t variants
  * @start: count, from read_seqcount_begin()
  *
  * read_seqcount_retry closes the read critical section of given
@@ -409,7 +411,7 @@ static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 
 /**
  * raw_write_seqcount_begin() - start a seqcount_t write section w/o lockdep
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+ * @s: Pointer to seqcount_t or any of the seqcount_LOCKTYPE_t variants
  */
 #define raw_write_seqcount_begin(s)					\
 do {									\
@@ -428,7 +430,7 @@ static inline void raw_write_seqcount_t_begin(seqcount_t *s)
 
 /**
  * raw_write_seqcount_end() - end a seqcount_t write section w/o lockdep
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+ * @s: Pointer to seqcount_t or any of the seqcount_LOCKTYPE_t variants
  */
 #define raw_write_seqcount_end(s)					\
 do {									\
@@ -448,7 +450,7 @@ static inline void raw_write_seqcount_t_end(seqcount_t *s)
 /**
  * write_seqcount_begin_nested() - start a seqcount_t write section with
  *                                 custom lockdep nesting level
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+ * @s: Pointer to seqcount_t or any of the seqcount_LOCKTYPE_t variants
  * @subclass: lockdep nesting level
  *
  * See Documentation/locking/lockdep-design.rst
@@ -471,7 +473,7 @@ static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
 
 /**
  * write_seqcount_begin() - start a seqcount_t write side critical section
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+ * @s: Pointer to seqcount_t or any of the seqcount_LOCKTYPE_t variants
  *
  * write_seqcount_begin opens a write side critical section of the given
  * seqcount_t.
@@ -497,7 +499,7 @@ static inline void write_seqcount_t_begin(seqcount_t *s)
 
 /**
  * write_seqcount_end() - end a seqcount_t write side critical section
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+ * @s: Pointer to seqcount_t or any of the seqcount_LOCKTYPE_t variants
  *
  * The write section must've been opened with write_seqcount_begin().
  */
@@ -517,7 +519,7 @@ static inline void write_seqcount_t_end(seqcount_t *s)
 
 /**
  * raw_write_seqcount_barrier() - do a seqcount_t write barrier
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+ * @s: Pointer to seqcount_t or any of the seqcount_LOCKTYPE_t variants
  *
  * This can be used to provide an ordering guarantee instead of the usual
  * consistency guarantee. It is one wmb cheaper, because it can collapse
@@ -571,7 +573,7 @@ static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
 /**
  * write_seqcount_invalidate() - invalidate in-progress seqcount_t read
  *                               side operations
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+ * @s: Pointer to seqcount_t or any of the seqcount_LOCKTYPE_t variants
  *
  * After write_seqcount_invalidate, no seqcount_t read side operations
  * will complete successfully and see data older than this.
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 2/5] seqlock: Use unique prefix for seqcount_t property accessors
  2020-08-28  1:07 ` [PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support Ahmed S. Darwish
  2020-08-28  1:07   ` [PATCH v1 1/5] seqlock: seqcount_LOCKTYPE_t: Standardize naming convention Ahmed S. Darwish
@ 2020-08-28  1:07   ` Ahmed S. Darwish
  2020-08-28  8:27     ` peterz
  2020-08-28  1:07   ` [PATCH v1 3/5] seqlock: seqcount_t: Implement all read APIs as statement expressions Ahmed S. Darwish
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-28  1:07 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Sebastian A. Siewior, Paul E. McKenney,
	Steven Rostedt, LKML, Ahmed S. Darwish

At seqlock.h, the following set of functions:

    - __seqcount_ptr()
    - __seqcount_preemptible()
    - __seqcount_assert()

act as plain seqcount_t "property" accessors. Meanwhile, the following
group:

    - __seqcount_ptr()
    - __seqcount_lock_preemptible()
    - __seqcount_assert_lock_held()

act as the equivalent set, but in the generic form, taking either
seqcount_t or any of the seqcount_LOCKTYPE_t variants.

This is quite confusing, especially the first member where it is called
exactly the same in both groups.

Differentiate the first group by using "__seqcount_t_" as prefix. This
also conforms with the rest of seqlock.h naming conventions.

While at it, also constify the property accessors first parameter where
appropriate.

References: 55f3560df975 ("seqlock: Extend seqcount API with associated locks")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 598af597f74e..5470e9cd52ce 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -183,13 +183,13 @@ __seqcount_##locktype##_ptr(seqcount_##locktype##_t *s)			\
 }									\
 									\
 static __always_inline bool						\
-__seqcount_##locktype##_preemptible(seqcount_##locktype##_t *s)		\
+__seqcount_##locktype##_preemptible(const seqcount_##locktype##_t *s)	\
 {									\
 	return preemptible;						\
 }									\
 									\
 static __always_inline void						\
-__seqcount_##locktype##_assert(seqcount_##locktype##_t *s)		\
+__seqcount_##locktype##_assert(const seqcount_##locktype##_t *s)	\
 {									\
 	__SEQ_LOCK(lockdep_assert_held(lockmember));			\
 }
@@ -198,17 +198,17 @@ __seqcount_##locktype##_assert(seqcount_##locktype##_t *s)		\
  * __seqprop() for seqcount_t
  */
 
-static inline seqcount_t *__seqcount_ptr(seqcount_t *s)
+static inline seqcount_t *__seqcount_t_ptr(seqcount_t *s)
 {
 	return s;
 }
 
-static inline bool __seqcount_preemptible(seqcount_t *s)
+static inline bool __seqcount_t_preemptible(const seqcount_t *s)
 {
 	return false;
 }
 
-static inline void __seqcount_assert(seqcount_t *s)
+static inline void __seqcount_t_assert(const seqcount_t *s)
 {
 	lockdep_assert_preemption_disabled();
 }
@@ -236,12 +236,11 @@ SEQCOUNT_LOCKTYPE(ww_mutex,	struct ww_mutex,	true,	&s->lock->base)
 #define SEQCNT_MUTEX_ZERO(name, lock)		SEQCOUNT_LOCKTYPE_ZERO(name, lock)
 #define SEQCNT_WW_MUTEX_ZERO(name, lock) 	SEQCOUNT_LOCKTYPE_ZERO(name, lock)
 
-
 #define __seqprop_case(s, locktype, prop)				\
 	seqcount_##locktype##_t: __seqcount_##locktype##_##prop((void *)(s))
 
 #define __seqprop(s, prop) _Generic(*(s),				\
-	seqcount_t:		__seqcount_##prop((void *)(s)),		\
+	seqcount_t:		__seqcount_t_##prop((void *)(s)),	\
 	__seqprop_case((s),	raw_spinlock,	prop),			\
 	__seqprop_case((s),	spinlock,	prop),			\
 	__seqprop_case((s),	rwlock,		prop),			\
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 3/5] seqlock: seqcount_t: Implement all read APIs as statement expressions
  2020-08-28  1:07 ` [PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support Ahmed S. Darwish
  2020-08-28  1:07   ` [PATCH v1 1/5] seqlock: seqcount_LOCKTYPE_t: Standardize naming convention Ahmed S. Darwish
  2020-08-28  1:07   ` [PATCH v1 2/5] seqlock: Use unique prefix for seqcount_t property accessors Ahmed S. Darwish
@ 2020-08-28  1:07   ` Ahmed S. Darwish
  2020-08-28  8:30     ` peterz
  2020-08-28  1:07   ` [PATCH v1 4/5] seqlock: seqcount_LOCKTYPE_t: Introduce PREEMPT_RT support Ahmed S. Darwish
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-28  1:07 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Sebastian A. Siewior, Paul E. McKenney,
	Steven Rostedt, LKML, Ahmed S. Darwish

The sequence counters read APIs are implemented as CPP macros, so they
can take either seqcount_t or any of the seqcount_LOCKTYPE_t variants.
Such macros then get *directly* transformed to internal C functions that
only take plain seqcount_t.

Further commits need access to seqcount_LOCKTYPE_t inside of the actual
read APIs code. Thus transform all of the seqcount read APIs to pure GCC
statement expressions instead.

This will not break type-safety: all of the transformed APIs resolve to
a _Generic() selection that does not have a "default" case.

This will also not affect the transformed APIs readability: previously
added kernel-doc above all of seqlock.h functions makes the expectations
quite clear for call-site developers.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 98 ++++++++++++++++++++---------------------
 1 file changed, 49 insertions(+), 49 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 5470e9cd52ce..d114a9f4e9d9 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -182,6 +182,12 @@ __seqcount_##locktype##_ptr(seqcount_##locktype##_t *s)			\
 	return &s->seqcount;						\
 }									\
 									\
+static __always_inline unsigned						\
+__seqcount_##locktype##_sequence(const seqcount_##locktype##_t *s)	\
+{									\
+	return READ_ONCE(s->seqcount.sequence);				\
+}									\
+									\
 static __always_inline bool						\
 __seqcount_##locktype##_preemptible(const seqcount_##locktype##_t *s)	\
 {									\
@@ -203,6 +209,11 @@ static inline seqcount_t *__seqcount_t_ptr(seqcount_t *s)
 	return s;
 }
 
+static inline unsigned __seqcount_t_sequence(const seqcount_t *s)
+{
+	return READ_ONCE(s->sequence);
+}
+
 static inline bool __seqcount_t_preemptible(const seqcount_t *s)
 {
 	return false;
@@ -248,6 +259,7 @@ SEQCOUNT_LOCKTYPE(ww_mutex,	struct ww_mutex,	true,	&s->lock->base)
 	__seqprop_case((s),	ww_mutex,	prop))
 
 #define __seqcount_ptr(s)		__seqprop(s, ptr)
+#define __seqcount_sequence(s)		__seqprop(s, sequence)
 #define __seqcount_lock_preemptible(s)	__seqprop(s, preemptible)
 #define __seqcount_assert_lock_held(s)	__seqprop(s, assert)
 
@@ -266,21 +278,19 @@ SEQCOUNT_LOCKTYPE(ww_mutex,	struct ww_mutex,	true,	&s->lock->base)
  * Return: count to be passed to read_seqcount_retry()
  */
 #define __read_seqcount_begin(s)					\
-	__read_seqcount_t_begin(__seqcount_ptr(s))
-
-static inline unsigned __read_seqcount_t_begin(const seqcount_t *s)
-{
-	unsigned ret;
-
-repeat:
-	ret = READ_ONCE(s->sequence);
-	if (unlikely(ret & 1)) {
-		cpu_relax();
-		goto repeat;
-	}
-	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
-	return ret;
-}
+({									\
+	unsigned seq;							\
+									\
+	do {								\
+		seq = __seqcount_sequence(s);				\
+		if (likely(! (seq & 1)))				\
+			break;						\
+		cpu_relax();						\
+	} while (true);							\
+									\
+	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);			\
+	seq;								\
+})
 
 /**
  * raw_read_seqcount_begin() - begin a seqcount_t read section w/o lockdep
@@ -289,14 +299,12 @@ static inline unsigned __read_seqcount_t_begin(const seqcount_t *s)
  * Return: count to be passed to read_seqcount_retry()
  */
 #define raw_read_seqcount_begin(s)					\
-	raw_read_seqcount_t_begin(__seqcount_ptr(s))
-
-static inline unsigned raw_read_seqcount_t_begin(const seqcount_t *s)
-{
-	unsigned ret = __read_seqcount_t_begin(s);
-	smp_rmb();
-	return ret;
-}
+({									\
+	unsigned seq = __read_seqcount_begin(s);			\
+									\
+	smp_rmb();							\
+	seq;								\
+})
 
 /**
  * read_seqcount_begin() - begin a seqcount_t read critical section
@@ -305,13 +313,10 @@ static inline unsigned raw_read_seqcount_t_begin(const seqcount_t *s)
  * Return: count to be passed to read_seqcount_retry()
  */
 #define read_seqcount_begin(s)						\
-	read_seqcount_t_begin(__seqcount_ptr(s))
-
-static inline unsigned read_seqcount_t_begin(const seqcount_t *s)
-{
-	seqcount_lockdep_reader_access(s);
-	return raw_read_seqcount_t_begin(s);
-}
+({									\
+	seqcount_lockdep_reader_access(__seqcount_ptr(s));		\
+	raw_read_seqcount_begin(s);					\
+})
 
 /**
  * raw_read_seqcount() - read the raw seqcount_t counter value
@@ -325,15 +330,13 @@ static inline unsigned read_seqcount_t_begin(const seqcount_t *s)
  * Return: count to be passed to read_seqcount_retry()
  */
 #define raw_read_seqcount(s)						\
-	raw_read_seqcount_t(__seqcount_ptr(s))
-
-static inline unsigned raw_read_seqcount_t(const seqcount_t *s)
-{
-	unsigned ret = READ_ONCE(s->sequence);
-	smp_rmb();
-	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
-	return ret;
-}
+({									\
+	unsigned seq = __seqcount_sequence(s);				\
+									\
+	smp_rmb();							\
+	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);			\
+	seq;								\
+})
 
 /**
  * raw_seqcount_begin() - begin a seqcount_t read critical section w/o
@@ -353,16 +356,13 @@ static inline unsigned raw_read_seqcount_t(const seqcount_t *s)
  * Return: count to be passed to read_seqcount_retry()
  */
 #define raw_seqcount_begin(s)						\
-	raw_seqcount_t_begin(__seqcount_ptr(s))
-
-static inline unsigned raw_seqcount_t_begin(const seqcount_t *s)
-{
-	/*
-	 * If the counter is odd, let read_seqcount_retry() fail
-	 * by decrementing the counter.
-	 */
-	return raw_read_seqcount_t(s) & ~1;
-}
+({									\
+	/*								\
+	 * If the counter is odd, let read_seqcount_retry() fail	\
+	 * by decrementing the counter.					\
+	 */								\
+	raw_read_seqcount(s) & ~1;					\
+})
 
 /**
  * __read_seqcount_retry() - end a seqcount_t read section w/o barrier
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 4/5] seqlock: seqcount_LOCKTYPE_t: Introduce PREEMPT_RT support
  2020-08-28  1:07 ` [PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support Ahmed S. Darwish
                     ` (2 preceding siblings ...)
  2020-08-28  1:07   ` [PATCH v1 3/5] seqlock: seqcount_t: Implement all read APIs as statement expressions Ahmed S. Darwish
@ 2020-08-28  1:07   ` Ahmed S. Darwish
  2020-08-28  8:57     ` peterz
  2020-08-28  8:59     ` peterz
  2020-08-28  1:07   ` [PATCH v1 5/5] seqlock: PREEMPT_RT: Do not starve seqlock_t writers Ahmed S. Darwish
  2020-09-04  6:52   ` [PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support peterz
  5 siblings, 2 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-28  1:07 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Sebastian A. Siewior, Paul E. McKenney,
	Steven Rostedt, LKML, Ahmed S. Darwish

Preemption must be disabled before entering a sequence counter write
side critical section.  Failing to do so, the read side section can
preempt the write side section and spin for the entire scheduler tick.
If that reader belongs to a real-time scheduling class, it can spin
forever and the kernel will livelock.

Disabling preemption cannot be done for PREEMPT_RT. It can lead to
higher latencies and the write side sections will not be able to acquire
locks which become sleeping locks (e.g. spinlock_t).

To solve this dilemma, do not disable preemption for seqcount_LOCKTYPE_t
writers. Rather, detect if a seqcount_LOCKTYPE_t writer is in progress.
If that is the case, acquire then release the associated LOCKTYPE writer
serialization lock. This will allow any preempted writer to make progress
until the end of its writer serialization lock critical section.

Implement this technique for all of PREEMPT_RT sleeping locks.

Link: https://lkml.kernel.org/r/159708609435.2571.13948681727529247231.tglx@nanos
Link: https://lkml.kernel.org/r/20200519214547.352050-1-a.darwish@linutronix.de
References: 55f3560df975 ("seqlock: Extend seqcount API with associated locks")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 72 +++++++++++++++++++++++++++++++++--------
 1 file changed, 59 insertions(+), 13 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index d114a9f4e9d9..8d4bf12272ba 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -17,6 +17,7 @@
 #include <linux/kcsan-checks.h>
 #include <linux/lockdep.h>
 #include <linux/mutex.h>
+#include <linux/ww_mutex.h>
 #include <linux/preempt.h>
 #include <linux/spinlock.h>
 
@@ -131,7 +132,23 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  * See Documentation/locking/seqlock.rst
  */
 
-#ifdef CONFIG_LOCKDEP
+/*
+ * For PREEMPT_RT, seqcount_LOCKTYPE_t write side critical sections cannot
+ * disable preemption. It can lead to higher latencies, and the write side
+ * sections will not be able to acquire locks which become sleeping locks
+ * (e.g. spinlock_t).
+ *
+ * To remain preemptible while avoiding a possible livelock caused by the
+ * read side preempting the write side, use a different technique: detect
+ * if a seqcount_LOCKTYPE_t writer is in progress. If that is the case,
+ * acquire then release the associated LOCKTYPE writer serialization
+ * lock. This will allow any preempted writer to make progress until the
+ * end of its writer serialization lock critical section.
+ *
+ * This lock-unlock technique must be implemented for all of PREEMPT_RT
+ * sleeping locks.  See Documentation/locking/locktypes.rst
+ */
+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_PREEMPT_RT)
 #define __SEQ_LOCK(expr)	expr
 #else
 #define __SEQ_LOCK(expr)
@@ -162,8 +179,10 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  * @locktype_t:		canonical/full LOCKTYPE C data type
  * @preemptible:	preemptibility of above locktype
  * @lockmember:		argument for lockdep_assert_held()
+ * @lockbase:		associated lock release function (prefix only)
+ * @lock_acquire:	associated lock acquisition function (full call)
  */
-#define SEQCOUNT_LOCKTYPE(locktype, locktype_t, preemptible, lockmember)	\
+#define SEQCOUNT_LOCKTYPE(locktype, locktype_t, preemptible, lockmember, lockbase, lock_acquire) \
 typedef struct seqcount_##locktype {					\
 	seqcount_t		seqcount;				\
 	__SEQ_LOCK(locktype_t	*lock);					\
@@ -185,7 +204,23 @@ __seqcount_##locktype##_ptr(seqcount_##locktype##_t *s)			\
 static __always_inline unsigned						\
 __seqcount_##locktype##_sequence(const seqcount_##locktype##_t *s)	\
 {									\
-	return READ_ONCE(s->seqcount.sequence);				\
+	unsigned seq = READ_ONCE(s->seqcount.sequence);			\
+									\
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))				\
+		return seq;						\
+									\
+	if (preemptible && unlikely(seq & 1)) {				\
+		__SEQ_LOCK(lock_acquire);				\
+		__SEQ_LOCK(lockbase##_unlock(s->lock));			\
+									\
+		/*							\
+		 * Re-read the sequence counter since the (possibly	\
+		 * preempted) writer made progress.			\
+		 */							\
+		seq = READ_ONCE(s->seqcount.sequence);			\
+	}								\
+									\
+	return seq;							\
 }									\
 									\
 static __always_inline bool						\
@@ -224,11 +259,13 @@ static inline void __seqcount_t_assert(const seqcount_t *s)
 	lockdep_assert_preemption_disabled();
 }
 
-SEQCOUNT_LOCKTYPE(raw_spinlock,	raw_spinlock_t,		false,	s->lock)
-SEQCOUNT_LOCKTYPE(spinlock,	spinlock_t,		false,	s->lock)
-SEQCOUNT_LOCKTYPE(rwlock,	rwlock_t,		false,	s->lock)
-SEQCOUNT_LOCKTYPE(mutex,	struct mutex,		true,	s->lock)
-SEQCOUNT_LOCKTYPE(ww_mutex,	struct ww_mutex,	true,	&s->lock->base)
+#define __SEQ_RT	IS_ENABLED(CONFIG_PREEMPT_RT)
+
+SEQCOUNT_LOCKTYPE(raw_spinlock, raw_spinlock_t,  false,    s->lock,        raw_spin, raw_spin_lock(s->lock))
+SEQCOUNT_LOCKTYPE(spinlock,     spinlock_t,      __SEQ_RT, s->lock,        spin,     spin_lock(s->lock))
+SEQCOUNT_LOCKTYPE(rwlock,       rwlock_t,        __SEQ_RT, s->lock,        read,     read_lock(s->lock))
+SEQCOUNT_LOCKTYPE(mutex,        struct mutex,    true,     s->lock,        mutex,    mutex_lock(s->lock))
+SEQCOUNT_LOCKTYPE(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mutex, ww_mutex_lock(s->lock, NULL))
 
 /**
  * SEQCNT_LOCKTYPE_ZERO - static initializer for seqcount_LOCKTYPE_t
@@ -408,13 +445,22 @@ static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
 	return __read_seqcount_t_retry(s, start);
 }
 
+/*
+ * Automatically disable preemption for seqcount_LOCKTYPE_t writers, if the
+ * associated lock does not implicitly disable preemption.
+ *
+ * Don't do it for PREEMPT_RT. Check __SEQ_LOCK().
+ */
+#define __seq_enforce_preemption_protection(s)				\
+	(!IS_ENABLED(CONFIG_PREEMPT_RT) && __seqcount_lock_preemptible(s))
+
 /**
  * raw_write_seqcount_begin() - start a seqcount_t write section w/o lockdep
  * @s: Pointer to seqcount_t or any of the seqcount_LOCKTYPE_t variants
  */
 #define raw_write_seqcount_begin(s)					\
 do {									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (__seq_enforce_preemption_protection(s))			\
 		preempt_disable();					\
 									\
 	raw_write_seqcount_t_begin(__seqcount_ptr(s));			\
@@ -435,7 +481,7 @@ static inline void raw_write_seqcount_t_begin(seqcount_t *s)
 do {									\
 	raw_write_seqcount_t_end(__seqcount_ptr(s));			\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (__seq_enforce_preemption_protection(s))			\
 		preempt_enable();					\
 } while (0)
 
@@ -458,7 +504,7 @@ static inline void raw_write_seqcount_t_end(seqcount_t *s)
 do {									\
 	__seqcount_assert_lock_held(s);					\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (__seq_enforce_preemption_protection(s))			\
 		preempt_disable();					\
 									\
 	write_seqcount_t_begin_nested(__seqcount_ptr(s), subclass);	\
@@ -485,7 +531,7 @@ static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
 do {									\
 	__seqcount_assert_lock_held(s);					\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (__seq_enforce_preemption_protection(s))			\
 		preempt_disable();					\
 									\
 	write_seqcount_t_begin(__seqcount_ptr(s));			\
@@ -506,7 +552,7 @@ static inline void write_seqcount_t_begin(seqcount_t *s)
 do {									\
 	write_seqcount_t_end(__seqcount_ptr(s));			\
 									\
-	if (__seqcount_lock_preemptible(s))				\
+	if (__seq_enforce_preemption_protection(s))			\
 		preempt_enable();					\
 } while (0)
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [PATCH v1 5/5] seqlock: PREEMPT_RT: Do not starve seqlock_t writers
  2020-08-28  1:07 ` [PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support Ahmed S. Darwish
                     ` (3 preceding siblings ...)
  2020-08-28  1:07   ` [PATCH v1 4/5] seqlock: seqcount_LOCKTYPE_t: Introduce PREEMPT_RT support Ahmed S. Darwish
@ 2020-08-28  1:07   ` Ahmed S. Darwish
  2020-09-04  6:52   ` [PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support peterz
  5 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-28  1:07 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Thomas Gleixner, Sebastian A. Siewior, Paul E. McKenney,
	Steven Rostedt, LKML, Ahmed S. Darwish

On PREEMPT_RT, seqlock_t is transformed to a sleeping lock that do not
disable preemption. A seqlock_t reader can thus preempt the write side
section and spin for the enter scheduler tick. If that reader belongs to
a real-time scheduling class, it can spin forever and the kernel will
livelock.

To break such a possible livelock on PREEMPT_RT, implement seqlock_t in
terms of "seqcount_spinlock_t" instead of plain "seqcount_t".

Beside the pure annotational value, this will leverage the already
existing seqcount_LOCKTYPE_T anti-livelock mechanisms -- without adding
any extra code.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 include/linux/seqlock.h | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 8d4bf12272ba..151e7a18fd7b 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -761,13 +761,17 @@ static inline void raw_write_seqcount_t_latch(seqcount_t *s)
  *    - Documentation/locking/seqlock.rst
  */
 typedef struct {
-	struct seqcount seqcount;
+	/*
+	 * Make sure that readers don't starve writers on PREEMPT_RT: use
+	 * seqcount_spinlock_t instead of seqcount_t. Check __SEQ_LOCK().
+	 */
+	seqcount_spinlock_t seqcount;
 	spinlock_t lock;
 } seqlock_t;
 
 #define __SEQLOCK_UNLOCKED(lockname)					\
 	{								\
-		.seqcount = SEQCNT_ZERO(lockname),			\
+		.seqcount = SEQCNT_SPINLOCK_ZERO(lockname, &(lockname).lock), \
 		.lock =	__SPIN_LOCK_UNLOCKED(lockname)			\
 	}
 
@@ -777,8 +781,8 @@ typedef struct {
  */
 #define seqlock_init(sl)						\
 	do {								\
-		seqcount_init(&(sl)->seqcount);				\
 		spin_lock_init(&(sl)->lock);				\
+		seqcount_spinlock_init(&(sl)->seqcount, &(sl)->lock);	\
 	} while (0)
 
 /**
@@ -825,6 +829,12 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 	return read_seqcount_retry(&sl->seqcount, start);
 }
 
+/*
+ * For all seqlock_t write side functions, use write_seqcount_*t*_begin()
+ * instead of the generic write_seqcount_begin(). This way, no redundant
+ * lockdep_assert_held() checks are added.
+ */
+
 /**
  * write_seqlock() - start a seqlock_t write side critical section
  * @sl: Pointer to seqlock_t
@@ -841,7 +851,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 static inline void write_seqlock(seqlock_t *sl)
 {
 	spin_lock(&sl->lock);
-	write_seqcount_t_begin(&sl->seqcount);
+	write_seqcount_t_begin(&sl->seqcount.seqcount);
 }
 
 /**
@@ -853,7 +863,7 @@ static inline void write_seqlock(seqlock_t *sl)
  */
 static inline void write_sequnlock(seqlock_t *sl)
 {
-	write_seqcount_t_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount.seqcount);
 	spin_unlock(&sl->lock);
 }
 
@@ -867,7 +877,7 @@ static inline void write_sequnlock(seqlock_t *sl)
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
 	spin_lock_bh(&sl->lock);
-	write_seqcount_t_begin(&sl->seqcount);
+	write_seqcount_t_begin(&sl->seqcount.seqcount);
 }
 
 /**
@@ -880,7 +890,7 @@ static inline void write_seqlock_bh(seqlock_t *sl)
  */
 static inline void write_sequnlock_bh(seqlock_t *sl)
 {
-	write_seqcount_t_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount.seqcount);
 	spin_unlock_bh(&sl->lock);
 }
 
@@ -894,7 +904,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
 	spin_lock_irq(&sl->lock);
-	write_seqcount_t_begin(&sl->seqcount);
+	write_seqcount_t_begin(&sl->seqcount.seqcount);
 }
 
 /**
@@ -906,7 +916,7 @@ static inline void write_seqlock_irq(seqlock_t *sl)
  */
 static inline void write_sequnlock_irq(seqlock_t *sl)
 {
-	write_seqcount_t_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount.seqcount);
 	spin_unlock_irq(&sl->lock);
 }
 
@@ -915,7 +925,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 	unsigned long flags;
 
 	spin_lock_irqsave(&sl->lock, flags);
-	write_seqcount_t_begin(&sl->seqcount);
+	write_seqcount_t_begin(&sl->seqcount.seqcount);
 	return flags;
 }
 
@@ -944,7 +954,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
 static inline void
 write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
 {
-	write_seqcount_t_end(&sl->seqcount);
+	write_seqcount_t_end(&sl->seqcount.seqcount);
 	spin_unlock_irqrestore(&sl->lock, flags);
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 1/5] seqlock: seqcount_LOCKTYPE_t: Standardize naming convention
  2020-08-28  1:07   ` [PATCH v1 1/5] seqlock: seqcount_LOCKTYPE_t: Standardize naming convention Ahmed S. Darwish
@ 2020-08-28  8:18     ` peterz
  2020-08-28  8:24       ` Ahmed S. Darwish
  0 siblings, 1 reply; 258+ messages in thread
From: peterz @ 2020-08-28  8:18 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	Paul E. McKenney, Steven Rostedt, LKML

On Fri, Aug 28, 2020 at 03:07:06AM +0200, Ahmed S. Darwish wrote:
> At seqlock.h, sequence counters with associated locks are either called
> seqcount_LOCKNAME_t, seqcount_LOCKTYPE_t, or seqcount_locktype_t.
> 
> Standardize on "seqcount_LOCKTYPE_t" for all instances in comments,
> kernel-doc, and SEQCOUNT_LOCKTYPE() generative macro paramters.

> +#define SEQCOUNT_LOCKTYPE(locktype, locktype_t, preemptible, lockmember)	\
> +typedef struct seqcount_##locktype {					\
> +	__SEQ_LOCK(locktype_t	*lock);					\
> +} seqcount_##locktype##_t;						\

Hurmph, so my thinking was that the above 'locktype' is not actually a
type and therefore a misnomer.

But I see your point about it being a bit of a mess.

Would:

 s/LOCKTYPE/LOCKNAME/g
 s/seqcount_locktype_t/seqcount_LOCKNAME_t/g

help? Then we're consistently at: seqcount_LOCKNAME_t, which is a type.


^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 1/5] seqlock: seqcount_LOCKTYPE_t: Standardize naming convention
  2020-08-28  8:18     ` peterz
@ 2020-08-28  8:24       ` Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-28  8:24 UTC (permalink / raw)
  To: peterz
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	Paul E. McKenney, Steven Rostedt, LKML

Hi :)

On Fri, Aug 28, 2020 at 10:18:44AM +0200, peterz@infradead.org wrote:
> On Fri, Aug 28, 2020 at 03:07:06AM +0200, Ahmed S. Darwish wrote:
> > At seqlock.h, sequence counters with associated locks are either called
> > seqcount_LOCKNAME_t, seqcount_LOCKTYPE_t, or seqcount_locktype_t.
> >
> > Standardize on "seqcount_LOCKTYPE_t" for all instances in comments,
> > kernel-doc, and SEQCOUNT_LOCKTYPE() generative macro paramters.
>
> > +#define SEQCOUNT_LOCKTYPE(locktype, locktype_t, preemptible, lockmember)	\
> > +typedef struct seqcount_##locktype {					\
> > +	__SEQ_LOCK(locktype_t	*lock);					\
> > +} seqcount_##locktype##_t;						\
>
> Hurmph, so my thinking was that the above 'locktype' is not actually a
> type and therefore a misnomer.
>
> But I see your point about it being a bit of a mess.
>
> Would:
>
>  s/LOCKTYPE/LOCKNAME/g
>  s/seqcount_locktype_t/seqcount_LOCKNAME_t/g
>
> help? Then we're consistently at: seqcount_LOCKNAME_t, which is a type.
>

Sounds good, will do.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 2/5] seqlock: Use unique prefix for seqcount_t property accessors
  2020-08-28  1:07   ` [PATCH v1 2/5] seqlock: Use unique prefix for seqcount_t property accessors Ahmed S. Darwish
@ 2020-08-28  8:27     ` peterz
  2020-08-28  8:59       ` Ahmed S. Darwish
  0 siblings, 1 reply; 258+ messages in thread
From: peterz @ 2020-08-28  8:27 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	Paul E. McKenney, Steven Rostedt, LKML

On Fri, Aug 28, 2020 at 03:07:07AM +0200, Ahmed S. Darwish wrote:
> Differentiate the first group by using "__seqcount_t_" as prefix. This
> also conforms with the rest of seqlock.h naming conventions.

>  #define __seqprop_case(s, locktype, prop)				\
>  	seqcount_##locktype##_t: __seqcount_##locktype##_##prop((void *)(s))
>  
>  #define __seqprop(s, prop) _Generic(*(s),				\
> -	seqcount_t:		__seqcount_##prop((void *)(s)),		\
> +	seqcount_t:		__seqcount_t_##prop((void *)(s)),	\
>  	__seqprop_case((s),	raw_spinlock,	prop),			\
>  	__seqprop_case((s),	spinlock,	prop),			\
>  	__seqprop_case((s),	rwlock,		prop),			\

If instead you do:

#define __seqprop_case(s, _lockname, prop) \
	seqcount##_lockname##_t: __seqcount##_lockname##_##prop((void *)(s))

You can have:

	__seqprop_case((s),	,		prop),
	__seqprop_case((s),	_raw_spinlock,	prop),
	__seqprop_case((s),	_spinlock,	prop),
	__seqprop_case((s),	_rwlock,	prop),
	__seqprop_case((s),	_mutex,		prop),
	__seqprop_case((s),	_ww_mutex,	prop),

And it's all good again.

Although arguably we should do something like s/__seqcount/__seqprop/
over this lot.



^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 3/5] seqlock: seqcount_t: Implement all read APIs as statement expressions
  2020-08-28  1:07   ` [PATCH v1 3/5] seqlock: seqcount_t: Implement all read APIs as statement expressions Ahmed S. Darwish
@ 2020-08-28  8:30     ` peterz
  2020-08-28  8:37       ` Ahmed S. Darwish
  0 siblings, 1 reply; 258+ messages in thread
From: peterz @ 2020-08-28  8:30 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	Paul E. McKenney, Steven Rostedt, LKML

On Fri, Aug 28, 2020 at 03:07:08AM +0200, Ahmed S. Darwish wrote:
>  #define __read_seqcount_begin(s)					\
> +({									\
> +	unsigned seq;							\
> +									\
> +	do {								\
> +		seq = __seqcount_sequence(s);				\
> +		if (likely(! (seq & 1)))				\
> +			break;						\
> +		cpu_relax();						\
> +	} while (true);							\
> +									\
> +	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);			\
> +	seq;								\
> +})

Since we're there anyway, does it make sense to (re)write this like:

	while ((seq = __seqcount_sequence(s)) & 1)
		cpu_relax();

?


^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 3/5] seqlock: seqcount_t: Implement all read APIs as statement expressions
  2020-08-28  8:30     ` peterz
@ 2020-08-28  8:37       ` Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-28  8:37 UTC (permalink / raw)
  To: peterz
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	Paul E. McKenney, Steven Rostedt, LKML

On Fri, Aug 28, 2020 at 10:30:22AM +0200, peterz@infradead.org wrote:
> On Fri, Aug 28, 2020 at 03:07:08AM +0200, Ahmed S. Darwish wrote:
> >  #define __read_seqcount_begin(s)					\
> > +({									\
> > +	unsigned seq;							\
> > +									\
> > +	do {								\
> > +		seq = __seqcount_sequence(s);				\
> > +		if (likely(! (seq & 1)))				\
> > +			break;						\
> > +		cpu_relax();						\
> > +	} while (true);							\
> > +									\
> > +	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);			\
> > +	seq;								\
> > +})
>
> Since we're there anyway, does it make sense to (re)write this like:
>
> 	while ((seq = __seqcount_sequence(s)) & 1)
> 		cpu_relax();
>
> ?
>

Yeah, much better of course; will do.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 4/5] seqlock: seqcount_LOCKTYPE_t: Introduce PREEMPT_RT support
  2020-08-28  1:07   ` [PATCH v1 4/5] seqlock: seqcount_LOCKTYPE_t: Introduce PREEMPT_RT support Ahmed S. Darwish
@ 2020-08-28  8:57     ` peterz
  2020-08-28  8:59     ` peterz
  1 sibling, 0 replies; 258+ messages in thread
From: peterz @ 2020-08-28  8:57 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	Paul E. McKenney, Steven Rostedt, LKML

On Fri, Aug 28, 2020 at 03:07:09AM +0200, Ahmed S. Darwish wrote:
> +/*
> + * Automatically disable preemption for seqcount_LOCKTYPE_t writers, if the
> + * associated lock does not implicitly disable preemption.
> + *
> + * Don't do it for PREEMPT_RT. Check __SEQ_LOCK().
> + */
> +#define __seq_enforce_preemption_protection(s)				\
> +	(!IS_ENABLED(CONFIG_PREEMPT_RT) && __seqcount_lock_preemptible(s))

Hurph, so basically you want to make __seqcount_lock_preemptible()
return true PREEMPT_RT ? Should we then not muck about with the propery
instead of this?

ISTR I had something like the below, would that not be the same but much
clearer ?

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 300cbf312546..3b5ad026ddfb 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -211,11 +211,13 @@ static inline void __seqcount_assert(seqcount_t *s)
 	lockdep_assert_preemption_disabled();
 }
 
-SEQCOUNT_LOCKTYPE(raw_spinlock_t,	raw_spinlock,	false,	s->lock)
-SEQCOUNT_LOCKTYPE(spinlock_t,		spinlock,	false,	s->lock)
-SEQCOUNT_LOCKTYPE(rwlock_t,		rwlock,		false,	s->lock)
-SEQCOUNT_LOCKTYPE(struct mutex,		mutex,		true,	s->lock)
-SEQCOUNT_LOCKTYPE(struct ww_mutex,	ww_mutex,	true,	&s->lock->base)
+#define __PREEMPT_RT	IS_BUILTIN(CONFIG_PREEMPT_RT)
+
+SEQCOUNT_LOCKTYPE(raw_spinlock_t,	raw_spinlock,	false,		s->lock)
+SEQCOUNT_LOCKTYPE(spinlock_t,		spinlock,	__PREEMPT_RT,	s->lock)
+SEQCOUNT_LOCKTYPE(rwlock_t,		rwlock,		__PREEMPT_RT,	s->lock)
+SEQCOUNT_LOCKTYPE(struct mutex,		mutex,		true,		s->lock)
+SEQCOUNT_LOCKTYPE(struct ww_mutex,	ww_mutex,	true,		&s->lock->base)
 
 /*
  * SEQCNT_LOCKNAME_ZERO - static initializer for seqcount_LOCKNAME_t


^ permalink raw reply related	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 2/5] seqlock: Use unique prefix for seqcount_t property accessors
  2020-08-28  8:27     ` peterz
@ 2020-08-28  8:59       ` Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-28  8:59 UTC (permalink / raw)
  To: peterz
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	Paul E. McKenney, Steven Rostedt, LKML

On Fri, Aug 28, 2020 at 10:27:54AM +0200, peterz@infradead.org wrote:
> On Fri, Aug 28, 2020 at 03:07:07AM +0200, Ahmed S. Darwish wrote:
> > Differentiate the first group by using "__seqcount_t_" as prefix. This
> > also conforms with the rest of seqlock.h naming conventions.
>
> >  #define __seqprop_case(s, locktype, prop)				\
> >  	seqcount_##locktype##_t: __seqcount_##locktype##_##prop((void *)(s))
> >
> >  #define __seqprop(s, prop) _Generic(*(s),				\
> > -	seqcount_t:		__seqcount_##prop((void *)(s)),		\
> > +	seqcount_t:		__seqcount_t_##prop((void *)(s)),	\
> >  	__seqprop_case((s),	raw_spinlock,	prop),			\
> >  	__seqprop_case((s),	spinlock,	prop),			\
> >  	__seqprop_case((s),	rwlock,		prop),			\
>
> If instead you do:
>
> #define __seqprop_case(s, _lockname, prop) \
> 	seqcount##_lockname##_t: __seqcount##_lockname##_##prop((void *)(s))
>
> You can have:
>
> 	__seqprop_case((s),	,		prop),
> 	__seqprop_case((s),	_raw_spinlock,	prop),
> 	__seqprop_case((s),	_spinlock,	prop),
> 	__seqprop_case((s),	_rwlock,	prop),
> 	__seqprop_case((s),	_mutex,		prop),
> 	__seqprop_case((s),	_ww_mutex,	prop),
>
> And it's all good again.
>
> Although arguably we should do something like s/__seqcount/__seqprop/
> over this lot.
>

ACK.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 4/5] seqlock: seqcount_LOCKTYPE_t: Introduce PREEMPT_RT support
  2020-08-28  1:07   ` [PATCH v1 4/5] seqlock: seqcount_LOCKTYPE_t: Introduce PREEMPT_RT support Ahmed S. Darwish
  2020-08-28  8:57     ` peterz
@ 2020-08-28  8:59     ` peterz
  2020-08-28  9:31       ` Ahmed S. Darwish
  1 sibling, 1 reply; 258+ messages in thread
From: peterz @ 2020-08-28  8:59 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	Paul E. McKenney, Steven Rostedt, LKML

On Fri, Aug 28, 2020 at 03:07:09AM +0200, Ahmed S. Darwish wrote:
> +#define __SEQ_RT	IS_ENABLED(CONFIG_PREEMPT_RT)
> +
> +SEQCOUNT_LOCKTYPE(raw_spinlock, raw_spinlock_t,  false,    s->lock,        raw_spin, raw_spin_lock(s->lock))
> +SEQCOUNT_LOCKTYPE(spinlock,     spinlock_t,      __SEQ_RT, s->lock,        spin,     spin_lock(s->lock))
> +SEQCOUNT_LOCKTYPE(rwlock,       rwlock_t,        __SEQ_RT, s->lock,        read,     read_lock(s->lock))
> +SEQCOUNT_LOCKTYPE(mutex,        struct mutex,    true,     s->lock,        mutex,    mutex_lock(s->lock))
> +SEQCOUNT_LOCKTYPE(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mutex, ww_mutex_lock(s->lock, NULL))

Ooh, reading is hard, but ^^^^ you already have that.

> +/*
> + * Automatically disable preemption for seqcount_LOCKTYPE_t writers, if the
> + * associated lock does not implicitly disable preemption.
> + *
> + * Don't do it for PREEMPT_RT. Check __SEQ_LOCK().
> + */
> +#define __seq_enforce_preemption_protection(s)				\
> +	(!IS_ENABLED(CONFIG_PREEMPT_RT) && __seqcount_lock_preemptible(s))

Then what is this doing ?!? I'm confused now.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 4/5] seqlock: seqcount_LOCKTYPE_t: Introduce PREEMPT_RT support
  2020-08-28  8:59     ` peterz
@ 2020-08-28  9:31       ` Ahmed S. Darwish
  2020-08-28 14:36         ` Ahmed S. Darwish
  0 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-28  9:31 UTC (permalink / raw)
  To: peterz
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	Paul E. McKenney, Steven Rostedt, LKML

On Fri, Aug 28, 2020 at 10:59:38AM +0200, peterz@infradead.org wrote:
> On Fri, Aug 28, 2020 at 03:07:09AM +0200, Ahmed S. Darwish wrote:
> > +#define __SEQ_RT	IS_ENABLED(CONFIG_PREEMPT_RT)
> > +
> > +SEQCOUNT_LOCKTYPE(raw_spinlock, raw_spinlock_t,  false,    s->lock,        raw_spin, raw_spin_lock(s->lock))
> > +SEQCOUNT_LOCKTYPE(spinlock,     spinlock_t,      __SEQ_RT, s->lock,        spin,     spin_lock(s->lock))
> > +SEQCOUNT_LOCKTYPE(rwlock,       rwlock_t,        __SEQ_RT, s->lock,        read,     read_lock(s->lock))
> > +SEQCOUNT_LOCKTYPE(mutex,        struct mutex,    true,     s->lock,        mutex,    mutex_lock(s->lock))
> > +SEQCOUNT_LOCKTYPE(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mutex, ww_mutex_lock(s->lock, NULL))
>
> Ooh, reading is hard, but ^^^^ you already have that.
>

Haha, I was just answering the other mail :)

> > +/*
> > + * Automatically disable preemption for seqcount_LOCKTYPE_t writers, if the
> > + * associated lock does not implicitly disable preemption.
> > + *
> > + * Don't do it for PREEMPT_RT. Check __SEQ_LOCK().
> > + */
> > +#define __seq_enforce_preemption_protection(s)				\
> > +	(!IS_ENABLED(CONFIG_PREEMPT_RT) && __seqcount_lock_preemptible(s))
>
> Then what is this doing ?!? I'm confused now.

There were a number of call sites (at DRM mainly) that protected their
seqcount_t write sections with mutex and ww_mutex. So, they manually
disabled preemption before entering the seqcount_t write sections.

Unfortunately these write sections were big, and spinlocks were also
acquired inside them.  This was all very bad for RT...

So the idea was to relieve call sites from the responsibility of
disabling/enabling preemption upon entering a seqcount_LOCKNAME_t write
section, and let seqlock.h automatically handle it if LOCKNAME_t is
preemptible (the macro's condition, second part).

Having seqlock.h control the preempt disable/enable allows us to never
disable preemption for RT, which is exactly what is needed since we
already handle any possible livelock through the mechanisms described at
__SEQ_LOCK (the macro's condition test, first part).

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 4/5] seqlock: seqcount_LOCKTYPE_t: Introduce PREEMPT_RT support
  2020-08-28  9:31       ` Ahmed S. Darwish
@ 2020-08-28 14:36         ` Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-08-28 14:36 UTC (permalink / raw)
  To: peterz
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	Paul E. McKenney, Steven Rostedt, LKML

On Fri, Aug 28, 2020 at 11:31:32AM +0200, Ahmed S. Darwish wrote:
> On Fri, Aug 28, 2020 at 10:59:38AM +0200, peterz@infradead.org wrote:
> > On Fri, Aug 28, 2020 at 03:07:09AM +0200, Ahmed S. Darwish wrote:
> > > +#define __SEQ_RT	IS_ENABLED(CONFIG_PREEMPT_RT)
> > > +
> > > +SEQCOUNT_LOCKTYPE(raw_spinlock, raw_spinlock_t,  false,    s->lock,        raw_spin, raw_spin_lock(s->lock))
> > > +SEQCOUNT_LOCKTYPE(spinlock,     spinlock_t,      __SEQ_RT, s->lock,        spin,     spin_lock(s->lock))
> > > +SEQCOUNT_LOCKTYPE(rwlock,       rwlock_t,        __SEQ_RT, s->lock,        read,     read_lock(s->lock))
> > > +SEQCOUNT_LOCKTYPE(mutex,        struct mutex,    true,     s->lock,        mutex,    mutex_lock(s->lock))
> > > +SEQCOUNT_LOCKTYPE(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mutex, ww_mutex_lock(s->lock, NULL))
> >
> > Ooh, reading is hard, but ^^^^ you already have that.
> >
>
> Haha, I was just answering the other mail :)
>
> > > +/*
> > > + * Automatically disable preemption for seqcount_LOCKTYPE_t writers, if the
> > > + * associated lock does not implicitly disable preemption.
> > > + *
> > > + * Don't do it for PREEMPT_RT. Check __SEQ_LOCK().
> > > + */
> > > +#define __seq_enforce_preemption_protection(s)				\
> > > +	(!IS_ENABLED(CONFIG_PREEMPT_RT) && __seqcount_lock_preemptible(s))
> >
> > Then what is this doing ?!? I'm confused now.
>
> There were a number of call sites (at DRM mainly) that protected their
> seqcount_t write sections with mutex and ww_mutex. So, they manually
> disabled preemption before entering the seqcount_t write sections.
>
> Unfortunately these write sections were big, and spinlocks were also
> acquired inside them.  This was all very bad for RT...
>
> So the idea was to relieve call sites from the responsibility of
> disabling/enabling preemption upon entering a seqcount_LOCKNAME_t write
> section, and let seqlock.h automatically handle it if LOCKNAME_t is
> preemptible (the macro's condition, second part).
>
> Having seqlock.h control the preempt disable/enable allows us to never
> disable preemption for RT, which is exactly what is needed since we
> already handle any possible livelock through the mechanisms described at
> __SEQ_LOCK (the macro's condition test, first part).
>

So to summarize, __seqcount_lock_preemptible() has two use cases:

    1. For !PREEMPT_RT, to automatically disable preemption on the write
       side when the seqcount associated lock is preemptible.

    2. For PREEMPT_RT, to lock/unlock the seqcount associated lock on
       the read side if it is RT-preemptible (sleeping lock).

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support
  2020-08-28  1:07 ` [PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support Ahmed S. Darwish
                     ` (4 preceding siblings ...)
  2020-08-28  1:07   ` [PATCH v1 5/5] seqlock: PREEMPT_RT: Do not starve seqlock_t writers Ahmed S. Darwish
@ 2020-09-04  6:52   ` peterz
  2020-09-04  7:30     ` Ahmed S. Darwish
  5 siblings, 1 reply; 258+ messages in thread
From: peterz @ 2020-09-04  6:52 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	Paul E. McKenney, Steven Rostedt, LKML



FWIW, can you please start a new thread with ever posting? I lost this
one because it got stuck onto some ancient thread.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support
  2020-09-04  6:52   ` [PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support peterz
@ 2020-09-04  7:30     ` Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-09-04  7:30 UTC (permalink / raw)
  To: peterz
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	Paul E. McKenney, Steven Rostedt, LKML

On Fri, Sep 04, 2020 at 08:52:45AM +0200, peterz@infradead.org wrote:
>
> FWIW, can you please start a new thread with ever posting? I lost this
> one because it got stuck onto some ancient thread.
>

Yeah, I used an old send-mail script, sorry.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 6/8] x86/tsc: Use seqcount_latch_t
  2020-08-27 11:40   ` [PATCH v1 6/8] x86/tsc: " Ahmed S. Darwish
@ 2020-09-04  7:41     ` peterz
  2020-09-04  8:03       ` peterz
  2020-09-10 15:08     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
  1 sibling, 1 reply; 258+ messages in thread
From: peterz @ 2020-09-04  7:41 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	LKML, Borislav Petkov, x86, H. Peter Anvin

On Thu, Aug 27, 2020 at 01:40:42PM +0200, Ahmed S. Darwish wrote:

>  __always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
>  {
> +	seqcount_latch_t *seqcount;
>  	int seq, idx;
>  
>  	preempt_disable_notrace();
>  
> +	seqcount = &this_cpu_ptr(&cyc2ns)->seq;
>  	do {
> -		seq = this_cpu_read(cyc2ns.seq.sequence);
> +		seq = raw_read_seqcount_latch(seqcount);
>  		idx = seq & 1;
>  
>  		data->cyc2ns_offset = this_cpu_read(cyc2ns.data[idx].cyc2ns_offset);
>  		data->cyc2ns_mul    = this_cpu_read(cyc2ns.data[idx].cyc2ns_mul);
>  		data->cyc2ns_shift  = this_cpu_read(cyc2ns.data[idx].cyc2ns_shift);
>  
> -	} while (unlikely(seq != this_cpu_read(cyc2ns.seq.sequence)));
> +	} while (read_seqcount_latch_retry(seqcount, seq));
>  }

So I worried about this change, it obviously generates worse code. But I
was not expecting this:

Before:

196: 0000000000000110   189 FUNC    GLOBAL DEFAULT    1 native_sched_clock

After:

195: 0000000000000110   399 FUNC    GLOBAL DEFAULT    1 native_sched_clock

That's _210_ bytes extra!!

If you look at the disassembly of the thing after it's a complete
trainwreck.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 6/8] x86/tsc: Use seqcount_latch_t
  2020-09-04  7:41     ` peterz
@ 2020-09-04  8:03       ` peterz
  2020-09-07 16:29         ` Ahmed S. Darwish
  0 siblings, 1 reply; 258+ messages in thread
From: peterz @ 2020-09-04  8:03 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	LKML, Borislav Petkov, x86, H. Peter Anvin

On Fri, Sep 04, 2020 at 09:41:42AM +0200, peterz@infradead.org wrote:
> On Thu, Aug 27, 2020 at 01:40:42PM +0200, Ahmed S. Darwish wrote:
> 
> >  __always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
> >  {
> > +	seqcount_latch_t *seqcount;
> >  	int seq, idx;
> >  
> >  	preempt_disable_notrace();
> >  
> > +	seqcount = &this_cpu_ptr(&cyc2ns)->seq;
> >  	do {
> > -		seq = this_cpu_read(cyc2ns.seq.sequence);
> > +		seq = raw_read_seqcount_latch(seqcount);
> >  		idx = seq & 1;
> >  
> >  		data->cyc2ns_offset = this_cpu_read(cyc2ns.data[idx].cyc2ns_offset);
> >  		data->cyc2ns_mul    = this_cpu_read(cyc2ns.data[idx].cyc2ns_mul);
> >  		data->cyc2ns_shift  = this_cpu_read(cyc2ns.data[idx].cyc2ns_shift);
> >  
> > -	} while (unlikely(seq != this_cpu_read(cyc2ns.seq.sequence)));
> > +	} while (read_seqcount_latch_retry(seqcount, seq));
> >  }
> 
> So I worried about this change, it obviously generates worse code. But I
> was not expecting this:
> 
> Before:
> 
> 196: 0000000000000110   189 FUNC    GLOBAL DEFAULT    1 native_sched_clock
> 
> After:
> 
> 195: 0000000000000110   399 FUNC    GLOBAL DEFAULT    1 native_sched_clock
> 
> That's _210_ bytes extra!!
> 
> If you look at the disassembly of the thing after it's a complete
> trainwreck.

The below delta fixes it again.

---
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -68,21 +68,19 @@ early_param("tsc_early_khz", tsc_early_k
 
 __always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
 {
-	seqcount_latch_t *seqcount;
 	int seq, idx;
 
 	preempt_disable_notrace();
 
-	seqcount = &this_cpu_ptr(&cyc2ns)->seq;
 	do {
-		seq = raw_read_seqcount_latch(seqcount);
+		seq = this_cpu_read(cyc2ns.seq.seqcount.sequence);
 		idx = seq & 1;
 
 		data->cyc2ns_offset = this_cpu_read(cyc2ns.data[idx].cyc2ns_offset);
 		data->cyc2ns_mul    = this_cpu_read(cyc2ns.data[idx].cyc2ns_mul);
 		data->cyc2ns_shift  = this_cpu_read(cyc2ns.data[idx].cyc2ns_shift);
 
-	} while (read_seqcount_latch_retry(seqcount, seq));
+	} while (unlikely(seq != this_cpu_read(cyc2ns.seq.seqcount.sequence)));
 }
 
 __always_inline void cyc2ns_read_end(void)

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 6/8] x86/tsc: Use seqcount_latch_t
  2020-09-04  8:03       ` peterz
@ 2020-09-07 16:29         ` Ahmed S. Darwish
  2020-09-07 17:30           ` peterz
  0 siblings, 1 reply; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-09-07 16:29 UTC (permalink / raw)
  To: peterz
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	LKML, Borislav Petkov, x86, H. Peter Anvin

Hi,

On Fri, Sep 04, 2020 at 10:03:12AM +0200, peterz@infradead.org wrote:
> On Fri, Sep 04, 2020 at 09:41:42AM +0200, peterz@infradead.org wrote:
> > On Thu, Aug 27, 2020 at 01:40:42PM +0200, Ahmed S. Darwish wrote:
> >
> > >  __always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
> > >  {
> > > +	seqcount_latch_t *seqcount;
> > >  	int seq, idx;
> > >
> > >  	preempt_disable_notrace();
> > >
> > > +	seqcount = &this_cpu_ptr(&cyc2ns)->seq;
> > >  	do {
> > > -		seq = this_cpu_read(cyc2ns.seq.sequence);
> > > +		seq = raw_read_seqcount_latch(seqcount);
> > >  		idx = seq & 1;
> > >
> > >  		data->cyc2ns_offset = this_cpu_read(cyc2ns.data[idx].cyc2ns_offset);
> > >  		data->cyc2ns_mul    = this_cpu_read(cyc2ns.data[idx].cyc2ns_mul);
> > >  		data->cyc2ns_shift  = this_cpu_read(cyc2ns.data[idx].cyc2ns_shift);
> > >
> > > -	} while (unlikely(seq != this_cpu_read(cyc2ns.seq.sequence)));
> > > +	} while (read_seqcount_latch_retry(seqcount, seq));
> > >  }
> >
> > So I worried about this change, it obviously generates worse code. But I
> > was not expecting this:
> >
> > Before:
> >
> > 196: 0000000000000110   189 FUNC    GLOBAL DEFAULT    1 native_sched_clock
> >
> > After:
> >
> > 195: 0000000000000110   399 FUNC    GLOBAL DEFAULT    1 native_sched_clock
> >
> > That's _210_ bytes extra!!
> >
> > If you look at the disassembly of the thing after it's a complete
> > trainwreck.
>
> The below delta fixes it again.
>
> ---
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -68,21 +68,19 @@ early_param("tsc_early_khz", tsc_early_k
>
>  __always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
>  {
> -	seqcount_latch_t *seqcount;
>  	int seq, idx;
>
>  	preempt_disable_notrace();
>
> -	seqcount = &this_cpu_ptr(&cyc2ns)->seq;
>  	do {
> -		seq = raw_read_seqcount_latch(seqcount);
> +		seq = this_cpu_read(cyc2ns.seq.seqcount.sequence);
>  		idx = seq & 1;
>
>  		data->cyc2ns_offset = this_cpu_read(cyc2ns.data[idx].cyc2ns_offset);
>  		data->cyc2ns_mul    = this_cpu_read(cyc2ns.data[idx].cyc2ns_mul);
>  		data->cyc2ns_shift  = this_cpu_read(cyc2ns.data[idx].cyc2ns_shift);
>
> -	} while (read_seqcount_latch_retry(seqcount, seq));
> +	} while (unlikely(seq != this_cpu_read(cyc2ns.seq.seqcount.sequence)));
>  }
>
>  __always_inline void cyc2ns_read_end(void)

I've been unsuccessful in reproducing this huge, 200+ bytes, difference.
Can I please get the defconfig and GCC version?

Here are the two competing implementations:

noinline void cyc2ns_read_begin_v1(struct cyc2ns_data *data)
{
	seqcount_latch_t *seqcount;
	int seq, idx;

	preempt_disable_notrace();

	seqcount = &this_cpu_ptr(&cyc2ns)->seq;
	do {
		seq = raw_read_seqcount_latch(seqcount);
		idx = seq & 1;

		data->cyc2ns_offset = this_cpu_read(cyc2ns.data[idx].cyc2ns_offset);
		data->cyc2ns_mul    = this_cpu_read(cyc2ns.data[idx].cyc2ns_mul);
		data->cyc2ns_shift  = this_cpu_read(cyc2ns.data[idx].cyc2ns_shift);

	} while (read_seqcount_latch_retry(seqcount, seq));
}

noinline void cyc2ns_read_begin_v2(struct cyc2ns_data *data)
{
	int seq, idx;

	preempt_disable_notrace();

	do {
		seq = this_cpu_read(cyc2ns.seq.seqcount.sequence);
		idx = seq & 1;

		data->cyc2ns_offset = this_cpu_read(cyc2ns.data[idx].cyc2ns_offset);
		data->cyc2ns_mul    = this_cpu_read(cyc2ns.data[idx].cyc2ns_mul);
		data->cyc2ns_shift  = this_cpu_read(cyc2ns.data[idx].cyc2ns_shift);

	} while (unlikely(seq != this_cpu_read(cyc2ns.seq.seqcount.sequence)));
}

And here are their resulting gcc-7/8/4.9 binary output (DEBUG_PREEMPT=y,
but setting DEBUG_KERNEL/DEBUG_PREEMPT=n also doesn't show any big _v1
vs. _v2 difference; similarly when using gcc-10):

gcc-7
=====

ffffffff81028890 <cyc2ns_read_begin_v1>:
ffffffff81028890:	65 ff 05 a9 e6 fe 7e 	incl   %gs:0x7efee6a9(%rip)        # 16f40 <__preempt_count>
			ffffffff81028893: R_X86_64_PC32	__preempt_count-0x4
ffffffff81028897:	48 c7 c6 40 a5 1f 00 	mov    $0x1fa540,%rsi
			ffffffff8102889a: R_X86_64_32S	.data..percpu+0x1fa540
ffffffff8102889e:	48 89 f2             	mov    %rsi,%rdx
ffffffff810288a1:	65 48 03 15 bf 8a fe 	add    %gs:0x7efe8abf(%rip),%rdx        # 11368 <this_cpu_off>
ffffffff810288a8:	7e
			ffffffff810288a5: R_X86_64_PC32	this_cpu_off-0x4
ffffffff810288a9:	8b 4a 20             	mov    0x20(%rdx),%ecx
ffffffff810288ac:	89 c8                	mov    %ecx,%eax
ffffffff810288ae:	83 e0 01             	and    $0x1,%eax
ffffffff810288b1:	48 c1 e0 04          	shl    $0x4,%rax
ffffffff810288b5:	48 01 f0             	add    %rsi,%rax
ffffffff810288b8:	65 4c 8b 40 08       	mov    %gs:0x8(%rax),%r8
ffffffff810288bd:	4c 89 47 08          	mov    %r8,0x8(%rdi)
ffffffff810288c1:	65 44 8b 00          	mov    %gs:(%rax),%r8d
ffffffff810288c5:	44 89 07             	mov    %r8d,(%rdi)
ffffffff810288c8:	65 8b 40 04          	mov    %gs:0x4(%rax),%eax
ffffffff810288cc:	89 47 04             	mov    %eax,0x4(%rdi)
ffffffff810288cf:	8b 42 20             	mov    0x20(%rdx),%eax
ffffffff810288d2:	39 c8                	cmp    %ecx,%eax
ffffffff810288d4:	75 d3                	jne    ffffffff810288a9 <cyc2ns_read_begin_v1+0x19>
ffffffff810288d6:	f3 c3                	repz retq
ffffffff810288d8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
ffffffff810288df:	00

ffffffff810288e0 <cyc2ns_read_begin_v2>:
ffffffff810288e0:	65 ff 05 59 e6 fe 7e 	incl   %gs:0x7efee659(%rip)        # 16f40 <__preempt_count>
			ffffffff810288e3: R_X86_64_PC32	__preempt_count-0x4
ffffffff810288e7:	48 c7 c1 40 a5 1f 00 	mov    $0x1fa540,%rcx
			ffffffff810288ea: R_X86_64_32S	.data..percpu+0x1fa540
ffffffff810288ee:	48 89 c8             	mov    %rcx,%rax
ffffffff810288f1:	65 48 03 05 6f 8a fe 	add    %gs:0x7efe8a6f(%rip),%rax        # 11368 <this_cpu_off>
ffffffff810288f8:	7e
			ffffffff810288f5: R_X86_64_PC32	this_cpu_off-0x4
ffffffff810288f9:	65 8b 15 60 1c 1d 7f 	mov    %gs:0x7f1d1c60(%rip),%edx        # 1fa560 <cyc2ns+0x20>
			ffffffff810288fc: R_X86_64_PC32	.data..percpu+0x1fa55c
ffffffff81028900:	89 d0                	mov    %edx,%eax
ffffffff81028902:	83 e0 01             	and    $0x1,%eax
ffffffff81028905:	48 c1 e0 04          	shl    $0x4,%rax
ffffffff81028909:	48 01 c8             	add    %rcx,%rax
ffffffff8102890c:	65 48 8b 70 08       	mov    %gs:0x8(%rax),%rsi
ffffffff81028911:	48 89 77 08          	mov    %rsi,0x8(%rdi)
ffffffff81028915:	65 8b 30             	mov    %gs:(%rax),%esi
ffffffff81028918:	89 37                	mov    %esi,(%rdi)
ffffffff8102891a:	65 8b 40 04          	mov    %gs:0x4(%rax),%eax
ffffffff8102891e:	89 47 04             	mov    %eax,0x4(%rdi)
ffffffff81028921:	65 8b 05 38 1c 1d 7f 	mov    %gs:0x7f1d1c38(%rip),%eax        # 1fa560 <cyc2ns+0x20>
			ffffffff81028924: R_X86_64_PC32	.data..percpu+0x1fa55c
ffffffff81028928:	39 c2                	cmp    %eax,%edx
ffffffff8102892a:	75 cd                	jne    ffffffff810288f9 <cyc2ns_read_begin_v2+0x19>
ffffffff8102892c:	f3 c3                	repz retq
ffffffff8102892e:	66 90                	xchg   %ax,%ax

gcc-8
=====

ffffffff810281a0 <cyc2ns_read_begin_v1>:
ffffffff810281a0:	65 ff 05 99 ed fe 7e 	incl   %gs:0x7efeed99(%rip)        # 16f40 <__preempt_count>
			ffffffff810281a3: R_X86_64_PC32	__preempt_count-0x4
ffffffff810281a7:	49 c7 c0 40 a5 1f 00 	mov    $0x1fa540,%r8
			ffffffff810281aa: R_X86_64_32S	.data..percpu+0x1fa540
ffffffff810281ae:	4c 89 c1             	mov    %r8,%rcx
ffffffff810281b1:	65 48 03 0d af 91 fe 	add    %gs:0x7efe91af(%rip),%rcx        # 11368 <this_cpu_off>
ffffffff810281b8:	7e
			ffffffff810281b5: R_X86_64_PC32	this_cpu_off-0x4
ffffffff810281b9:	8b 51 20             	mov    0x20(%rcx),%edx
ffffffff810281bc:	89 d0                	mov    %edx,%eax
ffffffff810281be:	83 e0 01             	and    $0x1,%eax
ffffffff810281c1:	48 c1 e0 04          	shl    $0x4,%rax
ffffffff810281c5:	4c 01 c0             	add    %r8,%rax
ffffffff810281c8:	65 48 8b 70 08       	mov    %gs:0x8(%rax),%rsi
ffffffff810281cd:	48 89 77 08          	mov    %rsi,0x8(%rdi)
ffffffff810281d1:	65 8b 30             	mov    %gs:(%rax),%esi
ffffffff810281d4:	89 37                	mov    %esi,(%rdi)
ffffffff810281d6:	65 8b 40 04          	mov    %gs:0x4(%rax),%eax
ffffffff810281da:	89 47 04             	mov    %eax,0x4(%rdi)
ffffffff810281dd:	8b 41 20             	mov    0x20(%rcx),%eax
ffffffff810281e0:	39 d0                	cmp    %edx,%eax
ffffffff810281e2:	75 d5                	jne    ffffffff810281b9 <cyc2ns_read_begin_v1+0x19>
ffffffff810281e4:	c3                   	retq
ffffffff810281e5:	66 66 2e 0f 1f 84 00 	data16 nopw %cs:0x0(%rax,%rax,1)
ffffffff810281ec:	00 00 00 00

ffffffff810281f0 <cyc2ns_read_begin_v2>:
ffffffff810281f0:	65 ff 05 49 ed fe 7e 	incl   %gs:0x7efeed49(%rip)        # 16f40 <__preempt_count>
			ffffffff810281f3: R_X86_64_PC32	__preempt_count-0x4
ffffffff810281f7:	48 c7 c1 40 a5 1f 00 	mov    $0x1fa540,%rcx
			ffffffff810281fa: R_X86_64_32S	.data..percpu+0x1fa540
ffffffff810281fe:	48 89 c8             	mov    %rcx,%rax
ffffffff81028201:	65 48 03 05 5f 91 fe 	add    %gs:0x7efe915f(%rip),%rax        # 11368 <this_cpu_off>
ffffffff81028208:	7e
			ffffffff81028205: R_X86_64_PC32	this_cpu_off-0x4
ffffffff81028209:	65 8b 15 50 23 1d 7f 	mov    %gs:0x7f1d2350(%rip),%edx        # 1fa560 <cyc2ns+0x20>
			ffffffff8102820c: R_X86_64_PC32	.data..percpu+0x1fa55c
ffffffff81028210:	89 d0                	mov    %edx,%eax
ffffffff81028212:	83 e0 01             	and    $0x1,%eax
ffffffff81028215:	48 c1 e0 04          	shl    $0x4,%rax
ffffffff81028219:	48 01 c8             	add    %rcx,%rax
ffffffff8102821c:	65 48 8b 70 08       	mov    %gs:0x8(%rax),%rsi
ffffffff81028221:	48 89 77 08          	mov    %rsi,0x8(%rdi)
ffffffff81028225:	65 8b 30             	mov    %gs:(%rax),%esi
ffffffff81028228:	89 37                	mov    %esi,(%rdi)
ffffffff8102822a:	65 8b 40 04          	mov    %gs:0x4(%rax),%eax
ffffffff8102822e:	89 47 04             	mov    %eax,0x4(%rdi)
ffffffff81028231:	65 8b 05 28 23 1d 7f 	mov    %gs:0x7f1d2328(%rip),%eax        # 1fa560 <cyc2ns+0x20>
			ffffffff81028234: R_X86_64_PC32	.data..percpu+0x1fa55c
ffffffff81028238:	39 c2                	cmp    %eax,%edx
ffffffff8102823a:	75 cd                	jne    ffffffff81028209 <cyc2ns_read_begin_v2+0x19>
ffffffff8102823c:	c3                   	retq
ffffffff8102823d:	0f 1f 00             	nopl   (%rax)

gcc-4.9
=======

ffffffff810ab900 <cyc2ns_read_begin_v1>:
ffffffff810ab900:	55                   	push   %rbp
ffffffff810ab901:	48 89 fd             	mov    %rdi,%rbp
ffffffff810ab904:	53                   	push   %rbx
ffffffff810ab905:	65 ff 05 f4 b6 f6 7e 	incl   %gs:0x7ef6b6f4(%rip)        # 17000 <__preempt_count>
			ffffffff810ab908: R_X86_64_PC32	__preempt_count-0x4
ffffffff810ab90c:	e8 df da c9 00       	callq  ffffffff81d493f0 <debug_smp_processor_id>
			ffffffff810ab90d: R_X86_64_PC32	debug_smp_processor_id-0x4
ffffffff810ab911:	89 c0                	mov    %eax,%eax
ffffffff810ab913:	48 c7 c3 00 ab 02 00 	mov    $0x2ab00,%rbx
			ffffffff810ab916: R_X86_64_32S	.data..percpu+0x2ab00
ffffffff810ab91a:	48 03 1c c5 80 87 65 	add    -0x7d9a7880(,%rax,8),%rbx
ffffffff810ab921:	82
			ffffffff810ab91e: R_X86_64_32S	__per_cpu_offset
ffffffff810ab922:	8b 53 20             	mov    0x20(%rbx),%edx
ffffffff810ab925:	89 d0                	mov    %edx,%eax
ffffffff810ab927:	83 e0 01             	and    $0x1,%eax
ffffffff810ab92a:	48 c1 e0 04          	shl    $0x4,%rax
ffffffff810ab92e:	65 48 8b 88 08 ab 02 	mov    %gs:0x2ab08(%rax),%rcx
ffffffff810ab935:	00
			ffffffff810ab932: R_X86_64_32S	.data..percpu+0x2ab08
ffffffff810ab936:	48 89 4d 08          	mov    %rcx,0x8(%rbp)
ffffffff810ab93a:	65 8b 88 00 ab 02 00 	mov    %gs:0x2ab00(%rax),%ecx
			ffffffff810ab93d: R_X86_64_32S	.data..percpu+0x2ab00
ffffffff810ab941:	89 4d 00             	mov    %ecx,0x0(%rbp)
ffffffff810ab944:	65 8b 80 04 ab 02 00 	mov    %gs:0x2ab04(%rax),%eax
			ffffffff810ab947: R_X86_64_32S	.data..percpu+0x2ab04
ffffffff810ab94b:	89 45 04             	mov    %eax,0x4(%rbp)
ffffffff810ab94e:	8b 43 20             	mov    0x20(%rbx),%eax
ffffffff810ab951:	39 c2                	cmp    %eax,%edx
ffffffff810ab953:	75 cd                	jne    ffffffff810ab922 <cyc2ns_read_begin_v1+0x22>
ffffffff810ab955:	5b                   	pop    %rbx
ffffffff810ab956:	5d                   	pop    %rbp
ffffffff810ab957:	c3                   	retq
ffffffff810ab958:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
ffffffff810ab95f:	00

ffffffff810ab960 <cyc2ns_read_begin_v2>:
ffffffff810ab960:	65 ff 05 99 b6 f6 7e 	incl   %gs:0x7ef6b699(%rip)        # 17000 <__preempt_count>
			ffffffff810ab963: R_X86_64_PC32	__preempt_count-0x4
ffffffff810ab967:	65 8b 15 b2 f1 f7 7e 	mov    %gs:0x7ef7f1b2(%rip),%edx        # 2ab20 <cyc2ns+0x20>
			ffffffff810ab96a: R_X86_64_PC32	.data..percpu+0x2ab1c
ffffffff810ab96e:	89 d0                	mov    %edx,%eax
ffffffff810ab970:	83 e0 01             	and    $0x1,%eax
ffffffff810ab973:	48 c1 e0 04          	shl    $0x4,%rax
ffffffff810ab977:	65 48 8b 88 08 ab 02 	mov    %gs:0x2ab08(%rax),%rcx
ffffffff810ab97e:	00
			ffffffff810ab97b: R_X86_64_32S	.data..percpu+0x2ab08
ffffffff810ab97f:	48 89 4f 08          	mov    %rcx,0x8(%rdi)
ffffffff810ab983:	65 8b 88 00 ab 02 00 	mov    %gs:0x2ab00(%rax),%ecx
			ffffffff810ab986: R_X86_64_32S	.data..percpu+0x2ab00
ffffffff810ab98a:	89 0f                	mov    %ecx,(%rdi)
ffffffff810ab98c:	65 8b 80 04 ab 02 00 	mov    %gs:0x2ab04(%rax),%eax
			ffffffff810ab98f: R_X86_64_32S	.data..percpu+0x2ab04
ffffffff810ab993:	89 47 04             	mov    %eax,0x4(%rdi)
ffffffff810ab996:	65 8b 05 83 f1 f7 7e 	mov    %gs:0x7ef7f183(%rip),%eax        # 2ab20 <cyc2ns+0x20>
			ffffffff810ab999: R_X86_64_PC32	.data..percpu+0x2ab1c
ffffffff810ab99d:	39 c2                	cmp    %eax,%edx
ffffffff810ab99f:	75 c6                	jne    ffffffff810ab967 <cyc2ns_read_begin_v2+0x7>
ffffffff810ab9a1:	f3 c3                	repz retq
ffffffff810ab9a3:	66 66 66 66 2e 0f 1f 	data16 data16 data16 nopw %cs:0x0(%rax,%rax,1)
ffffffff810ab9aa:	84 00 00 00 00 00

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 6/8] x86/tsc: Use seqcount_latch_t
  2020-09-07 16:29         ` Ahmed S. Darwish
@ 2020-09-07 17:30           ` peterz
  2020-09-08  6:23             ` Ahmed S. Darwish
  0 siblings, 1 reply; 258+ messages in thread
From: peterz @ 2020-09-07 17:30 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	LKML, Borislav Petkov, x86, H. Peter Anvin

On Mon, Sep 07, 2020 at 06:29:13PM +0200, Ahmed S. Darwish wrote:

> I've been unsuccessful in reproducing this huge, 200+ bytes, difference.
> Can I please get the defconfig and GCC version?

I think I lost the config and it's either gcc-9.3 or gcc-10, I can't
remember.

I just tried with:

  make defconfig
  ./scripts/config --enable PREEMPT --enable DEBUG_ATOMIC_SLEEP
  make oldconfig

And that reproduces things a little, but nowhere near as horrible as I
reported. Clearly I had something mad enabled by accident.

> Here are the two competing implementations:
> 
> noinline void cyc2ns_read_begin_v1(struct cyc2ns_data *data)
> {
> 	seqcount_latch_t *seqcount;
> 	int seq, idx;
> 
> 	preempt_disable_notrace();
> 
> 	seqcount = &this_cpu_ptr(&cyc2ns)->seq;
> 	do {
> 		seq = raw_read_seqcount_latch(seqcount);
> 		idx = seq & 1;
> 
> 		data->cyc2ns_offset = this_cpu_read(cyc2ns.data[idx].cyc2ns_offset);
> 		data->cyc2ns_mul    = this_cpu_read(cyc2ns.data[idx].cyc2ns_mul);
> 		data->cyc2ns_shift  = this_cpu_read(cyc2ns.data[idx].cyc2ns_shift);
> 
> 	} while (read_seqcount_latch_retry(seqcount, seq));
> }
> 
> noinline void cyc2ns_read_begin_v2(struct cyc2ns_data *data)
> {
> 	int seq, idx;
> 
> 	preempt_disable_notrace();
> 
> 	do {
> 		seq = this_cpu_read(cyc2ns.seq.seqcount.sequence);
> 		idx = seq & 1;
> 
> 		data->cyc2ns_offset = this_cpu_read(cyc2ns.data[idx].cyc2ns_offset);
> 		data->cyc2ns_mul    = this_cpu_read(cyc2ns.data[idx].cyc2ns_mul);
> 		data->cyc2ns_shift  = this_cpu_read(cyc2ns.data[idx].cyc2ns_shift);
> 
> 	} while (unlikely(seq != this_cpu_read(cyc2ns.seq.seqcount.sequence)));
> }

Don't look at this function in isolation, look at native_sched_clock()
where it's used as a whole.

What happened (afaict) is that the change caused it to use more
registers and ended up spiling crap on the stack.

GCC-9.3 gives me:

(this_cpu variant)


0000 0000000000000c00 <native_sched_clock>:
0000  c00:	e9 65 00 00 00       	jmpq   c6a <native_sched_clock+0x6a>
0005  c05:	0f 31                	rdtsc  
0007  c07:	48 c1 e2 20          	shl    $0x20,%rdx
000b  c0b:	48 09 c2             	or     %rax,%rdx
000e  c0e:	65 ff 05 00 00 00 00 	incl   %gs:0x0(%rip)        # c15 <native_sched_clock+0x15>
0011 			c11: R_X86_64_PC32	__preempt_count-0x4
0015  c15:	65 44 8b 05 00 00 00 	mov    %gs:0x0(%rip),%r8d        # c1d <native_sched_clock+0x1d>
001c  c1c:	00 
0019 			c19: R_X86_64_PC32	.data..percpu..shared_aligned+0x1c
001d  c1d:	44 89 c0             	mov    %r8d,%eax
0020  c20:	83 e0 01             	and    $0x1,%eax
0023  c23:	48 c1 e0 04          	shl    $0x4,%rax
0027  c27:	48 8d 88 00 00 00 00 	lea    0x0(%rax),%rcx
002a 			c2a: R_X86_64_32S	.data..percpu..shared_aligned
002e  c2e:	65 48 8b 79 08       	mov    %gs:0x8(%rcx),%rdi
0033  c33:	65 8b b0 00 00 00 00 	mov    %gs:0x0(%rax),%esi
0036 			c36: R_X86_64_32S	.data..percpu..shared_aligned
003a  c3a:	65 8b 49 04          	mov    %gs:0x4(%rcx),%ecx
003e  c3e:	65 8b 05 00 00 00 00 	mov    %gs:0x0(%rip),%eax        # c45 <native_sched_clock+0x45>
0041 			c41: R_X86_64_PC32	.data..percpu..shared_aligned+0x1c
0045  c45:	41 39 c0             	cmp    %eax,%r8d
0048  c48:	75 cb                	jne    c15 <native_sched_clock+0x15>
004a  c4a:	89 f0                	mov    %esi,%eax
004c  c4c:	48 f7 e2             	mul    %rdx
004f  c4f:	48 0f ad d0          	shrd   %cl,%rdx,%rax
0053  c53:	48 d3 ea             	shr    %cl,%rdx
0056  c56:	f6 c1 40             	test   $0x40,%cl
0059  c59:	48 0f 45 c2          	cmovne %rdx,%rax
005d  c5d:	48 01 f8             	add    %rdi,%rax
0060  c60:	65 ff 0d 00 00 00 00 	decl   %gs:0x0(%rip)        # c67 <native_sched_clock+0x67>
0063 			c63: R_X86_64_PC32	__preempt_count-0x4
0067  c67:	74 1a                	je     c83 <native_sched_clock+0x83>
0069  c69:	c3                   	retq   
006a  c6a:	48 69 05 00 00 00 00 	imul   $0xf4240,0x0(%rip),%rax        # c75 <native_sched_clock+0x75>
0071  c71:	40 42 0f 00 
006d 			c6d: R_X86_64_PC32	jiffies_64-0x8
0075  c75:	48 ba 00 b8 64 d9 05 	movabs $0xfff0be05d964b800,%rdx
007c  c7c:	be f0 ff 
007f  c7f:	48 01 d0             	add    %rdx,%rax
0082  c82:	c3                   	retq   
0083  c83:	55                   	push   %rbp
0084  c84:	48 89 e5             	mov    %rsp,%rbp
0087  c87:	e8 00 00 00 00       	callq  c8c <native_sched_clock+0x8c>
0088 			c88: R_X86_64_PLT32	preempt_schedule_notrace_thunk-0x4
008c  c8c:	5d                   	pop    %rbp
008d  c8d:	c3                   	retq   



(seqcount_latch variant)


0000 0000000000000c20 <native_sched_clock>:
0000  c20:	e9 89 00 00 00       	jmpq   cae <native_sched_clock+0x8e>
0005  c25:	55                   	push   %rbp
0006  c26:	48 89 e5             	mov    %rsp,%rbp
0009  c29:	41 54                	push   %r12
000b  c2b:	53                   	push   %rbx
000c  c2c:	48 83 e4 f0          	and    $0xfffffffffffffff0,%rsp
0010  c30:	0f 31                	rdtsc  
0012  c32:	48 c1 e2 20          	shl    $0x20,%rdx
0016  c36:	48 89 d3             	mov    %rdx,%rbx
0019  c39:	48 09 c3             	or     %rax,%rbx
001c  c3c:	65 ff 05 00 00 00 00 	incl   %gs:0x0(%rip)        # c43 <native_sched_clock+0x23>
001f 			c3f: R_X86_64_PC32	__preempt_count-0x4
0023  c43:	e8 00 00 00 00       	callq  c48 <native_sched_clock+0x28>
0024 			c44: R_X86_64_PLT32	debug_smp_processor_id-0x4
0028  c48:	49 c7 c4 00 00 00 00 	mov    $0x0,%r12
002b 			c4b: R_X86_64_32S	.data..percpu..shared_aligned+0x20
002f  c4f:	89 c0                	mov    %eax,%eax
0031  c51:	4c 03 24 c5 00 00 00 	add    0x0(,%rax,8),%r12
0038  c58:	00 
0035 			c55: R_X86_64_32S	__per_cpu_offset
0039  c59:	4c 89 e0             	mov    %r12,%rax
003c  c5c:	8b 30                	mov    (%rax),%esi
003e  c5e:	89 f1                	mov    %esi,%ecx
0040  c60:	83 e1 01             	and    $0x1,%ecx
0043  c63:	48 c1 e1 04          	shl    $0x4,%rcx
0047  c67:	48 8d b9 00 00 00 00 	lea    0x0(%rcx),%rdi
004a 			c6a: R_X86_64_32S	.data..percpu..shared_aligned
004e  c6e:	65 4c 8b 47 08       	mov    %gs:0x8(%rdi),%r8
0053  c73:	65 44 8b 89 00 00 00 	mov    %gs:0x0(%rcx),%r9d
005a  c7a:	00 
0057 			c77: R_X86_64_32S	.data..percpu..shared_aligned
005b  c7b:	65 8b 4f 04          	mov    %gs:0x4(%rdi),%ecx
005f  c7f:	8b 10                	mov    (%rax),%edx
0061  c81:	39 d6                	cmp    %edx,%esi
0063  c83:	75 d7                	jne    c5c <native_sched_clock+0x3c>
0065  c85:	44 89 c8             	mov    %r9d,%eax
0068  c88:	48 f7 e3             	mul    %rbx
006b  c8b:	48 0f ad d0          	shrd   %cl,%rdx,%rax
006f  c8f:	48 d3 ea             	shr    %cl,%rdx
0072  c92:	f6 c1 40             	test   $0x40,%cl
0075  c95:	48 0f 45 c2          	cmovne %rdx,%rax
0079  c99:	4c 01 c0             	add    %r8,%rax
007c  c9c:	65 ff 0d 00 00 00 00 	decl   %gs:0x0(%rip)        # ca3 <native_sched_clock+0x83>
007f 			c9f: R_X86_64_PC32	__preempt_count-0x4
0083  ca3:	74 22                	je     cc7 <native_sched_clock+0xa7>
0085  ca5:	48 8d 65 f0          	lea    -0x10(%rbp),%rsp
0089  ca9:	5b                   	pop    %rbx
008a  caa:	41 5c                	pop    %r12
008c  cac:	5d                   	pop    %rbp
008d  cad:	c3                   	retq   
008e  cae:	48 69 05 00 00 00 00 	imul   $0xf4240,0x0(%rip),%rax        # cb9 <native_sched_clock+0x99>
0095  cb5:	40 42 0f 00 
0091 			cb1: R_X86_64_PC32	jiffies_64-0x8
0099  cb9:	49 b8 00 b8 64 d9 05 	movabs $0xfff0be05d964b800,%r8
00a0  cc0:	be f0 ff 
00a3  cc3:	4c 01 c0             	add    %r8,%rax
00a6  cc6:	c3                   	retq   
00a7  cc7:	e8 00 00 00 00       	callq  ccc <native_sched_clock+0xac>
00a8 			cc8: R_X86_64_PLT32	preempt_schedule_notrace_thunk-0x4
00ac  ccc:	eb d7                	jmp    ca5 <native_sched_clock+0x85>


And you see it starting to spill stuff.

When we disable DEBUG_ATOMIC_SLEEP it becomes much saner again:


0000 0000000000000b40 <native_sched_clock>:
0000  b40:	e9 6b 00 00 00       	jmpq   bb0 <native_sched_clock+0x70>
0005  b45:	0f 31                	rdtsc  
0007  b47:	48 c1 e2 20          	shl    $0x20,%rdx
000b  b4b:	48 09 c2             	or     %rax,%rdx
000e  b4e:	65 ff 05 00 00 00 00 	incl   %gs:0x0(%rip)        # b55 <native_sched_clock+0x15>
0011 			b51: R_X86_64_PC32	__preempt_count-0x4
0015  b55:	49 c7 c0 00 00 00 00 	mov    $0x0,%r8
0018 			b58: R_X86_64_32S	.data..percpu..shared_aligned+0x20
001c  b5c:	65 4c 03 05 00 00 00 	add    %gs:0x0(%rip),%r8        # b64 <native_sched_clock+0x24>
0023  b63:	00 
0020 			b60: R_X86_64_PC32	this_cpu_off-0x4
0024  b64:	41 8b 30             	mov    (%r8),%esi
0027  b67:	89 f1                	mov    %esi,%ecx
0029  b69:	83 e1 01             	and    $0x1,%ecx
002c  b6c:	48 c1 e1 04          	shl    $0x4,%rcx
0030  b70:	48 8d b9 00 00 00 00 	lea    0x0(%rcx),%rdi
0033 			b73: R_X86_64_32S	.data..percpu..shared_aligned
0037  b77:	65 4c 8b 57 08       	mov    %gs:0x8(%rdi),%r10
003c  b7c:	65 44 8b 89 00 00 00 	mov    %gs:0x0(%rcx),%r9d
0043  b83:	00 
0040 			b80: R_X86_64_32S	.data..percpu..shared_aligned
0044  b84:	65 8b 4f 04          	mov    %gs:0x4(%rdi),%ecx
0048  b88:	41 8b 38             	mov    (%r8),%edi
004b  b8b:	39 fe                	cmp    %edi,%esi
004d  b8d:	75 d5                	jne    b64 <native_sched_clock+0x24>
004f  b8f:	44 89 c8             	mov    %r9d,%eax
0052  b92:	48 f7 e2             	mul    %rdx
0055  b95:	48 0f ad d0          	shrd   %cl,%rdx,%rax
0059  b99:	48 d3 ea             	shr    %cl,%rdx
005c  b9c:	f6 c1 40             	test   $0x40,%cl
005f  b9f:	48 0f 45 c2          	cmovne %rdx,%rax
0063  ba3:	4c 01 d0             	add    %r10,%rax
0066  ba6:	65 ff 0d 00 00 00 00 	decl   %gs:0x0(%rip)        # bad <native_sched_clock+0x6d>
0069 			ba9: R_X86_64_PC32	__preempt_count-0x4
006d  bad:	74 1a                	je     bc9 <native_sched_clock+0x89>
006f  baf:	c3                   	retq   
0070  bb0:	48 69 05 00 00 00 00 	imul   $0xf4240,0x0(%rip),%rax        # bbb <native_sched_clock+0x7b>
0077  bb7:	40 42 0f 00 
0073 			bb3: R_X86_64_PC32	jiffies_64-0x8
007b  bbb:	49 ba 00 b8 64 d9 05 	movabs $0xfff0be05d964b800,%r10
0082  bc2:	be f0 ff 
0085  bc5:	4c 01 d0             	add    %r10,%rax
0088  bc8:	c3                   	retq   
0089  bc9:	55                   	push   %rbp
008a  bca:	48 89 e5             	mov    %rsp,%rbp
008d  bcd:	e8 00 00 00 00       	callq  bd2 <native_sched_clock+0x92>
008e 			bce: R_X86_64_PLT32	preempt_schedule_notrace_thunk-0x4
0092  bd2:	5d                   	pop    %rbp
0093  bd3:	c3                   	retq   

But that's still slightly larger.



Anyway, I frobbed the patch to use the this_cpu variant, and I've queued
the lot.

^ permalink raw reply	[flat|nested] 258+ messages in thread

* Re: [PATCH v1 6/8] x86/tsc: Use seqcount_latch_t
  2020-09-07 17:30           ` peterz
@ 2020-09-08  6:23             ` Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: Ahmed S. Darwish @ 2020-09-08  6:23 UTC (permalink / raw)
  To: peterz
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, Sebastian A. Siewior,
	LKML, Borislav Petkov, x86, H. Peter Anvin

On Mon, Sep 07, 2020 at 07:30:47PM +0200, peterz@infradead.org wrote:
...
>
> Don't look at this function in isolation, look at native_sched_clock()
> where it's used as a whole.
>
...
> What happened (afaict) is that the change caused it to use more
> registers and ended up spiling crap on the stack.
>
...
>
> Anyway, I frobbed the patch to use the this_cpu variant, and I've queued
> the lot.

Perfect. Thanks a lot ;-)

^ permalink raw reply	[flat|nested] 258+ messages in thread

* [tip: locking/core] seqlock: seqcount_LOCKNAME_t: Introduce PREEMPT_RT support
  2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
                   ` (29 preceding siblings ...)
  2020-08-28  1:07 ` [PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support Ahmed S. Darwish
@ 2020-09-10 15:08 ` tip-bot2 for Ahmed S. Darwish
  30 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-09-10 15:08 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     8117ab508f9c476e0a10b9db7f4818f784cf3176
Gitweb:        https://git.kernel.org/tip/8117ab508f9c476e0a10b9db7f4818f784cf3176
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Fri, 04 Sep 2020 17:32:30 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 10 Sep 2020 11:19:31 +02:00

seqlock: seqcount_LOCKNAME_t: Introduce PREEMPT_RT support

Preemption must be disabled before entering a sequence counter write
side critical section.  Otherwise the read side section can preempt the
write side section and spin for the entire scheduler tick.  If that
reader belongs to a real-time scheduling class, it can spin forever and
the kernel will livelock.

Disabling preemption cannot be done for PREEMPT_RT though: it can lead
to higher latencies, and the write side sections will not be able to
acquire locks which become sleeping locks (e.g. spinlock_t).

To remain preemptible, while avoiding a possible livelock caused by the
reader preempting the writer, use a different technique: let the reader
detect if a seqcount_LOCKNAME_t writer is in progress. If that's the
case, acquire then release the associated LOCKNAME writer serialization
lock. This will allow any possibly-preempted writer to make progress
until the end of its writer serialization lock critical section.

Implement this lock-unlock technique for all seqcount_LOCKNAME_t with
an associated (PREEMPT_RT) sleeping lock.

References: 55f3560df975 ("seqlock: Extend seqcount API with associated locks")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200519214547.352050-1-a.darwish@linutronix.de
---
 include/linux/seqlock.h | 61 +++++++++++++++++++++++++++++++++-------
 1 file changed, 51 insertions(+), 10 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index f3b7827..2bc9510 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -17,6 +17,7 @@
 #include <linux/kcsan-checks.h>
 #include <linux/lockdep.h>
 #include <linux/mutex.h>
+#include <linux/ww_mutex.h>
 #include <linux/preempt.h>
 #include <linux/spinlock.h>
 
@@ -131,7 +132,23 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  * See Documentation/locking/seqlock.rst
  */
 
-#ifdef CONFIG_LOCKDEP
+/*
+ * For PREEMPT_RT, seqcount_LOCKNAME_t write side critical sections cannot
+ * disable preemption. It can lead to higher latencies, and the write side
+ * sections will not be able to acquire locks which become sleeping locks
+ * (e.g. spinlock_t).
+ *
+ * To remain preemptible while avoiding a possible livelock caused by the
+ * reader preempting the writer, use a different technique: let the reader
+ * detect if a seqcount_LOCKNAME_t writer is in progress. If that is the
+ * case, acquire then release the associated LOCKNAME writer serialization
+ * lock. This will allow any possibly-preempted writer to make progress
+ * until the end of its writer serialization lock critical section.
+ *
+ * This lock-unlock technique must be implemented for all of PREEMPT_RT
+ * sleeping locks.  See Documentation/locking/locktypes.rst
+ */
+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_PREEMPT_RT)
 #define __SEQ_LOCK(expr)	expr
 #else
 #define __SEQ_LOCK(expr)
@@ -162,10 +179,12 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  *
  * @lockname:		"LOCKNAME" part of seqcount_LOCKNAME_t
  * @locktype:		LOCKNAME canonical C data type
- * @preemptible:	preemptibility of above lockname
+ * @preemptible:	preemptibility of above locktype
  * @lockmember:		argument for lockdep_assert_held()
+ * @lockbase:		associated lock release function (prefix only)
+ * @lock_acquire:	associated lock acquisition function (full call)
  */
-#define SEQCOUNT_LOCKNAME(lockname, locktype, preemptible, lockmember)	\
+#define SEQCOUNT_LOCKNAME(lockname, locktype, preemptible, lockmember, lockbase, lock_acquire) \
 typedef struct seqcount_##lockname {					\
 	seqcount_t		seqcount;				\
 	__SEQ_LOCK(locktype	*lock);					\
@@ -187,13 +206,33 @@ __seqprop_##lockname##_ptr(seqcount_##lockname##_t *s)			\
 static __always_inline unsigned						\
 __seqprop_##lockname##_sequence(const seqcount_##lockname##_t *s)	\
 {									\
-	return READ_ONCE(s->seqcount.sequence);				\
+	unsigned seq = READ_ONCE(s->seqcount.sequence);			\
+									\
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))				\
+		return seq;						\
+									\
+	if (preemptible && unlikely(seq & 1)) {				\
+		__SEQ_LOCK(lock_acquire);				\
+		__SEQ_LOCK(lockbase##_unlock(s->lock));			\
+									\
+		/*							\
+		 * Re-read the sequence counter since the (possibly	\
+		 * preempted) writer made progress.			\
+		 */							\
+		seq = READ_ONCE(s->seqcount.sequence);			\
+	}								\
+									\
+	return seq;							\
 }									\
 									\
 static __always_inline bool						\
 __seqprop_##lockname##_preemptible(const seqcount_##lockname##_t *s)	\
 {									\
-	return preemptible;						\
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))				\
+		return preemptible;					\
+									\
+	/* PREEMPT_RT relies on the above LOCK+UNLOCK */		\
+	return false;							\
 }									\
 									\
 static __always_inline void						\
@@ -226,11 +265,13 @@ static inline void __seqprop_assert(const seqcount_t *s)
 	lockdep_assert_preemption_disabled();
 }
 
-SEQCOUNT_LOCKNAME(raw_spinlock,	raw_spinlock_t,		false,	s->lock)
-SEQCOUNT_LOCKNAME(spinlock,	spinlock_t,		false,	s->lock)
-SEQCOUNT_LOCKNAME(rwlock,	rwlock_t,		false,	s->lock)
-SEQCOUNT_LOCKNAME(mutex,	struct mutex,		true,	s->lock)
-SEQCOUNT_LOCKNAME(ww_mutex,	struct ww_mutex,	true,	&s->lock->base)
+#define __SEQ_RT	IS_ENABLED(CONFIG_PREEMPT_RT)
+
+SEQCOUNT_LOCKNAME(raw_spinlock, raw_spinlock_t,  false,    s->lock,        raw_spin, raw_spin_lock(s->lock))
+SEQCOUNT_LOCKNAME(spinlock,     spinlock_t,      __SEQ_RT, s->lock,        spin,     spin_lock(s->lock))
+SEQCOUNT_LOCKNAME(rwlock,       rwlock_t,        __SEQ_RT, s->lock,        read,     read_lock(s->lock))
+SEQCOUNT_LOCKNAME(mutex,        struct mutex,    true,     s->lock,        mutex,    mutex_lock(s->lock))
+SEQCOUNT_LOCKNAME(ww_mutex,     struct ww_mutex, true,     &s->lock->base, ww_mutex, ww_mutex_lock(s->lock, NULL))
 
 /*
  * SEQCNT_LOCKNAME_ZERO - static initializer for seqcount_LOCKNAME_t

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] seqlock: seqcount latch APIs: Only allow seqcount_latch_t
  2020-08-27 11:40   ` [PATCH v1 8/8] seqlock: seqcount latch APIs: Only allow seqcount_latch_t Ahmed S. Darwish
@ 2020-09-10 15:08     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-09-10 15:08 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     0c9794c8b6781eb7dad8e19b78c5d4557790597a
Gitweb:        https://git.kernel.org/tip/0c9794c8b6781eb7dad8e19b78c5d4557790597a
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Thu, 27 Aug 2020 13:40:44 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 10 Sep 2020 11:19:30 +02:00

seqlock: seqcount latch APIs: Only allow seqcount_latch_t

All latch sequence counter call-sites have now been converted from plain
seqcount_t to the new seqcount_latch_t data type.

Enforce type-safety by modifying seqlock.h latch APIs to only accept
seqcount_latch_t.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200827114044.11173-9-a.darwish@linutronix.de
---
 include/linux/seqlock.h | 36 +++++++++++++++---------------------
 1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 88b917d..f2a7a46 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -620,7 +620,7 @@ static inline void seqcount_latch_init(seqcount_latch_t *s)
 
 /**
  * raw_read_seqcount_latch() - pick even/odd latch data copy
- * @s: Pointer to seqcount_t, seqcount_raw_spinlock_t, or seqcount_latch_t
+ * @s: Pointer to seqcount_latch_t
  *
  * See raw_write_seqcount_latch() for details and a full reader/writer
  * usage example.
@@ -629,17 +629,14 @@ static inline void seqcount_latch_init(seqcount_latch_t *s)
  * picking which data copy to read. The full counter must then be checked
  * with read_seqcount_latch_retry().
  */
-#define raw_read_seqcount_latch(s)						\
-({										\
-	/*									\
-	 * Pairs with the first smp_wmb() in raw_write_seqcount_latch().	\
-	 * Due to the dependent load, a full smp_rmb() is not needed.		\
-	 */									\
-	_Generic(*(s),								\
-		 seqcount_t:		  READ_ONCE(((seqcount_t *)s)->sequence),			\
-		 seqcount_raw_spinlock_t: READ_ONCE(((seqcount_raw_spinlock_t *)s)->seqcount.sequence),	\
-		 seqcount_latch_t:	  READ_ONCE(((seqcount_latch_t *)s)->seqcount.sequence));	\
-})
+static inline unsigned raw_read_seqcount_latch(const seqcount_latch_t *s)
+{
+	/*
+	 * Pairs with the first smp_wmb() in raw_write_seqcount_latch().
+	 * Due to the dependent load, a full smp_rmb() is not needed.
+	 */
+	return READ_ONCE(s->seqcount.sequence);
+}
 
 /**
  * read_seqcount_latch_retry() - end a seqcount_latch_t read section
@@ -656,7 +653,7 @@ read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
 
 /**
  * raw_write_seqcount_latch() - redirect latch readers to even/odd copy
- * @s: Pointer to seqcount_t, seqcount_raw_spinlock_t, or seqcount_latch_t
+ * @s: Pointer to seqcount_latch_t
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -735,14 +732,11 @@ read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
  *	When data is a dynamic data structure; one should use regular RCU
  *	patterns to manage the lifetimes of the objects within.
  */
-#define raw_write_seqcount_latch(s)						\
-{										\
-       smp_wmb();      /* prior stores before incrementing "sequence" */	\
-       _Generic(*(s),								\
-		seqcount_t:		((seqcount_t *)s)->sequence++,		\
-		seqcount_raw_spinlock_t:((seqcount_raw_spinlock_t *)s)->seqcount.sequence++, \
-		seqcount_latch_t:	((seqcount_latch_t *)s)->seqcount.sequence++); \
-       smp_wmb();      /* increment "sequence" before following stores */	\
+static inline void raw_write_seqcount_latch(seqcount_latch_t *s)
+{
+	smp_wmb();	/* prior stores before incrementing "sequence" */
+	s->seqcount.sequence++;
+	smp_wmb();      /* increment "sequence" before following stores */
 }
 
 /*

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] rbtree_latch: Use seqcount_latch_t
  2020-08-27 11:40   ` [PATCH v1 7/8] rbtree_latch: " Ahmed S. Darwish
@ 2020-09-10 15:08     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-09-10 15:08 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     24bf401cebfd630cc9e2c3746e43945e836626f9
Gitweb:        https://git.kernel.org/tip/24bf401cebfd630cc9e2c3746e43945e836626f9
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Thu, 27 Aug 2020 13:40:43 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 10 Sep 2020 11:19:29 +02:00

rbtree_latch: Use seqcount_latch_t

Latch sequence counters have unique read and write APIs, and thus
seqcount_latch_t was recently introduced at seqlock.h.

Use that new data type instead of plain seqcount_t. This adds the
necessary type-safety and ensures that only latching-safe seqcount APIs
are to be used.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200827114044.11173-8-a.darwish@linutronix.de
---
 include/linux/rbtree_latch.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/rbtree_latch.h b/include/linux/rbtree_latch.h
index 7d012fa..3d1a9e7 100644
--- a/include/linux/rbtree_latch.h
+++ b/include/linux/rbtree_latch.h
@@ -42,8 +42,8 @@ struct latch_tree_node {
 };
 
 struct latch_tree_root {
-	seqcount_t	seq;
-	struct rb_root	tree[2];
+	seqcount_latch_t	seq;
+	struct rb_root		tree[2];
 };
 
 /**
@@ -206,7 +206,7 @@ latch_tree_find(void *key, struct latch_tree_root *root,
 	do {
 		seq = raw_read_seqcount_latch(&root->seq);
 		node = __lt_find(key, root, seq & 1, ops->comp);
-	} while (read_seqcount_retry(&root->seq, seq));
+	} while (read_seqcount_latch_retry(&root->seq, seq));
 
 	return node;
 }

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] x86/tsc: Use seqcount_latch_t
  2020-08-27 11:40   ` [PATCH v1 6/8] x86/tsc: " Ahmed S. Darwish
  2020-09-04  7:41     ` peterz
@ 2020-09-10 15:08     ` tip-bot2 for Ahmed S. Darwish
  1 sibling, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-09-10 15:08 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     a1f1066133d85d5f42217cc72a2490bb7aa889c5
Gitweb:        https://git.kernel.org/tip/a1f1066133d85d5f42217cc72a2490bb7aa889c5
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Thu, 27 Aug 2020 13:40:42 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 10 Sep 2020 11:19:29 +02:00

x86/tsc: Use seqcount_latch_t

Latch sequence counters have unique read and write APIs, and thus
seqcount_latch_t was recently introduced at seqlock.h.

Use that new data type instead of plain seqcount_t. This adds the
necessary type-safety and ensures that only latching-safe seqcount APIs
are to be used.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
[peterz: unwreck cyc2ns_read_begin()]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200827114044.11173-7-a.darwish@linutronix.de
---
 arch/x86/kernel/tsc.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 49d9250..f70dffc 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -54,7 +54,7 @@ struct clocksource *art_related_clocksource;
 
 struct cyc2ns {
 	struct cyc2ns_data data[2];	/*  0 + 2*16 = 32 */
-	seqcount_t	   seq;		/* 32 + 4    = 36 */
+	seqcount_latch_t   seq;		/* 32 + 4    = 36 */
 
 }; /* fits one cacheline */
 
@@ -73,14 +73,14 @@ __always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
 	preempt_disable_notrace();
 
 	do {
-		seq = this_cpu_read(cyc2ns.seq.sequence);
+		seq = this_cpu_read(cyc2ns.seq.seqcount.sequence);
 		idx = seq & 1;
 
 		data->cyc2ns_offset = this_cpu_read(cyc2ns.data[idx].cyc2ns_offset);
 		data->cyc2ns_mul    = this_cpu_read(cyc2ns.data[idx].cyc2ns_mul);
 		data->cyc2ns_shift  = this_cpu_read(cyc2ns.data[idx].cyc2ns_shift);
 
-	} while (unlikely(seq != this_cpu_read(cyc2ns.seq.sequence)));
+	} while (unlikely(seq != this_cpu_read(cyc2ns.seq.seqcount.sequence)));
 }
 
 __always_inline void cyc2ns_read_end(void)
@@ -186,7 +186,7 @@ static void __init cyc2ns_init_boot_cpu(void)
 {
 	struct cyc2ns *c2n = this_cpu_ptr(&cyc2ns);
 
-	seqcount_init(&c2n->seq);
+	seqcount_latch_init(&c2n->seq);
 	__set_cyc2ns_scale(tsc_khz, smp_processor_id(), rdtsc());
 }
 
@@ -203,7 +203,7 @@ static void __init cyc2ns_init_secondary_cpus(void)
 
 	for_each_possible_cpu(cpu) {
 		if (cpu != this_cpu) {
-			seqcount_init(&c2n->seq);
+			seqcount_latch_init(&c2n->seq);
 			c2n = per_cpu_ptr(&cyc2ns, cpu);
 			c2n->data[0] = data[0];
 			c2n->data[1] = data[1];

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] timekeeping: Use seqcount_latch_t
  2020-08-27 11:40   ` [PATCH v1 5/8] timekeeping: " Ahmed S. Darwish
@ 2020-09-10 15:08     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-09-10 15:08 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     249d053835320cb3e7c00066cf085a6ba9b1f126
Gitweb:        https://git.kernel.org/tip/249d053835320cb3e7c00066cf085a6ba9b1f126
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Thu, 27 Aug 2020 13:40:41 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 10 Sep 2020 11:19:29 +02:00

timekeeping: Use seqcount_latch_t

Latch sequence counters are a multiversion concurrency control mechanism
where the seqcount_t counter even/odd value is used to switch between
two data storage copies. This allows the seqcount_t read path to safely
interrupt its write side critical section (e.g. from NMIs).

Initially, latch sequence counters were implemented as a single write
function, raw_write_seqcount_latch(), above plain seqcount_t. The read
path was expected to use plain seqcount_t raw_read_seqcount().

A specialized read function was later added, raw_read_seqcount_latch(),
and became the standardized way for latch read paths. Having unique read
and write APIs meant that latch sequence counters are basically a data
type of their own -- just inappropriately overloading plain seqcount_t.
The seqcount_latch_t data type was thus introduced at seqlock.h.

Use that new data type instead of seqcount_raw_spinlock_t. This ensures
that only latch-safe APIs are to be used with the sequence counter.

Note that the use of seqcount_raw_spinlock_t was not very useful in the
first place. Only the "raw_" subset of seqcount_t APIs were used at
timekeeping.c. This subset was created for contexts where lockdep cannot
be used. seqcount_LOCKTYPE_t's raison d'être -- verifying that the
seqcount_t writer serialization lock is held -- cannot thus be done.

References: 0c3351d451ae ("seqlock: Use raw_ prefix instead of _no_lockdep")
References: 55f3560df975 ("seqlock: Extend seqcount API with associated locks")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200827114044.11173-6-a.darwish@linutronix.de
---
 kernel/time/timekeeping.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4c47f38..999c981 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -64,7 +64,7 @@ static struct timekeeper shadow_timekeeper;
  * See @update_fast_timekeeper() below.
  */
 struct tk_fast {
-	seqcount_raw_spinlock_t	seq;
+	seqcount_latch_t	seq;
 	struct tk_read_base	base[2];
 };
 
@@ -81,13 +81,13 @@ static struct clocksource dummy_clock = {
 };
 
 static struct tk_fast tk_fast_mono ____cacheline_aligned = {
-	.seq     = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_mono.seq, &timekeeper_lock),
+	.seq     = SEQCNT_LATCH_ZERO(tk_fast_mono.seq),
 	.base[0] = { .clock = &dummy_clock, },
 	.base[1] = { .clock = &dummy_clock, },
 };
 
 static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
-	.seq     = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_raw.seq, &timekeeper_lock),
+	.seq     = SEQCNT_LATCH_ZERO(tk_fast_raw.seq),
 	.base[0] = { .clock = &dummy_clock, },
 	.base[1] = { .clock = &dummy_clock, },
 };
@@ -467,7 +467,7 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
 					tk_clock_read(tkr),
 					tkr->cycle_last,
 					tkr->mask));
-	} while (read_seqcount_retry(&tkf->seq, seq));
+	} while (read_seqcount_latch_retry(&tkf->seq, seq));
 
 	return now;
 }
@@ -533,7 +533,7 @@ static __always_inline u64 __ktime_get_real_fast_ns(struct tk_fast *tkf)
 					tk_clock_read(tkr),
 					tkr->cycle_last,
 					tkr->mask));
-	} while (read_seqcount_retry(&tkf->seq, seq));
+	} while (read_seqcount_latch_retry(&tkf->seq, seq));
 
 	return now;
 }

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] time/sched_clock: Use seqcount_latch_t
  2020-08-27 11:40   ` [PATCH v1 4/8] time/sched_clock: Use seqcount_latch_t Ahmed S. Darwish
@ 2020-09-10 15:08     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-09-10 15:08 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     a690ed07353ec45f056b0a6f87c23a12a59c030d
Gitweb:        https://git.kernel.org/tip/a690ed07353ec45f056b0a6f87c23a12a59c030d
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Thu, 27 Aug 2020 13:40:40 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 10 Sep 2020 11:19:29 +02:00

time/sched_clock: Use seqcount_latch_t

Latch sequence counters have unique read and write APIs, and thus
seqcount_latch_t was recently introduced at seqlock.h.

Use that new data type instead of plain seqcount_t. This adds the
necessary type-safety and ensures only latching-safe seqcount APIs are
to be used.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200827114044.11173-5-a.darwish@linutronix.de
---
 kernel/time/sched_clock.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 8c6b5fe..0642013 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -35,7 +35,7 @@
  * into a single 64-byte cache line.
  */
 struct clock_data {
-	seqcount_t		seq;
+	seqcount_latch_t	seq;
 	struct clock_read_data	read_data[2];
 	ktime_t			wrap_kt;
 	unsigned long		rate;
@@ -76,7 +76,7 @@ struct clock_read_data *sched_clock_read_begin(unsigned int *seq)
 
 int sched_clock_read_retry(unsigned int seq)
 {
-	return read_seqcount_retry(&cd.seq, seq);
+	return read_seqcount_latch_retry(&cd.seq, seq);
 }
 
 unsigned long long notrace sched_clock(void)

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] seqlock: Introduce seqcount_latch_t
  2020-08-27 11:40   ` [PATCH v1 3/8] seqlock: Introduce seqcount_latch_t Ahmed S. Darwish
@ 2020-09-10 15:08     ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-09-10 15:08 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     80793c3471d90d4dc2b48deadb6413bdfe39500f
Gitweb:        https://git.kernel.org/tip/80793c3471d90d4dc2b48deadb6413bdfe39500f
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Thu, 27 Aug 2020 13:40:39 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 10 Sep 2020 11:19:28 +02:00

seqlock: Introduce seqcount_latch_t

Latch sequence counters are a multiversion concurrency control mechanism
where the seqcount_t counter even/odd value is used to switch between
two copies of protected data. This allows the seqcount_t read path to
safely interrupt its write side critical section (e.g. from NMIs).

Initially, latch sequence counters were implemented as a single write
function above plain seqcount_t: raw_write_seqcount_latch(). The read
side was expected to use plain seqcount_t raw_read_seqcount().

A specialized latch read function, raw_read_seqcount_latch(), was later
added. It became the standardized way for latch read paths.  Due to the
dependent load, it has one read memory barrier less than the plain
seqcount_t raw_read_seqcount() API.

Only raw_write_seqcount_latch() and raw_read_seqcount_latch() should be
used with latch sequence counters. Having *unique* read and write path
APIs means that latch sequence counters are actually a data type of
their own -- just inappropriately overloading plain seqcount_t.

Introduce seqcount_latch_t. This adds type-safety and ensures that only
the correct latch-safe APIs are to be used.

Not to break bisection, let the latch APIs also accept plain seqcount_t
or seqcount_raw_spinlock_t. After converting all call sites to
seqcount_latch_t, only that new data type will be allowed.

References: 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()")
References: 7fc26327b756 ("seqlock: Introduce raw_read_seqcount_latch()")
References: aadd6e5caaac ("time/sched_clock: Use raw_read_seqcount_latch()")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200827114044.11173-4-a.darwish@linutronix.de
---
 Documentation/locking/seqlock.rst |  18 +++++-
 include/linux/seqlock.h           | 104 ++++++++++++++++++++---------
 2 files changed, 91 insertions(+), 31 deletions(-)

diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
index 62c5ad9..a334b58 100644
--- a/Documentation/locking/seqlock.rst
+++ b/Documentation/locking/seqlock.rst
@@ -139,6 +139,24 @@ with the associated LOCKTYPE lock acquired.
 
 Read path: same as in :ref:`seqcount_t`.
 
+
+.. _seqcount_latch_t:
+
+Latch sequence counters (``seqcount_latch_t``)
+----------------------------------------------
+
+Latch sequence counters are a multiversion concurrency control mechanism
+where the embedded seqcount_t counter even/odd value is used to switch
+between two copies of protected data. This allows the sequence counter
+read path to safely interrupt its own write side critical section.
+
+Use seqcount_latch_t when the write side sections cannot be protected
+from interruption by readers. This is typically the case when the read
+side can be invoked from NMI handlers.
+
+Check `raw_write_seqcount_latch()` for more information.
+
+
 .. _seqlock_t:
 
 Sequential locks (``seqlock_t``)
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 300cbf3..88b917d 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -587,34 +587,76 @@ static inline void write_seqcount_t_invalidate(seqcount_t *s)
 	kcsan_nestable_atomic_end();
 }
 
-/**
- * raw_read_seqcount_latch() - pick even/odd seqcount_t latch data copy
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+/*
+ * Latch sequence counters (seqcount_latch_t)
  *
- * Use seqcount_t latching to switch between two storage places protected
- * by a sequence counter. Doing so allows having interruptible, preemptible,
- * seqcount_t write side critical sections.
+ * A sequence counter variant where the counter even/odd value is used to
+ * switch between two copies of protected data. This allows the read path,
+ * typically NMIs, to safely interrupt the write side critical section.
  *
- * Check raw_write_seqcount_latch() for more details and a full reader and
- * writer usage example.
+ * As the write sections are fully preemptible, no special handling for
+ * PREEMPT_RT is needed.
+ */
+typedef struct {
+	seqcount_t seqcount;
+} seqcount_latch_t;
+
+/**
+ * SEQCNT_LATCH_ZERO() - static initializer for seqcount_latch_t
+ * @seq_name: Name of the seqcount_latch_t instance
+ */
+#define SEQCNT_LATCH_ZERO(seq_name) {					\
+	.seqcount		= SEQCNT_ZERO(seq_name.seqcount),	\
+}
+
+/**
+ * seqcount_latch_init() - runtime initializer for seqcount_latch_t
+ * @s: Pointer to the seqcount_latch_t instance
+ */
+static inline void seqcount_latch_init(seqcount_latch_t *s)
+{
+	seqcount_init(&s->seqcount);
+}
+
+/**
+ * raw_read_seqcount_latch() - pick even/odd latch data copy
+ * @s: Pointer to seqcount_t, seqcount_raw_spinlock_t, or seqcount_latch_t
+ *
+ * See raw_write_seqcount_latch() for details and a full reader/writer
+ * usage example.
  *
  * Return: sequence counter raw value. Use the lowest bit as an index for
- * picking which data copy to read. The full counter value must then be
- * checked with read_seqcount_retry().
+ * picking which data copy to read. The full counter must then be checked
+ * with read_seqcount_latch_retry().
  */
-#define raw_read_seqcount_latch(s)					\
-	raw_read_seqcount_t_latch(__seqcount_ptr(s))
+#define raw_read_seqcount_latch(s)						\
+({										\
+	/*									\
+	 * Pairs with the first smp_wmb() in raw_write_seqcount_latch().	\
+	 * Due to the dependent load, a full smp_rmb() is not needed.		\
+	 */									\
+	_Generic(*(s),								\
+		 seqcount_t:		  READ_ONCE(((seqcount_t *)s)->sequence),			\
+		 seqcount_raw_spinlock_t: READ_ONCE(((seqcount_raw_spinlock_t *)s)->seqcount.sequence),	\
+		 seqcount_latch_t:	  READ_ONCE(((seqcount_latch_t *)s)->seqcount.sequence));	\
+})
 
-static inline int raw_read_seqcount_t_latch(seqcount_t *s)
+/**
+ * read_seqcount_latch_retry() - end a seqcount_latch_t read section
+ * @s:		Pointer to seqcount_latch_t
+ * @start:	count, from raw_read_seqcount_latch()
+ *
+ * Return: true if a read section retry is required, else false
+ */
+static inline int
+read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
 {
-	/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
-	int seq = READ_ONCE(s->sequence); /* ^^^ */
-	return seq;
+	return read_seqcount_retry(&s->seqcount, start);
 }
 
 /**
- * raw_write_seqcount_latch() - redirect readers to even/odd copy
- * @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
+ * raw_write_seqcount_latch() - redirect latch readers to even/odd copy
+ * @s: Pointer to seqcount_t, seqcount_raw_spinlock_t, or seqcount_latch_t
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -633,7 +675,7 @@ static inline int raw_read_seqcount_t_latch(seqcount_t *s)
  * The basic form is a data structure like::
  *
  *	struct latch_struct {
- *		seqcount_t		seq;
+ *		seqcount_latch_t	seq;
  *		struct data_struct	data[2];
  *	};
  *
@@ -643,13 +685,13 @@ static inline int raw_read_seqcount_t_latch(seqcount_t *s)
  *	void latch_modify(struct latch_struct *latch, ...)
  *	{
  *		smp_wmb();	// Ensure that the last data[1] update is visible
- *		latch->seq++;
+ *		latch->seq.sequence++;
  *		smp_wmb();	// Ensure that the seqcount update is visible
  *
  *		modify(latch->data[0], ...);
  *
  *		smp_wmb();	// Ensure that the data[0] update is visible
- *		latch->seq++;
+ *		latch->seq.sequence++;
  *		smp_wmb();	// Ensure that the seqcount update is visible
  *
  *		modify(latch->data[1], ...);
@@ -668,8 +710,8 @@ static inline int raw_read_seqcount_t_latch(seqcount_t *s)
  *			idx = seq & 0x01;
  *			entry = data_query(latch->data[idx], ...);
  *
- *		// read_seqcount_retry() includes needed smp_rmb()
- *		} while (read_seqcount_retry(&latch->seq, seq));
+ *		// This includes needed smp_rmb()
+ *		} while (read_seqcount_latch_retry(&latch->seq, seq));
  *
  *		return entry;
  *	}
@@ -693,14 +735,14 @@ static inline int raw_read_seqcount_t_latch(seqcount_t *s)
  *	When data is a dynamic data structure; one should use regular RCU
  *	patterns to manage the lifetimes of the objects within.
  */
-#define raw_write_seqcount_latch(s)					\
-	raw_write_seqcount_t_latch(__seqcount_ptr(s))
-
-static inline void raw_write_seqcount_t_latch(seqcount_t *s)
-{
-       smp_wmb();      /* prior stores before incrementing "sequence" */
-       s->sequence++;
-       smp_wmb();      /* increment "sequence" before following stores */
+#define raw_write_seqcount_latch(s)						\
+{										\
+       smp_wmb();      /* prior stores before incrementing "sequence" */	\
+       _Generic(*(s),								\
+		seqcount_t:		((seqcount_t *)s)->sequence++,		\
+		seqcount_raw_spinlock_t:((seqcount_raw_spinlock_t *)s)->seqcount.sequence++, \
+		seqcount_latch_t:	((seqcount_latch_t *)s)->seqcount.sequence++); \
+       smp_wmb();      /* increment "sequence" before following stores */	\
 }
 
 /*

^ permalink raw reply related	[flat|nested] 258+ messages in thread

* [tip: locking/core] mm/swap: Do not abuse the seqcount_t latching API
  2020-05-25 16:10     ` John Ogness
@ 2020-09-10 15:08       ` tip-bot2 for Ahmed S. Darwish
  0 siblings, 0 replies; 258+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-09-10 15:08 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     6446a5131e24a834606c15a965fa920041581c2c
Gitweb:        https://git.kernel.org/tip/6446a5131e24a834606c15a965fa920041581c2c
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Thu, 27 Aug 2020 13:40:38 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 10 Sep 2020 11:19:28 +02:00

mm/swap: Do not abuse the seqcount_t latching API

Commit eef1a429f234 ("mm/swap.c: piggyback lru_add_drain_all() calls")
implemented an optimization mechanism to exit the to-be-started LRU
drain operation (name it A) if another drain operation *started and
finished* while (A) was blocked on the LRU draining mutex.

This was done through a seqcount_t latch, which is an abuse of its
semantics:

  1. seqcount_t latching should be used for the purpose of switching
     between two storage places with sequence protection to allow
     interruptible, preemptible, writer sections. The referenced
     optimization mechanism has absolutely nothing to do with that.

  2. The used raw_write_seqcount_latch() has two SMP write memory
     barriers to insure one consistent storage place out of the two
     storage places available. A full memory barrier is required
     instead: to guarantee that the pagevec counter stores visible by
     local CPU are visible to other CPUs -- before loading the current
     drain generation.

Beside the seqcount_t API abuse, the semantics of a latch sequence
counter was force-fitted into the referenced optimization. What was
meant is to track "generations" of LRU draining operations, where
"global lru draining generation = x" implies that all generations
0 < n <= x are already *scheduled* for draining -- thus nothing needs
to be done if the current generation number n <= x.

Remove the conceptually-inappropriate seqcount_t latch usage. Manually
implement the referenced optimization using a counter and SMP memory
barriers.

Note, while at it, use the non-atomic variant of cpumask_set_cpu(),
__cpumask_set_cpu(), due to the already existing mutex protection.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/87y2pg9erj.fsf@vostro.fn.ogness.net
---
 mm/swap.c | 65 ++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 54 insertions(+), 11 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index d16d65d..a1ec807 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -763,10 +763,20 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
  */
 void lru_add_drain_all(void)
 {
-	static seqcount_t seqcount = SEQCNT_ZERO(seqcount);
-	static DEFINE_MUTEX(lock);
+	/*
+	 * lru_drain_gen - Global pages generation number
+	 *
+	 * (A) Definition: global lru_drain_gen = x implies that all generations
+	 *     0 < n <= x are already *scheduled* for draining.
+	 *
+	 * This is an optimization for the highly-contended use case where a
+	 * user space workload keeps constantly generating a flow of pages for
+	 * each CPU.
+	 */
+	static unsigned int lru_drain_gen;
 	static struct cpumask has_work;
-	int cpu, seq;
+	static DEFINE_MUTEX(lock);
+	unsigned cpu, this_gen;
 
 	/*
 	 * Make sure nobody triggers this path before mm_percpu_wq is fully
@@ -775,21 +785,54 @@ void lru_add_drain_all(void)
 	if (WARN_ON(!mm_percpu_wq))
 		return;
 
-	seq = raw_read_seqcount_latch(&seqcount);
+	/*
+	 * Guarantee pagevec counter stores visible by this CPU are visible to
+	 * other CPUs before loading the current drain generation.
+	 */
+	smp_mb();
+
+	/*
+	 * (B) Locally cache global LRU draining generation number
+	 *
+	 * The read barrier ensures that the counter is loaded before the mutex
+	 * is taken. It pairs with smp_mb() inside the mutex critical section
+	 * at (D).
+	 */
+	this_gen = smp_load_acquire(&lru_drain_gen);
 
 	mutex_lock(&lock);
 
 	/*
-	 * Piggyback on drain started and finished while we waited for lock:
-	 * all pages pended at the time of our enter were drained from vectors.
+	 * (C) Exit the draining operation if a newer generation, from another
+	 * lru_add_drain_all(), was already scheduled for draining. Check (A).
 	 */
-	if (__read_seqcount_retry(&seqcount, seq))
+	if (unlikely(this_gen != lru_drain_gen))
 		goto done;
 
-	raw_write_seqcount_latch(&seqcount);
+	/*
+	 * (D) Increment global generation number
+	 *
+	 * Pairs with smp_load_acquire() at (B), outside of the critical
+	 * section. Use a full memory barrier to guarantee that the new global
+	 * drain generation number is stored before loading pagevec counters.
+	 *
+	 * This pairing must be done here, before the for_each_online_cpu loop
+	 * below which drains the page vectors.
+	 *
+	 * Let x, y, and z represent some system CPU numbers, where x < y < z.
+	 * Assume CPU #z is is in the middle of the for_each_online_cpu loop
+	 * below and has already reached CPU #y's per-cpu data. CPU #x comes
+	 * along, adds some pages to its per-cpu vectors, then calls
+	 * lru_add_drain_all().
+	 *
+	 * If the paired barrier is done at any later step, e.g. after the
+	 * loop, CPU #x will just exit at (C) and miss flushing out all of its
+	 * added pages.
+	 */
+	WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
+	smp_mb();
 
 	cpumask_clear(&has_work);
-
 	for_each_online_cpu(cpu) {
 		struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
 
@@ -801,7 +844,7 @@ void lru_add_drain_all(void)
 		    need_activate_page_drain(cpu)) {
 			INIT_WORK(work, lru_add_drain_per_cpu);
 			queue_work_on(cpu, mm_percpu_wq, work);
-			cpumask_set_cpu(cpu, &has_work);
+			__cpumask_set_cpu(cpu, &has_work);
 		}
 	}
 
@@ -816,7 +859,7 @@ void lru_add_drain_all(void)
 {
 	lru_add_drain();
 }
-#endif
+#endif /* CONFIG_SMP */
 
 /**
  * release_pages - batched put_page()

^ permalink raw reply related	[flat|nested] 258+ messages in thread

end of thread, other threads:[~2020-09-10 19:29 UTC | newest]

Thread overview: 258+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-19 21:45 [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount Ahmed S. Darwish
2020-05-19 22:01   ` Stephen Hemminger
2020-05-19 22:23     ` Thomas Gleixner
2020-05-19 23:11       ` Stephen Hemminger
2020-05-19 23:42         ` Thomas Gleixner
2020-05-20  0:06           ` Stephen Hemminger
2020-05-20  1:55             ` Thomas Gleixner
2020-05-20  2:57           ` David Miller
2020-05-20  3:18             ` Eric Dumazet
2020-05-20  4:36               ` Stephen Hemminger
2020-05-20 19:37             ` Thomas Gleixner
2020-05-20 21:36               ` Stephen Hemminger
2020-05-20  2:01   ` Eric Dumazet
2020-05-20  6:42     ` Ahmed S. Darwish
2020-05-20 12:51       ` Eric Dumazet
2020-06-03 14:33         ` Ahmed S. Darwish
2020-05-20 14:37   ` Dan Carpenter
2020-05-25 16:22     ` Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 02/25] mm/swap: Don't abuse the seqcount latching API Ahmed S. Darwish
2020-05-20 12:22   ` Konstantin Khlebnikov
2020-05-20 13:05     ` Peter Zijlstra
2020-05-22 14:57   ` Peter Zijlstra
2020-05-22 15:17     ` Sebastian A. Siewior
2020-05-22 16:23       ` Peter Zijlstra
2020-05-25 15:24     ` Ahmed S. Darwish
2020-05-25 15:45       ` Peter Zijlstra
2020-05-25 16:10     ` John Ogness
2020-09-10 15:08       ` [tip: locking/core] mm/swap: Do not abuse the seqcount_t " tip-bot2 for Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 03/25] net: phy: fixed_phy: Remove unused seqcount Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 04/25] block: nr_sects_write(): Disable preemption on seqcount write Ahmed S. Darwish
2020-05-22 16:39   ` Peter Zijlstra
2020-05-25  9:56     ` Ahmed S. Darwish
     [not found]   ` <20200522001237.A00E8206BE@mail.kernel.org>
2020-05-25 10:12     ` Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 05/25] u64_stats: Document writer non-preemptibility requirement Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 06/25] dma-buf: Remove custom seqcount lockdep class key Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 07/25] lockdep: Add preemption disabled assertion API Ahmed S. Darwish
2020-05-22 17:55   ` Peter Zijlstra
2020-05-23 14:59     ` Sebastian A. Siewior
2020-05-23 22:41       ` Peter Zijlstra
2020-05-24 10:50         ` Sebastian A. Siewior
2020-05-25 10:22         ` Peter Zijlstra
2020-05-26  0:52           ` Ahmed S. Darwish
2020-05-26  8:13             ` Peter Zijlstra
2020-05-26  9:45               ` Ahmed S. Darwish
2020-06-03 15:30               ` Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 08/25] seqlock: lockdep assert non-preemptibility on seqcount_t write Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 09/25] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
2020-05-22 18:01   ` Peter Zijlstra
2020-05-22 22:24     ` Steven Rostedt
2020-05-25 10:50       ` Ahmed S. Darwish
2020-05-25 11:02         ` Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 10/25] seqlock: Add RST directives to kernel-doc code samples and notes Ahmed S. Darwish
2020-05-22 18:02   ` Peter Zijlstra
2020-05-22 18:03     ` Peter Zijlstra
2020-05-22 18:26       ` Thomas Gleixner
2020-05-22 18:32         ` Peter Zijlstra
2020-05-25  9:36           ` Ahmed S. Darwish
2020-05-25 13:44             ` Peter Zijlstra
2020-05-25 14:07               ` Peter Zijlstra
2020-05-19 21:45 ` [PATCH v1 11/25] seqlock: Add missing kernel-doc annotations Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 12/25] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 13/25] dma-buf: Use sequence counter with associated wound/wait mutex Ahmed S. Darwish
2020-05-20 10:48   ` Christian König
2020-05-21  0:09     ` Ahmed S. Darwish
2020-05-21 13:20       ` Christian König
2020-05-19 21:45 ` [PATCH v1 14/25] sched: tasks: Use sequence counter with associated spinlock Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 15/25] netfilter: conntrack: " Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 16/25] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 17/25] xfrm: policy: Use sequence counters with associated lock Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 18/25] timekeeping: Use sequence counter with associated raw spinlock Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 19/25] vfs: Use sequence counter with associated spinlock Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 20/25] raid5: " Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 21/25] iocost: " Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 22/25] NFSv4: " Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 23/25] userfaultfd: " Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 24/25] kvm/eventfd: " Ahmed S. Darwish
2020-05-19 21:45 ` [PATCH v1 25/25] hrtimer: Use sequence counter with associated raw spinlock Ahmed S. Darwish
2020-06-08  0:57 ` [PATCH v2 00/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
2020-06-08  0:57   ` [PATCH v2 01/18] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
2020-06-08  0:57   ` [PATCH v2 02/18] seqlock: Properly format kernel-doc code samples Ahmed S. Darwish
2020-06-08  0:57   ` [PATCH v2 03/18] seqlock: Add missing kernel-doc annotations Ahmed S. Darwish
2020-06-08  0:57   ` [PATCH v2 04/18] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
2020-06-08  0:57   ` [PATCH v2 05/18] dma-buf: Remove custom seqcount lockdep class key Ahmed S. Darwish
2020-06-08  0:57   ` [PATCH v2 06/18] dma-buf: Use sequence counter with associated wound/wait mutex Ahmed S. Darwish
2020-06-08 14:32     ` Daniel Vetter
2020-06-08  0:57   ` [PATCH v2 07/18] sched: tasks: Use sequence counter with associated spinlock Ahmed S. Darwish
2020-06-08  0:57   ` [PATCH v2 08/18] netfilter: conntrack: " Ahmed S. Darwish
2020-06-08  0:57   ` [PATCH v2 09/18] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock Ahmed S. Darwish
2020-06-08  0:57   ` [PATCH v2 10/18] xfrm: policy: Use sequence counters with associated lock Ahmed S. Darwish
2020-06-08  0:57   ` [PATCH v2 11/18] timekeeping: Use sequence counter with associated raw spinlock Ahmed S. Darwish
2020-06-08  0:57   ` [PATCH v2 12/18] vfs: Use sequence counter with associated spinlock Ahmed S. Darwish
2020-06-08  0:57   ` [PATCH v2 13/18] raid5: " Ahmed S. Darwish
2020-06-08  0:57   ` [PATCH v2 14/18] iocost: " Ahmed S. Darwish
2020-06-08  0:57   ` [PATCH v2 15/18] NFSv4: " Ahmed S. Darwish
2020-06-08  0:57   ` [PATCH v2 16/18] userfaultfd: " Ahmed S. Darwish
2020-06-08  0:57   ` [PATCH v2 17/18] kvm/eventfd: " Ahmed S. Darwish
2020-06-08 12:57     ` Paolo Bonzini
2020-06-08  0:57   ` [PATCH v2 18/18] hrtimer: Use sequence counter with associated raw spinlock Ahmed S. Darwish
2020-06-30  5:44 ` [PATCH v3 00/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
2020-06-30  5:44   ` [PATCH v3 01/20] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
2020-07-06 21:04     ` Peter Zijlstra
2020-07-06 21:12       ` Jonathan Corbet
2020-07-06 21:16       ` Peter Zijlstra
2020-07-07 10:12       ` Ahmed S. Darwish
2020-07-07 12:47         ` Peter Zijlstra
2020-06-30  5:44   ` [PATCH v3 02/20] seqlock: Properly format kernel-doc code samples Ahmed S. Darwish
2020-06-30  5:44   ` [PATCH v3 03/20] seqlock: Add missing kernel-doc annotations Ahmed S. Darwish
2020-06-30  5:44   ` [PATCH v3 04/20] lockdep: Add preemption enabled/disabled assertion APIs Ahmed S. Darwish
2020-07-06 20:50     ` Peter Zijlstra
2020-07-07  7:34       ` Sebastian A. Siewior
2020-06-30  5:44   ` [PATCH v3 05/20] seqlock: lockdep assert non-preemptibility on seqcount_t write Ahmed S. Darwish
2020-06-30  5:44   ` [PATCH v3 06/20] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
2020-07-06 21:21     ` Peter Zijlstra
2020-07-07  8:40       ` Ahmed S. Darwish
2020-07-07 13:04         ` Peter Zijlstra
2020-07-07 14:37           ` Peter Zijlstra
2020-07-08  9:12             ` Peter Zijlstra
2020-07-08 10:43               ` Ahmed S. Darwish
2020-07-08 10:33             ` Ahmed S. Darwish
2020-07-08 12:29               ` Peter Zijlstra
2020-07-08 14:13                 ` Peter Zijlstra
2020-07-08 14:25                   ` Peter Zijlstra
2020-07-08 15:09                 ` Ahmed S. Darwish
2020-07-08 15:35                   ` Peter Zijlstra
2020-07-08 15:58                     ` Ahmed S. Darwish
2020-07-08 16:16                       ` Peter Zijlstra
2020-07-08 16:18                       ` Peter Zijlstra
2020-07-08 16:01                     ` Peter Zijlstra
2020-06-30  5:44   ` [PATCH v3 07/20] dma-buf: Remove custom seqcount lockdep class key Ahmed S. Darwish
2020-06-30  5:44   ` [PATCH v3 08/20] dma-buf: Use sequence counter with associated wound/wait mutex Ahmed S. Darwish
2020-06-30  5:44   ` [PATCH v3 09/20] sched: tasks: Use sequence counter with associated spinlock Ahmed S. Darwish
2020-06-30  5:44   ` [PATCH v3 10/20] netfilter: conntrack: " Ahmed S. Darwish
2020-06-30  5:44   ` [PATCH v3 11/20] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock Ahmed S. Darwish
2020-06-30  5:44   ` [PATCH v3 12/20] xfrm: policy: Use sequence counters with associated lock Ahmed S. Darwish
2020-06-30  5:44   ` [PATCH v3 13/20] timekeeping: Use sequence counter with associated raw spinlock Ahmed S. Darwish
2020-06-30  5:44   ` [PATCH v3 14/20] vfs: Use sequence counter with associated spinlock Ahmed S. Darwish
2020-06-30  5:44   ` [PATCH v3 15/20] raid5: " Ahmed S. Darwish
2020-06-30  5:44   ` [PATCH v3 16/20] iocost: " Ahmed S. Darwish
2020-06-30  7:11     ` Daniel Wagner
2020-06-30  5:44   ` [PATCH v3 17/20] NFSv4: " Ahmed S. Darwish
2020-06-30  5:44   ` [PATCH v3 18/20] userfaultfd: " Ahmed S. Darwish
2020-06-30  5:44   ` [PATCH v3 19/20] kvm/eventfd: " Ahmed S. Darwish
2020-06-30  5:44   ` [PATCH v3 20/20] hrtimer: Use sequence counter with associated raw spinlock Ahmed S. Darwish
2020-07-20 15:55 ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 01/24] Documentation: locking: Describe seqlock design and usage Ahmed S. Darwish
2020-07-21  1:35     ` Steven Rostedt
2020-07-21  1:37       ` Steven Rostedt
2020-07-21  5:34       ` Ahmed S. Darwish
2020-07-21  1:44     ` Steven Rostedt
2020-07-21  1:51       ` Steven Rostedt
2020-07-21  7:15         ` Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 02/24] seqlock: Properly format kernel-doc code samples Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 03/24] seqlock: seqcount_t latch: End read sections with read_seqcount_retry() Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 04/24] seqlock: Reorder seqcount_t and seqlock_t API definitions Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 05/24] seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 06/24] seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount() Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 07/24] lockdep: Add preemption enabled/disabled assertion APIs Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-08-08 23:21     ` [PATCH v4 08/24] " Guenter Roeck
2020-08-08 23:23       ` Guenter Roeck
2020-08-09 18:42       ` Ahmed S. Darwish
2020-08-10  8:59         ` Greg KH
2020-08-10  9:48           ` peterz
2020-08-10 10:03             ` Greg KH
2020-08-10  9:54           ` [PATCH] Revert "seqlock: lockdep assert non-preemptibility on seqcount_t write" Ahmed S. Darwish
2020-08-10 10:05             ` Greg KH
2020-08-10 10:35               ` Ahmed S. Darwish
2020-08-10 14:10               ` Guenter Roeck
2020-08-18 22:51                 ` Valdis Klētnieks
2020-08-19  0:56                   ` Guenter Roeck
2020-08-19  7:00                     ` Sebastian Andrzej Siewior
2020-08-19  7:34                       ` Valdis Klētnieks
2020-08-19 16:15                         ` Guenter Roeck
2020-08-10 19:55           ` [PATCH v4 08/24] seqlock: lockdep assert non-preemptibility on seqcount_t write Thomas Gleixner
2020-08-11 10:06             ` Greg KH
2020-07-20 15:55   ` [PATCH v4 09/24] seqlock: Extend seqcount API with associated locks Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 10/24] seqlock: Align multi-line macros newline escapes at 72 columns Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 11/24] dma-buf: Remove custom seqcount lockdep class key Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 12/24] dma-buf: Use sequence counter with associated wound/wait mutex Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 13/24] sched: tasks: Use sequence counter with associated spinlock Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 14/24] netfilter: conntrack: " Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 15/24] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 16/24] xfrm: policy: Use sequence counters with associated lock Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 17/24] timekeeping: Use sequence counter with associated raw spinlock Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 18/24] vfs: Use sequence counter with associated spinlock Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 19/24] raid5: " Ahmed S. Darwish
2020-07-22  6:40     ` Song Liu
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 20/24] iocost: " Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 21/24] NFSv4: " Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 22/24] userfaultfd: " Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 23/24] kvm/eventfd: " Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 15:55   ` [PATCH v4 24/24] hrtimer: Use sequence counter with associated raw spinlock Ahmed S. Darwish
2020-07-29 14:33     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-07-20 16:49   ` [PATCH v4 00/24] seqlock: Extend seqcount API with associated locks Eric Biggers
2020-07-20 17:33     ` Ahmed S. Darwish
2020-08-27 11:40 ` [PATCH v1 0/8] seqlock: Introduce seqcount_latch_t Ahmed S. Darwish
2020-08-27 11:40   ` [PATCH v1 1/8] time/sched_clock: Use raw_read_seqcount_latch() during suspend Ahmed S. Darwish
2020-08-27 11:40   ` [PATCH v1 2/8] mm/swap: Do not abuse the seqcount_t latching API Ahmed S. Darwish
2020-08-27 11:40   ` [PATCH v1 3/8] seqlock: Introduce seqcount_latch_t Ahmed S. Darwish
2020-09-10 15:08     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-08-27 11:40   ` [PATCH v1 4/8] time/sched_clock: Use seqcount_latch_t Ahmed S. Darwish
2020-09-10 15:08     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-08-27 11:40   ` [PATCH v1 5/8] timekeeping: " Ahmed S. Darwish
2020-09-10 15:08     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-08-27 11:40   ` [PATCH v1 6/8] x86/tsc: " Ahmed S. Darwish
2020-09-04  7:41     ` peterz
2020-09-04  8:03       ` peterz
2020-09-07 16:29         ` Ahmed S. Darwish
2020-09-07 17:30           ` peterz
2020-09-08  6:23             ` Ahmed S. Darwish
2020-09-10 15:08     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-08-27 11:40   ` [PATCH v1 7/8] rbtree_latch: " Ahmed S. Darwish
2020-09-10 15:08     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-08-27 11:40   ` [PATCH v1 8/8] seqlock: seqcount latch APIs: Only allow seqcount_latch_t Ahmed S. Darwish
2020-09-10 15:08     ` [tip: locking/core] " tip-bot2 for Ahmed S. Darwish
2020-08-28  1:07 ` [PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support Ahmed S. Darwish
2020-08-28  1:07   ` [PATCH v1 1/5] seqlock: seqcount_LOCKTYPE_t: Standardize naming convention Ahmed S. Darwish
2020-08-28  8:18     ` peterz
2020-08-28  8:24       ` Ahmed S. Darwish
2020-08-28  1:07   ` [PATCH v1 2/5] seqlock: Use unique prefix for seqcount_t property accessors Ahmed S. Darwish
2020-08-28  8:27     ` peterz
2020-08-28  8:59       ` Ahmed S. Darwish
2020-08-28  1:07   ` [PATCH v1 3/5] seqlock: seqcount_t: Implement all read APIs as statement expressions Ahmed S. Darwish
2020-08-28  8:30     ` peterz
2020-08-28  8:37       ` Ahmed S. Darwish
2020-08-28  1:07   ` [PATCH v1 4/5] seqlock: seqcount_LOCKTYPE_t: Introduce PREEMPT_RT support Ahmed S. Darwish
2020-08-28  8:57     ` peterz
2020-08-28  8:59     ` peterz
2020-08-28  9:31       ` Ahmed S. Darwish
2020-08-28 14:36         ` Ahmed S. Darwish
2020-08-28  1:07   ` [PATCH v1 5/5] seqlock: PREEMPT_RT: Do not starve seqlock_t writers Ahmed S. Darwish
2020-09-04  6:52   ` [PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support peterz
2020-09-04  7:30     ` Ahmed S. Darwish
2020-09-10 15:08 ` [tip: locking/core] seqlock: seqcount_LOCKNAME_t: " tip-bot2 for Ahmed S. Darwish

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).