linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Detect SRCU related deadlocks
@ 2023-01-13  6:59 Boqun Feng
  2023-01-13  6:59 ` [PATCH 1/3] locking/lockdep: Introduce lock_sync() Boqun Feng
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Boqun Feng @ 2023-01-13  6:59 UTC (permalink / raw)
  To: linux-kernel, rcu, kvm
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, Lai Jiangshan, Paul E. McKenney, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, David Woodhouse,
	Paolo Bonzini, seanjc, Joel Fernandes, Matthew Wilcox,
	Michal Luczaj

This is actually a leftover of the recursive read deadlock detection
patchset:

	https://lore.kernel.org/lkml/20180411135647.21496-1-boqun.feng@gmail.com/

I resolve comments then and add more test cases, and hopefully this can
fulfill the request from KVM:

	https://lore.kernel.org/lkml/a14a13a690277d4cc95a4b26aa2d9a4d9b392a74.camel@infradead.org/

;-)

The patch #3 is now WIP for two reasons:

*	It may conflicts with Paul's patchset on removing CONFIG_SRCU

*	I haven't found a proper way to "reinit" srcu_struct when
	lockdep selftest runs: cleanup_srcu_struct() needs workqueue
	however the tests can run before there is one.

Anyway, these selftests prove the detection actually works. And as
always, feedbacks and comments are welcome!

Regards,
Boqun

Boqun Feng (3):
  locking/lockdep: Introduce lock_sync()
  rcu: Equip sleepable RCU with lockdep dependency graph checks
  WIP: locking/lockdep: selftests: Add selftests for SRCU

 include/linux/lockdep.h  |  5 +++
 include/linux/srcu.h     | 23 +++++++++++--
 kernel/locking/lockdep.c | 34 +++++++++++++++++++
 kernel/rcu/srcutiny.c    |  2 ++
 kernel/rcu/srcutree.c    |  2 ++
 lib/locking-selftest.c   | 71 ++++++++++++++++++++++++++++++++++++++++
 6 files changed, 135 insertions(+), 2 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/3] locking/lockdep: Introduce lock_sync()
  2023-01-13  6:59 [PATCH 0/3] Detect SRCU related deadlocks Boqun Feng
@ 2023-01-13  6:59 ` Boqun Feng
  2023-01-16 21:56   ` Waiman Long
  2023-01-13  6:59 ` [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks Boqun Feng
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Boqun Feng @ 2023-01-13  6:59 UTC (permalink / raw)
  To: linux-kernel, rcu, kvm
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, Lai Jiangshan, Paul E. McKenney, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, David Woodhouse,
	Paolo Bonzini, seanjc, Joel Fernandes, Matthew Wilcox,
	Michal Luczaj

Currently, in order to annonate functions like synchronize_srcu() for
lockdep, a trick as follow can be used:

	lock_acquire();
	lock_release();

, which indicates synchronize_srcu() acts like an empty critical section
that waits for other (read-side) critical sections to finish. This
surely can catch some deadlock, but as discussion brought up by Paul
Mckenney [1], this could introduce false positives because of
irq-safe/unsafe detection. Extra tricks might help this:

	local_irq_disable(..);
	lock_acquire();
	lock_release();
	local_irq_enable(...);

But it's better that lockdep could provide an annonation for
synchronize_srcu() like functions, so that people won't need to repeat
the ugly tricks above. Therefore introduce lock_sync(). It's simply an
lock+unlock pair with no irq safe/unsafe deadlock check, since the
to-be-annontated functions don't create real critical sections therefore
there is no way that irq can create extra dependencies.

[1]: https://lore.kernel.org/lkml/20180412021233.ewncg5jjuzjw3x62@tardis/

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 include/linux/lockdep.h  |  5 +++++
 kernel/locking/lockdep.c | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 1f1099dac3f0..ba09df6a0872 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -268,6 +268,10 @@ extern void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 
 extern void lock_release(struct lockdep_map *lock, unsigned long ip);
 
+extern void lock_sync(struct lockdep_map *lock, unsigned int subclass,
+		      int read, int check, struct lockdep_map *nest_lock,
+		      unsigned long ip);
+
 /* lock_is_held_type() returns */
 #define LOCK_STATE_UNKNOWN	-1
 #define LOCK_STATE_NOT_HELD	0
@@ -555,6 +559,7 @@ do {									\
 #define lock_map_acquire_read(l)		lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_)
 #define lock_map_acquire_tryread(l)		lock_acquire_shared_recursive(l, 0, 1, NULL, _THIS_IP_)
 #define lock_map_release(l)			lock_release(l, _THIS_IP_)
+#define lock_map_sync(l)			lock_sync(l, 0, 0, 1, NULL, _THIS_IP_)
 
 #ifdef CONFIG_PROVE_LOCKING
 # define might_lock(lock)						\
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e3375bc40dad..cffa026a765f 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -5692,6 +5692,40 @@ void lock_release(struct lockdep_map *lock, unsigned long ip)
 }
 EXPORT_SYMBOL_GPL(lock_release);
 
+/*
+ * lock_sync() - A special annotation for synchronize_{s,}rcu()-like API.
+ *
+ * No actual critical section is created by the APIs annotated with this: these
+ * APIs are used to wait for one or multiple critical sections (on other CPUs
+ * or threads), and it means that calling these APIs inside these critical
+ * sections is potential deadlock.
+ *
+ * This annotation acts as an acqurie+release anontation pair with hardirqoff
+ * being 1. Since there's no critical section, no interrupt can create extra
+ * dependencies "inside" the annotation, hardirqoff == 1 allows us to avoid
+ * false positives.
+ */
+void lock_sync(struct lockdep_map *lock, unsigned subclass, int read,
+	       int check, struct lockdep_map *nest_lock, unsigned long ip)
+{
+	unsigned long flags;
+
+	if (unlikely(!lockdep_enabled()))
+		return;
+
+	raw_local_irq_save(flags);
+	check_flags(flags);
+
+	lockdep_recursion_inc();
+	__lock_acquire(lock, subclass, 0, read, check, 1, nest_lock, ip, 0, 0);
+
+	if (__lock_release(lock, ip))
+		check_chain_key(current);
+	lockdep_recursion_finish();
+	raw_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(lock_sync);
+
 noinstr int lock_is_held_type(const struct lockdep_map *lock, int read)
 {
 	unsigned long flags;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks
  2023-01-13  6:59 [PATCH 0/3] Detect SRCU related deadlocks Boqun Feng
  2023-01-13  6:59 ` [PATCH 1/3] locking/lockdep: Introduce lock_sync() Boqun Feng
@ 2023-01-13  6:59 ` Boqun Feng
  2023-01-13 11:29   ` Paul E. McKenney
  2023-01-16 22:01   ` Waiman Long
  2023-01-13  6:59 ` [PATCH 3/3] WIP: locking/lockdep: selftests: Add selftests for SRCU Boqun Feng
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Boqun Feng @ 2023-01-13  6:59 UTC (permalink / raw)
  To: linux-kernel, rcu, kvm
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, Lai Jiangshan, Paul E. McKenney, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, David Woodhouse,
	Paolo Bonzini, seanjc, Joel Fernandes, Matthew Wilcox,
	Michal Luczaj

Although all flavors of RCU are annotated correctly with lockdep as
recursive read locks, their 'check' parameter of lock_acquire() is
unset. It means that RCU read locks are not added into the lockdep
dependency graph therefore deadlock detection based on dependency graph
won't catch deadlock caused by RCU. This is fine for "non-sleepable" RCU
flavors since wait-context detection and other context based detection
can catch these deadlocks. However for sleepable RCU, this is limited.

Actually we can detect the deadlocks caused by SRCU by 1) making
srcu_read_lock() a 'check'ed recursive read lock and 2) making
synchronize_srcu() a empty write lock critical section. Even better,
with the newly introduced lock_sync(), we can avoid false positives
about irq-unsafe/safe. So do it.

Note that NMI safe SRCU read side critical sections are currently not
annonated, since step-by-step approach can help us deal with
false-positives. These may be annotated in the future.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 include/linux/srcu.h  | 23 +++++++++++++++++++++--
 kernel/rcu/srcutiny.c |  2 ++
 kernel/rcu/srcutree.c |  2 ++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 9b9d0bbf1d3c..a1595f8c5155 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -102,6 +102,21 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
 	return lock_is_held(&ssp->dep_map);
 }
 
+static inline void srcu_lock_acquire(struct lockdep_map *map)
+{
+	lock_map_acquire_read(map);
+}
+
+static inline void srcu_lock_release(struct lockdep_map *map)
+{
+	lock_map_release(map);
+}
+
+static inline void srcu_lock_sync(struct lockdep_map *map)
+{
+	lock_map_sync(map);
+}
+
 #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
 
 static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
@@ -109,6 +124,10 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
 	return 1;
 }
 
+#define srcu_lock_acquire(m) do { } while (0)
+#define srcu_lock_release(m) do { } while (0)
+#define srcu_lock_sync(m) do { } while (0)
+
 #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
 
 #define SRCU_NMI_UNKNOWN	0x0
@@ -182,7 +201,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)
 
 	srcu_check_nmi_safety(ssp, false);
 	retval = __srcu_read_lock(ssp);
-	rcu_lock_acquire(&(ssp)->dep_map);
+	srcu_lock_acquire(&(ssp)->dep_map);
 	return retval;
 }
 
@@ -226,7 +245,7 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx)
 {
 	WARN_ON_ONCE(idx & ~0x1);
 	srcu_check_nmi_safety(ssp, false);
-	rcu_lock_release(&(ssp)->dep_map);
+	srcu_lock_release(&(ssp)->dep_map);
 	__srcu_read_unlock(ssp, idx);
 }
 
diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index b12fb0cec44d..336af24e0fe3 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -197,6 +197,8 @@ void synchronize_srcu(struct srcu_struct *ssp)
 {
 	struct rcu_synchronize rs;
 
+	srcu_lock_sync(&ssp->dep_map);
+
 	RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
 			lock_is_held(&rcu_bh_lock_map) ||
 			lock_is_held(&rcu_lock_map) ||
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index ca4b5dcec675..408088c73e0e 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1267,6 +1267,8 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
 {
 	struct rcu_synchronize rcu;
 
+	srcu_lock_sync(&ssp->dep_map);
+
 	RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
 			 lock_is_held(&rcu_bh_lock_map) ||
 			 lock_is_held(&rcu_lock_map) ||
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/3] WIP: locking/lockdep: selftests: Add selftests for SRCU
  2023-01-13  6:59 [PATCH 0/3] Detect SRCU related deadlocks Boqun Feng
  2023-01-13  6:59 ` [PATCH 1/3] locking/lockdep: Introduce lock_sync() Boqun Feng
  2023-01-13  6:59 ` [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks Boqun Feng
@ 2023-01-13  6:59 ` Boqun Feng
  2023-01-13 12:46 ` [PATCH 0/3] KVM: Make use of SRCU deadlock detection support David Woodhouse
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Boqun Feng @ 2023-01-13  6:59 UTC (permalink / raw)
  To: linux-kernel, rcu, kvm
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, Lai Jiangshan, Paul E. McKenney, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, David Woodhouse,
	Paolo Bonzini, seanjc, Joel Fernandes, Matthew Wilcox,
	Michal Luczaj

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 lib/locking-selftest.c | 71 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 8d24279fad05..5fc206a2f9f1 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -60,6 +60,7 @@ __setup("debug_locks_verbose=", setup_debug_locks_verbose);
 #define LOCKTYPE_RTMUTEX 0x20
 #define LOCKTYPE_LL	0x40
 #define LOCKTYPE_SPECIAL 0x80
+#define LOCKTYPE_SRCU	0x100
 
 static struct ww_acquire_ctx t, t2;
 static struct ww_mutex o, o2, o3;
@@ -100,6 +101,13 @@ static DEFINE_RT_MUTEX(rtmutex_D);
 
 #endif
 
+#ifdef CONFIG_SRCU
+static struct lock_class_key srcu_A_key;
+static struct lock_class_key srcu_B_key;
+static struct srcu_struct srcu_A;
+static struct srcu_struct srcu_B;
+#endif
+
 /*
  * Locks that we initialize dynamically as well so that
  * e.g. X1 and X2 becomes two instances of the same class,
@@ -1418,6 +1426,12 @@ static void reset_locks(void)
 	memset(&ww_lockdep.acquire_key, 0, sizeof(ww_lockdep.acquire_key));
 	memset(&ww_lockdep.mutex_key, 0, sizeof(ww_lockdep.mutex_key));
 	local_irq_enable();
+
+#ifdef CONFIG_SRCU
+	__init_srcu_struct(&srcu_A, "srcuA", &srcu_A_key);
+	__init_srcu_struct(&srcu_B, "srcuB", &srcu_B_key);
+#endif
+
 }
 
 #undef I
@@ -2360,6 +2374,58 @@ static void ww_tests(void)
 	pr_cont("\n");
 }
 
+static void srcu_ABBA(void)
+{
+	int ia, ib;
+
+	ia = srcu_read_lock(&srcu_A);
+	synchronize_srcu(&srcu_B);
+	srcu_read_unlock(&srcu_A, ia);
+
+	ib = srcu_read_lock(&srcu_B);
+	synchronize_srcu(&srcu_A);
+	srcu_read_unlock(&srcu_B, ib); // should fail
+}
+
+static void srcu_mutex_ABBA(void)
+{
+	int ia;
+
+	mutex_lock(&mutex_A);
+	synchronize_srcu(&srcu_A);
+	mutex_unlock(&mutex_A);
+
+	ia = srcu_read_lock(&srcu_A);
+	mutex_lock(&mutex_A);
+	mutex_unlock(&mutex_A);
+	srcu_read_unlock(&srcu_A, ia); // should fail
+}
+
+static void srcu_irqsafe(void)
+{
+	int ia;
+
+	HARDIRQ_ENTER();
+	ia = srcu_read_lock(&srcu_A);
+	srcu_read_unlock(&srcu_A, ia);
+	HARDIRQ_EXIT();
+
+	synchronize_srcu(&srcu_A); // should NOT fail
+}
+
+static void srcu_tests(void)
+{
+	printk("  --------------------------------------------------------------------------\n");
+	printk("  | SRCU tests |\n");
+	printk("  ---------------\n");
+	print_testname("ABBA read-sync/read-sync");
+	dotest(srcu_ABBA, FAILURE, LOCKTYPE_SRCU);
+	print_testname("ABBA mutex-sync/read-mutex");
+	dotest(srcu_mutex_ABBA, FAILURE, LOCKTYPE_SRCU);
+	print_testname("Irqsafe synchronize_srcu");
+	dotest(srcu_irqsafe, SUCCESS, LOCKTYPE_SRCU);
+	pr_cont("\n");
+}
 
 /*
  * <in hardirq handler>
@@ -2881,6 +2947,10 @@ void locking_selftest(void)
 	printk("  --------------------------------------------------------------------------\n");
 
 	init_shared_classes();
+#ifdef CONFIG_SRCU
+	__init_srcu_struct(&srcu_A, "srcuA", &srcu_A_key);
+	__init_srcu_struct(&srcu_B, "srcuB", &srcu_B_key);
+#endif
 	lockdep_set_selftest_task(current);
 
 	DO_TESTCASE_6R("A-A deadlock", AA);
@@ -2965,6 +3035,7 @@ void locking_selftest(void)
 	DO_TESTCASE_6x2x2RW("irq read-recursion #3", irq_read_recursion3);
 
 	ww_tests();
+	srcu_tests();
 
 	force_read_lock_recursive = 0;
 	/*
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks
  2023-01-13  6:59 ` [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks Boqun Feng
@ 2023-01-13 11:29   ` Paul E. McKenney
  2023-01-13 18:05     ` Boqun Feng
  2023-01-16 22:01   ` Waiman Long
  1 sibling, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2023-01-13 11:29 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, rcu, kvm, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Lai Jiangshan, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, David Woodhouse, Paolo Bonzini, seanjc,
	Joel Fernandes, Matthew Wilcox, Michal Luczaj

On Thu, Jan 12, 2023 at 10:59:54PM -0800, Boqun Feng wrote:
> Although all flavors of RCU are annotated correctly with lockdep as
> recursive read locks, their 'check' parameter of lock_acquire() is
> unset. It means that RCU read locks are not added into the lockdep
> dependency graph therefore deadlock detection based on dependency graph
> won't catch deadlock caused by RCU. This is fine for "non-sleepable" RCU
> flavors since wait-context detection and other context based detection
> can catch these deadlocks. However for sleepable RCU, this is limited.
> 
> Actually we can detect the deadlocks caused by SRCU by 1) making
> srcu_read_lock() a 'check'ed recursive read lock and 2) making
> synchronize_srcu() a empty write lock critical section. Even better,
> with the newly introduced lock_sync(), we can avoid false positives
> about irq-unsafe/safe. So do it.
> 
> Note that NMI safe SRCU read side critical sections are currently not
> annonated, since step-by-step approach can help us deal with
> false-positives. These may be annotated in the future.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

Nice, thank you!!!

Acked-by: Paul E. McKenney <paulmck@kernel.org>

Or if you would prefer that I take the series through -rcu, please just
let me know.

							Thanx, Paul

> ---
>  include/linux/srcu.h  | 23 +++++++++++++++++++++--
>  kernel/rcu/srcutiny.c |  2 ++
>  kernel/rcu/srcutree.c |  2 ++
>  3 files changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> index 9b9d0bbf1d3c..a1595f8c5155 100644
> --- a/include/linux/srcu.h
> +++ b/include/linux/srcu.h
> @@ -102,6 +102,21 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
>  	return lock_is_held(&ssp->dep_map);
>  }
>  
> +static inline void srcu_lock_acquire(struct lockdep_map *map)
> +{
> +	lock_map_acquire_read(map);
> +}
> +
> +static inline void srcu_lock_release(struct lockdep_map *map)
> +{
> +	lock_map_release(map);
> +}
> +
> +static inline void srcu_lock_sync(struct lockdep_map *map)
> +{
> +	lock_map_sync(map);
> +}
> +
>  #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>  
>  static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> @@ -109,6 +124,10 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
>  	return 1;
>  }
>  
> +#define srcu_lock_acquire(m) do { } while (0)
> +#define srcu_lock_release(m) do { } while (0)
> +#define srcu_lock_sync(m) do { } while (0)
> +
>  #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>  
>  #define SRCU_NMI_UNKNOWN	0x0
> @@ -182,7 +201,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)
>  
>  	srcu_check_nmi_safety(ssp, false);
>  	retval = __srcu_read_lock(ssp);
> -	rcu_lock_acquire(&(ssp)->dep_map);
> +	srcu_lock_acquire(&(ssp)->dep_map);
>  	return retval;
>  }
>  
> @@ -226,7 +245,7 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx)
>  {
>  	WARN_ON_ONCE(idx & ~0x1);
>  	srcu_check_nmi_safety(ssp, false);
> -	rcu_lock_release(&(ssp)->dep_map);
> +	srcu_lock_release(&(ssp)->dep_map);
>  	__srcu_read_unlock(ssp, idx);
>  }
>  
> diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> index b12fb0cec44d..336af24e0fe3 100644
> --- a/kernel/rcu/srcutiny.c
> +++ b/kernel/rcu/srcutiny.c
> @@ -197,6 +197,8 @@ void synchronize_srcu(struct srcu_struct *ssp)
>  {
>  	struct rcu_synchronize rs;
>  
> +	srcu_lock_sync(&ssp->dep_map);
> +
>  	RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
>  			lock_is_held(&rcu_bh_lock_map) ||
>  			lock_is_held(&rcu_lock_map) ||
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index ca4b5dcec675..408088c73e0e 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -1267,6 +1267,8 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
>  {
>  	struct rcu_synchronize rcu;
>  
> +	srcu_lock_sync(&ssp->dep_map);
> +
>  	RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
>  			 lock_is_held(&rcu_bh_lock_map) ||
>  			 lock_is_held(&rcu_lock_map) ||
> -- 
> 2.38.1
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 0/3] KVM: Make use of SRCU deadlock detection support
  2023-01-13  6:59 [PATCH 0/3] Detect SRCU related deadlocks Boqun Feng
                   ` (2 preceding siblings ...)
  2023-01-13  6:59 ` [PATCH 3/3] WIP: locking/lockdep: selftests: Add selftests for SRCU Boqun Feng
@ 2023-01-13 12:46 ` David Woodhouse
  2023-01-13 12:46   ` [PATCH 1/3] KVM: Show lockdep the kvm->mutex vs. kvm->srcu ordering rule David Woodhouse
                     ` (3 more replies)
       [not found] ` <20230113130330.1027-1-hdanton@sina.com>
  2023-01-13 23:57 ` [PATCH 4/3] locking/lockdep: Improve the deadlock scenario print for sync and read lock Boqun Feng
  5 siblings, 4 replies; 27+ messages in thread
From: David Woodhouse @ 2023-01-13 12:46 UTC (permalink / raw)
  To: Boqun Feng, linux-kernel, rcu, kvm
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Lai Jiangshan, Paul E . McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, David Woodhouse, Paolo Bonzini, seanjc,
	Joel Fernandes, Matthew Wilcox, Michal Luczaj


On Thu, 2023-01-12 at 22:59 -0800, Boqun Feng wrote:
> This is actually a leftover of the recursive read deadlock detection
> patchset:
> 
>         https://lore.kernel.org/lkml/20180411135647.21496-1-boqun.feng@gmail.com/
> 
> I resolve comments then and add more test cases, and hopefully this can
> fulfill the request from KVM:
> 
>         https://lore.kernel.org/lkml/a14a13a690277d4cc95a4b26aa2d9a4d9b392a74.camel@infradead.org/
> 
> ;-)

It definitely seems to work; thank you! I can revert some of the recent 
fixes from the KVM tree, apply your patches, and then I can trigger the 
lockdep warnings. To make it reliably trigger, we need to artificially
call synchronize_srcu(&kvm->srcu) under kvm->lock on KVM init, because
the circumstances under which that happens are a bit esoteric and don't
always happen otherwise, so lockdep wouldn't notice.

> The patch #3 is now WIP for two reasons:
> 
> *       It may conflicts with Paul's patchset on removing CONFIG_SRCU
> 
> *       I haven't found a proper way to "reinit" srcu_struct when
>         lockdep selftest runs: cleanup_srcu_struct() needs workqueue
>         however the tests can run before there is one.

Understood. I think the KVM series which follows can stand alone and go 
via the KVM tree separately. As and when your series gets merged, it'll 
serve to protect against regressions.

Thanks again!

David Woodhouse (3):
      KVM: Show lockdep the kvm->mutex vs. kvm->srcu ordering rule
      KVM: selftests: Use enum for test numbers in xen_shinfo_test
      KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test

 .../testing/selftests/kvm/x86_64/xen_shinfo_test.c | 165 ++++++++++++++-------
 virt/kvm/kvm_main.c                                |  10 ++
 2 files changed, 124 insertions(+), 51 deletions(-)



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/3] KVM: Show lockdep the kvm->mutex vs. kvm->srcu ordering rule
  2023-01-13 12:46 ` [PATCH 0/3] KVM: Make use of SRCU deadlock detection support David Woodhouse
@ 2023-01-13 12:46   ` David Woodhouse
  2023-01-13 12:46   ` [PATCH 2/3] KVM: selftests: Use enum for test numbers in xen_shinfo_test David Woodhouse
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: David Woodhouse @ 2023-01-13 12:46 UTC (permalink / raw)
  To: Boqun Feng, linux-kernel, rcu, kvm
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Lai Jiangshan, Paul E . McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, David Woodhouse, Paolo Bonzini, seanjc,
	Joel Fernandes, Matthew Wilcox, Michal Luczaj

From: David Woodhouse <dwmw@amazon.co.uk>

Lockdep is learning to spot deadlocks with sleepable RCU vs. mutexes,
which can occur where one code path calls synchronize_scru() with a
mutex held, while another code path attempts to obtain the same mutex
while in a read-side section.

Since lockdep isn't very good at reading the English prose in
Documentation/virt/kvm/locking.rst, give it a demonstration by calling
synchronize_scru(&kvm->srcu) while holding kvm->lock in kvm_create_vm().
The cases where this happens naturally are relatively esoteric and may
not happen otherwise.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 virt/kvm/kvm_main.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 13e88297f999..285b3c5a6364 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1173,6 +1173,16 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
 	if (init_srcu_struct(&kvm->irq_srcu))
 		goto out_err_no_irq_srcu;
 
+#ifdef CONFIG_LOCKDEP
+	/*
+	 * Ensure lockdep knows that it's not permitted to lock kvm->lock
+	 * from a SRCU read section on kvm->srcu.
+	 */
+	mutex_lock(&kvm->lock);
+	synchronize_srcu(&kvm->srcu);
+	mutex_unlock(&kvm->lock);
+#endif
+
 	refcount_set(&kvm->users_count, 1);
 	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
 		for (j = 0; j < 2; j++) {
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/3] KVM: selftests: Use enum for test numbers in xen_shinfo_test
  2023-01-13 12:46 ` [PATCH 0/3] KVM: Make use of SRCU deadlock detection support David Woodhouse
  2023-01-13 12:46   ` [PATCH 1/3] KVM: Show lockdep the kvm->mutex vs. kvm->srcu ordering rule David Woodhouse
@ 2023-01-13 12:46   ` David Woodhouse
  2023-01-13 17:13     ` David Woodhouse
  2023-01-13 12:46   ` [PATCH 3/3] KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test David Woodhouse
  2023-02-04  2:34   ` [PATCH 0/3] KVM: Make use of SRCU deadlock detection support Sean Christopherson
  3 siblings, 1 reply; 27+ messages in thread
From: David Woodhouse @ 2023-01-13 12:46 UTC (permalink / raw)
  To: Boqun Feng, linux-kernel, rcu, kvm
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Lai Jiangshan, Paul E . McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, David Woodhouse, Paolo Bonzini, seanjc,
	Joel Fernandes, Matthew Wilcox, Michal Luczaj

From: David Woodhouse <dwmw@amazon.co.uk>

The xen_shinfo_test started off with very few iterations, and the numbers
we used in GUEST_SYNC() were precisely mapped to the RUNSTATE_xxx values
anyway to start with.

It has since grown quite a few more tests, and it's kind of awful to be
handling them all as bare numbers. Especially when I want to add a new
test in the middle. Define an enum for the test stages, and use it both
in the guest code and the host switch statement.

No functional change, if I can count to 24.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 .../selftests/kvm/x86_64/xen_shinfo_test.c    | 134 +++++++++++-------
 1 file changed, 83 insertions(+), 51 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
index 721f6a693799..644d614a9965 100644
--- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
@@ -44,6 +44,36 @@
 #define EVTCHN_TEST2 66
 #define EVTCHN_TIMER 13
 
+enum {
+	TEST_INJECT_VECTOR = 0,
+	TEST_RUNSTATE_runnable,
+	TEST_RUNSTATE_blocked,
+	TEST_RUNSTATE_offline,
+	TEST_RUNSTATE_ADJUST,
+	TEST_RUNSTATE_DATA,
+	TEST_STEAL_TIME,
+	TEST_EVTCHN_MASKED,
+	TEST_EVTCHN_UNMASKED,
+	TEST_EVTCHN_SLOWPATH,
+	TEST_EVTCHN_SEND_IOCTL,
+	TEST_EVTCHN_HCALL,
+	TEST_EVTCHN_HCALL_EVENTFD,
+	TEST_TIMER_SETUP,
+	TEST_TIMER_WAIT,
+	TEST_TIMER_RESTORE,
+	TEST_POLL_READY,
+	TEST_POLL_TIMEOUT,
+	TEST_POLL_MASKED,
+	TEST_POLL_WAKE,
+	TEST_TIMER_PAST,
+	TEST_LOCKING_SEND_RACE,
+	TEST_LOCKING_POLL_RACE,
+	TEST_LOCKING_POLL_TIMEOUT,
+	TEST_DONE,
+
+	TEST_GUEST_SAW_IRQ,
+};
+
 #define XEN_HYPERCALL_MSR	0x40000000
 
 #define MIN_STEAL_TIME		50000
@@ -147,7 +177,7 @@ static void evtchn_handler(struct ex_regs *regs)
 	vi->evtchn_pending_sel = 0;
 	guest_saw_irq = true;
 
-	GUEST_SYNC(0x20);
+	GUEST_SYNC(TEST_GUEST_SAW_IRQ);
 }
 
 static void guest_wait_for_irq(void)
@@ -168,41 +198,41 @@ static void guest_code(void)
 	);
 
 	/* Trigger an interrupt injection */
-	GUEST_SYNC(0);
+	GUEST_SYNC(TEST_INJECT_VECTOR);
 
 	guest_wait_for_irq();
 
 	/* Test having the host set runstates manually */
-	GUEST_SYNC(RUNSTATE_runnable);
+	GUEST_SYNC(TEST_RUNSTATE_runnable);
 	GUEST_ASSERT(rs->time[RUNSTATE_runnable] != 0);
 	GUEST_ASSERT(rs->state == 0);
 
-	GUEST_SYNC(RUNSTATE_blocked);
+	GUEST_SYNC(TEST_RUNSTATE_blocked);
 	GUEST_ASSERT(rs->time[RUNSTATE_blocked] != 0);
 	GUEST_ASSERT(rs->state == 0);
 
-	GUEST_SYNC(RUNSTATE_offline);
+	GUEST_SYNC(TEST_RUNSTATE_offline);
 	GUEST_ASSERT(rs->time[RUNSTATE_offline] != 0);
 	GUEST_ASSERT(rs->state == 0);
 
 	/* Test runstate time adjust */
-	GUEST_SYNC(4);
+	GUEST_SYNC(TEST_RUNSTATE_ADJUST);
 	GUEST_ASSERT(rs->time[RUNSTATE_blocked] == 0x5a);
 	GUEST_ASSERT(rs->time[RUNSTATE_offline] == 0x6b6b);
 
 	/* Test runstate time set */
-	GUEST_SYNC(5);
+	GUEST_SYNC(TEST_RUNSTATE_DATA);
 	GUEST_ASSERT(rs->state_entry_time >= 0x8000);
 	GUEST_ASSERT(rs->time[RUNSTATE_runnable] == 0);
 	GUEST_ASSERT(rs->time[RUNSTATE_blocked] == 0x6b6b);
 	GUEST_ASSERT(rs->time[RUNSTATE_offline] == 0x5a);
 
 	/* sched_yield() should result in some 'runnable' time */
-	GUEST_SYNC(6);
+	GUEST_SYNC(TEST_STEAL_TIME);
 	GUEST_ASSERT(rs->time[RUNSTATE_runnable] >= MIN_STEAL_TIME);
 
 	/* Attempt to deliver a *masked* interrupt */
-	GUEST_SYNC(7);
+	GUEST_SYNC(TEST_EVTCHN_MASKED);
 
 	/* Wait until we see the bit set */
 	struct shared_info *si = (void *)SHINFO_VADDR;
@@ -210,21 +240,21 @@ static void guest_code(void)
 		__asm__ __volatile__ ("rep nop" : : : "memory");
 
 	/* Now deliver an *unmasked* interrupt */
-	GUEST_SYNC(8);
+	GUEST_SYNC(TEST_EVTCHN_UNMASKED);
 
 	guest_wait_for_irq();
 
 	/* Change memslots and deliver an interrupt */
-	GUEST_SYNC(9);
+	GUEST_SYNC(TEST_EVTCHN_SLOWPATH);
 
 	guest_wait_for_irq();
 
 	/* Deliver event channel with KVM_XEN_HVM_EVTCHN_SEND */
-	GUEST_SYNC(10);
+	GUEST_SYNC(TEST_EVTCHN_SEND_IOCTL);
 
 	guest_wait_for_irq();
 
-	GUEST_SYNC(11);
+	GUEST_SYNC(TEST_EVTCHN_HCALL);
 
 	/* Our turn. Deliver event channel (to ourselves) with
 	 * EVTCHNOP_send hypercall. */
@@ -240,7 +270,7 @@ static void guest_code(void)
 
 	guest_wait_for_irq();
 
-	GUEST_SYNC(12);
+	GUEST_SYNC(TEST_EVTCHN_HCALL_EVENTFD);
 
 	/* Deliver "outbound" event channel to an eventfd which
 	 * happens to be one of our own irqfds. */
@@ -255,7 +285,7 @@ static void guest_code(void)
 
 	guest_wait_for_irq();
 
-	GUEST_SYNC(13);
+	GUEST_SYNC(TEST_TIMER_SETUP);
 
 	/* Set a timer 100ms in the future. */
 	__asm__ __volatile__ ("vmcall" :
@@ -264,17 +294,17 @@ static void guest_code(void)
 			      "D" (rs->state_entry_time + 100000000));
 	GUEST_ASSERT(rax == 0);
 
-	GUEST_SYNC(14);
+	GUEST_SYNC(TEST_TIMER_WAIT);
 
 	/* Now wait for the timer */
 	guest_wait_for_irq();
 
-	GUEST_SYNC(15);
+	GUEST_SYNC(TEST_TIMER_RESTORE);
 
 	/* The host has 'restored' the timer. Just wait for it. */
 	guest_wait_for_irq();
 
-	GUEST_SYNC(16);
+	GUEST_SYNC(TEST_POLL_READY);
 
 	/* Poll for an event channel port which is already set */
 	u32 ports[1] = { EVTCHN_TIMER };
@@ -292,7 +322,7 @@ static void guest_code(void)
 
 	GUEST_ASSERT(rax == 0);
 
-	GUEST_SYNC(17);
+	GUEST_SYNC(TEST_POLL_TIMEOUT);
 
 	/* Poll for an unset port and wait for the timeout. */
 	p.timeout = 100000000;
@@ -304,7 +334,7 @@ static void guest_code(void)
 
 	GUEST_ASSERT(rax == 0);
 
-	GUEST_SYNC(18);
+	GUEST_SYNC(TEST_POLL_MASKED);
 
 	/* A timer will wake the masked port we're waiting on, while we poll */
 	p.timeout = 0;
@@ -316,7 +346,7 @@ static void guest_code(void)
 
 	GUEST_ASSERT(rax == 0);
 
-	GUEST_SYNC(19);
+	GUEST_SYNC(TEST_POLL_WAKE);
 
 	/* A timer wake an *unmasked* port which should wake us with an
 	 * actual interrupt, while we're polling on a different port. */
@@ -332,17 +362,17 @@ static void guest_code(void)
 
 	guest_wait_for_irq();
 
-	GUEST_SYNC(20);
+	GUEST_SYNC(TEST_TIMER_PAST);
 
 	/* Timer should have fired already */
 	guest_wait_for_irq();
 
-	GUEST_SYNC(21);
+	GUEST_SYNC(TEST_LOCKING_SEND_RACE);
 	/* Racing host ioctls */
 
 	guest_wait_for_irq();
 
-	GUEST_SYNC(22);
+	GUEST_SYNC(TEST_LOCKING_POLL_RACE);
 	/* Racing vmcall against host ioctl */
 
 	ports[0] = 0;
@@ -375,12 +405,12 @@ static void guest_code(void)
 	 * expiring while the event channel was invalid.
 	 */
 	if (!guest_saw_irq) {
-		GUEST_SYNC(23);
+		GUEST_SYNC(TEST_LOCKING_POLL_TIMEOUT);
 		goto wait_for_timer;
 	}
 	guest_saw_irq = false;
 
-	GUEST_SYNC(24);
+	GUEST_SYNC(TEST_DONE);
 }
 
 static int cmp_timespec(struct timespec *a, struct timespec *b)
@@ -439,6 +469,7 @@ int main(int argc, char *argv[])
 	bool verbose;
 	int ret;
 
+	printf("TEST_DONE is %d\n", TEST_DONE);
 	verbose = argc > 1 && (!strncmp(argv[1], "-v", 3) ||
 			       !strncmp(argv[1], "--verbose", 10));
 
@@ -649,25 +680,26 @@ int main(int argc, char *argv[])
 					    "runstate times don't add up");
 
 			switch (uc.args[1]) {
-			case 0:
+			case TEST_INJECT_VECTOR:
 				if (verbose)
 					printf("Delivering evtchn upcall\n");
 				evtchn_irq_expected = true;
 				vinfo->evtchn_upcall_pending = 1;
 				break;
 
-			case RUNSTATE_runnable...RUNSTATE_offline:
+			case TEST_RUNSTATE_runnable...TEST_RUNSTATE_offline:
 				TEST_ASSERT(!evtchn_irq_expected, "Event channel IRQ not seen");
 				if (!do_runstate_tests)
 					goto done;
 				if (verbose)
 					printf("Testing runstate %s\n", runstate_names[uc.args[1]]);
 				rst.type = KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT;
-				rst.u.runstate.state = uc.args[1];
+				rst.u.runstate.state = uc.args[1] + RUNSTATE_runnable -
+					TEST_RUNSTATE_runnable;
 				vcpu_ioctl(vcpu, KVM_XEN_VCPU_SET_ATTR, &rst);
 				break;
 
-			case 4:
+			case TEST_RUNSTATE_ADJUST:
 				if (verbose)
 					printf("Testing RUNSTATE_ADJUST\n");
 				rst.type = KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST;
@@ -682,7 +714,7 @@ int main(int argc, char *argv[])
 				vcpu_ioctl(vcpu, KVM_XEN_VCPU_SET_ATTR, &rst);
 				break;
 
-			case 5:
+			case TEST_RUNSTATE_DATA:
 				if (verbose)
 					printf("Testing RUNSTATE_DATA\n");
 				rst.type = KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA;
@@ -694,7 +726,7 @@ int main(int argc, char *argv[])
 				vcpu_ioctl(vcpu, KVM_XEN_VCPU_SET_ATTR, &rst);
 				break;
 
-			case 6:
+			case TEST_STEAL_TIME:
 				if (verbose)
 					printf("Testing steal time\n");
 				/* Yield until scheduler delay exceeds target */
@@ -704,7 +736,7 @@ int main(int argc, char *argv[])
 				} while (get_run_delay() < rundelay);
 				break;
 
-			case 7:
+			case TEST_EVTCHN_MASKED:
 				if (!do_eventfd_tests)
 					goto done;
 				if (verbose)
@@ -714,7 +746,7 @@ int main(int argc, char *argv[])
 				alarm(1);
 				break;
 
-			case 8:
+			case TEST_EVTCHN_UNMASKED:
 				if (verbose)
 					printf("Testing unmasked event channel\n");
 				/* Unmask that, but deliver the other one */
@@ -725,7 +757,7 @@ int main(int argc, char *argv[])
 				alarm(1);
 				break;
 
-			case 9:
+			case TEST_EVTCHN_SLOWPATH:
 				TEST_ASSERT(!evtchn_irq_expected,
 					    "Expected event channel IRQ but it didn't happen");
 				shinfo->evtchn_pending[1] = 0;
@@ -738,7 +770,7 @@ int main(int argc, char *argv[])
 				alarm(1);
 				break;
 
-			case 10:
+			case TEST_EVTCHN_SEND_IOCTL:
 				TEST_ASSERT(!evtchn_irq_expected,
 					    "Expected event channel IRQ but it didn't happen");
 				if (!do_evtchn_tests)
@@ -758,7 +790,7 @@ int main(int argc, char *argv[])
 				alarm(1);
 				break;
 
-			case 11:
+			case TEST_EVTCHN_HCALL:
 				TEST_ASSERT(!evtchn_irq_expected,
 					    "Expected event channel IRQ but it didn't happen");
 				shinfo->evtchn_pending[1] = 0;
@@ -769,7 +801,7 @@ int main(int argc, char *argv[])
 				alarm(1);
 				break;
 
-			case 12:
+			case TEST_EVTCHN_HCALL_EVENTFD:
 				TEST_ASSERT(!evtchn_irq_expected,
 					    "Expected event channel IRQ but it didn't happen");
 				shinfo->evtchn_pending[0] = 0;
@@ -780,7 +812,7 @@ int main(int argc, char *argv[])
 				alarm(1);
 				break;
 
-			case 13:
+			case TEST_TIMER_SETUP:
 				TEST_ASSERT(!evtchn_irq_expected,
 					    "Expected event channel IRQ but it didn't happen");
 				shinfo->evtchn_pending[1] = 0;
@@ -789,7 +821,7 @@ int main(int argc, char *argv[])
 					printf("Testing guest oneshot timer\n");
 				break;
 
-			case 14:
+			case TEST_TIMER_WAIT:
 				memset(&tmr, 0, sizeof(tmr));
 				tmr.type = KVM_XEN_VCPU_ATTR_TYPE_TIMER;
 				vcpu_ioctl(vcpu, KVM_XEN_VCPU_GET_ATTR, &tmr);
@@ -803,7 +835,7 @@ int main(int argc, char *argv[])
 				alarm(1);
 				break;
 
-			case 15:
+			case TEST_TIMER_RESTORE:
 				TEST_ASSERT(!evtchn_irq_expected,
 					    "Expected event channel IRQ but it didn't happen");
 				shinfo->evtchn_pending[0] = 0;
@@ -817,7 +849,7 @@ int main(int argc, char *argv[])
 				alarm(1);
 				break;
 
-			case 16:
+			case TEST_POLL_READY:
 				TEST_ASSERT(!evtchn_irq_expected,
 					    "Expected event channel IRQ but it didn't happen");
 
@@ -827,14 +859,14 @@ int main(int argc, char *argv[])
 				alarm(1);
 				break;
 
-			case 17:
+			case TEST_POLL_TIMEOUT:
 				if (verbose)
 					printf("Testing SCHEDOP_poll timeout\n");
 				shinfo->evtchn_pending[0] = 0;
 				alarm(1);
 				break;
 
-			case 18:
+			case TEST_POLL_MASKED:
 				if (verbose)
 					printf("Testing SCHEDOP_poll wake on masked event\n");
 
@@ -843,7 +875,7 @@ int main(int argc, char *argv[])
 				alarm(1);
 				break;
 
-			case 19:
+			case TEST_POLL_WAKE:
 				shinfo->evtchn_pending[0] = shinfo->evtchn_mask[0] = 0;
 				if (verbose)
 					printf("Testing SCHEDOP_poll wake on unmasked event\n");
@@ -860,7 +892,7 @@ int main(int argc, char *argv[])
 				alarm(1);
 				break;
 
-			case 20:
+			case TEST_TIMER_PAST:
 				TEST_ASSERT(!evtchn_irq_expected,
 					    "Expected event channel IRQ but it didn't happen");
 				/* Read timer and check it is no longer pending */
@@ -877,7 +909,7 @@ int main(int argc, char *argv[])
 				alarm(1);
 				break;
 
-			case 21:
+			case TEST_LOCKING_SEND_RACE:
 				TEST_ASSERT(!evtchn_irq_expected,
 					    "Expected event channel IRQ but it didn't happen");
 				alarm(0);
@@ -899,7 +931,7 @@ int main(int argc, char *argv[])
 					__vm_ioctl(vm, KVM_XEN_HVM_EVTCHN_SEND, &uxe);
 				break;
 
-			case 22:
+			case TEST_LOCKING_POLL_RACE:
 				TEST_ASSERT(!evtchn_irq_expected,
 					    "Expected event channel IRQ but it didn't happen");
 
@@ -914,7 +946,7 @@ int main(int argc, char *argv[])
 				vcpu_ioctl(vcpu, KVM_XEN_VCPU_SET_ATTR, &tmr);
 				break;
 
-			case 23:
+			case TEST_LOCKING_POLL_TIMEOUT:
 				/*
 				 * Optional and possibly repeated sync point.
 				 * Injecting the timer IRQ may fail if the
@@ -936,7 +968,7 @@ int main(int argc, char *argv[])
 							 SHINFO_RACE_TIMEOUT * 1000000000ULL;
 				vcpu_ioctl(vcpu, KVM_XEN_VCPU_SET_ATTR, &tmr);
 				break;
-			case 24:
+			case TEST_DONE:
 				TEST_ASSERT(!evtchn_irq_expected,
 					    "Expected event channel IRQ but it didn't happen");
 
@@ -947,7 +979,7 @@ int main(int argc, char *argv[])
 				TEST_ASSERT(ret == 0, "pthread_join() failed: %s", strerror(ret));
 				goto done;
 
-			case 0x20:
+			case TEST_GUEST_SAW_IRQ:
 				TEST_ASSERT(evtchn_irq_expected, "Unexpected event channel IRQ");
 				evtchn_irq_expected = false;
 				break;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/3] KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test
  2023-01-13 12:46 ` [PATCH 0/3] KVM: Make use of SRCU deadlock detection support David Woodhouse
  2023-01-13 12:46   ` [PATCH 1/3] KVM: Show lockdep the kvm->mutex vs. kvm->srcu ordering rule David Woodhouse
  2023-01-13 12:46   ` [PATCH 2/3] KVM: selftests: Use enum for test numbers in xen_shinfo_test David Woodhouse
@ 2023-01-13 12:46   ` David Woodhouse
  2023-02-04  2:32     ` Sean Christopherson
  2023-02-04  2:34   ` [PATCH 0/3] KVM: Make use of SRCU deadlock detection support Sean Christopherson
  3 siblings, 1 reply; 27+ messages in thread
From: David Woodhouse @ 2023-01-13 12:46 UTC (permalink / raw)
  To: Boqun Feng, linux-kernel, rcu, kvm
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Lai Jiangshan, Paul E . McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, David Woodhouse, Paolo Bonzini, seanjc,
	Joel Fernandes, Matthew Wilcox, Michal Luczaj

From: David Woodhouse <dwmw@amazon.co.uk>

When kvm_xen_evtchn_send() takes the slow path because the shinfo GPC
needs to be revalidated, it used to violate the SRCU vs. kvm->lock
locking rules and potentially cause a deadlock.

Now that lockdep is learning to catch such things, make sure that code
path is exercised by the selftest.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 .../selftests/kvm/x86_64/xen_shinfo_test.c    | 31 +++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
index 644d614a9965..3adc2e11b094 100644
--- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
@@ -29,6 +29,9 @@
 #define DUMMY_REGION_GPA	(SHINFO_REGION_GPA + (3 * PAGE_SIZE))
 #define DUMMY_REGION_SLOT	11
 
+#define DUMMY_REGION_GPA_2	(SHINFO_REGION_GPA + (4 * PAGE_SIZE))
+#define DUMMY_REGION_SLOT_2	12
+
 #define SHINFO_ADDR	(SHINFO_REGION_GPA)
 #define VCPU_INFO_ADDR	(SHINFO_REGION_GPA + 0x40)
 #define PVTIME_ADDR	(SHINFO_REGION_GPA + PAGE_SIZE)
@@ -57,6 +60,7 @@ enum {
 	TEST_EVTCHN_SLOWPATH,
 	TEST_EVTCHN_SEND_IOCTL,
 	TEST_EVTCHN_HCALL,
+	TEST_EVTCHN_HCALL_SLOWPATH,
 	TEST_EVTCHN_HCALL_EVENTFD,
 	TEST_TIMER_SETUP,
 	TEST_TIMER_WAIT,
@@ -270,6 +274,20 @@ static void guest_code(void)
 
 	guest_wait_for_irq();
 
+	GUEST_SYNC(TEST_EVTCHN_HCALL_SLOWPATH);
+
+	/* Same again, but this time the host has messed with memslots
+	 * so it should take the slow path in kvm_xen_set_evtchn(). */
+	__asm__ __volatile__ ("vmcall" :
+			      "=a" (rax) :
+			      "a" (__HYPERVISOR_event_channel_op),
+			      "D" (EVTCHNOP_send),
+			      "S" (&s));
+
+	GUEST_ASSERT(rax == 0);
+
+	guest_wait_for_irq();
+
 	GUEST_SYNC(TEST_EVTCHN_HCALL_EVENTFD);
 
 	/* Deliver "outbound" event channel to an eventfd which
@@ -801,6 +819,19 @@ int main(int argc, char *argv[])
 				alarm(1);
 				break;
 
+			case TEST_EVTCHN_HCALL_SLOWPATH:
+				TEST_ASSERT(!evtchn_irq_expected,
+					    "Expected event channel IRQ but it didn't happen");
+				shinfo->evtchn_pending[0] = 0;
+
+				if (verbose)
+					printf("Testing guest EVTCHNOP_send direct to evtchn after memslot change\n");
+				vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
+							    DUMMY_REGION_GPA_2, DUMMY_REGION_SLOT_2, 1, 0);
+				evtchn_irq_expected = true;
+				alarm(1);
+				break;
+
 			case TEST_EVTCHN_HCALL_EVENTFD:
 				TEST_ASSERT(!evtchn_irq_expected,
 					    "Expected event channel IRQ but it didn't happen");
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] KVM: selftests: Use enum for test numbers in xen_shinfo_test
  2023-01-13 12:46   ` [PATCH 2/3] KVM: selftests: Use enum for test numbers in xen_shinfo_test David Woodhouse
@ 2023-01-13 17:13     ` David Woodhouse
  0 siblings, 0 replies; 27+ messages in thread
From: David Woodhouse @ 2023-01-13 17:13 UTC (permalink / raw)
  To: Boqun Feng, linux-kernel, rcu, kvm
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Lai Jiangshan, Paul E . McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Paolo Bonzini, seanjc, Joel Fernandes,
	Matthew Wilcox, Michal Luczaj

[-- Attachment #1: Type: text/plain, Size: 419 bytes --]

On Fri, 2023-01-13 at 12:46 +0000, David Woodhouse wrote:
> @@ -439,6 +469,7 @@ int main(int argc, char *argv[])
>         bool verbose;
>         int ret;
>  
> +       printf("TEST_DONE is %d\n", TEST_DONE);
>         verbose = argc > 1 && (!strncmp(argv[1], "-v", 3) ||
>                                !strncmp(argv[1], "--verbose", 10));
>  

Dammit.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks
       [not found] ` <20230113130330.1027-1-hdanton@sina.com>
@ 2023-01-13 17:58   ` Boqun Feng
       [not found]   ` <20230113235809.1085-1-hdanton@sina.com>
  1 sibling, 0 replies; 27+ messages in thread
From: Boqun Feng @ 2023-01-13 17:58 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-kernel, linux-mm, Peter Zijlstra, Paul E. McKenney,
	Paolo Bonzini, Joel Fernandes

On Fri, Jan 13, 2023 at 09:03:30PM +0800, Hillf Danton wrote:
> On 12 Jan 2023 22:59:54 -0800 Boqun Feng <boqun.feng@gmail.com>
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -1267,6 +1267,8 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
> >  {
> >  	struct rcu_synchronize rcu;
> >  
> > +	srcu_lock_sync(&ssp->dep_map);
> > +
> >  	RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> >  			 lock_is_held(&rcu_bh_lock_map) ||
> >  			 lock_is_held(&rcu_lock_map) ||
> > -- 
> > 2.38.1
> 
> The following deadlock is able to escape srcu_lock_sync() because the
> __lock_release folded in sync leaves one lock on the sync side.
> 
> 	cpu9		cpu0
> 	---		---
> 	lock A		srcu_lock_acquire(&ssp->dep_map);
> 	srcu_lock_sync(&ssp->dep_map);
> 			lock A

But isn't it just the srcu_mutex_ABBA test case in patch #3, and my run
of lockdep selftest shows we can catch it. Anything subtle I'm missing?

Regards,
Boqun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks
  2023-01-13 11:29   ` Paul E. McKenney
@ 2023-01-13 18:05     ` Boqun Feng
  2023-01-13 19:11       ` Paul E. McKenney
  0 siblings, 1 reply; 27+ messages in thread
From: Boqun Feng @ 2023-01-13 18:05 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, rcu, kvm, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Lai Jiangshan, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, David Woodhouse, Paolo Bonzini, seanjc,
	Joel Fernandes, Matthew Wilcox, Michal Luczaj

On Fri, Jan 13, 2023 at 03:29:49AM -0800, Paul E. McKenney wrote:
> On Thu, Jan 12, 2023 at 10:59:54PM -0800, Boqun Feng wrote:
> > Although all flavors of RCU are annotated correctly with lockdep as
> > recursive read locks, their 'check' parameter of lock_acquire() is
> > unset. It means that RCU read locks are not added into the lockdep
> > dependency graph therefore deadlock detection based on dependency graph
> > won't catch deadlock caused by RCU. This is fine for "non-sleepable" RCU
> > flavors since wait-context detection and other context based detection
> > can catch these deadlocks. However for sleepable RCU, this is limited.
> > 
> > Actually we can detect the deadlocks caused by SRCU by 1) making
> > srcu_read_lock() a 'check'ed recursive read lock and 2) making
> > synchronize_srcu() a empty write lock critical section. Even better,
> > with the newly introduced lock_sync(), we can avoid false positives
> > about irq-unsafe/safe. So do it.
> > 
> > Note that NMI safe SRCU read side critical sections are currently not
> > annonated, since step-by-step approach can help us deal with
> > false-positives. These may be annotated in the future.
> > 
> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> 
> Nice, thank you!!!
> 
> Acked-by: Paul E. McKenney <paulmck@kernel.org>
> 
> Or if you would prefer that I take the series through -rcu, please just
> let me know.
> 

I prefer that the first two patches go through your tree, because it
reduces the synchronization among locking, rcu and KVM trees to the
synchronization betwen rcu and KVM trees.

For patch #3, since it's not really ready yet, so I don't know, but I
guess when it's finished, probably better go through -rcu.

Regards,
Boqun

> 							Thanx, Paul
> 
> > ---
> >  include/linux/srcu.h  | 23 +++++++++++++++++++++--
> >  kernel/rcu/srcutiny.c |  2 ++
> >  kernel/rcu/srcutree.c |  2 ++
> >  3 files changed, 25 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> > index 9b9d0bbf1d3c..a1595f8c5155 100644
> > --- a/include/linux/srcu.h
> > +++ b/include/linux/srcu.h
> > @@ -102,6 +102,21 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> >  	return lock_is_held(&ssp->dep_map);
> >  }
> >  
> > +static inline void srcu_lock_acquire(struct lockdep_map *map)
> > +{
> > +	lock_map_acquire_read(map);
> > +}
> > +
> > +static inline void srcu_lock_release(struct lockdep_map *map)
> > +{
> > +	lock_map_release(map);
> > +}
> > +
> > +static inline void srcu_lock_sync(struct lockdep_map *map)
> > +{
> > +	lock_map_sync(map);
> > +}
> > +
> >  #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> >  
> >  static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> > @@ -109,6 +124,10 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> >  	return 1;
> >  }
> >  
> > +#define srcu_lock_acquire(m) do { } while (0)
> > +#define srcu_lock_release(m) do { } while (0)
> > +#define srcu_lock_sync(m) do { } while (0)
> > +
> >  #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> >  
> >  #define SRCU_NMI_UNKNOWN	0x0
> > @@ -182,7 +201,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)
> >  
> >  	srcu_check_nmi_safety(ssp, false);
> >  	retval = __srcu_read_lock(ssp);
> > -	rcu_lock_acquire(&(ssp)->dep_map);
> > +	srcu_lock_acquire(&(ssp)->dep_map);
> >  	return retval;
> >  }
> >  
> > @@ -226,7 +245,7 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx)
> >  {
> >  	WARN_ON_ONCE(idx & ~0x1);
> >  	srcu_check_nmi_safety(ssp, false);
> > -	rcu_lock_release(&(ssp)->dep_map);
> > +	srcu_lock_release(&(ssp)->dep_map);
> >  	__srcu_read_unlock(ssp, idx);
> >  }
> >  
> > diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> > index b12fb0cec44d..336af24e0fe3 100644
> > --- a/kernel/rcu/srcutiny.c
> > +++ b/kernel/rcu/srcutiny.c
> > @@ -197,6 +197,8 @@ void synchronize_srcu(struct srcu_struct *ssp)
> >  {
> >  	struct rcu_synchronize rs;
> >  
> > +	srcu_lock_sync(&ssp->dep_map);
> > +
> >  	RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> >  			lock_is_held(&rcu_bh_lock_map) ||
> >  			lock_is_held(&rcu_lock_map) ||
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index ca4b5dcec675..408088c73e0e 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -1267,6 +1267,8 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
> >  {
> >  	struct rcu_synchronize rcu;
> >  
> > +	srcu_lock_sync(&ssp->dep_map);
> > +
> >  	RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> >  			 lock_is_held(&rcu_bh_lock_map) ||
> >  			 lock_is_held(&rcu_lock_map) ||
> > -- 
> > 2.38.1
> > 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks
  2023-01-13 18:05     ` Boqun Feng
@ 2023-01-13 19:11       ` Paul E. McKenney
  2023-01-16 17:36         ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2023-01-13 19:11 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, rcu, kvm, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Lai Jiangshan, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, David Woodhouse, Paolo Bonzini, seanjc,
	Joel Fernandes, Matthew Wilcox, Michal Luczaj

On Fri, Jan 13, 2023 at 10:05:22AM -0800, Boqun Feng wrote:
> On Fri, Jan 13, 2023 at 03:29:49AM -0800, Paul E. McKenney wrote:
> > On Thu, Jan 12, 2023 at 10:59:54PM -0800, Boqun Feng wrote:
> > > Although all flavors of RCU are annotated correctly with lockdep as
> > > recursive read locks, their 'check' parameter of lock_acquire() is
> > > unset. It means that RCU read locks are not added into the lockdep
> > > dependency graph therefore deadlock detection based on dependency graph
> > > won't catch deadlock caused by RCU. This is fine for "non-sleepable" RCU
> > > flavors since wait-context detection and other context based detection
> > > can catch these deadlocks. However for sleepable RCU, this is limited.
> > > 
> > > Actually we can detect the deadlocks caused by SRCU by 1) making
> > > srcu_read_lock() a 'check'ed recursive read lock and 2) making
> > > synchronize_srcu() a empty write lock critical section. Even better,
> > > with the newly introduced lock_sync(), we can avoid false positives
> > > about irq-unsafe/safe. So do it.
> > > 
> > > Note that NMI safe SRCU read side critical sections are currently not
> > > annonated, since step-by-step approach can help us deal with
> > > false-positives. These may be annotated in the future.
> > > 
> > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > 
> > Nice, thank you!!!
> > 
> > Acked-by: Paul E. McKenney <paulmck@kernel.org>
> > 
> > Or if you would prefer that I take the series through -rcu, please just
> > let me know.
> 
> I prefer that the first two patches go through your tree, because it
> reduces the synchronization among locking, rcu and KVM trees to the
> synchronization betwen rcu and KVM trees.

Very well, I have queued and pushed these with the usual wordsmithing,
thank you!

On the possibility of annotating __srcu_read_lock_nmisafe() and
__srcu_read_unlock_nmisafe(), are those lockdep annotations themselves
NMI-safe?

> For patch #3, since it's not really ready yet, so I don't know, but I
> guess when it's finished, probably better go through -rcu.

Please let me know when it is ready!

							Thanx, Paul

> Regards,
> Boqun
> 
> > 							Thanx, Paul
> > 
> > > ---
> > >  include/linux/srcu.h  | 23 +++++++++++++++++++++--
> > >  kernel/rcu/srcutiny.c |  2 ++
> > >  kernel/rcu/srcutree.c |  2 ++
> > >  3 files changed, 25 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> > > index 9b9d0bbf1d3c..a1595f8c5155 100644
> > > --- a/include/linux/srcu.h
> > > +++ b/include/linux/srcu.h
> > > @@ -102,6 +102,21 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> > >  	return lock_is_held(&ssp->dep_map);
> > >  }
> > >  
> > > +static inline void srcu_lock_acquire(struct lockdep_map *map)
> > > +{
> > > +	lock_map_acquire_read(map);
> > > +}
> > > +
> > > +static inline void srcu_lock_release(struct lockdep_map *map)
> > > +{
> > > +	lock_map_release(map);
> > > +}
> > > +
> > > +static inline void srcu_lock_sync(struct lockdep_map *map)
> > > +{
> > > +	lock_map_sync(map);
> > > +}
> > > +
> > >  #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> > >  
> > >  static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> > > @@ -109,6 +124,10 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> > >  	return 1;
> > >  }
> > >  
> > > +#define srcu_lock_acquire(m) do { } while (0)
> > > +#define srcu_lock_release(m) do { } while (0)
> > > +#define srcu_lock_sync(m) do { } while (0)
> > > +
> > >  #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> > >  
> > >  #define SRCU_NMI_UNKNOWN	0x0
> > > @@ -182,7 +201,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)
> > >  
> > >  	srcu_check_nmi_safety(ssp, false);
> > >  	retval = __srcu_read_lock(ssp);
> > > -	rcu_lock_acquire(&(ssp)->dep_map);
> > > +	srcu_lock_acquire(&(ssp)->dep_map);
> > >  	return retval;
> > >  }
> > >  
> > > @@ -226,7 +245,7 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx)
> > >  {
> > >  	WARN_ON_ONCE(idx & ~0x1);
> > >  	srcu_check_nmi_safety(ssp, false);
> > > -	rcu_lock_release(&(ssp)->dep_map);
> > > +	srcu_lock_release(&(ssp)->dep_map);
> > >  	__srcu_read_unlock(ssp, idx);
> > >  }
> > >  
> > > diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> > > index b12fb0cec44d..336af24e0fe3 100644
> > > --- a/kernel/rcu/srcutiny.c
> > > +++ b/kernel/rcu/srcutiny.c
> > > @@ -197,6 +197,8 @@ void synchronize_srcu(struct srcu_struct *ssp)
> > >  {
> > >  	struct rcu_synchronize rs;
> > >  
> > > +	srcu_lock_sync(&ssp->dep_map);
> > > +
> > >  	RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> > >  			lock_is_held(&rcu_bh_lock_map) ||
> > >  			lock_is_held(&rcu_lock_map) ||
> > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > > index ca4b5dcec675..408088c73e0e 100644
> > > --- a/kernel/rcu/srcutree.c
> > > +++ b/kernel/rcu/srcutree.c
> > > @@ -1267,6 +1267,8 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
> > >  {
> > >  	struct rcu_synchronize rcu;
> > >  
> > > +	srcu_lock_sync(&ssp->dep_map);
> > > +
> > >  	RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> > >  			 lock_is_held(&rcu_bh_lock_map) ||
> > >  			 lock_is_held(&rcu_lock_map) ||
> > > -- 
> > > 2.38.1
> > > 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 4/3] locking/lockdep: Improve the deadlock scenario print for sync and read lock
  2023-01-13  6:59 [PATCH 0/3] Detect SRCU related deadlocks Boqun Feng
                   ` (4 preceding siblings ...)
       [not found] ` <20230113130330.1027-1-hdanton@sina.com>
@ 2023-01-13 23:57 ` Boqun Feng
  2023-01-16 22:21   ` Waiman Long
  5 siblings, 1 reply; 27+ messages in thread
From: Boqun Feng @ 2023-01-13 23:57 UTC (permalink / raw)
  To: linux-kernel, rcu, kvm
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, Lai Jiangshan, Paul E. McKenney, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, David Woodhouse,
	Paolo Bonzini, seanjc, Joel Fernandes, Matthew Wilcox,
	Michal Luczaj

Lock scenario print is always a weak spot of lockdep splats. Improvement
can be made if we rework the dependency search and the error printing.

However without touching the graph search, we can improve a little for
the circular deadlock case, since we have the to-be-added lock
dependency, and know whether these two locks are read/write/sync.

In order to know whether a held_lock is sync or not, a bit was
"stolen" from ->references, which reduce our limit for the same lock
class nesting from 2^12 to 2^11, and it should still be good enough.

Besides, since we now have bit in held_lock for sync, we don't need the
"hardirqoffs being 1" trick, and also we can avoid the __lock_release()
if we jump out of __lock_acquire() before the held_lock stored.

With these changes, a deadlock case evolved with read lock and sync gets
a better print-out from:

	[...]  Possible unsafe locking scenario:
	[...]
	[...]        CPU0                    CPU1
	[...]        ----                    ----
	[...]   lock(srcuA);
	[...]                                lock(srcuB);
	[...]                                lock(srcuA);
	[...]   lock(srcuB);

to

	[...]  Possible unsafe locking scenario:
	[...]
	[...]        CPU0                    CPU1
	[...]        ----                    ----
	[...]   rlock(srcuA);
	[...]                                lock(srcuB);
	[...]                                lock(srcuA);
	[...]   sync(srcuB);

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 include/linux/lockdep.h  |  3 ++-
 kernel/locking/lockdep.c | 48 ++++++++++++++++++++++++++--------------
 2 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index ba09df6a0872..febd7ecc225c 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -134,7 +134,8 @@ struct held_lock {
 	unsigned int read:2;        /* see lock_acquire() comment */
 	unsigned int check:1;       /* see lock_acquire() comment */
 	unsigned int hardirqs_off:1;
-	unsigned int references:12;					/* 32 bits */
+	unsigned int sync:1;
+	unsigned int references:11;					/* 32 bits */
 	unsigned int pin_count;
 };
 
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index cffa026a765f..4031d87f6829 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1880,6 +1880,8 @@ print_circular_lock_scenario(struct held_lock *src,
 	struct lock_class *source = hlock_class(src);
 	struct lock_class *target = hlock_class(tgt);
 	struct lock_class *parent = prt->class;
+	int src_read = src->read;
+	int tgt_read = tgt->read;
 
 	/*
 	 * A direct locking problem where unsafe_class lock is taken
@@ -1907,7 +1909,10 @@ print_circular_lock_scenario(struct held_lock *src,
 	printk(" Possible unsafe locking scenario:\n\n");
 	printk("       CPU0                    CPU1\n");
 	printk("       ----                    ----\n");
-	printk("  lock(");
+	if (tgt_read != 0)
+		printk("  rlock(");
+	else
+		printk("  lock(");
 	__print_lock_name(target);
 	printk(KERN_CONT ");\n");
 	printk("                               lock(");
@@ -1916,7 +1921,12 @@ print_circular_lock_scenario(struct held_lock *src,
 	printk("                               lock(");
 	__print_lock_name(target);
 	printk(KERN_CONT ");\n");
-	printk("  lock(");
+	if (src_read != 0)
+		printk("  rlock(");
+	else if (src->sync)
+		printk("  sync(");
+	else
+		printk("  lock(");
 	__print_lock_name(source);
 	printk(KERN_CONT ");\n");
 	printk("\n *** DEADLOCK ***\n\n");
@@ -4530,7 +4540,13 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
 					return 0;
 		}
 	}
-	if (!hlock->hardirqs_off) {
+
+	/*
+	 * For lock_sync(), don't mark the ENABLED usage, since lock_sync()
+	 * creates no critical section and no extra dependency can be introduced
+	 * by interrupts
+	 */
+	if (!hlock->hardirqs_off && !hlock->sync) {
 		if (hlock->read) {
 			if (!mark_lock(curr, hlock,
 					LOCK_ENABLED_HARDIRQ_READ))
@@ -4909,7 +4925,7 @@ static int __lock_is_held(const struct lockdep_map *lock, int read);
 static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 			  int trylock, int read, int check, int hardirqs_off,
 			  struct lockdep_map *nest_lock, unsigned long ip,
-			  int references, int pin_count)
+			  int references, int pin_count, int sync)
 {
 	struct task_struct *curr = current;
 	struct lock_class *class = NULL;
@@ -4960,7 +4976,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 
 	class_idx = class - lock_classes;
 
-	if (depth) { /* we're holding locks */
+	if (depth && !sync) {
+		/* we're holding locks and the new held lock is not a sync */
 		hlock = curr->held_locks + depth - 1;
 		if (hlock->class_idx == class_idx && nest_lock) {
 			if (!references)
@@ -4994,6 +5011,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	hlock->trylock = trylock;
 	hlock->read = read;
 	hlock->check = check;
+	hlock->sync = !!sync;
 	hlock->hardirqs_off = !!hardirqs_off;
 	hlock->references = references;
 #ifdef CONFIG_LOCK_STAT
@@ -5055,6 +5073,10 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	if (!validate_chain(curr, hlock, chain_head, chain_key))
 		return 0;
 
+	/* For lock_sync(), we are done here since no actual critical section */
+	if (hlock->sync)
+		return 1;
+
 	curr->curr_chain_key = chain_key;
 	curr->lockdep_depth++;
 	check_chain_key(curr);
@@ -5196,7 +5218,7 @@ static int reacquire_held_locks(struct task_struct *curr, unsigned int depth,
 				    hlock->read, hlock->check,
 				    hlock->hardirqs_off,
 				    hlock->nest_lock, hlock->acquire_ip,
-				    hlock->references, hlock->pin_count)) {
+				    hlock->references, hlock->pin_count, 0)) {
 		case 0:
 			return 1;
 		case 1:
@@ -5666,7 +5688,7 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 
 	lockdep_recursion_inc();
 	__lock_acquire(lock, subclass, trylock, read, check,
-		       irqs_disabled_flags(flags), nest_lock, ip, 0, 0);
+		       irqs_disabled_flags(flags), nest_lock, ip, 0, 0, 0);
 	lockdep_recursion_finish();
 	raw_local_irq_restore(flags);
 }
@@ -5699,11 +5721,6 @@ EXPORT_SYMBOL_GPL(lock_release);
  * APIs are used to wait for one or multiple critical sections (on other CPUs
  * or threads), and it means that calling these APIs inside these critical
  * sections is potential deadlock.
- *
- * This annotation acts as an acqurie+release anontation pair with hardirqoff
- * being 1. Since there's no critical section, no interrupt can create extra
- * dependencies "inside" the annotation, hardirqoff == 1 allows us to avoid
- * false positives.
  */
 void lock_sync(struct lockdep_map *lock, unsigned subclass, int read,
 	       int check, struct lockdep_map *nest_lock, unsigned long ip)
@@ -5717,10 +5734,9 @@ void lock_sync(struct lockdep_map *lock, unsigned subclass, int read,
 	check_flags(flags);
 
 	lockdep_recursion_inc();
-	__lock_acquire(lock, subclass, 0, read, check, 1, nest_lock, ip, 0, 0);
-
-	if (__lock_release(lock, ip))
-		check_chain_key(current);
+	__lock_acquire(lock, subclass, 0, read, check,
+		       irqs_disabled_flags(flags), nest_lock, ip, 0, 0, 1);
+	check_chain_key(current);
 	lockdep_recursion_finish();
 	raw_local_irq_restore(flags);
 }
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks
       [not found]   ` <20230113235809.1085-1-hdanton@sina.com>
@ 2023-01-14  0:17     ` Boqun Feng
       [not found]     ` <20230114071832.1162-1-hdanton@sina.com>
  1 sibling, 0 replies; 27+ messages in thread
From: Boqun Feng @ 2023-01-14  0:17 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-kernel, linux-mm, Peter Zijlstra, Paul E. McKenney,
	Paolo Bonzini, Joel Fernandes

On Sat, Jan 14, 2023 at 07:58:09AM +0800, Hillf Danton wrote:
> On 13 Jan 2023 09:58:10 -0800 Boqun Feng <boqun.feng@gmail.com>
> > On Fri, Jan 13, 2023 at 09:03:30PM +0800, Hillf Danton wrote:
> > > On 12 Jan 2023 22:59:54 -0800 Boqun Feng <boqun.feng@gmail.com>
> > > > --- a/kernel/rcu/srcutree.c
> > > > +++ b/kernel/rcu/srcutree.c
> > > > @@ -1267,6 +1267,8 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
> > > >  {
> > > >  	struct rcu_synchronize rcu;
> > > >  
> > > > +	srcu_lock_sync(&ssp->dep_map);
> > > > +
> > > >  	RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> > > >  			 lock_is_held(&rcu_bh_lock_map) ||
> > > >  			 lock_is_held(&rcu_lock_map) ||
> > > > -- 
> > > > 2.38.1
> > > 
> > > The following deadlock is able to escape srcu_lock_sync() because the
> > > __lock_release folded in sync leaves one lock on the sync side.
> > > 
> > > 	cpu9		cpu0
> > > 	---		---
> > > 	lock A		srcu_lock_acquire(&ssp->dep_map);
> > > 	srcu_lock_sync(&ssp->dep_map);
> > > 			lock A
> > 
> > But isn't it just the srcu_mutex_ABBA test case in patch #3, and my run
> > of lockdep selftest shows we can catch it. Anything subtle I'm missing?
> 
> I am leaning to not call it ABBA deadlock, because B is unlocked.
> 
> 	task X		task Y
> 	---		---
> 	lock A
> 	lock B
> 			lock B
> 	unlock B
> 	wait_for_completion E
> 
> 			lock A
> 			complete E
> 
> And no deadlock should be detected/caught after B goes home.

Your example makes me more confused.. given the case:

	task X		task Y
	---		---
	mutex_lock(A);
			srcu_read_lock(B);
	synchronze_srcu(B);
			mutex_lock(A);

isn't it a deadlock? If your example, A, B or E which one is srcu?

Regards,
Boqun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks
       [not found]     ` <20230114071832.1162-1-hdanton@sina.com>
@ 2023-01-14  7:32       ` Boqun Feng
       [not found]       ` <20230114102659.1219-1-hdanton@sina.com>
  1 sibling, 0 replies; 27+ messages in thread
From: Boqun Feng @ 2023-01-14  7:32 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-kernel, linux-mm, Peter Zijlstra, Paul E. McKenney,
	Paolo Bonzini, Joel Fernandes

On Sat, Jan 14, 2023 at 03:18:32PM +0800, Hillf Danton wrote:
> On Fri, 13 Jan 2023 16:17:59 -0800 Boqun Feng <boqun.feng@gmail.com>
> > On Sat, Jan 14, 2023 at 07:58:09AM +0800, Hillf Danton wrote:
> > > On 13 Jan 2023 09:58:10 -0800 Boqun Feng <boqun.feng@gmail.com>
> > > > On Fri, Jan 13, 2023 at 09:03:30PM +0800, Hillf Danton wrote:
> > > > > On 12 Jan 2023 22:59:54 -0800 Boqun Feng <boqun.feng@gmail.com>
> > > > > > --- a/kernel/rcu/srcutree.c
> > > > > > +++ b/kernel/rcu/srcutree.c
> > > > > > @@ -1267,6 +1267,8 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
> > > > > >  {
> > > > > >  	struct rcu_synchronize rcu;
> > > > > >  
> > > > > > +	srcu_lock_sync(&ssp->dep_map);
> > > > > > +
> > > > > >  	RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> > > > > >  			 lock_is_held(&rcu_bh_lock_map) ||
> > > > > >  			 lock_is_held(&rcu_lock_map) ||
> > > > > > -- 
> > > > > > 2.38.1
> > > > > 
> > > > > The following deadlock is able to escape srcu_lock_sync() because the
> > > > > __lock_release folded in sync leaves one lock on the sync side.
> > > > > 
> > > > > 	cpu9		cpu0
> > > > > 	---		---
> > > > > 	lock A		srcu_lock_acquire(&ssp->dep_map);
> > > > > 	srcu_lock_sync(&ssp->dep_map);
> > > > > 			lock A
> > > > 
> > > > But isn't it just the srcu_mutex_ABBA test case in patch #3, and my run
> > > > of lockdep selftest shows we can catch it. Anything subtle I'm missing?
> > > 
> > > I am leaning to not call it ABBA deadlock, because B is unlocked.
> > > 
> > > 	task X		task Y
> > > 	---		---
> > > 	lock A
> > > 	lock B
> > > 			lock B
> > > 	unlock B
> > > 	wait_for_completion E
> > > 
> > > 			lock A
> > > 			complete E
> > > 
> > > And no deadlock should be detected/caught after B goes home.
> > 
> > Your example makes me more confused.. given the case:
> > 
> > 	task X		task Y
> > 	---		---
> > 	mutex_lock(A);
> > 			srcu_read_lock(B);
> > 	synchronze_srcu(B);
> > 			mutex_lock(A);
> > 
> > isn't it a deadlock?
> 
> Yes and nope, see below.
> 
> > If your example, A, B or E which one is srcu?
> 
> A and B are mutex, and E is completion in my example to show the failure
> of catching deadlock in case of non-fake lock. Now see srcu after your change.
> 
>  	task X			task Y
>  	---			---
>  	mutex_lock(A);
>  				srcu_read_lock(B);
> 				srcu_lock_acquire(&B->dep_map);
> 				a) lock_map_acquire_read(&B->dep_map);
>  	synchronze_srcu(B);
> 	__synchronize_srcu(B);
> 	srcu_lock_sync(&B->dep_map);
> 	lock_map_sync(&B->dep_map);
> 	lock_sync(&B->dep_map);
> 	__lock_acquire(&B->dep_map);

At this time, lockdep add dependency A -> B in the dependency graph.

> 				b) lock_map_acquire_read(&B->dep_map);
> 	__lock_release(&B->dep_map);
> 				c) lock_map_acquire_read(&B->dep_map);
>  				mutex_lock(A);

and here, lockdep will try to add dependency B -> A into the dependency
graph, and find that A -> B -> A will form a circle (with strong
dependency), therefore lockdep knows it's a deadlock.

>  
> No deadlock could be detected if taskY takes mutexA after taskX releases B,

The timing that taskX releases B doesn't master, since lockdep uses
graph to detect deadlocks rather than after-fact detection.

> and how taskY acquires B does not matter as per the a), b) and c) modes in
> the above chart, again because releasing lock can break deadlock in general.

I have test cases showing the above deadlock can be detected, so if you
think there is a deadlock that may dodge from my change, feel free to
add a test case in lib/locking-selftest.c ;-)

Regards,
Boqun


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks
       [not found]       ` <20230114102659.1219-1-hdanton@sina.com>
@ 2023-01-15  0:18         ` Boqun Feng
  0 siblings, 0 replies; 27+ messages in thread
From: Boqun Feng @ 2023-01-15  0:18 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-kernel, linux-mm, Peter Zijlstra, Paul E. McKenney,
	Paolo Bonzini, Joel Fernandes

On Sat, Jan 14, 2023 at 06:26:59PM +0800, Hillf Danton wrote:
> On Fri, 13 Jan 2023 23:32:01 -0800 Boqun Feng <boqun.feng@gmail.com>
> > On Sat, Jan 14, 2023 at 03:18:32PM +0800, Hillf Danton wrote:
> > >
> > >  	task X			task Y
> > >  	---			---
> > >  	mutex_lock(A);
> > >  				srcu_read_lock(B);
> > > 				srcu_lock_acquire(&B->dep_map);
> > > 				a) lock_map_acquire_read(&B->dep_map);
> > >  	synchronze_srcu(B);
> > > 	__synchronize_srcu(B);
> > > 	srcu_lock_sync(&B->dep_map);
> > > 	lock_map_sync(&B->dep_map);
> > > 	lock_sync(&B->dep_map);
> > > 	__lock_acquire(&B->dep_map);
> > 
> > At this time, lockdep add dependency A -> B in the dependency graph.
> > 
> > > 				b) lock_map_acquire_read(&B->dep_map);
> > > 	__lock_release(&B->dep_map);
> > > 				c) lock_map_acquire_read(&B->dep_map);
> > >  				mutex_lock(A);
> > 
> > and here, lockdep will try to add dependency B -> A into the dependency
> > graph, and find that A -> B -> A will form a circle (with strong
> > dependency), therefore lockdep knows it's a deadlock. 
> 
> Is the strong dependency applying to mode c)?
> If yes then deadlock should be also detected in the following locking
> pattern that has no deadlock.
> 
> 	cpu0			cpu1
> 	---			---
> 	mutex_lock A
> 	mutex_lock B
> 	mutex_unlock B
> 				mutex_lock B
> 				mutex_lock A

Well, of course, this is how lockdep works. Lockdep detects the
*potential* deadlocks rather than detects the deadlocks when they
really happen. Otherwise lockdep is useless.

The execution in your example shows the potential deadlocks, i.e. one
task acquires A and then acquires B, the other task acquires B and then
acquires A. Potential deadlocks mean given a correct timing, a deadlock
may happen.

Regards,
Boqun

> > 
> > >  
> > > No deadlock could be detected if taskY takes mutexA after taskX releases B,
> > 
> > The timing that taskX releases B doesn't master, since lockdep uses
> > graph to detect deadlocks rather than after-fact detection.
> > 
> > > and how taskY acquires B does not matter as per the a), b) and c) modes in
> > > the above chart, again because releasing lock can break deadlock in general.
> > 
> > I have test cases showing the above deadlock can be detected, so if you
> > think there is a deadlock that may dodge from my change, feel free to
> > add a test case in lib/locking-selftest.c ;-)
> > 
> > Regards,
> > Boqun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks
  2023-01-13 19:11       ` Paul E. McKenney
@ 2023-01-16 17:36         ` Paolo Bonzini
  2023-01-16 17:54           ` Boqun Feng
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2023-01-16 17:36 UTC (permalink / raw)
  To: paulmck, Boqun Feng
  Cc: linux-kernel, rcu, kvm, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Lai Jiangshan, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, David Woodhouse, seanjc, Joel Fernandes,
	Matthew Wilcox, Michal Luczaj

On 1/13/23 20:11, Paul E. McKenney wrote:
> On Fri, Jan 13, 2023 at 10:05:22AM -0800, Boqun Feng wrote:
>> On Fri, Jan 13, 2023 at 03:29:49AM -0800, Paul E. McKenney wrote:
>> I prefer that the first two patches go through your tree, because it
>> reduces the synchronization among locking, rcu and KVM trees to the
>> synchronization betwen rcu and KVM trees.
> 
> Very well, I have queued and pushed these with the usual wordsmithing,
> thank you!

I'm worried about this case:

	CPU 0				CPU 1
	--------------------		------------------
	lock A				srcu lock B
	srcu lock B			lock A
	srcu unlock B			unlock A
	unlock A			srcu unlock B

While a bit unclean, there is nothing that downright forbids this; as 
long as synchronize_srcu does not happen inside lock A, no deadlock can 
occur.

However, if srcu is replaced with an rwlock then lockdep should and does 
report a deadlock.  Boqun, do you get a false positive or do your 
patches correctly suppress this?

Paolo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks
  2023-01-16 17:36         ` Paolo Bonzini
@ 2023-01-16 17:54           ` Boqun Feng
  2023-01-16 18:56             ` Paul E. McKenney
  0 siblings, 1 reply; 27+ messages in thread
From: Boqun Feng @ 2023-01-16 17:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: paulmck, linux-kernel, rcu, kvm, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Lai Jiangshan, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, David Woodhouse, seanjc,
	Joel Fernandes, Matthew Wilcox, Michal Luczaj

On Mon, Jan 16, 2023 at 06:36:43PM +0100, Paolo Bonzini wrote:
> On 1/13/23 20:11, Paul E. McKenney wrote:
> > On Fri, Jan 13, 2023 at 10:05:22AM -0800, Boqun Feng wrote:
> > > On Fri, Jan 13, 2023 at 03:29:49AM -0800, Paul E. McKenney wrote:
> > > I prefer that the first two patches go through your tree, because it
> > > reduces the synchronization among locking, rcu and KVM trees to the
> > > synchronization betwen rcu and KVM trees.
> > 
> > Very well, I have queued and pushed these with the usual wordsmithing,
> > thank you!
> 
> I'm worried about this case:
> 
> 	CPU 0				CPU 1
> 	--------------------		------------------
> 	lock A				srcu lock B
> 	srcu lock B			lock A
> 	srcu unlock B			unlock A
> 	unlock A			srcu unlock B
> 
> While a bit unclean, there is nothing that downright forbids this; as long
> as synchronize_srcu does not happen inside lock A, no deadlock can occur.
> 

First, even with my change, lockdep won't report this as a deadlock
because srcu_read_lock() is annotated as a recursive (unfair) read lock
(the "read" parameter for lock_acquire() is 2) and in this case lockdep
knows that it won't cause deadlock.

For SRCU, the annotation mapping that is 1) srcu_read_lock() is marked
as recursive read lock and 2) synchronize_srcu() is marked as write lock
sync, recursive read locks themselves cannot cause deadlocks and lockdep
is aware of it.

Will add a selftest for this later.

> However, if srcu is replaced with an rwlock then lockdep should and does
> report a deadlock.  Boqun, do you get a false positive or do your patches

To be more precise, to have a deadlock, the read lock on CPU 0 has to be
a *fair* read lock (i.e. non-recursive reader, the "read" parameter for
lock_acquire is 1)

> correctly suppress this?
> 

I'm pretty sure that lockdep handles this ;-)

Regards,
Boqun

> Paolo
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks
  2023-01-16 17:54           ` Boqun Feng
@ 2023-01-16 18:56             ` Paul E. McKenney
  0 siblings, 0 replies; 27+ messages in thread
From: Paul E. McKenney @ 2023-01-16 18:56 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Paolo Bonzini, linux-kernel, rcu, kvm, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Waiman Long, Lai Jiangshan,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	David Woodhouse, seanjc, Joel Fernandes, Matthew Wilcox,
	Michal Luczaj

On Mon, Jan 16, 2023 at 09:54:32AM -0800, Boqun Feng wrote:
> On Mon, Jan 16, 2023 at 06:36:43PM +0100, Paolo Bonzini wrote:
> > On 1/13/23 20:11, Paul E. McKenney wrote:
> > > On Fri, Jan 13, 2023 at 10:05:22AM -0800, Boqun Feng wrote:
> > > > On Fri, Jan 13, 2023 at 03:29:49AM -0800, Paul E. McKenney wrote:
> > > > I prefer that the first two patches go through your tree, because it
> > > > reduces the synchronization among locking, rcu and KVM trees to the
> > > > synchronization betwen rcu and KVM trees.
> > > 
> > > Very well, I have queued and pushed these with the usual wordsmithing,
> > > thank you!
> > 
> > I'm worried about this case:
> > 
> > 	CPU 0				CPU 1
> > 	--------------------		------------------
> > 	lock A				srcu lock B
> > 	srcu lock B			lock A
> > 	srcu unlock B			unlock A
> > 	unlock A			srcu unlock B
> > 
> > While a bit unclean, there is nothing that downright forbids this; as long
> > as synchronize_srcu does not happen inside lock A, no deadlock can occur.
> > 
> 
> First, even with my change, lockdep won't report this as a deadlock
> because srcu_read_lock() is annotated as a recursive (unfair) read lock
> (the "read" parameter for lock_acquire() is 2) and in this case lockdep
> knows that it won't cause deadlock.
> 
> For SRCU, the annotation mapping that is 1) srcu_read_lock() is marked
> as recursive read lock and 2) synchronize_srcu() is marked as write lock
> sync, recursive read locks themselves cannot cause deadlocks and lockdep
> is aware of it.
> 
> Will add a selftest for this later.
> 
> > However, if srcu is replaced with an rwlock then lockdep should and does
> > report a deadlock.  Boqun, do you get a false positive or do your patches
> 
> To be more precise, to have a deadlock, the read lock on CPU 0 has to be
> a *fair* read lock (i.e. non-recursive reader, the "read" parameter for
> lock_acquire is 1)
> 
> > correctly suppress this?
> 
> I'm pretty sure that lockdep handles this ;-)

And lockdep agrees, refusing to complain about the following:

	idx = srcu_read_lock(&srcu1);
	mutex_lock(&mut1);
	mutex_unlock(&mut1);
	srcu_read_unlock(&srcu1, idx);

	mutex_lock(&mut1);
	idx = srcu_read_lock(&srcu1);
	srcu_read_unlock(&srcu1, idx);
	mutex_unlock(&mut1);

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] locking/lockdep: Introduce lock_sync()
  2023-01-13  6:59 ` [PATCH 1/3] locking/lockdep: Introduce lock_sync() Boqun Feng
@ 2023-01-16 21:56   ` Waiman Long
  0 siblings, 0 replies; 27+ messages in thread
From: Waiman Long @ 2023-01-16 21:56 UTC (permalink / raw)
  To: Boqun Feng, linux-kernel, rcu, kvm
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Lai Jiangshan,
	Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, David Woodhouse, Paolo Bonzini, seanjc,
	Joel Fernandes, Matthew Wilcox, Michal Luczaj

On 1/13/23 01:59, Boqun Feng wrote:
> Currently, in order to annonate functions like synchronize_srcu() for
> lockdep, a trick as follow can be used:
>
> 	lock_acquire();
> 	lock_release();
>
> , which indicates synchronize_srcu() acts like an empty critical section
> that waits for other (read-side) critical sections to finish. This
> surely can catch some deadlock, but as discussion brought up by Paul
> Mckenney [1], this could introduce false positives because of
> irq-safe/unsafe detection. Extra tricks might help this:
>
> 	local_irq_disable(..);
> 	lock_acquire();
> 	lock_release();
> 	local_irq_enable(...);
>
> But it's better that lockdep could provide an annonation for
> synchronize_srcu() like functions, so that people won't need to repeat
> the ugly tricks above. Therefore introduce lock_sync(). It's simply an
> lock+unlock pair with no irq safe/unsafe deadlock check, since the
> to-be-annontated functions don't create real critical sections therefore
> there is no way that irq can create extra dependencies.
>
> [1]: https://lore.kernel.org/lkml/20180412021233.ewncg5jjuzjw3x62@tardis/
>
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>   include/linux/lockdep.h  |  5 +++++
>   kernel/locking/lockdep.c | 34 ++++++++++++++++++++++++++++++++++
>   2 files changed, 39 insertions(+)
>
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index 1f1099dac3f0..ba09df6a0872 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -268,6 +268,10 @@ extern void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>   
>   extern void lock_release(struct lockdep_map *lock, unsigned long ip);
>   
> +extern void lock_sync(struct lockdep_map *lock, unsigned int subclass,
> +		      int read, int check, struct lockdep_map *nest_lock,
> +		      unsigned long ip);
> +
>   /* lock_is_held_type() returns */
>   #define LOCK_STATE_UNKNOWN	-1
>   #define LOCK_STATE_NOT_HELD	0
> @@ -555,6 +559,7 @@ do {									\
>   #define lock_map_acquire_read(l)		lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_)
>   #define lock_map_acquire_tryread(l)		lock_acquire_shared_recursive(l, 0, 1, NULL, _THIS_IP_)
>   #define lock_map_release(l)			lock_release(l, _THIS_IP_)
> +#define lock_map_sync(l)			lock_sync(l, 0, 0, 1, NULL, _THIS_IP_)
>   
>   #ifdef CONFIG_PROVE_LOCKING
>   # define might_lock(lock)						\
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index e3375bc40dad..cffa026a765f 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -5692,6 +5692,40 @@ void lock_release(struct lockdep_map *lock, unsigned long ip)
>   }
>   EXPORT_SYMBOL_GPL(lock_release);
>   
> +/*
> + * lock_sync() - A special annotation for synchronize_{s,}rcu()-like API.
> + *
> + * No actual critical section is created by the APIs annotated with this: these
> + * APIs are used to wait for one or multiple critical sections (on other CPUs
> + * or threads), and it means that calling these APIs inside these critical
> + * sections is potential deadlock.
> + *
> + * This annotation acts as an acqurie+release anontation pair with hardirqoff
> + * being 1. Since there's no critical section, no interrupt can create extra
> + * dependencies "inside" the annotation, hardirqoff == 1 allows us to avoid
> + * false positives.
> + */
> +void lock_sync(struct lockdep_map *lock, unsigned subclass, int read,
> +	       int check, struct lockdep_map *nest_lock, unsigned long ip)
> +{
> +	unsigned long flags;
> +
> +	if (unlikely(!lockdep_enabled()))
> +		return;
> +
> +	raw_local_irq_save(flags);
> +	check_flags(flags);
> +
> +	lockdep_recursion_inc();
> +	__lock_acquire(lock, subclass, 0, read, check, 1, nest_lock, ip, 0, 0);
> +
> +	if (__lock_release(lock, ip))
> +		check_chain_key(current);
> +	lockdep_recursion_finish();
> +	raw_local_irq_restore(flags);
> +}
> +EXPORT_SYMBOL_GPL(lock_sync);
> +
>   noinstr int lock_is_held_type(const struct lockdep_map *lock, int read)
>   {
>   	unsigned long flags;

This patch looks good to me.

Acked-by: Waiman Long <longman@redhat.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks
  2023-01-13  6:59 ` [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks Boqun Feng
  2023-01-13 11:29   ` Paul E. McKenney
@ 2023-01-16 22:01   ` Waiman Long
  1 sibling, 0 replies; 27+ messages in thread
From: Waiman Long @ 2023-01-16 22:01 UTC (permalink / raw)
  To: Boqun Feng, linux-kernel, rcu, kvm
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Lai Jiangshan,
	Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, David Woodhouse, Paolo Bonzini, seanjc,
	Joel Fernandes, Matthew Wilcox, Michal Luczaj

On 1/13/23 01:59, Boqun Feng wrote:
> Although all flavors of RCU are annotated correctly with lockdep as
> recursive read locks, their 'check' parameter of lock_acquire() is
> unset. It means that RCU read locks are not added into the lockdep
> dependency graph therefore deadlock detection based on dependency graph
> won't catch deadlock caused by RCU. This is fine for "non-sleepable" RCU
> flavors since wait-context detection and other context based detection
> can catch these deadlocks. However for sleepable RCU, this is limited.
>
> Actually we can detect the deadlocks caused by SRCU by 1) making
> srcu_read_lock() a 'check'ed recursive read lock and 2) making
> synchronize_srcu() a empty write lock critical section. Even better,
> with the newly introduced lock_sync(), we can avoid false positives
> about irq-unsafe/safe. So do it.
>
> Note that NMI safe SRCU read side critical sections are currently not
> annonated, since step-by-step approach can help us deal with
> false-positives. These may be annotated in the future.
>
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>   include/linux/srcu.h  | 23 +++++++++++++++++++++--
>   kernel/rcu/srcutiny.c |  2 ++
>   kernel/rcu/srcutree.c |  2 ++
>   3 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> index 9b9d0bbf1d3c..a1595f8c5155 100644
> --- a/include/linux/srcu.h
> +++ b/include/linux/srcu.h
> @@ -102,6 +102,21 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
>   	return lock_is_held(&ssp->dep_map);
>   }
>   
> +static inline void srcu_lock_acquire(struct lockdep_map *map)
> +{
> +	lock_map_acquire_read(map);
> +}
> +
> +static inline void srcu_lock_release(struct lockdep_map *map)
> +{
> +	lock_map_release(map);
> +}
> +
> +static inline void srcu_lock_sync(struct lockdep_map *map)
> +{
> +	lock_map_sync(map);
> +}
> +
>   #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */

It would be nice if a comment is added to document the difference 
between the 2 sets of rcu_lock_* and srcu_lock_* functions. It is 
described in patch description, but it is not easy to figure that out 
just by looking at the source files.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/3] locking/lockdep: Improve the deadlock scenario print for sync and read lock
  2023-01-13 23:57 ` [PATCH 4/3] locking/lockdep: Improve the deadlock scenario print for sync and read lock Boqun Feng
@ 2023-01-16 22:21   ` Waiman Long
  2023-01-16 22:35     ` Boqun Feng
  0 siblings, 1 reply; 27+ messages in thread
From: Waiman Long @ 2023-01-16 22:21 UTC (permalink / raw)
  To: Boqun Feng, linux-kernel, rcu, kvm
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Lai Jiangshan,
	Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, David Woodhouse, Paolo Bonzini, seanjc,
	Joel Fernandes, Matthew Wilcox, Michal Luczaj

On 1/13/23 18:57, Boqun Feng wrote:
> Lock scenario print is always a weak spot of lockdep splats. Improvement
> can be made if we rework the dependency search and the error printing.
>
> However without touching the graph search, we can improve a little for
> the circular deadlock case, since we have the to-be-added lock
> dependency, and know whether these two locks are read/write/sync.
>
> In order to know whether a held_lock is sync or not, a bit was
> "stolen" from ->references, which reduce our limit for the same lock
> class nesting from 2^12 to 2^11, and it should still be good enough.
>
> Besides, since we now have bit in held_lock for sync, we don't need the
> "hardirqoffs being 1" trick, and also we can avoid the __lock_release()
> if we jump out of __lock_acquire() before the held_lock stored.
>
> With these changes, a deadlock case evolved with read lock and sync gets
> a better print-out from:
>
> 	[...]  Possible unsafe locking scenario:
> 	[...]
> 	[...]        CPU0                    CPU1
> 	[...]        ----                    ----
> 	[...]   lock(srcuA);
> 	[...]                                lock(srcuB);
> 	[...]                                lock(srcuA);
> 	[...]   lock(srcuB);
>
> to
>
> 	[...]  Possible unsafe locking scenario:
> 	[...]
> 	[...]        CPU0                    CPU1
> 	[...]        ----                    ----
> 	[...]   rlock(srcuA);
> 	[...]                                lock(srcuB);
> 	[...]                                lock(srcuA);
> 	[...]   sync(srcuB);
>
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>   include/linux/lockdep.h  |  3 ++-
>   kernel/locking/lockdep.c | 48 ++++++++++++++++++++++++++--------------
>   2 files changed, 34 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index ba09df6a0872..febd7ecc225c 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -134,7 +134,8 @@ struct held_lock {
>   	unsigned int read:2;        /* see lock_acquire() comment */
>   	unsigned int check:1;       /* see lock_acquire() comment */
>   	unsigned int hardirqs_off:1;
> -	unsigned int references:12;					/* 32 bits */
> +	unsigned int sync:1;
> +	unsigned int references:11;					/* 32 bits */
>   	unsigned int pin_count;
>   };
>   
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index cffa026a765f..4031d87f6829 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -1880,6 +1880,8 @@ print_circular_lock_scenario(struct held_lock *src,
>   	struct lock_class *source = hlock_class(src);
>   	struct lock_class *target = hlock_class(tgt);
>   	struct lock_class *parent = prt->class;
> +	int src_read = src->read;
> +	int tgt_read = tgt->read;
>   
>   	/*
>   	 * A direct locking problem where unsafe_class lock is taken
> @@ -1907,7 +1909,10 @@ print_circular_lock_scenario(struct held_lock *src,
>   	printk(" Possible unsafe locking scenario:\n\n");
>   	printk("       CPU0                    CPU1\n");
>   	printk("       ----                    ----\n");
> -	printk("  lock(");
> +	if (tgt_read != 0)
> +		printk("  rlock(");
> +	else
> +		printk("  lock(");
>   	__print_lock_name(target);
>   	printk(KERN_CONT ");\n");
>   	printk("                               lock(");
> @@ -1916,7 +1921,12 @@ print_circular_lock_scenario(struct held_lock *src,
>   	printk("                               lock(");
>   	__print_lock_name(target);
>   	printk(KERN_CONT ");\n");
> -	printk("  lock(");
> +	if (src_read != 0)
> +		printk("  rlock(");
> +	else if (src->sync)
> +		printk("  sync(");
> +	else
> +		printk("  lock(");
>   	__print_lock_name(source);
>   	printk(KERN_CONT ");\n");
>   	printk("\n *** DEADLOCK ***\n\n");

src can be sync() but not the target. Is there a reason why that is the 
case?


> @@ -4530,7 +4540,13 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
>   					return 0;
>   		}
>   	}
> -	if (!hlock->hardirqs_off) {
> +
> +	/*
> +	 * For lock_sync(), don't mark the ENABLED usage, since lock_sync()
> +	 * creates no critical section and no extra dependency can be introduced
> +	 * by interrupts
> +	 */
> +	if (!hlock->hardirqs_off && !hlock->sync) {
>   		if (hlock->read) {
>   			if (!mark_lock(curr, hlock,
>   					LOCK_ENABLED_HARDIRQ_READ))
> @@ -4909,7 +4925,7 @@ static int __lock_is_held(const struct lockdep_map *lock, int read);
>   static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>   			  int trylock, int read, int check, int hardirqs_off,
>   			  struct lockdep_map *nest_lock, unsigned long ip,
> -			  int references, int pin_count)
> +			  int references, int pin_count, int sync)
>   {
>   	struct task_struct *curr = current;
>   	struct lock_class *class = NULL;
> @@ -4960,7 +4976,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>   
>   	class_idx = class - lock_classes;
>   
> -	if (depth) { /* we're holding locks */
> +	if (depth && !sync) {
> +		/* we're holding locks and the new held lock is not a sync */
>   		hlock = curr->held_locks + depth - 1;
>   		if (hlock->class_idx == class_idx && nest_lock) {
>   			if (!references)
> @@ -4994,6 +5011,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>   	hlock->trylock = trylock;
>   	hlock->read = read;
>   	hlock->check = check;
> +	hlock->sync = !!sync;
>   	hlock->hardirqs_off = !!hardirqs_off;
>   	hlock->references = references;
>   #ifdef CONFIG_LOCK_STAT
> @@ -5055,6 +5073,10 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>   	if (!validate_chain(curr, hlock, chain_head, chain_key))
>   		return 0;
>   
> +	/* For lock_sync(), we are done here since no actual critical section */
> +	if (hlock->sync)
> +		return 1;
> +
>   	curr->curr_chain_key = chain_key;
>   	curr->lockdep_depth++;
>   	check_chain_key(curr);

Even with sync, there is still a corresponding lock_acquire() and 
lock_release(), you can't exit here without increasing lockdep_depth. 
That can cause underflow.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/3] locking/lockdep: Improve the deadlock scenario print for sync and read lock
  2023-01-16 22:21   ` Waiman Long
@ 2023-01-16 22:35     ` Boqun Feng
  2023-01-17  1:36       ` Waiman Long
  0 siblings, 1 reply; 27+ messages in thread
From: Boqun Feng @ 2023-01-16 22:35 UTC (permalink / raw)
  To: Waiman Long
  Cc: linux-kernel, rcu, kvm, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Lai Jiangshan, Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, David Woodhouse, Paolo Bonzini, seanjc,
	Joel Fernandes, Matthew Wilcox, Michal Luczaj

On Mon, Jan 16, 2023 at 05:21:09PM -0500, Waiman Long wrote:
> On 1/13/23 18:57, Boqun Feng wrote:
> > Lock scenario print is always a weak spot of lockdep splats. Improvement
> > can be made if we rework the dependency search and the error printing.
> > 
> > However without touching the graph search, we can improve a little for
> > the circular deadlock case, since we have the to-be-added lock
> > dependency, and know whether these two locks are read/write/sync.
> > 
> > In order to know whether a held_lock is sync or not, a bit was
> > "stolen" from ->references, which reduce our limit for the same lock
> > class nesting from 2^12 to 2^11, and it should still be good enough.
> > 
> > Besides, since we now have bit in held_lock for sync, we don't need the
> > "hardirqoffs being 1" trick, and also we can avoid the __lock_release()
> > if we jump out of __lock_acquire() before the held_lock stored.
> > 
> > With these changes, a deadlock case evolved with read lock and sync gets
> > a better print-out from:
> > 
> > 	[...]  Possible unsafe locking scenario:
> > 	[...]
> > 	[...]        CPU0                    CPU1
> > 	[...]        ----                    ----
> > 	[...]   lock(srcuA);
> > 	[...]                                lock(srcuB);
> > 	[...]                                lock(srcuA);
> > 	[...]   lock(srcuB);
> > 
> > to
> > 
> > 	[...]  Possible unsafe locking scenario:
> > 	[...]
> > 	[...]        CPU0                    CPU1
> > 	[...]        ----                    ----
> > 	[...]   rlock(srcuA);
> > 	[...]                                lock(srcuB);
> > 	[...]                                lock(srcuA);
> > 	[...]   sync(srcuB);
> > 
> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > ---
> >   include/linux/lockdep.h  |  3 ++-
> >   kernel/locking/lockdep.c | 48 ++++++++++++++++++++++++++--------------
> >   2 files changed, 34 insertions(+), 17 deletions(-)
> > 
> > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> > index ba09df6a0872..febd7ecc225c 100644
> > --- a/include/linux/lockdep.h
> > +++ b/include/linux/lockdep.h
> > @@ -134,7 +134,8 @@ struct held_lock {
> >   	unsigned int read:2;        /* see lock_acquire() comment */
> >   	unsigned int check:1;       /* see lock_acquire() comment */
> >   	unsigned int hardirqs_off:1;
> > -	unsigned int references:12;					/* 32 bits */
> > +	unsigned int sync:1;
> > +	unsigned int references:11;					/* 32 bits */
> >   	unsigned int pin_count;
> >   };
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index cffa026a765f..4031d87f6829 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -1880,6 +1880,8 @@ print_circular_lock_scenario(struct held_lock *src,
> >   	struct lock_class *source = hlock_class(src);
> >   	struct lock_class *target = hlock_class(tgt);
> >   	struct lock_class *parent = prt->class;
> > +	int src_read = src->read;
> > +	int tgt_read = tgt->read;
> >   	/*
> >   	 * A direct locking problem where unsafe_class lock is taken
> > @@ -1907,7 +1909,10 @@ print_circular_lock_scenario(struct held_lock *src,
> >   	printk(" Possible unsafe locking scenario:\n\n");
> >   	printk("       CPU0                    CPU1\n");
> >   	printk("       ----                    ----\n");
> > -	printk("  lock(");
> > +	if (tgt_read != 0)
> > +		printk("  rlock(");
> > +	else
> > +		printk("  lock(");
> >   	__print_lock_name(target);
> >   	printk(KERN_CONT ");\n");
> >   	printk("                               lock(");
> > @@ -1916,7 +1921,12 @@ print_circular_lock_scenario(struct held_lock *src,
> >   	printk("                               lock(");
> >   	__print_lock_name(target);
> >   	printk(KERN_CONT ");\n");
> > -	printk("  lock(");
> > +	if (src_read != 0)
> > +		printk("  rlock(");
> > +	else if (src->sync)
> > +		printk("  sync(");
> > +	else
> > +		printk("  lock(");
> >   	__print_lock_name(source);
> >   	printk(KERN_CONT ");\n");
> >   	printk("\n *** DEADLOCK ***\n\n");
> 
> src can be sync() but not the target. Is there a reason why that is the
> case?
> 

The functions annotated by sync() don't create real critical sections,
so no lock dependency can be created from a sync(), for example:

	synchronize_srcu(A);
	mutex_lock(B);

no dependency from A to B. In the scenario case, if we see a dependency
target -> source, the target cannot be a lock_sync(). I will add better
documentation later.

> 
> > @@ -4530,7 +4540,13 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
> >   					return 0;
> >   		}
> >   	}
> > -	if (!hlock->hardirqs_off) {
> > +
> > +	/*
> > +	 * For lock_sync(), don't mark the ENABLED usage, since lock_sync()
> > +	 * creates no critical section and no extra dependency can be introduced
> > +	 * by interrupts
> > +	 */
> > +	if (!hlock->hardirqs_off && !hlock->sync) {
> >   		if (hlock->read) {
> >   			if (!mark_lock(curr, hlock,
> >   					LOCK_ENABLED_HARDIRQ_READ))
> > @@ -4909,7 +4925,7 @@ static int __lock_is_held(const struct lockdep_map *lock, int read);
> >   static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> >   			  int trylock, int read, int check, int hardirqs_off,
> >   			  struct lockdep_map *nest_lock, unsigned long ip,
> > -			  int references, int pin_count)
> > +			  int references, int pin_count, int sync)
> >   {
> >   	struct task_struct *curr = current;
> >   	struct lock_class *class = NULL;
> > @@ -4960,7 +4976,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> >   	class_idx = class - lock_classes;
> > -	if (depth) { /* we're holding locks */
> > +	if (depth && !sync) {
> > +		/* we're holding locks and the new held lock is not a sync */
> >   		hlock = curr->held_locks + depth - 1;
> >   		if (hlock->class_idx == class_idx && nest_lock) {
> >   			if (!references)
> > @@ -4994,6 +5011,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> >   	hlock->trylock = trylock;
> >   	hlock->read = read;
> >   	hlock->check = check;
> > +	hlock->sync = !!sync;
> >   	hlock->hardirqs_off = !!hardirqs_off;
> >   	hlock->references = references;
> >   #ifdef CONFIG_LOCK_STAT
> > @@ -5055,6 +5073,10 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> >   	if (!validate_chain(curr, hlock, chain_head, chain_key))
> >   		return 0;
> > +	/* For lock_sync(), we are done here since no actual critical section */
> > +	if (hlock->sync)
> > +		return 1;
> > +
> >   	curr->curr_chain_key = chain_key;
> >   	curr->lockdep_depth++;
> >   	check_chain_key(curr);
> 
> Even with sync, there is still a corresponding lock_acquire() and
> lock_release(), you can't exit here without increasing lockdep_depth. That
> can cause underflow.
> 

I actually remove the __lock_release() in lock_sync() in this patch, so
I think it's OK. But I must admit the whole submission is to give David
something to see whether the output is an improvement, so I probably
should separate the output changes and the lock_sync() internall into
two patches (and the later can also be folded into the introduction
patch).

Regards,
Boqun

> Cheers,
> Longman
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/3] locking/lockdep: Improve the deadlock scenario print for sync and read lock
  2023-01-16 22:35     ` Boqun Feng
@ 2023-01-17  1:36       ` Waiman Long
  0 siblings, 0 replies; 27+ messages in thread
From: Waiman Long @ 2023-01-17  1:36 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, rcu, kvm, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Lai Jiangshan, Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, David Woodhouse, Paolo Bonzini, seanjc,
	Joel Fernandes, Matthew Wilcox, Michal Luczaj

On 1/16/23 17:35, Boqun Feng wrote:
> On Mon, Jan 16, 2023 at 05:21:09PM -0500, Waiman Long wrote:
>> On 1/13/23 18:57, Boqun Feng wrote:
>>> Lock scenario print is always a weak spot of lockdep splats. Improvement
>>> can be made if we rework the dependency search and the error printing.
>>>
>>> However without touching the graph search, we can improve a little for
>>> the circular deadlock case, since we have the to-be-added lock
>>> dependency, and know whether these two locks are read/write/sync.
>>>
>>> In order to know whether a held_lock is sync or not, a bit was
>>> "stolen" from ->references, which reduce our limit for the same lock
>>> class nesting from 2^12 to 2^11, and it should still be good enough.
>>>
>>> Besides, since we now have bit in held_lock for sync, we don't need the
>>> "hardirqoffs being 1" trick, and also we can avoid the __lock_release()
>>> if we jump out of __lock_acquire() before the held_lock stored.
>>>
>>> With these changes, a deadlock case evolved with read lock and sync gets
>>> a better print-out from:
>>>
>>> 	[...]  Possible unsafe locking scenario:
>>> 	[...]
>>> 	[...]        CPU0                    CPU1
>>> 	[...]        ----                    ----
>>> 	[...]   lock(srcuA);
>>> 	[...]                                lock(srcuB);
>>> 	[...]                                lock(srcuA);
>>> 	[...]   lock(srcuB);
>>>
>>> to
>>>
>>> 	[...]  Possible unsafe locking scenario:
>>> 	[...]
>>> 	[...]        CPU0                    CPU1
>>> 	[...]        ----                    ----
>>> 	[...]   rlock(srcuA);
>>> 	[...]                                lock(srcuB);
>>> 	[...]                                lock(srcuA);
>>> 	[...]   sync(srcuB);
>>>
>>> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
>>> ---
>>>    include/linux/lockdep.h  |  3 ++-
>>>    kernel/locking/lockdep.c | 48 ++++++++++++++++++++++++++--------------
>>>    2 files changed, 34 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
>>> index ba09df6a0872..febd7ecc225c 100644
>>> --- a/include/linux/lockdep.h
>>> +++ b/include/linux/lockdep.h
>>> @@ -134,7 +134,8 @@ struct held_lock {
>>>    	unsigned int read:2;        /* see lock_acquire() comment */
>>>    	unsigned int check:1;       /* see lock_acquire() comment */
>>>    	unsigned int hardirqs_off:1;
>>> -	unsigned int references:12;					/* 32 bits */
>>> +	unsigned int sync:1;
>>> +	unsigned int references:11;					/* 32 bits */
>>>    	unsigned int pin_count;
>>>    };
>>> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
>>> index cffa026a765f..4031d87f6829 100644
>>> --- a/kernel/locking/lockdep.c
>>> +++ b/kernel/locking/lockdep.c
>>> @@ -1880,6 +1880,8 @@ print_circular_lock_scenario(struct held_lock *src,
>>>    	struct lock_class *source = hlock_class(src);
>>>    	struct lock_class *target = hlock_class(tgt);
>>>    	struct lock_class *parent = prt->class;
>>> +	int src_read = src->read;
>>> +	int tgt_read = tgt->read;
>>>    	/*
>>>    	 * A direct locking problem where unsafe_class lock is taken
>>> @@ -1907,7 +1909,10 @@ print_circular_lock_scenario(struct held_lock *src,
>>>    	printk(" Possible unsafe locking scenario:\n\n");
>>>    	printk("       CPU0                    CPU1\n");
>>>    	printk("       ----                    ----\n");
>>> -	printk("  lock(");
>>> +	if (tgt_read != 0)
>>> +		printk("  rlock(");
>>> +	else
>>> +		printk("  lock(");
>>>    	__print_lock_name(target);
>>>    	printk(KERN_CONT ");\n");
>>>    	printk("                               lock(");
>>> @@ -1916,7 +1921,12 @@ print_circular_lock_scenario(struct held_lock *src,
>>>    	printk("                               lock(");
>>>    	__print_lock_name(target);
>>>    	printk(KERN_CONT ");\n");
>>> -	printk("  lock(");
>>> +	if (src_read != 0)
>>> +		printk("  rlock(");
>>> +	else if (src->sync)
>>> +		printk("  sync(");
>>> +	else
>>> +		printk("  lock(");
>>>    	__print_lock_name(source);
>>>    	printk(KERN_CONT ");\n");
>>>    	printk("\n *** DEADLOCK ***\n\n");
>> src can be sync() but not the target. Is there a reason why that is the
>> case?
>>
> The functions annotated by sync() don't create real critical sections,
> so no lock dependency can be created from a sync(), for example:
>
> 	synchronize_srcu(A);
> 	mutex_lock(B);
>
> no dependency from A to B. In the scenario case, if we see a dependency
> target -> source, the target cannot be a lock_sync(). I will add better
> documentation later.
Right, the dependency won't happen since you reduce lock_sync() to 
mostly do validate_chain() without actually storing it in the lock chain 
which I did miss in my initial review. Without that, a dependency may 
happen if an NMI happens between lock_acquire() and lock_release() in 
lock_sync().
>>> @@ -4530,7 +4540,13 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
>>>    					return 0;
>>>    		}
>>>    	}
>>> -	if (!hlock->hardirqs_off) {
>>> +
>>> +	/*
>>> +	 * For lock_sync(), don't mark the ENABLED usage, since lock_sync()
>>> +	 * creates no critical section and no extra dependency can be introduced
>>> +	 * by interrupts
>>> +	 */
>>> +	if (!hlock->hardirqs_off && !hlock->sync) {
>>>    		if (hlock->read) {
>>>    			if (!mark_lock(curr, hlock,
>>>    					LOCK_ENABLED_HARDIRQ_READ))
>>> @@ -4909,7 +4925,7 @@ static int __lock_is_held(const struct lockdep_map *lock, int read);
>>>    static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>>>    			  int trylock, int read, int check, int hardirqs_off,
>>>    			  struct lockdep_map *nest_lock, unsigned long ip,
>>> -			  int references, int pin_count)
>>> +			  int references, int pin_count, int sync)
>>>    {
>>>    	struct task_struct *curr = current;
>>>    	struct lock_class *class = NULL;
>>> @@ -4960,7 +4976,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>>>    	class_idx = class - lock_classes;
>>> -	if (depth) { /* we're holding locks */
>>> +	if (depth && !sync) {
>>> +		/* we're holding locks and the new held lock is not a sync */
>>>    		hlock = curr->held_locks + depth - 1;
>>>    		if (hlock->class_idx == class_idx && nest_lock) {
>>>    			if (!references)
>>> @@ -4994,6 +5011,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>>>    	hlock->trylock = trylock;
>>>    	hlock->read = read;
>>>    	hlock->check = check;
>>> +	hlock->sync = !!sync;
>>>    	hlock->hardirqs_off = !!hardirqs_off;
>>>    	hlock->references = references;
>>>    #ifdef CONFIG_LOCK_STAT
>>> @@ -5055,6 +5073,10 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>>>    	if (!validate_chain(curr, hlock, chain_head, chain_key))
>>>    		return 0;
>>> +	/* For lock_sync(), we are done here since no actual critical section */
>>> +	if (hlock->sync)
>>> +		return 1;
>>> +
>>>    	curr->curr_chain_key = chain_key;
>>>    	curr->lockdep_depth++;
>>>    	check_chain_key(curr);
>> Even with sync, there is still a corresponding lock_acquire() and
>> lock_release(), you can't exit here without increasing lockdep_depth. That
>> can cause underflow.
>>
> I actually remove the __lock_release() in lock_sync() in this patch, so
> I think it's OK. But I must admit the whole submission is to give David
> something to see whether the output is an improvement, so I probably
> should separate the output changes and the lock_sync() internall into
> two patches (and the later can also be folded into the introduction
> patch).

I saw that now. You may not need to separate it into 2 patches since 
there is some dependency between the two. You do have to document the 2 
different changes in your patch description.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test
  2023-01-13 12:46   ` [PATCH 3/3] KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test David Woodhouse
@ 2023-02-04  2:32     ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2023-02-04  2:32 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Boqun Feng, linux-kernel, rcu, kvm, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Lai Jiangshan, Paul E . McKenney,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Paolo Bonzini,
	Joel Fernandes, Matthew Wilcox, Michal Luczaj

On Fri, Jan 13, 2023, David Woodhouse wrote:
> @@ -57,6 +60,7 @@ enum {
>  	TEST_EVTCHN_SLOWPATH,
>  	TEST_EVTCHN_SEND_IOCTL,
>  	TEST_EVTCHN_HCALL,
> +	TEST_EVTCHN_HCALL_SLOWPATH,
>  	TEST_EVTCHN_HCALL_EVENTFD,
>  	TEST_TIMER_SETUP,
>  	TEST_TIMER_WAIT,
> @@ -270,6 +274,20 @@ static void guest_code(void)
>  
>  	guest_wait_for_irq();
>  
> +	GUEST_SYNC(TEST_EVTCHN_HCALL_SLOWPATH);
> +
> +	/* Same again, but this time the host has messed with memslots
> +	 * so it should take the slow path in kvm_xen_set_evtchn(). */

	/*
	 * https://lore.kernel.org/all/CA+55aFyQYJerovMsSoSKS7PessZBr4vNp-3QUUwhqk4A4_jcbg@mail.gmail.com
	 */

> +	__asm__ __volatile__ ("vmcall" :
> +			      "=a" (rax) :
> +			      "a" (__HYPERVISOR_event_channel_op),
> +			      "D" (EVTCHNOP_send),
> +			      "S" (&s));
> +
> +	GUEST_ASSERT(rax == 0);

There's a lot of copy+paste in this file, and we really should do VMMCALL when
running on AMD.  That's easy to do with some changes that are in the queue for
6.3.  I'll repost these selftest patches on top of a few patches to add helpers for
doing hypercalls using the Xen ABI.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] KVM: Make use of SRCU deadlock detection support
  2023-01-13 12:46 ` [PATCH 0/3] KVM: Make use of SRCU deadlock detection support David Woodhouse
                     ` (2 preceding siblings ...)
  2023-01-13 12:46   ` [PATCH 3/3] KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test David Woodhouse
@ 2023-02-04  2:34   ` Sean Christopherson
  3 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2023-02-04  2:34 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Boqun Feng, linux-kernel, rcu, kvm, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long, Lai Jiangshan, Paul E . McKenney,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Paolo Bonzini,
	Joel Fernandes, Matthew Wilcox, Michal Luczaj

On Fri, Jan 13, 2023, David Woodhouse wrote:
> David Woodhouse (3):
>       KVM: Show lockdep the kvm->mutex vs. kvm->srcu ordering rule
>       KVM: selftests: Use enum for test numbers in xen_shinfo_test
>       KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test
> 
>  .../testing/selftests/kvm/x86_64/xen_shinfo_test.c | 165 ++++++++++++++-------
>  virt/kvm/kvm_main.c                                |  10 ++
>  2 files changed, 124 insertions(+), 51 deletions(-)

As mentioned in patch three, I'm going to repost the selftests changes on top
of other cleanups, and will plan on applying them next week if all goes well.

Paolo, do you want to grab the KVM change directly?

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-02-04  2:34 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-13  6:59 [PATCH 0/3] Detect SRCU related deadlocks Boqun Feng
2023-01-13  6:59 ` [PATCH 1/3] locking/lockdep: Introduce lock_sync() Boqun Feng
2023-01-16 21:56   ` Waiman Long
2023-01-13  6:59 ` [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks Boqun Feng
2023-01-13 11:29   ` Paul E. McKenney
2023-01-13 18:05     ` Boqun Feng
2023-01-13 19:11       ` Paul E. McKenney
2023-01-16 17:36         ` Paolo Bonzini
2023-01-16 17:54           ` Boqun Feng
2023-01-16 18:56             ` Paul E. McKenney
2023-01-16 22:01   ` Waiman Long
2023-01-13  6:59 ` [PATCH 3/3] WIP: locking/lockdep: selftests: Add selftests for SRCU Boqun Feng
2023-01-13 12:46 ` [PATCH 0/3] KVM: Make use of SRCU deadlock detection support David Woodhouse
2023-01-13 12:46   ` [PATCH 1/3] KVM: Show lockdep the kvm->mutex vs. kvm->srcu ordering rule David Woodhouse
2023-01-13 12:46   ` [PATCH 2/3] KVM: selftests: Use enum for test numbers in xen_shinfo_test David Woodhouse
2023-01-13 17:13     ` David Woodhouse
2023-01-13 12:46   ` [PATCH 3/3] KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test David Woodhouse
2023-02-04  2:32     ` Sean Christopherson
2023-02-04  2:34   ` [PATCH 0/3] KVM: Make use of SRCU deadlock detection support Sean Christopherson
     [not found] ` <20230113130330.1027-1-hdanton@sina.com>
2023-01-13 17:58   ` [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks Boqun Feng
     [not found]   ` <20230113235809.1085-1-hdanton@sina.com>
2023-01-14  0:17     ` Boqun Feng
     [not found]     ` <20230114071832.1162-1-hdanton@sina.com>
2023-01-14  7:32       ` Boqun Feng
     [not found]       ` <20230114102659.1219-1-hdanton@sina.com>
2023-01-15  0:18         ` Boqun Feng
2023-01-13 23:57 ` [PATCH 4/3] locking/lockdep: Improve the deadlock scenario print for sync and read lock Boqun Feng
2023-01-16 22:21   ` Waiman Long
2023-01-16 22:35     ` Boqun Feng
2023-01-17  1:36       ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).