linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore
@ 2020-12-03 20:09 Eric W. Biederman
  2020-12-03 20:10 ` [PATCH 1/3] rwsem: Implement down_read_killable_nested Eric W. Biederman
                   ` (3 more replies)
  0 siblings, 4 replies; 46+ messages in thread
From: Eric W. Biederman @ 2020-12-03 20:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo


Recently syzbot reported[0] that there is a deadlock amongst the users
of exec_update_mutex.

The simplest and most robust solution appears to be making
exec_update_mutex a read/write lock and having everything execept for
exec take the lock for read.

This set of changes upgrades rwsem so it has the functionality needed
and uses a rw_semaphore to replace the current mutex.

Eric W. Biederman (3):
      rwsem: Implement down_read_killable_nested
      rwsem: Implement down_read_interruptible
      exec: Transform exec_update_mutex into a rw_semaphore

 fs/exec.c                    | 12 ++++++------
 fs/proc/base.c               | 10 +++++-----
 include/linux/rwsem.h        |  3 +++
 include/linux/sched/signal.h | 11 ++++++-----
 init/init_task.c             |  2 +-
 kernel/events/core.c         | 12 ++++++------
 kernel/fork.c                |  6 +++---
 kernel/kcmp.c                | 30 +++++++++++++++---------------
 kernel/locking/rwsem.c       | 40 ++++++++++++++++++++++++++++++++++++++++
 kernel/pid.c                 |  4 ++--
 10 files changed, 87 insertions(+), 43 deletions(-)

[0] https://lkml.kernel.org/r/00000000000063640c05ade8e3de@google.com

Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 1/3] rwsem: Implement down_read_killable_nested
  2020-12-03 20:09 [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman
@ 2020-12-03 20:10 ` Eric W. Biederman
  2020-12-04  1:58   ` Waiman Long
  2020-12-09 18:38   ` [tip: locking/core] " tip-bot2 for Eric W. Biederman
  2020-12-03 20:11 ` [PATCH 2/3] rwsem: Implement down_read_interruptible Eric W. Biederman
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 46+ messages in thread
From: Eric W. Biederman @ 2020-12-03 20:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo


In preparation for converting exec_update_mutex to a rwsem so that
multiple readers can execute in parallel and not deadlock, add
down_read_killable_nested.  This is needed so that kcmp_lock
can be converted from working on a mutexes to working on rw_semaphores.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/rwsem.h  |  2 ++
 kernel/locking/rwsem.c | 14 ++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 25e3fde85617..13021b08b2ed 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -171,6 +171,7 @@ extern void downgrade_write(struct rw_semaphore *sem);
  * See Documentation/locking/lockdep-design.rst for more details.)
  */
 extern void down_read_nested(struct rw_semaphore *sem, int subclass);
+extern int __must_check down_read_killable_nested(struct rw_semaphore *sem, int subclass);
 extern void down_write_nested(struct rw_semaphore *sem, int subclass);
 extern int down_write_killable_nested(struct rw_semaphore *sem, int subclass);
 extern void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest_lock);
@@ -191,6 +192,7 @@ extern void down_read_non_owner(struct rw_semaphore *sem);
 extern void up_read_non_owner(struct rw_semaphore *sem);
 #else
 # define down_read_nested(sem, subclass)		down_read(sem)
+# define down_read_killable_nested(sem, subclass)	down_read_killable(sem)
 # define down_write_nest_lock(sem, nest_lock)	down_write(sem)
 # define down_write_nested(sem, subclass)	down_write(sem)
 # define down_write_killable_nested(sem, subclass)	down_write_killable(sem)
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index f11b9bd3431d..54d11cb97551 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1605,6 +1605,20 @@ void down_read_nested(struct rw_semaphore *sem, int subclass)
 }
 EXPORT_SYMBOL(down_read_nested);
 
+int down_read_killable_nested(struct rw_semaphore *sem, int subclass)
+{
+	might_sleep();
+	rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_);
+
+	if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_killable)) {
+		rwsem_release(&sem->dep_map, _RET_IP_);
+		return -EINTR;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(down_read_killable_nested);
+
 void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest)
 {
 	might_sleep();
-- 
2.25.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-03 20:09 [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman
  2020-12-03 20:10 ` [PATCH 1/3] rwsem: Implement down_read_killable_nested Eric W. Biederman
@ 2020-12-03 20:11 ` Eric W. Biederman
  2020-12-04  1:59   ` Waiman Long
  2020-12-09 18:38   ` [tip: locking/core] rwsem: Implement down_read_interruptible tip-bot2 for Eric W. Biederman
  2020-12-03 20:12 ` [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman
  2020-12-03 22:42 ` [PATCH 0/3] " Linus Torvalds
  3 siblings, 2 replies; 46+ messages in thread
From: Eric W. Biederman @ 2020-12-03 20:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo


In preparation for converting exec_update_mutex to a rwsem so that
multiple readers can execute in parallel and not deadlock, add
down_read_interruptible.  This is needed for perf_event_open to be
converted (with no semantic changes) from working on a mutex to
wroking on a rwsem.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/rwsem.h  |  1 +
 kernel/locking/rwsem.c | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 13021b08b2ed..4c715be48717 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -123,6 +123,7 @@ static inline int rwsem_is_contended(struct rw_semaphore *sem)
  * lock for reading
  */
 extern void down_read(struct rw_semaphore *sem);
+extern int __must_check down_read_interruptible(struct rw_semaphore *sem);
 extern int __must_check down_read_killable(struct rw_semaphore *sem);
 
 /*
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 54d11cb97551..a163542d178e 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1345,6 +1345,18 @@ static inline void __down_read(struct rw_semaphore *sem)
 	}
 }
 
+static inline int __down_read_interruptible(struct rw_semaphore *sem)
+{
+	if (!rwsem_read_trylock(sem)) {
+		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE)))
+			return -EINTR;
+		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
+	} else {
+		rwsem_set_reader_owned(sem);
+	}
+	return 0;
+}
+
 static inline int __down_read_killable(struct rw_semaphore *sem)
 {
 	if (!rwsem_read_trylock(sem)) {
@@ -1495,6 +1507,20 @@ void __sched down_read(struct rw_semaphore *sem)
 }
 EXPORT_SYMBOL(down_read);
 
+int __sched down_read_interruptible(struct rw_semaphore *sem)
+{
+	might_sleep();
+	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
+
+	if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) {
+		rwsem_release(&sem->dep_map, _RET_IP_);
+		return -EINTR;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(down_read_interruptible);
+
 int __sched down_read_killable(struct rw_semaphore *sem)
 {
 	might_sleep();
-- 
2.25.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-03 20:09 [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman
  2020-12-03 20:10 ` [PATCH 1/3] rwsem: Implement down_read_killable_nested Eric W. Biederman
  2020-12-03 20:11 ` [PATCH 2/3] rwsem: Implement down_read_interruptible Eric W. Biederman
@ 2020-12-03 20:12 ` Eric W. Biederman
  2020-12-04 16:08   ` Bernd Edlinger
  2020-12-03 22:42 ` [PATCH 0/3] " Linus Torvalds
  3 siblings, 1 reply; 46+ messages in thread
From: Eric W. Biederman @ 2020-12-03 20:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo


Recently syzbot reported[0] that there is a deadlock amongst the users
of exec_update_mutex.  The problematic lock ordering found by lockdep
was:

   perf_event_open  (exec_update_mutex -> ovl_i_mutex)
   chown            (ovl_i_mutex       -> sb_writes)
   sendfile         (sb_writes         -> p->lock)
     by reading from a proc file and writing to overlayfs
   proc_pid_syscall (p->lock           -> exec_update_mutex)

While looking at possible solutions it occured to me that all of the
users and possible users involved only wanted to state of the given
process to remain the same.  They are all readers.  The only writer is
exec.

There is no reason for readers to block on each other.  So fix
this deadlock by transforming exec_update_mutex into a rw_semaphore
named exec_update_lock that only exec takes for writing.

Cc: Jann Horn <jannh@google.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christopher Yeoh <cyeoh@au1.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Fixes: eea9673250db ("exec: Add exec_update_mutex to replace cred_guard_mutex")
[0] https://lkml.kernel.org/r/00000000000063640c05ade8e3de@google.com
Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/exec.c                    | 12 ++++++------
 fs/proc/base.c               | 10 +++++-----
 include/linux/sched/signal.h | 11 ++++++-----
 init/init_task.c             |  2 +-
 kernel/events/core.c         | 12 ++++++------
 kernel/fork.c                |  6 +++---
 kernel/kcmp.c                | 30 +++++++++++++++---------------
 kernel/pid.c                 |  4 ++--
 8 files changed, 44 insertions(+), 43 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 9e9368603168..b822aa0d682e 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -965,8 +965,8 @@ EXPORT_SYMBOL(read_code);
 
 /*
  * Maps the mm_struct mm into the current task struct.
- * On success, this function returns with the mutex
- * exec_update_mutex locked.
+ * On success, this function returns with exec_update_lock
+ * held for writing.
  */
 static int exec_mmap(struct mm_struct *mm)
 {
@@ -981,7 +981,7 @@ static int exec_mmap(struct mm_struct *mm)
 	if (old_mm)
 		sync_mm_rss(old_mm);
 
-	ret = mutex_lock_killable(&tsk->signal->exec_update_mutex);
+	ret = down_write_killable(&tsk->signal->exec_update_lock);
 	if (ret)
 		return ret;
 
@@ -995,7 +995,7 @@ static int exec_mmap(struct mm_struct *mm)
 		mmap_read_lock(old_mm);
 		if (unlikely(old_mm->core_state)) {
 			mmap_read_unlock(old_mm);
-			mutex_unlock(&tsk->signal->exec_update_mutex);
+			up_write(&tsk->signal->exec_update_lock);
 			return -EINTR;
 		}
 	}
@@ -1392,7 +1392,7 @@ int begin_new_exec(struct linux_binprm * bprm)
 	return 0;
 
 out_unlock:
-	mutex_unlock(&me->signal->exec_update_mutex);
+	up_write(&me->signal->exec_update_lock);
 out:
 	return retval;
 }
@@ -1433,7 +1433,7 @@ void setup_new_exec(struct linux_binprm * bprm)
 	 * some architectures like powerpc
 	 */
 	me->mm->task_size = TASK_SIZE;
-	mutex_unlock(&me->signal->exec_update_mutex);
+	up_write(&me->signal->exec_update_lock);
 	mutex_unlock(&me->signal->cred_guard_mutex);
 }
 EXPORT_SYMBOL(setup_new_exec);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 0f707003dda5..dd1f4f6f22bc 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -405,11 +405,11 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
 
 static int lock_trace(struct task_struct *task)
 {
-	int err = mutex_lock_killable(&task->signal->exec_update_mutex);
+	int err = down_read_killable(&task->signal->exec_update_lock);
 	if (err)
 		return err;
 	if (!ptrace_may_access(task, PTRACE_MODE_ATTACH_FSCREDS)) {
-		mutex_unlock(&task->signal->exec_update_mutex);
+		up_read(&task->signal->exec_update_lock);
 		return -EPERM;
 	}
 	return 0;
@@ -417,7 +417,7 @@ static int lock_trace(struct task_struct *task)
 
 static void unlock_trace(struct task_struct *task)
 {
-	mutex_unlock(&task->signal->exec_update_mutex);
+	up_read(&task->signal->exec_update_lock);
 }
 
 #ifdef CONFIG_STACKTRACE
@@ -2928,7 +2928,7 @@ static int do_io_accounting(struct task_struct *task, struct seq_file *m, int wh
 	unsigned long flags;
 	int result;
 
-	result = mutex_lock_killable(&task->signal->exec_update_mutex);
+	result = down_read_killable(&task->signal->exec_update_lock);
 	if (result)
 		return result;
 
@@ -2964,7 +2964,7 @@ static int do_io_accounting(struct task_struct *task, struct seq_file *m, int wh
 	result = 0;
 
 out_unlock:
-	mutex_unlock(&task->signal->exec_update_mutex);
+	up_read(&task->signal->exec_update_lock);
 	return result;
 }
 
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 1bad18a1d8ba..4b6a8234d7fc 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -228,12 +228,13 @@ struct signal_struct {
 					 * credential calculations
 					 * (notably. ptrace)
 					 * Deprecated do not use in new code.
-					 * Use exec_update_mutex instead.
-					 */
-	struct mutex exec_update_mutex;	/* Held while task_struct is being
-					 * updated during exec, and may have
-					 * inconsistent permissions.
+					 * Use exec_update_lock instead.
 					 */
+	struct rw_semaphore exec_update_lock;	/* Held while task_struct is
+						 * being updated during exec,
+						 * and may have inconsistent
+						 * permissions.
+						 */
 } __randomize_layout;
 
 /*
diff --git a/init/init_task.c b/init/init_task.c
index a56f0abb63e9..15f6eb93a04f 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -26,7 +26,7 @@ static struct signal_struct init_signals = {
 	.multiprocess	= HLIST_HEAD_INIT,
 	.rlim		= INIT_RLIMITS,
 	.cred_guard_mutex = __MUTEX_INITIALIZER(init_signals.cred_guard_mutex),
-	.exec_update_mutex = __MUTEX_INITIALIZER(init_signals.exec_update_mutex),
+	.exec_update_lock = __RWSEM_INITIALIZER(init_signals.exec_update_lock),
 #ifdef CONFIG_POSIX_TIMERS
 	.posix_timers = LIST_HEAD_INIT(init_signals.posix_timers),
 	.cputimer	= {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index da467e1dd49a..3189f690236e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1325,7 +1325,7 @@ static void put_ctx(struct perf_event_context *ctx)
  * function.
  *
  * Lock order:
- *    exec_update_mutex
+ *    exec_update_lock
  *	task_struct::perf_event_mutex
  *	  perf_event_context::mutex
  *	    perf_event::child_mutex;
@@ -11730,14 +11730,14 @@ SYSCALL_DEFINE5(perf_event_open,
 	}
 
 	if (task) {
-		err = mutex_lock_interruptible(&task->signal->exec_update_mutex);
+		err = down_read_interruptible(&task->signal->exec_update_lock);
 		if (err)
 			goto err_task;
 
 		/*
 		 * Preserve ptrace permission check for backwards compatibility.
 		 *
-		 * We must hold exec_update_mutex across this and any potential
+		 * We must hold exec_update_lock across this and any potential
 		 * perf_install_in_context() call for this new event to
 		 * serialize against exec() altering our credentials (and the
 		 * perf_event_exit_task() that could imply).
@@ -12026,7 +12026,7 @@ SYSCALL_DEFINE5(perf_event_open,
 	mutex_unlock(&ctx->mutex);
 
 	if (task) {
-		mutex_unlock(&task->signal->exec_update_mutex);
+		up_read(&task->signal->exec_update_lock);
 		put_task_struct(task);
 	}
 
@@ -12062,7 +12062,7 @@ SYSCALL_DEFINE5(perf_event_open,
 		free_event(event);
 err_cred:
 	if (task)
-		mutex_unlock(&task->signal->exec_update_mutex);
+		up_read(&task->signal->exec_update_lock);
 err_task:
 	if (task)
 		put_task_struct(task);
@@ -12367,7 +12367,7 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
 /*
  * When a child task exits, feed back event values to parent events.
  *
- * Can be called with exec_update_mutex held when called from
+ * Can be called with exec_update_lock held when called from
  * setup_new_exec().
  */
 void perf_event_exit_task(struct task_struct *child)
diff --git a/kernel/fork.c b/kernel/fork.c
index 837b546528c8..4d0ae6f827df 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1221,7 +1221,7 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode)
 	struct mm_struct *mm;
 	int err;
 
-	err =  mutex_lock_killable(&task->signal->exec_update_mutex);
+	err =  down_read_killable(&task->signal->exec_update_lock);
 	if (err)
 		return ERR_PTR(err);
 
@@ -1231,7 +1231,7 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode)
 		mmput(mm);
 		mm = ERR_PTR(-EACCES);
 	}
-	mutex_unlock(&task->signal->exec_update_mutex);
+	up_read(&task->signal->exec_update_lock);
 
 	return mm;
 }
@@ -1591,7 +1591,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 	sig->oom_score_adj_min = current->signal->oom_score_adj_min;
 
 	mutex_init(&sig->cred_guard_mutex);
-	mutex_init(&sig->exec_update_mutex);
+	init_rwsem(&sig->exec_update_lock);
 
 	return 0;
 }
diff --git a/kernel/kcmp.c b/kernel/kcmp.c
index 36e58eb5a11d..5353edfad8e1 100644
--- a/kernel/kcmp.c
+++ b/kernel/kcmp.c
@@ -70,25 +70,25 @@ get_file_raw_ptr(struct task_struct *task, unsigned int idx)
 	return file;
 }
 
-static void kcmp_unlock(struct mutex *m1, struct mutex *m2)
+static void kcmp_unlock(struct rw_semaphore *l1, struct rw_semaphore *l2)
 {
-	if (likely(m2 != m1))
-		mutex_unlock(m2);
-	mutex_unlock(m1);
+	if (likely(l2 != l1))
+		up_read(l2);
+	up_read(l1);
 }
 
-static int kcmp_lock(struct mutex *m1, struct mutex *m2)
+static int kcmp_lock(struct rw_semaphore *l1, struct rw_semaphore *l2)
 {
 	int err;
 
-	if (m2 > m1)
-		swap(m1, m2);
+	if (l2 > l1)
+		swap(l1, l2);
 
-	err = mutex_lock_killable(m1);
-	if (!err && likely(m1 != m2)) {
-		err = mutex_lock_killable_nested(m2, SINGLE_DEPTH_NESTING);
+	err = down_read_killable(l1);
+	if (!err && likely(l1 != l2)) {
+		err = down_read_killable_nested(l2, SINGLE_DEPTH_NESTING);
 		if (err)
-			mutex_unlock(m1);
+			up_read(l1);
 	}
 
 	return err;
@@ -156,8 +156,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
 	/*
 	 * One should have enough rights to inspect task details.
 	 */
-	ret = kcmp_lock(&task1->signal->exec_update_mutex,
-			&task2->signal->exec_update_mutex);
+	ret = kcmp_lock(&task1->signal->exec_update_lock,
+			&task2->signal->exec_update_lock);
 	if (ret)
 		goto err;
 	if (!ptrace_may_access(task1, PTRACE_MODE_READ_REALCREDS) ||
@@ -212,8 +212,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
 	}
 
 err_unlock:
-	kcmp_unlock(&task1->signal->exec_update_mutex,
-		    &task2->signal->exec_update_mutex);
+	kcmp_unlock(&task1->signal->exec_update_lock,
+		    &task2->signal->exec_update_lock);
 err:
 	put_task_struct(task1);
 	put_task_struct(task2);
diff --git a/kernel/pid.c b/kernel/pid.c
index a96bc4bf4f86..4856818c9de1 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -628,7 +628,7 @@ static struct file *__pidfd_fget(struct task_struct *task, int fd)
 	struct file *file;
 	int ret;
 
-	ret = mutex_lock_killable(&task->signal->exec_update_mutex);
+	ret = down_read_killable(&task->signal->exec_update_lock);
 	if (ret)
 		return ERR_PTR(ret);
 
@@ -637,7 +637,7 @@ static struct file *__pidfd_fget(struct task_struct *task, int fd)
 	else
 		file = ERR_PTR(-EPERM);
 
-	mutex_unlock(&task->signal->exec_update_mutex);
+	up_read(&task->signal->exec_update_lock);
 
 	return file ?: ERR_PTR(-EBADF);
 }
-- 
2.25.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-03 20:09 [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman
                   ` (2 preceding siblings ...)
  2020-12-03 20:12 ` [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman
@ 2020-12-03 22:42 ` Linus Torvalds
  2020-12-04  1:56   ` Waiman Long
  2020-12-04  4:54   ` Davidlohr Bueso
  3 siblings, 2 replies; 46+ messages in thread
From: Linus Torvalds @ 2020-12-03 22:42 UTC (permalink / raw)
  To: Eric W. Biederman, Waiman Long, Davidlohr Bueso
  Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

On Thu, Dec 3, 2020 at 12:10 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> The simplest and most robust solution appears to be making
> exec_update_mutex a read/write lock and having everything execept for
> exec take the lock for read.

Looks sane to me.

I'd like the locking people to look at the down_read_*() functions,
even if they look simple. Adding Waiman and Davidlohr to the cc just
in case they would look at that too, since they've worked on that
code.

            Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-03 22:42 ` [PATCH 0/3] " Linus Torvalds
@ 2020-12-04  1:56   ` Waiman Long
  2020-12-04  4:54   ` Davidlohr Bueso
  1 sibling, 0 replies; 46+ messages in thread
From: Waiman Long @ 2020-12-04  1:56 UTC (permalink / raw)
  To: Linus Torvalds, Eric W. Biederman, Davidlohr Bueso
  Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

On 12/3/20 5:42 PM, Linus Torvalds wrote:
> On Thu, Dec 3, 2020 at 12:10 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> The simplest and most robust solution appears to be making
>> exec_update_mutex a read/write lock and having everything execept for
>> exec take the lock for read.
> Looks sane to me.
>
> I'd like the locking people to look at the down_read_*() functions,
> even if they look simple. Adding Waiman and Davidlohr to the cc just
> in case they would look at that too, since they've worked on that
> code.

I have looked at patches 1 & 2 on adding down_read_killable_nested() and 
down_read_interruptible(). They looks good to me.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/3] rwsem: Implement down_read_killable_nested
  2020-12-03 20:10 ` [PATCH 1/3] rwsem: Implement down_read_killable_nested Eric W. Biederman
@ 2020-12-04  1:58   ` Waiman Long
  2020-12-09 18:38   ` [tip: locking/core] " tip-bot2 for Eric W. Biederman
  1 sibling, 0 replies; 46+ messages in thread
From: Waiman Long @ 2020-12-04  1:58 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

On 12/3/20 3:10 PM, Eric W. Biederman wrote:
> In preparation for converting exec_update_mutex to a rwsem so that
> multiple readers can execute in parallel and not deadlock, add
> down_read_killable_nested.  This is needed so that kcmp_lock
> can be converted from working on a mutexes to working on rw_semaphores.
>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Will Deacon <will@kernel.org>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>   include/linux/rwsem.h  |  2 ++
>   kernel/locking/rwsem.c | 14 ++++++++++++++
>   2 files changed, 16 insertions(+)
>
> diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
> index 25e3fde85617..13021b08b2ed 100644
> --- a/include/linux/rwsem.h
> +++ b/include/linux/rwsem.h
> @@ -171,6 +171,7 @@ extern void downgrade_write(struct rw_semaphore *sem);
>    * See Documentation/locking/lockdep-design.rst for more details.)
>    */
>   extern void down_read_nested(struct rw_semaphore *sem, int subclass);
> +extern int __must_check down_read_killable_nested(struct rw_semaphore *sem, int subclass);
>   extern void down_write_nested(struct rw_semaphore *sem, int subclass);
>   extern int down_write_killable_nested(struct rw_semaphore *sem, int subclass);
>   extern void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest_lock);
> @@ -191,6 +192,7 @@ extern void down_read_non_owner(struct rw_semaphore *sem);
>   extern void up_read_non_owner(struct rw_semaphore *sem);
>   #else
>   # define down_read_nested(sem, subclass)		down_read(sem)
> +# define down_read_killable_nested(sem, subclass)	down_read_killable(sem)
>   # define down_write_nest_lock(sem, nest_lock)	down_write(sem)
>   # define down_write_nested(sem, subclass)	down_write(sem)
>   # define down_write_killable_nested(sem, subclass)	down_write_killable(sem)
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index f11b9bd3431d..54d11cb97551 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -1605,6 +1605,20 @@ void down_read_nested(struct rw_semaphore *sem, int subclass)
>   }
>   EXPORT_SYMBOL(down_read_nested);
>   
> +int down_read_killable_nested(struct rw_semaphore *sem, int subclass)
> +{
> +	might_sleep();
> +	rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_);
> +
> +	if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_killable)) {
> +		rwsem_release(&sem->dep_map, _RET_IP_);
> +		return -EINTR;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(down_read_killable_nested);
> +
>   void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest)
>   {
>   	might_sleep();

Acked-by: Waiman Long <longman@redhat.com>

-Longman


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-03 20:11 ` [PATCH 2/3] rwsem: Implement down_read_interruptible Eric W. Biederman
@ 2020-12-04  1:59   ` Waiman Long
  2020-12-07  9:02     ` Peter Zijlstra
  2020-12-09 18:38   ` [tip: locking/core] rwsem: Implement down_read_interruptible tip-bot2 for Eric W. Biederman
  1 sibling, 1 reply; 46+ messages in thread
From: Waiman Long @ 2020-12-04  1:59 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

On 12/3/20 3:11 PM, Eric W. Biederman wrote:
> In preparation for converting exec_update_mutex to a rwsem so that
> multiple readers can execute in parallel and not deadlock, add
> down_read_interruptible.  This is needed for perf_event_open to be
> converted (with no semantic changes) from working on a mutex to
> wroking on a rwsem.
>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Will Deacon <will@kernel.org>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>   include/linux/rwsem.h  |  1 +
>   kernel/locking/rwsem.c | 26 ++++++++++++++++++++++++++
>   2 files changed, 27 insertions(+)
>
> diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
> index 13021b08b2ed..4c715be48717 100644
> --- a/include/linux/rwsem.h
> +++ b/include/linux/rwsem.h
> @@ -123,6 +123,7 @@ static inline int rwsem_is_contended(struct rw_semaphore *sem)
>    * lock for reading
>    */
>   extern void down_read(struct rw_semaphore *sem);
> +extern int __must_check down_read_interruptible(struct rw_semaphore *sem);
>   extern int __must_check down_read_killable(struct rw_semaphore *sem);
>   
>   /*
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 54d11cb97551..a163542d178e 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -1345,6 +1345,18 @@ static inline void __down_read(struct rw_semaphore *sem)
>   	}
>   }
>   
> +static inline int __down_read_interruptible(struct rw_semaphore *sem)
> +{
> +	if (!rwsem_read_trylock(sem)) {
> +		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE)))
> +			return -EINTR;
> +		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
> +	} else {
> +		rwsem_set_reader_owned(sem);
> +	}
> +	return 0;
> +}
> +
>   static inline int __down_read_killable(struct rw_semaphore *sem)
>   {
>   	if (!rwsem_read_trylock(sem)) {
> @@ -1495,6 +1507,20 @@ void __sched down_read(struct rw_semaphore *sem)
>   }
>   EXPORT_SYMBOL(down_read);
>   
> +int __sched down_read_interruptible(struct rw_semaphore *sem)
> +{
> +	might_sleep();
> +	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
> +
> +	if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) {
> +		rwsem_release(&sem->dep_map, _RET_IP_);
> +		return -EINTR;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(down_read_interruptible);
> +
>   int __sched down_read_killable(struct rw_semaphore *sem)
>   {
>   	might_sleep();

Acked-by: Waiman Long <longman@redhat.com>

-Longman


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-03 22:42 ` [PATCH 0/3] " Linus Torvalds
  2020-12-04  1:56   ` Waiman Long
@ 2020-12-04  4:54   ` Davidlohr Bueso
  1 sibling, 0 replies; 46+ messages in thread
From: Davidlohr Bueso @ 2020-12-04  4:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, Waiman Long, Linux Kernel Mailing List,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn,
	Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov,
	Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

On Thu, 03 Dec 2020, Linus Torvalds wrote:

>On Thu, Dec 3, 2020 at 12:10 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> The simplest and most robust solution appears to be making
>> exec_update_mutex a read/write lock and having everything execept for
>> exec take the lock for read.
>
>Looks sane to me.
>
>I'd like the locking people to look at the down_read_*() functions,
>even if they look simple. Adding Waiman and Davidlohr to the cc just
>in case they would look at that too, since they've worked on that
>code.

rwsem changes look good to me - and 3/3 looks much nicer than the previous
alternatives to the deadlock.

Acked-by: Davidlohr Bueso <dbueso@suse.de>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-03 20:12 ` [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman
@ 2020-12-04 16:08   ` Bernd Edlinger
  2020-12-04 17:21     ` Linus Torvalds
  2020-12-04 17:39     ` Eric W. Biederman
  0 siblings, 2 replies; 46+ messages in thread
From: Bernd Edlinger @ 2020-12-04 16:08 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov,
	Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Waiman Long, Davidlohr Bueso

Hi Eric,

I think I remembered from a previous discussion about this topic,
that it was unclear if the rw_semaphores are working the same
in RT-Linux.  Will this fix work in RT as well?

On 12/3/20 9:12 PM, Eric W. Biederman wrote:
> --- a/kernel/kcmp.c
> +++ b/kernel/kcmp.c
> @@ -70,25 +70,25 @@ get_file_raw_ptr(struct task_struct *task, unsigned int idx)
>  	return file;
>  }
>  
> -static void kcmp_unlock(struct mutex *m1, struct mutex *m2)
> +static void kcmp_unlock(struct rw_semaphore *l1, struct rw_semaphore *l2)
>  {
> -	if (likely(m2 != m1))
> -		mutex_unlock(m2);
> -	mutex_unlock(m1);
> +	if (likely(l2 != l1))

is this still necessary ?

> +		up_read(l2);
> +	up_read(l1);
>  }
>  
> -static int kcmp_lock(struct mutex *m1, struct mutex *m2)
> +static int kcmp_lock(struct rw_semaphore *l1, struct rw_semaphore *l2)
>  {
>  	int err;
>  
> -	if (m2 > m1)
> -		swap(m1, m2);
> +	if (l2 > l1)
> +		swap(l1, l2);

and this is probably also no longer necessary?


>  
> -	err = mutex_lock_killable(m1);
> -	if (!err && likely(m1 != m2)) {
> -		err = mutex_lock_killable_nested(m2, SINGLE_DEPTH_NESTING);
> +	err = down_read_killable(l1);
> +	if (!err && likely(l1 != l2)) {

and this can now be unconditionally, right?

> +		err = down_read_killable_nested(l2, SINGLE_DEPTH_NESTING);
>  		if (err)
> -			mutex_unlock(m1);
> +			up_read(l1);
>  	}
>  
>  	return err;
> @@ -156,8 +156,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
>  	/*
>  	 * One should have enough rights to inspect task details.
>  	 */
> -	ret = kcmp_lock(&task1->signal->exec_update_mutex,
> -			&task2->signal->exec_update_mutex);
> +	ret = kcmp_lock(&task1->signal->exec_update_lock,
> +			&task2->signal->exec_update_lock);
>  	if (ret)
>  		goto err;
>  	if (!ptrace_may_access(task1, PTRACE_MODE_READ_REALCREDS) ||
> @@ -212,8 +212,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
>  	}
>  
>  err_unlock:
> -	kcmp_unlock(&task1->signal->exec_update_mutex,
> -		    &task2->signal->exec_update_mutex);
> +	kcmp_unlock(&task1->signal->exec_update_lock,
> +		    &task2->signal->exec_update_lock);
>  err:
>  	put_task_struct(task1);
>  	put_task_struct(task2);


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-04 16:08   ` Bernd Edlinger
@ 2020-12-04 17:21     ` Linus Torvalds
  2020-12-04 19:34       ` Eric W. Biederman
  2020-12-04 17:39     ` Eric W. Biederman
  1 sibling, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2020-12-04 17:21 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Eric W. Biederman, Linux Kernel Mailing List, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Waiman Long, Davidlohr Bueso

On Fri, Dec 4, 2020 at 8:08 AM Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
>
> >
> > -static void kcmp_unlock(struct mutex *m1, struct mutex *m2)
> > +static void kcmp_unlock(struct rw_semaphore *l1, struct rw_semaphore *l2)
> >  {
> > -     if (likely(m2 != m1))
> > -             mutex_unlock(m2);
> > -     mutex_unlock(m1);
> > +     if (likely(l2 != l1))
>
> is this still necessary ?
>
> > +             up_read(l2);
> > +     up_read(l1);
> >  }
> >
> > -static int kcmp_lock(struct mutex *m1, struct mutex *m2)
> > +static int kcmp_lock(struct rw_semaphore *l1, struct rw_semaphore *l2)
> >  {
> >       int err;
> >
> > -     if (m2 > m1)
> > -             swap(m1, m2);
> > +     if (l2 > l1)
> > +             swap(l1, l2);
>
> and this is probably also no longer necessary?

These are still necessary, because even a recursive read lock can
still block on a writer trying to come in between the two read locks
due to fairness guarantees.

So taking the same read lock twice is still a source of possible deadlocks.

For the same reason, read locks still have ABBA deadlock and need to
be taken in order.

So switching from a mutex to a rwlock doesn't really change the
locking rules in this respect.

In fact, I'm not convinced this change even fixes the deadlock that
syzbot reported, for the same reason: it just requires a write lock in
between two read locks to deadlock.

               Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-04 16:08   ` Bernd Edlinger
  2020-12-04 17:21     ` Linus Torvalds
@ 2020-12-04 17:39     ` Eric W. Biederman
  1 sibling, 0 replies; 46+ messages in thread
From: Eric W. Biederman @ 2020-12-04 17:39 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: linux-kernel, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov,
	Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Waiman Long, Davidlohr Bueso

Bernd Edlinger <bernd.edlinger@hotmail.de> writes:

> Hi Eric,
>
> I think I remembered from a previous discussion about this topic,
> that it was unclear if the rw_semaphores are working the same
> in RT-Linux.  Will this fix work in RT as well?

The locks should work close enough to the same that correct code is also
correct code on RT-linux.  If not it is an RT-linux bug.

An rw_semaphore may be less than optimal on RT-linux.  I do remember
that mutexes are prefered.  But this change is more about correctness
than anything else.

> On 12/3/20 9:12 PM, Eric W. Biederman wrote:
>> --- a/kernel/kcmp.c
>> +++ b/kernel/kcmp.c
>> @@ -70,25 +70,25 @@ get_file_raw_ptr(struct task_struct *task, unsigned int idx)
>>  	return file;
>>  }
>>  
>> -static void kcmp_unlock(struct mutex *m1, struct mutex *m2)
>> +static void kcmp_unlock(struct rw_semaphore *l1, struct rw_semaphore *l2)
>>  {
>> -	if (likely(m2 != m1))
>> -		mutex_unlock(m2);
>> -	mutex_unlock(m1);
>> +	if (likely(l2 != l1))
>
> is this still necessary ?

Yes.  Both pids could be threads of the same process or even the same
value so yes this is definitely necessary.  rw_semaphores don't nest on
the same cpu.

>
>> +		up_read(l2);
>> +	up_read(l1);
>>  }
>>  
>> -static int kcmp_lock(struct mutex *m1, struct mutex *m2)
>> +static int kcmp_lock(struct rw_semaphore *l1, struct rw_semaphore *l2)
>>  {
>>  	int err;
>>  
>> -	if (m2 > m1)
>> -		swap(m1, m2);
>> +	if (l2 > l1)
>> +		swap(l1, l2);
>
> and this is probably also no longer necessary?

I think lockdep needs this, so it can be certain the same rw_semaphore
is not nesting on the cpu.   Otherwise we will have inconsitencies about
which is the nested lock.  It won't matter in practice, but I am not
certain lockdep knows enough to tell the difference.

If anything removing the swap is a candidate for a follow up patch
where it can be considered separately from other concerns.  For this
patch keeping the logic unchanged makes it trivial to verify that
the conversion from one lock to another is correct.

>>  
>> -	err = mutex_lock_killable(m1);
>> -	if (!err && likely(m1 != m2)) {
>> -		err = mutex_lock_killable_nested(m2, SINGLE_DEPTH_NESTING);
>> +	err = down_read_killable(l1);
>> +	if (!err && likely(l1 != l2)) {
>
> and this can now be unconditionally, right?

Nope.  The two locks can be the same lock, and they don't nest on a
single cpu.  I tested and verified that lockdep complains bitterly
if down_read_killable_nested is replaced with a simple
down_read_killable.


>> +		err = down_read_killable_nested(l2, SINGLE_DEPTH_NESTING);
>>  		if (err)
>> -			mutex_unlock(m1);
>> +			up_read(l1);
>>  	}
>>  
>>  	return err;
>> @@ -156,8 +156,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
>>  	/*
>>  	 * One should have enough rights to inspect task details.
>>  	 */
>> -	ret = kcmp_lock(&task1->signal->exec_update_mutex,
>> -			&task2->signal->exec_update_mutex);
>> +	ret = kcmp_lock(&task1->signal->exec_update_lock,
>> +			&task2->signal->exec_update_lock);
>>  	if (ret)
>>  		goto err;
>>  	if (!ptrace_may_access(task1, PTRACE_MODE_READ_REALCREDS) ||
>> @@ -212,8 +212,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
>>  	}
>>  
>>  err_unlock:
>> -	kcmp_unlock(&task1->signal->exec_update_mutex,
>> -		    &task2->signal->exec_update_mutex);
>> +	kcmp_unlock(&task1->signal->exec_update_lock,
>> +		    &task2->signal->exec_update_lock);
>>  err:
>>  	put_task_struct(task1);
>>  	put_task_struct(task2);
>
>
> Thanks
> Bernd.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-04 17:21     ` Linus Torvalds
@ 2020-12-04 19:34       ` Eric W. Biederman
  2020-12-04 20:10         ` Linus Torvalds
  0 siblings, 1 reply; 46+ messages in thread
From: Eric W. Biederman @ 2020-12-04 19:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bernd Edlinger, Linux Kernel Mailing List, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro,
	Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Waiman Long, Davidlohr Bueso

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Fri, Dec 4, 2020 at 8:08 AM Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
>>
>> >
>> > -static void kcmp_unlock(struct mutex *m1, struct mutex *m2)
>> > +static void kcmp_unlock(struct rw_semaphore *l1, struct rw_semaphore *l2)
>> >  {
>> > -     if (likely(m2 != m1))
>> > -             mutex_unlock(m2);
>> > -     mutex_unlock(m1);
>> > +     if (likely(l2 != l1))
>>
>> is this still necessary ?
>>
>> > +             up_read(l2);
>> > +     up_read(l1);
>> >  }
>> >
>> > -static int kcmp_lock(struct mutex *m1, struct mutex *m2)
>> > +static int kcmp_lock(struct rw_semaphore *l1, struct rw_semaphore *l2)
>> >  {
>> >       int err;
>> >
>> > -     if (m2 > m1)
>> > -             swap(m1, m2);
>> > +     if (l2 > l1)
>> > +             swap(l1, l2);
>>
>> and this is probably also no longer necessary?
>
> These are still necessary, because even a recursive read lock can
> still block on a writer trying to come in between the two read locks
> due to fairness guarantees.
>
> So taking the same read lock twice is still a source of possible deadlocks.
>
> For the same reason, read locks still have ABBA deadlock and need to
> be taken in order.
>
> So switching from a mutex to a rwlock doesn't really change the
> locking rules in this respect.

Thinking about the specific case of down_read on two instances of
exec_update_lock.  If there are two instances of kcmp being called with
the sames two pids, but in opposite order running, and the tasks of that
both of those pids refer to both exec, you could definitely get a
deadlock.

So yes.  We definitely need to keep the swap as well.

> In fact, I'm not convinced this change even fixes the deadlock that
> syzbot reported, for the same reason: it just requires a write lock in
> between two read locks to deadlock.

From a deadlock perspective the change is strictly better than what we
have today.  The readers will no longer block on each other.

For the specific case that syzbot reported it is readers who were
blocking on each other so that specific case if fixed.



On the write side of exec_update_lock we have:

cred_guard_mutex -> exec_update_lock

Which means that to get an ABBA deadlock cred_guard_mutex would need to
be involved and it is only acquired in 3 places: ptrace_attach,
do_seccomp, and proc_pid_attr_write.  In none of the 3 from the syscall
entry point until the code that takes cred_guard_mutex can I find
anything that takes any locks.  Perhaps there is something in io_uring I
did not completely trace that write path.

So given that the exec path would need to be involved, and the exec path
takes exec_update_lock pretty much at the top level.  I am not seeing
how there is any room for deadlocks after this change.

Am I missing something?

Eric









^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-04 19:34       ` Eric W. Biederman
@ 2020-12-04 20:10         ` Linus Torvalds
  2020-12-04 20:30           ` Bernd Edlinger
  2020-12-05 17:43           ` [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman
  0 siblings, 2 replies; 46+ messages in thread
From: Linus Torvalds @ 2020-12-04 20:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Bernd Edlinger, Linux Kernel Mailing List, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro,
	Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Waiman Long, Davidlohr Bueso

On Fri, Dec 4, 2020 at 11:35 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> From a deadlock perspective the change is strictly better than what we
> have today.  The readers will no longer block on each other.

No, agreed, it's better regardless.

> For the specific case that syzbot reported it is readers who were
> blocking on each other so that specific case if fixed.

So the thing is, a reader can still block another reader if a writer
comes in between them. Which is why I was thinking that the same
deadlock could happen if somebody does an execve at just the right
point.

> On the write side of exec_update_lock we have:
>
> cred_guard_mutex -> exec_update_lock
>
> Which means that to get an ABBA deadlock cred_guard_mutex would need to
> be involved

No, see above: you can get a deadlock with

 - first reader gets exec_update_lock

 - writer on exec_update_lock blocks on first reader (this is exec)

 - second reader of exec_update_lock now blocks on the writer.

So if that second reader holds something that the first one wants to
get (or is the same thread as the first one), you have a deadlock: the
first reader will never make any progress, will never release the
lock, and the writer will never get it, and the second reader will
forever wait for the writer that is ahead of it in the queue.

cred_guard_mutex is immaterial, it's not involved in the deadlock.
Yes, the writer holds it, but it's not relevant for anything else.

And that deadlock looks very much like what syzcaller detected, in
exactly that scenario:

  Chain exists of:
    &sig->exec_update_mutex --> sb_writers#4 --> &p->lock

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(&p->lock);
                                 lock(sb_writers#4);
                                 lock(&p->lock);
    lock(&sig->exec_update_mutex);

   *** DEADLOCK ***

No?  The only thing that is missing is that writer that causes the
exec_update_mutex readers to block each other - exactly like they did
when it was a mutex.

But I may be missing something entirely obvious that keeps this from happening.

         Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-04 20:10         ` Linus Torvalds
@ 2020-12-04 20:30           ` Bernd Edlinger
  2020-12-04 20:48             ` Linus Torvalds
  2020-12-05 17:43           ` [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman
  1 sibling, 1 reply; 46+ messages in thread
From: Bernd Edlinger @ 2020-12-04 20:30 UTC (permalink / raw)
  To: Linus Torvalds, Eric W. Biederman
  Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov,
	Cyrill Gorcunov, Sargun Dhillon, Christian Brauner,
	Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long,
	Davidlohr Bueso

On 12/4/20 9:10 PM, Linus Torvalds wrote:
> On Fri, Dec 4, 2020 at 11:35 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> From a deadlock perspective the change is strictly better than what we
>> have today.  The readers will no longer block on each other.
> 
> No, agreed, it's better regardless.
> 
>> For the specific case that syzbot reported it is readers who were
>> blocking on each other so that specific case if fixed.
> 
> So the thing is, a reader can still block another reader if a writer
> comes in between them. Which is why I was thinking that the same
> deadlock could happen if somebody does an execve at just the right
> point.
> 
>> On the write side of exec_update_lock we have:
>>
>> cred_guard_mutex -> exec_update_lock
>>
>> Which means that to get an ABBA deadlock cred_guard_mutex would need to
>> be involved
> 
> No, see above: you can get a deadlock with
> 
>  - first reader gets exec_update_lock
> 
>  - writer on exec_update_lock blocks on first reader (this is exec)
> 
>  - second reader of exec_update_lock now blocks on the writer.
> 
> So if that second reader holds something that the first one wants to
> get (or is the same thread as the first one), you have a deadlock: the
> first reader will never make any progress, will never release the
> lock, and the writer will never get it, and the second reader will
> forever wait for the writer that is ahead of it in the queue.
> 
> cred_guard_mutex is immaterial, it's not involved in the deadlock.
> Yes, the writer holds it, but it's not relevant for anything else.
> 
> And that deadlock looks very much like what syzcaller detected, in
> exactly that scenario:
> 
>   Chain exists of:
>     &sig->exec_update_mutex --> sb_writers#4 --> &p->lock
> 
>    Possible unsafe locking scenario:
> 
>          CPU0                    CPU1
>          ----                    ----
>     lock(&p->lock);
>                                  lock(sb_writers#4);
>                                  lock(&p->lock);
>     lock(&sig->exec_update_mutex);
> 
>    *** DEADLOCK ***
> 
> No?  The only thing that is missing is that writer that causes the
> exec_update_mutex readers to block each other - exactly like they did
> when it was a mutex.
> 
> But I may be missing something entirely obvious that keeps this from happening.
> 

I think you convinced me:

>    perf_event_open  (exec_update_mutex -> ovl_i_mutex)
>    chown            (ovl_i_mutex       -> sb_writes)
>    sendfile         (sb_writes         -> p->lock)
>      by reading from a proc file and writing to overlayfs
>    proc_pid_syscall (p->lock           -> exec_update_mutex)


We need just 5 Philosophers (A-E):

Philosopher A calls proc_pid_syscall, takes p->lock, and falls asleep for a while now.

Philosphper B calls sendfile, takes sb_writes, and has to wait on p->lock.

Philosopher C calls chown, takes ovl_i_mutes, and has to wait on sb_writes.

Philosopher D calls perf_event_open, takes down_read(exec_mutex_lock), and has to wait on ovl_i_mutex.

Philosopher E calls exec, and wants to take down_write(exec_mutex_lock), has to wait for Philosopher D.

Now Philosopher A wakes up from his sleep, and wants to take down_read(exec_mutex_lock), but due
to fairness reasons, he has to wait until Philosopher E gets and releases his write lock again.

If Philosopher A is now blocked, we have a deadlock, right?


Bernd.



>          Linus
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-04 20:30           ` Bernd Edlinger
@ 2020-12-04 20:48             ` Linus Torvalds
  2020-12-04 21:48               ` Davidlohr Bueso
  2020-12-07  9:09               ` Peter Zijlstra
  0 siblings, 2 replies; 46+ messages in thread
From: Linus Torvalds @ 2020-12-04 20:48 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Eric W. Biederman, Linux Kernel Mailing List, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro,
	Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Waiman Long, Davidlohr Bueso

On Fri, Dec 4, 2020 at 12:30 PM Bernd Edlinger
<bernd.edlinger@hotmail.de> wrote:
>>
> >    perf_event_open  (exec_update_mutex -> ovl_i_mutex)

Side note: this one looks like it should be easy to fix.

Is there any real reason why exec_update_mutex is actually gotten that
early, and held for that long in the perf event code?

I _think_ we could move the ptrace check to be much later, to _just_ before that

         * This is the point on no return; we cannot fail hereafter.

point in the perf event install chain..

I don't think it needs to be moved down even that much, I think it
would be sufficient to move it down below the "perf_event_alloc()",
but I didn't check very much.

The fact that create_local_trace_uprobe() can end up going into a
lookup of an OVL filesystem path smells kind of odd to me to begin
with, but I didn't look at the whole thing.

PeterZ, is there something I'm missing?

          Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-04 20:48             ` Linus Torvalds
@ 2020-12-04 21:48               ` Davidlohr Bueso
  2020-12-05 18:05                 ` Eric W. Biederman
  2020-12-07  9:09               ` Peter Zijlstra
  1 sibling, 1 reply; 46+ messages in thread
From: Davidlohr Bueso @ 2020-12-04 21:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bernd Edlinger, Eric W. Biederman, Linux Kernel Mailing List,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn,
	Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov,
	Sargun Dhillon, Christian Brauner, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Waiman Long

On Fri, 04 Dec 2020, Linus Torvalds wrote:

>On Fri, Dec 4, 2020 at 12:30 PM Bernd Edlinger
><bernd.edlinger@hotmail.de> wrote:
>>>
>> >    perf_event_open  (exec_update_mutex -> ovl_i_mutex)
>
>Side note: this one looks like it should be easy to fix.
>
>Is there any real reason why exec_update_mutex is actually gotten that
>early, and held for that long in the perf event code?

afaict just to validate the whole operation early. Per 79c9ce57eb2 the
mutex will guard the check and the perf_install_in_context vs exec.

>
>I _think_ we could move the ptrace check to be much later, to _just_ before that
>
>         * This is the point on no return; we cannot fail hereafter.
>
>point in the perf event install chain..

Peter had the idea of doing the ptrace_may_access() check twice: first
lockless and early, then under exec_update_mutex when it mattered right
before perf_install_in_context():

https://lore.kernel.org/linux-fsdevel/20200828123720.GZ1362448@hirez.programming.kicks-ass.net/

>
>I don't think it needs to be moved down even that much, I think it
>would be sufficient to move it down below the "perf_event_alloc()",
>but I didn't check very much.

Yeah we could just keep a single ptrace_may_access() check just further
down until it won't deadlock.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-04 20:10         ` Linus Torvalds
  2020-12-04 20:30           ` Bernd Edlinger
@ 2020-12-05 17:43           ` Eric W. Biederman
  1 sibling, 0 replies; 46+ messages in thread
From: Eric W. Biederman @ 2020-12-05 17:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bernd Edlinger, Linux Kernel Mailing List, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro,
	Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Waiman Long, Davidlohr Bueso

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Fri, Dec 4, 2020 at 11:35 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> From a deadlock perspective the change is strictly better than what we
>> have today.  The readers will no longer block on each other.
>
> No, agreed, it's better regardless.
>
>> For the specific case that syzbot reported it is readers who were
>> blocking on each other so that specific case if fixed.
>
> So the thing is, a reader can still block another reader if a writer
> comes in between them. Which is why I was thinking that the same
> deadlock could happen if somebody does an execve at just the right
> point.

>> On the write side of exec_update_lock we have:
>>
>> cred_guard_mutex -> exec_update_lock
>>
>> Which means that to get an ABBA deadlock cred_guard_mutex would need to
>> be involved
>
> No, see above: you can get a deadlock with
>
>  - first reader gets exec_update_lock
>
>  - writer on exec_update_lock blocks on first reader (this is exec)
>
>  - second reader of exec_update_lock now blocks on the writer.
>
> So if that second reader holds something that the first one wants to
> get (or is the same thread as the first one), you have a deadlock: the
> first reader will never make any progress, will never release the
> lock, and the writer will never get it, and the second reader will
> forever wait for the writer that is ahead of it in the queue.

Oh.  I see what you are saying.  I knew the writer had to be involved
and block, but if the reader comes first all that has to be added to the
situation is that an exec happens and attempts to take the lock.  Any
reader will block the writer.

For some reason I was blind to the fact that a reader could block the
writer, and I was looking for anything else that might block the writer.

> And that deadlock looks very much like what syzcaller detected, in
> exactly that scenario:
>
>   Chain exists of:
>     &sig->exec_update_mutex --> sb_writers#4 --> &p->lock
>
>    Possible unsafe locking scenario:
>
>          CPU0                    CPU1
>          ----                    ----
>     lock(&p->lock);
>                                  lock(sb_writers#4);
>                                  lock(&p->lock);
>     lock(&sig->exec_update_mutex);
>
>    *** DEADLOCK ***
>
> No?  The only thing that is missing is that writer that causes the
> exec_update_mutex readers to block each other - exactly like they did
> when it was a mutex.
>
> But I may be missing something entirely obvious that keeps this from happening.

I don't think so.

It does look like my change makes it one step more difficult to cause
the deadlock, but the deadlock can still happen. :-(

Making it a read/writer lock both makes it more difficult to cause
a deadlock and makes a lot of sense from a documentation standpoint
so I still plan on merging the changes after the weekend.


It looks like we do need to have a close look at perf_event_open and
everything associated with it.

Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-04 21:48               ` Davidlohr Bueso
@ 2020-12-05 18:05                 ` Eric W. Biederman
  2020-12-07  9:15                   ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Eric W. Biederman @ 2020-12-05 18:05 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Linus Torvalds, Bernd Edlinger, Linux Kernel Mailing List,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn,
	Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov,
	Sargun Dhillon, Christian Brauner, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Waiman Long

Davidlohr Bueso <dave@stgolabs.net> writes:

> On Fri, 04 Dec 2020, Linus Torvalds wrote:
>
>>On Fri, Dec 4, 2020 at 12:30 PM Bernd Edlinger
>><bernd.edlinger@hotmail.de> wrote:
>>>>
>>> >    perf_event_open  (exec_update_mutex -> ovl_i_mutex)
>>
>>Side note: this one looks like it should be easy to fix.
>>
>>Is there any real reason why exec_update_mutex is actually gotten that
>>early, and held for that long in the perf event code?
>
> afaict just to validate the whole operation early. Per 79c9ce57eb2 the
> mutex will guard the check and the perf_install_in_context vs exec.
>
>>
>>I _think_ we could move the ptrace check to be much later, to _just_ before that
>>
>>         * This is the point on no return; we cannot fail hereafter.
>>
>>point in the perf event install chain..
>
> Peter had the idea of doing the ptrace_may_access() check twice: first
> lockless and early, then under exec_update_mutex when it mattered right
> before perf_install_in_context():
>
> https://lore.kernel.org/linux-fsdevel/20200828123720.GZ1362448@hirez.programming.kicks-ass.net/
>
>>
>>I don't think it needs to be moved down even that much, I think it
>>would be sufficient to move it down below the "perf_event_alloc()",
>>but I didn't check very much.
>
> Yeah we could just keep a single ptrace_may_access() check just further
> down until it won't deadlock.

I am trying to understand why the permission check is there.

The ptrace_may_access check came in with the earliest version of the
code I can find.  So going down that path hasn't helped.

There are about 65 calls of perf_pmu_register so it definitely makes me
nervous holding a semaphore over perf_event_alloc.

What is truly silly in all of this is perf_uprobe_event_init fails if
!perfmon_capable().   Which means the ptrace_may_access check and
holding exec_update_mutex is completely irrelevant to the function of
create_local_trace_uprobe.  Which is strange in and of itself.  I would
have thought uprobes would have been the kind of probe that it would
be safe for ordinary users to use.

This is a much deeper rabit hole than I had anticipated when I started
looking and I will have to come back to this after the weekend.

If at all possible I would love to be able to move the grabbing of
exec_update_mutex after perf_event_alloc and the pluggable functions it
calls.  It doesn't feel possible to audit that.

On the flip side the task is passed in as hw.target.  So it may not be
possible to move the permission check.  It is chaos.

Eric





^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-04  1:59   ` Waiman Long
@ 2020-12-07  9:02     ` Peter Zijlstra
  2020-12-07 15:33       ` Waiman Long
                         ` (4 more replies)
  0 siblings, 5 replies; 46+ messages in thread
From: Peter Zijlstra @ 2020-12-07  9:02 UTC (permalink / raw)
  To: Waiman Long
  Cc: Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

On Thu, Dec 03, 2020 at 08:59:13PM -0500, Waiman Long wrote:
> On 12/3/20 3:11 PM, Eric W. Biederman wrote:

> > +static inline int __down_read_interruptible(struct rw_semaphore *sem)
> > +{
> > +	if (!rwsem_read_trylock(sem)) {
> > +		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE)))
> > +			return -EINTR;
> > +		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
> > +	} else {
> > +		rwsem_set_reader_owned(sem);
> > +	}
> > +	return 0;
> > +}
> > +
> >   static inline int __down_read_killable(struct rw_semaphore *sem)
> >   {
> >   	if (!rwsem_read_trylock(sem)) {
> > @@ -1495,6 +1507,20 @@ void __sched down_read(struct rw_semaphore *sem)
> >   }
> >   EXPORT_SYMBOL(down_read);
> > +int __sched down_read_interruptible(struct rw_semaphore *sem)
> > +{
> > +	might_sleep();
> > +	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
> > +
> > +	if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) {
> > +		rwsem_release(&sem->dep_map, _RET_IP_);
> > +		return -EINTR;
> > +	}
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL(down_read_interruptible);
> > +
> >   int __sched down_read_killable(struct rw_semaphore *sem)
> >   {
> >   	might_sleep();
> 
> Acked-by: Waiman Long <longman@redhat.com>

Yeah, that seems correct.. There's an unfortunate amount of copy-paste
there though.

Do we want to follow that up with something like this?

---
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -275,7 +275,25 @@ static inline bool rwsem_read_trylock(st
 	long cnt = atomic_long_add_return_acquire(RWSEM_READER_BIAS, &sem->count);
 	if (WARN_ON_ONCE(cnt < 0))
 		rwsem_set_nonspinnable(sem);
-	return !(cnt & RWSEM_READ_FAILED_MASK);
+
+	if (!(cnt & RWSEM_READ_FAILED_MASK)) {
+		rwsem_set_reader_owned(sem);
+		return true;
+	}
+
+	return false;
+}
+
+static inline bool rwsem_write_trylock(struct rw_semaphore *sem)
+{
+	long tmp = RWSEM_UNLOCKED_VALUE;
+
+	if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, RWSEM_WRITER_LOCKED)) {
+		rwsem_set_owner(sem);
+		return true;
+	}
+
+	return false;
 }
 
 /*
@@ -1335,38 +1353,29 @@ static struct rw_semaphore *rwsem_downgr
 /*
  * lock for reading
  */
-static inline void __down_read(struct rw_semaphore *sem)
+static inline int __down_read_common(struct rw_semaphore *sem, int state)
 {
 	if (!rwsem_read_trylock(sem)) {
-		rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE);
+		if (IS_ERR(rwsem_down_read_slowpath(sem, state)))
+			return -EINTR;
 		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
-	} else {
-		rwsem_set_reader_owned(sem);
 	}
+	return 0;
 }
 
-static inline int __down_read_interruptible(struct rw_semaphore *sem)
+static __always_inline void __down_read(struct rw_semaphore *sem)
 {
-	if (!rwsem_read_trylock(sem)) {
-		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE)))
-			return -EINTR;
-		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
-	} else {
-		rwsem_set_reader_owned(sem);
-	}
-	return 0;
+	__down_read_common(sem, TASK_UNINTERRUPTIBLE);
 }
 
-static inline int __down_read_killable(struct rw_semaphore *sem)
+static __always_inline int __down_read_interruptible(struct rw_semaphore *sem)
 {
-	if (!rwsem_read_trylock(sem)) {
-		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_KILLABLE)))
-			return -EINTR;
-		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
-	} else {
-		rwsem_set_reader_owned(sem);
-	}
-	return 0;
+	return __down_read_common(sem, TASK_INTERRUPTIBLE);
+}
+
+static __always_inline int __down_read_killable(struct rw_semaphore *sem)
+{
+	return __down_read_common(sem, TASK_KILLABLE);
 }
 
 static inline int __down_read_trylock(struct rw_semaphore *sem)
@@ -1392,44 +1401,29 @@ static inline int __down_read_trylock(st
 /*
  * lock for writing
  */
-static inline void __down_write(struct rw_semaphore *sem)
+static inline int __down_write_common(struct rw_semaphore *sem, int state)
 {
-	long tmp = RWSEM_UNLOCKED_VALUE;
-
-	if (unlikely(!atomic_long_try_cmpxchg_acquire(&sem->count, &tmp,
-						      RWSEM_WRITER_LOCKED)))
-		rwsem_down_write_slowpath(sem, TASK_UNINTERRUPTIBLE);
-	else
-		rwsem_set_owner(sem);
+	if (unlikely(!rwsem_write_trylock(sem))) {
+		if (IS_ERR(rwsem_down_write_slowpath(sem, state)))
+			return -EINTR;
+	}
+	return 0;
 }
 
-static inline int __down_write_killable(struct rw_semaphore *sem)
+static __always_inline void __down_write(struct rw_semaphore *sem)
 {
-	long tmp = RWSEM_UNLOCKED_VALUE;
+	__down_write_common(sem, TASK_UNINTERRUPTIBLE);
+}
 
-	if (unlikely(!atomic_long_try_cmpxchg_acquire(&sem->count, &tmp,
-						      RWSEM_WRITER_LOCKED))) {
-		if (IS_ERR(rwsem_down_write_slowpath(sem, TASK_KILLABLE)))
-			return -EINTR;
-	} else {
-		rwsem_set_owner(sem);
-	}
-	return 0;
+static __always_inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	return __down_write_common(sem, TASK_KILLABLE);
 }
 
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
-	long tmp;
-
 	DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem);
-
-	tmp  = RWSEM_UNLOCKED_VALUE;
-	if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp,
-					    RWSEM_WRITER_LOCKED)) {
-		rwsem_set_owner(sem);
-		return true;
-	}
-	return false;
+	return rwsem_write_trylock(sem);
 }
 
 /*

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-04 20:48             ` Linus Torvalds
  2020-12-04 21:48               ` Davidlohr Bueso
@ 2020-12-07  9:09               ` Peter Zijlstra
  2020-12-07 18:40                 ` Linus Torvalds
  1 sibling, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2020-12-07  9:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bernd Edlinger, Eric W. Biederman, Linux Kernel Mailing List,
	Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro,
	Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Waiman Long, Davidlohr Bueso

On Fri, Dec 04, 2020 at 12:48:18PM -0800, Linus Torvalds wrote:
> On Fri, Dec 4, 2020 at 12:30 PM Bernd Edlinger
> <bernd.edlinger@hotmail.de> wrote:
> >>
> > >    perf_event_open  (exec_update_mutex -> ovl_i_mutex)
> 
> Side note: this one looks like it should be easy to fix.

> PeterZ, is there something I'm missing?

Like this?

  https://lkml.kernel.org/r/20200828123720.GZ1362448@hirez.programming.kicks-ass.net

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-05 18:05                 ` Eric W. Biederman
@ 2020-12-07  9:15                   ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2020-12-07  9:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Davidlohr Bueso, Linus Torvalds, Bernd Edlinger,
	Linux Kernel Mailing List, Ingo Molnar, Will Deacon, Jann Horn,
	Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov,
	Sargun Dhillon, Christian Brauner, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Waiman Long

On Sat, Dec 05, 2020 at 12:05:32PM -0600, Eric W. Biederman wrote:
> I am trying to understand why the permission check is there.

It's about observability, is task A allowed to observe state of task B?

By installing a perf event on another task, we can very accurately tell
what it's doing, and isn't fundamentally different from attaching a
debugger (ie. ptrace).

Therefore we chose to use the same security checks. As is good custom,
one does security checks early.

Then Jann came and observed that race against execve mucking with privs,
and we got to hold that mutex across lots.

That patch I proposed earlier should solve that all.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-07  9:02     ` Peter Zijlstra
@ 2020-12-07 15:33       ` Waiman Long
  2020-12-07 16:58         ` David Laight
  2020-12-07 15:56       ` Eric W. Biederman
                         ` (3 subsequent siblings)
  4 siblings, 1 reply; 46+ messages in thread
From: Waiman Long @ 2020-12-07 15:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

On 12/7/20 4:02 AM, Peter Zijlstra wrote:
> On Thu, Dec 03, 2020 at 08:59:13PM -0500, Waiman Long wrote:
>> On 12/3/20 3:11 PM, Eric W. Biederman wrote:
>>> +static inline int __down_read_interruptible(struct rw_semaphore *sem)
>>> +{
>>> +	if (!rwsem_read_trylock(sem)) {
>>> +		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE)))
>>> +			return -EINTR;
>>> +		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
>>> +	} else {
>>> +		rwsem_set_reader_owned(sem);
>>> +	}
>>> +	return 0;
>>> +}
>>> +
>>>    static inline int __down_read_killable(struct rw_semaphore *sem)
>>>    {
>>>    	if (!rwsem_read_trylock(sem)) {
>>> @@ -1495,6 +1507,20 @@ void __sched down_read(struct rw_semaphore *sem)
>>>    }
>>>    EXPORT_SYMBOL(down_read);
>>> +int __sched down_read_interruptible(struct rw_semaphore *sem)
>>> +{
>>> +	might_sleep();
>>> +	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
>>> +
>>> +	if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) {
>>> +		rwsem_release(&sem->dep_map, _RET_IP_);
>>> +		return -EINTR;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +EXPORT_SYMBOL(down_read_interruptible);
>>> +
>>>    int __sched down_read_killable(struct rw_semaphore *sem)
>>>    {
>>>    	might_sleep();
>> Acked-by: Waiman Long <longman@redhat.com>
> Yeah, that seems correct.. There's an unfortunate amount of copy-paste
> there though.
>
> Do we want to follow that up with something like this?

I am actually thinking about similar streamlining once the patch lands.

Your suggested changes look fine to me.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-07  9:02     ` Peter Zijlstra
  2020-12-07 15:33       ` Waiman Long
@ 2020-12-07 15:56       ` Eric W. Biederman
  2020-12-08 14:52         ` Peter Zijlstra
  2020-12-09 18:38       ` [tip: locking/core] locking/rwsem: Introduce rwsem_write_trylock() tip-bot2 for Peter Zijlstra
                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 46+ messages in thread
From: Eric W. Biederman @ 2020-12-07 15:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, linux-kernel, Linus Torvalds, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, Dec 03, 2020 at 08:59:13PM -0500, Waiman Long wrote:
>> On 12/3/20 3:11 PM, Eric W. Biederman wrote:
>
>> > +static inline int __down_read_interruptible(struct rw_semaphore *sem)
>> > +{
>> > +	if (!rwsem_read_trylock(sem)) {
>> > +		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE)))
>> > +			return -EINTR;
>> > +		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
>> > +	} else {
>> > +		rwsem_set_reader_owned(sem);
>> > +	}
>> > +	return 0;
>> > +}
>> > +
>> >   static inline int __down_read_killable(struct rw_semaphore *sem)
>> >   {
>> >   	if (!rwsem_read_trylock(sem)) {
>> > @@ -1495,6 +1507,20 @@ void __sched down_read(struct rw_semaphore *sem)
>> >   }
>> >   EXPORT_SYMBOL(down_read);
>> > +int __sched down_read_interruptible(struct rw_semaphore *sem)
>> > +{
>> > +	might_sleep();
>> > +	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
>> > +
>> > +	if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) {
>> > +		rwsem_release(&sem->dep_map, _RET_IP_);
>> > +		return -EINTR;
>> > +	}
>> > +
>> > +	return 0;
>> > +}
>> > +EXPORT_SYMBOL(down_read_interruptible);
>> > +
>> >   int __sched down_read_killable(struct rw_semaphore *sem)
>> >   {
>> >   	might_sleep();
>> 
>> Acked-by: Waiman Long <longman@redhat.com>
>
> Yeah, that seems correct.. There's an unfortunate amount of copy-paste
> there though.
>
> Do we want to follow that up with something like this?

Do you want to pull these two into a topic branch in the tip tree
based on v10-rc1?

That way we can share these two patches, and then you folks can make
your locking cleanups and I can change exec_update_mutex to a rw_semaphore?

Eric

> ---
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -275,7 +275,25 @@ static inline bool rwsem_read_trylock(st
>  	long cnt = atomic_long_add_return_acquire(RWSEM_READER_BIAS, &sem->count);
>  	if (WARN_ON_ONCE(cnt < 0))
>  		rwsem_set_nonspinnable(sem);
> -	return !(cnt & RWSEM_READ_FAILED_MASK);
> +
> +	if (!(cnt & RWSEM_READ_FAILED_MASK)) {
> +		rwsem_set_reader_owned(sem);
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +static inline bool rwsem_write_trylock(struct rw_semaphore *sem)
> +{
> +	long tmp = RWSEM_UNLOCKED_VALUE;
> +
> +	if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, RWSEM_WRITER_LOCKED)) {
> +		rwsem_set_owner(sem);
> +		return true;
> +	}
> +
> +	return false;
>  }
>  
>  /*
> @@ -1335,38 +1353,29 @@ static struct rw_semaphore *rwsem_downgr
>  /*
>   * lock for reading
>   */
> -static inline void __down_read(struct rw_semaphore *sem)
> +static inline int __down_read_common(struct rw_semaphore *sem, int state)
>  {
>  	if (!rwsem_read_trylock(sem)) {
> -		rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE);
> +		if (IS_ERR(rwsem_down_read_slowpath(sem, state)))
> +			return -EINTR;
>  		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
> -	} else {
> -		rwsem_set_reader_owned(sem);
>  	}
> +	return 0;
>  }
>  
> -static inline int __down_read_interruptible(struct rw_semaphore *sem)
> +static __always_inline void __down_read(struct rw_semaphore *sem)
>  {
> -	if (!rwsem_read_trylock(sem)) {
> -		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE)))
> -			return -EINTR;
> -		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
> -	} else {
> -		rwsem_set_reader_owned(sem);
> -	}
> -	return 0;
> +	__down_read_common(sem, TASK_UNINTERRUPTIBLE);
>  }
>  
> -static inline int __down_read_killable(struct rw_semaphore *sem)
> +static __always_inline int __down_read_interruptible(struct rw_semaphore *sem)
>  {
> -	if (!rwsem_read_trylock(sem)) {
> -		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_KILLABLE)))
> -			return -EINTR;
> -		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
> -	} else {
> -		rwsem_set_reader_owned(sem);
> -	}
> -	return 0;
> +	return __down_read_common(sem, TASK_INTERRUPTIBLE);
> +}
> +
> +static __always_inline int __down_read_killable(struct rw_semaphore *sem)
> +{
> +	return __down_read_common(sem, TASK_KILLABLE);
>  }
>  
>  static inline int __down_read_trylock(struct rw_semaphore *sem)
> @@ -1392,44 +1401,29 @@ static inline int __down_read_trylock(st
>  /*
>   * lock for writing
>   */
> -static inline void __down_write(struct rw_semaphore *sem)
> +static inline int __down_write_common(struct rw_semaphore *sem, int state)
>  {
> -	long tmp = RWSEM_UNLOCKED_VALUE;
> -
> -	if (unlikely(!atomic_long_try_cmpxchg_acquire(&sem->count, &tmp,
> -						      RWSEM_WRITER_LOCKED)))
> -		rwsem_down_write_slowpath(sem, TASK_UNINTERRUPTIBLE);
> -	else
> -		rwsem_set_owner(sem);
> +	if (unlikely(!rwsem_write_trylock(sem))) {
> +		if (IS_ERR(rwsem_down_write_slowpath(sem, state)))
> +			return -EINTR;
> +	}
> +	return 0;
>  }
>  
> -static inline int __down_write_killable(struct rw_semaphore *sem)
> +static __always_inline void __down_write(struct rw_semaphore *sem)
>  {
> -	long tmp = RWSEM_UNLOCKED_VALUE;
> +	__down_write_common(sem, TASK_UNINTERRUPTIBLE);
> +}
>  
> -	if (unlikely(!atomic_long_try_cmpxchg_acquire(&sem->count, &tmp,
> -						      RWSEM_WRITER_LOCKED))) {
> -		if (IS_ERR(rwsem_down_write_slowpath(sem, TASK_KILLABLE)))
> -			return -EINTR;
> -	} else {
> -		rwsem_set_owner(sem);
> -	}
> -	return 0;
> +static __always_inline int __down_write_killable(struct rw_semaphore *sem)
> +{
> +	return __down_write_common(sem, TASK_KILLABLE);
>  }
>  
>  static inline int __down_write_trylock(struct rw_semaphore *sem)
>  {
> -	long tmp;
> -
>  	DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem);
> -
> -	tmp  = RWSEM_UNLOCKED_VALUE;
> -	if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp,
> -					    RWSEM_WRITER_LOCKED)) {
> -		rwsem_set_owner(sem);
> -		return true;
> -	}
> -	return false;
> +	return rwsem_write_trylock(sem);
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-07 15:33       ` Waiman Long
@ 2020-12-07 16:58         ` David Laight
  2020-12-07 19:02           ` Waiman Long
  0 siblings, 1 reply; 46+ messages in thread
From: David Laight @ 2020-12-07 16:58 UTC (permalink / raw)
  To: 'Waiman Long', Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

From: Waiman Long 
> Sent: 07 December 2020 15:34
> 
> On 12/7/20 4:02 AM, Peter Zijlstra wrote:
> > On Thu, Dec 03, 2020 at 08:59:13PM -0500, Waiman Long wrote:
> >> On 12/3/20 3:11 PM, Eric W. Biederman wrote:
> >>> +static inline int __down_read_interruptible(struct rw_semaphore *sem)
> >>> +{
> >>> +	if (!rwsem_read_trylock(sem)) {
> >>> +		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE)))
> >>> +			return -EINTR;
> >>> +		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
> >>> +	} else {
> >>> +		rwsem_set_reader_owned(sem);
> >>> +	}
> >>> +	return 0;
> >>> +}
> >>> +
> >>>    static inline int __down_read_killable(struct rw_semaphore *sem)
> >>>    {
> >>>    	if (!rwsem_read_trylock(sem)) {
> >>> @@ -1495,6 +1507,20 @@ void __sched down_read(struct rw_semaphore *sem)
> >>>    }
> >>>    EXPORT_SYMBOL(down_read);
> >>> +int __sched down_read_interruptible(struct rw_semaphore *sem)
> >>> +{
> >>> +	might_sleep();
> >>> +	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
> >>> +
> >>> +	if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) {
> >>> +		rwsem_release(&sem->dep_map, _RET_IP_);
> >>> +		return -EINTR;
> >>> +	}
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +EXPORT_SYMBOL(down_read_interruptible);
> >>> +
> >>>    int __sched down_read_killable(struct rw_semaphore *sem)
> >>>    {
> >>>    	might_sleep();
> >> Acked-by: Waiman Long <longman@redhat.com>
> > Yeah, that seems correct.. There's an unfortunate amount of copy-paste
> > there though.
> >
> > Do we want to follow that up with something like this?
> 
> I am actually thinking about similar streamlining once the patch lands.
> 
> Your suggested changes look fine to me.

How much more difficult would it be to also add a timeout option?
I looked at adding one to the mutex code - and fell into a big pile
of replicated code.

ISTM that one the initial locked exchange (and spin) fails a few
extra instructions when heading for the sleep don't really matter

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore
  2020-12-07  9:09               ` Peter Zijlstra
@ 2020-12-07 18:40                 ` Linus Torvalds
  2020-12-08  8:34                   ` [PATCH] perf: Break deadlock involving exec_update_mutex Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Linus Torvalds @ 2020-12-07 18:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Bernd Edlinger, Eric W. Biederman, Linux Kernel Mailing List,
	Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro,
	Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Waiman Long, Davidlohr Bueso

On Mon, Dec 7, 2020 at 1:10 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> > PeterZ, is there something I'm missing?
>
> Like this?
>
>   https://lkml.kernel.org/r/20200828123720.GZ1362448@hirez.programming.kicks-ass.net

Yes, except I think you should remove the old ptrace_may_access() check.

Because otherwise we'll just end up having KCSAN complain about the
unlocked optimistic accesses or something like that.

So do the perfmon_capable() check early - it doesn't need the
exec_update_mutex - and then just do the ptrace_may_access() one late.

I don't see any point at all in checking privileges twice, and I do
see real downsides. Not just that KCSAN issue, but also lack of
coverage (ie the second check will then effectively never be tested,
which is bad too).

               Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-07 16:58         ` David Laight
@ 2020-12-07 19:02           ` Waiman Long
  2020-12-08  9:12             ` David Laight
  0 siblings, 1 reply; 46+ messages in thread
From: Waiman Long @ 2020-12-07 19:02 UTC (permalink / raw)
  To: David Laight, Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

On 12/7/20 11:58 AM, David Laight wrote:
> From: Waiman Long
>> Sent: 07 December 2020 15:34
>>
>> On 12/7/20 4:02 AM, Peter Zijlstra wrote:
>>> On Thu, Dec 03, 2020 at 08:59:13PM -0500, Waiman Long wrote:
>>>> On 12/3/20 3:11 PM, Eric W. Biederman wrote:
>>>>> +static inline int __down_read_interruptible(struct rw_semaphore *sem)
>>>>> +{
>>>>> +	if (!rwsem_read_trylock(sem)) {
>>>>> +		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE)))
>>>>> +			return -EINTR;
>>>>> +		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
>>>>> +	} else {
>>>>> +		rwsem_set_reader_owned(sem);
>>>>> +	}
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>>     static inline int __down_read_killable(struct rw_semaphore *sem)
>>>>>     {
>>>>>     	if (!rwsem_read_trylock(sem)) {
>>>>> @@ -1495,6 +1507,20 @@ void __sched down_read(struct rw_semaphore *sem)
>>>>>     }
>>>>>     EXPORT_SYMBOL(down_read);
>>>>> +int __sched down_read_interruptible(struct rw_semaphore *sem)
>>>>> +{
>>>>> +	might_sleep();
>>>>> +	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
>>>>> +
>>>>> +	if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) {
>>>>> +		rwsem_release(&sem->dep_map, _RET_IP_);
>>>>> +		return -EINTR;
>>>>> +	}
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>> +EXPORT_SYMBOL(down_read_interruptible);
>>>>> +
>>>>>     int __sched down_read_killable(struct rw_semaphore *sem)
>>>>>     {
>>>>>     	might_sleep();
>>>> Acked-by: Waiman Long <longman@redhat.com>
>>> Yeah, that seems correct.. There's an unfortunate amount of copy-paste
>>> there though.
>>>
>>> Do we want to follow that up with something like this?
>> I am actually thinking about similar streamlining once the patch lands.
>>
>> Your suggested changes look fine to me.
> How much more difficult would it be to also add a timeout option?
> I looked at adding one to the mutex code - and fell into a big pile
> of replicated code.
>
> ISTM that one the initial locked exchange (and spin) fails a few
> extra instructions when heading for the sleep don't really matter
>
Actually, I had tried that before. See

https://lore.kernel.org/lkml/20190911150537.19527-1-longman@redhat.com/

That is for rwsem, but the same can be done for mutex. However, Peter 
didn't seem to like the idea of a timeout parameter. Anyway, it is 
certainly doable if there is a good use case for it.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] perf: Break deadlock involving exec_update_mutex
  2020-12-07 18:40                 ` Linus Torvalds
@ 2020-12-08  8:34                   ` Peter Zijlstra
  2020-12-08 18:37                     ` Linus Torvalds
  2020-12-10 18:38                     ` Davidlohr Bueso
  0 siblings, 2 replies; 46+ messages in thread
From: Peter Zijlstra @ 2020-12-08  8:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bernd Edlinger, Eric W. Biederman, Linux Kernel Mailing List,
	Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro,
	Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Waiman Long, Davidlohr Bueso

On Mon, Dec 07, 2020 at 10:40:11AM -0800, Linus Torvalds wrote:
> On Mon, Dec 7, 2020 at 1:10 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > > PeterZ, is there something I'm missing?
> >
> > Like this?
> >
> >   https://lkml.kernel.org/r/20200828123720.GZ1362448@hirez.programming.kicks-ass.net
> 
> Yes, except I think you should remove the old ptrace_may_access() check.

> I don't see any point at all in checking privileges twice, and I do
> see real downsides. Not just that KCSAN issue, but also lack of
> coverage (ie the second check will then effectively never be tested,
> which is bad too).

Fair enough, find below.

I suppose I'll queue the below into tip/perf/core for next merge window,
unless you want it in a hurry?

---
Subject: perf: Break deadlock involving exec_update_mutex
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri, 28 Aug 2020 14:37:20 +0200

Syzbot reported a lock inversion involving perf. The sore point being
perf holding exec_update_mutex() for a very long time, specifically
across a whole bunch of filesystem ops in pmu::event_init() (uprobes)
and anon_inode_getfile().

This then inverts against procfs code trying to take
exec_update_mutex.

Move the permission checks later, such that we need to hold the mutex
over less code.

Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |   46 +++++++++++++++++++++++-----------------------
 1 file changed, 23 insertions(+), 23 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11832,24 +11832,6 @@ SYSCALL_DEFINE5(perf_event_open,
 		goto err_task;
 	}
 
-	if (task) {
-		err = mutex_lock_interruptible(&task->signal->exec_update_mutex);
-		if (err)
-			goto err_task;
-
-		/*
-		 * Preserve ptrace permission check for backwards compatibility.
-		 *
-		 * We must hold exec_update_mutex across this and any potential
-		 * perf_install_in_context() call for this new event to
-		 * serialize against exec() altering our credentials (and the
-		 * perf_event_exit_task() that could imply).
-		 */
-		err = -EACCES;
-		if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
-			goto err_cred;
-	}
-
 	if (flags & PERF_FLAG_PID_CGROUP)
 		cgroup_fd = pid;
 
@@ -11857,7 +11839,7 @@ SYSCALL_DEFINE5(perf_event_open,
 				 NULL, NULL, cgroup_fd);
 	if (IS_ERR(event)) {
 		err = PTR_ERR(event);
-		goto err_cred;
+		goto err_task;
 	}
 
 	if (is_sampling_event(event)) {
@@ -11976,6 +11958,24 @@ SYSCALL_DEFINE5(perf_event_open,
 		goto err_context;
 	}
 
+	if (task) {
+		err = mutex_lock_interruptible(&task->signal->exec_update_mutex);
+		if (err)
+			goto err_file;
+
+		/*
+		 * Preserve ptrace permission check for backwards compatibility.
+		 *
+		 * We must hold exec_update_mutex across this and any potential
+		 * perf_install_in_context() call for this new event to
+		 * serialize against exec() altering our credentials (and the
+		 * perf_event_exit_task() that could imply).
+		 */
+		err = -EACCES;
+		if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
+			goto err_cred;
+	}
+
 	if (move_group) {
 		gctx = __perf_event_ctx_lock_double(group_leader, ctx);
 
@@ -12151,7 +12151,10 @@ SYSCALL_DEFINE5(perf_event_open,
 	if (move_group)
 		perf_event_ctx_unlock(group_leader, gctx);
 	mutex_unlock(&ctx->mutex);
-/* err_file: */
+err_cred:
+	if (task)
+		mutex_unlock(&task->signal->exec_update_mutex);
+err_file:
 	fput(event_file);
 err_context:
 	perf_unpin_context(ctx);
@@ -12163,9 +12166,6 @@ SYSCALL_DEFINE5(perf_event_open,
 	 */
 	if (!event_file)
 		free_event(event);
-err_cred:
-	if (task)
-		mutex_unlock(&task->signal->exec_update_mutex);
 err_task:
 	if (task)
 		put_task_struct(task);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-07 19:02           ` Waiman Long
@ 2020-12-08  9:12             ` David Laight
  2020-12-08 12:32               ` Peter Zijlstra
  2020-12-08 15:34               ` Waiman Long
  0 siblings, 2 replies; 46+ messages in thread
From: David Laight @ 2020-12-08  9:12 UTC (permalink / raw)
  To: 'Waiman Long', Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

From: Waiman Long
> Sent: 07 December 2020 19:02
...
> > How much more difficult would it be to also add a timeout option?
> > I looked at adding one to the mutex code - and fell into a big pile
> > of replicated code.
> >
> > ISTM that one the initial locked exchange (and spin) fails a few
> > extra instructions when heading for the sleep don't really matter
> >
> Actually, I had tried that before. See
> 
> https://lore.kernel.org/lkml/20190911150537.19527-1-longman@redhat.com/
> 
> That is for rwsem, but the same can be done for mutex. However, Peter
> didn't seem to like the idea of a timeout parameter. Anyway, it is
> certainly doable if there is a good use case for it.

'Unfortunately' my use-case if for an out-of-tree driver.

The problem I was solving is a status call blocking because
some other code is 'stuck' (probably an oops) with a mutex held.

The code used to use down_timeout() (it was written for 2.4).
When I changed to mutex_(to get optimistic spinning) I lost
the ability to do the timeouts.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-08  9:12             ` David Laight
@ 2020-12-08 12:32               ` Peter Zijlstra
  2020-12-08 13:13                 ` David Laight
  2020-12-08 15:34               ` Waiman Long
  1 sibling, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2020-12-08 12:32 UTC (permalink / raw)
  To: David Laight
  Cc: 'Waiman Long',
	Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

On Tue, Dec 08, 2020 at 09:12:36AM +0000, David Laight wrote:
> From: Waiman Long
> > Sent: 07 December 2020 19:02
> ...
> > > How much more difficult would it be to also add a timeout option?
> > > I looked at adding one to the mutex code - and fell into a big pile
> > > of replicated code.
> > >
> > > ISTM that one the initial locked exchange (and spin) fails a few
> > > extra instructions when heading for the sleep don't really matter
> > >
> > Actually, I had tried that before. See
> > 
> > https://lore.kernel.org/lkml/20190911150537.19527-1-longman@redhat.com/
> > 
> > That is for rwsem, but the same can be done for mutex. However, Peter
> > didn't seem to like the idea of a timeout parameter. Anyway, it is
> > certainly doable if there is a good use case for it.
> 
> 'Unfortunately' my use-case if for an out-of-tree driver.
> 
> The problem I was solving is a status call blocking because
> some other code is 'stuck' (probably an oops) with a mutex held.

Working around oopses is not a sane use-case. If you oops, you get to
keep all the pieces.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-08 12:32               ` Peter Zijlstra
@ 2020-12-08 13:13                 ` David Laight
  0 siblings, 0 replies; 46+ messages in thread
From: David Laight @ 2020-12-08 13:13 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: 'Waiman Long',
	Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

From: Peter Zijlstra
> Sent: 08 December 2020 12:32
> 
> On Tue, Dec 08, 2020 at 09:12:36AM +0000, David Laight wrote:
> > From: Waiman Long
> > > Sent: 07 December 2020 19:02
> > ...
> > > > How much more difficult would it be to also add a timeout option?
> > > > I looked at adding one to the mutex code - and fell into a big pile
> > > > of replicated code.
> > > >
> > > > ISTM that one the initial locked exchange (and spin) fails a few
> > > > extra instructions when heading for the sleep don't really matter
> > > >
> > > Actually, I had tried that before. See
> > >
> > > https://lore.kernel.org/lkml/20190911150537.19527-1-longman@redhat.com/
> > >
> > > That is for rwsem, but the same can be done for mutex. However, Peter
> > > didn't seem to like the idea of a timeout parameter. Anyway, it is
> > > certainly doable if there is a good use case for it.
> >
> > 'Unfortunately' my use-case if for an out-of-tree driver.
> >
> > The problem I was solving is a status call blocking because
> > some other code is 'stuck' (probably an oops) with a mutex held.
> 
> Working around oopses is not a sane use-case. If you oops, you get to
> keep all the pieces.

It's not so much 'working around' as trying to get debug info out
to fix the bug.
It gets annoying when another process lock up.
ISTR there is something global of that nature after a panic.

I have written a version that reported an error if the process
holding the mutex is dead (wasn't race safe against process exit).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-07 15:56       ` Eric W. Biederman
@ 2020-12-08 14:52         ` Peter Zijlstra
  2020-12-08 18:27           ` Eric W. Biederman
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2020-12-08 14:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Waiman Long, linux-kernel, Linus Torvalds, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

On Mon, Dec 07, 2020 at 09:56:34AM -0600, Eric W. Biederman wrote:

> Do you want to pull these two into a topic branch in the tip tree
> based on v10-rc1?

I'll go do that. I'll let the robots chew on it before pushing it out
though, I'll reply once it's in tip.git.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-08  9:12             ` David Laight
  2020-12-08 12:32               ` Peter Zijlstra
@ 2020-12-08 15:34               ` Waiman Long
  2020-12-08 16:23                 ` David Laight
  1 sibling, 1 reply; 46+ messages in thread
From: Waiman Long @ 2020-12-08 15:34 UTC (permalink / raw)
  To: David Laight, Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

On 12/8/20 4:12 AM, David Laight wrote:
> From: Waiman Long
>> Sent: 07 December 2020 19:02
> ...
>>> How much more difficult would it be to also add a timeout option?
>>> I looked at adding one to the mutex code - and fell into a big pile
>>> of replicated code.
>>>
>>> ISTM that one the initial locked exchange (and spin) fails a few
>>> extra instructions when heading for the sleep don't really matter
>>>
>> Actually, I had tried that before. See
>>
>> https://lore.kernel.org/lkml/20190911150537.19527-1-longman@redhat.com/
>>
>> That is for rwsem, but the same can be done for mutex. However, Peter
>> didn't seem to like the idea of a timeout parameter. Anyway, it is
>> certainly doable if there is a good use case for it.
> 'Unfortunately' my use-case if for an out-of-tree driver.
>
> The problem I was solving is a status call blocking because
> some other code is 'stuck' (probably an oops) with a mutex held.
>
> The code used to use down_timeout() (it was written for 2.4).
> When I changed to mutex_(to get optimistic spinning) I lost
> the ability to do the timeouts.

The primary reason for sending out that patchset was to work around some 
circular locking problem in existing code even though these circular 
locking scenarios are not likely to happen. Your case is certainly 
another potential circular locking problem as well.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-08 15:34               ` Waiman Long
@ 2020-12-08 16:23                 ` David Laight
  0 siblings, 0 replies; 46+ messages in thread
From: David Laight @ 2020-12-08 16:23 UTC (permalink / raw)
  To: 'Waiman Long', Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

From: Waiman Long
> Sent: 08 December 2020 15:34
> 
> On 12/8/20 4:12 AM, David Laight wrote:
> > From: Waiman Long
> >> Sent: 07 December 2020 19:02
> > ...
> >>> How much more difficult would it be to also add a timeout option?
> >>> I looked at adding one to the mutex code - and fell into a big pile
> >>> of replicated code.
> >>>
> >>> ISTM that one the initial locked exchange (and spin) fails a few
> >>> extra instructions when heading for the sleep don't really matter
> >>>
> >> Actually, I had tried that before. See
> >>
> >> https://lore.kernel.org/lkml/20190911150537.19527-1-longman@redhat.com/
> >>
> >> That is for rwsem, but the same can be done for mutex. However, Peter
> >> didn't seem to like the idea of a timeout parameter. Anyway, it is
> >> certainly doable if there is a good use case for it.
> > 'Unfortunately' my use-case if for an out-of-tree driver.
> >
> > The problem I was solving is a status call blocking because
> > some other code is 'stuck' (probably an oops) with a mutex held.
> >
> > The code used to use down_timeout() (it was written for 2.4).
> > When I changed to mutex_(to get optimistic spinning) I lost
> > the ability to do the timeouts.
> 
> The primary reason for sending out that patchset was to work around some
> circular locking problem in existing code even though these circular
> locking scenarios are not likely to happen. Your case is certainly
> another potential circular locking problem as well.

If you've got lock-ordering problems they need fixing.
Neither signals nor timeouts are real solutions.
Either may help diagnose the problem, but they aren't fixes.

OTOH if it reasonable to have a request interrupted by a signal
it must also be reasonable to implement a timeout.
Of course, one might wonder whether 'correct' code should ever
be waiting on a mutex for any length of time.
So is there even a justification for interruptible waits for mutex.

FWIW I could implement my timeouts using SIGALARM - but it is a lot
of work :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-08 14:52         ` Peter Zijlstra
@ 2020-12-08 18:27           ` Eric W. Biederman
  2020-12-09 18:36             ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Eric W. Biederman @ 2020-12-08 18:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, linux-kernel, Linus Torvalds, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

Peter Zijlstra <peterz@infradead.org> writes:

> On Mon, Dec 07, 2020 at 09:56:34AM -0600, Eric W. Biederman wrote:
>
>> Do you want to pull these two into a topic branch in the tip tree
>> based on v10-rc1?
>
> I'll go do that. I'll let the robots chew on it before pushing it out
> though, I'll reply once it's in tip.git.

Thanks,

Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] perf: Break deadlock involving exec_update_mutex
  2020-12-08  8:34                   ` [PATCH] perf: Break deadlock involving exec_update_mutex Peter Zijlstra
@ 2020-12-08 18:37                     ` Linus Torvalds
  2020-12-10 18:38                     ` Davidlohr Bueso
  1 sibling, 0 replies; 46+ messages in thread
From: Linus Torvalds @ 2020-12-08 18:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Bernd Edlinger, Eric W. Biederman, Linux Kernel Mailing List,
	Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro,
	Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Waiman Long, Davidlohr Bueso

On Tue, Dec 8, 2020 at 12:34 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> I suppose I'll queue the below into tip/perf/core for next merge window,
> unless you want it in a hurry?

LGTM, and there's no hurry. This is not a new thing, and I don't think
it has been a problem in practice.

            Linus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-08 18:27           ` Eric W. Biederman
@ 2020-12-09 18:36             ` Peter Zijlstra
  2020-12-10 19:33               ` Eric W. Biederman
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2020-12-09 18:36 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Waiman Long, linux-kernel, Linus Torvalds, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

On Tue, Dec 08, 2020 at 12:27:39PM -0600, Eric W. Biederman wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Mon, Dec 07, 2020 at 09:56:34AM -0600, Eric W. Biederman wrote:
> >
> >> Do you want to pull these two into a topic branch in the tip tree
> >> based on v10-rc1?
> >
> > I'll go do that. I'll let the robots chew on it before pushing it out
> > though, I'll reply once it's in tip.git.
> 
> Thanks,

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/rwsem


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [tip: locking/core] locking/rwsem: Fold __down_{read,write}*()
  2020-12-07  9:02     ` Peter Zijlstra
                         ` (2 preceding siblings ...)
  2020-12-09 18:38       ` [tip: locking/core] locking/rwsem: Introduce rwsem_write_trylock() tip-bot2 for Peter Zijlstra
@ 2020-12-09 18:38       ` tip-bot2 for Peter Zijlstra
  2020-12-09 18:38       ` [tip: locking/core] locking/rwsem: Better collate rwsem_read_trylock() tip-bot2 for Peter Zijlstra
  4 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2020-12-09 18:38 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     c995e638ccbbc65a76d1713c4fdcf927e7e2cb83
Gitweb:        https://git.kernel.org/tip/c995e638ccbbc65a76d1713c4fdcf927e7e2cb83
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 08 Dec 2020 10:27:41 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 09 Dec 2020 17:08:47 +01:00

locking/rwsem: Fold __down_{read,write}*()

There's a lot needless duplication in __down_{read,write}*(), cure
that with a helper.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201207090243.GE3040@hirez.programming.kicks-ass.net
---
 kernel/locking/rwsem.c | 45 ++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 7915456..67ae366 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1354,32 +1354,29 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
 /*
  * lock for reading
  */
-static inline void __down_read(struct rw_semaphore *sem)
+static inline int __down_read_common(struct rw_semaphore *sem, int state)
 {
 	if (!rwsem_read_trylock(sem)) {
-		rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE);
+		if (IS_ERR(rwsem_down_read_slowpath(sem, state)))
+			return -EINTR;
 		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
 	}
+	return 0;
+}
+
+static inline void __down_read(struct rw_semaphore *sem)
+{
+	__down_read_common(sem, TASK_UNINTERRUPTIBLE);
 }
 
 static inline int __down_read_interruptible(struct rw_semaphore *sem)
 {
-	if (!rwsem_read_trylock(sem)) {
-		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE)))
-			return -EINTR;
-		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
-	}
-	return 0;
+	return __down_read_common(sem, TASK_INTERRUPTIBLE);
 }
 
 static inline int __down_read_killable(struct rw_semaphore *sem)
 {
-	if (!rwsem_read_trylock(sem)) {
-		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_KILLABLE)))
-			return -EINTR;
-		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
-	}
-	return 0;
+	return __down_read_common(sem, TASK_KILLABLE);
 }
 
 static inline int __down_read_trylock(struct rw_semaphore *sem)
@@ -1405,22 +1402,26 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	if (unlikely(!rwsem_write_trylock(sem)))
-		rwsem_down_write_slowpath(sem, TASK_UNINTERRUPTIBLE);
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
+static inline int __down_write_common(struct rw_semaphore *sem, int state)
 {
 	if (unlikely(!rwsem_write_trylock(sem))) {
-		if (IS_ERR(rwsem_down_write_slowpath(sem, TASK_KILLABLE)))
+		if (IS_ERR(rwsem_down_write_slowpath(sem, state)))
 			return -EINTR;
 	}
 
 	return 0;
 }
 
+static inline void __down_write(struct rw_semaphore *sem)
+{
+	__down_write_common(sem, TASK_UNINTERRUPTIBLE);
+}
+
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	return __down_write_common(sem, TASK_KILLABLE);
+}
+
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
 	DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [tip: locking/core] locking/rwsem: Introduce rwsem_write_trylock()
  2020-12-07  9:02     ` Peter Zijlstra
  2020-12-07 15:33       ` Waiman Long
  2020-12-07 15:56       ` Eric W. Biederman
@ 2020-12-09 18:38       ` tip-bot2 for Peter Zijlstra
  2020-12-09 18:38       ` [tip: locking/core] locking/rwsem: Fold __down_{read,write}*() tip-bot2 for Peter Zijlstra
  2020-12-09 18:38       ` [tip: locking/core] locking/rwsem: Better collate rwsem_read_trylock() tip-bot2 for Peter Zijlstra
  4 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2020-12-09 18:38 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     285c61aedf6bc5d81b37e4dc48c19012e8ff9836
Gitweb:        https://git.kernel.org/tip/285c61aedf6bc5d81b37e4dc48c19012e8ff9836
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 08 Dec 2020 10:25:06 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 09 Dec 2020 17:08:47 +01:00

locking/rwsem: Introduce rwsem_write_trylock()

One copy of this logic is better than three.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201207090243.GE3040@hirez.programming.kicks-ass.net
---
 kernel/locking/rwsem.c | 38 ++++++++++++++++----------------------
 1 file changed, 16 insertions(+), 22 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 5c0dc7e..7915456 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -285,6 +285,18 @@ static inline bool rwsem_read_trylock(struct rw_semaphore *sem)
 	return false;
 }
 
+static inline bool rwsem_write_trylock(struct rw_semaphore *sem)
+{
+	long tmp = RWSEM_UNLOCKED_VALUE;
+
+	if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, RWSEM_WRITER_LOCKED)) {
+		rwsem_set_owner(sem);
+		return true;
+	}
+
+	return false;
+}
+
 /*
  * Return just the real task structure pointer of the owner
  */
@@ -1395,42 +1407,24 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
  */
 static inline void __down_write(struct rw_semaphore *sem)
 {
-	long tmp = RWSEM_UNLOCKED_VALUE;
-
-	if (unlikely(!atomic_long_try_cmpxchg_acquire(&sem->count, &tmp,
-						      RWSEM_WRITER_LOCKED)))
+	if (unlikely(!rwsem_write_trylock(sem)))
 		rwsem_down_write_slowpath(sem, TASK_UNINTERRUPTIBLE);
-	else
-		rwsem_set_owner(sem);
 }
 
 static inline int __down_write_killable(struct rw_semaphore *sem)
 {
-	long tmp = RWSEM_UNLOCKED_VALUE;
-
-	if (unlikely(!atomic_long_try_cmpxchg_acquire(&sem->count, &tmp,
-						      RWSEM_WRITER_LOCKED))) {
+	if (unlikely(!rwsem_write_trylock(sem))) {
 		if (IS_ERR(rwsem_down_write_slowpath(sem, TASK_KILLABLE)))
 			return -EINTR;
-	} else {
-		rwsem_set_owner(sem);
 	}
+
 	return 0;
 }
 
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
-	long tmp;
-
 	DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem);
-
-	tmp  = RWSEM_UNLOCKED_VALUE;
-	if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp,
-					    RWSEM_WRITER_LOCKED)) {
-		rwsem_set_owner(sem);
-		return true;
-	}
-	return false;
+	return rwsem_write_trylock(sem);
 }
 
 /*

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [tip: locking/core] rwsem: Implement down_read_interruptible
  2020-12-03 20:11 ` [PATCH 2/3] rwsem: Implement down_read_interruptible Eric W. Biederman
  2020-12-04  1:59   ` Waiman Long
@ 2020-12-09 18:38   ` tip-bot2 for Eric W. Biederman
  1 sibling, 0 replies; 46+ messages in thread
From: tip-bot2 for Eric W. Biederman @ 2020-12-09 18:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Eric W. Biederman, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     31784cff7ee073b34d6eddabb95e3be2880a425c
Gitweb:        https://git.kernel.org/tip/31784cff7ee073b34d6eddabb95e3be2880a425c
Author:        Eric W. Biederman <ebiederm@xmission.com>
AuthorDate:    Thu, 03 Dec 2020 14:11:13 -06:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 09 Dec 2020 17:08:42 +01:00

rwsem: Implement down_read_interruptible

In preparation for converting exec_update_mutex to a rwsem so that
multiple readers can execute in parallel and not deadlock, add
down_read_interruptible.  This is needed for perf_event_open to be
converted (with no semantic changes) from working on a mutex to
wroking on a rwsem.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/87k0tybqfy.fsf@x220.int.ebiederm.org
---
 include/linux/rwsem.h  |  1 +
 kernel/locking/rwsem.c | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 13021b0..4c715be 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -123,6 +123,7 @@ static inline int rwsem_is_contended(struct rw_semaphore *sem)
  * lock for reading
  */
 extern void down_read(struct rw_semaphore *sem);
+extern int __must_check down_read_interruptible(struct rw_semaphore *sem);
 extern int __must_check down_read_killable(struct rw_semaphore *sem);
 
 /*
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 54d11cb..a163542 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1345,6 +1345,18 @@ static inline void __down_read(struct rw_semaphore *sem)
 	}
 }
 
+static inline int __down_read_interruptible(struct rw_semaphore *sem)
+{
+	if (!rwsem_read_trylock(sem)) {
+		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE)))
+			return -EINTR;
+		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
+	} else {
+		rwsem_set_reader_owned(sem);
+	}
+	return 0;
+}
+
 static inline int __down_read_killable(struct rw_semaphore *sem)
 {
 	if (!rwsem_read_trylock(sem)) {
@@ -1495,6 +1507,20 @@ void __sched down_read(struct rw_semaphore *sem)
 }
 EXPORT_SYMBOL(down_read);
 
+int __sched down_read_interruptible(struct rw_semaphore *sem)
+{
+	might_sleep();
+	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
+
+	if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) {
+		rwsem_release(&sem->dep_map, _RET_IP_);
+		return -EINTR;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(down_read_interruptible);
+
 int __sched down_read_killable(struct rw_semaphore *sem)
 {
 	might_sleep();

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [tip: locking/core] locking/rwsem: Better collate rwsem_read_trylock()
  2020-12-07  9:02     ` Peter Zijlstra
                         ` (3 preceding siblings ...)
  2020-12-09 18:38       ` [tip: locking/core] locking/rwsem: Fold __down_{read,write}*() tip-bot2 for Peter Zijlstra
@ 2020-12-09 18:38       ` tip-bot2 for Peter Zijlstra
  4 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2020-12-09 18:38 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     3379116a0ca965b00e6522c7ea3f16c9dbd8f9f9
Gitweb:        https://git.kernel.org/tip/3379116a0ca965b00e6522c7ea3f16c9dbd8f9f9
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 08 Dec 2020 10:22:16 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 09 Dec 2020 17:08:47 +01:00

locking/rwsem: Better collate rwsem_read_trylock()

All users of rwsem_read_trylock() do rwsem_set_reader_owned(sem) on
success, move it into rwsem_read_trylock() proper.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201207090243.GE3040@hirez.programming.kicks-ass.net
---
 kernel/locking/rwsem.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index a163542..5c0dc7e 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -273,9 +273,16 @@ static inline void rwsem_set_nonspinnable(struct rw_semaphore *sem)
 static inline bool rwsem_read_trylock(struct rw_semaphore *sem)
 {
 	long cnt = atomic_long_add_return_acquire(RWSEM_READER_BIAS, &sem->count);
+
 	if (WARN_ON_ONCE(cnt < 0))
 		rwsem_set_nonspinnable(sem);
-	return !(cnt & RWSEM_READ_FAILED_MASK);
+
+	if (!(cnt & RWSEM_READ_FAILED_MASK)) {
+		rwsem_set_reader_owned(sem);
+		return true;
+	}
+
+	return false;
 }
 
 /*
@@ -1340,8 +1347,6 @@ static inline void __down_read(struct rw_semaphore *sem)
 	if (!rwsem_read_trylock(sem)) {
 		rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE);
 		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
-	} else {
-		rwsem_set_reader_owned(sem);
 	}
 }
 
@@ -1351,8 +1356,6 @@ static inline int __down_read_interruptible(struct rw_semaphore *sem)
 		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE)))
 			return -EINTR;
 		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
-	} else {
-		rwsem_set_reader_owned(sem);
 	}
 	return 0;
 }
@@ -1363,8 +1366,6 @@ static inline int __down_read_killable(struct rw_semaphore *sem)
 		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_KILLABLE)))
 			return -EINTR;
 		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
-	} else {
-		rwsem_set_reader_owned(sem);
 	}
 	return 0;
 }

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [tip: locking/core] rwsem: Implement down_read_killable_nested
  2020-12-03 20:10 ` [PATCH 1/3] rwsem: Implement down_read_killable_nested Eric W. Biederman
  2020-12-04  1:58   ` Waiman Long
@ 2020-12-09 18:38   ` tip-bot2 for Eric W. Biederman
  1 sibling, 0 replies; 46+ messages in thread
From: tip-bot2 for Eric W. Biederman @ 2020-12-09 18:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Eric W. Biederman, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     0f9368b5bf6db0c04afc5454b1be79022a681615
Gitweb:        https://git.kernel.org/tip/0f9368b5bf6db0c04afc5454b1be79022a681615
Author:        Eric W. Biederman <ebiederm@xmission.com>
AuthorDate:    Thu, 03 Dec 2020 14:10:32 -06:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 09 Dec 2020 17:08:41 +01:00

rwsem: Implement down_read_killable_nested

In preparation for converting exec_update_mutex to a rwsem so that
multiple readers can execute in parallel and not deadlock, add
down_read_killable_nested.  This is needed so that kcmp_lock
can be converted from working on a mutexes to working on rw_semaphores.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/87o8jabqh3.fsf@x220.int.ebiederm.org
---
 include/linux/rwsem.h  |  2 ++
 kernel/locking/rwsem.c | 14 ++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 25e3fde..13021b0 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -171,6 +171,7 @@ extern void downgrade_write(struct rw_semaphore *sem);
  * See Documentation/locking/lockdep-design.rst for more details.)
  */
 extern void down_read_nested(struct rw_semaphore *sem, int subclass);
+extern int __must_check down_read_killable_nested(struct rw_semaphore *sem, int subclass);
 extern void down_write_nested(struct rw_semaphore *sem, int subclass);
 extern int down_write_killable_nested(struct rw_semaphore *sem, int subclass);
 extern void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest_lock);
@@ -191,6 +192,7 @@ extern void down_read_non_owner(struct rw_semaphore *sem);
 extern void up_read_non_owner(struct rw_semaphore *sem);
 #else
 # define down_read_nested(sem, subclass)		down_read(sem)
+# define down_read_killable_nested(sem, subclass)	down_read_killable(sem)
 # define down_write_nest_lock(sem, nest_lock)	down_write(sem)
 # define down_write_nested(sem, subclass)	down_write(sem)
 # define down_write_killable_nested(sem, subclass)	down_write_killable(sem)
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index f11b9bd..54d11cb 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1605,6 +1605,20 @@ void down_read_nested(struct rw_semaphore *sem, int subclass)
 }
 EXPORT_SYMBOL(down_read_nested);
 
+int down_read_killable_nested(struct rw_semaphore *sem, int subclass)
+{
+	might_sleep();
+	rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_);
+
+	if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_killable)) {
+		rwsem_release(&sem->dep_map, _RET_IP_);
+		return -EINTR;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(down_read_killable_nested);
+
 void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest)
 {
 	might_sleep();

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] perf: Break deadlock involving exec_update_mutex
  2020-12-08  8:34                   ` [PATCH] perf: Break deadlock involving exec_update_mutex Peter Zijlstra
  2020-12-08 18:37                     ` Linus Torvalds
@ 2020-12-10 18:38                     ` Davidlohr Bueso
  2020-12-10 19:40                       ` Eric W. Biederman
  1 sibling, 1 reply; 46+ messages in thread
From: Davidlohr Bueso @ 2020-12-10 18:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Bernd Edlinger, Eric W. Biederman,
	Linux Kernel Mailing List, Ingo Molnar, Will Deacon, Jann Horn,
	Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov,
	Sargun Dhillon, Christian Brauner, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Waiman Long

On Tue, 08 Dec 2020, Peter Zijlstra wrote:

>I suppose I'll queue the below into tip/perf/core for next merge window,
>unless you want it in a hurry?

I'm thinking we'd still want Eric's series on top of this for the reader
benefits of the lock.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-09 18:36             ` Peter Zijlstra
@ 2020-12-10 19:33               ` Eric W. Biederman
  2020-12-11  8:16                 ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Eric W. Biederman @ 2020-12-10 19:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, linux-kernel, Linus Torvalds, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

Peter Zijlstra <peterz@infradead.org> writes:

> On Tue, Dec 08, 2020 at 12:27:39PM -0600, Eric W. Biederman wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>> 
>> > On Mon, Dec 07, 2020 at 09:56:34AM -0600, Eric W. Biederman wrote:
>> >
>> >> Do you want to pull these two into a topic branch in the tip tree
>> >> based on v10-rc1?
>> >
>> > I'll go do that. I'll let the robots chew on it before pushing it out
>> > though, I'll reply once it's in tip.git.
>> 
>> Thanks,
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/rwsem

Is that branch supposed to be against 34816d20f173
("Merge tag 'gfs2-v5.10-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2")

If so I can live with that, but it is a little awkward to work with a
base that recent.  As all of my other branches have an older base.

Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] perf: Break deadlock involving exec_update_mutex
  2020-12-10 18:38                     ` Davidlohr Bueso
@ 2020-12-10 19:40                       ` Eric W. Biederman
  0 siblings, 0 replies; 46+ messages in thread
From: Eric W. Biederman @ 2020-12-10 19:40 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Linus Torvalds, Bernd Edlinger,
	Linux Kernel Mailing List, Ingo Molnar, Will Deacon, Jann Horn,
	Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov,
	Sargun Dhillon, Christian Brauner, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Waiman Long

Davidlohr Bueso <dave@stgolabs.net> writes:

> On Tue, 08 Dec 2020, Peter Zijlstra wrote:
>
>>I suppose I'll queue the below into tip/perf/core for next merge window,
>>unless you want it in a hurry?
>
> I'm thinking we'd still want Eric's series on top of this for the reader
> benefits of the lock.

We will see shortly what happens when the various branches all make it
into linux-next.   The two changes don't conflict in principle so it
should not be a problem.

Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible
  2020-12-10 19:33               ` Eric W. Biederman
@ 2020-12-11  8:16                 ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2020-12-11  8:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Waiman Long, linux-kernel, Linus Torvalds, Ingo Molnar,
	Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger,
	Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon,
	Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo

On Thu, Dec 10, 2020 at 01:33:25PM -0600, Eric W. Biederman wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Tue, Dec 08, 2020 at 12:27:39PM -0600, Eric W. Biederman wrote:
> >> Peter Zijlstra <peterz@infradead.org> writes:
> >> 
> >> > On Mon, Dec 07, 2020 at 09:56:34AM -0600, Eric W. Biederman wrote:
> >> >
> >> >> Do you want to pull these two into a topic branch in the tip tree
> >> >> based on v10-rc1?
> >> >
> >> > I'll go do that. I'll let the robots chew on it before pushing it out
> >> > though, I'll reply once it's in tip.git.
> >> 
> >> Thanks,
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/rwsem
> 
> Is that branch supposed to be against 34816d20f173
> ("Merge tag 'gfs2-v5.10-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2")

That's what it looks like indeed. IIRC my script was supposed to pick
the most recent -rc when creating new branches, but then I've no idea
how I ended up on this. I'll go dig..

> If so I can live with that, but it is a little awkward to work with a
> base that recent.  As all of my other branches have an older base.

I missed you explicitly requested -rc1, sorry about that :/



^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2020-12-11  8:18 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03 20:09 [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman
2020-12-03 20:10 ` [PATCH 1/3] rwsem: Implement down_read_killable_nested Eric W. Biederman
2020-12-04  1:58   ` Waiman Long
2020-12-09 18:38   ` [tip: locking/core] " tip-bot2 for Eric W. Biederman
2020-12-03 20:11 ` [PATCH 2/3] rwsem: Implement down_read_interruptible Eric W. Biederman
2020-12-04  1:59   ` Waiman Long
2020-12-07  9:02     ` Peter Zijlstra
2020-12-07 15:33       ` Waiman Long
2020-12-07 16:58         ` David Laight
2020-12-07 19:02           ` Waiman Long
2020-12-08  9:12             ` David Laight
2020-12-08 12:32               ` Peter Zijlstra
2020-12-08 13:13                 ` David Laight
2020-12-08 15:34               ` Waiman Long
2020-12-08 16:23                 ` David Laight
2020-12-07 15:56       ` Eric W. Biederman
2020-12-08 14:52         ` Peter Zijlstra
2020-12-08 18:27           ` Eric W. Biederman
2020-12-09 18:36             ` Peter Zijlstra
2020-12-10 19:33               ` Eric W. Biederman
2020-12-11  8:16                 ` Peter Zijlstra
2020-12-09 18:38       ` [tip: locking/core] locking/rwsem: Introduce rwsem_write_trylock() tip-bot2 for Peter Zijlstra
2020-12-09 18:38       ` [tip: locking/core] locking/rwsem: Fold __down_{read,write}*() tip-bot2 for Peter Zijlstra
2020-12-09 18:38       ` [tip: locking/core] locking/rwsem: Better collate rwsem_read_trylock() tip-bot2 for Peter Zijlstra
2020-12-09 18:38   ` [tip: locking/core] rwsem: Implement down_read_interruptible tip-bot2 for Eric W. Biederman
2020-12-03 20:12 ` [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman
2020-12-04 16:08   ` Bernd Edlinger
2020-12-04 17:21     ` Linus Torvalds
2020-12-04 19:34       ` Eric W. Biederman
2020-12-04 20:10         ` Linus Torvalds
2020-12-04 20:30           ` Bernd Edlinger
2020-12-04 20:48             ` Linus Torvalds
2020-12-04 21:48               ` Davidlohr Bueso
2020-12-05 18:05                 ` Eric W. Biederman
2020-12-07  9:15                   ` Peter Zijlstra
2020-12-07  9:09               ` Peter Zijlstra
2020-12-07 18:40                 ` Linus Torvalds
2020-12-08  8:34                   ` [PATCH] perf: Break deadlock involving exec_update_mutex Peter Zijlstra
2020-12-08 18:37                     ` Linus Torvalds
2020-12-10 18:38                     ` Davidlohr Bueso
2020-12-10 19:40                       ` Eric W. Biederman
2020-12-05 17:43           ` [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman
2020-12-04 17:39     ` Eric W. Biederman
2020-12-03 22:42 ` [PATCH 0/3] " Linus Torvalds
2020-12-04  1:56   ` Waiman Long
2020-12-04  4:54   ` Davidlohr Bueso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).