* [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore @ 2020-12-03 20:09 Eric W. Biederman 2020-12-03 20:10 ` [PATCH 1/3] rwsem: Implement down_read_killable_nested Eric W. Biederman ` (3 more replies) 0 siblings, 4 replies; 46+ messages in thread From: Eric W. Biederman @ 2020-12-03 20:09 UTC (permalink / raw) To: linux-kernel Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo Recently syzbot reported[0] that there is a deadlock amongst the users of exec_update_mutex. The simplest and most robust solution appears to be making exec_update_mutex a read/write lock and having everything execept for exec take the lock for read. This set of changes upgrades rwsem so it has the functionality needed and uses a rw_semaphore to replace the current mutex. Eric W. Biederman (3): rwsem: Implement down_read_killable_nested rwsem: Implement down_read_interruptible exec: Transform exec_update_mutex into a rw_semaphore fs/exec.c | 12 ++++++------ fs/proc/base.c | 10 +++++----- include/linux/rwsem.h | 3 +++ include/linux/sched/signal.h | 11 ++++++----- init/init_task.c | 2 +- kernel/events/core.c | 12 ++++++------ kernel/fork.c | 6 +++--- kernel/kcmp.c | 30 +++++++++++++++--------------- kernel/locking/rwsem.c | 40 ++++++++++++++++++++++++++++++++++++++++ kernel/pid.c | 4 ++-- 10 files changed, 87 insertions(+), 43 deletions(-) [0] https://lkml.kernel.org/r/00000000000063640c05ade8e3de@google.com Eric ^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH 1/3] rwsem: Implement down_read_killable_nested 2020-12-03 20:09 [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman @ 2020-12-03 20:10 ` Eric W. Biederman 2020-12-04 1:58 ` Waiman Long 2020-12-09 18:38 ` [tip: locking/core] " tip-bot2 for Eric W. Biederman 2020-12-03 20:11 ` [PATCH 2/3] rwsem: Implement down_read_interruptible Eric W. Biederman ` (2 subsequent siblings) 3 siblings, 2 replies; 46+ messages in thread From: Eric W. Biederman @ 2020-12-03 20:10 UTC (permalink / raw) To: linux-kernel Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo In preparation for converting exec_update_mutex to a rwsem so that multiple readers can execute in parallel and not deadlock, add down_read_killable_nested. This is needed so that kcmp_lock can be converted from working on a mutexes to working on rw_semaphores. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- include/linux/rwsem.h | 2 ++ kernel/locking/rwsem.c | 14 ++++++++++++++ 2 files changed, 16 insertions(+) diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index 25e3fde85617..13021b08b2ed 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -171,6 +171,7 @@ extern void downgrade_write(struct rw_semaphore *sem); * See Documentation/locking/lockdep-design.rst for more details.) */ extern void down_read_nested(struct rw_semaphore *sem, int subclass); +extern int __must_check down_read_killable_nested(struct rw_semaphore *sem, int subclass); extern void down_write_nested(struct rw_semaphore *sem, int subclass); extern int down_write_killable_nested(struct rw_semaphore *sem, int subclass); extern void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest_lock); @@ -191,6 +192,7 @@ extern void down_read_non_owner(struct rw_semaphore *sem); extern void up_read_non_owner(struct rw_semaphore *sem); #else # define down_read_nested(sem, subclass) down_read(sem) +# define down_read_killable_nested(sem, subclass) down_read_killable(sem) # define down_write_nest_lock(sem, nest_lock) down_write(sem) # define down_write_nested(sem, subclass) down_write(sem) # define down_write_killable_nested(sem, subclass) down_write_killable(sem) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index f11b9bd3431d..54d11cb97551 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -1605,6 +1605,20 @@ void down_read_nested(struct rw_semaphore *sem, int subclass) } EXPORT_SYMBOL(down_read_nested); +int down_read_killable_nested(struct rw_semaphore *sem, int subclass) +{ + might_sleep(); + rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_); + + if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_killable)) { + rwsem_release(&sem->dep_map, _RET_IP_); + return -EINTR; + } + + return 0; +} +EXPORT_SYMBOL(down_read_killable_nested); + void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest) { might_sleep(); -- 2.25.0 ^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH 1/3] rwsem: Implement down_read_killable_nested 2020-12-03 20:10 ` [PATCH 1/3] rwsem: Implement down_read_killable_nested Eric W. Biederman @ 2020-12-04 1:58 ` Waiman Long 2020-12-09 18:38 ` [tip: locking/core] " tip-bot2 for Eric W. Biederman 1 sibling, 0 replies; 46+ messages in thread From: Waiman Long @ 2020-12-04 1:58 UTC (permalink / raw) To: Eric W. Biederman, linux-kernel Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo On 12/3/20 3:10 PM, Eric W. Biederman wrote: > In preparation for converting exec_update_mutex to a rwsem so that > multiple readers can execute in parallel and not deadlock, add > down_read_killable_nested. This is needed so that kcmp_lock > can be converted from working on a mutexes to working on rw_semaphores. > > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Will Deacon <will@kernel.org> > Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> > --- > include/linux/rwsem.h | 2 ++ > kernel/locking/rwsem.c | 14 ++++++++++++++ > 2 files changed, 16 insertions(+) > > diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h > index 25e3fde85617..13021b08b2ed 100644 > --- a/include/linux/rwsem.h > +++ b/include/linux/rwsem.h > @@ -171,6 +171,7 @@ extern void downgrade_write(struct rw_semaphore *sem); > * See Documentation/locking/lockdep-design.rst for more details.) > */ > extern void down_read_nested(struct rw_semaphore *sem, int subclass); > +extern int __must_check down_read_killable_nested(struct rw_semaphore *sem, int subclass); > extern void down_write_nested(struct rw_semaphore *sem, int subclass); > extern int down_write_killable_nested(struct rw_semaphore *sem, int subclass); > extern void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest_lock); > @@ -191,6 +192,7 @@ extern void down_read_non_owner(struct rw_semaphore *sem); > extern void up_read_non_owner(struct rw_semaphore *sem); > #else > # define down_read_nested(sem, subclass) down_read(sem) > +# define down_read_killable_nested(sem, subclass) down_read_killable(sem) > # define down_write_nest_lock(sem, nest_lock) down_write(sem) > # define down_write_nested(sem, subclass) down_write(sem) > # define down_write_killable_nested(sem, subclass) down_write_killable(sem) > diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c > index f11b9bd3431d..54d11cb97551 100644 > --- a/kernel/locking/rwsem.c > +++ b/kernel/locking/rwsem.c > @@ -1605,6 +1605,20 @@ void down_read_nested(struct rw_semaphore *sem, int subclass) > } > EXPORT_SYMBOL(down_read_nested); > > +int down_read_killable_nested(struct rw_semaphore *sem, int subclass) > +{ > + might_sleep(); > + rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_); > + > + if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_killable)) { > + rwsem_release(&sem->dep_map, _RET_IP_); > + return -EINTR; > + } > + > + return 0; > +} > +EXPORT_SYMBOL(down_read_killable_nested); > + > void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest) > { > might_sleep(); Acked-by: Waiman Long <longman@redhat.com> -Longman ^ permalink raw reply [flat|nested] 46+ messages in thread
* [tip: locking/core] rwsem: Implement down_read_killable_nested 2020-12-03 20:10 ` [PATCH 1/3] rwsem: Implement down_read_killable_nested Eric W. Biederman 2020-12-04 1:58 ` Waiman Long @ 2020-12-09 18:38 ` tip-bot2 for Eric W. Biederman 1 sibling, 0 replies; 46+ messages in thread From: tip-bot2 for Eric W. Biederman @ 2020-12-09 18:38 UTC (permalink / raw) To: linux-tip-commits Cc: Eric W. Biederman, Peter Zijlstra (Intel), x86, linux-kernel The following commit has been merged into the locking/core branch of tip: Commit-ID: 0f9368b5bf6db0c04afc5454b1be79022a681615 Gitweb: https://git.kernel.org/tip/0f9368b5bf6db0c04afc5454b1be79022a681615 Author: Eric W. Biederman <ebiederm@xmission.com> AuthorDate: Thu, 03 Dec 2020 14:10:32 -06:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Wed, 09 Dec 2020 17:08:41 +01:00 rwsem: Implement down_read_killable_nested In preparation for converting exec_update_mutex to a rwsem so that multiple readers can execute in parallel and not deadlock, add down_read_killable_nested. This is needed so that kcmp_lock can be converted from working on a mutexes to working on rw_semaphores. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/87o8jabqh3.fsf@x220.int.ebiederm.org --- include/linux/rwsem.h | 2 ++ kernel/locking/rwsem.c | 14 ++++++++++++++ 2 files changed, 16 insertions(+) diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index 25e3fde..13021b0 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -171,6 +171,7 @@ extern void downgrade_write(struct rw_semaphore *sem); * See Documentation/locking/lockdep-design.rst for more details.) */ extern void down_read_nested(struct rw_semaphore *sem, int subclass); +extern int __must_check down_read_killable_nested(struct rw_semaphore *sem, int subclass); extern void down_write_nested(struct rw_semaphore *sem, int subclass); extern int down_write_killable_nested(struct rw_semaphore *sem, int subclass); extern void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest_lock); @@ -191,6 +192,7 @@ extern void down_read_non_owner(struct rw_semaphore *sem); extern void up_read_non_owner(struct rw_semaphore *sem); #else # define down_read_nested(sem, subclass) down_read(sem) +# define down_read_killable_nested(sem, subclass) down_read_killable(sem) # define down_write_nest_lock(sem, nest_lock) down_write(sem) # define down_write_nested(sem, subclass) down_write(sem) # define down_write_killable_nested(sem, subclass) down_write_killable(sem) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index f11b9bd..54d11cb 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -1605,6 +1605,20 @@ void down_read_nested(struct rw_semaphore *sem, int subclass) } EXPORT_SYMBOL(down_read_nested); +int down_read_killable_nested(struct rw_semaphore *sem, int subclass) +{ + might_sleep(); + rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_); + + if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_killable)) { + rwsem_release(&sem->dep_map, _RET_IP_); + return -EINTR; + } + + return 0; +} +EXPORT_SYMBOL(down_read_killable_nested); + void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest) { might_sleep(); ^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-03 20:09 [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman 2020-12-03 20:10 ` [PATCH 1/3] rwsem: Implement down_read_killable_nested Eric W. Biederman @ 2020-12-03 20:11 ` Eric W. Biederman 2020-12-04 1:59 ` Waiman Long 2020-12-09 18:38 ` [tip: locking/core] rwsem: Implement down_read_interruptible tip-bot2 for Eric W. Biederman 2020-12-03 20:12 ` [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman 2020-12-03 22:42 ` [PATCH 0/3] " Linus Torvalds 3 siblings, 2 replies; 46+ messages in thread From: Eric W. Biederman @ 2020-12-03 20:11 UTC (permalink / raw) To: linux-kernel Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo In preparation for converting exec_update_mutex to a rwsem so that multiple readers can execute in parallel and not deadlock, add down_read_interruptible. This is needed for perf_event_open to be converted (with no semantic changes) from working on a mutex to wroking on a rwsem. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- include/linux/rwsem.h | 1 + kernel/locking/rwsem.c | 26 ++++++++++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index 13021b08b2ed..4c715be48717 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -123,6 +123,7 @@ static inline int rwsem_is_contended(struct rw_semaphore *sem) * lock for reading */ extern void down_read(struct rw_semaphore *sem); +extern int __must_check down_read_interruptible(struct rw_semaphore *sem); extern int __must_check down_read_killable(struct rw_semaphore *sem); /* diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 54d11cb97551..a163542d178e 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -1345,6 +1345,18 @@ static inline void __down_read(struct rw_semaphore *sem) } } +static inline int __down_read_interruptible(struct rw_semaphore *sem) +{ + if (!rwsem_read_trylock(sem)) { + if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE))) + return -EINTR; + DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); + } else { + rwsem_set_reader_owned(sem); + } + return 0; +} + static inline int __down_read_killable(struct rw_semaphore *sem) { if (!rwsem_read_trylock(sem)) { @@ -1495,6 +1507,20 @@ void __sched down_read(struct rw_semaphore *sem) } EXPORT_SYMBOL(down_read); +int __sched down_read_interruptible(struct rw_semaphore *sem) +{ + might_sleep(); + rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_); + + if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) { + rwsem_release(&sem->dep_map, _RET_IP_); + return -EINTR; + } + + return 0; +} +EXPORT_SYMBOL(down_read_interruptible); + int __sched down_read_killable(struct rw_semaphore *sem) { might_sleep(); -- 2.25.0 ^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-03 20:11 ` [PATCH 2/3] rwsem: Implement down_read_interruptible Eric W. Biederman @ 2020-12-04 1:59 ` Waiman Long 2020-12-07 9:02 ` Peter Zijlstra 2020-12-09 18:38 ` [tip: locking/core] rwsem: Implement down_read_interruptible tip-bot2 for Eric W. Biederman 1 sibling, 1 reply; 46+ messages in thread From: Waiman Long @ 2020-12-04 1:59 UTC (permalink / raw) To: Eric W. Biederman, linux-kernel Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo On 12/3/20 3:11 PM, Eric W. Biederman wrote: > In preparation for converting exec_update_mutex to a rwsem so that > multiple readers can execute in parallel and not deadlock, add > down_read_interruptible. This is needed for perf_event_open to be > converted (with no semantic changes) from working on a mutex to > wroking on a rwsem. > > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Will Deacon <will@kernel.org> > Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> > --- > include/linux/rwsem.h | 1 + > kernel/locking/rwsem.c | 26 ++++++++++++++++++++++++++ > 2 files changed, 27 insertions(+) > > diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h > index 13021b08b2ed..4c715be48717 100644 > --- a/include/linux/rwsem.h > +++ b/include/linux/rwsem.h > @@ -123,6 +123,7 @@ static inline int rwsem_is_contended(struct rw_semaphore *sem) > * lock for reading > */ > extern void down_read(struct rw_semaphore *sem); > +extern int __must_check down_read_interruptible(struct rw_semaphore *sem); > extern int __must_check down_read_killable(struct rw_semaphore *sem); > > /* > diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c > index 54d11cb97551..a163542d178e 100644 > --- a/kernel/locking/rwsem.c > +++ b/kernel/locking/rwsem.c > @@ -1345,6 +1345,18 @@ static inline void __down_read(struct rw_semaphore *sem) > } > } > > +static inline int __down_read_interruptible(struct rw_semaphore *sem) > +{ > + if (!rwsem_read_trylock(sem)) { > + if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE))) > + return -EINTR; > + DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); > + } else { > + rwsem_set_reader_owned(sem); > + } > + return 0; > +} > + > static inline int __down_read_killable(struct rw_semaphore *sem) > { > if (!rwsem_read_trylock(sem)) { > @@ -1495,6 +1507,20 @@ void __sched down_read(struct rw_semaphore *sem) > } > EXPORT_SYMBOL(down_read); > > +int __sched down_read_interruptible(struct rw_semaphore *sem) > +{ > + might_sleep(); > + rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_); > + > + if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) { > + rwsem_release(&sem->dep_map, _RET_IP_); > + return -EINTR; > + } > + > + return 0; > +} > +EXPORT_SYMBOL(down_read_interruptible); > + > int __sched down_read_killable(struct rw_semaphore *sem) > { > might_sleep(); Acked-by: Waiman Long <longman@redhat.com> -Longman ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-04 1:59 ` Waiman Long @ 2020-12-07 9:02 ` Peter Zijlstra 2020-12-07 15:33 ` Waiman Long ` (4 more replies) 0 siblings, 5 replies; 46+ messages in thread From: Peter Zijlstra @ 2020-12-07 9:02 UTC (permalink / raw) To: Waiman Long Cc: Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo On Thu, Dec 03, 2020 at 08:59:13PM -0500, Waiman Long wrote: > On 12/3/20 3:11 PM, Eric W. Biederman wrote: > > +static inline int __down_read_interruptible(struct rw_semaphore *sem) > > +{ > > + if (!rwsem_read_trylock(sem)) { > > + if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE))) > > + return -EINTR; > > + DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); > > + } else { > > + rwsem_set_reader_owned(sem); > > + } > > + return 0; > > +} > > + > > static inline int __down_read_killable(struct rw_semaphore *sem) > > { > > if (!rwsem_read_trylock(sem)) { > > @@ -1495,6 +1507,20 @@ void __sched down_read(struct rw_semaphore *sem) > > } > > EXPORT_SYMBOL(down_read); > > +int __sched down_read_interruptible(struct rw_semaphore *sem) > > +{ > > + might_sleep(); > > + rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_); > > + > > + if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) { > > + rwsem_release(&sem->dep_map, _RET_IP_); > > + return -EINTR; > > + } > > + > > + return 0; > > +} > > +EXPORT_SYMBOL(down_read_interruptible); > > + > > int __sched down_read_killable(struct rw_semaphore *sem) > > { > > might_sleep(); > > Acked-by: Waiman Long <longman@redhat.com> Yeah, that seems correct.. There's an unfortunate amount of copy-paste there though. Do we want to follow that up with something like this? --- --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -275,7 +275,25 @@ static inline bool rwsem_read_trylock(st long cnt = atomic_long_add_return_acquire(RWSEM_READER_BIAS, &sem->count); if (WARN_ON_ONCE(cnt < 0)) rwsem_set_nonspinnable(sem); - return !(cnt & RWSEM_READ_FAILED_MASK); + + if (!(cnt & RWSEM_READ_FAILED_MASK)) { + rwsem_set_reader_owned(sem); + return true; + } + + return false; +} + +static inline bool rwsem_write_trylock(struct rw_semaphore *sem) +{ + long tmp = RWSEM_UNLOCKED_VALUE; + + if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, RWSEM_WRITER_LOCKED)) { + rwsem_set_owner(sem); + return true; + } + + return false; } /* @@ -1335,38 +1353,29 @@ static struct rw_semaphore *rwsem_downgr /* * lock for reading */ -static inline void __down_read(struct rw_semaphore *sem) +static inline int __down_read_common(struct rw_semaphore *sem, int state) { if (!rwsem_read_trylock(sem)) { - rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE); + if (IS_ERR(rwsem_down_read_slowpath(sem, state))) + return -EINTR; DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); - } else { - rwsem_set_reader_owned(sem); } + return 0; } -static inline int __down_read_interruptible(struct rw_semaphore *sem) +static __always_inline void __down_read(struct rw_semaphore *sem) { - if (!rwsem_read_trylock(sem)) { - if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE))) - return -EINTR; - DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); - } else { - rwsem_set_reader_owned(sem); - } - return 0; + __down_read_common(sem, TASK_UNINTERRUPTIBLE); } -static inline int __down_read_killable(struct rw_semaphore *sem) +static __always_inline int __down_read_interruptible(struct rw_semaphore *sem) { - if (!rwsem_read_trylock(sem)) { - if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_KILLABLE))) - return -EINTR; - DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); - } else { - rwsem_set_reader_owned(sem); - } - return 0; + return __down_read_common(sem, TASK_INTERRUPTIBLE); +} + +static __always_inline int __down_read_killable(struct rw_semaphore *sem) +{ + return __down_read_common(sem, TASK_KILLABLE); } static inline int __down_read_trylock(struct rw_semaphore *sem) @@ -1392,44 +1401,29 @@ static inline int __down_read_trylock(st /* * lock for writing */ -static inline void __down_write(struct rw_semaphore *sem) +static inline int __down_write_common(struct rw_semaphore *sem, int state) { - long tmp = RWSEM_UNLOCKED_VALUE; - - if (unlikely(!atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, - RWSEM_WRITER_LOCKED))) - rwsem_down_write_slowpath(sem, TASK_UNINTERRUPTIBLE); - else - rwsem_set_owner(sem); + if (unlikely(!rwsem_write_trylock(sem))) { + if (IS_ERR(rwsem_down_write_slowpath(sem, state))) + return -EINTR; + } + return 0; } -static inline int __down_write_killable(struct rw_semaphore *sem) +static __always_inline void __down_write(struct rw_semaphore *sem) { - long tmp = RWSEM_UNLOCKED_VALUE; + __down_write_common(sem, TASK_UNINTERRUPTIBLE); +} - if (unlikely(!atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, - RWSEM_WRITER_LOCKED))) { - if (IS_ERR(rwsem_down_write_slowpath(sem, TASK_KILLABLE))) - return -EINTR; - } else { - rwsem_set_owner(sem); - } - return 0; +static __always_inline int __down_write_killable(struct rw_semaphore *sem) +{ + return __down_write_common(sem, TASK_KILLABLE); } static inline int __down_write_trylock(struct rw_semaphore *sem) { - long tmp; - DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem); - - tmp = RWSEM_UNLOCKED_VALUE; - if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, - RWSEM_WRITER_LOCKED)) { - rwsem_set_owner(sem); - return true; - } - return false; + return rwsem_write_trylock(sem); } /* ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-07 9:02 ` Peter Zijlstra @ 2020-12-07 15:33 ` Waiman Long 2020-12-07 16:58 ` David Laight 2020-12-07 15:56 ` Eric W. Biederman ` (3 subsequent siblings) 4 siblings, 1 reply; 46+ messages in thread From: Waiman Long @ 2020-12-07 15:33 UTC (permalink / raw) To: Peter Zijlstra Cc: Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo On 12/7/20 4:02 AM, Peter Zijlstra wrote: > On Thu, Dec 03, 2020 at 08:59:13PM -0500, Waiman Long wrote: >> On 12/3/20 3:11 PM, Eric W. Biederman wrote: >>> +static inline int __down_read_interruptible(struct rw_semaphore *sem) >>> +{ >>> + if (!rwsem_read_trylock(sem)) { >>> + if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE))) >>> + return -EINTR; >>> + DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); >>> + } else { >>> + rwsem_set_reader_owned(sem); >>> + } >>> + return 0; >>> +} >>> + >>> static inline int __down_read_killable(struct rw_semaphore *sem) >>> { >>> if (!rwsem_read_trylock(sem)) { >>> @@ -1495,6 +1507,20 @@ void __sched down_read(struct rw_semaphore *sem) >>> } >>> EXPORT_SYMBOL(down_read); >>> +int __sched down_read_interruptible(struct rw_semaphore *sem) >>> +{ >>> + might_sleep(); >>> + rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_); >>> + >>> + if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) { >>> + rwsem_release(&sem->dep_map, _RET_IP_); >>> + return -EINTR; >>> + } >>> + >>> + return 0; >>> +} >>> +EXPORT_SYMBOL(down_read_interruptible); >>> + >>> int __sched down_read_killable(struct rw_semaphore *sem) >>> { >>> might_sleep(); >> Acked-by: Waiman Long <longman@redhat.com> > Yeah, that seems correct.. There's an unfortunate amount of copy-paste > there though. > > Do we want to follow that up with something like this? I am actually thinking about similar streamlining once the patch lands. Your suggested changes look fine to me. Cheers, Longman ^ permalink raw reply [flat|nested] 46+ messages in thread
* RE: [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-07 15:33 ` Waiman Long @ 2020-12-07 16:58 ` David Laight 2020-12-07 19:02 ` Waiman Long 0 siblings, 1 reply; 46+ messages in thread From: David Laight @ 2020-12-07 16:58 UTC (permalink / raw) To: 'Waiman Long', Peter Zijlstra Cc: Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo From: Waiman Long > Sent: 07 December 2020 15:34 > > On 12/7/20 4:02 AM, Peter Zijlstra wrote: > > On Thu, Dec 03, 2020 at 08:59:13PM -0500, Waiman Long wrote: > >> On 12/3/20 3:11 PM, Eric W. Biederman wrote: > >>> +static inline int __down_read_interruptible(struct rw_semaphore *sem) > >>> +{ > >>> + if (!rwsem_read_trylock(sem)) { > >>> + if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE))) > >>> + return -EINTR; > >>> + DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); > >>> + } else { > >>> + rwsem_set_reader_owned(sem); > >>> + } > >>> + return 0; > >>> +} > >>> + > >>> static inline int __down_read_killable(struct rw_semaphore *sem) > >>> { > >>> if (!rwsem_read_trylock(sem)) { > >>> @@ -1495,6 +1507,20 @@ void __sched down_read(struct rw_semaphore *sem) > >>> } > >>> EXPORT_SYMBOL(down_read); > >>> +int __sched down_read_interruptible(struct rw_semaphore *sem) > >>> +{ > >>> + might_sleep(); > >>> + rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_); > >>> + > >>> + if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) { > >>> + rwsem_release(&sem->dep_map, _RET_IP_); > >>> + return -EINTR; > >>> + } > >>> + > >>> + return 0; > >>> +} > >>> +EXPORT_SYMBOL(down_read_interruptible); > >>> + > >>> int __sched down_read_killable(struct rw_semaphore *sem) > >>> { > >>> might_sleep(); > >> Acked-by: Waiman Long <longman@redhat.com> > > Yeah, that seems correct.. There's an unfortunate amount of copy-paste > > there though. > > > > Do we want to follow that up with something like this? > > I am actually thinking about similar streamlining once the patch lands. > > Your suggested changes look fine to me. How much more difficult would it be to also add a timeout option? I looked at adding one to the mutex code - and fell into a big pile of replicated code. ISTM that one the initial locked exchange (and spin) fails a few extra instructions when heading for the sleep don't really matter David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-07 16:58 ` David Laight @ 2020-12-07 19:02 ` Waiman Long 2020-12-08 9:12 ` David Laight 0 siblings, 1 reply; 46+ messages in thread From: Waiman Long @ 2020-12-07 19:02 UTC (permalink / raw) To: David Laight, Peter Zijlstra Cc: Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo On 12/7/20 11:58 AM, David Laight wrote: > From: Waiman Long >> Sent: 07 December 2020 15:34 >> >> On 12/7/20 4:02 AM, Peter Zijlstra wrote: >>> On Thu, Dec 03, 2020 at 08:59:13PM -0500, Waiman Long wrote: >>>> On 12/3/20 3:11 PM, Eric W. Biederman wrote: >>>>> +static inline int __down_read_interruptible(struct rw_semaphore *sem) >>>>> +{ >>>>> + if (!rwsem_read_trylock(sem)) { >>>>> + if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE))) >>>>> + return -EINTR; >>>>> + DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); >>>>> + } else { >>>>> + rwsem_set_reader_owned(sem); >>>>> + } >>>>> + return 0; >>>>> +} >>>>> + >>>>> static inline int __down_read_killable(struct rw_semaphore *sem) >>>>> { >>>>> if (!rwsem_read_trylock(sem)) { >>>>> @@ -1495,6 +1507,20 @@ void __sched down_read(struct rw_semaphore *sem) >>>>> } >>>>> EXPORT_SYMBOL(down_read); >>>>> +int __sched down_read_interruptible(struct rw_semaphore *sem) >>>>> +{ >>>>> + might_sleep(); >>>>> + rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_); >>>>> + >>>>> + if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) { >>>>> + rwsem_release(&sem->dep_map, _RET_IP_); >>>>> + return -EINTR; >>>>> + } >>>>> + >>>>> + return 0; >>>>> +} >>>>> +EXPORT_SYMBOL(down_read_interruptible); >>>>> + >>>>> int __sched down_read_killable(struct rw_semaphore *sem) >>>>> { >>>>> might_sleep(); >>>> Acked-by: Waiman Long <longman@redhat.com> >>> Yeah, that seems correct.. There's an unfortunate amount of copy-paste >>> there though. >>> >>> Do we want to follow that up with something like this? >> I am actually thinking about similar streamlining once the patch lands. >> >> Your suggested changes look fine to me. > How much more difficult would it be to also add a timeout option? > I looked at adding one to the mutex code - and fell into a big pile > of replicated code. > > ISTM that one the initial locked exchange (and spin) fails a few > extra instructions when heading for the sleep don't really matter > Actually, I had tried that before. See https://lore.kernel.org/lkml/20190911150537.19527-1-longman@redhat.com/ That is for rwsem, but the same can be done for mutex. However, Peter didn't seem to like the idea of a timeout parameter. Anyway, it is certainly doable if there is a good use case for it. Cheers, Longman ^ permalink raw reply [flat|nested] 46+ messages in thread
* RE: [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-07 19:02 ` Waiman Long @ 2020-12-08 9:12 ` David Laight 2020-12-08 12:32 ` Peter Zijlstra 2020-12-08 15:34 ` Waiman Long 0 siblings, 2 replies; 46+ messages in thread From: David Laight @ 2020-12-08 9:12 UTC (permalink / raw) To: 'Waiman Long', Peter Zijlstra Cc: Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo From: Waiman Long > Sent: 07 December 2020 19:02 ... > > How much more difficult would it be to also add a timeout option? > > I looked at adding one to the mutex code - and fell into a big pile > > of replicated code. > > > > ISTM that one the initial locked exchange (and spin) fails a few > > extra instructions when heading for the sleep don't really matter > > > Actually, I had tried that before. See > > https://lore.kernel.org/lkml/20190911150537.19527-1-longman@redhat.com/ > > That is for rwsem, but the same can be done for mutex. However, Peter > didn't seem to like the idea of a timeout parameter. Anyway, it is > certainly doable if there is a good use case for it. 'Unfortunately' my use-case if for an out-of-tree driver. The problem I was solving is a status call blocking because some other code is 'stuck' (probably an oops) with a mutex held. The code used to use down_timeout() (it was written for 2.4). When I changed to mutex_(to get optimistic spinning) I lost the ability to do the timeouts. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-08 9:12 ` David Laight @ 2020-12-08 12:32 ` Peter Zijlstra 2020-12-08 13:13 ` David Laight 2020-12-08 15:34 ` Waiman Long 1 sibling, 1 reply; 46+ messages in thread From: Peter Zijlstra @ 2020-12-08 12:32 UTC (permalink / raw) To: David Laight Cc: 'Waiman Long', Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo On Tue, Dec 08, 2020 at 09:12:36AM +0000, David Laight wrote: > From: Waiman Long > > Sent: 07 December 2020 19:02 > ... > > > How much more difficult would it be to also add a timeout option? > > > I looked at adding one to the mutex code - and fell into a big pile > > > of replicated code. > > > > > > ISTM that one the initial locked exchange (and spin) fails a few > > > extra instructions when heading for the sleep don't really matter > > > > > Actually, I had tried that before. See > > > > https://lore.kernel.org/lkml/20190911150537.19527-1-longman@redhat.com/ > > > > That is for rwsem, but the same can be done for mutex. However, Peter > > didn't seem to like the idea of a timeout parameter. Anyway, it is > > certainly doable if there is a good use case for it. > > 'Unfortunately' my use-case if for an out-of-tree driver. > > The problem I was solving is a status call blocking because > some other code is 'stuck' (probably an oops) with a mutex held. Working around oopses is not a sane use-case. If you oops, you get to keep all the pieces. ^ permalink raw reply [flat|nested] 46+ messages in thread
* RE: [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-08 12:32 ` Peter Zijlstra @ 2020-12-08 13:13 ` David Laight 0 siblings, 0 replies; 46+ messages in thread From: David Laight @ 2020-12-08 13:13 UTC (permalink / raw) To: 'Peter Zijlstra' Cc: 'Waiman Long', Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo From: Peter Zijlstra > Sent: 08 December 2020 12:32 > > On Tue, Dec 08, 2020 at 09:12:36AM +0000, David Laight wrote: > > From: Waiman Long > > > Sent: 07 December 2020 19:02 > > ... > > > > How much more difficult would it be to also add a timeout option? > > > > I looked at adding one to the mutex code - and fell into a big pile > > > > of replicated code. > > > > > > > > ISTM that one the initial locked exchange (and spin) fails a few > > > > extra instructions when heading for the sleep don't really matter > > > > > > > Actually, I had tried that before. See > > > > > > https://lore.kernel.org/lkml/20190911150537.19527-1-longman@redhat.com/ > > > > > > That is for rwsem, but the same can be done for mutex. However, Peter > > > didn't seem to like the idea of a timeout parameter. Anyway, it is > > > certainly doable if there is a good use case for it. > > > > 'Unfortunately' my use-case if for an out-of-tree driver. > > > > The problem I was solving is a status call blocking because > > some other code is 'stuck' (probably an oops) with a mutex held. > > Working around oopses is not a sane use-case. If you oops, you get to > keep all the pieces. It's not so much 'working around' as trying to get debug info out to fix the bug. It gets annoying when another process lock up. ISTR there is something global of that nature after a panic. I have written a version that reported an error if the process holding the mutex is dead (wasn't race safe against process exit). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-08 9:12 ` David Laight 2020-12-08 12:32 ` Peter Zijlstra @ 2020-12-08 15:34 ` Waiman Long 2020-12-08 16:23 ` David Laight 1 sibling, 1 reply; 46+ messages in thread From: Waiman Long @ 2020-12-08 15:34 UTC (permalink / raw) To: David Laight, Peter Zijlstra Cc: Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo On 12/8/20 4:12 AM, David Laight wrote: > From: Waiman Long >> Sent: 07 December 2020 19:02 > ... >>> How much more difficult would it be to also add a timeout option? >>> I looked at adding one to the mutex code - and fell into a big pile >>> of replicated code. >>> >>> ISTM that one the initial locked exchange (and spin) fails a few >>> extra instructions when heading for the sleep don't really matter >>> >> Actually, I had tried that before. See >> >> https://lore.kernel.org/lkml/20190911150537.19527-1-longman@redhat.com/ >> >> That is for rwsem, but the same can be done for mutex. However, Peter >> didn't seem to like the idea of a timeout parameter. Anyway, it is >> certainly doable if there is a good use case for it. > 'Unfortunately' my use-case if for an out-of-tree driver. > > The problem I was solving is a status call blocking because > some other code is 'stuck' (probably an oops) with a mutex held. > > The code used to use down_timeout() (it was written for 2.4). > When I changed to mutex_(to get optimistic spinning) I lost > the ability to do the timeouts. The primary reason for sending out that patchset was to work around some circular locking problem in existing code even though these circular locking scenarios are not likely to happen. Your case is certainly another potential circular locking problem as well. Cheers, Longman ^ permalink raw reply [flat|nested] 46+ messages in thread
* RE: [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-08 15:34 ` Waiman Long @ 2020-12-08 16:23 ` David Laight 0 siblings, 0 replies; 46+ messages in thread From: David Laight @ 2020-12-08 16:23 UTC (permalink / raw) To: 'Waiman Long', Peter Zijlstra Cc: Eric W. Biederman, linux-kernel, Linus Torvalds, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo From: Waiman Long > Sent: 08 December 2020 15:34 > > On 12/8/20 4:12 AM, David Laight wrote: > > From: Waiman Long > >> Sent: 07 December 2020 19:02 > > ... > >>> How much more difficult would it be to also add a timeout option? > >>> I looked at adding one to the mutex code - and fell into a big pile > >>> of replicated code. > >>> > >>> ISTM that one the initial locked exchange (and spin) fails a few > >>> extra instructions when heading for the sleep don't really matter > >>> > >> Actually, I had tried that before. See > >> > >> https://lore.kernel.org/lkml/20190911150537.19527-1-longman@redhat.com/ > >> > >> That is for rwsem, but the same can be done for mutex. However, Peter > >> didn't seem to like the idea of a timeout parameter. Anyway, it is > >> certainly doable if there is a good use case for it. > > 'Unfortunately' my use-case if for an out-of-tree driver. > > > > The problem I was solving is a status call blocking because > > some other code is 'stuck' (probably an oops) with a mutex held. > > > > The code used to use down_timeout() (it was written for 2.4). > > When I changed to mutex_(to get optimistic spinning) I lost > > the ability to do the timeouts. > > The primary reason for sending out that patchset was to work around some > circular locking problem in existing code even though these circular > locking scenarios are not likely to happen. Your case is certainly > another potential circular locking problem as well. If you've got lock-ordering problems they need fixing. Neither signals nor timeouts are real solutions. Either may help diagnose the problem, but they aren't fixes. OTOH if it reasonable to have a request interrupted by a signal it must also be reasonable to implement a timeout. Of course, one might wonder whether 'correct' code should ever be waiting on a mutex for any length of time. So is there even a justification for interruptible waits for mutex. FWIW I could implement my timeouts using SIGALARM - but it is a lot of work :-) David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-07 9:02 ` Peter Zijlstra 2020-12-07 15:33 ` Waiman Long @ 2020-12-07 15:56 ` Eric W. Biederman 2020-12-08 14:52 ` Peter Zijlstra 2020-12-09 18:38 ` [tip: locking/core] locking/rwsem: Introduce rwsem_write_trylock() tip-bot2 for Peter Zijlstra ` (2 subsequent siblings) 4 siblings, 1 reply; 46+ messages in thread From: Eric W. Biederman @ 2020-12-07 15:56 UTC (permalink / raw) To: Peter Zijlstra Cc: Waiman Long, linux-kernel, Linus Torvalds, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo Peter Zijlstra <peterz@infradead.org> writes: > On Thu, Dec 03, 2020 at 08:59:13PM -0500, Waiman Long wrote: >> On 12/3/20 3:11 PM, Eric W. Biederman wrote: > >> > +static inline int __down_read_interruptible(struct rw_semaphore *sem) >> > +{ >> > + if (!rwsem_read_trylock(sem)) { >> > + if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE))) >> > + return -EINTR; >> > + DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); >> > + } else { >> > + rwsem_set_reader_owned(sem); >> > + } >> > + return 0; >> > +} >> > + >> > static inline int __down_read_killable(struct rw_semaphore *sem) >> > { >> > if (!rwsem_read_trylock(sem)) { >> > @@ -1495,6 +1507,20 @@ void __sched down_read(struct rw_semaphore *sem) >> > } >> > EXPORT_SYMBOL(down_read); >> > +int __sched down_read_interruptible(struct rw_semaphore *sem) >> > +{ >> > + might_sleep(); >> > + rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_); >> > + >> > + if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) { >> > + rwsem_release(&sem->dep_map, _RET_IP_); >> > + return -EINTR; >> > + } >> > + >> > + return 0; >> > +} >> > +EXPORT_SYMBOL(down_read_interruptible); >> > + >> > int __sched down_read_killable(struct rw_semaphore *sem) >> > { >> > might_sleep(); >> >> Acked-by: Waiman Long <longman@redhat.com> > > Yeah, that seems correct.. There's an unfortunate amount of copy-paste > there though. > > Do we want to follow that up with something like this? Do you want to pull these two into a topic branch in the tip tree based on v10-rc1? That way we can share these two patches, and then you folks can make your locking cleanups and I can change exec_update_mutex to a rw_semaphore? Eric > --- > --- a/kernel/locking/rwsem.c > +++ b/kernel/locking/rwsem.c > @@ -275,7 +275,25 @@ static inline bool rwsem_read_trylock(st > long cnt = atomic_long_add_return_acquire(RWSEM_READER_BIAS, &sem->count); > if (WARN_ON_ONCE(cnt < 0)) > rwsem_set_nonspinnable(sem); > - return !(cnt & RWSEM_READ_FAILED_MASK); > + > + if (!(cnt & RWSEM_READ_FAILED_MASK)) { > + rwsem_set_reader_owned(sem); > + return true; > + } > + > + return false; > +} > + > +static inline bool rwsem_write_trylock(struct rw_semaphore *sem) > +{ > + long tmp = RWSEM_UNLOCKED_VALUE; > + > + if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, RWSEM_WRITER_LOCKED)) { > + rwsem_set_owner(sem); > + return true; > + } > + > + return false; > } > > /* > @@ -1335,38 +1353,29 @@ static struct rw_semaphore *rwsem_downgr > /* > * lock for reading > */ > -static inline void __down_read(struct rw_semaphore *sem) > +static inline int __down_read_common(struct rw_semaphore *sem, int state) > { > if (!rwsem_read_trylock(sem)) { > - rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE); > + if (IS_ERR(rwsem_down_read_slowpath(sem, state))) > + return -EINTR; > DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); > - } else { > - rwsem_set_reader_owned(sem); > } > + return 0; > } > > -static inline int __down_read_interruptible(struct rw_semaphore *sem) > +static __always_inline void __down_read(struct rw_semaphore *sem) > { > - if (!rwsem_read_trylock(sem)) { > - if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE))) > - return -EINTR; > - DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); > - } else { > - rwsem_set_reader_owned(sem); > - } > - return 0; > + __down_read_common(sem, TASK_UNINTERRUPTIBLE); > } > > -static inline int __down_read_killable(struct rw_semaphore *sem) > +static __always_inline int __down_read_interruptible(struct rw_semaphore *sem) > { > - if (!rwsem_read_trylock(sem)) { > - if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_KILLABLE))) > - return -EINTR; > - DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); > - } else { > - rwsem_set_reader_owned(sem); > - } > - return 0; > + return __down_read_common(sem, TASK_INTERRUPTIBLE); > +} > + > +static __always_inline int __down_read_killable(struct rw_semaphore *sem) > +{ > + return __down_read_common(sem, TASK_KILLABLE); > } > > static inline int __down_read_trylock(struct rw_semaphore *sem) > @@ -1392,44 +1401,29 @@ static inline int __down_read_trylock(st > /* > * lock for writing > */ > -static inline void __down_write(struct rw_semaphore *sem) > +static inline int __down_write_common(struct rw_semaphore *sem, int state) > { > - long tmp = RWSEM_UNLOCKED_VALUE; > - > - if (unlikely(!atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, > - RWSEM_WRITER_LOCKED))) > - rwsem_down_write_slowpath(sem, TASK_UNINTERRUPTIBLE); > - else > - rwsem_set_owner(sem); > + if (unlikely(!rwsem_write_trylock(sem))) { > + if (IS_ERR(rwsem_down_write_slowpath(sem, state))) > + return -EINTR; > + } > + return 0; > } > > -static inline int __down_write_killable(struct rw_semaphore *sem) > +static __always_inline void __down_write(struct rw_semaphore *sem) > { > - long tmp = RWSEM_UNLOCKED_VALUE; > + __down_write_common(sem, TASK_UNINTERRUPTIBLE); > +} > > - if (unlikely(!atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, > - RWSEM_WRITER_LOCKED))) { > - if (IS_ERR(rwsem_down_write_slowpath(sem, TASK_KILLABLE))) > - return -EINTR; > - } else { > - rwsem_set_owner(sem); > - } > - return 0; > +static __always_inline int __down_write_killable(struct rw_semaphore *sem) > +{ > + return __down_write_common(sem, TASK_KILLABLE); > } > > static inline int __down_write_trylock(struct rw_semaphore *sem) > { > - long tmp; > - > DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem); > - > - tmp = RWSEM_UNLOCKED_VALUE; > - if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, > - RWSEM_WRITER_LOCKED)) { > - rwsem_set_owner(sem); > - return true; > - } > - return false; > + return rwsem_write_trylock(sem); > } > > /* ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-07 15:56 ` Eric W. Biederman @ 2020-12-08 14:52 ` Peter Zijlstra 2020-12-08 18:27 ` Eric W. Biederman 0 siblings, 1 reply; 46+ messages in thread From: Peter Zijlstra @ 2020-12-08 14:52 UTC (permalink / raw) To: Eric W. Biederman Cc: Waiman Long, linux-kernel, Linus Torvalds, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo On Mon, Dec 07, 2020 at 09:56:34AM -0600, Eric W. Biederman wrote: > Do you want to pull these two into a topic branch in the tip tree > based on v10-rc1? I'll go do that. I'll let the robots chew on it before pushing it out though, I'll reply once it's in tip.git. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-08 14:52 ` Peter Zijlstra @ 2020-12-08 18:27 ` Eric W. Biederman 2020-12-09 18:36 ` Peter Zijlstra 0 siblings, 1 reply; 46+ messages in thread From: Eric W. Biederman @ 2020-12-08 18:27 UTC (permalink / raw) To: Peter Zijlstra Cc: Waiman Long, linux-kernel, Linus Torvalds, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo Peter Zijlstra <peterz@infradead.org> writes: > On Mon, Dec 07, 2020 at 09:56:34AM -0600, Eric W. Biederman wrote: > >> Do you want to pull these two into a topic branch in the tip tree >> based on v10-rc1? > > I'll go do that. I'll let the robots chew on it before pushing it out > though, I'll reply once it's in tip.git. Thanks, Eric ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-08 18:27 ` Eric W. Biederman @ 2020-12-09 18:36 ` Peter Zijlstra 2020-12-10 19:33 ` Eric W. Biederman 0 siblings, 1 reply; 46+ messages in thread From: Peter Zijlstra @ 2020-12-09 18:36 UTC (permalink / raw) To: Eric W. Biederman Cc: Waiman Long, linux-kernel, Linus Torvalds, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo On Tue, Dec 08, 2020 at 12:27:39PM -0600, Eric W. Biederman wrote: > Peter Zijlstra <peterz@infradead.org> writes: > > > On Mon, Dec 07, 2020 at 09:56:34AM -0600, Eric W. Biederman wrote: > > > >> Do you want to pull these two into a topic branch in the tip tree > >> based on v10-rc1? > > > > I'll go do that. I'll let the robots chew on it before pushing it out > > though, I'll reply once it's in tip.git. > > Thanks, git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/rwsem ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-09 18:36 ` Peter Zijlstra @ 2020-12-10 19:33 ` Eric W. Biederman 2020-12-11 8:16 ` Peter Zijlstra 0 siblings, 1 reply; 46+ messages in thread From: Eric W. Biederman @ 2020-12-10 19:33 UTC (permalink / raw) To: Peter Zijlstra Cc: Waiman Long, linux-kernel, Linus Torvalds, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo Peter Zijlstra <peterz@infradead.org> writes: > On Tue, Dec 08, 2020 at 12:27:39PM -0600, Eric W. Biederman wrote: >> Peter Zijlstra <peterz@infradead.org> writes: >> >> > On Mon, Dec 07, 2020 at 09:56:34AM -0600, Eric W. Biederman wrote: >> > >> >> Do you want to pull these two into a topic branch in the tip tree >> >> based on v10-rc1? >> > >> > I'll go do that. I'll let the robots chew on it before pushing it out >> > though, I'll reply once it's in tip.git. >> >> Thanks, > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/rwsem Is that branch supposed to be against 34816d20f173 ("Merge tag 'gfs2-v5.10-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2") If so I can live with that, but it is a little awkward to work with a base that recent. As all of my other branches have an older base. Eric ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 2/3] rwsem: Implement down_read_interruptible 2020-12-10 19:33 ` Eric W. Biederman @ 2020-12-11 8:16 ` Peter Zijlstra 0 siblings, 0 replies; 46+ messages in thread From: Peter Zijlstra @ 2020-12-11 8:16 UTC (permalink / raw) To: Eric W. Biederman Cc: Waiman Long, linux-kernel, Linus Torvalds, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo On Thu, Dec 10, 2020 at 01:33:25PM -0600, Eric W. Biederman wrote: > Peter Zijlstra <peterz@infradead.org> writes: > > > On Tue, Dec 08, 2020 at 12:27:39PM -0600, Eric W. Biederman wrote: > >> Peter Zijlstra <peterz@infradead.org> writes: > >> > >> > On Mon, Dec 07, 2020 at 09:56:34AM -0600, Eric W. Biederman wrote: > >> > > >> >> Do you want to pull these two into a topic branch in the tip tree > >> >> based on v10-rc1? > >> > > >> > I'll go do that. I'll let the robots chew on it before pushing it out > >> > though, I'll reply once it's in tip.git. > >> > >> Thanks, > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/rwsem > > Is that branch supposed to be against 34816d20f173 > ("Merge tag 'gfs2-v5.10-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2") That's what it looks like indeed. IIRC my script was supposed to pick the most recent -rc when creating new branches, but then I've no idea how I ended up on this. I'll go dig.. > If so I can live with that, but it is a little awkward to work with a > base that recent. As all of my other branches have an older base. I missed you explicitly requested -rc1, sorry about that :/ ^ permalink raw reply [flat|nested] 46+ messages in thread
* [tip: locking/core] locking/rwsem: Introduce rwsem_write_trylock() 2020-12-07 9:02 ` Peter Zijlstra 2020-12-07 15:33 ` Waiman Long 2020-12-07 15:56 ` Eric W. Biederman @ 2020-12-09 18:38 ` tip-bot2 for Peter Zijlstra 2020-12-09 18:38 ` [tip: locking/core] locking/rwsem: Fold __down_{read,write}*() tip-bot2 for Peter Zijlstra 2020-12-09 18:38 ` [tip: locking/core] locking/rwsem: Better collate rwsem_read_trylock() tip-bot2 for Peter Zijlstra 4 siblings, 0 replies; 46+ messages in thread From: tip-bot2 for Peter Zijlstra @ 2020-12-09 18:38 UTC (permalink / raw) To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel The following commit has been merged into the locking/core branch of tip: Commit-ID: 285c61aedf6bc5d81b37e4dc48c19012e8ff9836 Gitweb: https://git.kernel.org/tip/285c61aedf6bc5d81b37e4dc48c19012e8ff9836 Author: Peter Zijlstra <peterz@infradead.org> AuthorDate: Tue, 08 Dec 2020 10:25:06 +01:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Wed, 09 Dec 2020 17:08:47 +01:00 locking/rwsem: Introduce rwsem_write_trylock() One copy of this logic is better than three. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201207090243.GE3040@hirez.programming.kicks-ass.net --- kernel/locking/rwsem.c | 38 ++++++++++++++++---------------------- 1 file changed, 16 insertions(+), 22 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 5c0dc7e..7915456 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -285,6 +285,18 @@ static inline bool rwsem_read_trylock(struct rw_semaphore *sem) return false; } +static inline bool rwsem_write_trylock(struct rw_semaphore *sem) +{ + long tmp = RWSEM_UNLOCKED_VALUE; + + if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, RWSEM_WRITER_LOCKED)) { + rwsem_set_owner(sem); + return true; + } + + return false; +} + /* * Return just the real task structure pointer of the owner */ @@ -1395,42 +1407,24 @@ static inline int __down_read_trylock(struct rw_semaphore *sem) */ static inline void __down_write(struct rw_semaphore *sem) { - long tmp = RWSEM_UNLOCKED_VALUE; - - if (unlikely(!atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, - RWSEM_WRITER_LOCKED))) + if (unlikely(!rwsem_write_trylock(sem))) rwsem_down_write_slowpath(sem, TASK_UNINTERRUPTIBLE); - else - rwsem_set_owner(sem); } static inline int __down_write_killable(struct rw_semaphore *sem) { - long tmp = RWSEM_UNLOCKED_VALUE; - - if (unlikely(!atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, - RWSEM_WRITER_LOCKED))) { + if (unlikely(!rwsem_write_trylock(sem))) { if (IS_ERR(rwsem_down_write_slowpath(sem, TASK_KILLABLE))) return -EINTR; - } else { - rwsem_set_owner(sem); } + return 0; } static inline int __down_write_trylock(struct rw_semaphore *sem) { - long tmp; - DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem); - - tmp = RWSEM_UNLOCKED_VALUE; - if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, - RWSEM_WRITER_LOCKED)) { - rwsem_set_owner(sem); - return true; - } - return false; + return rwsem_write_trylock(sem); } /* ^ permalink raw reply related [flat|nested] 46+ messages in thread
* [tip: locking/core] locking/rwsem: Fold __down_{read,write}*() 2020-12-07 9:02 ` Peter Zijlstra ` (2 preceding siblings ...) 2020-12-09 18:38 ` [tip: locking/core] locking/rwsem: Introduce rwsem_write_trylock() tip-bot2 for Peter Zijlstra @ 2020-12-09 18:38 ` tip-bot2 for Peter Zijlstra 2020-12-09 18:38 ` [tip: locking/core] locking/rwsem: Better collate rwsem_read_trylock() tip-bot2 for Peter Zijlstra 4 siblings, 0 replies; 46+ messages in thread From: tip-bot2 for Peter Zijlstra @ 2020-12-09 18:38 UTC (permalink / raw) To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel The following commit has been merged into the locking/core branch of tip: Commit-ID: c995e638ccbbc65a76d1713c4fdcf927e7e2cb83 Gitweb: https://git.kernel.org/tip/c995e638ccbbc65a76d1713c4fdcf927e7e2cb83 Author: Peter Zijlstra <peterz@infradead.org> AuthorDate: Tue, 08 Dec 2020 10:27:41 +01:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Wed, 09 Dec 2020 17:08:47 +01:00 locking/rwsem: Fold __down_{read,write}*() There's a lot needless duplication in __down_{read,write}*(), cure that with a helper. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201207090243.GE3040@hirez.programming.kicks-ass.net --- kernel/locking/rwsem.c | 45 ++++++++++++++++++++--------------------- 1 file changed, 23 insertions(+), 22 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 7915456..67ae366 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -1354,32 +1354,29 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem) /* * lock for reading */ -static inline void __down_read(struct rw_semaphore *sem) +static inline int __down_read_common(struct rw_semaphore *sem, int state) { if (!rwsem_read_trylock(sem)) { - rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE); + if (IS_ERR(rwsem_down_read_slowpath(sem, state))) + return -EINTR; DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); } + return 0; +} + +static inline void __down_read(struct rw_semaphore *sem) +{ + __down_read_common(sem, TASK_UNINTERRUPTIBLE); } static inline int __down_read_interruptible(struct rw_semaphore *sem) { - if (!rwsem_read_trylock(sem)) { - if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE))) - return -EINTR; - DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); - } - return 0; + return __down_read_common(sem, TASK_INTERRUPTIBLE); } static inline int __down_read_killable(struct rw_semaphore *sem) { - if (!rwsem_read_trylock(sem)) { - if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_KILLABLE))) - return -EINTR; - DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); - } - return 0; + return __down_read_common(sem, TASK_KILLABLE); } static inline int __down_read_trylock(struct rw_semaphore *sem) @@ -1405,22 +1402,26 @@ static inline int __down_read_trylock(struct rw_semaphore *sem) /* * lock for writing */ -static inline void __down_write(struct rw_semaphore *sem) -{ - if (unlikely(!rwsem_write_trylock(sem))) - rwsem_down_write_slowpath(sem, TASK_UNINTERRUPTIBLE); -} - -static inline int __down_write_killable(struct rw_semaphore *sem) +static inline int __down_write_common(struct rw_semaphore *sem, int state) { if (unlikely(!rwsem_write_trylock(sem))) { - if (IS_ERR(rwsem_down_write_slowpath(sem, TASK_KILLABLE))) + if (IS_ERR(rwsem_down_write_slowpath(sem, state))) return -EINTR; } return 0; } +static inline void __down_write(struct rw_semaphore *sem) +{ + __down_write_common(sem, TASK_UNINTERRUPTIBLE); +} + +static inline int __down_write_killable(struct rw_semaphore *sem) +{ + return __down_write_common(sem, TASK_KILLABLE); +} + static inline int __down_write_trylock(struct rw_semaphore *sem) { DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem); ^ permalink raw reply related [flat|nested] 46+ messages in thread
* [tip: locking/core] locking/rwsem: Better collate rwsem_read_trylock() 2020-12-07 9:02 ` Peter Zijlstra ` (3 preceding siblings ...) 2020-12-09 18:38 ` [tip: locking/core] locking/rwsem: Fold __down_{read,write}*() tip-bot2 for Peter Zijlstra @ 2020-12-09 18:38 ` tip-bot2 for Peter Zijlstra 4 siblings, 0 replies; 46+ messages in thread From: tip-bot2 for Peter Zijlstra @ 2020-12-09 18:38 UTC (permalink / raw) To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel The following commit has been merged into the locking/core branch of tip: Commit-ID: 3379116a0ca965b00e6522c7ea3f16c9dbd8f9f9 Gitweb: https://git.kernel.org/tip/3379116a0ca965b00e6522c7ea3f16c9dbd8f9f9 Author: Peter Zijlstra <peterz@infradead.org> AuthorDate: Tue, 08 Dec 2020 10:22:16 +01:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Wed, 09 Dec 2020 17:08:47 +01:00 locking/rwsem: Better collate rwsem_read_trylock() All users of rwsem_read_trylock() do rwsem_set_reader_owned(sem) on success, move it into rwsem_read_trylock() proper. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201207090243.GE3040@hirez.programming.kicks-ass.net --- kernel/locking/rwsem.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index a163542..5c0dc7e 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -273,9 +273,16 @@ static inline void rwsem_set_nonspinnable(struct rw_semaphore *sem) static inline bool rwsem_read_trylock(struct rw_semaphore *sem) { long cnt = atomic_long_add_return_acquire(RWSEM_READER_BIAS, &sem->count); + if (WARN_ON_ONCE(cnt < 0)) rwsem_set_nonspinnable(sem); - return !(cnt & RWSEM_READ_FAILED_MASK); + + if (!(cnt & RWSEM_READ_FAILED_MASK)) { + rwsem_set_reader_owned(sem); + return true; + } + + return false; } /* @@ -1340,8 +1347,6 @@ static inline void __down_read(struct rw_semaphore *sem) if (!rwsem_read_trylock(sem)) { rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE); DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); - } else { - rwsem_set_reader_owned(sem); } } @@ -1351,8 +1356,6 @@ static inline int __down_read_interruptible(struct rw_semaphore *sem) if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE))) return -EINTR; DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); - } else { - rwsem_set_reader_owned(sem); } return 0; } @@ -1363,8 +1366,6 @@ static inline int __down_read_killable(struct rw_semaphore *sem) if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_KILLABLE))) return -EINTR; DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); - } else { - rwsem_set_reader_owned(sem); } return 0; } ^ permalink raw reply related [flat|nested] 46+ messages in thread
* [tip: locking/core] rwsem: Implement down_read_interruptible 2020-12-03 20:11 ` [PATCH 2/3] rwsem: Implement down_read_interruptible Eric W. Biederman 2020-12-04 1:59 ` Waiman Long @ 2020-12-09 18:38 ` tip-bot2 for Eric W. Biederman 1 sibling, 0 replies; 46+ messages in thread From: tip-bot2 for Eric W. Biederman @ 2020-12-09 18:38 UTC (permalink / raw) To: linux-tip-commits Cc: Eric W. Biederman, Peter Zijlstra (Intel), x86, linux-kernel The following commit has been merged into the locking/core branch of tip: Commit-ID: 31784cff7ee073b34d6eddabb95e3be2880a425c Gitweb: https://git.kernel.org/tip/31784cff7ee073b34d6eddabb95e3be2880a425c Author: Eric W. Biederman <ebiederm@xmission.com> AuthorDate: Thu, 03 Dec 2020 14:11:13 -06:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Wed, 09 Dec 2020 17:08:42 +01:00 rwsem: Implement down_read_interruptible In preparation for converting exec_update_mutex to a rwsem so that multiple readers can execute in parallel and not deadlock, add down_read_interruptible. This is needed for perf_event_open to be converted (with no semantic changes) from working on a mutex to wroking on a rwsem. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/87k0tybqfy.fsf@x220.int.ebiederm.org --- include/linux/rwsem.h | 1 + kernel/locking/rwsem.c | 26 ++++++++++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index 13021b0..4c715be 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -123,6 +123,7 @@ static inline int rwsem_is_contended(struct rw_semaphore *sem) * lock for reading */ extern void down_read(struct rw_semaphore *sem); +extern int __must_check down_read_interruptible(struct rw_semaphore *sem); extern int __must_check down_read_killable(struct rw_semaphore *sem); /* diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 54d11cb..a163542 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -1345,6 +1345,18 @@ static inline void __down_read(struct rw_semaphore *sem) } } +static inline int __down_read_interruptible(struct rw_semaphore *sem) +{ + if (!rwsem_read_trylock(sem)) { + if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_INTERRUPTIBLE))) + return -EINTR; + DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); + } else { + rwsem_set_reader_owned(sem); + } + return 0; +} + static inline int __down_read_killable(struct rw_semaphore *sem) { if (!rwsem_read_trylock(sem)) { @@ -1495,6 +1507,20 @@ void __sched down_read(struct rw_semaphore *sem) } EXPORT_SYMBOL(down_read); +int __sched down_read_interruptible(struct rw_semaphore *sem) +{ + might_sleep(); + rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_); + + if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) { + rwsem_release(&sem->dep_map, _RET_IP_); + return -EINTR; + } + + return 0; +} +EXPORT_SYMBOL(down_read_interruptible); + int __sched down_read_killable(struct rw_semaphore *sem) { might_sleep(); ^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-03 20:09 [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman 2020-12-03 20:10 ` [PATCH 1/3] rwsem: Implement down_read_killable_nested Eric W. Biederman 2020-12-03 20:11 ` [PATCH 2/3] rwsem: Implement down_read_interruptible Eric W. Biederman @ 2020-12-03 20:12 ` Eric W. Biederman 2020-12-04 16:08 ` Bernd Edlinger 2020-12-03 22:42 ` [PATCH 0/3] " Linus Torvalds 3 siblings, 1 reply; 46+ messages in thread From: Eric W. Biederman @ 2020-12-03 20:12 UTC (permalink / raw) To: linux-kernel Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo Recently syzbot reported[0] that there is a deadlock amongst the users of exec_update_mutex. The problematic lock ordering found by lockdep was: perf_event_open (exec_update_mutex -> ovl_i_mutex) chown (ovl_i_mutex -> sb_writes) sendfile (sb_writes -> p->lock) by reading from a proc file and writing to overlayfs proc_pid_syscall (p->lock -> exec_update_mutex) While looking at possible solutions it occured to me that all of the users and possible users involved only wanted to state of the given process to remain the same. They are all readers. The only writer is exec. There is no reason for readers to block on each other. So fix this deadlock by transforming exec_update_mutex into a rw_semaphore named exec_update_lock that only exec takes for writing. Cc: Jann Horn <jannh@google.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Bernd Edlinger <bernd.edlinger@hotmail.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christopher Yeoh <cyeoh@au1.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Fixes: eea9673250db ("exec: Add exec_update_mutex to replace cred_guard_mutex") [0] https://lkml.kernel.org/r/00000000000063640c05ade8e3de@google.com Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- fs/exec.c | 12 ++++++------ fs/proc/base.c | 10 +++++----- include/linux/sched/signal.h | 11 ++++++----- init/init_task.c | 2 +- kernel/events/core.c | 12 ++++++------ kernel/fork.c | 6 +++--- kernel/kcmp.c | 30 +++++++++++++++--------------- kernel/pid.c | 4 ++-- 8 files changed, 44 insertions(+), 43 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 9e9368603168..b822aa0d682e 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -965,8 +965,8 @@ EXPORT_SYMBOL(read_code); /* * Maps the mm_struct mm into the current task struct. - * On success, this function returns with the mutex - * exec_update_mutex locked. + * On success, this function returns with exec_update_lock + * held for writing. */ static int exec_mmap(struct mm_struct *mm) { @@ -981,7 +981,7 @@ static int exec_mmap(struct mm_struct *mm) if (old_mm) sync_mm_rss(old_mm); - ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); + ret = down_write_killable(&tsk->signal->exec_update_lock); if (ret) return ret; @@ -995,7 +995,7 @@ static int exec_mmap(struct mm_struct *mm) mmap_read_lock(old_mm); if (unlikely(old_mm->core_state)) { mmap_read_unlock(old_mm); - mutex_unlock(&tsk->signal->exec_update_mutex); + up_write(&tsk->signal->exec_update_lock); return -EINTR; } } @@ -1392,7 +1392,7 @@ int begin_new_exec(struct linux_binprm * bprm) return 0; out_unlock: - mutex_unlock(&me->signal->exec_update_mutex); + up_write(&me->signal->exec_update_lock); out: return retval; } @@ -1433,7 +1433,7 @@ void setup_new_exec(struct linux_binprm * bprm) * some architectures like powerpc */ me->mm->task_size = TASK_SIZE; - mutex_unlock(&me->signal->exec_update_mutex); + up_write(&me->signal->exec_update_lock); mutex_unlock(&me->signal->cred_guard_mutex); } EXPORT_SYMBOL(setup_new_exec); diff --git a/fs/proc/base.c b/fs/proc/base.c index 0f707003dda5..dd1f4f6f22bc 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -405,11 +405,11 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns, static int lock_trace(struct task_struct *task) { - int err = mutex_lock_killable(&task->signal->exec_update_mutex); + int err = down_read_killable(&task->signal->exec_update_lock); if (err) return err; if (!ptrace_may_access(task, PTRACE_MODE_ATTACH_FSCREDS)) { - mutex_unlock(&task->signal->exec_update_mutex); + up_read(&task->signal->exec_update_lock); return -EPERM; } return 0; @@ -417,7 +417,7 @@ static int lock_trace(struct task_struct *task) static void unlock_trace(struct task_struct *task) { - mutex_unlock(&task->signal->exec_update_mutex); + up_read(&task->signal->exec_update_lock); } #ifdef CONFIG_STACKTRACE @@ -2928,7 +2928,7 @@ static int do_io_accounting(struct task_struct *task, struct seq_file *m, int wh unsigned long flags; int result; - result = mutex_lock_killable(&task->signal->exec_update_mutex); + result = down_read_killable(&task->signal->exec_update_lock); if (result) return result; @@ -2964,7 +2964,7 @@ static int do_io_accounting(struct task_struct *task, struct seq_file *m, int wh result = 0; out_unlock: - mutex_unlock(&task->signal->exec_update_mutex); + up_read(&task->signal->exec_update_lock); return result; } diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 1bad18a1d8ba..4b6a8234d7fc 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -228,12 +228,13 @@ struct signal_struct { * credential calculations * (notably. ptrace) * Deprecated do not use in new code. - * Use exec_update_mutex instead. - */ - struct mutex exec_update_mutex; /* Held while task_struct is being - * updated during exec, and may have - * inconsistent permissions. + * Use exec_update_lock instead. */ + struct rw_semaphore exec_update_lock; /* Held while task_struct is + * being updated during exec, + * and may have inconsistent + * permissions. + */ } __randomize_layout; /* diff --git a/init/init_task.c b/init/init_task.c index a56f0abb63e9..15f6eb93a04f 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -26,7 +26,7 @@ static struct signal_struct init_signals = { .multiprocess = HLIST_HEAD_INIT, .rlim = INIT_RLIMITS, .cred_guard_mutex = __MUTEX_INITIALIZER(init_signals.cred_guard_mutex), - .exec_update_mutex = __MUTEX_INITIALIZER(init_signals.exec_update_mutex), + .exec_update_lock = __RWSEM_INITIALIZER(init_signals.exec_update_lock), #ifdef CONFIG_POSIX_TIMERS .posix_timers = LIST_HEAD_INIT(init_signals.posix_timers), .cputimer = { diff --git a/kernel/events/core.c b/kernel/events/core.c index da467e1dd49a..3189f690236e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1325,7 +1325,7 @@ static void put_ctx(struct perf_event_context *ctx) * function. * * Lock order: - * exec_update_mutex + * exec_update_lock * task_struct::perf_event_mutex * perf_event_context::mutex * perf_event::child_mutex; @@ -11730,14 +11730,14 @@ SYSCALL_DEFINE5(perf_event_open, } if (task) { - err = mutex_lock_interruptible(&task->signal->exec_update_mutex); + err = down_read_interruptible(&task->signal->exec_update_lock); if (err) goto err_task; /* * Preserve ptrace permission check for backwards compatibility. * - * We must hold exec_update_mutex across this and any potential + * We must hold exec_update_lock across this and any potential * perf_install_in_context() call for this new event to * serialize against exec() altering our credentials (and the * perf_event_exit_task() that could imply). @@ -12026,7 +12026,7 @@ SYSCALL_DEFINE5(perf_event_open, mutex_unlock(&ctx->mutex); if (task) { - mutex_unlock(&task->signal->exec_update_mutex); + up_read(&task->signal->exec_update_lock); put_task_struct(task); } @@ -12062,7 +12062,7 @@ SYSCALL_DEFINE5(perf_event_open, free_event(event); err_cred: if (task) - mutex_unlock(&task->signal->exec_update_mutex); + up_read(&task->signal->exec_update_lock); err_task: if (task) put_task_struct(task); @@ -12367,7 +12367,7 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn) /* * When a child task exits, feed back event values to parent events. * - * Can be called with exec_update_mutex held when called from + * Can be called with exec_update_lock held when called from * setup_new_exec(). */ void perf_event_exit_task(struct task_struct *child) diff --git a/kernel/fork.c b/kernel/fork.c index 837b546528c8..4d0ae6f827df 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1221,7 +1221,7 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode) struct mm_struct *mm; int err; - err = mutex_lock_killable(&task->signal->exec_update_mutex); + err = down_read_killable(&task->signal->exec_update_lock); if (err) return ERR_PTR(err); @@ -1231,7 +1231,7 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode) mmput(mm); mm = ERR_PTR(-EACCES); } - mutex_unlock(&task->signal->exec_update_mutex); + up_read(&task->signal->exec_update_lock); return mm; } @@ -1591,7 +1591,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk) sig->oom_score_adj_min = current->signal->oom_score_adj_min; mutex_init(&sig->cred_guard_mutex); - mutex_init(&sig->exec_update_mutex); + init_rwsem(&sig->exec_update_lock); return 0; } diff --git a/kernel/kcmp.c b/kernel/kcmp.c index 36e58eb5a11d..5353edfad8e1 100644 --- a/kernel/kcmp.c +++ b/kernel/kcmp.c @@ -70,25 +70,25 @@ get_file_raw_ptr(struct task_struct *task, unsigned int idx) return file; } -static void kcmp_unlock(struct mutex *m1, struct mutex *m2) +static void kcmp_unlock(struct rw_semaphore *l1, struct rw_semaphore *l2) { - if (likely(m2 != m1)) - mutex_unlock(m2); - mutex_unlock(m1); + if (likely(l2 != l1)) + up_read(l2); + up_read(l1); } -static int kcmp_lock(struct mutex *m1, struct mutex *m2) +static int kcmp_lock(struct rw_semaphore *l1, struct rw_semaphore *l2) { int err; - if (m2 > m1) - swap(m1, m2); + if (l2 > l1) + swap(l1, l2); - err = mutex_lock_killable(m1); - if (!err && likely(m1 != m2)) { - err = mutex_lock_killable_nested(m2, SINGLE_DEPTH_NESTING); + err = down_read_killable(l1); + if (!err && likely(l1 != l2)) { + err = down_read_killable_nested(l2, SINGLE_DEPTH_NESTING); if (err) - mutex_unlock(m1); + up_read(l1); } return err; @@ -156,8 +156,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type, /* * One should have enough rights to inspect task details. */ - ret = kcmp_lock(&task1->signal->exec_update_mutex, - &task2->signal->exec_update_mutex); + ret = kcmp_lock(&task1->signal->exec_update_lock, + &task2->signal->exec_update_lock); if (ret) goto err; if (!ptrace_may_access(task1, PTRACE_MODE_READ_REALCREDS) || @@ -212,8 +212,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type, } err_unlock: - kcmp_unlock(&task1->signal->exec_update_mutex, - &task2->signal->exec_update_mutex); + kcmp_unlock(&task1->signal->exec_update_lock, + &task2->signal->exec_update_lock); err: put_task_struct(task1); put_task_struct(task2); diff --git a/kernel/pid.c b/kernel/pid.c index a96bc4bf4f86..4856818c9de1 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -628,7 +628,7 @@ static struct file *__pidfd_fget(struct task_struct *task, int fd) struct file *file; int ret; - ret = mutex_lock_killable(&task->signal->exec_update_mutex); + ret = down_read_killable(&task->signal->exec_update_lock); if (ret) return ERR_PTR(ret); @@ -637,7 +637,7 @@ static struct file *__pidfd_fget(struct task_struct *task, int fd) else file = ERR_PTR(-EPERM); - mutex_unlock(&task->signal->exec_update_mutex); + up_read(&task->signal->exec_update_lock); return file ?: ERR_PTR(-EBADF); } -- 2.25.0 ^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-03 20:12 ` [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman @ 2020-12-04 16:08 ` Bernd Edlinger 2020-12-04 17:21 ` Linus Torvalds 2020-12-04 17:39 ` Eric W. Biederman 0 siblings, 2 replies; 46+ messages in thread From: Bernd Edlinger @ 2020-12-04 16:08 UTC (permalink / raw) To: Eric W. Biederman, linux-kernel Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long, Davidlohr Bueso Hi Eric, I think I remembered from a previous discussion about this topic, that it was unclear if the rw_semaphores are working the same in RT-Linux. Will this fix work in RT as well? On 12/3/20 9:12 PM, Eric W. Biederman wrote: > --- a/kernel/kcmp.c > +++ b/kernel/kcmp.c > @@ -70,25 +70,25 @@ get_file_raw_ptr(struct task_struct *task, unsigned int idx) > return file; > } > > -static void kcmp_unlock(struct mutex *m1, struct mutex *m2) > +static void kcmp_unlock(struct rw_semaphore *l1, struct rw_semaphore *l2) > { > - if (likely(m2 != m1)) > - mutex_unlock(m2); > - mutex_unlock(m1); > + if (likely(l2 != l1)) is this still necessary ? > + up_read(l2); > + up_read(l1); > } > > -static int kcmp_lock(struct mutex *m1, struct mutex *m2) > +static int kcmp_lock(struct rw_semaphore *l1, struct rw_semaphore *l2) > { > int err; > > - if (m2 > m1) > - swap(m1, m2); > + if (l2 > l1) > + swap(l1, l2); and this is probably also no longer necessary? > > - err = mutex_lock_killable(m1); > - if (!err && likely(m1 != m2)) { > - err = mutex_lock_killable_nested(m2, SINGLE_DEPTH_NESTING); > + err = down_read_killable(l1); > + if (!err && likely(l1 != l2)) { and this can now be unconditionally, right? > + err = down_read_killable_nested(l2, SINGLE_DEPTH_NESTING); > if (err) > - mutex_unlock(m1); > + up_read(l1); > } > > return err; > @@ -156,8 +156,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type, > /* > * One should have enough rights to inspect task details. > */ > - ret = kcmp_lock(&task1->signal->exec_update_mutex, > - &task2->signal->exec_update_mutex); > + ret = kcmp_lock(&task1->signal->exec_update_lock, > + &task2->signal->exec_update_lock); > if (ret) > goto err; > if (!ptrace_may_access(task1, PTRACE_MODE_READ_REALCREDS) || > @@ -212,8 +212,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type, > } > > err_unlock: > - kcmp_unlock(&task1->signal->exec_update_mutex, > - &task2->signal->exec_update_mutex); > + kcmp_unlock(&task1->signal->exec_update_lock, > + &task2->signal->exec_update_lock); > err: > put_task_struct(task1); > put_task_struct(task2); Thanks Bernd. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-04 16:08 ` Bernd Edlinger @ 2020-12-04 17:21 ` Linus Torvalds 2020-12-04 19:34 ` Eric W. Biederman 2020-12-04 17:39 ` Eric W. Biederman 1 sibling, 1 reply; 46+ messages in thread From: Linus Torvalds @ 2020-12-04 17:21 UTC (permalink / raw) To: Bernd Edlinger Cc: Eric W. Biederman, Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long, Davidlohr Bueso On Fri, Dec 4, 2020 at 8:08 AM Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: > > > > > -static void kcmp_unlock(struct mutex *m1, struct mutex *m2) > > +static void kcmp_unlock(struct rw_semaphore *l1, struct rw_semaphore *l2) > > { > > - if (likely(m2 != m1)) > > - mutex_unlock(m2); > > - mutex_unlock(m1); > > + if (likely(l2 != l1)) > > is this still necessary ? > > > + up_read(l2); > > + up_read(l1); > > } > > > > -static int kcmp_lock(struct mutex *m1, struct mutex *m2) > > +static int kcmp_lock(struct rw_semaphore *l1, struct rw_semaphore *l2) > > { > > int err; > > > > - if (m2 > m1) > > - swap(m1, m2); > > + if (l2 > l1) > > + swap(l1, l2); > > and this is probably also no longer necessary? These are still necessary, because even a recursive read lock can still block on a writer trying to come in between the two read locks due to fairness guarantees. So taking the same read lock twice is still a source of possible deadlocks. For the same reason, read locks still have ABBA deadlock and need to be taken in order. So switching from a mutex to a rwlock doesn't really change the locking rules in this respect. In fact, I'm not convinced this change even fixes the deadlock that syzbot reported, for the same reason: it just requires a write lock in between two read locks to deadlock. Linus ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-04 17:21 ` Linus Torvalds @ 2020-12-04 19:34 ` Eric W. Biederman 2020-12-04 20:10 ` Linus Torvalds 0 siblings, 1 reply; 46+ messages in thread From: Eric W. Biederman @ 2020-12-04 19:34 UTC (permalink / raw) To: Linus Torvalds Cc: Bernd Edlinger, Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long, Davidlohr Bueso Linus Torvalds <torvalds@linux-foundation.org> writes: > On Fri, Dec 4, 2020 at 8:08 AM Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: >> >> > >> > -static void kcmp_unlock(struct mutex *m1, struct mutex *m2) >> > +static void kcmp_unlock(struct rw_semaphore *l1, struct rw_semaphore *l2) >> > { >> > - if (likely(m2 != m1)) >> > - mutex_unlock(m2); >> > - mutex_unlock(m1); >> > + if (likely(l2 != l1)) >> >> is this still necessary ? >> >> > + up_read(l2); >> > + up_read(l1); >> > } >> > >> > -static int kcmp_lock(struct mutex *m1, struct mutex *m2) >> > +static int kcmp_lock(struct rw_semaphore *l1, struct rw_semaphore *l2) >> > { >> > int err; >> > >> > - if (m2 > m1) >> > - swap(m1, m2); >> > + if (l2 > l1) >> > + swap(l1, l2); >> >> and this is probably also no longer necessary? > > These are still necessary, because even a recursive read lock can > still block on a writer trying to come in between the two read locks > due to fairness guarantees. > > So taking the same read lock twice is still a source of possible deadlocks. > > For the same reason, read locks still have ABBA deadlock and need to > be taken in order. > > So switching from a mutex to a rwlock doesn't really change the > locking rules in this respect. Thinking about the specific case of down_read on two instances of exec_update_lock. If there are two instances of kcmp being called with the sames two pids, but in opposite order running, and the tasks of that both of those pids refer to both exec, you could definitely get a deadlock. So yes. We definitely need to keep the swap as well. > In fact, I'm not convinced this change even fixes the deadlock that > syzbot reported, for the same reason: it just requires a write lock in > between two read locks to deadlock. From a deadlock perspective the change is strictly better than what we have today. The readers will no longer block on each other. For the specific case that syzbot reported it is readers who were blocking on each other so that specific case if fixed. On the write side of exec_update_lock we have: cred_guard_mutex -> exec_update_lock Which means that to get an ABBA deadlock cred_guard_mutex would need to be involved and it is only acquired in 3 places: ptrace_attach, do_seccomp, and proc_pid_attr_write. In none of the 3 from the syscall entry point until the code that takes cred_guard_mutex can I find anything that takes any locks. Perhaps there is something in io_uring I did not completely trace that write path. So given that the exec path would need to be involved, and the exec path takes exec_update_lock pretty much at the top level. I am not seeing how there is any room for deadlocks after this change. Am I missing something? Eric ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-04 19:34 ` Eric W. Biederman @ 2020-12-04 20:10 ` Linus Torvalds 2020-12-04 20:30 ` Bernd Edlinger 2020-12-05 17:43 ` [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman 0 siblings, 2 replies; 46+ messages in thread From: Linus Torvalds @ 2020-12-04 20:10 UTC (permalink / raw) To: Eric W. Biederman Cc: Bernd Edlinger, Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long, Davidlohr Bueso On Fri, Dec 4, 2020 at 11:35 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > From a deadlock perspective the change is strictly better than what we > have today. The readers will no longer block on each other. No, agreed, it's better regardless. > For the specific case that syzbot reported it is readers who were > blocking on each other so that specific case if fixed. So the thing is, a reader can still block another reader if a writer comes in between them. Which is why I was thinking that the same deadlock could happen if somebody does an execve at just the right point. > On the write side of exec_update_lock we have: > > cred_guard_mutex -> exec_update_lock > > Which means that to get an ABBA deadlock cred_guard_mutex would need to > be involved No, see above: you can get a deadlock with - first reader gets exec_update_lock - writer on exec_update_lock blocks on first reader (this is exec) - second reader of exec_update_lock now blocks on the writer. So if that second reader holds something that the first one wants to get (or is the same thread as the first one), you have a deadlock: the first reader will never make any progress, will never release the lock, and the writer will never get it, and the second reader will forever wait for the writer that is ahead of it in the queue. cred_guard_mutex is immaterial, it's not involved in the deadlock. Yes, the writer holds it, but it's not relevant for anything else. And that deadlock looks very much like what syzcaller detected, in exactly that scenario: Chain exists of: &sig->exec_update_mutex --> sb_writers#4 --> &p->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&p->lock); lock(sb_writers#4); lock(&p->lock); lock(&sig->exec_update_mutex); *** DEADLOCK *** No? The only thing that is missing is that writer that causes the exec_update_mutex readers to block each other - exactly like they did when it was a mutex. But I may be missing something entirely obvious that keeps this from happening. Linus ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-04 20:10 ` Linus Torvalds @ 2020-12-04 20:30 ` Bernd Edlinger 2020-12-04 20:48 ` Linus Torvalds 2020-12-05 17:43 ` [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman 1 sibling, 1 reply; 46+ messages in thread From: Bernd Edlinger @ 2020-12-04 20:30 UTC (permalink / raw) To: Linus Torvalds, Eric W. Biederman Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long, Davidlohr Bueso On 12/4/20 9:10 PM, Linus Torvalds wrote: > On Fri, Dec 4, 2020 at 11:35 AM Eric W. Biederman <ebiederm@xmission.com> wrote: >> >> From a deadlock perspective the change is strictly better than what we >> have today. The readers will no longer block on each other. > > No, agreed, it's better regardless. > >> For the specific case that syzbot reported it is readers who were >> blocking on each other so that specific case if fixed. > > So the thing is, a reader can still block another reader if a writer > comes in between them. Which is why I was thinking that the same > deadlock could happen if somebody does an execve at just the right > point. > >> On the write side of exec_update_lock we have: >> >> cred_guard_mutex -> exec_update_lock >> >> Which means that to get an ABBA deadlock cred_guard_mutex would need to >> be involved > > No, see above: you can get a deadlock with > > - first reader gets exec_update_lock > > - writer on exec_update_lock blocks on first reader (this is exec) > > - second reader of exec_update_lock now blocks on the writer. > > So if that second reader holds something that the first one wants to > get (or is the same thread as the first one), you have a deadlock: the > first reader will never make any progress, will never release the > lock, and the writer will never get it, and the second reader will > forever wait for the writer that is ahead of it in the queue. > > cred_guard_mutex is immaterial, it's not involved in the deadlock. > Yes, the writer holds it, but it's not relevant for anything else. > > And that deadlock looks very much like what syzcaller detected, in > exactly that scenario: > > Chain exists of: > &sig->exec_update_mutex --> sb_writers#4 --> &p->lock > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&p->lock); > lock(sb_writers#4); > lock(&p->lock); > lock(&sig->exec_update_mutex); > > *** DEADLOCK *** > > No? The only thing that is missing is that writer that causes the > exec_update_mutex readers to block each other - exactly like they did > when it was a mutex. > > But I may be missing something entirely obvious that keeps this from happening. > I think you convinced me: > perf_event_open (exec_update_mutex -> ovl_i_mutex) > chown (ovl_i_mutex -> sb_writes) > sendfile (sb_writes -> p->lock) > by reading from a proc file and writing to overlayfs > proc_pid_syscall (p->lock -> exec_update_mutex) We need just 5 Philosophers (A-E): Philosopher A calls proc_pid_syscall, takes p->lock, and falls asleep for a while now. Philosphper B calls sendfile, takes sb_writes, and has to wait on p->lock. Philosopher C calls chown, takes ovl_i_mutes, and has to wait on sb_writes. Philosopher D calls perf_event_open, takes down_read(exec_mutex_lock), and has to wait on ovl_i_mutex. Philosopher E calls exec, and wants to take down_write(exec_mutex_lock), has to wait for Philosopher D. Now Philosopher A wakes up from his sleep, and wants to take down_read(exec_mutex_lock), but due to fairness reasons, he has to wait until Philosopher E gets and releases his write lock again. If Philosopher A is now blocked, we have a deadlock, right? Bernd. > Linus > ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-04 20:30 ` Bernd Edlinger @ 2020-12-04 20:48 ` Linus Torvalds 2020-12-04 21:48 ` Davidlohr Bueso 2020-12-07 9:09 ` Peter Zijlstra 0 siblings, 2 replies; 46+ messages in thread From: Linus Torvalds @ 2020-12-04 20:48 UTC (permalink / raw) To: Bernd Edlinger Cc: Eric W. Biederman, Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long, Davidlohr Bueso On Fri, Dec 4, 2020 at 12:30 PM Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: >> > > perf_event_open (exec_update_mutex -> ovl_i_mutex) Side note: this one looks like it should be easy to fix. Is there any real reason why exec_update_mutex is actually gotten that early, and held for that long in the perf event code? I _think_ we could move the ptrace check to be much later, to _just_ before that * This is the point on no return; we cannot fail hereafter. point in the perf event install chain.. I don't think it needs to be moved down even that much, I think it would be sufficient to move it down below the "perf_event_alloc()", but I didn't check very much. The fact that create_local_trace_uprobe() can end up going into a lookup of an OVL filesystem path smells kind of odd to me to begin with, but I didn't look at the whole thing. PeterZ, is there something I'm missing? Linus ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-04 20:48 ` Linus Torvalds @ 2020-12-04 21:48 ` Davidlohr Bueso 2020-12-05 18:05 ` Eric W. Biederman 2020-12-07 9:09 ` Peter Zijlstra 1 sibling, 1 reply; 46+ messages in thread From: Davidlohr Bueso @ 2020-12-04 21:48 UTC (permalink / raw) To: Linus Torvalds Cc: Bernd Edlinger, Eric W. Biederman, Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long On Fri, 04 Dec 2020, Linus Torvalds wrote: >On Fri, Dec 4, 2020 at 12:30 PM Bernd Edlinger ><bernd.edlinger@hotmail.de> wrote: >>> >> > perf_event_open (exec_update_mutex -> ovl_i_mutex) > >Side note: this one looks like it should be easy to fix. > >Is there any real reason why exec_update_mutex is actually gotten that >early, and held for that long in the perf event code? afaict just to validate the whole operation early. Per 79c9ce57eb2 the mutex will guard the check and the perf_install_in_context vs exec. > >I _think_ we could move the ptrace check to be much later, to _just_ before that > > * This is the point on no return; we cannot fail hereafter. > >point in the perf event install chain.. Peter had the idea of doing the ptrace_may_access() check twice: first lockless and early, then under exec_update_mutex when it mattered right before perf_install_in_context(): https://lore.kernel.org/linux-fsdevel/20200828123720.GZ1362448@hirez.programming.kicks-ass.net/ > >I don't think it needs to be moved down even that much, I think it >would be sufficient to move it down below the "perf_event_alloc()", >but I didn't check very much. Yeah we could just keep a single ptrace_may_access() check just further down until it won't deadlock. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-04 21:48 ` Davidlohr Bueso @ 2020-12-05 18:05 ` Eric W. Biederman 2020-12-07 9:15 ` Peter Zijlstra 0 siblings, 1 reply; 46+ messages in thread From: Eric W. Biederman @ 2020-12-05 18:05 UTC (permalink / raw) To: Davidlohr Bueso Cc: Linus Torvalds, Bernd Edlinger, Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long Davidlohr Bueso <dave@stgolabs.net> writes: > On Fri, 04 Dec 2020, Linus Torvalds wrote: > >>On Fri, Dec 4, 2020 at 12:30 PM Bernd Edlinger >><bernd.edlinger@hotmail.de> wrote: >>>> >>> > perf_event_open (exec_update_mutex -> ovl_i_mutex) >> >>Side note: this one looks like it should be easy to fix. >> >>Is there any real reason why exec_update_mutex is actually gotten that >>early, and held for that long in the perf event code? > > afaict just to validate the whole operation early. Per 79c9ce57eb2 the > mutex will guard the check and the perf_install_in_context vs exec. > >> >>I _think_ we could move the ptrace check to be much later, to _just_ before that >> >> * This is the point on no return; we cannot fail hereafter. >> >>point in the perf event install chain.. > > Peter had the idea of doing the ptrace_may_access() check twice: first > lockless and early, then under exec_update_mutex when it mattered right > before perf_install_in_context(): > > https://lore.kernel.org/linux-fsdevel/20200828123720.GZ1362448@hirez.programming.kicks-ass.net/ > >> >>I don't think it needs to be moved down even that much, I think it >>would be sufficient to move it down below the "perf_event_alloc()", >>but I didn't check very much. > > Yeah we could just keep a single ptrace_may_access() check just further > down until it won't deadlock. I am trying to understand why the permission check is there. The ptrace_may_access check came in with the earliest version of the code I can find. So going down that path hasn't helped. There are about 65 calls of perf_pmu_register so it definitely makes me nervous holding a semaphore over perf_event_alloc. What is truly silly in all of this is perf_uprobe_event_init fails if !perfmon_capable(). Which means the ptrace_may_access check and holding exec_update_mutex is completely irrelevant to the function of create_local_trace_uprobe. Which is strange in and of itself. I would have thought uprobes would have been the kind of probe that it would be safe for ordinary users to use. This is a much deeper rabit hole than I had anticipated when I started looking and I will have to come back to this after the weekend. If at all possible I would love to be able to move the grabbing of exec_update_mutex after perf_event_alloc and the pluggable functions it calls. It doesn't feel possible to audit that. On the flip side the task is passed in as hw.target. So it may not be possible to move the permission check. It is chaos. Eric ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-05 18:05 ` Eric W. Biederman @ 2020-12-07 9:15 ` Peter Zijlstra 0 siblings, 0 replies; 46+ messages in thread From: Peter Zijlstra @ 2020-12-07 9:15 UTC (permalink / raw) To: Eric W. Biederman Cc: Davidlohr Bueso, Linus Torvalds, Bernd Edlinger, Linux Kernel Mailing List, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long On Sat, Dec 05, 2020 at 12:05:32PM -0600, Eric W. Biederman wrote: > I am trying to understand why the permission check is there. It's about observability, is task A allowed to observe state of task B? By installing a perf event on another task, we can very accurately tell what it's doing, and isn't fundamentally different from attaching a debugger (ie. ptrace). Therefore we chose to use the same security checks. As is good custom, one does security checks early. Then Jann came and observed that race against execve mucking with privs, and we got to hold that mutex across lots. That patch I proposed earlier should solve that all. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-04 20:48 ` Linus Torvalds 2020-12-04 21:48 ` Davidlohr Bueso @ 2020-12-07 9:09 ` Peter Zijlstra 2020-12-07 18:40 ` Linus Torvalds 1 sibling, 1 reply; 46+ messages in thread From: Peter Zijlstra @ 2020-12-07 9:09 UTC (permalink / raw) To: Linus Torvalds Cc: Bernd Edlinger, Eric W. Biederman, Linux Kernel Mailing List, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long, Davidlohr Bueso On Fri, Dec 04, 2020 at 12:48:18PM -0800, Linus Torvalds wrote: > On Fri, Dec 4, 2020 at 12:30 PM Bernd Edlinger > <bernd.edlinger@hotmail.de> wrote: > >> > > > perf_event_open (exec_update_mutex -> ovl_i_mutex) > > Side note: this one looks like it should be easy to fix. > PeterZ, is there something I'm missing? Like this? https://lkml.kernel.org/r/20200828123720.GZ1362448@hirez.programming.kicks-ass.net ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-07 9:09 ` Peter Zijlstra @ 2020-12-07 18:40 ` Linus Torvalds 2020-12-08 8:34 ` [PATCH] perf: Break deadlock involving exec_update_mutex Peter Zijlstra 0 siblings, 1 reply; 46+ messages in thread From: Linus Torvalds @ 2020-12-07 18:40 UTC (permalink / raw) To: Peter Zijlstra Cc: Bernd Edlinger, Eric W. Biederman, Linux Kernel Mailing List, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long, Davidlohr Bueso On Mon, Dec 7, 2020 at 1:10 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > PeterZ, is there something I'm missing? > > Like this? > > https://lkml.kernel.org/r/20200828123720.GZ1362448@hirez.programming.kicks-ass.net Yes, except I think you should remove the old ptrace_may_access() check. Because otherwise we'll just end up having KCSAN complain about the unlocked optimistic accesses or something like that. So do the perfmon_capable() check early - it doesn't need the exec_update_mutex - and then just do the ptrace_may_access() one late. I don't see any point at all in checking privileges twice, and I do see real downsides. Not just that KCSAN issue, but also lack of coverage (ie the second check will then effectively never be tested, which is bad too). Linus ^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH] perf: Break deadlock involving exec_update_mutex 2020-12-07 18:40 ` Linus Torvalds @ 2020-12-08 8:34 ` Peter Zijlstra 2020-12-08 18:37 ` Linus Torvalds 2020-12-10 18:38 ` Davidlohr Bueso 0 siblings, 2 replies; 46+ messages in thread From: Peter Zijlstra @ 2020-12-08 8:34 UTC (permalink / raw) To: Linus Torvalds Cc: Bernd Edlinger, Eric W. Biederman, Linux Kernel Mailing List, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long, Davidlohr Bueso On Mon, Dec 07, 2020 at 10:40:11AM -0800, Linus Torvalds wrote: > On Mon, Dec 7, 2020 at 1:10 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > PeterZ, is there something I'm missing? > > > > Like this? > > > > https://lkml.kernel.org/r/20200828123720.GZ1362448@hirez.programming.kicks-ass.net > > Yes, except I think you should remove the old ptrace_may_access() check. > I don't see any point at all in checking privileges twice, and I do > see real downsides. Not just that KCSAN issue, but also lack of > coverage (ie the second check will then effectively never be tested, > which is bad too). Fair enough, find below. I suppose I'll queue the below into tip/perf/core for next merge window, unless you want it in a hurry? --- Subject: perf: Break deadlock involving exec_update_mutex From: Peter Zijlstra <peterz@infradead.org> Date: Fri, 28 Aug 2020 14:37:20 +0200 Syzbot reported a lock inversion involving perf. The sore point being perf holding exec_update_mutex() for a very long time, specifically across a whole bunch of filesystem ops in pmu::event_init() (uprobes) and anon_inode_getfile(). This then inverts against procfs code trying to take exec_update_mutex. Move the permission checks later, such that we need to hold the mutex over less code. Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- kernel/events/core.c | 46 +++++++++++++++++++++++----------------------- 1 file changed, 23 insertions(+), 23 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -11832,24 +11832,6 @@ SYSCALL_DEFINE5(perf_event_open, goto err_task; } - if (task) { - err = mutex_lock_interruptible(&task->signal->exec_update_mutex); - if (err) - goto err_task; - - /* - * Preserve ptrace permission check for backwards compatibility. - * - * We must hold exec_update_mutex across this and any potential - * perf_install_in_context() call for this new event to - * serialize against exec() altering our credentials (and the - * perf_event_exit_task() that could imply). - */ - err = -EACCES; - if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) - goto err_cred; - } - if (flags & PERF_FLAG_PID_CGROUP) cgroup_fd = pid; @@ -11857,7 +11839,7 @@ SYSCALL_DEFINE5(perf_event_open, NULL, NULL, cgroup_fd); if (IS_ERR(event)) { err = PTR_ERR(event); - goto err_cred; + goto err_task; } if (is_sampling_event(event)) { @@ -11976,6 +11958,24 @@ SYSCALL_DEFINE5(perf_event_open, goto err_context; } + if (task) { + err = mutex_lock_interruptible(&task->signal->exec_update_mutex); + if (err) + goto err_file; + + /* + * Preserve ptrace permission check for backwards compatibility. + * + * We must hold exec_update_mutex across this and any potential + * perf_install_in_context() call for this new event to + * serialize against exec() altering our credentials (and the + * perf_event_exit_task() that could imply). + */ + err = -EACCES; + if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) + goto err_cred; + } + if (move_group) { gctx = __perf_event_ctx_lock_double(group_leader, ctx); @@ -12151,7 +12151,10 @@ SYSCALL_DEFINE5(perf_event_open, if (move_group) perf_event_ctx_unlock(group_leader, gctx); mutex_unlock(&ctx->mutex); -/* err_file: */ +err_cred: + if (task) + mutex_unlock(&task->signal->exec_update_mutex); +err_file: fput(event_file); err_context: perf_unpin_context(ctx); @@ -12163,9 +12166,6 @@ SYSCALL_DEFINE5(perf_event_open, */ if (!event_file) free_event(event); -err_cred: - if (task) - mutex_unlock(&task->signal->exec_update_mutex); err_task: if (task) put_task_struct(task); ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] perf: Break deadlock involving exec_update_mutex 2020-12-08 8:34 ` [PATCH] perf: Break deadlock involving exec_update_mutex Peter Zijlstra @ 2020-12-08 18:37 ` Linus Torvalds 2020-12-10 18:38 ` Davidlohr Bueso 1 sibling, 0 replies; 46+ messages in thread From: Linus Torvalds @ 2020-12-08 18:37 UTC (permalink / raw) To: Peter Zijlstra Cc: Bernd Edlinger, Eric W. Biederman, Linux Kernel Mailing List, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long, Davidlohr Bueso On Tue, Dec 8, 2020 at 12:34 AM Peter Zijlstra <peterz@infradead.org> wrote: > > I suppose I'll queue the below into tip/perf/core for next merge window, > unless you want it in a hurry? LGTM, and there's no hurry. This is not a new thing, and I don't think it has been a problem in practice. Linus ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] perf: Break deadlock involving exec_update_mutex 2020-12-08 8:34 ` [PATCH] perf: Break deadlock involving exec_update_mutex Peter Zijlstra 2020-12-08 18:37 ` Linus Torvalds @ 2020-12-10 18:38 ` Davidlohr Bueso 2020-12-10 19:40 ` Eric W. Biederman 1 sibling, 1 reply; 46+ messages in thread From: Davidlohr Bueso @ 2020-12-10 18:38 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, Bernd Edlinger, Eric W. Biederman, Linux Kernel Mailing List, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long On Tue, 08 Dec 2020, Peter Zijlstra wrote: >I suppose I'll queue the below into tip/perf/core for next merge window, >unless you want it in a hurry? I'm thinking we'd still want Eric's series on top of this for the reader benefits of the lock. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH] perf: Break deadlock involving exec_update_mutex 2020-12-10 18:38 ` Davidlohr Bueso @ 2020-12-10 19:40 ` Eric W. Biederman 0 siblings, 0 replies; 46+ messages in thread From: Eric W. Biederman @ 2020-12-10 19:40 UTC (permalink / raw) To: Davidlohr Bueso Cc: Peter Zijlstra, Linus Torvalds, Bernd Edlinger, Linux Kernel Mailing List, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long Davidlohr Bueso <dave@stgolabs.net> writes: > On Tue, 08 Dec 2020, Peter Zijlstra wrote: > >>I suppose I'll queue the below into tip/perf/core for next merge window, >>unless you want it in a hurry? > > I'm thinking we'd still want Eric's series on top of this for the reader > benefits of the lock. We will see shortly what happens when the various branches all make it into linux-next. The two changes don't conflict in principle so it should not be a problem. Eric ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-04 20:10 ` Linus Torvalds 2020-12-04 20:30 ` Bernd Edlinger @ 2020-12-05 17:43 ` Eric W. Biederman 1 sibling, 0 replies; 46+ messages in thread From: Eric W. Biederman @ 2020-12-05 17:43 UTC (permalink / raw) To: Linus Torvalds Cc: Bernd Edlinger, Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long, Davidlohr Bueso Linus Torvalds <torvalds@linux-foundation.org> writes: > On Fri, Dec 4, 2020 at 11:35 AM Eric W. Biederman <ebiederm@xmission.com> wrote: >> >> From a deadlock perspective the change is strictly better than what we >> have today. The readers will no longer block on each other. > > No, agreed, it's better regardless. > >> For the specific case that syzbot reported it is readers who were >> blocking on each other so that specific case if fixed. > > So the thing is, a reader can still block another reader if a writer > comes in between them. Which is why I was thinking that the same > deadlock could happen if somebody does an execve at just the right > point. >> On the write side of exec_update_lock we have: >> >> cred_guard_mutex -> exec_update_lock >> >> Which means that to get an ABBA deadlock cred_guard_mutex would need to >> be involved > > No, see above: you can get a deadlock with > > - first reader gets exec_update_lock > > - writer on exec_update_lock blocks on first reader (this is exec) > > - second reader of exec_update_lock now blocks on the writer. > > So if that second reader holds something that the first one wants to > get (or is the same thread as the first one), you have a deadlock: the > first reader will never make any progress, will never release the > lock, and the writer will never get it, and the second reader will > forever wait for the writer that is ahead of it in the queue. Oh. I see what you are saying. I knew the writer had to be involved and block, but if the reader comes first all that has to be added to the situation is that an exec happens and attempts to take the lock. Any reader will block the writer. For some reason I was blind to the fact that a reader could block the writer, and I was looking for anything else that might block the writer. > And that deadlock looks very much like what syzcaller detected, in > exactly that scenario: > > Chain exists of: > &sig->exec_update_mutex --> sb_writers#4 --> &p->lock > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&p->lock); > lock(sb_writers#4); > lock(&p->lock); > lock(&sig->exec_update_mutex); > > *** DEADLOCK *** > > No? The only thing that is missing is that writer that causes the > exec_update_mutex readers to block each other - exactly like they did > when it was a mutex. > > But I may be missing something entirely obvious that keeps this from happening. I don't think so. It does look like my change makes it one step more difficult to cause the deadlock, but the deadlock can still happen. :-( Making it a read/writer lock both makes it more difficult to cause a deadlock and makes a lot of sense from a documentation standpoint so I still plan on merging the changes after the weekend. It looks like we do need to have a close look at perf_event_open and everything associated with it. Eric ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-04 16:08 ` Bernd Edlinger 2020-12-04 17:21 ` Linus Torvalds @ 2020-12-04 17:39 ` Eric W. Biederman 1 sibling, 0 replies; 46+ messages in thread From: Eric W. Biederman @ 2020-12-04 17:39 UTC (permalink / raw) To: Bernd Edlinger Cc: linux-kernel, Linus Torvalds, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo, Waiman Long, Davidlohr Bueso Bernd Edlinger <bernd.edlinger@hotmail.de> writes: > Hi Eric, > > I think I remembered from a previous discussion about this topic, > that it was unclear if the rw_semaphores are working the same > in RT-Linux. Will this fix work in RT as well? The locks should work close enough to the same that correct code is also correct code on RT-linux. If not it is an RT-linux bug. An rw_semaphore may be less than optimal on RT-linux. I do remember that mutexes are prefered. But this change is more about correctness than anything else. > On 12/3/20 9:12 PM, Eric W. Biederman wrote: >> --- a/kernel/kcmp.c >> +++ b/kernel/kcmp.c >> @@ -70,25 +70,25 @@ get_file_raw_ptr(struct task_struct *task, unsigned int idx) >> return file; >> } >> >> -static void kcmp_unlock(struct mutex *m1, struct mutex *m2) >> +static void kcmp_unlock(struct rw_semaphore *l1, struct rw_semaphore *l2) >> { >> - if (likely(m2 != m1)) >> - mutex_unlock(m2); >> - mutex_unlock(m1); >> + if (likely(l2 != l1)) > > is this still necessary ? Yes. Both pids could be threads of the same process or even the same value so yes this is definitely necessary. rw_semaphores don't nest on the same cpu. > >> + up_read(l2); >> + up_read(l1); >> } >> >> -static int kcmp_lock(struct mutex *m1, struct mutex *m2) >> +static int kcmp_lock(struct rw_semaphore *l1, struct rw_semaphore *l2) >> { >> int err; >> >> - if (m2 > m1) >> - swap(m1, m2); >> + if (l2 > l1) >> + swap(l1, l2); > > and this is probably also no longer necessary? I think lockdep needs this, so it can be certain the same rw_semaphore is not nesting on the cpu. Otherwise we will have inconsitencies about which is the nested lock. It won't matter in practice, but I am not certain lockdep knows enough to tell the difference. If anything removing the swap is a candidate for a follow up patch where it can be considered separately from other concerns. For this patch keeping the logic unchanged makes it trivial to verify that the conversion from one lock to another is correct. >> >> - err = mutex_lock_killable(m1); >> - if (!err && likely(m1 != m2)) { >> - err = mutex_lock_killable_nested(m2, SINGLE_DEPTH_NESTING); >> + err = down_read_killable(l1); >> + if (!err && likely(l1 != l2)) { > > and this can now be unconditionally, right? Nope. The two locks can be the same lock, and they don't nest on a single cpu. I tested and verified that lockdep complains bitterly if down_read_killable_nested is replaced with a simple down_read_killable. >> + err = down_read_killable_nested(l2, SINGLE_DEPTH_NESTING); >> if (err) >> - mutex_unlock(m1); >> + up_read(l1); >> } >> >> return err; >> @@ -156,8 +156,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type, >> /* >> * One should have enough rights to inspect task details. >> */ >> - ret = kcmp_lock(&task1->signal->exec_update_mutex, >> - &task2->signal->exec_update_mutex); >> + ret = kcmp_lock(&task1->signal->exec_update_lock, >> + &task2->signal->exec_update_lock); >> if (ret) >> goto err; >> if (!ptrace_may_access(task1, PTRACE_MODE_READ_REALCREDS) || >> @@ -212,8 +212,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type, >> } >> >> err_unlock: >> - kcmp_unlock(&task1->signal->exec_update_mutex, >> - &task2->signal->exec_update_mutex); >> + kcmp_unlock(&task1->signal->exec_update_lock, >> + &task2->signal->exec_update_lock); >> err: >> put_task_struct(task1); >> put_task_struct(task2); > > > Thanks > Bernd. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-03 20:09 [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman ` (2 preceding siblings ...) 2020-12-03 20:12 ` [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman @ 2020-12-03 22:42 ` Linus Torvalds 2020-12-04 1:56 ` Waiman Long 2020-12-04 4:54 ` Davidlohr Bueso 3 siblings, 2 replies; 46+ messages in thread From: Linus Torvalds @ 2020-12-03 22:42 UTC (permalink / raw) To: Eric W. Biederman, Waiman Long, Davidlohr Bueso Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo On Thu, Dec 3, 2020 at 12:10 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > The simplest and most robust solution appears to be making > exec_update_mutex a read/write lock and having everything execept for > exec take the lock for read. Looks sane to me. I'd like the locking people to look at the down_read_*() functions, even if they look simple. Adding Waiman and Davidlohr to the cc just in case they would look at that too, since they've worked on that code. Linus ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-03 22:42 ` [PATCH 0/3] " Linus Torvalds @ 2020-12-04 1:56 ` Waiman Long 2020-12-04 4:54 ` Davidlohr Bueso 1 sibling, 0 replies; 46+ messages in thread From: Waiman Long @ 2020-12-04 1:56 UTC (permalink / raw) To: Linus Torvalds, Eric W. Biederman, Davidlohr Bueso Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo On 12/3/20 5:42 PM, Linus Torvalds wrote: > On Thu, Dec 3, 2020 at 12:10 PM Eric W. Biederman <ebiederm@xmission.com> wrote: >> The simplest and most robust solution appears to be making >> exec_update_mutex a read/write lock and having everything execept for >> exec take the lock for read. > Looks sane to me. > > I'd like the locking people to look at the down_read_*() functions, > even if they look simple. Adding Waiman and Davidlohr to the cc just > in case they would look at that too, since they've worked on that > code. I have looked at patches 1 & 2 on adding down_read_killable_nested() and down_read_interruptible(). They looks good to me. Cheers, Longman ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore 2020-12-03 22:42 ` [PATCH 0/3] " Linus Torvalds 2020-12-04 1:56 ` Waiman Long @ 2020-12-04 4:54 ` Davidlohr Bueso 1 sibling, 0 replies; 46+ messages in thread From: Davidlohr Bueso @ 2020-12-04 4:54 UTC (permalink / raw) To: Linus Torvalds Cc: Eric W. Biederman, Waiman Long, Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar, Will Deacon, Jann Horn, Vasiliy Kulikov, Al Viro, Bernd Edlinger, Oleg Nesterov, Christopher Yeoh, Cyrill Gorcunov, Sargun Dhillon, Christian Brauner, Arnd Bergmann, Arnaldo Carvalho de Melo On Thu, 03 Dec 2020, Linus Torvalds wrote: >On Thu, Dec 3, 2020 at 12:10 PM Eric W. Biederman <ebiederm@xmission.com> wrote: >> >> The simplest and most robust solution appears to be making >> exec_update_mutex a read/write lock and having everything execept for >> exec take the lock for read. > >Looks sane to me. > >I'd like the locking people to look at the down_read_*() functions, >even if they look simple. Adding Waiman and Davidlohr to the cc just >in case they would look at that too, since they've worked on that >code. rwsem changes look good to me - and 3/3 looks much nicer than the previous alternatives to the deadlock. Acked-by: Davidlohr Bueso <dbueso@suse.de> ^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2020-12-11 8:18 UTC | newest] Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-03 20:09 [PATCH 0/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman 2020-12-03 20:10 ` [PATCH 1/3] rwsem: Implement down_read_killable_nested Eric W. Biederman 2020-12-04 1:58 ` Waiman Long 2020-12-09 18:38 ` [tip: locking/core] " tip-bot2 for Eric W. Biederman 2020-12-03 20:11 ` [PATCH 2/3] rwsem: Implement down_read_interruptible Eric W. Biederman 2020-12-04 1:59 ` Waiman Long 2020-12-07 9:02 ` Peter Zijlstra 2020-12-07 15:33 ` Waiman Long 2020-12-07 16:58 ` David Laight 2020-12-07 19:02 ` Waiman Long 2020-12-08 9:12 ` David Laight 2020-12-08 12:32 ` Peter Zijlstra 2020-12-08 13:13 ` David Laight 2020-12-08 15:34 ` Waiman Long 2020-12-08 16:23 ` David Laight 2020-12-07 15:56 ` Eric W. Biederman 2020-12-08 14:52 ` Peter Zijlstra 2020-12-08 18:27 ` Eric W. Biederman 2020-12-09 18:36 ` Peter Zijlstra 2020-12-10 19:33 ` Eric W. Biederman 2020-12-11 8:16 ` Peter Zijlstra 2020-12-09 18:38 ` [tip: locking/core] locking/rwsem: Introduce rwsem_write_trylock() tip-bot2 for Peter Zijlstra 2020-12-09 18:38 ` [tip: locking/core] locking/rwsem: Fold __down_{read,write}*() tip-bot2 for Peter Zijlstra 2020-12-09 18:38 ` [tip: locking/core] locking/rwsem: Better collate rwsem_read_trylock() tip-bot2 for Peter Zijlstra 2020-12-09 18:38 ` [tip: locking/core] rwsem: Implement down_read_interruptible tip-bot2 for Eric W. Biederman 2020-12-03 20:12 ` [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman 2020-12-04 16:08 ` Bernd Edlinger 2020-12-04 17:21 ` Linus Torvalds 2020-12-04 19:34 ` Eric W. Biederman 2020-12-04 20:10 ` Linus Torvalds 2020-12-04 20:30 ` Bernd Edlinger 2020-12-04 20:48 ` Linus Torvalds 2020-12-04 21:48 ` Davidlohr Bueso 2020-12-05 18:05 ` Eric W. Biederman 2020-12-07 9:15 ` Peter Zijlstra 2020-12-07 9:09 ` Peter Zijlstra 2020-12-07 18:40 ` Linus Torvalds 2020-12-08 8:34 ` [PATCH] perf: Break deadlock involving exec_update_mutex Peter Zijlstra 2020-12-08 18:37 ` Linus Torvalds 2020-12-10 18:38 ` Davidlohr Bueso 2020-12-10 19:40 ` Eric W. Biederman 2020-12-05 17:43 ` [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore Eric W. Biederman 2020-12-04 17:39 ` Eric W. Biederman 2020-12-03 22:42 ` [PATCH 0/3] " Linus Torvalds 2020-12-04 1:56 ` Waiman Long 2020-12-04 4:54 ` Davidlohr Bueso
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.