* [PATCH] Fix data race in mark_rt_mutex_waiters @ 2023-01-20 13:55 Hernan Ponce de Leon 2023-01-20 14:58 ` Arjan van de Ven 2023-01-20 16:23 ` Peter Zijlstra 0 siblings, 2 replies; 24+ messages in thread From: Hernan Ponce de Leon @ 2023-01-20 13:55 UTC (permalink / raw) To: peterz, mingo, will, longman, boqun.feng, akpm, arjan, tglx, joel, paulmck, stern, diogo.behrens, jonas.oberhauser Cc: linux-kernel, Hernan Ponce de Leon, stable From: Hernan Ponce de Leon <hernanl.leon@huawei.com> Following the defition of data race in tools/memory-model/linux-kernel.cat the dartagnan tool https://github.com/hernanponcedeleon/Dat3M reported a race between mark_rt_mutex_waiters and rt_mutex_cmpxchg_release. Commit 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core") later removed in commit d0aa7a70bf03 ("futex_requeue_pi optimization") and reverted in commit bd197234b0a6 ("Revert "futex_requeue_pi optimization"") The original commit introduced the data race. Cc: stable@vger.kernel.org # v2.6.18.x Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core") Signed-off-by: Hernan Ponce de Leon <hernanl.leon@huawei.com> --- kernel/locking/rtmutex.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 010cf4e6d0b8..7ed9472edd48 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -235,7 +235,7 @@ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock) unsigned long owner, *p = (unsigned long *) &lock->owner; do { - owner = *p; + owner = READ_ONCE(*p); } while (cmpxchg_relaxed(p, owner, owner | RT_MUTEX_HAS_WAITERS) != owner); -- 2.25.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-20 13:55 [PATCH] Fix data race in mark_rt_mutex_waiters Hernan Ponce de Leon @ 2023-01-20 14:58 ` Arjan van de Ven 2023-01-20 15:54 ` Paul E. McKenney 2023-01-20 16:23 ` Peter Zijlstra 1 sibling, 1 reply; 24+ messages in thread From: Arjan van de Ven @ 2023-01-20 14:58 UTC (permalink / raw) To: Hernan Ponce de Leon, peterz, mingo, will, longman, boqun.feng, akpm, tglx, joel, paulmck, stern, diogo.behrens, jonas.oberhauser Cc: linux-kernel, Hernan Ponce de Leon, stable On 1/20/2023 5:55 AM, Hernan Ponce de Leon wrote: > From: Hernan Ponce de Leon <hernanl.leon@huawei.com> > > kernel/locking/rtmutex.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c > index 010cf4e6d0b8..7ed9472edd48 100644 > --- a/kernel/locking/rtmutex.c > +++ b/kernel/locking/rtmutex.c > @@ -235,7 +235,7 @@ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock) > unsigned long owner, *p = (unsigned long *) &lock->owner; > > do { > - owner = *p; > + owner = READ_ONCE(*p); > } while (cmpxchg_relaxed(p, owner, I don't see how this makes any difference at all. *p can be read a dozen times and it's fine; cmpxchg has barrier semantics for compilers afaics ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-20 14:58 ` Arjan van de Ven @ 2023-01-20 15:54 ` Paul E. McKenney 2023-01-22 15:24 ` Hernan Ponce de Leon 0 siblings, 1 reply; 24+ messages in thread From: Paul E. McKenney @ 2023-01-20 15:54 UTC (permalink / raw) To: Arjan van de Ven Cc: Hernan Ponce de Leon, peterz, mingo, will, longman, boqun.feng, akpm, tglx, joel, stern, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable On Fri, Jan 20, 2023 at 06:58:20AM -0800, Arjan van de Ven wrote: > On 1/20/2023 5:55 AM, Hernan Ponce de Leon wrote: > > From: Hernan Ponce de Leon <hernanl.leon@huawei.com> > > > > > kernel/locking/rtmutex.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c > > index 010cf4e6d0b8..7ed9472edd48 100644 > > --- a/kernel/locking/rtmutex.c > > +++ b/kernel/locking/rtmutex.c > > @@ -235,7 +235,7 @@ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock) > > unsigned long owner, *p = (unsigned long *) &lock->owner; > > do { > > - owner = *p; > > + owner = READ_ONCE(*p); > > } while (cmpxchg_relaxed(p, owner, > > > I don't see how this makes any difference at all. > *p can be read a dozen times and it's fine; cmpxchg has barrier semantics for compilers afaics Doing so does suppress a KCSAN warning. You could also use data_race() if it turns out that the volatile semantics would prevent a valuable compiler optimization. Thanx, Paul ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-20 15:54 ` Paul E. McKenney @ 2023-01-22 15:24 ` Hernan Ponce de Leon 2023-01-23 16:40 ` Paul E. McKenney 0 siblings, 1 reply; 24+ messages in thread From: Hernan Ponce de Leon @ 2023-01-22 15:24 UTC (permalink / raw) To: paulmck, Arjan van de Ven Cc: peterz, mingo, will, longman, boqun.feng, akpm, tglx, joel, stern, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable On 1/20/2023 4:54 PM, Paul E. McKenney wrote: > On Fri, Jan 20, 2023 at 06:58:20AM -0800, Arjan van de Ven wrote: >> On 1/20/2023 5:55 AM, Hernan Ponce de Leon wrote: >>> From: Hernan Ponce de Leon <hernanl.leon@huawei.com> >>> >> >>> kernel/locking/rtmutex.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c >>> index 010cf4e6d0b8..7ed9472edd48 100644 >>> --- a/kernel/locking/rtmutex.c >>> +++ b/kernel/locking/rtmutex.c >>> @@ -235,7 +235,7 @@ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock) >>> unsigned long owner, *p = (unsigned long *) &lock->owner; >>> do { >>> - owner = *p; >>> + owner = READ_ONCE(*p); >>> } while (cmpxchg_relaxed(p, owner, >> >> >> I don't see how this makes any difference at all. >> *p can be read a dozen times and it's fine; cmpxchg has barrier semantics for compilers afaics > > Doing so does suppress a KCSAN warning. You could also use data_race() > if it turns out that the volatile semantics would prevent a valuable > compiler optimization. I think the import question is "is this a harmful data race (and needs to be fixed as proposed by the patch) or a harmless one (and we should use data_race() to silence tools)?". In https://lkml.org/lkml/2023/1/22/160 I describe how this data race can affect important ordering guarantees for the rest of the code. For this reason I consider it a harmful one. If this is not the case, I would appreciate some feedback or pointer to resources about what races care to avoid spamming the mailing list in the future. Hernan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-22 15:24 ` Hernan Ponce de Leon @ 2023-01-23 16:40 ` Paul E. McKenney 2023-01-23 17:34 ` Alan Stern 2023-01-24 14:57 ` Hernan Ponce de Leon 0 siblings, 2 replies; 24+ messages in thread From: Paul E. McKenney @ 2023-01-23 16:40 UTC (permalink / raw) To: Hernan Ponce de Leon Cc: Arjan van de Ven, peterz, mingo, will, longman, boqun.feng, akpm, tglx, joel, stern, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable On Sun, Jan 22, 2023 at 04:24:21PM +0100, Hernan Ponce de Leon wrote: > On 1/20/2023 4:54 PM, Paul E. McKenney wrote: > > On Fri, Jan 20, 2023 at 06:58:20AM -0800, Arjan van de Ven wrote: > > > On 1/20/2023 5:55 AM, Hernan Ponce de Leon wrote: > > > > From: Hernan Ponce de Leon <hernanl.leon@huawei.com> > > > > > > > > > > > kernel/locking/rtmutex.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c > > > > index 010cf4e6d0b8..7ed9472edd48 100644 > > > > --- a/kernel/locking/rtmutex.c > > > > +++ b/kernel/locking/rtmutex.c > > > > @@ -235,7 +235,7 @@ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock) > > > > unsigned long owner, *p = (unsigned long *) &lock->owner; > > > > do { > > > > - owner = *p; > > > > + owner = READ_ONCE(*p); > > > > } while (cmpxchg_relaxed(p, owner, > > > > > > > > > I don't see how this makes any difference at all. > > > *p can be read a dozen times and it's fine; cmpxchg has barrier semantics for compilers afaics > > > > Doing so does suppress a KCSAN warning. You could also use data_race() > > if it turns out that the volatile semantics would prevent a valuable > > compiler optimization. > > I think the import question is "is this a harmful data race (and needs to be > fixed as proposed by the patch) or a harmless one (and we should use > data_race() to silence tools)?". > > In https://lkml.org/lkml/2023/1/22/160 I describe how this data race can > affect important ordering guarantees for the rest of the code. For this > reason I consider it a harmful one. If this is not the case, I would > appreciate some feedback or pointer to resources about what races care to > avoid spamming the mailing list in the future. In the case, the value read is passed into cmpxchg_relaxed(), which checks the value against memory. In this case, as Arjan noted, the only compiler-and-silicon difference between data_race() and READ_ONCE() is that use of data_race() might allow the compiler to do things like tear the load, thus forcing the occasional spurious cmpxchg_relaxed() failure. In contrast, LKMM (by design) throws up its hands when it sees a data race. Something about not being eager to track the idiosyncrasies of many compiler versions. My approach in my own code is to use *_ONCE() unless it causes a visible performance regression or if it confuses KCSAN. An example of the latter can be debug code, in which case use of data_race() avoids suppressing KCSAN warnings (and also false positives, depending). Except that your other email seems to also be arguing that additional ordering is required. So is https://lkml.org/lkml/2023/1/20/702 really sufficient just by itself, or is additional ordering required? Thanx, Paul ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-23 16:40 ` Paul E. McKenney @ 2023-01-23 17:34 ` Alan Stern 2023-01-23 17:48 ` Paul E. McKenney 2023-01-23 20:02 ` Jonas Oberhauser 2023-01-24 14:57 ` Hernan Ponce de Leon 1 sibling, 2 replies; 24+ messages in thread From: Alan Stern @ 2023-01-23 17:34 UTC (permalink / raw) To: Paul E. McKenney Cc: Hernan Ponce de Leon, Arjan van de Ven, peterz, mingo, will, longman, boqun.feng, akpm, tglx, joel, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable On Mon, Jan 23, 2023 at 08:40:14AM -0800, Paul E. McKenney wrote: > In the case, the value read is passed into cmpxchg_relaxed(), which > checks the value against memory. In this case, as Arjan noted, the only > compiler-and-silicon difference between data_race() and READ_ONCE() > is that use of data_race() might allow the compiler to do things like > tear the load, thus forcing the occasional spurious cmpxchg_relaxed() > failure. Is it possible in theory for a torn load to cause a spurious cmpxchg_relaxed() success? Or would that not matter here? Alan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-23 17:34 ` Alan Stern @ 2023-01-23 17:48 ` Paul E. McKenney 2023-01-23 20:02 ` Jonas Oberhauser 1 sibling, 0 replies; 24+ messages in thread From: Paul E. McKenney @ 2023-01-23 17:48 UTC (permalink / raw) To: Alan Stern Cc: Hernan Ponce de Leon, Arjan van de Ven, peterz, mingo, will, longman, boqun.feng, akpm, tglx, joel, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable On Mon, Jan 23, 2023 at 12:34:37PM -0500, Alan Stern wrote: > On Mon, Jan 23, 2023 at 08:40:14AM -0800, Paul E. McKenney wrote: > > In the case, the value read is passed into cmpxchg_relaxed(), which > > checks the value against memory. In this case, as Arjan noted, the only > > compiler-and-silicon difference between data_race() and READ_ONCE() > > is that use of data_race() might allow the compiler to do things like > > tear the load, thus forcing the occasional spurious cmpxchg_relaxed() > > failure. > > Is it possible in theory for a torn load to cause a spurious > cmpxchg_relaxed() success? Or would that not matter here? In this case, the new value is the old value with an additional bit set. There is no check for that bit being clear, so I am having a hard time seeing a difference. Then again, much might depend on the ordering that Hernan is referring to. And Peter Zijlstra's suggestion of set_bit() is quite attractive, give or take the casting issues called out by David Laight. Thanx, Paul ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-23 17:34 ` Alan Stern 2023-01-23 17:48 ` Paul E. McKenney @ 2023-01-23 20:02 ` Jonas Oberhauser 1 sibling, 0 replies; 24+ messages in thread From: Jonas Oberhauser @ 2023-01-23 20:02 UTC (permalink / raw) To: Alan Stern, Paul E. McKenney Cc: Hernan Ponce de Leon, Arjan van de Ven, peterz, mingo, will, longman, boqun.feng, akpm, tglx, joel, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable On 1/23/2023 6:34 PM, Alan Stern wrote: > On Mon, Jan 23, 2023 at 08:40:14AM -0800, Paul E. McKenney wrote: >> In the case, the value read is passed into cmpxchg_relaxed(), which >> checks the value against memory. In this case, as Arjan noted, the only >> compiler-and-silicon difference between data_race() and READ_ONCE() >> is that use of data_race() might allow the compiler to do things like >> tear the load, thus forcing the occasional spurious cmpxchg_relaxed() >> failure. > Is it possible in theory for a torn load to cause a spurious > cmpxchg_relaxed() success? Or would that not matter here? Note that in this example there are no memory accesses between the read and the CAS. So if the cmpxchg succeeds, what you non-atomically read must be exactly the value that is read by the cmpxchg, and you could pretend that the torn read happened at the same time as the cmpxchg. This "pretend" part requires that there are no other events in the middle, otherwise you could be violating some ordering constraint between those events and the torn reads. Otherwise you might get some issues. E.g., you might read a sequence count of 259 from reading the lower half when the count is 3 and the upper half when the count is 256, and then do the CAS when the sequence count is 259, so if you had two peeks at sequence-count-protected data between that read and the CAS you might see different states despite the CAS succeeding. have fun, jonas ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-23 16:40 ` Paul E. McKenney 2023-01-23 17:34 ` Alan Stern @ 2023-01-24 14:57 ` Hernan Ponce de Leon 2023-01-24 15:42 ` Waiman Long 2023-01-24 16:12 ` [PATCH] Fix data race in mark_rt_mutex_waiters Paul E. McKenney 1 sibling, 2 replies; 24+ messages in thread From: Hernan Ponce de Leon @ 2023-01-24 14:57 UTC (permalink / raw) To: paulmck Cc: Arjan van de Ven, peterz, mingo, will, longman, boqun.feng, akpm, tglx, joel, stern, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable On 1/23/2023 5:40 PM, Paul E. McKenney wrote: > On Sun, Jan 22, 2023 at 04:24:21PM +0100, Hernan Ponce de Leon wrote: >> On 1/20/2023 4:54 PM, Paul E. McKenney wrote: >>> On Fri, Jan 20, 2023 at 06:58:20AM -0800, Arjan van de Ven wrote: >>>> On 1/20/2023 5:55 AM, Hernan Ponce de Leon wrote: >>>>> From: Hernan Ponce de Leon <hernanl.leon@huawei.com> >>>>> >>>> >>>>> kernel/locking/rtmutex.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c >>>>> index 010cf4e6d0b8..7ed9472edd48 100644 >>>>> --- a/kernel/locking/rtmutex.c >>>>> +++ b/kernel/locking/rtmutex.c >>>>> @@ -235,7 +235,7 @@ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock) >>>>> unsigned long owner, *p = (unsigned long *) &lock->owner; >>>>> do { >>>>> - owner = *p; >>>>> + owner = READ_ONCE(*p); >>>>> } while (cmpxchg_relaxed(p, owner, >>>> >>>> >>>> I don't see how this makes any difference at all. >>>> *p can be read a dozen times and it's fine; cmpxchg has barrier semantics for compilers afaics >>> >>> Doing so does suppress a KCSAN warning. You could also use data_race() >>> if it turns out that the volatile semantics would prevent a valuable >>> compiler optimization. >> >> I think the import question is "is this a harmful data race (and needs to be >> fixed as proposed by the patch) or a harmless one (and we should use >> data_race() to silence tools)?". >> >> In https://lkml.org/lkml/2023/1/22/160 I describe how this data race can >> affect important ordering guarantees for the rest of the code. For this >> reason I consider it a harmful one. If this is not the case, I would >> appreciate some feedback or pointer to resources about what races care to >> avoid spamming the mailing list in the future. > > In the case, the value read is passed into cmpxchg_relaxed(), which > checks the value against memory. In this case, as Arjan noted, the only > compiler-and-silicon difference between data_race() and READ_ONCE() > is that use of data_race() might allow the compiler to do things like > tear the load, thus forcing the occasional spurious cmpxchg_relaxed() > failure. In contrast, LKMM (by design) throws up its hands when it sees > a data race. Something about not being eager to track the idiosyncrasies > of many compiler versions. > > My approach in my own code is to use *_ONCE() unless it causes a visible > performance regression or if it confuses KCSAN. An example of the latter > can be debug code, in which case use of data_race() avoids suppressing > KCSAN warnings (and also false positives, depending). I understand that *_ONCE() might avoid some compiler optimization and reduce performance in the general case. However, if I understand your first paragraph correctly, in this particular case data_race() could allow the CAS to fail more often, resulting in more spinning iterations and degraded performance. Am I right? > > Except that your other email seems to also be arguing that additional > ordering is required. So is https://lkml.org/lkml/2023/1/20/702 really > sufficient just by itself, or is additional ordering required? I do not claim that we need to mark the read to add the ordering that is needed for correctness (mutual exclusion). What I claim in this patch is that there is a data race, and since it can affect ordering constrains in subtle ways, I consider it harmful and thus I want to fix it. What I explain in the other email is that if we fix the data race, either the fence or the acquire store might be relaxed (because marking the read gives us some extra ordering guarantees). If the race is not fixed, both the fence and the acquire are needed according to LKMM. The situation is different wrt hardware models. In that case the tool cannot find any violation even if we don't fix the race and we relax the store / remove the fence. Hernan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-24 14:57 ` Hernan Ponce de Leon @ 2023-01-24 15:42 ` Waiman Long 2023-01-24 15:52 ` Peter Zijlstra 2023-01-24 16:12 ` [PATCH] Fix data race in mark_rt_mutex_waiters Paul E. McKenney 1 sibling, 1 reply; 24+ messages in thread From: Waiman Long @ 2023-01-24 15:42 UTC (permalink / raw) To: Hernan Ponce de Leon, paulmck Cc: Arjan van de Ven, peterz, mingo, will, boqun.feng, akpm, tglx, joel, stern, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable On 1/24/23 09:57, Hernan Ponce de Leon wrote: >> In the case, the value read is passed into cmpxchg_relaxed(), which >> checks the value against memory. In this case, as Arjan noted, the only >> compiler-and-silicon difference between data_race() and READ_ONCE() >> is that use of data_race() might allow the compiler to do things like >> tear the load, thus forcing the occasional spurious cmpxchg_relaxed() >> failure. In contrast, LKMM (by design) throws up its hands when it sees >> a data race. Something about not being eager to track the >> idiosyncrasies >> of many compiler versions. >> >> My approach in my own code is to use *_ONCE() unless it causes a visible >> performance regression or if it confuses KCSAN. An example of the >> latter >> can be debug code, in which case use of data_race() avoids suppressing >> KCSAN warnings (and also false positives, depending). > > I understand that *_ONCE() might avoid some compiler optimization and > reduce performance in the general case. However, if I understand your > first paragraph correctly, in this particular case data_race() could > allow the CAS to fail more often, resulting in more spinning > iterations and degraded performance. Am I right? > >> >> Except that your other email seems to also be arguing that additional >> ordering is required. So is https://lkml.org/lkml/2023/1/20/702 really >> sufficient just by itself, or is additional ordering required? > > I do not claim that we need to mark the read to add the ordering that > is needed for correctness (mutual exclusion). What I claim in this > patch is that there is a data race, and since it can affect ordering > constrains in subtle ways, I consider it harmful and thus I want to > fix it. > > What I explain in the other email is that if we fix the data race, > either the fence or the acquire store might be relaxed (because > marking the read gives us some extra ordering guarantees). If the race > is not fixed, both the fence and the acquire are needed according to > LKMM. The situation is different wrt hardware models. In that case the > tool cannot find any violation even if we don't fix the race and we > relax the store / remove the fence. I would suggest to do it as suggested by PeterZ. Instead of set_bit(), however, it is probably better to use atomic_long_or() like atomic_long_or_relaxed(RT_MUTEX_HAS_WAITERS, (atomic_long_t *)&lock->owner) The mutex code stores the lock owner as atomic_long_t. So it is natural to treat &lock->owner as atomic_long_t here too. Cheers, Longman ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-24 15:42 ` Waiman Long @ 2023-01-24 15:52 ` Peter Zijlstra 2023-01-24 16:04 ` Waiman Long 0 siblings, 1 reply; 24+ messages in thread From: Peter Zijlstra @ 2023-01-24 15:52 UTC (permalink / raw) To: Waiman Long Cc: Hernan Ponce de Leon, paulmck, Arjan van de Ven, mingo, will, boqun.feng, akpm, tglx, joel, stern, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable On Tue, Jan 24, 2023 at 10:42:24AM -0500, Waiman Long wrote: > I would suggest to do it as suggested by PeterZ. Instead of set_bit(), > however, it is probably better to use atomic_long_or() like > > atomic_long_or_relaxed(RT_MUTEX_HAS_WAITERS, (atomic_long_t *)&lock->owner) That function doesn't exist, atomic_long_or() is implicitly relaxed for not returning a value. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-24 15:52 ` Peter Zijlstra @ 2023-01-24 16:04 ` Waiman Long 2023-01-26 9:42 ` Hernan Ponce de Leon 0 siblings, 1 reply; 24+ messages in thread From: Waiman Long @ 2023-01-24 16:04 UTC (permalink / raw) To: Peter Zijlstra Cc: Hernan Ponce de Leon, paulmck, Arjan van de Ven, mingo, will, boqun.feng, akpm, tglx, joel, stern, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable On 1/24/23 10:52, Peter Zijlstra wrote: > On Tue, Jan 24, 2023 at 10:42:24AM -0500, Waiman Long wrote: > >> I would suggest to do it as suggested by PeterZ. Instead of set_bit(), >> however, it is probably better to use atomic_long_or() like >> >> atomic_long_or_relaxed(RT_MUTEX_HAS_WAITERS, (atomic_long_t *)&lock->owner) > That function doesn't exist, atomic_long_or() is implicitly relaxed for > not returning a value. > You are right. atomic_long_or() doesn't have variants like some others. Cheers, Longman ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-24 16:04 ` Waiman Long @ 2023-01-26 9:42 ` Hernan Ponce de Leon 2023-01-26 12:20 ` Peter Zijlstra 0 siblings, 1 reply; 24+ messages in thread From: Hernan Ponce de Leon @ 2023-01-26 9:42 UTC (permalink / raw) To: Waiman Long, Peter Zijlstra Cc: paulmck, Arjan van de Ven, mingo, will, boqun.feng, akpm, tglx, joel, stern, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable, Jonas Oberhauser On 1/24/2023 5:04 PM, Waiman Long wrote: > > On 1/24/23 10:52, Peter Zijlstra wrote: >> On Tue, Jan 24, 2023 at 10:42:24AM -0500, Waiman Long wrote: >> >>> I would suggest to do it as suggested by PeterZ. Instead of set_bit(), >>> however, it is probably better to use atomic_long_or() like >>> >>> atomic_long_or_relaxed(RT_MUTEX_HAS_WAITERS, (atomic_long_t >>> *)&lock->owner) >> That function doesn't exist, atomic_long_or() is implicitly relaxed for >> not returning a value. >> > You are right. atomic_long_or() doesn't have variants like some others. > > Cheers, > Longman > When you say "replace the whole of that function", do you mean "barrier included"? I argue in the other email that I think this should not affect correctness (at least not obviously), but removing the barrier is doing more than just fixing the data race as this patch suggests. Hernan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-26 9:42 ` Hernan Ponce de Leon @ 2023-01-26 12:20 ` Peter Zijlstra 2023-01-26 14:20 ` Peter Zijlstra 2023-01-26 21:07 ` Hernan Ponce de Leon 0 siblings, 2 replies; 24+ messages in thread From: Peter Zijlstra @ 2023-01-26 12:20 UTC (permalink / raw) To: Hernan Ponce de Leon Cc: Waiman Long, paulmck, Arjan van de Ven, mingo, will, boqun.feng, akpm, tglx, joel, stern, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable, Jonas Oberhauser On Thu, Jan 26, 2023 at 10:42:07AM +0100, Hernan Ponce de Leon wrote: > On 1/24/2023 5:04 PM, Waiman Long wrote: > > > > On 1/24/23 10:52, Peter Zijlstra wrote: > > > On Tue, Jan 24, 2023 at 10:42:24AM -0500, Waiman Long wrote: > > > > > > > I would suggest to do it as suggested by PeterZ. Instead of set_bit(), > > > > however, it is probably better to use atomic_long_or() like > > > > > > > > atomic_long_or_relaxed(RT_MUTEX_HAS_WAITERS, (atomic_long_t > > > > *)&lock->owner) > > > That function doesn't exist, atomic_long_or() is implicitly relaxed for > > > not returning a value. > > > > > You are right. atomic_long_or() doesn't have variants like some others. > > > > Cheers, > > Longman > > > > When you say "replace the whole of that function", do you mean "barrier > included"? I argue in the other email that I think this should not affect > correctness (at least not obviously), but removing the barrier is doing more > than just fixing the data race as this patch suggests. Well, set_bit() implies smp_mb(), atomic_long_or() does not and would need to retain the barrier. That said, the comments are all batshit. The top comment states relaxed ordering is suffient since holding lock, the comment with the barrier utterly fails to explain what it's ordering against. So all that would need to be updated as well. That said, looking at 1c0908d8e441 I'm not at all sure we need that barrier. Even in the try_to_take_rt_mutex(.waiter=NULL) case, where we skip over the task->pi_lock region, rt_mutex_set_owner(.acquire=true) will be ACQUIRE. And in case of rt_mutex_owner(), we fail the trylock (return with 0) and a failed trylock does not imply any ordering. So why are we having this barrier? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-26 12:20 ` Peter Zijlstra @ 2023-01-26 14:20 ` Peter Zijlstra 2023-01-26 21:07 ` Hernan Ponce de Leon 1 sibling, 0 replies; 24+ messages in thread From: Peter Zijlstra @ 2023-01-26 14:20 UTC (permalink / raw) To: Hernan Ponce de Leon Cc: Waiman Long, paulmck, Arjan van de Ven, mingo, will, boqun.feng, akpm, tglx, joel, stern, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable, Jonas Oberhauser On Thu, Jan 26, 2023 at 01:20:07PM +0100, Peter Zijlstra wrote: > Well, set_bit() implies smp_mb(), atomic_long_or() does not and would > need to retain the barrier. set_bit() does not, must've had a brain-fart or so. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-26 12:20 ` Peter Zijlstra 2023-01-26 14:20 ` Peter Zijlstra @ 2023-01-26 21:07 ` Hernan Ponce de Leon 2023-01-26 22:10 ` David Laight 1 sibling, 1 reply; 24+ messages in thread From: Hernan Ponce de Leon @ 2023-01-26 21:07 UTC (permalink / raw) To: Peter Zijlstra Cc: Waiman Long, paulmck, Arjan van de Ven, mingo, will, boqun.feng, akpm, tglx, joel, stern, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable, Jonas Oberhauser On 1/26/2023 1:20 PM, Peter Zijlstra wrote: > On Thu, Jan 26, 2023 at 10:42:07AM +0100, Hernan Ponce de Leon wrote: >> On 1/24/2023 5:04 PM, Waiman Long wrote: >>> >>> On 1/24/23 10:52, Peter Zijlstra wrote: >>>> On Tue, Jan 24, 2023 at 10:42:24AM -0500, Waiman Long wrote: >>>> >>>>> I would suggest to do it as suggested by PeterZ. Instead of set_bit(), >>>>> however, it is probably better to use atomic_long_or() like >>>>> >>>>> atomic_long_or_relaxed(RT_MUTEX_HAS_WAITERS, (atomic_long_t >>>>> *)&lock->owner) >>>> That function doesn't exist, atomic_long_or() is implicitly relaxed for >>>> not returning a value. >>>> >>> You are right. atomic_long_or() doesn't have variants like some others. >>> >>> Cheers, >>> Longman >>> >> >> When you say "replace the whole of that function", do you mean "barrier >> included"? I argue in the other email that I think this should not affect >> correctness (at least not obviously), but removing the barrier is doing more >> than just fixing the data race as this patch suggests. > > Well, set_bit() implies smp_mb(), atomic_long_or() does not and would > need to retain the barrier. > > That said, the comments are all batshit. The top comment states relaxed > ordering is suffient since holding lock, the comment with the barrier > utterly fails to explain what it's ordering against. I think the top comment became obsolete after 1c0908d8e441 and this just went unnoticed. I agree the comment with the barrier does not say much and getting some more detailed information was one of the goals of my other email. > > So all that would need to be updated as well. > > That said, looking at 1c0908d8e441 I'm not at all sure we need that > barrier. Even in the try_to_take_rt_mutex(.waiter=NULL) case, where we > skip over the task->pi_lock region, rt_mutex_set_owner(.acquire=true) > will be ACQUIRE. This sentence states in a clear way the idea I was trying to express in my other email about why the barrier is not necessary. I think the same argument holds if we keep the barrier and relax the store in rt_mutex_set_owner as suggested by Boqun (see patch below). > > And in case of rt_mutex_owner(), we fail the trylock (return with 0) and > a failed trylock does not imply any ordering. > > So why are we having this barrier? I run again the verification with the following patch (I am aware the comments still need to be updated, this was just to be able to run the tool) and the tool still finds no violation. diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 010cf4e6d0b8..c62e409906a2 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -107,7 +107,7 @@ rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner) * lock->wait_lock is held but explicit acquire semantics are needed * for a new lock owner so WRITE_ONCE is insufficient. */ - xchg_acquire(&lock->owner, rt_mutex_owner_encode(lock, owner)); + WRITE_ONCE(lock->owner, rt_mutex_owner_encode(lock, owner)); } static __always_inline void rt_mutex_clear_owner(struct rt_mutex_base *lock) @@ -232,12 +232,7 @@ static __always_inline bool rt_mutex_cmpxchg_release(struct rt_mutex_base *lock, */ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock) { - unsigned long owner, *p = (unsigned long *) &lock->owner; - - do { - owner = *p; - } while (cmpxchg_relaxed(p, owner, - owner | RT_MUTEX_HAS_WAITERS) != owner); + atomic_long_or(RT_MUTEX_HAS_WAITERS, (atomic_long_t *)&lock->owner); /* * The cmpxchg loop above is relaxed to avoid back-to-back ACQUIRE -- ^ permalink raw reply related [flat|nested] 24+ messages in thread
* RE: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-26 21:07 ` Hernan Ponce de Leon @ 2023-01-26 22:10 ` David Laight 2023-01-27 1:46 ` Waiman Long 0 siblings, 1 reply; 24+ messages in thread From: David Laight @ 2023-01-26 22:10 UTC (permalink / raw) To: 'Hernan Ponce de Leon', Peter Zijlstra Cc: Waiman Long, paulmck, Arjan van de Ven, mingo, will, boqun.feng, akpm, tglx, joel, stern, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable, Jonas Oberhauser From: Hernan Ponce de Leon > Sent: 26 January 2023 21:07 ... > static __always_inline void rt_mutex_clear_owner(struct rt_mutex_base > *lock) > @@ -232,12 +232,7 @@ static __always_inline bool > rt_mutex_cmpxchg_release(struct rt_mutex_base *lock, > */ > static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base > *lock) > { > - unsigned long owner, *p = (unsigned long *) &lock->owner; > - > - do { > - owner = *p; > - } while (cmpxchg_relaxed(p, owner, > - owner | RT_MUTEX_HAS_WAITERS) != owner); > + atomic_long_or(RT_MUTEX_HAS_WAITERS, (atomic_long_t *)&lock->owner); These *(int_type *)&foo accesses (quite often just plain wrong) made me look up the definitions. All one big accident waiting to happen... RT_MUTEX_HAS_WAITERS is defined in a different header to the structure. The explanatory comment is in a 3rd file. It would all be safer if lock->owner were atomic_long_t with a comment that it was the waiting task_struct | RT_MUTEX_HAS_WAITERS. Given the actual definition is rt_mutex_base_is_locked() even correct? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-26 22:10 ` David Laight @ 2023-01-27 1:46 ` Waiman Long 2023-03-01 16:32 ` lock_torture results for different patches: Antonio Paolillo 0 siblings, 1 reply; 24+ messages in thread From: Waiman Long @ 2023-01-27 1:46 UTC (permalink / raw) To: David Laight, 'Hernan Ponce de Leon', Peter Zijlstra Cc: paulmck, Arjan van de Ven, mingo, will, boqun.feng, akpm, tglx, joel, stern, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable, Jonas Oberhauser On 1/26/23 17:10, David Laight wrote: > From: Hernan Ponce de Leon >> Sent: 26 January 2023 21:07 > ... >> static __always_inline void rt_mutex_clear_owner(struct rt_mutex_base >> *lock) >> @@ -232,12 +232,7 @@ static __always_inline bool >> rt_mutex_cmpxchg_release(struct rt_mutex_base *lock, >> */ >> static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base >> *lock) >> { >> - unsigned long owner, *p = (unsigned long *) &lock->owner; >> - >> - do { >> - owner = *p; >> - } while (cmpxchg_relaxed(p, owner, >> - owner | RT_MUTEX_HAS_WAITERS) != owner); >> + atomic_long_or(RT_MUTEX_HAS_WAITERS, (atomic_long_t *)&lock->owner); > These *(int_type *)&foo accesses (quite often just plain wrong) > made me look up the definitions. > > All one big accident waiting to happen... > RT_MUTEX_HAS_WAITERS is defined in a different header to the structure. > The explanatory comment is in a 3rd file. > > It would all be safer if lock->owner were atomic_long_t with a comment > that it was the waiting task_struct | RT_MUTEX_HAS_WAITERS. > > Given the actual definition is rt_mutex_base_is_locked() even correct? It is arguable if it should be considered locked if a waiter is waiting but the lock is at an unlock state at the moment. Mutex has a narrower definition of locked while others have a broader one. Cheers, Longman ^ permalink raw reply [flat|nested] 24+ messages in thread
* lock_torture results for different patches: 2023-01-27 1:46 ` Waiman Long @ 2023-03-01 16:32 ` Antonio Paolillo 2023-03-06 7:58 ` Hernan Ponce de Leon 2023-03-10 17:11 ` Paul E. McKenney 0 siblings, 2 replies; 24+ messages in thread From: Antonio Paolillo @ 2023-03-01 16:32 UTC (permalink / raw) To: longman Cc: David.Laight, akpm, arjan, boqun.feng, diogo.behrens, hernan.poncedeleon, hernanl.leon, joel, jonas.oberhauser, jonas.oberhauser, linux-kernel, mingo, paulmck, peterz, stable, stern, tglx, will Dear all, I want to provide some support to Hernan regarding performance claims. I used lock_torture to evaluate the different proposed patches on two different server machines: - a Huawei TaiShan 200 (Model 2280) rack server that has 128 GB of RAM and 2x Kunpeng 920-4826 processors, a HiSilicon chip with 48 ARMv8.2 64-bit cores totaling 96 cores (no SMT) [1, 2], denoted as taishan200-96c; - a GIGABYTE R182-Z91-00 rack server that has 128 GB of RAM and 2x EPYC 7352 processors, an AMD chip with 24 x86_64 cores, totaling 48 cores (96 CPUs when counting hyperthreading) [3, 4], denoted as gigabyte-96c. I ran the evaluation on a Ubuntu 22.04 distro, with custom kernels based on v6.2-rc6 (6d796c50f84ca79f1722bb131799e5a5710c4700). The different kernels are combination of patches: - (0) Stock kernel; - (1) With relaxed set owner barrier (as discussed in [5] and questioned by Peter, the barrier seems not to be needed); - (2) With READ_ONCE(), as originally proposed in this thread; - (3) With atomic_long_or() as proposed by Peter; - (4) With relaxed set owner barrier and READ_ONCE(); - (5) With relaxed set owner barrier and atomic_long_or(). I ran lock_torture several times, exploring the following parameter space: - torture_type="rtmutex_lock", - nwriters_stress=[1, 2, 3, 4, 8, 16, 32, 64, 95], - stat_interval=4, - stutter=0, - shuffle_interval=0. For each value of "nwriters_stress", I ran the configuration 5 times. By feeding the lock_torture kthread pids to "taskset -p", I overruled the scheduling such that the distribution of kthreads to CPUs is fixed. I also disabled "irq balance" and "numa balance" daemons, fixed the frequency to 1.5GHz using the "userspace" cpufreq governor and isolated all the cores used (using isolcpus=1-95 at boot-time) to avoid any source of interference. As a warm-up phase, I ignored the first reported results and only considered the latest 60 seconds of execution (after all kthreads migrated to their final CPU). The reported throughput is computed by dividing the reported number of operations by the duration of the measurement for each dot (60 seconds), so higher is better. Here follows the results on taishan200-96c (the 'rel' column is the mean relative to the mean of the stock kernel, in percent, and each mean is the average over 5 independent runs): Kernel: k0-stock-6.2.0-rc6 k1-rmacq k2-readonce k3-alongor k4-rmacq+readonce k5-rmacq+alongor Statistic (kops/s): mean std mean std rel mean std rel mean std rel mean std rel mean std rel nwriters_stress: 1 899.91 24.95 880.10 29.62 -2% 871.57 44.27 -3% 888.65 37.90 -1% 898.63 29.82 -0% 889.83 25.64 -1% 2 359.30 25.92 416.83 32.77 +16% 360.65 28.32 +0% 404.79 42.64 +13% 380.65 21.29 +6% 404.37 23.27 +13% 3 314.97 24.32 308.41 9.68 -2% 315.00 9.97 +0% 313.86 13.47 -0% 313.47 4.01 -0% 322.77 20.82 +2% 4 328.02 15.09 330.65 29.33 +1% 314.83 24.28 -4% 305.71 12.72 -7% 322.95 10.39 -2% 343.32 13.73 +5% 8 292.16 22.03 288.85 10.50 -1% 288.28 18.84 -1% 285.42 24.58 -2% 310.23 26.08 +6% 285.67 20.03 -2% 16 297.03 26.89 281.89 29.22 -5% 265.19 33.73 -11% 279.02 22.43 -6% 284.40 36.21 -4% 285.21 36.33 -4% 32 187.36 28.59 175.71 19.77 -6% 186.44 48.15 -0% 206.59 14.11 +10% 174.08 24.30 -7% 185.80 45.12 -1% 64 148.13 48.65 172.48 34.29 +16% 154.59 47.05 +4% 164.22 29.81 +11% 142.13 47.40 -4% 136.39 29.95 -8% 95 174.35 57.89 148.59 38.03 -15% 156.85 43.64 -10% 132.92 32.35 -24% 126.44 28.24 -27% 146.82 60.04 -16% And the results on gigabyte-96c: Kernel: k0-stock-6.2.0-rc6 k1-rmacq k2-readonce k3-alongor k4-rmacq+readonce k5-rmacq+alongor Statistic (kops/s): mean std mean std rel mean std rel mean std rel mean std rel mean std rel nwriters_stress: 1 713.72 25.68 707.32 17.73 -1% 718.81 12.63 +1% 712.80 13.57 -0% 709.17 14.10 -1% 730.33 9.14 +2% 2 376.25 8.19 400.09 16.24 +6% 396.71 26.09 +5% 412.61 17.80 +10% 396.48 7.02 +5% 409.90 14.61 +9% 3 415.07 16.83 410.19 19.82 -1% 423.39 9.68 +2% 417.28 10.23 +1% 424.94 17.48 +2% 422.92 11.75 +2% 4 286.77 26.63 285.13 6.80 -1% 297.33 23.62 +4% 296.49 16.60 +3% 303.99 30.38 +6% 296.93 9.90 +4% 8 296.56 20.45 308.97 12.53 +4% 305.49 19.91 +3% 294.24 17.24 -1% 294.71 24.03 -1% 294.09 25.20 -1% 16 257.34 33.94 266.03 29.60 +3% 270.72 35.22 +5% 252.28 50.16 -2% 263.83 45.84 +3% 247.42 41.01 -4% 32 278.78 51.45 215.35 68.40 -23% 259.77 87.44 -7% 217.26 79.67 -22% 201.23 70.46 -28% 282.47 116.65 +1% 64 75.82 64.87 194.52 137.19 +157% 35.57 12.14 -53% 74.24 72.04 -2% 71.29 45.55 -6% 77.93 43.57 +3% 95 60.37 68.13 198.38 116.93 +229% 43.12 17.60 -29% 58.80 36.47 -3% 57.78 63.00 -4% 61.33 71.18 +2% We can safely conclude that the patches do not significatively affect the throughput of the lock_torture benchmark for rtmutex_lock. The values for nwriters_stress>=64 can safely be ignored as they are too spread. Please notice that I pushed a landing page [6] with results in HTML that may be more convenient to browse together with interactive charts. Cheers, Antonio [1] https://e.huawei.com/uk/products/servers/taishan-server/taishan-2280-v2 [2] https://en.wikichip.org/wiki/hisilicon/kunpeng/920-4826 [3] https://www.gigabyte.com/Rack-Server/R182-Z91-rev-100 [4] https://www.amd.com/en/products/cpu/amd-epyc-7352 [5] https://lkml.org/lkml/2023/1/22/160 [6] https://antonio.paolillo.be/public/rtlocks-locktorture-patches.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: lock_torture results for different patches: 2023-03-01 16:32 ` lock_torture results for different patches: Antonio Paolillo @ 2023-03-06 7:58 ` Hernan Ponce de Leon 2023-03-10 17:11 ` Paul E. McKenney 1 sibling, 0 replies; 24+ messages in thread From: Hernan Ponce de Leon @ 2023-03-06 7:58 UTC (permalink / raw) To: Antonio Paolillo, longman Cc: David.Laight, akpm, arjan, boqun.feng, diogo.behrens, hernanl.leon, joel, jonas.oberhauser, jonas.oberhauser, linux-kernel, mingo, paulmck, peterz, stable, stern, tglx, will Thanks Antonio for the performance results. Taking both correctness and performance into consideration, this is my current understanding of the situation. Removing the acquire barrier does not improve performance and while it might be unnecessary, it seems better to play it safe and keep it. That being said, a comment about this should be added to the code (Paul suggested something like this at some point). Neither of the proposed fixes to the data race significantly affect performance. Using READ_ONCE shows better performance in the AMD machine, but using atomic_long_or shows the opposite in the HiSilicon one. However, the difference is more prominent using READ_ONCE. I am not sure I got David's comment. Are you proposing to change the definition of rt_mutex_base to struct rt_mutex_base { raw_spinlock_t wait_lock; struct rb_root_cached waiters; /* This the waiting task_struct | RT_MUTEX_HAS_WAITERS. */ atomic_long_t *owner; }; I don't fully understand the consequences of this. At minimum this would require changes in several methods such as the one below, right? static __always_inline bool rt_mutex_cmpxchg_release(struct rt_mutex_base *lock, atomic_long_t *old, atomic_long_t *new) Is there anything else that I am missing? Hernan On 3/1/2023 5:32 PM, Antonio Paolillo wrote: > Dear all, > > I want to provide some support to Hernan regarding performance claims. > > I used lock_torture to evaluate the different proposed patches on two > different server machines: > - a Huawei TaiShan 200 (Model 2280) rack server that has 128 GB of RAM > and 2x Kunpeng 920-4826 processors, a HiSilicon chip with 48 ARMv8.2 > 64-bit cores totaling 96 cores (no SMT) [1, 2], > denoted as taishan200-96c; > - a GIGABYTE R182-Z91-00 rack server that has 128 GB of RAM and 2x > EPYC 7352 processors, an AMD chip with 24 x86_64 cores, totaling 48 > cores (96 CPUs when counting hyperthreading) [3, 4], > denoted as gigabyte-96c. > > I ran the evaluation on a Ubuntu 22.04 distro, with custom kernels based > on v6.2-rc6 (6d796c50f84ca79f1722bb131799e5a5710c4700). > The different kernels are combination of patches: > - (0) Stock kernel; > - (1) With relaxed set owner barrier (as discussed in [5] and questioned > by Peter, the barrier seems not to be needed); > - (2) With READ_ONCE(), as originally proposed in this thread; > - (3) With atomic_long_or() as proposed by Peter; > - (4) With relaxed set owner barrier and READ_ONCE(); > - (5) With relaxed set owner barrier and atomic_long_or(). > > I ran lock_torture several times, exploring the following parameter > space: > - torture_type="rtmutex_lock", > - nwriters_stress=[1, 2, 3, 4, 8, 16, 32, 64, 95], > - stat_interval=4, > - stutter=0, > - shuffle_interval=0. > For each value of "nwriters_stress", I ran the configuration 5 times. > > By feeding the lock_torture kthread pids to "taskset -p", I overruled > the scheduling such that the distribution of kthreads to CPUs is fixed. > I also disabled "irq balance" and "numa balance" daemons, fixed the > frequency to 1.5GHz using the "userspace" cpufreq governor and isolated > all the cores used (using isolcpus=1-95 at boot-time) to avoid any > source of interference. > > As a warm-up phase, I ignored the first reported results and only > considered the latest 60 seconds of execution (after all kthreads > migrated to their final CPU). > The reported throughput is computed by dividing the reported number of > operations by the duration of the measurement for each dot (60 seconds), > so higher is better. > > Here follows the results on taishan200-96c (the 'rel' column is the mean > relative to the mean of the stock kernel, in percent, and each mean is > the average over 5 independent runs): > > Kernel: k0-stock-6.2.0-rc6 k1-rmacq k2-readonce k3-alongor k4-rmacq+readonce k5-rmacq+alongor > Statistic (kops/s): mean std mean std rel mean std rel mean std rel mean std rel mean std rel > nwriters_stress: > 1 899.91 24.95 880.10 29.62 -2% 871.57 44.27 -3% 888.65 37.90 -1% 898.63 29.82 -0% 889.83 25.64 -1% > 2 359.30 25.92 416.83 32.77 +16% 360.65 28.32 +0% 404.79 42.64 +13% 380.65 21.29 +6% 404.37 23.27 +13% > 3 314.97 24.32 308.41 9.68 -2% 315.00 9.97 +0% 313.86 13.47 -0% 313.47 4.01 -0% 322.77 20.82 +2% > 4 328.02 15.09 330.65 29.33 +1% 314.83 24.28 -4% 305.71 12.72 -7% 322.95 10.39 -2% 343.32 13.73 +5% > 8 292.16 22.03 288.85 10.50 -1% 288.28 18.84 -1% 285.42 24.58 -2% 310.23 26.08 +6% 285.67 20.03 -2% > 16 297.03 26.89 281.89 29.22 -5% 265.19 33.73 -11% 279.02 22.43 -6% 284.40 36.21 -4% 285.21 36.33 -4% > 32 187.36 28.59 175.71 19.77 -6% 186.44 48.15 -0% 206.59 14.11 +10% 174.08 24.30 -7% 185.80 45.12 -1% > 64 148.13 48.65 172.48 34.29 +16% 154.59 47.05 +4% 164.22 29.81 +11% 142.13 47.40 -4% 136.39 29.95 -8% > 95 174.35 57.89 148.59 38.03 -15% 156.85 43.64 -10% 132.92 32.35 -24% 126.44 28.24 -27% 146.82 60.04 -16% > > And the results on gigabyte-96c: > > Kernel: k0-stock-6.2.0-rc6 k1-rmacq k2-readonce k3-alongor k4-rmacq+readonce k5-rmacq+alongor > Statistic (kops/s): mean std mean std rel mean std rel mean std rel mean std rel mean std rel > nwriters_stress: > 1 713.72 25.68 707.32 17.73 -1% 718.81 12.63 +1% 712.80 13.57 -0% 709.17 14.10 -1% 730.33 9.14 +2% > 2 376.25 8.19 400.09 16.24 +6% 396.71 26.09 +5% 412.61 17.80 +10% 396.48 7.02 +5% 409.90 14.61 +9% > 3 415.07 16.83 410.19 19.82 -1% 423.39 9.68 +2% 417.28 10.23 +1% 424.94 17.48 +2% 422.92 11.75 +2% > 4 286.77 26.63 285.13 6.80 -1% 297.33 23.62 +4% 296.49 16.60 +3% 303.99 30.38 +6% 296.93 9.90 +4% > 8 296.56 20.45 308.97 12.53 +4% 305.49 19.91 +3% 294.24 17.24 -1% 294.71 24.03 -1% 294.09 25.20 -1% > 16 257.34 33.94 266.03 29.60 +3% 270.72 35.22 +5% 252.28 50.16 -2% 263.83 45.84 +3% 247.42 41.01 -4% > 32 278.78 51.45 215.35 68.40 -23% 259.77 87.44 -7% 217.26 79.67 -22% 201.23 70.46 -28% 282.47 116.65 +1% > 64 75.82 64.87 194.52 137.19 +157% 35.57 12.14 -53% 74.24 72.04 -2% 71.29 45.55 -6% 77.93 43.57 +3% > 95 60.37 68.13 198.38 116.93 +229% 43.12 17.60 -29% 58.80 36.47 -3% 57.78 63.00 -4% 61.33 71.18 +2% > > We can safely conclude that the patches do not significatively affect > the throughput of the lock_torture benchmark for rtmutex_lock. > The values for nwriters_stress>=64 can safely be ignored as they are too > spread. > > Please notice that I pushed a landing page [6] with results in HTML that > may be more convenient to browse together with interactive charts. > > Cheers, > > Antonio > > [1] https://e.huawei.com/uk/products/servers/taishan-server/taishan-2280-v2 > [2] https://en.wikichip.org/wiki/hisilicon/kunpeng/920-4826 > [3] https://www.gigabyte.com/Rack-Server/R182-Z91-rev-100 > [4] https://www.amd.com/en/products/cpu/amd-epyc-7352 > [5] https://lkml.org/lkml/2023/1/22/160 > [6] https://antonio.paolillo.be/public/rtlocks-locktorture-patches.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: lock_torture results for different patches: 2023-03-01 16:32 ` lock_torture results for different patches: Antonio Paolillo 2023-03-06 7:58 ` Hernan Ponce de Leon @ 2023-03-10 17:11 ` Paul E. McKenney 1 sibling, 0 replies; 24+ messages in thread From: Paul E. McKenney @ 2023-03-10 17:11 UTC (permalink / raw) To: Antonio Paolillo Cc: longman, David.Laight, akpm, arjan, boqun.feng, diogo.behrens, hernan.poncedeleon, hernanl.leon, joel, jonas.oberhauser, jonas.oberhauser, linux-kernel, mingo, peterz, stable, stern, tglx, will On Wed, Mar 01, 2023 at 05:32:14PM +0100, Antonio Paolillo wrote: > Dear all, > > I want to provide some support to Hernan regarding performance claims. > > I used lock_torture to evaluate the different proposed patches on two > different server machines: > - a Huawei TaiShan 200 (Model 2280) rack server that has 128 GB of RAM > and 2x Kunpeng 920-4826 processors, a HiSilicon chip with 48 ARMv8.2 > 64-bit cores totaling 96 cores (no SMT) [1, 2], > denoted as taishan200-96c; > - a GIGABYTE R182-Z91-00 rack server that has 128 GB of RAM and 2x > EPYC 7352 processors, an AMD chip with 24 x86_64 cores, totaling 48 > cores (96 CPUs when counting hyperthreading) [3, 4], > denoted as gigabyte-96c. > > I ran the evaluation on a Ubuntu 22.04 distro, with custom kernels based > on v6.2-rc6 (6d796c50f84ca79f1722bb131799e5a5710c4700). > The different kernels are combination of patches: > - (0) Stock kernel; > - (1) With relaxed set owner barrier (as discussed in [5] and questioned > by Peter, the barrier seems not to be needed); > - (2) With READ_ONCE(), as originally proposed in this thread; > - (3) With atomic_long_or() as proposed by Peter; > - (4) With relaxed set owner barrier and READ_ONCE(); > - (5) With relaxed set owner barrier and atomic_long_or(). > > I ran lock_torture several times, exploring the following parameter > space: > - torture_type="rtmutex_lock", > - nwriters_stress=[1, 2, 3, 4, 8, 16, 32, 64, 95], > - stat_interval=4, > - stutter=0, > - shuffle_interval=0. > For each value of "nwriters_stress", I ran the configuration 5 times. > > By feeding the lock_torture kthread pids to "taskset -p", I overruled > the scheduling such that the distribution of kthreads to CPUs is fixed. > I also disabled "irq balance" and "numa balance" daemons, fixed the > frequency to 1.5GHz using the "userspace" cpufreq governor and isolated > all the cores used (using isolcpus=1-95 at boot-time) to avoid any > source of interference. > > As a warm-up phase, I ignored the first reported results and only > considered the latest 60 seconds of execution (after all kthreads > migrated to their final CPU). > The reported throughput is computed by dividing the reported number of > operations by the duration of the measurement for each dot (60 seconds), > so higher is better. > > Here follows the results on taishan200-96c (the 'rel' column is the mean > relative to the mean of the stock kernel, in percent, and each mean is > the average over 5 independent runs): > > Kernel: k0-stock-6.2.0-rc6 k1-rmacq k2-readonce k3-alongor k4-rmacq+readonce k5-rmacq+alongor > Statistic (kops/s): mean std mean std rel mean std rel mean std rel mean std rel mean std rel > nwriters_stress: > 1 899.91 24.95 880.10 29.62 -2% 871.57 44.27 -3% 888.65 37.90 -1% 898.63 29.82 -0% 889.83 25.64 -1% > 2 359.30 25.92 416.83 32.77 +16% 360.65 28.32 +0% 404.79 42.64 +13% 380.65 21.29 +6% 404.37 23.27 +13% > 3 314.97 24.32 308.41 9.68 -2% 315.00 9.97 +0% 313.86 13.47 -0% 313.47 4.01 -0% 322.77 20.82 +2% > 4 328.02 15.09 330.65 29.33 +1% 314.83 24.28 -4% 305.71 12.72 -7% 322.95 10.39 -2% 343.32 13.73 +5% > 8 292.16 22.03 288.85 10.50 -1% 288.28 18.84 -1% 285.42 24.58 -2% 310.23 26.08 +6% 285.67 20.03 -2% > 16 297.03 26.89 281.89 29.22 -5% 265.19 33.73 -11% 279.02 22.43 -6% 284.40 36.21 -4% 285.21 36.33 -4% > 32 187.36 28.59 175.71 19.77 -6% 186.44 48.15 -0% 206.59 14.11 +10% 174.08 24.30 -7% 185.80 45.12 -1% > 64 148.13 48.65 172.48 34.29 +16% 154.59 47.05 +4% 164.22 29.81 +11% 142.13 47.40 -4% 136.39 29.95 -8% > 95 174.35 57.89 148.59 38.03 -15% 156.85 43.64 -10% 132.92 32.35 -24% 126.44 28.24 -27% 146.82 60.04 -16% > > And the results on gigabyte-96c: > > Kernel: k0-stock-6.2.0-rc6 k1-rmacq k2-readonce k3-alongor k4-rmacq+readonce k5-rmacq+alongor > Statistic (kops/s): mean std mean std rel mean std rel mean std rel mean std rel mean std rel > nwriters_stress: > 1 713.72 25.68 707.32 17.73 -1% 718.81 12.63 +1% 712.80 13.57 -0% 709.17 14.10 -1% 730.33 9.14 +2% > 2 376.25 8.19 400.09 16.24 +6% 396.71 26.09 +5% 412.61 17.80 +10% 396.48 7.02 +5% 409.90 14.61 +9% > 3 415.07 16.83 410.19 19.82 -1% 423.39 9.68 +2% 417.28 10.23 +1% 424.94 17.48 +2% 422.92 11.75 +2% > 4 286.77 26.63 285.13 6.80 -1% 297.33 23.62 +4% 296.49 16.60 +3% 303.99 30.38 +6% 296.93 9.90 +4% > 8 296.56 20.45 308.97 12.53 +4% 305.49 19.91 +3% 294.24 17.24 -1% 294.71 24.03 -1% 294.09 25.20 -1% > 16 257.34 33.94 266.03 29.60 +3% 270.72 35.22 +5% 252.28 50.16 -2% 263.83 45.84 +3% 247.42 41.01 -4% > 32 278.78 51.45 215.35 68.40 -23% 259.77 87.44 -7% 217.26 79.67 -22% 201.23 70.46 -28% 282.47 116.65 +1% > 64 75.82 64.87 194.52 137.19 +157% 35.57 12.14 -53% 74.24 72.04 -2% 71.29 45.55 -6% 77.93 43.57 +3% > 95 60.37 68.13 198.38 116.93 +229% 43.12 17.60 -29% 58.80 36.47 -3% 57.78 63.00 -4% 61.33 71.18 +2% > > We can safely conclude that the patches do not significatively affect > the throughput of the lock_torture benchmark for rtmutex_lock. > The values for nwriters_stress>=64 can safely be ignored as they are too > spread. Just so you know, locktorture is intended to be a stress test rather than a performance benchmark. Hugo Guiroux's dissertation gives a much better locking performance methodology: https://hugoguiroux.github.io/assets/these.pdf Thanx, Paul > Please notice that I pushed a landing page [6] with results in HTML that > may be more convenient to browse together with interactive charts. > > Cheers, > > Antonio > > [1] https://e.huawei.com/uk/products/servers/taishan-server/taishan-2280-v2 > [2] https://en.wikichip.org/wiki/hisilicon/kunpeng/920-4826 > [3] https://www.gigabyte.com/Rack-Server/R182-Z91-rev-100 > [4] https://www.amd.com/en/products/cpu/amd-epyc-7352 > [5] https://lkml.org/lkml/2023/1/22/160 > [6] https://antonio.paolillo.be/public/rtlocks-locktorture-patches.html > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-24 14:57 ` Hernan Ponce de Leon 2023-01-24 15:42 ` Waiman Long @ 2023-01-24 16:12 ` Paul E. McKenney 1 sibling, 0 replies; 24+ messages in thread From: Paul E. McKenney @ 2023-01-24 16:12 UTC (permalink / raw) To: Hernan Ponce de Leon Cc: Arjan van de Ven, peterz, mingo, will, longman, boqun.feng, akpm, tglx, joel, stern, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable On Tue, Jan 24, 2023 at 03:57:55PM +0100, Hernan Ponce de Leon wrote: > On 1/23/2023 5:40 PM, Paul E. McKenney wrote: > > On Sun, Jan 22, 2023 at 04:24:21PM +0100, Hernan Ponce de Leon wrote: > > > On 1/20/2023 4:54 PM, Paul E. McKenney wrote: > > > > On Fri, Jan 20, 2023 at 06:58:20AM -0800, Arjan van de Ven wrote: > > > > > On 1/20/2023 5:55 AM, Hernan Ponce de Leon wrote: > > > > > > From: Hernan Ponce de Leon <hernanl.leon@huawei.com> > > > > > > > > > > > > > > > > > kernel/locking/rtmutex.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c > > > > > > index 010cf4e6d0b8..7ed9472edd48 100644 > > > > > > --- a/kernel/locking/rtmutex.c > > > > > > +++ b/kernel/locking/rtmutex.c > > > > > > @@ -235,7 +235,7 @@ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock) > > > > > > unsigned long owner, *p = (unsigned long *) &lock->owner; > > > > > > do { > > > > > > - owner = *p; > > > > > > + owner = READ_ONCE(*p); > > > > > > } while (cmpxchg_relaxed(p, owner, > > > > > > > > > > > > > > > I don't see how this makes any difference at all. > > > > > *p can be read a dozen times and it's fine; cmpxchg has barrier semantics for compilers afaics > > > > > > > > Doing so does suppress a KCSAN warning. You could also use data_race() > > > > if it turns out that the volatile semantics would prevent a valuable > > > > compiler optimization. > > > > > > I think the import question is "is this a harmful data race (and needs to be > > > fixed as proposed by the patch) or a harmless one (and we should use > > > data_race() to silence tools)?". > > > > > > In https://lkml.org/lkml/2023/1/22/160 I describe how this data race can > > > affect important ordering guarantees for the rest of the code. For this > > > reason I consider it a harmful one. If this is not the case, I would > > > appreciate some feedback or pointer to resources about what races care to > > > avoid spamming the mailing list in the future. > > > > In the case, the value read is passed into cmpxchg_relaxed(), which > > checks the value against memory. In this case, as Arjan noted, the only > > compiler-and-silicon difference between data_race() and READ_ONCE() > > is that use of data_race() might allow the compiler to do things like > > tear the load, thus forcing the occasional spurious cmpxchg_relaxed() > > failure. In contrast, LKMM (by design) throws up its hands when it sees > > a data race. Something about not being eager to track the idiosyncrasies > > of many compiler versions. > > > > My approach in my own code is to use *_ONCE() unless it causes a visible > > performance regression or if it confuses KCSAN. An example of the latter > > can be debug code, in which case use of data_race() avoids suppressing > > KCSAN warnings (and also false positives, depending). > > I understand that *_ONCE() might avoid some compiler optimization and reduce > performance in the general case. However, if I understand your first > paragraph correctly, in this particular case data_race() could allow the CAS > to fail more often, resulting in more spinning iterations and degraded > performance. Am I right? In theory, yes. The overall effect on performance will depend on the hardware, the compiler, the compiler version, the flags passed to that compiler, and who knows what all else. > > Except that your other email seems to also be arguing that additional > > ordering is required. So is https://lkml.org/lkml/2023/1/20/702 really > > sufficient just by itself, or is additional ordering required? > > I do not claim that we need to mark the read to add the ordering that is > needed for correctness (mutual exclusion). What I claim in this patch is > that there is a data race, and since it can affect ordering constrains in > subtle ways, I consider it harmful and thus I want to fix it. > > What I explain in the other email is that if we fix the data race, either > the fence or the acquire store might be relaxed (because marking the read > gives us some extra ordering guarantees). If the race is not fixed, both the > fence and the acquire are needed according to LKMM. The situation is > different wrt hardware models. In that case the tool cannot find any > violation even if we don't fix the race and we relax the store / remove the > fence. Plus there might be other options, as Waiman and Peter are discussing. Thanx, Paul ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-20 13:55 [PATCH] Fix data race in mark_rt_mutex_waiters Hernan Ponce de Leon 2023-01-20 14:58 ` Arjan van de Ven @ 2023-01-20 16:23 ` Peter Zijlstra 2023-01-20 16:58 ` David Laight 1 sibling, 1 reply; 24+ messages in thread From: Peter Zijlstra @ 2023-01-20 16:23 UTC (permalink / raw) To: Hernan Ponce de Leon Cc: mingo, will, longman, boqun.feng, akpm, arjan, tglx, joel, paulmck, stern, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable On Fri, Jan 20, 2023 at 02:55:25PM +0100, Hernan Ponce de Leon wrote: > From: Hernan Ponce de Leon <hernanl.leon@huawei.com> > > Following the defition of data race in > tools/memory-model/linux-kernel.cat the dartagnan tool > https://github.com/hernanponcedeleon/Dat3M > reported a race between mark_rt_mutex_waiters and rt_mutex_cmpxchg_release. > > Commit 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core") > later removed in commit d0aa7a70bf03 ("futex_requeue_pi optimization") > and reverted in commit bd197234b0a6 > ("Revert "futex_requeue_pi optimization"") > > The original commit introduced the data race. > > Cc: stable@vger.kernel.org # v2.6.18.x > Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core") > Signed-off-by: Hernan Ponce de Leon <hernanl.leon@huawei.com> > --- > kernel/locking/rtmutex.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c > index 010cf4e6d0b8..7ed9472edd48 100644 > --- a/kernel/locking/rtmutex.c > +++ b/kernel/locking/rtmutex.c > @@ -235,7 +235,7 @@ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock) > unsigned long owner, *p = (unsigned long *) &lock->owner; > > do { > - owner = *p; > + owner = READ_ONCE(*p); > } while (cmpxchg_relaxed(p, owner, > owner | RT_MUTEX_HAS_WAITERS) != owner); > Can't we replace the whole of that function with: set_bit(0, (unsigned long *)&lock->owner); ? ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: [PATCH] Fix data race in mark_rt_mutex_waiters 2023-01-20 16:23 ` Peter Zijlstra @ 2023-01-20 16:58 ` David Laight 0 siblings, 0 replies; 24+ messages in thread From: David Laight @ 2023-01-20 16:58 UTC (permalink / raw) To: 'Peter Zijlstra', Hernan Ponce de Leon Cc: mingo, will, longman, boqun.feng, akpm, arjan, tglx, joel, paulmck, stern, diogo.behrens, jonas.oberhauser, linux-kernel, Hernan Ponce de Leon, stable From: Peter Zijlstra > Sent: 20 January 2023 16:23 ... > > do { > > - owner = *p; > > + owner = READ_ONCE(*p); > > } while (cmpxchg_relaxed(p, owner, > > owner | RT_MUTEX_HAS_WAITERS) != owner); > > > > Can't we replace the whole of that function with: > > set_bit(0, (unsigned long *)&lock->owner); > > ? If you need the cast then probably not... There really ought to be a compile-time test (somehow) that set_bit() is only used on large bit arrays. OTOH atomic_or32/64() and atomic_and32/64() might use usable in many places. On x86 I doubt it makes much difference whether you use 'bis' or 'lock or'. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2023-03-10 17:11 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-01-20 13:55 [PATCH] Fix data race in mark_rt_mutex_waiters Hernan Ponce de Leon 2023-01-20 14:58 ` Arjan van de Ven 2023-01-20 15:54 ` Paul E. McKenney 2023-01-22 15:24 ` Hernan Ponce de Leon 2023-01-23 16:40 ` Paul E. McKenney 2023-01-23 17:34 ` Alan Stern 2023-01-23 17:48 ` Paul E. McKenney 2023-01-23 20:02 ` Jonas Oberhauser 2023-01-24 14:57 ` Hernan Ponce de Leon 2023-01-24 15:42 ` Waiman Long 2023-01-24 15:52 ` Peter Zijlstra 2023-01-24 16:04 ` Waiman Long 2023-01-26 9:42 ` Hernan Ponce de Leon 2023-01-26 12:20 ` Peter Zijlstra 2023-01-26 14:20 ` Peter Zijlstra 2023-01-26 21:07 ` Hernan Ponce de Leon 2023-01-26 22:10 ` David Laight 2023-01-27 1:46 ` Waiman Long 2023-03-01 16:32 ` lock_torture results for different patches: Antonio Paolillo 2023-03-06 7:58 ` Hernan Ponce de Leon 2023-03-10 17:11 ` Paul E. McKenney 2023-01-24 16:12 ` [PATCH] Fix data race in mark_rt_mutex_waiters Paul E. McKenney 2023-01-20 16:23 ` Peter Zijlstra 2023-01-20 16:58 ` David Laight
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).