* [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() @ 2019-04-19 17:21 Alan Stern 2019-04-19 17:54 ` Paul E. McKenney 2019-04-19 18:00 ` Peter Zijlstra 0 siblings, 2 replies; 21+ messages in thread From: Alan Stern @ 2019-04-19 17:21 UTC (permalink / raw) To: LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, Daniel Lustig, David Howells, Jade Alglave, Luc Maranget, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra, Will Deacon Cc: Kernel development list The description of smp_mb__before_atomic() and smp_mb__after_atomic() in Documentation/atomic_t.txt is slightly terse and misleading. It does not clearly state that these barriers only affect the ordering of other instructions with respect to the atomic operation. This improves the text to make the actual ordering implications clear, and also to explain how these barriers differ from a RELEASE or ACQUIRE ordering. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> --- Documentation/atomic_t.txt | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) Index: usb-devel/Documentation/atomic_t.txt =================================================================== --- usb-devel.orig/Documentation/atomic_t.txt +++ usb-devel/Documentation/atomic_t.txt @@ -171,7 +171,10 @@ The barriers: smp_mb__{before,after}_atomic() only apply to the RMW ops and can be used to augment/upgrade the ordering -inherent to the used atomic op. These barriers provide a full smp_mb(). +inherent to the used atomic op. Unlike normal smp_mb() barriers, they order +only the RMW op itself against the instructions preceding the +smp_mb__before_atomic() or following the smp_mb__after_atomic(); they do +not order instructions on the other side of the RMW op at all. These helper barriers exist because architectures have varying implicit ordering on their SMP atomic primitives. For example our TSO architectures @@ -195,7 +198,8 @@ Further, while something like: atomic_dec(&X); is a 'typical' RELEASE pattern, the barrier is strictly stronger than -a RELEASE. Similarly for something like: +a RELEASE because it orders preceding instructions against both the read +and write parts of the atomic_dec(). Similarly, something like: atomic_inc(&X); smp_mb__after_atomic(); @@ -227,7 +231,8 @@ strictly stronger than ACQUIRE. As illus This should not happen; but a hypothetical atomic_inc_acquire() -- (void)atomic_fetch_inc_acquire() for instance -- would allow the outcome, -since then: +because it would not order the W part of the RMW against the following +WRITE_ONCE. Thus: P1 P2 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-19 17:21 [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() Alan Stern @ 2019-04-19 17:54 ` Paul E. McKenney 2019-04-19 18:00 ` Peter Zijlstra 1 sibling, 0 replies; 21+ messages in thread From: Paul E. McKenney @ 2019-04-19 17:54 UTC (permalink / raw) To: Alan Stern Cc: LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, Daniel Lustig, David Howells, Jade Alglave, Luc Maranget, Nicholas Piggin, Peter Zijlstra, Will Deacon, Kernel development list On Fri, Apr 19, 2019 at 01:21:45PM -0400, Alan Stern wrote: > The description of smp_mb__before_atomic() and smp_mb__after_atomic() > in Documentation/atomic_t.txt is slightly terse and misleading. It > does not clearly state that these barriers only affect the ordering of > other instructions with respect to the atomic operation. > > This improves the text to make the actual ordering implications clear, > and also to explain how these barriers differ from a RELEASE or > ACQUIRE ordering. > > Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Queued for further review, thank you, Alan! Thanx, Paul > --- > > > Documentation/atomic_t.txt | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > Index: usb-devel/Documentation/atomic_t.txt > =================================================================== > --- usb-devel.orig/Documentation/atomic_t.txt > +++ usb-devel/Documentation/atomic_t.txt > @@ -171,7 +171,10 @@ The barriers: > smp_mb__{before,after}_atomic() > > only apply to the RMW ops and can be used to augment/upgrade the ordering > -inherent to the used atomic op. These barriers provide a full smp_mb(). > +inherent to the used atomic op. Unlike normal smp_mb() barriers, they order > +only the RMW op itself against the instructions preceding the > +smp_mb__before_atomic() or following the smp_mb__after_atomic(); they do > +not order instructions on the other side of the RMW op at all. > > These helper barriers exist because architectures have varying implicit > ordering on their SMP atomic primitives. For example our TSO architectures > @@ -195,7 +198,8 @@ Further, while something like: > atomic_dec(&X); > > is a 'typical' RELEASE pattern, the barrier is strictly stronger than > -a RELEASE. Similarly for something like: > +a RELEASE because it orders preceding instructions against both the read > +and write parts of the atomic_dec(). Similarly, something like: > > atomic_inc(&X); > smp_mb__after_atomic(); > @@ -227,7 +231,8 @@ strictly stronger than ACQUIRE. As illus > > This should not happen; but a hypothetical atomic_inc_acquire() -- > (void)atomic_fetch_inc_acquire() for instance -- would allow the outcome, > -since then: > +because it would not order the W part of the RMW against the following > +WRITE_ONCE. Thus: > > P1 P2 > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-19 17:21 [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() Alan Stern 2019-04-19 17:54 ` Paul E. McKenney @ 2019-04-19 18:00 ` Peter Zijlstra 2019-04-19 18:26 ` Paul E. McKenney 1 sibling, 1 reply; 21+ messages in thread From: Peter Zijlstra @ 2019-04-19 18:00 UTC (permalink / raw) To: Alan Stern Cc: LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, Daniel Lustig, David Howells, Jade Alglave, Luc Maranget, Nicholas Piggin, Paul E. McKenney, Will Deacon, Kernel development list On Fri, Apr 19, 2019 at 01:21:45PM -0400, Alan Stern wrote: > Index: usb-devel/Documentation/atomic_t.txt > =================================================================== > --- usb-devel.orig/Documentation/atomic_t.txt > +++ usb-devel/Documentation/atomic_t.txt > @@ -171,7 +171,10 @@ The barriers: > smp_mb__{before,after}_atomic() > > only apply to the RMW ops and can be used to augment/upgrade the ordering > -inherent to the used atomic op. These barriers provide a full smp_mb(). > +inherent to the used atomic op. Unlike normal smp_mb() barriers, they order > +only the RMW op itself against the instructions preceding the > +smp_mb__before_atomic() or following the smp_mb__after_atomic(); they do > +not order instructions on the other side of the RMW op at all. Now it is I who is confused; what? x = 1; smp_mb__before_atomic(); atomic_add(1, &a); y = 1; the stores to both x and y will be ordered as if an smp_mb() where there. There is no order between a and y otoh. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-19 18:00 ` Peter Zijlstra @ 2019-04-19 18:26 ` Paul E. McKenney 2019-04-20 0:26 ` Nicholas Piggin 0 siblings, 1 reply; 21+ messages in thread From: Paul E. McKenney @ 2019-04-19 18:26 UTC (permalink / raw) To: Peter Zijlstra Cc: Alan Stern, LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, Daniel Lustig, David Howells, Jade Alglave, Luc Maranget, Nicholas Piggin, Will Deacon, Kernel development list On Fri, Apr 19, 2019 at 08:00:17PM +0200, Peter Zijlstra wrote: > On Fri, Apr 19, 2019 at 01:21:45PM -0400, Alan Stern wrote: > > Index: usb-devel/Documentation/atomic_t.txt > > =================================================================== > > --- usb-devel.orig/Documentation/atomic_t.txt > > +++ usb-devel/Documentation/atomic_t.txt > > @@ -171,7 +171,10 @@ The barriers: > > smp_mb__{before,after}_atomic() > > > > only apply to the RMW ops and can be used to augment/upgrade the ordering > > -inherent to the used atomic op. These barriers provide a full smp_mb(). > > +inherent to the used atomic op. Unlike normal smp_mb() barriers, they order > > +only the RMW op itself against the instructions preceding the > > +smp_mb__before_atomic() or following the smp_mb__after_atomic(); they do > > +not order instructions on the other side of the RMW op at all. > > Now it is I who is confused; what? > > x = 1; > smp_mb__before_atomic(); > atomic_add(1, &a); > y = 1; > > the stores to both x and y will be ordered as if an smp_mb() where > there. There is no order between a and y otoh. Let's look at x86. And a slightly different example: x = 1; smp_mb__before_atomic(); atomic_add(1, &a); r1 = y; The atomic_add() asm does not have the "memory" constraint, which is completely legitimate because atomic_add() does not return a value, and thus guarantees no ordering. The compiler is therefore within its rights to transform the code into the following: x = 1; smp_mb__before_atomic(); r1 = y; atomic_add(1, &a); But x86's smp_mb__before_atomic() is just a compiler barrier, and x86 is further allowed to reorder prior stores with later loads. The CPU can therefore execute this code as follows: r1 = y; x = 1; smp_mb__before_atomic(); atomic_add(1, &a); So in general, the ordering is guaranteed only to the atomic itself, not to accesses on the other side of the atomic. Thanx, Paul ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-19 18:26 ` Paul E. McKenney @ 2019-04-20 0:26 ` Nicholas Piggin 2019-04-20 8:54 ` Paul E. McKenney 0 siblings, 1 reply; 21+ messages in thread From: Nicholas Piggin @ 2019-04-20 0:26 UTC (permalink / raw) To: paulmck, Peter Zijlstra Cc: LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon Paul E. McKenney's on April 20, 2019 4:26 am: > On Fri, Apr 19, 2019 at 08:00:17PM +0200, Peter Zijlstra wrote: >> On Fri, Apr 19, 2019 at 01:21:45PM -0400, Alan Stern wrote: >> > Index: usb-devel/Documentation/atomic_t.txt >> > =================================================================== >> > --- usb-devel.orig/Documentation/atomic_t.txt >> > +++ usb-devel/Documentation/atomic_t.txt >> > @@ -171,7 +171,10 @@ The barriers: >> > smp_mb__{before,after}_atomic() >> > >> > only apply to the RMW ops and can be used to augment/upgrade the ordering >> > -inherent to the used atomic op. These barriers provide a full smp_mb(). >> > +inherent to the used atomic op. Unlike normal smp_mb() barriers, they order >> > +only the RMW op itself against the instructions preceding the >> > +smp_mb__before_atomic() or following the smp_mb__after_atomic(); they do >> > +not order instructions on the other side of the RMW op at all. >> >> Now it is I who is confused; what? >> >> x = 1; >> smp_mb__before_atomic(); >> atomic_add(1, &a); >> y = 1; >> >> the stores to both x and y will be ordered as if an smp_mb() where >> there. There is no order between a and y otoh. > > Let's look at x86. And a slightly different example: > > x = 1; > smp_mb__before_atomic(); > atomic_add(1, &a); > r1 = y; > > The atomic_add() asm does not have the "memory" constraint, which is > completely legitimate because atomic_add() does not return a value, > and thus guarantees no ordering. The compiler is therefore within > its rights to transform the code into the following: > > x = 1; > smp_mb__before_atomic(); > r1 = y; > atomic_add(1, &a); > > But x86's smp_mb__before_atomic() is just a compiler barrier, and > x86 is further allowed to reorder prior stores with later loads. > The CPU can therefore execute this code as follows: > > r1 = y; > x = 1; > smp_mb__before_atomic(); > atomic_add(1, &a); > > So in general, the ordering is guaranteed only to the atomic itself, > not to accesses on the other side of the atomic. That's interesting. I don't think that's what all our code expects. I had the same idea as Peter. IIRC the primitive was originally introduced exactly so x86 would not need to have the unnecessary hardware barrier with sequences like smp_mb(); ... atomic_inc(&v); The "new" semantics are a bit subtle. One option might be just to replace it entirely with atomic_xxx_mb() ? Thanks, Nick ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-20 0:26 ` Nicholas Piggin @ 2019-04-20 8:54 ` Paul E. McKenney 2019-04-23 12:17 ` Peter Zijlstra 2019-04-23 12:32 ` Peter Zijlstra 0 siblings, 2 replies; 21+ messages in thread From: Paul E. McKenney @ 2019-04-20 8:54 UTC (permalink / raw) To: Nicholas Piggin Cc: Peter Zijlstra, LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon On Sat, Apr 20, 2019 at 10:26:15AM +1000, Nicholas Piggin wrote: > Paul E. McKenney's on April 20, 2019 4:26 am: > > On Fri, Apr 19, 2019 at 08:00:17PM +0200, Peter Zijlstra wrote: > >> On Fri, Apr 19, 2019 at 01:21:45PM -0400, Alan Stern wrote: > >> > Index: usb-devel/Documentation/atomic_t.txt > >> > =================================================================== > >> > --- usb-devel.orig/Documentation/atomic_t.txt > >> > +++ usb-devel/Documentation/atomic_t.txt > >> > @@ -171,7 +171,10 @@ The barriers: > >> > smp_mb__{before,after}_atomic() > >> > > >> > only apply to the RMW ops and can be used to augment/upgrade the ordering > >> > -inherent to the used atomic op. These barriers provide a full smp_mb(). > >> > +inherent to the used atomic op. Unlike normal smp_mb() barriers, they order > >> > +only the RMW op itself against the instructions preceding the > >> > +smp_mb__before_atomic() or following the smp_mb__after_atomic(); they do > >> > +not order instructions on the other side of the RMW op at all. > >> > >> Now it is I who is confused; what? > >> > >> x = 1; > >> smp_mb__before_atomic(); > >> atomic_add(1, &a); > >> y = 1; > >> > >> the stores to both x and y will be ordered as if an smp_mb() where > >> there. There is no order between a and y otoh. > > > > Let's look at x86. And a slightly different example: > > > > x = 1; > > smp_mb__before_atomic(); > > atomic_add(1, &a); > > r1 = y; > > > > The atomic_add() asm does not have the "memory" constraint, which is > > completely legitimate because atomic_add() does not return a value, > > and thus guarantees no ordering. The compiler is therefore within > > its rights to transform the code into the following: > > > > x = 1; > > smp_mb__before_atomic(); > > r1 = y; > > atomic_add(1, &a); > > > > But x86's smp_mb__before_atomic() is just a compiler barrier, and > > x86 is further allowed to reorder prior stores with later loads. > > The CPU can therefore execute this code as follows: > > > > r1 = y; > > x = 1; > > smp_mb__before_atomic(); > > atomic_add(1, &a); > > > > So in general, the ordering is guaranteed only to the atomic itself, > > not to accesses on the other side of the atomic. > > That's interesting. I don't think that's what all our code expects. > I had the same idea as Peter. > > IIRC the primitive was originally introduced exactly so x86 would not > need to have the unnecessary hardware barrier with sequences like > > smp_mb(); > ... > atomic_inc(&v); > > The "new" semantics are a bit subtle. One option might be just to > replace it entirely with atomic_xxx_mb() ? Hmmm... There are more than 2,000 uses of atomic_inc() in the kernel. There are about 300-400 total between smp_mb__before_atomic() and smp_mb__after_atomic(). So what are our options? 1. atomic_xxx_mb() as you say. From a quick scan of smp_mb__before_atomic() uses, we need this for atomic_inc(), atomic_dec(), atomic_add(), atomic_sub(), clear_bit(), set_bit(), test_bit(), atomic_long_dec(), atomic_long_add(), refcount_dec(), cmpxchg_relaxed(), set_tsk_thread_flag(), clear_bit_unlock(). Another random look identifies atomic_andnot(). And atomic_set(): set_preempt_state(). This fails on x86, s390, and TSO friends, does it not? Or is this ARM-only? Still, why not just smp_mb() before and after? Same issue in __kernfs_new_node(), bio_cnt_set(), sbitmap_queue_update_wake_batch(), Ditto for atomic64_set() in __ceph_dir_set_complete(). Ditto for atomic_read() in rvt_qp_is_avail(). This function has a couple of other oddly placed smp_mb__before_atomic(). And atomic_cmpxchg(): msc_buffer_alloc(). This instance of smp_mb__before_atomic() can be removed unless I am missing something subtle. Ditto for kvm_vcpu_exiting_guest_mode(), pv_kick_node(), __sbq_wake_up(), And lock acquisition??? acm_read_bulk_callback(). In nfnl_acct_fill_info(), a smp_mb__before_atomic() after a atomic64_xchg()??? Also before a clear_bit(), but the clear_bit() is inside an "if". There are a few cases that would see added overhead. For example, svc_get_next_xprt() has the following: smp_mb__before_atomic(); clear_bit(SP_CONGESTED, &pool->sp_flags); clear_bit(RQ_BUSY, &rqstp->rq_flags); smp_mb__after_atomic(); And xs_sock_reset_connection_flags() has this: smp_mb__before_atomic(); clear_bit(XPRT_CLOSE_WAIT, &xprt->state); clear_bit(XPRT_CLOSING, &xprt->state); xs_sock_reset_state_flags(xprt); /* Also a clear_bit(). */ smp_mb__after_atomic(); Yeah, there are more than a few misuses, aren't there? :-/ A coccinelle script seems in order. In 0day test robot. But there are a number of helper functions whose purpose seems to be to wrap an atomic in smp_mb__before_atomic() and smp_mb__after_atomic(), so some of the atomic_xxx_mb() functions might be a good idea just for improved readability. 2. Add something to checkpatch.pl warning about non-total ordering, with the error message explaining that the ordering is partial. 3. Make non-value-returning atomics provide full ordering. This would of course need some benchmarking, but would be a simple change to make and would eliminate a large class of potential bugs. My guess is that the loss in performance would be non-negligible, but who knows? 4. Your idea here! Thanx, Paul ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-20 8:54 ` Paul E. McKenney @ 2019-04-23 12:17 ` Peter Zijlstra 2019-04-23 13:21 ` Paul E. McKenney 2019-04-23 12:32 ` Peter Zijlstra 1 sibling, 1 reply; 21+ messages in thread From: Peter Zijlstra @ 2019-04-23 12:17 UTC (permalink / raw) To: Paul E. McKenney Cc: Nicholas Piggin, LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon On Sat, Apr 20, 2019 at 01:54:40AM -0700, Paul E. McKenney wrote: > 3. Make non-value-returning atomics provide full ordering. > This would of course need some benchmarking, but would be a > simple change to make and would eliminate a large class of > potential bugs. My guess is that the loss in performance > would be non-negligible, but who knows? Well, only for the architectures that have smp_mb__{before,after}_atomic() as barrier(), which are: ia64, mips, s390, sparc, x86 and xtense. $ ./compare.sh defconfig-build defconfig-build1 vmlinux do_profile_hits 275 278 +3,+0 freezer_apply_state 86 98 +12,+0 perf_event_alloc 2232 2261 +29,+0 _free_event 631 660 +29,+0 shmem_add_to_page_cache 712 722 +10,+0 _enable_swap_info 333 337 +4,+0 do_mmu_notifier_register 303 311 +8,+0 __nfs_commit_inode 356 359 +3,+0 tcp_try_coalesce 246 250 +4,+0 i915_gem_free_object 90 97 +7,+0 mce_intel_hcpu_update 39 47 +8,+0 __ia32_sys_swapoff 1177 1181 +4,+0 pci_enable_ats 124 131 +7,+0 __x64_sys_swapoff 1178 1182 +4,+0 i915_gem_madvise_ioctl 447 443 -4,+0 calc_global_load_tick 75 82 +7,+0 i915_gem_object_set_tiling 712 708 -4,+0 total 11374236 11374367 +131,+0 Which doesn't look too bad. --- diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h index ea3d95275b43..115127c7ad28 100644 --- a/arch/x86/include/asm/atomic.h +++ b/arch/x86/include/asm/atomic.h @@ -54,7 +54,7 @@ static __always_inline void arch_atomic_add(int i, atomic_t *v) { asm volatile(LOCK_PREFIX "addl %1,%0" : "+m" (v->counter) - : "ir" (i)); + : "ir" (i) : "memory"); } /** @@ -68,7 +68,7 @@ static __always_inline void arch_atomic_sub(int i, atomic_t *v) { asm volatile(LOCK_PREFIX "subl %1,%0" : "+m" (v->counter) - : "ir" (i)); + : "ir" (i) : "memory"); } /** @@ -95,7 +95,7 @@ static __always_inline bool arch_atomic_sub_and_test(int i, atomic_t *v) static __always_inline void arch_atomic_inc(atomic_t *v) { asm volatile(LOCK_PREFIX "incl %0" - : "+m" (v->counter)); + : "+m" (v->counter) :: "memory"); } #define arch_atomic_inc arch_atomic_inc @@ -108,7 +108,7 @@ static __always_inline void arch_atomic_inc(atomic_t *v) static __always_inline void arch_atomic_dec(atomic_t *v) { asm volatile(LOCK_PREFIX "decl %0" - : "+m" (v->counter)); + : "+m" (v->counter) :: "memory"); } #define arch_atomic_dec arch_atomic_dec diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h index dadc20adba21..5e86c0d68ac1 100644 --- a/arch/x86/include/asm/atomic64_64.h +++ b/arch/x86/include/asm/atomic64_64.h @@ -45,7 +45,7 @@ static __always_inline void arch_atomic64_add(long i, atomic64_t *v) { asm volatile(LOCK_PREFIX "addq %1,%0" : "=m" (v->counter) - : "er" (i), "m" (v->counter)); + : "er" (i), "m" (v->counter) : "memory"); } /** @@ -59,7 +59,7 @@ static inline void arch_atomic64_sub(long i, atomic64_t *v) { asm volatile(LOCK_PREFIX "subq %1,%0" : "=m" (v->counter) - : "er" (i), "m" (v->counter)); + : "er" (i), "m" (v->counter) : "memory"); } /** @@ -87,7 +87,7 @@ static __always_inline void arch_atomic64_inc(atomic64_t *v) { asm volatile(LOCK_PREFIX "incq %0" : "=m" (v->counter) - : "m" (v->counter)); + : "m" (v->counter) : "memory"); } #define arch_atomic64_inc arch_atomic64_inc @@ -101,7 +101,7 @@ static __always_inline void arch_atomic64_dec(atomic64_t *v) { asm volatile(LOCK_PREFIX "decq %0" : "=m" (v->counter) - : "m" (v->counter)); + : "m" (v->counter) : "memory"); } #define arch_atomic64_dec arch_atomic64_dec ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-23 12:17 ` Peter Zijlstra @ 2019-04-23 13:21 ` Paul E. McKenney 2019-04-23 13:26 ` Peter Zijlstra 0 siblings, 1 reply; 21+ messages in thread From: Paul E. McKenney @ 2019-04-23 13:21 UTC (permalink / raw) To: Peter Zijlstra Cc: Nicholas Piggin, LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon On Tue, Apr 23, 2019 at 02:17:15PM +0200, Peter Zijlstra wrote: > On Sat, Apr 20, 2019 at 01:54:40AM -0700, Paul E. McKenney wrote: > > 3. Make non-value-returning atomics provide full ordering. > > This would of course need some benchmarking, but would be a > > simple change to make and would eliminate a large class of > > potential bugs. My guess is that the loss in performance > > would be non-negligible, but who knows? > > Well, only for the architectures that have > smp_mb__{before,after}_atomic() as barrier(), which are: ia64, mips, > s390, sparc, x86 and xtense. The weakly ordered architectures would need to add the equivalent of smp_mb() before and after, right? This might result in a more noticeable loss of performance. Thanx, Paul > $ ./compare.sh defconfig-build defconfig-build1 vmlinux > do_profile_hits 275 278 +3,+0 > freezer_apply_state 86 98 +12,+0 > perf_event_alloc 2232 2261 +29,+0 > _free_event 631 660 +29,+0 > shmem_add_to_page_cache 712 722 +10,+0 > _enable_swap_info 333 337 +4,+0 > do_mmu_notifier_register 303 311 +8,+0 > __nfs_commit_inode 356 359 +3,+0 > tcp_try_coalesce 246 250 +4,+0 > i915_gem_free_object 90 97 +7,+0 > mce_intel_hcpu_update 39 47 +8,+0 > __ia32_sys_swapoff 1177 1181 +4,+0 > pci_enable_ats 124 131 +7,+0 > __x64_sys_swapoff 1178 1182 +4,+0 > i915_gem_madvise_ioctl 447 443 -4,+0 > calc_global_load_tick 75 82 +7,+0 > i915_gem_object_set_tiling 712 708 -4,+0 > total 11374236 11374367 +131,+0 > Which doesn't look too bad. > > --- > diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h > index ea3d95275b43..115127c7ad28 100644 > --- a/arch/x86/include/asm/atomic.h > +++ b/arch/x86/include/asm/atomic.h > @@ -54,7 +54,7 @@ static __always_inline void arch_atomic_add(int i, atomic_t *v) > { > asm volatile(LOCK_PREFIX "addl %1,%0" > : "+m" (v->counter) > - : "ir" (i)); > + : "ir" (i) : "memory"); > } > > /** > @@ -68,7 +68,7 @@ static __always_inline void arch_atomic_sub(int i, atomic_t *v) > { > asm volatile(LOCK_PREFIX "subl %1,%0" > : "+m" (v->counter) > - : "ir" (i)); > + : "ir" (i) : "memory"); > } > > /** > @@ -95,7 +95,7 @@ static __always_inline bool arch_atomic_sub_and_test(int i, atomic_t *v) > static __always_inline void arch_atomic_inc(atomic_t *v) > { > asm volatile(LOCK_PREFIX "incl %0" > - : "+m" (v->counter)); > + : "+m" (v->counter) :: "memory"); > } > #define arch_atomic_inc arch_atomic_inc > > @@ -108,7 +108,7 @@ static __always_inline void arch_atomic_inc(atomic_t *v) > static __always_inline void arch_atomic_dec(atomic_t *v) > { > asm volatile(LOCK_PREFIX "decl %0" > - : "+m" (v->counter)); > + : "+m" (v->counter) :: "memory"); > } > #define arch_atomic_dec arch_atomic_dec > > diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h > index dadc20adba21..5e86c0d68ac1 100644 > --- a/arch/x86/include/asm/atomic64_64.h > +++ b/arch/x86/include/asm/atomic64_64.h > @@ -45,7 +45,7 @@ static __always_inline void arch_atomic64_add(long i, atomic64_t *v) > { > asm volatile(LOCK_PREFIX "addq %1,%0" > : "=m" (v->counter) > - : "er" (i), "m" (v->counter)); > + : "er" (i), "m" (v->counter) : "memory"); > } > > /** > @@ -59,7 +59,7 @@ static inline void arch_atomic64_sub(long i, atomic64_t *v) > { > asm volatile(LOCK_PREFIX "subq %1,%0" > : "=m" (v->counter) > - : "er" (i), "m" (v->counter)); > + : "er" (i), "m" (v->counter) : "memory"); > } > > /** > @@ -87,7 +87,7 @@ static __always_inline void arch_atomic64_inc(atomic64_t *v) > { > asm volatile(LOCK_PREFIX "incq %0" > : "=m" (v->counter) > - : "m" (v->counter)); > + : "m" (v->counter) : "memory"); > } > #define arch_atomic64_inc arch_atomic64_inc > > @@ -101,7 +101,7 @@ static __always_inline void arch_atomic64_dec(atomic64_t *v) > { > asm volatile(LOCK_PREFIX "decq %0" > : "=m" (v->counter) > - : "m" (v->counter)); > + : "m" (v->counter) : "memory"); > } > #define arch_atomic64_dec arch_atomic64_dec > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-23 13:21 ` Paul E. McKenney @ 2019-04-23 13:26 ` Peter Zijlstra 2019-04-23 20:16 ` Paul E. McKenney 0 siblings, 1 reply; 21+ messages in thread From: Peter Zijlstra @ 2019-04-23 13:26 UTC (permalink / raw) To: Paul E. McKenney Cc: Nicholas Piggin, LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon On Tue, Apr 23, 2019 at 06:21:16AM -0700, Paul E. McKenney wrote: > On Tue, Apr 23, 2019 at 02:17:15PM +0200, Peter Zijlstra wrote: > > On Sat, Apr 20, 2019 at 01:54:40AM -0700, Paul E. McKenney wrote: > > > 3. Make non-value-returning atomics provide full ordering. > > > This would of course need some benchmarking, but would be a > > > simple change to make and would eliminate a large class of > > > potential bugs. My guess is that the loss in performance > > > would be non-negligible, but who knows? > > > > Well, only for the architectures that have > > smp_mb__{before,after}_atomic() as barrier(), which are: ia64, mips, > > s390, sparc, x86 and xtense. > > The weakly ordered architectures would need to add the equivalent of > smp_mb() before and after, right? This might result in a more noticeable > loss of performance. The weak archs already have: smp_mb__{before,after}_atomic() := smp_mb(). ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-23 13:26 ` Peter Zijlstra @ 2019-04-23 20:16 ` Paul E. McKenney 2019-04-23 20:28 ` Peter Zijlstra 0 siblings, 1 reply; 21+ messages in thread From: Paul E. McKenney @ 2019-04-23 20:16 UTC (permalink / raw) To: Peter Zijlstra Cc: Nicholas Piggin, LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon On Tue, Apr 23, 2019 at 03:26:20PM +0200, Peter Zijlstra wrote: > On Tue, Apr 23, 2019 at 06:21:16AM -0700, Paul E. McKenney wrote: > > On Tue, Apr 23, 2019 at 02:17:15PM +0200, Peter Zijlstra wrote: > > > On Sat, Apr 20, 2019 at 01:54:40AM -0700, Paul E. McKenney wrote: > > > > 3. Make non-value-returning atomics provide full ordering. > > > > This would of course need some benchmarking, but would be a > > > > simple change to make and would eliminate a large class of > > > > potential bugs. My guess is that the loss in performance > > > > would be non-negligible, but who knows? > > > > > > Well, only for the architectures that have > > > smp_mb__{before,after}_atomic() as barrier(), which are: ia64, mips, > > > s390, sparc, x86 and xtense. > > > > The weakly ordered architectures would need to add the equivalent of > > smp_mb() before and after, right? This might result in a more noticeable > > loss of performance. > > The weak archs already have: smp_mb__{before,after}_atomic() := > smp_mb(). Agreed, but I thought that one of the ideas going forward was to get rid of smp_mb__{before,after}_atomic(). Thanx, Paul ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-23 20:16 ` Paul E. McKenney @ 2019-04-23 20:28 ` Peter Zijlstra 2019-04-24 8:29 ` Paul E. McKenney 0 siblings, 1 reply; 21+ messages in thread From: Peter Zijlstra @ 2019-04-23 20:28 UTC (permalink / raw) To: Paul E. McKenney Cc: Nicholas Piggin, LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon On Tue, Apr 23, 2019 at 01:16:37PM -0700, Paul E. McKenney wrote: > Agreed, but I thought that one of the ideas going forward was to get > rid of smp_mb__{before,after}_atomic(). It's not one I had considered.. I just wanted to get rid of this 'surprise' behaviour. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-23 20:28 ` Peter Zijlstra @ 2019-04-24 8:29 ` Paul E. McKenney 2019-04-24 8:44 ` Peter Zijlstra 0 siblings, 1 reply; 21+ messages in thread From: Paul E. McKenney @ 2019-04-24 8:29 UTC (permalink / raw) To: Peter Zijlstra Cc: Nicholas Piggin, LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon On Tue, Apr 23, 2019 at 10:28:31PM +0200, Peter Zijlstra wrote: > On Tue, Apr 23, 2019 at 01:16:37PM -0700, Paul E. McKenney wrote: > > > Agreed, but I thought that one of the ideas going forward was to get > > rid of smp_mb__{before,after}_atomic(). > > It's not one I had considered.. I just wanted to get rid of this > 'surprise' behaviour. Ah, good point, your patch is in fact a midpoint between those two positions. Just to make sure I understand: 1. Without your patch, smp_mb__{before,after}_atomic() orders only against the atomic itself. 2. With your patch, smp_mb__{before,after}_atomic() orders against the atomic itself and the accesses on the other side of that atomic. However, it does not order the atomic against the accesses on the other side of that atomic. Putting things between the smp_mb__{before,after}_atomic() and the atomic is in my opinion a bad idea, but in this case they are not necessarily ordered. 3. Dispensing with smp_mb__{before,after}_atomic() would have void RMW atomics fully ordered, but I suspect that it results in ugly performance regressions. Or am I still missing something? Thanx, Paul ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-24 8:29 ` Paul E. McKenney @ 2019-04-24 8:44 ` Peter Zijlstra 0 siblings, 0 replies; 21+ messages in thread From: Peter Zijlstra @ 2019-04-24 8:44 UTC (permalink / raw) To: Paul E. McKenney Cc: Nicholas Piggin, LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon On Wed, Apr 24, 2019 at 01:29:48AM -0700, Paul E. McKenney wrote: > On Tue, Apr 23, 2019 at 10:28:31PM +0200, Peter Zijlstra wrote: > > On Tue, Apr 23, 2019 at 01:16:37PM -0700, Paul E. McKenney wrote: > > > > > Agreed, but I thought that one of the ideas going forward was to get > > > rid of smp_mb__{before,after}_atomic(). > > > > It's not one I had considered.. I just wanted to get rid of this > > 'surprise' behaviour. > > Ah, good point, your patch is in fact a midpoint between those two > positions. Just to make sure I understand: > > 1. Without your patch, smp_mb__{before,after}_atomic() orders > only against the atomic itself. Right, and that was not intentional. > 2. With your patch, smp_mb__{before,after}_atomic() orders against > the atomic itself and the accesses on the other side of that > atomic. However, it does not order the atomic against the > accesses on the other side of that atomic. Right. I'll go make a more complete patch, covering all the architectures. > Putting things between the smp_mb__{before,after}_atomic() > and the atomic is in my opinion a bad idea, but in this case > they are not necessarily ordered. Agreed, that is an unsupported idiom and it would be good to have something check for this. > 3. Dispensing with smp_mb__{before,after}_atomic() would have > void RMW atomics fully ordered, but I suspect that it results > in ugly performance regressions. > > Or am I still missing something? I think we're good :-) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-20 8:54 ` Paul E. McKenney 2019-04-23 12:17 ` Peter Zijlstra @ 2019-04-23 12:32 ` Peter Zijlstra 2019-04-23 13:30 ` Paul E. McKenney 1 sibling, 1 reply; 21+ messages in thread From: Peter Zijlstra @ 2019-04-23 12:32 UTC (permalink / raw) To: Paul E. McKenney Cc: Nicholas Piggin, LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon On Sat, Apr 20, 2019 at 01:54:40AM -0700, Paul E. McKenney wrote: > And atomic_set(): set_preempt_state(). This fails > on x86, s390, and TSO friends, does it not? Or is > this ARM-only? Still, why not just smp_mb() before and > after? Same issue in __kernfs_new_node(), bio_cnt_set(), > sbitmap_queue_update_wake_batch(), > > Ditto for atomic64_set() in __ceph_dir_set_complete(). > > Ditto for atomic_read() in rvt_qp_is_avail(). This function > has a couple of other oddly placed smp_mb__before_atomic(). That are just straight up bugs. The atomic_t.txt file clearly specifies the barriers only apply to RmW ops and both _set() and _read() are specified to not be a RmW. > And atomic_cmpxchg(): msc_buffer_alloc(). This instance > of smp_mb__before_atomic() can be removed unless I am missing > something subtle. Ditto for kvm_vcpu_exiting_guest_mode(), > pv_kick_node(), __sbq_wake_up(), Note that pv_kick_node() uses cmpxchg_relaxed(), which does not otherwise imply barriers. > And lock acquisition??? acm_read_bulk_callback(). I think it goes with the set_bit() earlier, but what do I know. > In nfnl_acct_fill_info(), a smp_mb__before_atomic() after > a atomic64_xchg()??? Also before a clear_bit(), but the > clear_bit() is inside an "if". Since it is _before, I'm thinking the pairing was intended with the clear_bit(), and yes, then I would expect the smp_mb__before_atomic() to be part of that same branch. > There are a few cases that would see added overhead. For example, > svc_get_next_xprt() has the following: > > smp_mb__before_atomic(); > clear_bit(SP_CONGESTED, &pool->sp_flags); > clear_bit(RQ_BUSY, &rqstp->rq_flags); > smp_mb__after_atomic(); > > And xs_sock_reset_connection_flags() has this: > > smp_mb__before_atomic(); > clear_bit(XPRT_CLOSE_WAIT, &xprt->state); > clear_bit(XPRT_CLOSING, &xprt->state); > xs_sock_reset_state_flags(xprt); /* Also a clear_bit(). */ > smp_mb__after_atomic(); > > Yeah, there are more than a few misuses, aren't there? :-/ > A coccinelle script seems in order. In 0day test robot. If we can get it to flag the right patterns, then yes that might be useful regardless of the issue at hand, people seem to get this one wrong a lot. > But there are a number of helper functions whose purpose > seems to be to wrap an atomic in smp_mb__before_atomic() and > smp_mb__after_atomic(), so some of the atomic_xxx_mb() functions > might be a good idea just for improved readability. Are there really sites where _mb() makes sense? The above is just a lot of buggy code. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-23 12:32 ` Peter Zijlstra @ 2019-04-23 13:30 ` Paul E. McKenney 2019-04-23 13:40 ` Peter Zijlstra ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Paul E. McKenney @ 2019-04-23 13:30 UTC (permalink / raw) To: Peter Zijlstra Cc: Nicholas Piggin, LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon On Tue, Apr 23, 2019 at 02:32:09PM +0200, Peter Zijlstra wrote: > On Sat, Apr 20, 2019 at 01:54:40AM -0700, Paul E. McKenney wrote: > > And atomic_set(): set_preempt_state(). This fails > > on x86, s390, and TSO friends, does it not? Or is > > this ARM-only? Still, why not just smp_mb() before and > > after? Same issue in __kernfs_new_node(), bio_cnt_set(), > > sbitmap_queue_update_wake_batch(), > > > > Ditto for atomic64_set() in __ceph_dir_set_complete(). > > > > Ditto for atomic_read() in rvt_qp_is_avail(). This function > > has a couple of other oddly placed smp_mb__before_atomic(). > > That are just straight up bugs. The atomic_t.txt file clearly specifies > the barriers only apply to RmW ops and both _set() and _read() are > specified to not be a RmW. Agreed. The "Ditto" covers my atomic_set() consternation. ;-) > > And atomic_cmpxchg(): msc_buffer_alloc(). This instance > > of smp_mb__before_atomic() can be removed unless I am missing > > something subtle. Ditto for kvm_vcpu_exiting_guest_mode(), > > pv_kick_node(), __sbq_wake_up(), > > Note that pv_kick_node() uses cmpxchg_relaxed(), which does not > otherwise imply barriers. Good point, my eyes must have been going funny. > > And lock acquisition??? acm_read_bulk_callback(). > > I think it goes with the set_bit() earlier, but what do I know. Quite possibly! In that case it should be smp_mb__after_atomic(), and it would be nice if it immediately followed the set_bit(). > > In nfnl_acct_fill_info(), a smp_mb__before_atomic() after > > a atomic64_xchg()??? Also before a clear_bit(), but the > > clear_bit() is inside an "if". > > Since it is _before, I'm thinking the pairing was intended with the > clear_bit(), and yes, then I would expect the smp_mb__before_atomic() to > be part of that same branch. It is quite possible that this one is a leftover, where the atomic operation was removed but the smp_mb__{before,after}_atomic() lived on. I had one of those in RCU, which now has a patch in -rcu. > > There are a few cases that would see added overhead. For example, > > svc_get_next_xprt() has the following: > > > > smp_mb__before_atomic(); > > clear_bit(SP_CONGESTED, &pool->sp_flags); > > clear_bit(RQ_BUSY, &rqstp->rq_flags); > > smp_mb__after_atomic(); > > > > And xs_sock_reset_connection_flags() has this: > > > > smp_mb__before_atomic(); > > clear_bit(XPRT_CLOSE_WAIT, &xprt->state); > > clear_bit(XPRT_CLOSING, &xprt->state); > > xs_sock_reset_state_flags(xprt); /* Also a clear_bit(). */ > > smp_mb__after_atomic(); > > > > Yeah, there are more than a few misuses, aren't there? :-/ > > A coccinelle script seems in order. In 0day test robot. > > If we can get it to flag the right patterns, then yes that might be > useful regardless of the issue at hand, people seem to get this one > wrong a lot. To be fair, the odd-looking ones are maybe 5% of the total. Still too many wrong, but the vast majority look OK. > > But there are a number of helper functions whose purpose > > seems to be to wrap an atomic in smp_mb__before_atomic() and > > smp_mb__after_atomic(), so some of the atomic_xxx_mb() functions > > might be a good idea just for improved readability. > > Are there really sites where _mb() makes sense? The above is just a lot > of buggy code. There are a great many that look like this: smp_mb__before_atomic(); clear_bit(NFSD4_CLIENT_UPCALL_LOCK, &clp->cl_flags); smp_mb__after_atomic(); Replacing these three lines with this would not be a bad thing: clear_bit_mb(NFSD4_CLIENT_UPCALL_LOCK, &clp->cl_flags); Thanx, Paul ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-23 13:30 ` Paul E. McKenney @ 2019-04-23 13:40 ` Peter Zijlstra 2019-04-23 20:19 ` Paul E. McKenney 2019-04-27 8:17 ` Andrea Parri 2019-04-29 9:24 ` Johan Hovold 2 siblings, 1 reply; 21+ messages in thread From: Peter Zijlstra @ 2019-04-23 13:40 UTC (permalink / raw) To: Paul E. McKenney Cc: Nicholas Piggin, LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon On Tue, Apr 23, 2019 at 06:30:10AM -0700, Paul E. McKenney wrote: > There are a great many that look like this: > > smp_mb__before_atomic(); > clear_bit(NFSD4_CLIENT_UPCALL_LOCK, &clp->cl_flags); > smp_mb__after_atomic(); Ooh, marvel at the comment describing the ordering there. Oh wait :-( So much for checkpatch.pl I suppose. I think the first is a release order for the 'LOCK' bit and the second is because of wake_up_bit() being a shitty interface. So maybe that should've been: clear_bit_unlock(NFSD4_CLIENT_UPCALL_LOCK, &clp->cl_flags); smp_mb__after_atomic(); wake_up_bit(&clp->cl_flags, NFSD4_CLIENT_UPCALL_LOCK); instead? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-23 13:40 ` Peter Zijlstra @ 2019-04-23 20:19 ` Paul E. McKenney 0 siblings, 0 replies; 21+ messages in thread From: Paul E. McKenney @ 2019-04-23 20:19 UTC (permalink / raw) To: Peter Zijlstra Cc: Nicholas Piggin, LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon On Tue, Apr 23, 2019 at 03:40:03PM +0200, Peter Zijlstra wrote: > On Tue, Apr 23, 2019 at 06:30:10AM -0700, Paul E. McKenney wrote: > > There are a great many that look like this: > > > > smp_mb__before_atomic(); > > clear_bit(NFSD4_CLIENT_UPCALL_LOCK, &clp->cl_flags); > > smp_mb__after_atomic(); > > Ooh, marvel at the comment describing the ordering there. Oh wait :-( > So much for checkpatch.pl I suppose. Especially if the code was in the kernel before checkpatch.pl started asking for comments. Which might or might not be the case with this code. No idea either way. > I think the first is a release order for the 'LOCK' bit and the second > is because of wake_up_bit() being a shitty interface. > > So maybe that should've been: > > clear_bit_unlock(NFSD4_CLIENT_UPCALL_LOCK, &clp->cl_flags); > smp_mb__after_atomic(); > wake_up_bit(&clp->cl_flags, NFSD4_CLIENT_UPCALL_LOCK); > > instead? Quite possibly, but my brain is too fried to be sure. Thanx, Paul ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-23 13:30 ` Paul E. McKenney 2019-04-23 13:40 ` Peter Zijlstra @ 2019-04-27 8:17 ` Andrea Parri 2019-04-27 8:36 ` Paul E. McKenney 2019-04-29 9:24 ` Johan Hovold 2 siblings, 1 reply; 21+ messages in thread From: Andrea Parri @ 2019-04-27 8:17 UTC (permalink / raw) To: Paul E. McKenney Cc: Peter Zijlstra, Nicholas Piggin, LKMM Maintainers -- Akira Yokosawa, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon On Tue, Apr 23, 2019 at 06:30:10AM -0700, Paul E. McKenney wrote: > On Tue, Apr 23, 2019 at 02:32:09PM +0200, Peter Zijlstra wrote: > > On Sat, Apr 20, 2019 at 01:54:40AM -0700, Paul E. McKenney wrote: > > > And atomic_set(): set_preempt_state(). This fails > > > on x86, s390, and TSO friends, does it not? Or is > > > this ARM-only? Still, why not just smp_mb() before and > > > after? Same issue in __kernfs_new_node(), bio_cnt_set(), > > > sbitmap_queue_update_wake_batch(), > > > > > > Ditto for atomic64_set() in __ceph_dir_set_complete(). > > > > > > Ditto for atomic_read() in rvt_qp_is_avail(). This function > > > has a couple of other oddly placed smp_mb__before_atomic(). > > > > That are just straight up bugs. The atomic_t.txt file clearly specifies > > the barriers only apply to RmW ops and both _set() and _read() are > > specified to not be a RmW. > > Agreed. The "Ditto" covers my atomic_set() consternation. ;-) I was working on some of these before the Easter break [1, 2]: the plan was to continue next week, but by addressing the remaining cases with a conservative s/that barrier/smp_mb at first; unless you've other plans? Andrea [1] http://lkml.kernel.org/r/1555417031-27356-1-git-send-email-andrea.parri@amarulasolutions.com [2] http://lkml.kernel.org/r/1555404968-39927-1-git-send-email-pbonzini@redhat.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-27 8:17 ` Andrea Parri @ 2019-04-27 8:36 ` Paul E. McKenney 0 siblings, 0 replies; 21+ messages in thread From: Paul E. McKenney @ 2019-04-27 8:36 UTC (permalink / raw) To: Andrea Parri Cc: Peter Zijlstra, Nicholas Piggin, LKMM Maintainers -- Akira Yokosawa, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon On Sat, Apr 27, 2019 at 10:17:38AM +0200, Andrea Parri wrote: > On Tue, Apr 23, 2019 at 06:30:10AM -0700, Paul E. McKenney wrote: > > On Tue, Apr 23, 2019 at 02:32:09PM +0200, Peter Zijlstra wrote: > > > On Sat, Apr 20, 2019 at 01:54:40AM -0700, Paul E. McKenney wrote: > > > > And atomic_set(): set_preempt_state(). This fails > > > > on x86, s390, and TSO friends, does it not? Or is > > > > this ARM-only? Still, why not just smp_mb() before and > > > > after? Same issue in __kernfs_new_node(), bio_cnt_set(), > > > > sbitmap_queue_update_wake_batch(), > > > > > > > > Ditto for atomic64_set() in __ceph_dir_set_complete(). > > > > > > > > Ditto for atomic_read() in rvt_qp_is_avail(). This function > > > > has a couple of other oddly placed smp_mb__before_atomic(). > > > > > > That are just straight up bugs. The atomic_t.txt file clearly specifies > > > the barriers only apply to RmW ops and both _set() and _read() are > > > specified to not be a RmW. > > > > Agreed. The "Ditto" covers my atomic_set() consternation. ;-) > > I was working on some of these before the Easter break [1, 2]: the plan > was to continue next week, but by addressing the remaining cases with a > conservative s/that barrier/smp_mb at first; unless you've other plans? > > Andrea > > [1] http://lkml.kernel.org/r/1555417031-27356-1-git-send-email-andrea.parri@amarulasolutions.com > [2] http://lkml.kernel.org/r/1555404968-39927-1-git-send-email-pbonzini@redhat.com Sounds good to me! ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-23 13:30 ` Paul E. McKenney 2019-04-23 13:40 ` Peter Zijlstra 2019-04-27 8:17 ` Andrea Parri @ 2019-04-29 9:24 ` Johan Hovold 2019-04-29 14:49 ` Paul E. McKenney 2 siblings, 1 reply; 21+ messages in thread From: Johan Hovold @ 2019-04-29 9:24 UTC (permalink / raw) To: Paul E. McKenney Cc: Peter Zijlstra, Nicholas Piggin, LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon On Tue, Apr 23, 2019 at 06:30:10AM -0700, Paul E. McKenney wrote: > On Tue, Apr 23, 2019 at 02:32:09PM +0200, Peter Zijlstra wrote: > > On Sat, Apr 20, 2019 at 01:54:40AM -0700, Paul E. McKenney wrote: > > > And lock acquisition??? acm_read_bulk_callback(). > > > > I think it goes with the set_bit() earlier, but what do I know. > > Quite possibly! In that case it should be smp_mb__after_atomic(), > and it would be nice if it immediately followed the set_bit(). I noticed this one last week as well. The set_bit() had been incorrectly moved and without noticing the smp_mb__before_atomic(). I've submitted a patch to restore it and to fix a related issue to due missing barriers: https://lkml.kernel.org/r/20190425160540.10036-5-johan@kernel.org Johan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() 2019-04-29 9:24 ` Johan Hovold @ 2019-04-29 14:49 ` Paul E. McKenney 0 siblings, 0 replies; 21+ messages in thread From: Paul E. McKenney @ 2019-04-29 14:49 UTC (permalink / raw) To: Johan Hovold Cc: Peter Zijlstra, Nicholas Piggin, LKMM Maintainers -- Akira Yokosawa, Andrea Parri, Boqun Feng, David Howells, Daniel Lustig, Jade Alglave, Kernel development list, Luc Maranget, Alan Stern, Will Deacon On Mon, Apr 29, 2019 at 11:24:30AM +0200, Johan Hovold wrote: > On Tue, Apr 23, 2019 at 06:30:10AM -0700, Paul E. McKenney wrote: > > On Tue, Apr 23, 2019 at 02:32:09PM +0200, Peter Zijlstra wrote: > > > On Sat, Apr 20, 2019 at 01:54:40AM -0700, Paul E. McKenney wrote: > > > > > And lock acquisition??? acm_read_bulk_callback(). > > > > > > I think it goes with the set_bit() earlier, but what do I know. > > > > Quite possibly! In that case it should be smp_mb__after_atomic(), > > and it would be nice if it immediately followed the set_bit(). > > I noticed this one last week as well. The set_bit() had been incorrectly > moved and without noticing the smp_mb__before_atomic(). I've submitted a > patch to restore it and to fix a related issue to due missing barriers: > > https://lkml.kernel.org/r/20190425160540.10036-5-johan@kernel.org Good to know, thank you! Thanx, Paul ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-04-29 14:51 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-04-19 17:21 [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() Alan Stern 2019-04-19 17:54 ` Paul E. McKenney 2019-04-19 18:00 ` Peter Zijlstra 2019-04-19 18:26 ` Paul E. McKenney 2019-04-20 0:26 ` Nicholas Piggin 2019-04-20 8:54 ` Paul E. McKenney 2019-04-23 12:17 ` Peter Zijlstra 2019-04-23 13:21 ` Paul E. McKenney 2019-04-23 13:26 ` Peter Zijlstra 2019-04-23 20:16 ` Paul E. McKenney 2019-04-23 20:28 ` Peter Zijlstra 2019-04-24 8:29 ` Paul E. McKenney 2019-04-24 8:44 ` Peter Zijlstra 2019-04-23 12:32 ` Peter Zijlstra 2019-04-23 13:30 ` Paul E. McKenney 2019-04-23 13:40 ` Peter Zijlstra 2019-04-23 20:19 ` Paul E. McKenney 2019-04-27 8:17 ` Andrea Parri 2019-04-27 8:36 ` Paul E. McKenney 2019-04-29 9:24 ` Johan Hovold 2019-04-29 14:49 ` Paul E. McKenney
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.