All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] xfs: insert memory barriers before wake_up_bit()
@ 2013-02-04 16:12 Alex Elder
  2013-02-04 16:13 ` [PATCH 1/2] xfs: memory barrier " Alex Elder
  2013-02-04 16:13 ` [PATCH 2/2] xfs: another " Alex Elder
  0 siblings, 2 replies; 8+ messages in thread
From: Alex Elder @ 2013-02-04 16:12 UTC (permalink / raw)
  To: xfs; +Cc: Sage Weil

These two patches insert memory barriers in xfs_iunlock() and
xfs_inode_item_unpin() before their calls to wake_up_bit(),
to ensure just-queued processes are properly awakened.

					-Alex

[PATCH 1/2] xfs: memory barrier before wake_up_bit()
[PATCH 2/2] xfs: another memory barrier before wake_up_bit()

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] xfs: memory barrier before wake_up_bit()
  2013-02-04 16:12 [PATCH 0/2] xfs: insert memory barriers before wake_up_bit() Alex Elder
@ 2013-02-04 16:13 ` Alex Elder
  2013-02-04 23:06   ` Dave Chinner
  2013-02-04 16:13 ` [PATCH 2/2] xfs: another " Alex Elder
  1 sibling, 1 reply; 8+ messages in thread
From: Alex Elder @ 2013-02-04 16:13 UTC (permalink / raw)
  To: xfs

In xfs_ifunlock() there is a call to wake_up_bit() after clearing
the flush lock on the xfs inode.  This is not guaranteed to be safe,
as noted in the comments above wake_up_bit() beginning with:

    In order for this to function properly, as it uses
    waitqueue_active() internally, some kind of memory
    barrier must be done prior to calling this.

I claim no mastery of the details and subtlety of memory barrier
use, but I believe the issue is that the call to waitqueue_active()
in __wake_up_bit(), could be operating on a value of "wq" that is
out of date.  This patch fixes this by inserting a call to smp_mb()
in xfs_iunlock before calling wake_up_bit(), along the lines of
what's done in unlock_new_inode().  A litte more explanation
follows.


In __xfs_iflock(), prepare_to_wait_exclusive() adds a wait queue
entry to the end of a bit wait queue before setting the current task
state to UNINTERRUPTIBLE.  And although setting the task state
issues a full smp_mb() (which ensures changes made are visible to
the rest of the system at that point) that alone does not guarantee
that other CPUs will instantly avail themselves of the updated
value.  A separate CPU needs to issue at least a read barrier in
order to ensure the wq value it uses to determine whether there are
waiters is up-to-date, and waitqueue_active() does not do that.

I came to suspect this code because we had a customer with a system
that was hung with one or more tasks stuck in __xfs_iflock().  A
little poking around the affected code led me to the comments in
wake_up_bit().

Signed-off-by: Alex Elder <elder@inktank.com>
---
 fs/xfs/xfs_inode.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 22baf6e..237e7f6 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -419,6 +419,7 @@ static inline void xfs_iflock(struct xfs_inode *ip)
 static inline void xfs_ifunlock(struct xfs_inode *ip)
 {
 	xfs_iflags_clear(ip, XFS_IFLOCK);
+	smp_mb();
 	wake_up_bit(&ip->i_flags, __XFS_IFLOCK_BIT);
 }

-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] xfs: another memory barrier before wake_up_bit()
  2013-02-04 16:12 [PATCH 0/2] xfs: insert memory barriers before wake_up_bit() Alex Elder
  2013-02-04 16:13 ` [PATCH 1/2] xfs: memory barrier " Alex Elder
@ 2013-02-04 16:13 ` Alex Elder
  2013-02-04 23:26   ` Dave Chinner
  1 sibling, 1 reply; 8+ messages in thread
From: Alex Elder @ 2013-02-04 16:13 UTC (permalink / raw)
  To: xfs

In xfs_inode_item_unpin() there is a call to wake_up_bit() following
an independent test for whether waiters should be awakened.  This
requires a memory barrier in order to guarantee correct operation
(see the comment above wake_up_bit()).

Signed-off-by: Alex Elder <elder@inktank.com>
---
 fs/xfs/xfs_inode_item.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index d041d47..a7cacf7 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -474,8 +474,10 @@ xfs_inode_item_unpin(

 	trace_xfs_inode_unpin(ip, _RET_IP_);
 	ASSERT(atomic_read(&ip->i_pincount) > 0);
-	if (atomic_dec_and_test(&ip->i_pincount))
-		wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT);
+	if (!atomic_dec_and_test(&ip->i_pincount))
+		return;
+	smp_mb();
+	wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT);
 }

 STATIC uint
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] xfs: memory barrier before wake_up_bit()
  2013-02-04 16:13 ` [PATCH 1/2] xfs: memory barrier " Alex Elder
@ 2013-02-04 23:06   ` Dave Chinner
  2013-02-05  1:35     ` Alex Elder
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2013-02-04 23:06 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Mon, Feb 04, 2013 at 10:13:11AM -0600, Alex Elder wrote:
> In xfs_ifunlock() there is a call to wake_up_bit() after clearing
> the flush lock on the xfs inode.  This is not guaranteed to be safe,
> as noted in the comments above wake_up_bit() beginning with:
> 
>     In order for this to function properly, as it uses
>     waitqueue_active() internally, some kind of memory
>     barrier must be done prior to calling this.
> 
> I claim no mastery of the details and subtlety of memory barrier
> use, but I believe the issue is that the call to waitqueue_active()
> in __wake_up_bit(), could be operating on a value of "wq" that is
> out of date.  This patch fixes this by inserting a call to smp_mb()
> in xfs_iunlock before calling wake_up_bit(), along the lines of
> what's done in unlock_new_inode().  A litte more explanation
> follows.
> 
> 
> In __xfs_iflock(), prepare_to_wait_exclusive() adds a wait queue
> entry to the end of a bit wait queue before setting the current task
> state to UNINTERRUPTIBLE.  And although setting the task state
> issues a full smp_mb() (which ensures changes made are visible to
> the rest of the system at that point) that alone does not guarantee
> that other CPUs will instantly avail themselves of the updated
> value.  A separate CPU needs to issue at least a read barrier in
> order to ensure the wq value it uses to determine whether there are
> waiters is up-to-date, and waitqueue_active() does not do that.

You can probably trim most of this and simply point at the comment
describing wake_up_bit()....

> I came to suspect this code because we had a customer with a system
> that was hung with one or more tasks stuck in __xfs_iflock().  A
> little poking around the affected code led me to the comments in
> wake_up_bit().
> 
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
>  fs/xfs/xfs_inode.h |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 22baf6e..237e7f6 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -419,6 +419,7 @@ static inline void xfs_iflock(struct xfs_inode *ip)
>  static inline void xfs_ifunlock(struct xfs_inode *ip)
>  {
>  	xfs_iflags_clear(ip, XFS_IFLOCK);
> +	smp_mb();
>  	wake_up_bit(&ip->i_flags, __XFS_IFLOCK_BIT);

ACK, smp_mb() is needed because spin_unlock() is not a memory
barrier and so not everyone will have seen the bit being cleared.

Reviewed-by: Dave Chinner <david@fromorbit.com>

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] xfs: another memory barrier before wake_up_bit()
  2013-02-04 16:13 ` [PATCH 2/2] xfs: another " Alex Elder
@ 2013-02-04 23:26   ` Dave Chinner
  2013-02-05  1:38     ` Alex Elder
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2013-02-04 23:26 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Mon, Feb 04, 2013 at 10:13:23AM -0600, Alex Elder wrote:
> In xfs_inode_item_unpin() there is a call to wake_up_bit() following
> an independent test for whether waiters should be awakened.  This
> requires a memory barrier in order to guarantee correct operation
> (see the comment above wake_up_bit()).
> 
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
>  fs/xfs/xfs_inode_item.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> index d041d47..a7cacf7 100644
> --- a/fs/xfs/xfs_inode_item.c
> +++ b/fs/xfs/xfs_inode_item.c
> @@ -474,8 +474,10 @@ xfs_inode_item_unpin(
> 
>  	trace_xfs_inode_unpin(ip, _RET_IP_);
>  	ASSERT(atomic_read(&ip->i_pincount) > 0);
> -	if (atomic_dec_and_test(&ip->i_pincount))
> -		wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT);
> +	if (!atomic_dec_and_test(&ip->i_pincount))
> +		return;
> +	smp_mb();
> +	wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT);

I'm not sure this a barrier is actually needed here.  The "wake up"
bit is never stored or cleared anywhere in this case, it is used
only to define a wait channel and directed wake up. Hence the "need
a barrier so all CPUs see the cleared bit" case doesn't arise here.
We use an atomic variable instead, and that makes it safe.

If you read Documentation/atomic_ops.txt, you'll find that atomic
modification operations are required to have explicit barrier
semantics. i.e. that atomic_dec_and_test() must behave like it has
both a smp_mb() before and after the atomic operation. i.e:

	Unlike the above routines, it is required that explicit memory
	barriers are performed before and after the operation.  It must be
	done such that all memory operations before and after the atomic
	operation calls are strongly ordered with respect to the atomic
	operation itself.

So, the smp_mb() that is added here is redundant - the
atomic_dec_and_test() call already has the necesary memory barriers
that wake_up_bit() requires.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] xfs: memory barrier before wake_up_bit()
  2013-02-04 23:06   ` Dave Chinner
@ 2013-02-05  1:35     ` Alex Elder
  2013-02-07 15:44       ` Ben Myers
  0 siblings, 1 reply; 8+ messages in thread
From: Alex Elder @ 2013-02-05  1:35 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 02/04/2013 05:06 PM, Dave Chinner wrote:
> On Mon, Feb 04, 2013 at 10:13:11AM -0600, Alex Elder wrote:
>> In xfs_ifunlock() there is a call to wake_up_bit() after clearing
>> the flush lock on the xfs inode.  This is not guaranteed to be safe,
>> as noted in the comments above wake_up_bit() beginning with:
>>
>>     In order for this to function properly, as it uses
>>     waitqueue_active() internally, some kind of memory
>>     barrier must be done prior to calling this.
>>
>> I claim no mastery of the details and subtlety of memory barrier
>> use, but I believe the issue is that the call to waitqueue_active()
>> in __wake_up_bit(), could be operating on a value of "wq" that is
>> out of date.  This patch fixes this by inserting a call to smp_mb()
>> in xfs_iunlock before calling wake_up_bit(), along the lines of
>> what's done in unlock_new_inode().  A litte more explanation
>> follows.
>>
>>
>> In __xfs_iflock(), prepare_to_wait_exclusive() adds a wait queue
>> entry to the end of a bit wait queue before setting the current task
>> state to UNINTERRUPTIBLE.  And although setting the task state
>> issues a full smp_mb() (which ensures changes made are visible to
>> the rest of the system at that point) that alone does not guarantee
>> that other CPUs will instantly avail themselves of the updated
>> value.  A separate CPU needs to issue at least a read barrier in
>> order to ensure the wq value it uses to determine whether there are
>> waiters is up-to-date, and waitqueue_active() does not do that.
> 
> You can probably trim most of this and simply point at the comment
> describing wake_up_bit()....

Yeah, I know.  I just wanted to sort of say what I was
thinking to get confirmation (or correction).  I now
have a much better understanding of barriers than I did
before, but there are still corners I haven't wrapped
my head around.

Ben, please feel free do trim off this stuff as you
see fit.

					-Alex

> 
>> I came to suspect this code because we had a customer with a system
>> that was hung with one or more tasks stuck in __xfs_iflock().  A
>> little poking around the affected code led me to the comments in
>> wake_up_bit().
>>
>> Signed-off-by: Alex Elder <elder@inktank.com>
>> ---
>>  fs/xfs/xfs_inode.h |    1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
>> index 22baf6e..237e7f6 100644
>> --- a/fs/xfs/xfs_inode.h
>> +++ b/fs/xfs/xfs_inode.h
>> @@ -419,6 +419,7 @@ static inline void xfs_iflock(struct xfs_inode *ip)
>>  static inline void xfs_ifunlock(struct xfs_inode *ip)
>>  {
>>  	xfs_iflags_clear(ip, XFS_IFLOCK);
>> +	smp_mb();
>>  	wake_up_bit(&ip->i_flags, __XFS_IFLOCK_BIT);
> 
> ACK, smp_mb() is needed because spin_unlock() is not a memory
> barrier and so not everyone will have seen the bit being cleared.
> 
> Reviewed-by: Dave Chinner <david@fromorbit.com>
> 
> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] xfs: another memory barrier before wake_up_bit()
  2013-02-04 23:26   ` Dave Chinner
@ 2013-02-05  1:38     ` Alex Elder
  0 siblings, 0 replies; 8+ messages in thread
From: Alex Elder @ 2013-02-05  1:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 02/04/2013 05:26 PM, Dave Chinner wrote:
> On Mon, Feb 04, 2013 at 10:13:23AM -0600, Alex Elder wrote:
>> In xfs_inode_item_unpin() there is a call to wake_up_bit() following
>> an independent test for whether waiters should be awakened.  This
>> requires a memory barrier in order to guarantee correct operation
>> (see the comment above wake_up_bit()).
>>
>> Signed-off-by: Alex Elder <elder@inktank.com>
>> ---
>>  fs/xfs/xfs_inode_item.c |    6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
>> index d041d47..a7cacf7 100644
>> --- a/fs/xfs/xfs_inode_item.c
>> +++ b/fs/xfs/xfs_inode_item.c
>> @@ -474,8 +474,10 @@ xfs_inode_item_unpin(
>>
>>  	trace_xfs_inode_unpin(ip, _RET_IP_);
>>  	ASSERT(atomic_read(&ip->i_pincount) > 0);
>> -	if (atomic_dec_and_test(&ip->i_pincount))
>> -		wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT);
>> +	if (!atomic_dec_and_test(&ip->i_pincount))
>> +		return;
>> +	smp_mb();
>> +	wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT);
> 
> I'm not sure this a barrier is actually needed here.  The "wake up"
> bit is never stored or cleared anywhere in this case, it is used
> only to define a wait channel and directed wake up. Hence the "need
> a barrier so all CPUs see the cleared bit" case doesn't arise here.
> We use an atomic variable instead, and that makes it safe.
> 
> If you read Documentation/atomic_ops.txt, you'll find that atomic
> modification operations are required to have explicit barrier
> semantics. i.e. that atomic_dec_and_test() must behave like it has
> both a smp_mb() before and after the atomic operation. i.e:
> 
> 	Unlike the above routines, it is required that explicit memory
> 	barriers are performed before and after the operation.  It must be
> 	done such that all memory operations before and after the atomic
> 	operation calls are strongly ordered with respect to the atomic
> 	operation itself.
> 
> So, the smp_mb() that is added here is redundant - the
> atomic_dec_and_test() call already has the necesary memory barriers
> that wake_up_bit() requires.

I hadn't looked at that in as much detail, but now that you point it
out I concur.

I retract this patch.

Thanks.

					-Alex

> 
> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] xfs: memory barrier before wake_up_bit()
  2013-02-05  1:35     ` Alex Elder
@ 2013-02-07 15:44       ` Ben Myers
  0 siblings, 0 replies; 8+ messages in thread
From: Ben Myers @ 2013-02-07 15:44 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Mon, Feb 04, 2013 at 07:35:09PM -0600, Alex Elder wrote:
> On 02/04/2013 05:06 PM, Dave Chinner wrote:
> > On Mon, Feb 04, 2013 at 10:13:11AM -0600, Alex Elder wrote:
> >> In xfs_ifunlock() there is a call to wake_up_bit() after clearing
> >> the flush lock on the xfs inode.  This is not guaranteed to be safe,
> >> as noted in the comments above wake_up_bit() beginning with:
> >>
> >>     In order for this to function properly, as it uses
> >>     waitqueue_active() internally, some kind of memory
> >>     barrier must be done prior to calling this.
> >>
> >> I claim no mastery of the details and subtlety of memory barrier
> >> use, but I believe the issue is that the call to waitqueue_active()
> >> in __wake_up_bit(), could be operating on a value of "wq" that is
> >> out of date.  This patch fixes this by inserting a call to smp_mb()
> >> in xfs_iunlock before calling wake_up_bit(), along the lines of
> >> what's done in unlock_new_inode().  A litte more explanation
> >> follows.
> >>
> >>
> >> In __xfs_iflock(), prepare_to_wait_exclusive() adds a wait queue
> >> entry to the end of a bit wait queue before setting the current task
> >> state to UNINTERRUPTIBLE.  And although setting the task state
> >> issues a full smp_mb() (which ensures changes made are visible to
> >> the rest of the system at that point) that alone does not guarantee
> >> that other CPUs will instantly avail themselves of the updated
> >> value.  A separate CPU needs to issue at least a read barrier in
> >> order to ensure the wq value it uses to determine whether there are
> >> waiters is up-to-date, and waitqueue_active() does not do that.
> > 
> > You can probably trim most of this and simply point at the comment
> > describing wake_up_bit()....
> 
> Yeah, I know.  I just wanted to sort of say what I was
> thinking to get confirmation (or correction).  I now
> have a much better understanding of barriers than I did
> before, but there are still corners I haven't wrapped
> my head around.
> 
> Ben, please feel free do trim off this stuff as you
> see fit.

Applied.

Thanks Alex!

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-02-07 15:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-04 16:12 [PATCH 0/2] xfs: insert memory barriers before wake_up_bit() Alex Elder
2013-02-04 16:13 ` [PATCH 1/2] xfs: memory barrier " Alex Elder
2013-02-04 23:06   ` Dave Chinner
2013-02-05  1:35     ` Alex Elder
2013-02-07 15:44       ` Ben Myers
2013-02-04 16:13 ` [PATCH 2/2] xfs: another " Alex Elder
2013-02-04 23:26   ` Dave Chinner
2013-02-05  1:38     ` Alex Elder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.