[RFC] Badness in __mutex_unlock_slowpath with XFS stress tests

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
@ 2006-03-09  7:54 Suzuki
  2006-03-09 12:03 ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Suzuki @ 2006-03-09  7:54 UTC (permalink / raw)
  To: linux-fsdevel, linux-aio kvack.org, lkml; +Cc: suparna, akpm

[-- Attachment #1: Type: text/plain, Size: 134 bytes --]

Missed out linux-aio & linux-fs-devel lists. Forwarding.

Comments ?

Suzuki K P
Linux Technology Center,
IBM Software Labs, India.

[-- Attachment #2: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests.eml --]
[-- Type: message/rfc822, Size: 2028 bytes --]

From: Suzuki <suzuki@in.ibm.com>
To: lkml <linux-kernel@vger.kernel.org>
Cc: suparna <suparna@in.ibm.com>,  akpm@osdl.org
Subject: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
Date: Thu, 09 Mar 2006 12:47:01 +0530
Message-ID: <440FD66D.6060308@in.ibm.com>

Hi all,

I was working on an issue with getting "Badness in
__mutex_unlock_slowpath" and hence a stack trace, while running FS
stress tests on XFS on 2.6.16-rc5 kernel.

The dmesg looks like :

Badness in __mutex_unlock_slowpath at kernel/mutex.c:207
  [<c0103c0c>] show_trace+0x20/0x22
  [<c0103d4b>] dump_stack+0x1e/0x20
  [<c0473f1f>] __mutex_unlock_slowpath+0x12a/0x23b
  [<c0473938>] mutex_unlock+0xb/0xd
  [<c02a5720>] xfs_read+0x230/0x2d9
  [<c02a1bed>] linvfs_aio_read+0x8d/0x98
  [<c015f3df>] do_sync_read+0xb8/0x107
  [<c015f4f7>] vfs_read+0xc9/0x19b
  [<c015f8b2>] sys_read+0x47/0x6e
  [<c0102db7>] sysenter_past_esp+0x54/0x75

This happens with XFS DIO reads. xfs_read holds the i_mutex and issues a
__generic_file_aio_read(), which falls into __blockdev_direct_IO with
DIO_OWN_LOCKING flag (since xfs uses own_locking ). Now
__blockdev_direct_IO releases the i_mutex for READs with
DIO_OWN_LOCKING.When it returns to xfs_read, it tries to unlock the
i_mutex ( which is now already unlocked), causing the "Badness".

The possible solution which we can think of, is not to unlock the
i_mutex for DIO_OWN_LOCKING. This will only affect the DIO_OWN_LOCKING 
users (as of now, only XFS ) with concurrent DIO sync read requests. AIO 
read requests would not suffer this problem since they would just return 
once the DIO is submitted.

Another work around for this can  be adding a check "mutex_is_locked"
before trying to unlock i_mutex in xfs_read. But this seems to be an
ugly hack. :(

Comments ?

thanks,

Suzuki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
  2006-03-09  7:54 [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests Suzuki
@ 2006-03-09 12:03 ` Christoph Hellwig
  2006-03-09 22:30   ` Nathan Scott
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2006-03-09 12:03 UTC (permalink / raw)
  To: Suzuki; +Cc: linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm, linux-xfs

On Thu, Mar 09, 2006 at 01:24:38PM +0530, Suzuki wrote:
> 
> Missed out linux-aio & linux-fs-devel lists. Forwarding.
> 
> Comments ?

I've seen this too.  The problem is that __generic_file_aio_read can return
with or without the i_mutex locked in the direct I/O case for filesystems
that set DIO_OWN_LOCKING.  It's a nasty one and I haven't found a better solution
than copying lots of code from filemap.c into xfs.

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
  2006-03-09 12:03 ` Christoph Hellwig
@ 2006-03-09 22:30   ` Nathan Scott
  2006-03-09 22:42     ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Nathan Scott @ 2006-03-09 22:30 UTC (permalink / raw)
  To: Christoph Hellwig, Suzuki
  Cc: linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm, linux-xfs

On Thu, Mar 09, 2006 at 12:03:06PM +0000, Christoph Hellwig wrote:
> On Thu, Mar 09, 2006 at 01:24:38PM +0530, Suzuki wrote:
> > 
> > Missed out linux-aio & linux-fs-devel lists. Forwarding.
> > 
> > Comments ?
> 
> I've seen this too.  The problem is that __generic_file_aio_read can return
> with or without the i_mutex locked in the direct I/O case for filesystems
> that set DIO_OWN_LOCKING.

Not for reads AFAICT - __generic_file_aio_read + own-locking
should always have released i_mutex at the end of the direct
read - are you thinking of writes or have I missed something?

> It's a nasty one and I haven't found a better solution
> than copying lots of code from filemap.c into xfs.

Er, eek?  Hopefully thats not needed - from my reading of the
code, all the i_mutex locking for direct reads lives inside
direct-io.c, not filemap.c -- is the solution from my other
mail not workable?  (isn't it only writes that has the wierd
buffered I/O fallback + i_sem/i_mutex locking interaction?).

thanks.

-- 
Nathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
  2006-03-09 22:30   ` Nathan Scott
@ 2006-03-09 22:42     ` Christoph Hellwig
  2006-03-09 23:14       ` Nathan Scott
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2006-03-09 22:42 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Christoph Hellwig, Suzuki, linux-fsdevel, linux-aio kvack.org,
	lkml, suparna, akpm, linux-xfs

On Fri, Mar 10, 2006 at 09:30:42AM +1100, Nathan Scott wrote:
> Not for reads AFAICT - __generic_file_aio_read + own-locking
> should always have released i_mutex at the end of the direct
> read - are you thinking of writes or have I missed something?

if an error occurs before a_ops->direct_IO is called __generic_file_aio_read
will return with i_mutex still locked.  Note that checking for negative
return values is not enough as __blockdev_direct_IO can return errors
aswell.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
  2006-03-09 22:42     ` Christoph Hellwig
@ 2006-03-09 23:14       ` Nathan Scott
  2006-03-10  0:50         ` Nathan Scott
  0 siblings, 1 reply; 14+ messages in thread
From: Nathan Scott @ 2006-03-09 23:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Suzuki, linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm,
	linux-xfs

On Thu, Mar 09, 2006 at 10:42:19PM +0000, Christoph Hellwig wrote:
> On Fri, Mar 10, 2006 at 09:30:42AM +1100, Nathan Scott wrote:
> > Not for reads AFAICT - __generic_file_aio_read + own-locking
> > should always have released i_mutex at the end of the direct
> > read - are you thinking of writes or have I missed something?
> 
> if an error occurs before a_ops->direct_IO is called __generic_file_aio_read
> will return with i_mutex still locked.  Note that checking for negative
> return values is not enough as __blockdev_direct_IO can return errors
> aswell.

*groan* - right you are.  Another option may be to have the
generic dio+own-locking case reacquire i_mutex if it drops
it, before returning... thoughts?  Seems alot less invasive
than the filemap.c code dup'ing thing.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
  2006-03-09 23:14       ` Nathan Scott
@ 2006-03-10  0:50         ` Nathan Scott
  2006-03-10 15:49           ` Christoph Hellwig
                             ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Nathan Scott @ 2006-03-10  0:50 UTC (permalink / raw)
  To: Christoph Hellwig, Suzuki
  Cc: linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm, linux-xfs

On Fri, Mar 10, 2006 at 10:14:22AM +1100, Nathan Scott wrote:
> On Thu, Mar 09, 2006 at 10:42:19PM +0000, Christoph Hellwig wrote:
> > On Fri, Mar 10, 2006 at 09:30:42AM +1100, Nathan Scott wrote:
> > > Not for reads AFAICT - __generic_file_aio_read + own-locking
> > > should always have released i_mutex at the end of the direct
> > > read - are you thinking of writes or have I missed something?
> > 
> > if an error occurs before a_ops->direct_IO is called __generic_file_aio_read
> > will return with i_mutex still locked.  Note that checking for negative
> > return values is not enough as __blockdev_direct_IO can return errors
> > aswell.
> 
> *groan* - right you are.  Another option may be to have the
> generic dio+own-locking case reacquire i_mutex if it drops
> it, before returning... thoughts?  Seems alot less invasive
> than the filemap.c code dup'ing thing.

Something like this (works OK for me)...

cheers.

-- 
Nathan


Index: 2.6.x-xfs/fs/direct-io.c
===================================================================
--- 2.6.x-xfs.orig/fs/direct-io.c
+++ 2.6.x-xfs/fs/direct-io.c
@@ -1155,15 +1155,16 @@ direct_io_worker(int rw, struct kiocb *i
  * For writes, i_mutex is not held on entry; it is never taken.
  *
  * DIO_LOCKING (simple locking for regular files)
- * For writes we are called under i_mutex and return with i_mutex held, even though
- * it is internally dropped.
+ * For writes we are called under i_mutex and return with i_mutex held, even
+ * though it is internally dropped.
  * For reads, i_mutex is not held on entry, but it is taken and dropped before
  * returning.
  *
  * DIO_OWN_LOCKING (filesystem provides synchronisation and handling of
  *	uninitialised data, allowing parallel direct readers and writers)
  * For writes we are called without i_mutex, return without it, never touch it.
- * For reads, i_mutex is held on entry and will be released before returning.
+ * For reads we are called under i_mutex and return with i_mutex held, even
+ * though it may be internally dropped.
  *
  * Additional i_alloc_sem locking requirements described inline below.
  */
@@ -1182,7 +1183,8 @@ __blockdev_direct_IO(int rw, struct kioc
 	ssize_t retval = -EINVAL;
 	loff_t end = offset;
 	struct dio *dio;
-	int reader_with_isem = (rw == READ && dio_lock_type == DIO_OWN_LOCKING);
+	int release_i_mutex = 0;
+	int acquire_i_mutex = 0;
 
 	if (rw & WRITE)
 		current->flags |= PF_SYNCWRITE;
@@ -1225,7 +1227,6 @@ __blockdev_direct_IO(int rw, struct kioc
 	 *	writers need to grab i_alloc_sem only (i_mutex is already held)
 	 * For regular files using DIO_OWN_LOCKING,
 	 *	neither readers nor writers take any locks here
-	 *	(i_mutex is already held and release for writers here)
 	 */
 	dio->lock_type = dio_lock_type;
 	if (dio_lock_type != DIO_NO_LOCKING) {
@@ -1236,7 +1237,7 @@ __blockdev_direct_IO(int rw, struct kioc
 			mapping = iocb->ki_filp->f_mapping;
 			if (dio_lock_type != DIO_OWN_LOCKING) {
 				mutex_lock(&inode->i_mutex);
-				reader_with_isem = 1;
+				release_i_mutex = 1;
 			}
 
 			retval = filemap_write_and_wait_range(mapping, offset,
@@ -1248,7 +1249,7 @@ __blockdev_direct_IO(int rw, struct kioc
 
 			if (dio_lock_type == DIO_OWN_LOCKING) {
 				mutex_unlock(&inode->i_mutex);
-				reader_with_isem = 0;
+				acquire_i_mutex = 1;
 			}
 		}
 
@@ -1269,11 +1270,13 @@ __blockdev_direct_IO(int rw, struct kioc
 				nr_segs, blkbits, get_blocks, end_io, dio);
 
 	if (rw == READ && dio_lock_type == DIO_LOCKING)
-		reader_with_isem = 0;
+		release_i_mutex = 0;
 
 out:
-	if (reader_with_isem)
+	if (release_i_mutex)
 		mutex_unlock(&inode->i_mutex);
+	else if (acquire_i_mutex)
+		mutex_lock(&inode->i_mutex);
 	if (rw & WRITE)
 		current->flags &= ~PF_SYNCWRITE;
 	return retval;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
  2006-03-10  0:50         ` Nathan Scott
@ 2006-03-10 15:49           ` Christoph Hellwig
  2006-03-14  4:46             ` Suparna Bhattacharya
  2006-03-17 17:22           ` Adrian Bunk
  2006-07-10 16:46           ` Stephane Doyon
  2 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2006-03-10 15:49 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Christoph Hellwig, Suzuki, linux-fsdevel, linux-aio kvack.org,
	lkml, suparna, akpm, linux-xfs

On Fri, Mar 10, 2006 at 11:50:20AM +1100, Nathan Scott wrote:
> Something like this (works OK for me)...

Yeah, that should work for now.  But long-term we really need to redo
direct I/O locking to have a common scheme for all filesystems.  I've heard
birds whistling RH patches yet another scheme into RHEL4 for GFS an it's
definitly already far too complex now.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
  2006-03-10 15:49           ` Christoph Hellwig
@ 2006-03-14  4:46             ` Suparna Bhattacharya
  0 siblings, 0 replies; 14+ messages in thread
From: Suparna Bhattacharya @ 2006-03-14  4:46 UTC (permalink / raw)
  To: Christoph Hellwig, Nathan Scott, Suzuki, linux-fsdevel,
	linux-aio kvack.org, lkml, akpm, linux-xfs

On Fri, Mar 10, 2006 at 03:49:25PM +0000, Christoph Hellwig wrote:
> On Fri, Mar 10, 2006 at 11:50:20AM +1100, Nathan Scott wrote:
> > Something like this (works OK for me)...
> 
> Yeah, that should work for now.  But long-term we really need to redo
> direct I/O locking to have a common scheme for all filesystems.  I've heard
> birds whistling RH patches yet another scheme into RHEL4 for GFS an it's
> definitly already far too complex now.

Yup, getting rid of the need for all these confusing locking
modes was one of the objectives in mind for DIO simplification.
(http://www.kernel.org/pub/linux/kernel/people/suparna/DIO-simplify.txt)
Once we have an efficient range locking or similar mechanism in place
(Chris Mason is working on a patch), then it should be possible to push
out all of the i_mutex locking to higher level routines, outside of
direct-io.c.

Longer term, it would be nice to be able to rethink and further simplify
the whole _nolock equiv versions for VFS write methods. Especially the
percolation down to sync_page_range_nolock, etc.

Regards
Suparna

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
  2006-03-10  0:50         ` Nathan Scott
  2006-03-10 15:49           ` Christoph Hellwig
@ 2006-03-17 17:22           ` Adrian Bunk
  2006-03-18  3:34             ` Nathan Scott
  2006-07-10 16:46           ` Stephane Doyon
  2 siblings, 1 reply; 14+ messages in thread
From: Adrian Bunk @ 2006-03-17 17:22 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Christoph Hellwig, Suzuki, linux-fsdevel, linux-aio kvack.org,
	lkml, suparna, akpm, linux-xfs

On Fri, Mar 10, 2006 at 11:50:20AM +1100, Nathan Scott wrote:
> On Fri, Mar 10, 2006 at 10:14:22AM +1100, Nathan Scott wrote:
> > On Thu, Mar 09, 2006 at 10:42:19PM +0000, Christoph Hellwig wrote:
> > > On Fri, Mar 10, 2006 at 09:30:42AM +1100, Nathan Scott wrote:
> > > > Not for reads AFAICT - __generic_file_aio_read + own-locking
> > > > should always have released i_mutex at the end of the direct
> > > > read - are you thinking of writes or have I missed something?
> > > 
> > > if an error occurs before a_ops->direct_IO is called __generic_file_aio_read
> > > will return with i_mutex still locked.  Note that checking for negative
> > > return values is not enough as __blockdev_direct_IO can return errors
> > > aswell.
> > 
> > *groan* - right you are.  Another option may be to have the
> > generic dio+own-locking case reacquire i_mutex if it drops
> > it, before returning... thoughts?  Seems alot less invasive
> > than the filemap.c code dup'ing thing.
> 
> Something like this (works OK for me)...

Is this 2.6.16 material?

> cheers.
> Nathan
>...

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
  2006-03-17 17:22           ` Adrian Bunk
@ 2006-03-18  3:34             ` Nathan Scott
  2006-03-18  5:03               ` Adrian Bunk
  0 siblings, 1 reply; 14+ messages in thread
From: Nathan Scott @ 2006-03-18  3:34 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Christoph Hellwig, Suzuki, linux-fsdevel, linux-aio kvack.org,
	lkml, suparna, akpm, linux-xfs

On Fri, Mar 17, 2006 at 06:22:10PM +0100, Adrian Bunk wrote:
> On Fri, Mar 10, 2006 at 11:50:20AM +1100, Nathan Scott wrote:
> > Something like this (works OK for me)...
> 
> Is this 2.6.16 material?

Its been merged already.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
  2006-03-18  3:34             ` Nathan Scott
@ 2006-03-18  5:03               ` Adrian Bunk
  0 siblings, 0 replies; 14+ messages in thread
From: Adrian Bunk @ 2006-03-18  5:03 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Christoph Hellwig, Suzuki, linux-fsdevel, linux-aio kvack.org,
	lkml, suparna, akpm, linux-xfs

On Sat, Mar 18, 2006 at 02:34:44PM +1100, Nathan Scott wrote:
> On Fri, Mar 17, 2006 at 06:22:10PM +0100, Adrian Bunk wrote:
> > On Fri, Mar 10, 2006 at 11:50:20AM +1100, Nathan Scott wrote:
> > > Something like this (works OK for me)...
> > 
> > Is this 2.6.16 material?
> 
> Its been merged already.

Ups, sorry for missing this.

> cheers.
> Nathan

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
  2006-03-10  0:50         ` Nathan Scott
  2006-03-10 15:49           ` Christoph Hellwig
  2006-03-17 17:22           ` Adrian Bunk
@ 2006-07-10 16:46           ` Stephane Doyon
  2006-07-11  0:18             ` Nathan Scott
  2 siblings, 1 reply; 14+ messages in thread
From: Stephane Doyon @ 2006-07-10 16:46 UTC (permalink / raw)
  To: Christoph Hellwig, Nathan Scott
  Cc: Suzuki, linux-fsdevel, linux-aio kvack.org, lkml, suparna, akpm,
	linux-xfs

Hi,

A few months back, a fix was introduced for a mutex double unlock warning 
related to direct I/O in XFS. I believe that fix has a lock ordering 
problem that can cause a deadlock.

The problem was that __blockdev_direct_IO() would unlock the i_mutex in 
the READ and DIO_OWN_LOCKING case, and the mutex would be unlocked again 
in xfs_read() soon after returning from __generic_file_aio_read(). Because 
there are lots of execution paths down from __generic_file_aio_read() that 
do not consistently release the i_mutex, the safest fix was deemed to be 
to reacquire the i_mutex before returning from __blockdev_direct_IO(). At 
this point however, the reader is holding an xfs ilock, and AFAICT the 
i_mutex is usually taken BEFORE the xfs ilock.

We are seeing a deadlock between a process writing and another process 
reading the same file, when the reader is using direct I/O. (The writer 
must apparently be growing the file, using either direct or buffered 
I/O.) Something like this, on XFS:
(dd if=/dev/zero of=f bs=128K count=5000 & ) ; sleep 1 ; dd of=/dev/null 
if=f iflag=direct bs=128K count=5000

Seen on kernels 2.6.16 and 2.6.17.

The deadlock scenario appears to be this:
-The reader goes into xfs_read(), takes the i_mutex and then an xfs_ilock
XFS_IOLOCK_SHARED, then calls down to __generic_file_aio_read() which
eventually goes down to __blockdev_direct_IO(). In there it drops the 
i_mutex.
-The writer goes into xfs_write() and obtains the i_mutex. It then tries
to get an xfs_ilock XFS_ILOCK_EXCL and must wait on it since it's held by 
the reader.
-The reader, still in __blockdev_direct_IO(), executes directio_worker() 
and then tries to reacquire the i_mutex, and must wait on it because the 
writer has it.

And so we have a deadlock.

I leave it to the experts to figure out what the right fix for this might 
be. A workaround might be to not release the i_mutex during 
__blockdev_direct_IO().

Thanks

On Thu, 9 Mar 2006, Christoph Hellwig wrote:

> On Thu, Mar 09, 2006 at 01:24:38PM +0530, Suzuki wrote:
>>
>> Missed out linux-aio & linux-fs-devel lists. Forwarding.
>>
>> Comments ?
>
> I've seen this too.  The problem is that __generic_file_aio_read can return
> with or without the i_mutex locked in the direct I/O case for filesystems
> that set DIO_OWN_LOCKING.  It's a nasty one and I haven't found a better solution
> than copying lots of code from filemap.c into xfs.
>
>
>

On Fri, 10 Mar 2006, Nathan Scott wrote:

> On Thu, Mar 09, 2006 at 12:47:01PM +0530, Suzuki wrote:
>> Hi all,
>
> Hi there Suzuki,
>
>> I was working on an issue with getting "Badness in
>> __mutex_unlock_slowpath" and hence a stack trace, while running FS
>> stress tests on XFS on 2.6.16-rc5 kernel.
>
> Thanks for looking into this.
>
>> The dmesg looks like :
>>
>> Badness in __mutex_unlock_slowpath at kernel/mutex.c:207
>>  [<c0103c0c>] show_trace+0x20/0x22
>>  [<c0103d4b>] dump_stack+0x1e/0x20
>>  [<c0473f1f>] __mutex_unlock_slowpath+0x12a/0x23b
>>  [<c0473938>] mutex_unlock+0xb/0xd
>>  [<c02a5720>] xfs_read+0x230/0x2d9
>>  [<c02a1bed>] linvfs_aio_read+0x8d/0x98
>>  [<c015f3df>] do_sync_read+0xb8/0x107
>>  [<c015f4f7>] vfs_read+0xc9/0x19b
>>  [<c015f8b2>] sys_read+0x47/0x6e
>>  [<c0102db7>] sysenter_past_esp+0x54/0x75
>
> Yeah, test 008 from the xfstests suite was reliably hitting this for
> me, it'd just not percolated to the top of my todo list yet.
>
>> This happens with XFS DIO reads. xfs_read holds the i_mutex and issues a
>> __generic_file_aio_read(), which falls into __blockdev_direct_IO with
>> DIO_OWN_LOCKING flag (since xfs uses own_locking ). Now
>> __blockdev_direct_IO releases the i_mutex for READs with
>> DIO_OWN_LOCKING.When it returns to xfs_read, it tries to unlock the
>> i_mutex ( which is now already unlocked), causing the "Badness".
>
> Indeed.  And there's the problem - why is XFS releasing i_mutex
> for the direct read in xfs_read?  Shouldn't be - fs/direct-io.c
> will always release i_mutex for a direct read in the own-locking
> case, so XFS shouldn't be doing it too (thats what the code does
> and thats what the comment preceding __blockdev_direct_IO says).
>
> The only piece of the puzzle I don't understand is why we don't
> always get that badness message at the end of every direct read.
> Perhaps its some subtle fastpath/slowpath difference, or maybe
> "debug_mutex_on" gets switched off after the first occurance...
>
> Anyway, with the above change (remove 2 lines near xfs_read end),
> I can no longer reproduce the problem in that previously-warning
> test case, and all the other XFS tests seem to be chugging along
> OK (which includes a healthy mix of dio testing).
>
>> The possible solution which we can think of, is not to unlock the
>> i_mutex for DIO_OWN_LOCKING. This will only affect the DIO_OWN_LOCKING
>> users (as of now, only XFS ) with concurrent DIO sync read requests. AIO
>> read requests would not suffer this problem since they would just return
>> once the DIO is submitted.
>
> I don't think that level of invasiveness is necessary at this stage,
> but perhaps you're seeing something that I've missed?  Do you see
> any reason why removing the xfs_read unlock wont work?
>
>> Another work around for this can  be adding a check "mutex_is_locked"
>> before trying to unlock i_mutex in xfs_read. But this seems to be an
>> ugly hack. :(
>
> Hmm, that just plain wouldn't work - what if the lock was released
> in generic direct IO code, and someone else had acquired it before
> we got to the end of xfs_read?  Badness for sure.
>
> cheers.
>
>

On Fri, 10 Mar 2006, Nathan Scott wrote:

> On Fri, Mar 10, 2006 at 10:14:22AM +1100, Nathan Scott wrote:
>> On Thu, Mar 09, 2006 at 10:42:19PM +0000, Christoph Hellwig wrote:
>>> On Fri, Mar 10, 2006 at 09:30:42AM +1100, Nathan Scott wrote:
>>>> Not for reads AFAICT - __generic_file_aio_read + own-locking
>>>> should always have released i_mutex at the end of the direct
>>>> read - are you thinking of writes or have I missed something?
>>>
>>> if an error occurs before a_ops->direct_IO is called __generic_file_aio_read
>>> will return with i_mutex still locked.  Note that checking for negative
>>> return values is not enough as __blockdev_direct_IO can return errors
>>> aswell.
>>
>> *groan* - right you are.  Another option may be to have the
>> generic dio+own-locking case reacquire i_mutex if it drops
>> it, before returning... thoughts?  Seems alot less invasive
>> than the filemap.c code dup'ing thing.
>
> Something like this (works OK for me)...
>
> cheers.
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
  2006-07-10 16:46           ` Stephane Doyon
@ 2006-07-11  0:18             ` Nathan Scott
  2006-07-11 13:40               ` Stephane Doyon
  0 siblings, 1 reply; 14+ messages in thread
From: Nathan Scott @ 2006-07-11  0:18 UTC (permalink / raw)
  To: Stephane Doyon
  Cc: Christoph Hellwig, Suzuki, linux-fsdevel, linux-aio kvack.org,
	lkml, suparna, akpm, xfs

On Mon, Jul 10, 2006 at 12:46:23PM -0400, Stephane Doyon wrote:
> Hi,

Hi Stephane,

> A few months back, a fix was introduced for a mutex double unlock warning 
> related to direct I/O in XFS. I believe that fix has a lock ordering 
> problem that can cause a deadlock.
> 
> The problem was that __blockdev_direct_IO() would unlock the i_mutex in 
> the READ and DIO_OWN_LOCKING case, and the mutex would be unlocked again 
> in xfs_read() soon after returning from __generic_file_aio_read(). Because 
> there are lots of execution paths down from __generic_file_aio_read() that 
> do not consistently release the i_mutex, the safest fix was deemed to be 
> to reacquire the i_mutex before returning from __blockdev_direct_IO(). At 
> this point however, the reader is holding an xfs ilock, and AFAICT the 
> i_mutex is usually taken BEFORE the xfs ilock.

That is correct, yes.  Hmm.  Subtle.  Painful.  Thanks for the detailed
report and your analysis.

> We are seeing a deadlock between a process writing and another process 
> reading the same file, when the reader is using direct I/O. (The writer 

Is that a direct writer or a buffered writer?

> must apparently be growing the file, using either direct or buffered 
> I/O.) Something like this, on XFS:
> (dd if=/dev/zero of=f bs=128K count=5000 & ) ; sleep 1 ; dd of=/dev/null 
> if=f iflag=direct bs=128K count=5000
> 
> Seen on kernels 2.6.16 and 2.6.17.
> 
> The deadlock scenario appears to be this:
> -The reader goes into xfs_read(), takes the i_mutex and then an xfs_ilock
> XFS_IOLOCK_SHARED, then calls down to __generic_file_aio_read() which
> eventually goes down to __blockdev_direct_IO(). In there it drops the 
> i_mutex.
> -The writer goes into xfs_write() and obtains the i_mutex. It then tries
> to get an xfs_ilock XFS_ILOCK_EXCL and must wait on it since it's held by 
> the reader.
> -The reader, still in __blockdev_direct_IO(), executes directio_worker() 
> and then tries to reacquire the i_mutex, and must wait on it because the 
> writer has it.
> 
> And so we have a deadlock.

*nod*.  This will require some thought, I'm not sure I like the sound of
your suggested workaround there a whole lot, unfortunately, but maybe it
is all we can do at this stage.  Let me ponder further and get back to you
(but if you want to prototype your fix further, that'd be most welcome, of
course).

cheers.

--
Nathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests
  2006-07-11  0:18             ` Nathan Scott
@ 2006-07-11 13:40               ` Stephane Doyon
  0 siblings, 0 replies; 14+ messages in thread
From: Stephane Doyon @ 2006-07-11 13:40 UTC (permalink / raw)
  To: Nathan Scott
  Cc: Christoph Hellwig, Suzuki, linux-fsdevel, linux-aio kvack.org,
	lkml, suparna, akpm, xfs

On Tue, 11 Jul 2006, Nathan Scott wrote:

> On Mon, Jul 10, 2006 at 12:46:23PM -0400, Stephane Doyon wrote:
>> Hi,
>
> Hi Stephane,
>
>> A few months back, a fix was introduced for a mutex double unlock warning
>> related to direct I/O in XFS. I believe that fix has a lock ordering
>> problem that can cause a deadlock.
>>
>> The problem was that __blockdev_direct_IO() would unlock the i_mutex in
>> the READ and DIO_OWN_LOCKING case, and the mutex would be unlocked again
>> in xfs_read() soon after returning from __generic_file_aio_read(). Because
>> there are lots of execution paths down from __generic_file_aio_read() that
>> do not consistently release the i_mutex, the safest fix was deemed to be
>> to reacquire the i_mutex before returning from __blockdev_direct_IO(). At
>> this point however, the reader is holding an xfs ilock, and AFAICT the
>> i_mutex is usually taken BEFORE the xfs ilock.
>
> That is correct, yes.  Hmm.  Subtle.  Painful.  Thanks for the detailed
> report and your analysis.
>
>> We are seeing a deadlock between a process writing and another process
>> reading the same file, when the reader is using direct I/O. (The writer
>
> Is that a direct writer or a buffered writer?

Whichever, both cases trigger the deadlock.

>> must apparently be growing the file, using either direct or buffered
>> I/O.) Something like this, on XFS:
>> (dd if=/dev/zero of=f bs=128K count=5000 & ) ; sleep 1 ; dd of=/dev/null
>> if=f iflag=direct bs=128K count=5000
>>
>> Seen on kernels 2.6.16 and 2.6.17.
>>
>> The deadlock scenario appears to be this:
>> -The reader goes into xfs_read(), takes the i_mutex and then an xfs_ilock
>> XFS_IOLOCK_SHARED, then calls down to __generic_file_aio_read() which
>> eventually goes down to __blockdev_direct_IO(). In there it drops the
>> i_mutex.
>> -The writer goes into xfs_write() and obtains the i_mutex. It then tries
>> to get an xfs_ilock XFS_ILOCK_EXCL and must wait on it since it's held by
>> the reader.
>> -The reader, still in __blockdev_direct_IO(), executes directio_worker()
>> and then tries to reacquire the i_mutex, and must wait on it because the
>> writer has it.
>>
>> And so we have a deadlock.
>
> *nod*.  This will require some thought, I'm not sure I like the sound of
> your suggested workaround there a whole lot, unfortunately, but maybe it
> is all we can do at this stage.  Let me ponder further and get back to you

Thank you.

> (but if you want to prototype your fix further, that'd be most welcome, of
> course).

Sure, well it's not very subtle. The below patch is what I'm using for 
now. I haven't seen problems with it yet but it hasn't been seriously 
tested.

I'm assuming that it's OK to keep holding the i_mutex during the call to 
direct_io_worker(), because in the DIO_LOCKING case, direct_io_worker() 
is called with the i_mutex held, and XFS touches i_mutex only in 
xfs_read() and xfs_write(), so opefully nothing will conflict.

Signed-off-by: Stephane Doyon <sdoyon@max-t.com>
--- linux/fs/direct-io.c.orig	2006-07-11 09:23:20.000000000 -0400
+++ linux/fs/direct-io.c	2006-07-11 09:27:54.000000000 -0400
@@ -1191,7 +1191,6 @@ __blockdev_direct_IO(int rw, struct kioc
  	loff_t end = offset;
  	struct dio *dio;
  	int release_i_mutex = 0;
-	int acquire_i_mutex = 0;

  	if (rw & WRITE)
  		current->flags |= PF_SYNCWRITE;
@@ -1254,11 +1253,6 @@ __blockdev_direct_IO(int rw, struct kioc
  				goto out;
  			}

-			if (dio_lock_type == DIO_OWN_LOCKING) {
-				mutex_unlock(&inode->i_mutex);
-				acquire_i_mutex = 1;
-			}
-		}

  		if (dio_lock_type == DIO_LOCKING)
  			down_read(&inode->i_alloc_sem);
@@ -1282,8 +1276,6 @@ __blockdev_direct_IO(int rw, struct kioc
  out:
  	if (release_i_mutex)
  		mutex_unlock(&inode->i_mutex);
-	else if (acquire_i_mutex)
-		mutex_lock(&inode->i_mutex);
  	if (rw & WRITE)
  		current->flags &= ~PF_SYNCWRITE;
  	return retval;

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2006-07-11 13:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-09  7:54 [RFC] Badness in __mutex_unlock_slowpath with XFS stress tests Suzuki
2006-03-09 12:03 ` Christoph Hellwig
2006-03-09 22:30   ` Nathan Scott
2006-03-09 22:42     ` Christoph Hellwig
2006-03-09 23:14       ` Nathan Scott
2006-03-10  0:50         ` Nathan Scott
2006-03-10 15:49           ` Christoph Hellwig
2006-03-14  4:46             ` Suparna Bhattacharya
2006-03-17 17:22           ` Adrian Bunk
2006-03-18  3:34             ` Nathan Scott
2006-03-18  5:03               ` Adrian Bunk
2006-07-10 16:46           ` Stephane Doyon
2006-07-11  0:18             ` Nathan Scott
2006-07-11 13:40               ` Stephane Doyon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).