All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ext4: fix race aio-dio vs freeze_fs
@ 2015-11-23 16:02 Dmitry Monakhov
  2015-11-23 16:37 ` Dmitry Monakhov
  2015-11-24 13:24   ` Jan Kara
  0 siblings, 2 replies; 12+ messages in thread
From: Dmitry Monakhov @ 2015-11-23 16:02 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, tytso, Dmitry Monakhov

After freeze_fs was revoked (from Jan Kara) pages's write-back completion
is deffered before unwritten conversion, so explicit flush_unwritten_io()
was removed here: c724585b62411
But we still may face deferred conversion for aio-dio case
# Trivial testcase
for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done &
fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
    --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite
NOTE: Sane testcase should be integrated to xfstests, but it requires
changes in common/* code, so let's use this this test at the moment.

In order to fix this race we have to guard journal transaction with explicit
sb_{start,end}_intwrite()  as we do with ext4_evict_inode here:8e8ad8a5

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/extents.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 3a6197a..4cba944 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -5040,6 +5040,12 @@ int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode,
 	max_blocks = ((EXT4_BLOCK_ALIGN(len + offset, blkbits) >> blkbits) -
 		      map.m_lblk);
 	/*
+	 * Protect us against freezing - AIO-DIO case. Caller didn't have to
+	 * have any protection against it
+	 */
+	sb_start_intwrite(inode->i_sb);
+
+	/*
 	 * This is somewhat ugly but the idea is clear: When transaction is
 	 * reserved, everything goes into it. Otherwise we rather start several
 	 * smaller transactions for conversion of each extent separately.
@@ -5083,6 +5089,7 @@ int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode,
 	}
 	if (!credits)
 		ret2 = ext4_journal_stop(handle);
+	sb_end_intwrite(inode->i_sb);
 	return ret > 0 ? ret2 : ret;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: fix race aio-dio vs freeze_fs
  2015-11-23 16:02 [PATCH] ext4: fix race aio-dio vs freeze_fs Dmitry Monakhov
@ 2015-11-23 16:37 ` Dmitry Monakhov
  2015-11-24 13:31   ` Jan Kara
  2015-11-24 13:24   ` Jan Kara
  1 sibling, 1 reply; 12+ messages in thread
From: Dmitry Monakhov @ 2015-11-23 16:37 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, tytso

[-- Attachment #1: Type: text/plain, Size: 3907 bytes --]

Dmitry Monakhov <dmonakhov@openvz.org> writes:

> After freeze_fs was revoked (from Jan Kara) pages's write-back completion
> is deffered before unwritten conversion, so explicit flush_unwritten_io()
> was removed here: c724585b62411
> But we still may face deferred conversion for aio-dio case
> # Trivial testcase
> for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done &
> fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
>     --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite
> NOTE: Sane testcase should be integrated to xfstests, but it requires
> changes in common/* code, so let's use this this test at the moment.
>
> In order to fix this race we have to guard journal transaction with explicit
> sb_{start,end}_intwrite()  as we do with ext4_evict_inode here:8e8ad8a5
Fairly to say I'm not very happy with the fix because it continues bad
practice of ad-hock fixes for generic journal vs freeze synchronization

Ideal fix would be to move sb_start_intwrite/sb_end_intwrite() to
ext4_journal_start()/ext4_journal_stop() but this is not possible due to
limitations introduced by nojournal mode (described here:8e8ad8a5)
So let's fix nojournal instead. In order to do that we somehow have
store ref_count and pointer to sb inside nojournal_handle.
There are two possible ways to do that.
1) Embed second journal related field to task_struct and guard it with
   compile macros definition.
void *journal_info;
+ #ifdef CONFIG_EXTRA_JOURNAL_INFO
+   void *journal_info2;
+ #endif

2) Encode ref and sb in to single long. This can be done by aligning
   ext4_sb_info pointer to 4096. So we can embed ref count to lower bits
   like follows.
#define EXT4_NOJOURNAL_SHIFT 12
#define EXT4_NOJOURNAL_MAX_REF_COUNT 1 << (EXT4_NOJOURNAL_SHIFT-1)
#define EXT4_NOJOURNAL_MASK  (1 << EXT4_NOJOURNAL_SHIFT) -1
#define NOJOURNAL_SB(handle) (handle & ~EXT4_NOJOURNAL_MASK)
#define NOJOURNAL_REF(handle) ((handle & ~EXT4_NOJOURNAL_MASK) >> 1)
static int ext4_handle_valid(handle_t *handle)
{
        return !(handle & 0x1);
}
static handle_t *get_nojournal_handle(struct super_block *sb)
{
        handle_t *handle = current->journal_info;
        struct super_block *old_sb  = NOJOURNAL_SB(handle);
        unsigned long ref_cnt = NOJOURNAL_REF(handle);
        BUG_ON(old_sb && old_sb != sb);
        ref++;
        current->journal_info = NOJOURNAL_SB(handle);
}

What do you think about this? Are where any better way to fix this?

>
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext4/extents.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 3a6197a..4cba944 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -5040,6 +5040,12 @@ int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode,
>  	max_blocks = ((EXT4_BLOCK_ALIGN(len + offset, blkbits) >> blkbits) -
>  		      map.m_lblk);
>  	/*
> +	 * Protect us against freezing - AIO-DIO case. Caller didn't have to
> +	 * have any protection against it
> +	 */
> +	sb_start_intwrite(inode->i_sb);
> +
> +	/*
>  	 * This is somewhat ugly but the idea is clear: When transaction is
>  	 * reserved, everything goes into it. Otherwise we rather start several
>  	 * smaller transactions for conversion of each extent separately.
> @@ -5083,6 +5089,7 @@ int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode,
>  	}
>  	if (!credits)
>  		ret2 = ext4_journal_stop(handle);
> +	sb_end_intwrite(inode->i_sb);
>  	return ret > 0 ? ret2 : ret;
>  }
>  
> -- 
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: fix race aio-dio vs freeze_fs
  2015-11-23 16:02 [PATCH] ext4: fix race aio-dio vs freeze_fs Dmitry Monakhov
@ 2015-11-24 13:24   ` Jan Kara
  2015-11-24 13:24   ` Jan Kara
  1 sibling, 0 replies; 12+ messages in thread
From: Jan Kara @ 2015-11-24 13:24 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, jack, tytso, xfs, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1264 bytes --]

On Mon 23-11-15 20:02:48, Dmitry Monakhov wrote:
> After freeze_fs was revoked (from Jan Kara) pages's write-back completion
> is deffered before unwritten conversion, so explicit flush_unwritten_io()
> was removed here: c724585b62411
> But we still may face deferred conversion for aio-dio case
> # Trivial testcase
> for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done &
> fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
>     --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite
> NOTE: Sane testcase should be integrated to xfstests, but it requires
> changes in common/* code, so let's use this this test at the moment.
> 
> In order to fix this race we have to guard journal transaction with explicit
> sb_{start,end}_intwrite()  as we do with ext4_evict_inode here:8e8ad8a5

Well, this problem seems to suggest that we have the freeze protection for
AIO writes wrong. We should call file_end_write() from aio_complete() and
not from aio_run_iocb()... I believe XFS and other filesystems may have
problems with this as well (CCed). Attached patch (so far only compile
tested since my test machine is pondering on something else) should fix
this.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-aio-Fix-freeze-protection-of-aio-writes.patch --]
[-- Type: text/x-patch, Size: 2142 bytes --]

>From a7332719d80dc94c11d1c1cb32c88b7f25e1ae61 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 24 Nov 2015 14:19:22 +0100
Subject: [PATCH] aio: Fix freeze protection of aio writes

Currently we dropped freeze protection of aio writes just after IO was
submitted. Thus aio write could be in flight while the filesystem was
frozen and that could result in unexpected situation like aio completion
wanting to convert extent type on frozen filesystem. Testcase from
Dmitry triggering this is like:

for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done &
fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
    --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite

Fix the problem by dropping freeze protection only once IO is completed
in aio_complete().

Reported-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/aio.c           | 10 +++++++---
 include/linux/fs.h |  1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 155f84253f33..3775030053f7 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1065,6 +1065,9 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
 	unsigned tail, pos, head;
 	unsigned long	flags;
 
+	if (kiocb->ki_flags & IOCB_WRITE)
+		file_end_write(kiocb->ki_filp);
+
 	/*
 	 * Special case handling for sync iocbs:
 	 *  - events go directly into the iocb for fast handling
@@ -1449,13 +1452,14 @@ rw_common:
 
 		len = ret;
 
-		if (rw == WRITE)
+		/* We drop freeze protection in aio_complete() */
+		if (rw == WRITE) {
 			file_start_write(file);
+			req->ki_flags |= IOCB_WRITE;
+		}
 
 		ret = iter_op(req, &iter);
 
-		if (rw == WRITE)
-			file_end_write(file);
 		kfree(iovec);
 		break;
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3aa514254161..54af40ed6a26 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -319,6 +319,7 @@ struct writeback_control;
 #define IOCB_EVENTFD		(1 << 0)
 #define IOCB_APPEND		(1 << 1)
 #define IOCB_DIRECT		(1 << 2)
+#define IOCB_WRITE		(1 << 3)
 
 struct kiocb {
 	struct file		*ki_filp;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: fix race aio-dio vs freeze_fs
@ 2015-11-24 13:24   ` Jan Kara
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2015-11-24 13:24 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-fsdevel, tytso, linux-ext4, jack, xfs

[-- Attachment #1: Type: text/plain, Size: 1264 bytes --]

On Mon 23-11-15 20:02:48, Dmitry Monakhov wrote:
> After freeze_fs was revoked (from Jan Kara) pages's write-back completion
> is deffered before unwritten conversion, so explicit flush_unwritten_io()
> was removed here: c724585b62411
> But we still may face deferred conversion for aio-dio case
> # Trivial testcase
> for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done &
> fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
>     --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite
> NOTE: Sane testcase should be integrated to xfstests, but it requires
> changes in common/* code, so let's use this this test at the moment.
> 
> In order to fix this race we have to guard journal transaction with explicit
> sb_{start,end}_intwrite()  as we do with ext4_evict_inode here:8e8ad8a5

Well, this problem seems to suggest that we have the freeze protection for
AIO writes wrong. We should call file_end_write() from aio_complete() and
not from aio_run_iocb()... I believe XFS and other filesystems may have
problems with this as well (CCed). Attached patch (so far only compile
tested since my test machine is pondering on something else) should fix
this.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-aio-Fix-freeze-protection-of-aio-writes.patch --]
[-- Type: text/x-patch, Size: 2142 bytes --]

>From a7332719d80dc94c11d1c1cb32c88b7f25e1ae61 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 24 Nov 2015 14:19:22 +0100
Subject: [PATCH] aio: Fix freeze protection of aio writes

Currently we dropped freeze protection of aio writes just after IO was
submitted. Thus aio write could be in flight while the filesystem was
frozen and that could result in unexpected situation like aio completion
wanting to convert extent type on frozen filesystem. Testcase from
Dmitry triggering this is like:

for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done &
fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
    --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite

Fix the problem by dropping freeze protection only once IO is completed
in aio_complete().

Reported-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/aio.c           | 10 +++++++---
 include/linux/fs.h |  1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 155f84253f33..3775030053f7 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1065,6 +1065,9 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
 	unsigned tail, pos, head;
 	unsigned long	flags;
 
+	if (kiocb->ki_flags & IOCB_WRITE)
+		file_end_write(kiocb->ki_filp);
+
 	/*
 	 * Special case handling for sync iocbs:
 	 *  - events go directly into the iocb for fast handling
@@ -1449,13 +1452,14 @@ rw_common:
 
 		len = ret;
 
-		if (rw == WRITE)
+		/* We drop freeze protection in aio_complete() */
+		if (rw == WRITE) {
 			file_start_write(file);
+			req->ki_flags |= IOCB_WRITE;
+		}
 
 		ret = iter_op(req, &iter);
 
-		if (rw == WRITE)
-			file_end_write(file);
 		kfree(iovec);
 		break;
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3aa514254161..54af40ed6a26 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -319,6 +319,7 @@ struct writeback_control;
 #define IOCB_EVENTFD		(1 << 0)
 #define IOCB_APPEND		(1 << 1)
 #define IOCB_DIRECT		(1 << 2)
+#define IOCB_WRITE		(1 << 3)
 
 struct kiocb {
 	struct file		*ki_filp;
-- 
2.1.4


[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: fix race aio-dio vs freeze_fs
  2015-11-23 16:37 ` Dmitry Monakhov
@ 2015-11-24 13:31   ` Jan Kara
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2015-11-24 13:31 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, jack, tytso

On Mon 23-11-15 19:37:56, Dmitry Monakhov wrote:
> Dmitry Monakhov <dmonakhov@openvz.org> writes:
> 
> > After freeze_fs was revoked (from Jan Kara) pages's write-back completion
> > is deffered before unwritten conversion, so explicit flush_unwritten_io()
> > was removed here: c724585b62411
> > But we still may face deferred conversion for aio-dio case
> > # Trivial testcase
> > for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done &
> > fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
> >     --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite
> > NOTE: Sane testcase should be integrated to xfstests, but it requires
> > changes in common/* code, so let's use this this test at the moment.
> >
> > In order to fix this race we have to guard journal transaction with explicit
> > sb_{start,end}_intwrite()  as we do with ext4_evict_inode here:8e8ad8a5
> Fairly to say I'm not very happy with the fix because it continues bad
> practice of ad-hock fixes for generic journal vs freeze synchronization
> 
> Ideal fix would be to move sb_start_intwrite/sb_end_intwrite() to
> ext4_journal_start()/ext4_journal_stop() but this is not possible due to
> limitations introduced by nojournal mode (described here:8e8ad8a5)
> So let's fix nojournal instead. In order to do that we somehow have
> store ref_count and pointer to sb inside nojournal_handle.
> There are two possible ways to do that.
> 1) Embed second journal related field to task_struct and guard it with
>    compile macros definition.
> void *journal_info;
> + #ifdef CONFIG_EXTRA_JOURNAL_INFO
> +   void *journal_info2;
> + #endif
> 
> 2) Encode ref and sb in to single long. This can be done by aligning
>    ext4_sb_info pointer to 4096. So we can embed ref count to lower bits
>    like follows.
> #define EXT4_NOJOURNAL_SHIFT 12
> #define EXT4_NOJOURNAL_MAX_REF_COUNT 1 << (EXT4_NOJOURNAL_SHIFT-1)
> #define EXT4_NOJOURNAL_MASK  (1 << EXT4_NOJOURNAL_SHIFT) -1
> #define NOJOURNAL_SB(handle) (handle & ~EXT4_NOJOURNAL_MASK)
> #define NOJOURNAL_REF(handle) ((handle & ~EXT4_NOJOURNAL_MASK) >> 1)
> static int ext4_handle_valid(handle_t *handle)
> {
>         return !(handle & 0x1);
> }
> static handle_t *get_nojournal_handle(struct super_block *sb)
> {
>         handle_t *handle = current->journal_info;
>         struct super_block *old_sb  = NOJOURNAL_SB(handle);
>         unsigned long ref_cnt = NOJOURNAL_REF(handle);
>         BUG_ON(old_sb && old_sb != sb);
>         ref++;
>         current->journal_info = NOJOURNAL_SB(handle);
> }
> 
> What do you think about this? Are where any better way to fix this?

Well, we do have:

WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);

in ext4_journal_check_start() so we should get a warning for each
unexpected transaction started on frozen filesystem. The idea really is
there should be no write activity happening on frozen fs and the few cases
like MMP are handled by explicit sb_start_intwrite() calls. I'd prefer to
keep those cases special and don't add general freeze protection in
transaction handling code since that may only result in deadlocks instead
of the warning when starting transaction in unexpectected places...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: fix race aio-dio vs freeze_fs
  2015-11-24 13:24   ` Jan Kara
@ 2015-11-24 16:07     ` Christoph Hellwig
  -1 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2015-11-24 16:07 UTC (permalink / raw)
  To: Jan Kara; +Cc: Dmitry Monakhov, linux-ext4, tytso, xfs, linux-fsdevel

On Tue, Nov 24, 2015 at 02:24:21PM +0100, Jan Kara wrote:
> Well, this problem seems to suggest that we have the freeze protection for
> AIO writes wrong. We should call file_end_write() from aio_complete() and
> not from aio_run_iocb()... I believe XFS and other filesystems may have
> problems with this as well (CCed). Attached patch (so far only compile
> tested since my test machine is pondering on something else) should fix
> this.

Sounds like one way to do it, but we'd really want a vfs_* helper for
this so that it doesn't have to duplicated in other write_iter users
like the loop driver, which seems to be missing file
file_start_write/file_end_write entirely.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: fix race aio-dio vs freeze_fs
@ 2015-11-24 16:07     ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2015-11-24 16:07 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-ext4, Dmitry Monakhov, tytso, xfs

On Tue, Nov 24, 2015 at 02:24:21PM +0100, Jan Kara wrote:
> Well, this problem seems to suggest that we have the freeze protection for
> AIO writes wrong. We should call file_end_write() from aio_complete() and
> not from aio_run_iocb()... I believe XFS and other filesystems may have
> problems with this as well (CCed). Attached patch (so far only compile
> tested since my test machine is pondering on something else) should fix
> this.

Sounds like one way to do it, but we'd really want a vfs_* helper for
this so that it doesn't have to duplicated in other write_iter users
like the loop driver, which seems to be missing file
file_start_write/file_end_write entirely.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: fix race aio-dio vs freeze_fs
  2015-11-24 13:24   ` Jan Kara
  (?)
  (?)
@ 2015-11-24 16:55   ` Dmitry Monakhov
  2015-11-25  9:19       ` Jan Kara
  -1 siblings, 1 reply; 12+ messages in thread
From: Dmitry Monakhov @ 2015-11-24 16:55 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-ext4, Dmitriy Monakhov, tytso, xfs


[-- Attachment #1.1: Type: text/plain, Size: 1786 bytes --]

On Nov 24, 2015 16:25, "Jan Kara" <jack@suse.cz> wrote:
>
> On Mon 23-11-15 20:02:48, Dmitry Monakhov wrote:
> > After freeze_fs was revoked (from Jan Kara) pages's write-back
completion
> > is deffered before unwritten conversion, so explicit
flush_unwritten_io()
> > was removed here: c724585b62411
> > But we still may face deferred conversion for aio-dio case
> > # Trivial testcase
> > for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done
&
> > fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
> >     --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite
> > NOTE: Sane testcase should be integrated to xfstests, but it requires
> > changes in common/* code, so let's use this this test at the moment.
> >
> > In order to fix this race we have to guard journal transaction with
explicit
> > sb_{start,end}_intwrite()  as we do with ext4_evict_inode here:8e8ad8a5
>
> Well, this problem seems to suggest that we have the freeze protection for
> AIO writes wrong. We should call file_end_write() from aio_complete() and
> not from aio_run_iocb()...
Yep. It was my first attempt to fix that issue, but  unfortunately this
trick will break lockdep. Caller will do file_start_write and exit to
userspace. Lockdep treats such behaviour as bug (return to userspace with a
lock held)

There are two way to fix that
1) add specific 'long' lock primitive to lockdep
2) let sync_filesystems to wait pended aio-dio

> I believe XFS and other filesystems may have
> problems with this as well (CCed). Attached patch (so far only compile
> tested since my test machine is pondering on something else) should fix
> this.
>
>                                                                 Honza
>
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

[-- Attachment #1.2: Type: text/html, Size: 2288 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: fix race aio-dio vs freeze_fs
  2015-11-24 16:55   ` Dmitry Monakhov
@ 2015-11-25  9:19       ` Jan Kara
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2015-11-25  9:19 UTC (permalink / raw)
  To: Dmitry Monakhov
  Cc: Jan Kara, Dmitriy Monakhov, linux-fsdevel, xfs, tytso, linux-ext4

On Tue 24-11-15 20:55:40, Dmitry Monakhov wrote:
> On Nov 24, 2015 16:25, "Jan Kara" <jack@suse.cz> wrote:
> > On Mon 23-11-15 20:02:48, Dmitry Monakhov wrote:
> > > After freeze_fs was revoked (from Jan Kara) pages's write-back
> completion
> > > is deffered before unwritten conversion, so explicit
> flush_unwritten_io()
> > > was removed here: c724585b62411
> > > But we still may face deferred conversion for aio-dio case
> > > # Trivial testcase
> > > for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done
> &
> > > fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
> > >     --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite
> > > NOTE: Sane testcase should be integrated to xfstests, but it requires
> > > changes in common/* code, so let's use this this test at the moment.
> > >
> > > In order to fix this race we have to guard journal transaction with
> explicit
> > > sb_{start,end}_intwrite()  as we do with ext4_evict_inode here:8e8ad8a5
> >
> > Well, this problem seems to suggest that we have the freeze protection for
> > AIO writes wrong. We should call file_end_write() from aio_complete() and
> > not from aio_run_iocb()...
> Yep. It was my first attempt to fix that issue, but  unfortunately this
> trick will break lockdep. Caller will do file_start_write and exit to
> userspace. Lockdep treats such behaviour as bug (return to userspace with a
> lock held)
> 
> There are two way to fix that
> 1) add specific 'long' lock primitive to lockdep

The way we tell lockdep about transfer of context is that we just lie to
lockdep and tell it that the lock got unlocked at appropriate place and
then tell it we locked it again at another place. It is somewhat ugly but
not that hard to do... Generally lockdep is a tool that should help but by
no means it should be a reason for poor locking decisions just because
lockdep cannot handle them.

								Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: fix race aio-dio vs freeze_fs
@ 2015-11-25  9:19       ` Jan Kara
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2015-11-25  9:19 UTC (permalink / raw)
  To: Dmitry Monakhov
  Cc: tytso, xfs, Dmitriy Monakhov, linux-fsdevel, Jan Kara, linux-ext4

On Tue 24-11-15 20:55:40, Dmitry Monakhov wrote:
> On Nov 24, 2015 16:25, "Jan Kara" <jack@suse.cz> wrote:
> > On Mon 23-11-15 20:02:48, Dmitry Monakhov wrote:
> > > After freeze_fs was revoked (from Jan Kara) pages's write-back
> completion
> > > is deffered before unwritten conversion, so explicit
> flush_unwritten_io()
> > > was removed here: c724585b62411
> > > But we still may face deferred conversion for aio-dio case
> > > # Trivial testcase
> > > for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done
> &
> > > fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
> > >     --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite
> > > NOTE: Sane testcase should be integrated to xfstests, but it requires
> > > changes in common/* code, so let's use this this test at the moment.
> > >
> > > In order to fix this race we have to guard journal transaction with
> explicit
> > > sb_{start,end}_intwrite()  as we do with ext4_evict_inode here:8e8ad8a5
> >
> > Well, this problem seems to suggest that we have the freeze protection for
> > AIO writes wrong. We should call file_end_write() from aio_complete() and
> > not from aio_run_iocb()...
> Yep. It was my first attempt to fix that issue, but  unfortunately this
> trick will break lockdep. Caller will do file_start_write and exit to
> userspace. Lockdep treats such behaviour as bug (return to userspace with a
> lock held)
> 
> There are two way to fix that
> 1) add specific 'long' lock primitive to lockdep

The way we tell lockdep about transfer of context is that we just lie to
lockdep and tell it that the lock got unlocked at appropriate place and
then tell it we locked it again at another place. It is somewhat ugly but
not that hard to do... Generally lockdep is a tool that should help but by
no means it should be a reason for poor locking decisions just because
lockdep cannot handle them.

								Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: fix race aio-dio vs freeze_fs
  2015-11-24 16:07     ` Christoph Hellwig
@ 2015-11-25 10:25       ` Jan Kara
  -1 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2015-11-25 10:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Dmitry Monakhov, linux-ext4, tytso, xfs, linux-fsdevel

On Tue 24-11-15 08:07:23, Christoph Hellwig wrote:
> On Tue, Nov 24, 2015 at 02:24:21PM +0100, Jan Kara wrote:
> > Well, this problem seems to suggest that we have the freeze protection for
> > AIO writes wrong. We should call file_end_write() from aio_complete() and
> > not from aio_run_iocb()... I believe XFS and other filesystems may have
> > problems with this as well (CCed). Attached patch (so far only compile
> > tested since my test machine is pondering on something else) should fix
> > this.
> 
> Sounds like one way to do it, but we'd really want a vfs_* helper for
> this so that it doesn't have to duplicated in other write_iter users
> like the loop driver, which seems to be missing file
> file_start_write/file_end_write entirely.

That is mostly a separate issue, isn't it? I guess you mean a helper like
vfs_write_iter() that would get freeze protection and call ->write_iter()?
And what about files which have ->write (or ->splice_write()) and don't end
up calling ->write_iter? Also there is stuff like __kernel_write() which
ends up calling ->write_iter() but e.g. kernel/acct.c: do_acct_process()
wants to do it's own thing...

So the current status is not ideal but I don't see how to make it
substantially better...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: fix race aio-dio vs freeze_fs
@ 2015-11-25 10:25       ` Jan Kara
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2015-11-25 10:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, xfs, Dmitry Monakhov, linux-fsdevel, tytso, linux-ext4

On Tue 24-11-15 08:07:23, Christoph Hellwig wrote:
> On Tue, Nov 24, 2015 at 02:24:21PM +0100, Jan Kara wrote:
> > Well, this problem seems to suggest that we have the freeze protection for
> > AIO writes wrong. We should call file_end_write() from aio_complete() and
> > not from aio_run_iocb()... I believe XFS and other filesystems may have
> > problems with this as well (CCed). Attached patch (so far only compile
> > tested since my test machine is pondering on something else) should fix
> > this.
> 
> Sounds like one way to do it, but we'd really want a vfs_* helper for
> this so that it doesn't have to duplicated in other write_iter users
> like the loop driver, which seems to be missing file
> file_start_write/file_end_write entirely.

That is mostly a separate issue, isn't it? I guess you mean a helper like
vfs_write_iter() that would get freeze protection and call ->write_iter()?
And what about files which have ->write (or ->splice_write()) and don't end
up calling ->write_iter? Also there is stuff like __kernel_write() which
ends up calling ->write_iter() but e.g. kernel/acct.c: do_acct_process()
wants to do it's own thing...

So the current status is not ideal but I don't see how to make it
substantially better...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-11-25 10:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-23 16:02 [PATCH] ext4: fix race aio-dio vs freeze_fs Dmitry Monakhov
2015-11-23 16:37 ` Dmitry Monakhov
2015-11-24 13:31   ` Jan Kara
2015-11-24 13:24 ` Jan Kara
2015-11-24 13:24   ` Jan Kara
2015-11-24 16:07   ` Christoph Hellwig
2015-11-24 16:07     ` Christoph Hellwig
2015-11-25 10:25     ` Jan Kara
2015-11-25 10:25       ` Jan Kara
2015-11-24 16:55   ` Dmitry Monakhov
2015-11-25  9:19     ` Jan Kara
2015-11-25  9:19       ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.