All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] ext4: serialize dio nonlocked reads with defrag workers
@ 2012-09-04 17:22 Dmitry Monakhov
  2012-09-04 17:22 ` [PATCH 2/4] ext4: serialize unlocked dio reads with truncate Dmitry Monakhov
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Dmitry Monakhov @ 2012-09-04 17:22 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, Dmitry Monakhov

Inode's block defrag and ext4_change_inode_journal_flag() may
affect nonlocked DIO reads result, so proper synchronization
required.

- add missed inode_dio_wait() calls where appropriate
- recheck ext4_should_dioread_nolock under extra i_dio_count reference.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/ext4.h        |    2 ++
 fs/ext4/ext4_jbd2.h   |    2 ++
 fs/ext4/indirect.c    |   12 ++++++++++++
 fs/ext4/inode.c       |    5 +++++
 fs/ext4/move_extent.c |   10 ++++++++++
 5 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f9024a6..36e8b84 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1349,6 +1349,8 @@ enum {
 	EXT4_STATE_DIO_UNWRITTEN,	/* need convert on dio done*/
 	EXT4_STATE_NEWENTRY,		/* File just added to dir */
 	EXT4_STATE_DELALLOC_RESERVED,	/* blks already reserved for delalloc */
+	EXT4_STATE_DIOREAD_LOCK,	/* Disable support for dio read
+					   nolocking */
 };
 
 #define EXT4_INODE_BIT_FNS(name, field, offset)				\
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 56d258c..318d177 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -399,6 +399,8 @@ static inline int ext4_should_dioread_nolock(struct inode *inode)
 		return 0;
 	if (ext4_should_journal_data(inode))
 		return 0;
+	if (ext4_test_inode_state(inode, EXT4_STATE_DIOREAD_LOCK))
+		return 0;
 	return 1;
 }
 
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 830e1b2..ba40309 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -812,10 +812,22 @@ retry:
 			ext4_flush_completed_IO(inode);
 			mutex_unlock(&inode->i_mutex);
 		}
+		/*
+		 * Inode's locking behaviour may change due to number
+		 * of reasons, in order to be shure that nolock dioreads
+		 * is still allowed we have to recheck inode's flags
+		 * while i_dio_count > 0
+		 */
+		atomic_inc(&inode->i_dio_count);
+		if (!unlikely(ext4_should_dioread_nolock(inode))) {
+			inode_dio_done(inode);
+			goto retry;
+		}
 		ret = __blockdev_direct_IO(rw, iocb, inode,
 				 inode->i_sb->s_bdev, iov,
 				 offset, nr_segs,
 				 ext4_get_block, NULL, NULL, 0);
+		inode_dio_done(inode);
 	} else {
 		ret = blockdev_direct_IO(rw, iocb, inode, iov,
 				 offset, nr_segs, ext4_get_block);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d12d30e..58ef61a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4741,6 +4741,10 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
 			return err;
 	}
 
+	/* Wait for all existing dio workers */
+	ext4_set_inode_state(inode, EXT4_STATE_DIOREAD_LOCK);
+	inode_dio_wait(inode);
+
 	jbd2_journal_lock_updates(journal);
 
 	/*
@@ -4760,6 +4764,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
 	ext4_set_aops(inode);
 
 	jbd2_journal_unlock_updates(journal);
+	ext4_clear_inode_state(inode, EXT4_STATE_DIOREAD_LOCK);
 
 	/* Finally we can mark the inode as dirty. */
 
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index c5826c6..a6a4278 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -1213,6 +1213,14 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
 	ret1 = mext_inode_double_lock(orig_inode, donor_inode);
 	if (ret1 < 0)
 		return ret1;
+	/* Protect inodes against DIO workers
+	 * - Disable dio nonlock reads, so all new dio workers will block
+	 *   on i_mutex.
+	 * - wait for existing DIO in flight */
+	ext4_set_inode_state(orig_inode, EXT4_STATE_DIOREAD_LOCK);
+	ext4_set_inode_state(donor_inode, EXT4_STATE_DIOREAD_LOCK);
+	inode_dio_wait(orig_inode);
+	inode_dio_wait(donor_inode);
 
 	/* Protect extent tree against block allocations via delalloc */
 	double_down_write_data_sem(orig_inode, donor_inode);
@@ -1412,6 +1420,8 @@ out:
 		kfree(holecheck_path);
 	}
 	double_up_write_data_sem(orig_inode, donor_inode);
+	ext4_clear_inode_state(orig_inode, EXT4_STATE_DIOREAD_LOCK);
+	ext4_clear_inode_state(donor_inode, EXT4_STATE_DIOREAD_LOCK);
 	ret2 = mext_inode_double_unlock(orig_inode, donor_inode);
 
 	if (ret1)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] ext4: serialize unlocked dio reads with truncate
  2012-09-04 17:22 [PATCH 1/4] ext4: serialize dio nonlocked reads with defrag workers Dmitry Monakhov
@ 2012-09-04 17:22 ` Dmitry Monakhov
  2012-09-04 17:22 ` [PATCH 3/4] ext4: endless truncate due to nonlocked dio readers Dmitry Monakhov
  2012-09-04 17:22 ` [PATCH 4/4] ext4: serialize truncate with owerwrite DIO workers Dmitry Monakhov
  2 siblings, 0 replies; 5+ messages in thread
From: Dmitry Monakhov @ 2012-09-04 17:22 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, Dmitry Monakhov

Current serialization will works only for DIO which holds
i_mutex, but nonlocked DIO following race is possible:

dio_nolock_read_task            truncate_task
				->ext4_setattr()
				 ->inode_dio_wait()
->ext4_ext_direct_IO
  ->ext4_ind_direct_IO
    ->__blockdev_direct_IO
      ->ext4_get_block
				 ->truncate_setsize()
				 ->ext4_truncate()
				 #alloc truncated blocks
				 #to other inode
      ->submit_io()
     #INFORMATION LEAK

In order to serialize with unlocked DIO reads we have to
rearrange wait sequence
1) update i_size first
2) wait for outstanding DIO requests
3) and only after that truncate inode blocks

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/inode.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 58ef61a..709ec5a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4304,7 +4304,6 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
 	}
 
 	if (attr->ia_valid & ATTR_SIZE) {
-		inode_dio_wait(inode);
 
 		if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
 			struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@@ -4355,6 +4354,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
 	if (attr->ia_valid & ATTR_SIZE) {
 		if (attr->ia_size != i_size_read(inode))
 			truncate_setsize(inode, attr->ia_size);
+		inode_dio_wait(inode);
 		ext4_truncate(inode);
 	}
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] ext4: endless truncate due to nonlocked dio readers
  2012-09-04 17:22 [PATCH 1/4] ext4: serialize dio nonlocked reads with defrag workers Dmitry Monakhov
  2012-09-04 17:22 ` [PATCH 2/4] ext4: serialize unlocked dio reads with truncate Dmitry Monakhov
@ 2012-09-04 17:22 ` Dmitry Monakhov
  2012-09-04 17:31   ` Dmitry Monakhov
  2012-09-04 17:22 ` [PATCH 4/4] ext4: serialize truncate with owerwrite DIO workers Dmitry Monakhov
  2 siblings, 1 reply; 5+ messages in thread
From: Dmitry Monakhov @ 2012-09-04 17:22 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, Dmitry Monakhov

If we have enough aggressive DIO readers, truncate will wait
forever inside inode_dio_wait(). It is reasonable to disable
nonlock DIO read optimization during truncate.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/inode.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 709ec5a..e696f19 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4354,7 +4354,9 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
 	if (attr->ia_valid & ATTR_SIZE) {
 		if (attr->ia_size != i_size_read(inode))
 			truncate_setsize(inode, attr->ia_size);
+		ext4_set_inode_state(inode, EXT4_STATE_DIOREAD_LOCK);
 		inode_dio_wait(inode);
+		ext4_set_inode_state(inode, EXT4_STATE_DIOREAD_LOCK);
 		ext4_truncate(inode);
 	}
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] ext4: serialize truncate with owerwrite DIO workers
  2012-09-04 17:22 [PATCH 1/4] ext4: serialize dio nonlocked reads with defrag workers Dmitry Monakhov
  2012-09-04 17:22 ` [PATCH 2/4] ext4: serialize unlocked dio reads with truncate Dmitry Monakhov
  2012-09-04 17:22 ` [PATCH 3/4] ext4: endless truncate due to nonlocked dio readers Dmitry Monakhov
@ 2012-09-04 17:22 ` Dmitry Monakhov
  2 siblings, 0 replies; 5+ messages in thread
From: Dmitry Monakhov @ 2012-09-04 17:22 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, Dmitry Monakhov

Jan Kara have spotted interesting issue:
There are potential data corruption issue with direct IO overwrites
racing with truncate:
 Like:
  dio write                      truncate_task
  ->ext4_ext_direct_IO
   ->overwrite == 1
    ->down_read(&EXT4_I(inode)->i_data_sem);
    ->mutex_unlock(&inode->i_mutex);
                               ->ext4_setattr()
                                ->inode_dio_wait()
                                ->truncate_setsize()
                                ->ext4_truncate()
                                 ->down_write(&EXT4_I(inode)->i_data_sem);
    ->__blockdev_direct_IO
     ->ext4_get_block
     ->submit_io()
    ->up_read(&EXT4_I(inode)->i_data_sem);
                                 # truncate data blocks, allocate them to
                                 # other inode - bad stuff happens because
                                 # dio is still in flight.

In order to serialize with truncate dio worker should grab extra i_dio_count
reference before drop i_mutex.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/inode.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e696f19..46606e3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3035,6 +3035,7 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
 		overwrite = *((int *)iocb->private);
 
 		if (overwrite) {
+			atomic_inc(&inode->i_dio_count);
 			down_read(&EXT4_I(inode)->i_data_sem);
 			mutex_unlock(&inode->i_mutex);
 		}
@@ -3134,6 +3135,7 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
 		if (overwrite) {
 			up_read(&EXT4_I(inode)->i_data_sem);
 			mutex_lock(&inode->i_mutex);
+			inode_dio_done(inode);
 		}
 
 		return ret;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 3/4] ext4: endless truncate due to nonlocked dio readers
  2012-09-04 17:22 ` [PATCH 3/4] ext4: endless truncate due to nonlocked dio readers Dmitry Monakhov
@ 2012-09-04 17:31   ` Dmitry Monakhov
  0 siblings, 0 replies; 5+ messages in thread
From: Dmitry Monakhov @ 2012-09-04 17:31 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack

On Tue,  4 Sep 2012 21:22:50 +0400, Dmitry Monakhov <dmonakhov@openvz.org> wrote:
> If we have enough aggressive DIO readers, truncate will wait
> forever inside inode_dio_wait(). It is reasonable to disable
> nonlock DIO read optimization during truncate.
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext4/inode.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 709ec5a..e696f19 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4354,7 +4354,9 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
>  	if (attr->ia_valid & ATTR_SIZE) {
>  		if (attr->ia_size != i_size_read(inode))
>  			truncate_setsize(inode, attr->ia_size);
> +		ext4_set_inode_state(inode, EXT4_STATE_DIOREAD_LOCK);
>  		inode_dio_wait(inode);
> +		ext4_set_inode_state(inode, EXT4_STATE_DIOREAD_LOCK);
OOps, sorry  ^^^^^^^^^^^^^^^ this is wrong
should be ext4_clear_inode_state()
>  		ext4_truncate(inode);
>  	}
>  
> -- 
> 1.7.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-09-04 17:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-04 17:22 [PATCH 1/4] ext4: serialize dio nonlocked reads with defrag workers Dmitry Monakhov
2012-09-04 17:22 ` [PATCH 2/4] ext4: serialize unlocked dio reads with truncate Dmitry Monakhov
2012-09-04 17:22 ` [PATCH 3/4] ext4: endless truncate due to nonlocked dio readers Dmitry Monakhov
2012-09-04 17:31   ` Dmitry Monakhov
2012-09-04 17:22 ` [PATCH 4/4] ext4: serialize truncate with owerwrite DIO workers Dmitry Monakhov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.