linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ext4: Optimize ext4 DIO overwrites
@ 2019-12-18 17:44 Jan Kara
  2019-12-19 13:53 ` Ritesh Harjani
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2019-12-18 17:44 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Dan Williams, Berrocal, Eduardo, Jan Kara

Currently we start transaction for mapping every extent for writing
using direct IO. This is unnecessary when we know we are overwriting
already allocated blocks and the overhead of starting a transaction can
be significant especially for multithreaded workloads doing small writes.
Use iomap operations that avoid starting a transaction for direct IO
overwrites.

This improves throughput of 4k random writes - fio jobfile:
[global]
rw=randrw
norandommap=1
invalidate=0
bs=4k
numjobs=16
time_based=1
ramp_time=30
runtime=120
group_reporting=1
ioengine=psync
direct=1
size=16G
filename=file1.0.0:file1.0.1:file1.0.2:file1.0.3:file1.0.4:file1.0.5:file1.0.6:file1.0.7:file1.0.8:file1.0.9:file1.0.10:file1.0.11:file1.0.12:file1.0.13:file1.0.14:file1.0.15:file1.0.16:file1.0.17:file1.0.18:file1.0.19:file1.0.20:file1.0.21:file1.0.22:file1.0.23:file1.0.24:file1.0.25:file1.0.26:file1.0.27:file1.0.28:file1.0.29:file1.0.30:file1.0.31
file_service_type=random
nrfiles=32

from 3018MB/s to 4059MB/s in my test VM running test against simulated
pmem device (note that before iomap conversion, this workload was able
to achieve 3708MB/s because old direct IO path avoided transaction start
for overwrites as well). For dax, the win is even larger improving
throughput from 3042MB/s to 4311MB/s.

Reported-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ext4.h  |  1 +
 fs/ext4/file.c  |  4 +++-
 fs/ext4/inode.c | 21 +++++++++++++++++++++
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f8578caba40d..e31fc5749a19 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3390,6 +3390,7 @@ static inline void ext4_clear_io_unwritten_flag(ext4_io_end_t *io_end)
 }
 
 extern const struct iomap_ops ext4_iomap_ops;
+extern const struct iomap_ops ext4_iomap_overwrite_ops;
 extern const struct iomap_ops ext4_iomap_report_ops;
 
 static inline int ext4_buffer_uptodate(struct buffer_head *bh)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 6a7293a5cda2..f8e4af72d64d 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -370,6 +370,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	loff_t offset;
 	handle_t *handle;
 	struct inode *inode = file_inode(iocb->ki_filp);
+	const struct iomap_ops *iomap_ops = &ext4_iomap_ops;
 	bool extend = false, overwrite = false, unaligned_aio = false;
 
 	if (iocb->ki_flags & IOCB_NOWAIT) {
@@ -415,6 +416,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (!unaligned_aio && ext4_overwrite_io(inode, offset, count) &&
 	    ext4_should_dioread_nolock(inode)) {
 		overwrite = true;
+		iomap_ops = &ext4_iomap_overwrite_ops;
 		downgrade_write(&inode->i_rwsem);
 	}
 
@@ -435,7 +437,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		ext4_journal_stop(handle);
 	}
 
-	ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops, &ext4_dio_write_ops,
+	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
 			   is_sync_kiocb(iocb) || unaligned_aio || extend);
 
 	if (extend)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 28f28de0c1b6..e1eb4493aacc 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3448,6 +3448,22 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	return 0;
 }
 
+static int ext4_iomap_overwrite_begin(struct inode *inode, loff_t offset,
+		loff_t length, unsigned flags, struct iomap *iomap,
+		struct iomap *srcmap)
+{
+	int ret;
+
+	/*
+	 * Even for writes we don't need to allocate blocks, so just pretend
+	 * we are reading to save overhead of starting a transaction.
+	 */
+	flags &= ~IOMAP_WRITE;
+	ret = ext4_iomap_begin(inode, offset, length, flags, iomap, srcmap);
+	WARN_ON_ONCE(iomap->type != IOMAP_MAPPED);
+	return ret;
+}
+
 static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length,
 			  ssize_t written, unsigned flags, struct iomap *iomap)
 {
@@ -3469,6 +3485,11 @@ const struct iomap_ops ext4_iomap_ops = {
 	.iomap_end		= ext4_iomap_end,
 };
 
+const struct iomap_ops ext4_iomap_overwrite_ops = {
+	.iomap_begin		= ext4_iomap_overwrite_begin,
+	.iomap_end		= ext4_iomap_end,
+};
+
 static bool ext4_iomap_is_delalloc(struct inode *inode,
 				   struct ext4_map_blocks *map)
 {
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: Optimize ext4 DIO overwrites
  2019-12-18 17:44 [PATCH] ext4: Optimize ext4 DIO overwrites Jan Kara
@ 2019-12-19 13:53 ` Ritesh Harjani
  2019-12-19 19:28   ` Jan Kara
  0 siblings, 1 reply; 6+ messages in thread
From: Ritesh Harjani @ 2019-12-19 13:53 UTC (permalink / raw)
  To: Jan Kara, Ted Tso; +Cc: linux-ext4, Dan Williams, Berrocal, Eduardo



On 12/18/19 11:14 PM, Jan Kara wrote:
> Currently we start transaction for mapping every extent for writing
> using direct IO. This is unnecessary when we know we are overwriting
> already allocated blocks and the overhead of starting a transaction can
> be significant especially for multithreaded workloads doing small writes.
> Use iomap operations that avoid starting a transaction for direct IO
> overwrites.
> 
> This improves throughput of 4k random writes - fio jobfile:
> [global]
> rw=randrw
> norandommap=1
> invalidate=0
> bs=4k
> numjobs=16
> time_based=1
> ramp_time=30
> runtime=120
> group_reporting=1
> ioengine=psync
> direct=1
> size=16G
> filename=file1.0.0:file1.0.1:file1.0.2:file1.0.3:file1.0.4:file1.0.5:file1.0.6:file1.0.7:file1.0.8:file1.0.9:file1.0.10:file1.0.11:file1.0.12:file1.0.13:file1.0.14:file1.0.15:file1.0.16:file1.0.17:file1.0.18:file1.0.19:file1.0.20:file1.0.21:file1.0.22:file1.0.23:file1.0.24:file1.0.25:file1.0.26:file1.0.27:file1.0.28:file1.0.29:file1.0.30:file1.0.31
> file_service_type=random
> nrfiles=32
> 
> from 3018MB/s to 4059MB/s in my test VM running test against simulated
> pmem device (note that before iomap conversion, this workload was able
> to achieve 3708MB/s because old direct IO path avoided transaction start
> for overwrites as well). For dax, the win is even larger improving
> throughput from 3042MB/s to 4311MB/s.

However for dax via ext4_dax_write_iter() path, we still need a way to
detect if it's overwrite and that path can be optimized too right?
I see, that this path could use both `shared inode locking` and
`no journal transaction` optimizations in case of overwrites. Correct?


> 
> Reported-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Jan Kara <jack@suse.cz>

This was one of the next AI I too wanted to do. I guess since everyone
loves performance improvements. :)

No problem with current patch. Looks good. Gave it a run too on my
system.

Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>


However depending on which patch lands first one may need a
re-basing. Will conflict with this-
https://marc.info/?l=linux-ext4&m=157613016931238&w=2



> ---
>   fs/ext4/ext4.h  |  1 +
>   fs/ext4/file.c  |  4 +++-
>   fs/ext4/inode.c | 21 +++++++++++++++++++++
>   3 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index f8578caba40d..e31fc5749a19 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -3390,6 +3390,7 @@ static inline void ext4_clear_io_unwritten_flag(ext4_io_end_t *io_end)
>   }
> 
>   extern const struct iomap_ops ext4_iomap_ops;
> +extern const struct iomap_ops ext4_iomap_overwrite_ops;
>   extern const struct iomap_ops ext4_iomap_report_ops;
> 
>   static inline int ext4_buffer_uptodate(struct buffer_head *bh)
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 6a7293a5cda2..f8e4af72d64d 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -370,6 +370,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>   	loff_t offset;
>   	handle_t *handle;
>   	struct inode *inode = file_inode(iocb->ki_filp);
> +	const struct iomap_ops *iomap_ops = &ext4_iomap_ops;
>   	bool extend = false, overwrite = false, unaligned_aio = false;
> 
>   	if (iocb->ki_flags & IOCB_NOWAIT) {
> @@ -415,6 +416,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>   	if (!unaligned_aio && ext4_overwrite_io(inode, offset, count) &&
>   	    ext4_should_dioread_nolock(inode)) {
>   		overwrite = true;
> +		iomap_ops = &ext4_iomap_overwrite_ops;
>   		downgrade_write(&inode->i_rwsem);
>   	}
> 
> @@ -435,7 +437,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>   		ext4_journal_stop(handle);
>   	}
> 
> -	ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops, &ext4_dio_write_ops,
> +	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
>   			   is_sync_kiocb(iocb) || unaligned_aio || extend);
> 
>   	if (extend)
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 28f28de0c1b6..e1eb4493aacc 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3448,6 +3448,22 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>   	return 0;
>   }
> 
> +static int ext4_iomap_overwrite_begin(struct inode *inode, loff_t offset,
> +		loff_t length, unsigned flags, struct iomap *iomap,
> +		struct iomap *srcmap)
> +{
> +	int ret;
> +
> +	/*
> +	 * Even for writes we don't need to allocate blocks, so just pretend
> +	 * we are reading to save overhead of starting a transaction.
> +	 */
> +	flags &= ~IOMAP_WRITE;
> +	ret = ext4_iomap_begin(inode, offset, length, flags, iomap, srcmap);
> +	WARN_ON_ONCE(iomap->type != IOMAP_MAPPED);
> +	return ret;
> +}
> +
>   static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length,
>   			  ssize_t written, unsigned flags, struct iomap *iomap)
>   {
> @@ -3469,6 +3485,11 @@ const struct iomap_ops ext4_iomap_ops = {
>   	.iomap_end		= ext4_iomap_end,
>   };
> 
> +const struct iomap_ops ext4_iomap_overwrite_ops = {
> +	.iomap_begin		= ext4_iomap_overwrite_begin,
> +	.iomap_end		= ext4_iomap_end,
> +};
> +
>   static bool ext4_iomap_is_delalloc(struct inode *inode,
>   				   struct ext4_map_blocks *map)
>   {
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: Optimize ext4 DIO overwrites
  2019-12-19 13:53 ` Ritesh Harjani
@ 2019-12-19 19:28   ` Jan Kara
  2019-12-26 17:17     ` Theodore Y. Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2019-12-19 19:28 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Jan Kara, Ted Tso, linux-ext4, Dan Williams, Berrocal, Eduardo

On Thu 19-12-19 19:23:28, Ritesh Harjani wrote:
> On 12/18/19 11:14 PM, Jan Kara wrote:
> > Currently we start transaction for mapping every extent for writing
> > using direct IO. This is unnecessary when we know we are overwriting
> > already allocated blocks and the overhead of starting a transaction can
> > be significant especially for multithreaded workloads doing small writes.
> > Use iomap operations that avoid starting a transaction for direct IO
> > overwrites.
> > 
> > This improves throughput of 4k random writes - fio jobfile:
> > [global]
> > rw=randrw
> > norandommap=1
> > invalidate=0
> > bs=4k
> > numjobs=16
> > time_based=1
> > ramp_time=30
> > runtime=120
> > group_reporting=1
> > ioengine=psync
> > direct=1
> > size=16G
> > filename=file1.0.0:file1.0.1:file1.0.2:file1.0.3:file1.0.4:file1.0.5:file1.0.6:file1.0.7:file1.0.8:file1.0.9:file1.0.10:file1.0.11:file1.0.12:file1.0.13:file1.0.14:file1.0.15:file1.0.16:file1.0.17:file1.0.18:file1.0.19:file1.0.20:file1.0.21:file1.0.22:file1.0.23:file1.0.24:file1.0.25:file1.0.26:file1.0.27:file1.0.28:file1.0.29:file1.0.30:file1.0.31
> > file_service_type=random
> > nrfiles=32
> > 
> > from 3018MB/s to 4059MB/s in my test VM running test against simulated
> > pmem device (note that before iomap conversion, this workload was able
> > to achieve 3708MB/s because old direct IO path avoided transaction start
> > for overwrites as well). For dax, the win is even larger improving
> > throughput from 3042MB/s to 4311MB/s.
> 
> However for dax via ext4_dax_write_iter() path, we still need a way to
> detect if it's overwrite and that path can be optimized too right?
> I see, that this path could use both `shared inode locking` and
> `no journal transaction` optimizations in case of overwrites. Correct?

I don't think we can really afford the shared locking in
ext4_dax_write_iter() as POSIX requires overlapping writes to be
serialized. But we could still optimize-away the transaction starts.

> > Reported-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> This was one of the next AI I too wanted to do. I guess since everyone
> loves performance improvements. :)
> 
> No problem with current patch. Looks good. Gave it a run too on my
> system.
> 
> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>

Thanks!

> However depending on which patch lands first one may need a
> re-basing. Will conflict with this-
> https://marc.info/?l=linux-ext4&m=157613016931238&w=2

Yes, but the conflict is minor and trivial to resolve.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: Optimize ext4 DIO overwrites
  2019-12-19 19:28   ` Jan Kara
@ 2019-12-26 17:17     ` Theodore Y. Ts'o
  2019-12-27  5:32       ` Ritesh Harjani
  2020-01-06  9:33       ` Jan Kara
  0 siblings, 2 replies; 6+ messages in thread
From: Theodore Y. Ts'o @ 2019-12-26 17:17 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ritesh Harjani, linux-ext4, Dan Williams, Berrocal, Eduardo

On Thu, Dec 19, 2019 at 08:28:23PM +0100, Jan Kara wrote:
> > However depending on which patch lands first one may need a
> > re-basing. Will conflict with this-
> > https://marc.info/?l=linux-ext4&m=157613016931238&w=2
> 
> Yes, but the conflict is minor and trivial to resolve.
> 

Is this the correct resolution?

--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -447,6 +447,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	struct inode *inode = file_inode(iocb->ki_filp);
 	loff_t offset = iocb->ki_pos;
 	size_t count = iov_iter_count(from);
+	const struct iomap_ops *iomap_ops = &ext4_iomap_ops;
 	bool extend = false, unaligned_io = false;
 	bool ilock_shared = true;
 
@@ -526,7 +527,9 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		ext4_journal_stop(handle);
 	}
 
-	ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops, &ext4_dio_write_ops,
+	if (ilock_shared)
+		iomap_ops = &ext4_iomap_overwrite_ops;
+	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
 			   is_sync_kiocb(iocb) || unaligned_io || extend);
 
 	if (extend)

     	   	    	      	  - Ted
				  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: Optimize ext4 DIO overwrites
  2019-12-26 17:17     ` Theodore Y. Ts'o
@ 2019-12-27  5:32       ` Ritesh Harjani
  2020-01-06  9:33       ` Jan Kara
  1 sibling, 0 replies; 6+ messages in thread
From: Ritesh Harjani @ 2019-12-27  5:32 UTC (permalink / raw)
  To: Theodore Y. Ts'o, Jan Kara
  Cc: linux-ext4, Dan Williams, Berrocal, Eduardo



On 12/26/19 10:47 PM, Theodore Y. Ts'o wrote:
> On Thu, Dec 19, 2019 at 08:28:23PM +0100, Jan Kara wrote:
>>> However depending on which patch lands first one may need a
>>> re-basing. Will conflict with this-
>>> https://marc.info/?l=linux-ext4&m=157613016931238&w=2
>>
>> Yes, but the conflict is minor and trivial to resolve.
>>
> 
> Is this the correct resolution?
> 
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -447,6 +447,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>   	struct inode *inode = file_inode(iocb->ki_filp);
>   	loff_t offset = iocb->ki_pos;
>   	size_t count = iov_iter_count(from);
> +	const struct iomap_ops *iomap_ops = &ext4_iomap_ops;
>   	bool extend = false, unaligned_io = false;
>   	bool ilock_shared = true;
>   
> @@ -526,7 +527,9 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>   		ext4_journal_stop(handle);
>   	}
>   
> -	ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops, &ext4_dio_write_ops,
> +	if (ilock_shared)
> +		iomap_ops = &ext4_iomap_overwrite_ops;
> +	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
>   			   is_sync_kiocb(iocb) || unaligned_io || extend);
>   
>   	if (extend)
> 
>  

Yes, this looks correct to me.

Thanks
-ritesh


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: Optimize ext4 DIO overwrites
  2019-12-26 17:17     ` Theodore Y. Ts'o
  2019-12-27  5:32       ` Ritesh Harjani
@ 2020-01-06  9:33       ` Jan Kara
  1 sibling, 0 replies; 6+ messages in thread
From: Jan Kara @ 2020-01-06  9:33 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Jan Kara, Ritesh Harjani, linux-ext4, Dan Williams, Berrocal, Eduardo

On Thu 26-12-19 12:17:31, Theodore Y. Ts'o wrote:
> On Thu, Dec 19, 2019 at 08:28:23PM +0100, Jan Kara wrote:
> > > However depending on which patch lands first one may need a
> > > re-basing. Will conflict with this-
> > > https://marc.info/?l=linux-ext4&m=157613016931238&w=2
> > 
> > Yes, but the conflict is minor and trivial to resolve.
> > 
> 
> Is this the correct resolution?

Looks good to me as well.

								Honza

> 
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -447,6 +447,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  	struct inode *inode = file_inode(iocb->ki_filp);
>  	loff_t offset = iocb->ki_pos;
>  	size_t count = iov_iter_count(from);
> +	const struct iomap_ops *iomap_ops = &ext4_iomap_ops;
>  	bool extend = false, unaligned_io = false;
>  	bool ilock_shared = true;
>  
> @@ -526,7 +527,9 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  		ext4_journal_stop(handle);
>  	}
>  
> -	ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops, &ext4_dio_write_ops,
> +	if (ilock_shared)
> +		iomap_ops = &ext4_iomap_overwrite_ops;
> +	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
>  			   is_sync_kiocb(iocb) || unaligned_io || extend);
>  
>  	if (extend)
> 
>      	   	    	      	  - Ted
> 				  
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-01-06  9:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-18 17:44 [PATCH] ext4: Optimize ext4 DIO overwrites Jan Kara
2019-12-19 13:53 ` Ritesh Harjani
2019-12-19 19:28   ` Jan Kara
2019-12-26 17:17     ` Theodore Y. Ts'o
2019-12-27  5:32       ` Ritesh Harjani
2020-01-06  9:33       ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).