All of lore.kernel.org
 help / color / mirror / Atom feed
* mm: don't read i_size of inode unless we need it
@ 2021-10-26 18:15 Jens Axboe
  2021-10-26 19:11 ` Chris Mason
  2021-10-28 14:19 ` Christoph Hellwig
  0 siblings, 2 replies; 5+ messages in thread
From: Jens Axboe @ 2021-10-26 18:15 UTC (permalink / raw)
  To: Andrew Morton, Dave Chinner, linux-fsdevel,
	Linux Memory Management List, Chris Mason, Pavel Begunkov

We always go through i_size_read(), and we rarely end up needing it. Push
the read to down where we need to check it, which avoids it for most
cases.

It looks like we can even remove this check entirely, which might be
worth pursuing. But at least this takes it out of the hot path.

Acked-by: Chris Mason <clm@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

I came across this and wrote the patch the other day, then Pavel pointed
me at his original posting of a very similar patch back in August.
Discussed it with Chris, and it sure _seems_ like this would be fine.

In an attempt to move the original discussion forward, here's this
posting.

diff --git a/mm/filemap.c b/mm/filemap.c
index 44b4b551e430..850920276846 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2736,9 +2736,7 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 		struct file *file = iocb->ki_filp;
 		struct address_space *mapping = file->f_mapping;
 		struct inode *inode = mapping->host;
-		loff_t size;
 
-		size = i_size_read(inode);
 		if (iocb->ki_flags & IOCB_NOWAIT) {
 			if (filemap_range_needs_writeback(mapping, iocb->ki_pos,
 						iocb->ki_pos + count - 1))
@@ -2770,8 +2768,9 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 		 * the rest of the read.  Buffered reads will not work for
 		 * DAX files, so don't bother trying.
 		 */
-		if (retval < 0 || !count || iocb->ki_pos >= size ||
-		    IS_DAX(inode))
+		if (retval < 0 || !count || IS_DAX(inode))
+			return retval;
+		if (iocb->ki_pos >= i_size_read(inode))
 			return retval;
 	}
 
-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: mm: don't read i_size of inode unless we need it
  2021-10-26 18:15 mm: don't read i_size of inode unless we need it Jens Axboe
@ 2021-10-26 19:11 ` Chris Mason
  2021-10-27 15:50   ` Jens Axboe
  2021-10-28 14:19 ` Christoph Hellwig
  1 sibling, 1 reply; 5+ messages in thread
From: Chris Mason @ 2021-10-26 19:11 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Andrew Morton, Dave Chinner, linux-fsdevel,
	Linux Memory Management List, Pavel Begunkov


> On Oct 26, 2021, at 2:15 PM, Jens Axboe <axboe@kernel.dk> wrote:
> 
> We always go through i_size_read(), and we rarely end up needing it. Push
> the read to down where we need to check it, which avoids it for most
> cases.
> 
> It looks like we can even remove this check entirely, which might be
> worth pursuing. But at least this takes it out of the hot path.
> 
> Acked-by: Chris Mason <clm@fb.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
> 
> I came across this and wrote the patch the other day, then Pavel pointed
> me at his original posting of a very similar patch back in August.
> Discussed it with Chris, and it sure _seems_ like this would be fine.
> 
> In an attempt to move the original discussion forward, here's this
> posting.
> 

I had the same concerns Dave Chinner did, but I think the i_size check inside generic_file_read_iter() is dead code at this point.  Checking ki_pos against i_size was added for Btrfs:

commit 66f998f611897319b555364cefd5d6e88a205866
Author: Josef Bacik <josef@redhat.com>
Date:   Sun May 23 11:00:54 2010 -0400

    fs: allow short direct-io reads to be completed via buffered IO

And we’ve switched to btrfs_file_read_iter(), which does the check the same way PavelJens have done it here.

I don’t think checking i_size before or after O_DIRECT makes the race fundamentally different.  We might return a short read at different  times than we did before, but we won’t be returning stale/incorrect data.

-chris

> diff --git a/mm/filemap.c b/mm/filemap.c
> index 44b4b551e430..850920276846 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2736,9 +2736,7 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
> 		struct file *file = iocb->ki_filp;
> 		struct address_space *mapping = file->f_mapping;
> 		struct inode *inode = mapping->host;
> -		loff_t size;
> 
> -		size = i_size_read(inode);
> 		if (iocb->ki_flags & IOCB_NOWAIT) {
> 			if (filemap_range_needs_writeback(mapping, iocb->ki_pos,
> 						iocb->ki_pos + count - 1))
> @@ -2770,8 +2768,9 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
> 		 * the rest of the read.  Buffered reads will not work for
> 		 * DAX files, so don't bother trying.
> 		 */
> -		if (retval < 0 || !count || iocb->ki_pos >= size ||
> -		    IS_DAX(inode))
> +		if (retval < 0 || !count || IS_DAX(inode))
> +			return retval;
> +		if (iocb->ki_pos >= i_size_read(inode))
> 			return retval;
> 	}
> 




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mm: don't read i_size of inode unless we need it
  2021-10-26 19:11 ` Chris Mason
@ 2021-10-27 15:50   ` Jens Axboe
  0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2021-10-27 15:50 UTC (permalink / raw)
  To: Chris Mason
  Cc: Andrew Morton, Dave Chinner, linux-fsdevel,
	Linux Memory Management List, Pavel Begunkov

On 10/26/21 1:11 PM, Chris Mason wrote:
> 
>> On Oct 26, 2021, at 2:15 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> We always go through i_size_read(), and we rarely end up needing it. Push
>> the read to down where we need to check it, which avoids it for most
>> cases.
>>
>> It looks like we can even remove this check entirely, which might be
>> worth pursuing. But at least this takes it out of the hot path.
>>
>> Acked-by: Chris Mason <clm@fb.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>
>> ---
>>
>> I came across this and wrote the patch the other day, then Pavel pointed
>> me at his original posting of a very similar patch back in August.
>> Discussed it with Chris, and it sure _seems_ like this would be fine.
>>
>> In an attempt to move the original discussion forward, here's this
>> posting.
>>
> 
> I had the same concerns Dave Chinner did, but I think the i_size check
> inside generic_file_read_iter() is dead code at this point.  Checking
> ki_pos against i_size was added for Btrfs:
> 
> commit 66f998f611897319b555364cefd5d6e88a205866
> Author: Josef Bacik <josef@redhat.com>
> Date:   Sun May 23 11:00:54 2010 -0400
> 
>     fs: allow short direct-io reads to be completed via buffered IO
> 
> And we’ve switched to btrfs_file_read_iter(), which does the check the
> same way PavelJens have done it here.
> 
> I don’t think checking i_size before or after O_DIRECT makes the race
> fundamentally different.  We might return a short read at different
> times than we did before, but we won’t be returning stale/incorrect
> data.

Andrew, can you queue this one up in the mm branch?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mm: don't read i_size of inode unless we need it
  2021-10-26 18:15 mm: don't read i_size of inode unless we need it Jens Axboe
  2021-10-26 19:11 ` Chris Mason
@ 2021-10-28 14:19 ` Christoph Hellwig
  2021-10-28 15:00   ` Jens Axboe
  1 sibling, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2021-10-28 14:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Andrew Morton, Dave Chinner, linux-fsdevel,
	Linux Memory Management List, Chris Mason, Pavel Begunkov

Btw, given that you're micro-optimizing in this area:

block devices still use ->direct_IO in the I/O path.  It might make
sense to switch to the model of the file systems that use iomap where
we avoid that indirect call and can optimize the code for the direct
I/O fast path.  With that you woudn't even end up using this i_size_read
at all

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mm: don't read i_size of inode unless we need it
  2021-10-28 14:19 ` Christoph Hellwig
@ 2021-10-28 15:00   ` Jens Axboe
  0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2021-10-28 15:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Dave Chinner, linux-fsdevel,
	Linux Memory Management List, Chris Mason, Pavel Begunkov

On 10/28/21 8:19 AM, Christoph Hellwig wrote:
> Btw, given that you're micro-optimizing in this area:
> 
> block devices still use ->direct_IO in the I/O path.  It might make
> sense to switch to the model of the file systems that use iomap where
> we avoid that indirect call and can optimize the code for the direct
> I/O fast path.  With that you woudn't even end up using this i_size_read
> at all

It's not a bad idea, we do kind of jump through some hoops here by
calling generic_file_iter_read(), we should just check for dio upfront
and handle that, falling back to filemap_read() if needed.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-28 15:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-26 18:15 mm: don't read i_size of inode unless we need it Jens Axboe
2021-10-26 19:11 ` Chris Mason
2021-10-27 15:50   ` Jens Axboe
2021-10-28 14:19 ` Christoph Hellwig
2021-10-28 15:00   ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.