linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Major btrfs fiemap slowdown on file with many extents once in cache (RCU stalls?) (Was: [PATCH 1/3] filemap: Correct the conditions for marking a folio as accessed)
       [not found] ` <20220619151143.1054746-2-willy@infradead.org>
@ 2022-06-30  7:29   ` Dominique MARTINET
  2022-06-30 14:37     ` Theodore Ts'o
  2022-07-01  1:55     ` Yu Kuai
  0 siblings, 2 replies; 5+ messages in thread
From: Dominique MARTINET @ 2022-06-30  7:29 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-btrfs
  Cc: linux-mm, linux-fsdevel, linux-kernel, Yu Kuai, Kent Overstreet

Hi Willy, linux-btrfs@vger,

Matthew Wilcox (Oracle) wrote on Sun, Jun 19, 2022 at 04:11:41PM +0100:
> We had an off-by-one error which meant that we never marked the first page
> in a read as accessed.  This was visible as a slowdown when re-reading
> a file as pages were being evicted from cache too soon.  In reviewing
> this code, we noticed a second bug where a multi-page folio would be
> marked as accessed multiple times when doing reads that were less than
> the size of the folio.

when debugging an unrelated issue (short reads on btrfs with io_uring
and O_DIRECT[1]), I noticed that my horrible big file copy speeds fell
down from ~2GB/s (there's compression and lots of zeroes) to ~100MB/s
the second time I was copying it with cp.

I've taken a moment to bisect this and came down to this patch.

[1] https://lore.kernel.org/all/YrrFGO4A1jS0GI0G@atmark-techno.com/T/#u



Dropping caches (echo 3 > /proc/sys/vm/drop_caches) restore the speed,
so there appears to be some bad effect to having the file in cache for
fiemap?
To be fair that file is pretty horrible:
---
# compsize bigfile
Processed 1 file, 194955 regular extents (199583 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       15%      3.7G          23G          23G       
none       100%      477M         477M         514M       
zstd        14%      3.2G          23G          23G       
---

Here's what perf has to say about it on top of this patch when running
`cp bigfile /dev/null` the first time:

98.97%     0.00%  cp       [kernel.kallsyms]    [k]
entry_SYSCALL_64_after_hwframe
 entry_SYSCALL_64_after_hwframe
 do_syscall_64
  - 93.40% ksys_read
     - 93.36% vfs_read
        - 93.25% new_sync_read
           - 93.20% filemap_read
              - 83.38% filemap_get_pages
                 - 82.76% page_cache_ra_unbounded
                    + 59.72% folio_alloc
                    + 13.43% read_pages
                    + 8.75% filemap_add_folio
                      0.64% xa_load
                   0.52% filemap_get_read_batch
              + 8.75% copy_page_to_iter
  - 4.73% __x64_sys_ioctl
     - 4.72% do_vfs_ioctl
        - btrfs_fiemap
           - 4.70% extent_fiemap
              + 3.95% btrfs_check_shared
              + 0.70% get_extent_skip_holes

and second time:
99.90%     0.00%  cp       [kernel.kallsyms]    [k]
entry_SYSCALL_64_after_hwfram
 entry_SYSCALL_64_after_hwframe
 do_syscall_64
  - 94.62% __x64_sys_ioctl
       do_vfs_ioctl
       btrfs_fiemap
     - extent_fiemap
        - 50.01% get_extent_skip_holes
           - 50.00% btrfs_get_extent_fiemap
              - 49.97% count_range_bits
                   rb_next
        + 28.72% lock_extent_bits
        + 15.55% __clear_extent_bit
  - 5.21% ksys_read
     + 5.21% vfs_read

(if this isn't readable, 95% of the time is spent on fiemap the second
time around)




I've also been observing RCU stalls on my laptop with the same workload
(cp to /dev/null), but unfortunately I could not reproduce in qemu so I
could not take traces to confirm they are caused by the same commit but
given the workload I'd say that is it?
I can rebuild a kernel for my laptop and confirm if you think it should
be something else.


I didn't look at the patch itself (yet) so have no suggestion at this
point - it's plausible the patch fixed something and just exposed slow
code that had been there all along so it might be better to look at the
btrfs side first, I don't know.
If you don't manage to reproduce I'll be happy to test anything thrown
at me at the very least.


Thanks,
-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Major btrfs fiemap slowdown on file with many extents once in cache (RCU stalls?) (Was: [PATCH 1/3] filemap: Correct the conditions for marking a folio as accessed)
  2022-06-30  7:29   ` Major btrfs fiemap slowdown on file with many extents once in cache (RCU stalls?) (Was: [PATCH 1/3] filemap: Correct the conditions for marking a folio as accessed) Dominique MARTINET
@ 2022-06-30 14:37     ` Theodore Ts'o
  2022-06-30 22:43       ` Dominique MARTINET
  2022-07-01  1:55     ` Yu Kuai
  1 sibling, 1 reply; 5+ messages in thread
From: Theodore Ts'o @ 2022-06-30 14:37 UTC (permalink / raw)
  To: Dominique MARTINET
  Cc: Matthew Wilcox (Oracle),
	linux-btrfs, linux-mm, linux-fsdevel, linux-kernel, Yu Kuai,
	Kent Overstreet

On Thu, Jun 30, 2022 at 04:29:05PM +0900, Dominique MARTINET wrote:
> Hi Willy, linux-btrfs@vger,
> 
> Matthew Wilcox (Oracle) wrote on Sun, Jun 19, 2022 at 04:11:41PM +0100:
> > We had an off-by-one error which meant that we never marked the first page
> > in a read as accessed.  This was visible as a slowdown when re-reading
> > a file as pages were being evicted from cache too soon.  In reviewing
> > this code, we noticed a second bug where a multi-page folio would be
> > marked as accessed multiple times when doing reads that were less than
> > the size of the folio.
> 
> when debugging an unrelated issue (short reads on btrfs with io_uring
> and O_DIRECT[1]), I noticed that my horrible big file copy speeds fell
> down from ~2GB/s (there's compression and lots of zeroes) to ~100MB/s
> the second time I was copying it with cp.
> 
> I've taken a moment to bisect this and came down to this patch.

I think you may have forgotten to include the commit-id that was the
results of your bisect.... ?

					- Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Major btrfs fiemap slowdown on file with many extents once in cache (RCU stalls?) (Was: [PATCH 1/3] filemap: Correct the conditions for marking a folio as accessed)
  2022-06-30 14:37     ` Theodore Ts'o
@ 2022-06-30 22:43       ` Dominique MARTINET
  0 siblings, 0 replies; 5+ messages in thread
From: Dominique MARTINET @ 2022-06-30 22:43 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Matthew Wilcox (Oracle),
	linux-btrfs, linux-mm, linux-fsdevel, linux-kernel, Yu Kuai,
	Kent Overstreet

Theodore Ts'o wrote on Thu, Jun 30, 2022 at 10:37:58AM -0400:
> On Thu, Jun 30, 2022 at 04:29:05PM +0900, Dominique MARTINET wrote:
> > Hi Willy, linux-btrfs@vger,
> > 
> > Matthew Wilcox (Oracle) wrote on Sun, Jun 19, 2022 at 04:11:41PM +0100:
> > > We had an off-by-one error which meant that we never marked the first page
> > > in a read as accessed.  This was visible as a slowdown when re-reading
> > > a file as pages were being evicted from cache too soon.  In reviewing
> > > this code, we noticed a second bug where a multi-page folio would be
> > > marked as accessed multiple times when doing reads that were less than
> > > the size of the folio.
> > 
> > when debugging an unrelated issue (short reads on btrfs with io_uring
> > and O_DIRECT[1]), I noticed that my horrible big file copy speeds fell
> > down from ~2GB/s (there's compression and lots of zeroes) to ~100MB/s
> > the second time I was copying it with cp.
> > 
> > I've taken a moment to bisect this and came down to this patch.
> 
> I think you may have forgotten to include the commit-id that was the
> results of your bisect.... ?

Sorry, this is the patch I replied to and it was recent enough that I
assumed it'd still be in mailboxes, but you're right it's better with a
commit id. This is was merged as 5ccc944dce3d ("filemap: Correct the
conditions for marking a folio as accessed")


Thanks,
-- 
Dominique

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Major btrfs fiemap slowdown on file with many extents once in cache (RCU stalls?) (Was: [PATCH 1/3] filemap: Correct the conditions for marking a folio as accessed)
  2022-06-30  7:29   ` Major btrfs fiemap slowdown on file with many extents once in cache (RCU stalls?) (Was: [PATCH 1/3] filemap: Correct the conditions for marking a folio as accessed) Dominique MARTINET
  2022-06-30 14:37     ` Theodore Ts'o
@ 2022-07-01  1:55     ` Yu Kuai
  2022-07-07  8:42       ` Dominique MARTINET
  1 sibling, 1 reply; 5+ messages in thread
From: Yu Kuai @ 2022-07-01  1:55 UTC (permalink / raw)
  To: Dominique MARTINET, Matthew Wilcox (Oracle), linux-btrfs
  Cc: linux-mm, linux-fsdevel, linux-kernel, Kent Overstreet

在 2022/06/30 15:29, Dominique MARTINET 写道:
> Hi Willy, linux-btrfs@vger,
> 
> Matthew Wilcox (Oracle) wrote on Sun, Jun 19, 2022 at 04:11:41PM +0100:
>> We had an off-by-one error which meant that we never marked the first page
>> in a read as accessed.  This was visible as a slowdown when re-reading
>> a file as pages were being evicted from cache too soon.  In reviewing
>> this code, we noticed a second bug where a multi-page folio would be
>> marked as accessed multiple times when doing reads that were less than
>> the size of the folio.
> 
> when debugging an unrelated issue (short reads on btrfs with io_uring
> and O_DIRECT[1]), I noticed that my horrible big file copy speeds fell
> down from ~2GB/s (there's compression and lots of zeroes) to ~100MB/s
> the second time I was copying it with cp.
> 

Hi,

With this patch ctive_page() will be called the second time that page is
mark accessed, which has some extra overhead, however, 2GB/s -> 100MB/s
is insane, I'm not sure how this is possible, but it seems like it has
something to do with this change.(Noted that it's problematic that page
will not mark accessed before this patch).

BTW, during my test, the speed of buffer read in ext4 only fell down a
little.

Thanks,
Kuai
> I've taken a moment to bisect this and came down to this patch.
> 
> [1] https://lore.kernel.org/all/YrrFGO4A1jS0GI0G@atmark-techno.com/T/#u
> 
> 
> 
> Dropping caches (echo 3 > /proc/sys/vm/drop_caches) restore the speed,
> so there appears to be some bad effect to having the file in cache for
> fiemap?
> To be fair that file is pretty horrible:
> ---
> # compsize bigfile
> Processed 1 file, 194955 regular extents (199583 refs), 0 inline.
> Type       Perc     Disk Usage   Uncompressed Referenced
> TOTAL       15%      3.7G          23G          23G
> none       100%      477M         477M         514M
> zstd        14%      3.2G          23G          23G
> ---
> 
> Here's what perf has to say about it on top of this patch when running
> `cp bigfile /dev/null` the first time:
> 
> 98.97%     0.00%  cp       [kernel.kallsyms]    [k]
> entry_SYSCALL_64_after_hwframe
>   entry_SYSCALL_64_after_hwframe
>   do_syscall_64
>    - 93.40% ksys_read
>       - 93.36% vfs_read
>          - 93.25% new_sync_read
>             - 93.20% filemap_read
>                - 83.38% filemap_get_pages
>                   - 82.76% page_cache_ra_unbounded
>                      + 59.72% folio_alloc
>                      + 13.43% read_pages
>                      + 8.75% filemap_add_folio
>                        0.64% xa_load
>                     0.52% filemap_get_read_batch
>                + 8.75% copy_page_to_iter
>    - 4.73% __x64_sys_ioctl
>       - 4.72% do_vfs_ioctl
>          - btrfs_fiemap
>             - 4.70% extent_fiemap
>                + 3.95% btrfs_check_shared
>                + 0.70% get_extent_skip_holes
> 
> and second time:
> 99.90%     0.00%  cp       [kernel.kallsyms]    [k]
> entry_SYSCALL_64_after_hwfram
>   entry_SYSCALL_64_after_hwframe
>   do_syscall_64
>    - 94.62% __x64_sys_ioctl
>         do_vfs_ioctl
>         btrfs_fiemap
>       - extent_fiemap
>          - 50.01% get_extent_skip_holes
>             - 50.00% btrfs_get_extent_fiemap
>                - 49.97% count_range_bits
>                     rb_next
>          + 28.72% lock_extent_bits
>          + 15.55% __clear_extent_bit
>    - 5.21% ksys_read
>       + 5.21% vfs_read
> 
> (if this isn't readable, 95% of the time is spent on fiemap the second
> time around)
> 
> 
> 
> 
> I've also been observing RCU stalls on my laptop with the same workload
> (cp to /dev/null), but unfortunately I could not reproduce in qemu so I
> could not take traces to confirm they are caused by the same commit but
> given the workload I'd say that is it?
> I can rebuild a kernel for my laptop and confirm if you think it should
> be something else.
> 
> 
> I didn't look at the patch itself (yet) so have no suggestion at this
> point - it's plausible the patch fixed something and just exposed slow
> code that had been there all along so it might be better to look at the
> btrfs side first, I don't know.
> If you don't manage to reproduce I'll be happy to test anything thrown
> at me at the very least.
> 
> 
> Thanks,
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Major btrfs fiemap slowdown on file with many extents once in cache (RCU stalls?) (Was: [PATCH 1/3] filemap: Correct the conditions for marking a folio as accessed)
  2022-07-01  1:55     ` Yu Kuai
@ 2022-07-07  8:42       ` Dominique MARTINET
  0 siblings, 0 replies; 5+ messages in thread
From: Dominique MARTINET @ 2022-07-07  8:42 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Matthew Wilcox (Oracle),
	linux-btrfs, linux-mm, linux-fsdevel, linux-kernel,
	Kent Overstreet, Chris Mason, Josef Bacik, David Sterba

(added btrfs maintainers in direct cc)

Yu Kuai wrote on Fri, Jul 01, 2022 at 09:55:31AM +0800:
> With this patch ctive_page() will be called the second time that page is
> mark accessed, which has some extra overhead, however, 2GB/s -> 100MB/s
> is insane, I'm not sure how this is possible, but it seems like it has
> something to do with this change.(Noted that it's problematic that page
> will not mark accessed before this patch).

I honestly don't understand why folio being marked as accessed affects
how fiemap is processed...
My guess would be that this indeed "just fixes" that pages didn't get
marked as accessed -> were dropped from cache -> it kept the inode
io_tree small -> fiemap was fast ; and it really just a problem that the
fiemap algorithm doesn't scale, but I haven't really checked if I'm
right here.


So I don't think we should focus so much on the regression part as to
figure out what's actually different the second time around and make
that faster.


checking with 'perf script' btrfs_get_extent_fiemap() spends most of its
time on this:
 delalloc_len = count_range_bits(&inode->io_tree, &delalloc_start,
                                 end, len, EXTENT_DELALLOC, 1);

I have no idea what delalloc is supposed to be, but I can guess there is
just way too many nodes in the io_tree: why is that and why wasn't there
so many the first time around? I would assumed that as the file gets
read it is put into cache, so the end of the first read should slow down
as well but it didn't, so I'm sure I misunderstood something and I'm
wasting everyone's time. Feel free to ignore me and find the issue
instead :)


> BTW, during my test, the speed of buffer read in ext4 only fell down a
> little.

For "normal" files that don't have ~200k extents full of holes and
compression changes and whatever else this has gone through, I can
confirm the slowdown is not as bad -- almost unnoticeable when few
extents.
but I still have my laptop cashing when I'm copying this file twice
(well, I -could- just turn off panic_on_stall...) so it can go from a
little to infinity...


Thanks,

(Leaving rest of the message for anyone catching up now; if there's
anything you'd like me to do feel free to ask.)

> > I've taken a moment to bisect this and came down to this patch.
> > (5ccc944dce3d ("filemap: Correct the conditions for marking a folio
> > as accessed"))
> > 
> > [1] https://lore.kernel.org/all/YrrFGO4A1jS0GI0G@atmark-techno.com/T/#u
> > 
> > 
> > 
> > Dropping caches (echo 3 > /proc/sys/vm/drop_caches) restore the speed,
> > so there appears to be some bad effect to having the file in cache for
> > fiemap?
> > To be fair that file is pretty horrible:
> > ---
> > # compsize bigfile
> > Processed 1 file, 194955 regular extents (199583 refs), 0 inline.
> > Type       Perc     Disk Usage   Uncompressed Referenced
> > TOTAL       15%      3.7G          23G          23G
> > none       100%      477M         477M         514M
> > zstd        14%      3.2G          23G          23G
> > ---
> > 
> > Here's what perf has to say about it on top of this patch when running
> > `cp bigfile /dev/null` the first time:
> > 
> > 98.97%     0.00%  cp       [kernel.kallsyms]    [k]
> > entry_SYSCALL_64_after_hwframe
> >   entry_SYSCALL_64_after_hwframe
> >   do_syscall_64
> >    - 93.40% ksys_read
> >       - 93.36% vfs_read
> >          - 93.25% new_sync_read
> >             - 93.20% filemap_read
> >                - 83.38% filemap_get_pages
> >                   - 82.76% page_cache_ra_unbounded
> >                      + 59.72% folio_alloc
> >                      + 13.43% read_pages
> >                      + 8.75% filemap_add_folio
> >                        0.64% xa_load
> >                     0.52% filemap_get_read_batch
> >                + 8.75% copy_page_to_iter
> >    - 4.73% __x64_sys_ioctl
> >       - 4.72% do_vfs_ioctl
> >          - btrfs_fiemap
> >             - 4.70% extent_fiemap
> >                + 3.95% btrfs_check_shared
> >                + 0.70% get_extent_skip_holes
> > 
> > and second time:
> > 99.90%     0.00%  cp       [kernel.kallsyms]    [k]
> > entry_SYSCALL_64_after_hwfram
> >   entry_SYSCALL_64_after_hwframe
> >   do_syscall_64
> >    - 94.62% __x64_sys_ioctl
> >         do_vfs_ioctl
> >         btrfs_fiemap
> >       - extent_fiemap
> >          - 50.01% get_extent_skip_holes
> >             - 50.00% btrfs_get_extent_fiemap
> >                - 49.97% count_range_bits
> >                     rb_next
> >          + 28.72% lock_extent_bits
> >          + 15.55% __clear_extent_bit
> >    - 5.21% ksys_read
> >       + 5.21% vfs_read
> > 
> > (if this isn't readable, 95% of the time is spent on fiemap the second
> > time around)
> > 
> > 
> > 
> > 
> > I've also been observing RCU stalls on my laptop with the same workload
> > (cp to /dev/null), but unfortunately I could not reproduce in qemu so I
> > could not take traces to confirm they are caused by the same commit but
> > given the workload I'd say that is it?
> > I can rebuild a kernel for my laptop and confirm if you think it should
> > be something else.
> > 
> > 
> > I didn't look at the patch itself (yet) so have no suggestion at this
> > point - it's plausible the patch fixed something and just exposed slow
> > code that had been there all along so it might be better to look at the
> > btrfs side first, I don't know.
> > If you don't manage to reproduce I'll be happy to test anything thrown
> > at me at the very least.

-- 
Dominique

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-07-07  8:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220619151143.1054746-1-willy@infradead.org>
     [not found] ` <20220619151143.1054746-2-willy@infradead.org>
2022-06-30  7:29   ` Major btrfs fiemap slowdown on file with many extents once in cache (RCU stalls?) (Was: [PATCH 1/3] filemap: Correct the conditions for marking a folio as accessed) Dominique MARTINET
2022-06-30 14:37     ` Theodore Ts'o
2022-06-30 22:43       ` Dominique MARTINET
2022-07-01  1:55     ` Yu Kuai
2022-07-07  8:42       ` Dominique MARTINET

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).