Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
* cpu bound I/O behaviour in linux 5.4 (possibly others)
@ 2020-02-14 11:30 Marc Lehmann
  2020-02-14 11:57 ` Nikolay Borisov
  2020-02-14 12:36 ` Roman Mamedov
  0 siblings, 2 replies; 7+ messages in thread
From: Marc Lehmann @ 2020-02-14 11:30 UTC (permalink / raw)
  To: linux-btrfs

Hi!

I've upgraded a machine to linux 5.4.15 that runs a small netnews
system. It normally pulls news with about 20MB/s. After upgrading (it
seems) that this process is now CPU bound, and I get only about 10mb/s
throughput. Otherwise, everything seems fine - no obvious bugs, and no
obvious performance problems.

"CPU-bound" specifically means that the disk(s) seem pretty idle (it an
6x10TB raid5), I can do a lot of I/O without slowing down the transfer,
but there is always a single kworker which is constantly at 100% cpu (i.e.
one core) in top:

 8963 root      20   0       0      0      0 R 2 100.0   0.0   2:04 [kworker/u8:15+flush-btrfs-3]

When I cat /proc/8963/task/8963/stack regularly, I get either no output or
(most often) this single line:

   [<0>] tree_search_offset.isra.0+0x16a/0x1d0 [btrfs]

It is possible that this is _not_ new behaviour with 5.4, but I often use
top, and I can't remember having a kworker stuck at 100% cpu for days.
(The fs is about a year old and had no issues so far, the last scrub is about
a week old).

Another symptom is that Dirty in /proc/meminfo is typically at 7-8GB,
which is more or less the value of /proc/sys/vm/dirty_ratio, Writeback is
usually 0 or has small values, and running sync often takes 30m or more.

The 100% cpu is definitely caused by the news transfer - pausing it and
waiting a while makes it effectively disappear and everything goes back to
normal.

The news process effectively does this in multiple parallel loops:

   openat(AT_FDCWD, "/store/04267/26623~", O_WRONLY|O_CREAT|O_EXCL, 0600...
   write(75, "Path: ask005.abavia.com!"..., 656453...
   close(75)                   = 0
   renameat2(AT_FDCWD, "/store/04267/26623~", AT_FDCWD, "/store/04267/26623", 0 ...

The file layout is one layer of subdirectories with 100000 files inside
each, which has posed absolutely no probelms withe xt4/xfs in the past,
and also btrfs didn't seem to mind.

My question is, would this be expected behaviour? If yes, is it something
that can be influenced/improved on my side?

I can investigate and do some experiments, but I cannot easily update
kernels/do reboots on this system.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cpu bound I/O behaviour in linux 5.4 (possibly others)
  2020-02-14 11:30 cpu bound I/O behaviour in linux 5.4 (possibly others) Marc Lehmann
@ 2020-02-14 11:57 ` Nikolay Borisov
  2020-02-14 12:20   ` Filipe Manana
  2020-02-14 12:40   ` Marc Lehmann
  2020-02-14 12:36 ` Roman Mamedov
  1 sibling, 2 replies; 7+ messages in thread
From: Nikolay Borisov @ 2020-02-14 11:57 UTC (permalink / raw)
  To: Marc Lehmann, linux-btrfs



On 14.02.20 г. 13:30 ч., Marc Lehmann wrote:
> Hi!
> 
> I've upgraded a machine to linux 5.4.15 that runs a small netnews
> system. It normally pulls news with about 20MB/s. After upgrading (it
> seems) that this process is now CPU bound, and I get only about 10mb/s
> throughput. Otherwise, everything seems fine - no obvious bugs, and no
> obvious performance problems.
> 
> "CPU-bound" specifically means that the disk(s) seem pretty idle (it an
> 6x10TB raid5), I can do a lot of I/O without slowing down the transfer,
> but there is always a single kworker which is constantly at 100% cpu (i.e.
> one core) in top:

So this is a 50tb useful space, right?

> 
>  8963 root      20   0       0      0      0 R 2 100.0   0.0   2:04 [kworker/u8:15+flush-btrfs-3]
> 
> When I cat /proc/8963/task/8963/stack regularly, I get either no output or
> (most often) this single line:
> 
>    [<0>] tree_search_offset.isra.0+0x16a/0x1d0 [btrfs]

This points to freespace cache. One thing that I might suggest is try
using free-space-tree (aka free space cache v2 ) as opposed to v1. You
can achieve this with the space_cache=2 mount option. I suspect what
might be happening is the freespace cache is rather fragmented and the
rb tree which is really pointer chasing is having a hard time keeping
up. space cache v2 is a lot better in that regard, since it's a btree as
its backing data structure.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cpu bound I/O behaviour in linux 5.4 (possibly others)
  2020-02-14 11:57 ` Nikolay Borisov
@ 2020-02-14 12:20   ` Filipe Manana
  2020-02-14 20:07     ` Marc Lehmann
  2020-02-14 12:40   ` Marc Lehmann
  1 sibling, 1 reply; 7+ messages in thread
From: Filipe Manana @ 2020-02-14 12:20 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Marc Lehmann, linux-btrfs

On Fri, Feb 14, 2020 at 11:58 AM Nikolay Borisov <nborisov@suse.com> wrote:
>
>
>
> On 14.02.20 г. 13:30 ч., Marc Lehmann wrote:
> > Hi!
> >
> > I've upgraded a machine to linux 5.4.15 that runs a small netnews
> > system. It normally pulls news with about 20MB/s. After upgrading (it
> > seems) that this process is now CPU bound, and I get only about 10mb/s
> > throughput. Otherwise, everything seems fine - no obvious bugs, and no
> > obvious performance problems.
> >
> > "CPU-bound" specifically means that the disk(s) seem pretty idle (it an
> > 6x10TB raid5), I can do a lot of I/O without slowing down the transfer,
> > but there is always a single kworker which is constantly at 100% cpu (i.e.
> > one core) in top:
>
> So this is a 50tb useful space, right?
>
> >
> >  8963 root      20   0       0      0      0 R 2 100.0   0.0   2:04 [kworker/u8:15+flush-btrfs-3]
> >
> > When I cat /proc/8963/task/8963/stack regularly, I get either no output or
> > (most often) this single line:
> >
> >    [<0>] tree_search_offset.isra.0+0x16a/0x1d0 [btrfs]
>
> This points to freespace cache. One thing that I might suggest is try
> using free-space-tree (aka free space cache v2 ) as opposed to v1. You
> can achieve this with the space_cache=2 mount option. I suspect what
> might be happening is the freespace cache is rather fragmented and the
> rb tree which is really pointer chasing is having a hard time keeping
> up. space cache v2 is a lot better in that regard, since it's a btree as
> its backing data structure.

Both free space cache and free space tree use the same in memory
representation of free space, which is a red black tree containing
extent and bitmap entries.
What differs between them is how they persist that information - the
space cache uses inodes and data extents, while the free space tree
uses a btree for that (no use of data extents nor inodes).

tree_search_offset is used to search the in-memory red black tree,
used by the free space cache and the free space tree implementations.

So you can't deduce that the free space cache is being used, and
despite being the default, it was not mentioned by Marc if he's not
using already the free space tree (-o space_cache=v2).
Switching from one to the other might make the problem go away, simply
because it cause free space to be scanned and build a new cache or
tree.



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cpu bound I/O behaviour in linux 5.4 (possibly others)
  2020-02-14 11:30 cpu bound I/O behaviour in linux 5.4 (possibly others) Marc Lehmann
  2020-02-14 11:57 ` Nikolay Borisov
@ 2020-02-14 12:36 ` Roman Mamedov
  2020-02-14 12:45   ` Marc Lehmann
  1 sibling, 1 reply; 7+ messages in thread
From: Roman Mamedov @ 2020-02-14 12:36 UTC (permalink / raw)
  To: Marc Lehmann; +Cc: linux-btrfs

On Fri, 14 Feb 2020 12:30:27 +0100
Marc Lehmann <schmorp@schmorp.de> wrote:

> I've upgraded a machine to linux 5.4.15 that runs a small netnews

You don't seem to mention which version you upgraded from. If a full bisect is
impractical, this is the (very distant) next best thing you can do. Was it from
5.14.14, or from 3.4? :)

Also would be nice if you can double-check that returning to that previous
version right now makes the issue go away, and it's not a coincidence of
something else changed on the FS or OS (such as other package upgrades beside
the kernel).

> system. It normally pulls news with about 20MB/s. After upgrading (it
> seems) that this process is now CPU bound, and I get only about 10mb/s
> throughput. Otherwise, everything seems fine - no obvious bugs, and no
> obvious performance problems.
> 
> "CPU-bound" specifically means that the disk(s) seem pretty idle (it an
> 6x10TB raid5), I can do a lot of I/O without slowing down the transfer,
> but there is always a single kworker which is constantly at 100% cpu (i.e.
> one core) in top:
> 
>  8963 root      20   0       0      0      0 R 2 100.0   0.0   2:04 [kworker/u8:15+flush-btrfs-3]
> 
> When I cat /proc/8963/task/8963/stack regularly, I get either no output or
> (most often) this single line:
> 
>    [<0>] tree_search_offset.isra.0+0x16a/0x1d0 [btrfs]
> 
> It is possible that this is _not_ new behaviour with 5.4, but I often use
> top, and I can't remember having a kworker stuck at 100% cpu for days.
> (The fs is about a year old and had no issues so far, the last scrub is about
> a week old).
> 
> Another symptom is that Dirty in /proc/meminfo is typically at 7-8GB,
> which is more or less the value of /proc/sys/vm/dirty_ratio, Writeback is
> usually 0 or has small values, and running sync often takes 30m or more.
> 
> The 100% cpu is definitely caused by the news transfer - pausing it and
> waiting a while makes it effectively disappear and everything goes back to
> normal.
> 
> The news process effectively does this in multiple parallel loops:
> 
>    openat(AT_FDCWD, "/store/04267/26623~", O_WRONLY|O_CREAT|O_EXCL, 0600...
>    write(75, "Path: ask005.abavia.com!"..., 656453...
>    close(75)                   = 0
>    renameat2(AT_FDCWD, "/store/04267/26623~", AT_FDCWD, "/store/04267/26623", 0 ...
> 
> The file layout is one layer of subdirectories with 100000 files inside
> each, which has posed absolutely no probelms withe xt4/xfs in the past,
> and also btrfs didn't seem to mind.
> 
> My question is, would this be expected behaviour? If yes, is it something
> that can be influenced/improved on my side?
> 
> I can investigate and do some experiments, but I cannot easily update
> kernels/do reboots on this system.
> 


-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cpu bound I/O behaviour in linux 5.4 (possibly others)
  2020-02-14 11:57 ` Nikolay Borisov
  2020-02-14 12:20   ` Filipe Manana
@ 2020-02-14 12:40   ` Marc Lehmann
  1 sibling, 0 replies; 7+ messages in thread
From: Marc Lehmann @ 2020-02-14 12:40 UTC (permalink / raw)
  To: Nikolay Borisov, Filipe Manana; +Cc: linux-btrfs

First of all, thanks for the response - and if my mails are somehow
off-topic for the list, also feelf ree to tell me :)

On Fri, Feb 14, 2020 at 01:57:33PM +0200, Nikolay Borisov <nborisov@suse.com> wrote:
> > one core) in top:
> 
> So this is a 50tb useful space, right?

Yup, pretty full, too. I also forgot to mention that the vast majority
of files are between 400kB and 2MB, and not, as one might expect from
textgroups, a few kilobytes.

> >    [<0>] tree_search_offset.isra.0+0x16a/0x1d0 [btrfs]
> 
> This points to freespace cache. One thing that I might suggest is try

Hmm, I did switch to the free space tree on all larger filesystems long
ago, or at least I thought so. I use these mount options on the fs in
question:

   rw,noatime,nossd,space_cache=v2,commit=120,subvolid=5,subvol=/

I assume this is the correct way to get it (and your space_cache=2 is a
typo, or an alternative?).

So either I am not switching on the free space tree properly, or it's not
the problem. I did notice major speedups form it in the past, though.

> So you can't deduce that the free space cache is being used, and
> despite being the default, it was not mentioned by Marc if he's not
> using already the free space tree (-o space_cache=v2).

Yes, sorry, it's alwayss hard to strike a balance between needed info and
too much.

> Switching from one to the other might make the problem go away, simply
> because it cause free space to be scanned and build a new cache or
> tree.

So clearing the free space tree might also help? Can I do this while its
mounted using remount or do I haver to umount/mount (or use btrfs check)?

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cpu bound I/O behaviour in linux 5.4 (possibly others)
  2020-02-14 12:36 ` Roman Mamedov
@ 2020-02-14 12:45   ` Marc Lehmann
  0 siblings, 0 replies; 7+ messages in thread
From: Marc Lehmann @ 2020-02-14 12:45 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-btrfs

On Fri, Feb 14, 2020 at 05:36:54PM +0500, Roman Mamedov <rm@romanrm.net> wrote:
> 
> You don't seem to mention which version you upgraded from. If a full bisect is
> impractical, this is the (very distant) next best thing you can do. Was it from
> 5.14.14, or from 3.4? :)

It was 5.2.21, and before that, 4.19. I might be able to do this, but it's
not something that I can quickly do to test this out.

> Also would be nice if you can double-check that returning to that previous
> version right now makes the issue go away, and it's not a coincidence of
> something else changed on the FS or OS (such as other package upgrades beside
> the kernel).

Would have done so, if I could easily do that. In the meantime, it's a
definite possibility that this is not actually something new, but only
something that recently manifested due to different I/O patterns.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cpu bound I/O behaviour in linux 5.4 (possibly others)
  2020-02-14 12:20   ` Filipe Manana
@ 2020-02-14 20:07     ` Marc Lehmann
  0 siblings, 0 replies; 7+ messages in thread
From: Marc Lehmann @ 2020-02-14 20:07 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs

On Fri, Feb 14, 2020 at 12:20:46PM +0000, Filipe Manana <fdmanana@gmail.com> wrote:
> So you can't deduce that the free space cache is being used, and
> despite being the default, it was not mentioned by Marc if he's not
> using already the free space tree (-o space_cache=v2).

Today, during an idle period, I stopped the news process, waited 15
minutes for I/O tand CPU to become mostly idle, and did this:

   # mount -noremount,clear_cache,space_cache=v1 /localvol5
   # mount -noremount,clear_cache,space_cache=v2 /localvol5

Which, to my surprise, didn't make btrfs complain too much:

   [1378546.558533] BTRFS info (device dm-17): force clearing of disk cache
   [1378546.558536] BTRFS info (device dm-17): enabling disk space caching
   [1378546.558537] BTRFS info (device dm-17): disk space caching is enabled

   [1378553.868438] BTRFS info (device dm-17): enabling free space tree
   [1378553.868440] BTRFS info (device dm-17): using free space tree

I don't know if this was effective in clearing the cache, but it didn't
change the behaviour - as soon as the new process started writing files
again, it was at 100% cpu.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14 11:30 cpu bound I/O behaviour in linux 5.4 (possibly others) Marc Lehmann
2020-02-14 11:57 ` Nikolay Borisov
2020-02-14 12:20   ` Filipe Manana
2020-02-14 20:07     ` Marc Lehmann
2020-02-14 12:40   ` Marc Lehmann
2020-02-14 12:36 ` Roman Mamedov
2020-02-14 12:45   ` Marc Lehmann

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git