All of lore.kernel.org
 help / color / mirror / Atom feed
* performance recommendations
@ 2021-02-15 14:53 Pal, Laszlo
  2021-02-15 19:30 ` Pal, Laszlo
  2021-02-16  7:17 ` Nikolay Borisov
  0 siblings, 2 replies; 13+ messages in thread
From: Pal, Laszlo @ 2021-02-15 14:53 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I'm not sure this is the right place to ask, but let me try :) I have
a server where I mainly using btrfs because of the builtin compress
feature. This is a central log server, storing logs from tens of
thousands devices, using a text files in thousands of directories in
millions of files.

I've started to think it was not the best idea to choose btrfs for this :)

The performance of this server was always worst than others where I
don't use btrfs, but I thought this is just because the i/o overhead
of compression and the not-so-good esx host providing the disk to this
machine. But now, even rm a single file takes ages, so there is
something definitely wrong. So, I'm looking for some recommendations
for such an environment where the data-security functions of btrfs is
not as important than the performance.

I was searching the net for some comprehensive performance documents
for months, but I cannot find it so far.

Thank you in advance
Laszlo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: performance recommendations
  2021-02-15 14:53 performance recommendations Pal, Laszlo
@ 2021-02-15 19:30 ` Pal, Laszlo
       [not found]   ` <B7BDEFC2-2444-4926-8FFC-D78B6CE5CB4E@vlad.hu>
  2021-02-16  7:17 ` Nikolay Borisov
  1 sibling, 1 reply; 13+ messages in thread
From: Pal, Laszlo @ 2021-02-15 19:30 UTC (permalink / raw)
  To: linux-btrfs

So,

I'm trying to recover this stuff... this is a CentOS7 based system
running for almost two years. It was never too fast, but did what I
intended to do, but today I've observed very very bad performance on
ls, rm and other complicated commands. Like rm <any single file> takes
forever and in iotop I can see this command is using 50% of i/o
together with btrfs-transacti, so something definitely wrong

I've added ram and cpu to the VM, but it does not help. Now, I'm also
trying to modify fstab to add noatime, autodefrag

In the journal I can see some "free cache file invalid, skip" warnings

Can anyone offer me some help, so at least I can boot the machine
(right now the boot times out on mount task, so I can have either
emergency mode or rescuecd)

Thank you
Laszlo

On Mon, Feb 15, 2021 at 3:53 PM Pal, Laszlo <vlad@vlad.hu> wrote:
>
> Hi,
>
> I'm not sure this is the right place to ask, but let me try :) I have
> a server where I mainly using btrfs because of the builtin compress
> feature. This is a central log server, storing logs from tens of
> thousands devices, using a text files in thousands of directories in
> millions of files.
>
> I've started to think it was not the best idea to choose btrfs for this :)
>
> The performance of this server was always worst than others where I
> don't use btrfs, but I thought this is just because the i/o overhead
> of compression and the not-so-good esx host providing the disk to this
> machine. But now, even rm a single file takes ages, so there is
> something definitely wrong. So, I'm looking for some recommendations
> for such an environment where the data-security functions of btrfs is
> not as important than the performance.
>
> I was searching the net for some comprehensive performance documents
> for months, but I cannot find it so far.
>
> Thank you in advance
> Laszlo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: performance recommendations
       [not found]   ` <B7BDEFC2-2444-4926-8FFC-D78B6CE5CB4E@vlad.hu>
@ 2021-02-15 21:50     ` Pal, Laszlo
  2021-02-16  6:08       ` Piotr Szymaniak
  0 siblings, 1 reply; 13+ messages in thread
From: Pal, Laszlo @ 2021-02-15 21:50 UTC (permalink / raw)
  To: linux-btrfs

So,

After booting on rescuecd and adding the mount options below the
system can be booted, but it still performing terribly… the main
observation is this

# time rm -f qradar-leef-17.log
real        0m13.648s
user       0m0.000s
sys          0m0.105s

This is a 2Mbyte file and it takes 13 secs to rm…. what can be the
reason behind it and how to fix such weird behavior? Is this a file
system corruption? Some misconfiguration? Or should I start planning
to use some more traditional filesystem for this use-case?


Thank you for your help

Pál, László
vlad@vlad.hu

>
> On 2021. Feb 15., at 20:30, Pal, Laszlo <vlad@vlad.hu> wrote:
>
> So,
>
> I'm trying to recover this stuff... this is a CentOS7 based system
> running for almost two years. It was never too fast, but did what I
> intended to do, but today I've observed very very bad performance on
> ls, rm and other complicated commands. Like rm <any single file> takes
> forever and in iotop I can see this command is using 50% of i/o
> together with btrfs-transacti, so something definitely wrong
>
> I've added ram and cpu to the VM, but it does not help. Now, I'm also
> trying to modify fstab to add noatime, autodefrag
>
> In the journal I can see some "free cache file invalid, skip" warnings
>
> Can anyone offer me some help, so at least I can boot the machine
> (right now the boot times out on mount task, so I can have either
> emergency mode or rescuecd)
>
> Thank you
> Laszlo
>
> On Mon, Feb 15, 2021 at 3:53 PM Pal, Laszlo <vlad@vlad.hu> wrote:
>
>
> Hi,
>
> I'm not sure this is the right place to ask, but let me try :) I have
> a server where I mainly using btrfs because of the builtin compress
> feature. This is a central log server, storing logs from tens of
> thousands devices, using a text files in thousands of directories in
> millions of files.
>
> I've started to think it was not the best idea to choose btrfs for this :)
>
> The performance of this server was always worst than others where I
> don't use btrfs, but I thought this is just because the i/o overhead
> of compression and the not-so-good esx host providing the disk to this
> machine. But now, even rm a single file takes ages, so there is
> something definitely wrong. So, I'm looking for some recommendations
> for such an environment where the data-security functions of btrfs is
> not as important than the performance.
>
> I was searching the net for some comprehensive performance documents
> for months, but I cannot find it so far.
>
> Thank you in advance
> Laszlo
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: performance recommendations
  2021-02-15 21:50     ` Pal, Laszlo
@ 2021-02-16  6:08       ` Piotr Szymaniak
  0 siblings, 0 replies; 13+ messages in thread
From: Piotr Szymaniak @ 2021-02-16  6:08 UTC (permalink / raw)
  To: Pal, Laszlo; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 559 bytes --]

On Mon, Feb 15, 2021 at 10:50:15PM +0100, Pal, Laszlo wrote:
> So,

You should start here:
https://btrfs.wiki.kernel.org/index.php/Btrfs_mailing_list#What_information_to_provide_when_asking_a_support_question

Best regards,
Piotr Szymaniak.
-- 
  - Chyba nie jest pan jednym z tych roniacych lzy liberalow?
  - Odmawiam odpowiedzi, poniewaz moglaby zostac wykorzystana przeciwko
mnie - odparlem. Taksiarz wydal prychniecie oznaczajace dlaczego-ja-
zawsze-trafiam-na-takich-cwaniakow... ale zamknal sie.
  -- Stephen King, "The Breathing Method"

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: performance recommendations
  2021-02-15 14:53 performance recommendations Pal, Laszlo
  2021-02-15 19:30 ` Pal, Laszlo
@ 2021-02-16  7:17 ` Nikolay Borisov
  2021-02-16  8:54   ` Pal, Laszlo
  1 sibling, 1 reply; 13+ messages in thread
From: Nikolay Borisov @ 2021-02-16  7:17 UTC (permalink / raw)
  To: Pal, Laszlo, linux-btrfs



On 15.02.21 г. 16:53 ч., Pal, Laszlo wrote:
> Hi,
> 
> I'm not sure this is the right place to ask, but let me try :) I have
> a server where I mainly using btrfs because of the builtin compress
> feature. This is a central log server, storing logs from tens of
> thousands devices, using a text files in thousands of directories in
> millions of files.
> 
> I've started to think it was not the best idea to choose btrfs for this :)
> 
> The performance of this server was always worst than others where I
> don't use btrfs, but I thought this is just because the i/o overhead
> of compression and the not-so-good esx host providing the disk to this
> machine. But now, even rm a single file takes ages, so there is
> something definitely wrong. So, I'm looking for some recommendations
> for such an environment where the data-security functions of btrfs is
> not as important than the performance.
> 
> I was searching the net for some comprehensive performance documents
> for months, but I cannot find it so far.
> 
> Thank you in advance
> Laszlo
> 

You are likely suffering fragmentation issues, given you hold log files
I'd assume you do a lot of small writes, each one results in a CoW
operation which allocates space.  This results in increasing the size of
the metadata tree and since you are likely using harddrives seeking is
slow. To try and ascertain if that's really the case I'd advise you to
show the output of the following commands:

btrfs fi usage <mountpoint> - this will show the total used space on the
filesystem.

Then run btrfs inspect-internal dump-tree -t5 </dev/xxx> | grep -c
EXTENT_DATA

Which will show how many data extents there are in the filesystem.
Subsequently run btrfs inspect-internal dump-tree -t5 </dev/xxx> | grep
-c leaf which will show how many leaves there are in the filesystem.
Then you have 2 options:

a) Use btrfs defragment to actually rewrite leaves to try and make them
be closer so that seeks are going to become somewhat cheaper,

b) Rewrite the logs files by copying them with no reflinks so that
instead of 1 file consisting of multiple small extents just make them
consist of 1 giant extent, also with your use case I'd assume you also
want nocow to be enabled, unfortunately nodatacow precludes using
compression.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: performance recommendations
  2021-02-16  7:17 ` Nikolay Borisov
@ 2021-02-16  8:54   ` Pal, Laszlo
  2021-02-16  9:02     ` Nikolay Borisov
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Pal, Laszlo @ 2021-02-16  8:54 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

Thank you all for the quick response. The server is running, but as I
said the i/o perf. is not as good as it should be. I'm also thinking
the fragmentation is the issue but I also would like to optimise my
config and if possible keep this server running with acceptable
performance, so let me answer the questions below

So, as far as I see the action plan is the following
- enable v2 space_cache. is this safe/stable enough?
- run defrag on old data, I suppose it will run weeks, but I'm ok with
it if the server can run smoothly during this process
- compress=zstd is the recommended mount option? is this performing
better than the default?
- I'm also thinking to -after defrag- compress my logs with
traditional gzip compression and turn off on-the-fly compress (is this
a huge performance gain?)

Any other suggestions?

Thank you
Laszlo
---

uname -a
3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64
x86_64 x86_64 GNU/Linux

  btrfs --version
  btrfs-progs v4.9.1

  btrfs fi show
  Label: 'centos'  uuid: 7017204b-1582-4b4e-ad04-9e55212c7d46
Total devices 2 FS bytes used 4.03TiB
devid    1 size 491.12GiB used 119.02GiB path /dev/sda2
devid    2 size 4.50TiB used 4.14TiB path /dev/sdb1

  btrfs fi df
  btrfs fi df /var
Data, single: total=4.09TiB, used=3.96TiB
System, RAID1: total=8.00MiB, used=464.00KiB
Metadata, RAID1: total=81.00GiB, used=75.17GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

  dmesg > dmesg.log
  dmesg|grep -i btrfs
  [  491.729364] BTRFS warning (device sdb1): block group
4619266686976 has wrong amount of free space
  [  491.729371] BTRFS warning (device sdb1): failed to load free
space cache for block group 4619266686976, rebuilding it now

  CPU type and model
  processor : 11
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU           E5540  @ 2.53GHz
stepping : 4
microcode : 0x1d
cpu MHz : 2533.423
cache size : 8192 KB
12 vCPU on esxi

how much memory
48 GB RAM

type and model of hard disk
virtualized Fujitsu RAID on esxi

is it raid
yes, the underlying virtualization provides redundancy, no sw RAID

Kernel version
3.10.0-1160.6.1.el7.x86_64

your btrfs mount options probably in /etc/fstab
UUID=7017204b-1582-4b4e-ad04-9e55212c7d46 /
btrfs   defaults,noatime,autodefrag,subvol=root     0 0
UUID=7017204b-1582-4b4e-ad04-9e55212c7d46 /var
btrfs   defaults,subvol=var,noatime,autodefrag      0 0

size of log files
4,5TB on /var

have you snapshots
no

have you tries tools like dedup remover
not yet

things you do

1. Kernel update LTS kernel has been updated to 5.10 (maybe you have
to install it manually, because centos will be dropped -> reboot
maybe you have to remove your mount point in fstab and boot into
system and mount it later manually.
Is this absolutely necessary?

2. set mount options in fstab
    defaults,autodefrag,space_cache=v2,compress=zstd (autodefrag only on HDD)
    defaults,ssd,space_cache=v2,compress=zstd (for ssd)

  autodefrag is already enabled. v2 space_cache is safe enough?

3. sudo btrfs scrub start /dev/sda (use your device)
    watch sudo btrfs scrub status /dev/sda (watch and wait until finished)

4. sudo btrfs device stats /dev/sda (your disk)

5.install smartmontools
   run sudo smartctl -x /dev/sda (use your disk)
   check
I think this is not applicable because this is a virtual disk,

On Tue, Feb 16, 2021 at 8:17 AM Nikolay Borisov <nborisov@suse.com> wrote:
>
>
>
> On 15.02.21 г. 16:53 ч., Pal, Laszlo wrote:
> > Hi,
> >
> > I'm not sure this is the right place to ask, but let me try :) I have
> > a server where I mainly using btrfs because of the builtin compress
> > feature. This is a central log server, storing logs from tens of
> > thousands devices, using a text files in thousands of directories in
> > millions of files.
> >
> > I've started to think it was not the best idea to choose btrfs for this :)
> >
> > The performance of this server was always worst than others where I
> > don't use btrfs, but I thought this is just because the i/o overhead
> > of compression and the not-so-good esx host providing the disk to this
> > machine. But now, even rm a single file takes ages, so there is
> > something definitely wrong. So, I'm looking for some recommendations
> > for such an environment where the data-security functions of btrfs is
> > not as important than the performance.
> >
> > I was searching the net for some comprehensive performance documents
> > for months, but I cannot find it so far.
> >
> > Thank you in advance
> > Laszlo
> >
>
> You are likely suffering fragmentation issues, given you hold log files
> I'd assume you do a lot of small writes, each one results in a CoW
> operation which allocates space.  This results in increasing the size of
> the metadata tree and since you are likely using harddrives seeking is
> slow. To try and ascertain if that's really the case I'd advise you to
> show the output of the following commands:
>
> btrfs fi usage <mountpoint> - this will show the total used space on the
> filesystem.
>
> Then run btrfs inspect-internal dump-tree -t5 </dev/xxx> | grep -c
> EXTENT_DATA
>
> Which will show how many data extents there are in the filesystem.
> Subsequently run btrfs inspect-internal dump-tree -t5 </dev/xxx> | grep
> -c leaf which will show how many leaves there are in the filesystem.
> Then you have 2 options:
>
> a) Use btrfs defragment to actually rewrite leaves to try and make them
> be closer so that seeks are going to become somewhat cheaper,
>
> b) Rewrite the logs files by copying them with no reflinks so that
> instead of 1 file consisting of multiple small extents just make them
> consist of 1 giant extent, also with your use case I'd assume you also
> want nocow to be enabled, unfortunately nodatacow precludes using
> compression.
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: performance recommendations
  2021-02-16  8:54   ` Pal, Laszlo
@ 2021-02-16  9:02     ` Nikolay Borisov
  2021-02-16 10:28     ` Lionel Bouton
       [not found]     ` <aeed56c3-e641-46a1-5692-04c6ae75d212@gmail.com>
  2 siblings, 0 replies; 13+ messages in thread
From: Nikolay Borisov @ 2021-02-16  9:02 UTC (permalink / raw)
  To: Pal, Laszlo; +Cc: linux-btrfs



On 16.02.21 г. 10:54 ч., Pal, Laszlo wrote:
> Thank you all for the quick response. The server is running, but as I
> said the i/o perf. is not as good as it should be. I'm also thinking
> the fragmentation is the issue but I also would like to optimise my
> config and if possible keep this server running with acceptable
> performance, so let me answer the questions below
> 
> So, as far as I see the action plan is the following
> - enable v2 space_cache. is this safe/stable enough?
> - run defrag on old data, I suppose it will run weeks, but I'm ok with
> it if the server can run smoothly during this process
> - compress=zstd is the recommended mount option? is this performing
> better than the default?
> - I'm also thinking to -after defrag- compress my logs with
> traditional gzip compression and turn off on-the-fly compress (is this
> a huge performance gain?)
> 
> Any other suggestions?
> 
> Thank you
> Laszlo
> ---
> 
> uname -a
> 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64
> x86_64 x86_64 GNU/Linux

Ok, first of all this is a vendor kernel, most likely CentOS or
something like that. I have no idea what is the state of btrfs in this
kernel as such any statements that regarding stability of particular
feature are essentially invalid as I can only base my answers on
upstream kernels or on SUSE-derived kernels.

Given this my suggestion is for you to try and upgrade to a recent
upstream kernel. The latest is best, if you prefer stability you can try
some of the older (4.4/4.14/4.19/5.4) kernels. But you can expect btrfs
to have best performance with the latest stable which is 5.10.x at the
moment.



Bear in mind that btrfs is in active development so between 3.10 and the
current upstream - 5.10 there has been _a lot_ of changes whiich result
in better performance as well as fixed bugs.

> 
>   btrfs --version
>   btrfs-progs v4.9.1
> 

<snip>

> how much memory
> 48 GB RAM
> 
> type and model of hard disk
> virtualized Fujitsu RAID on esxi
> 
> is it raid
> yes, the underlying virtualization provides redundancy, no sw RAID
> 
> Kernel version
> 3.10.0-1160.6.1.el7.x86_64
> 
> your btrfs mount options probably in /etc/fstab
> UUID=7017204b-1582-4b4e-ad04-9e55212c7d46 /
> btrfs   defaults,noatime,autodefrag,subvol=root     0 0
> UUID=7017204b-1582-4b4e-ad04-9e55212c7d46 /var
> btrfs   defaults,subvol=var,noatime,autodefrag      0 0
> 
> size of log files
> 4,5TB on /var
> 
> have you snapshots
> no
> 
> have you tries tools like dedup remover
> not yet
> 
> things you do
> 
> 1. Kernel update LTS kernel has been updated to 5.10 (maybe you have
> to install it manually, because centos will be dropped -> reboot
> maybe you have to remove your mount point in fstab and boot into
> system and mount it later manually.
> Is this absolutely necessary?
> 
> 2. set mount options in fstab
>     defaults,autodefrag,space_cache=v2,compress=zstd (autodefrag only on HDD)
>     defaults,ssd,space_cache=v2,compress=zstd (for ssd)
> 
>   autodefrag is already enabled. v2 space_cache is safe enough?
> 
> 3. sudo btrfs scrub start /dev/sda (use your device)
>     watch sudo btrfs scrub status /dev/sda (watch and wait until finished)
> 
> 4. sudo btrfs device stats /dev/sda (your disk)
> 
> 5.install smartmontools
>    run sudo smartctl -x /dev/sda (use your disk)
>    check
> I think this is not applicable because this is a virtual disk,

<snip>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: performance recommendations
  2021-02-16  8:54   ` Pal, Laszlo
  2021-02-16  9:02     ` Nikolay Borisov
@ 2021-02-16 10:28     ` Lionel Bouton
       [not found]     ` <aeed56c3-e641-46a1-5692-04c6ae75d212@gmail.com>
  2 siblings, 0 replies; 13+ messages in thread
From: Lionel Bouton @ 2021-02-16 10:28 UTC (permalink / raw)
  To: Pal, Laszlo, Nikolay Borisov; +Cc: linux-btrfs

Hi,

Le 16/02/2021 à 09:54, Pal, Laszlo a écrit :
> [...]
> So, as far as I see the action plan is the following
> - enable v2 space_cache. is this safe/stable enough?
> - run defrag on old data, I suppose it will run weeks, but I'm ok with
> it if the server can run smoothly during this process
> - compress=zstd is the recommended mount option? is this performing
> better than the default?
> - I'm also thinking to -after defrag- compress my logs with
> traditional gzip compression and turn off on-the-fly compress (is this
> a huge performance gain?)
>
> [...]
>
> 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64
> x86_64 x86_64 GNU/Linux
>

As Nikolay pointed out, this is a vendor kernel based on a very (more
than 7 years) old kernel version and this can create problems.
For reference, see
https://btrfs.wiki.kernel.org/index.php/Changelog#By_feature

- v2 space_cache appeared in 4.3.
- compress=zstd appeared in 4.14.

So for the 2 questions related to those, you'll have to ask the
distribution if they back-ported them (I doubt it, usually only bugfixes
are backported).

I wouldn't be comfortable using a 3.10 based kernel with BTRFS. For
example there was at least one compress=lzo bug (race condition?) that
corrupted data on occasions that was fixed (from memory) in either a
late 3.x kernel or a 4.x kernel. The stability and performance on such a
base will not compare well with the current state of BTRFS.

If you really want to go ahead with this kernel and BTRFS I would at
least avoid compression with it and as you suggested in your last point
compress at the application level.

Note that compression will make fragmentation worse. BTRFS uses small
individually compressed extents (probably because there isn't any other
decent way to minimize the costs of seeking into a file). The more
extents you have the more opportunity for fragmentation exist.

For defragmentation, I use something I coded to replace autodefrag which
was not usable in my use cases :
https://github.com/jtek/ceph-utils/blob/master/btrfs-defrag-scheduler.rb
The complexity is worth it for me because it allows good performance on
filesystem where I need BTRFS (either for checksums or snapshots). For a
log server I wouldn't consider it but you already bit the BTRFS bullet
so it might help depending on the details (it could be adapted to handle
a transition to a more sane state for example).
Initially it was fine-tuned to handle Ceph OSDs and latter on adapted to
very large BTRFS volumes backing NFS servers, backup servers (see the
README in the same repository) or even some of our PostgreSQL replicas.
The initialization on an existing filesystem needs special (undocumented
unfortunately) care though. By default the first pass goes very fast to
get an estimation of the number of files and this can create a very
large I/O load. If you decide to test it, I can provide directions (or
update the documentation). For a pure log server it is overkill (you
could simply defragment files on file rotation).


To sum up : if I were in your position I would probably choose between
these alternatives :
- switch to ext4 (maybe the easiest unless the migration is impractical),
- defragment old files that aren't written to anymore and schedule
defragmentation when log files are archived (maybe using logrotate),
- use my defragmentation scheduler as a last resort (might be a solution
if you store other data than logs in the filesystem too).

In all cases I would avoid BTRFS compression and compress on log
rotation. You'll get better performance and compression this way.

If you can update the kernel, use space_cache=v2, it is stable on recent
kernels (I don't even remember it being buggy even with the earlier
kernels).

Best regards,

Lionel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: performance recommendations
       [not found]     ` <aeed56c3-e641-46a1-5692-04c6ae75d212@gmail.com>
@ 2021-02-16 17:03       ` Pal, Laszlo
  2021-02-16 17:49         ` Leonidas Spyropoulos
  0 siblings, 1 reply; 13+ messages in thread
From: Pal, Laszlo @ 2021-02-16 17:03 UTC (permalink / raw)
  To: Roman Stingler, linux-btrfs

Hi,

Thank you. So, I've installed a new centos7 with the same
configuration, old kernel and using btrfs. Then, upgraded the kernel
to 5.11 and all went well, so I thought let's do it on the prod server

Unfortunately when I boot on 5.11 sysroot mount times out and I have
something like this in log

btrfs open ctree failed

Any quick fix for this? I'm able to mount btrfs volume using a rescuCD
but I have the same issues, like rm a big file takes 10 minutes....

Thank you
Laszlo

On Tue, Feb 16, 2021 at 2:08 PM Roman Stingler <roman.stingler@gmail.com> wrote:
>
> first update your kernel to 5.10 it is lts now and try again.
>
> there have been 1 million updates to stability and performance
> improvements in the past year.
>
>
>
> On 2/16/21 9:54 AM, Pal, Laszlo wrote:
> > Thank you all for the quick response. The server is running, but as I
> > said the i/o perf. is not as good as it should be. I'm also thinking
> > the fragmentation is the issue but I also would like to optimise my
> > config and if possible keep this server running with acceptable
> > performance, so let me answer the questions below
> >
> > So, as far as I see the action plan is the following
> > - enable v2 space_cache. is this safe/stable enough?
> > - run defrag on old data, I suppose it will run weeks, but I'm ok with
> > it if the server can run smoothly during this process
> > - compress=zstd is the recommended mount option? is this performing
> > better than the default?
> > - I'm also thinking to -after defrag- compress my logs with
> > traditional gzip compression and turn off on-the-fly compress (is this
> > a huge performance gain?)
> >
> > Any other suggestions?
> >
> > Thank you
> > Laszlo
> > ---
> >
> > uname -a
> > 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64
> > x86_64 x86_64 GNU/Linux
> >
> >    btrfs --version
> >    btrfs-progs v4.9.1
> >
> >    btrfs fi show
> >    Label: 'centos'  uuid: 7017204b-1582-4b4e-ad04-9e55212c7d46
> > Total devices 2 FS bytes used 4.03TiB
> > devid    1 size 491.12GiB used 119.02GiB path /dev/sda2
> > devid    2 size 4.50TiB used 4.14TiB path /dev/sdb1
> >
> >    btrfs fi df
> >    btrfs fi df /var
> > Data, single: total=4.09TiB, used=3.96TiB
> > System, RAID1: total=8.00MiB, used=464.00KiB
> > Metadata, RAID1: total=81.00GiB, used=75.17GiB
> > GlobalReserve, single: total=512.00MiB, used=0.00B
> >
> >    dmesg > dmesg.log
> >    dmesg|grep -i btrfs
> >    [  491.729364] BTRFS warning (device sdb1): block group
> > 4619266686976 has wrong amount of free space
> >    [  491.729371] BTRFS warning (device sdb1): failed to load free
> > space cache for block group 4619266686976, rebuilding it now
> >
> >    CPU type and model
> >    processor : 11
> > vendor_id : GenuineIntel
> > cpu family : 6
> > model : 26
> > model name : Intel(R) Xeon(R) CPU           E5540  @ 2.53GHz
> > stepping : 4
> > microcode : 0x1d
> > cpu MHz : 2533.423
> > cache size : 8192 KB
> > 12 vCPU on esxi
> >
> > how much memory
> > 48 GB RAM
> >
> > type and model of hard disk
> > virtualized Fujitsu RAID on esxi
> >
> > is it raid
> > yes, the underlying virtualization provides redundancy, no sw RAID
> >
> > Kernel version
> > 3.10.0-1160.6.1.el7.x86_64
> >
> > your btrfs mount options probably in /etc/fstab
> > UUID=7017204b-1582-4b4e-ad04-9e55212c7d46 /
> > btrfs   defaults,noatime,autodefrag,subvol=root     0 0
> > UUID=7017204b-1582-4b4e-ad04-9e55212c7d46 /var
> > btrfs   defaults,subvol=var,noatime,autodefrag      0 0
> >
> > size of log files
> > 4,5TB on /var
> >
> > have you snapshots
> > no
> >
> > have you tries tools like dedup remover
> > not yet
> >
> > things you do
> >
> > 1. Kernel update LTS kernel has been updated to 5.10 (maybe you have
> > to install it manually, because centos will be dropped -> reboot
> > maybe you have to remove your mount point in fstab and boot into
> > system and mount it later manually.
> > Is this absolutely necessary?
> >
> > 2. set mount options in fstab
> >      defaults,autodefrag,space_cache=v2,compress=zstd (autodefrag only on HDD)
> >      defaults,ssd,space_cache=v2,compress=zstd (for ssd)
> >
> >    autodefrag is already enabled. v2 space_cache is safe enough?
> >
> > 3. sudo btrfs scrub start /dev/sda (use your device)
> >      watch sudo btrfs scrub status /dev/sda (watch and wait until finished)
> >
> > 4. sudo btrfs device stats /dev/sda (your disk)
> >
> > 5.install smartmontools
> >     run sudo smartctl -x /dev/sda (use your disk)
> >     check
> > I think this is not applicable because this is a virtual disk,
> >
> > On Tue, Feb 16, 2021 at 8:17 AM Nikolay Borisov <nborisov@suse.com> wrote:
> >>
> >>
> >> On 15.02.21 г. 16:53 ч., Pal, Laszlo wrote:
> >>> Hi,
> >>>
> >>> I'm not sure this is the right place to ask, but let me try :) I have
> >>> a server where I mainly using btrfs because of the builtin compress
> >>> feature. This is a central log server, storing logs from tens of
> >>> thousands devices, using a text files in thousands of directories in
> >>> millions of files.
> >>>
> >>> I've started to think it was not the best idea to choose btrfs for this :)
> >>>
> >>> The performance of this server was always worst than others where I
> >>> don't use btrfs, but I thought this is just because the i/o overhead
> >>> of compression and the not-so-good esx host providing the disk to this
> >>> machine. But now, even rm a single file takes ages, so there is
> >>> something definitely wrong. So, I'm looking for some recommendations
> >>> for such an environment where the data-security functions of btrfs is
> >>> not as important than the performance.
> >>>
> >>> I was searching the net for some comprehensive performance documents
> >>> for months, but I cannot find it so far.
> >>>
> >>> Thank you in advance
> >>> Laszlo
> >>>
> >> You are likely suffering fragmentation issues, given you hold log files
> >> I'd assume you do a lot of small writes, each one results in a CoW
> >> operation which allocates space.  This results in increasing the size of
> >> the metadata tree and since you are likely using harddrives seeking is
> >> slow. To try and ascertain if that's really the case I'd advise you to
> >> show the output of the following commands:
> >>
> >> btrfs fi usage <mountpoint> - this will show the total used space on the
> >> filesystem.
> >>
> >> Then run btrfs inspect-internal dump-tree -t5 </dev/xxx> | grep -c
> >> EXTENT_DATA
> >>
> >> Which will show how many data extents there are in the filesystem.
> >> Subsequently run btrfs inspect-internal dump-tree -t5 </dev/xxx> | grep
> >> -c leaf which will show how many leaves there are in the filesystem.
> >> Then you have 2 options:
> >>
> >> a) Use btrfs defragment to actually rewrite leaves to try and make them
> >> be closer so that seeks are going to become somewhat cheaper,
> >>
> >> b) Rewrite the logs files by copying them with no reflinks so that
> >> instead of 1 file consisting of multiple small extents just make them
> >> consist of 1 giant extent, also with your use case I'd assume you also
> >> want nocow to be enabled, unfortunately nodatacow precludes using
> >> compression.
> >>
> >>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: performance recommendations
  2021-02-16 17:03       ` Pal, Laszlo
@ 2021-02-16 17:49         ` Leonidas Spyropoulos
  2021-02-16 18:01           ` Pal, Laszlo
  0 siblings, 1 reply; 13+ messages in thread
From: Leonidas Spyropoulos @ 2021-02-16 17:49 UTC (permalink / raw)
  To: linux-btrfs

Hi Laszlo,

On 16/02/21, Pal, Laszlo wrote:
> Hi,
> 
> Thank you. So, I've installed a new centos7 with the same
> configuration, old kernel and using btrfs. Then, upgraded the kernel
> to 5.11 and all went well, so I thought let's do it on the prod server
> 
Since this is a VM can you clone the disk / partition and attach it to
another VM which running a newer kernel and btrfs progs?

This way you can try debugging it without affecting prod server.

> Unfortunately when I boot on 5.11 sysroot mount times out and I have
> something like this in log
> 
> btrfs open ctree failed
So before that `dmesg` doesn't have any relevant logs?
> 
> Any quick fix for this? I'm able to mount btrfs volume using a rescuCD
> but I have the same issues, like rm a big file takes 10 minutes....

If you manage to mount the disk in a newer kernel and btrfs progs try
creating a new file system to take advantage of the new feature (on
creation) - then migrate the data and follow the recommendations
mentioned already.

Cheers,

-- 
Leonidas Spyropoulos

A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: performance recommendations
  2021-02-16 17:49         ` Leonidas Spyropoulos
@ 2021-02-16 18:01           ` Pal, Laszlo
  2021-02-16 18:21             ` Lionel Bouton
  0 siblings, 1 reply; 13+ messages in thread
From: Pal, Laszlo @ 2021-02-16 18:01 UTC (permalink / raw)
  To: Leonidas Spyropoulos, linux-btrfs

Hi,

Thank you. If I have to clone, I think I'll just get rid of the
machine and recreate with some other file system. I'm aware, this is
my fault -lack of research and time pressure-, but I think if I can
boot it with the old kernel I'll keep it running as long as it can and
I'll use this time to create another, better designed machine.

Answering your question regarding the ctree, no there is nothing else
in the log but when I check dmesg on the booted rescueCD during mount,
I can see some similar message "btrfs transaction blocked more than
xxx seconds" and the the end "open_ctree", so it seems I really have
some file system corruption as the root cause (maybe created by some
bugs in the old code, or some unexpected reboot)

Thx
Laszlo

On Tue, Feb 16, 2021 at 6:52 PM Leonidas Spyropoulos
<artafinde@gmail.com> wrote:
>
> Hi Laszlo,
>
> On 16/02/21, Pal, Laszlo wrote:
> > Hi,
> >
> > Thank you. So, I've installed a new centos7 with the same
> > configuration, old kernel and using btrfs. Then, upgraded the kernel
> > to 5.11 and all went well, so I thought let's do it on the prod server
> >
> Since this is a VM can you clone the disk / partition and attach it to
> another VM which running a newer kernel and btrfs progs?
>
> This way you can try debugging it without affecting prod server.
>
> > Unfortunately when I boot on 5.11 sysroot mount times out and I have
> > something like this in log
> >
> > btrfs open ctree failed
> So before that `dmesg` doesn't have any relevant logs?
> >
> > Any quick fix for this? I'm able to mount btrfs volume using a rescuCD
> > but I have the same issues, like rm a big file takes 10 minutes....
>
> If you manage to mount the disk in a newer kernel and btrfs progs try
> creating a new file system to take advantage of the new feature (on
> creation) - then migrate the data and follow the recommendations
> mentioned already.
>
> Cheers,
>
> --
> Leonidas Spyropoulos
>
> A: Because it messes up the order in which people normally read text.
> Q: Why is it such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing on usenet and in e-mail?
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: performance recommendations
  2021-02-16 18:01           ` Pal, Laszlo
@ 2021-02-16 18:21             ` Lionel Bouton
  2021-02-16 19:31               ` Pal, Laszlo
  0 siblings, 1 reply; 13+ messages in thread
From: Lionel Bouton @ 2021-02-16 18:21 UTC (permalink / raw)
  To: Pal, Laszlo, Leonidas Spyropoulos, linux-btrfs

Hi,

Le 16/02/2021 à 19:01, Pal, Laszlo a écrit :
> Hi,
>
> Thank you. If I have to clone, I think I'll just get rid of the
> machine and recreate with some other file system. I'm aware, this is
> my fault -lack of research and time pressure-, but I think if I can
> boot it with the old kernel I'll keep it running as long as it can and
> I'll use this time to create another, better designed machine.
>
> Answering your question regarding the ctree, no there is nothing else
> in the log but when I check dmesg on the booted rescueCD during mount,
> I can see some similar message "btrfs transaction blocked more than
> xxx seconds" and the the end "open_ctree", so it seems I really have
> some file system corruption as the root cause (maybe created by some
> bugs in the old code, or some unexpected reboot)

From experience systemd has mount timeouts which will result in
open_ctree errors with healthy but slow filesystems.
IIRC x-systemd.mount-timeout=infinity in fstab can be used to avoid
these errors.

Best regards,

Lionel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: performance recommendations
  2021-02-16 18:21             ` Lionel Bouton
@ 2021-02-16 19:31               ` Pal, Laszlo
  0 siblings, 0 replies; 13+ messages in thread
From: Pal, Laszlo @ 2021-02-16 19:31 UTC (permalink / raw)
  To: Lionel Bouton; +Cc: Leonidas Spyropoulos, linux-btrfs

Thanks. Unfortunately, this machine with this file system can boot
only with 3.10 kernel somehow. Now it is started, and I hope it can
hold the line while I'm creating another machine with a different FS.
This is not a criticism towards BTRFS, this is a criticism to myself
to not thinking enough before implementing this service

Thanks all for the help

Laszlo


> open_ctree errors with healthy but slow filesystems.
> IIRC x-systemd.mount-timeout=infinity in fstab can be used to avoid
> these errors.
>
> Best regards,
>
> Lionel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-02-16 19:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-15 14:53 performance recommendations Pal, Laszlo
2021-02-15 19:30 ` Pal, Laszlo
     [not found]   ` <B7BDEFC2-2444-4926-8FFC-D78B6CE5CB4E@vlad.hu>
2021-02-15 21:50     ` Pal, Laszlo
2021-02-16  6:08       ` Piotr Szymaniak
2021-02-16  7:17 ` Nikolay Borisov
2021-02-16  8:54   ` Pal, Laszlo
2021-02-16  9:02     ` Nikolay Borisov
2021-02-16 10:28     ` Lionel Bouton
     [not found]     ` <aeed56c3-e641-46a1-5692-04c6ae75d212@gmail.com>
2021-02-16 17:03       ` Pal, Laszlo
2021-02-16 17:49         ` Leonidas Spyropoulos
2021-02-16 18:01           ` Pal, Laszlo
2021-02-16 18:21             ` Lionel Bouton
2021-02-16 19:31               ` Pal, Laszlo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.