All of lore.kernel.org
 help / color / mirror / Atom feed
* BTRFS and databases
@ 2018-08-01  3:45 MegaBrutal
  2018-08-01  8:48 ` Duncan
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: MegaBrutal @ 2018-08-01  3:45 UTC (permalink / raw)
  To: linux-btrfs

Hi all,

I know it's a decade-old question, but I'd like to hear your thoughts
of today. By now, I became a heavy BTRFS user. Almost everywhere I use
BTRFS, except in situations when it is obvious there is no benefit
(e.g. /var/log, /boot). At home, all my desktop, laptop and server
computers are mainly running on BTRFS with only a few file systems on
ext4. I even installed BTRFS in corporate productive systems (in those
cases, the systems were mainly on ext4; but there were some specific
file systems those exploited BTRFS features).

But there is still one question that I can't get over: if you store a
database (e.g. MySQL), would you prefer having a BTRFS volume mounted
with nodatacow, or would you just simply use ext4?

I know that with nodatacow, I take away most of the benefits of BTRFS
(those are actually hurting database performance – the exact CoW
nature that is elsewhere a blessing, with databases it's a drawback).
But are there any advantages of still sticking to BTRFS for a database
albeit CoW is disabled, or should I just return to the old and
reliable ext4 for those applications?


Kind regards,
MegaBrutal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-01  3:45 BTRFS and databases MegaBrutal
@ 2018-08-01  8:48 ` Duncan
  2018-08-01  8:56 ` Hugo Mills
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Duncan @ 2018-08-01  8:48 UTC (permalink / raw)
  To: linux-btrfs

MegaBrutal posted on Wed, 01 Aug 2018 05:45:15 +0200 as excerpted:

> But there is still one question that I can't get over: if you store a
> database (e.g. MySQL), would you prefer having a BTRFS volume mounted
> with nodatacow, or would you just simply use ext4?
> 
> I know that with nodatacow, I take away most of the benefits of BTRFS
> (those are actually hurting database performance – the exact CoW nature
> that is elsewhere a blessing, with databases it's a drawback). But are
> there any advantages of still sticking to BTRFS for a database albeit
> CoW is disabled, or should I just return to the old and reliable ext4
> for those applications?

Good question, on which I might expect some honest disagreement on the 
answer.

Personally, I tend to hate nocow with a passion, and would thus recommend 
putting databases and similar write-pattern (VM images...) files on their 
own dedicated non-btrfs (ext4, etc) if at all reasonable.

But that comes from a general split partition-favoring viewpoint, where 
doing another partition/lvm-volume and putting a different filesystem on 
it is no big deal, as it's just one more partition/volume to manage of 
(likely) several.

Some distros/companies/installations have policies strongly favoring 
btrfs for its "storage pool" features, trying to keep things simple and 
flexible by using just the one solution and one big btrfs and throwing 
everything onto it, often using btrfs subvolumes where others would use 
separate partitions/volumes with independent filesystems.  For these 
folks, the flexibility of being able to throw it all on one filesystem 
with subvolumes overrides the down sides of having to deal with nocow and 
its conditions, rules and additional risk.

And a big part of that flexibility, along with being a feature in its own 
right, is btrfs built-in multi-device, without having to resort to an 
additional multi-device layer such as lvm or mdraid.


So if you're using btrfs for multi-device or other features that nocow 
doesn't affect, it's plausible that you'd prefer nocow on btrfs to 
/having/ to do partitioning/lvm/mdraid and setup that separate non-btrfs 
just for your database (or vm image) files.

But from your post you're perfectly fine with partitioning and the like 
already, and won't consider it a heavy imposition to deal with a separate 
non-btrfs, ext4 or whatever, and in that case, at least here, I'd 
strongly recommend you do just that, avoiding the nocow that I honestly 
see as a compromise best left to those that really need it because they 
aren't prepared to deal with the hassle of setting up the separate 
filesystem along with all that entails.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-01  3:45 BTRFS and databases MegaBrutal
  2018-08-01  8:48 ` Duncan
@ 2018-08-01  8:56 ` Hugo Mills
  2018-08-02  9:16   ` Martin Steigerwald
  2018-08-01  8:59 ` Mike Fleetwood
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Hugo Mills @ 2018-08-01  8:56 UTC (permalink / raw)
  To: MegaBrutal; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1745 bytes --]

On Wed, Aug 01, 2018 at 05:45:15AM +0200, MegaBrutal wrote:
> I know it's a decade-old question, but I'd like to hear your thoughts
> of today. By now, I became a heavy BTRFS user. Almost everywhere I use
> BTRFS, except in situations when it is obvious there is no benefit
> (e.g. /var/log, /boot). At home, all my desktop, laptop and server
> computers are mainly running on BTRFS with only a few file systems on
> ext4. I even installed BTRFS in corporate productive systems (in those
> cases, the systems were mainly on ext4; but there were some specific
> file systems those exploited BTRFS features).
> 
> But there is still one question that I can't get over: if you store a
> database (e.g. MySQL), would you prefer having a BTRFS volume mounted
> with nodatacow, or would you just simply use ext4?

   Personally, I'd start with btrfs with autodefrag. It has some
degree of I/O overhead, but if the database isn't performance-critical
and already near the limits of the hardware, it's unlikely to make
much difference. Autodefrag should keep the fragmentation down to a
minimum.

   Hugo.

> I know that with nodatacow, I take away most of the benefits of BTRFS
> (those are actually hurting database performance – the exact CoW
> nature that is elsewhere a blessing, with databases it's a drawback).
> But are there any advantages of still sticking to BTRFS for a database
> albeit CoW is disabled, or should I just return to the old and
> reliable ext4 for those applications?
> 
> 
> Kind regards,
> MegaBrutal

-- 
Hugo Mills             | In theory, theory and practice are the same. In
hugo@... carfax.org.uk | practice, they're different.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-01  3:45 BTRFS and databases MegaBrutal
  2018-08-01  8:48 ` Duncan
  2018-08-01  8:56 ` Hugo Mills
@ 2018-08-01  8:59 ` Mike Fleetwood
  2018-08-01 11:21 ` Adam Borowski
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Mike Fleetwood @ 2018-08-01  8:59 UTC (permalink / raw)
  To: MegaBrutal; +Cc: linux-btrfs

On 1 August 2018 at 04:45, MegaBrutal <megabrutal@gmail.com> wrote:

> But there is still one question that I can't get over: if you store a
> database (e.g. MySQL), would you prefer having a BTRFS volume mounted
> with nodatacow, or would you just simply use ext4?
>
> I know that with nodatacow, I take away most of the benefits of BTRFS
> (those are actually hurting database performance – the exact CoW
> nature that is elsewhere a blessing, with databases it's a drawback).
> But are there any advantages of still sticking to BTRFS for a database
> albeit CoW is disabled, or should I just return to the old and
> reliable ext4 for those applications?

Also note that no data CoW implies no data checksums too.
https://btrfs.wiki.kernel.org/index.php/FAQ#Can_I_have_nodatacow_.28or_chattr_.2BC.29_but_still_have_checksumming.3F

Mike

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-01  3:45 BTRFS and databases MegaBrutal
                   ` (2 preceding siblings ...)
  2018-08-01  8:59 ` Mike Fleetwood
@ 2018-08-01 11:21 ` Adam Borowski
  2018-08-01 12:19 ` Austin S. Hemmelgarn
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Adam Borowski @ 2018-08-01 11:21 UTC (permalink / raw)
  To: linux-btrfs

On Wed, Aug 01, 2018 at 05:45:15AM +0200, MegaBrutal wrote:
> But there is still one question that I can't get over: if you store a
> database (e.g. MySQL), would you prefer having a BTRFS volume mounted
> with nodatacow, or would you just simply use ext4?
> 
> I know that with nodatacow, I take away most of the benefits of BTRFS
> (those are actually hurting database performance – the exact CoW
> nature that is elsewhere a blessing, with databases it's a drawback).
> But are there any advantages of still sticking to BTRFS for a database
> albeit CoW is disabled, or should I just return to the old and
> reliable ext4 for those applications?

Is this database performance-critical?

If yes, you'd want ext4 -- nocow is a crappy ext4 lookalike, with no
benefits of btrfs.  Or, if you snapshot it, you get bad fragmentation yet no
checksums/etc.

If no, regular cow (especially with autodefrag) will be enough.  Sure, this
particular load won't be as performant (mysql really loves fsync, which is
an anathema to btrfs), but you get all the data safety improvements,
frequent cheap backups, and so on.

Thus: if the server's primary purpose is that database, you don't want
btrfs.  If the database is merely incidental, not microoptimizing it will
save a lot of your time.

In neither case nocow is a good idea.  Especially if raid (!= 0) is
involved.


Meow!
-- 
// If you believe in so-called "intellectual property", please immediately
// cease using counterfeit alphabets.  Instead, contact the nearest temple
// of Amon, whose priests will provide you with scribal services for all
// your writing needs, for Reasonable And Non-Discriminatory prices.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-01  3:45 BTRFS and databases MegaBrutal
                   ` (3 preceding siblings ...)
  2018-08-01 11:21 ` Adam Borowski
@ 2018-08-01 12:19 ` Austin S. Hemmelgarn
  2018-08-01 14:33 ` Remi Gauvin
  2018-08-02  7:02 ` Qu Wenruo
  6 siblings, 0 replies; 19+ messages in thread
From: Austin S. Hemmelgarn @ 2018-08-01 12:19 UTC (permalink / raw)
  To: MegaBrutal, linux-btrfs

On 2018-07-31 23:45, MegaBrutal wrote:
> Hi all,
> 
> I know it's a decade-old question, but I'd like to hear your thoughts
> of today. By now, I became a heavy BTRFS user. Almost everywhere I use
> BTRFS, except in situations when it is obvious there is no benefit
> (e.g. /var/log, /boot). At home, all my desktop, laptop and server
> computers are mainly running on BTRFS with only a few file systems on
> ext4. I even installed BTRFS in corporate productive systems (in those
> cases, the systems were mainly on ext4; but there were some specific
> file systems those exploited BTRFS features).
> 
> But there is still one question that I can't get over: if you store a
> database (e.g. MySQL), would you prefer having a BTRFS volume mounted
> with nodatacow, or would you just simply use ext4?
> 
> I know that with nodatacow, I take away most of the benefits of BTRFS
> (those are actually hurting database performance – the exact CoW
> nature that is elsewhere a blessing, with databases it's a drawback).
> But are there any advantages of still sticking to BTRFS for a database
> albeit CoW is disabled, or should I just return to the old and
> reliable ext4 for those applications?
You still have snapshotting, which can be useful if you for some reason 
can't just dump all the tables to SQL for backups (but seriously, that's 
_really_ what you should be doing to back up your database, it decouples 
it from whatever back-end storage you're using).  You also still have 
the guaranteed metadata consistency (nodatacow disables COW for data 
chunks only, COW still happens for metadata chunks), but you can get 
(close) to that with the newest versions of XFS too.

If you want to use BTRFS, I'd suggest playing around with the different 
on-disk storage formats offered by your RDBMS (MySQL in this case). 
Some of the offer measurably better performance on BTRFS than others, 
but it's at least partially dependent on how your software uses that 
database.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-01  3:45 BTRFS and databases MegaBrutal
                   ` (4 preceding siblings ...)
  2018-08-01 12:19 ` Austin S. Hemmelgarn
@ 2018-08-01 14:33 ` Remi Gauvin
  2018-08-02  7:07   ` Qu Wenruo
  2018-08-02  7:02 ` Qu Wenruo
  6 siblings, 1 reply; 19+ messages in thread
From: Remi Gauvin @ 2018-08-01 14:33 UTC (permalink / raw)
  To: MegaBrutal, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 621 bytes --]

On 2018-07-31 11:45 PM, MegaBrutal wrote:

> I know that with nodatacow, I take away most of the benefits of BTRFS
> (those are actually hurting database performance – the exact CoW
> nature that is elsewhere a blessing, with databases it's a drawback).
> But are there any advantages of still sticking to BTRFS for a database
> albeit CoW is disabled, or should I just return to the old and
> reliable ext4 for those applications?
> 

Be very careful about nodatacow and btrfs 'raid'.  BTRFS has no data
synching mechanism for raid, so if your mirrors end up different
somehow, your Array is going to be inconsistent.

[-- Attachment #2: remi.vcf --]
[-- Type: text/x-vcard, Size: 193 bytes --]

begin:vcard
fn:Remi Gauvin
n:Gauvin;Remi
org:Georgian Infotech
adr:;;3-51 Sykes St. N.;Meaford;ON;N4L 1X3;Canada
email;internet:remi@georgianit.com
tel;work:226-256-1545
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-01  3:45 BTRFS and databases MegaBrutal
                   ` (5 preceding siblings ...)
  2018-08-01 14:33 ` Remi Gauvin
@ 2018-08-02  7:02 ` Qu Wenruo
  2018-08-02 10:45   ` Andrei Borzenkov
  6 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2018-08-02  7:02 UTC (permalink / raw)
  To: MegaBrutal, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1960 bytes --]



On 2018年08月01日 11:45, MegaBrutal wrote:
> Hi all,
> 
> I know it's a decade-old question, but I'd like to hear your thoughts
> of today. By now, I became a heavy BTRFS user. Almost everywhere I use
> BTRFS, except in situations when it is obvious there is no benefit
> (e.g. /var/log, /boot). At home, all my desktop, laptop and server
> computers are mainly running on BTRFS with only a few file systems on
> ext4. I even installed BTRFS in corporate productive systems (in those
> cases, the systems were mainly on ext4; but there were some specific
> file systems those exploited BTRFS features).
> 
> But there is still one question that I can't get over: if you store a
> database (e.g. MySQL), would you prefer having a BTRFS volume mounted
> with nodatacow, or would you just simply use ext4?
> 
> I know that with nodatacow, I take away most of the benefits of BTRFS
> (those are actually hurting database performance – the exact CoW
> nature that is elsewhere a blessing, with databases it's a drawback).
> But are there any advantages of still sticking to BTRFS for a database
> albeit CoW is disabled, or should I just return to the old and
> reliable ext4 for those applications?

Since I'm not a expert in database, so I can totally be wrong, but what
about completely disabling database write-ahead-log (WAL), and let
btrfs' data CoW to handle data consistency completely?

If there is some concern about the commit interval, it could be tuned by
commit= mount option.

It may either lead to super unexpected fast behavior, or some unknown
disaster. (And for latter, we at least could get some interesting
feedback and bugs to fix)

Thanks,
Qu

> 
> 
> Kind regards,
> MegaBrutal
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-01 14:33 ` Remi Gauvin
@ 2018-08-02  7:07   ` Qu Wenruo
  2018-08-02 12:32     ` Remi Gauvin
  0 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2018-08-02  7:07 UTC (permalink / raw)
  To: Remi Gauvin, MegaBrutal, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1497 bytes --]



On 2018年08月01日 22:33, Remi Gauvin wrote:
> On 2018-07-31 11:45 PM, MegaBrutal wrote:
> 
>> I know that with nodatacow, I take away most of the benefits of BTRFS
>> (those are actually hurting database performance – the exact CoW
>> nature that is elsewhere a blessing, with databases it's a drawback).
>> But are there any advantages of still sticking to BTRFS for a database
>> albeit CoW is disabled, or should I just return to the old and
>> reliable ext4 for those applications?
>>
> 
> Be very careful about nodatacow and btrfs 'raid'.  BTRFS has no data
> synching mechanism for raid,

Not completely the case though.
Discussed in another thread, for nodatacow/nodatasum case we indeed
don't have any anyway to keep data correct.

But for RAID1 datacow or metadata case, it should not be case.
For tree block, we have generation/first_key/level check done when
searching tree blocks.
And for scrub, we also have generation check, so in theory we could
recovery such problem during scrub.

For data, since we have cow (along with csum), it should be no problem
to recover.

And since datacow is used, transaction on each device should be atomic,
thus we should be able to handle one-time device out-of-sync case.
(For multiple out-of-sync events, we don't have any good way though).

Or did I miss something from previous discussion?

Thanks,
Qu

> so if your mirrors end up different
> somehow, your Array is going to be inconsistent.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-01  8:56 ` Hugo Mills
@ 2018-08-02  9:16   ` Martin Steigerwald
  2018-08-02 10:15     ` ein
  2018-08-02 10:35     ` Andrei Borzenkov
  0 siblings, 2 replies; 19+ messages in thread
From: Martin Steigerwald @ 2018-08-02  9:16 UTC (permalink / raw)
  To: Hugo Mills; +Cc: MegaBrutal, linux-btrfs

Hugo Mills - 01.08.18, 10:56:
> On Wed, Aug 01, 2018 at 05:45:15AM +0200, MegaBrutal wrote:
> > I know it's a decade-old question, but I'd like to hear your
> > thoughts
> > of today. By now, I became a heavy BTRFS user. Almost everywhere I
> > use BTRFS, except in situations when it is obvious there is no
> > benefit (e.g. /var/log, /boot). At home, all my desktop, laptop and
> > server computers are mainly running on BTRFS with only a few file
> > systems on ext4. I even installed BTRFS in corporate productive
> > systems (in those cases, the systems were mainly on ext4; but there
> > were some specific file systems those exploited BTRFS features).
> > 
> > But there is still one question that I can't get over: if you store
> > a
> > database (e.g. MySQL), would you prefer having a BTRFS volume
> > mounted
> > with nodatacow, or would you just simply use ext4?
> 
>    Personally, I'd start with btrfs with autodefrag. It has some
> degree of I/O overhead, but if the database isn't performance-critical
> and already near the limits of the hardware, it's unlikely to make
> much difference. Autodefrag should keep the fragmentation down to a
> minimum.

I read that autodefrag would only help with small databases.

I also read that even on SSDs there is a notable performance penalty. 
4.2 GiB akonadi database  for tons of mails appears to work okayish on 
dual SSD BTRFS RAID 1 here with LZO compression here. However I have no 
comparison, for example how it would run on XFS. And its fragmented 
quite a bit, example for the largest file of 3 GiB – I know this in part 
is also due to LZO compression.

[…].local/share/akonadi/db_data/akonadi> time /usr/sbin/filefrag 
parttable.ibd
parttable.ibd: 45380 extents found
/usr/sbin/filefrag parttable.ibd  0,00s user 0,86s system 41% cpu 2,054 
total

However it digs out those extents quite fast.

I would not feel comfortable with setting this file to nodatacow.


However I wonder: Is this it? Is there nothing that can be improved in 
BTRFS to handle database and VM files in a better way, without altering 
any default settings?

Is it also an issue on ZFS? ZFS does also copy on write. How does ZFS 
handle this? Can anything be learned from it? I never head people 
complain about poor database performance on ZFS, but… I don´t use it and 
I am not subscribed to any ZFS mailing lists, so they may have similar 
issues and I just do not know it.

Well there seems to be a performance penalty at least when compared to 
XFS:

About ZFS Performance
Yves Trudeau, May 15, 2018

https://www.percona.com/blog/2018/05/15/about-zfs-performance/

The article described how you can use NVMe devices as cache to mitigate 
the performance impact. That would hint that BTRFS with VFS Hot Data 
Tracking and relocating data to SSD or NVMe devices could be a way to 
set this up.


But as said I read about bad database performance even on SSDs with 
BTRFS. I do not find the original reference at the moment, but I got 
this for example, however it is from 2015 (on kernel 4.0 which is a bit 
old):

Friends don't let friends use BTRFS for OLTP
2015/09/16 by Tomas Vondra

https://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp

Interestingly it also compares with ZFS which is doing much better. So 
maybe there is really something to be learned from ZFS.

I did not get clearly whether the benchmark was on an SSD, as Tomas 
notes the "ssd" mount option, it might have been.

Thanks,
-- 
Martin



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-02  9:16   ` Martin Steigerwald
@ 2018-08-02 10:15     ` ein
  2018-08-02 10:35     ` Andrei Borzenkov
  1 sibling, 0 replies; 19+ messages in thread
From: ein @ 2018-08-02 10:15 UTC (permalink / raw)
  To: Martin Steigerwald, Hugo Mills; +Cc: MegaBrutal, linux-btrfs

On 08/02/2018 11:16 AM, Martin Steigerwald wrote:
> However I wonder: Is this it? Is there nothing that can be improved in 
> BTRFS to handle database and VM files in a better way, without altering 
> any default settings?

Poor performance is not the biggest BTRFS problem, it's known for silent data corruption for
instance when using KVM with cache=none,aio=native, error counters are worthless too and do not
increment in case of csum mismatches. Performance penalty is huge, iowait for 4x SSD on RAID10 with
BTRFS is the same like when using RAID1 on 2xSAS with Ext4 for firebird database ~ 20GB which can
fit as a whole in RAM for better read performance.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-02  9:16   ` Martin Steigerwald
  2018-08-02 10:15     ` ein
@ 2018-08-02 10:35     ` Andrei Borzenkov
  2018-08-02 10:42       ` Martin Steigerwald
  2018-08-02 10:53       ` Qu Wenruo
  1 sibling, 2 replies; 19+ messages in thread
From: Andrei Borzenkov @ 2018-08-02 10:35 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: Hugo Mills, MegaBrutal, linux-btrfs



Отправлено с iPhone

> 2 авг. 2018 г., в 12:16, Martin Steigerwald <martin@lichtvoll.de> написал(а):
> 
> Hugo Mills - 01.08.18, 10:56:
>>> On Wed, Aug 01, 2018 at 05:45:15AM +0200, MegaBrutal wrote:
>>> I know it's a decade-old question, but I'd like to hear your
>>> thoughts
>>> of today. By now, I became a heavy BTRFS user. Almost everywhere I
>>> use BTRFS, except in situations when it is obvious there is no
>>> benefit (e.g. /var/log, /boot). At home, all my desktop, laptop and
>>> server computers are mainly running on BTRFS with only a few file
>>> systems on ext4. I even installed BTRFS in corporate productive
>>> systems (in those cases, the systems were mainly on ext4; but there
>>> were some specific file systems those exploited BTRFS features).
>>> 
>>> But there is still one question that I can't get over: if you store
>>> a
>>> database (e.g. MySQL), would you prefer having a BTRFS volume
>>> mounted
>>> with nodatacow, or would you just simply use ext4?
>> 
>>   Personally, I'd start with btrfs with autodefrag. It has some
>> degree of I/O overhead, but if the database isn't performance-critical
>> and already near the limits of the hardware, it's unlikely to make
>> much difference. Autodefrag should keep the fragmentation down to a
>> minimum.
> 
> I read that autodefrag would only help with small databases.
> 

I wonder if anyone actually 

a) quantified performance impact
b) analyzed the cause

I work with NetApp for a long time and I can say from first hand experience that fragmentation had zero impact on OLTP workload. It did affect backup performance as was expected, but this could be fixed by periodic reallocation (defragmentation).

And even that needed quite some time to observe (years) on pretty high  load database with regular backup and replication snapshots.

If btrfs is so susceptible to fragmentation, what is the reason for it?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-02 10:35     ` Andrei Borzenkov
@ 2018-08-02 10:42       ` Martin Steigerwald
  2018-08-02 10:53       ` Qu Wenruo
  1 sibling, 0 replies; 19+ messages in thread
From: Martin Steigerwald @ 2018-08-02 10:42 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Hugo Mills, MegaBrutal, linux-btrfs

Andrei Borzenkov - 02.08.18, 12:35:
> Отправлено с iPhone
> 
> > 2 авг. 2018 г., в 12:16, Martin Steigerwald <martin@lichtvoll.de>
> > написал(а):> 
> > Hugo Mills - 01.08.18, 10:56:
> >>> On Wed, Aug 01, 2018 at 05:45:15AM +0200, MegaBrutal wrote:
> >>> I know it's a decade-old question, but I'd like to hear your
> >>> thoughts
> >>> of today. By now, I became a heavy BTRFS user. Almost everywhere I
> >>> use BTRFS, except in situations when it is obvious there is no
> >>> benefit (e.g. /var/log, /boot). At home, all my desktop, laptop
> >>> and
> >>> server computers are mainly running on BTRFS with only a few file
> >>> systems on ext4. I even installed BTRFS in corporate productive
> >>> systems (in those cases, the systems were mainly on ext4; but
> >>> there
> >>> were some specific file systems those exploited BTRFS features).
> >>> 
> >>> But there is still one question that I can't get over: if you
> >>> store
> >>> a
> >>> database (e.g. MySQL), would you prefer having a BTRFS volume
> >>> mounted
> >>> with nodatacow, or would you just simply use ext4?
> >>> 
> >>   Personally, I'd start with btrfs with autodefrag. It has some
> >> 
> >> degree of I/O overhead, but if the database isn't
> >> performance-critical and already near the limits of the hardware,
> >> it's unlikely to make much difference. Autodefrag should keep the
> >> fragmentation down to a minimum.
> > 
> > I read that autodefrag would only help with small databases.
> 
> I wonder if anyone actually
> 
> a) quantified performance impact
> b) analyzed the cause
> 
> I work with NetApp for a long time and I can say from first hand
> experience that fragmentation had zero impact on OLTP workload. It
> did affect backup performance as was expected, but this could be
> fixed by periodic reallocation (defragmentation).
> 
> And even that needed quite some time to observe (years) on pretty high
>  load database with regular backup and replication snapshots.
> 
> If btrfs is so susceptible to fragmentation, what is the reason for
> it?

In the end of my original mail I mentioned a blog article that also had 
some performance graphs. Did you actually read it?

Thanks,
-- 
Martin



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-02  7:02 ` Qu Wenruo
@ 2018-08-02 10:45   ` Andrei Borzenkov
  2018-08-02 10:56     ` Qu Wenruo
  0 siblings, 1 reply; 19+ messages in thread
From: Andrei Borzenkov @ 2018-08-02 10:45 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: MegaBrutal, linux-btrfs



Отправлено с iPhone

> 2 авг. 2018 г., в 10:02, Qu Wenruo <quwenruo.btrfs@gmx.com> написал(а):
> 
> 
> 
>> On 2018年08月01日 11:45, MegaBrutal wrote:
>> Hi all,
>> 
>> I know it's a decade-old question, but I'd like to hear your thoughts
>> of today. By now, I became a heavy BTRFS user. Almost everywhere I use
>> BTRFS, except in situations when it is obvious there is no benefit
>> (e.g. /var/log, /boot). At home, all my desktop, laptop and server
>> computers are mainly running on BTRFS with only a few file systems on
>> ext4. I even installed BTRFS in corporate productive systems (in those
>> cases, the systems were mainly on ext4; but there were some specific
>> file systems those exploited BTRFS features).
>> 
>> But there is still one question that I can't get over: if you store a
>> database (e.g. MySQL), would you prefer having a BTRFS volume mounted
>> with nodatacow, or would you just simply use ext4?
>> 
>> I know that with nodatacow, I take away most of the benefits of BTRFS
>> (those are actually hurting database performance – the exact CoW
>> nature that is elsewhere a blessing, with databases it's a drawback).
>> But are there any advantages of still sticking to BTRFS for a database
>> albeit CoW is disabled, or should I just return to the old and
>> reliable ext4 for those applications?
> 
> Since I'm not a expert in database, so I can totally be wrong, but what
> about completely disabling database write-ahead-log (WAL), and let
> btrfs' data CoW to handle data consistency completely?
> 

This would make content of database after crash completely unpredictable, thus making it impossible to reliably roll back transaction.


> If there is some concern about the commit interval, it could be tuned by
> commit= mount option.
> 
> It may either lead to super unexpected fast behavior, or some unknown
> disaster. (And for latter, we at least could get some interesting
> feedback and bugs to fix)
> 
> Thanks,
> Qu
> 
>> 
>> 
>> Kind regards,
>> MegaBrutal
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-02 10:35     ` Andrei Borzenkov
  2018-08-02 10:42       ` Martin Steigerwald
@ 2018-08-02 10:53       ` Qu Wenruo
  1 sibling, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2018-08-02 10:53 UTC (permalink / raw)
  To: Andrei Borzenkov, Martin Steigerwald; +Cc: Hugo Mills, MegaBrutal, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4259 bytes --]



On 2018年08月02日 18:35, Andrei Borzenkov wrote:
> 
> 
> Отправлено с iPhone
> 
>> 2 авг. 2018 г., в 12:16, Martin Steigerwald <martin@lichtvoll.de> написал(а):
>>
>> Hugo Mills - 01.08.18, 10:56:
>>>> On Wed, Aug 01, 2018 at 05:45:15AM +0200, MegaBrutal wrote:
>>>> I know it's a decade-old question, but I'd like to hear your
>>>> thoughts
>>>> of today. By now, I became a heavy BTRFS user. Almost everywhere I
>>>> use BTRFS, except in situations when it is obvious there is no
>>>> benefit (e.g. /var/log, /boot). At home, all my desktop, laptop and
>>>> server computers are mainly running on BTRFS with only a few file
>>>> systems on ext4. I even installed BTRFS in corporate productive
>>>> systems (in those cases, the systems were mainly on ext4; but there
>>>> were some specific file systems those exploited BTRFS features).
>>>>
>>>> But there is still one question that I can't get over: if you store
>>>> a
>>>> database (e.g. MySQL), would you prefer having a BTRFS volume
>>>> mounted
>>>> with nodatacow, or would you just simply use ext4?
>>>
>>>   Personally, I'd start with btrfs with autodefrag. It has some
>>> degree of I/O overhead, but if the database isn't performance-critical
>>> and already near the limits of the hardware, it's unlikely to make
>>> much difference. Autodefrag should keep the fragmentation down to a
>>> minimum.
>>
>> I read that autodefrag would only help with small databases.
>>
> 
> I wonder if anyone actually 
> 
> a) quantified performance impact
> b) analyzed the cause

It's caused by btrfs' poor fsync() performance and lock-hot metadata
operations.

The root cause is how we design btrfs' btree.
For snapshot and only for snapshot, we use one btree for one
*subvolume*, unlike other fses which normally use one btree for one
*inode* (both dir and file).

This means each time we need to modify anything, including updating
EXTENT_DATA pointer, or adding new child inode pointer, we need to do
write lock of the whole subvolume tree root to the leaf.
Which makes the tree root pretty lock hot.

In short, in btrfs we need to lock and race on a big tree, while for
other fses, they only need to lock and race on different small trees.
(And that's why they can't support fast fs level snapshot)

That's the root cause of btrfs' slow metadata performance.
We have a lot of optimization to speedup the process, from delayed-ref
to tree log. But it's still slow compared to other fses.

For fsync() we have log tree optimization, which only logs related data
pointer and inodes updates, and skips some full transaction operations.
It indeeds make fsync() much faster, but still slower than other fses,
due to metadata design.


BTW, nodatacow indeed improves performance, but it's mostly due to the
following factors:
1) No csum calculation
   Although csum calculation can be balanced to multi cpu cores/threads,
   and CRC32 is pretty fast, it still introduces overhead.

2) Some overwrite no longer needs to modify subvolume tree
   If we're doing overwrite, and there is no need to do CoW (for
   snapshot), we can skip updating EXTENT_DATA, and this reduces
   a lot of tree write lock and improve performance.

> 
> I work with NetApp for a long time and I can say from first hand experience that fragmentation had zero impact on OLTP workload. It did affect backup performance as was expected, but this could be fixed by periodic reallocation (defragmentation).
> 
> And even that needed quite some time to observe (years) on pretty high  load database with regular backup and replication snapshots.
> 
> If btrfs is so susceptible to fragmentation, what is the reason for it?

I heard some reports of fragmentation, but mostly related to extent
booking and ENOSPC, not really related to performance.
And IIRC I did some old performance tests on HDD using btrfs and xfs/ext4.
Using autodefrag mount option in fact reduces performance on btrfs.

Thanks,
Qu

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-02 10:45   ` Andrei Borzenkov
@ 2018-08-02 10:56     ` Qu Wenruo
  2018-08-02 12:27       ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2018-08-02 10:56 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: MegaBrutal, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2761 bytes --]



On 2018年08月02日 18:45, Andrei Borzenkov wrote:
> 
> 
> Отправлено с iPhone
> 
>> 2 авг. 2018 г., в 10:02, Qu Wenruo <quwenruo.btrfs@gmx.com> написал(а):
>>
>>
>>
>>> On 2018年08月01日 11:45, MegaBrutal wrote:
>>> Hi all,
>>>
>>> I know it's a decade-old question, but I'd like to hear your thoughts
>>> of today. By now, I became a heavy BTRFS user. Almost everywhere I use
>>> BTRFS, except in situations when it is obvious there is no benefit
>>> (e.g. /var/log, /boot). At home, all my desktop, laptop and server
>>> computers are mainly running on BTRFS with only a few file systems on
>>> ext4. I even installed BTRFS in corporate productive systems (in those
>>> cases, the systems were mainly on ext4; but there were some specific
>>> file systems those exploited BTRFS features).
>>>
>>> But there is still one question that I can't get over: if you store a
>>> database (e.g. MySQL), would you prefer having a BTRFS volume mounted
>>> with nodatacow, or would you just simply use ext4?
>>>
>>> I know that with nodatacow, I take away most of the benefits of BTRFS
>>> (those are actually hurting database performance – the exact CoW
>>> nature that is elsewhere a blessing, with databases it's a drawback).
>>> But are there any advantages of still sticking to BTRFS for a database
>>> albeit CoW is disabled, or should I just return to the old and
>>> reliable ext4 for those applications?
>>
>> Since I'm not a expert in database, so I can totally be wrong, but what
>> about completely disabling database write-ahead-log (WAL), and let
>> btrfs' data CoW to handle data consistency completely?
>>
> 
> This would make content of database after crash completely unpredictable, thus making it impossible to reliably roll back transaction.

Btrfs itself (with datacow) can ensure the fs is updated completely.

That's to say, even a crash happens, the content of the fs will be the
same state as previous btrfs transaction (btrfs sync).

Thus there is no need to rollback database transaction though.
(Unless database transaction is not sync to btrfs transaction)

Thanks,
Qu

> 
> 
>> If there is some concern about the commit interval, it could be tuned by
>> commit= mount option.
>>
>> It may either lead to super unexpected fast behavior, or some unknown
>> disaster. (And for latter, we at least could get some interesting
>> feedback and bugs to fix)
>>
>> Thanks,
>> Qu
>>
>>>
>>>
>>> Kind regards,
>>> MegaBrutal
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-02 10:56     ` Qu Wenruo
@ 2018-08-02 12:27       ` Austin S. Hemmelgarn
  2018-08-02 13:14         ` Martin Raiber
  0 siblings, 1 reply; 19+ messages in thread
From: Austin S. Hemmelgarn @ 2018-08-02 12:27 UTC (permalink / raw)
  To: Qu Wenruo, Andrei Borzenkov; +Cc: MegaBrutal, linux-btrfs

On 2018-08-02 06:56, Qu Wenruo wrote:
> 
> 
> On 2018年08月02日 18:45, Andrei Borzenkov wrote:
>>
>>
>> Отправлено с iPhone
>>
>>> 2 авг. 2018 г., в 10:02, Qu Wenruo <quwenruo.btrfs@gmx.com> написал(а):
>>>
>>>
>>>
>>>> On 2018年08月01日 11:45, MegaBrutal wrote:
>>>> Hi all,
>>>>
>>>> I know it's a decade-old question, but I'd like to hear your thoughts
>>>> of today. By now, I became a heavy BTRFS user. Almost everywhere I use
>>>> BTRFS, except in situations when it is obvious there is no benefit
>>>> (e.g. /var/log, /boot). At home, all my desktop, laptop and server
>>>> computers are mainly running on BTRFS with only a few file systems on
>>>> ext4. I even installed BTRFS in corporate productive systems (in those
>>>> cases, the systems were mainly on ext4; but there were some specific
>>>> file systems those exploited BTRFS features).
>>>>
>>>> But there is still one question that I can't get over: if you store a
>>>> database (e.g. MySQL), would you prefer having a BTRFS volume mounted
>>>> with nodatacow, or would you just simply use ext4?
>>>>
>>>> I know that with nodatacow, I take away most of the benefits of BTRFS
>>>> (those are actually hurting database performance – the exact CoW
>>>> nature that is elsewhere a blessing, with databases it's a drawback).
>>>> But are there any advantages of still sticking to BTRFS for a database
>>>> albeit CoW is disabled, or should I just return to the old and
>>>> reliable ext4 for those applications?
>>>
>>> Since I'm not a expert in database, so I can totally be wrong, but what
>>> about completely disabling database write-ahead-log (WAL), and let
>>> btrfs' data CoW to handle data consistency completely?
>>>
>>
>> This would make content of database after crash completely unpredictable, thus making it impossible to reliably roll back transaction.
> 
> Btrfs itself (with datacow) can ensure the fs is updated completely.
> 
> That's to say, even a crash happens, the content of the fs will be the
> same state as previous btrfs transaction (btrfs sync).
> 
> Thus there is no need to rollback database transaction though.
> (Unless database transaction is not sync to btrfs transaction)
> 
Two issues with this statement:

1. Not all database software properly groups logically related 
operations that need to be atomic as a unit into transactions.
2. Even aside from point 1 and the possibility of database corruption, 
there are other legitimate reasons that you might need to roll-back a 
transaction (for example, the rather obvious case of a transaction that 
should not have happened in the first place).


^ permalink raw reply	[flat|nested] 19+ messages in thread

* BTRFS and databases
  2018-08-02  7:07   ` Qu Wenruo
@ 2018-08-02 12:32     ` Remi Gauvin
  0 siblings, 0 replies; 19+ messages in thread
From: Remi Gauvin @ 2018-08-02 12:32 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 741 bytes --]

On 2018-08-02 03:07 AM, Qu Wenruo wrote:


> For data, since we have cow (along with csum), it should be no problem
> to recover.
> 
> And since datacow is used, transaction on each device should be atomic,
> thus we should be able to handle one-time device out-of-sync case.
> (For multiple out-of-sync events, we don't have any good way though).
> 
> Or did I miss something from previous discussion?

As far as I know, that is indeed correct and works very well.  The
question was specifically about using nodatacow for databases,, and
that's the question I was responding too.  In the current state, I do no
believe btrfs nodatacow is in any way appropriate for databases/vm
hosting when combined with multi-device.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: BTRFS and databases
  2018-08-02 12:27       ` Austin S. Hemmelgarn
@ 2018-08-02 13:14         ` Martin Raiber
  0 siblings, 0 replies; 19+ messages in thread
From: Martin Raiber @ 2018-08-02 13:14 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Qu Wenruo, Andrei Borzenkov; +Cc: MegaBrutal, linux-btrfs

On 02.08.2018 14:27 Austin S. Hemmelgarn wrote:
> On 2018-08-02 06:56, Qu Wenruo wrote:
>>
>> On 2018年08月02日 18:45, Andrei Borzenkov wrote:
>>>
>>> Отправлено с iPhone
>>>
>>>> 2 авг. 2018 г., в 10:02, Qu Wenruo <quwenruo.btrfs@gmx.com>
>>>> написал(а):
>>>>
>>>>> On 2018年08月01日 11:45, MegaBrutal wrote:
>>>>> Hi all,
>>>>>
>>>>> I know it's a decade-old question, but I'd like to hear your thoughts
>>>>> of today. By now, I became a heavy BTRFS user. Almost everywhere I
>>>>> use
>>>>> BTRFS, except in situations when it is obvious there is no benefit
>>>>> (e.g. /var/log, /boot). At home, all my desktop, laptop and server
>>>>> computers are mainly running on BTRFS with only a few file systems on
>>>>> ext4. I even installed BTRFS in corporate productive systems (in
>>>>> those
>>>>> cases, the systems were mainly on ext4; but there were some specific
>>>>> file systems those exploited BTRFS features).
>>>>>
>>>>> But there is still one question that I can't get over: if you store a
>>>>> database (e.g. MySQL), would you prefer having a BTRFS volume mounted
>>>>> with nodatacow, or would you just simply use ext4?
>>>>>
>>>>> I know that with nodatacow, I take away most of the benefits of BTRFS
>>>>> (those are actually hurting database performance – the exact CoW
>>>>> nature that is elsewhere a blessing, with databases it's a drawback).
>>>>> But are there any advantages of still sticking to BTRFS for a
>>>>> database
>>>>> albeit CoW is disabled, or should I just return to the old and
>>>>> reliable ext4 for those applications?
>>>>
>>>> Since I'm not a expert in database, so I can totally be wrong, but
>>>> what
>>>> about completely disabling database write-ahead-log (WAL), and let
>>>> btrfs' data CoW to handle data consistency completely?
>>>>
>>>
>>> This would make content of database after crash completely
>>> unpredictable, thus making it impossible to reliably roll back
>>> transaction.
>>
>> Btrfs itself (with datacow) can ensure the fs is updated completely.
>>
>> That's to say, even a crash happens, the content of the fs will be the
>> same state as previous btrfs transaction (btrfs sync).
>>
>> Thus there is no need to rollback database transaction though.
>> (Unless database transaction is not sync to btrfs transaction)
>>
> Two issues with this statement:
>
> 1. Not all database software properly groups logically related
> operations that need to be atomic as a unit into transactions.
> 2. Even aside from point 1 and the possibility of database corruption,
> there are other legitimate reasons that you might need to roll-back a
> transaction (for example, the rather obvious case of a transaction
> that should not have happened in the first place).

I thought of a database transaction scheme that is based on btrfs
features before. It has practical issues, though.
One would put a b-tree database file into a subvolume (e.g. trans_0).
When changing the b-tree database one would create a snapshot (trans_1),
then change the file in the snapshot. On commit sync trans_1, then
delete trans_0. On rollback, delete trans_1.

Problems:
* Large overhead for small transactions (OLTP) -- problem in general for
copy-on-write b-tree databases
* Only root can create or destroy snapshots
* Per default the Linux memory system starts write-back pretty much
immediately, so pages that get overwritten more than once in a
transaction (and not kept in RAM) unless Linux is tuned to not do this.

I have used this method, albeit by reflinking the database, then
modifying the reflink, but I think reflinking it slower than creating a
snapshot?

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-08-02 15:05 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-01  3:45 BTRFS and databases MegaBrutal
2018-08-01  8:48 ` Duncan
2018-08-01  8:56 ` Hugo Mills
2018-08-02  9:16   ` Martin Steigerwald
2018-08-02 10:15     ` ein
2018-08-02 10:35     ` Andrei Borzenkov
2018-08-02 10:42       ` Martin Steigerwald
2018-08-02 10:53       ` Qu Wenruo
2018-08-01  8:59 ` Mike Fleetwood
2018-08-01 11:21 ` Adam Borowski
2018-08-01 12:19 ` Austin S. Hemmelgarn
2018-08-01 14:33 ` Remi Gauvin
2018-08-02  7:07   ` Qu Wenruo
2018-08-02 12:32     ` Remi Gauvin
2018-08-02  7:02 ` Qu Wenruo
2018-08-02 10:45   ` Andrei Borzenkov
2018-08-02 10:56     ` Qu Wenruo
2018-08-02 12:27       ` Austin S. Hemmelgarn
2018-08-02 13:14         ` Martin Raiber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.