All of lore.kernel.org
 help / color / mirror / Atom feed
* Transactional btrfs
@ 2018-09-06  7:23 Nathan Dehnel
  2018-09-06 10:08 ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 5+ messages in thread
From: Nathan Dehnel @ 2018-09-06  7:23 UTC (permalink / raw)
  To: linux-btrfs

https://lwn.net/Articles/287289/

In 2008, HP released the source code for a filesystem called advfs so
that its features could be incorporated into linux filesystems. Advfs
had a feature where a group of file writes were an atomic transaction.

https://www.usenix.org/system/files/conference/fast15/fast15-paper-verma.pdf

These guys used advfs to add a "syncv" system call that makes writes
across multiple files atomic.

https://lwn.net/Articles/715918/

A patch was later submitted based on the previous paper in some way.

So I guess my question is, does btrfs support atomic writes across
multiple files? Or is anyone interested in such a feature?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Transactional btrfs
  2018-09-06  7:23 Transactional btrfs Nathan Dehnel
@ 2018-09-06 10:08 ` Austin S. Hemmelgarn
  2018-09-08 16:24   ` Adam Borowski
  0 siblings, 1 reply; 5+ messages in thread
From: Austin S. Hemmelgarn @ 2018-09-06 10:08 UTC (permalink / raw)
  To: Nathan Dehnel, linux-btrfs

On 2018-09-06 03:23, Nathan Dehnel wrote:
> https://lwn.net/Articles/287289/
> 
> In 2008, HP released the source code for a filesystem called advfs so
> that its features could be incorporated into linux filesystems. Advfs
> had a feature where a group of file writes were an atomic transaction.
> 
> https://www.usenix.org/system/files/conference/fast15/fast15-paper-verma.pdf
> 
> These guys used advfs to add a "syncv" system call that makes writes
> across multiple files atomic.
> 
> https://lwn.net/Articles/715918/
> 
> A patch was later submitted based on the previous paper in some way.
> 
> So I guess my question is, does btrfs support atomic writes across
> multiple files? Or is anyone interested in such a feature?
> 
I'm fairly certain that it does not currently, but in theory it would 
not be hard to add.

Realistically, the only cases I can think of where cross-file atomic 
_writes_ would be of any benefit are database systems.

However, if this were extended to include rename, unlink, touch, and a 
handful of other VFS operations, then I can easily think of a few dozen 
use cases.  Package managers in particular would likely be very 
interested in being able to atomically rename a group of files as a 
single transaction, as it would make their job _much_ easier.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Transactional btrfs
  2018-09-06 10:08 ` Austin S. Hemmelgarn
@ 2018-09-08 16:24   ` Adam Borowski
  2018-09-08 20:45     ` Martin Raiber
  0 siblings, 1 reply; 5+ messages in thread
From: Adam Borowski @ 2018-09-08 16:24 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Nathan Dehnel, linux-btrfs

On Thu, Sep 06, 2018 at 06:08:33AM -0400, Austin S. Hemmelgarn wrote:
> On 2018-09-06 03:23, Nathan Dehnel wrote:
> > So I guess my question is, does btrfs support atomic writes across
> > multiple files? Or is anyone interested in such a feature?
> > 
> I'm fairly certain that it does not currently, but in theory it would not be
> hard to add.
> 
> Realistically, the only cases I can think of where cross-file atomic
> _writes_ would be of any benefit are database systems.
> 
> However, if this were extended to include rename, unlink, touch, and a
> handful of other VFS operations, then I can easily think of a few dozen use
> cases.  Package managers in particular would likely be very interested in
> being able to atomically rename a group of files as a single transaction, as
> it would make their job _much_ easier.

I wonder, what about:
sync; mount -o remount,commit=9999999,flushoncommit
eatmydata apt dist-upgrade
sync; mount -o remount,commit=30,noflushoncommit

Obviously, this gets fooled by fsyncs, and makes the transaction affects the
whole system (if you have unrelated writes they won't get committed until
the end of transaction).  Then there are nocow files, but you already made
the decision to disable most features of btrfs for them.

So unless something forces a commit, this should already work, giving
cross-file atomic writes, renames and so on.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ What Would Jesus Do, MUD/MMORPG edition:
⣾⠁⢰⠒⠀⣿⡁ • multiplay with an admin char to benefit your mortal [Mt3:16-17]
⢿⡄⠘⠷⠚⠋⠀ • abuse item cloning bugs [Mt14:17-20, Mt15:34-37]
⠈⠳⣄⠀⠀⠀⠀ • use glitches to walk on water [Mt14:25-26]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Transactional btrfs
  2018-09-08 16:24   ` Adam Borowski
@ 2018-09-08 20:45     ` Martin Raiber
  2018-09-08 21:28       ` Adam Borowski
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Raiber @ 2018-09-08 20:45 UTC (permalink / raw)
  Cc: linux-btrfs

Am 08.09.2018 um 18:24 schrieb Adam Borowski:
> On Thu, Sep 06, 2018 at 06:08:33AM -0400, Austin S. Hemmelgarn wrote:
>> On 2018-09-06 03:23, Nathan Dehnel wrote:
>>> So I guess my question is, does btrfs support atomic writes across
>>> multiple files? Or is anyone interested in such a feature?
>>>
>> I'm fairly certain that it does not currently, but in theory it would not be
>> hard to add.
>>
>> Realistically, the only cases I can think of where cross-file atomic
>> _writes_ would be of any benefit are database systems.
>>
>> However, if this were extended to include rename, unlink, touch, and a
>> handful of other VFS operations, then I can easily think of a few dozen use
>> cases.  Package managers in particular would likely be very interested in
>> being able to atomically rename a group of files as a single transaction, as
>> it would make their job _much_ easier.
> I wonder, what about:
> sync; mount -o remount,commit=9999999,flushoncommit
> eatmydata apt dist-upgrade
> sync; mount -o remount,commit=30,noflushoncommit
>
> Obviously, this gets fooled by fsyncs, and makes the transaction affects the
> whole system (if you have unrelated writes they won't get committed until
> the end of transaction).  Then there are nocow files, but you already made
> the decision to disable most features of btrfs for them.
>
> So unless something forces a commit, this should already work, giving
> cross-file atomic writes, renames and so on.

Now combine this with snapshot root, then on success rename exchange to
root and you are there.

Btrfs had in the past TRANS_START and TRANS_END ioctls (for ceph, I
think), but no rollback (and therefore no error handling incl. ENOSPC).

If you want to look at a working file system transaction mechanism, you
should look at transactional NTFS (TxF). They are writing they are
deprecating it, so it's perhaps not very widely used. Windows uses it
for updates, I think.

Specifically for btrfs, the problem would be that it really needs to
support multiple simultaneous writers, otherwise one transaction can
block the whole system.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Transactional btrfs
  2018-09-08 20:45     ` Martin Raiber
@ 2018-09-08 21:28       ` Adam Borowski
  0 siblings, 0 replies; 5+ messages in thread
From: Adam Borowski @ 2018-09-08 21:28 UTC (permalink / raw)
  To: Martin Raiber; +Cc: linux-btrfs

On Sat, Sep 08, 2018 at 08:45:47PM +0000, Martin Raiber wrote:
> Am 08.09.2018 um 18:24 schrieb Adam Borowski:
> > On Thu, Sep 06, 2018 at 06:08:33AM -0400, Austin S. Hemmelgarn wrote:
> >> On 2018-09-06 03:23, Nathan Dehnel wrote:
> >>> So I guess my question is, does btrfs support atomic writes across
> >>> multiple files? Or is anyone interested in such a feature?
> >>>
> >> I'm fairly certain that it does not currently, but in theory it would not be
> >> hard to add.

> >> However, if this were extended to include rename, unlink, touch, and a
> >> handful of other VFS operations, then I can easily think of a few dozen use
> >> cases.  Package managers in particular would likely be very interested in
> >> being able to atomically rename a group of files as a single transaction, as
> >> it would make their job _much_ easier.

> > I wonder, what about:
> > sync; mount -o remount,commit=9999999,flushoncommit
> > eatmydata apt dist-upgrade
> > sync; mount -o remount,commit=30,noflushoncommit
> >
> > Obviously, this gets fooled by fsyncs, and makes the transaction affects the
> > whole system (if you have unrelated writes they won't get committed until
> > the end of transaction).  Then there are nocow files, but you already made
> > the decision to disable most features of btrfs for them.

> Now combine this with snapshot root, then on success rename exchange to
> root and you are there.

No need: no unsuccessful transactions ever get written to the disk.
(Not counting unreachable stuff.)

> Btrfs had in the past TRANS_START and TRANS_END ioctls (for ceph, I
> think), but no rollback (and therefore no error handling incl. ENOSPC).
> 
> If you want to look at a working file system transaction mechanism, you
> should look at transactional NTFS (TxF). They are writing they are
> deprecating it, so it's perhaps not very widely used. Windows uses it
> for updates, I think.

You're talking about multiple simultaneous transactions, they have a massive
complexity cost.  And btrfs is already ridiculously complex.  I don't really
see a good way to tie this with the POSIX API without some serious
rethinking.

dpkg can already recover from a properly returned error (although not as
nicely as a full rollback); what is fatal for it is having its status
database corrupted/out of sync.  That's why it does a multiple fsync dance
and keeps fully rewriting its files over and over and over.

Atomic operations are pretty useful even without a rollback: you still need
to be able to handle failure, but not a crash.

> Specifically for btrfs, the problem would be that it really needs to
> support multiple simultaneous writers, otherwise one transaction can
> block the whole system.

My dirty hack above doesn't suffer from such a block: it only suffers from
compromising durability of concurrent writers.  During that userspace
transaction, there are no commits until it finishes; this means that if
there's unrelated activity it may suffer from losing writes that were done
between transaction start and crash.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ What Would Jesus Do, MUD/MMORPG edition:
⣾⠁⢰⠒⠀⣿⡁ • multiplay with an admin char to benefit your mortal [Mt3:16-17]
⢿⡄⠘⠷⠚⠋⠀ • abuse item cloning bugs [Mt14:17-20, Mt15:34-37]
⠈⠳⣄⠀⠀⠀⠀ • use glitches to walk on water [Mt14:25-26]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-09-09  2:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-06  7:23 Transactional btrfs Nathan Dehnel
2018-09-06 10:08 ` Austin S. Hemmelgarn
2018-09-08 16:24   ` Adam Borowski
2018-09-08 20:45     ` Martin Raiber
2018-09-08 21:28       ` Adam Borowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.