All of lore.kernel.org
 help / color / mirror / Atom feed
* Transactional XFS?
@ 2012-02-15 19:15 Grozdan
  2012-02-16  0:22 ` Dave Chinner
  2012-02-16 12:01 ` Matthias Schniedermeyer
  0 siblings, 2 replies; 9+ messages in thread
From: Grozdan @ 2012-02-15 19:15 UTC (permalink / raw)
  To: xfs

Hi,

I just finished watching the excellent speech of Dave Chinner at
linux.conf.au and I must say I'm impressed by the recent improvements
to XFS. Towards the end of the talk, Dave talked about upcoming
improvements on Metadata reliability and other features. What I'm
wondering about is if there are any plans in making XFS transactional
(fully atomic) like it is the case with recent NTFS versions on
Windows Vista and higher?

Thanks

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Transactional XFS?
  2012-02-15 19:15 Transactional XFS? Grozdan
@ 2012-02-16  0:22 ` Dave Chinner
  2012-02-16  1:01   ` Stewart Smith
  2012-02-16 12:01 ` Matthias Schniedermeyer
  1 sibling, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2012-02-16  0:22 UTC (permalink / raw)
  To: Grozdan; +Cc: xfs

On Wed, Feb 15, 2012 at 08:15:46PM +0100, Grozdan wrote:
> Hi,
> 
> I just finished watching the excellent speech of Dave Chinner at
> linux.conf.au and I must say I'm impressed by the recent improvements
> to XFS. Towards the end of the talk, Dave talked about upcoming
> improvements on Metadata reliability and other features. What I'm
> wondering about is if there are any plans in making XFS transactional
> (fully atomic) like it is the case with recent NTFS versions on
> Windows Vista and higher?

What do you mean by "fully atomic"? NTFS is not fully atomic - it
doesn't journal data so can lose data on a crash - so I'm not sure
what you mean here....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Transactional XFS?
  2012-02-16  0:22 ` Dave Chinner
@ 2012-02-16  1:01   ` Stewart Smith
  2012-02-16  1:43     ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Stewart Smith @ 2012-02-16  1:01 UTC (permalink / raw)
  To: Dave Chinner, Grozdan; +Cc: xfs

On Thu, 16 Feb 2012 11:22:37 +1100, Dave Chinner <david@fromorbit.com> wrote:
> On Wed, Feb 15, 2012 at 08:15:46PM +0100, Grozdan wrote:
> > Hi,
> > 
> > I just finished watching the excellent speech of Dave Chinner at
> > linux.conf.au and I must say I'm impressed by the recent improvements
> > to XFS. Towards the end of the talk, Dave talked about upcoming
> > improvements on Metadata reliability and other features. What I'm
> > wondering about is if there are any plans in making XFS transactional
> > (fully atomic) like it is the case with recent NTFS versions on
> > Windows Vista and higher?
> 
> What do you mean by "fully atomic"? NTFS is not fully atomic - it
> doesn't journal data so can lose data on a crash - so I'm not sure
> what you mean here....

There's another API in Windows that's let you do operations in a
all-or-nothing way. Originally this was scoped to be able to just add a
couple of API calls to the Windows file API and have it all "just work"
(imagine adding just three syscalls: begin(), commit(),
rollback()). This didn't really work out so well, and by the final Vista
release, it was a wholly different API calls (more like tx_begin,
tx_open, tx_read, tx_write... so you had to have code explicitly aware
of transactions).

AFAIK the current big user is Windows Update. That is, windows update
will either apply all its changes to the system or none. Think of being
able to hit the reset button halfway through a windows update and have
everything "just work" and come back to a sane state. I've had a linux
box crash during a dist-upgrade before... not pretty.

It's a neat idea, but as you can imagine, fraught with difficulties.

I think it'd be possible to do.. you know, if you lock a number of FS
and VFS devs in a room with database people for a month or so we may
theoritically solve nearly all the problems....

-- 
Stewart Smith

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Transactional XFS?
  2012-02-16  1:01   ` Stewart Smith
@ 2012-02-16  1:43     ` Dave Chinner
  2012-02-16  5:38       ` Stewart Smith
  2012-02-16 22:10       ` Peter Grandi
  0 siblings, 2 replies; 9+ messages in thread
From: Dave Chinner @ 2012-02-16  1:43 UTC (permalink / raw)
  To: Stewart Smith; +Cc: Grozdan, xfs

On Thu, Feb 16, 2012 at 12:01:01PM +1100, Stewart Smith wrote:
> On Thu, 16 Feb 2012 11:22:37 +1100, Dave Chinner <david@fromorbit.com> wrote:
> > On Wed, Feb 15, 2012 at 08:15:46PM +0100, Grozdan wrote:
> > > Hi,
> > > 
> > > I just finished watching the excellent speech of Dave Chinner at
> > > linux.conf.au and I must say I'm impressed by the recent improvements
> > > to XFS. Towards the end of the talk, Dave talked about upcoming
> > > improvements on Metadata reliability and other features. What I'm
> > > wondering about is if there are any plans in making XFS transactional
> > > (fully atomic) like it is the case with recent NTFS versions on
> > > Windows Vista and higher?
> > 
> > What do you mean by "fully atomic"? NTFS is not fully atomic - it
> > doesn't journal data so can lose data on a crash - so I'm not sure
> > what you mean here....
> 
> There's another API in Windows that's let you do operations in a
> all-or-nothing way. Originally this was scoped to be able to just add a
> couple of API calls to the Windows file API and have it all "just work"
> (imagine adding just three syscalls: begin(), commit(),
> rollback()). This didn't really work out so well, and by the final Vista
> release, it was a wholly different API calls (more like tx_begin,
> tx_open, tx_read, tx_write... so you had to have code explicitly aware
> of transactions).

Oh, so making some set of random user changes to random user data
have ACID properties? That's what databases are for, isn't it?  :P

I dont see us implementing anything like this in XFS anytime soon.
We are looking to add transaction grouping so that we can make
things that currently require multiple transactions (e.g. create a
file, add a default ACL) atomic, but I don't have any plans to
open the can of worms that is userspace controlled transactions any
time soon.

> AFAIK the current big user is Windows Update. That is, windows update
> will either apply all its changes to the system or none. Think of being
> able to hit the reset button halfway through a windows update and have
> everything "just work" and come back to a sane state. I've had a linux
> box crash during a dist-upgrade before... not pretty.
> 
> It's a neat idea, but as you can imagine, fraught with difficulties.

We already have this upgrade rollback functionality in development
with none of that complexity - it uses filesystem snapshots so is
effectively filesystem independent and already works with yum and
btrfs. You don't need any special application support for this -
rollback from a failed upgrade is as simple as a reboot.

> I think it'd be possible to do.. you know, if you lock a number of FS
> and VFS devs in a room with database people for a month or so we may
> theoritically solve nearly all the problems....

Sure, Microsoft have been trying to make their filesystem a database
for years. It's theoretically possible, but in practice they've
fallen short in every attempt in the past 15 years.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Transactional XFS?
  2012-02-16  1:43     ` Dave Chinner
@ 2012-02-16  5:38       ` Stewart Smith
  2012-02-16  6:42         ` Dave Chinner
  2012-02-16 22:10       ` Peter Grandi
  1 sibling, 1 reply; 9+ messages in thread
From: Stewart Smith @ 2012-02-16  5:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Grozdan, xfs

On Thu, 16 Feb 2012 12:43:38 +1100, Dave Chinner <david@fromorbit.com> wrote:
> Oh, so making some set of random user changes to random user data
> have ACID properties? That's what databases are for, isn't it?  :P

Yep :)

> I dont see us implementing anything like this in XFS anytime soon.
> We are looking to add transaction grouping so that we can make
> things that currently require multiple transactions (e.g. create a
> file, add a default ACL) atomic, but I don't have any plans to
> open the can of worms that is userspace controlled transactions any
> time soon.

The worst part is working out the semantics as to not break existing apps
(without completely sacrificing concurrency).

> We already have this upgrade rollback functionality in development
> with none of that complexity - it uses filesystem snapshots so is
> effectively filesystem independent and already works with yum and
> btrfs. You don't need any special application support for this -
> rollback from a failed upgrade is as simple as a reboot.

The downside being you also roll back your logs and any other changes
made during that time. On the whole though, it's probably sufficient.

> Sure, Microsoft have been trying to make their filesystem a database
> for years. It's theoretically possible, but in practice they've
> fallen short in every attempt in the past 15 years.

err... try 20 years :)

It's funny in a way, sqlite succeeds at effectively doing this for an
awful large number of applications.

-- 
Stewart Smith

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Transactional XFS?
  2012-02-16  5:38       ` Stewart Smith
@ 2012-02-16  6:42         ` Dave Chinner
  2012-02-17  4:40           ` Stewart Smith
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2012-02-16  6:42 UTC (permalink / raw)
  To: Stewart Smith; +Cc: Grozdan, xfs

On Thu, Feb 16, 2012 at 04:38:02PM +1100, Stewart Smith wrote:
> On Thu, 16 Feb 2012 12:43:38 +1100, Dave Chinner <david@fromorbit.com> wrote:
> > Oh, so making some set of random user changes to random user data
> > have ACID properties? That's what databases are for, isn't it?  :P
> 
> Yep :)
> 
> > I dont see us implementing anything like this in XFS anytime soon.
> > We are looking to add transaction grouping so that we can make
> > things that currently require multiple transactions (e.g. create a
> > file, add a default ACL) atomic, but I don't have any plans to
> > open the can of worms that is userspace controlled transactions any
> > time soon.
> 
> The worst part is working out the semantics as to not break existing apps
> (without completely sacrificing concurrency).

That doesn't seem like a show stopper to me.

The part that I see is that it is basically impossible to do
arbitrarily large transactions in a filesystem - they are limited by
the size of the log. e.g. you can't have a user transaction that
writes more data or modifies more data than the log allows in a
single checkpoint/transaction. e.g. you can't just overwrite a 100MB
file in a transaction and expect it to work. It might work if you've
got a 2GB log, but if you've only got a 10MB log, then that
overwrite transaction is full of fail.

It's issues that like that that doom the generic usefulness of
userspace controlled filesystem transactions as part of the normal
filesystem operation. If you need this sort of functionality, it has
to be layered over the top of the filesystem to avoid filesystem
atomicity limitations. i.e. another layer of tracking and
journalling. And at that point you're talking about implementing a
database on top of the filesystem in the filesystem....

> > We already have this upgrade rollback functionality in development
> > with none of that complexity - it uses filesystem snapshots so is
> > effectively filesystem independent and already works with yum and
> > btrfs. You don't need any special application support for this -
> > rollback from a failed upgrade is as simple as a reboot.
> 
> The downside being you also roll back your logs and any other changes
> made during that time. On the whole though, it's probably sufficient.

That, IMO, is one of the good things about it. You go back to a
pristine condition, but still have the failed upgrade image that you
can mount and debug. The logs and all the failed state is still
intact in the upgrade image, and when you are done debugging it you
can blow it away and try again....

> > Sure, Microsoft have been trying to make their filesystem a database
> > for years. It's theoretically possible, but in practice they've
> > fallen short in every attempt in the past 15 years.
> 
> err... try 20 years :)

Time gets aways from me these days ;)

> It's funny in a way, sqlite succeeds at effectively doing this for an
> awful large number of applications.

/me nods

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Transactional XFS?
  2012-02-15 19:15 Transactional XFS? Grozdan
  2012-02-16  0:22 ` Dave Chinner
@ 2012-02-16 12:01 ` Matthias Schniedermeyer
  1 sibling, 0 replies; 9+ messages in thread
From: Matthias Schniedermeyer @ 2012-02-16 12:01 UTC (permalink / raw)
  To: Grozdan; +Cc: xfs

On 15.02.2012 20:15, Grozdan wrote:
> Hi,
> 
> I just finished watching the excellent speech of Dave Chinner at
> linux.conf.au and I must say I'm impressed by the recent improvements
> to XFS. Towards the end of the talk, Dave talked about upcoming
> improvements on Metadata reliability and other features. What I'm
> wondering about is if there are any plans in making XFS transactional
> (fully atomic) like it is the case with recent NTFS versions on
> Windows Vista and higher?

You could argue if it is NTFS doing the work at all.
I glanced over a document describing it, and as far as i remember the 
KTM-Component does all the work and stores the changes into a 
specialized database.
So effectivly you have a shim at the VFS-Layer that lets "others" see 
the old data while your application can see the new data and when you 
"commit", all the filesystem changes stored in the database are applied 
to the filesystem.

As far as i unterstand it you wouldn't necessarily need support for that 
in the filesystem itself, you could do it at the VFS level.

So one of the union/layered-"things" should be able to do that.
IOW, store all the changes necessary and "replay" the changes to the 
actual filesystem when doing the commit. (Or the opposite, depending if 
you expect a commit or rollback as the default operation at transaction 
end.)

Or BTRFS should be able to do that, when they implement snapshot at 
directory-level (AFAIR BTRFS currently supports snapshots at subvolume 
level, so if you use a subvolume you could already to that). You would 
snapshot the dir, do your work in the snapshot and switch the original 
dir with the snapshot on commit.
Altough i don't know if you can switch a mounted subvolume, or if it has 
to be umounted first. Having to do a umount might be problematic, 
depending on use-case.





Bis denn

-- 
Real Programmers consider "what you see is what you get" to be just as 
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated, 
cryptic, powerful, unforgiving, dangerous.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Transactional XFS?
  2012-02-16  1:43     ` Dave Chinner
  2012-02-16  5:38       ` Stewart Smith
@ 2012-02-16 22:10       ` Peter Grandi
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Grandi @ 2012-02-16 22:10 UTC (permalink / raw)
  To: Linux fs XFS

[ ... ]

> Oh, so making some set of random user changes to random user data
> have ACID properties? That's what databases are for, isn't it?  :P

I am going to use this and in particular "That's what databases
are for, isn't it?" as a quote to throw at people who try to use
filesystems as database managers, usually with very many very
small files (also known as "records" to database people), but
not only.

>> I think it'd be possible to do.. you know, if you lock a
>> number of FS and VFS devs in a room with database people for
>> a month or so we may theoritically solve nearly all the
>> problems....

The DBMS people have given up long, long ago. At least since
the article by Stonebraker mentioned here:

  http://WWW.sabi.co.UK/blog/anno05-4th.html#051012d

Anyhow Oracle has sponsored two filesystem designs, one being
OCFS2, which is pretty decent, targeted at DBMS storage and does
not have ACID as such, and one being BTRFS which is not targeted
at DBMS storage and that has snapshots for rollback of failed
transactions.

> Sure, Microsoft have been trying to make their filesystem a
> database for years. It's theoretically possible, but in
> practice they've fallen short in every attempt in the past 15
> years.

I think it would be easier to do the opposite, and there have
been indeed filesystems implemented on top of DBMSes (with the
DBMS storing their data directly on top of block devices).

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Transactional XFS?
  2012-02-16  6:42         ` Dave Chinner
@ 2012-02-17  4:40           ` Stewart Smith
  0 siblings, 0 replies; 9+ messages in thread
From: Stewart Smith @ 2012-02-17  4:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Grozdan, xfs

On Thu, 16 Feb 2012 17:42:30 +1100, Dave Chinner <david@fromorbit.com> wrote:
> > The worst part is working out the semantics as to not break existing apps
> > (without completely sacrificing concurrency).
> 
> That doesn't seem like a show stopper to me.
> 
> The part that I see is that it is basically impossible to do
> arbitrarily large transactions in a filesystem - they are limited by
> the size of the log. e.g. you can't have a user transaction that
> writes more data or modifies more data than the log allows in a
> single checkpoint/transaction. e.g. you can't just overwrite a 100MB
> file in a transaction and expect it to work. It might work if you've
> got a 2GB log, but if you've only got a 10MB log, then that
> overwrite transaction is full of fail.

We have this problem too. none of the solutions are particularly pretty,
and certainly do have a performance impact.

> It's issues that like that that doom the generic usefulness of
> userspace controlled filesystem transactions as part of the normal
> filesystem operation. If you need this sort of functionality, it has
> to be layered over the top of the filesystem to avoid filesystem
> atomicity limitations. i.e. another layer of tracking and
> journalling. And at that point you're talking about implementing a
> database on top of the filesystem in the filesystem....

As I said... it's tricky to solve all the problems :)
-- 
Stewart Smith

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-02-17  4:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-15 19:15 Transactional XFS? Grozdan
2012-02-16  0:22 ` Dave Chinner
2012-02-16  1:01   ` Stewart Smith
2012-02-16  1:43     ` Dave Chinner
2012-02-16  5:38       ` Stewart Smith
2012-02-16  6:42         ` Dave Chinner
2012-02-17  4:40           ` Stewart Smith
2012-02-16 22:10       ` Peter Grandi
2012-02-16 12:01 ` Matthias Schniedermeyer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.