linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Disk space accounting and subvolume delete
@ 2010-05-10 18:23 Bruce Guenter
  2010-05-10 18:50 ` Josef Bacik
  2010-05-11  0:10 ` Yan, Zheng 
  0 siblings, 2 replies; 9+ messages in thread
From: Bruce Guenter @ 2010-05-10 18:23 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 989 bytes --]

Hi.

When deleting a snapshot, I have observed that the disk space used by
that snapshot is not immediately released (according to statvfs or df).
Neither "sync" nor "btrfs filesystem sync" releases the disk space
neither.  The only way I have found to actually fully release the disk
space is to issue the sync and then sleep until the statvfs free numbers
stop changing.

This is a rather problematic approach to managing disk space.  Is there
any way to either force a wait until the disk space has been released?

My application is automatically managing disk space in the presence of
snapshots.  I allow the disk (a backup) to fill up with snapshots until
it is nearly full, and then to delete snapshots until I have a threshold
free.  However, without the disk space being released promptly and no
way to wait until it is released, the loop can't tell how many snapshots
to delete.

-- 
Bruce Guenter <bruce@untroubled.org>                http://untroubled.org/

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Disk space accounting and subvolume delete
  2010-05-10 18:23 Disk space accounting and subvolume delete Bruce Guenter
@ 2010-05-10 18:50 ` Josef Bacik
  2010-05-11  0:10 ` Yan, Zheng 
  1 sibling, 0 replies; 9+ messages in thread
From: Josef Bacik @ 2010-05-10 18:50 UTC (permalink / raw)
  To: linux-btrfs

On Mon, May 10, 2010 at 12:23:52PM -0600, Bruce Guenter wrote:
> Hi.
> 
> When deleting a snapshot, I have observed that the disk space used by
> that snapshot is not immediately released (according to statvfs or df).
> Neither "sync" nor "btrfs filesystem sync" releases the disk space
> neither.  The only way I have found to actually fully release the disk
> space is to issue the sync and then sleep until the statvfs free numbers
> stop changing.
> 
> This is a rather problematic approach to managing disk space.  Is there
> any way to either force a wait until the disk space has been released?
> 
> My application is automatically managing disk space in the presence of
> snapshots.  I allow the disk (a backup) to fill up with snapshots until
> it is nearly full, and then to delete snapshots until I have a threshold
> free.  However, without the disk space being released promptly and no
> way to wait until it is released, the loop can't tell how many snapshots
> to delete.
> 

The way BTRFS's COW works is that we can't free up space until after a
transaction has committed.  After the transaction commits (after a sync) we walk
the list of pinned extents and free them asynchronously.  We could probably make
btrfs filesystem sync wait for that part to finish tho.  It shouldn't be too
hard to do, feel free to take a crack at it.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Disk space accounting and subvolume delete
  2010-05-10 18:23 Disk space accounting and subvolume delete Bruce Guenter
  2010-05-10 18:50 ` Josef Bacik
@ 2010-05-11  0:10 ` Yan, Zheng 
  2010-05-11 15:45   ` Bruce Guenter
  1 sibling, 1 reply; 9+ messages in thread
From: Yan, Zheng  @ 2010-05-11  0:10 UTC (permalink / raw)
  To: linux-btrfs

On Tue, May 11, 2010 at 2:23 AM, Bruce Guenter <bruce@untroubled.org> w=
rote:
> Hi.
>
> When deleting a snapshot, I have observed that the disk space used by
> that snapshot is not immediately released (according to statvfs or df=
).
> Neither "sync" nor "btrfs filesystem sync" releases the disk space
> neither. =A0The only way I have found to actually fully release the d=
isk
> space is to issue the sync and then sleep until the statvfs free numb=
ers
> stop changing.
>
> This is a rather problematic approach to managing disk space. =A0Is t=
here
> any way to either force a wait until the disk space has been released=
?
>
> My application is automatically managing disk space in the presence o=
f
> snapshots. =A0I allow the disk (a backup) to fill up with snapshots u=
ntil
> it is nearly full, and then to delete snapshots until I have a thresh=
old
> free. =A0However, without the disk space being released promptly and =
no
> way to wait until it is released, the loop can't tell how many snapsh=
ots
> to delete.
>

This is because the snapshot deleting ioctl only removes the a link.
The corresponding tree is dropped in the background by a kernel thread.
We could probably add another ioctl that waits until the tree has been
completely dropped.

Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Disk space accounting and subvolume delete
  2010-05-11  0:10 ` Yan, Zheng 
@ 2010-05-11 15:45   ` Bruce Guenter
  2010-05-12  5:02     ` Yan, Zheng 
  0 siblings, 1 reply; 9+ messages in thread
From: Bruce Guenter @ 2010-05-11 15:45 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1162 bytes --]

On Tue, May 11, 2010 at 08:10:38AM +0800, Yan, Zheng  wrote:
> This is because the snapshot deleting ioctl only removes the a link.

Right, I understand that.  That part is not unexpected, as it works just
like unlink would.  However...

> The corresponding tree is dropped in the background by a kernel thread.

The surprise is that 'sync', in any form I was able to try, does not
wait until all or even most of the I/O is completed.  Apparently the
standards spec for sync(2) says it is not required to wait for I/O to
complete, but AFAIK all other Linux FS do wait (the man page for sync(2)
implies as much, as does the info page for sync in glibc).

The only way I've found so far to force this behavior is to unmount, and
that's rather intrusive to other users of the FS.

> We could probably add another ioctl that waits until the tree has been
> completely dropped.

Since the expected behavior for sync is to wait until all pending I/O
has been completed, I would argue this should be the default action for
sync.  Am I misunderstanding something?

-- 
Bruce Guenter <bruce@untroubled.org>                http://untroubled.org/

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Disk space accounting and subvolume delete
  2010-05-11 15:45   ` Bruce Guenter
@ 2010-05-12  5:02     ` Yan, Zheng 
  2010-05-12 21:56       ` Mike Fleetwood
  2010-05-31 19:01       ` Bruce Guenter
  0 siblings, 2 replies; 9+ messages in thread
From: Yan, Zheng  @ 2010-05-12  5:02 UTC (permalink / raw)
  To: linux-btrfs

On Tue, May 11, 2010 at 11:45 PM, Bruce Guenter <bruce@untroubled.org> =
wrote:
> On Tue, May 11, 2010 at 08:10:38AM +0800, Yan, Zheng =A0wrote:
>> This is because the snapshot deleting ioctl only removes the a link.
>
> Right, I understand that. =A0That part is not unexpected, as it works=
 just
> like unlink would. =A0However...
>
>> The corresponding tree is dropped in the background by a kernel thre=
ad.
>
> The surprise is that 'sync', in any form I was able to try, does not
> wait until all or even most of the I/O is completed. =A0Apparently th=
e
> standards spec for sync(2) says it is not required to wait for I/O to
> complete, but AFAIK all other Linux FS do wait (the man page for sync=
(2)
> implies as much, as does the info page for sync in glibc).
>
> The only way I've found so far to force this behavior is to unmount, =
and
> that's rather intrusive to other users of the FS.
>
>> We could probably add another ioctl that waits until the tree has be=
en
>> completely dropped.
>
> Since the expected behavior for sync is to wait until all pending I/O
> has been completed, I would argue this should be the default action f=
or
> sync. =A0Am I misunderstanding something?
>

Dropping a tree can be lengthy. It's not good to let sync wait for hour=
s.
=46or most linux FS, 'sync' just force an transaction/journal commit. I=
 don't
think they wait for large operations that can span multiple transaction=
s to
complete.

Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Disk space accounting and subvolume delete
  2010-05-12  5:02     ` Yan, Zheng 
@ 2010-05-12 21:56       ` Mike Fleetwood
  2010-05-31 19:01       ` Bruce Guenter
  1 sibling, 0 replies; 9+ messages in thread
From: Mike Fleetwood @ 2010-05-12 21:56 UTC (permalink / raw)
  To: linux-btrfs

On 12 May 2010 06:02, Yan, Zheng <yanzheng@21cn.com> wrote:
> On Tue, May 11, 2010 at 11:45 PM, Bruce Guenter <bruce@untroubled.org=
> wrote:
>> On Tue, May 11, 2010 at 08:10:38AM +0800, Yan, Zheng =C2=A0wrote:
>>> This is because the snapshot deleting ioctl only removes the a link=
=2E
>>
>> Right, I understand that. =C2=A0That part is not unexpected, as it w=
orks just
>> like unlink would. =C2=A0However...
>>
>>> The corresponding tree is dropped in the background by a kernel thr=
ead.
>>
>> The surprise is that 'sync', in any form I was able to try, does not
>> wait until all or even most of the I/O is completed. =C2=A0Apparentl=
y the
>> standards spec for sync(2) says it is not required to wait for I/O t=
o
>> complete, but AFAIK all other Linux FS do wait (the man page for syn=
c(2)
>> implies as much, as does the info page for sync in glibc).
>>
>> The only way I've found so far to force this behavior is to unmount,=
 and
>> that's rather intrusive to other users of the FS.
>>
>>> We could probably add another ioctl that waits until the tree has b=
een
>>> completely dropped.
>>
>> Since the expected behavior for sync is to wait until all pending I/=
O
>> has been completed, I would argue this should be the default action =
for
>> sync. =C2=A0Am I misunderstanding something?
>>
>
> Dropping a tree can be lengthy. It's not good to let sync wait for ho=
urs.
> For most linux FS, 'sync' just force an transaction/journal commit. I=
 don't
> think they wait for large operations that can span multiple transacti=
ons to
> complete.

Disclaimer: I know nothing about the internals of Btrfs!

I have an analogy as a way to thinking about what deleting a snapshot
entails (which I hope isn't totally bogus).

Deleting a clone of a file system is not like unlinking a single file.
 It is analogous to deleting a directory tree.  Syncing in the middle
of a recursive delete will wait for the in flight I/O to complete, but
it would not wait for the unlink requests from the portion of the
directory tree not yet traversed.  The same would be true when the
kernel thread deletes the snapshot by recursing through it's tree.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Disk space accounting and subvolume delete
  2010-05-12  5:02     ` Yan, Zheng 
  2010-05-12 21:56       ` Mike Fleetwood
@ 2010-05-31 19:01       ` Bruce Guenter
  2010-05-31 20:34         ` Mike Fedyk
  2010-06-01  2:32         ` Yan, Zheng 
  1 sibling, 2 replies; 9+ messages in thread
From: Bruce Guenter @ 2010-05-31 19:01 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 487 bytes --]

On Wed, May 12, 2010 at 01:02:07PM +0800, Yan, Zheng  wrote:
> Dropping a tree can be lengthy. It's not good to let sync wait for hours.
> For most linux FS, 'sync' just force an transaction/journal commit. I don't
> think they wait for large operations that can span multiple transactions to
> complete.

What happens to the consistency of the filesystem if a crash happens
during this process?

-- 
Bruce Guenter <bruce@untroubled.org>                http://untroubled.org/

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Disk space accounting and subvolume delete
  2010-05-31 19:01       ` Bruce Guenter
@ 2010-05-31 20:34         ` Mike Fedyk
  2010-06-01  2:32         ` Yan, Zheng 
  1 sibling, 0 replies; 9+ messages in thread
From: Mike Fedyk @ 2010-05-31 20:34 UTC (permalink / raw)
  To: linux-btrfs

On Mon, May 31, 2010 at 12:01 PM, Bruce Guenter <bruce@untroubled.org> =
wrote:
> On Wed, May 12, 2010 at 01:02:07PM +0800, Yan, Zheng =C2=A0wrote:
>> Dropping a tree can be lengthy. It's not good to let sync wait for h=
ours.
>> For most linux FS, 'sync' just force an transaction/journal commit. =
I don't
>> think they wait for large operations that can span multiple transact=
ions to
>> complete.
>
> What happens to the consistency of the filesystem if a crash happens
> during this process?

There's a good test case for you to try.  Let us know what you find.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Disk space accounting and subvolume delete
  2010-05-31 19:01       ` Bruce Guenter
  2010-05-31 20:34         ` Mike Fedyk
@ 2010-06-01  2:32         ` Yan, Zheng 
  1 sibling, 0 replies; 9+ messages in thread
From: Yan, Zheng  @ 2010-06-01  2:32 UTC (permalink / raw)
  To: linux-btrfs

On Tue, Jun 1, 2010 at 3:01 AM, Bruce Guenter <bruce@untroubled.org> wr=
ote:
> On Wed, May 12, 2010 at 01:02:07PM +0800, Yan, Zheng =A0wrote:
>> Dropping a tree can be lengthy. It's not good to let sync wait for h=
ours.
>> For most linux FS, 'sync' just force an transaction/journal commit. =
I don't
>> think they wait for large operations that can span multiple transact=
ions to
>> complete.
>
> What happens to the consistency of the filesystem if a crash happens
> during this process?
>

This does not break the consistency of the filesystem. Next mount will =
find the
partial dropped tree and restart the dropping process.

Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-06-01  2:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-10 18:23 Disk space accounting and subvolume delete Bruce Guenter
2010-05-10 18:50 ` Josef Bacik
2010-05-11  0:10 ` Yan, Zheng 
2010-05-11 15:45   ` Bruce Guenter
2010-05-12  5:02     ` Yan, Zheng 
2010-05-12 21:56       ` Mike Fleetwood
2010-05-31 19:01       ` Bruce Guenter
2010-05-31 20:34         ` Mike Fedyk
2010-06-01  2:32         ` Yan, Zheng 

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).