linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [TOPIC LPC] Filesystem Shrink
@ 2021-09-08  7:27 Allison Henderson
  2021-09-08 17:31 ` Amir Goldstein
  2021-09-08 22:25 ` Andreas Dilger
  0 siblings, 2 replies; 5+ messages in thread
From: Allison Henderson @ 2021-09-08  7:27 UTC (permalink / raw)
  To: linux-fsdevel

Hi All,

Earlier this month I had sent out a lpc micro conference proposal for
file system shrink.  It sounds like the talk is of interest, but folks
recommended I forward the discussion to fsdevel for more feed back.
Below is the abstract for the talk:


File system shrink allows a file system to be reduced in size by some 
specified size blocks as long as the file system has enough unallocated 
space to do so.  This operation is currently unsupported in xfs.  Though 
a file system can be backed up and recreated in smaller sizes, this is 
not functionally the same as an in place resize.  Implementing this 
feature is costly in terms of developer time and resources, so it is 
important to consider the motivations to implement this feature.  This 
talk would aim to discuss any user stories for this feature.  What are 
the possible cases for a user needing to shrink the file system after 
creation, and by how much?  Can these requirements be satisfied with a 
simpler mkfs option to backup an existing file system into a new but 
smaller filesystem?  In the cases of creating a rootfs, will a protofile 
suffice?  If the shrink feature is needed, we should further discuss the 
APIs that users would need.

Beyond the user stories, it is also worth discussing implementation 
challenges.  Reflink and parent pointers can assist in facilitating 
shrink operations, but is it reasonable to make them requirements for 
shrink?  Gathering feedback and addressing these challenges will help 
guide future development efforts for this feature.


Comments and feedback are appreciated!
Thanks!

Allison

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [TOPIC LPC] Filesystem Shrink
  2021-09-08  7:27 [TOPIC LPC] Filesystem Shrink Allison Henderson
@ 2021-09-08 17:31 ` Amir Goldstein
  2021-09-09  2:34   ` Dave Chinner
  2021-09-08 22:25 ` Andreas Dilger
  1 sibling, 1 reply; 5+ messages in thread
From: Amir Goldstein @ 2021-09-08 17:31 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-fsdevel, Dave Chinner

On Wed, Sep 8, 2021 at 10:51 AM Allison Henderson
<allison.henderson@oracle.com> wrote:
>
> Hi All,
>
> Earlier this month I had sent out a lpc micro conference proposal for
> file system shrink.  It sounds like the talk is of interest, but folks
> recommended I forward the discussion to fsdevel for more feed back.
> Below is the abstract for the talk:
>
>
> File system shrink allows a file system to be reduced in size by some
> specified size blocks as long as the file system has enough unallocated
> space to do so.  This operation is currently unsupported in xfs.  Though
> a file system can be backed up and recreated in smaller sizes, this is
> not functionally the same as an in place resize.  Implementing this
> feature is costly in terms of developer time and resources, so it is
> important to consider the motivations to implement this feature.  This
> talk would aim to discuss any user stories for this feature.  What are
> the possible cases for a user needing to shrink the file system after
> creation, and by how much?  Can these requirements be satisfied with a
> simpler mkfs option to backup an existing file system into a new but
> smaller filesystem?  In the cases of creating a rootfs, will a protofile
> suffice?  If the shrink feature is needed, we should further discuss the
> APIs that users would need.
>
> Beyond the user stories, it is also worth discussing implementation
> challenges.  Reflink and parent pointers can assist in facilitating
> shrink operations, but is it reasonable to make them requirements for
> shrink?  Gathering feedback and addressing these challenges will help
> guide future development efforts for this feature.
>
>
> Comments and feedback are appreciated!
> Thanks!
>

Hi Allison,

That sounds like an interesting topic for discussion.
It reminds me of a cool proposal that Dave posted a while back [1]
about limiting the thin provisioned disk usage of xfs.

I imagine that online shrinking would involve limiting new block
allocations to a certain blockdev offset (or AG) am I right?
I wonder, how is statfs() going to present the available/free blocks
information in that state?

If high blocks are presented as free then users may encounter
surprising ENOSPC.
If all high blocks are presented as used, then removing files
in high space, won't free up available disk space.
There is an option to reduce total size and present the high blocks
as over committed disk usage, but that is going to be weird...

Have you spent any time considering these user visible
implications?

Thanks,
Amir.

[1] https://lore.kernel.org/linux-xfs/20171026083322.20428-1-david@fromorbit.com/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [TOPIC LPC] Filesystem Shrink
  2021-09-08  7:27 [TOPIC LPC] Filesystem Shrink Allison Henderson
  2021-09-08 17:31 ` Amir Goldstein
@ 2021-09-08 22:25 ` Andreas Dilger
  2021-09-14  6:12   ` Allison Henderson
  1 sibling, 1 reply; 5+ messages in thread
From: Andreas Dilger @ 2021-09-08 22:25 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 3255 bytes --]

On Sep 8, 2021, at 1:27 AM, Allison Henderson <allison.henderson@oracle.com> wrote:
> 
> Hi All,
> 
> Earlier this month I had sent out a lpc micro conference proposal for
> file system shrink.  It sounds like the talk is of interest, but folks
> recommended I forward the discussion to fsdevel for more feed back.
> Below is the abstract for the talk:
> 
> 
> File system shrink allows a file system to be reduced in size by some specified size blocks as long as the file system has enough unallocated space to do so.  This operation is currently unsupported in xfs.  Though a file system can be backed up and recreated in smaller sizes, this is not functionally the same as an in place resize.  Implementing this feature is costly in terms of developer time and resources, so it is important to consider the motivations to implement this feature.  This talk would aim to discuss any user stories for this feature.  What are the possible cases for a user needing to shrink the file system after creation, and by how much?  Can these requirements be satisfied with a simpler mkfs option to backup an existing file system into a new but smaller filesystem?  In the cases of creating a rootfs, will a protofile suffice?  If the shrink feature is needed, we should further discuss the APIs that users would need.
> 
> Beyond the user stories, it is also worth discussing implementation challenges.  Reflink and parent pointers can assist in facilitating shrink operations, but is it reasonable to make them requirements for shrink?  Gathering feedback and addressing these challenges will help guide future development efforts for this feature.
> 
> 
> Comments and feedback are appreciated!

This is an issue that has come up occasionally in the past, and more
frequently these days because of virtualization. "Accidental resize"
kind of mistakes, or an installer formatting a huge root filesystem
but wanting to carve off separate filesystems for more robustness
(e.g. so /var/log and /var/tmp don't fill the single root filesystem
and cause the system to fail).

There was some prototype work for a "lazy" online shrink mechanism
for ext4, that essentially just prevented block allocations at the
end of the filesystem.  This required userspace to move any files
and inodes that were beyond the high watermark, and then some time
later either do the shrink offline once the end of the filesystem
was empty, or later enhance the online resize code to remove unused
block groups at the end of the filesystem.  This turns out to be not
as complex as one expects, if the filesystem is already mostly empty,
which is true in the majority of real use cases ("accidental resize",
or "huge root partition" cases).

There is an old a patch available in Patchworks and some discussion
about what would be needed to make it suitable for production use:

https://patchwork.ozlabs.org/project/linux-ext4/patch/9ba7e5de79b8b25e335026d57ec0640fc25e5ce0.1534905460.git.jaco@uls.co.za/

I don't think it would need a huge effort to update that patch and add
the minor changes that are needed to make it really usable (stop inode
allocations beyond the high watermark, add a group remove ioctl, etc.)

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [TOPIC LPC] Filesystem Shrink
  2021-09-08 17:31 ` Amir Goldstein
@ 2021-09-09  2:34   ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2021-09-09  2:34 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Allison Henderson, linux-fsdevel

On Wed, Sep 08, 2021 at 08:31:24PM +0300, Amir Goldstein wrote:
> On Wed, Sep 8, 2021 at 10:51 AM Allison Henderson
> <allison.henderson@oracle.com> wrote:
> >
> > Hi All,
> >
> > Earlier this month I had sent out a lpc micro conference proposal for
> > file system shrink.  It sounds like the talk is of interest, but folks
> > recommended I forward the discussion to fsdevel for more feed back.
> > Below is the abstract for the talk:
> >
> >
> > File system shrink allows a file system to be reduced in size by some
> > specified size blocks as long as the file system has enough unallocated
> > space to do so.  This operation is currently unsupported in xfs.  Though
> > a file system can be backed up and recreated in smaller sizes, this is
> > not functionally the same as an in place resize.  Implementing this
> > feature is costly in terms of developer time and resources, so it is
> > important to consider the motivations to implement this feature.  This
> > talk would aim to discuss any user stories for this feature.  What are
> > the possible cases for a user needing to shrink the file system after
> > creation, and by how much?  Can these requirements be satisfied with a
> > simpler mkfs option to backup an existing file system into a new but
> > smaller filesystem?

That has been the traditional answer - create a new block device,
mkfs, xfsdump/xfs_restore and off you go. It has the benefit of only
needing to read/write data once, but has the downside that it does
not keep extent sharing information (reflink/dedupe) intact.

The problem I hear about most regularly these days is management of
cloudy stuff, where there aren't new block devices and/or storage
space available so this mechanism is not really available for use.
The ideal solution for these environments is sparse storage and
fstrim to free storage space that is unused by the filesystem and
that does not require shrink, but that seems difficult because
cloudy management interfaces don't seem to have a concept of
storage consumption vs assigned filesystem capacity....

> > In the cases of creating a rootfs, will a protofile
> > suffice?  If the shrink feature is needed, we should further discuss the
> > APIs that users would need.
> >
> > Beyond the user stories, it is also worth discussing implementation
> > challenges.  Reflink and parent pointers can assist in facilitating

"Reflink, reverse mapping and parent pointers"....

> > shrink operations, but is it reasonable to make them requirements for
> > shrink?

If the filesystem metadata is larger than what can be cached in
memory, then the only way of doing a performant shrink operation is
to have rmap (for GETFSMAP queries) and parent pointers (for path
name reconstruction).

Indeed, GETFSMAP is the only way we can find owners of all the
metadata in the AG that needs to be moved as some per-inode metadata
can be otherwise invisible to userspace (e.g. BMBT blocks).  Finding
such metadata without rmapbt support requires GETFSMAP to implement
a brute-force in-kernel used space scanner to identify such blocks
to report their owners. That's a lot of new code just to replicate
what rmapbt already does...

Really, though, userspace should just rely on having GETFSMAP tell
it everything that needs to move. Support for filesystems
that don't support rmapbt and require dumb, brute force searches to
provide the information can be added in future as they aren't
actually required to implement a working shrink algorithm.

As for reflink, it's been the default for a few years now, so making
shrink require it so that it can do atomic data movement in
userspace without any additional kernel support requirements doesn't
seem particularly bothersome to me...

> > Gathering feedback and addressing these challenges will help
> > guide future development efforts for this feature.
> >
> >
> > Comments and feedback are appreciated!
> > Thanks!
> >
> 
> Hi Allison,
> 
> That sounds like an interesting topic for discussion.
> It reminds me of a cool proposal that Dave posted a while back [1]
> about limiting the thin provisioned disk usage of xfs.

That's a different kettle of fish altogether - it allows for the
filesystem to grow and shrink logically, not physically, and has a
fundamental requirement for a sparse block device to decouple the
filesystem LBA from the physical storage LBAs. In the extreme,
the filesystem still needs a physical shrink operation if the user
requires the sparse device size to change....

> I imagine that online shrinking would involve limiting new block
> allocations to a certain blockdev offset (or AG) am I right?

Sort of.

We do need to limit new _user_ allocations (data and metadata) in
AGs that we are going to shrink away. We still need to be able
to atomically move data and metadata out of those AGs and that may
require allocation of new AG internal metadata to facilitate. e.g.
modifying freespace, rmaps, refcounts, etc can all require
allocation of new btree blocks in the offline AG.

> I wonder, how is statfs() going to present the available/free blocks
> information in that state?

No matter what we do, it will be "wrong" for someone.

In the current design, visible filesystem size does not change until
the final stage where the physical space is atomically removed via a
recoverable transaction.  There are several reasons for this, the
least of which is that turning off allocation is intended to be used
by more than just shrink. e.g AG could be offline for repair, etc.

As it is, ENOSPC can already happen when there is heaps of free
space available in the filesystem.  e.g. reflink copies can fail
ENOSPC because there isn't space in the AG for the new AG internal
refcount or rmap records to be recorded in the relevant AG btrees.

Indeed, the only way we are going to know if shrink cannot move all the
data out of the AGs we want to shrink away is to have all the other
AGs hit "AG full and no other allocation candidate" ENOSPC
conditions during data movement.

e.g. we start a shrink by checking if there's space available in the
lower AGs for all the data that needs to be moved (via
XFS_IOC_AG_GEOMETRY) so we know it should succeed. But if the user
starts consuming space after this check, there's every chance that
the shrink is going to fail because there is no longer enough space
available in the lower AGs to move all the data.

Changing what statfs() reports isn't going to fix/prevent problems
like this...

> If high blocks are presented as free then users may encounter
> surprising ENOSPC.
> If all high blocks are presented as used, then removing files
> in high space, won't free up available disk space.

Yup. And if you present them as used the userspace data movement
algorithm may not be able to make progress even when there is still
internal space available in the remaining AGs that could be used.

> There is an option to reduce total size and present the high blocks
> as over committed disk usage, but that is going to be weird...

Not to mention complex to account for and incredibly fragile to
maintain.

Of course, I haven't really even mentioned shrink failure semantics.
If the data movement fails because of a transient ENOSPC condition,
should the applications even be aware that a shrink was in progress?

> Have you spent any time considering these user visible
> implications?

An awful lot, in fact. Physically shrinking an active filesystem
cannot be done instantly, and so there are always going to be
situations where the behaviour we choose is going to be the wrong
choice for some user. Remember that the data movement part of a
physical shrink operation could take hours, days or even weeks to
complete; this is the dominating user visible implication of
physical shrinking...

The likelihood of a physical shrink failing is quite high - data
movement to empty physical space is not guaranteed to succeed.
There's all sorts of complexity around moving shared data extents
(reflink/deduped copies) that actually increase filesystem space
usage during a shrink (transient increase as well as permanent).
That can result in a shrink failing even though there's technically
enough free space in the lower AGs to complete the shrink...

So when you take into account the likelihood of failure, transient
ENOSPC conditions during a shrink, the heavy impact on performance
the data movement will have, the difficulty in doing atomic
relocation on actively modified files and directories, etc, the
answer to all these problems is "don't run shrink on production
filesystems". i.e "Online" only means the filesystem is mounted
while the shrink runs, not that it's something you run in
production...

With that in mind, worrying about how applications react to shrink
changing the allocation patterns and the amount of space available
is pretty much the least of my concerns at this point in time...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [TOPIC LPC] Filesystem Shrink
  2021-09-08 22:25 ` Andreas Dilger
@ 2021-09-14  6:12   ` Allison Henderson
  0 siblings, 0 replies; 5+ messages in thread
From: Allison Henderson @ 2021-09-14  6:12 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-fsdevel



On 9/8/21 3:25 PM, Andreas Dilger wrote:
> On Sep 8, 2021, at 1:27 AM, Allison Henderson <allison.henderson@oracle.com> wrote:
>>
>> Hi All,
>>
>> Earlier this month I had sent out a lpc micro conference proposal for
>> file system shrink.  It sounds like the talk is of interest, but folks
>> recommended I forward the discussion to fsdevel for more feed back.
>> Below is the abstract for the talk:
>>
>>
>> File system shrink allows a file system to be reduced in size by some specified size blocks as long as the file system has enough unallocated space to do so.  This operation is currently unsupported in xfs.  Though a file system can be backed up and recreated in smaller sizes, this is not functionally the same as an in place resize.  Implementing this feature is costly in terms of developer time and resources, so it is important to consider the motivations to implement this feature.  This talk would aim to discuss any user stories for this feature.  What are the possible cases for a user needing to shrink the file system after creation, and by how much?  Can these requirements be satisfied with a simpler mkfs option to backup an existing file system into a new but smaller filesystem?  In the cases of creating a rootfs, will a protofile suffice?  If the shrink feature is needed, we should further discuss the APIs that users would need.
>>
>> Beyond the user stories, it is also worth discussing implementation challenges.  Reflink and parent pointers can assist in facilitating shrink operations, but is it reasonable to make them requirements for shrink?  Gathering feedback and addressing these challenges will help guide future development efforts for this feature.
>>
>>
>> Comments and feedback are appreciated!
> 
> This is an issue that has come up occasionally in the past, and more
> frequently these days because of virtualization. "Accidental resize"
> kind of mistakes, or an installer formatting a huge root filesystem
> but wanting to carve off separate filesystems for more robustness
> (e.g. so /var/log and /var/tmp don't fill the single root filesystem
> and cause the system to fail).
> 
> There was some prototype work for a "lazy" online shrink mechanism
> for ext4, that essentially just prevented block allocations at the
> end of the filesystem.  This required userspace to move any files
> and inodes that were beyond the high watermark, and then some time
> later either do the shrink offline once the end of the filesystem
> was empty, or later enhance the online resize code to remove unused
> block groups at the end of the filesystem.  This turns out to be not
> as complex as one expects, if the filesystem is already mostly empty,
> which is true in the majority of real use cases ("accidental resize",
> or "huge root partition" cases).
> 
> There is an old a patch available in Patchworks and some discussion
> about what would be needed to make it suitable for production use:
> 
> https://patchwork.ozlabs.org/project/linux-ext4/patch/9ba7e5de79b8b25e335026d57ec0640fc25e5ce0.1534905460.git.jaco@uls.co.za/
> 
> I don't think it would need a huge effort to update that patch and add
> the minor changes that are needed to make it really usable (stop inode
> allocations beyond the high watermark, add a group remove ioctl, etc.)
> 
> Cheers, Andreas
> 
I see, thanks for the link, I didn't know there had been effort made on 
the ext4 side, I had been looking more at the xfs implementation.  This 
certainly looks like it might be a good starting point for ext4, it 
seems both solutions need to limit user allocations one way or another. 
  And I suspect both will have to deal with similar statfs reporting 
challenges discussed for the xfs approach too.  Virtualization issues 
seem to be a common motivator for shrink support in both fs types, so 
it's a good indication that is a worthwhile pursuit.  Perhaps then this 
discussion topic will be of interest to both xfs and ext solutions.

Thanks for the feedback!
Allison


> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-09-14  6:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-08  7:27 [TOPIC LPC] Filesystem Shrink Allison Henderson
2021-09-08 17:31 ` Amir Goldstein
2021-09-09  2:34   ` Dave Chinner
2021-09-08 22:25 ` Andreas Dilger
2021-09-14  6:12   ` Allison Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).