linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ?
@ 2018-07-26 16:31 Marc MERLIN
  2018-07-27 12:59 ` Zdenek Kabelac
  0 siblings, 1 reply; 15+ messages in thread
From: Marc MERLIN @ 2018-07-26 16:31 UTC (permalink / raw)
  To: linux-lvm

Still learning about thin volumes.
Why do I want my thin pool to get auto extended? Does "extended" mean
resized? 
Why would I want to have thin_pool_autoextend_threshold below 100 and
have it auto extend as needed vs having all of them be at 100, knowing
that underlying block allocation will fail if I run out of physical
blocks underneath?


Details:
I have a 14TB bcache block device.
On top, I'd like to put multiple btrfs filesystems.
There is however an issue with btrfs where it gets more unsafe (and
slower) to use if you have too many snapshots (over 50, and especially
over 100).
The fix around this is sadly to have multiple separate filesystems,
which kind of negates the nice part where you make subvolumes and let
them grow independently.

So, I'm going to make about 10 thin volumes, one for each of my btrfs
subvolumes so that they are all separate filesystems.
However, my plan is to make them all 14TB in size so that I never have
to resize the filesystem with the full understanding of course that the
sum of all is still going to be 14TiB underneath.

Right now, I'm getting this:
gargamel:~# lvcreate -V14TiB -T vgds2/thinpool2 -n debian64
  Using default stripesize 64.00 KiB.
  WARNING: Sum of all thin volume sizes (28.00 TiB) exceeds the size of thin pool vgds2/thinpool2 and the size of whole volume group (14.55 TiB).
  WARNING: You have not turned on protection against thin pools running out of space.
  WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
  Logical volume "debian64" created.

I'm looking at lvm.conf, and I'll be honest that it's not clear
# Configuration option activation/thin_pool_autoextend_threshold.
# Auto-extend a thin pool when its usage exceeds this percent.
# Setting this to 100 disables automatic extension.
# The minimum value is 50 (a smaller value is treated as 50.)
# Also see thin_pool_autoextend_percent.
# Automatic extension requires dmeventd to be monitoring the LV.
# 
# Example
# Using 70% autoextend threshold and 20% autoextend size, when a 1G
# thin pool exceeds 700M, it is extended to 1.2G, and when it exceeds
# 840M, it is extended to 1.44G:
# thin_pool_autoextend_threshold = 70
# 
thin_pool_autoextend_threshold = 100

What's the downside of just leaving it at 100?

Thanks
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ?
  2018-07-26 16:31 [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? Marc MERLIN
@ 2018-07-27 12:59 ` Zdenek Kabelac
  2018-07-27 18:26   ` Marc MERLIN
  0 siblings, 1 reply; 15+ messages in thread
From: Zdenek Kabelac @ 2018-07-27 12:59 UTC (permalink / raw)
  To: LVM general discussion and development, Marc MERLIN

Dne 26.7.2018 v 18:31 Marc MERLIN napsal(a):
> Still learning about thin volumes.
> Why do I want my thin pool to get auto extended? Does "extended" mean
> resized?

yes   extension == resize


> Why would I want to have thin_pool_autoextend_threshold below 100 and
> have it auto extend as needed vs having all of them be at 100, knowing
> that underlying block allocation will fail if I run out of physical
> blocks underneath?

Hi

man lvmthin.


In general -  do not calculate with 'running out of space on thin-pool' as 
your regular workflow how to use it daily.

Running out-of-space in thin-pool (data and even more on metadata) will have 
always MAJOR impact on usability of your system. It's always unpleasant moment 
and it's not even closely comparable with something like running out-of-space 
in your filesystem - it's much more problematic case - so you should at all 
cost try to avoid it.

If you want to be living on corner case of out-of-space, thin-pool is probably 
not the best technology for use.




> 
> 
> Details:
> I have a 14TB bcache block device.
> On top, I'd like to put multiple btrfs filesystems.

IMHO bad plan to combine 2 overprovisioning technologies together.

btrfs HAS its own built-in volume manager  (aka built-in it's own like lvm)


> There is however an issue with btrfs where it gets more unsafe (and
> slower) to use if you have too many snapshots (over 50, and especially
> over 100).

It's better to pair  thin-pool with ext4 of  XFS.

BTRFS will suffer great pain from problems of lvm2 snapshots - where btrfs 
will see the very same block device multiple times present in your system - so 
I'd highly discourage usage of thin-pool with btrfs unless you are very well 
aware of the weaknesses and you can avoid running into them...


> I'm looking at lvm.conf, and I'll be honest that it's not clear
> # Configuration option activation/thin_pool_autoextend_threshold.
> # Auto-extend a thin pool when its usage exceeds this percent.
> # Setting this to 100 disables automatic extension.
> # The minimum value is 50 (a smaller value is treated as 50.)
> # Also see thin_pool_autoextend_percent.
> # Automatic extension requires dmeventd to be monitoring the LV.
> #
> # Example
> # Using 70% autoextend threshold and 20% autoextend size, when a 1G
> # thin pool exceeds 700M, it is extended to 1.2G, and when it exceeds
> # 840M, it is extended to 1.44G:
> # thin_pool_autoextend_threshold = 70
> #
> thin_pool_autoextend_threshold = 100
> 
> What's the downside of just leaving it at 100?

Possible lose of your data in case you run out of space and you hit some 
corner cases - note just with 4.18 kernel will be fixed one quite annoying bug 
with usage of  TRIM and full pool which could have lead to some problematic 
metadata recovery.

Regards

Zdenek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ?
  2018-07-27 12:59 ` Zdenek Kabelac
@ 2018-07-27 18:26   ` Marc MERLIN
  2018-07-27 19:31     ` John Stoffel
  2018-07-31  2:44     ` Marc MERLIN
  0 siblings, 2 replies; 15+ messages in thread
From: Marc MERLIN @ 2018-07-27 18:26 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: LVM general discussion and development

Hi Zdenek,

Thanks for your helpful reply.

On Fri, Jul 27, 2018 at 02:59:28PM +0200, Zdenek Kabelac wrote:
> Dne 26.7.2018 v 18:31 Marc MERLIN napsal(a):
> >Still learning about thin volumes.
> >Why do I want my thin pool to get auto extended? Does "extended" mean
> >resized?
> 
> yes   extension == resize
 
Gotcha. Then I don't want to have to worry about my filesystem being resized
multiple times, especially since I'm not sure how it will help.
 
> man lvmthin.
 
Thanks. Had read it, but not carefully enough.
So, I just re-read "Automatic extend settings"
I'm still I'm not entirely sure how using extension would help me there. I
can't set it to 10% for all 10 filesystems (50% is minimum).
If I set it to anything less than 100%, it could later that it can block,
and try to extend and resize later, but ultimately I'll still have multiple
filesystems that together exceed the space available, so I can run out.
I'm not seeing how the automatic extend setting is helpful, at least in my case.
Am I missing something?

To be clear, my case is that I will have 10 filesystems in a place where the
same data was in a single filesystem that sadly I must segment now. More
than a few will take more than 1/10th of the space, but I don't want to have
to worry about which ones are going to use how much as long as all together
they stay below 100% of course.
I don't want to have to manage space for each of those 10 and have to resize
them by hand multiple times up and down to share the space, hence dm-thin.

My understanding is that I have to watch this carefully
  LV Name                thinpool2  
  VG Name                vgds2
  LV Pool metadata       thinpool2_tmeta
  LV Pool data           thinpool2_tdata
  LV Status              available
  # open                 8
  LV Size                14.50 TiB
  Allocated pool data    20.26%
  Allocated metadata     10.66%

I'll have to make sure to run fstrim so that 'Allocated pool data' never 
gets too high.
Metadata, I need to read more about to see whether that may become a problem.
I think as long as I don't use LVM snapshots I should be ok (and I won't).

> Running out-of-space in thin-pool (data and even more on metadata) will 
> have always MAJOR impact on usability of your system. It's always 
> unpleasant moment and it's not even closely comparable with something like 
> running out-of-space in your filesystem - it's much more problematic case - 
> so you should at all cost try to avoid it.
 
Thanks for confirming.
I suppose in my case I should set 'errorwhenfull y' so that the FS immmediately 
remounts read only on write failure. Delaying for up to 60 seconds is not
going to help in my case.

> If you want to be living on corner case of out-of-space, thin-pool is 
> probably not the best technology for use.
 
I don't want to be using dm-thin at all, but I have too many subvolumes for
a single btrfs filesystem, so I need to segement my btrfs filesystem in 10
or so, to be safe (as discussed with btrfs developers)
 
> IMHO bad plan to combine 2 overprovisioning technologies together.
> btrfs HAS its own built-in volume manager  (aka built-in it's own like lvm)
 
btrfs does not over provision, and sadly I found out that if you have more
than 50 or 100 snapshots, you are going to run into problems with balancing,
and bigger problems with filesystem corruption and repair later (as I found
out over the last 3 weeks dealing with this)
 
> >There is however an issue with btrfs where it gets more unsafe (and
> >slower) to use if you have too many snapshots (over 50, and especially
> >over 100).
> 
> It's better to pair  thin-pool with ext4 of  XFS.
 
I need btrfs send/receive, so that's not an option.

> BTRFS will suffer great pain from problems of lvm2 snapshots - where btrfs 

I will not be using lvm snapshots at all.

> will see the very same block device multiple times present in your system - 
> so I'd highly discourage usage of thin-pool with btrfs unless you are very 
> well aware of the weaknesses and you can avoid running into them...

I'm only using thin-pool to allow dynamic block allocation for over
provisioning. I will use no other LVM feature. Is that ok?

> Possible lose of your data in case you run out of space and you hit some 
> corner cases - note just with 4.18 kernel will be fixed one quite annoying 
> bug with usage of  TRIM and full pool which could have lead to some 
> problematic metadata recovery.

So, as long as I run trim in btrfs and make very sure I don't run out of blocks
on the VG side, should I be safe-ish enough?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ?
  2018-07-27 18:26   ` Marc MERLIN
@ 2018-07-27 19:31     ` John Stoffel
  2018-07-27 19:58       ` Marc MERLIN
  2018-07-31  4:52       ` Chris Murphy
  2018-07-31  2:44     ` Marc MERLIN
  1 sibling, 2 replies; 15+ messages in thread
From: John Stoffel @ 2018-07-27 19:31 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Zdenek Kabelac

>>>>> "Marc" == Marc MERLIN <marc@merlins.org> writes:

Marc> Hi Zdenek,
Marc> Thanks for your helpful reply.

Marc> On Fri, Jul 27, 2018 at 02:59:28PM +0200, Zdenek Kabelac wrote:
>> Dne 26.7.2018 v 18:31 Marc MERLIN napsal(a):
>> >Still learning about thin volumes.
>> >Why do I want my thin pool to get auto extended? Does "extended" mean
>> >resized?
>> 
>> yes   extension == resize
 
Marc> Gotcha. Then I don't want to have to worry about my filesystem being resized
Marc> multiple times, especially since I'm not sure how it will help.
 
>> man lvmthin.
 
Marc> Thanks. Had read it, but not carefully enough.
Marc> So, I just re-read "Automatic extend settings"
Marc> I'm still I'm not entirely sure how using extension would help me there. I
Marc> can't set it to 10% for all 10 filesystems (50% is minimum).
Marc> If I set it to anything less than 100%, it could later that it can block,
Marc> and try to extend and resize later, but ultimately I'll still have multiple
Marc> filesystems that together exceed the space available, so I can run out.
Marc> I'm not seeing how the automatic extend setting is helpful, at least in my case.
Marc> Am I missing something?

Marc> To be clear, my case is that I will have 10 filesystems in a
Marc> place where the same data was in a single filesystem that sadly
Marc> I must segment now. More than a few will take more than 1/10th
Marc> of the space, but I don't want to have to worry about which ones
Marc> are going to use how much as long as all together they stay
Marc> below 100% of course.

Marc> I don't want to have to manage space for each of those 10 and
Marc> have to resize them by hand multiple times up and down to share
Marc> the space, hence dm-thin.

Why don't you run quotas on your filesystems?  Also, none of the
filesystems in Linux land that I'm aware of supports shrinking the
filesystem while live, it's all a unmount, shrink FS, shrink volume
(carefully!) and then re-mount the filesystem.

But again, I think you might really prefer quotas instead, unless you
need complete logical seperation.

John

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ?
  2018-07-27 19:31     ` John Stoffel
@ 2018-07-27 19:58       ` Marc MERLIN
  2018-07-27 21:09         ` John Stoffel
  2018-07-31  4:52       ` Chris Murphy
  1 sibling, 1 reply; 15+ messages in thread
From: Marc MERLIN @ 2018-07-27 19:58 UTC (permalink / raw)
  To: LVM general discussion and development, john; +Cc: Zdenek Kabelac

On Fri, Jul 27, 2018 at 03:31:36PM -0400, John Stoffel wrote:
> Why don't you run quotas on your filesystems?  Also, none of the
> filesystems in Linux land that I'm aware of supports shrinking the
> filesystem while live, it's all a unmount, shrink FS, shrink volume
> (carefully!) and then re-mount the filesystem.
 
Those filesystems can be umounted, so shrinking while live is not
something I need even if btrfs might actually support it.

> But again, I think you might really prefer quotas instead, unless you
> need complete logical seperation.

Since I know more than I wish I did about btrfs :) let me explain a bit
more

0) I will not be using lvm for its own snapshot capabilities, or resize.
I'm cheating by using overcommit with dm-thin and not wanting to worry
about segmenting space between each fileystem and having to worry about 
shrinking one to grow another one from time to time.

1) quotas don't work well on btrfs when you have snapshots, and by that
I mean btfrs snapshots. Because blocks are shared between snapshots,
calculating quotas is a performance problem.

2) I don't have a space or quota problem on btrfs, the problem I have is
I use btrfs send/receive a lot for backups (it's a backup server) and
history (go back a month ago or whatever).
http://marc.merlins.org/perso/btrfs/post_2014-03-22_Btrfs-Tips_-Doing-Fast-Incremental-Backups-With-Btrfs-Send-and-Receive.html
if you aren't familiar with btrfs send/receive backups.
Btrfs starts having performance problems for some operations (re-balance,
or fsck) when you have too many subvolumes (each snapshot creates a
subvolume).

3) I hit severe enough problems that filesystem checks were taking days
to complete, which was not workable. The only way around it is to have
fewer subvolumes.

4) because I still need the same amount of backups and want the same
amount of history, fewer subvolumes means moving each separate subvolume
into its own separate filesystem.

Then there is the last part that btrfs is still not super stable and can
have corruption problems (although in my case I had clear problems due
to an underlying unreliable SATA subsystem which caused writes not to
make it to all the blocks of each drive of a raid set, something that
even careful journalling does not deal with with).
So, I have:

5) when things go wrong with btrfs, you're better off having smaller
filesystems with less data as they are quicker to check and repair as
well we quicker to rebuild if they are corrupted beyond repair
(btrfs can easily get into a state where all or most of your data is
still there read only, but the filesystem has extent issues that can't
be fixed at this moment and require a rebuild)

Makes sense?

Am I crazy to want to use dm-thin the way I'm trying to? :)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ?
  2018-07-27 19:58       ` Marc MERLIN
@ 2018-07-27 21:09         ` John Stoffel
  2018-07-27 23:35           ` Marc MERLIN
  0 siblings, 1 reply; 15+ messages in thread
From: John Stoffel @ 2018-07-27 21:09 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Zdenek Kabelac, LVM general discussion and development

>>>>> "Marc" == Marc MERLIN <marc@merlins.org> writes:

Marc> On Fri, Jul 27, 2018 at 03:31:36PM -0400, John Stoffel wrote:
>> Why don't you run quotas on your filesystems?  Also, none of the
>> filesystems in Linux land that I'm aware of supports shrinking the
>> filesystem while live, it's all a unmount, shrink FS, shrink volume
>> (carefully!) and then re-mount the filesystem.
 
Marc> Those filesystems can be umounted, so shrinking while live is not
Marc> something I need even if btrfs might actually support it.

>> But again, I think you might really prefer quotas instead, unless you
>> need complete logical seperation.

Marc> Since I know more than I wish I did about btrfs :) let me explain a bit
Marc> more

Marc> 0) I will not be using lvm for its own snapshot capabilities, or
Marc> resize.  I'm cheating by using overcommit with dm-thin and not
Marc> wanting to worry about segmenting space between each fileystem
Marc> and having to worry about shrinking one to grow another one from
Marc> time to time.

Marc> 1) quotas don't work well on btrfs when you have snapshots, and
Marc> by that I mean btfrs snapshots. Because blocks are shared
Marc> between snapshots, calculating quotas is a performance problem.

Marc> 2) I don't have a space or quota problem on btrfs, the problem I
Marc> have is I use btrfs send/receive a lot for backups (it's a
Marc> backup server) and history (go back a month ago or whatever).
Marc> http://marc.merlins.org/perso/btrfs/post_2014-03-22_Btrfs-Tips_-Doing-Fast-Incremental-Backups-With-Btrfs-Send-and-Receive.html
Marc> if you aren't familiar with btrfs send/receive backups.  Btrfs
Marc> starts having performance problems for some operations
Marc> (re-balance, or fsck) when you have too many subvolumes (each
Marc> snapshot creates a subvolume).

That's the key part that I didn't realize.  And this is why I'm still
leary of btrfs (and zfs for that matter) since as you push the limits,
they tend to fall off a cliff performance wise, instead of degrading
more gracefully.  So you're obvisously also using source brtfs
volume(s) for your data being backed up.  So can understand what
you're trying to do...

So is it a single 14tb source btrfs volume, and did you make snapshots
on a rotating basis to the destinations volumes?  

Marc> 3) I hit severe enough problems that filesystem checks were
Marc> taking days to complete, which was not workable. The only way
Marc> around it is to have fewer subvolumes.

Ouch!  This is not an easy space to be in.  

Marc> 4) because I still need the same amount of backups and want the same
Marc> amount of history, fewer subvolumes means moving each separate subvolume
Marc> into its own separate filesystem.

So you're doing snapshots of source sub-volumes?  I figure you must be
running into performance problems no matter which end you're talking
about here, because the btrfs stuff is just going to bite you one way
or another.

Marc> Then there is the last part that btrfs is still not super stable
Marc> and can have corruption problems (although in my case I had
Marc> clear problems due to an underlying unreliable SATA subsystem
Marc> which caused writes not to make it to all the blocks of each
Marc> drive of a raid set, something that even careful journalling
Marc> does not deal with with).  So, I have:

Man, you love living dangerously!  *grin*

Marc> 5) when things go wrong with btrfs, you're better off having smaller
Marc> filesystems with less data as they are quicker to check and repair as
Marc> well we quicker to rebuild if they are corrupted beyond repair
Marc> (btrfs can easily get into a state where all or most of your data is
Marc> still there read only, but the filesystem has extent issues that can't
Marc> be fixed at this moment and require a rebuild)

Ouch!  You really enjoy living on the edge.  :-)   

Marc> Am I crazy to want to use dm-thin the way I'm trying to? :)

I think you're a little crazy using btrfs in this way, *grin* since
losing my data is a big no-no in my world.  Personally love my Netapps
because they're super reliable and super easy to grow-shrink volumes
and snapshots just work, along with cloning volumes across to other
systems.

But I also agree that backups are a pain in the ass, no matter how you
look at it, and it's only gotten worse as filesystem size, and file
counts have gone up, but underlying filesystems and such haven't
managed to keep up.

Good luck for sure!
John

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ?
  2018-07-27 21:09         ` John Stoffel
@ 2018-07-27 23:35           ` Marc MERLIN
  0 siblings, 0 replies; 15+ messages in thread
From: Marc MERLIN @ 2018-07-27 23:35 UTC (permalink / raw)
  To: John Stoffel; +Cc: Zdenek Kabelac, LVM general discussion and development

On Fri, Jul 27, 2018 at 05:09:54PM -0400, John Stoffel wrote:
> That's the key part that I didn't realize.  And this is why I'm still
> leary of btrfs (and zfs for that matter) since as you push the limits,
> they tend to fall off a cliff performance wise, instead of degrading
> more gracefully.  So you're obvisously also using source brtfs
> volume(s) for your data being backed up.  So can understand what
> you're trying to do...
> 
> So is it a single 14tb source btrfs volume, and did you make snapshots
> on a rotating basis to the destinations volumes?  
 
Maybe we should continue this on the btrfs list, I don't want to spam
people here who don't care about btrfs :) but I'll answer this and if we
continue, let's move lists if you don't mind.

btrfs send/receive needs a snapshot for each copy. I then have a script
that decides that I keep X of the older snapshots I don't need anymore
for send/receive to work, but that I keep around for posterity.

Snapshots do not actually cause performance issues that I've noticed day
to day with btrfs, but if you do quotas, or balance (which is a
complicated operation), or btrfsck, then the number of snapshots
matters, and performance gets hurt quite a bit if you have 270
snapshots, like I ended up having in the end :)

> Marc> 4) because I still need the same amount of backups and want the same
> Marc> amount of history, fewer subvolumes means moving each separate subvolume
> Marc> into its own separate filesystem.
> 
> So you're doing snapshots of source sub-volumes?  I figure you must be
> running into performance problems no matter which end you're talking
> about here, because the btrfs stuff is just going to bite you one way
> or another.
 
Not really, performance was fine. It was so much better than using
rsync (sometimes by 100x or more)
But yeah, send/receive makes a snapshots of the source, and leaves a
snapshot on the destination volume.
You can work with only 2 snapshots, but I keep more for historical
restores.

> Marc> Then there is the last part that btrfs is still not super stable
> Marc> and can have corruption problems (although in my case I had
> Marc> clear problems due to an underlying unreliable SATA subsystem
> Marc> which caused writes not to make it to all the blocks of each
> Marc> drive of a raid set, something that even careful journalling
> Marc> does not deal with with).  So, I have:
> 
> Man, you love living dangerously!  *grin*

It is a good time to say that I actually use all of this on one
filesystem?

mdadm raid5
bcache
dmcrypt
dm-thin
lvm
btrfs

:)

> I think you're a little crazy using btrfs in this way, *grin* since
> losing my data is a big no-no in my world.  Personally love my Netapps
> because they're super reliable and super easy to grow-shrink volumes
> and snapshots just work, along with cloning volumes across to other
> systems.
 
I used to work at netapp, they're great, but they don't work inside my
laptop, obviously they're not open source and I'd rather avoid using
NFS if I can at this point (ok, they also do iscsi).

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ?
  2018-07-27 18:26   ` Marc MERLIN
  2018-07-27 19:31     ` John Stoffel
@ 2018-07-31  2:44     ` Marc MERLIN
  2018-07-31 12:35       ` Zdenek Kabelac
  1 sibling, 1 reply; 15+ messages in thread
From: Marc MERLIN @ 2018-07-31  2:44 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: LVM general discussion and development

On Fri, Jul 27, 2018 at 11:26:58AM -0700, Marc MERLIN wrote:
> Hi Zdenek,
> 
> Thanks for your helpful reply.
 
Ha again Zdenek,

Just to confirm, am I going to be ok enough with the scheme I described
as long as I ensure that 'Allocated pool data' does not get to 100% ?

For now, I have my btrfs filesystems mounted with "discard", so
hopefully it should tell dm-thin when it can free up/reuse blocks.

Given that, am I more or less ok using dm-thin that way?

And for my own understanding, is there any reason why I would even want
to consider thin_pool_autoextend_threshold < 100 ?

As a reminder, I have:
mdadm raid5
bcache
dmcrypt
dm-thin
lvm
btrfs

Thanks,
Marc

> On Fri, Jul 27, 2018 at 02:59:28PM +0200, Zdenek Kabelac wrote:
> > Dne 26.7.2018 v 18:31 Marc MERLIN napsal(a):
> > >Still learning about thin volumes.
> > >Why do I want my thin pool to get auto extended? Does "extended" mean
> > >resized?
> > 
> > yes   extension == resize
>  
> Gotcha. Then I don't want to have to worry about my filesystem being resized
> multiple times, especially since I'm not sure how it will help.
>  
> > man lvmthin.
>  
> Thanks. Had read it, but not carefully enough.
> So, I just re-read "Automatic extend settings"
> I'm still I'm not entirely sure how using extension would help me there. I
> can't set it to 10% for all 10 filesystems (50% is minimum).
> If I set it to anything less than 100%, it could later that it can block,
> and try to extend and resize later, but ultimately I'll still have multiple
> filesystems that together exceed the space available, so I can run out.
> I'm not seeing how the automatic extend setting is helpful, at least in my case.
> Am I missing something?
> 
> To be clear, my case is that I will have 10 filesystems in a place where the
> same data was in a single filesystem that sadly I must segment now. More
> than a few will take more than 1/10th of the space, but I don't want to have
> to worry about which ones are going to use how much as long as all together
> they stay below 100% of course.
> I don't want to have to manage space for each of those 10 and have to resize
> them by hand multiple times up and down to share the space, hence dm-thin.
> 
> My understanding is that I have to watch this carefully
>   LV Name                thinpool2  
>   VG Name                vgds2
>   LV Pool metadata       thinpool2_tmeta
>   LV Pool data           thinpool2_tdata
>   LV Status              available
>   # open                 8
>   LV Size                14.50 TiB
>   Allocated pool data    20.26%
>   Allocated metadata     10.66%
> 
> I'll have to make sure to run fstrim so that 'Allocated pool data' never 
> gets too high.
> Metadata, I need to read more about to see whether that may become a problem.
> I think as long as I don't use LVM snapshots I should be ok (and I won't).
> 
> > Running out-of-space in thin-pool (data and even more on metadata) will 
> > have always MAJOR impact on usability of your system. It's always 
> > unpleasant moment and it's not even closely comparable with something like 
> > running out-of-space in your filesystem - it's much more problematic case - 
> > so you should at all cost try to avoid it.
>  
> Thanks for confirming.
> I suppose in my case I should set 'errorwhenfull y' so that the FS immmediately 
> remounts read only on write failure. Delaying for up to 60 seconds is not
> going to help in my case.
> 
> > If you want to be living on corner case of out-of-space, thin-pool is 
> > probably not the best technology for use.
>  
> I don't want to be using dm-thin at all, but I have too many subvolumes for
> a single btrfs filesystem, so I need to segement my btrfs filesystem in 10
> or so, to be safe (as discussed with btrfs developers)
>  
> > IMHO bad plan to combine 2 overprovisioning technologies together.
> > btrfs HAS its own built-in volume manager  (aka built-in it's own like lvm)
>  
> btrfs does not over provision, and sadly I found out that if you have more
> than 50 or 100 snapshots, you are going to run into problems with balancing,
> and bigger problems with filesystem corruption and repair later (as I found
> out over the last 3 weeks dealing with this)
>  
> > >There is however an issue with btrfs where it gets more unsafe (and
> > >slower) to use if you have too many snapshots (over 50, and especially
> > >over 100).
> > 
> > It's better to pair  thin-pool with ext4 of  XFS.
>  
> I need btrfs send/receive, so that's not an option.
> 
> > BTRFS will suffer great pain from problems of lvm2 snapshots - where btrfs 
> 
> I will not be using lvm snapshots at all.
> 
> > will see the very same block device multiple times present in your system - 
> > so I'd highly discourage usage of thin-pool with btrfs unless you are very 
> > well aware of the weaknesses and you can avoid running into them...
> 
> I'm only using thin-pool to allow dynamic block allocation for over
> provisioning. I will use no other LVM feature. Is that ok?
> 
> > Possible lose of your data in case you run out of space and you hit some 
> > corner cases - note just with 4.18 kernel will be fixed one quite annoying 
> > bug with usage of  TRIM and full pool which could have lead to some 
> > problematic metadata recovery.
> 
> So, as long as I run trim in btrfs and make very sure I don't run out of blocks
> on the VG side, should I be safe-ish enough?
> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/  
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
> 

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ?
  2018-07-27 19:31     ` John Stoffel
  2018-07-27 19:58       ` Marc MERLIN
@ 2018-07-31  4:52       ` Chris Murphy
  2018-08-01  1:33         ` John Stoffel
  1 sibling, 1 reply; 15+ messages in thread
From: Chris Murphy @ 2018-07-31  4:52 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Zdenek Kabelac

On Fri, Jul 27, 2018 at 1:31 PM, John Stoffel <john@stoffel.org> wrote:

>
> Why don't you run quotas on your filesystems?  Also, none of the
> filesystems in Linux land that I'm aware of supports shrinking the
> filesystem while live, it's all a unmount, shrink FS, shrink volume
> (carefully!) and then re-mount the filesystem.

Btrfs supports grow and shrink resizes only when mounted. It's not
possible to resize when unmounted.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ?
  2018-07-31  2:44     ` Marc MERLIN
@ 2018-07-31 12:35       ` Zdenek Kabelac
  2018-07-31 21:17         ` Marc MERLIN
  0 siblings, 1 reply; 15+ messages in thread
From: Zdenek Kabelac @ 2018-07-31 12:35 UTC (permalink / raw)
  To: LVM general discussion and development, Marc MERLIN

Dne 31.7.2018 v 04:44 Marc MERLIN napsal(a):
> On Fri, Jul 27, 2018 at 11:26:58AM -0700, Marc MERLIN wrote:
>> Hi Zdenek,
>>
>> Thanks for your helpful reply.
>   
> Ha again Zdenek,
> 
> Just to confirm, am I going to be ok enough with the scheme I described
> as long as I ensure that 'Allocated pool data' does not get to 100% ?
> 
> For now, I have my btrfs filesystems mounted with "discard", so
> hopefully it should tell dm-thin when it can free up/reuse blocks.
> 
> Given that, am I more or less ok using dm-thin that way?
> 
> And for my own understanding, is there any reason why I would even want
> to consider thin_pool_autoextend_threshold < 100 ?
> 

Hi

If you monitor amount of free space for data AND for metadata in thin-pool
yourself you can keep easily threshold == 100.

Just don't forget when you upsize 'data' - you should also typically
extend also metadata -  it's not uncommon issue user  start with small
'data' & 'metadata' LV with thin-pool - then  continue to only extend
thin-pool 'data' volume and ignore/forget about metadata completely
and hit the full metadata device - which can lead to many troubles
(hitting full dataLV is normally not a big deal).

Regards

Zdenek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ?
  2018-07-31 12:35       ` Zdenek Kabelac
@ 2018-07-31 21:17         ` Marc MERLIN
  2018-08-01 11:37           ` Zdenek Kabelac
  0 siblings, 1 reply; 15+ messages in thread
From: Marc MERLIN @ 2018-07-31 21:17 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: LVM general discussion and development

On Tue, Jul 31, 2018 at 02:35:42PM +0200, Zdenek Kabelac wrote:
> If you monitor amount of free space for data AND for metadata in thin-pool
> yourself you can keep easily threshold == 100.
 
Understood. Two things:
1) basically threshold < 100 allows you to hit the limit, have LVM pause
IO, allocate more blocks, and resize the filesystem for you.
However, if you're not monitoring this, it's ultimately just the same as
having threshold = 100 and hoping that you won't hit the limit, except
that you're adding the complexity of resizes in the mix. Correct?

2) I wasn't quite clear on what metadata was used for, and I let
vgcreate pick a default amount for me. Am I correct that it basically
tracks block usage and maybe LVM snapshots that I'm not going to use,
and that therefore if I don't resize my LV, I don't really have to
worry about metadata running out?

> Just don't forget when you upsize 'data' - you should also typically
> extend also metadata -  it's not uncommon issue user  start with small
> 'data' & 'metadata' LV with thin-pool - then  continue to only extend
> thin-pool 'data' volume and ignore/forget about metadata completely
> and hit the full metadata device - which can lead to many troubles
> (hitting full dataLV is normally not a big deal).

Thanks for the warning. Given that I started with the maximum size and
don't plain on ever extending (to be fair, I can't), I should be ok
there, correct?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ?
  2018-07-31  4:52       ` Chris Murphy
@ 2018-08-01  1:33         ` John Stoffel
  2018-08-01  2:43           ` Chris Murphy
  0 siblings, 1 reply; 15+ messages in thread
From: John Stoffel @ 2018-08-01  1:33 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Zdenek Kabelac

>>>>> "Chris" == Chris Murphy <lists@colorremedies.com> writes:

Chris> On Fri, Jul 27, 2018 at 1:31 PM, John Stoffel <john@stoffel.org> wrote:
>> 
>> Why don't you run quotas on your filesystems?  Also, none of the
>> filesystems in Linux land that I'm aware of supports shrinking the
>> filesystem while live, it's all a unmount, shrink FS, shrink volume
>> (carefully!) and then re-mount the filesystem.

Chris> Btrfs supports grow and shrink resizes only when mounted. It's
Chris> not possible to resize when unmounted.

That's... bizarre.  Good to know, but bizarre.  That does make it more
appealing to use in day to day situations for sure.  Any thoughts on
how stable this is in real life?

John

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ?
  2018-08-01  1:33         ` John Stoffel
@ 2018-08-01  2:43           ` Chris Murphy
  2018-08-02 17:42             ` Chris Murphy
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Murphy @ 2018-08-01  2:43 UTC (permalink / raw)
  To: LVM general discussion and development

On Tue, Jul 31, 2018 at 7:33 PM, John Stoffel <john@stoffel.org> wrote:
>>>>>> "Chris" == Chris Murphy <lists@colorremedies.com> writes:
>
> Chris> On Fri, Jul 27, 2018 at 1:31 PM, John Stoffel <john@stoffel.org> wrote:
>>>
>>> Why don't you run quotas on your filesystems?  Also, none of the
>>> filesystems in Linux land that I'm aware of supports shrinking the
>>> filesystem while live, it's all a unmount, shrink FS, shrink volume
>>> (carefully!) and then re-mount the filesystem.
>
> Chris> Btrfs supports grow and shrink resizes only when mounted. It's
> Chris> not possible to resize when unmounted.
>
> That's... bizarre.  Good to know, but bizarre.  That does make it more
> appealing to use in day to day situations for sure.  Any thoughts on
> how stable this is in real life?

I've never heard of it failing in many years of being on the Btrfs
list. The resize leverages the same block group handling as balance
code, so the relocation of block groups during resize is the same as
you'd get with a filtered balance, it's integral to the file system's
operation.

The shrink operation first moves block groups in the region subject to
shrink (the part that's going away), and this is an atomic operation
per block group. You could pull the plug on it (and I have) in
progress and you'd just get a reversion to a prior state before the
last file system metadata and superblock commit (assumes the hardware
isn't lying and some hardware does lie). Once all the block groups are
moved, and the dev and chunk trees are updated to reflect the new
location of those chunks (block groups), the superblocks are updated
to reflect the new device size.

Literally the shrink operation changes very little metadata, it's just
moving block groups, and then the actual "resize" is merely a
superblock change. The file system metadata doesn't change much
because Btrfs uses an internal logical block addressing to reference
file extents and those references stay the same during a resize. The
logical block range mapping to physical block range mapping is a tiny
update (maybe 1/2 dozen 16K leaf and node writes) and those updates
are always COW, not overwrites. That's also how this is an atomic
operation. If the block group copy fails, the dev and chunk trees that
are used to translate between logical and physical block ranges never
get updated.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ?
  2018-07-31 21:17         ` Marc MERLIN
@ 2018-08-01 11:37           ` Zdenek Kabelac
  0 siblings, 0 replies; 15+ messages in thread
From: Zdenek Kabelac @ 2018-08-01 11:37 UTC (permalink / raw)
  To: LVM general discussion and development, Marc MERLIN

Dne 31.7.2018 v 23:17 Marc MERLIN napsal(a):
> On Tue, Jul 31, 2018 at 02:35:42PM +0200, Zdenek Kabelac wrote:
>> If you monitor amount of free space for data AND for metadata in thin-pool
>> yourself you can keep easily threshold == 100.
>   
> Understood. Two things:
> 1) basically threshold < 100 allows you to hit the limit, have LVM pause
> IO, allocate more blocks, and resize the filesystem for you.
> However, if you're not monitoring this, it's ultimately just the same as
> having threshold = 100 and hoping that you won't hit the limit, except
> that you're adding the complexity of resizes in the mix. Correct?

Sure thing, when there is no free space to extend your overprovisioned 
thin-pool and you run out-of-space you hit the limit at some point....


> 2) I wasn't quite clear on what metadata was used for, and I let
> vgcreate pick a default amount for me. Am I correct that it basically
> tracks block usage and maybe LVM snapshots that I'm not going to use,
> and that therefore if I don't resize my LV, I don't really have to
> worry about metadata running out?

kernel metadata stored in _tmeta LV hold old mapping of all thin-volumes.
i.e.  which thin-pool chunk belongs to which thin-volume.

>> Just don't forget when you upsize 'data' - you should also typically
>> extend also metadata -  it's not uncommon issue user  start with small
>> 'data' & 'metadata' LV with thin-pool - then  continue to only extend
>> thin-pool 'data' volume and ignore/forget about metadata completely
>> and hit the full metadata device - which can lead to many troubles
>> (hitting full dataLV is normally not a big deal).
> 
> Thanks for the warning. Given that I started with the maximum size and
> don't plain on ever extending (to be fair, I can't), I should be ok
> there, correct?

Yep -  once you make ~16GiB metadata you can't make them any bigger (hard 
internal limitation of existing thin-pool target implementation).

But you still need to remember you can run of space in your metadata if there 
is heavy usage of many large thin volumes - so the value of free space should 
be always somehow monitored...


Regards

Zdenek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ?
  2018-08-01  2:43           ` Chris Murphy
@ 2018-08-02 17:42             ` Chris Murphy
  0 siblings, 0 replies; 15+ messages in thread
From: Chris Murphy @ 2018-08-02 17:42 UTC (permalink / raw)
  To: LVM general discussion and development

On Tue, Jul 31, 2018 at 8:43 PM, Chris Murphy <lists@colorremedies.com> wrote:
> On Tue, Jul 31, 2018 at 7:33 PM, John Stoffel <john@stoffel.org> wrote:
>>>>>>> "Chris" == Chris Murphy <lists@colorremedies.com> writes:
>>
>> Chris> On Fri, Jul 27, 2018 at 1:31 PM, John Stoffel <john@stoffel.org> wrote:
>>>>
>>>> Why don't you run quotas on your filesystems?  Also, none of the
>>>> filesystems in Linux land that I'm aware of supports shrinking the
>>>> filesystem while live, it's all a unmount, shrink FS, shrink volume
>>>> (carefully!) and then re-mount the filesystem.
>>
>> Chris> Btrfs supports grow and shrink resizes only when mounted. It's
>> Chris> not possible to resize when unmounted.
>>
>> That's... bizarre.  Good to know, but bizarre.  That does make it more
>> appealing to use in day to day situations for sure.  Any thoughts on
>> how stable this is in real life?
>
> I've never heard of it failing in many years of being on the Btrfs
> list. The resize leverages the same block group handling as balance
> code, so the relocation of block groups during resize is the same as
> you'd get with a filtered balance, it's integral to the file system's
> operation.
>
> The shrink operation first moves block groups in the region subject to
> shrink (the part that's going away), and this is an atomic operation
> per block group. You could pull the plug on it (and I have) in
> progress and you'd just get a reversion to a prior state before the
> last file system metadata and superblock commit (assumes the hardware
> isn't lying and some hardware does lie). Once all the block groups are
> moved, and the dev and chunk trees are updated to reflect the new
> location of those chunks (block groups), the superblocks are updated
> to reflect the new device size.
>
> Literally the shrink operation changes very little metadata, it's just
> moving block groups, and then the actual "resize" is merely a
> superblock change. The file system metadata doesn't change much
> because Btrfs uses an internal logical block addressing to reference
> file extents and those references stay the same during a resize. The
> logical block range mapping to physical block range mapping is a tiny
> update (maybe 1/2 dozen 16K leaf and node writes) and those updates
> are always COW, not overwrites. That's also how this is an atomic
> operation. If the block group copy fails, the dev and chunk trees that
> are used to translate between logical and physical block ranges never
> get updated.
>
>
> --
> Chris Murphy

Also, fs resize always happens when doing device add or device remove.
So resize is integral for Btrfs multiple device support. Device add
and remove can likewise only be done while the file system is mounted.
Removing a device means migrating block groups off that device,
shrinking the file system by an amount identical to the device size,
updating superblocks on remaining devices, and wiping the Btrfs
signature on the removed device. And there are similar behaviors when
converting block group profiles: e.g. from single to raid1, single to
DUP, DUP to single, raid5 to raid6 or vice versa and so on.
Conversions are only possible while the file system is mounted.

LVM pvmove isn't entirely different in concept. The LVM extents are
smaller (4MB by default) than Btrfs block groups (dynamically variable
in size but most typically they are 1GiB for data bg's and 256MB for
metadata bg's, and 32MB for system bg's. Btrfs block groups are
collections of extents.). But basically the file system just keeps on
reading and writing to its usual LBA's which are abstracted and
translated into real physical LBA's and a device by LVM. I don't know
how atomic pvmove is without the --atomic flag, and what the chances
of resuming pvmove in case of crash or an urgent reboot is.

The gotcha with ext4 and XFS is they put filesystem metadata in fixed
locations on a block device, so those all have to be relocated to new
fixed positions based on the new block device size as well as data.
The shrink operation is probably sufficiently complicated for ext234
that they just don't want concurrent read/write operations happening
while shrinking. And also the resize introduces inherent inefficiency
with subsequent operation. The greater the difference between mkfs
volume size and the resized size, the greater the inefficiency. That
applies to both ext4 and XFS whether shrink or grow, of course XFS
doesn't have shrink at all, the expectation for its more sophisticated
environment use cases was that it would only ever be grown.

Whereas Btrfs has no fixed locations for any of its block groups, so
from its perspective a resize is just not that unique of an operation,
leveraging code that's regularly exercised in normal operation anyway.
And it also doesn't suffer from any resize inefficiencies either; in
fact depending on the operation it might become more efficient.

Anyway, probably a better way of handling shrink with ext4 and XFS is
having them on LVM thin volumes, and just using fstrim to remove
unused LVM extents from the LV, releasing them back to the pool for
use by any other LV in that pool. It's not exactly the same thing as a
shrink of course, but if the idea is to let a file system use the
unused but "reserved" space of a second file system, merely trimming
the second file system on a thin LV does achieve that. Bigger issue
here is you can't then shrink the pool, so you can still get stuck in
some circumstances.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-08-02 17:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-26 16:31 [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? Marc MERLIN
2018-07-27 12:59 ` Zdenek Kabelac
2018-07-27 18:26   ` Marc MERLIN
2018-07-27 19:31     ` John Stoffel
2018-07-27 19:58       ` Marc MERLIN
2018-07-27 21:09         ` John Stoffel
2018-07-27 23:35           ` Marc MERLIN
2018-07-31  4:52       ` Chris Murphy
2018-08-01  1:33         ` John Stoffel
2018-08-01  2:43           ` Chris Murphy
2018-08-02 17:42             ` Chris Murphy
2018-07-31  2:44     ` Marc MERLIN
2018-07-31 12:35       ` Zdenek Kabelac
2018-07-31 21:17         ` Marc MERLIN
2018-08-01 11:37           ` Zdenek Kabelac

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).