* [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? @ 2018-07-26 16:31 Marc MERLIN 2018-07-27 12:59 ` Zdenek Kabelac 0 siblings, 1 reply; 15+ messages in thread From: Marc MERLIN @ 2018-07-26 16:31 UTC (permalink / raw) To: linux-lvm Still learning about thin volumes. Why do I want my thin pool to get auto extended? Does "extended" mean resized? Why would I want to have thin_pool_autoextend_threshold below 100 and have it auto extend as needed vs having all of them be at 100, knowing that underlying block allocation will fail if I run out of physical blocks underneath? Details: I have a 14TB bcache block device. On top, I'd like to put multiple btrfs filesystems. There is however an issue with btrfs where it gets more unsafe (and slower) to use if you have too many snapshots (over 50, and especially over 100). The fix around this is sadly to have multiple separate filesystems, which kind of negates the nice part where you make subvolumes and let them grow independently. So, I'm going to make about 10 thin volumes, one for each of my btrfs subvolumes so that they are all separate filesystems. However, my plan is to make them all 14TB in size so that I never have to resize the filesystem with the full understanding of course that the sum of all is still going to be 14TiB underneath. Right now, I'm getting this: gargamel:~# lvcreate -V14TiB -T vgds2/thinpool2 -n debian64 Using default stripesize 64.00 KiB. WARNING: Sum of all thin volume sizes (28.00 TiB) exceeds the size of thin pool vgds2/thinpool2 and the size of whole volume group (14.55 TiB). WARNING: You have not turned on protection against thin pools running out of space. WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full. Logical volume "debian64" created. I'm looking at lvm.conf, and I'll be honest that it's not clear # Configuration option activation/thin_pool_autoextend_threshold. # Auto-extend a thin pool when its usage exceeds this percent. # Setting this to 100 disables automatic extension. # The minimum value is 50 (a smaller value is treated as 50.) # Also see thin_pool_autoextend_percent. # Automatic extension requires dmeventd to be monitoring the LV. # # Example # Using 70% autoextend threshold and 20% autoextend size, when a 1G # thin pool exceeds 700M, it is extended to 1.2G, and when it exceeds # 840M, it is extended to 1.44G: # thin_pool_autoextend_threshold = 70 # thin_pool_autoextend_threshold = 100 What's the downside of just leaving it at 100? Thanks Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? 2018-07-26 16:31 [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? Marc MERLIN @ 2018-07-27 12:59 ` Zdenek Kabelac 2018-07-27 18:26 ` Marc MERLIN 0 siblings, 1 reply; 15+ messages in thread From: Zdenek Kabelac @ 2018-07-27 12:59 UTC (permalink / raw) To: LVM general discussion and development, Marc MERLIN Dne 26.7.2018 v 18:31 Marc MERLIN napsal(a): > Still learning about thin volumes. > Why do I want my thin pool to get auto extended? Does "extended" mean > resized? yes extension == resize > Why would I want to have thin_pool_autoextend_threshold below 100 and > have it auto extend as needed vs having all of them be at 100, knowing > that underlying block allocation will fail if I run out of physical > blocks underneath? Hi man lvmthin. In general - do not calculate with 'running out of space on thin-pool' as your regular workflow how to use it daily. Running out-of-space in thin-pool (data and even more on metadata) will have always MAJOR impact on usability of your system. It's always unpleasant moment and it's not even closely comparable with something like running out-of-space in your filesystem - it's much more problematic case - so you should at all cost try to avoid it. If you want to be living on corner case of out-of-space, thin-pool is probably not the best technology for use. > > > Details: > I have a 14TB bcache block device. > On top, I'd like to put multiple btrfs filesystems. IMHO bad plan to combine 2 overprovisioning technologies together. btrfs HAS its own built-in volume manager (aka built-in it's own like lvm) > There is however an issue with btrfs where it gets more unsafe (and > slower) to use if you have too many snapshots (over 50, and especially > over 100). It's better to pair thin-pool with ext4 of XFS. BTRFS will suffer great pain from problems of lvm2 snapshots - where btrfs will see the very same block device multiple times present in your system - so I'd highly discourage usage of thin-pool with btrfs unless you are very well aware of the weaknesses and you can avoid running into them... > I'm looking at lvm.conf, and I'll be honest that it's not clear > # Configuration option activation/thin_pool_autoextend_threshold. > # Auto-extend a thin pool when its usage exceeds this percent. > # Setting this to 100 disables automatic extension. > # The minimum value is 50 (a smaller value is treated as 50.) > # Also see thin_pool_autoextend_percent. > # Automatic extension requires dmeventd to be monitoring the LV. > # > # Example > # Using 70% autoextend threshold and 20% autoextend size, when a 1G > # thin pool exceeds 700M, it is extended to 1.2G, and when it exceeds > # 840M, it is extended to 1.44G: > # thin_pool_autoextend_threshold = 70 > # > thin_pool_autoextend_threshold = 100 > > What's the downside of just leaving it at 100? Possible lose of your data in case you run out of space and you hit some corner cases - note just with 4.18 kernel will be fixed one quite annoying bug with usage of TRIM and full pool which could have lead to some problematic metadata recovery. Regards Zdenek ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? 2018-07-27 12:59 ` Zdenek Kabelac @ 2018-07-27 18:26 ` Marc MERLIN 2018-07-27 19:31 ` John Stoffel 2018-07-31 2:44 ` Marc MERLIN 0 siblings, 2 replies; 15+ messages in thread From: Marc MERLIN @ 2018-07-27 18:26 UTC (permalink / raw) To: Zdenek Kabelac; +Cc: LVM general discussion and development Hi Zdenek, Thanks for your helpful reply. On Fri, Jul 27, 2018 at 02:59:28PM +0200, Zdenek Kabelac wrote: > Dne 26.7.2018 v 18:31 Marc MERLIN napsal(a): > >Still learning about thin volumes. > >Why do I want my thin pool to get auto extended? Does "extended" mean > >resized? > > yes extension == resize Gotcha. Then I don't want to have to worry about my filesystem being resized multiple times, especially since I'm not sure how it will help. > man lvmthin. Thanks. Had read it, but not carefully enough. So, I just re-read "Automatic extend settings" I'm still I'm not entirely sure how using extension would help me there. I can't set it to 10% for all 10 filesystems (50% is minimum). If I set it to anything less than 100%, it could later that it can block, and try to extend and resize later, but ultimately I'll still have multiple filesystems that together exceed the space available, so I can run out. I'm not seeing how the automatic extend setting is helpful, at least in my case. Am I missing something? To be clear, my case is that I will have 10 filesystems in a place where the same data was in a single filesystem that sadly I must segment now. More than a few will take more than 1/10th of the space, but I don't want to have to worry about which ones are going to use how much as long as all together they stay below 100% of course. I don't want to have to manage space for each of those 10 and have to resize them by hand multiple times up and down to share the space, hence dm-thin. My understanding is that I have to watch this carefully LV Name thinpool2 VG Name vgds2 LV Pool metadata thinpool2_tmeta LV Pool data thinpool2_tdata LV Status available # open 8 LV Size 14.50 TiB Allocated pool data 20.26% Allocated metadata 10.66% I'll have to make sure to run fstrim so that 'Allocated pool data' never gets too high. Metadata, I need to read more about to see whether that may become a problem. I think as long as I don't use LVM snapshots I should be ok (and I won't). > Running out-of-space in thin-pool (data and even more on metadata) will > have always MAJOR impact on usability of your system. It's always > unpleasant moment and it's not even closely comparable with something like > running out-of-space in your filesystem - it's much more problematic case - > so you should at all cost try to avoid it. Thanks for confirming. I suppose in my case I should set 'errorwhenfull y' so that the FS immmediately remounts read only on write failure. Delaying for up to 60 seconds is not going to help in my case. > If you want to be living on corner case of out-of-space, thin-pool is > probably not the best technology for use. I don't want to be using dm-thin at all, but I have too many subvolumes for a single btrfs filesystem, so I need to segement my btrfs filesystem in 10 or so, to be safe (as discussed with btrfs developers) > IMHO bad plan to combine 2 overprovisioning technologies together. > btrfs HAS its own built-in volume manager (aka built-in it's own like lvm) btrfs does not over provision, and sadly I found out that if you have more than 50 or 100 snapshots, you are going to run into problems with balancing, and bigger problems with filesystem corruption and repair later (as I found out over the last 3 weeks dealing with this) > >There is however an issue with btrfs where it gets more unsafe (and > >slower) to use if you have too many snapshots (over 50, and especially > >over 100). > > It's better to pair thin-pool with ext4 of XFS. I need btrfs send/receive, so that's not an option. > BTRFS will suffer great pain from problems of lvm2 snapshots - where btrfs I will not be using lvm snapshots at all. > will see the very same block device multiple times present in your system - > so I'd highly discourage usage of thin-pool with btrfs unless you are very > well aware of the weaknesses and you can avoid running into them... I'm only using thin-pool to allow dynamic block allocation for over provisioning. I will use no other LVM feature. Is that ok? > Possible lose of your data in case you run out of space and you hit some > corner cases - note just with 4.18 kernel will be fixed one quite annoying > bug with usage of TRIM and full pool which could have lead to some > problematic metadata recovery. So, as long as I run trim in btrfs and make very sure I don't run out of blocks on the VG side, should I be safe-ish enough? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? 2018-07-27 18:26 ` Marc MERLIN @ 2018-07-27 19:31 ` John Stoffel 2018-07-27 19:58 ` Marc MERLIN 2018-07-31 4:52 ` Chris Murphy 2018-07-31 2:44 ` Marc MERLIN 1 sibling, 2 replies; 15+ messages in thread From: John Stoffel @ 2018-07-27 19:31 UTC (permalink / raw) To: LVM general discussion and development; +Cc: Zdenek Kabelac >>>>> "Marc" == Marc MERLIN <marc@merlins.org> writes: Marc> Hi Zdenek, Marc> Thanks for your helpful reply. Marc> On Fri, Jul 27, 2018 at 02:59:28PM +0200, Zdenek Kabelac wrote: >> Dne 26.7.2018 v 18:31 Marc MERLIN napsal(a): >> >Still learning about thin volumes. >> >Why do I want my thin pool to get auto extended? Does "extended" mean >> >resized? >> >> yes extension == resize Marc> Gotcha. Then I don't want to have to worry about my filesystem being resized Marc> multiple times, especially since I'm not sure how it will help. >> man lvmthin. Marc> Thanks. Had read it, but not carefully enough. Marc> So, I just re-read "Automatic extend settings" Marc> I'm still I'm not entirely sure how using extension would help me there. I Marc> can't set it to 10% for all 10 filesystems (50% is minimum). Marc> If I set it to anything less than 100%, it could later that it can block, Marc> and try to extend and resize later, but ultimately I'll still have multiple Marc> filesystems that together exceed the space available, so I can run out. Marc> I'm not seeing how the automatic extend setting is helpful, at least in my case. Marc> Am I missing something? Marc> To be clear, my case is that I will have 10 filesystems in a Marc> place where the same data was in a single filesystem that sadly Marc> I must segment now. More than a few will take more than 1/10th Marc> of the space, but I don't want to have to worry about which ones Marc> are going to use how much as long as all together they stay Marc> below 100% of course. Marc> I don't want to have to manage space for each of those 10 and Marc> have to resize them by hand multiple times up and down to share Marc> the space, hence dm-thin. Why don't you run quotas on your filesystems? Also, none of the filesystems in Linux land that I'm aware of supports shrinking the filesystem while live, it's all a unmount, shrink FS, shrink volume (carefully!) and then re-mount the filesystem. But again, I think you might really prefer quotas instead, unless you need complete logical seperation. John ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? 2018-07-27 19:31 ` John Stoffel @ 2018-07-27 19:58 ` Marc MERLIN 2018-07-27 21:09 ` John Stoffel 2018-07-31 4:52 ` Chris Murphy 1 sibling, 1 reply; 15+ messages in thread From: Marc MERLIN @ 2018-07-27 19:58 UTC (permalink / raw) To: LVM general discussion and development, john; +Cc: Zdenek Kabelac On Fri, Jul 27, 2018 at 03:31:36PM -0400, John Stoffel wrote: > Why don't you run quotas on your filesystems? Also, none of the > filesystems in Linux land that I'm aware of supports shrinking the > filesystem while live, it's all a unmount, shrink FS, shrink volume > (carefully!) and then re-mount the filesystem. Those filesystems can be umounted, so shrinking while live is not something I need even if btrfs might actually support it. > But again, I think you might really prefer quotas instead, unless you > need complete logical seperation. Since I know more than I wish I did about btrfs :) let me explain a bit more 0) I will not be using lvm for its own snapshot capabilities, or resize. I'm cheating by using overcommit with dm-thin and not wanting to worry about segmenting space between each fileystem and having to worry about shrinking one to grow another one from time to time. 1) quotas don't work well on btrfs when you have snapshots, and by that I mean btfrs snapshots. Because blocks are shared between snapshots, calculating quotas is a performance problem. 2) I don't have a space or quota problem on btrfs, the problem I have is I use btrfs send/receive a lot for backups (it's a backup server) and history (go back a month ago or whatever). http://marc.merlins.org/perso/btrfs/post_2014-03-22_Btrfs-Tips_-Doing-Fast-Incremental-Backups-With-Btrfs-Send-and-Receive.html if you aren't familiar with btrfs send/receive backups. Btrfs starts having performance problems for some operations (re-balance, or fsck) when you have too many subvolumes (each snapshot creates a subvolume). 3) I hit severe enough problems that filesystem checks were taking days to complete, which was not workable. The only way around it is to have fewer subvolumes. 4) because I still need the same amount of backups and want the same amount of history, fewer subvolumes means moving each separate subvolume into its own separate filesystem. Then there is the last part that btrfs is still not super stable and can have corruption problems (although in my case I had clear problems due to an underlying unreliable SATA subsystem which caused writes not to make it to all the blocks of each drive of a raid set, something that even careful journalling does not deal with with). So, I have: 5) when things go wrong with btrfs, you're better off having smaller filesystems with less data as they are quicker to check and repair as well we quicker to rebuild if they are corrupted beyond repair (btrfs can easily get into a state where all or most of your data is still there read only, but the filesystem has extent issues that can't be fixed at this moment and require a rebuild) Makes sense? Am I crazy to want to use dm-thin the way I'm trying to? :) Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? 2018-07-27 19:58 ` Marc MERLIN @ 2018-07-27 21:09 ` John Stoffel 2018-07-27 23:35 ` Marc MERLIN 0 siblings, 1 reply; 15+ messages in thread From: John Stoffel @ 2018-07-27 21:09 UTC (permalink / raw) To: Marc MERLIN; +Cc: Zdenek Kabelac, LVM general discussion and development >>>>> "Marc" == Marc MERLIN <marc@merlins.org> writes: Marc> On Fri, Jul 27, 2018 at 03:31:36PM -0400, John Stoffel wrote: >> Why don't you run quotas on your filesystems? Also, none of the >> filesystems in Linux land that I'm aware of supports shrinking the >> filesystem while live, it's all a unmount, shrink FS, shrink volume >> (carefully!) and then re-mount the filesystem. Marc> Those filesystems can be umounted, so shrinking while live is not Marc> something I need even if btrfs might actually support it. >> But again, I think you might really prefer quotas instead, unless you >> need complete logical seperation. Marc> Since I know more than I wish I did about btrfs :) let me explain a bit Marc> more Marc> 0) I will not be using lvm for its own snapshot capabilities, or Marc> resize. I'm cheating by using overcommit with dm-thin and not Marc> wanting to worry about segmenting space between each fileystem Marc> and having to worry about shrinking one to grow another one from Marc> time to time. Marc> 1) quotas don't work well on btrfs when you have snapshots, and Marc> by that I mean btfrs snapshots. Because blocks are shared Marc> between snapshots, calculating quotas is a performance problem. Marc> 2) I don't have a space or quota problem on btrfs, the problem I Marc> have is I use btrfs send/receive a lot for backups (it's a Marc> backup server) and history (go back a month ago or whatever). Marc> http://marc.merlins.org/perso/btrfs/post_2014-03-22_Btrfs-Tips_-Doing-Fast-Incremental-Backups-With-Btrfs-Send-and-Receive.html Marc> if you aren't familiar with btrfs send/receive backups. Btrfs Marc> starts having performance problems for some operations Marc> (re-balance, or fsck) when you have too many subvolumes (each Marc> snapshot creates a subvolume). That's the key part that I didn't realize. And this is why I'm still leary of btrfs (and zfs for that matter) since as you push the limits, they tend to fall off a cliff performance wise, instead of degrading more gracefully. So you're obvisously also using source brtfs volume(s) for your data being backed up. So can understand what you're trying to do... So is it a single 14tb source btrfs volume, and did you make snapshots on a rotating basis to the destinations volumes? Marc> 3) I hit severe enough problems that filesystem checks were Marc> taking days to complete, which was not workable. The only way Marc> around it is to have fewer subvolumes. Ouch! This is not an easy space to be in. Marc> 4) because I still need the same amount of backups and want the same Marc> amount of history, fewer subvolumes means moving each separate subvolume Marc> into its own separate filesystem. So you're doing snapshots of source sub-volumes? I figure you must be running into performance problems no matter which end you're talking about here, because the btrfs stuff is just going to bite you one way or another. Marc> Then there is the last part that btrfs is still not super stable Marc> and can have corruption problems (although in my case I had Marc> clear problems due to an underlying unreliable SATA subsystem Marc> which caused writes not to make it to all the blocks of each Marc> drive of a raid set, something that even careful journalling Marc> does not deal with with). So, I have: Man, you love living dangerously! *grin* Marc> 5) when things go wrong with btrfs, you're better off having smaller Marc> filesystems with less data as they are quicker to check and repair as Marc> well we quicker to rebuild if they are corrupted beyond repair Marc> (btrfs can easily get into a state where all or most of your data is Marc> still there read only, but the filesystem has extent issues that can't Marc> be fixed at this moment and require a rebuild) Ouch! You really enjoy living on the edge. :-) Marc> Am I crazy to want to use dm-thin the way I'm trying to? :) I think you're a little crazy using btrfs in this way, *grin* since losing my data is a big no-no in my world. Personally love my Netapps because they're super reliable and super easy to grow-shrink volumes and snapshots just work, along with cloning volumes across to other systems. But I also agree that backups are a pain in the ass, no matter how you look at it, and it's only gotten worse as filesystem size, and file counts have gone up, but underlying filesystems and such haven't managed to keep up. Good luck for sure! John ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? 2018-07-27 21:09 ` John Stoffel @ 2018-07-27 23:35 ` Marc MERLIN 0 siblings, 0 replies; 15+ messages in thread From: Marc MERLIN @ 2018-07-27 23:35 UTC (permalink / raw) To: John Stoffel; +Cc: Zdenek Kabelac, LVM general discussion and development On Fri, Jul 27, 2018 at 05:09:54PM -0400, John Stoffel wrote: > That's the key part that I didn't realize. And this is why I'm still > leary of btrfs (and zfs for that matter) since as you push the limits, > they tend to fall off a cliff performance wise, instead of degrading > more gracefully. So you're obvisously also using source brtfs > volume(s) for your data being backed up. So can understand what > you're trying to do... > > So is it a single 14tb source btrfs volume, and did you make snapshots > on a rotating basis to the destinations volumes? Maybe we should continue this on the btrfs list, I don't want to spam people here who don't care about btrfs :) but I'll answer this and if we continue, let's move lists if you don't mind. btrfs send/receive needs a snapshot for each copy. I then have a script that decides that I keep X of the older snapshots I don't need anymore for send/receive to work, but that I keep around for posterity. Snapshots do not actually cause performance issues that I've noticed day to day with btrfs, but if you do quotas, or balance (which is a complicated operation), or btrfsck, then the number of snapshots matters, and performance gets hurt quite a bit if you have 270 snapshots, like I ended up having in the end :) > Marc> 4) because I still need the same amount of backups and want the same > Marc> amount of history, fewer subvolumes means moving each separate subvolume > Marc> into its own separate filesystem. > > So you're doing snapshots of source sub-volumes? I figure you must be > running into performance problems no matter which end you're talking > about here, because the btrfs stuff is just going to bite you one way > or another. Not really, performance was fine. It was so much better than using rsync (sometimes by 100x or more) But yeah, send/receive makes a snapshots of the source, and leaves a snapshot on the destination volume. You can work with only 2 snapshots, but I keep more for historical restores. > Marc> Then there is the last part that btrfs is still not super stable > Marc> and can have corruption problems (although in my case I had > Marc> clear problems due to an underlying unreliable SATA subsystem > Marc> which caused writes not to make it to all the blocks of each > Marc> drive of a raid set, something that even careful journalling > Marc> does not deal with with). So, I have: > > Man, you love living dangerously! *grin* It is a good time to say that I actually use all of this on one filesystem? mdadm raid5 bcache dmcrypt dm-thin lvm btrfs :) > I think you're a little crazy using btrfs in this way, *grin* since > losing my data is a big no-no in my world. Personally love my Netapps > because they're super reliable and super easy to grow-shrink volumes > and snapshots just work, along with cloning volumes across to other > systems. I used to work at netapp, they're great, but they don't work inside my laptop, obviously they're not open source and I'd rather avoid using NFS if I can at this point (ok, they also do iscsi). Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? 2018-07-27 19:31 ` John Stoffel 2018-07-27 19:58 ` Marc MERLIN @ 2018-07-31 4:52 ` Chris Murphy 2018-08-01 1:33 ` John Stoffel 1 sibling, 1 reply; 15+ messages in thread From: Chris Murphy @ 2018-07-31 4:52 UTC (permalink / raw) To: LVM general discussion and development; +Cc: Zdenek Kabelac On Fri, Jul 27, 2018 at 1:31 PM, John Stoffel <john@stoffel.org> wrote: > > Why don't you run quotas on your filesystems? Also, none of the > filesystems in Linux land that I'm aware of supports shrinking the > filesystem while live, it's all a unmount, shrink FS, shrink volume > (carefully!) and then re-mount the filesystem. Btrfs supports grow and shrink resizes only when mounted. It's not possible to resize when unmounted. -- Chris Murphy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? 2018-07-31 4:52 ` Chris Murphy @ 2018-08-01 1:33 ` John Stoffel 2018-08-01 2:43 ` Chris Murphy 0 siblings, 1 reply; 15+ messages in thread From: John Stoffel @ 2018-08-01 1:33 UTC (permalink / raw) To: LVM general discussion and development; +Cc: Zdenek Kabelac >>>>> "Chris" == Chris Murphy <lists@colorremedies.com> writes: Chris> On Fri, Jul 27, 2018 at 1:31 PM, John Stoffel <john@stoffel.org> wrote: >> >> Why don't you run quotas on your filesystems? Also, none of the >> filesystems in Linux land that I'm aware of supports shrinking the >> filesystem while live, it's all a unmount, shrink FS, shrink volume >> (carefully!) and then re-mount the filesystem. Chris> Btrfs supports grow and shrink resizes only when mounted. It's Chris> not possible to resize when unmounted. That's... bizarre. Good to know, but bizarre. That does make it more appealing to use in day to day situations for sure. Any thoughts on how stable this is in real life? John ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? 2018-08-01 1:33 ` John Stoffel @ 2018-08-01 2:43 ` Chris Murphy 2018-08-02 17:42 ` Chris Murphy 0 siblings, 1 reply; 15+ messages in thread From: Chris Murphy @ 2018-08-01 2:43 UTC (permalink / raw) To: LVM general discussion and development On Tue, Jul 31, 2018 at 7:33 PM, John Stoffel <john@stoffel.org> wrote: >>>>>> "Chris" == Chris Murphy <lists@colorremedies.com> writes: > > Chris> On Fri, Jul 27, 2018 at 1:31 PM, John Stoffel <john@stoffel.org> wrote: >>> >>> Why don't you run quotas on your filesystems? Also, none of the >>> filesystems in Linux land that I'm aware of supports shrinking the >>> filesystem while live, it's all a unmount, shrink FS, shrink volume >>> (carefully!) and then re-mount the filesystem. > > Chris> Btrfs supports grow and shrink resizes only when mounted. It's > Chris> not possible to resize when unmounted. > > That's... bizarre. Good to know, but bizarre. That does make it more > appealing to use in day to day situations for sure. Any thoughts on > how stable this is in real life? I've never heard of it failing in many years of being on the Btrfs list. The resize leverages the same block group handling as balance code, so the relocation of block groups during resize is the same as you'd get with a filtered balance, it's integral to the file system's operation. The shrink operation first moves block groups in the region subject to shrink (the part that's going away), and this is an atomic operation per block group. You could pull the plug on it (and I have) in progress and you'd just get a reversion to a prior state before the last file system metadata and superblock commit (assumes the hardware isn't lying and some hardware does lie). Once all the block groups are moved, and the dev and chunk trees are updated to reflect the new location of those chunks (block groups), the superblocks are updated to reflect the new device size. Literally the shrink operation changes very little metadata, it's just moving block groups, and then the actual "resize" is merely a superblock change. The file system metadata doesn't change much because Btrfs uses an internal logical block addressing to reference file extents and those references stay the same during a resize. The logical block range mapping to physical block range mapping is a tiny update (maybe 1/2 dozen 16K leaf and node writes) and those updates are always COW, not overwrites. That's also how this is an atomic operation. If the block group copy fails, the dev and chunk trees that are used to translate between logical and physical block ranges never get updated. -- Chris Murphy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? 2018-08-01 2:43 ` Chris Murphy @ 2018-08-02 17:42 ` Chris Murphy 0 siblings, 0 replies; 15+ messages in thread From: Chris Murphy @ 2018-08-02 17:42 UTC (permalink / raw) To: LVM general discussion and development On Tue, Jul 31, 2018 at 8:43 PM, Chris Murphy <lists@colorremedies.com> wrote: > On Tue, Jul 31, 2018 at 7:33 PM, John Stoffel <john@stoffel.org> wrote: >>>>>>> "Chris" == Chris Murphy <lists@colorremedies.com> writes: >> >> Chris> On Fri, Jul 27, 2018 at 1:31 PM, John Stoffel <john@stoffel.org> wrote: >>>> >>>> Why don't you run quotas on your filesystems? Also, none of the >>>> filesystems in Linux land that I'm aware of supports shrinking the >>>> filesystem while live, it's all a unmount, shrink FS, shrink volume >>>> (carefully!) and then re-mount the filesystem. >> >> Chris> Btrfs supports grow and shrink resizes only when mounted. It's >> Chris> not possible to resize when unmounted. >> >> That's... bizarre. Good to know, but bizarre. That does make it more >> appealing to use in day to day situations for sure. Any thoughts on >> how stable this is in real life? > > I've never heard of it failing in many years of being on the Btrfs > list. The resize leverages the same block group handling as balance > code, so the relocation of block groups during resize is the same as > you'd get with a filtered balance, it's integral to the file system's > operation. > > The shrink operation first moves block groups in the region subject to > shrink (the part that's going away), and this is an atomic operation > per block group. You could pull the plug on it (and I have) in > progress and you'd just get a reversion to a prior state before the > last file system metadata and superblock commit (assumes the hardware > isn't lying and some hardware does lie). Once all the block groups are > moved, and the dev and chunk trees are updated to reflect the new > location of those chunks (block groups), the superblocks are updated > to reflect the new device size. > > Literally the shrink operation changes very little metadata, it's just > moving block groups, and then the actual "resize" is merely a > superblock change. The file system metadata doesn't change much > because Btrfs uses an internal logical block addressing to reference > file extents and those references stay the same during a resize. The > logical block range mapping to physical block range mapping is a tiny > update (maybe 1/2 dozen 16K leaf and node writes) and those updates > are always COW, not overwrites. That's also how this is an atomic > operation. If the block group copy fails, the dev and chunk trees that > are used to translate between logical and physical block ranges never > get updated. > > > -- > Chris Murphy Also, fs resize always happens when doing device add or device remove. So resize is integral for Btrfs multiple device support. Device add and remove can likewise only be done while the file system is mounted. Removing a device means migrating block groups off that device, shrinking the file system by an amount identical to the device size, updating superblocks on remaining devices, and wiping the Btrfs signature on the removed device. And there are similar behaviors when converting block group profiles: e.g. from single to raid1, single to DUP, DUP to single, raid5 to raid6 or vice versa and so on. Conversions are only possible while the file system is mounted. LVM pvmove isn't entirely different in concept. The LVM extents are smaller (4MB by default) than Btrfs block groups (dynamically variable in size but most typically they are 1GiB for data bg's and 256MB for metadata bg's, and 32MB for system bg's. Btrfs block groups are collections of extents.). But basically the file system just keeps on reading and writing to its usual LBA's which are abstracted and translated into real physical LBA's and a device by LVM. I don't know how atomic pvmove is without the --atomic flag, and what the chances of resuming pvmove in case of crash or an urgent reboot is. The gotcha with ext4 and XFS is they put filesystem metadata in fixed locations on a block device, so those all have to be relocated to new fixed positions based on the new block device size as well as data. The shrink operation is probably sufficiently complicated for ext234 that they just don't want concurrent read/write operations happening while shrinking. And also the resize introduces inherent inefficiency with subsequent operation. The greater the difference between mkfs volume size and the resized size, the greater the inefficiency. That applies to both ext4 and XFS whether shrink or grow, of course XFS doesn't have shrink at all, the expectation for its more sophisticated environment use cases was that it would only ever be grown. Whereas Btrfs has no fixed locations for any of its block groups, so from its perspective a resize is just not that unique of an operation, leveraging code that's regularly exercised in normal operation anyway. And it also doesn't suffer from any resize inefficiencies either; in fact depending on the operation it might become more efficient. Anyway, probably a better way of handling shrink with ext4 and XFS is having them on LVM thin volumes, and just using fstrim to remove unused LVM extents from the LV, releasing them back to the pool for use by any other LV in that pool. It's not exactly the same thing as a shrink of course, but if the idea is to let a file system use the unused but "reserved" space of a second file system, merely trimming the second file system on a thin LV does achieve that. Bigger issue here is you can't then shrink the pool, so you can still get stuck in some circumstances. -- Chris Murphy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? 2018-07-27 18:26 ` Marc MERLIN 2018-07-27 19:31 ` John Stoffel @ 2018-07-31 2:44 ` Marc MERLIN 2018-07-31 12:35 ` Zdenek Kabelac 1 sibling, 1 reply; 15+ messages in thread From: Marc MERLIN @ 2018-07-31 2:44 UTC (permalink / raw) To: Zdenek Kabelac; +Cc: LVM general discussion and development On Fri, Jul 27, 2018 at 11:26:58AM -0700, Marc MERLIN wrote: > Hi Zdenek, > > Thanks for your helpful reply. Ha again Zdenek, Just to confirm, am I going to be ok enough with the scheme I described as long as I ensure that 'Allocated pool data' does not get to 100% ? For now, I have my btrfs filesystems mounted with "discard", so hopefully it should tell dm-thin when it can free up/reuse blocks. Given that, am I more or less ok using dm-thin that way? And for my own understanding, is there any reason why I would even want to consider thin_pool_autoextend_threshold < 100 ? As a reminder, I have: mdadm raid5 bcache dmcrypt dm-thin lvm btrfs Thanks, Marc > On Fri, Jul 27, 2018 at 02:59:28PM +0200, Zdenek Kabelac wrote: > > Dne 26.7.2018 v 18:31 Marc MERLIN napsal(a): > > >Still learning about thin volumes. > > >Why do I want my thin pool to get auto extended? Does "extended" mean > > >resized? > > > > yes extension == resize > > Gotcha. Then I don't want to have to worry about my filesystem being resized > multiple times, especially since I'm not sure how it will help. > > > man lvmthin. > > Thanks. Had read it, but not carefully enough. > So, I just re-read "Automatic extend settings" > I'm still I'm not entirely sure how using extension would help me there. I > can't set it to 10% for all 10 filesystems (50% is minimum). > If I set it to anything less than 100%, it could later that it can block, > and try to extend and resize later, but ultimately I'll still have multiple > filesystems that together exceed the space available, so I can run out. > I'm not seeing how the automatic extend setting is helpful, at least in my case. > Am I missing something? > > To be clear, my case is that I will have 10 filesystems in a place where the > same data was in a single filesystem that sadly I must segment now. More > than a few will take more than 1/10th of the space, but I don't want to have > to worry about which ones are going to use how much as long as all together > they stay below 100% of course. > I don't want to have to manage space for each of those 10 and have to resize > them by hand multiple times up and down to share the space, hence dm-thin. > > My understanding is that I have to watch this carefully > LV Name thinpool2 > VG Name vgds2 > LV Pool metadata thinpool2_tmeta > LV Pool data thinpool2_tdata > LV Status available > # open 8 > LV Size 14.50 TiB > Allocated pool data 20.26% > Allocated metadata 10.66% > > I'll have to make sure to run fstrim so that 'Allocated pool data' never > gets too high. > Metadata, I need to read more about to see whether that may become a problem. > I think as long as I don't use LVM snapshots I should be ok (and I won't). > > > Running out-of-space in thin-pool (data and even more on metadata) will > > have always MAJOR impact on usability of your system. It's always > > unpleasant moment and it's not even closely comparable with something like > > running out-of-space in your filesystem - it's much more problematic case - > > so you should at all cost try to avoid it. > > Thanks for confirming. > I suppose in my case I should set 'errorwhenfull y' so that the FS immmediately > remounts read only on write failure. Delaying for up to 60 seconds is not > going to help in my case. > > > If you want to be living on corner case of out-of-space, thin-pool is > > probably not the best technology for use. > > I don't want to be using dm-thin at all, but I have too many subvolumes for > a single btrfs filesystem, so I need to segement my btrfs filesystem in 10 > or so, to be safe (as discussed with btrfs developers) > > > IMHO bad plan to combine 2 overprovisioning technologies together. > > btrfs HAS its own built-in volume manager (aka built-in it's own like lvm) > > btrfs does not over provision, and sadly I found out that if you have more > than 50 or 100 snapshots, you are going to run into problems with balancing, > and bigger problems with filesystem corruption and repair later (as I found > out over the last 3 weeks dealing with this) > > > >There is however an issue with btrfs where it gets more unsafe (and > > >slower) to use if you have too many snapshots (over 50, and especially > > >over 100). > > > > It's better to pair thin-pool with ext4 of XFS. > > I need btrfs send/receive, so that's not an option. > > > BTRFS will suffer great pain from problems of lvm2 snapshots - where btrfs > > I will not be using lvm snapshots at all. > > > will see the very same block device multiple times present in your system - > > so I'd highly discourage usage of thin-pool with btrfs unless you are very > > well aware of the weaknesses and you can avoid running into them... > > I'm only using thin-pool to allow dynamic block allocation for over > provisioning. I will use no other LVM feature. Is that ok? > > > Possible lose of your data in case you run out of space and you hit some > > corner cases - note just with 4.18 kernel will be fixed one quite annoying > > bug with usage of TRIM and full pool which could have lead to some > > problematic metadata recovery. > > So, as long as I run trim in btrfs and make very sure I don't run out of blocks > on the VG side, should I be safe-ish enough? > > Thanks, > Marc > -- > "A mouse is a device used to point at the xterm you want to type in" - A.S.R. > Microsoft is to operating systems .... > .... what McDonalds is to gourmet cooking > Home page: http://marc.merlins.org/ > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ > -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? 2018-07-31 2:44 ` Marc MERLIN @ 2018-07-31 12:35 ` Zdenek Kabelac 2018-07-31 21:17 ` Marc MERLIN 0 siblings, 1 reply; 15+ messages in thread From: Zdenek Kabelac @ 2018-07-31 12:35 UTC (permalink / raw) To: LVM general discussion and development, Marc MERLIN Dne 31.7.2018 v 04:44 Marc MERLIN napsal(a): > On Fri, Jul 27, 2018 at 11:26:58AM -0700, Marc MERLIN wrote: >> Hi Zdenek, >> >> Thanks for your helpful reply. > > Ha again Zdenek, > > Just to confirm, am I going to be ok enough with the scheme I described > as long as I ensure that 'Allocated pool data' does not get to 100% ? > > For now, I have my btrfs filesystems mounted with "discard", so > hopefully it should tell dm-thin when it can free up/reuse blocks. > > Given that, am I more or less ok using dm-thin that way? > > And for my own understanding, is there any reason why I would even want > to consider thin_pool_autoextend_threshold < 100 ? > Hi If you monitor amount of free space for data AND for metadata in thin-pool yourself you can keep easily threshold == 100. Just don't forget when you upsize 'data' - you should also typically extend also metadata - it's not uncommon issue user start with small 'data' & 'metadata' LV with thin-pool - then continue to only extend thin-pool 'data' volume and ignore/forget about metadata completely and hit the full metadata device - which can lead to many troubles (hitting full dataLV is normally not a big deal). Regards Zdenek ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? 2018-07-31 12:35 ` Zdenek Kabelac @ 2018-07-31 21:17 ` Marc MERLIN 2018-08-01 11:37 ` Zdenek Kabelac 0 siblings, 1 reply; 15+ messages in thread From: Marc MERLIN @ 2018-07-31 21:17 UTC (permalink / raw) To: Zdenek Kabelac; +Cc: LVM general discussion and development On Tue, Jul 31, 2018 at 02:35:42PM +0200, Zdenek Kabelac wrote: > If you monitor amount of free space for data AND for metadata in thin-pool > yourself you can keep easily threshold == 100. Understood. Two things: 1) basically threshold < 100 allows you to hit the limit, have LVM pause IO, allocate more blocks, and resize the filesystem for you. However, if you're not monitoring this, it's ultimately just the same as having threshold = 100 and hoping that you won't hit the limit, except that you're adding the complexity of resizes in the mix. Correct? 2) I wasn't quite clear on what metadata was used for, and I let vgcreate pick a default amount for me. Am I correct that it basically tracks block usage and maybe LVM snapshots that I'm not going to use, and that therefore if I don't resize my LV, I don't really have to worry about metadata running out? > Just don't forget when you upsize 'data' - you should also typically > extend also metadata - it's not uncommon issue user start with small > 'data' & 'metadata' LV with thin-pool - then continue to only extend > thin-pool 'data' volume and ignore/forget about metadata completely > and hit the full metadata device - which can lead to many troubles > (hitting full dataLV is normally not a big deal). Thanks for the warning. Given that I started with the maximum size and don't plain on ever extending (to be fair, I can't), I should be ok there, correct? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? 2018-07-31 21:17 ` Marc MERLIN @ 2018-08-01 11:37 ` Zdenek Kabelac 0 siblings, 0 replies; 15+ messages in thread From: Zdenek Kabelac @ 2018-08-01 11:37 UTC (permalink / raw) To: LVM general discussion and development, Marc MERLIN Dne 31.7.2018 v 23:17 Marc MERLIN napsal(a): > On Tue, Jul 31, 2018 at 02:35:42PM +0200, Zdenek Kabelac wrote: >> If you monitor amount of free space for data AND for metadata in thin-pool >> yourself you can keep easily threshold == 100. > > Understood. Two things: > 1) basically threshold < 100 allows you to hit the limit, have LVM pause > IO, allocate more blocks, and resize the filesystem for you. > However, if you're not monitoring this, it's ultimately just the same as > having threshold = 100 and hoping that you won't hit the limit, except > that you're adding the complexity of resizes in the mix. Correct? Sure thing, when there is no free space to extend your overprovisioned thin-pool and you run out-of-space you hit the limit at some point.... > 2) I wasn't quite clear on what metadata was used for, and I let > vgcreate pick a default amount for me. Am I correct that it basically > tracks block usage and maybe LVM snapshots that I'm not going to use, > and that therefore if I don't resize my LV, I don't really have to > worry about metadata running out? kernel metadata stored in _tmeta LV hold old mapping of all thin-volumes. i.e. which thin-pool chunk belongs to which thin-volume. >> Just don't forget when you upsize 'data' - you should also typically >> extend also metadata - it's not uncommon issue user start with small >> 'data' & 'metadata' LV with thin-pool - then continue to only extend >> thin-pool 'data' volume and ignore/forget about metadata completely >> and hit the full metadata device - which can lead to many troubles >> (hitting full dataLV is normally not a big deal). > > Thanks for the warning. Given that I started with the maximum size and > don't plain on ever extending (to be fair, I can't), I should be ok > there, correct? Yep - once you make ~16GiB metadata you can't make them any bigger (hard internal limitation of existing thin-pool target implementation). But you still need to remember you can run of space in your metadata if there is heavy usage of many large thin volumes - so the value of free space should be always somehow monitored... Regards Zdenek ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2018-08-02 17:42 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-07-26 16:31 [linux-lvm] Why use thin_pool_autoextend_threshold < 100 ? Marc MERLIN 2018-07-27 12:59 ` Zdenek Kabelac 2018-07-27 18:26 ` Marc MERLIN 2018-07-27 19:31 ` John Stoffel 2018-07-27 19:58 ` Marc MERLIN 2018-07-27 21:09 ` John Stoffel 2018-07-27 23:35 ` Marc MERLIN 2018-07-31 4:52 ` Chris Murphy 2018-08-01 1:33 ` John Stoffel 2018-08-01 2:43 ` Chris Murphy 2018-08-02 17:42 ` Chris Murphy 2018-07-31 2:44 ` Marc MERLIN 2018-07-31 12:35 ` Zdenek Kabelac 2018-07-31 21:17 ` Marc MERLIN 2018-08-01 11:37 ` Zdenek Kabelac
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).