* Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-18 21:33 ` Spelic 0 siblings, 0 replies; 72+ messages in thread From: Spelic @ 2012-06-18 21:33 UTC (permalink / raw) To: xfs, linux-ext4, device-mapper development [-- Attachment #1.1: Type: text/plain, Size: 3519 bytes --] Hello all I am doing some testing of dm-thin on kernel 3.4.2 and latest lvm from source (the rest is Ubuntu Precise 12.04). There are a few problems with ext4 and (different ones with) xfs I am doing this: dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync lvs rm zeroes #optional dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync #again lvs rm zeroes #optional ... dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync #again lvs rm zeroes fstrim /mnt/mountpoint lvs On ext4 the problem is that it always reallocates blocks at different places, so you can see from lvs that space occupation in the pool and thinlv increases at each iteration of dd, again and again, until it has allocated the whole thin device (really 100% of it). And this is true regardless of me doing rm or not between one dd and the other. The other problem is that by doing this, ext4 always gets the worst performance from thinp, about 140MB/sec on my system, because it is constantly allocating blocks, instead of 350MB/sec which should have been with my system if it used already allocated regions (see below compared to xfs). I am on an MD raid-5 of 5 hdds. I could suggest to add a "thinp mode" mount option to ext4 affecting the allocator, so that it tries to reallocate recently used and freed areas and not constantly new areas. Note that mount -o discard does work and prevents allocation bloating, but it still always gets the worst write performances from thinp. Alternatively thinp could be improved so that block allocation is fast :-P (*) However, good news is that fstrim works correctly on ext4, and is able to drop all space allocated by all dd's. Also mount -o discard works. On xfs there is a different problem. Xfs apparently correctly re-uses the same blocks so that after the first write at 140MB/sec, subsequent overwrites of the same file are at full speed such as 350MB/sec (same speed as with non-thin lvm), and also you don't see space occupation going up at every iteration of dd, either with or without rm in-between the dd's. [ok actually now retrying it needed 3 rewrites to stabilize allocation... probably an AG count thing.] However the problem with XFS is that discard doesn't appear to work. Fstrim doesn't work, and neither does "mount -o discard ... + rm zeroes" . There is apparently no way to drop the allocated blocks, as seen from lvs. This is in contrast to what it is written here http://xfs.org/index.php/FITRIM/discard which declare fstrim and mount -o discard to be working. Please note that since I am above MD raid5 (I believe this is the reason), the passdown of discards does not work, as my dmesg says: [160508.497879] device-mapper: thin: Discard unsupported by data device (dm-1): Disabling discard passdown. but AFAIU, unless there is a thinp bug, this should not affect the unmapping of thin blocks by fstrimming xfs... and in fact ext4 is able to do that. (*) Strange thing is that write performance appears to be roughly the same for default thin chunksize and for 1MB thin chunksize. I would have expected thinp allocation to be faster with larger thin chunksizes but instead it is actually slower (note that there are no snapshots here and hence no CoW). This is also true if I set the thinpool to not zero newly allocated blocks: performances are about 240 MB/sec then, but again they don't increase with larger chunksizes, they actually decrease slightly with very large chunksizes such as 16MB. Why is that? Thanks for your help S. [-- Attachment #1.2: Type: text/html, Size: 4284 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-18 21:33 ` Spelic 0 siblings, 0 replies; 72+ messages in thread From: Spelic @ 2012-06-18 21:33 UTC (permalink / raw) To: xfs, linux-ext4, device-mapper development [-- Attachment #1.1: Type: text/plain, Size: 3519 bytes --] Hello all I am doing some testing of dm-thin on kernel 3.4.2 and latest lvm from source (the rest is Ubuntu Precise 12.04). There are a few problems with ext4 and (different ones with) xfs I am doing this: dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync lvs rm zeroes #optional dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync #again lvs rm zeroes #optional ... dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync #again lvs rm zeroes fstrim /mnt/mountpoint lvs On ext4 the problem is that it always reallocates blocks at different places, so you can see from lvs that space occupation in the pool and thinlv increases at each iteration of dd, again and again, until it has allocated the whole thin device (really 100% of it). And this is true regardless of me doing rm or not between one dd and the other. The other problem is that by doing this, ext4 always gets the worst performance from thinp, about 140MB/sec on my system, because it is constantly allocating blocks, instead of 350MB/sec which should have been with my system if it used already allocated regions (see below compared to xfs). I am on an MD raid-5 of 5 hdds. I could suggest to add a "thinp mode" mount option to ext4 affecting the allocator, so that it tries to reallocate recently used and freed areas and not constantly new areas. Note that mount -o discard does work and prevents allocation bloating, but it still always gets the worst write performances from thinp. Alternatively thinp could be improved so that block allocation is fast :-P (*) However, good news is that fstrim works correctly on ext4, and is able to drop all space allocated by all dd's. Also mount -o discard works. On xfs there is a different problem. Xfs apparently correctly re-uses the same blocks so that after the first write at 140MB/sec, subsequent overwrites of the same file are at full speed such as 350MB/sec (same speed as with non-thin lvm), and also you don't see space occupation going up at every iteration of dd, either with or without rm in-between the dd's. [ok actually now retrying it needed 3 rewrites to stabilize allocation... probably an AG count thing.] However the problem with XFS is that discard doesn't appear to work. Fstrim doesn't work, and neither does "mount -o discard ... + rm zeroes" . There is apparently no way to drop the allocated blocks, as seen from lvs. This is in contrast to what it is written here http://xfs.org/index.php/FITRIM/discard which declare fstrim and mount -o discard to be working. Please note that since I am above MD raid5 (I believe this is the reason), the passdown of discards does not work, as my dmesg says: [160508.497879] device-mapper: thin: Discard unsupported by data device (dm-1): Disabling discard passdown. but AFAIU, unless there is a thinp bug, this should not affect the unmapping of thin blocks by fstrimming xfs... and in fact ext4 is able to do that. (*) Strange thing is that write performance appears to be roughly the same for default thin chunksize and for 1MB thin chunksize. I would have expected thinp allocation to be faster with larger thin chunksizes but instead it is actually slower (note that there are no snapshots here and hence no CoW). This is also true if I set the thinpool to not zero newly allocated blocks: performances are about 240 MB/sec then, but again they don't increase with larger chunksizes, they actually decrease slightly with very large chunksizes such as 16MB. Why is that? Thanks for your help S. [-- Attachment #1.2: Type: text/html, Size: 4202 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-18 21:33 ` Spelic @ 2012-06-19 1:57 ` Dave Chinner -1 siblings, 0 replies; 72+ messages in thread From: Dave Chinner @ 2012-06-19 1:57 UTC (permalink / raw) To: Spelic; +Cc: xfs, linux-ext4, device-mapper development On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote: > Hello all > I am doing some testing of dm-thin on kernel 3.4.2 and latest lvm > from source (the rest is Ubuntu Precise 12.04). > There are a few problems with ext4 and (different ones with) xfs > > I am doing this: > dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync > lvs > rm zeroes #optional > dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync #again > lvs > rm zeroes #optional > ... > dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync #again > lvs > rm zeroes > fstrim /mnt/mountpoint > lvs [snip ext4 problems] > On xfs there is a different problem. > Xfs apparently correctly re-uses the same blocks so that after the > first write at 140MB/sec, subsequent overwrites of the same file are > at full speed such as 350MB/sec (same speed as with non-thin lvm), > and also you don't see space occupation going up at every iteration > of dd, either with or without rm in-between the dd's. [ok actually > now retrying it needed 3 rewrites to stabilize allocation... > probably an AG count thing.] That's just a characteristic of the allocation algorithm. It's not something that you see in day-to-day operation of the filesystem, though, because you rarely remove and rewrite a file like this repeatedly. So in the real world, performance will be more like ext4 when you are running workloads where you actually store data for longer than a millisecond... Expect that the 140MB/s number is the normal performance case, because as soon as you take a snapshot, the overwrite requires new blocks to be allocated in dm-thinp. You don't get thinp for nothing - it has an associated performance cost as you are now finding out.... > However the problem with XFS is that discard doesn't appear to work. > Fstrim doesn't work, and neither does "mount -o discard ... + rm > zeroes" . There is apparently no way to drop the allocated blocks, > as seen from lvs. This is in contrast to what it is written here > http://xfs.org/index.php/FITRIM/discard which declare fstrim and > mount -o discard to be working. I don't see why it wouldnt be if the underlying device supports it. Have you looked at a block trace or an xfs event trace to see if discards are being issued by XFS? Are you getting messages like: XFS: (dev) discard failed for extent [0x123,4096], error -5 in dmesg, or is fstrim seeing errors returned from the trim ioctl? > Please note that since I am above MD raid5 (I believe this is the > reason), the passdown of discards does not work, as my dmesg says: > [160508.497879] device-mapper: thin: Discard unsupported by data > device (dm-1): Disabling discard passdown. > but AFAIU, unless there is a thinp bug, this should not affect the > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is > able to do that. Does ext4 report that same error? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 1:57 ` Dave Chinner 0 siblings, 0 replies; 72+ messages in thread From: Dave Chinner @ 2012-06-19 1:57 UTC (permalink / raw) To: Spelic; +Cc: device-mapper development, linux-ext4, xfs On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote: > Hello all > I am doing some testing of dm-thin on kernel 3.4.2 and latest lvm > from source (the rest is Ubuntu Precise 12.04). > There are a few problems with ext4 and (different ones with) xfs > > I am doing this: > dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync > lvs > rm zeroes #optional > dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync #again > lvs > rm zeroes #optional > ... > dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync #again > lvs > rm zeroes > fstrim /mnt/mountpoint > lvs [snip ext4 problems] > On xfs there is a different problem. > Xfs apparently correctly re-uses the same blocks so that after the > first write at 140MB/sec, subsequent overwrites of the same file are > at full speed such as 350MB/sec (same speed as with non-thin lvm), > and also you don't see space occupation going up at every iteration > of dd, either with or without rm in-between the dd's. [ok actually > now retrying it needed 3 rewrites to stabilize allocation... > probably an AG count thing.] That's just a characteristic of the allocation algorithm. It's not something that you see in day-to-day operation of the filesystem, though, because you rarely remove and rewrite a file like this repeatedly. So in the real world, performance will be more like ext4 when you are running workloads where you actually store data for longer than a millisecond... Expect that the 140MB/s number is the normal performance case, because as soon as you take a snapshot, the overwrite requires new blocks to be allocated in dm-thinp. You don't get thinp for nothing - it has an associated performance cost as you are now finding out.... > However the problem with XFS is that discard doesn't appear to work. > Fstrim doesn't work, and neither does "mount -o discard ... + rm > zeroes" . There is apparently no way to drop the allocated blocks, > as seen from lvs. This is in contrast to what it is written here > http://xfs.org/index.php/FITRIM/discard which declare fstrim and > mount -o discard to be working. I don't see why it wouldnt be if the underlying device supports it. Have you looked at a block trace or an xfs event trace to see if discards are being issued by XFS? Are you getting messages like: XFS: (dev) discard failed for extent [0x123,4096], error -5 in dmesg, or is fstrim seeing errors returned from the trim ioctl? > Please note that since I am above MD raid5 (I believe this is the > reason), the passdown of discards does not work, as my dmesg says: > [160508.497879] device-mapper: thin: Discard unsupported by data > device (dm-1): Disabling discard passdown. > but AFAIU, unless there is a thinp bug, this should not affect the > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is > able to do that. Does ext4 report that same error? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 1:57 ` Dave Chinner @ 2012-06-19 3:12 ` Mike Snitzer -1 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-19 3:12 UTC (permalink / raw) To: Dave Chinner; +Cc: Spelic, device-mapper development, linux-ext4, xfs On Mon, Jun 18 2012 at 9:57pm -0400, Dave Chinner <david@fromorbit.com> wrote: > On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote: > > > Please note that since I am above MD raid5 (I believe this is the > > reason), the passdown of discards does not work, as my dmesg says: > > [160508.497879] device-mapper: thin: Discard unsupported by data > > device (dm-1): Disabling discard passdown. > > but AFAIU, unless there is a thinp bug, this should not affect the > > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is > > able to do that. > > Does ext4 report that same error? That message says the underlying device doesn't support discards (because it is an MD device). But the thinp device still has discards enabled -- it just won't pass the discards down to the underlying data device. So yes, it'll happen with ext4 -- it is generated when the thin-pool device is loaded (which happens independent of the filesystem that is layered ontop). The discards still inform the thin-pool that the corresponding extents are no longer allocated. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 3:12 ` Mike Snitzer 0 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-19 3:12 UTC (permalink / raw) To: Dave Chinner; +Cc: device-mapper development, linux-ext4, xfs, Spelic On Mon, Jun 18 2012 at 9:57pm -0400, Dave Chinner <david@fromorbit.com> wrote: > On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote: > > > Please note that since I am above MD raid5 (I believe this is the > > reason), the passdown of discards does not work, as my dmesg says: > > [160508.497879] device-mapper: thin: Discard unsupported by data > > device (dm-1): Disabling discard passdown. > > but AFAIU, unless there is a thinp bug, this should not affect the > > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is > > able to do that. > > Does ext4 report that same error? That message says the underlying device doesn't support discards (because it is an MD device). But the thinp device still has discards enabled -- it just won't pass the discards down to the underlying data device. So yes, it'll happen with ext4 -- it is generated when the thin-pool device is loaded (which happens independent of the filesystem that is layered ontop). The discards still inform the thin-pool that the corresponding extents are no longer allocated. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 3:12 ` Mike Snitzer @ 2012-06-19 6:32 ` Lukáš Czerner -1 siblings, 0 replies; 72+ messages in thread From: Lukáš Czerner @ 2012-06-19 6:32 UTC (permalink / raw) To: Mike Snitzer Cc: Dave Chinner, Spelic, device-mapper development, linux-ext4, xfs On Mon, 18 Jun 2012, Mike Snitzer wrote: > Date: Mon, 18 Jun 2012 23:12:42 -0400 > From: Mike Snitzer <snitzer@redhat.com> > To: Dave Chinner <david@fromorbit.com> > Cc: Spelic <spelic@shiftmail.org>, > device-mapper development <dm-devel@redhat.com>, > linux-ext4@vger.kernel.org, xfs@oss.sgi.com > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard > > On Mon, Jun 18 2012 at 9:57pm -0400, > Dave Chinner <david@fromorbit.com> wrote: > > > On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote: > > > > > Please note that since I am above MD raid5 (I believe this is the > > > reason), the passdown of discards does not work, as my dmesg says: > > > [160508.497879] device-mapper: thin: Discard unsupported by data > > > device (dm-1): Disabling discard passdown. > > > but AFAIU, unless there is a thinp bug, this should not affect the > > > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is > > > able to do that. > > > > Does ext4 report that same error? > > That message says the underlying device doesn't support discards > (because it is an MD device). But the thinp device still has discards > enabled -- it just won't pass the discards down to the underlying data > device. > > So yes, it'll happen with ext4 -- it is generated when the thin-pool > device is loaded (which happens independent of the filesystem that is > layered ontop). > > The discards still inform the thin-pool that the corresponding extents > are no longer allocated. So do I understand correctly that even though the discard came through and thinp took advantage of it it still returns EOPNOTSUPP ? This seems rather suboptimal. IIRC there was a discussion to add an option to enable/disable sending discard in thinp target down to the device. So maybe it might be a bit smarter than that and actually enable/disable discard pass through depending on the underlying support, so we do not blindly send discard down to the device even though it does not support it. So we'll have three options: pass through - always send discard down backstop - never send discard down to the device auto - send discard only if the underlying device supports it What do you think ? -Lukas > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 6:32 ` Lukáš Czerner 0 siblings, 0 replies; 72+ messages in thread From: Lukáš Czerner @ 2012-06-19 6:32 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development, linux-ext4, xfs, Spelic On Mon, 18 Jun 2012, Mike Snitzer wrote: > Date: Mon, 18 Jun 2012 23:12:42 -0400 > From: Mike Snitzer <snitzer@redhat.com> > To: Dave Chinner <david@fromorbit.com> > Cc: Spelic <spelic@shiftmail.org>, > device-mapper development <dm-devel@redhat.com>, > linux-ext4@vger.kernel.org, xfs@oss.sgi.com > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard > > On Mon, Jun 18 2012 at 9:57pm -0400, > Dave Chinner <david@fromorbit.com> wrote: > > > On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote: > > > > > Please note that since I am above MD raid5 (I believe this is the > > > reason), the passdown of discards does not work, as my dmesg says: > > > [160508.497879] device-mapper: thin: Discard unsupported by data > > > device (dm-1): Disabling discard passdown. > > > but AFAIU, unless there is a thinp bug, this should not affect the > > > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is > > > able to do that. > > > > Does ext4 report that same error? > > That message says the underlying device doesn't support discards > (because it is an MD device). But the thinp device still has discards > enabled -- it just won't pass the discards down to the underlying data > device. > > So yes, it'll happen with ext4 -- it is generated when the thin-pool > device is loaded (which happens independent of the filesystem that is > layered ontop). > > The discards still inform the thin-pool that the corresponding extents > are no longer allocated. So do I understand correctly that even though the discard came through and thinp took advantage of it it still returns EOPNOTSUPP ? This seems rather suboptimal. IIRC there was a discussion to add an option to enable/disable sending discard in thinp target down to the device. So maybe it might be a bit smarter than that and actually enable/disable discard pass through depending on the underlying support, so we do not blindly send discard down to the device even though it does not support it. So we'll have three options: pass through - always send discard down backstop - never send discard down to the device auto - send discard only if the underlying device supports it What do you think ? -Lukas > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 6:32 ` Lukáš Czerner @ 2012-06-19 11:29 ` Spelic -1 siblings, 0 replies; 72+ messages in thread From: Spelic @ 2012-06-19 11:29 UTC (permalink / raw) To: Lukáš Czerner Cc: Mike Snitzer, Dave Chinner, Spelic, device-mapper development, linux-ext4, xfs On 06/19/12 08:32, Lukáš Czerner wrote: > > So do I understand correctly that even though the discard came > through and thinp took advantage of it it still returns EOPNOTSUPP ? > This seems rather suboptimal. IIRC there was a discussion to add an > option to enable/disable sending discard in thinp target down > to the device. I'll ask this too... do I understand correctly that dm-thin returns EOPNOTSUPP to the filesystem layer even though it is using the discard to unmap blocks, and at that point XFS stops sending discards down there (while ext4 keeps sending them)? This looks like a bug of dm-thin to me. Discards are "supported" in such a scenario. Do you have a patch for dm-thin so to prevent it sending EOPTNOTSUPP ? Thank you S. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 11:29 ` Spelic 0 siblings, 0 replies; 72+ messages in thread From: Spelic @ 2012-06-19 11:29 UTC (permalink / raw) To: Lukáš Czerner Cc: Mike Snitzer, xfs, device-mapper development, Spelic, linux-ext4 On 06/19/12 08:32, Lukáš Czerner wrote: > > So do I understand correctly that even though the discard came > through and thinp took advantage of it it still returns EOPNOTSUPP ? > This seems rather suboptimal. IIRC there was a discussion to add an > option to enable/disable sending discard in thinp target down > to the device. I'll ask this too... do I understand correctly that dm-thin returns EOPNOTSUPP to the filesystem layer even though it is using the discard to unmap blocks, and at that point XFS stops sending discards down there (while ext4 keeps sending them)? This looks like a bug of dm-thin to me. Discards are "supported" in such a scenario. Do you have a patch for dm-thin so to prevent it sending EOPTNOTSUPP ? Thank you S. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 11:29 ` Spelic @ 2012-06-19 12:20 ` Lukáš Czerner -1 siblings, 0 replies; 72+ messages in thread From: Lukáš Czerner @ 2012-06-19 12:20 UTC (permalink / raw) To: Spelic Cc: Lukáš Czerner, Mike Snitzer, Dave Chinner, device-mapper development, linux-ext4, xfs [-- Attachment #1: Type: TEXT/PLAIN, Size: 1444 bytes --] On Tue, 19 Jun 2012, Spelic wrote: > Date: Tue, 19 Jun 2012 13:29:55 +0200 > From: Spelic <spelic@shiftmail.org> > To: Lukáš Czerner <lczerner@redhat.com> > Cc: Mike Snitzer <snitzer@redhat.com>, Dave Chinner <david@fromorbit.com>, > Spelic <spelic@shiftmail.org>, > device-mapper development <dm-devel@redhat.com>, > linux-ext4@vger.kernel.org, xfs@oss.sgi.com > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard > > On 06/19/12 08:32, Lukáš Czerner wrote: > > > > So do I understand correctly that even though the discard came > > through and thinp took advantage of it it still returns EOPNOTSUPP ? > > This seems rather suboptimal. IIRC there was a discussion to add an > > option to enable/disable sending discard in thinp target down > > to the device. > > I'll ask this too... > do I understand correctly that dm-thin returns EOPNOTSUPP to the filesystem > layer even though it is using the discard to unmap blocks, and at that point > XFS stops sending discards down there (while ext4 keeps sending them)? > > This looks like a bug of dm-thin to me. Discards are "supported" in such a > scenario. > > Do you have a patch for dm-thin so to prevent it sending EOPTNOTSUPP ? Yes, definitely this behaviour need to change in dm-thin. I do not have a path, it was merely a proposal how thing could be done. Not sure what Mike and rest of the dm folks think about this. -Lukas > > Thank you > S. > ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 12:20 ` Lukáš Czerner 0 siblings, 0 replies; 72+ messages in thread From: Lukáš Czerner @ 2012-06-19 12:20 UTC (permalink / raw) To: Spelic Cc: Mike Snitzer, xfs, device-mapper development, Lukáš Czerner, linux-ext4 [-- Attachment #1: Type: TEXT/PLAIN, Size: 1444 bytes --] On Tue, 19 Jun 2012, Spelic wrote: > Date: Tue, 19 Jun 2012 13:29:55 +0200 > From: Spelic <spelic@shiftmail.org> > To: Lukáš Czerner <lczerner@redhat.com> > Cc: Mike Snitzer <snitzer@redhat.com>, Dave Chinner <david@fromorbit.com>, > Spelic <spelic@shiftmail.org>, > device-mapper development <dm-devel@redhat.com>, > linux-ext4@vger.kernel.org, xfs@oss.sgi.com > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard > > On 06/19/12 08:32, Lukáš Czerner wrote: > > > > So do I understand correctly that even though the discard came > > through and thinp took advantage of it it still returns EOPNOTSUPP ? > > This seems rather suboptimal. IIRC there was a discussion to add an > > option to enable/disable sending discard in thinp target down > > to the device. > > I'll ask this too... > do I understand correctly that dm-thin returns EOPNOTSUPP to the filesystem > layer even though it is using the discard to unmap blocks, and at that point > XFS stops sending discards down there (while ext4 keeps sending them)? > > This looks like a bug of dm-thin to me. Discards are "supported" in such a > scenario. > > Do you have a patch for dm-thin so to prevent it sending EOPTNOTSUPP ? Yes, definitely this behaviour need to change in dm-thin. I do not have a path, it was merely a proposal how thing could be done. Not sure what Mike and rest of the dm folks think about this. -Lukas > > Thank you > S. > [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 11:29 ` Spelic @ 2012-06-19 13:34 ` Mike Snitzer -1 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-19 13:34 UTC (permalink / raw) To: Spelic Cc: Lukáš Czerner, device-mapper development, linux-ext4, Dave Chinner, xfs On Tue, Jun 19 2012 at 7:29am -0400, Spelic <spelic@shiftmail.org> wrote: > On 06/19/12 08:32, Lukáš Czerner wrote: > > > >So do I understand correctly that even though the discard came > >through and thinp took advantage of it it still returns EOPNOTSUPP ? > >This seems rather suboptimal. IIRC there was a discussion to add an > >option to enable/disable sending discard in thinp target down > >to the device. > > I'll ask this too... > do I understand correctly that dm-thin returns EOPNOTSUPP to the > filesystem layer even though it is using the discard to unmap > blocks, and at that point XFS stops sending discards down there > (while ext4 keeps sending them)? Are you actually seeing that? Or are you just seizing on Lukas' misunderstanding? > This looks like a bug of dm-thin to me. Discards are "supported" in > such a scenario. > > Do you have a patch for dm-thin so to prevent it sending EOPTNOTSUPP ? thinp should _not_ be sending -EOPNOTSUPP unless 'ignore_discard' is provided as a feature when loading thin-pool's DM table. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 13:34 ` Mike Snitzer 0 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-19 13:34 UTC (permalink / raw) To: Spelic Cc: Lukáš Czerner, device-mapper development, linux-ext4, xfs On Tue, Jun 19 2012 at 7:29am -0400, Spelic <spelic@shiftmail.org> wrote: > On 06/19/12 08:32, Lukáš Czerner wrote: > > > >So do I understand correctly that even though the discard came > >through and thinp took advantage of it it still returns EOPNOTSUPP ? > >This seems rather suboptimal. IIRC there was a discussion to add an > >option to enable/disable sending discard in thinp target down > >to the device. > > I'll ask this too... > do I understand correctly that dm-thin returns EOPNOTSUPP to the > filesystem layer even though it is using the discard to unmap > blocks, and at that point XFS stops sending discards down there > (while ext4 keeps sending them)? Are you actually seeing that? Or are you just seizing on Lukas' misunderstanding? > This looks like a bug of dm-thin to me. Discards are "supported" in > such a scenario. > > Do you have a patch for dm-thin so to prevent it sending EOPTNOTSUPP ? thinp should _not_ be sending -EOPNOTSUPP unless 'ignore_discard' is provided as a feature when loading thin-pool's DM table. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 6:32 ` Lukáš Czerner @ 2012-06-19 13:16 ` Mike Snitzer -1 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-19 13:16 UTC (permalink / raw) To: Lukáš Czerner Cc: Dave Chinner, Spelic, device-mapper development, linux-ext4, xfs On Tue, Jun 19 2012 at 2:32am -0400, Lukáš Czerner <lczerner@redhat.com> wrote: > On Mon, 18 Jun 2012, Mike Snitzer wrote: > > > Date: Mon, 18 Jun 2012 23:12:42 -0400 > > From: Mike Snitzer <snitzer@redhat.com> > > To: Dave Chinner <david@fromorbit.com> > > Cc: Spelic <spelic@shiftmail.org>, > > device-mapper development <dm-devel@redhat.com>, > > linux-ext4@vger.kernel.org, xfs@oss.sgi.com > > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard > > > > On Mon, Jun 18 2012 at 9:57pm -0400, > > Dave Chinner <david@fromorbit.com> wrote: > > > > > On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote: > > > > > > > Please note that since I am above MD raid5 (I believe this is the > > > > reason), the passdown of discards does not work, as my dmesg says: > > > > [160508.497879] device-mapper: thin: Discard unsupported by data > > > > device (dm-1): Disabling discard passdown. > > > > but AFAIU, unless there is a thinp bug, this should not affect the > > > > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is > > > > able to do that. > > > > > > Does ext4 report that same error? > > > > That message says the underlying device doesn't support discards > > (because it is an MD device). But the thinp device still has discards > > enabled -- it just won't pass the discards down to the underlying data > > device. > > > > So yes, it'll happen with ext4 -- it is generated when the thin-pool > > device is loaded (which happens independent of the filesystem that is > > layered ontop). > > > > The discards still inform the thin-pool that the corresponding extents > > are no longer allocated. > > So do I understand correctly that even though the discard came > through and thinp took advantage of it it still returns EOPNOTSUPP ? No, not correct. Why are you assuming this? I must be missing something from this discussion that led you there. > This seems rather suboptimal. IIRC there was a discussion to add an > option to enable/disable sending discard in thinp target down > to the device. > > So maybe it might be a bit smarter than that and actually > enable/disable discard pass through depending on the underlying > support, so we do not blindly send discard down to the device even > though it does not support it. Yes, that is what we did. Discards are enabled my default (including discard passdown), but if the underlying data device doesn't support discards then the discards will not be passed down. And here are the feature controls that can be provided when loading the thin-pool's DM table: ignore_discard: disable discard no_discard_passdown: don't pass discards down to the data device -EOPNOTSUPP is only ever returned if 'ignore_discard' is provided. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 13:16 ` Mike Snitzer 0 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-19 13:16 UTC (permalink / raw) To: Lukáš Czerner Cc: device-mapper development, linux-ext4, xfs, Spelic On Tue, Jun 19 2012 at 2:32am -0400, Lukáš Czerner <lczerner@redhat.com> wrote: > On Mon, 18 Jun 2012, Mike Snitzer wrote: > > > Date: Mon, 18 Jun 2012 23:12:42 -0400 > > From: Mike Snitzer <snitzer@redhat.com> > > To: Dave Chinner <david@fromorbit.com> > > Cc: Spelic <spelic@shiftmail.org>, > > device-mapper development <dm-devel@redhat.com>, > > linux-ext4@vger.kernel.org, xfs@oss.sgi.com > > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard > > > > On Mon, Jun 18 2012 at 9:57pm -0400, > > Dave Chinner <david@fromorbit.com> wrote: > > > > > On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote: > > > > > > > Please note that since I am above MD raid5 (I believe this is the > > > > reason), the passdown of discards does not work, as my dmesg says: > > > > [160508.497879] device-mapper: thin: Discard unsupported by data > > > > device (dm-1): Disabling discard passdown. > > > > but AFAIU, unless there is a thinp bug, this should not affect the > > > > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is > > > > able to do that. > > > > > > Does ext4 report that same error? > > > > That message says the underlying device doesn't support discards > > (because it is an MD device). But the thinp device still has discards > > enabled -- it just won't pass the discards down to the underlying data > > device. > > > > So yes, it'll happen with ext4 -- it is generated when the thin-pool > > device is loaded (which happens independent of the filesystem that is > > layered ontop). > > > > The discards still inform the thin-pool that the corresponding extents > > are no longer allocated. > > So do I understand correctly that even though the discard came > through and thinp took advantage of it it still returns EOPNOTSUPP ? No, not correct. Why are you assuming this? I must be missing something from this discussion that led you there. > This seems rather suboptimal. IIRC there was a discussion to add an > option to enable/disable sending discard in thinp target down > to the device. > > So maybe it might be a bit smarter than that and actually > enable/disable discard pass through depending on the underlying > support, so we do not blindly send discard down to the device even > though it does not support it. Yes, that is what we did. Discards are enabled my default (including discard passdown), but if the underlying data device doesn't support discards then the discards will not be passed down. And here are the feature controls that can be provided when loading the thin-pool's DM table: ignore_discard: disable discard no_discard_passdown: don't pass discards down to the data device -EOPNOTSUPP is only ever returned if 'ignore_discard' is provided. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 13:16 ` Mike Snitzer @ 2012-06-19 13:25 ` Lukáš Czerner -1 siblings, 0 replies; 72+ messages in thread From: Lukáš Czerner @ 2012-06-19 13:25 UTC (permalink / raw) To: Mike Snitzer Cc: Lukáš Czerner, Dave Chinner, Spelic, device-mapper development, linux-ext4, xfs [-- Attachment #1: Type: TEXT/PLAIN, Size: 4041 bytes --] On Tue, 19 Jun 2012, Mike Snitzer wrote: > Date: Tue, 19 Jun 2012 09:16:49 -0400 > From: Mike Snitzer <snitzer@redhat.com> > To: Lukáš Czerner <lczerner@redhat.com> > Cc: Dave Chinner <david@fromorbit.com>, Spelic <spelic@shiftmail.org>, > device-mapper development <dm-devel@redhat.com>, > linux-ext4@vger.kernel.org, xfs@oss.sgi.com > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard > > On Tue, Jun 19 2012 at 2:32am -0400, > Lukáš Czerner <lczerner@redhat.com> wrote: > > > On Mon, 18 Jun 2012, Mike Snitzer wrote: > > > > > Date: Mon, 18 Jun 2012 23:12:42 -0400 > > > From: Mike Snitzer <snitzer@redhat.com> > > > To: Dave Chinner <david@fromorbit.com> > > > Cc: Spelic <spelic@shiftmail.org>, > > > device-mapper development <dm-devel@redhat.com>, > > > linux-ext4@vger.kernel.org, xfs@oss.sgi.com > > > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard > > > > > > On Mon, Jun 18 2012 at 9:57pm -0400, > > > Dave Chinner <david@fromorbit.com> wrote: > > > > > > > On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote: > > > > > > > > > Please note that since I am above MD raid5 (I believe this is the > > > > > reason), the passdown of discards does not work, as my dmesg says: > > > > > [160508.497879] device-mapper: thin: Discard unsupported by data > > > > > device (dm-1): Disabling discard passdown. > > > > > but AFAIU, unless there is a thinp bug, this should not affect the > > > > > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is > > > > > able to do that. > > > > > > > > Does ext4 report that same error? > > > > > > That message says the underlying device doesn't support discards > > > (because it is an MD device). But the thinp device still has discards > > > enabled -- it just won't pass the discards down to the underlying data > > > device. > > > > > > So yes, it'll happen with ext4 -- it is generated when the thin-pool > > > device is loaded (which happens independent of the filesystem that is > > > layered ontop). > > > > > > The discards still inform the thin-pool that the corresponding extents > > > are no longer allocated. > > > > So do I understand correctly that even though the discard came > > through and thinp took advantage of it it still returns EOPNOTSUPP ? > > No, not correct. Why are you assuming this? I must be missing > something from this discussion that led you there. Those two paragraphs led me to that conclusion: That message says the underlying device doesn't support discards (because it is an MD device). But the thinp device still has discards enabled -- it just won't pass the discards down to the underlying data device. The discards still inform the thin-pool that the corresponding extents are no longer allocated. so I am a bit confused now. Why the dm-thin returned EOPNOTSUPP then ? Is that because it has been configured to ignore_discard, or it actually takes advantage of the discard but underlying device does not support it (and no_discard_passdown is not set) so it return EOPNOTSUPP ? > > > This seems rather suboptimal. IIRC there was a discussion to add an > > option to enable/disable sending discard in thinp target down > > to the device. > > > > So maybe it might be a bit smarter than that and actually > > enable/disable discard pass through depending on the underlying > > support, so we do not blindly send discard down to the device even > > though it does not support it. > > Yes, that is what we did. > > Discards are enabled my default (including discard passdown), but if the > underlying data device doesn't support discards then the discards will > not be passed down. > > And here are the feature controls that can be provided when loading the > thin-pool's DM table: > > ignore_discard: disable discard > no_discard_passdown: don't pass discards down to the data device > > -EOPNOTSUPP is only ever returned if 'ignore_discard' is provided. Ok, so in this case 'ignore_discard' has been configured ? Thanks! -Lukas ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 13:25 ` Lukáš Czerner 0 siblings, 0 replies; 72+ messages in thread From: Lukáš Czerner @ 2012-06-19 13:25 UTC (permalink / raw) To: Mike Snitzer Cc: xfs, device-mapper development, Spelic, Lukáš Czerner, linux-ext4 [-- Attachment #1: Type: TEXT/PLAIN, Size: 4041 bytes --] On Tue, 19 Jun 2012, Mike Snitzer wrote: > Date: Tue, 19 Jun 2012 09:16:49 -0400 > From: Mike Snitzer <snitzer@redhat.com> > To: Lukáš Czerner <lczerner@redhat.com> > Cc: Dave Chinner <david@fromorbit.com>, Spelic <spelic@shiftmail.org>, > device-mapper development <dm-devel@redhat.com>, > linux-ext4@vger.kernel.org, xfs@oss.sgi.com > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard > > On Tue, Jun 19 2012 at 2:32am -0400, > Lukáš Czerner <lczerner@redhat.com> wrote: > > > On Mon, 18 Jun 2012, Mike Snitzer wrote: > > > > > Date: Mon, 18 Jun 2012 23:12:42 -0400 > > > From: Mike Snitzer <snitzer@redhat.com> > > > To: Dave Chinner <david@fromorbit.com> > > > Cc: Spelic <spelic@shiftmail.org>, > > > device-mapper development <dm-devel@redhat.com>, > > > linux-ext4@vger.kernel.org, xfs@oss.sgi.com > > > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard > > > > > > On Mon, Jun 18 2012 at 9:57pm -0400, > > > Dave Chinner <david@fromorbit.com> wrote: > > > > > > > On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote: > > > > > > > > > Please note that since I am above MD raid5 (I believe this is the > > > > > reason), the passdown of discards does not work, as my dmesg says: > > > > > [160508.497879] device-mapper: thin: Discard unsupported by data > > > > > device (dm-1): Disabling discard passdown. > > > > > but AFAIU, unless there is a thinp bug, this should not affect the > > > > > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is > > > > > able to do that. > > > > > > > > Does ext4 report that same error? > > > > > > That message says the underlying device doesn't support discards > > > (because it is an MD device). But the thinp device still has discards > > > enabled -- it just won't pass the discards down to the underlying data > > > device. > > > > > > So yes, it'll happen with ext4 -- it is generated when the thin-pool > > > device is loaded (which happens independent of the filesystem that is > > > layered ontop). > > > > > > The discards still inform the thin-pool that the corresponding extents > > > are no longer allocated. > > > > So do I understand correctly that even though the discard came > > through and thinp took advantage of it it still returns EOPNOTSUPP ? > > No, not correct. Why are you assuming this? I must be missing > something from this discussion that led you there. Those two paragraphs led me to that conclusion: That message says the underlying device doesn't support discards (because it is an MD device). But the thinp device still has discards enabled -- it just won't pass the discards down to the underlying data device. The discards still inform the thin-pool that the corresponding extents are no longer allocated. so I am a bit confused now. Why the dm-thin returned EOPNOTSUPP then ? Is that because it has been configured to ignore_discard, or it actually takes advantage of the discard but underlying device does not support it (and no_discard_passdown is not set) so it return EOPNOTSUPP ? > > > This seems rather suboptimal. IIRC there was a discussion to add an > > option to enable/disable sending discard in thinp target down > > to the device. > > > > So maybe it might be a bit smarter than that and actually > > enable/disable discard pass through depending on the underlying > > support, so we do not blindly send discard down to the device even > > though it does not support it. > > Yes, that is what we did. > > Discards are enabled my default (including discard passdown), but if the > underlying data device doesn't support discards then the discards will > not be passed down. > > And here are the feature controls that can be provided when loading the > thin-pool's DM table: > > ignore_discard: disable discard > no_discard_passdown: don't pass discards down to the data device > > -EOPNOTSUPP is only ever returned if 'ignore_discard' is provided. Ok, so in this case 'ignore_discard' has been configured ? Thanks! -Lukas [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 13:25 ` Lukáš Czerner @ 2012-06-19 13:30 ` Mike Snitzer -1 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-19 13:30 UTC (permalink / raw) To: Lukáš Czerner Cc: Dave Chinner, Spelic, device-mapper development, linux-ext4, xfs On Tue, Jun 19 2012 at 9:25am -0400, Lukáš Czerner <lczerner@redhat.com> wrote: > On Tue, 19 Jun 2012, Mike Snitzer wrote: > > > Date: Tue, 19 Jun 2012 09:16:49 -0400 > > From: Mike Snitzer <snitzer@redhat.com> > > To: Lukáš Czerner <lczerner@redhat.com> > > Cc: Dave Chinner <david@fromorbit.com>, Spelic <spelic@shiftmail.org>, > > device-mapper development <dm-devel@redhat.com>, > > linux-ext4@vger.kernel.org, xfs@oss.sgi.com > > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard > > > > On Tue, Jun 19 2012 at 2:32am -0400, > > Lukáš Czerner <lczerner@redhat.com> wrote: > > > > > So do I understand correctly that even though the discard came > > > through and thinp took advantage of it it still returns EOPNOTSUPP ? > > > > No, not correct. Why are you assuming this? I must be missing > > something from this discussion that led you there. > > Those two paragraphs led me to that conclusion: > > That message says the underlying device doesn't support discards > (because it is an MD device). But the thinp device still has discards > enabled -- it just won't pass the discards down to the underlying data > device. > > The discards still inform the thin-pool that the corresponding extents > are no longer allocated. > > so I am a bit confused now. Why the dm-thin returned EOPNOTSUPP then > ? Is that because it has been configured to ignore_discard, or it > actually takes advantage of the discard but underlying device does > not support it (and no_discard_passdown is not set) so it return > EOPNOTSUPP ? > > > > > > This seems rather suboptimal. IIRC there was a discussion to add an > > > option to enable/disable sending discard in thinp target down > > > to the device. > > > > > > So maybe it might be a bit smarter than that and actually > > > enable/disable discard pass through depending on the underlying > > > support, so we do not blindly send discard down to the device even > > > though it does not support it. > > > > Yes, that is what we did. > > > > Discards are enabled my default (including discard passdown), but if the > > underlying data device doesn't support discards then the discards will > > not be passed down. > > > > And here are the feature controls that can be provided when loading the > > thin-pool's DM table: > > > > ignore_discard: disable discard > > no_discard_passdown: don't pass discards down to the data device > > > > -EOPNOTSUPP is only ever returned if 'ignore_discard' is provided. > > Ok, so in this case 'ignore_discard' has been configured ? I don't recall Spelic saying anything about EOPNOTSUPP. So what has made you zero in on an -EOPNOTSUPP return (which should not be happening)? -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 13:30 ` Mike Snitzer 0 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-19 13:30 UTC (permalink / raw) To: Lukáš Czerner Cc: device-mapper development, linux-ext4, xfs, Spelic On Tue, Jun 19 2012 at 9:25am -0400, Lukáš Czerner <lczerner@redhat.com> wrote: > On Tue, 19 Jun 2012, Mike Snitzer wrote: > > > Date: Tue, 19 Jun 2012 09:16:49 -0400 > > From: Mike Snitzer <snitzer@redhat.com> > > To: Lukáš Czerner <lczerner@redhat.com> > > Cc: Dave Chinner <david@fromorbit.com>, Spelic <spelic@shiftmail.org>, > > device-mapper development <dm-devel@redhat.com>, > > linux-ext4@vger.kernel.org, xfs@oss.sgi.com > > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard > > > > On Tue, Jun 19 2012 at 2:32am -0400, > > Lukáš Czerner <lczerner@redhat.com> wrote: > > > > > So do I understand correctly that even though the discard came > > > through and thinp took advantage of it it still returns EOPNOTSUPP ? > > > > No, not correct. Why are you assuming this? I must be missing > > something from this discussion that led you there. > > Those two paragraphs led me to that conclusion: > > That message says the underlying device doesn't support discards > (because it is an MD device). But the thinp device still has discards > enabled -- it just won't pass the discards down to the underlying data > device. > > The discards still inform the thin-pool that the corresponding extents > are no longer allocated. > > so I am a bit confused now. Why the dm-thin returned EOPNOTSUPP then > ? Is that because it has been configured to ignore_discard, or it > actually takes advantage of the discard but underlying device does > not support it (and no_discard_passdown is not set) so it return > EOPNOTSUPP ? > > > > > > This seems rather suboptimal. IIRC there was a discussion to add an > > > option to enable/disable sending discard in thinp target down > > > to the device. > > > > > > So maybe it might be a bit smarter than that and actually > > > enable/disable discard pass through depending on the underlying > > > support, so we do not blindly send discard down to the device even > > > though it does not support it. > > > > Yes, that is what we did. > > > > Discards are enabled my default (including discard passdown), but if the > > underlying data device doesn't support discards then the discards will > > not be passed down. > > > > And here are the feature controls that can be provided when loading the > > thin-pool's DM table: > > > > ignore_discard: disable discard > > no_discard_passdown: don't pass discards down to the data device > > > > -EOPNOTSUPP is only ever returned if 'ignore_discard' is provided. > > Ok, so in this case 'ignore_discard' has been configured ? I don't recall Spelic saying anything about EOPNOTSUPP. So what has made you zero in on an -EOPNOTSUPP return (which should not be happening)? _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 13:30 ` Mike Snitzer @ 2012-06-19 13:52 ` Spelic -1 siblings, 0 replies; 72+ messages in thread From: Spelic @ 2012-06-19 13:52 UTC (permalink / raw) To: Mike Snitzer Cc: Lukáš Czerner, Dave Chinner, Spelic, device-mapper development, linux-ext4, xfs On 06/19/12 15:30, Mike Snitzer wrote: > I don't recall Spelic saying anything about EOPNOTSUPP. So what has > made you zero in on an -EOPNOTSUPP return (which should not be > happening)? Exactly: I do not know if EOPNOTSUPP is being returned or not. If this helps, I have configured dm-thin via lvm2 LVM version: 2.02.95(2) (2012-03-06) Library version: 1.02.74 (2012-03-06) Driver version: 4.22.0 from dmsetup table I only see one option : "skip_block_zeroing", if and only if I configure it with -Zn . I do not see anything regarding ignore_discard vg1-pooltry1-tpool: 0 20971520 thin-pool 252:1 252:2 2048 0 1 skip_block_zeroing vg1-pooltry1_tdata: 0 20971520 linear 9:20 62922752 vg1-pooltry1_tmeta: 0 8192 linear 9:20 83894272 vg1-thinlv1: 0 31457280 thin 252:3 1 and in dmesg: [ 33.685200] device-mapper: thin: Discard unsupported by data device (dm-2): Disabling discard passdown. [ 33.709586] device-mapper: thin: Discard unsupported by data device (dm-6): Disabling discard passdown. I do not know what is the mechanism for which xfs cannot unmap blocks from dm-thin, but it really can't. If anyone has dm-thin installed he can try. This is 100% reproducible for me. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 13:52 ` Spelic 0 siblings, 0 replies; 72+ messages in thread From: Spelic @ 2012-06-19 13:52 UTC (permalink / raw) To: Mike Snitzer Cc: xfs, device-mapper development, Spelic, Lukáš Czerner, linux-ext4 On 06/19/12 15:30, Mike Snitzer wrote: > I don't recall Spelic saying anything about EOPNOTSUPP. So what has > made you zero in on an -EOPNOTSUPP return (which should not be > happening)? Exactly: I do not know if EOPNOTSUPP is being returned or not. If this helps, I have configured dm-thin via lvm2 LVM version: 2.02.95(2) (2012-03-06) Library version: 1.02.74 (2012-03-06) Driver version: 4.22.0 from dmsetup table I only see one option : "skip_block_zeroing", if and only if I configure it with -Zn . I do not see anything regarding ignore_discard vg1-pooltry1-tpool: 0 20971520 thin-pool 252:1 252:2 2048 0 1 skip_block_zeroing vg1-pooltry1_tdata: 0 20971520 linear 9:20 62922752 vg1-pooltry1_tmeta: 0 8192 linear 9:20 83894272 vg1-thinlv1: 0 31457280 thin 252:3 1 and in dmesg: [ 33.685200] device-mapper: thin: Discard unsupported by data device (dm-2): Disabling discard passdown. [ 33.709586] device-mapper: thin: Discard unsupported by data device (dm-6): Disabling discard passdown. I do not know what is the mechanism for which xfs cannot unmap blocks from dm-thin, but it really can't. If anyone has dm-thin installed he can try. This is 100% reproducible for me. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 13:52 ` Spelic @ 2012-06-19 14:05 ` Eric Sandeen -1 siblings, 0 replies; 72+ messages in thread From: Eric Sandeen @ 2012-06-19 14:05 UTC (permalink / raw) To: Spelic Cc: Mike Snitzer, Lukáš Czerner, Dave Chinner, device-mapper development, linux-ext4, xfs On 6/19/12 8:52 AM, Spelic wrote: > On 06/19/12 15:30, Mike Snitzer wrote: >> I don't recall Spelic saying anything about EOPNOTSUPP. So what has made you zero in on an -EOPNOTSUPP return (which should not be happening)? > > Exactly: I do not know if EOPNOTSUPP is being returned or not. > > If this helps, I have configured dm-thin via lvm2 > LVM version: 2.02.95(2) (2012-03-06) > Library version: 1.02.74 (2012-03-06) > Driver version: 4.22.0 > > from dmsetup table I only see one option : "skip_block_zeroing", if and only if I configure it with -Zn . I do not see anything regarding ignore_discard > > vg1-pooltry1-tpool: 0 20971520 thin-pool 252:1 252:2 2048 0 1 skip_block_zeroing > vg1-pooltry1_tdata: 0 20971520 linear 9:20 62922752 > vg1-pooltry1_tmeta: 0 8192 linear 9:20 83894272 > vg1-thinlv1: 0 31457280 thin 252:3 1 > > > and in dmesg: > [ 33.685200] device-mapper: thin: Discard unsupported by data device (dm-2): Disabling discard passdown. > [ 33.709586] device-mapper: thin: Discard unsupported by data device (dm-6): Disabling discard passdown. > > > I do not know what is the mechanism for which xfs cannot unmap blocks from dm-thin, but it really can't. > If anyone has dm-thin installed he can try. This is 100% reproducible for me. Might be worth seeing if xfs is ever getting to its discard code? There is a tracepoint... # mount -t debugfs none /sys/kernel/debug # echo 1 > /sys/kernel/debug/tracing/tracing_enabled # echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_discard_extent/enable <run test> # cat /sys/kernel/debug/tracing/trace -Eric ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 14:05 ` Eric Sandeen 0 siblings, 0 replies; 72+ messages in thread From: Eric Sandeen @ 2012-06-19 14:05 UTC (permalink / raw) To: Spelic Cc: Mike Snitzer, xfs, device-mapper development, Lukáš Czerner, linux-ext4 On 6/19/12 8:52 AM, Spelic wrote: > On 06/19/12 15:30, Mike Snitzer wrote: >> I don't recall Spelic saying anything about EOPNOTSUPP. So what has made you zero in on an -EOPNOTSUPP return (which should not be happening)? > > Exactly: I do not know if EOPNOTSUPP is being returned or not. > > If this helps, I have configured dm-thin via lvm2 > LVM version: 2.02.95(2) (2012-03-06) > Library version: 1.02.74 (2012-03-06) > Driver version: 4.22.0 > > from dmsetup table I only see one option : "skip_block_zeroing", if and only if I configure it with -Zn . I do not see anything regarding ignore_discard > > vg1-pooltry1-tpool: 0 20971520 thin-pool 252:1 252:2 2048 0 1 skip_block_zeroing > vg1-pooltry1_tdata: 0 20971520 linear 9:20 62922752 > vg1-pooltry1_tmeta: 0 8192 linear 9:20 83894272 > vg1-thinlv1: 0 31457280 thin 252:3 1 > > > and in dmesg: > [ 33.685200] device-mapper: thin: Discard unsupported by data device (dm-2): Disabling discard passdown. > [ 33.709586] device-mapper: thin: Discard unsupported by data device (dm-6): Disabling discard passdown. > > > I do not know what is the mechanism for which xfs cannot unmap blocks from dm-thin, but it really can't. > If anyone has dm-thin installed he can try. This is 100% reproducible for me. Might be worth seeing if xfs is ever getting to its discard code? There is a tracepoint... # mount -t debugfs none /sys/kernel/debug # echo 1 > /sys/kernel/debug/tracing/tracing_enabled # echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_discard_extent/enable <run test> # cat /sys/kernel/debug/tracing/trace -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 13:52 ` Spelic @ 2012-06-19 14:44 ` Mike Snitzer -1 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-19 14:44 UTC (permalink / raw) To: Spelic Cc: Lukáš Czerner, Dave Chinner, device-mapper development, linux-ext4, xfs On Tue, Jun 19 2012 at 9:52am -0400, Spelic <spelic@shiftmail.org> wrote: > On 06/19/12 15:30, Mike Snitzer wrote: > >I don't recall Spelic saying anything about EOPNOTSUPP. So what > >has made you zero in on an -EOPNOTSUPP return (which should not be > >happening)? > > Exactly: I do not know if EOPNOTSUPP is being returned or not. > > If this helps, I have configured dm-thin via lvm2 > LVM version: 2.02.95(2) (2012-03-06) > Library version: 1.02.74 (2012-03-06) > Driver version: 4.22.0 > > from dmsetup table I only see one option : "skip_block_zeroing", if > and only if I configure it with -Zn . I do not see anything > regarding ignore_discard > > vg1-pooltry1-tpool: 0 20971520 thin-pool 252:1 252:2 2048 0 1 > skip_block_zeroing > vg1-pooltry1_tdata: 0 20971520 linear 9:20 62922752 > vg1-pooltry1_tmeta: 0 8192 linear 9:20 83894272 > vg1-thinlv1: 0 31457280 thin 252:3 1 > > > and in dmesg: > [ 33.685200] device-mapper: thin: Discard unsupported by data > device (dm-2): Disabling discard passdown. > [ 33.709586] device-mapper: thin: Discard unsupported by data > device (dm-6): Disabling discard passdown. > > > I do not know what is the mechanism for which xfs cannot unmap > blocks from dm-thin, but it really can't. > If anyone has dm-thin installed he can try. This is 100% > reproducible for me. I was initially surprised by this considering the thinp-test-suite does test a compilebench workload against xfs and ext4 using online discard (-o discard). But I just modified that test to use a thin-pool with 'ignore_discard' and the test still passed on both ext4 and xfs. So there is more work needed in the thinp-test-suite to use blktrace hooks to verify that discards are occuring when the compilebench generated files are removed. I'll work through that and report back. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 14:44 ` Mike Snitzer 0 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-19 14:44 UTC (permalink / raw) To: Spelic Cc: Lukáš Czerner, device-mapper development, linux-ext4, xfs On Tue, Jun 19 2012 at 9:52am -0400, Spelic <spelic@shiftmail.org> wrote: > On 06/19/12 15:30, Mike Snitzer wrote: > >I don't recall Spelic saying anything about EOPNOTSUPP. So what > >has made you zero in on an -EOPNOTSUPP return (which should not be > >happening)? > > Exactly: I do not know if EOPNOTSUPP is being returned or not. > > If this helps, I have configured dm-thin via lvm2 > LVM version: 2.02.95(2) (2012-03-06) > Library version: 1.02.74 (2012-03-06) > Driver version: 4.22.0 > > from dmsetup table I only see one option : "skip_block_zeroing", if > and only if I configure it with -Zn . I do not see anything > regarding ignore_discard > > vg1-pooltry1-tpool: 0 20971520 thin-pool 252:1 252:2 2048 0 1 > skip_block_zeroing > vg1-pooltry1_tdata: 0 20971520 linear 9:20 62922752 > vg1-pooltry1_tmeta: 0 8192 linear 9:20 83894272 > vg1-thinlv1: 0 31457280 thin 252:3 1 > > > and in dmesg: > [ 33.685200] device-mapper: thin: Discard unsupported by data > device (dm-2): Disabling discard passdown. > [ 33.709586] device-mapper: thin: Discard unsupported by data > device (dm-6): Disabling discard passdown. > > > I do not know what is the mechanism for which xfs cannot unmap > blocks from dm-thin, but it really can't. > If anyone has dm-thin installed he can try. This is 100% > reproducible for me. I was initially surprised by this considering the thinp-test-suite does test a compilebench workload against xfs and ext4 using online discard (-o discard). But I just modified that test to use a thin-pool with 'ignore_discard' and the test still passed on both ext4 and xfs. So there is more work needed in the thinp-test-suite to use blktrace hooks to verify that discards are occuring when the compilebench generated files are removed. I'll work through that and report back. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 14:44 ` Mike Snitzer @ 2012-06-19 18:48 ` Mike Snitzer -1 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-19 18:48 UTC (permalink / raw) To: Spelic Cc: Lukáš Czerner, device-mapper development, linux-ext4, Dave Chinner, xfs On Tue, Jun 19 2012 at 10:44am -0400, Mike Snitzer <snitzer@redhat.com> wrote: > On Tue, Jun 19 2012 at 9:52am -0400, > Spelic <spelic@shiftmail.org> wrote: > > > I do not know what is the mechanism for which xfs cannot unmap > > blocks from dm-thin, but it really can't. > > If anyone has dm-thin installed he can try. This is 100% > > reproducible for me. > > I was initially surprised by this considering the thinp-test-suite does > test a compilebench workload against xfs and ext4 using online discard > (-o discard). > > But I just modified that test to use a thin-pool with 'ignore_discard' > and the test still passed on both ext4 and xfs. > > So there is more work needed in the thinp-test-suite to use blktrace > hooks to verify that discards are occuring when the compilebench > generated files are removed. > > I'll work through that and report back. blktrace shows discards for both xfs and ext4. But in general xfs is issuing discards with much smaller extents than ext4 does, e.g.: to the thin device: + 128 vs + 32 to the thin-pool's data device: + 120 vs + 16 ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 18:48 ` Mike Snitzer 0 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-19 18:48 UTC (permalink / raw) To: Spelic Cc: Lukáš Czerner, device-mapper development, linux-ext4, xfs On Tue, Jun 19 2012 at 10:44am -0400, Mike Snitzer <snitzer@redhat.com> wrote: > On Tue, Jun 19 2012 at 9:52am -0400, > Spelic <spelic@shiftmail.org> wrote: > > > I do not know what is the mechanism for which xfs cannot unmap > > blocks from dm-thin, but it really can't. > > If anyone has dm-thin installed he can try. This is 100% > > reproducible for me. > > I was initially surprised by this considering the thinp-test-suite does > test a compilebench workload against xfs and ext4 using online discard > (-o discard). > > But I just modified that test to use a thin-pool with 'ignore_discard' > and the test still passed on both ext4 and xfs. > > So there is more work needed in the thinp-test-suite to use blktrace > hooks to verify that discards are occuring when the compilebench > generated files are removed. > > I'll work through that and report back. blktrace shows discards for both xfs and ext4. But in general xfs is issuing discards with much smaller extents than ext4 does, e.g.: to the thin device: + 128 vs + 32 to the thin-pool's data device: + 120 vs + 16 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 18:48 ` Mike Snitzer @ 2012-06-19 20:06 ` Dave Chinner -1 siblings, 0 replies; 72+ messages in thread From: Dave Chinner @ 2012-06-19 20:06 UTC (permalink / raw) To: Mike Snitzer Cc: Spelic, Lukáš Czerner, device-mapper development, linux-ext4, xfs On Tue, Jun 19, 2012 at 02:48:59PM -0400, Mike Snitzer wrote: > On Tue, Jun 19 2012 at 10:44am -0400, > Mike Snitzer <snitzer@redhat.com> wrote: > > > On Tue, Jun 19 2012 at 9:52am -0400, > > Spelic <spelic@shiftmail.org> wrote: > > > > > I do not know what is the mechanism for which xfs cannot unmap > > > blocks from dm-thin, but it really can't. > > > If anyone has dm-thin installed he can try. This is 100% > > > reproducible for me. > > > > I was initially surprised by this considering the thinp-test-suite does > > test a compilebench workload against xfs and ext4 using online discard > > (-o discard). > > > > But I just modified that test to use a thin-pool with 'ignore_discard' > > and the test still passed on both ext4 and xfs. > > > > So there is more work needed in the thinp-test-suite to use blktrace > > hooks to verify that discards are occuring when the compilebench > > generated files are removed. > > > > I'll work through that and report back. > > blktrace shows discards for both xfs and ext4. > > But in general xfs is issuing discards with much smaller extents than > ext4 does, e.g.: THat's normal when you use -o discard - XFS sends extremely fine-grained discards as the have to be issued during the checkpoint commit that frees the extent. Hence they can't be aggregated like is done in ext4. As it is, no-one really should be using -o discard - it is extremely inefficient compared to a background fstrim run given that discards are unqueued, blocking IOs. It's just a bad idea until the lower layers get fixed to allow asynchronous, vectored discards and SATA supports queued discards... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 20:06 ` Dave Chinner 0 siblings, 0 replies; 72+ messages in thread From: Dave Chinner @ 2012-06-19 20:06 UTC (permalink / raw) To: Mike Snitzer Cc: Lukáš Czerner, device-mapper development, linux-ext4, xfs, Spelic On Tue, Jun 19, 2012 at 02:48:59PM -0400, Mike Snitzer wrote: > On Tue, Jun 19 2012 at 10:44am -0400, > Mike Snitzer <snitzer@redhat.com> wrote: > > > On Tue, Jun 19 2012 at 9:52am -0400, > > Spelic <spelic@shiftmail.org> wrote: > > > > > I do not know what is the mechanism for which xfs cannot unmap > > > blocks from dm-thin, but it really can't. > > > If anyone has dm-thin installed he can try. This is 100% > > > reproducible for me. > > > > I was initially surprised by this considering the thinp-test-suite does > > test a compilebench workload against xfs and ext4 using online discard > > (-o discard). > > > > But I just modified that test to use a thin-pool with 'ignore_discard' > > and the test still passed on both ext4 and xfs. > > > > So there is more work needed in the thinp-test-suite to use blktrace > > hooks to verify that discards are occuring when the compilebench > > generated files are removed. > > > > I'll work through that and report back. > > blktrace shows discards for both xfs and ext4. > > But in general xfs is issuing discards with much smaller extents than > ext4 does, e.g.: THat's normal when you use -o discard - XFS sends extremely fine-grained discards as the have to be issued during the checkpoint commit that frees the extent. Hence they can't be aggregated like is done in ext4. As it is, no-one really should be using -o discard - it is extremely inefficient compared to a background fstrim run given that discards are unqueued, blocking IOs. It's just a bad idea until the lower layers get fixed to allow asynchronous, vectored discards and SATA supports queued discards... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 20:06 ` Dave Chinner @ 2012-06-19 20:21 ` Ted Ts'o -1 siblings, 0 replies; 72+ messages in thread From: Ted Ts'o @ 2012-06-19 20:21 UTC (permalink / raw) To: Dave Chinner Cc: Mike Snitzer, Spelic, Lukáš Czerner, device-mapper development, linux-ext4, xfs On Wed, Jun 20, 2012 at 06:06:31AM +1000, Dave Chinner wrote: > > But in general xfs is issuing discards with much smaller extents than > > ext4 does, e.g.: > > THat's normal when you use -o discard - XFS sends extremely > fine-grained discards as the have to be issued during the checkpoint > commit that frees the extent. Hence they can't be aggregated like is > done in ext4. Actually, ext4 is also sending the discards during (well, actually, after) the commit which frees the extent/inode. We do aggregate them while the commit is open, but once the transaction is committed, we send out the discards. I suspect the difference is in the granularity of the transactions between ext4 and xfs. > As it is, no-one really should be using -o discard - it is extremely > inefficient compared to a background fstrim run given that discards > are unqueued, blocking IOs. It's just a bad idea until the lower > layers get fixed to allow asynchronous, vectored discards and SATA > supports queued discards... What Dave said. :-) This is true for both ext4 and xfs. As a result, I can very easily see there being a distinction made between when we *do* want to pass the discards all the way down to the device, and when we only want the thinp layer to process them --- because for current devices, sending discards down to the physical device is very heavyweight. I'm not sure how we could do this without a nasty layering violation, but some way in which we could label fstrim discards versus "we've committed the unlink/truncate and so thinp can feel free to reuse these blocks" discards would be interesting to consider. - Ted ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 20:21 ` Ted Ts'o 0 siblings, 0 replies; 72+ messages in thread From: Ted Ts'o @ 2012-06-19 20:21 UTC (permalink / raw) To: Dave Chinner Cc: Mike Snitzer, xfs, device-mapper development, Spelic, Lukáš Czerner, linux-ext4 On Wed, Jun 20, 2012 at 06:06:31AM +1000, Dave Chinner wrote: > > But in general xfs is issuing discards with much smaller extents than > > ext4 does, e.g.: > > THat's normal when you use -o discard - XFS sends extremely > fine-grained discards as the have to be issued during the checkpoint > commit that frees the extent. Hence they can't be aggregated like is > done in ext4. Actually, ext4 is also sending the discards during (well, actually, after) the commit which frees the extent/inode. We do aggregate them while the commit is open, but once the transaction is committed, we send out the discards. I suspect the difference is in the granularity of the transactions between ext4 and xfs. > As it is, no-one really should be using -o discard - it is extremely > inefficient compared to a background fstrim run given that discards > are unqueued, blocking IOs. It's just a bad idea until the lower > layers get fixed to allow asynchronous, vectored discards and SATA > supports queued discards... What Dave said. :-) This is true for both ext4 and xfs. As a result, I can very easily see there being a distinction made between when we *do* want to pass the discards all the way down to the device, and when we only want the thinp layer to process them --- because for current devices, sending discards down to the physical device is very heavyweight. I'm not sure how we could do this without a nasty layering violation, but some way in which we could label fstrim discards versus "we've committed the unlink/truncate and so thinp can feel free to reuse these blocks" discards would be interesting to consider. - Ted _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 20:21 ` Ted Ts'o @ 2012-06-19 20:39 ` Dave Chinner -1 siblings, 0 replies; 72+ messages in thread From: Dave Chinner @ 2012-06-19 20:39 UTC (permalink / raw) To: Ted Ts'o Cc: Mike Snitzer, Spelic, Lukáš Czerner, device-mapper development, linux-ext4, xfs On Tue, Jun 19, 2012 at 04:21:30PM -0400, Ted Ts'o wrote: > On Wed, Jun 20, 2012 at 06:06:31AM +1000, Dave Chinner wrote: > > > But in general xfs is issuing discards with much smaller extents than > > > ext4 does, e.g.: > > > > THat's normal when you use -o discard - XFS sends extremely > > fine-grained discards as the have to be issued during the checkpoint > > commit that frees the extent. Hence they can't be aggregated like is > > done in ext4. > > Actually, ext4 is also sending the discards during (well, actually, > after) the commit which frees the extent/inode. We do aggregate them > while the commit is open, but once the transaction is committed, we > send out the discards. I suspect the difference is in the granularity > of the transactions between ext4 and xfs. Exactly - XFS transactions are fine grained, checkpoints are coarse. We don't merge extents freed in fine grained transactions inside checkpoints. We probably could, but, well, it's complex to do in XFS and merging adjacent requests is something the block layer is supposed to do.... > > As it is, no-one really should be using -o discard - it is extremely > > inefficient compared to a background fstrim run given that discards > > are unqueued, blocking IOs. It's just a bad idea until the lower > > layers get fixed to allow asynchronous, vectored discards and SATA > > supports queued discards... > > What Dave said. :-) This is true for both ext4 and xfs. > > As a result, I can very easily see there being a distinction made > between when we *do* want to pass the discards all the way down to the > device, and when we only want the thinp layer to process them --- > because for current devices, sending discards down to the physical > device is very heavyweight. > > I'm not sure how we could do this without a nasty layering violation, > but some way in which we could label fstrim discards versus "we've > committed the unlink/truncate and so thinp can feel free to reuse > these blocks" discards would be interesting to consider. I think if we had better discard support from the block layer, it wouldn't matter from a filesystem POV what discard support is present in the block layer below it. I think it's better to get the block layer interface fixed than to add new request types/labels to filesystems to work around the current deficiencies. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 20:39 ` Dave Chinner 0 siblings, 0 replies; 72+ messages in thread From: Dave Chinner @ 2012-06-19 20:39 UTC (permalink / raw) To: Ted Ts'o Cc: Mike Snitzer, xfs, device-mapper development, Spelic, Lukáš Czerner, linux-ext4 On Tue, Jun 19, 2012 at 04:21:30PM -0400, Ted Ts'o wrote: > On Wed, Jun 20, 2012 at 06:06:31AM +1000, Dave Chinner wrote: > > > But in general xfs is issuing discards with much smaller extents than > > > ext4 does, e.g.: > > > > THat's normal when you use -o discard - XFS sends extremely > > fine-grained discards as the have to be issued during the checkpoint > > commit that frees the extent. Hence they can't be aggregated like is > > done in ext4. > > Actually, ext4 is also sending the discards during (well, actually, > after) the commit which frees the extent/inode. We do aggregate them > while the commit is open, but once the transaction is committed, we > send out the discards. I suspect the difference is in the granularity > of the transactions between ext4 and xfs. Exactly - XFS transactions are fine grained, checkpoints are coarse. We don't merge extents freed in fine grained transactions inside checkpoints. We probably could, but, well, it's complex to do in XFS and merging adjacent requests is something the block layer is supposed to do.... > > As it is, no-one really should be using -o discard - it is extremely > > inefficient compared to a background fstrim run given that discards > > are unqueued, blocking IOs. It's just a bad idea until the lower > > layers get fixed to allow asynchronous, vectored discards and SATA > > supports queued discards... > > What Dave said. :-) This is true for both ext4 and xfs. > > As a result, I can very easily see there being a distinction made > between when we *do* want to pass the discards all the way down to the > device, and when we only want the thinp layer to process them --- > because for current devices, sending discards down to the physical > device is very heavyweight. > > I'm not sure how we could do this without a nasty layering violation, > but some way in which we could label fstrim discards versus "we've > committed the unlink/truncate and so thinp can feel free to reuse > these blocks" discards would be interesting to consider. I think if we had better discard support from the block layer, it wouldn't matter from a filesystem POV what discard support is present in the block layer below it. I think it's better to get the block layer interface fixed than to add new request types/labels to filesystems to work around the current deficiencies. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 20:39 ` Dave Chinner @ 2012-06-20 9:01 ` Christoph Hellwig -1 siblings, 0 replies; 72+ messages in thread From: Christoph Hellwig @ 2012-06-20 9:01 UTC (permalink / raw) To: Dave Chinner Cc: Ted Ts'o, Mike Snitzer, xfs, device-mapper development, Spelic, Luk???? Czerner, linux-ext4 On Wed, Jun 20, 2012 at 06:39:38AM +1000, Dave Chinner wrote: > Exactly - XFS transactions are fine grained, checkpoints are coarse. > We don't merge extents freed in fine grained transactions inside > checkpoints. We probably could, but, well, it's complex to do in XFS > and merging adjacent requests is something the block layer is > supposed to do.... Last time I checked it actually tries to do that for discard requests, but then badly falls flat (=oopses). That's the reason why the XFS transaction commit code still uses the highly suboptimal synchronous blkdev_issue_discard instead of the async variant I wrote when designing the code. Another "issue" with the XFS discard pattern and the current block layer implementation is that XFS frees a lot of small metadata like inode clusters and btree blocks and discards them as well. If those simply fill one of the vectors in a range ATA TRIM command and/or a queueable command that's not much of an issue, but with the current combination of non-queueable, non-vetored TRIM that's a fairly nasty pattern. So until the block layer is sorted out I can not recommend actually using -o dicard. I planned to sort out the block layer issues ASAP when writing that code, but other things have kept me busy every since. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-20 9:01 ` Christoph Hellwig 0 siblings, 0 replies; 72+ messages in thread From: Christoph Hellwig @ 2012-06-20 9:01 UTC (permalink / raw) To: Dave Chinner Cc: Ted Ts'o, Mike Snitzer, xfs, device-mapper development, Spelic, Luk???? Czerner, linux-ext4 On Wed, Jun 20, 2012 at 06:39:38AM +1000, Dave Chinner wrote: > Exactly - XFS transactions are fine grained, checkpoints are coarse. > We don't merge extents freed in fine grained transactions inside > checkpoints. We probably could, but, well, it's complex to do in XFS > and merging adjacent requests is something the block layer is > supposed to do.... Last time I checked it actually tries to do that for discard requests, but then badly falls flat (=oopses). That's the reason why the XFS transaction commit code still uses the highly suboptimal synchronous blkdev_issue_discard instead of the async variant I wrote when designing the code. Another "issue" with the XFS discard pattern and the current block layer implementation is that XFS frees a lot of small metadata like inode clusters and btree blocks and discards them as well. If those simply fill one of the vectors in a range ATA TRIM command and/or a queueable command that's not much of an issue, but with the current combination of non-queueable, non-vetored TRIM that's a fairly nasty pattern. So until the block layer is sorted out I can not recommend actually using -o dicard. I planned to sort out the block layer issues ASAP when writing that code, but other things have kept me busy every since. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 20:06 ` Dave Chinner @ 2012-06-19 21:37 ` Spelic -1 siblings, 0 replies; 72+ messages in thread From: Spelic @ 2012-06-19 21:37 UTC (permalink / raw) To: Dave Chinner Cc: Mike Snitzer, Spelic, Lukáš Czerner, device-mapper development, linux-ext4, xfs On 06/19/12 22:06, Dave Chinner wrote: > On Tue, Jun 19, 2012 at 02:48:59PM -0400, Mike Snitzer wrote: >> On Tue, Jun 19 2012 at 10:44am -0400, >> Mike Snitzer<snitzer@redhat.com> wrote: >> >>> On Tue, Jun 19 2012 at 9:52am -0400, >>> Spelic<spelic@shiftmail.org> wrote: >>> >>>> I do not know what is the mechanism for which xfs cannot unmap >>>> blocks from dm-thin, but it really can't. >>>> If anyone has dm-thin installed he can try. This is 100% >>>> reproducible for me. >>> I was initially surprised by this considering the thinp-test-suite does >>> test a compilebench workload against xfs and ext4 using online discard >>> (-o discard). >>> >>> But I just modified that test to use a thin-pool with 'ignore_discard' >>> and the test still passed on both ext4 and xfs. >>> >>> So there is more work needed in the thinp-test-suite to use blktrace >>> hooks to verify that discards are occuring when the compilebench >>> generated files are removed. >>> >>> I'll work through that and report back. >> blktrace shows discards for both xfs and ext4. >> >> But in general xfs is issuing discards with much smaller extents than >> ext4 does, e.g.: > THat's normal when you use -o discard - XFS sends extremely > fine-grained discards as the have to be issued during the checkpoint > commit that frees the extent. Hence they can't be aggregated like is > done in ext4. > > As it is, no-one really should be using -o discard - it is extremely > inefficient compared to a background fstrim run given that discards > are unqueued, blocking IOs. It's just a bad idea until the lower > layers get fixed to allow asynchronous, vectored discards and SATA > supports queued discards... > Could it be that the thin blocksize is larger than the discard granularity by xfs so nothing ever gets unmapped? I have tried thin pools with the default blocksize (64k afair with lvm2) and 1MB. HOWEVER I also have tried fstrim on xfs, and that is also not capable to unmap things from the dm-thin. What is the granularity with fstrim in xfs? Sorry I can't access the machine right now; maybe tomorrow, or in the weekend. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 21:37 ` Spelic 0 siblings, 0 replies; 72+ messages in thread From: Spelic @ 2012-06-19 21:37 UTC (permalink / raw) To: Dave Chinner Cc: Mike Snitzer, xfs, device-mapper development, Spelic, Lukáš Czerner, linux-ext4 On 06/19/12 22:06, Dave Chinner wrote: > On Tue, Jun 19, 2012 at 02:48:59PM -0400, Mike Snitzer wrote: >> On Tue, Jun 19 2012 at 10:44am -0400, >> Mike Snitzer<snitzer@redhat.com> wrote: >> >>> On Tue, Jun 19 2012 at 9:52am -0400, >>> Spelic<spelic@shiftmail.org> wrote: >>> >>>> I do not know what is the mechanism for which xfs cannot unmap >>>> blocks from dm-thin, but it really can't. >>>> If anyone has dm-thin installed he can try. This is 100% >>>> reproducible for me. >>> I was initially surprised by this considering the thinp-test-suite does >>> test a compilebench workload against xfs and ext4 using online discard >>> (-o discard). >>> >>> But I just modified that test to use a thin-pool with 'ignore_discard' >>> and the test still passed on both ext4 and xfs. >>> >>> So there is more work needed in the thinp-test-suite to use blktrace >>> hooks to verify that discards are occuring when the compilebench >>> generated files are removed. >>> >>> I'll work through that and report back. >> blktrace shows discards for both xfs and ext4. >> >> But in general xfs is issuing discards with much smaller extents than >> ext4 does, e.g.: > THat's normal when you use -o discard - XFS sends extremely > fine-grained discards as the have to be issued during the checkpoint > commit that frees the extent. Hence they can't be aggregated like is > done in ext4. > > As it is, no-one really should be using -o discard - it is extremely > inefficient compared to a background fstrim run given that discards > are unqueued, blocking IOs. It's just a bad idea until the lower > layers get fixed to allow asynchronous, vectored discards and SATA > supports queued discards... > Could it be that the thin blocksize is larger than the discard granularity by xfs so nothing ever gets unmapped? I have tried thin pools with the default blocksize (64k afair with lvm2) and 1MB. HOWEVER I also have tried fstrim on xfs, and that is also not capable to unmap things from the dm-thin. What is the granularity with fstrim in xfs? Sorry I can't access the machine right now; maybe tomorrow, or in the weekend. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 21:37 ` Spelic @ 2012-06-19 23:12 ` Dave Chinner -1 siblings, 0 replies; 72+ messages in thread From: Dave Chinner @ 2012-06-19 23:12 UTC (permalink / raw) To: Spelic Cc: Mike Snitzer, Lukáš Czerner, device-mapper development, linux-ext4, xfs On Tue, Jun 19, 2012 at 11:37:54PM +0200, Spelic wrote: > On 06/19/12 22:06, Dave Chinner wrote: > >On Tue, Jun 19, 2012 at 02:48:59PM -0400, Mike Snitzer wrote: > >>On Tue, Jun 19 2012 at 10:44am -0400, > >>Mike Snitzer<snitzer@redhat.com> wrote: > >> > >>>On Tue, Jun 19 2012 at 9:52am -0400, > >>>Spelic<spelic@shiftmail.org> wrote: > >>> > >>>>I do not know what is the mechanism for which xfs cannot unmap > >>>>blocks from dm-thin, but it really can't. > >>>>If anyone has dm-thin installed he can try. This is 100% > >>>>reproducible for me. > >>>I was initially surprised by this considering the thinp-test-suite does > >>>test a compilebench workload against xfs and ext4 using online discard > >>>(-o discard). > >>> > >>>But I just modified that test to use a thin-pool with 'ignore_discard' > >>>and the test still passed on both ext4 and xfs. > >>> > >>>So there is more work needed in the thinp-test-suite to use blktrace > >>>hooks to verify that discards are occuring when the compilebench > >>>generated files are removed. > >>> > >>>I'll work through that and report back. > >>blktrace shows discards for both xfs and ext4. > >> > >>But in general xfs is issuing discards with much smaller extents than > >>ext4 does, e.g.: > >THat's normal when you use -o discard - XFS sends extremely > >fine-grained discards as the have to be issued during the checkpoint > >commit that frees the extent. Hence they can't be aggregated like is > >done in ext4. > > > >As it is, no-one really should be using -o discard - it is extremely > >inefficient compared to a background fstrim run given that discards > >are unqueued, blocking IOs. It's just a bad idea until the lower > >layers get fixed to allow asynchronous, vectored discards and SATA > >supports queued discards... > > > > Could it be that the thin blocksize is larger than the discard > granularity by xfs so nothing ever gets unmapped? for -o discard, possibly. for fstrim, unlikely. > I have tried thin pools with the default blocksize (64k afair with > lvm2) and 1MB. > HOWEVER I also have tried fstrim on xfs, and that is also not > capable to unmap things from the dm-thin. > What is the granularity with fstrim in xfs? Whatever granularity you passed fstrim. You need to run an event trace on XFS to find out if it is issuing discards before going any further.. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 23:12 ` Dave Chinner 0 siblings, 0 replies; 72+ messages in thread From: Dave Chinner @ 2012-06-19 23:12 UTC (permalink / raw) To: Spelic Cc: Lukáš Czerner, device-mapper development, linux-ext4, Mike Snitzer, xfs On Tue, Jun 19, 2012 at 11:37:54PM +0200, Spelic wrote: > On 06/19/12 22:06, Dave Chinner wrote: > >On Tue, Jun 19, 2012 at 02:48:59PM -0400, Mike Snitzer wrote: > >>On Tue, Jun 19 2012 at 10:44am -0400, > >>Mike Snitzer<snitzer@redhat.com> wrote: > >> > >>>On Tue, Jun 19 2012 at 9:52am -0400, > >>>Spelic<spelic@shiftmail.org> wrote: > >>> > >>>>I do not know what is the mechanism for which xfs cannot unmap > >>>>blocks from dm-thin, but it really can't. > >>>>If anyone has dm-thin installed he can try. This is 100% > >>>>reproducible for me. > >>>I was initially surprised by this considering the thinp-test-suite does > >>>test a compilebench workload against xfs and ext4 using online discard > >>>(-o discard). > >>> > >>>But I just modified that test to use a thin-pool with 'ignore_discard' > >>>and the test still passed on both ext4 and xfs. > >>> > >>>So there is more work needed in the thinp-test-suite to use blktrace > >>>hooks to verify that discards are occuring when the compilebench > >>>generated files are removed. > >>> > >>>I'll work through that and report back. > >>blktrace shows discards for both xfs and ext4. > >> > >>But in general xfs is issuing discards with much smaller extents than > >>ext4 does, e.g.: > >THat's normal when you use -o discard - XFS sends extremely > >fine-grained discards as the have to be issued during the checkpoint > >commit that frees the extent. Hence they can't be aggregated like is > >done in ext4. > > > >As it is, no-one really should be using -o discard - it is extremely > >inefficient compared to a background fstrim run given that discards > >are unqueued, blocking IOs. It's just a bad idea until the lower > >layers get fixed to allow asynchronous, vectored discards and SATA > >supports queued discards... > > > > Could it be that the thin blocksize is larger than the discard > granularity by xfs so nothing ever gets unmapped? for -o discard, possibly. for fstrim, unlikely. > I have tried thin pools with the default blocksize (64k afair with > lvm2) and 1MB. > HOWEVER I also have tried fstrim on xfs, and that is also not > capable to unmap things from the dm-thin. > What is the granularity with fstrim in xfs? Whatever granularity you passed fstrim. You need to run an event trace on XFS to find out if it is issuing discards before going any further.. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 1:57 ` Dave Chinner @ 2012-06-20 12:11 ` Spelic -1 siblings, 0 replies; 72+ messages in thread From: Spelic @ 2012-06-20 12:11 UTC (permalink / raw) To: Dave Chinner; +Cc: Spelic, xfs, linux-ext4, device-mapper development Ok guys, I think I found the bug. One or more bugs. Pool has chunksize 1MB. In sysfs the thin volume has: queue/discard_max_bytes and queue/discard_granularity are 1048576 . And it has discard_alignment = 0, which based on sysfs-block documentation is correct (a less misleading name would have been discard_offset imho). Here is the blktrace from ext4 fstrim: ... 252,9 17 498 0.030466556 841 Q D 19898368 + 2048 [fstrim] 252,9 17 499 0.030467501 841 Q D 19900416 + 2048 [fstrim] 252,9 17 500 0.030468359 841 Q D 19902464 + 2048 [fstrim] 252,9 17 501 0.030469313 841 Q D 19904512 + 2048 [fstrim] 252,9 17 502 0.030470144 841 Q D 19906560 + 2048 [fstrim] 252,9 17 503 0.030471381 841 Q D 19908608 + 2048 [fstrim] 252,9 17 504 0.030472473 841 Q D 19910656 + 2048 [fstrim] 252,9 17 505 0.030473504 841 Q D 19912704 + 2048 [fstrim] 252,9 17 506 0.030474561 841 Q D 19914752 + 2048 [fstrim] 252,9 17 507 0.030475571 841 Q D 19916800 + 2048 [fstrim] 252,9 17 508 0.030476423 841 Q D 19918848 + 2048 [fstrim] 252,9 17 509 0.030477341 841 Q D 19920896 + 2048 [fstrim] 252,9 17 510 0.034299630 841 Q D 19922944 + 2048 [fstrim] 252,9 17 511 0.034306880 841 Q D 19924992 + 2048 [fstrim] 252,9 17 512 0.034307955 841 Q D 19927040 + 2048 [fstrim] 252,9 17 513 0.034308928 841 Q D 19929088 + 2048 [fstrim] 252,9 17 514 0.034309945 841 Q D 19931136 + 2048 [fstrim] 252,9 17 515 0.034311007 841 Q D 19933184 + 2048 [fstrim] 252,9 17 516 0.034312008 841 Q D 19935232 + 2048 [fstrim] 252,9 17 517 0.034313122 841 Q D 19937280 + 2048 [fstrim] 252,9 17 518 0.034314013 841 Q D 19939328 + 2048 [fstrim] 252,9 17 519 0.034314940 841 Q D 19941376 + 2048 [fstrim] 252,9 17 520 0.034315835 841 Q D 19943424 + 2048 [fstrim] 252,9 17 521 0.034316662 841 Q D 19945472 + 2048 [fstrim] 252,9 17 522 0.034317547 841 Q D 19947520 + 2048 [fstrim] ... Here is the blktrace from xfs fstrim: 252,12 16 1 0.000000000 554 Q D 96 + 2048 [fstrim] 252,12 16 2 0.000010149 554 Q D 2144 + 2048 [fstrim] 252,12 16 3 0.000011349 554 Q D 4192 + 2048 [fstrim] 252,12 16 4 0.000012584 554 Q D 6240 + 2048 [fstrim] 252,12 16 5 0.000013685 554 Q D 8288 + 2048 [fstrim] 252,12 16 6 0.000014660 554 Q D 10336 + 2048 [fstrim] 252,12 16 7 0.000015707 554 Q D 12384 + 2048 [fstrim] 252,12 16 8 0.000016692 554 Q D 14432 + 2048 [fstrim] 252,12 16 9 0.000017594 554 Q D 16480 + 2048 [fstrim] 252,12 16 10 0.000018539 554 Q D 18528 + 2048 [fstrim] 252,12 16 11 0.000019434 554 Q D 20576 + 2048 [fstrim] 252,12 16 12 0.000020879 554 Q D 22624 + 2048 [fstrim] 252,12 16 13 0.000021856 554 Q D 24672 + 2048 [fstrim] 252,12 16 14 0.000022786 554 Q D 26720 + 2048 [fstrim] 252,12 16 15 0.000023699 554 Q D 28768 + 2048 [fstrim] 252,12 16 16 0.000024672 554 Q D 30816 + 2048 [fstrim] 252,12 16 17 0.000025467 554 Q D 32864 + 2048 [fstrim] 252,12 16 18 0.000026374 554 Q D 34912 + 2048 [fstrim] 252,12 16 19 0.000027194 554 Q D 36960 + 2048 [fstrim] 252,12 16 20 0.000028137 554 Q D 39008 + 2048 [fstrim] 252,12 16 21 0.000029524 554 Q D 41056 + 2048 [fstrim] 252,12 16 22 0.000030479 554 Q D 43104 + 2048 [fstrim] 252,12 16 23 0.000031306 554 Q D 45152 + 2048 [fstrim] 252,12 16 24 0.000032134 554 Q D 47200 + 2048 [fstrim] 252,12 16 25 0.000032964 554 Q D 49248 + 2048 [fstrim] 252,12 16 26 0.000033794 554 Q D 51296 + 2048 [fstrim] As you can see, while ext4 correctly aligns the discards to 1MB, xfs does not. It looks like an fstrim or xfs bug: they don't look at discard_alignment (=0 ... a less misleading name would be discard_offset imho) + discard_granularity (=1MB) and they don't base alignments on those. Clearly the dm-thin cannot unmap anything if the 1MB regions are not fully covered by a single discard. Note that specifying a large -m option for fstrim does NOT widen the discard messages above 2048, and this is correct because discard_max_bytes for that device is 1048576 . If discard_max_bytes could be made much larger these kind of bugs could be ameliorated, especially in complex situations like layers over layers, virtualization etc. Note that also in ext4 there are parts of the discard without the 1MB alignment as seen with blktrace (out of my snippet), so this also might need to be fixed, but most of it is aligned to 1MB. In xfs there are no parts aligned to 1MB. Now, another problem: Firstly I wanted to say that in my original post I missed the conv=notrunc for dd: I complained about the performances because I expected the zerofiles would have been rewritten in-place without block re-provisioning by dm-thin, but clearly without conv=notrunc this was not happening. I confirm that with conv=notrunc performances are high at the first rewrite, also in ext4, and occupied space in the thin volume does not increase at every rewrite by dd. HOWEVER by NOT specifying conv=notrunc, the behaviour of dd / ext4 / dm-thin is different if skip_block_zeroing is specified or not. If skip_block_zeroing is not specified (provisioned blocks are pre-zeroed) the space occupied by dd truncate + rewrite INCREASES at every rewrite, while if skip_block_zeroing is NOT specified, dd truncate + rewrite DOES NOT increase space occupied on the thin volume. Note: try this on ext4, not xfs. This looks very strange to me. The only reason I can think of is some kind of cooperative behaviour of ext4 with the variable dm-X/queue/discard_zeroes_data which is different in the two cases. Can anyone give an explanation or check if this is the intended behaviour? And still an open question is: why the speed of provisioning new blocks does not increase with increasing chunk size (64K --> 1MB --> 16MB...), not even when skip_block_zeroing has been set and there is no CoW? ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-20 12:11 ` Spelic 0 siblings, 0 replies; 72+ messages in thread From: Spelic @ 2012-06-20 12:11 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-ext4, device-mapper development, xfs, Spelic Ok guys, I think I found the bug. One or more bugs. Pool has chunksize 1MB. In sysfs the thin volume has: queue/discard_max_bytes and queue/discard_granularity are 1048576 . And it has discard_alignment = 0, which based on sysfs-block documentation is correct (a less misleading name would have been discard_offset imho). Here is the blktrace from ext4 fstrim: ... 252,9 17 498 0.030466556 841 Q D 19898368 + 2048 [fstrim] 252,9 17 499 0.030467501 841 Q D 19900416 + 2048 [fstrim] 252,9 17 500 0.030468359 841 Q D 19902464 + 2048 [fstrim] 252,9 17 501 0.030469313 841 Q D 19904512 + 2048 [fstrim] 252,9 17 502 0.030470144 841 Q D 19906560 + 2048 [fstrim] 252,9 17 503 0.030471381 841 Q D 19908608 + 2048 [fstrim] 252,9 17 504 0.030472473 841 Q D 19910656 + 2048 [fstrim] 252,9 17 505 0.030473504 841 Q D 19912704 + 2048 [fstrim] 252,9 17 506 0.030474561 841 Q D 19914752 + 2048 [fstrim] 252,9 17 507 0.030475571 841 Q D 19916800 + 2048 [fstrim] 252,9 17 508 0.030476423 841 Q D 19918848 + 2048 [fstrim] 252,9 17 509 0.030477341 841 Q D 19920896 + 2048 [fstrim] 252,9 17 510 0.034299630 841 Q D 19922944 + 2048 [fstrim] 252,9 17 511 0.034306880 841 Q D 19924992 + 2048 [fstrim] 252,9 17 512 0.034307955 841 Q D 19927040 + 2048 [fstrim] 252,9 17 513 0.034308928 841 Q D 19929088 + 2048 [fstrim] 252,9 17 514 0.034309945 841 Q D 19931136 + 2048 [fstrim] 252,9 17 515 0.034311007 841 Q D 19933184 + 2048 [fstrim] 252,9 17 516 0.034312008 841 Q D 19935232 + 2048 [fstrim] 252,9 17 517 0.034313122 841 Q D 19937280 + 2048 [fstrim] 252,9 17 518 0.034314013 841 Q D 19939328 + 2048 [fstrim] 252,9 17 519 0.034314940 841 Q D 19941376 + 2048 [fstrim] 252,9 17 520 0.034315835 841 Q D 19943424 + 2048 [fstrim] 252,9 17 521 0.034316662 841 Q D 19945472 + 2048 [fstrim] 252,9 17 522 0.034317547 841 Q D 19947520 + 2048 [fstrim] ... Here is the blktrace from xfs fstrim: 252,12 16 1 0.000000000 554 Q D 96 + 2048 [fstrim] 252,12 16 2 0.000010149 554 Q D 2144 + 2048 [fstrim] 252,12 16 3 0.000011349 554 Q D 4192 + 2048 [fstrim] 252,12 16 4 0.000012584 554 Q D 6240 + 2048 [fstrim] 252,12 16 5 0.000013685 554 Q D 8288 + 2048 [fstrim] 252,12 16 6 0.000014660 554 Q D 10336 + 2048 [fstrim] 252,12 16 7 0.000015707 554 Q D 12384 + 2048 [fstrim] 252,12 16 8 0.000016692 554 Q D 14432 + 2048 [fstrim] 252,12 16 9 0.000017594 554 Q D 16480 + 2048 [fstrim] 252,12 16 10 0.000018539 554 Q D 18528 + 2048 [fstrim] 252,12 16 11 0.000019434 554 Q D 20576 + 2048 [fstrim] 252,12 16 12 0.000020879 554 Q D 22624 + 2048 [fstrim] 252,12 16 13 0.000021856 554 Q D 24672 + 2048 [fstrim] 252,12 16 14 0.000022786 554 Q D 26720 + 2048 [fstrim] 252,12 16 15 0.000023699 554 Q D 28768 + 2048 [fstrim] 252,12 16 16 0.000024672 554 Q D 30816 + 2048 [fstrim] 252,12 16 17 0.000025467 554 Q D 32864 + 2048 [fstrim] 252,12 16 18 0.000026374 554 Q D 34912 + 2048 [fstrim] 252,12 16 19 0.000027194 554 Q D 36960 + 2048 [fstrim] 252,12 16 20 0.000028137 554 Q D 39008 + 2048 [fstrim] 252,12 16 21 0.000029524 554 Q D 41056 + 2048 [fstrim] 252,12 16 22 0.000030479 554 Q D 43104 + 2048 [fstrim] 252,12 16 23 0.000031306 554 Q D 45152 + 2048 [fstrim] 252,12 16 24 0.000032134 554 Q D 47200 + 2048 [fstrim] 252,12 16 25 0.000032964 554 Q D 49248 + 2048 [fstrim] 252,12 16 26 0.000033794 554 Q D 51296 + 2048 [fstrim] As you can see, while ext4 correctly aligns the discards to 1MB, xfs does not. It looks like an fstrim or xfs bug: they don't look at discard_alignment (=0 ... a less misleading name would be discard_offset imho) + discard_granularity (=1MB) and they don't base alignments on those. Clearly the dm-thin cannot unmap anything if the 1MB regions are not fully covered by a single discard. Note that specifying a large -m option for fstrim does NOT widen the discard messages above 2048, and this is correct because discard_max_bytes for that device is 1048576 . If discard_max_bytes could be made much larger these kind of bugs could be ameliorated, especially in complex situations like layers over layers, virtualization etc. Note that also in ext4 there are parts of the discard without the 1MB alignment as seen with blktrace (out of my snippet), so this also might need to be fixed, but most of it is aligned to 1MB. In xfs there are no parts aligned to 1MB. Now, another problem: Firstly I wanted to say that in my original post I missed the conv=notrunc for dd: I complained about the performances because I expected the zerofiles would have been rewritten in-place without block re-provisioning by dm-thin, but clearly without conv=notrunc this was not happening. I confirm that with conv=notrunc performances are high at the first rewrite, also in ext4, and occupied space in the thin volume does not increase at every rewrite by dd. HOWEVER by NOT specifying conv=notrunc, the behaviour of dd / ext4 / dm-thin is different if skip_block_zeroing is specified or not. If skip_block_zeroing is not specified (provisioned blocks are pre-zeroed) the space occupied by dd truncate + rewrite INCREASES at every rewrite, while if skip_block_zeroing is NOT specified, dd truncate + rewrite DOES NOT increase space occupied on the thin volume. Note: try this on ext4, not xfs. This looks very strange to me. The only reason I can think of is some kind of cooperative behaviour of ext4 with the variable dm-X/queue/discard_zeroes_data which is different in the two cases. Can anyone give an explanation or check if this is the intended behaviour? And still an open question is: why the speed of provisioning new blocks does not increase with increasing chunk size (64K --> 1MB --> 16MB...), not even when skip_block_zeroing has been set and there is no CoW? _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-20 12:11 ` Spelic @ 2012-06-20 22:53 ` Dave Chinner -1 siblings, 0 replies; 72+ messages in thread From: Dave Chinner @ 2012-06-20 22:53 UTC (permalink / raw) To: Spelic; +Cc: xfs, linux-ext4, device-mapper development On Wed, Jun 20, 2012 at 02:11:31PM +0200, Spelic wrote: > Ok guys, I think I found the bug. One or more bugs. > > > Pool has chunksize 1MB. > In sysfs the thin volume has: queue/discard_max_bytes and > queue/discard_granularity are 1048576 . > And it has discard_alignment = 0, which based on sysfs-block > documentation is correct (a less misleading name would have been > discard_offset imho). > Here is the blktrace from ext4 fstrim: > ... > 252,9 17 498 0.030466556 841 Q D 19898368 + 2048 [fstrim] > 252,9 17 499 0.030467501 841 Q D 19900416 + 2048 [fstrim] > 252,9 17 500 0.030468359 841 Q D 19902464 + 2048 [fstrim] > 252,9 17 501 0.030469313 841 Q D 19904512 + 2048 [fstrim] > 252,9 17 502 0.030470144 841 Q D 19906560 + 2048 [fstrim] > 252,9 17 503 0.030471381 841 Q D 19908608 + 2048 [fstrim] > 252,9 17 504 0.030472473 841 Q D 19910656 + 2048 [fstrim] > 252,9 17 505 0.030473504 841 Q D 19912704 + 2048 [fstrim] > 252,9 17 506 0.030474561 841 Q D 19914752 + 2048 [fstrim] > 252,9 17 507 0.030475571 841 Q D 19916800 + 2048 [fstrim] > 252,9 17 508 0.030476423 841 Q D 19918848 + 2048 [fstrim] > 252,9 17 509 0.030477341 841 Q D 19920896 + 2048 [fstrim] > 252,9 17 510 0.034299630 841 Q D 19922944 + 2048 [fstrim] > 252,9 17 511 0.034306880 841 Q D 19924992 + 2048 [fstrim] > 252,9 17 512 0.034307955 841 Q D 19927040 + 2048 [fstrim] > 252,9 17 513 0.034308928 841 Q D 19929088 + 2048 [fstrim] > 252,9 17 514 0.034309945 841 Q D 19931136 + 2048 [fstrim] > 252,9 17 515 0.034311007 841 Q D 19933184 + 2048 [fstrim] > 252,9 17 516 0.034312008 841 Q D 19935232 + 2048 [fstrim] > 252,9 17 517 0.034313122 841 Q D 19937280 + 2048 [fstrim] > 252,9 17 518 0.034314013 841 Q D 19939328 + 2048 [fstrim] > 252,9 17 519 0.034314940 841 Q D 19941376 + 2048 [fstrim] > 252,9 17 520 0.034315835 841 Q D 19943424 + 2048 [fstrim] > 252,9 17 521 0.034316662 841 Q D 19945472 + 2048 [fstrim] > 252,9 17 522 0.034317547 841 Q D 19947520 + 2048 [fstrim] > ... > > Here is the blktrace from xfs fstrim: > 252,12 16 1 0.000000000 554 Q D 96 + 2048 [fstrim] > 252,12 16 2 0.000010149 554 Q D 2144 + 2048 [fstrim] > 252,12 16 3 0.000011349 554 Q D 4192 + 2048 [fstrim] > 252,12 16 4 0.000012584 554 Q D 6240 + 2048 [fstrim] > 252,12 16 5 0.000013685 554 Q D 8288 + 2048 [fstrim] > 252,12 16 6 0.000014660 554 Q D 10336 + 2048 [fstrim] > 252,12 16 7 0.000015707 554 Q D 12384 + 2048 [fstrim] > 252,12 16 8 0.000016692 554 Q D 14432 + 2048 [fstrim] > 252,12 16 9 0.000017594 554 Q D 16480 + 2048 [fstrim] > 252,12 16 10 0.000018539 554 Q D 18528 + 2048 [fstrim] > 252,12 16 11 0.000019434 554 Q D 20576 + 2048 [fstrim] > 252,12 16 12 0.000020879 554 Q D 22624 + 2048 [fstrim] > 252,12 16 13 0.000021856 554 Q D 24672 + 2048 [fstrim] > 252,12 16 14 0.000022786 554 Q D 26720 + 2048 [fstrim] > 252,12 16 15 0.000023699 554 Q D 28768 + 2048 [fstrim] > 252,12 16 16 0.000024672 554 Q D 30816 + 2048 [fstrim] > 252,12 16 17 0.000025467 554 Q D 32864 + 2048 [fstrim] > 252,12 16 18 0.000026374 554 Q D 34912 + 2048 [fstrim] > 252,12 16 19 0.000027194 554 Q D 36960 + 2048 [fstrim] > 252,12 16 20 0.000028137 554 Q D 39008 + 2048 [fstrim] > 252,12 16 21 0.000029524 554 Q D 41056 + 2048 [fstrim] > 252,12 16 22 0.000030479 554 Q D 43104 + 2048 [fstrim] > 252,12 16 23 0.000031306 554 Q D 45152 + 2048 [fstrim] > 252,12 16 24 0.000032134 554 Q D 47200 + 2048 [fstrim] > 252,12 16 25 0.000032964 554 Q D 49248 + 2048 [fstrim] > 252,12 16 26 0.000033794 554 Q D 51296 + 2048 [fstrim] > > > As you can see, while ext4 correctly aligns the discards to 1MB, xfs > does not. XFs just sends a large extent to blkdev_issue_discard(), and cares nothing about discard alignment or granularity. > It looks like an fstrim or xfs bug: they don't look at > discard_alignment (=0 ... a less misleading name would be > discard_offset imho) + discard_granularity (=1MB) and they don't > base alignments on those. It looks like blkdev_issue_discard() has reduced each discard to bios of a single "granule" (1MB), and not aligned them, hence they are ignore by dm-thinp. what are the discard parameters exposed by dm-thinp in /sys/block/<thinp-blkdev>/queue/discard* It looks to me that dmthinp might be setting discard_max_bytes to 1MB rather than discard_granularity. Looking at dm-thin.c: static void set_discard_limits(struct pool *pool, struct queue_limits *limits) { /* * FIXME: these limits may be incompatible with the pool's data device */ limits->max_discard_sectors = pool->sectors_per_block; /* * This is just a hint, and not enforced. We have to cope with * bios that overlap 2 blocks. */ limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT; limits->discard_zeroes_data = pool->pf.zero_new_blocks; } Yes - discard_max_bytes == discard_granularity, and so blkdev_issue_discard fails to align the request properly. As it is, setting discard_max_bytes to the thinp block size is silly - it means you'll never get range requests, and we sent a discard for every single block in a range rather than having the thinp code iterate over a range itself. i.e. this is not a filesystem bug that is causing the problem.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-20 22:53 ` Dave Chinner 0 siblings, 0 replies; 72+ messages in thread From: Dave Chinner @ 2012-06-20 22:53 UTC (permalink / raw) To: Spelic; +Cc: device-mapper development, linux-ext4, xfs On Wed, Jun 20, 2012 at 02:11:31PM +0200, Spelic wrote: > Ok guys, I think I found the bug. One or more bugs. > > > Pool has chunksize 1MB. > In sysfs the thin volume has: queue/discard_max_bytes and > queue/discard_granularity are 1048576 . > And it has discard_alignment = 0, which based on sysfs-block > documentation is correct (a less misleading name would have been > discard_offset imho). > Here is the blktrace from ext4 fstrim: > ... > 252,9 17 498 0.030466556 841 Q D 19898368 + 2048 [fstrim] > 252,9 17 499 0.030467501 841 Q D 19900416 + 2048 [fstrim] > 252,9 17 500 0.030468359 841 Q D 19902464 + 2048 [fstrim] > 252,9 17 501 0.030469313 841 Q D 19904512 + 2048 [fstrim] > 252,9 17 502 0.030470144 841 Q D 19906560 + 2048 [fstrim] > 252,9 17 503 0.030471381 841 Q D 19908608 + 2048 [fstrim] > 252,9 17 504 0.030472473 841 Q D 19910656 + 2048 [fstrim] > 252,9 17 505 0.030473504 841 Q D 19912704 + 2048 [fstrim] > 252,9 17 506 0.030474561 841 Q D 19914752 + 2048 [fstrim] > 252,9 17 507 0.030475571 841 Q D 19916800 + 2048 [fstrim] > 252,9 17 508 0.030476423 841 Q D 19918848 + 2048 [fstrim] > 252,9 17 509 0.030477341 841 Q D 19920896 + 2048 [fstrim] > 252,9 17 510 0.034299630 841 Q D 19922944 + 2048 [fstrim] > 252,9 17 511 0.034306880 841 Q D 19924992 + 2048 [fstrim] > 252,9 17 512 0.034307955 841 Q D 19927040 + 2048 [fstrim] > 252,9 17 513 0.034308928 841 Q D 19929088 + 2048 [fstrim] > 252,9 17 514 0.034309945 841 Q D 19931136 + 2048 [fstrim] > 252,9 17 515 0.034311007 841 Q D 19933184 + 2048 [fstrim] > 252,9 17 516 0.034312008 841 Q D 19935232 + 2048 [fstrim] > 252,9 17 517 0.034313122 841 Q D 19937280 + 2048 [fstrim] > 252,9 17 518 0.034314013 841 Q D 19939328 + 2048 [fstrim] > 252,9 17 519 0.034314940 841 Q D 19941376 + 2048 [fstrim] > 252,9 17 520 0.034315835 841 Q D 19943424 + 2048 [fstrim] > 252,9 17 521 0.034316662 841 Q D 19945472 + 2048 [fstrim] > 252,9 17 522 0.034317547 841 Q D 19947520 + 2048 [fstrim] > ... > > Here is the blktrace from xfs fstrim: > 252,12 16 1 0.000000000 554 Q D 96 + 2048 [fstrim] > 252,12 16 2 0.000010149 554 Q D 2144 + 2048 [fstrim] > 252,12 16 3 0.000011349 554 Q D 4192 + 2048 [fstrim] > 252,12 16 4 0.000012584 554 Q D 6240 + 2048 [fstrim] > 252,12 16 5 0.000013685 554 Q D 8288 + 2048 [fstrim] > 252,12 16 6 0.000014660 554 Q D 10336 + 2048 [fstrim] > 252,12 16 7 0.000015707 554 Q D 12384 + 2048 [fstrim] > 252,12 16 8 0.000016692 554 Q D 14432 + 2048 [fstrim] > 252,12 16 9 0.000017594 554 Q D 16480 + 2048 [fstrim] > 252,12 16 10 0.000018539 554 Q D 18528 + 2048 [fstrim] > 252,12 16 11 0.000019434 554 Q D 20576 + 2048 [fstrim] > 252,12 16 12 0.000020879 554 Q D 22624 + 2048 [fstrim] > 252,12 16 13 0.000021856 554 Q D 24672 + 2048 [fstrim] > 252,12 16 14 0.000022786 554 Q D 26720 + 2048 [fstrim] > 252,12 16 15 0.000023699 554 Q D 28768 + 2048 [fstrim] > 252,12 16 16 0.000024672 554 Q D 30816 + 2048 [fstrim] > 252,12 16 17 0.000025467 554 Q D 32864 + 2048 [fstrim] > 252,12 16 18 0.000026374 554 Q D 34912 + 2048 [fstrim] > 252,12 16 19 0.000027194 554 Q D 36960 + 2048 [fstrim] > 252,12 16 20 0.000028137 554 Q D 39008 + 2048 [fstrim] > 252,12 16 21 0.000029524 554 Q D 41056 + 2048 [fstrim] > 252,12 16 22 0.000030479 554 Q D 43104 + 2048 [fstrim] > 252,12 16 23 0.000031306 554 Q D 45152 + 2048 [fstrim] > 252,12 16 24 0.000032134 554 Q D 47200 + 2048 [fstrim] > 252,12 16 25 0.000032964 554 Q D 49248 + 2048 [fstrim] > 252,12 16 26 0.000033794 554 Q D 51296 + 2048 [fstrim] > > > As you can see, while ext4 correctly aligns the discards to 1MB, xfs > does not. XFs just sends a large extent to blkdev_issue_discard(), and cares nothing about discard alignment or granularity. > It looks like an fstrim or xfs bug: they don't look at > discard_alignment (=0 ... a less misleading name would be > discard_offset imho) + discard_granularity (=1MB) and they don't > base alignments on those. It looks like blkdev_issue_discard() has reduced each discard to bios of a single "granule" (1MB), and not aligned them, hence they are ignore by dm-thinp. what are the discard parameters exposed by dm-thinp in /sys/block/<thinp-blkdev>/queue/discard* It looks to me that dmthinp might be setting discard_max_bytes to 1MB rather than discard_granularity. Looking at dm-thin.c: static void set_discard_limits(struct pool *pool, struct queue_limits *limits) { /* * FIXME: these limits may be incompatible with the pool's data device */ limits->max_discard_sectors = pool->sectors_per_block; /* * This is just a hint, and not enforced. We have to cope with * bios that overlap 2 blocks. */ limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT; limits->discard_zeroes_data = pool->pf.zero_new_blocks; } Yes - discard_max_bytes == discard_granularity, and so blkdev_issue_discard fails to align the request properly. As it is, setting discard_max_bytes to the thinp block size is silly - it means you'll never get range requests, and we sent a discard for every single block in a range rather than having the thinp code iterate over a range itself. i.e. this is not a filesystem bug that is causing the problem.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-20 22:53 ` Dave Chinner @ 2012-06-21 17:47 ` Mike Snitzer -1 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-21 17:47 UTC (permalink / raw) To: Dave Chinner Cc: Spelic, device-mapper development, linux-ext4, xfs, Paolo Bonzini, axboe, hch On Wed, Jun 20 2012 at 6:53pm -0400, Dave Chinner <david@fromorbit.com> wrote: > On Wed, Jun 20, 2012 at 02:11:31PM +0200, Spelic wrote: > > Ok guys, I think I found the bug. One or more bugs. > > > > > > Pool has chunksize 1MB. > > In sysfs the thin volume has: queue/discard_max_bytes and > > queue/discard_granularity are 1048576 . > > And it has discard_alignment = 0, which based on sysfs-block > > documentation is correct (a less misleading name would have been > > discard_offset imho). > > Here is the blktrace from ext4 fstrim: > > ... > > 252,9 17 498 0.030466556 841 Q D 19898368 + 2048 [fstrim] > > 252,9 17 499 0.030467501 841 Q D 19900416 + 2048 [fstrim] > > 252,9 17 500 0.030468359 841 Q D 19902464 + 2048 [fstrim] > > 252,9 17 501 0.030469313 841 Q D 19904512 + 2048 [fstrim] > > 252,9 17 502 0.030470144 841 Q D 19906560 + 2048 [fstrim] > > 252,9 17 503 0.030471381 841 Q D 19908608 + 2048 [fstrim] > > 252,9 17 504 0.030472473 841 Q D 19910656 + 2048 [fstrim] > > 252,9 17 505 0.030473504 841 Q D 19912704 + 2048 [fstrim] > > 252,9 17 506 0.030474561 841 Q D 19914752 + 2048 [fstrim] > > 252,9 17 507 0.030475571 841 Q D 19916800 + 2048 [fstrim] > > 252,9 17 508 0.030476423 841 Q D 19918848 + 2048 [fstrim] > > 252,9 17 509 0.030477341 841 Q D 19920896 + 2048 [fstrim] > > 252,9 17 510 0.034299630 841 Q D 19922944 + 2048 [fstrim] > > 252,9 17 511 0.034306880 841 Q D 19924992 + 2048 [fstrim] > > 252,9 17 512 0.034307955 841 Q D 19927040 + 2048 [fstrim] > > 252,9 17 513 0.034308928 841 Q D 19929088 + 2048 [fstrim] > > 252,9 17 514 0.034309945 841 Q D 19931136 + 2048 [fstrim] > > 252,9 17 515 0.034311007 841 Q D 19933184 + 2048 [fstrim] > > 252,9 17 516 0.034312008 841 Q D 19935232 + 2048 [fstrim] > > 252,9 17 517 0.034313122 841 Q D 19937280 + 2048 [fstrim] > > 252,9 17 518 0.034314013 841 Q D 19939328 + 2048 [fstrim] > > 252,9 17 519 0.034314940 841 Q D 19941376 + 2048 [fstrim] > > 252,9 17 520 0.034315835 841 Q D 19943424 + 2048 [fstrim] > > 252,9 17 521 0.034316662 841 Q D 19945472 + 2048 [fstrim] > > 252,9 17 522 0.034317547 841 Q D 19947520 + 2048 [fstrim] > > ... > > > > Here is the blktrace from xfs fstrim: > > 252,12 16 1 0.000000000 554 Q D 96 + 2048 [fstrim] > > 252,12 16 2 0.000010149 554 Q D 2144 + 2048 [fstrim] > > 252,12 16 3 0.000011349 554 Q D 4192 + 2048 [fstrim] > > 252,12 16 4 0.000012584 554 Q D 6240 + 2048 [fstrim] > > 252,12 16 5 0.000013685 554 Q D 8288 + 2048 [fstrim] > > 252,12 16 6 0.000014660 554 Q D 10336 + 2048 [fstrim] > > 252,12 16 7 0.000015707 554 Q D 12384 + 2048 [fstrim] > > 252,12 16 8 0.000016692 554 Q D 14432 + 2048 [fstrim] > > 252,12 16 9 0.000017594 554 Q D 16480 + 2048 [fstrim] > > 252,12 16 10 0.000018539 554 Q D 18528 + 2048 [fstrim] > > 252,12 16 11 0.000019434 554 Q D 20576 + 2048 [fstrim] > > 252,12 16 12 0.000020879 554 Q D 22624 + 2048 [fstrim] > > 252,12 16 13 0.000021856 554 Q D 24672 + 2048 [fstrim] > > 252,12 16 14 0.000022786 554 Q D 26720 + 2048 [fstrim] > > 252,12 16 15 0.000023699 554 Q D 28768 + 2048 [fstrim] > > 252,12 16 16 0.000024672 554 Q D 30816 + 2048 [fstrim] > > 252,12 16 17 0.000025467 554 Q D 32864 + 2048 [fstrim] > > 252,12 16 18 0.000026374 554 Q D 34912 + 2048 [fstrim] > > 252,12 16 19 0.000027194 554 Q D 36960 + 2048 [fstrim] > > 252,12 16 20 0.000028137 554 Q D 39008 + 2048 [fstrim] > > 252,12 16 21 0.000029524 554 Q D 41056 + 2048 [fstrim] > > 252,12 16 22 0.000030479 554 Q D 43104 + 2048 [fstrim] > > 252,12 16 23 0.000031306 554 Q D 45152 + 2048 [fstrim] > > 252,12 16 24 0.000032134 554 Q D 47200 + 2048 [fstrim] > > 252,12 16 25 0.000032964 554 Q D 49248 + 2048 [fstrim] > > 252,12 16 26 0.000033794 554 Q D 51296 + 2048 [fstrim] > > > > > > As you can see, while ext4 correctly aligns the discards to 1MB, xfs > > does not. > > XFs just sends a large extent to blkdev_issue_discard(), and cares > nothing about discard alignment or granularity. > > > It looks like an fstrim or xfs bug: they don't look at > > discard_alignment (=0 ... a less misleading name would be > > discard_offset imho) + discard_granularity (=1MB) and they don't > > base alignments on those. > > It looks like blkdev_issue_discard() has reduced each discard to > bios of a single "granule" (1MB), and not aligned them, hence they > are ignore by dm-thinp. > > what are the discard parameters exposed by dm-thinp in > /sys/block/<thinp-blkdev>/queue/discard* > > It looks to me that dmthinp might be setting discard_max_bytes to > 1MB rather than discard_granularity. Looking at dm-thin.c: > > static void set_discard_limits(struct pool *pool, struct queue_limits *limits) > { > /* > * FIXME: these limits may be incompatible with the pool's data device > */ > limits->max_discard_sectors = pool->sectors_per_block; > > /* > * This is just a hint, and not enforced. We have to cope with > * bios that overlap 2 blocks. > */ > limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT; > limits->discard_zeroes_data = pool->pf.zero_new_blocks; > } > > > Yes - discard_max_bytes == discard_granularity, and so > blkdev_issue_discard fails to align the request properly. As it is, > setting discard_max_bytes to the thinp block size is silly - it > means you'll never get range requests, and we sent a discard for > every single block in a range rather than having the thinp code > iterate over a range itself. So 2 different issues: 1) blkdev_issue_discard isn't properly aligning 2) thinp should accept larger discards (up to the stacked discard_max_bytes rather than setting an override) > i.e. this is not a filesystem bug that is causing the problem.... Paolo Bonzini fixed blkdev_issue_discard to properly align some time ago; unfortunately the patches slipped through the cracks (cc'ing Paolo, Jens, and Christoph). Here are references to Paolo's patches: 0/2 https://lkml.org/lkml/2012/3/14/323 1/2 https://lkml.org/lkml/2012/3/14/324 2/2 https://lkml.org/lkml/2012/3/14/325 Patch 2/2 specifically addresses the case where: discard_max_bytes == discard_granularity Paolo, any chance you could resend to Jens (maybe with hch's comments on patch#2 accounted for)? Also, please add hch's Reviewed-by when reposting. (would love to see this fixed for 3.5-rcX but if not 3.6 it is?) ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-21 17:47 ` Mike Snitzer 0 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-21 17:47 UTC (permalink / raw) To: Dave Chinner Cc: axboe, xfs, hch, device-mapper development, Spelic, Paolo Bonzini, linux-ext4 On Wed, Jun 20 2012 at 6:53pm -0400, Dave Chinner <david@fromorbit.com> wrote: > On Wed, Jun 20, 2012 at 02:11:31PM +0200, Spelic wrote: > > Ok guys, I think I found the bug. One or more bugs. > > > > > > Pool has chunksize 1MB. > > In sysfs the thin volume has: queue/discard_max_bytes and > > queue/discard_granularity are 1048576 . > > And it has discard_alignment = 0, which based on sysfs-block > > documentation is correct (a less misleading name would have been > > discard_offset imho). > > Here is the blktrace from ext4 fstrim: > > ... > > 252,9 17 498 0.030466556 841 Q D 19898368 + 2048 [fstrim] > > 252,9 17 499 0.030467501 841 Q D 19900416 + 2048 [fstrim] > > 252,9 17 500 0.030468359 841 Q D 19902464 + 2048 [fstrim] > > 252,9 17 501 0.030469313 841 Q D 19904512 + 2048 [fstrim] > > 252,9 17 502 0.030470144 841 Q D 19906560 + 2048 [fstrim] > > 252,9 17 503 0.030471381 841 Q D 19908608 + 2048 [fstrim] > > 252,9 17 504 0.030472473 841 Q D 19910656 + 2048 [fstrim] > > 252,9 17 505 0.030473504 841 Q D 19912704 + 2048 [fstrim] > > 252,9 17 506 0.030474561 841 Q D 19914752 + 2048 [fstrim] > > 252,9 17 507 0.030475571 841 Q D 19916800 + 2048 [fstrim] > > 252,9 17 508 0.030476423 841 Q D 19918848 + 2048 [fstrim] > > 252,9 17 509 0.030477341 841 Q D 19920896 + 2048 [fstrim] > > 252,9 17 510 0.034299630 841 Q D 19922944 + 2048 [fstrim] > > 252,9 17 511 0.034306880 841 Q D 19924992 + 2048 [fstrim] > > 252,9 17 512 0.034307955 841 Q D 19927040 + 2048 [fstrim] > > 252,9 17 513 0.034308928 841 Q D 19929088 + 2048 [fstrim] > > 252,9 17 514 0.034309945 841 Q D 19931136 + 2048 [fstrim] > > 252,9 17 515 0.034311007 841 Q D 19933184 + 2048 [fstrim] > > 252,9 17 516 0.034312008 841 Q D 19935232 + 2048 [fstrim] > > 252,9 17 517 0.034313122 841 Q D 19937280 + 2048 [fstrim] > > 252,9 17 518 0.034314013 841 Q D 19939328 + 2048 [fstrim] > > 252,9 17 519 0.034314940 841 Q D 19941376 + 2048 [fstrim] > > 252,9 17 520 0.034315835 841 Q D 19943424 + 2048 [fstrim] > > 252,9 17 521 0.034316662 841 Q D 19945472 + 2048 [fstrim] > > 252,9 17 522 0.034317547 841 Q D 19947520 + 2048 [fstrim] > > ... > > > > Here is the blktrace from xfs fstrim: > > 252,12 16 1 0.000000000 554 Q D 96 + 2048 [fstrim] > > 252,12 16 2 0.000010149 554 Q D 2144 + 2048 [fstrim] > > 252,12 16 3 0.000011349 554 Q D 4192 + 2048 [fstrim] > > 252,12 16 4 0.000012584 554 Q D 6240 + 2048 [fstrim] > > 252,12 16 5 0.000013685 554 Q D 8288 + 2048 [fstrim] > > 252,12 16 6 0.000014660 554 Q D 10336 + 2048 [fstrim] > > 252,12 16 7 0.000015707 554 Q D 12384 + 2048 [fstrim] > > 252,12 16 8 0.000016692 554 Q D 14432 + 2048 [fstrim] > > 252,12 16 9 0.000017594 554 Q D 16480 + 2048 [fstrim] > > 252,12 16 10 0.000018539 554 Q D 18528 + 2048 [fstrim] > > 252,12 16 11 0.000019434 554 Q D 20576 + 2048 [fstrim] > > 252,12 16 12 0.000020879 554 Q D 22624 + 2048 [fstrim] > > 252,12 16 13 0.000021856 554 Q D 24672 + 2048 [fstrim] > > 252,12 16 14 0.000022786 554 Q D 26720 + 2048 [fstrim] > > 252,12 16 15 0.000023699 554 Q D 28768 + 2048 [fstrim] > > 252,12 16 16 0.000024672 554 Q D 30816 + 2048 [fstrim] > > 252,12 16 17 0.000025467 554 Q D 32864 + 2048 [fstrim] > > 252,12 16 18 0.000026374 554 Q D 34912 + 2048 [fstrim] > > 252,12 16 19 0.000027194 554 Q D 36960 + 2048 [fstrim] > > 252,12 16 20 0.000028137 554 Q D 39008 + 2048 [fstrim] > > 252,12 16 21 0.000029524 554 Q D 41056 + 2048 [fstrim] > > 252,12 16 22 0.000030479 554 Q D 43104 + 2048 [fstrim] > > 252,12 16 23 0.000031306 554 Q D 45152 + 2048 [fstrim] > > 252,12 16 24 0.000032134 554 Q D 47200 + 2048 [fstrim] > > 252,12 16 25 0.000032964 554 Q D 49248 + 2048 [fstrim] > > 252,12 16 26 0.000033794 554 Q D 51296 + 2048 [fstrim] > > > > > > As you can see, while ext4 correctly aligns the discards to 1MB, xfs > > does not. > > XFs just sends a large extent to blkdev_issue_discard(), and cares > nothing about discard alignment or granularity. > > > It looks like an fstrim or xfs bug: they don't look at > > discard_alignment (=0 ... a less misleading name would be > > discard_offset imho) + discard_granularity (=1MB) and they don't > > base alignments on those. > > It looks like blkdev_issue_discard() has reduced each discard to > bios of a single "granule" (1MB), and not aligned them, hence they > are ignore by dm-thinp. > > what are the discard parameters exposed by dm-thinp in > /sys/block/<thinp-blkdev>/queue/discard* > > It looks to me that dmthinp might be setting discard_max_bytes to > 1MB rather than discard_granularity. Looking at dm-thin.c: > > static void set_discard_limits(struct pool *pool, struct queue_limits *limits) > { > /* > * FIXME: these limits may be incompatible with the pool's data device > */ > limits->max_discard_sectors = pool->sectors_per_block; > > /* > * This is just a hint, and not enforced. We have to cope with > * bios that overlap 2 blocks. > */ > limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT; > limits->discard_zeroes_data = pool->pf.zero_new_blocks; > } > > > Yes - discard_max_bytes == discard_granularity, and so > blkdev_issue_discard fails to align the request properly. As it is, > setting discard_max_bytes to the thinp block size is silly - it > means you'll never get range requests, and we sent a discard for > every single block in a range rather than having the thinp code > iterate over a range itself. So 2 different issues: 1) blkdev_issue_discard isn't properly aligning 2) thinp should accept larger discards (up to the stacked discard_max_bytes rather than setting an override) > i.e. this is not a filesystem bug that is causing the problem.... Paolo Bonzini fixed blkdev_issue_discard to properly align some time ago; unfortunately the patches slipped through the cracks (cc'ing Paolo, Jens, and Christoph). Here are references to Paolo's patches: 0/2 https://lkml.org/lkml/2012/3/14/323 1/2 https://lkml.org/lkml/2012/3/14/324 2/2 https://lkml.org/lkml/2012/3/14/325 Patch 2/2 specifically addresses the case where: discard_max_bytes == discard_granularity Paolo, any chance you could resend to Jens (maybe with hch's comments on patch#2 accounted for)? Also, please add hch's Reviewed-by when reposting. (would love to see this fixed for 3.5-rcX but if not 3.6 it is?) _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-21 17:47 ` Mike Snitzer @ 2012-06-21 23:29 ` Dave Chinner -1 siblings, 0 replies; 72+ messages in thread From: Dave Chinner @ 2012-06-21 23:29 UTC (permalink / raw) To: Mike Snitzer Cc: Spelic, device-mapper development, linux-ext4, xfs, Paolo Bonzini, axboe, hch On Thu, Jun 21, 2012 at 01:47:43PM -0400, Mike Snitzer wrote: > On Wed, Jun 20 2012 at 6:53pm -0400, > Dave Chinner <david@fromorbit.com> wrote: > > > On Wed, Jun 20, 2012 at 02:11:31PM +0200, Spelic wrote: > > > Ok guys, I think I found the bug. One or more bugs. > > > > > > > > > Pool has chunksize 1MB. > > > In sysfs the thin volume has: queue/discard_max_bytes and > > > queue/discard_granularity are 1048576 . > > > And it has discard_alignment = 0, which based on sysfs-block > > > documentation is correct (a less misleading name would have been > > > discard_offset imho). > > > Here is the blktrace from ext4 fstrim: > > > ... > > > 252,9 17 498 0.030466556 841 Q D 19898368 + 2048 [fstrim] > > > 252,9 17 499 0.030467501 841 Q D 19900416 + 2048 [fstrim] > > > 252,9 17 500 0.030468359 841 Q D 19902464 + 2048 [fstrim] .... > > > Here is the blktrace from xfs fstrim: > > > 252,12 16 1 0.000000000 554 Q D 96 + 2048 [fstrim] > > > 252,12 16 2 0.000010149 554 Q D 2144 + 2048 [fstrim] > > > 252,12 16 3 0.000011349 554 Q D 4192 + 2048 [fstrim] ..... > > It looks like blkdev_issue_discard() has reduced each discard to > > bios of a single "granule" (1MB), and not aligned them, hence they > > are ignore by dm-thinp. > > > > what are the discard parameters exposed by dm-thinp in > > /sys/block/<thinp-blkdev>/queue/discard* > > > > It looks to me that dmthinp might be setting discard_max_bytes to > > 1MB rather than discard_granularity. Looking at dm-thin.c: > > > > static void set_discard_limits(struct pool *pool, struct queue_limits *limits) > > { > > /* > > * FIXME: these limits may be incompatible with the pool's data device > > */ > > limits->max_discard_sectors = pool->sectors_per_block; > > > > /* > > * This is just a hint, and not enforced. We have to cope with > > * bios that overlap 2 blocks. > > */ > > limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT; > > limits->discard_zeroes_data = pool->pf.zero_new_blocks; > > } > > > > > > Yes - discard_max_bytes == discard_granularity, and so > > blkdev_issue_discard fails to align the request properly. As it is, > > setting discard_max_bytes to the thinp block size is silly - it > > means you'll never get range requests, and we sent a discard for > > every single block in a range rather than having the thinp code > > iterate over a range itself. > > So 2 different issues: > 1) blkdev_issue_discard isn't properly aligning > 2) thinp should accept larger discards (up to the stacked > discard_max_bytes rather than setting an override) Yes, in effect, but there's no real reason I can see why thinp can't accept large discard requests than the underlying stack and break them up appropriately itself.... > > i.e. this is not a filesystem bug that is causing the problem.... > > Paolo Bonzini fixed blkdev_issue_discard to properly align some time > ago; unfortunately the patches slipped through the cracks (cc'ing Paolo, > Jens, and Christoph). > > Here are references to Paolo's patches: > 0/2 https://lkml.org/lkml/2012/3/14/323 > 1/2 https://lkml.org/lkml/2012/3/14/324 > 2/2 https://lkml.org/lkml/2012/3/14/325 > > Patch 2/2 specifically addresses the case where: > discard_max_bytes == discard_granularity > > Paolo, any chance you could resend to Jens (maybe with hch's comments on > patch#2 accounted for)? Also, please add hch's Reviewed-by when > reposting. > > (would love to see this fixed for 3.5-rcX but if not 3.6 it is?) That would be good... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-21 23:29 ` Dave Chinner 0 siblings, 0 replies; 72+ messages in thread From: Dave Chinner @ 2012-06-21 23:29 UTC (permalink / raw) To: Mike Snitzer Cc: axboe, xfs, hch, device-mapper development, Spelic, Paolo Bonzini, linux-ext4 On Thu, Jun 21, 2012 at 01:47:43PM -0400, Mike Snitzer wrote: > On Wed, Jun 20 2012 at 6:53pm -0400, > Dave Chinner <david@fromorbit.com> wrote: > > > On Wed, Jun 20, 2012 at 02:11:31PM +0200, Spelic wrote: > > > Ok guys, I think I found the bug. One or more bugs. > > > > > > > > > Pool has chunksize 1MB. > > > In sysfs the thin volume has: queue/discard_max_bytes and > > > queue/discard_granularity are 1048576 . > > > And it has discard_alignment = 0, which based on sysfs-block > > > documentation is correct (a less misleading name would have been > > > discard_offset imho). > > > Here is the blktrace from ext4 fstrim: > > > ... > > > 252,9 17 498 0.030466556 841 Q D 19898368 + 2048 [fstrim] > > > 252,9 17 499 0.030467501 841 Q D 19900416 + 2048 [fstrim] > > > 252,9 17 500 0.030468359 841 Q D 19902464 + 2048 [fstrim] .... > > > Here is the blktrace from xfs fstrim: > > > 252,12 16 1 0.000000000 554 Q D 96 + 2048 [fstrim] > > > 252,12 16 2 0.000010149 554 Q D 2144 + 2048 [fstrim] > > > 252,12 16 3 0.000011349 554 Q D 4192 + 2048 [fstrim] ..... > > It looks like blkdev_issue_discard() has reduced each discard to > > bios of a single "granule" (1MB), and not aligned them, hence they > > are ignore by dm-thinp. > > > > what are the discard parameters exposed by dm-thinp in > > /sys/block/<thinp-blkdev>/queue/discard* > > > > It looks to me that dmthinp might be setting discard_max_bytes to > > 1MB rather than discard_granularity. Looking at dm-thin.c: > > > > static void set_discard_limits(struct pool *pool, struct queue_limits *limits) > > { > > /* > > * FIXME: these limits may be incompatible with the pool's data device > > */ > > limits->max_discard_sectors = pool->sectors_per_block; > > > > /* > > * This is just a hint, and not enforced. We have to cope with > > * bios that overlap 2 blocks. > > */ > > limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT; > > limits->discard_zeroes_data = pool->pf.zero_new_blocks; > > } > > > > > > Yes - discard_max_bytes == discard_granularity, and so > > blkdev_issue_discard fails to align the request properly. As it is, > > setting discard_max_bytes to the thinp block size is silly - it > > means you'll never get range requests, and we sent a discard for > > every single block in a range rather than having the thinp code > > iterate over a range itself. > > So 2 different issues: > 1) blkdev_issue_discard isn't properly aligning > 2) thinp should accept larger discards (up to the stacked > discard_max_bytes rather than setting an override) Yes, in effect, but there's no real reason I can see why thinp can't accept large discard requests than the underlying stack and break them up appropriately itself.... > > i.e. this is not a filesystem bug that is causing the problem.... > > Paolo Bonzini fixed blkdev_issue_discard to properly align some time > ago; unfortunately the patches slipped through the cracks (cc'ing Paolo, > Jens, and Christoph). > > Here are references to Paolo's patches: > 0/2 https://lkml.org/lkml/2012/3/14/323 > 1/2 https://lkml.org/lkml/2012/3/14/324 > 2/2 https://lkml.org/lkml/2012/3/14/325 > > Patch 2/2 specifically addresses the case where: > discard_max_bytes == discard_granularity > > Paolo, any chance you could resend to Jens (maybe with hch's comments on > patch#2 accounted for)? Also, please add hch's Reviewed-by when > reposting. > > (would love to see this fixed for 3.5-rcX but if not 3.6 it is?) That would be good... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-21 17:47 ` Mike Snitzer @ 2012-07-01 14:53 ` Paolo Bonzini -1 siblings, 0 replies; 72+ messages in thread From: Paolo Bonzini @ 2012-07-01 14:53 UTC (permalink / raw) To: Mike Snitzer Cc: Dave Chinner, Spelic, device-mapper development, linux-ext4, xfs, axboe, hch Il 21/06/2012 19:47, Mike Snitzer ha scritto: > Paolo Bonzini fixed blkdev_issue_discard to properly align some time > ago; unfortunately the patches slipped through the cracks (cc'ing Paolo, > Jens, and Christoph). > > Here are references to Paolo's patches: > 0/2 https://lkml.org/lkml/2012/3/14/323 > 1/2 https://lkml.org/lkml/2012/3/14/324 > 2/2 https://lkml.org/lkml/2012/3/14/325 > > Patch 2/2 specifically addresses the case where: > discard_max_bytes == discard_granularity > > Paolo, any chance you could resend to Jens (maybe with hch's comments on > patch#2 accounted for)? Also, please add hch's Reviewed-by when > reposting. Sure, I'll do it this week. I just need to retest. Paolo ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-07-01 14:53 ` Paolo Bonzini 0 siblings, 0 replies; 72+ messages in thread From: Paolo Bonzini @ 2012-07-01 14:53 UTC (permalink / raw) To: Mike Snitzer Cc: axboe, xfs, hch, device-mapper development, Spelic, linux-ext4 Il 21/06/2012 19:47, Mike Snitzer ha scritto: > Paolo Bonzini fixed blkdev_issue_discard to properly align some time > ago; unfortunately the patches slipped through the cracks (cc'ing Paolo, > Jens, and Christoph). > > Here are references to Paolo's patches: > 0/2 https://lkml.org/lkml/2012/3/14/323 > 1/2 https://lkml.org/lkml/2012/3/14/324 > 2/2 https://lkml.org/lkml/2012/3/14/325 > > Patch 2/2 specifically addresses the case where: > discard_max_bytes == discard_granularity > > Paolo, any chance you could resend to Jens (maybe with hch's comments on > patch#2 accounted for)? Also, please add hch's Reviewed-by when > reposting. Sure, I'll do it this week. I just need to retest. Paolo _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-07-01 14:53 ` Paolo Bonzini @ 2012-07-02 13:00 ` Mike Snitzer -1 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-07-02 13:00 UTC (permalink / raw) To: Paolo Bonzini Cc: Dave Chinner, Spelic, device-mapper development, linux-ext4, xfs, axboe, hch, Martin K. Petersen On Sun, Jul 01 2012 at 10:53am -0400, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 21/06/2012 19:47, Mike Snitzer ha scritto: > > Paolo Bonzini fixed blkdev_issue_discard to properly align some time > > ago; unfortunately the patches slipped through the cracks (cc'ing Paolo, > > Jens, and Christoph). > > > > Here are references to Paolo's patches: > > 0/2 https://lkml.org/lkml/2012/3/14/323 > > 1/2 https://lkml.org/lkml/2012/3/14/324 > > 2/2 https://lkml.org/lkml/2012/3/14/325 > > > > Patch 2/2 specifically addresses the case where: > > discard_max_bytes == discard_granularity > > > > Paolo, any chance you could resend to Jens (maybe with hch's comments on > > patch#2 accounted for)? Also, please add hch's Reviewed-by when > > reposting. > > Sure, I'll do it this week. I just need to retest. Great, thanks. (cc'ing mkp) One thing that seemed odd was your adjustment for discard_alignment (in patch 1/2). I need to better understand how discard_alignment (an offset despite the name not saying as much) relates to alignment_offset. Could just be that once a partition tool, or lvm, etc account for alignment_offset (which they do now) that discard_alignment is automagically accounted for as a side-effect? (I haven't actually seen discard_alignment != 0 in the wild) Mike ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-07-02 13:00 ` Mike Snitzer 0 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-07-02 13:00 UTC (permalink / raw) To: Paolo Bonzini Cc: axboe, Martin K. Petersen, xfs, hch, device-mapper development, Spelic, linux-ext4 On Sun, Jul 01 2012 at 10:53am -0400, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 21/06/2012 19:47, Mike Snitzer ha scritto: > > Paolo Bonzini fixed blkdev_issue_discard to properly align some time > > ago; unfortunately the patches slipped through the cracks (cc'ing Paolo, > > Jens, and Christoph). > > > > Here are references to Paolo's patches: > > 0/2 https://lkml.org/lkml/2012/3/14/323 > > 1/2 https://lkml.org/lkml/2012/3/14/324 > > 2/2 https://lkml.org/lkml/2012/3/14/325 > > > > Patch 2/2 specifically addresses the case where: > > discard_max_bytes == discard_granularity > > > > Paolo, any chance you could resend to Jens (maybe with hch's comments on > > patch#2 accounted for)? Also, please add hch's Reviewed-by when > > reposting. > > Sure, I'll do it this week. I just need to retest. Great, thanks. (cc'ing mkp) One thing that seemed odd was your adjustment for discard_alignment (in patch 1/2). I need to better understand how discard_alignment (an offset despite the name not saying as much) relates to alignment_offset. Could just be that once a partition tool, or lvm, etc account for alignment_offset (which they do now) that discard_alignment is automagically accounted for as a side-effect? (I haven't actually seen discard_alignment != 0 in the wild) Mike _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-07-02 13:00 ` Mike Snitzer @ 2012-07-02 13:15 ` Paolo Bonzini -1 siblings, 0 replies; 72+ messages in thread From: Paolo Bonzini @ 2012-07-02 13:15 UTC (permalink / raw) To: Mike Snitzer Cc: Dave Chinner, Spelic, device-mapper development, linux-ext4, xfs, axboe, hch, Martin K. Petersen Il 02/07/2012 15:00, Mike Snitzer ha scritto: > On Sun, Jul 01 2012 at 10:53am -0400, > Paolo Bonzini <pbonzini@redhat.com> wrote: > >> Il 21/06/2012 19:47, Mike Snitzer ha scritto: >>> Paolo Bonzini fixed blkdev_issue_discard to properly align some time >>> ago; unfortunately the patches slipped through the cracks (cc'ing Paolo, >>> Jens, and Christoph). >>> >>> Here are references to Paolo's patches: >>> 0/2 https://lkml.org/lkml/2012/3/14/323 >>> 1/2 https://lkml.org/lkml/2012/3/14/324 >>> 2/2 https://lkml.org/lkml/2012/3/14/325 >>> >>> Patch 2/2 specifically addresses the case where: >>> discard_max_bytes == discard_granularity >>> >>> Paolo, any chance you could resend to Jens (maybe with hch's comments on >>> patch#2 accounted for)? Also, please add hch's Reviewed-by when >>> reposting. >> >> Sure, I'll do it this week. I just need to retest. > > Great, thanks. > > (cc'ing mkp) > > One thing that seemed odd was your adjustment for discard_alignment (in > patch 1/2). > > I need to better understand how discard_alignment (an offset despite the > name not saying as much) relates to alignment_offset. In principle, it doesn't. All SBC says is: The UNMAP GRANULARITY ALIGNMENT field indicates the LBA of the first logical block to which the OPTIMAL UNMAP GRANULARITY field applies. The unmap granularity alignment is used to calculate an optimal unmap request starting LBA as follows: optimal unmap request starting LBA = (n * optimal unmap granularity) + unmap granularity alignment and what my patch does is ensure that all requests except the first start at such an LBA. In practice, there is a connection between the two, because a sane disk will make all discard_alignment-aligned sectors also alignment_offset-aligned, or vice versa, or both (depending on whether 1<<phys_exp is < > or = to discard_granularity). > Could just be that once a partition tool, or lvm, etc account for > alignment_offset (which they do now) that discard_alignment is > automagically accounted for as a side-effect? Yes, if discard_granularity <= 1<<phys_exp. In that case, the condition above simplifies to discard_alignment == alignment_offset % discard_granularity. Your partitions will be already aligned to both alignment_offset and discard_alignment. It seems more likely that discard_granularity > 1<<phys_exp if they differ at all, in which case the partition tool will improve the situation but still not reach an optimal setting. The optimal positioning of partitions/logical volumes/etc. would be to align them to lcm(1<<phys_exp, discard_granularity), and "misalign" the starting sector by max(discard_alignment, alignment_offset). > (I haven't actually seen discard_alignment != 0 in the wild) Me neither, but it was easy to account for it in the patch. Paolo ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-07-02 13:15 ` Paolo Bonzini 0 siblings, 0 replies; 72+ messages in thread From: Paolo Bonzini @ 2012-07-02 13:15 UTC (permalink / raw) To: Mike Snitzer Cc: axboe, Martin K. Petersen, xfs, hch, device-mapper development, Spelic, linux-ext4 Il 02/07/2012 15:00, Mike Snitzer ha scritto: > On Sun, Jul 01 2012 at 10:53am -0400, > Paolo Bonzini <pbonzini@redhat.com> wrote: > >> Il 21/06/2012 19:47, Mike Snitzer ha scritto: >>> Paolo Bonzini fixed blkdev_issue_discard to properly align some time >>> ago; unfortunately the patches slipped through the cracks (cc'ing Paolo, >>> Jens, and Christoph). >>> >>> Here are references to Paolo's patches: >>> 0/2 https://lkml.org/lkml/2012/3/14/323 >>> 1/2 https://lkml.org/lkml/2012/3/14/324 >>> 2/2 https://lkml.org/lkml/2012/3/14/325 >>> >>> Patch 2/2 specifically addresses the case where: >>> discard_max_bytes == discard_granularity >>> >>> Paolo, any chance you could resend to Jens (maybe with hch's comments on >>> patch#2 accounted for)? Also, please add hch's Reviewed-by when >>> reposting. >> >> Sure, I'll do it this week. I just need to retest. > > Great, thanks. > > (cc'ing mkp) > > One thing that seemed odd was your adjustment for discard_alignment (in > patch 1/2). > > I need to better understand how discard_alignment (an offset despite the > name not saying as much) relates to alignment_offset. In principle, it doesn't. All SBC says is: The UNMAP GRANULARITY ALIGNMENT field indicates the LBA of the first logical block to which the OPTIMAL UNMAP GRANULARITY field applies. The unmap granularity alignment is used to calculate an optimal unmap request starting LBA as follows: optimal unmap request starting LBA = (n * optimal unmap granularity) + unmap granularity alignment and what my patch does is ensure that all requests except the first start at such an LBA. In practice, there is a connection between the two, because a sane disk will make all discard_alignment-aligned sectors also alignment_offset-aligned, or vice versa, or both (depending on whether 1<<phys_exp is < > or = to discard_granularity). > Could just be that once a partition tool, or lvm, etc account for > alignment_offset (which they do now) that discard_alignment is > automagically accounted for as a side-effect? Yes, if discard_granularity <= 1<<phys_exp. In that case, the condition above simplifies to discard_alignment == alignment_offset % discard_granularity. Your partitions will be already aligned to both alignment_offset and discard_alignment. It seems more likely that discard_granularity > 1<<phys_exp if they differ at all, in which case the partition tool will improve the situation but still not reach an optimal setting. The optimal positioning of partitions/logical volumes/etc. would be to align them to lcm(1<<phys_exp, discard_granularity), and "misalign" the starting sector by max(discard_alignment, alignment_offset). > (I haven't actually seen discard_alignment != 0 in the wild) Me neither, but it was easy to account for it in the patch. Paolo _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-18 21:33 ` Spelic @ 2012-06-19 14:09 ` Lukáš Czerner -1 siblings, 0 replies; 72+ messages in thread From: Lukáš Czerner @ 2012-06-19 14:09 UTC (permalink / raw) To: Spelic; +Cc: xfs, linux-ext4, device-mapper development On Mon, 18 Jun 2012, Spelic wrote: > Date: Mon, 18 Jun 2012 23:33:50 +0200 > From: Spelic <spelic@shiftmail.org> > To: xfs@oss.sgi.com, linux-ext4@vger.kernel.org, > device-mapper development <dm-devel@redhat.com> > Subject: Ext4 and xfs problems in dm-thin on allocation and discard > > Hello all > I am doing some testing of dm-thin on kernel 3.4.2 and latest lvm from source > (the rest is Ubuntu Precise 12.04). > There are a few problems with ext4 and (different ones with) xfs > > I am doing this: > dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync > lvs > rm zeroes #optional > dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync #again > lvs > rm zeroes #optional > ... > dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync #again > lvs > rm zeroes > fstrim /mnt/mountpoint > lvs > > On ext4 the problem is that it always reallocates blocks at different places, > so you can see from lvs that space occupation in the pool and thinlv increases > at each iteration of dd, again and again, until it has allocated the whole > thin device (really 100% of it). And this is true regardless of me doing rm or > not between one dd and the other. > The other problem is that by doing this, ext4 always gets the worst > performance from thinp, about 140MB/sec on my system, because it is constantly > allocating blocks, instead of 350MB/sec which should have been with my system > if it used already allocated regions (see below compared to xfs). I am on an > MD raid-5 of 5 hdds. > I could suggest to add a "thinp mode" mount option to ext4 affecting the > allocator, so that it tries to reallocate recently used and freed areas and > not constantly new areas. Note that mount -o discard does work and prevents > allocation bloating, but it still always gets the worst write performances > from thinp. Alternatively thinp could be improved so that block allocation is > fast :-P (*) > However, good news is that fstrim works correctly on ext4, and is able to drop > all space allocated by all dd's. Also mount -o discard works. I am happy to hear that discard actually works with ext4. Regarding the performance problem, part of it has already been explained by Dave and I agree with him. With thin provisioning you'll get totally different file system layout than on fully provisioned disk as you push more and more writes to your drive. This unfortunately has great impact on performance since file systems usually have a lot of optimization on where to put data/metadata on the drive and how to read them. However in case of thinly provisioned storage those optimization would not help. And yes, you just have to expect lower performance with dm-thin from the file system on top of it. It is not and it will never be ideal solution for workloads where you expect the best performance. However optimization have to be done on dm and fs side and the work is currently in progress and now when we have "cheap" thinp solution I guess that the progress will by quite faster in that regard. -Lukas > > On xfs there is a different problem. > Xfs apparently correctly re-uses the same blocks so that after the first write > at 140MB/sec, subsequent overwrites of the same file are at full speed such as > 350MB/sec (same speed as with non-thin lvm), and also you don't see space > occupation going up at every iteration of dd, either with or without rm > in-between the dd's. [ok actually now retrying it needed 3 rewrites to > stabilize allocation... probably an AG count thing.] > However the problem with XFS is that discard doesn't appear to work. Fstrim > doesn't work, and neither does "mount -o discard ... + rm zeroes" . There is > apparently no way to drop the allocated blocks, as seen from lvs. This is in > contrast to what it is written here http://xfs.org/index.php/FITRIM/discard > which declare fstrim and mount -o discard to be working. > Please note that since I am above MD raid5 (I believe this is the reason), the > passdown of discards does not work, as my dmesg says: > [160508.497879] device-mapper: thin: Discard unsupported by data device > (dm-1): Disabling discard passdown. > but AFAIU, unless there is a thinp bug, this should not affect the unmapping > of thin blocks by fstrimming xfs... and in fact ext4 is able to do that. > > (*) Strange thing is that write performance appears to be roughly the same for > default thin chunksize and for 1MB thin chunksize. I would have expected thinp > allocation to be faster with larger thin chunksizes but instead it is actually > slower (note that there are no snapshots here and hence no CoW). This is also > true if I set the thinpool to not zero newly allocated blocks: performances > are about 240 MB/sec then, but again they don't increase with larger > chunksizes, they actually decrease slightly with very large chunksizes such as > 16MB. Why is that? > > Thanks for your help > S. > ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 14:09 ` Lukáš Czerner 0 siblings, 0 replies; 72+ messages in thread From: Lukáš Czerner @ 2012-06-19 14:09 UTC (permalink / raw) To: Spelic; +Cc: device-mapper development, linux-ext4, xfs On Mon, 18 Jun 2012, Spelic wrote: > Date: Mon, 18 Jun 2012 23:33:50 +0200 > From: Spelic <spelic@shiftmail.org> > To: xfs@oss.sgi.com, linux-ext4@vger.kernel.org, > device-mapper development <dm-devel@redhat.com> > Subject: Ext4 and xfs problems in dm-thin on allocation and discard > > Hello all > I am doing some testing of dm-thin on kernel 3.4.2 and latest lvm from source > (the rest is Ubuntu Precise 12.04). > There are a few problems with ext4 and (different ones with) xfs > > I am doing this: > dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync > lvs > rm zeroes #optional > dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync #again > lvs > rm zeroes #optional > ... > dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync #again > lvs > rm zeroes > fstrim /mnt/mountpoint > lvs > > On ext4 the problem is that it always reallocates blocks at different places, > so you can see from lvs that space occupation in the pool and thinlv increases > at each iteration of dd, again and again, until it has allocated the whole > thin device (really 100% of it). And this is true regardless of me doing rm or > not between one dd and the other. > The other problem is that by doing this, ext4 always gets the worst > performance from thinp, about 140MB/sec on my system, because it is constantly > allocating blocks, instead of 350MB/sec which should have been with my system > if it used already allocated regions (see below compared to xfs). I am on an > MD raid-5 of 5 hdds. > I could suggest to add a "thinp mode" mount option to ext4 affecting the > allocator, so that it tries to reallocate recently used and freed areas and > not constantly new areas. Note that mount -o discard does work and prevents > allocation bloating, but it still always gets the worst write performances > from thinp. Alternatively thinp could be improved so that block allocation is > fast :-P (*) > However, good news is that fstrim works correctly on ext4, and is able to drop > all space allocated by all dd's. Also mount -o discard works. I am happy to hear that discard actually works with ext4. Regarding the performance problem, part of it has already been explained by Dave and I agree with him. With thin provisioning you'll get totally different file system layout than on fully provisioned disk as you push more and more writes to your drive. This unfortunately has great impact on performance since file systems usually have a lot of optimization on where to put data/metadata on the drive and how to read them. However in case of thinly provisioned storage those optimization would not help. And yes, you just have to expect lower performance with dm-thin from the file system on top of it. It is not and it will never be ideal solution for workloads where you expect the best performance. However optimization have to be done on dm and fs side and the work is currently in progress and now when we have "cheap" thinp solution I guess that the progress will by quite faster in that regard. -Lukas > > On xfs there is a different problem. > Xfs apparently correctly re-uses the same blocks so that after the first write > at 140MB/sec, subsequent overwrites of the same file are at full speed such as > 350MB/sec (same speed as with non-thin lvm), and also you don't see space > occupation going up at every iteration of dd, either with or without rm > in-between the dd's. [ok actually now retrying it needed 3 rewrites to > stabilize allocation... probably an AG count thing.] > However the problem with XFS is that discard doesn't appear to work. Fstrim > doesn't work, and neither does "mount -o discard ... + rm zeroes" . There is > apparently no way to drop the allocated blocks, as seen from lvs. This is in > contrast to what it is written here http://xfs.org/index.php/FITRIM/discard > which declare fstrim and mount -o discard to be working. > Please note that since I am above MD raid5 (I believe this is the reason), the > passdown of discards does not work, as my dmesg says: > [160508.497879] device-mapper: thin: Discard unsupported by data device > (dm-1): Disabling discard passdown. > but AFAIU, unless there is a thinp bug, this should not affect the unmapping > of thin blocks by fstrimming xfs... and in fact ext4 is able to do that. > > (*) Strange thing is that write performance appears to be roughly the same for > default thin chunksize and for 1MB thin chunksize. I would have expected thinp > allocation to be faster with larger thin chunksizes but instead it is actually > slower (note that there are no snapshots here and hence no CoW). This is also > true if I set the thinpool to not zero newly allocated blocks: performances > are about 240 MB/sec then, but again they don't increase with larger > chunksizes, they actually decrease slightly with very large chunksizes such as > 16MB. Why is that? > > Thanks for your help > S. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 14:09 ` Lukáš Czerner @ 2012-06-19 14:19 ` Ted Ts'o -1 siblings, 0 replies; 72+ messages in thread From: Ted Ts'o @ 2012-06-19 14:19 UTC (permalink / raw) To: Lukáš Czerner Cc: Spelic, xfs, linux-ext4, device-mapper development On Tue, Jun 19, 2012 at 04:09:48PM +0200, Lukáš Czerner wrote: > > With thin provisioning you'll get totally different file system > layout than on fully provisioned disk as you push more and more > writes to your drive. This unfortunately has great impact on > performance since file systems usually have a lot of optimization on > where to put data/metadata on the drive and how to read them. > However in case of thinly provisioned storage those optimization > would not help. And yes, you just have to expect lower performance > with dm-thin from the file system on top of it. It is not and it > will never be ideal solution for workloads where you expect the best > performance. One of the things which would be nice to be able to easily set up is a configuration where we get the benefits of thin provisioning with respect to snapshost, but where the underlying block device used by the file system is contiguous. That is, it would be really useful to *not* use thin provisioning for the underlying file system, but to use thin provisioned snapshots. That way we only pay the thinp performance penalty for the snapshots, and not for normal file system operations. This is something that would be very useful both for ext4 and xfs. I talked to Alasdair about this a few months ago at the Collab Summit, and I think it's doable today, but it was somewhat complicaed to set up. I don't recall the details now, but perhaps someone who's more familiar device mapper could outline the details, and perhaps we can either simplify it or abstract it away in a convenient front-end script? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 14:19 ` Ted Ts'o 0 siblings, 0 replies; 72+ messages in thread From: Ted Ts'o @ 2012-06-19 14:19 UTC (permalink / raw) To: Lukáš Czerner Cc: linux-ext4, device-mapper development, xfs, Spelic On Tue, Jun 19, 2012 at 04:09:48PM +0200, Lukáš Czerner wrote: > > With thin provisioning you'll get totally different file system > layout than on fully provisioned disk as you push more and more > writes to your drive. This unfortunately has great impact on > performance since file systems usually have a lot of optimization on > where to put data/metadata on the drive and how to read them. > However in case of thinly provisioned storage those optimization > would not help. And yes, you just have to expect lower performance > with dm-thin from the file system on top of it. It is not and it > will never be ideal solution for workloads where you expect the best > performance. One of the things which would be nice to be able to easily set up is a configuration where we get the benefits of thin provisioning with respect to snapshost, but where the underlying block device used by the file system is contiguous. That is, it would be really useful to *not* use thin provisioning for the underlying file system, but to use thin provisioned snapshots. That way we only pay the thinp performance penalty for the snapshots, and not for normal file system operations. This is something that would be very useful both for ext4 and xfs. I talked to Alasdair about this a few months ago at the Collab Summit, and I think it's doable today, but it was somewhat complicaed to set up. I don't recall the details now, but perhaps someone who's more familiar device mapper could outline the details, and perhaps we can either simplify it or abstract it away in a convenient front-end script? - Ted _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 14:19 ` Ted Ts'o @ 2012-06-19 14:23 ` Eric Sandeen -1 siblings, 0 replies; 72+ messages in thread From: Eric Sandeen @ 2012-06-19 14:23 UTC (permalink / raw) To: Ted Ts'o Cc: Lukáš Czerner, Spelic, xfs, linux-ext4, device-mapper development On 6/19/12 9:19 AM, Ted Ts'o wrote: > On Tue, Jun 19, 2012 at 04:09:48PM +0200, Lukáš Czerner wrote: >> >> With thin provisioning you'll get totally different file system >> layout than on fully provisioned disk as you push more and more >> writes to your drive. This unfortunately has great impact on >> performance since file systems usually have a lot of optimization on >> where to put data/metadata on the drive and how to read them. >> However in case of thinly provisioned storage those optimization >> would not help. And yes, you just have to expect lower performance >> with dm-thin from the file system on top of it. It is not and it >> will never be ideal solution for workloads where you expect the best >> performance. > > One of the things which would be nice to be able to easily set up is a > configuration where we get the benefits of thin provisioning with > respect to snapshost, but where the underlying block device used by > the file system is contiguous. That is, it would be really useful to > *not* use thin provisioning for the underlying file system, but to use > thin provisioned snapshots. That way we only pay the thinp > performance penalty for the snapshots, and not for normal file system > operations. This is something that would be very useful both for ext4 > and xfs. I agree, and have asked for exactly the same thing... though I have no idea how hard it is to disentangle allocation-aware snapshots from thing provisioned storage. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 14:23 ` Eric Sandeen 0 siblings, 0 replies; 72+ messages in thread From: Eric Sandeen @ 2012-06-19 14:23 UTC (permalink / raw) To: Ted Ts'o Cc: Lukáš Czerner, linux-ext4, xfs, device-mapper development, Spelic On 6/19/12 9:19 AM, Ted Ts'o wrote: > On Tue, Jun 19, 2012 at 04:09:48PM +0200, Lukáš Czerner wrote: >> >> With thin provisioning you'll get totally different file system >> layout than on fully provisioned disk as you push more and more >> writes to your drive. This unfortunately has great impact on >> performance since file systems usually have a lot of optimization on >> where to put data/metadata on the drive and how to read them. >> However in case of thinly provisioned storage those optimization >> would not help. And yes, you just have to expect lower performance >> with dm-thin from the file system on top of it. It is not and it >> will never be ideal solution for workloads where you expect the best >> performance. > > One of the things which would be nice to be able to easily set up is a > configuration where we get the benefits of thin provisioning with > respect to snapshost, but where the underlying block device used by > the file system is contiguous. That is, it would be really useful to > *not* use thin provisioning for the underlying file system, but to use > thin provisioned snapshots. That way we only pay the thinp > performance penalty for the snapshots, and not for normal file system > operations. This is something that would be very useful both for ext4 > and xfs. I agree, and have asked for exactly the same thing... though I have no idea how hard it is to disentangle allocation-aware snapshots from thing provisioned storage. -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 14:19 ` Ted Ts'o @ 2012-06-19 14:37 ` Lukáš Czerner -1 siblings, 0 replies; 72+ messages in thread From: Lukáš Czerner @ 2012-06-19 14:37 UTC (permalink / raw) To: Ted Ts'o Cc: Lukáš Czerner, Spelic, xfs, linux-ext4, device-mapper development [-- Attachment #1: Type: TEXT/PLAIN, Size: 3187 bytes --] On Tue, 19 Jun 2012, Ted Ts'o wrote: > Date: Tue, 19 Jun 2012 10:19:33 -0400 > From: Ted Ts'o <tytso@mit.edu> > To: Lukáš Czerner <lczerner@redhat.com> > Cc: Spelic <spelic@shiftmail.org>, xfs@oss.sgi.com, > linux-ext4@vger.kernel.org, > device-mapper development <dm-devel@redhat.com> > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard > > On Tue, Jun 19, 2012 at 04:09:48PM +0200, Lukáš Czerner wrote: > > > > With thin provisioning you'll get totally different file system > > layout than on fully provisioned disk as you push more and more > > writes to your drive. This unfortunately has great impact on > > performance since file systems usually have a lot of optimization on > > where to put data/metadata on the drive and how to read them. > > However in case of thinly provisioned storage those optimization > > would not help. And yes, you just have to expect lower performance > > with dm-thin from the file system on top of it. It is not and it > > will never be ideal solution for workloads where you expect the best > > performance. > > One of the things which would be nice to be able to easily set up is a > configuration where we get the benefits of thin provisioning with > respect to snapshost, but where the underlying block device used by > the file system is contiguous. That is, it would be really useful to > *not* use thin provisioning for the underlying file system, but to use > thin provisioned snapshots. That way we only pay the thinp > performance penalty for the snapshots, and not for normal file system > operations. This is something that would be very useful both for ext4 > and xfs. > > I talked to Alasdair about this a few months ago at the Collab Summit, > and I think it's doable today, but it was somewhat complicaed to set > up. I don't recall the details now, but perhaps someone who's more > familiar device mapper could outline the details, and perhaps we can > either simplify it or abstract it away in a convenient front-end > script? like ssm for example ? :) Yes this would definitely help and I think there are actually more possible optimization like this. If we "cripple" the dm-thin so that only snapshot feature is provided, but the actual thinp feature is not used. It would definitely help the performance for those who are only interested in snapshots. You'll still have your file system layout mixed up once you start using snapshot, but it'll be definitely better. Also some king of fs/dm interface for optimizing the layout might helpful as well. The other thing which could be done is to still enable to utilize thinp feature, but try to keep file systems on the dm-thin relatively separated and contiguous (although probably not in it's entire size). It would certainly work only to some thin pool utilization threshold, but it is something. Also if we can add some fs related optimization to try not to span entire file system but rather utilize smaller parts first (alter the block allocator so it does not allocate blocks from random groups from entire fs but rather have smaller block group working set at start), this can be even more useful. -Lukas > > - Ted > ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 14:37 ` Lukáš Czerner 0 siblings, 0 replies; 72+ messages in thread From: Lukáš Czerner @ 2012-06-19 14:37 UTC (permalink / raw) To: Ted Ts'o Cc: Lukáš Czerner, linux-ext4, xfs, device-mapper development, Spelic [-- Attachment #1: Type: TEXT/PLAIN, Size: 3187 bytes --] On Tue, 19 Jun 2012, Ted Ts'o wrote: > Date: Tue, 19 Jun 2012 10:19:33 -0400 > From: Ted Ts'o <tytso@mit.edu> > To: Lukáš Czerner <lczerner@redhat.com> > Cc: Spelic <spelic@shiftmail.org>, xfs@oss.sgi.com, > linux-ext4@vger.kernel.org, > device-mapper development <dm-devel@redhat.com> > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard > > On Tue, Jun 19, 2012 at 04:09:48PM +0200, Lukáš Czerner wrote: > > > > With thin provisioning you'll get totally different file system > > layout than on fully provisioned disk as you push more and more > > writes to your drive. This unfortunately has great impact on > > performance since file systems usually have a lot of optimization on > > where to put data/metadata on the drive and how to read them. > > However in case of thinly provisioned storage those optimization > > would not help. And yes, you just have to expect lower performance > > with dm-thin from the file system on top of it. It is not and it > > will never be ideal solution for workloads where you expect the best > > performance. > > One of the things which would be nice to be able to easily set up is a > configuration where we get the benefits of thin provisioning with > respect to snapshost, but where the underlying block device used by > the file system is contiguous. That is, it would be really useful to > *not* use thin provisioning for the underlying file system, but to use > thin provisioned snapshots. That way we only pay the thinp > performance penalty for the snapshots, and not for normal file system > operations. This is something that would be very useful both for ext4 > and xfs. > > I talked to Alasdair about this a few months ago at the Collab Summit, > and I think it's doable today, but it was somewhat complicaed to set > up. I don't recall the details now, but perhaps someone who's more > familiar device mapper could outline the details, and perhaps we can > either simplify it or abstract it away in a convenient front-end > script? like ssm for example ? :) Yes this would definitely help and I think there are actually more possible optimization like this. If we "cripple" the dm-thin so that only snapshot feature is provided, but the actual thinp feature is not used. It would definitely help the performance for those who are only interested in snapshots. You'll still have your file system layout mixed up once you start using snapshot, but it'll be definitely better. Also some king of fs/dm interface for optimizing the layout might helpful as well. The other thing which could be done is to still enable to utilize thinp feature, but try to keep file systems on the dm-thin relatively separated and contiguous (although probably not in it's entire size). It would certainly work only to some thin pool utilization threshold, but it is something. Also if we can add some fs related optimization to try not to span entire file system but rather utilize smaller parts first (alter the block allocator so it does not allocate blocks from random groups from entire fs but rather have smaller block group working set at start), this can be even more useful. -Lukas > > - Ted > [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [dm-devel] Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 14:19 ` Ted Ts'o @ 2012-06-19 14:43 ` Alasdair G Kergon -1 siblings, 0 replies; 72+ messages in thread From: Alasdair G Kergon @ 2012-06-19 14:43 UTC (permalink / raw) To: device-mapper development Cc: Lukáš Czerner, linux-ext4, xfs, Spelic On Tue, Jun 19, 2012 at 10:19:33AM -0400, Ted Ts'o wrote: > One of the things which would be nice to be able to easily set up is a > configuration where we get the benefits of thin provisioning with > respect to snapshost, but where the underlying block device used by > the file system is contiguous. We're tracking this requirement (for lvm2) here: https://bugzilla.redhat.com/show_bug.cgi?id=814737 Alasdair ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [dm-devel] Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 14:43 ` Alasdair G Kergon 0 siblings, 0 replies; 72+ messages in thread From: Alasdair G Kergon @ 2012-06-19 14:43 UTC (permalink / raw) To: device-mapper development Cc: Lukáš Czerner, Spelic, linux-ext4, xfs On Tue, Jun 19, 2012 at 10:19:33AM -0400, Ted Ts'o wrote: > One of the things which would be nice to be able to easily set up is a > configuration where we get the benefits of thin provisioning with > respect to snapshost, but where the underlying block device used by > the file system is contiguous. We're tracking this requirement (for lvm2) here: https://bugzilla.redhat.com/show_bug.cgi?id=814737 Alasdair _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 14:43 ` Alasdair G Kergon @ 2012-06-19 15:28 ` Mike Snitzer -1 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-19 15:28 UTC (permalink / raw) To: device-mapper development, Lukáš Czerner, linux-ext4, xfs, Spelic On Tue, Jun 19 2012 at 10:43am -0400, Alasdair G Kergon <agk@redhat.com> wrote: > On Tue, Jun 19, 2012 at 10:19:33AM -0400, Ted Ts'o wrote: > > One of the things which would be nice to be able to easily set up is a > > configuration where we get the benefits of thin provisioning with > > respect to snapshost, but where the underlying block device used by > > the file system is contiguous. > > We're tracking this requirement (for lvm2) here: > https://bugzilla.redhat.com/show_bug.cgi?id=814737 That is an lvm2 BZ but there is further kernel work needed. It should be noted that the "external origin" feature was added to the thinp target with this commit: http://git.kernel.org/linus/2dd9c257fbc243aa76ee6d It is start, but external origin is kept read-only and any writes trigger allocation of new blocks within the thin-pool. We've talked some about the desire to have a fully provisioned volume that only starts to get fragmented once snapshots are taken. The idea is to move the origin into the data volume, via mapping, rather than copying: Dec 14 10:37:08 <ejt> we then build a data dev that consists of a linear mapping to that origin Dec 14 10:37:12 <ejt> plus some extra stuff Dec 14 10:37:23 <ejt> (the additonal free space for snapshots) Dec 14 10:37:49 <ejt> we then prepare thinp metadata with a mapping to that origin ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 15:28 ` Mike Snitzer 0 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-19 15:28 UTC (permalink / raw) To: device-mapper development, Lukáš Czerner, linux-ext4, xfs, Spelic On Tue, Jun 19 2012 at 10:43am -0400, Alasdair G Kergon <agk@redhat.com> wrote: > On Tue, Jun 19, 2012 at 10:19:33AM -0400, Ted Ts'o wrote: > > One of the things which would be nice to be able to easily set up is a > > configuration where we get the benefits of thin provisioning with > > respect to snapshost, but where the underlying block device used by > > the file system is contiguous. > > We're tracking this requirement (for lvm2) here: > https://bugzilla.redhat.com/show_bug.cgi?id=814737 That is an lvm2 BZ but there is further kernel work needed. It should be noted that the "external origin" feature was added to the thinp target with this commit: http://git.kernel.org/linus/2dd9c257fbc243aa76ee6d It is start, but external origin is kept read-only and any writes trigger allocation of new blocks within the thin-pool. We've talked some about the desire to have a fully provisioned volume that only starts to get fragmented once snapshots are taken. The idea is to move the origin into the data volume, via mapping, rather than copying: Dec 14 10:37:08 <ejt> we then build a data dev that consists of a linear mapping to that origin Dec 14 10:37:12 <ejt> plus some extra stuff Dec 14 10:37:23 <ejt> (the additonal free space for snapshots) Dec 14 10:37:49 <ejt> we then prepare thinp metadata with a mapping to that origin _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [dm-devel] Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 15:28 ` Mike Snitzer @ 2012-06-19 16:03 ` Alasdair G Kergon -1 siblings, 0 replies; 72+ messages in thread From: Alasdair G Kergon @ 2012-06-19 16:03 UTC (permalink / raw) To: device-mapper development Cc: Lukáš Czerner, linux-ext4, xfs, Spelic On Tue, Jun 19, 2012 at 11:28:56AM -0400, Mike Snitzer wrote: > That is an lvm2 BZ but there is further kernel work needed. In principle, userspace should already be able to handle the replumbing I think. (But when we work through the details of an online import, perhaps we'll want some further kernel change for atomicity/speed reasons? In particular we need to be able to do the last part of the metadata merge quickly.) Roughly: 1. rejig the lvm metadata for the new configuration [lvm] - appends the "whole LV" data to the pool's data 2. Generate metadata for the appended data and append this to the metadata area [dmpd] 3. suspend all the affected devices [lvm] 4. link the already-prepared metadata into the existing metadata [dmpd] 5. resume all the devices (now using the new extended pool) Alasdair ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [dm-devel] Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 16:03 ` Alasdair G Kergon 0 siblings, 0 replies; 72+ messages in thread From: Alasdair G Kergon @ 2012-06-19 16:03 UTC (permalink / raw) To: device-mapper development Cc: Lukáš Czerner, Spelic, linux-ext4, xfs On Tue, Jun 19, 2012 at 11:28:56AM -0400, Mike Snitzer wrote: > That is an lvm2 BZ but there is further kernel work needed. In principle, userspace should already be able to handle the replumbing I think. (But when we work through the details of an online import, perhaps we'll want some further kernel change for atomicity/speed reasons? In particular we need to be able to do the last part of the metadata merge quickly.) Roughly: 1. rejig the lvm metadata for the new configuration [lvm] - appends the "whole LV" data to the pool's data 2. Generate metadata for the appended data and append this to the metadata area [dmpd] 3. suspend all the affected devices [lvm] 4. link the already-prepared metadata into the existing metadata [dmpd] 5. resume all the devices (now using the new extended pool) Alasdair _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [dm-devel] Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 15:28 ` Mike Snitzer @ 2012-06-19 19:58 ` Ted Ts'o -1 siblings, 0 replies; 72+ messages in thread From: Ted Ts'o @ 2012-06-19 19:58 UTC (permalink / raw) To: device-mapper development Cc: Lukáš Czerner, linux-ext4, xfs, Spelic On Tue, Jun 19, 2012 at 11:28:56AM -0400, Mike Snitzer wrote: > > That is an lvm2 BZ but there is further kernel work needed. > > It should be noted that the "external origin" feature was added to the > thinp target with this commit: > http://git.kernel.org/linus/2dd9c257fbc243aa76ee6d > > It is start, but external origin is kept read-only and any writes > trigger allocation of new blocks within the thin-pool. Hmm... maybe this is what I had been told. I thought there was some feature where you could take a read-only thinp snapshot of an external volume (i.e., a pre-existing LVM2 volume, or a block device), and then after that, make read-write snapshots using the read-only snapshot as a base? Is that something that works today, or is planned? Or am I totally confused? And if it is something that works today, is there a web site or documentation file that gives a recipe for how to use it if we want to do some performance experiments (i.e., it doesn't have to be a user friendly interface if that's not ready yet). Thanks, - Ted ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [dm-devel] Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 19:58 ` Ted Ts'o 0 siblings, 0 replies; 72+ messages in thread From: Ted Ts'o @ 2012-06-19 19:58 UTC (permalink / raw) To: device-mapper development Cc: Lukáš Czerner, Spelic, linux-ext4, xfs On Tue, Jun 19, 2012 at 11:28:56AM -0400, Mike Snitzer wrote: > > That is an lvm2 BZ but there is further kernel work needed. > > It should be noted that the "external origin" feature was added to the > thinp target with this commit: > http://git.kernel.org/linus/2dd9c257fbc243aa76ee6d > > It is start, but external origin is kept read-only and any writes > trigger allocation of new blocks within the thin-pool. Hmm... maybe this is what I had been told. I thought there was some feature where you could take a read-only thinp snapshot of an external volume (i.e., a pre-existing LVM2 volume, or a block device), and then after that, make read-write snapshots using the read-only snapshot as a base? Is that something that works today, or is planned? Or am I totally confused? And if it is something that works today, is there a web site or documentation file that gives a recipe for how to use it if we want to do some performance experiments (i.e., it doesn't have to be a user friendly interface if that's not ready yet). Thanks, - Ted _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard 2012-06-19 19:58 ` Ted Ts'o @ 2012-06-19 20:44 ` Mike Snitzer -1 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-19 20:44 UTC (permalink / raw) To: Ted Ts'o Cc: device-mapper development, Lukáš Czerner, Spelic, linux-ext4, xfs On Tue, Jun 19 2012 at 3:58pm -0400, Ted Ts'o <tytso@mit.edu> wrote: > On Tue, Jun 19, 2012 at 11:28:56AM -0400, Mike Snitzer wrote: > > > > That is an lvm2 BZ but there is further kernel work needed. > > > > It should be noted that the "external origin" feature was added to the > > thinp target with this commit: > > http://git.kernel.org/linus/2dd9c257fbc243aa76ee6d > > > > It is start, but external origin is kept read-only and any writes > > trigger allocation of new blocks within the thin-pool. > > Hmm... maybe this is what I had been told. I thought there was some > feature where you could take a read-only thinp snapshot of an external > volume (i.e., a pre-existing LVM2 volume, or a block device), and then > after that, make read-write snapshots using the read-only snapshot as > a base? Is that something that works today, or is planned? Or am I > totally confused? The commit I referenced basically provides that capability. > And if it is something that works today, is there a web site or > documentation file that gives a recipe for how to use it if we want to > do some performance experiments (i.e., it doesn't have to be a user > friendly interface if that's not ready yet). Documentation/device-mapper/thin-provisioning.txt has details on how to use dmsetup to create a thin device that uses a read-only external origin volume (so all reads to unprovisioned areas of the thin device will be remapped to the external origin -- "external" meaning the volume outside of the thin-pool). The creation of a thin device w/ a read-only external origin gets you started with a thin device that is effectively a snapshot of the origin volume. That thin device is read-write -- all writes are provisioned from the thin-pool that is backing the thin device. And you can take snapshots (or recursive snapshots) of that thin device. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: Ext4 and xfs problems in dm-thin on allocation and discard @ 2012-06-19 20:44 ` Mike Snitzer 0 siblings, 0 replies; 72+ messages in thread From: Mike Snitzer @ 2012-06-19 20:44 UTC (permalink / raw) To: Ted Ts'o Cc: Lukáš Czerner, device-mapper development, linux-ext4, xfs, Spelic On Tue, Jun 19 2012 at 3:58pm -0400, Ted Ts'o <tytso@mit.edu> wrote: > On Tue, Jun 19, 2012 at 11:28:56AM -0400, Mike Snitzer wrote: > > > > That is an lvm2 BZ but there is further kernel work needed. > > > > It should be noted that the "external origin" feature was added to the > > thinp target with this commit: > > http://git.kernel.org/linus/2dd9c257fbc243aa76ee6d > > > > It is start, but external origin is kept read-only and any writes > > trigger allocation of new blocks within the thin-pool. > > Hmm... maybe this is what I had been told. I thought there was some > feature where you could take a read-only thinp snapshot of an external > volume (i.e., a pre-existing LVM2 volume, or a block device), and then > after that, make read-write snapshots using the read-only snapshot as > a base? Is that something that works today, or is planned? Or am I > totally confused? The commit I referenced basically provides that capability. > And if it is something that works today, is there a web site or > documentation file that gives a recipe for how to use it if we want to > do some performance experiments (i.e., it doesn't have to be a user > friendly interface if that's not ready yet). Documentation/device-mapper/thin-provisioning.txt has details on how to use dmsetup to create a thin device that uses a read-only external origin volume (so all reads to unprovisioned areas of the thin device will be remapped to the external origin -- "external" meaning the volume outside of the thin-pool). The creation of a thin device w/ a read-only external origin gets you started with a thin device that is effectively a snapshot of the origin volume. That thin device is read-write -- all writes are provisioned from the thin-pool that is backing the thin device. And you can take snapshots (or recursive snapshots) of that thin device. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 72+ messages in thread
end of thread, other threads:[~2012-07-02 13:15 UTC | newest] Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-06-18 21:33 Ext4 and xfs problems in dm-thin on allocation and discard Spelic 2012-06-18 21:33 ` Spelic 2012-06-19 1:57 ` Dave Chinner 2012-06-19 1:57 ` Dave Chinner 2012-06-19 3:12 ` Mike Snitzer 2012-06-19 3:12 ` Mike Snitzer 2012-06-19 6:32 ` Lukáš Czerner 2012-06-19 6:32 ` Lukáš Czerner 2012-06-19 11:29 ` Spelic 2012-06-19 11:29 ` Spelic 2012-06-19 12:20 ` Lukáš Czerner 2012-06-19 12:20 ` Lukáš Czerner 2012-06-19 13:34 ` Mike Snitzer 2012-06-19 13:34 ` Mike Snitzer 2012-06-19 13:16 ` Mike Snitzer 2012-06-19 13:16 ` Mike Snitzer 2012-06-19 13:25 ` Lukáš Czerner 2012-06-19 13:25 ` Lukáš Czerner 2012-06-19 13:30 ` Mike Snitzer 2012-06-19 13:30 ` Mike Snitzer 2012-06-19 13:52 ` Spelic 2012-06-19 13:52 ` Spelic 2012-06-19 14:05 ` Eric Sandeen 2012-06-19 14:05 ` Eric Sandeen 2012-06-19 14:44 ` Mike Snitzer 2012-06-19 14:44 ` Mike Snitzer 2012-06-19 18:48 ` Mike Snitzer 2012-06-19 18:48 ` Mike Snitzer 2012-06-19 20:06 ` Dave Chinner 2012-06-19 20:06 ` Dave Chinner 2012-06-19 20:21 ` Ted Ts'o 2012-06-19 20:21 ` Ted Ts'o 2012-06-19 20:39 ` Dave Chinner 2012-06-19 20:39 ` Dave Chinner 2012-06-20 9:01 ` Christoph Hellwig 2012-06-20 9:01 ` Christoph Hellwig 2012-06-19 21:37 ` Spelic 2012-06-19 21:37 ` Spelic 2012-06-19 23:12 ` Dave Chinner 2012-06-19 23:12 ` Dave Chinner 2012-06-20 12:11 ` Spelic 2012-06-20 12:11 ` Spelic 2012-06-20 22:53 ` Dave Chinner 2012-06-20 22:53 ` Dave Chinner 2012-06-21 17:47 ` Mike Snitzer 2012-06-21 17:47 ` Mike Snitzer 2012-06-21 23:29 ` Dave Chinner 2012-06-21 23:29 ` Dave Chinner 2012-07-01 14:53 ` Paolo Bonzini 2012-07-01 14:53 ` Paolo Bonzini 2012-07-02 13:00 ` Mike Snitzer 2012-07-02 13:00 ` Mike Snitzer 2012-07-02 13:15 ` Paolo Bonzini 2012-07-02 13:15 ` Paolo Bonzini 2012-06-19 14:09 ` Lukáš Czerner 2012-06-19 14:09 ` Lukáš Czerner 2012-06-19 14:19 ` Ted Ts'o 2012-06-19 14:19 ` Ted Ts'o 2012-06-19 14:23 ` Eric Sandeen 2012-06-19 14:23 ` Eric Sandeen 2012-06-19 14:37 ` Lukáš Czerner 2012-06-19 14:37 ` Lukáš Czerner 2012-06-19 14:43 ` [dm-devel] " Alasdair G Kergon 2012-06-19 14:43 ` Alasdair G Kergon 2012-06-19 15:28 ` Mike Snitzer 2012-06-19 15:28 ` Mike Snitzer 2012-06-19 16:03 ` [dm-devel] " Alasdair G Kergon 2012-06-19 16:03 ` Alasdair G Kergon 2012-06-19 19:58 ` Ted Ts'o 2012-06-19 19:58 ` Ted Ts'o 2012-06-19 20:44 ` Mike Snitzer 2012-06-19 20:44 ` Mike Snitzer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.