All of lore.kernel.org
 help / color / mirror / Atom feed
* Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-18 21:33 ` Spelic
  0 siblings, 0 replies; 72+ messages in thread
From: Spelic @ 2012-06-18 21:33 UTC (permalink / raw)
  To: xfs, linux-ext4, device-mapper development


[-- Attachment #1.1: Type: text/plain, Size: 3519 bytes --]

Hello all
I am doing some testing of dm-thin on kernel 3.4.2 and latest lvm from 
source (the rest is Ubuntu Precise 12.04).
There are a few problems with ext4 and (different ones with) xfs

I am doing this:
dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync
lvs
rm zeroes #optional
dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync  #again
lvs
rm zeroes #optional
...
dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync  #again
lvs
rm zeroes
fstrim /mnt/mountpoint
lvs

On ext4 the problem is that it always reallocates blocks at different 
places, so you can see from lvs that space occupation in the pool and 
thinlv increases at each iteration of dd, again and again, until it has 
allocated the whole thin device (really 100% of it). And this is true 
regardless of me doing rm or not between one dd and the other.
The other problem is that by doing this, ext4 always gets the worst 
performance from thinp, about 140MB/sec on my system, because it is 
constantly allocating blocks, instead of 350MB/sec which should have 
been with my system if it used already allocated regions (see below 
compared to xfs). I am on an MD raid-5 of 5 hdds.
I could suggest to add a "thinp mode" mount option to ext4 affecting the 
allocator, so that it tries to reallocate recently used and freed areas 
and not constantly new areas. Note that mount -o discard does work and 
prevents allocation bloating, but it still always gets the worst write 
performances from thinp. Alternatively thinp could be improved so that 
block allocation is fast :-P (*)
However, good news is that fstrim works correctly on ext4, and is able 
to drop all space allocated by all dd's. Also mount -o discard works.

On xfs there is a different problem.
Xfs apparently correctly re-uses the same blocks so that after the first 
write at 140MB/sec, subsequent overwrites of the same file are at full 
speed such as 350MB/sec (same speed as with non-thin lvm), and also you 
don't see space occupation going up at every iteration of dd, either 
with or without rm in-between the dd's. [ok actually now retrying it 
needed 3 rewrites to stabilize allocation... probably an AG count thing.]
However the problem with XFS is that discard doesn't appear to work. 
Fstrim doesn't work, and neither does "mount -o discard ... + rm zeroes" 
. There is apparently no way to drop the allocated blocks, as seen from 
lvs. This is in contrast to what it is written here 
http://xfs.org/index.php/FITRIM/discard which declare fstrim and mount 
-o discard to be working.
Please note that since I am above MD raid5 (I believe this is the 
reason), the passdown of discards does not work, as my dmesg says:
[160508.497879] device-mapper: thin: Discard unsupported by data device 
(dm-1): Disabling discard passdown.
but AFAIU, unless there is a thinp bug, this should not affect the 
unmapping of thin blocks by fstrimming xfs... and in fact ext4 is able 
to do that.

(*) Strange thing is that write performance appears to be roughly the 
same for default thin chunksize and for 1MB thin chunksize. I would have 
expected thinp allocation to be faster with larger thin chunksizes but 
instead it is actually slower (note that there are no snapshots here and 
hence no CoW). This is also true if I set the thinpool to not zero newly 
allocated blocks: performances are about 240 MB/sec then, but again they 
don't increase with larger chunksizes, they actually decrease slightly 
with very large chunksizes such as 16MB. Why is that?

Thanks for your help
S.

[-- Attachment #1.2: Type: text/html, Size: 4284 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-18 21:33 ` Spelic
  0 siblings, 0 replies; 72+ messages in thread
From: Spelic @ 2012-06-18 21:33 UTC (permalink / raw)
  To: xfs, linux-ext4, device-mapper development


[-- Attachment #1.1: Type: text/plain, Size: 3519 bytes --]

Hello all
I am doing some testing of dm-thin on kernel 3.4.2 and latest lvm from 
source (the rest is Ubuntu Precise 12.04).
There are a few problems with ext4 and (different ones with) xfs

I am doing this:
dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync
lvs
rm zeroes #optional
dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync  #again
lvs
rm zeroes #optional
...
dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync  #again
lvs
rm zeroes
fstrim /mnt/mountpoint
lvs

On ext4 the problem is that it always reallocates blocks at different 
places, so you can see from lvs that space occupation in the pool and 
thinlv increases at each iteration of dd, again and again, until it has 
allocated the whole thin device (really 100% of it). And this is true 
regardless of me doing rm or not between one dd and the other.
The other problem is that by doing this, ext4 always gets the worst 
performance from thinp, about 140MB/sec on my system, because it is 
constantly allocating blocks, instead of 350MB/sec which should have 
been with my system if it used already allocated regions (see below 
compared to xfs). I am on an MD raid-5 of 5 hdds.
I could suggest to add a "thinp mode" mount option to ext4 affecting the 
allocator, so that it tries to reallocate recently used and freed areas 
and not constantly new areas. Note that mount -o discard does work and 
prevents allocation bloating, but it still always gets the worst write 
performances from thinp. Alternatively thinp could be improved so that 
block allocation is fast :-P (*)
However, good news is that fstrim works correctly on ext4, and is able 
to drop all space allocated by all dd's. Also mount -o discard works.

On xfs there is a different problem.
Xfs apparently correctly re-uses the same blocks so that after the first 
write at 140MB/sec, subsequent overwrites of the same file are at full 
speed such as 350MB/sec (same speed as with non-thin lvm), and also you 
don't see space occupation going up at every iteration of dd, either 
with or without rm in-between the dd's. [ok actually now retrying it 
needed 3 rewrites to stabilize allocation... probably an AG count thing.]
However the problem with XFS is that discard doesn't appear to work. 
Fstrim doesn't work, and neither does "mount -o discard ... + rm zeroes" 
. There is apparently no way to drop the allocated blocks, as seen from 
lvs. This is in contrast to what it is written here 
http://xfs.org/index.php/FITRIM/discard which declare fstrim and mount 
-o discard to be working.
Please note that since I am above MD raid5 (I believe this is the 
reason), the passdown of discards does not work, as my dmesg says:
[160508.497879] device-mapper: thin: Discard unsupported by data device 
(dm-1): Disabling discard passdown.
but AFAIU, unless there is a thinp bug, this should not affect the 
unmapping of thin blocks by fstrimming xfs... and in fact ext4 is able 
to do that.

(*) Strange thing is that write performance appears to be roughly the 
same for default thin chunksize and for 1MB thin chunksize. I would have 
expected thinp allocation to be faster with larger thin chunksizes but 
instead it is actually slower (note that there are no snapshots here and 
hence no CoW). This is also true if I set the thinpool to not zero newly 
allocated blocks: performances are about 240 MB/sec then, but again they 
don't increase with larger chunksizes, they actually decrease slightly 
with very large chunksizes such as 16MB. Why is that?

Thanks for your help
S.

[-- Attachment #1.2: Type: text/html, Size: 4202 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-18 21:33 ` Spelic
@ 2012-06-19  1:57   ` Dave Chinner
  -1 siblings, 0 replies; 72+ messages in thread
From: Dave Chinner @ 2012-06-19  1:57 UTC (permalink / raw)
  To: Spelic; +Cc: xfs, linux-ext4, device-mapper development

On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote:
> Hello all
> I am doing some testing of dm-thin on kernel 3.4.2 and latest lvm
> from source (the rest is Ubuntu Precise 12.04).
> There are a few problems with ext4 and (different ones with) xfs
> 
> I am doing this:
> dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync
> lvs
> rm zeroes #optional
> dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync  #again
> lvs
> rm zeroes #optional
> ...
> dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync  #again
> lvs
> rm zeroes
> fstrim /mnt/mountpoint
> lvs

[snip ext4 problems]

> On xfs there is a different problem.
> Xfs apparently correctly re-uses the same blocks so that after the
> first write at 140MB/sec, subsequent overwrites of the same file are
> at full speed such as 350MB/sec (same speed as with non-thin lvm),
> and also you don't see space occupation going up at every iteration
> of dd, either with or without rm in-between the dd's. [ok actually
> now retrying it needed 3 rewrites to stabilize allocation...
> probably an AG count thing.]

That's just a characteristic of the allocation algorithm. It's not
something that you see in day-to-day operation of the filesystem,
though, because you rarely remove and rewrite a file like this
repeatedly. So in the real world, performance will be more like ext4
when you are running workloads where you actually store data for
longer than a millisecond...

Expect that the 140MB/s number is the normal performance case,
because as soon as you take a snapshot, the overwrite requires new
blocks to be allocated in dm-thinp. You don't get thinp for nothing
- it has an associated performance cost as you are now finding
out....

> However the problem with XFS is that discard doesn't appear to work.
> Fstrim doesn't work, and neither does "mount -o discard ... + rm
> zeroes" . There is apparently no way to drop the allocated blocks,
> as seen from lvs. This is in contrast to what it is written here
> http://xfs.org/index.php/FITRIM/discard which declare fstrim and
> mount -o discard to be working.

I don't see why it wouldnt be if the underlying device supports it.
Have you looked at a block trace or an xfs event trace to see if
discards are being issued by XFS?

Are you getting messages like:

XFS: (dev) discard failed for extent [0x123,4096], error -5

in dmesg, or is fstrim seeing errors returned from the trim ioctl?

> Please note that since I am above MD raid5 (I believe this is the
> reason), the passdown of discards does not work, as my dmesg says:
> [160508.497879] device-mapper: thin: Discard unsupported by data
> device (dm-1): Disabling discard passdown.
> but AFAIU, unless there is a thinp bug, this should not affect the
> unmapping of thin blocks by fstrimming xfs... and in fact ext4 is
> able to do that.

Does ext4 report that same error?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19  1:57   ` Dave Chinner
  0 siblings, 0 replies; 72+ messages in thread
From: Dave Chinner @ 2012-06-19  1:57 UTC (permalink / raw)
  To: Spelic; +Cc: device-mapper development, linux-ext4, xfs

On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote:
> Hello all
> I am doing some testing of dm-thin on kernel 3.4.2 and latest lvm
> from source (the rest is Ubuntu Precise 12.04).
> There are a few problems with ext4 and (different ones with) xfs
> 
> I am doing this:
> dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync
> lvs
> rm zeroes #optional
> dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync  #again
> lvs
> rm zeroes #optional
> ...
> dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync  #again
> lvs
> rm zeroes
> fstrim /mnt/mountpoint
> lvs

[snip ext4 problems]

> On xfs there is a different problem.
> Xfs apparently correctly re-uses the same blocks so that after the
> first write at 140MB/sec, subsequent overwrites of the same file are
> at full speed such as 350MB/sec (same speed as with non-thin lvm),
> and also you don't see space occupation going up at every iteration
> of dd, either with or without rm in-between the dd's. [ok actually
> now retrying it needed 3 rewrites to stabilize allocation...
> probably an AG count thing.]

That's just a characteristic of the allocation algorithm. It's not
something that you see in day-to-day operation of the filesystem,
though, because you rarely remove and rewrite a file like this
repeatedly. So in the real world, performance will be more like ext4
when you are running workloads where you actually store data for
longer than a millisecond...

Expect that the 140MB/s number is the normal performance case,
because as soon as you take a snapshot, the overwrite requires new
blocks to be allocated in dm-thinp. You don't get thinp for nothing
- it has an associated performance cost as you are now finding
out....

> However the problem with XFS is that discard doesn't appear to work.
> Fstrim doesn't work, and neither does "mount -o discard ... + rm
> zeroes" . There is apparently no way to drop the allocated blocks,
> as seen from lvs. This is in contrast to what it is written here
> http://xfs.org/index.php/FITRIM/discard which declare fstrim and
> mount -o discard to be working.

I don't see why it wouldnt be if the underlying device supports it.
Have you looked at a block trace or an xfs event trace to see if
discards are being issued by XFS?

Are you getting messages like:

XFS: (dev) discard failed for extent [0x123,4096], error -5

in dmesg, or is fstrim seeing errors returned from the trim ioctl?

> Please note that since I am above MD raid5 (I believe this is the
> reason), the passdown of discards does not work, as my dmesg says:
> [160508.497879] device-mapper: thin: Discard unsupported by data
> device (dm-1): Disabling discard passdown.
> but AFAIU, unless there is a thinp bug, this should not affect the
> unmapping of thin blocks by fstrimming xfs... and in fact ext4 is
> able to do that.

Does ext4 report that same error?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19  1:57   ` Dave Chinner
@ 2012-06-19  3:12     ` Mike Snitzer
  -1 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-19  3:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Spelic, device-mapper development, linux-ext4, xfs

On Mon, Jun 18 2012 at  9:57pm -0400,
Dave Chinner <david@fromorbit.com> wrote:

> On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote:
>
> > Please note that since I am above MD raid5 (I believe this is the
> > reason), the passdown of discards does not work, as my dmesg says:
> > [160508.497879] device-mapper: thin: Discard unsupported by data
> > device (dm-1): Disabling discard passdown.
> > but AFAIU, unless there is a thinp bug, this should not affect the
> > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is
> > able to do that.
> 
> Does ext4 report that same error?

That message says the underlying device doesn't support discards
(because it is an MD device).  But the thinp device still has discards
enabled -- it just won't pass the discards down to the underlying data
device.

So yes, it'll happen with ext4 -- it is generated when the thin-pool
device is loaded (which happens independent of the filesystem that is
layered ontop).

The discards still inform the thin-pool that the corresponding extents
are no longer allocated.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19  3:12     ` Mike Snitzer
  0 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-19  3:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: device-mapper development, linux-ext4, xfs, Spelic

On Mon, Jun 18 2012 at  9:57pm -0400,
Dave Chinner <david@fromorbit.com> wrote:

> On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote:
>
> > Please note that since I am above MD raid5 (I believe this is the
> > reason), the passdown of discards does not work, as my dmesg says:
> > [160508.497879] device-mapper: thin: Discard unsupported by data
> > device (dm-1): Disabling discard passdown.
> > but AFAIU, unless there is a thinp bug, this should not affect the
> > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is
> > able to do that.
> 
> Does ext4 report that same error?

That message says the underlying device doesn't support discards
(because it is an MD device).  But the thinp device still has discards
enabled -- it just won't pass the discards down to the underlying data
device.

So yes, it'll happen with ext4 -- it is generated when the thin-pool
device is loaded (which happens independent of the filesystem that is
layered ontop).

The discards still inform the thin-pool that the corresponding extents
are no longer allocated.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19  3:12     ` Mike Snitzer
@ 2012-06-19  6:32       ` Lukáš Czerner
  -1 siblings, 0 replies; 72+ messages in thread
From: Lukáš Czerner @ 2012-06-19  6:32 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Dave Chinner, Spelic, device-mapper development, linux-ext4, xfs

On Mon, 18 Jun 2012, Mike Snitzer wrote:

> Date: Mon, 18 Jun 2012 23:12:42 -0400
> From: Mike Snitzer <snitzer@redhat.com>
> To: Dave Chinner <david@fromorbit.com>
> Cc: Spelic <spelic@shiftmail.org>,
>     device-mapper development <dm-devel@redhat.com>,
>     linux-ext4@vger.kernel.org, xfs@oss.sgi.com
> Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard
> 
> On Mon, Jun 18 2012 at  9:57pm -0400,
> Dave Chinner <david@fromorbit.com> wrote:
> 
> > On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote:
> >
> > > Please note that since I am above MD raid5 (I believe this is the
> > > reason), the passdown of discards does not work, as my dmesg says:
> > > [160508.497879] device-mapper: thin: Discard unsupported by data
> > > device (dm-1): Disabling discard passdown.
> > > but AFAIU, unless there is a thinp bug, this should not affect the
> > > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is
> > > able to do that.
> > 
> > Does ext4 report that same error?
> 
> That message says the underlying device doesn't support discards
> (because it is an MD device).  But the thinp device still has discards
> enabled -- it just won't pass the discards down to the underlying data
> device.
> 
> So yes, it'll happen with ext4 -- it is generated when the thin-pool
> device is loaded (which happens independent of the filesystem that is
> layered ontop).
> 
> The discards still inform the thin-pool that the corresponding extents
> are no longer allocated.

So do I understand correctly that even though the discard came
through and thinp took advantage of it it still returns EOPNOTSUPP ?
This seems rather suboptimal. IIRC there was a discussion to add an
option to enable/disable sending discard in thinp target down
to the device.

So maybe it might be a bit smarter than that and actually
enable/disable discard pass through depending on the underlying
support, so we do not blindly send discard down to the device even
though it does not support it.

So we'll have three options: 

pass through - always send discard down
backstop - never send discard down to the device
auto - send discard only if the underlying device supports it

What do you think ?

-Lukas

> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19  6:32       ` Lukáš Czerner
  0 siblings, 0 replies; 72+ messages in thread
From: Lukáš Czerner @ 2012-06-19  6:32 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, linux-ext4, xfs, Spelic

On Mon, 18 Jun 2012, Mike Snitzer wrote:

> Date: Mon, 18 Jun 2012 23:12:42 -0400
> From: Mike Snitzer <snitzer@redhat.com>
> To: Dave Chinner <david@fromorbit.com>
> Cc: Spelic <spelic@shiftmail.org>,
>     device-mapper development <dm-devel@redhat.com>,
>     linux-ext4@vger.kernel.org, xfs@oss.sgi.com
> Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard
> 
> On Mon, Jun 18 2012 at  9:57pm -0400,
> Dave Chinner <david@fromorbit.com> wrote:
> 
> > On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote:
> >
> > > Please note that since I am above MD raid5 (I believe this is the
> > > reason), the passdown of discards does not work, as my dmesg says:
> > > [160508.497879] device-mapper: thin: Discard unsupported by data
> > > device (dm-1): Disabling discard passdown.
> > > but AFAIU, unless there is a thinp bug, this should not affect the
> > > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is
> > > able to do that.
> > 
> > Does ext4 report that same error?
> 
> That message says the underlying device doesn't support discards
> (because it is an MD device).  But the thinp device still has discards
> enabled -- it just won't pass the discards down to the underlying data
> device.
> 
> So yes, it'll happen with ext4 -- it is generated when the thin-pool
> device is loaded (which happens independent of the filesystem that is
> layered ontop).
> 
> The discards still inform the thin-pool that the corresponding extents
> are no longer allocated.

So do I understand correctly that even though the discard came
through and thinp took advantage of it it still returns EOPNOTSUPP ?
This seems rather suboptimal. IIRC there was a discussion to add an
option to enable/disable sending discard in thinp target down
to the device.

So maybe it might be a bit smarter than that and actually
enable/disable discard pass through depending on the underlying
support, so we do not blindly send discard down to the device even
though it does not support it.

So we'll have three options: 

pass through - always send discard down
backstop - never send discard down to the device
auto - send discard only if the underlying device supports it

What do you think ?

-Lukas

> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19  6:32       ` Lukáš Czerner
@ 2012-06-19 11:29         ` Spelic
  -1 siblings, 0 replies; 72+ messages in thread
From: Spelic @ 2012-06-19 11:29 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: Mike Snitzer, Dave Chinner, Spelic, device-mapper development,
	linux-ext4, xfs

On 06/19/12 08:32, Lukáš Czerner wrote:
>
> So do I understand correctly that even though the discard came
> through and thinp took advantage of it it still returns EOPNOTSUPP ?
> This seems rather suboptimal. IIRC there was a discussion to add an
> option to enable/disable sending discard in thinp target down
> to the device.

I'll ask this too...
do I understand correctly that dm-thin returns EOPNOTSUPP to the 
filesystem layer even though it is using the discard to unmap blocks, 
and at that point XFS stops sending discards down there (while ext4 
keeps sending them)?

This looks like a bug of dm-thin to me. Discards are "supported" in such 
a scenario.

Do you have a patch for dm-thin so to prevent it sending EOPTNOTSUPP ?

Thank you
S.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 11:29         ` Spelic
  0 siblings, 0 replies; 72+ messages in thread
From: Spelic @ 2012-06-19 11:29 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: Mike Snitzer, xfs, device-mapper development, Spelic, linux-ext4

On 06/19/12 08:32, Lukáš Czerner wrote:
>
> So do I understand correctly that even though the discard came
> through and thinp took advantage of it it still returns EOPNOTSUPP ?
> This seems rather suboptimal. IIRC there was a discussion to add an
> option to enable/disable sending discard in thinp target down
> to the device.

I'll ask this too...
do I understand correctly that dm-thin returns EOPNOTSUPP to the 
filesystem layer even though it is using the discard to unmap blocks, 
and at that point XFS stops sending discards down there (while ext4 
keeps sending them)?

This looks like a bug of dm-thin to me. Discards are "supported" in such 
a scenario.

Do you have a patch for dm-thin so to prevent it sending EOPTNOTSUPP ?

Thank you
S.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 11:29         ` Spelic
@ 2012-06-19 12:20           ` Lukáš Czerner
  -1 siblings, 0 replies; 72+ messages in thread
From: Lukáš Czerner @ 2012-06-19 12:20 UTC (permalink / raw)
  To: Spelic
  Cc: Lukáš Czerner, Mike Snitzer, Dave Chinner,
	device-mapper development, linux-ext4, xfs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1444 bytes --]

On Tue, 19 Jun 2012, Spelic wrote:

> Date: Tue, 19 Jun 2012 13:29:55 +0200
> From: Spelic <spelic@shiftmail.org>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>, Dave Chinner <david@fromorbit.com>,
>     Spelic <spelic@shiftmail.org>,
>     device-mapper development <dm-devel@redhat.com>,
>     linux-ext4@vger.kernel.org, xfs@oss.sgi.com
> Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard
> 
> On 06/19/12 08:32, Lukáš Czerner wrote:
> > 
> > So do I understand correctly that even though the discard came
> > through and thinp took advantage of it it still returns EOPNOTSUPP ?
> > This seems rather suboptimal. IIRC there was a discussion to add an
> > option to enable/disable sending discard in thinp target down
> > to the device.
> 
> I'll ask this too...
> do I understand correctly that dm-thin returns EOPNOTSUPP to the filesystem
> layer even though it is using the discard to unmap blocks, and at that point
> XFS stops sending discards down there (while ext4 keeps sending them)?
> 
> This looks like a bug of dm-thin to me. Discards are "supported" in such a
> scenario.
> 
> Do you have a patch for dm-thin so to prevent it sending EOPTNOTSUPP ?

Yes, definitely this behaviour need to change in dm-thin. I do not
have a path, it was merely a proposal how thing could be done. Not
sure what Mike and rest of the dm folks think about this.

-Lukas

> 
> Thank you
> S.
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 12:20           ` Lukáš Czerner
  0 siblings, 0 replies; 72+ messages in thread
From: Lukáš Czerner @ 2012-06-19 12:20 UTC (permalink / raw)
  To: Spelic
  Cc: Mike Snitzer, xfs, device-mapper development,
	Lukáš Czerner, linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1444 bytes --]

On Tue, 19 Jun 2012, Spelic wrote:

> Date: Tue, 19 Jun 2012 13:29:55 +0200
> From: Spelic <spelic@shiftmail.org>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>, Dave Chinner <david@fromorbit.com>,
>     Spelic <spelic@shiftmail.org>,
>     device-mapper development <dm-devel@redhat.com>,
>     linux-ext4@vger.kernel.org, xfs@oss.sgi.com
> Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard
> 
> On 06/19/12 08:32, Lukáš Czerner wrote:
> > 
> > So do I understand correctly that even though the discard came
> > through and thinp took advantage of it it still returns EOPNOTSUPP ?
> > This seems rather suboptimal. IIRC there was a discussion to add an
> > option to enable/disable sending discard in thinp target down
> > to the device.
> 
> I'll ask this too...
> do I understand correctly that dm-thin returns EOPNOTSUPP to the filesystem
> layer even though it is using the discard to unmap blocks, and at that point
> XFS stops sending discards down there (while ext4 keeps sending them)?
> 
> This looks like a bug of dm-thin to me. Discards are "supported" in such a
> scenario.
> 
> Do you have a patch for dm-thin so to prevent it sending EOPTNOTSUPP ?

Yes, definitely this behaviour need to change in dm-thin. I do not
have a path, it was merely a proposal how thing could be done. Not
sure what Mike and rest of the dm folks think about this.

-Lukas

> 
> Thank you
> S.
> 

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19  6:32       ` Lukáš Czerner
@ 2012-06-19 13:16         ` Mike Snitzer
  -1 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-19 13:16 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: Dave Chinner, Spelic, device-mapper development, linux-ext4, xfs

On Tue, Jun 19 2012 at  2:32am -0400,
Lukáš Czerner <lczerner@redhat.com> wrote:

> On Mon, 18 Jun 2012, Mike Snitzer wrote:
> 
> > Date: Mon, 18 Jun 2012 23:12:42 -0400
> > From: Mike Snitzer <snitzer@redhat.com>
> > To: Dave Chinner <david@fromorbit.com>
> > Cc: Spelic <spelic@shiftmail.org>,
> >     device-mapper development <dm-devel@redhat.com>,
> >     linux-ext4@vger.kernel.org, xfs@oss.sgi.com
> > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard
> > 
> > On Mon, Jun 18 2012 at  9:57pm -0400,
> > Dave Chinner <david@fromorbit.com> wrote:
> > 
> > > On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote:
> > >
> > > > Please note that since I am above MD raid5 (I believe this is the
> > > > reason), the passdown of discards does not work, as my dmesg says:
> > > > [160508.497879] device-mapper: thin: Discard unsupported by data
> > > > device (dm-1): Disabling discard passdown.
> > > > but AFAIU, unless there is a thinp bug, this should not affect the
> > > > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is
> > > > able to do that.
> > > 
> > > Does ext4 report that same error?
> > 
> > That message says the underlying device doesn't support discards
> > (because it is an MD device).  But the thinp device still has discards
> > enabled -- it just won't pass the discards down to the underlying data
> > device.
> > 
> > So yes, it'll happen with ext4 -- it is generated when the thin-pool
> > device is loaded (which happens independent of the filesystem that is
> > layered ontop).
> > 
> > The discards still inform the thin-pool that the corresponding extents
> > are no longer allocated.
> 
> So do I understand correctly that even though the discard came
> through and thinp took advantage of it it still returns EOPNOTSUPP ?

No, not correct.  Why are you assuming this?  I must be missing
something from this discussion that led you there.

> This seems rather suboptimal. IIRC there was a discussion to add an
> option to enable/disable sending discard in thinp target down
> to the device.
> 
> So maybe it might be a bit smarter than that and actually
> enable/disable discard pass through depending on the underlying
> support, so we do not blindly send discard down to the device even
> though it does not support it.

Yes, that is what we did.

Discards are enabled my default (including discard passdown), but if the
underlying data device doesn't support discards then the discards will
not be passed down.

And here are the feature controls that can be provided when loading the
thin-pool's DM table:

ignore_discard: disable discard
no_discard_passdown: don't pass discards down to the data device

-EOPNOTSUPP is only ever returned if 'ignore_discard' is provided.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 13:16         ` Mike Snitzer
  0 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-19 13:16 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: device-mapper development, linux-ext4, xfs, Spelic

On Tue, Jun 19 2012 at  2:32am -0400,
Lukáš Czerner <lczerner@redhat.com> wrote:

> On Mon, 18 Jun 2012, Mike Snitzer wrote:
> 
> > Date: Mon, 18 Jun 2012 23:12:42 -0400
> > From: Mike Snitzer <snitzer@redhat.com>
> > To: Dave Chinner <david@fromorbit.com>
> > Cc: Spelic <spelic@shiftmail.org>,
> >     device-mapper development <dm-devel@redhat.com>,
> >     linux-ext4@vger.kernel.org, xfs@oss.sgi.com
> > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard
> > 
> > On Mon, Jun 18 2012 at  9:57pm -0400,
> > Dave Chinner <david@fromorbit.com> wrote:
> > 
> > > On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote:
> > >
> > > > Please note that since I am above MD raid5 (I believe this is the
> > > > reason), the passdown of discards does not work, as my dmesg says:
> > > > [160508.497879] device-mapper: thin: Discard unsupported by data
> > > > device (dm-1): Disabling discard passdown.
> > > > but AFAIU, unless there is a thinp bug, this should not affect the
> > > > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is
> > > > able to do that.
> > > 
> > > Does ext4 report that same error?
> > 
> > That message says the underlying device doesn't support discards
> > (because it is an MD device).  But the thinp device still has discards
> > enabled -- it just won't pass the discards down to the underlying data
> > device.
> > 
> > So yes, it'll happen with ext4 -- it is generated when the thin-pool
> > device is loaded (which happens independent of the filesystem that is
> > layered ontop).
> > 
> > The discards still inform the thin-pool that the corresponding extents
> > are no longer allocated.
> 
> So do I understand correctly that even though the discard came
> through and thinp took advantage of it it still returns EOPNOTSUPP ?

No, not correct.  Why are you assuming this?  I must be missing
something from this discussion that led you there.

> This seems rather suboptimal. IIRC there was a discussion to add an
> option to enable/disable sending discard in thinp target down
> to the device.
> 
> So maybe it might be a bit smarter than that and actually
> enable/disable discard pass through depending on the underlying
> support, so we do not blindly send discard down to the device even
> though it does not support it.

Yes, that is what we did.

Discards are enabled my default (including discard passdown), but if the
underlying data device doesn't support discards then the discards will
not be passed down.

And here are the feature controls that can be provided when loading the
thin-pool's DM table:

ignore_discard: disable discard
no_discard_passdown: don't pass discards down to the data device

-EOPNOTSUPP is only ever returned if 'ignore_discard' is provided.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 13:16         ` Mike Snitzer
@ 2012-06-19 13:25           ` Lukáš Czerner
  -1 siblings, 0 replies; 72+ messages in thread
From: Lukáš Czerner @ 2012-06-19 13:25 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Lukáš Czerner, Dave Chinner, Spelic,
	device-mapper development, linux-ext4, xfs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4041 bytes --]

On Tue, 19 Jun 2012, Mike Snitzer wrote:

> Date: Tue, 19 Jun 2012 09:16:49 -0400
> From: Mike Snitzer <snitzer@redhat.com>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: Dave Chinner <david@fromorbit.com>, Spelic <spelic@shiftmail.org>,
>     device-mapper development <dm-devel@redhat.com>,
>     linux-ext4@vger.kernel.org, xfs@oss.sgi.com
> Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard
> 
> On Tue, Jun 19 2012 at  2:32am -0400,
> Lukáš Czerner <lczerner@redhat.com> wrote:
> 
> > On Mon, 18 Jun 2012, Mike Snitzer wrote:
> > 
> > > Date: Mon, 18 Jun 2012 23:12:42 -0400
> > > From: Mike Snitzer <snitzer@redhat.com>
> > > To: Dave Chinner <david@fromorbit.com>
> > > Cc: Spelic <spelic@shiftmail.org>,
> > >     device-mapper development <dm-devel@redhat.com>,
> > >     linux-ext4@vger.kernel.org, xfs@oss.sgi.com
> > > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard
> > > 
> > > On Mon, Jun 18 2012 at  9:57pm -0400,
> > > Dave Chinner <david@fromorbit.com> wrote:
> > > 
> > > > On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote:
> > > >
> > > > > Please note that since I am above MD raid5 (I believe this is the
> > > > > reason), the passdown of discards does not work, as my dmesg says:
> > > > > [160508.497879] device-mapper: thin: Discard unsupported by data
> > > > > device (dm-1): Disabling discard passdown.
> > > > > but AFAIU, unless there is a thinp bug, this should not affect the
> > > > > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is
> > > > > able to do that.
> > > > 
> > > > Does ext4 report that same error?
> > > 
> > > That message says the underlying device doesn't support discards
> > > (because it is an MD device).  But the thinp device still has discards
> > > enabled -- it just won't pass the discards down to the underlying data
> > > device.
> > > 
> > > So yes, it'll happen with ext4 -- it is generated when the thin-pool
> > > device is loaded (which happens independent of the filesystem that is
> > > layered ontop).
> > > 
> > > The discards still inform the thin-pool that the corresponding extents
> > > are no longer allocated.
> > 
> > So do I understand correctly that even though the discard came
> > through and thinp took advantage of it it still returns EOPNOTSUPP ?
> 
> No, not correct.  Why are you assuming this?  I must be missing
> something from this discussion that led you there.

Those two paragraphs led me to that conclusion:

  That message says the underlying device doesn't support discards
  (because it is an MD device).  But the thinp device still has discards
  enabled -- it just won't pass the discards down to the underlying data
  device.

  The discards still inform the thin-pool that the corresponding extents
  are no longer allocated.

so I am a bit confused now. Why the dm-thin returned EOPNOTSUPP then
? Is that because it has been configured to ignore_discard, or it
actually takes advantage of the discard but underlying device does
not support it (and no_discard_passdown is not set) so it return
EOPNOTSUPP ?

> 
> > This seems rather suboptimal. IIRC there was a discussion to add an
> > option to enable/disable sending discard in thinp target down
> > to the device.
> > 
> > So maybe it might be a bit smarter than that and actually
> > enable/disable discard pass through depending on the underlying
> > support, so we do not blindly send discard down to the device even
> > though it does not support it.
> 
> Yes, that is what we did.
> 
> Discards are enabled my default (including discard passdown), but if the
> underlying data device doesn't support discards then the discards will
> not be passed down.
> 
> And here are the feature controls that can be provided when loading the
> thin-pool's DM table:
> 
> ignore_discard: disable discard
> no_discard_passdown: don't pass discards down to the data device
> 
> -EOPNOTSUPP is only ever returned if 'ignore_discard' is provided.

Ok, so in this case 'ignore_discard' has been configured ?

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 13:25           ` Lukáš Czerner
  0 siblings, 0 replies; 72+ messages in thread
From: Lukáš Czerner @ 2012-06-19 13:25 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: xfs, device-mapper development, Spelic, Lukáš Czerner,
	linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4041 bytes --]

On Tue, 19 Jun 2012, Mike Snitzer wrote:

> Date: Tue, 19 Jun 2012 09:16:49 -0400
> From: Mike Snitzer <snitzer@redhat.com>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: Dave Chinner <david@fromorbit.com>, Spelic <spelic@shiftmail.org>,
>     device-mapper development <dm-devel@redhat.com>,
>     linux-ext4@vger.kernel.org, xfs@oss.sgi.com
> Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard
> 
> On Tue, Jun 19 2012 at  2:32am -0400,
> Lukáš Czerner <lczerner@redhat.com> wrote:
> 
> > On Mon, 18 Jun 2012, Mike Snitzer wrote:
> > 
> > > Date: Mon, 18 Jun 2012 23:12:42 -0400
> > > From: Mike Snitzer <snitzer@redhat.com>
> > > To: Dave Chinner <david@fromorbit.com>
> > > Cc: Spelic <spelic@shiftmail.org>,
> > >     device-mapper development <dm-devel@redhat.com>,
> > >     linux-ext4@vger.kernel.org, xfs@oss.sgi.com
> > > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard
> > > 
> > > On Mon, Jun 18 2012 at  9:57pm -0400,
> > > Dave Chinner <david@fromorbit.com> wrote:
> > > 
> > > > On Mon, Jun 18, 2012 at 11:33:50PM +0200, Spelic wrote:
> > > >
> > > > > Please note that since I am above MD raid5 (I believe this is the
> > > > > reason), the passdown of discards does not work, as my dmesg says:
> > > > > [160508.497879] device-mapper: thin: Discard unsupported by data
> > > > > device (dm-1): Disabling discard passdown.
> > > > > but AFAIU, unless there is a thinp bug, this should not affect the
> > > > > unmapping of thin blocks by fstrimming xfs... and in fact ext4 is
> > > > > able to do that.
> > > > 
> > > > Does ext4 report that same error?
> > > 
> > > That message says the underlying device doesn't support discards
> > > (because it is an MD device).  But the thinp device still has discards
> > > enabled -- it just won't pass the discards down to the underlying data
> > > device.
> > > 
> > > So yes, it'll happen with ext4 -- it is generated when the thin-pool
> > > device is loaded (which happens independent of the filesystem that is
> > > layered ontop).
> > > 
> > > The discards still inform the thin-pool that the corresponding extents
> > > are no longer allocated.
> > 
> > So do I understand correctly that even though the discard came
> > through and thinp took advantage of it it still returns EOPNOTSUPP ?
> 
> No, not correct.  Why are you assuming this?  I must be missing
> something from this discussion that led you there.

Those two paragraphs led me to that conclusion:

  That message says the underlying device doesn't support discards
  (because it is an MD device).  But the thinp device still has discards
  enabled -- it just won't pass the discards down to the underlying data
  device.

  The discards still inform the thin-pool that the corresponding extents
  are no longer allocated.

so I am a bit confused now. Why the dm-thin returned EOPNOTSUPP then
? Is that because it has been configured to ignore_discard, or it
actually takes advantage of the discard but underlying device does
not support it (and no_discard_passdown is not set) so it return
EOPNOTSUPP ?

> 
> > This seems rather suboptimal. IIRC there was a discussion to add an
> > option to enable/disable sending discard in thinp target down
> > to the device.
> > 
> > So maybe it might be a bit smarter than that and actually
> > enable/disable discard pass through depending on the underlying
> > support, so we do not blindly send discard down to the device even
> > though it does not support it.
> 
> Yes, that is what we did.
> 
> Discards are enabled my default (including discard passdown), but if the
> underlying data device doesn't support discards then the discards will
> not be passed down.
> 
> And here are the feature controls that can be provided when loading the
> thin-pool's DM table:
> 
> ignore_discard: disable discard
> no_discard_passdown: don't pass discards down to the data device
> 
> -EOPNOTSUPP is only ever returned if 'ignore_discard' is provided.

Ok, so in this case 'ignore_discard' has been configured ?

Thanks!
-Lukas

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 13:25           ` Lukáš Czerner
@ 2012-06-19 13:30             ` Mike Snitzer
  -1 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-19 13:30 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: Dave Chinner, Spelic, device-mapper development, linux-ext4, xfs

On Tue, Jun 19 2012 at  9:25am -0400,
Lukáš Czerner <lczerner@redhat.com> wrote:

> On Tue, 19 Jun 2012, Mike Snitzer wrote:
> 
> > Date: Tue, 19 Jun 2012 09:16:49 -0400
> > From: Mike Snitzer <snitzer@redhat.com>
> > To: Lukáš Czerner <lczerner@redhat.com>
> > Cc: Dave Chinner <david@fromorbit.com>, Spelic <spelic@shiftmail.org>,
> >     device-mapper development <dm-devel@redhat.com>,
> >     linux-ext4@vger.kernel.org, xfs@oss.sgi.com
> > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard
> > 
> > On Tue, Jun 19 2012 at  2:32am -0400,
> > Lukáš Czerner <lczerner@redhat.com> wrote:
> >
> > > So do I understand correctly that even though the discard came
> > > through and thinp took advantage of it it still returns EOPNOTSUPP ?
> > 
> > No, not correct.  Why are you assuming this?  I must be missing
> > something from this discussion that led you there.
> 
> Those two paragraphs led me to that conclusion:
> 
>   That message says the underlying device doesn't support discards
>   (because it is an MD device).  But the thinp device still has discards
>   enabled -- it just won't pass the discards down to the underlying data
>   device.
> 
>   The discards still inform the thin-pool that the corresponding extents
>   are no longer allocated.
> 
> so I am a bit confused now. Why the dm-thin returned EOPNOTSUPP then
> ? Is that because it has been configured to ignore_discard, or it
> actually takes advantage of the discard but underlying device does
> not support it (and no_discard_passdown is not set) so it return
> EOPNOTSUPP ?
> 
> > 
> > > This seems rather suboptimal. IIRC there was a discussion to add an
> > > option to enable/disable sending discard in thinp target down
> > > to the device.
> > > 
> > > So maybe it might be a bit smarter than that and actually
> > > enable/disable discard pass through depending on the underlying
> > > support, so we do not blindly send discard down to the device even
> > > though it does not support it.
> > 
> > Yes, that is what we did.
> > 
> > Discards are enabled my default (including discard passdown), but if the
> > underlying data device doesn't support discards then the discards will
> > not be passed down.
> > 
> > And here are the feature controls that can be provided when loading the
> > thin-pool's DM table:
> > 
> > ignore_discard: disable discard
> > no_discard_passdown: don't pass discards down to the data device
> > 
> > -EOPNOTSUPP is only ever returned if 'ignore_discard' is provided.
> 
> Ok, so in this case 'ignore_discard' has been configured ?

I don't recall Spelic saying anything about EOPNOTSUPP.  So what has
made you zero in on an -EOPNOTSUPP return (which should not be
happening)?
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 13:30             ` Mike Snitzer
  0 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-19 13:30 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: device-mapper development, linux-ext4, xfs, Spelic

On Tue, Jun 19 2012 at  9:25am -0400,
Lukáš Czerner <lczerner@redhat.com> wrote:

> On Tue, 19 Jun 2012, Mike Snitzer wrote:
> 
> > Date: Tue, 19 Jun 2012 09:16:49 -0400
> > From: Mike Snitzer <snitzer@redhat.com>
> > To: Lukáš Czerner <lczerner@redhat.com>
> > Cc: Dave Chinner <david@fromorbit.com>, Spelic <spelic@shiftmail.org>,
> >     device-mapper development <dm-devel@redhat.com>,
> >     linux-ext4@vger.kernel.org, xfs@oss.sgi.com
> > Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard
> > 
> > On Tue, Jun 19 2012 at  2:32am -0400,
> > Lukáš Czerner <lczerner@redhat.com> wrote:
> >
> > > So do I understand correctly that even though the discard came
> > > through and thinp took advantage of it it still returns EOPNOTSUPP ?
> > 
> > No, not correct.  Why are you assuming this?  I must be missing
> > something from this discussion that led you there.
> 
> Those two paragraphs led me to that conclusion:
> 
>   That message says the underlying device doesn't support discards
>   (because it is an MD device).  But the thinp device still has discards
>   enabled -- it just won't pass the discards down to the underlying data
>   device.
> 
>   The discards still inform the thin-pool that the corresponding extents
>   are no longer allocated.
> 
> so I am a bit confused now. Why the dm-thin returned EOPNOTSUPP then
> ? Is that because it has been configured to ignore_discard, or it
> actually takes advantage of the discard but underlying device does
> not support it (and no_discard_passdown is not set) so it return
> EOPNOTSUPP ?
> 
> > 
> > > This seems rather suboptimal. IIRC there was a discussion to add an
> > > option to enable/disable sending discard in thinp target down
> > > to the device.
> > > 
> > > So maybe it might be a bit smarter than that and actually
> > > enable/disable discard pass through depending on the underlying
> > > support, so we do not blindly send discard down to the device even
> > > though it does not support it.
> > 
> > Yes, that is what we did.
> > 
> > Discards are enabled my default (including discard passdown), but if the
> > underlying data device doesn't support discards then the discards will
> > not be passed down.
> > 
> > And here are the feature controls that can be provided when loading the
> > thin-pool's DM table:
> > 
> > ignore_discard: disable discard
> > no_discard_passdown: don't pass discards down to the data device
> > 
> > -EOPNOTSUPP is only ever returned if 'ignore_discard' is provided.
> 
> Ok, so in this case 'ignore_discard' has been configured ?

I don't recall Spelic saying anything about EOPNOTSUPP.  So what has
made you zero in on an -EOPNOTSUPP return (which should not be
happening)?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 11:29         ` Spelic
@ 2012-06-19 13:34           ` Mike Snitzer
  -1 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-19 13:34 UTC (permalink / raw)
  To: Spelic
  Cc: Lukáš Czerner, device-mapper development, linux-ext4,
	Dave Chinner, xfs

On Tue, Jun 19 2012 at  7:29am -0400,
Spelic <spelic@shiftmail.org> wrote:

> On 06/19/12 08:32, Lukáš Czerner wrote:
> >
> >So do I understand correctly that even though the discard came
> >through and thinp took advantage of it it still returns EOPNOTSUPP ?
> >This seems rather suboptimal. IIRC there was a discussion to add an
> >option to enable/disable sending discard in thinp target down
> >to the device.
> 
> I'll ask this too...
> do I understand correctly that dm-thin returns EOPNOTSUPP to the
> filesystem layer even though it is using the discard to unmap
> blocks, and at that point XFS stops sending discards down there
> (while ext4 keeps sending them)?

Are you actually seeing that?  Or are you just seizing on Lukas'
misunderstanding?

> This looks like a bug of dm-thin to me. Discards are "supported" in
> such a scenario.
> 
> Do you have a patch for dm-thin so to prevent it sending EOPTNOTSUPP ?

thinp should _not_ be sending -EOPNOTSUPP unless 'ignore_discard' is
provided as a feature when loading thin-pool's DM  table.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 13:34           ` Mike Snitzer
  0 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-19 13:34 UTC (permalink / raw)
  To: Spelic
  Cc: Lukáš Czerner, device-mapper development, linux-ext4, xfs

On Tue, Jun 19 2012 at  7:29am -0400,
Spelic <spelic@shiftmail.org> wrote:

> On 06/19/12 08:32, Lukáš Czerner wrote:
> >
> >So do I understand correctly that even though the discard came
> >through and thinp took advantage of it it still returns EOPNOTSUPP ?
> >This seems rather suboptimal. IIRC there was a discussion to add an
> >option to enable/disable sending discard in thinp target down
> >to the device.
> 
> I'll ask this too...
> do I understand correctly that dm-thin returns EOPNOTSUPP to the
> filesystem layer even though it is using the discard to unmap
> blocks, and at that point XFS stops sending discards down there
> (while ext4 keeps sending them)?

Are you actually seeing that?  Or are you just seizing on Lukas'
misunderstanding?

> This looks like a bug of dm-thin to me. Discards are "supported" in
> such a scenario.
> 
> Do you have a patch for dm-thin so to prevent it sending EOPTNOTSUPP ?

thinp should _not_ be sending -EOPNOTSUPP unless 'ignore_discard' is
provided as a feature when loading thin-pool's DM  table.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 13:30             ` Mike Snitzer
@ 2012-06-19 13:52               ` Spelic
  -1 siblings, 0 replies; 72+ messages in thread
From: Spelic @ 2012-06-19 13:52 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Lukáš Czerner, Dave Chinner, Spelic,
	device-mapper development, linux-ext4, xfs

On 06/19/12 15:30, Mike Snitzer wrote:
> I don't recall Spelic saying anything about EOPNOTSUPP. So what has 
> made you zero in on an -EOPNOTSUPP return (which should not be 
> happening)? 

Exactly: I do not know if EOPNOTSUPP is being returned or not.

If this helps, I have configured dm-thin via lvm2
   LVM version:     2.02.95(2) (2012-03-06)
   Library version: 1.02.74 (2012-03-06)
   Driver version:  4.22.0

from dmsetup table I only see one option : "skip_block_zeroing", if and 
only if I configure it with -Zn . I do not see anything regarding 
ignore_discard

      vg1-pooltry1-tpool: 0 20971520 thin-pool 252:1 252:2 2048 0 1 
skip_block_zeroing
      vg1-pooltry1_tdata: 0 20971520 linear 9:20 62922752
      vg1-pooltry1_tmeta: 0 8192 linear 9:20 83894272
      vg1-thinlv1: 0 31457280 thin 252:3 1


and in dmesg:
[   33.685200] device-mapper: thin: Discard unsupported by data device 
(dm-2): Disabling discard passdown.
[   33.709586] device-mapper: thin: Discard unsupported by data device 
(dm-6): Disabling discard passdown.


I do not know what is the mechanism for which xfs cannot unmap blocks 
from dm-thin, but it really can't.
If anyone has dm-thin installed he can try. This is 100% reproducible 
for me.



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 13:52               ` Spelic
  0 siblings, 0 replies; 72+ messages in thread
From: Spelic @ 2012-06-19 13:52 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: xfs, device-mapper development, Spelic, Lukáš Czerner,
	linux-ext4

On 06/19/12 15:30, Mike Snitzer wrote:
> I don't recall Spelic saying anything about EOPNOTSUPP. So what has 
> made you zero in on an -EOPNOTSUPP return (which should not be 
> happening)? 

Exactly: I do not know if EOPNOTSUPP is being returned or not.

If this helps, I have configured dm-thin via lvm2
   LVM version:     2.02.95(2) (2012-03-06)
   Library version: 1.02.74 (2012-03-06)
   Driver version:  4.22.0

from dmsetup table I only see one option : "skip_block_zeroing", if and 
only if I configure it with -Zn . I do not see anything regarding 
ignore_discard

      vg1-pooltry1-tpool: 0 20971520 thin-pool 252:1 252:2 2048 0 1 
skip_block_zeroing
      vg1-pooltry1_tdata: 0 20971520 linear 9:20 62922752
      vg1-pooltry1_tmeta: 0 8192 linear 9:20 83894272
      vg1-thinlv1: 0 31457280 thin 252:3 1


and in dmesg:
[   33.685200] device-mapper: thin: Discard unsupported by data device 
(dm-2): Disabling discard passdown.
[   33.709586] device-mapper: thin: Discard unsupported by data device 
(dm-6): Disabling discard passdown.


I do not know what is the mechanism for which xfs cannot unmap blocks 
from dm-thin, but it really can't.
If anyone has dm-thin installed he can try. This is 100% reproducible 
for me.


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 13:52               ` Spelic
@ 2012-06-19 14:05                 ` Eric Sandeen
  -1 siblings, 0 replies; 72+ messages in thread
From: Eric Sandeen @ 2012-06-19 14:05 UTC (permalink / raw)
  To: Spelic
  Cc: Mike Snitzer, Lukáš Czerner, Dave Chinner,
	device-mapper development, linux-ext4, xfs

On 6/19/12 8:52 AM, Spelic wrote:
> On 06/19/12 15:30, Mike Snitzer wrote:
>> I don't recall Spelic saying anything about EOPNOTSUPP. So what has made you zero in on an -EOPNOTSUPP return (which should not be happening)? 
> 
> Exactly: I do not know if EOPNOTSUPP is being returned or not.
> 
> If this helps, I have configured dm-thin via lvm2
>   LVM version:     2.02.95(2) (2012-03-06)
>   Library version: 1.02.74 (2012-03-06)
>   Driver version:  4.22.0
> 
> from dmsetup table I only see one option : "skip_block_zeroing", if and only if I configure it with -Zn . I do not see anything regarding ignore_discard
> 
>      vg1-pooltry1-tpool: 0 20971520 thin-pool 252:1 252:2 2048 0 1 skip_block_zeroing
>      vg1-pooltry1_tdata: 0 20971520 linear 9:20 62922752
>      vg1-pooltry1_tmeta: 0 8192 linear 9:20 83894272
>      vg1-thinlv1: 0 31457280 thin 252:3 1
> 
> 
> and in dmesg:
> [   33.685200] device-mapper: thin: Discard unsupported by data device (dm-2): Disabling discard passdown.
> [   33.709586] device-mapper: thin: Discard unsupported by data device (dm-6): Disabling discard passdown.
> 
> 
> I do not know what is the mechanism for which xfs cannot unmap blocks from dm-thin, but it really can't.
> If anyone has dm-thin installed he can try. This is 100% reproducible for me.

Might be worth seeing if xfs is ever getting to its discard code?  There is a tracepoint...

# mount -t debugfs none /sys/kernel/debug
# echo 1 > /sys/kernel/debug/tracing/tracing_enabled
# echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_discard_extent/enable

<run test>

# cat /sys/kernel/debug/tracing/trace

-Eric

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 14:05                 ` Eric Sandeen
  0 siblings, 0 replies; 72+ messages in thread
From: Eric Sandeen @ 2012-06-19 14:05 UTC (permalink / raw)
  To: Spelic
  Cc: Mike Snitzer, xfs, device-mapper development,
	Lukáš Czerner, linux-ext4

On 6/19/12 8:52 AM, Spelic wrote:
> On 06/19/12 15:30, Mike Snitzer wrote:
>> I don't recall Spelic saying anything about EOPNOTSUPP. So what has made you zero in on an -EOPNOTSUPP return (which should not be happening)? 
> 
> Exactly: I do not know if EOPNOTSUPP is being returned or not.
> 
> If this helps, I have configured dm-thin via lvm2
>   LVM version:     2.02.95(2) (2012-03-06)
>   Library version: 1.02.74 (2012-03-06)
>   Driver version:  4.22.0
> 
> from dmsetup table I only see one option : "skip_block_zeroing", if and only if I configure it with -Zn . I do not see anything regarding ignore_discard
> 
>      vg1-pooltry1-tpool: 0 20971520 thin-pool 252:1 252:2 2048 0 1 skip_block_zeroing
>      vg1-pooltry1_tdata: 0 20971520 linear 9:20 62922752
>      vg1-pooltry1_tmeta: 0 8192 linear 9:20 83894272
>      vg1-thinlv1: 0 31457280 thin 252:3 1
> 
> 
> and in dmesg:
> [   33.685200] device-mapper: thin: Discard unsupported by data device (dm-2): Disabling discard passdown.
> [   33.709586] device-mapper: thin: Discard unsupported by data device (dm-6): Disabling discard passdown.
> 
> 
> I do not know what is the mechanism for which xfs cannot unmap blocks from dm-thin, but it really can't.
> If anyone has dm-thin installed he can try. This is 100% reproducible for me.

Might be worth seeing if xfs is ever getting to its discard code?  There is a tracepoint...

# mount -t debugfs none /sys/kernel/debug
# echo 1 > /sys/kernel/debug/tracing/tracing_enabled
# echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_discard_extent/enable

<run test>

# cat /sys/kernel/debug/tracing/trace

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-18 21:33 ` Spelic
@ 2012-06-19 14:09   ` Lukáš Czerner
  -1 siblings, 0 replies; 72+ messages in thread
From: Lukáš Czerner @ 2012-06-19 14:09 UTC (permalink / raw)
  To: Spelic; +Cc: xfs, linux-ext4, device-mapper development

On Mon, 18 Jun 2012, Spelic wrote:

> Date: Mon, 18 Jun 2012 23:33:50 +0200
> From: Spelic <spelic@shiftmail.org>
> To: xfs@oss.sgi.com, linux-ext4@vger.kernel.org,
>     device-mapper development <dm-devel@redhat.com>
> Subject: Ext4 and xfs problems in dm-thin on allocation and discard
> 
> Hello all
> I am doing some testing of dm-thin on kernel 3.4.2 and latest lvm from source
> (the rest is Ubuntu Precise 12.04).
> There are a few problems with ext4 and (different ones with) xfs
> 
> I am doing this:
> dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync
> lvs
> rm zeroes #optional
> dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync  #again
> lvs
> rm zeroes #optional
> ...
> dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync  #again
> lvs
> rm zeroes
> fstrim /mnt/mountpoint
> lvs
> 
> On ext4 the problem is that it always reallocates blocks at different places,
> so you can see from lvs that space occupation in the pool and thinlv increases
> at each iteration of dd, again and again, until it has allocated the whole
> thin device (really 100% of it). And this is true regardless of me doing rm or
> not between one dd and the other.
> The other problem is that by doing this, ext4 always gets the worst
> performance from thinp, about 140MB/sec on my system, because it is constantly
> allocating blocks, instead of 350MB/sec which should have been with my system
> if it used already allocated regions (see below compared to xfs). I am on an
> MD raid-5 of 5 hdds.
> I could suggest to add a "thinp mode" mount option to ext4 affecting the
> allocator, so that it tries to reallocate recently used and freed areas and
> not constantly new areas. Note that mount -o discard does work and prevents
> allocation bloating, but it still always gets the worst write performances
> from thinp. Alternatively thinp could be improved so that block allocation is
> fast :-P (*)
> However, good news is that fstrim works correctly on ext4, and is able to drop
> all space allocated by all dd's. Also mount -o discard works.

I am happy to hear that discard actually works with ext4. Regarding
the performance problem, part of it has already been explained by
Dave and I agree with him.

With thin provisioning you'll get totally different file system
layout than on fully provisioned disk as you push more and more
writes to your drive. This unfortunately has great impact on
performance since file systems usually have a lot of optimization on
where to put data/metadata on the drive and how to read them.
However in case of thinly provisioned storage those optimization
would not help. And yes, you just have to expect lower performance
with dm-thin from the file system on top of it. It is not and it
will never be ideal solution for workloads where you expect the best
performance.

However optimization have to be done on dm and fs side and the work
is currently in progress and now when we have "cheap" thinp solution
I guess that the progress will by quite faster in that regard.

-Lukas

> 
> On xfs there is a different problem.
> Xfs apparently correctly re-uses the same blocks so that after the first write
> at 140MB/sec, subsequent overwrites of the same file are at full speed such as
> 350MB/sec (same speed as with non-thin lvm), and also you don't see space
> occupation going up at every iteration of dd, either with or without rm
> in-between the dd's. [ok actually now retrying it needed 3 rewrites to
> stabilize allocation... probably an AG count thing.]
> However the problem with XFS is that discard doesn't appear to work. Fstrim
> doesn't work, and neither does "mount -o discard ... + rm zeroes" . There is
> apparently no way to drop the allocated blocks, as seen from lvs. This is in
> contrast to what it is written here http://xfs.org/index.php/FITRIM/discard
> which declare fstrim and mount -o discard to be working.
> Please note that since I am above MD raid5 (I believe this is the reason), the
> passdown of discards does not work, as my dmesg says:
> [160508.497879] device-mapper: thin: Discard unsupported by data device
> (dm-1): Disabling discard passdown.
> but AFAIU, unless there is a thinp bug, this should not affect the unmapping
> of thin blocks by fstrimming xfs... and in fact ext4 is able to do that.
> 
> (*) Strange thing is that write performance appears to be roughly the same for
> default thin chunksize and for 1MB thin chunksize. I would have expected thinp
> allocation to be faster with larger thin chunksizes but instead it is actually
> slower (note that there are no snapshots here and hence no CoW). This is also
> true if I set the thinpool to not zero newly allocated blocks: performances
> are about 240 MB/sec then, but again they don't increase with larger
> chunksizes, they actually decrease slightly with very large chunksizes such as
> 16MB. Why is that?
> 
> Thanks for your help
> S.
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 14:09   ` Lukáš Czerner
  0 siblings, 0 replies; 72+ messages in thread
From: Lukáš Czerner @ 2012-06-19 14:09 UTC (permalink / raw)
  To: Spelic; +Cc: device-mapper development, linux-ext4, xfs

On Mon, 18 Jun 2012, Spelic wrote:

> Date: Mon, 18 Jun 2012 23:33:50 +0200
> From: Spelic <spelic@shiftmail.org>
> To: xfs@oss.sgi.com, linux-ext4@vger.kernel.org,
>     device-mapper development <dm-devel@redhat.com>
> Subject: Ext4 and xfs problems in dm-thin on allocation and discard
> 
> Hello all
> I am doing some testing of dm-thin on kernel 3.4.2 and latest lvm from source
> (the rest is Ubuntu Precise 12.04).
> There are a few problems with ext4 and (different ones with) xfs
> 
> I am doing this:
> dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync
> lvs
> rm zeroes #optional
> dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync  #again
> lvs
> rm zeroes #optional
> ...
> dd if=/dev/zero of=zeroes bs=1M count=1000 conv=fsync  #again
> lvs
> rm zeroes
> fstrim /mnt/mountpoint
> lvs
> 
> On ext4 the problem is that it always reallocates blocks at different places,
> so you can see from lvs that space occupation in the pool and thinlv increases
> at each iteration of dd, again and again, until it has allocated the whole
> thin device (really 100% of it). And this is true regardless of me doing rm or
> not between one dd and the other.
> The other problem is that by doing this, ext4 always gets the worst
> performance from thinp, about 140MB/sec on my system, because it is constantly
> allocating blocks, instead of 350MB/sec which should have been with my system
> if it used already allocated regions (see below compared to xfs). I am on an
> MD raid-5 of 5 hdds.
> I could suggest to add a "thinp mode" mount option to ext4 affecting the
> allocator, so that it tries to reallocate recently used and freed areas and
> not constantly new areas. Note that mount -o discard does work and prevents
> allocation bloating, but it still always gets the worst write performances
> from thinp. Alternatively thinp could be improved so that block allocation is
> fast :-P (*)
> However, good news is that fstrim works correctly on ext4, and is able to drop
> all space allocated by all dd's. Also mount -o discard works.

I am happy to hear that discard actually works with ext4. Regarding
the performance problem, part of it has already been explained by
Dave and I agree with him.

With thin provisioning you'll get totally different file system
layout than on fully provisioned disk as you push more and more
writes to your drive. This unfortunately has great impact on
performance since file systems usually have a lot of optimization on
where to put data/metadata on the drive and how to read them.
However in case of thinly provisioned storage those optimization
would not help. And yes, you just have to expect lower performance
with dm-thin from the file system on top of it. It is not and it
will never be ideal solution for workloads where you expect the best
performance.

However optimization have to be done on dm and fs side and the work
is currently in progress and now when we have "cheap" thinp solution
I guess that the progress will by quite faster in that regard.

-Lukas

> 
> On xfs there is a different problem.
> Xfs apparently correctly re-uses the same blocks so that after the first write
> at 140MB/sec, subsequent overwrites of the same file are at full speed such as
> 350MB/sec (same speed as with non-thin lvm), and also you don't see space
> occupation going up at every iteration of dd, either with or without rm
> in-between the dd's. [ok actually now retrying it needed 3 rewrites to
> stabilize allocation... probably an AG count thing.]
> However the problem with XFS is that discard doesn't appear to work. Fstrim
> doesn't work, and neither does "mount -o discard ... + rm zeroes" . There is
> apparently no way to drop the allocated blocks, as seen from lvs. This is in
> contrast to what it is written here http://xfs.org/index.php/FITRIM/discard
> which declare fstrim and mount -o discard to be working.
> Please note that since I am above MD raid5 (I believe this is the reason), the
> passdown of discards does not work, as my dmesg says:
> [160508.497879] device-mapper: thin: Discard unsupported by data device
> (dm-1): Disabling discard passdown.
> but AFAIU, unless there is a thinp bug, this should not affect the unmapping
> of thin blocks by fstrimming xfs... and in fact ext4 is able to do that.
> 
> (*) Strange thing is that write performance appears to be roughly the same for
> default thin chunksize and for 1MB thin chunksize. I would have expected thinp
> allocation to be faster with larger thin chunksizes but instead it is actually
> slower (note that there are no snapshots here and hence no CoW). This is also
> true if I set the thinpool to not zero newly allocated blocks: performances
> are about 240 MB/sec then, but again they don't increase with larger
> chunksizes, they actually decrease slightly with very large chunksizes such as
> 16MB. Why is that?
> 
> Thanks for your help
> S.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 14:09   ` Lukáš Czerner
@ 2012-06-19 14:19     ` Ted Ts'o
  -1 siblings, 0 replies; 72+ messages in thread
From: Ted Ts'o @ 2012-06-19 14:19 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: Spelic, xfs, linux-ext4, device-mapper development

On Tue, Jun 19, 2012 at 04:09:48PM +0200, Lukáš Czerner wrote:
> 
> With thin provisioning you'll get totally different file system
> layout than on fully provisioned disk as you push more and more
> writes to your drive. This unfortunately has great impact on
> performance since file systems usually have a lot of optimization on
> where to put data/metadata on the drive and how to read them.
> However in case of thinly provisioned storage those optimization
> would not help. And yes, you just have to expect lower performance
> with dm-thin from the file system on top of it. It is not and it
> will never be ideal solution for workloads where you expect the best
> performance.

One of the things which would be nice to be able to easily set up is a
configuration where we get the benefits of thin provisioning with
respect to snapshost, but where the underlying block device used by
the file system is contiguous.  That is, it would be really useful to
*not* use thin provisioning for the underlying file system, but to use
thin provisioned snapshots.  That way we only pay the thinp
performance penalty for the snapshots, and not for normal file system
operations.  This is something that would be very useful both for ext4
and xfs.

I talked to Alasdair about this a few months ago at the Collab Summit,
and I think it's doable today, but it was somewhat complicaed to set
up.  I don't recall the details now, but perhaps someone who's more
familiar device mapper could outline the details, and perhaps we can
either simplify it or abstract it away in a convenient front-end
script?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 14:19     ` Ted Ts'o
  0 siblings, 0 replies; 72+ messages in thread
From: Ted Ts'o @ 2012-06-19 14:19 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: linux-ext4, device-mapper development, xfs, Spelic

On Tue, Jun 19, 2012 at 04:09:48PM +0200, Lukáš Czerner wrote:
> 
> With thin provisioning you'll get totally different file system
> layout than on fully provisioned disk as you push more and more
> writes to your drive. This unfortunately has great impact on
> performance since file systems usually have a lot of optimization on
> where to put data/metadata on the drive and how to read them.
> However in case of thinly provisioned storage those optimization
> would not help. And yes, you just have to expect lower performance
> with dm-thin from the file system on top of it. It is not and it
> will never be ideal solution for workloads where you expect the best
> performance.

One of the things which would be nice to be able to easily set up is a
configuration where we get the benefits of thin provisioning with
respect to snapshost, but where the underlying block device used by
the file system is contiguous.  That is, it would be really useful to
*not* use thin provisioning for the underlying file system, but to use
thin provisioned snapshots.  That way we only pay the thinp
performance penalty for the snapshots, and not for normal file system
operations.  This is something that would be very useful both for ext4
and xfs.

I talked to Alasdair about this a few months ago at the Collab Summit,
and I think it's doable today, but it was somewhat complicaed to set
up.  I don't recall the details now, but perhaps someone who's more
familiar device mapper could outline the details, and perhaps we can
either simplify it or abstract it away in a convenient front-end
script?

						- Ted

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 14:19     ` Ted Ts'o
@ 2012-06-19 14:23       ` Eric Sandeen
  -1 siblings, 0 replies; 72+ messages in thread
From: Eric Sandeen @ 2012-06-19 14:23 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Lukáš Czerner, Spelic, xfs, linux-ext4,
	device-mapper development

On 6/19/12 9:19 AM, Ted Ts'o wrote:
> On Tue, Jun 19, 2012 at 04:09:48PM +0200, Lukáš Czerner wrote:
>>
>> With thin provisioning you'll get totally different file system
>> layout than on fully provisioned disk as you push more and more
>> writes to your drive. This unfortunately has great impact on
>> performance since file systems usually have a lot of optimization on
>> where to put data/metadata on the drive and how to read them.
>> However in case of thinly provisioned storage those optimization
>> would not help. And yes, you just have to expect lower performance
>> with dm-thin from the file system on top of it. It is not and it
>> will never be ideal solution for workloads where you expect the best
>> performance.
> 
> One of the things which would be nice to be able to easily set up is a
> configuration where we get the benefits of thin provisioning with
> respect to snapshost, but where the underlying block device used by
> the file system is contiguous.  That is, it would be really useful to
> *not* use thin provisioning for the underlying file system, but to use
> thin provisioned snapshots.  That way we only pay the thinp
> performance penalty for the snapshots, and not for normal file system
> operations.  This is something that would be very useful both for ext4
> and xfs.

I agree, and have asked for exactly the same thing... though I have no
idea how hard it is to disentangle allocation-aware snapshots from thing
provisioned storage.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 14:23       ` Eric Sandeen
  0 siblings, 0 replies; 72+ messages in thread
From: Eric Sandeen @ 2012-06-19 14:23 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Lukáš Czerner, linux-ext4, xfs,
	device-mapper development, Spelic

On 6/19/12 9:19 AM, Ted Ts'o wrote:
> On Tue, Jun 19, 2012 at 04:09:48PM +0200, Lukáš Czerner wrote:
>>
>> With thin provisioning you'll get totally different file system
>> layout than on fully provisioned disk as you push more and more
>> writes to your drive. This unfortunately has great impact on
>> performance since file systems usually have a lot of optimization on
>> where to put data/metadata on the drive and how to read them.
>> However in case of thinly provisioned storage those optimization
>> would not help. And yes, you just have to expect lower performance
>> with dm-thin from the file system on top of it. It is not and it
>> will never be ideal solution for workloads where you expect the best
>> performance.
> 
> One of the things which would be nice to be able to easily set up is a
> configuration where we get the benefits of thin provisioning with
> respect to snapshost, but where the underlying block device used by
> the file system is contiguous.  That is, it would be really useful to
> *not* use thin provisioning for the underlying file system, but to use
> thin provisioned snapshots.  That way we only pay the thinp
> performance penalty for the snapshots, and not for normal file system
> operations.  This is something that would be very useful both for ext4
> and xfs.

I agree, and have asked for exactly the same thing... though I have no
idea how hard it is to disentangle allocation-aware snapshots from thing
provisioned storage.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 14:19     ` Ted Ts'o
@ 2012-06-19 14:37       ` Lukáš Czerner
  -1 siblings, 0 replies; 72+ messages in thread
From: Lukáš Czerner @ 2012-06-19 14:37 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Lukáš Czerner, Spelic, xfs, linux-ext4,
	device-mapper development

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3187 bytes --]

On Tue, 19 Jun 2012, Ted Ts'o wrote:

> Date: Tue, 19 Jun 2012 10:19:33 -0400
> From: Ted Ts'o <tytso@mit.edu>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: Spelic <spelic@shiftmail.org>, xfs@oss.sgi.com,
>     linux-ext4@vger.kernel.org,
>     device-mapper development <dm-devel@redhat.com>
> Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard
> 
> On Tue, Jun 19, 2012 at 04:09:48PM +0200, Lukáš Czerner wrote:
> > 
> > With thin provisioning you'll get totally different file system
> > layout than on fully provisioned disk as you push more and more
> > writes to your drive. This unfortunately has great impact on
> > performance since file systems usually have a lot of optimization on
> > where to put data/metadata on the drive and how to read them.
> > However in case of thinly provisioned storage those optimization
> > would not help. And yes, you just have to expect lower performance
> > with dm-thin from the file system on top of it. It is not and it
> > will never be ideal solution for workloads where you expect the best
> > performance.
> 
> One of the things which would be nice to be able to easily set up is a
> configuration where we get the benefits of thin provisioning with
> respect to snapshost, but where the underlying block device used by
> the file system is contiguous.  That is, it would be really useful to
> *not* use thin provisioning for the underlying file system, but to use
> thin provisioned snapshots.  That way we only pay the thinp
> performance penalty for the snapshots, and not for normal file system
> operations.  This is something that would be very useful both for ext4
> and xfs.
> 
> I talked to Alasdair about this a few months ago at the Collab Summit,
> and I think it's doable today, but it was somewhat complicaed to set
> up.  I don't recall the details now, but perhaps someone who's more
> familiar device mapper could outline the details, and perhaps we can
> either simplify it or abstract it away in a convenient front-end
> script?

like ssm for example ? :)

Yes this would definitely help and I think there are actually more
possible optimization like this.

If we "cripple" the dm-thin so that only snapshot feature is
provided, but the actual thinp feature is not used. It would
definitely help the performance for those who are only interested in
snapshots. You'll still have your file system layout mixed up once
you start using snapshot, but it'll be definitely better. Also some
king of fs/dm interface for optimizing the layout might helpful as
well.

The other thing which could be done is to still enable to utilize
thinp feature, but try to keep file systems on the dm-thin relatively
separated and contiguous (although probably not in it's entire size).
It would certainly work only to some thin pool utilization threshold,
but it is something. Also if we can add some fs related optimization
to try not to span entire file system but rather utilize smaller parts
first (alter the block allocator so it does not allocate blocks from
random groups from entire fs but rather have smaller block group
working set at start), this can be even more useful.

-Lukas

> 
> 						- Ted
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 14:37       ` Lukáš Czerner
  0 siblings, 0 replies; 72+ messages in thread
From: Lukáš Czerner @ 2012-06-19 14:37 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Lukáš Czerner, linux-ext4, xfs,
	device-mapper development, Spelic

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3187 bytes --]

On Tue, 19 Jun 2012, Ted Ts'o wrote:

> Date: Tue, 19 Jun 2012 10:19:33 -0400
> From: Ted Ts'o <tytso@mit.edu>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: Spelic <spelic@shiftmail.org>, xfs@oss.sgi.com,
>     linux-ext4@vger.kernel.org,
>     device-mapper development <dm-devel@redhat.com>
> Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard
> 
> On Tue, Jun 19, 2012 at 04:09:48PM +0200, Lukáš Czerner wrote:
> > 
> > With thin provisioning you'll get totally different file system
> > layout than on fully provisioned disk as you push more and more
> > writes to your drive. This unfortunately has great impact on
> > performance since file systems usually have a lot of optimization on
> > where to put data/metadata on the drive and how to read them.
> > However in case of thinly provisioned storage those optimization
> > would not help. And yes, you just have to expect lower performance
> > with dm-thin from the file system on top of it. It is not and it
> > will never be ideal solution for workloads where you expect the best
> > performance.
> 
> One of the things which would be nice to be able to easily set up is a
> configuration where we get the benefits of thin provisioning with
> respect to snapshost, but where the underlying block device used by
> the file system is contiguous.  That is, it would be really useful to
> *not* use thin provisioning for the underlying file system, but to use
> thin provisioned snapshots.  That way we only pay the thinp
> performance penalty for the snapshots, and not for normal file system
> operations.  This is something that would be very useful both for ext4
> and xfs.
> 
> I talked to Alasdair about this a few months ago at the Collab Summit,
> and I think it's doable today, but it was somewhat complicaed to set
> up.  I don't recall the details now, but perhaps someone who's more
> familiar device mapper could outline the details, and perhaps we can
> either simplify it or abstract it away in a convenient front-end
> script?

like ssm for example ? :)

Yes this would definitely help and I think there are actually more
possible optimization like this.

If we "cripple" the dm-thin so that only snapshot feature is
provided, but the actual thinp feature is not used. It would
definitely help the performance for those who are only interested in
snapshots. You'll still have your file system layout mixed up once
you start using snapshot, but it'll be definitely better. Also some
king of fs/dm interface for optimizing the layout might helpful as
well.

The other thing which could be done is to still enable to utilize
thinp feature, but try to keep file systems on the dm-thin relatively
separated and contiguous (although probably not in it's entire size).
It would certainly work only to some thin pool utilization threshold,
but it is something. Also if we can add some fs related optimization
to try not to span entire file system but rather utilize smaller parts
first (alter the block allocator so it does not allocate blocks from
random groups from entire fs but rather have smaller block group
working set at start), this can be even more useful.

-Lukas

> 
> 						- Ted
> 

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [dm-devel] Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 14:19     ` Ted Ts'o
@ 2012-06-19 14:43       ` Alasdair G Kergon
  -1 siblings, 0 replies; 72+ messages in thread
From: Alasdair G Kergon @ 2012-06-19 14:43 UTC (permalink / raw)
  To: device-mapper development
  Cc: Lukáš Czerner, linux-ext4, xfs, Spelic

On Tue, Jun 19, 2012 at 10:19:33AM -0400, Ted Ts'o wrote:
> One of the things which would be nice to be able to easily set up is a
> configuration where we get the benefits of thin provisioning with
> respect to snapshost, but where the underlying block device used by
> the file system is contiguous.  

We're tracking this requirement (for lvm2) here:
  https://bugzilla.redhat.com/show_bug.cgi?id=814737

Alasdair


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [dm-devel] Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 14:43       ` Alasdair G Kergon
  0 siblings, 0 replies; 72+ messages in thread
From: Alasdair G Kergon @ 2012-06-19 14:43 UTC (permalink / raw)
  To: device-mapper development
  Cc: Lukáš Czerner, Spelic, linux-ext4, xfs

On Tue, Jun 19, 2012 at 10:19:33AM -0400, Ted Ts'o wrote:
> One of the things which would be nice to be able to easily set up is a
> configuration where we get the benefits of thin provisioning with
> respect to snapshost, but where the underlying block device used by
> the file system is contiguous.  

We're tracking this requirement (for lvm2) here:
  https://bugzilla.redhat.com/show_bug.cgi?id=814737

Alasdair

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 13:52               ` Spelic
@ 2012-06-19 14:44                 ` Mike Snitzer
  -1 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-19 14:44 UTC (permalink / raw)
  To: Spelic
  Cc: Lukáš Czerner, Dave Chinner, device-mapper development,
	linux-ext4, xfs

On Tue, Jun 19 2012 at  9:52am -0400,
Spelic <spelic@shiftmail.org> wrote:

> On 06/19/12 15:30, Mike Snitzer wrote:
> >I don't recall Spelic saying anything about EOPNOTSUPP. So what
> >has made you zero in on an -EOPNOTSUPP return (which should not be
> >happening)?
> 
> Exactly: I do not know if EOPNOTSUPP is being returned or not.
> 
> If this helps, I have configured dm-thin via lvm2
>   LVM version:     2.02.95(2) (2012-03-06)
>   Library version: 1.02.74 (2012-03-06)
>   Driver version:  4.22.0
> 
> from dmsetup table I only see one option : "skip_block_zeroing", if
> and only if I configure it with -Zn . I do not see anything
> regarding ignore_discard
> 
>      vg1-pooltry1-tpool: 0 20971520 thin-pool 252:1 252:2 2048 0 1
> skip_block_zeroing
>      vg1-pooltry1_tdata: 0 20971520 linear 9:20 62922752
>      vg1-pooltry1_tmeta: 0 8192 linear 9:20 83894272
>      vg1-thinlv1: 0 31457280 thin 252:3 1
> 
> 
> and in dmesg:
> [   33.685200] device-mapper: thin: Discard unsupported by data
> device (dm-2): Disabling discard passdown.
> [   33.709586] device-mapper: thin: Discard unsupported by data
> device (dm-6): Disabling discard passdown.
> 
> 
> I do not know what is the mechanism for which xfs cannot unmap
> blocks from dm-thin, but it really can't.
> If anyone has dm-thin installed he can try. This is 100%
> reproducible for me.

I was initially surprised by this considering the thinp-test-suite does
test a compilebench workload against xfs and ext4 using online discard
(-o discard).

But I just modified that test to use a thin-pool with 'ignore_discard'
and the test still passed on both ext4 and xfs.

So there is more work needed in the thinp-test-suite to use blktrace
hooks to verify that discards are occuring when the compilebench
generated files are removed.

I'll work through that and report back.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 14:44                 ` Mike Snitzer
  0 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-19 14:44 UTC (permalink / raw)
  To: Spelic
  Cc: Lukáš Czerner, device-mapper development, linux-ext4, xfs

On Tue, Jun 19 2012 at  9:52am -0400,
Spelic <spelic@shiftmail.org> wrote:

> On 06/19/12 15:30, Mike Snitzer wrote:
> >I don't recall Spelic saying anything about EOPNOTSUPP. So what
> >has made you zero in on an -EOPNOTSUPP return (which should not be
> >happening)?
> 
> Exactly: I do not know if EOPNOTSUPP is being returned or not.
> 
> If this helps, I have configured dm-thin via lvm2
>   LVM version:     2.02.95(2) (2012-03-06)
>   Library version: 1.02.74 (2012-03-06)
>   Driver version:  4.22.0
> 
> from dmsetup table I only see one option : "skip_block_zeroing", if
> and only if I configure it with -Zn . I do not see anything
> regarding ignore_discard
> 
>      vg1-pooltry1-tpool: 0 20971520 thin-pool 252:1 252:2 2048 0 1
> skip_block_zeroing
>      vg1-pooltry1_tdata: 0 20971520 linear 9:20 62922752
>      vg1-pooltry1_tmeta: 0 8192 linear 9:20 83894272
>      vg1-thinlv1: 0 31457280 thin 252:3 1
> 
> 
> and in dmesg:
> [   33.685200] device-mapper: thin: Discard unsupported by data
> device (dm-2): Disabling discard passdown.
> [   33.709586] device-mapper: thin: Discard unsupported by data
> device (dm-6): Disabling discard passdown.
> 
> 
> I do not know what is the mechanism for which xfs cannot unmap
> blocks from dm-thin, but it really can't.
> If anyone has dm-thin installed he can try. This is 100%
> reproducible for me.

I was initially surprised by this considering the thinp-test-suite does
test a compilebench workload against xfs and ext4 using online discard
(-o discard).

But I just modified that test to use a thin-pool with 'ignore_discard'
and the test still passed on both ext4 and xfs.

So there is more work needed in the thinp-test-suite to use blktrace
hooks to verify that discards are occuring when the compilebench
generated files are removed.

I'll work through that and report back.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 14:43       ` Alasdair G Kergon
@ 2012-06-19 15:28         ` Mike Snitzer
  -1 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-19 15:28 UTC (permalink / raw)
  To: device-mapper development, Lukáš Czerner, linux-ext4,
	xfs, Spelic

On Tue, Jun 19 2012 at 10:43am -0400,
Alasdair G Kergon <agk@redhat.com> wrote:

> On Tue, Jun 19, 2012 at 10:19:33AM -0400, Ted Ts'o wrote:
> > One of the things which would be nice to be able to easily set up is a
> > configuration where we get the benefits of thin provisioning with
> > respect to snapshost, but where the underlying block device used by
> > the file system is contiguous.  
> 
> We're tracking this requirement (for lvm2) here:
>   https://bugzilla.redhat.com/show_bug.cgi?id=814737

That is an lvm2 BZ but there is further kernel work needed.

It should be noted that the "external origin" feature was added to the
thinp target with this commit: 
http://git.kernel.org/linus/2dd9c257fbc243aa76ee6d

It is start, but external origin is kept read-only and any writes
trigger allocation of new blocks within the thin-pool.

We've talked some about the desire to have a fully provisioned volume
that only starts to get fragmented once snapshots are taken.  The idea
is to move the origin into the data volume, via mapping, rather than
copying:

Dec 14 10:37:08 <ejt> we then build a data dev that consists of a linear mapping to that origin
Dec 14 10:37:12 <ejt> plus some extra stuff
Dec 14 10:37:23 <ejt> (the additonal free space for snapshots)
Dec 14 10:37:49 <ejt> we then prepare thinp metadata with a mapping to that origin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 15:28         ` Mike Snitzer
  0 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-19 15:28 UTC (permalink / raw)
  To: device-mapper development, Lukáš Czerner, linux-ext4,
	xfs, Spelic

On Tue, Jun 19 2012 at 10:43am -0400,
Alasdair G Kergon <agk@redhat.com> wrote:

> On Tue, Jun 19, 2012 at 10:19:33AM -0400, Ted Ts'o wrote:
> > One of the things which would be nice to be able to easily set up is a
> > configuration where we get the benefits of thin provisioning with
> > respect to snapshost, but where the underlying block device used by
> > the file system is contiguous.  
> 
> We're tracking this requirement (for lvm2) here:
>   https://bugzilla.redhat.com/show_bug.cgi?id=814737

That is an lvm2 BZ but there is further kernel work needed.

It should be noted that the "external origin" feature was added to the
thinp target with this commit: 
http://git.kernel.org/linus/2dd9c257fbc243aa76ee6d

It is start, but external origin is kept read-only and any writes
trigger allocation of new blocks within the thin-pool.

We've talked some about the desire to have a fully provisioned volume
that only starts to get fragmented once snapshots are taken.  The idea
is to move the origin into the data volume, via mapping, rather than
copying:

Dec 14 10:37:08 <ejt> we then build a data dev that consists of a linear mapping to that origin
Dec 14 10:37:12 <ejt> plus some extra stuff
Dec 14 10:37:23 <ejt> (the additonal free space for snapshots)
Dec 14 10:37:49 <ejt> we then prepare thinp metadata with a mapping to that origin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [dm-devel] Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 15:28         ` Mike Snitzer
@ 2012-06-19 16:03           ` Alasdair G Kergon
  -1 siblings, 0 replies; 72+ messages in thread
From: Alasdair G Kergon @ 2012-06-19 16:03 UTC (permalink / raw)
  To: device-mapper development
  Cc: Lukáš Czerner, linux-ext4, xfs, Spelic

On Tue, Jun 19, 2012 at 11:28:56AM -0400, Mike Snitzer wrote:
> That is an lvm2 BZ but there is further kernel work needed.
 
In principle, userspace should already be able to handle the replumbing I
think.  (But when we work through the details of an online import, perhaps
we'll want some further kernel change for atomicity/speed reasons?  In 
particular we need to be able to do the last part of the metadata merge
quickly.)

Roughly:
1. rejig the lvm metadata for the new configuration [lvm]
  - appends the "whole LV" data to the pool's data
2. Generate metadata for the appended data and append this to the metadata area [dmpd]
3. suspend all the affected devices [lvm]
4. link the already-prepared metadata into the existing metadata [dmpd]
5. resume all the devices (now using the new extended pool)

Alasdair


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [dm-devel] Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 16:03           ` Alasdair G Kergon
  0 siblings, 0 replies; 72+ messages in thread
From: Alasdair G Kergon @ 2012-06-19 16:03 UTC (permalink / raw)
  To: device-mapper development
  Cc: Lukáš Czerner, Spelic, linux-ext4, xfs

On Tue, Jun 19, 2012 at 11:28:56AM -0400, Mike Snitzer wrote:
> That is an lvm2 BZ but there is further kernel work needed.
 
In principle, userspace should already be able to handle the replumbing I
think.  (But when we work through the details of an online import, perhaps
we'll want some further kernel change for atomicity/speed reasons?  In 
particular we need to be able to do the last part of the metadata merge
quickly.)

Roughly:
1. rejig the lvm metadata for the new configuration [lvm]
  - appends the "whole LV" data to the pool's data
2. Generate metadata for the appended data and append this to the metadata area [dmpd]
3. suspend all the affected devices [lvm]
4. link the already-prepared metadata into the existing metadata [dmpd]
5. resume all the devices (now using the new extended pool)

Alasdair

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 14:44                 ` Mike Snitzer
@ 2012-06-19 18:48                   ` Mike Snitzer
  -1 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-19 18:48 UTC (permalink / raw)
  To: Spelic
  Cc: Lukáš Czerner, device-mapper development, linux-ext4,
	Dave Chinner, xfs

On Tue, Jun 19 2012 at 10:44am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Jun 19 2012 at  9:52am -0400,
> Spelic <spelic@shiftmail.org> wrote:
>
> > I do not know what is the mechanism for which xfs cannot unmap
> > blocks from dm-thin, but it really can't.
> > If anyone has dm-thin installed he can try. This is 100%
> > reproducible for me.
> 
> I was initially surprised by this considering the thinp-test-suite does
> test a compilebench workload against xfs and ext4 using online discard
> (-o discard).
> 
> But I just modified that test to use a thin-pool with 'ignore_discard'
> and the test still passed on both ext4 and xfs.
> 
> So there is more work needed in the thinp-test-suite to use blktrace
> hooks to verify that discards are occuring when the compilebench
> generated files are removed.
> 
> I'll work through that and report back.

blktrace shows discards for both xfs and ext4.

But in general xfs is issuing discards with much smaller extents than
ext4 does, e.g.:

to the thin device:
+ 128 vs + 32

to the thin-pool's data device:
+ 120 vs + 16

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 18:48                   ` Mike Snitzer
  0 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-19 18:48 UTC (permalink / raw)
  To: Spelic
  Cc: Lukáš Czerner, device-mapper development, linux-ext4, xfs

On Tue, Jun 19 2012 at 10:44am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Jun 19 2012 at  9:52am -0400,
> Spelic <spelic@shiftmail.org> wrote:
>
> > I do not know what is the mechanism for which xfs cannot unmap
> > blocks from dm-thin, but it really can't.
> > If anyone has dm-thin installed he can try. This is 100%
> > reproducible for me.
> 
> I was initially surprised by this considering the thinp-test-suite does
> test a compilebench workload against xfs and ext4 using online discard
> (-o discard).
> 
> But I just modified that test to use a thin-pool with 'ignore_discard'
> and the test still passed on both ext4 and xfs.
> 
> So there is more work needed in the thinp-test-suite to use blktrace
> hooks to verify that discards are occuring when the compilebench
> generated files are removed.
> 
> I'll work through that and report back.

blktrace shows discards for both xfs and ext4.

But in general xfs is issuing discards with much smaller extents than
ext4 does, e.g.:

to the thin device:
+ 128 vs + 32

to the thin-pool's data device:
+ 120 vs + 16

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [dm-devel] Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 15:28         ` Mike Snitzer
@ 2012-06-19 19:58           ` Ted Ts'o
  -1 siblings, 0 replies; 72+ messages in thread
From: Ted Ts'o @ 2012-06-19 19:58 UTC (permalink / raw)
  To: device-mapper development
  Cc: Lukáš Czerner, linux-ext4, xfs, Spelic

On Tue, Jun 19, 2012 at 11:28:56AM -0400, Mike Snitzer wrote:
> 
> That is an lvm2 BZ but there is further kernel work needed.
> 
> It should be noted that the "external origin" feature was added to the
> thinp target with this commit: 
> http://git.kernel.org/linus/2dd9c257fbc243aa76ee6d
> 
> It is start, but external origin is kept read-only and any writes
> trigger allocation of new blocks within the thin-pool.

Hmm... maybe this is what I had been told.  I thought there was some
feature where you could take a read-only thinp snapshot of an external
volume (i.e., a pre-existing LVM2 volume, or a block device), and then
after that, make read-write snapshots using the read-only snapshot as
a base?  Is that something that works today, or is planned?  Or am I
totally confused?

And if it is something that works today, is there a web site or
documentation file that gives a recipe for how to use it if we want to
do some performance experiments (i.e., it doesn't have to be a user
friendly interface if that's not ready yet).

Thanks,

						- Ted

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [dm-devel] Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 19:58           ` Ted Ts'o
  0 siblings, 0 replies; 72+ messages in thread
From: Ted Ts'o @ 2012-06-19 19:58 UTC (permalink / raw)
  To: device-mapper development
  Cc: Lukáš Czerner, Spelic, linux-ext4, xfs

On Tue, Jun 19, 2012 at 11:28:56AM -0400, Mike Snitzer wrote:
> 
> That is an lvm2 BZ but there is further kernel work needed.
> 
> It should be noted that the "external origin" feature was added to the
> thinp target with this commit: 
> http://git.kernel.org/linus/2dd9c257fbc243aa76ee6d
> 
> It is start, but external origin is kept read-only and any writes
> trigger allocation of new blocks within the thin-pool.

Hmm... maybe this is what I had been told.  I thought there was some
feature where you could take a read-only thinp snapshot of an external
volume (i.e., a pre-existing LVM2 volume, or a block device), and then
after that, make read-write snapshots using the read-only snapshot as
a base?  Is that something that works today, or is planned?  Or am I
totally confused?

And if it is something that works today, is there a web site or
documentation file that gives a recipe for how to use it if we want to
do some performance experiments (i.e., it doesn't have to be a user
friendly interface if that's not ready yet).

Thanks,

						- Ted

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 18:48                   ` Mike Snitzer
@ 2012-06-19 20:06                     ` Dave Chinner
  -1 siblings, 0 replies; 72+ messages in thread
From: Dave Chinner @ 2012-06-19 20:06 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Spelic, Lukáš Czerner, device-mapper development,
	linux-ext4, xfs

On Tue, Jun 19, 2012 at 02:48:59PM -0400, Mike Snitzer wrote:
> On Tue, Jun 19 2012 at 10:44am -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > On Tue, Jun 19 2012 at  9:52am -0400,
> > Spelic <spelic@shiftmail.org> wrote:
> >
> > > I do not know what is the mechanism for which xfs cannot unmap
> > > blocks from dm-thin, but it really can't.
> > > If anyone has dm-thin installed he can try. This is 100%
> > > reproducible for me.
> > 
> > I was initially surprised by this considering the thinp-test-suite does
> > test a compilebench workload against xfs and ext4 using online discard
> > (-o discard).
> > 
> > But I just modified that test to use a thin-pool with 'ignore_discard'
> > and the test still passed on both ext4 and xfs.
> > 
> > So there is more work needed in the thinp-test-suite to use blktrace
> > hooks to verify that discards are occuring when the compilebench
> > generated files are removed.
> > 
> > I'll work through that and report back.
> 
> blktrace shows discards for both xfs and ext4.
> 
> But in general xfs is issuing discards with much smaller extents than
> ext4 does, e.g.:

THat's normal when you use -o discard - XFS sends extremely
fine-grained discards as the have to be issued during the checkpoint
commit that frees the extent. Hence they can't be aggregated like is
done in ext4.

As it is, no-one really should be using -o discard - it is extremely
inefficient compared to a background fstrim run given that discards
are unqueued, blocking IOs. It's just a bad idea until the lower
layers get fixed to allow asynchronous, vectored discards and SATA
supports queued discards...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 20:06                     ` Dave Chinner
  0 siblings, 0 replies; 72+ messages in thread
From: Dave Chinner @ 2012-06-19 20:06 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Lukáš Czerner, device-mapper development, linux-ext4,
	xfs, Spelic

On Tue, Jun 19, 2012 at 02:48:59PM -0400, Mike Snitzer wrote:
> On Tue, Jun 19 2012 at 10:44am -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > On Tue, Jun 19 2012 at  9:52am -0400,
> > Spelic <spelic@shiftmail.org> wrote:
> >
> > > I do not know what is the mechanism for which xfs cannot unmap
> > > blocks from dm-thin, but it really can't.
> > > If anyone has dm-thin installed he can try. This is 100%
> > > reproducible for me.
> > 
> > I was initially surprised by this considering the thinp-test-suite does
> > test a compilebench workload against xfs and ext4 using online discard
> > (-o discard).
> > 
> > But I just modified that test to use a thin-pool with 'ignore_discard'
> > and the test still passed on both ext4 and xfs.
> > 
> > So there is more work needed in the thinp-test-suite to use blktrace
> > hooks to verify that discards are occuring when the compilebench
> > generated files are removed.
> > 
> > I'll work through that and report back.
> 
> blktrace shows discards for both xfs and ext4.
> 
> But in general xfs is issuing discards with much smaller extents than
> ext4 does, e.g.:

THat's normal when you use -o discard - XFS sends extremely
fine-grained discards as the have to be issued during the checkpoint
commit that frees the extent. Hence they can't be aggregated like is
done in ext4.

As it is, no-one really should be using -o discard - it is extremely
inefficient compared to a background fstrim run given that discards
are unqueued, blocking IOs. It's just a bad idea until the lower
layers get fixed to allow asynchronous, vectored discards and SATA
supports queued discards...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 20:06                     ` Dave Chinner
@ 2012-06-19 20:21                       ` Ted Ts'o
  -1 siblings, 0 replies; 72+ messages in thread
From: Ted Ts'o @ 2012-06-19 20:21 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Mike Snitzer, Spelic, Lukáš Czerner,
	device-mapper development, linux-ext4, xfs

On Wed, Jun 20, 2012 at 06:06:31AM +1000, Dave Chinner wrote:
> > But in general xfs is issuing discards with much smaller extents than
> > ext4 does, e.g.:
> 
> THat's normal when you use -o discard - XFS sends extremely
> fine-grained discards as the have to be issued during the checkpoint
> commit that frees the extent. Hence they can't be aggregated like is
> done in ext4.

Actually, ext4 is also sending the discards during (well, actually,
after) the commit which frees the extent/inode.  We do aggregate them
while the commit is open, but once the transaction is committed, we
send out the discards.  I suspect the difference is in the granularity
of the transactions between ext4 and xfs.

> As it is, no-one really should be using -o discard - it is extremely
> inefficient compared to a background fstrim run given that discards
> are unqueued, blocking IOs. It's just a bad idea until the lower
> layers get fixed to allow asynchronous, vectored discards and SATA
> supports queued discards...

What Dave said.  :-) This is true for both ext4 and xfs.

As a result, I can very easily see there being a distinction made
between when we *do* want to pass the discards all the way down to the
device, and when we only want the thinp layer to process them ---
because for current devices, sending discards down to the physical
device is very heavyweight.

I'm not sure how we could do this without a nasty layering violation,
but some way in which we could label fstrim discards versus "we've
committed the unlink/truncate and so thinp can feel free to reuse
these blocks" discards would be interesting to consider.

     	  	     	     	      - Ted

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 20:21                       ` Ted Ts'o
  0 siblings, 0 replies; 72+ messages in thread
From: Ted Ts'o @ 2012-06-19 20:21 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Mike Snitzer, xfs, device-mapper development, Spelic,
	Lukáš Czerner, linux-ext4

On Wed, Jun 20, 2012 at 06:06:31AM +1000, Dave Chinner wrote:
> > But in general xfs is issuing discards with much smaller extents than
> > ext4 does, e.g.:
> 
> THat's normal when you use -o discard - XFS sends extremely
> fine-grained discards as the have to be issued during the checkpoint
> commit that frees the extent. Hence they can't be aggregated like is
> done in ext4.

Actually, ext4 is also sending the discards during (well, actually,
after) the commit which frees the extent/inode.  We do aggregate them
while the commit is open, but once the transaction is committed, we
send out the discards.  I suspect the difference is in the granularity
of the transactions between ext4 and xfs.

> As it is, no-one really should be using -o discard - it is extremely
> inefficient compared to a background fstrim run given that discards
> are unqueued, blocking IOs. It's just a bad idea until the lower
> layers get fixed to allow asynchronous, vectored discards and SATA
> supports queued discards...

What Dave said.  :-) This is true for both ext4 and xfs.

As a result, I can very easily see there being a distinction made
between when we *do* want to pass the discards all the way down to the
device, and when we only want the thinp layer to process them ---
because for current devices, sending discards down to the physical
device is very heavyweight.

I'm not sure how we could do this without a nasty layering violation,
but some way in which we could label fstrim discards versus "we've
committed the unlink/truncate and so thinp can feel free to reuse
these blocks" discards would be interesting to consider.

     	  	     	     	      - Ted

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 20:21                       ` Ted Ts'o
@ 2012-06-19 20:39                         ` Dave Chinner
  -1 siblings, 0 replies; 72+ messages in thread
From: Dave Chinner @ 2012-06-19 20:39 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Mike Snitzer, Spelic, Lukáš Czerner,
	device-mapper development, linux-ext4, xfs

On Tue, Jun 19, 2012 at 04:21:30PM -0400, Ted Ts'o wrote:
> On Wed, Jun 20, 2012 at 06:06:31AM +1000, Dave Chinner wrote:
> > > But in general xfs is issuing discards with much smaller extents than
> > > ext4 does, e.g.:
> > 
> > THat's normal when you use -o discard - XFS sends extremely
> > fine-grained discards as the have to be issued during the checkpoint
> > commit that frees the extent. Hence they can't be aggregated like is
> > done in ext4.
> 
> Actually, ext4 is also sending the discards during (well, actually,
> after) the commit which frees the extent/inode.  We do aggregate them
> while the commit is open, but once the transaction is committed, we
> send out the discards.  I suspect the difference is in the granularity
> of the transactions between ext4 and xfs.

Exactly - XFS transactions are fine grained, checkpoints are coarse.
We don't merge extents freed in fine grained transactions inside
checkpoints. We probably could, but, well, it's complex to do in XFS
and merging adjacent requests is something the block layer is
supposed to do....

> > As it is, no-one really should be using -o discard - it is extremely
> > inefficient compared to a background fstrim run given that discards
> > are unqueued, blocking IOs. It's just a bad idea until the lower
> > layers get fixed to allow asynchronous, vectored discards and SATA
> > supports queued discards...
> 
> What Dave said.  :-) This is true for both ext4 and xfs.
> 
> As a result, I can very easily see there being a distinction made
> between when we *do* want to pass the discards all the way down to the
> device, and when we only want the thinp layer to process them ---
> because for current devices, sending discards down to the physical
> device is very heavyweight.
> 
> I'm not sure how we could do this without a nasty layering violation,
> but some way in which we could label fstrim discards versus "we've
> committed the unlink/truncate and so thinp can feel free to reuse
> these blocks" discards would be interesting to consider.

I think if we had better discard support from the block layer, it
wouldn't matter from a filesystem POV what discard support is
present in the block layer below it. I think it's better to get the
block layer interface fixed than to add new request types/labels to
filesystems to work around the current deficiencies.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 20:39                         ` Dave Chinner
  0 siblings, 0 replies; 72+ messages in thread
From: Dave Chinner @ 2012-06-19 20:39 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Mike Snitzer, xfs, device-mapper development, Spelic,
	Lukáš Czerner, linux-ext4

On Tue, Jun 19, 2012 at 04:21:30PM -0400, Ted Ts'o wrote:
> On Wed, Jun 20, 2012 at 06:06:31AM +1000, Dave Chinner wrote:
> > > But in general xfs is issuing discards with much smaller extents than
> > > ext4 does, e.g.:
> > 
> > THat's normal when you use -o discard - XFS sends extremely
> > fine-grained discards as the have to be issued during the checkpoint
> > commit that frees the extent. Hence they can't be aggregated like is
> > done in ext4.
> 
> Actually, ext4 is also sending the discards during (well, actually,
> after) the commit which frees the extent/inode.  We do aggregate them
> while the commit is open, but once the transaction is committed, we
> send out the discards.  I suspect the difference is in the granularity
> of the transactions between ext4 and xfs.

Exactly - XFS transactions are fine grained, checkpoints are coarse.
We don't merge extents freed in fine grained transactions inside
checkpoints. We probably could, but, well, it's complex to do in XFS
and merging adjacent requests is something the block layer is
supposed to do....

> > As it is, no-one really should be using -o discard - it is extremely
> > inefficient compared to a background fstrim run given that discards
> > are unqueued, blocking IOs. It's just a bad idea until the lower
> > layers get fixed to allow asynchronous, vectored discards and SATA
> > supports queued discards...
> 
> What Dave said.  :-) This is true for both ext4 and xfs.
> 
> As a result, I can very easily see there being a distinction made
> between when we *do* want to pass the discards all the way down to the
> device, and when we only want the thinp layer to process them ---
> because for current devices, sending discards down to the physical
> device is very heavyweight.
> 
> I'm not sure how we could do this without a nasty layering violation,
> but some way in which we could label fstrim discards versus "we've
> committed the unlink/truncate and so thinp can feel free to reuse
> these blocks" discards would be interesting to consider.

I think if we had better discard support from the block layer, it
wouldn't matter from a filesystem POV what discard support is
present in the block layer below it. I think it's better to get the
block layer interface fixed than to add new request types/labels to
filesystems to work around the current deficiencies.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 19:58           ` Ted Ts'o
@ 2012-06-19 20:44             ` Mike Snitzer
  -1 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-19 20:44 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: device-mapper development, Lukáš Czerner, Spelic,
	linux-ext4, xfs

On Tue, Jun 19 2012 at  3:58pm -0400,
Ted Ts'o <tytso@mit.edu> wrote:

> On Tue, Jun 19, 2012 at 11:28:56AM -0400, Mike Snitzer wrote:
> > 
> > That is an lvm2 BZ but there is further kernel work needed.
> > 
> > It should be noted that the "external origin" feature was added to the
> > thinp target with this commit: 
> > http://git.kernel.org/linus/2dd9c257fbc243aa76ee6d
> > 
> > It is start, but external origin is kept read-only and any writes
> > trigger allocation of new blocks within the thin-pool.
> 
> Hmm... maybe this is what I had been told.  I thought there was some
> feature where you could take a read-only thinp snapshot of an external
> volume (i.e., a pre-existing LVM2 volume, or a block device), and then
> after that, make read-write snapshots using the read-only snapshot as
> a base?  Is that something that works today, or is planned?  Or am I
> totally confused?

The commit I referenced basically provides that capability.

> And if it is something that works today, is there a web site or
> documentation file that gives a recipe for how to use it if we want to
> do some performance experiments (i.e., it doesn't have to be a user
> friendly interface if that's not ready yet).

Documentation/device-mapper/thin-provisioning.txt has details on how to
use dmsetup to create a thin device that uses a read-only external
origin volume (so all reads to unprovisioned areas of the thin device
will be remapped to the external origin -- "external" meaning the volume
outside of the thin-pool).

The creation of a thin device w/ a read-only external origin gets you
started with a thin device that is effectively a snapshot of the origin
volume.  That thin device is read-write -- all writes are provisioned
from the thin-pool that is backing the thin device.  And you can take
snapshots (or recursive snapshots) of that thin device.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 20:44             ` Mike Snitzer
  0 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-19 20:44 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Lukáš Czerner, device-mapper development, linux-ext4,
	xfs, Spelic

On Tue, Jun 19 2012 at  3:58pm -0400,
Ted Ts'o <tytso@mit.edu> wrote:

> On Tue, Jun 19, 2012 at 11:28:56AM -0400, Mike Snitzer wrote:
> > 
> > That is an lvm2 BZ but there is further kernel work needed.
> > 
> > It should be noted that the "external origin" feature was added to the
> > thinp target with this commit: 
> > http://git.kernel.org/linus/2dd9c257fbc243aa76ee6d
> > 
> > It is start, but external origin is kept read-only and any writes
> > trigger allocation of new blocks within the thin-pool.
> 
> Hmm... maybe this is what I had been told.  I thought there was some
> feature where you could take a read-only thinp snapshot of an external
> volume (i.e., a pre-existing LVM2 volume, or a block device), and then
> after that, make read-write snapshots using the read-only snapshot as
> a base?  Is that something that works today, or is planned?  Or am I
> totally confused?

The commit I referenced basically provides that capability.

> And if it is something that works today, is there a web site or
> documentation file that gives a recipe for how to use it if we want to
> do some performance experiments (i.e., it doesn't have to be a user
> friendly interface if that's not ready yet).

Documentation/device-mapper/thin-provisioning.txt has details on how to
use dmsetup to create a thin device that uses a read-only external
origin volume (so all reads to unprovisioned areas of the thin device
will be remapped to the external origin -- "external" meaning the volume
outside of the thin-pool).

The creation of a thin device w/ a read-only external origin gets you
started with a thin device that is effectively a snapshot of the origin
volume.  That thin device is read-write -- all writes are provisioned
from the thin-pool that is backing the thin device.  And you can take
snapshots (or recursive snapshots) of that thin device.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 20:06                     ` Dave Chinner
@ 2012-06-19 21:37                       ` Spelic
  -1 siblings, 0 replies; 72+ messages in thread
From: Spelic @ 2012-06-19 21:37 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Mike Snitzer, Spelic, Lukáš Czerner,
	device-mapper development, linux-ext4, xfs

On 06/19/12 22:06, Dave Chinner wrote:
> On Tue, Jun 19, 2012 at 02:48:59PM -0400, Mike Snitzer wrote:
>> On Tue, Jun 19 2012 at 10:44am -0400,
>> Mike Snitzer<snitzer@redhat.com>  wrote:
>>
>>> On Tue, Jun 19 2012 at  9:52am -0400,
>>> Spelic<spelic@shiftmail.org>  wrote:
>>>
>>>> I do not know what is the mechanism for which xfs cannot unmap
>>>> blocks from dm-thin, but it really can't.
>>>> If anyone has dm-thin installed he can try. This is 100%
>>>> reproducible for me.
>>> I was initially surprised by this considering the thinp-test-suite does
>>> test a compilebench workload against xfs and ext4 using online discard
>>> (-o discard).
>>>
>>> But I just modified that test to use a thin-pool with 'ignore_discard'
>>> and the test still passed on both ext4 and xfs.
>>>
>>> So there is more work needed in the thinp-test-suite to use blktrace
>>> hooks to verify that discards are occuring when the compilebench
>>> generated files are removed.
>>>
>>> I'll work through that and report back.
>> blktrace shows discards for both xfs and ext4.
>>
>> But in general xfs is issuing discards with much smaller extents than
>> ext4 does, e.g.:
> THat's normal when you use -o discard - XFS sends extremely
> fine-grained discards as the have to be issued during the checkpoint
> commit that frees the extent. Hence they can't be aggregated like is
> done in ext4.
>
> As it is, no-one really should be using -o discard - it is extremely
> inefficient compared to a background fstrim run given that discards
> are unqueued, blocking IOs. It's just a bad idea until the lower
> layers get fixed to allow asynchronous, vectored discards and SATA
> supports queued discards...
>

Could it be that the thin blocksize is larger than the discard 
granularity by xfs so nothing ever gets unmapped?
I have tried thin pools with the default blocksize (64k afair with lvm2) 
and 1MB.
HOWEVER I also have tried fstrim on xfs, and that is also not capable to 
unmap things from the dm-thin.
What is the granularity with fstrim in xfs?
Sorry I can't access the machine right now; maybe tomorrow, or in the 
weekend.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 21:37                       ` Spelic
  0 siblings, 0 replies; 72+ messages in thread
From: Spelic @ 2012-06-19 21:37 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Mike Snitzer, xfs, device-mapper development, Spelic,
	Lukáš Czerner, linux-ext4

On 06/19/12 22:06, Dave Chinner wrote:
> On Tue, Jun 19, 2012 at 02:48:59PM -0400, Mike Snitzer wrote:
>> On Tue, Jun 19 2012 at 10:44am -0400,
>> Mike Snitzer<snitzer@redhat.com>  wrote:
>>
>>> On Tue, Jun 19 2012 at  9:52am -0400,
>>> Spelic<spelic@shiftmail.org>  wrote:
>>>
>>>> I do not know what is the mechanism for which xfs cannot unmap
>>>> blocks from dm-thin, but it really can't.
>>>> If anyone has dm-thin installed he can try. This is 100%
>>>> reproducible for me.
>>> I was initially surprised by this considering the thinp-test-suite does
>>> test a compilebench workload against xfs and ext4 using online discard
>>> (-o discard).
>>>
>>> But I just modified that test to use a thin-pool with 'ignore_discard'
>>> and the test still passed on both ext4 and xfs.
>>>
>>> So there is more work needed in the thinp-test-suite to use blktrace
>>> hooks to verify that discards are occuring when the compilebench
>>> generated files are removed.
>>>
>>> I'll work through that and report back.
>> blktrace shows discards for both xfs and ext4.
>>
>> But in general xfs is issuing discards with much smaller extents than
>> ext4 does, e.g.:
> THat's normal when you use -o discard - XFS sends extremely
> fine-grained discards as the have to be issued during the checkpoint
> commit that frees the extent. Hence they can't be aggregated like is
> done in ext4.
>
> As it is, no-one really should be using -o discard - it is extremely
> inefficient compared to a background fstrim run given that discards
> are unqueued, blocking IOs. It's just a bad idea until the lower
> layers get fixed to allow asynchronous, vectored discards and SATA
> supports queued discards...
>

Could it be that the thin blocksize is larger than the discard 
granularity by xfs so nothing ever gets unmapped?
I have tried thin pools with the default blocksize (64k afair with lvm2) 
and 1MB.
HOWEVER I also have tried fstrim on xfs, and that is also not capable to 
unmap things from the dm-thin.
What is the granularity with fstrim in xfs?
Sorry I can't access the machine right now; maybe tomorrow, or in the 
weekend.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 21:37                       ` Spelic
@ 2012-06-19 23:12                         ` Dave Chinner
  -1 siblings, 0 replies; 72+ messages in thread
From: Dave Chinner @ 2012-06-19 23:12 UTC (permalink / raw)
  To: Spelic
  Cc: Mike Snitzer, Lukáš Czerner, device-mapper development,
	linux-ext4, xfs

On Tue, Jun 19, 2012 at 11:37:54PM +0200, Spelic wrote:
> On 06/19/12 22:06, Dave Chinner wrote:
> >On Tue, Jun 19, 2012 at 02:48:59PM -0400, Mike Snitzer wrote:
> >>On Tue, Jun 19 2012 at 10:44am -0400,
> >>Mike Snitzer<snitzer@redhat.com>  wrote:
> >>
> >>>On Tue, Jun 19 2012 at  9:52am -0400,
> >>>Spelic<spelic@shiftmail.org>  wrote:
> >>>
> >>>>I do not know what is the mechanism for which xfs cannot unmap
> >>>>blocks from dm-thin, but it really can't.
> >>>>If anyone has dm-thin installed he can try. This is 100%
> >>>>reproducible for me.
> >>>I was initially surprised by this considering the thinp-test-suite does
> >>>test a compilebench workload against xfs and ext4 using online discard
> >>>(-o discard).
> >>>
> >>>But I just modified that test to use a thin-pool with 'ignore_discard'
> >>>and the test still passed on both ext4 and xfs.
> >>>
> >>>So there is more work needed in the thinp-test-suite to use blktrace
> >>>hooks to verify that discards are occuring when the compilebench
> >>>generated files are removed.
> >>>
> >>>I'll work through that and report back.
> >>blktrace shows discards for both xfs and ext4.
> >>
> >>But in general xfs is issuing discards with much smaller extents than
> >>ext4 does, e.g.:
> >THat's normal when you use -o discard - XFS sends extremely
> >fine-grained discards as the have to be issued during the checkpoint
> >commit that frees the extent. Hence they can't be aggregated like is
> >done in ext4.
> >
> >As it is, no-one really should be using -o discard - it is extremely
> >inefficient compared to a background fstrim run given that discards
> >are unqueued, blocking IOs. It's just a bad idea until the lower
> >layers get fixed to allow asynchronous, vectored discards and SATA
> >supports queued discards...
> >
> 
> Could it be that the thin blocksize is larger than the discard
> granularity by xfs so nothing ever gets unmapped?

for -o discard, possibly. for fstrim, unlikely.

> I have tried thin pools with the default blocksize (64k afair with
> lvm2) and 1MB.
> HOWEVER I also have tried fstrim on xfs, and that is also not
> capable to unmap things from the dm-thin.
> What is the granularity with fstrim in xfs?

Whatever granularity you passed fstrim. You need to run an event
trace on XFS to find out if it is issuing discards before going
any further..

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-19 23:12                         ` Dave Chinner
  0 siblings, 0 replies; 72+ messages in thread
From: Dave Chinner @ 2012-06-19 23:12 UTC (permalink / raw)
  To: Spelic
  Cc: Lukáš Czerner, device-mapper development, linux-ext4,
	Mike Snitzer, xfs

On Tue, Jun 19, 2012 at 11:37:54PM +0200, Spelic wrote:
> On 06/19/12 22:06, Dave Chinner wrote:
> >On Tue, Jun 19, 2012 at 02:48:59PM -0400, Mike Snitzer wrote:
> >>On Tue, Jun 19 2012 at 10:44am -0400,
> >>Mike Snitzer<snitzer@redhat.com>  wrote:
> >>
> >>>On Tue, Jun 19 2012 at  9:52am -0400,
> >>>Spelic<spelic@shiftmail.org>  wrote:
> >>>
> >>>>I do not know what is the mechanism for which xfs cannot unmap
> >>>>blocks from dm-thin, but it really can't.
> >>>>If anyone has dm-thin installed he can try. This is 100%
> >>>>reproducible for me.
> >>>I was initially surprised by this considering the thinp-test-suite does
> >>>test a compilebench workload against xfs and ext4 using online discard
> >>>(-o discard).
> >>>
> >>>But I just modified that test to use a thin-pool with 'ignore_discard'
> >>>and the test still passed on both ext4 and xfs.
> >>>
> >>>So there is more work needed in the thinp-test-suite to use blktrace
> >>>hooks to verify that discards are occuring when the compilebench
> >>>generated files are removed.
> >>>
> >>>I'll work through that and report back.
> >>blktrace shows discards for both xfs and ext4.
> >>
> >>But in general xfs is issuing discards with much smaller extents than
> >>ext4 does, e.g.:
> >THat's normal when you use -o discard - XFS sends extremely
> >fine-grained discards as the have to be issued during the checkpoint
> >commit that frees the extent. Hence they can't be aggregated like is
> >done in ext4.
> >
> >As it is, no-one really should be using -o discard - it is extremely
> >inefficient compared to a background fstrim run given that discards
> >are unqueued, blocking IOs. It's just a bad idea until the lower
> >layers get fixed to allow asynchronous, vectored discards and SATA
> >supports queued discards...
> >
> 
> Could it be that the thin blocksize is larger than the discard
> granularity by xfs so nothing ever gets unmapped?

for -o discard, possibly. for fstrim, unlikely.

> I have tried thin pools with the default blocksize (64k afair with
> lvm2) and 1MB.
> HOWEVER I also have tried fstrim on xfs, and that is also not
> capable to unmap things from the dm-thin.
> What is the granularity with fstrim in xfs?

Whatever granularity you passed fstrim. You need to run an event
trace on XFS to find out if it is issuing discards before going
any further..

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19 20:39                         ` Dave Chinner
@ 2012-06-20  9:01                           ` Christoph Hellwig
  -1 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2012-06-20  9:01 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Ted Ts'o, Mike Snitzer, xfs, device-mapper development,
	Spelic, Luk???? Czerner, linux-ext4

On Wed, Jun 20, 2012 at 06:39:38AM +1000, Dave Chinner wrote:
> Exactly - XFS transactions are fine grained, checkpoints are coarse.
> We don't merge extents freed in fine grained transactions inside
> checkpoints. We probably could, but, well, it's complex to do in XFS
> and merging adjacent requests is something the block layer is
> supposed to do....

Last time I checked it actually tries to do that for discard requests,
but then badly falls flat (=oopses).  That's the reason why the XFS
transaction commit code still uses the highly suboptimal synchronous
blkdev_issue_discard instead of the async variant I wrote when designing
the code.

Another "issue" with the XFS discard pattern and the current block
layer implementation is that XFS frees a lot of small metadata like
inode clusters and btree blocks and discards them as well.  If those
simply fill one of the vectors in a range ATA TRIM command and/or a
queueable command that's not much of an issue, but with the current
combination of non-queueable, non-vetored TRIM that's a fairly nasty
pattern.

So until the block layer is sorted out I can not recommend actually
using -o dicard.  I planned to sort out the block layer issues ASAP
when writing that code, but other things have kept me busy every since.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-20  9:01                           ` Christoph Hellwig
  0 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2012-06-20  9:01 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Ted Ts'o, Mike Snitzer, xfs, device-mapper development,
	Spelic, Luk???? Czerner, linux-ext4

On Wed, Jun 20, 2012 at 06:39:38AM +1000, Dave Chinner wrote:
> Exactly - XFS transactions are fine grained, checkpoints are coarse.
> We don't merge extents freed in fine grained transactions inside
> checkpoints. We probably could, but, well, it's complex to do in XFS
> and merging adjacent requests is something the block layer is
> supposed to do....

Last time I checked it actually tries to do that for discard requests,
but then badly falls flat (=oopses).  That's the reason why the XFS
transaction commit code still uses the highly suboptimal synchronous
blkdev_issue_discard instead of the async variant I wrote when designing
the code.

Another "issue" with the XFS discard pattern and the current block
layer implementation is that XFS frees a lot of small metadata like
inode clusters and btree blocks and discards them as well.  If those
simply fill one of the vectors in a range ATA TRIM command and/or a
queueable command that's not much of an issue, but with the current
combination of non-queueable, non-vetored TRIM that's a fairly nasty
pattern.

So until the block layer is sorted out I can not recommend actually
using -o dicard.  I planned to sort out the block layer issues ASAP
when writing that code, but other things have kept me busy every since.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-19  1:57   ` Dave Chinner
@ 2012-06-20 12:11     ` Spelic
  -1 siblings, 0 replies; 72+ messages in thread
From: Spelic @ 2012-06-20 12:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Spelic, xfs, linux-ext4, device-mapper development

Ok guys, I think I found the bug. One or more bugs.


Pool has chunksize 1MB.
In sysfs the thin volume has: queue/discard_max_bytes and 
queue/discard_granularity are 1048576 .
And it has discard_alignment = 0, which based on sysfs-block 
documentation is correct (a less misleading name would have been 
discard_offset imho).
Here is the blktrace from ext4 fstrim:
...
252,9   17      498     0.030466556   841  Q   D 19898368 + 2048 [fstrim]
252,9   17      499     0.030467501   841  Q   D 19900416 + 2048 [fstrim]
252,9   17      500     0.030468359   841  Q   D 19902464 + 2048 [fstrim]
252,9   17      501     0.030469313   841  Q   D 19904512 + 2048 [fstrim]
252,9   17      502     0.030470144   841  Q   D 19906560 + 2048 [fstrim]
252,9   17      503     0.030471381   841  Q   D 19908608 + 2048 [fstrim]
252,9   17      504     0.030472473   841  Q   D 19910656 + 2048 [fstrim]
252,9   17      505     0.030473504   841  Q   D 19912704 + 2048 [fstrim]
252,9   17      506     0.030474561   841  Q   D 19914752 + 2048 [fstrim]
252,9   17      507     0.030475571   841  Q   D 19916800 + 2048 [fstrim]
252,9   17      508     0.030476423   841  Q   D 19918848 + 2048 [fstrim]
252,9   17      509     0.030477341   841  Q   D 19920896 + 2048 [fstrim]
252,9   17      510     0.034299630   841  Q   D 19922944 + 2048 [fstrim]
252,9   17      511     0.034306880   841  Q   D 19924992 + 2048 [fstrim]
252,9   17      512     0.034307955   841  Q   D 19927040 + 2048 [fstrim]
252,9   17      513     0.034308928   841  Q   D 19929088 + 2048 [fstrim]
252,9   17      514     0.034309945   841  Q   D 19931136 + 2048 [fstrim]
252,9   17      515     0.034311007   841  Q   D 19933184 + 2048 [fstrim]
252,9   17      516     0.034312008   841  Q   D 19935232 + 2048 [fstrim]
252,9   17      517     0.034313122   841  Q   D 19937280 + 2048 [fstrim]
252,9   17      518     0.034314013   841  Q   D 19939328 + 2048 [fstrim]
252,9   17      519     0.034314940   841  Q   D 19941376 + 2048 [fstrim]
252,9   17      520     0.034315835   841  Q   D 19943424 + 2048 [fstrim]
252,9   17      521     0.034316662   841  Q   D 19945472 + 2048 [fstrim]
252,9   17      522     0.034317547   841  Q   D 19947520 + 2048 [fstrim]
...

Here is the blktrace from xfs fstrim:
252,12  16        1     0.000000000   554  Q   D 96 + 2048 [fstrim]
252,12  16        2     0.000010149   554  Q   D 2144 + 2048 [fstrim]
252,12  16        3     0.000011349   554  Q   D 4192 + 2048 [fstrim]
252,12  16        4     0.000012584   554  Q   D 6240 + 2048 [fstrim]
252,12  16        5     0.000013685   554  Q   D 8288 + 2048 [fstrim]
252,12  16        6     0.000014660   554  Q   D 10336 + 2048 [fstrim]
252,12  16        7     0.000015707   554  Q   D 12384 + 2048 [fstrim]
252,12  16        8     0.000016692   554  Q   D 14432 + 2048 [fstrim]
252,12  16        9     0.000017594   554  Q   D 16480 + 2048 [fstrim]
252,12  16       10     0.000018539   554  Q   D 18528 + 2048 [fstrim]
252,12  16       11     0.000019434   554  Q   D 20576 + 2048 [fstrim]
252,12  16       12     0.000020879   554  Q   D 22624 + 2048 [fstrim]
252,12  16       13     0.000021856   554  Q   D 24672 + 2048 [fstrim]
252,12  16       14     0.000022786   554  Q   D 26720 + 2048 [fstrim]
252,12  16       15     0.000023699   554  Q   D 28768 + 2048 [fstrim]
252,12  16       16     0.000024672   554  Q   D 30816 + 2048 [fstrim]
252,12  16       17     0.000025467   554  Q   D 32864 + 2048 [fstrim]
252,12  16       18     0.000026374   554  Q   D 34912 + 2048 [fstrim]
252,12  16       19     0.000027194   554  Q   D 36960 + 2048 [fstrim]
252,12  16       20     0.000028137   554  Q   D 39008 + 2048 [fstrim]
252,12  16       21     0.000029524   554  Q   D 41056 + 2048 [fstrim]
252,12  16       22     0.000030479   554  Q   D 43104 + 2048 [fstrim]
252,12  16       23     0.000031306   554  Q   D 45152 + 2048 [fstrim]
252,12  16       24     0.000032134   554  Q   D 47200 + 2048 [fstrim]
252,12  16       25     0.000032964   554  Q   D 49248 + 2048 [fstrim]
252,12  16       26     0.000033794   554  Q   D 51296 + 2048 [fstrim]


As you can see, while ext4 correctly aligns the discards to 1MB, xfs 
does not.
It looks like an fstrim or xfs bug: they don't look at discard_alignment 
(=0 ... a less misleading name would be discard_offset imho) + 
discard_granularity (=1MB) and they don't base alignments on those.
Clearly the dm-thin cannot unmap anything if the 1MB regions are not 
fully covered by a single discard. Note that specifying a large -m 
option for fstrim does NOT widen the discard messages above 2048, and 
this is correct because discard_max_bytes for that device is 1048576 . 
If discard_max_bytes could be made much larger these kind of bugs could 
be ameliorated, especially in complex situations like layers over 
layers, virtualization etc.

Note that also in ext4 there are parts of the discard without the 1MB 
alignment as seen with blktrace (out of my snippet), so this also might 
need to be fixed, but most of it is aligned to 1MB. In xfs there are no 
parts aligned to 1MB.


Now, another problem:
Firstly I wanted to say that in my original post I missed the 
conv=notrunc for dd: I complained about the performances because I 
expected the zerofiles would have been rewritten in-place without block 
re-provisioning by dm-thin, but clearly without conv=notrunc this was 
not happening. I confirm that with conv=notrunc performances are high at 
the first rewrite, also in ext4, and occupied space in the thin volume 
does not increase at every rewrite by dd.
HOWEVER
by NOT specifying conv=notrunc, the behaviour of dd / ext4 / dm-thin is 
different if skip_block_zeroing is specified or not. If 
skip_block_zeroing is not specified (provisioned blocks are pre-zeroed) 
the space occupied by dd truncate + rewrite INCREASES at every rewrite, 
while if skip_block_zeroing is NOT specified, dd truncate + rewrite DOES 
NOT increase space occupied on the thin volume. Note: try this on ext4, 
not xfs.
This looks very strange to me. The only reason I can think of is some 
kind of cooperative behaviour of ext4 with the variable
dm-X/queue/discard_zeroes_data
which is different in the two cases. Can anyone give an explanation or 
check if this is the intended behaviour?


And still an open question is: why the speed of provisioning new blocks 
does not increase with increasing chunk size (64K --> 1MB --> 16MB...), 
not even when skip_block_zeroing has been set and there is no CoW?


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-20 12:11     ` Spelic
  0 siblings, 0 replies; 72+ messages in thread
From: Spelic @ 2012-06-20 12:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-ext4, device-mapper development, xfs, Spelic

Ok guys, I think I found the bug. One or more bugs.


Pool has chunksize 1MB.
In sysfs the thin volume has: queue/discard_max_bytes and 
queue/discard_granularity are 1048576 .
And it has discard_alignment = 0, which based on sysfs-block 
documentation is correct (a less misleading name would have been 
discard_offset imho).
Here is the blktrace from ext4 fstrim:
...
252,9   17      498     0.030466556   841  Q   D 19898368 + 2048 [fstrim]
252,9   17      499     0.030467501   841  Q   D 19900416 + 2048 [fstrim]
252,9   17      500     0.030468359   841  Q   D 19902464 + 2048 [fstrim]
252,9   17      501     0.030469313   841  Q   D 19904512 + 2048 [fstrim]
252,9   17      502     0.030470144   841  Q   D 19906560 + 2048 [fstrim]
252,9   17      503     0.030471381   841  Q   D 19908608 + 2048 [fstrim]
252,9   17      504     0.030472473   841  Q   D 19910656 + 2048 [fstrim]
252,9   17      505     0.030473504   841  Q   D 19912704 + 2048 [fstrim]
252,9   17      506     0.030474561   841  Q   D 19914752 + 2048 [fstrim]
252,9   17      507     0.030475571   841  Q   D 19916800 + 2048 [fstrim]
252,9   17      508     0.030476423   841  Q   D 19918848 + 2048 [fstrim]
252,9   17      509     0.030477341   841  Q   D 19920896 + 2048 [fstrim]
252,9   17      510     0.034299630   841  Q   D 19922944 + 2048 [fstrim]
252,9   17      511     0.034306880   841  Q   D 19924992 + 2048 [fstrim]
252,9   17      512     0.034307955   841  Q   D 19927040 + 2048 [fstrim]
252,9   17      513     0.034308928   841  Q   D 19929088 + 2048 [fstrim]
252,9   17      514     0.034309945   841  Q   D 19931136 + 2048 [fstrim]
252,9   17      515     0.034311007   841  Q   D 19933184 + 2048 [fstrim]
252,9   17      516     0.034312008   841  Q   D 19935232 + 2048 [fstrim]
252,9   17      517     0.034313122   841  Q   D 19937280 + 2048 [fstrim]
252,9   17      518     0.034314013   841  Q   D 19939328 + 2048 [fstrim]
252,9   17      519     0.034314940   841  Q   D 19941376 + 2048 [fstrim]
252,9   17      520     0.034315835   841  Q   D 19943424 + 2048 [fstrim]
252,9   17      521     0.034316662   841  Q   D 19945472 + 2048 [fstrim]
252,9   17      522     0.034317547   841  Q   D 19947520 + 2048 [fstrim]
...

Here is the blktrace from xfs fstrim:
252,12  16        1     0.000000000   554  Q   D 96 + 2048 [fstrim]
252,12  16        2     0.000010149   554  Q   D 2144 + 2048 [fstrim]
252,12  16        3     0.000011349   554  Q   D 4192 + 2048 [fstrim]
252,12  16        4     0.000012584   554  Q   D 6240 + 2048 [fstrim]
252,12  16        5     0.000013685   554  Q   D 8288 + 2048 [fstrim]
252,12  16        6     0.000014660   554  Q   D 10336 + 2048 [fstrim]
252,12  16        7     0.000015707   554  Q   D 12384 + 2048 [fstrim]
252,12  16        8     0.000016692   554  Q   D 14432 + 2048 [fstrim]
252,12  16        9     0.000017594   554  Q   D 16480 + 2048 [fstrim]
252,12  16       10     0.000018539   554  Q   D 18528 + 2048 [fstrim]
252,12  16       11     0.000019434   554  Q   D 20576 + 2048 [fstrim]
252,12  16       12     0.000020879   554  Q   D 22624 + 2048 [fstrim]
252,12  16       13     0.000021856   554  Q   D 24672 + 2048 [fstrim]
252,12  16       14     0.000022786   554  Q   D 26720 + 2048 [fstrim]
252,12  16       15     0.000023699   554  Q   D 28768 + 2048 [fstrim]
252,12  16       16     0.000024672   554  Q   D 30816 + 2048 [fstrim]
252,12  16       17     0.000025467   554  Q   D 32864 + 2048 [fstrim]
252,12  16       18     0.000026374   554  Q   D 34912 + 2048 [fstrim]
252,12  16       19     0.000027194   554  Q   D 36960 + 2048 [fstrim]
252,12  16       20     0.000028137   554  Q   D 39008 + 2048 [fstrim]
252,12  16       21     0.000029524   554  Q   D 41056 + 2048 [fstrim]
252,12  16       22     0.000030479   554  Q   D 43104 + 2048 [fstrim]
252,12  16       23     0.000031306   554  Q   D 45152 + 2048 [fstrim]
252,12  16       24     0.000032134   554  Q   D 47200 + 2048 [fstrim]
252,12  16       25     0.000032964   554  Q   D 49248 + 2048 [fstrim]
252,12  16       26     0.000033794   554  Q   D 51296 + 2048 [fstrim]


As you can see, while ext4 correctly aligns the discards to 1MB, xfs 
does not.
It looks like an fstrim or xfs bug: they don't look at discard_alignment 
(=0 ... a less misleading name would be discard_offset imho) + 
discard_granularity (=1MB) and they don't base alignments on those.
Clearly the dm-thin cannot unmap anything if the 1MB regions are not 
fully covered by a single discard. Note that specifying a large -m 
option for fstrim does NOT widen the discard messages above 2048, and 
this is correct because discard_max_bytes for that device is 1048576 . 
If discard_max_bytes could be made much larger these kind of bugs could 
be ameliorated, especially in complex situations like layers over 
layers, virtualization etc.

Note that also in ext4 there are parts of the discard without the 1MB 
alignment as seen with blktrace (out of my snippet), so this also might 
need to be fixed, but most of it is aligned to 1MB. In xfs there are no 
parts aligned to 1MB.


Now, another problem:
Firstly I wanted to say that in my original post I missed the 
conv=notrunc for dd: I complained about the performances because I 
expected the zerofiles would have been rewritten in-place without block 
re-provisioning by dm-thin, but clearly without conv=notrunc this was 
not happening. I confirm that with conv=notrunc performances are high at 
the first rewrite, also in ext4, and occupied space in the thin volume 
does not increase at every rewrite by dd.
HOWEVER
by NOT specifying conv=notrunc, the behaviour of dd / ext4 / dm-thin is 
different if skip_block_zeroing is specified or not. If 
skip_block_zeroing is not specified (provisioned blocks are pre-zeroed) 
the space occupied by dd truncate + rewrite INCREASES at every rewrite, 
while if skip_block_zeroing is NOT specified, dd truncate + rewrite DOES 
NOT increase space occupied on the thin volume. Note: try this on ext4, 
not xfs.
This looks very strange to me. The only reason I can think of is some 
kind of cooperative behaviour of ext4 with the variable
dm-X/queue/discard_zeroes_data
which is different in the two cases. Can anyone give an explanation or 
check if this is the intended behaviour?


And still an open question is: why the speed of provisioning new blocks 
does not increase with increasing chunk size (64K --> 1MB --> 16MB...), 
not even when skip_block_zeroing has been set and there is no CoW?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-20 12:11     ` Spelic
@ 2012-06-20 22:53       ` Dave Chinner
  -1 siblings, 0 replies; 72+ messages in thread
From: Dave Chinner @ 2012-06-20 22:53 UTC (permalink / raw)
  To: Spelic; +Cc: xfs, linux-ext4, device-mapper development

On Wed, Jun 20, 2012 at 02:11:31PM +0200, Spelic wrote:
> Ok guys, I think I found the bug. One or more bugs.
> 
> 
> Pool has chunksize 1MB.
> In sysfs the thin volume has: queue/discard_max_bytes and
> queue/discard_granularity are 1048576 .
> And it has discard_alignment = 0, which based on sysfs-block
> documentation is correct (a less misleading name would have been
> discard_offset imho).
> Here is the blktrace from ext4 fstrim:
> ...
> 252,9   17      498     0.030466556   841  Q   D 19898368 + 2048 [fstrim]
> 252,9   17      499     0.030467501   841  Q   D 19900416 + 2048 [fstrim]
> 252,9   17      500     0.030468359   841  Q   D 19902464 + 2048 [fstrim]
> 252,9   17      501     0.030469313   841  Q   D 19904512 + 2048 [fstrim]
> 252,9   17      502     0.030470144   841  Q   D 19906560 + 2048 [fstrim]
> 252,9   17      503     0.030471381   841  Q   D 19908608 + 2048 [fstrim]
> 252,9   17      504     0.030472473   841  Q   D 19910656 + 2048 [fstrim]
> 252,9   17      505     0.030473504   841  Q   D 19912704 + 2048 [fstrim]
> 252,9   17      506     0.030474561   841  Q   D 19914752 + 2048 [fstrim]
> 252,9   17      507     0.030475571   841  Q   D 19916800 + 2048 [fstrim]
> 252,9   17      508     0.030476423   841  Q   D 19918848 + 2048 [fstrim]
> 252,9   17      509     0.030477341   841  Q   D 19920896 + 2048 [fstrim]
> 252,9   17      510     0.034299630   841  Q   D 19922944 + 2048 [fstrim]
> 252,9   17      511     0.034306880   841  Q   D 19924992 + 2048 [fstrim]
> 252,9   17      512     0.034307955   841  Q   D 19927040 + 2048 [fstrim]
> 252,9   17      513     0.034308928   841  Q   D 19929088 + 2048 [fstrim]
> 252,9   17      514     0.034309945   841  Q   D 19931136 + 2048 [fstrim]
> 252,9   17      515     0.034311007   841  Q   D 19933184 + 2048 [fstrim]
> 252,9   17      516     0.034312008   841  Q   D 19935232 + 2048 [fstrim]
> 252,9   17      517     0.034313122   841  Q   D 19937280 + 2048 [fstrim]
> 252,9   17      518     0.034314013   841  Q   D 19939328 + 2048 [fstrim]
> 252,9   17      519     0.034314940   841  Q   D 19941376 + 2048 [fstrim]
> 252,9   17      520     0.034315835   841  Q   D 19943424 + 2048 [fstrim]
> 252,9   17      521     0.034316662   841  Q   D 19945472 + 2048 [fstrim]
> 252,9   17      522     0.034317547   841  Q   D 19947520 + 2048 [fstrim]
> ...
> 
> Here is the blktrace from xfs fstrim:
> 252,12  16        1     0.000000000   554  Q   D 96 + 2048 [fstrim]
> 252,12  16        2     0.000010149   554  Q   D 2144 + 2048 [fstrim]
> 252,12  16        3     0.000011349   554  Q   D 4192 + 2048 [fstrim]
> 252,12  16        4     0.000012584   554  Q   D 6240 + 2048 [fstrim]
> 252,12  16        5     0.000013685   554  Q   D 8288 + 2048 [fstrim]
> 252,12  16        6     0.000014660   554  Q   D 10336 + 2048 [fstrim]
> 252,12  16        7     0.000015707   554  Q   D 12384 + 2048 [fstrim]
> 252,12  16        8     0.000016692   554  Q   D 14432 + 2048 [fstrim]
> 252,12  16        9     0.000017594   554  Q   D 16480 + 2048 [fstrim]
> 252,12  16       10     0.000018539   554  Q   D 18528 + 2048 [fstrim]
> 252,12  16       11     0.000019434   554  Q   D 20576 + 2048 [fstrim]
> 252,12  16       12     0.000020879   554  Q   D 22624 + 2048 [fstrim]
> 252,12  16       13     0.000021856   554  Q   D 24672 + 2048 [fstrim]
> 252,12  16       14     0.000022786   554  Q   D 26720 + 2048 [fstrim]
> 252,12  16       15     0.000023699   554  Q   D 28768 + 2048 [fstrim]
> 252,12  16       16     0.000024672   554  Q   D 30816 + 2048 [fstrim]
> 252,12  16       17     0.000025467   554  Q   D 32864 + 2048 [fstrim]
> 252,12  16       18     0.000026374   554  Q   D 34912 + 2048 [fstrim]
> 252,12  16       19     0.000027194   554  Q   D 36960 + 2048 [fstrim]
> 252,12  16       20     0.000028137   554  Q   D 39008 + 2048 [fstrim]
> 252,12  16       21     0.000029524   554  Q   D 41056 + 2048 [fstrim]
> 252,12  16       22     0.000030479   554  Q   D 43104 + 2048 [fstrim]
> 252,12  16       23     0.000031306   554  Q   D 45152 + 2048 [fstrim]
> 252,12  16       24     0.000032134   554  Q   D 47200 + 2048 [fstrim]
> 252,12  16       25     0.000032964   554  Q   D 49248 + 2048 [fstrim]
> 252,12  16       26     0.000033794   554  Q   D 51296 + 2048 [fstrim]
> 
> 
> As you can see, while ext4 correctly aligns the discards to 1MB, xfs
> does not.

XFs just sends a large extent to blkdev_issue_discard(), and cares
nothing about discard alignment or granularity.

> It looks like an fstrim or xfs bug: they don't look at
> discard_alignment (=0 ... a less misleading name would be
> discard_offset imho) + discard_granularity (=1MB) and they don't
> base alignments on those.

It looks like blkdev_issue_discard() has reduced each discard to
bios of a single "granule" (1MB), and not aligned them, hence they
are ignore by dm-thinp.

what are the discard parameters exposed by dm-thinp in
/sys/block/<thinp-blkdev>/queue/discard*

It looks to me that dmthinp might be setting discard_max_bytes to
1MB rather than discard_granularity. Looking at dm-thin.c:

static void set_discard_limits(struct pool *pool, struct queue_limits *limits)
{
        /*
         * FIXME: these limits may be incompatible with the pool's data device
         */
        limits->max_discard_sectors = pool->sectors_per_block;

        /*
         * This is just a hint, and not enforced.  We have to cope with
         * bios that overlap 2 blocks.
         */
        limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
        limits->discard_zeroes_data = pool->pf.zero_new_blocks;
}


Yes - discard_max_bytes == discard_granularity, and so
blkdev_issue_discard fails to align the request properly. As it is,
setting discard_max_bytes to the thinp block size is silly - it
means you'll never get range requests, and we sent a discard for
every single block in a range rather than having the thinp code
iterate over a range itself.

i.e. this is not a filesystem bug that is causing the problem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-20 22:53       ` Dave Chinner
  0 siblings, 0 replies; 72+ messages in thread
From: Dave Chinner @ 2012-06-20 22:53 UTC (permalink / raw)
  To: Spelic; +Cc: device-mapper development, linux-ext4, xfs

On Wed, Jun 20, 2012 at 02:11:31PM +0200, Spelic wrote:
> Ok guys, I think I found the bug. One or more bugs.
> 
> 
> Pool has chunksize 1MB.
> In sysfs the thin volume has: queue/discard_max_bytes and
> queue/discard_granularity are 1048576 .
> And it has discard_alignment = 0, which based on sysfs-block
> documentation is correct (a less misleading name would have been
> discard_offset imho).
> Here is the blktrace from ext4 fstrim:
> ...
> 252,9   17      498     0.030466556   841  Q   D 19898368 + 2048 [fstrim]
> 252,9   17      499     0.030467501   841  Q   D 19900416 + 2048 [fstrim]
> 252,9   17      500     0.030468359   841  Q   D 19902464 + 2048 [fstrim]
> 252,9   17      501     0.030469313   841  Q   D 19904512 + 2048 [fstrim]
> 252,9   17      502     0.030470144   841  Q   D 19906560 + 2048 [fstrim]
> 252,9   17      503     0.030471381   841  Q   D 19908608 + 2048 [fstrim]
> 252,9   17      504     0.030472473   841  Q   D 19910656 + 2048 [fstrim]
> 252,9   17      505     0.030473504   841  Q   D 19912704 + 2048 [fstrim]
> 252,9   17      506     0.030474561   841  Q   D 19914752 + 2048 [fstrim]
> 252,9   17      507     0.030475571   841  Q   D 19916800 + 2048 [fstrim]
> 252,9   17      508     0.030476423   841  Q   D 19918848 + 2048 [fstrim]
> 252,9   17      509     0.030477341   841  Q   D 19920896 + 2048 [fstrim]
> 252,9   17      510     0.034299630   841  Q   D 19922944 + 2048 [fstrim]
> 252,9   17      511     0.034306880   841  Q   D 19924992 + 2048 [fstrim]
> 252,9   17      512     0.034307955   841  Q   D 19927040 + 2048 [fstrim]
> 252,9   17      513     0.034308928   841  Q   D 19929088 + 2048 [fstrim]
> 252,9   17      514     0.034309945   841  Q   D 19931136 + 2048 [fstrim]
> 252,9   17      515     0.034311007   841  Q   D 19933184 + 2048 [fstrim]
> 252,9   17      516     0.034312008   841  Q   D 19935232 + 2048 [fstrim]
> 252,9   17      517     0.034313122   841  Q   D 19937280 + 2048 [fstrim]
> 252,9   17      518     0.034314013   841  Q   D 19939328 + 2048 [fstrim]
> 252,9   17      519     0.034314940   841  Q   D 19941376 + 2048 [fstrim]
> 252,9   17      520     0.034315835   841  Q   D 19943424 + 2048 [fstrim]
> 252,9   17      521     0.034316662   841  Q   D 19945472 + 2048 [fstrim]
> 252,9   17      522     0.034317547   841  Q   D 19947520 + 2048 [fstrim]
> ...
> 
> Here is the blktrace from xfs fstrim:
> 252,12  16        1     0.000000000   554  Q   D 96 + 2048 [fstrim]
> 252,12  16        2     0.000010149   554  Q   D 2144 + 2048 [fstrim]
> 252,12  16        3     0.000011349   554  Q   D 4192 + 2048 [fstrim]
> 252,12  16        4     0.000012584   554  Q   D 6240 + 2048 [fstrim]
> 252,12  16        5     0.000013685   554  Q   D 8288 + 2048 [fstrim]
> 252,12  16        6     0.000014660   554  Q   D 10336 + 2048 [fstrim]
> 252,12  16        7     0.000015707   554  Q   D 12384 + 2048 [fstrim]
> 252,12  16        8     0.000016692   554  Q   D 14432 + 2048 [fstrim]
> 252,12  16        9     0.000017594   554  Q   D 16480 + 2048 [fstrim]
> 252,12  16       10     0.000018539   554  Q   D 18528 + 2048 [fstrim]
> 252,12  16       11     0.000019434   554  Q   D 20576 + 2048 [fstrim]
> 252,12  16       12     0.000020879   554  Q   D 22624 + 2048 [fstrim]
> 252,12  16       13     0.000021856   554  Q   D 24672 + 2048 [fstrim]
> 252,12  16       14     0.000022786   554  Q   D 26720 + 2048 [fstrim]
> 252,12  16       15     0.000023699   554  Q   D 28768 + 2048 [fstrim]
> 252,12  16       16     0.000024672   554  Q   D 30816 + 2048 [fstrim]
> 252,12  16       17     0.000025467   554  Q   D 32864 + 2048 [fstrim]
> 252,12  16       18     0.000026374   554  Q   D 34912 + 2048 [fstrim]
> 252,12  16       19     0.000027194   554  Q   D 36960 + 2048 [fstrim]
> 252,12  16       20     0.000028137   554  Q   D 39008 + 2048 [fstrim]
> 252,12  16       21     0.000029524   554  Q   D 41056 + 2048 [fstrim]
> 252,12  16       22     0.000030479   554  Q   D 43104 + 2048 [fstrim]
> 252,12  16       23     0.000031306   554  Q   D 45152 + 2048 [fstrim]
> 252,12  16       24     0.000032134   554  Q   D 47200 + 2048 [fstrim]
> 252,12  16       25     0.000032964   554  Q   D 49248 + 2048 [fstrim]
> 252,12  16       26     0.000033794   554  Q   D 51296 + 2048 [fstrim]
> 
> 
> As you can see, while ext4 correctly aligns the discards to 1MB, xfs
> does not.

XFs just sends a large extent to blkdev_issue_discard(), and cares
nothing about discard alignment or granularity.

> It looks like an fstrim or xfs bug: they don't look at
> discard_alignment (=0 ... a less misleading name would be
> discard_offset imho) + discard_granularity (=1MB) and they don't
> base alignments on those.

It looks like blkdev_issue_discard() has reduced each discard to
bios of a single "granule" (1MB), and not aligned them, hence they
are ignore by dm-thinp.

what are the discard parameters exposed by dm-thinp in
/sys/block/<thinp-blkdev>/queue/discard*

It looks to me that dmthinp might be setting discard_max_bytes to
1MB rather than discard_granularity. Looking at dm-thin.c:

static void set_discard_limits(struct pool *pool, struct queue_limits *limits)
{
        /*
         * FIXME: these limits may be incompatible with the pool's data device
         */
        limits->max_discard_sectors = pool->sectors_per_block;

        /*
         * This is just a hint, and not enforced.  We have to cope with
         * bios that overlap 2 blocks.
         */
        limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
        limits->discard_zeroes_data = pool->pf.zero_new_blocks;
}


Yes - discard_max_bytes == discard_granularity, and so
blkdev_issue_discard fails to align the request properly. As it is,
setting discard_max_bytes to the thinp block size is silly - it
means you'll never get range requests, and we sent a discard for
every single block in a range rather than having the thinp code
iterate over a range itself.

i.e. this is not a filesystem bug that is causing the problem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-20 22:53       ` Dave Chinner
@ 2012-06-21 17:47         ` Mike Snitzer
  -1 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-21 17:47 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Spelic, device-mapper development, linux-ext4, xfs,
	Paolo Bonzini, axboe, hch

On Wed, Jun 20 2012 at  6:53pm -0400,
Dave Chinner <david@fromorbit.com> wrote:

> On Wed, Jun 20, 2012 at 02:11:31PM +0200, Spelic wrote:
> > Ok guys, I think I found the bug. One or more bugs.
> > 
> > 
> > Pool has chunksize 1MB.
> > In sysfs the thin volume has: queue/discard_max_bytes and
> > queue/discard_granularity are 1048576 .
> > And it has discard_alignment = 0, which based on sysfs-block
> > documentation is correct (a less misleading name would have been
> > discard_offset imho).
> > Here is the blktrace from ext4 fstrim:
> > ...
> > 252,9   17      498     0.030466556   841  Q   D 19898368 + 2048 [fstrim]
> > 252,9   17      499     0.030467501   841  Q   D 19900416 + 2048 [fstrim]
> > 252,9   17      500     0.030468359   841  Q   D 19902464 + 2048 [fstrim]
> > 252,9   17      501     0.030469313   841  Q   D 19904512 + 2048 [fstrim]
> > 252,9   17      502     0.030470144   841  Q   D 19906560 + 2048 [fstrim]
> > 252,9   17      503     0.030471381   841  Q   D 19908608 + 2048 [fstrim]
> > 252,9   17      504     0.030472473   841  Q   D 19910656 + 2048 [fstrim]
> > 252,9   17      505     0.030473504   841  Q   D 19912704 + 2048 [fstrim]
> > 252,9   17      506     0.030474561   841  Q   D 19914752 + 2048 [fstrim]
> > 252,9   17      507     0.030475571   841  Q   D 19916800 + 2048 [fstrim]
> > 252,9   17      508     0.030476423   841  Q   D 19918848 + 2048 [fstrim]
> > 252,9   17      509     0.030477341   841  Q   D 19920896 + 2048 [fstrim]
> > 252,9   17      510     0.034299630   841  Q   D 19922944 + 2048 [fstrim]
> > 252,9   17      511     0.034306880   841  Q   D 19924992 + 2048 [fstrim]
> > 252,9   17      512     0.034307955   841  Q   D 19927040 + 2048 [fstrim]
> > 252,9   17      513     0.034308928   841  Q   D 19929088 + 2048 [fstrim]
> > 252,9   17      514     0.034309945   841  Q   D 19931136 + 2048 [fstrim]
> > 252,9   17      515     0.034311007   841  Q   D 19933184 + 2048 [fstrim]
> > 252,9   17      516     0.034312008   841  Q   D 19935232 + 2048 [fstrim]
> > 252,9   17      517     0.034313122   841  Q   D 19937280 + 2048 [fstrim]
> > 252,9   17      518     0.034314013   841  Q   D 19939328 + 2048 [fstrim]
> > 252,9   17      519     0.034314940   841  Q   D 19941376 + 2048 [fstrim]
> > 252,9   17      520     0.034315835   841  Q   D 19943424 + 2048 [fstrim]
> > 252,9   17      521     0.034316662   841  Q   D 19945472 + 2048 [fstrim]
> > 252,9   17      522     0.034317547   841  Q   D 19947520 + 2048 [fstrim]
> > ...
> > 
> > Here is the blktrace from xfs fstrim:
> > 252,12  16        1     0.000000000   554  Q   D 96 + 2048 [fstrim]
> > 252,12  16        2     0.000010149   554  Q   D 2144 + 2048 [fstrim]
> > 252,12  16        3     0.000011349   554  Q   D 4192 + 2048 [fstrim]
> > 252,12  16        4     0.000012584   554  Q   D 6240 + 2048 [fstrim]
> > 252,12  16        5     0.000013685   554  Q   D 8288 + 2048 [fstrim]
> > 252,12  16        6     0.000014660   554  Q   D 10336 + 2048 [fstrim]
> > 252,12  16        7     0.000015707   554  Q   D 12384 + 2048 [fstrim]
> > 252,12  16        8     0.000016692   554  Q   D 14432 + 2048 [fstrim]
> > 252,12  16        9     0.000017594   554  Q   D 16480 + 2048 [fstrim]
> > 252,12  16       10     0.000018539   554  Q   D 18528 + 2048 [fstrim]
> > 252,12  16       11     0.000019434   554  Q   D 20576 + 2048 [fstrim]
> > 252,12  16       12     0.000020879   554  Q   D 22624 + 2048 [fstrim]
> > 252,12  16       13     0.000021856   554  Q   D 24672 + 2048 [fstrim]
> > 252,12  16       14     0.000022786   554  Q   D 26720 + 2048 [fstrim]
> > 252,12  16       15     0.000023699   554  Q   D 28768 + 2048 [fstrim]
> > 252,12  16       16     0.000024672   554  Q   D 30816 + 2048 [fstrim]
> > 252,12  16       17     0.000025467   554  Q   D 32864 + 2048 [fstrim]
> > 252,12  16       18     0.000026374   554  Q   D 34912 + 2048 [fstrim]
> > 252,12  16       19     0.000027194   554  Q   D 36960 + 2048 [fstrim]
> > 252,12  16       20     0.000028137   554  Q   D 39008 + 2048 [fstrim]
> > 252,12  16       21     0.000029524   554  Q   D 41056 + 2048 [fstrim]
> > 252,12  16       22     0.000030479   554  Q   D 43104 + 2048 [fstrim]
> > 252,12  16       23     0.000031306   554  Q   D 45152 + 2048 [fstrim]
> > 252,12  16       24     0.000032134   554  Q   D 47200 + 2048 [fstrim]
> > 252,12  16       25     0.000032964   554  Q   D 49248 + 2048 [fstrim]
> > 252,12  16       26     0.000033794   554  Q   D 51296 + 2048 [fstrim]
> > 
> > 
> > As you can see, while ext4 correctly aligns the discards to 1MB, xfs
> > does not.
> 
> XFs just sends a large extent to blkdev_issue_discard(), and cares
> nothing about discard alignment or granularity.
> 
> > It looks like an fstrim or xfs bug: they don't look at
> > discard_alignment (=0 ... a less misleading name would be
> > discard_offset imho) + discard_granularity (=1MB) and they don't
> > base alignments on those.
> 
> It looks like blkdev_issue_discard() has reduced each discard to
> bios of a single "granule" (1MB), and not aligned them, hence they
> are ignore by dm-thinp.
> 
> what are the discard parameters exposed by dm-thinp in
> /sys/block/<thinp-blkdev>/queue/discard*
> 
> It looks to me that dmthinp might be setting discard_max_bytes to
> 1MB rather than discard_granularity. Looking at dm-thin.c:
> 
> static void set_discard_limits(struct pool *pool, struct queue_limits *limits)
> {
>         /*
>          * FIXME: these limits may be incompatible with the pool's data device
>          */
>         limits->max_discard_sectors = pool->sectors_per_block;
> 
>         /*
>          * This is just a hint, and not enforced.  We have to cope with
>          * bios that overlap 2 blocks.
>          */
>         limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
>         limits->discard_zeroes_data = pool->pf.zero_new_blocks;
> }
> 
> 
> Yes - discard_max_bytes == discard_granularity, and so
> blkdev_issue_discard fails to align the request properly. As it is,
> setting discard_max_bytes to the thinp block size is silly - it
> means you'll never get range requests, and we sent a discard for
> every single block in a range rather than having the thinp code
> iterate over a range itself.

So 2 different issues:
1) blkdev_issue_discard isn't properly aligning
2) thinp should accept larger discards (up to the stacked
   discard_max_bytes rather than setting an override)

> i.e. this is not a filesystem bug that is causing the problem....

Paolo Bonzini fixed blkdev_issue_discard to properly align some time
ago; unfortunately the patches slipped through the cracks (cc'ing Paolo,
Jens, and Christoph).

Here are references to Paolo's patches:
0/2 https://lkml.org/lkml/2012/3/14/323
1/2 https://lkml.org/lkml/2012/3/14/324
2/2 https://lkml.org/lkml/2012/3/14/325

Patch 2/2 specifically addresses the case where:
 discard_max_bytes == discard_granularity 

Paolo, any chance you could resend to Jens (maybe with hch's comments on
patch#2 accounted for)?  Also, please add hch's Reviewed-by when
reposting.

(would love to see this fixed for 3.5-rcX but if not 3.6 it is?)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-21 17:47         ` Mike Snitzer
  0 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-06-21 17:47 UTC (permalink / raw)
  To: Dave Chinner
  Cc: axboe, xfs, hch, device-mapper development, Spelic,
	Paolo Bonzini, linux-ext4

On Wed, Jun 20 2012 at  6:53pm -0400,
Dave Chinner <david@fromorbit.com> wrote:

> On Wed, Jun 20, 2012 at 02:11:31PM +0200, Spelic wrote:
> > Ok guys, I think I found the bug. One or more bugs.
> > 
> > 
> > Pool has chunksize 1MB.
> > In sysfs the thin volume has: queue/discard_max_bytes and
> > queue/discard_granularity are 1048576 .
> > And it has discard_alignment = 0, which based on sysfs-block
> > documentation is correct (a less misleading name would have been
> > discard_offset imho).
> > Here is the blktrace from ext4 fstrim:
> > ...
> > 252,9   17      498     0.030466556   841  Q   D 19898368 + 2048 [fstrim]
> > 252,9   17      499     0.030467501   841  Q   D 19900416 + 2048 [fstrim]
> > 252,9   17      500     0.030468359   841  Q   D 19902464 + 2048 [fstrim]
> > 252,9   17      501     0.030469313   841  Q   D 19904512 + 2048 [fstrim]
> > 252,9   17      502     0.030470144   841  Q   D 19906560 + 2048 [fstrim]
> > 252,9   17      503     0.030471381   841  Q   D 19908608 + 2048 [fstrim]
> > 252,9   17      504     0.030472473   841  Q   D 19910656 + 2048 [fstrim]
> > 252,9   17      505     0.030473504   841  Q   D 19912704 + 2048 [fstrim]
> > 252,9   17      506     0.030474561   841  Q   D 19914752 + 2048 [fstrim]
> > 252,9   17      507     0.030475571   841  Q   D 19916800 + 2048 [fstrim]
> > 252,9   17      508     0.030476423   841  Q   D 19918848 + 2048 [fstrim]
> > 252,9   17      509     0.030477341   841  Q   D 19920896 + 2048 [fstrim]
> > 252,9   17      510     0.034299630   841  Q   D 19922944 + 2048 [fstrim]
> > 252,9   17      511     0.034306880   841  Q   D 19924992 + 2048 [fstrim]
> > 252,9   17      512     0.034307955   841  Q   D 19927040 + 2048 [fstrim]
> > 252,9   17      513     0.034308928   841  Q   D 19929088 + 2048 [fstrim]
> > 252,9   17      514     0.034309945   841  Q   D 19931136 + 2048 [fstrim]
> > 252,9   17      515     0.034311007   841  Q   D 19933184 + 2048 [fstrim]
> > 252,9   17      516     0.034312008   841  Q   D 19935232 + 2048 [fstrim]
> > 252,9   17      517     0.034313122   841  Q   D 19937280 + 2048 [fstrim]
> > 252,9   17      518     0.034314013   841  Q   D 19939328 + 2048 [fstrim]
> > 252,9   17      519     0.034314940   841  Q   D 19941376 + 2048 [fstrim]
> > 252,9   17      520     0.034315835   841  Q   D 19943424 + 2048 [fstrim]
> > 252,9   17      521     0.034316662   841  Q   D 19945472 + 2048 [fstrim]
> > 252,9   17      522     0.034317547   841  Q   D 19947520 + 2048 [fstrim]
> > ...
> > 
> > Here is the blktrace from xfs fstrim:
> > 252,12  16        1     0.000000000   554  Q   D 96 + 2048 [fstrim]
> > 252,12  16        2     0.000010149   554  Q   D 2144 + 2048 [fstrim]
> > 252,12  16        3     0.000011349   554  Q   D 4192 + 2048 [fstrim]
> > 252,12  16        4     0.000012584   554  Q   D 6240 + 2048 [fstrim]
> > 252,12  16        5     0.000013685   554  Q   D 8288 + 2048 [fstrim]
> > 252,12  16        6     0.000014660   554  Q   D 10336 + 2048 [fstrim]
> > 252,12  16        7     0.000015707   554  Q   D 12384 + 2048 [fstrim]
> > 252,12  16        8     0.000016692   554  Q   D 14432 + 2048 [fstrim]
> > 252,12  16        9     0.000017594   554  Q   D 16480 + 2048 [fstrim]
> > 252,12  16       10     0.000018539   554  Q   D 18528 + 2048 [fstrim]
> > 252,12  16       11     0.000019434   554  Q   D 20576 + 2048 [fstrim]
> > 252,12  16       12     0.000020879   554  Q   D 22624 + 2048 [fstrim]
> > 252,12  16       13     0.000021856   554  Q   D 24672 + 2048 [fstrim]
> > 252,12  16       14     0.000022786   554  Q   D 26720 + 2048 [fstrim]
> > 252,12  16       15     0.000023699   554  Q   D 28768 + 2048 [fstrim]
> > 252,12  16       16     0.000024672   554  Q   D 30816 + 2048 [fstrim]
> > 252,12  16       17     0.000025467   554  Q   D 32864 + 2048 [fstrim]
> > 252,12  16       18     0.000026374   554  Q   D 34912 + 2048 [fstrim]
> > 252,12  16       19     0.000027194   554  Q   D 36960 + 2048 [fstrim]
> > 252,12  16       20     0.000028137   554  Q   D 39008 + 2048 [fstrim]
> > 252,12  16       21     0.000029524   554  Q   D 41056 + 2048 [fstrim]
> > 252,12  16       22     0.000030479   554  Q   D 43104 + 2048 [fstrim]
> > 252,12  16       23     0.000031306   554  Q   D 45152 + 2048 [fstrim]
> > 252,12  16       24     0.000032134   554  Q   D 47200 + 2048 [fstrim]
> > 252,12  16       25     0.000032964   554  Q   D 49248 + 2048 [fstrim]
> > 252,12  16       26     0.000033794   554  Q   D 51296 + 2048 [fstrim]
> > 
> > 
> > As you can see, while ext4 correctly aligns the discards to 1MB, xfs
> > does not.
> 
> XFs just sends a large extent to blkdev_issue_discard(), and cares
> nothing about discard alignment or granularity.
> 
> > It looks like an fstrim or xfs bug: they don't look at
> > discard_alignment (=0 ... a less misleading name would be
> > discard_offset imho) + discard_granularity (=1MB) and they don't
> > base alignments on those.
> 
> It looks like blkdev_issue_discard() has reduced each discard to
> bios of a single "granule" (1MB), and not aligned them, hence they
> are ignore by dm-thinp.
> 
> what are the discard parameters exposed by dm-thinp in
> /sys/block/<thinp-blkdev>/queue/discard*
> 
> It looks to me that dmthinp might be setting discard_max_bytes to
> 1MB rather than discard_granularity. Looking at dm-thin.c:
> 
> static void set_discard_limits(struct pool *pool, struct queue_limits *limits)
> {
>         /*
>          * FIXME: these limits may be incompatible with the pool's data device
>          */
>         limits->max_discard_sectors = pool->sectors_per_block;
> 
>         /*
>          * This is just a hint, and not enforced.  We have to cope with
>          * bios that overlap 2 blocks.
>          */
>         limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
>         limits->discard_zeroes_data = pool->pf.zero_new_blocks;
> }
> 
> 
> Yes - discard_max_bytes == discard_granularity, and so
> blkdev_issue_discard fails to align the request properly. As it is,
> setting discard_max_bytes to the thinp block size is silly - it
> means you'll never get range requests, and we sent a discard for
> every single block in a range rather than having the thinp code
> iterate over a range itself.

So 2 different issues:
1) blkdev_issue_discard isn't properly aligning
2) thinp should accept larger discards (up to the stacked
   discard_max_bytes rather than setting an override)

> i.e. this is not a filesystem bug that is causing the problem....

Paolo Bonzini fixed blkdev_issue_discard to properly align some time
ago; unfortunately the patches slipped through the cracks (cc'ing Paolo,
Jens, and Christoph).

Here are references to Paolo's patches:
0/2 https://lkml.org/lkml/2012/3/14/323
1/2 https://lkml.org/lkml/2012/3/14/324
2/2 https://lkml.org/lkml/2012/3/14/325

Patch 2/2 specifically addresses the case where:
 discard_max_bytes == discard_granularity 

Paolo, any chance you could resend to Jens (maybe with hch's comments on
patch#2 accounted for)?  Also, please add hch's Reviewed-by when
reposting.

(would love to see this fixed for 3.5-rcX but if not 3.6 it is?)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-21 17:47         ` Mike Snitzer
@ 2012-06-21 23:29           ` Dave Chinner
  -1 siblings, 0 replies; 72+ messages in thread
From: Dave Chinner @ 2012-06-21 23:29 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Spelic, device-mapper development, linux-ext4, xfs,
	Paolo Bonzini, axboe, hch

On Thu, Jun 21, 2012 at 01:47:43PM -0400, Mike Snitzer wrote:
> On Wed, Jun 20 2012 at  6:53pm -0400,
> Dave Chinner <david@fromorbit.com> wrote:
> 
> > On Wed, Jun 20, 2012 at 02:11:31PM +0200, Spelic wrote:
> > > Ok guys, I think I found the bug. One or more bugs.
> > > 
> > > 
> > > Pool has chunksize 1MB.
> > > In sysfs the thin volume has: queue/discard_max_bytes and
> > > queue/discard_granularity are 1048576 .
> > > And it has discard_alignment = 0, which based on sysfs-block
> > > documentation is correct (a less misleading name would have been
> > > discard_offset imho).
> > > Here is the blktrace from ext4 fstrim:
> > > ...
> > > 252,9   17      498     0.030466556   841  Q   D 19898368 + 2048 [fstrim]
> > > 252,9   17      499     0.030467501   841  Q   D 19900416 + 2048 [fstrim]
> > > 252,9   17      500     0.030468359   841  Q   D 19902464 + 2048 [fstrim]
....
> > > Here is the blktrace from xfs fstrim:
> > > 252,12  16        1     0.000000000   554  Q   D 96 + 2048 [fstrim]
> > > 252,12  16        2     0.000010149   554  Q   D 2144 + 2048 [fstrim]
> > > 252,12  16        3     0.000011349   554  Q   D 4192 + 2048 [fstrim]
.....
> > It looks like blkdev_issue_discard() has reduced each discard to
> > bios of a single "granule" (1MB), and not aligned them, hence they
> > are ignore by dm-thinp.
> > 
> > what are the discard parameters exposed by dm-thinp in
> > /sys/block/<thinp-blkdev>/queue/discard*
> > 
> > It looks to me that dmthinp might be setting discard_max_bytes to
> > 1MB rather than discard_granularity. Looking at dm-thin.c:
> > 
> > static void set_discard_limits(struct pool *pool, struct queue_limits *limits)
> > {
> >         /*
> >          * FIXME: these limits may be incompatible with the pool's data device
> >          */
> >         limits->max_discard_sectors = pool->sectors_per_block;
> > 
> >         /*
> >          * This is just a hint, and not enforced.  We have to cope with
> >          * bios that overlap 2 blocks.
> >          */
> >         limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
> >         limits->discard_zeroes_data = pool->pf.zero_new_blocks;
> > }
> > 
> > 
> > Yes - discard_max_bytes == discard_granularity, and so
> > blkdev_issue_discard fails to align the request properly. As it is,
> > setting discard_max_bytes to the thinp block size is silly - it
> > means you'll never get range requests, and we sent a discard for
> > every single block in a range rather than having the thinp code
> > iterate over a range itself.
> 
> So 2 different issues:
> 1) blkdev_issue_discard isn't properly aligning
> 2) thinp should accept larger discards (up to the stacked
>    discard_max_bytes rather than setting an override)

Yes, in effect, but there's no real reason I can see why thinp can't
accept large discard requests than the underlying stack and break
them up appropriately itself....

> > i.e. this is not a filesystem bug that is causing the problem....
> 
> Paolo Bonzini fixed blkdev_issue_discard to properly align some time
> ago; unfortunately the patches slipped through the cracks (cc'ing Paolo,
> Jens, and Christoph).
> 
> Here are references to Paolo's patches:
> 0/2 https://lkml.org/lkml/2012/3/14/323
> 1/2 https://lkml.org/lkml/2012/3/14/324
> 2/2 https://lkml.org/lkml/2012/3/14/325
> 
> Patch 2/2 specifically addresses the case where:
>  discard_max_bytes == discard_granularity 
> 
> Paolo, any chance you could resend to Jens (maybe with hch's comments on
> patch#2 accounted for)?  Also, please add hch's Reviewed-by when
> reposting.
> 
> (would love to see this fixed for 3.5-rcX but if not 3.6 it is?)

That would be good...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-06-21 23:29           ` Dave Chinner
  0 siblings, 0 replies; 72+ messages in thread
From: Dave Chinner @ 2012-06-21 23:29 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, xfs, hch, device-mapper development, Spelic,
	Paolo Bonzini, linux-ext4

On Thu, Jun 21, 2012 at 01:47:43PM -0400, Mike Snitzer wrote:
> On Wed, Jun 20 2012 at  6:53pm -0400,
> Dave Chinner <david@fromorbit.com> wrote:
> 
> > On Wed, Jun 20, 2012 at 02:11:31PM +0200, Spelic wrote:
> > > Ok guys, I think I found the bug. One or more bugs.
> > > 
> > > 
> > > Pool has chunksize 1MB.
> > > In sysfs the thin volume has: queue/discard_max_bytes and
> > > queue/discard_granularity are 1048576 .
> > > And it has discard_alignment = 0, which based on sysfs-block
> > > documentation is correct (a less misleading name would have been
> > > discard_offset imho).
> > > Here is the blktrace from ext4 fstrim:
> > > ...
> > > 252,9   17      498     0.030466556   841  Q   D 19898368 + 2048 [fstrim]
> > > 252,9   17      499     0.030467501   841  Q   D 19900416 + 2048 [fstrim]
> > > 252,9   17      500     0.030468359   841  Q   D 19902464 + 2048 [fstrim]
....
> > > Here is the blktrace from xfs fstrim:
> > > 252,12  16        1     0.000000000   554  Q   D 96 + 2048 [fstrim]
> > > 252,12  16        2     0.000010149   554  Q   D 2144 + 2048 [fstrim]
> > > 252,12  16        3     0.000011349   554  Q   D 4192 + 2048 [fstrim]
.....
> > It looks like blkdev_issue_discard() has reduced each discard to
> > bios of a single "granule" (1MB), and not aligned them, hence they
> > are ignore by dm-thinp.
> > 
> > what are the discard parameters exposed by dm-thinp in
> > /sys/block/<thinp-blkdev>/queue/discard*
> > 
> > It looks to me that dmthinp might be setting discard_max_bytes to
> > 1MB rather than discard_granularity. Looking at dm-thin.c:
> > 
> > static void set_discard_limits(struct pool *pool, struct queue_limits *limits)
> > {
> >         /*
> >          * FIXME: these limits may be incompatible with the pool's data device
> >          */
> >         limits->max_discard_sectors = pool->sectors_per_block;
> > 
> >         /*
> >          * This is just a hint, and not enforced.  We have to cope with
> >          * bios that overlap 2 blocks.
> >          */
> >         limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
> >         limits->discard_zeroes_data = pool->pf.zero_new_blocks;
> > }
> > 
> > 
> > Yes - discard_max_bytes == discard_granularity, and so
> > blkdev_issue_discard fails to align the request properly. As it is,
> > setting discard_max_bytes to the thinp block size is silly - it
> > means you'll never get range requests, and we sent a discard for
> > every single block in a range rather than having the thinp code
> > iterate over a range itself.
> 
> So 2 different issues:
> 1) blkdev_issue_discard isn't properly aligning
> 2) thinp should accept larger discards (up to the stacked
>    discard_max_bytes rather than setting an override)

Yes, in effect, but there's no real reason I can see why thinp can't
accept large discard requests than the underlying stack and break
them up appropriately itself....

> > i.e. this is not a filesystem bug that is causing the problem....
> 
> Paolo Bonzini fixed blkdev_issue_discard to properly align some time
> ago; unfortunately the patches slipped through the cracks (cc'ing Paolo,
> Jens, and Christoph).
> 
> Here are references to Paolo's patches:
> 0/2 https://lkml.org/lkml/2012/3/14/323
> 1/2 https://lkml.org/lkml/2012/3/14/324
> 2/2 https://lkml.org/lkml/2012/3/14/325
> 
> Patch 2/2 specifically addresses the case where:
>  discard_max_bytes == discard_granularity 
> 
> Paolo, any chance you could resend to Jens (maybe with hch's comments on
> patch#2 accounted for)?  Also, please add hch's Reviewed-by when
> reposting.
> 
> (would love to see this fixed for 3.5-rcX but if not 3.6 it is?)

That would be good...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-06-21 17:47         ` Mike Snitzer
@ 2012-07-01 14:53           ` Paolo Bonzini
  -1 siblings, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-07-01 14:53 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Dave Chinner, Spelic, device-mapper development, linux-ext4, xfs,
	axboe, hch

Il 21/06/2012 19:47, Mike Snitzer ha scritto:
> Paolo Bonzini fixed blkdev_issue_discard to properly align some time
> ago; unfortunately the patches slipped through the cracks (cc'ing Paolo,
> Jens, and Christoph).
> 
> Here are references to Paolo's patches:
> 0/2 https://lkml.org/lkml/2012/3/14/323
> 1/2 https://lkml.org/lkml/2012/3/14/324
> 2/2 https://lkml.org/lkml/2012/3/14/325
> 
> Patch 2/2 specifically addresses the case where:
>  discard_max_bytes == discard_granularity 
> 
> Paolo, any chance you could resend to Jens (maybe with hch's comments on
> patch#2 accounted for)?  Also, please add hch's Reviewed-by when
> reposting.

Sure, I'll do it this week.  I just need to retest.

Paolo


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-07-01 14:53           ` Paolo Bonzini
  0 siblings, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-07-01 14:53 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, xfs, hch, device-mapper development, Spelic, linux-ext4

Il 21/06/2012 19:47, Mike Snitzer ha scritto:
> Paolo Bonzini fixed blkdev_issue_discard to properly align some time
> ago; unfortunately the patches slipped through the cracks (cc'ing Paolo,
> Jens, and Christoph).
> 
> Here are references to Paolo's patches:
> 0/2 https://lkml.org/lkml/2012/3/14/323
> 1/2 https://lkml.org/lkml/2012/3/14/324
> 2/2 https://lkml.org/lkml/2012/3/14/325
> 
> Patch 2/2 specifically addresses the case where:
>  discard_max_bytes == discard_granularity 
> 
> Paolo, any chance you could resend to Jens (maybe with hch's comments on
> patch#2 accounted for)?  Also, please add hch's Reviewed-by when
> reposting.

Sure, I'll do it this week.  I just need to retest.

Paolo

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-07-01 14:53           ` Paolo Bonzini
@ 2012-07-02 13:00             ` Mike Snitzer
  -1 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-07-02 13:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Dave Chinner, Spelic, device-mapper development, linux-ext4, xfs,
	axboe, hch, Martin K. Petersen

On Sun, Jul 01 2012 at 10:53am -0400,
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 21/06/2012 19:47, Mike Snitzer ha scritto:
> > Paolo Bonzini fixed blkdev_issue_discard to properly align some time
> > ago; unfortunately the patches slipped through the cracks (cc'ing Paolo,
> > Jens, and Christoph).
> > 
> > Here are references to Paolo's patches:
> > 0/2 https://lkml.org/lkml/2012/3/14/323
> > 1/2 https://lkml.org/lkml/2012/3/14/324
> > 2/2 https://lkml.org/lkml/2012/3/14/325
> > 
> > Patch 2/2 specifically addresses the case where:
> >  discard_max_bytes == discard_granularity 
> > 
> > Paolo, any chance you could resend to Jens (maybe with hch's comments on
> > patch#2 accounted for)?  Also, please add hch's Reviewed-by when
> > reposting.
> 
> Sure, I'll do it this week.  I just need to retest.

Great, thanks.

(cc'ing mkp)

One thing that seemed odd was your adjustment for discard_alignment (in
patch 1/2).

I need to better understand how discard_alignment (an offset despite the
name not saying as much) relates to alignment_offset.

Could just be that once a partition tool, or lvm, etc account for
alignment_offset (which they do now) that discard_alignment is
automagically accounted for as a side-effect?

(I haven't actually seen discard_alignment != 0 in the wild)

Mike

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-07-02 13:00             ` Mike Snitzer
  0 siblings, 0 replies; 72+ messages in thread
From: Mike Snitzer @ 2012-07-02 13:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: axboe, Martin K. Petersen, xfs, hch, device-mapper development,
	Spelic, linux-ext4

On Sun, Jul 01 2012 at 10:53am -0400,
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 21/06/2012 19:47, Mike Snitzer ha scritto:
> > Paolo Bonzini fixed blkdev_issue_discard to properly align some time
> > ago; unfortunately the patches slipped through the cracks (cc'ing Paolo,
> > Jens, and Christoph).
> > 
> > Here are references to Paolo's patches:
> > 0/2 https://lkml.org/lkml/2012/3/14/323
> > 1/2 https://lkml.org/lkml/2012/3/14/324
> > 2/2 https://lkml.org/lkml/2012/3/14/325
> > 
> > Patch 2/2 specifically addresses the case where:
> >  discard_max_bytes == discard_granularity 
> > 
> > Paolo, any chance you could resend to Jens (maybe with hch's comments on
> > patch#2 accounted for)?  Also, please add hch's Reviewed-by when
> > reposting.
> 
> Sure, I'll do it this week.  I just need to retest.

Great, thanks.

(cc'ing mkp)

One thing that seemed odd was your adjustment for discard_alignment (in
patch 1/2).

I need to better understand how discard_alignment (an offset despite the
name not saying as much) relates to alignment_offset.

Could just be that once a partition tool, or lvm, etc account for
alignment_offset (which they do now) that discard_alignment is
automagically accounted for as a side-effect?

(I haven't actually seen discard_alignment != 0 in the wild)

Mike

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
  2012-07-02 13:00             ` Mike Snitzer
@ 2012-07-02 13:15               ` Paolo Bonzini
  -1 siblings, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-07-02 13:15 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Dave Chinner, Spelic, device-mapper development, linux-ext4, xfs,
	axboe, hch, Martin K. Petersen

Il 02/07/2012 15:00, Mike Snitzer ha scritto:
> On Sun, Jul 01 2012 at 10:53am -0400,
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> Il 21/06/2012 19:47, Mike Snitzer ha scritto:
>>> Paolo Bonzini fixed blkdev_issue_discard to properly align some time
>>> ago; unfortunately the patches slipped through the cracks (cc'ing Paolo,
>>> Jens, and Christoph).
>>>
>>> Here are references to Paolo's patches:
>>> 0/2 https://lkml.org/lkml/2012/3/14/323
>>> 1/2 https://lkml.org/lkml/2012/3/14/324
>>> 2/2 https://lkml.org/lkml/2012/3/14/325
>>>
>>> Patch 2/2 specifically addresses the case where:
>>>  discard_max_bytes == discard_granularity 
>>>
>>> Paolo, any chance you could resend to Jens (maybe with hch's comments on
>>> patch#2 accounted for)?  Also, please add hch's Reviewed-by when
>>> reposting.
>>
>> Sure, I'll do it this week.  I just need to retest.
> 
> Great, thanks.
> 
> (cc'ing mkp)
> 
> One thing that seemed odd was your adjustment for discard_alignment (in
> patch 1/2).
> 
> I need to better understand how discard_alignment (an offset despite the
> name not saying as much) relates to alignment_offset.

In principle, it doesn't.  All SBC says is:

  The UNMAP GRANULARITY ALIGNMENT field indicates the LBA of the first
  logical block to which the OPTIMAL UNMAP GRANULARITY field applies.
  The unmap granularity alignment is used to calculate an optimal unmap
  request starting LBA as follows:

   optimal unmap request starting LBA = (n * optimal unmap granularity)
      + unmap granularity alignment

and what my patch does is ensure that all requests except the first
start at such an LBA.

In practice, there is a connection between the two, because a sane disk
will make all discard_alignment-aligned sectors also
alignment_offset-aligned, or vice versa, or both (depending on whether
1<<phys_exp is < > or = to discard_granularity).

> Could just be that once a partition tool, or lvm, etc account for
> alignment_offset (which they do now) that discard_alignment is
> automagically accounted for as a side-effect?

Yes, if discard_granularity <= 1<<phys_exp.  In that case, the condition
above simplifies to discard_alignment == alignment_offset %
discard_granularity.  Your partitions will be already aligned to both
alignment_offset and discard_alignment.

It seems more likely that discard_granularity > 1<<phys_exp if they
differ at all, in which case the partition tool will improve the
situation but still not reach an optimal setting.

The optimal positioning of partitions/logical volumes/etc. would be to
align them to lcm(1<<phys_exp, discard_granularity), and "misalign" the
starting sector by max(discard_alignment, alignment_offset).

> (I haven't actually seen discard_alignment != 0 in the wild)

Me neither, but it was easy to account for it in the patch.

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Ext4 and xfs problems in dm-thin on allocation and discard
@ 2012-07-02 13:15               ` Paolo Bonzini
  0 siblings, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-07-02 13:15 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, Martin K. Petersen, xfs, hch, device-mapper development,
	Spelic, linux-ext4

Il 02/07/2012 15:00, Mike Snitzer ha scritto:
> On Sun, Jul 01 2012 at 10:53am -0400,
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> Il 21/06/2012 19:47, Mike Snitzer ha scritto:
>>> Paolo Bonzini fixed blkdev_issue_discard to properly align some time
>>> ago; unfortunately the patches slipped through the cracks (cc'ing Paolo,
>>> Jens, and Christoph).
>>>
>>> Here are references to Paolo's patches:
>>> 0/2 https://lkml.org/lkml/2012/3/14/323
>>> 1/2 https://lkml.org/lkml/2012/3/14/324
>>> 2/2 https://lkml.org/lkml/2012/3/14/325
>>>
>>> Patch 2/2 specifically addresses the case where:
>>>  discard_max_bytes == discard_granularity 
>>>
>>> Paolo, any chance you could resend to Jens (maybe with hch's comments on
>>> patch#2 accounted for)?  Also, please add hch's Reviewed-by when
>>> reposting.
>>
>> Sure, I'll do it this week.  I just need to retest.
> 
> Great, thanks.
> 
> (cc'ing mkp)
> 
> One thing that seemed odd was your adjustment for discard_alignment (in
> patch 1/2).
> 
> I need to better understand how discard_alignment (an offset despite the
> name not saying as much) relates to alignment_offset.

In principle, it doesn't.  All SBC says is:

  The UNMAP GRANULARITY ALIGNMENT field indicates the LBA of the first
  logical block to which the OPTIMAL UNMAP GRANULARITY field applies.
  The unmap granularity alignment is used to calculate an optimal unmap
  request starting LBA as follows:

   optimal unmap request starting LBA = (n * optimal unmap granularity)
      + unmap granularity alignment

and what my patch does is ensure that all requests except the first
start at such an LBA.

In practice, there is a connection between the two, because a sane disk
will make all discard_alignment-aligned sectors also
alignment_offset-aligned, or vice versa, or both (depending on whether
1<<phys_exp is < > or = to discard_granularity).

> Could just be that once a partition tool, or lvm, etc account for
> alignment_offset (which they do now) that discard_alignment is
> automagically accounted for as a side-effect?

Yes, if discard_granularity <= 1<<phys_exp.  In that case, the condition
above simplifies to discard_alignment == alignment_offset %
discard_granularity.  Your partitions will be already aligned to both
alignment_offset and discard_alignment.

It seems more likely that discard_granularity > 1<<phys_exp if they
differ at all, in which case the partition tool will improve the
situation but still not reach an optimal setting.

The optimal positioning of partitions/logical volumes/etc. would be to
align them to lcm(1<<phys_exp, discard_granularity), and "misalign" the
starting sector by max(discard_alignment, alignment_offset).

> (I haven't actually seen discard_alignment != 0 in the wild)

Me neither, but it was easy to account for it in the patch.

Paolo

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2012-07-02 13:15 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-18 21:33 Ext4 and xfs problems in dm-thin on allocation and discard Spelic
2012-06-18 21:33 ` Spelic
2012-06-19  1:57 ` Dave Chinner
2012-06-19  1:57   ` Dave Chinner
2012-06-19  3:12   ` Mike Snitzer
2012-06-19  3:12     ` Mike Snitzer
2012-06-19  6:32     ` Lukáš Czerner
2012-06-19  6:32       ` Lukáš Czerner
2012-06-19 11:29       ` Spelic
2012-06-19 11:29         ` Spelic
2012-06-19 12:20         ` Lukáš Czerner
2012-06-19 12:20           ` Lukáš Czerner
2012-06-19 13:34         ` Mike Snitzer
2012-06-19 13:34           ` Mike Snitzer
2012-06-19 13:16       ` Mike Snitzer
2012-06-19 13:16         ` Mike Snitzer
2012-06-19 13:25         ` Lukáš Czerner
2012-06-19 13:25           ` Lukáš Czerner
2012-06-19 13:30           ` Mike Snitzer
2012-06-19 13:30             ` Mike Snitzer
2012-06-19 13:52             ` Spelic
2012-06-19 13:52               ` Spelic
2012-06-19 14:05               ` Eric Sandeen
2012-06-19 14:05                 ` Eric Sandeen
2012-06-19 14:44               ` Mike Snitzer
2012-06-19 14:44                 ` Mike Snitzer
2012-06-19 18:48                 ` Mike Snitzer
2012-06-19 18:48                   ` Mike Snitzer
2012-06-19 20:06                   ` Dave Chinner
2012-06-19 20:06                     ` Dave Chinner
2012-06-19 20:21                     ` Ted Ts'o
2012-06-19 20:21                       ` Ted Ts'o
2012-06-19 20:39                       ` Dave Chinner
2012-06-19 20:39                         ` Dave Chinner
2012-06-20  9:01                         ` Christoph Hellwig
2012-06-20  9:01                           ` Christoph Hellwig
2012-06-19 21:37                     ` Spelic
2012-06-19 21:37                       ` Spelic
2012-06-19 23:12                       ` Dave Chinner
2012-06-19 23:12                         ` Dave Chinner
2012-06-20 12:11   ` Spelic
2012-06-20 12:11     ` Spelic
2012-06-20 22:53     ` Dave Chinner
2012-06-20 22:53       ` Dave Chinner
2012-06-21 17:47       ` Mike Snitzer
2012-06-21 17:47         ` Mike Snitzer
2012-06-21 23:29         ` Dave Chinner
2012-06-21 23:29           ` Dave Chinner
2012-07-01 14:53         ` Paolo Bonzini
2012-07-01 14:53           ` Paolo Bonzini
2012-07-02 13:00           ` Mike Snitzer
2012-07-02 13:00             ` Mike Snitzer
2012-07-02 13:15             ` Paolo Bonzini
2012-07-02 13:15               ` Paolo Bonzini
2012-06-19 14:09 ` Lukáš Czerner
2012-06-19 14:09   ` Lukáš Czerner
2012-06-19 14:19   ` Ted Ts'o
2012-06-19 14:19     ` Ted Ts'o
2012-06-19 14:23     ` Eric Sandeen
2012-06-19 14:23       ` Eric Sandeen
2012-06-19 14:37     ` Lukáš Czerner
2012-06-19 14:37       ` Lukáš Czerner
2012-06-19 14:43     ` [dm-devel] " Alasdair G Kergon
2012-06-19 14:43       ` Alasdair G Kergon
2012-06-19 15:28       ` Mike Snitzer
2012-06-19 15:28         ` Mike Snitzer
2012-06-19 16:03         ` [dm-devel] " Alasdair G Kergon
2012-06-19 16:03           ` Alasdair G Kergon
2012-06-19 19:58         ` Ted Ts'o
2012-06-19 19:58           ` Ted Ts'o
2012-06-19 20:44           ` Mike Snitzer
2012-06-19 20:44             ` Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.