All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: call for slideware ;)
       [not found] <20110209231656.GB5193@redhat.com>
@ 2011-02-10 14:59 ` Joe Thornber
  2011-02-23  1:21   ` Mike Snitzer
  0 siblings, 1 reply; 5+ messages in thread
From: Joe Thornber @ 2011-02-10 14:59 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Heinz Mauelshagen, Alasdair G. Kergon

Hi Mike,

On Wed, 2011-02-09 at 18:16 -0500, Mike Snitzer wrote:
> Joe and/or Heinz,
> 
> Could you provide a few slides on the thinp and shared snapshot
> infrastructure and targets?  Planned features and performance
> benefits,
> etc.
> 
> 

I've started a new project on GitHub:

https://github.com/jthornber/storage-papers

Heinz and I have started putting stuff in there.  We'll add more content
this afternoon.  I'm using a text format with svg files for the
diagrams.  Currently just generating html which I'll try and upload
somewhere.

- Joe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: call for slideware ;)
  2011-02-10 14:59 ` call for slideware ;) Joe Thornber
@ 2011-02-23  1:21   ` Mike Snitzer
  2011-02-23 12:18     ` Heinz Mauelshagen
  2011-02-23 12:24     ` Joe Thornber
  0 siblings, 2 replies; 5+ messages in thread
From: Mike Snitzer @ 2011-02-23  1:21 UTC (permalink / raw)
  To: Joe Thornber; +Cc: Heinz Mauelshagen

On Thu, Feb 10 2011 at  9:59am -0500,
Joe Thornber <thornber@redhat.com> wrote:

> Hi Mike,
> 
> On Wed, 2011-02-09 at 18:16 -0500, Mike Snitzer wrote:
> > Joe and/or Heinz,
> > 
> > Could you provide a few slides on the thinp and shared snapshot
> > infrastructure and targets?  Planned features and performance
> > benefits,
> > etc.
> > 
> > 
> 
> I've started a new project on GitHub:
> 
> https://github.com/jthornber/storage-papers
> 
> Heinz and I have started putting stuff in there.

I just had a look at the latest content and have some questions (way
more than I'd imagine you'd like to see.. means I'm clearly missing a
lot):

1) from "Solution" slide:
   "Space comes from a preallocated ‘pool’, which is itself just another
   logical volume, thus can be resized on demand."
   ...
   "Separate metadata device simplifies extension, this is hidden by the
    LVM system so sys admin unlikely to be aware of it."
    Q: Can you elaborate on the role of the metadata?  It maps between
       physical "area" (allocated from pool) for all writes to the
       logical address space?
    Q: can thinp and snapshot metadata coexist in the same pool? -- ask
       similar question below.

2) from "Block size choice" slide:
   The larger the block size:
   - the less chance there is of fragmentation (describe this)
     Q: can you please "describe this"? :)
   - the less frequently we need the expensive mapping operation
     Q: "expensive" is all relative, seems you contradict the expense of
        the mapping operation in the "Performance" slide?
   - the smaller the metadata tables are, so more of them can be held in core
     at a time. Leading to faster access to the provisioned blocks by
     minimizing reading in mapping information
     Q: "more of them" -- "them" being metadata tables?  So the take
        away is more thinp devices available on the same host?

3) from "Performance" slide:
   "Expensive operation is mapping in a new ‘area’"
   Q: is area the same as a block in the pool?  Why not call block size:
   "area size"?  "Block size" is familiar to people?  Original snapshot
   had "chunk size".

4) Q: what did you decide to run with for reads to logical address space
      that weren't previously mapped?  Just return zeroes like was
      discussed on lvm-team?

The "Metadata object" section is where you lose me:

5) I'm not clear on the notion of "external" vs "internal" snapshots.
   Q: can you elaborate on their characteristics?
   Maybe the following question has some relation to external vs
   internal?

6) I'm not clear on how you're going to clone the metadata tree for
   userspace to walk (for snapshot merge, etc).  Is that "clone" really
   a snapshot of the metadata device? -- seems unlikely as you'd need a
   metadata device for your metadata device's snapshots?
   - you said: "Userland will be given the location of an alternative
     superblock for the metadata device. This is the root of a tree of
     blocks referring to other blocks in a variety of data structures
     (btrees, space maps etc.). Blocks will be shared with the ‘live’
     version of the metadata, their reference counts may change as
     sharing is broken, but we know the blocks will never be updated."
     - Q: is this describing an "internal snapshot"?

7) from the "thin' target section:
"All devices stored within a metadata object are instanced with this
target. Be they fully mapped devices, thin provisioned devices, internal
snapshots or external snapshots."
Q: what is a fully mapped device?

8) "The target line:

thin <pool object> <internal device id>"
Q: so by <pool object>, that is the _id_ of a pool object that was
returned from the 'create virtual device' message?


In general my understanding of all this shared store infrastructure is a
muddled.  I need the audience to take away big concepts not get tripped
up (or trip me up!) on the minutia.

Subtle inconsistencies and/or opaque explanation aren't helping, e.g.:
1) the detail of "Configuration/Use" for thinp volume
   - "Allocate (empty) logical volume for the thin provisioning pool"
      Q: how can it be "empty"?  Isn't it the data volume you hand to
         the pool target?
   - "Allocate small logical volume for the thin provisioning metadata"
      Q: before in "Solution" slide you said "Separate metadata device
         simplifies extension", can the metadata volume be extended too?
   - "Set up thin provisioning mapped device on aforementioned 2 LVs"
      Q: so there is no distinct step for creating a pool?
      Q: pool is implicitly created at the time the thinp device is
         created? (doubtful but how you enumerated the steps makes it
	 misleading/confusing).
      Q: can snapshot and thinp volumes share the same pool?
         (if possible I could see it being brittle?)
         (but expressing such capability will help the audience "get"
	 the fact that the pool is nicely abstracted/sound design,
	 etc).

versus:

2) the description of the 'pool' and 'thin' targets
   - "This (pool) target ties together a shared metadata volume and a
     shared data volume."
     Q: when does the "block size" get defined if it isn't provided in
        the target line of "pool"?
   - "Be they fully mapped devices, thin provisioned devices, internal
     snapshots or external snapshots."
     Q: where does the notion of a thinp-snapshot (or whatever you are
        calling it) get expressed as a distinct target?  This is all
	very opaque to me...


p.s. I was going to hold off sending this and take another pass of your
slides but decided your feedback to all my Q:s would likely be much more
helpful than me trying to parse the slides again.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: call for slideware ;)
  2011-02-23  1:21   ` Mike Snitzer
@ 2011-02-23 12:18     ` Heinz Mauelshagen
  2011-02-23 12:24     ` Joe Thornber
  1 sibling, 0 replies; 5+ messages in thread
From: Heinz Mauelshagen @ 2011-02-23 12:18 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Joe Thornber

On Tue, 2011-02-22 at 20:22 -0500, Mike Snitzer wrote:
> On Thu, Feb 10 2011 at  9:59am -0500,
> Joe Thornber <thornber@redhat.com> wrote:
> 
> > Hi Mike,
> > 
> > On Wed, 2011-02-09 at 18:16 -0500, Mike Snitzer wrote:
> > > Joe and/or Heinz,
> > > 
> > > Could you provide a few slides on the thinp and shared snapshot
> > > infrastructure and targets?  Planned features and performance
> > > benefits,
> > > etc.
> > > 
> > > 
> > 
> > I've started a new project on GitHub:
> > 
> > https://github.com/jthornber/storage-papers
> > 
> > Heinz and I have started putting stuff in there.
> 
> I just had a look at the latest content and have some questions (way
> more than I'd imagine you'd like to see.. means I'm clearly missing a
> lot):
> 
> 1) from "Solution" slide:
>    "Space comes from a preallocated ‘pool’, which is itself just another
>    logical volume, thus can be resized on demand."
>    ...
>    "Separate metadata device simplifies extension, this is hidden by the
>     LVM system so sys admin unlikely to be aware of it."
>     Q: Can you elaborate on the role of the metadata?  It maps between
>        physical "area" (allocated from pool) for all writes to the
>        logical address space?

Yes.

>     Q: can thinp and snapshot metadata coexist in the same pool? -- ask
>        similar question below.

Theoretically yes if they would be able to share the same blocksize.
Practically no, because tinp (and hsm) will go by rather large block
sizes whereas snaphots go by small ones.

> 
> 2) from "Block size choice" slide:
>    The larger the block size:
>    - the less chance there is of fragmentation (describe this)
>      Q: can you please "describe this"? :)

With large blocks, the thinp user (eg. fs) is more likely to less
fragment it's metadata and data.

>    - the less frequently we need the expensive mapping operation



>      Q: "expensive" is all relative, seems you contradict the expense of
>         the mapping operation in the "Performance" slide?

No, less lookups save memory footprint to keep btree in memory and cpu
cycles in general, so larger block sizres help this.

>    - the smaller the metadata tables are, so more of them can be held in core
>      at a time. Leading to faster access to the provisioned blocks by
>      minimizing reading in mapping information
>      Q: "more of them" -- "them" being metadata tables?  So the take
>         away is more thinp devices available on the same host?

Not more thinp devices but more of their mapping tables in memory.
This is btrees with a partial working set in memory, ie. with smaller
tables there's less nodes thus more likelyhood to have them in core.

> 
> 3) from "Performance" slide:
>    "Expensive operation is mapping in a new ‘area’"
>    Q: is area the same as a block in the pool?  Why not call block size:
>    "area size"?  "Block size" is familiar to people?  Original snapshot
>    had "chunk size".

Yes, it's the allocation unit.
I think block size applies good because it names the
logical allocation entity.

> 
> 4) Q: what did you decide to run with for reads to logical address space
>       that weren't previously mapped?  Just return zeroes like was
>       discussed on lvm-team?

Well, we're returning zeroes for initial reads before writes in order to
prevent any discovery providing blocks unnecessarily. After the first
write any data in the block will be returned, which doesn't cause any
harm, because the application never wrote to the block before and thus
can never expect to retrieve senseful data from other segments of the
block it never wrote to. Discovery after such initial provisioning of a
block should be fine too, because we have to assume that the application
initialized the thinp dev properly for future discovery.

> 
> The "Metadata object" section is where you lose me:
> 
> 5) I'm not clear on the notion of "external" vs "internal" snapshots.
>    Q: can you elaborate on their characteristics?
>    Maybe the following question has some relation to external vs
>    internal?

Joe?

> 
> 6) I'm not clear on how you're going to clone the metadata tree for
>    userspace to walk (for snapshot merge, etc).  Is that "clone" really
>    a snapshot of the metadata device? -- seems unlikely as you'd need a
>    metadata device for your metadata device's snapshots?
>    - you said: "Userland will be given the location of an alternative
>      superblock for the metadata device. This is the root of a tree of
>      blocks referring to other blocks in a variety of data structures
>      (btrees, space maps etc.). Blocks will be shared with the ‘live’
>      version of the metadata, their reference counts may change as
>      sharing is broken, but we know the blocks will never be updated."
>      - Q: is this describing an "internal snapshot"?

Joe?

> 
> 7) from the "thin' target section:
> "All devices stored within a metadata object are instanced with this
> target. Be they fully mapped devices, thin provisioned devices, internal
> snapshots or external snapshots."
> Q: what is a fully mapped device?

All blocks mapped.

> 
> 8) "The target line:
> 
> thin <pool object> <internal device id>"
> Q: so by <pool object>, that is the _id_ of a pool object that was
> returned from the 'create virtual device' message?
> 
> 
> In general my understanding of all this shared store infrastructure is a
> muddled.  I need the audience to take away big concepts not get tripped
> up (or trip me up!) on the minutia.
> 
> Subtle inconsistencies and/or opaque explanation aren't helping, e.g.:
> 1) the detail of "Configuration/Use" for thinp volume
>    - "Allocate (empty) logical volume for the thin provisioning pool"
>       Q: how can it be "empty"?  Isn't it the data volume you hand to
>          the pool target?

It can start out to have zero size and uspace will grow it on thinp
targets request.

>    - "Allocate small logical volume for the thin provisioning metadata"
>       Q: before in "Solution" slide you said "Separate metadata device
>          simplifies extension", can the metadata volume be extended too?

Planned, not yet.

>    - "Set up thin provisioning mapped device on aforementioned 2 LVs"
>       Q: so there is no distinct step for creating a pool?

Not yet but we agreed to have 2 distinct steps with the multi-dev thinp
targets. Ie. one target for the pool responsible for all pool properties
like creating it in an iniitial step and another target responsible for
thin provisioned device properties.
Same with shared snapshots.

>       Q: pool is implicitly created at the time the thinp device is
>          created? (doubtful but how you enumerated the steps makes it
> 	 misleading/confusing).

This describes the single pool/device target, not the shared pool one
which is WIP to settle the interfaces as mentioned above.

>       Q: can snapshot and thinp volumes share the same pool?
>          (if possible I could see it being brittle?)
>          (but expressing such capability will help the audience "get"
> 	 the fact that the pool is nicely abstracted/sound design,
> 	 etc).

See my theoretically/practically remark above.

> 
> versus:
> 
> 2) the description of the 'pool' and 'thin' targets
>    - "This (pool) target ties together a shared metadata volume and a
>      shared data volume."
>      Q: when does the "block size" get defined if it isn't provided in
>         the target line of "pool"?

Target lines are subject to change as mentioned above. Ie. pool
properties are handled by the pool target and thin provisioned device
related ones by the device target.

>    - "Be they fully mapped devices, thin provisioned devices, internal
>      snapshots or external snapshots."
>      Q: where does the notion of a thinp-snapshot (or whatever you are
>         calling it) get expressed as a distinct target?  This is all
> 	very opaque to me...

Joe?

> 
> 
> p.s. I was going to hold off sending this and take another pass of your
> slides but decided your feedback to all my Q:s would likely be much more
> helpful than me trying to parse the slides again.


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: call for slideware ;)
  2011-02-23  1:21   ` Mike Snitzer
  2011-02-23 12:18     ` Heinz Mauelshagen
@ 2011-02-23 12:24     ` Joe Thornber
  2011-02-23 12:59       ` Joe Thornber
  1 sibling, 1 reply; 5+ messages in thread
From: Joe Thornber @ 2011-02-23 12:24 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Heinz Mauelshagen

Mike,

On Tue, 2011-02-22 at 20:22 -0500, Mike Snitzer wrote:
> I just had a look at the latest content and have some questions (way
> more than I'd imagine you'd like to see.. means I'm clearly missing a
> lot):

Thanks a lot for taking the time to go through this.  I'm updating the
document as I answer your questions.  I'll put the git commit hashes in
square brackets to make it easier for you to pick out the changes for
each question. 

> 1) from "Solution" slide:
>    "Space comes from a preallocated ‘pool’, which is itself just another
>    logical volume, thus can be resized on demand."
>    ...
>    "Separate metadata device simplifies extension, this is hidden by the
>     LVM system so sys admin unlikely to be aware of it."
>     Q: Can you elaborate on the role of the metadata?  It maps between
>        physical "area" (allocated from pool) for all writes to the
>        logical address space?

[0127dd9]

>     Q: can thinp and snapshot metadata coexist in the same pool? -- ask
>        similar question below.

I've added a new introduction section at the start of the document that
tries to explain that the thinp target is just a simple thin
provisioning solution, whereas multisnap will provide both thinp and
snapshots. [70e448f]


> 
> 2) from "Block size choice" slide:
>    The larger the block size:
>    - the less chance there is of fragmentation (describe this)
>      Q: can you please "describe this"? :)

[a6306c8]


>    - the less frequently we need the expensive mapping operation
>      Q: "expensive" is all relative, seems you contradict the expense of
>         the mapping operation in the "Performance" slide?

[938422d]  You still want to minimise it.  The performance at small
block sizes is better than I expected.

>    - the smaller the metadata tables are, so more of them can be held in core
>      at a time. Leading to faster access to the provisioned blocks by
>      minimizing reading in mapping information
>      Q: "more of them" -- "them" being metadata tables?  So the take
>         away is more thinp devices available on the same host?

No, fewer reads to load bit of the mapping table that aren't in the
cache. [9ba3ae3]


> 
> 3) from "Performance" slide:
>    "Expensive operation is mapping in a new ‘area’"
>    Q: is area the same as a block in the pool?  Why not call block size:
>    "area size"?  "Block size" is familiar to people?  Original snapshot
>    had "chunk size".

I switched from 'chunk' to 'block' because we seem to be the only people
who use the term chunk (my fault) and I was reading lots of filesystem
papers in preparation for this work where block is more ubiquitous.
I've changed 'area' and 'region' to block [1c6a5352].  If you think it's
still confusing I'll change everything to 'chunk' (the LVM2 tools are
still going to use --chunksize etc.).

> 4) Q: what did you decide to run with for reads to logical address space
>       that weren't previously mapped?  Just return zeroes like was
>       discussed on lvm-team?

[49c8490]


I've added a 'target parameter' section [8332c43].


> The "Metadata object" section is where you lose me:

I've added some more background stuff [c8e1685].

> 
> 5) I'm not clear on the notion of "external" vs "internal" snapshots.
>    Q: can you elaborate on their characteristics?

See above commit.

> 6) I'm not clear on how you're going to clone the metadata tree for
>    userspace to walk (for snapshot merge, etc).  Is that "clone" really
>    a snapshot of the metadata device? -- seems unlikely as you'd need a
>    metadata device for your metadata device's snapshots?

No.

>    - you said: "Userland will be given the location of an alternative
>      superblock for the metadata device. This is the root of a tree of
>      blocks referring to other blocks in a variety of data structures
>      (btrees, space maps etc.). Blocks will be shared with the ‘live’
>      version of the metadata, their reference counts may change as
>      sharing is broken, but we know the blocks will never be updated."
>      - Q: is this describing an "internal snapshot"?

No.  I don't really want to go into how the persistent-data library
works.  I should start a separate document for that.  If you think I'm
just confusing people by adding these issues then I can take this
section out?

> 7) from the "thin' target section:
> "All devices stored within a metadata object are instanced with this
> target. Be they fully mapped devices, thin provisioned devices, internal
> snapshots or external snapshots."
> Q: what is a fully mapped device?

A thinp that's fully mapped, I'll take it out [831c136].

> 
> 8) "The target line:
> 
> thin <pool object> <internal device id>"
> Q: so by <pool object>, that is the _id_ of a pool object that was
> returned from the 'create virtual device' message?

Yep, or rather the id that was passed in to that call.  Userland is in
charge of allocating these numbers.

> In general my understanding of all this shared store infrastructure is a
> muddled.  I need the audience to take away big concepts not get tripped
> up (or trip me up!) on the minutia.

Agreed, let's try and restrict this document to high level stuff.  I'll
do a separate persistent-data doc with the detail in.

> 
> Subtle inconsistencies and/or opaque explanation aren't helping, e.g.:
> 1) the detail of "Configuration/Use" for thinp volume
>    - "Allocate (empty) logical volume for the thin provisioning pool"
>       Q: how can it be "empty"?  Isn't it the data volume you hand to
>          the pool target?

Changed to 'possibly empty' [3ce2226].  I think this scenario will occur
quite often, for example a VM hosting service might create a new VM for
a client with a bunch of thinp devices, but not want to commit any space
to the VM until the client actually starts using the devices.


>    - "Allocate small logical volume for the thin provisioning metadata"
>       Q: before in "Solution" slide you said "Separate metadata device
>          simplifies extension", can the metadata volume be extended too?

That's the plan.  A userland library will make the necc. tweaks to the
metadata while the device is suspended.

>    - "Set up thin provisioning mapped device on aforementioned 2 LVs"
>       Q: so there is no distinct step for creating a pool?

For the thinp target, the data device that you pass in to the target is
the 'pool'.  I hope the 'target parameters' section I've added helps
explain this?

>       Q: pool is implicitly created at the time the thinp device is
>          created? (doubtful but how you enumerated the steps makes it
> 	 misleading/confusing).

The LVM tools will implicitly create the data/backing device and the
metadata device.  agk is envisioning a command line like:

	lvcreate --target-type=thinp --chunksize=512k --low-water-mark=4 -L10G

>       Q: can snapshot and thinp volumes share the same pool?
>          (if possible I could see it being brittle?)
>          (but expressing such capability will help the audience "get"
> 	 the fact that the pool is nicely abstracted/sound design,
> 	 etc).

I'm not sure if you're talking thinp target or multisnap here.  Why
'brittle'?


> p.s. I was going to hold off sending this and take another pass of your
> slides but decided your feedback to all my Q:s would likely be much more
> helpful than me trying to parse the slides again.

You definitely did right to send these, it gives me a kick to keep
improving it.  Have a read through it now and see if it's any better.
I'm quite happy to keep revising it for you.

- Joe


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: call for slideware ;)
  2011-02-23 12:24     ` Joe Thornber
@ 2011-02-23 12:59       ` Joe Thornber
  0 siblings, 0 replies; 5+ messages in thread
From: Joe Thornber @ 2011-02-23 12:59 UTC (permalink / raw)
  To: device-mapper development

For those that want the context of this email:

https://github.com/jthornber/storage-papers/blob/master/thinp-snapshots-2011/thinp-and-snapshots-talk.org

- Joe

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-02-23 12:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20110209231656.GB5193@redhat.com>
2011-02-10 14:59 ` call for slideware ;) Joe Thornber
2011-02-23  1:21   ` Mike Snitzer
2011-02-23 12:18     ` Heinz Mauelshagen
2011-02-23 12:24     ` Joe Thornber
2011-02-23 12:59       ` Joe Thornber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.