Re: call for slideware ;)

From: Joe Thornber <thornber@redhat.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: Heinz Mauelshagen <heinzm@redhat.com>
Subject: Re: call for slideware ;)
Date: Wed, 23 Feb 2011 12:24:48 +0000	[thread overview]
Message-ID: <1298463888.19562.138.camel@ubuntu> (raw)
In-Reply-To: <20110223012159.GA13983@redhat.com>

Mike,

On Tue, 2011-02-22 at 20:22 -0500, Mike Snitzer wrote:
> I just had a look at the latest content and have some questions (way
> more than I'd imagine you'd like to see.. means I'm clearly missing a
> lot):

Thanks a lot for taking the time to go through this.  I'm updating the
document as I answer your questions.  I'll put the git commit hashes in
square brackets to make it easier for you to pick out the changes for
each question. 

> 1) from "Solution" slide:
>    "Space comes from a preallocated ‘pool’, which is itself just another
>    logical volume, thus can be resized on demand."
>    ...
>    "Separate metadata device simplifies extension, this is hidden by the
>     LVM system so sys admin unlikely to be aware of it."
>     Q: Can you elaborate on the role of the metadata?  It maps between
>        physical "area" (allocated from pool) for all writes to the
>        logical address space?

[0127dd9]

>     Q: can thinp and snapshot metadata coexist in the same pool? -- ask
>        similar question below.

I've added a new introduction section at the start of the document that
tries to explain that the thinp target is just a simple thin
provisioning solution, whereas multisnap will provide both thinp and
snapshots. [70e448f]

> 
> 2) from "Block size choice" slide:
>    The larger the block size:
>    - the less chance there is of fragmentation (describe this)
>      Q: can you please "describe this"? :)

[a6306c8]

>    - the less frequently we need the expensive mapping operation
>      Q: "expensive" is all relative, seems you contradict the expense of
>         the mapping operation in the "Performance" slide?

[938422d]  You still want to minimise it.  The performance at small
block sizes is better than I expected.

>    - the smaller the metadata tables are, so more of them can be held in core
>      at a time. Leading to faster access to the provisioned blocks by
>      minimizing reading in mapping information
>      Q: "more of them" -- "them" being metadata tables?  So the take
>         away is more thinp devices available on the same host?

No, fewer reads to load bit of the mapping table that aren't in the
cache. [9ba3ae3]

> 
> 3) from "Performance" slide:
>    "Expensive operation is mapping in a new ‘area’"
>    Q: is area the same as a block in the pool?  Why not call block size:
>    "area size"?  "Block size" is familiar to people?  Original snapshot
>    had "chunk size".

I switched from 'chunk' to 'block' because we seem to be the only people
who use the term chunk (my fault) and I was reading lots of filesystem
papers in preparation for this work where block is more ubiquitous.
I've changed 'area' and 'region' to block [1c6a5352].  If you think it's
still confusing I'll change everything to 'chunk' (the LVM2 tools are
still going to use --chunksize etc.).

> 4) Q: what did you decide to run with for reads to logical address space
>       that weren't previously mapped?  Just return zeroes like was
>       discussed on lvm-team?

[49c8490]

I've added a 'target parameter' section [8332c43].

> The "Metadata object" section is where you lose me:

I've added some more background stuff [c8e1685].

> 
> 5) I'm not clear on the notion of "external" vs "internal" snapshots.
>    Q: can you elaborate on their characteristics?

See above commit.

> 6) I'm not clear on how you're going to clone the metadata tree for
>    userspace to walk (for snapshot merge, etc).  Is that "clone" really
>    a snapshot of the metadata device? -- seems unlikely as you'd need a
>    metadata device for your metadata device's snapshots?

No.

>    - you said: "Userland will be given the location of an alternative
>      superblock for the metadata device. This is the root of a tree of
>      blocks referring to other blocks in a variety of data structures
>      (btrees, space maps etc.). Blocks will be shared with the ‘live’
>      version of the metadata, their reference counts may change as
>      sharing is broken, but we know the blocks will never be updated."
>      - Q: is this describing an "internal snapshot"?

No.  I don't really want to go into how the persistent-data library
works.  I should start a separate document for that.  If you think I'm
just confusing people by adding these issues then I can take this
section out?

> 7) from the "thin' target section:
> "All devices stored within a metadata object are instanced with this
> target. Be they fully mapped devices, thin provisioned devices, internal
> snapshots or external snapshots."
> Q: what is a fully mapped device?

A thinp that's fully mapped, I'll take it out [831c136].

> 
> 8) "The target line:
> 
> thin <pool object> <internal device id>"
> Q: so by <pool object>, that is the _id_ of a pool object that was
> returned from the 'create virtual device' message?

Yep, or rather the id that was passed in to that call.  Userland is in
charge of allocating these numbers.

> In general my understanding of all this shared store infrastructure is a
> muddled.  I need the audience to take away big concepts not get tripped
> up (or trip me up!) on the minutia.

Agreed, let's try and restrict this document to high level stuff.  I'll
do a separate persistent-data doc with the detail in.

> 
> Subtle inconsistencies and/or opaque explanation aren't helping, e.g.:
> 1) the detail of "Configuration/Use" for thinp volume
>    - "Allocate (empty) logical volume for the thin provisioning pool"
>       Q: how can it be "empty"?  Isn't it the data volume you hand to
>          the pool target?

Changed to 'possibly empty' [3ce2226].  I think this scenario will occur
quite often, for example a VM hosting service might create a new VM for
a client with a bunch of thinp devices, but not want to commit any space
to the VM until the client actually starts using the devices.

>    - "Allocate small logical volume for the thin provisioning metadata"
>       Q: before in "Solution" slide you said "Separate metadata device
>          simplifies extension", can the metadata volume be extended too?

That's the plan.  A userland library will make the necc. tweaks to the
metadata while the device is suspended.

>    - "Set up thin provisioning mapped device on aforementioned 2 LVs"
>       Q: so there is no distinct step for creating a pool?

For the thinp target, the data device that you pass in to the target is
the 'pool'.  I hope the 'target parameters' section I've added helps
explain this?

>       Q: pool is implicitly created at the time the thinp device is
>          created? (doubtful but how you enumerated the steps makes it
> 	 misleading/confusing).

The LVM tools will implicitly create the data/backing device and the
metadata device.  agk is envisioning a command line like:

	lvcreate --target-type=thinp --chunksize=512k --low-water-mark=4 -L10G

>       Q: can snapshot and thinp volumes share the same pool?
>          (if possible I could see it being brittle?)
>          (but expressing such capability will help the audience "get"
> 	 the fact that the pool is nicely abstracted/sound design,
> 	 etc).

I'm not sure if you're talking thinp target or multisnap here.  Why
'brittle'?

> p.s. I was going to hold off sending this and take another pass of your
> slides but decided your feedback to all my Q:s would likely be much more
> helpful than me trying to parse the slides again.

You definitely did right to send these, it gives me a kick to keep
improving it.  Have a read through it now and see if it's any better.
I'm quite happy to keep revising it for you.

- Joe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel