All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joe Thornber <thornber@redhat.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: Heinz Mauelshagen <heinzm@redhat.com>
Subject: Re: call for slideware ;)
Date: Wed, 23 Feb 2011 12:24:48 +0000	[thread overview]
Message-ID: <1298463888.19562.138.camel@ubuntu> (raw)
In-Reply-To: <20110223012159.GA13983@redhat.com>

Mike,

On Tue, 2011-02-22 at 20:22 -0500, Mike Snitzer wrote:
> I just had a look at the latest content and have some questions (way
> more than I'd imagine you'd like to see.. means I'm clearly missing a
> lot):

Thanks a lot for taking the time to go through this.  I'm updating the
document as I answer your questions.  I'll put the git commit hashes in
square brackets to make it easier for you to pick out the changes for
each question. 

> 1) from "Solution" slide:
>    "Space comes from a preallocated ‘pool’, which is itself just another
>    logical volume, thus can be resized on demand."
>    ...
>    "Separate metadata device simplifies extension, this is hidden by the
>     LVM system so sys admin unlikely to be aware of it."
>     Q: Can you elaborate on the role of the metadata?  It maps between
>        physical "area" (allocated from pool) for all writes to the
>        logical address space?

[0127dd9]

>     Q: can thinp and snapshot metadata coexist in the same pool? -- ask
>        similar question below.

I've added a new introduction section at the start of the document that
tries to explain that the thinp target is just a simple thin
provisioning solution, whereas multisnap will provide both thinp and
snapshots. [70e448f]


> 
> 2) from "Block size choice" slide:
>    The larger the block size:
>    - the less chance there is of fragmentation (describe this)
>      Q: can you please "describe this"? :)

[a6306c8]


>    - the less frequently we need the expensive mapping operation
>      Q: "expensive" is all relative, seems you contradict the expense of
>         the mapping operation in the "Performance" slide?

[938422d]  You still want to minimise it.  The performance at small
block sizes is better than I expected.

>    - the smaller the metadata tables are, so more of them can be held in core
>      at a time. Leading to faster access to the provisioned blocks by
>      minimizing reading in mapping information
>      Q: "more of them" -- "them" being metadata tables?  So the take
>         away is more thinp devices available on the same host?

No, fewer reads to load bit of the mapping table that aren't in the
cache. [9ba3ae3]


> 
> 3) from "Performance" slide:
>    "Expensive operation is mapping in a new ‘area’"
>    Q: is area the same as a block in the pool?  Why not call block size:
>    "area size"?  "Block size" is familiar to people?  Original snapshot
>    had "chunk size".

I switched from 'chunk' to 'block' because we seem to be the only people
who use the term chunk (my fault) and I was reading lots of filesystem
papers in preparation for this work where block is more ubiquitous.
I've changed 'area' and 'region' to block [1c6a5352].  If you think it's
still confusing I'll change everything to 'chunk' (the LVM2 tools are
still going to use --chunksize etc.).

> 4) Q: what did you decide to run with for reads to logical address space
>       that weren't previously mapped?  Just return zeroes like was
>       discussed on lvm-team?

[49c8490]


I've added a 'target parameter' section [8332c43].


> The "Metadata object" section is where you lose me:

I've added some more background stuff [c8e1685].

> 
> 5) I'm not clear on the notion of "external" vs "internal" snapshots.
>    Q: can you elaborate on their characteristics?

See above commit.

> 6) I'm not clear on how you're going to clone the metadata tree for
>    userspace to walk (for snapshot merge, etc).  Is that "clone" really
>    a snapshot of the metadata device? -- seems unlikely as you'd need a
>    metadata device for your metadata device's snapshots?

No.

>    - you said: "Userland will be given the location of an alternative
>      superblock for the metadata device. This is the root of a tree of
>      blocks referring to other blocks in a variety of data structures
>      (btrees, space maps etc.). Blocks will be shared with the ‘live’
>      version of the metadata, their reference counts may change as
>      sharing is broken, but we know the blocks will never be updated."
>      - Q: is this describing an "internal snapshot"?

No.  I don't really want to go into how the persistent-data library
works.  I should start a separate document for that.  If you think I'm
just confusing people by adding these issues then I can take this
section out?

> 7) from the "thin' target section:
> "All devices stored within a metadata object are instanced with this
> target. Be they fully mapped devices, thin provisioned devices, internal
> snapshots or external snapshots."
> Q: what is a fully mapped device?

A thinp that's fully mapped, I'll take it out [831c136].

> 
> 8) "The target line:
> 
> thin <pool object> <internal device id>"
> Q: so by <pool object>, that is the _id_ of a pool object that was
> returned from the 'create virtual device' message?

Yep, or rather the id that was passed in to that call.  Userland is in
charge of allocating these numbers.

> In general my understanding of all this shared store infrastructure is a
> muddled.  I need the audience to take away big concepts not get tripped
> up (or trip me up!) on the minutia.

Agreed, let's try and restrict this document to high level stuff.  I'll
do a separate persistent-data doc with the detail in.

> 
> Subtle inconsistencies and/or opaque explanation aren't helping, e.g.:
> 1) the detail of "Configuration/Use" for thinp volume
>    - "Allocate (empty) logical volume for the thin provisioning pool"
>       Q: how can it be "empty"?  Isn't it the data volume you hand to
>          the pool target?

Changed to 'possibly empty' [3ce2226].  I think this scenario will occur
quite often, for example a VM hosting service might create a new VM for
a client with a bunch of thinp devices, but not want to commit any space
to the VM until the client actually starts using the devices.


>    - "Allocate small logical volume for the thin provisioning metadata"
>       Q: before in "Solution" slide you said "Separate metadata device
>          simplifies extension", can the metadata volume be extended too?

That's the plan.  A userland library will make the necc. tweaks to the
metadata while the device is suspended.

>    - "Set up thin provisioning mapped device on aforementioned 2 LVs"
>       Q: so there is no distinct step for creating a pool?

For the thinp target, the data device that you pass in to the target is
the 'pool'.  I hope the 'target parameters' section I've added helps
explain this?

>       Q: pool is implicitly created at the time the thinp device is
>          created? (doubtful but how you enumerated the steps makes it
> 	 misleading/confusing).

The LVM tools will implicitly create the data/backing device and the
metadata device.  agk is envisioning a command line like:

	lvcreate --target-type=thinp --chunksize=512k --low-water-mark=4 -L10G

>       Q: can snapshot and thinp volumes share the same pool?
>          (if possible I could see it being brittle?)
>          (but expressing such capability will help the audience "get"
> 	 the fact that the pool is nicely abstracted/sound design,
> 	 etc).

I'm not sure if you're talking thinp target or multisnap here.  Why
'brittle'?


> p.s. I was going to hold off sending this and take another pass of your
> slides but decided your feedback to all my Q:s would likely be much more
> helpful than me trying to parse the slides again.

You definitely did right to send these, it gives me a kick to keep
improving it.  Have a read through it now and see if it's any better.
I'm quite happy to keep revising it for you.

- Joe


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

  parent reply	other threads:[~2011-02-23 12:24 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20110209231656.GB5193@redhat.com>
2011-02-10 14:59 ` call for slideware ;) Joe Thornber
2011-02-23  1:21   ` Mike Snitzer
2011-02-23 12:18     ` Heinz Mauelshagen
2011-02-23 12:24     ` Joe Thornber [this message]
2011-02-23 12:59       ` Joe Thornber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1298463888.19562.138.camel@ubuntu \
    --to=thornber@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=heinzm@redhat.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.