All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sage Weil <sweil@redhat.com>
To: ifedotov@mirantis.com, allen.samuels@sandisk.com
Cc: ceph-devel@vger.kernel.org
Subject: 2 related bluestore questions
Date: Mon, 9 May 2016 14:31:38 -0400 (EDT)	[thread overview]
Message-ID: <alpine.DEB.2.11.1605091417590.336@cpach.fuggernut.com> (raw)

1. In 7fb649a3800a5653f5f7ddf942c53503f88ad3f1 I added an extent_ref_map_t 
to the blob_t.  This lets us keep track, for each blob, of references to 
the logical blob extents (in addition to the raw num_refs that just counts 
how many lextent_t's point to us).  It will let us make decisions about 
deallocating unused portions of the blob that are no longer referenced 
(e.g., when we are uncompressed).  It will also let us sanely reason 
about whether we can write into the blob's allocated space that is not 
referenced (e.g., past end of object/file, but within a min_alloc_size 
chunk).

The downside is that it's a bit more metadata to maintain.  OTOH, we need 
it in many cases, and it would be slow/tedious to create it on the fly.

I think yes, though some minor changes to the current extent_ref_map_t are 
needed, since it currently has weird assumptoins about empty meaning a ref 
count of 1.

2. Allow lextent_t's to be byte-granularity.

For example, if we write 10 bytes into the object, we'd have a blob of 
min_alloc_size, and an lextent_t that indicates [0,10) points to that 
blob.

The upside here is that truncate and zero are trivial updates to the 
lextent map and never need to do any IO--we just punch holes in our 
mapping.

The downside is that we might get odd mappings like

 0: 0~10->1
 4000: 4000~96->1

after a hole (10~3990) has been punched, and we may need to piece the 
mapping back together.  I think we will need most of this complexity 
(e.g., merging adjacent lextents that map to adjacent regions of the same 
blob) anyway.

Hmm, there is probably some other downside but now I can't think of a good 
reason not to do this.  It'll basically put all of the onus on the write 
code to do the right thing... which is probably a good thing.

Yes?


Also, one note on the WAL changes: we'll need to have any read portion of 
a wal event include the raw pextents *and* the associated checksum(s).  
This is because the events need to be idempotent and may overwrite the 
read region, or interact with wal ops that come before/after.

sage

             reply	other threads:[~2016-05-09 18:31 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-09 18:31 Sage Weil [this message]
2016-05-10 12:17 ` 2 related bluestore questions Igor Fedotov
2016-05-10 12:53   ` Sage Weil
2016-05-10 14:41     ` Igor Fedotov
2016-05-10 15:39       ` Sage Weil
2016-05-11  1:10         ` Sage Weil
2016-05-11 12:11           ` Igor Fedotov
2016-05-11 13:10             ` Sage Weil
2016-05-11 13:45               ` Igor Fedotov
2016-05-11 13:57                 ` Sage Weil
2016-05-11 20:54                   ` Sage Weil
2016-05-11 21:38                     ` Allen Samuels
2016-05-12  2:58                       ` Sage Weil
2016-05-12 11:54                         ` Allen Samuels
2016-05-12 14:47                           ` Igor Fedotov
2016-05-12 14:38                         ` Igor Fedotov
2016-05-12 16:37                         ` Igor Fedotov
2016-05-12 16:43                           ` Sage Weil
2016-05-12 16:45                             ` Igor Fedotov
2016-05-12 16:48                               ` Sage Weil
2016-05-12 16:52                                 ` Igor Fedotov
2016-05-12 17:09                                   ` Sage Weil
2016-05-13 17:07                                     ` Igor Fedotov
2016-05-12 14:29                       ` Igor Fedotov
2016-05-12 14:27                     ` Igor Fedotov
2016-05-12 15:06                       ` Sage Weil
2016-05-11 12:39           ` Igor Fedotov
2016-05-11 14:35             ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.11.1605091417590.336@cpach.fuggernut.com \
    --to=sweil@redhat.com \
    --cc=allen.samuels@sandisk.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=ifedotov@mirantis.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.