From: Sage Weil <sweil@redhat.com>
To: ifedotov@mirantis.com, allen.samuels@sandisk.com
Cc: ceph-devel@vger.kernel.org
Subject: 2 related bluestore questions
Date: Mon, 9 May 2016 14:31:38 -0400 (EDT) [thread overview]
Message-ID: <alpine.DEB.2.11.1605091417590.336@cpach.fuggernut.com> (raw)
1. In 7fb649a3800a5653f5f7ddf942c53503f88ad3f1 I added an extent_ref_map_t
to the blob_t. This lets us keep track, for each blob, of references to
the logical blob extents (in addition to the raw num_refs that just counts
how many lextent_t's point to us). It will let us make decisions about
deallocating unused portions of the blob that are no longer referenced
(e.g., when we are uncompressed). It will also let us sanely reason
about whether we can write into the blob's allocated space that is not
referenced (e.g., past end of object/file, but within a min_alloc_size
chunk).
The downside is that it's a bit more metadata to maintain. OTOH, we need
it in many cases, and it would be slow/tedious to create it on the fly.
I think yes, though some minor changes to the current extent_ref_map_t are
needed, since it currently has weird assumptoins about empty meaning a ref
count of 1.
2. Allow lextent_t's to be byte-granularity.
For example, if we write 10 bytes into the object, we'd have a blob of
min_alloc_size, and an lextent_t that indicates [0,10) points to that
blob.
The upside here is that truncate and zero are trivial updates to the
lextent map and never need to do any IO--we just punch holes in our
mapping.
The downside is that we might get odd mappings like
0: 0~10->1
4000: 4000~96->1
after a hole (10~3990) has been punched, and we may need to piece the
mapping back together. I think we will need most of this complexity
(e.g., merging adjacent lextents that map to adjacent regions of the same
blob) anyway.
Hmm, there is probably some other downside but now I can't think of a good
reason not to do this. It'll basically put all of the onus on the write
code to do the right thing... which is probably a good thing.
Yes?
Also, one note on the WAL changes: we'll need to have any read portion of
a wal event include the raw pextents *and* the associated checksum(s).
This is because the events need to be idempotent and may overwrite the
read region, or interact with wal ops that come before/after.
sage
next reply other threads:[~2016-05-09 18:31 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-09 18:31 Sage Weil [this message]
2016-05-10 12:17 ` 2 related bluestore questions Igor Fedotov
2016-05-10 12:53 ` Sage Weil
2016-05-10 14:41 ` Igor Fedotov
2016-05-10 15:39 ` Sage Weil
2016-05-11 1:10 ` Sage Weil
2016-05-11 12:11 ` Igor Fedotov
2016-05-11 13:10 ` Sage Weil
2016-05-11 13:45 ` Igor Fedotov
2016-05-11 13:57 ` Sage Weil
2016-05-11 20:54 ` Sage Weil
2016-05-11 21:38 ` Allen Samuels
2016-05-12 2:58 ` Sage Weil
2016-05-12 11:54 ` Allen Samuels
2016-05-12 14:47 ` Igor Fedotov
2016-05-12 14:38 ` Igor Fedotov
2016-05-12 16:37 ` Igor Fedotov
2016-05-12 16:43 ` Sage Weil
2016-05-12 16:45 ` Igor Fedotov
2016-05-12 16:48 ` Sage Weil
2016-05-12 16:52 ` Igor Fedotov
2016-05-12 17:09 ` Sage Weil
2016-05-13 17:07 ` Igor Fedotov
2016-05-12 14:29 ` Igor Fedotov
2016-05-12 14:27 ` Igor Fedotov
2016-05-12 15:06 ` Sage Weil
2016-05-11 12:39 ` Igor Fedotov
2016-05-11 14:35 ` Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.11.1605091417590.336@cpach.fuggernut.com \
--to=sweil@redhat.com \
--cc=allen.samuels@sandisk.com \
--cc=ceph-devel@vger.kernel.org \
--cc=ifedotov@mirantis.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.