All of lore.kernel.org
 help / color / mirror / Atom feed
* question about snapset
@ 2017-02-02 19:04 sheng qiu
  2017-02-03 14:05 ` Sage Weil
  0 siblings, 1 reply; 5+ messages in thread
From: sheng qiu @ 2017-02-02 19:04 UTC (permalink / raw)
  To: ceph-devel

Hi cephers,

We are reading the codes of Ceph I/O path. We found within the
do_op(), it tries to get the object context each time as well as the
snapset context.

may i ask what's the usage of the snapset context and what's the
relationship between snapset context and the individual object? Does
each object has a snapset context ? Does it stores as attribute within
the onode that associated with the object?
i read some articles said only head object has snapset context, may i
ask what's the head object and what's the relationship between a head
object and other non-head objects?

Thanks in advance.

Thanks,
Sheng

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: question about snapset
  2017-02-02 19:04 question about snapset sheng qiu
@ 2017-02-03 14:05 ` Sage Weil
       [not found]   ` <CAB7xdi=NFjC4W3XYSLPYFrs_vN18+TY-5F_UGzCe_TCweYHxdg@mail.gmail.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Sage Weil @ 2017-02-03 14:05 UTC (permalink / raw)
  To: sheng qiu; +Cc: ceph-devel

On Thu, 2 Feb 2017, sheng qiu wrote:
> Hi cephers,
> 
> We are reading the codes of Ceph I/O path. We found within the
> do_op(), it tries to get the object context each time as well as the
> snapset context.
> 
> may i ask what's the usage of the snapset context and what's the
> relationship between snapset context and the individual object? Does
> each object has a snapset context ? Does it stores as attribute within
> the onode that associated with the object?
> i read some articles said only head object has snapset context, may i
> ask what's the head object and what's the relationship between a head
> object and other non-head objects?

A given logical object may be contained by several snapshots, and we'll 
have a separate clone for each unique version of the object. The 
head (snapid == CEPH_NOSNAP) is the latest read/write version of the 
object.  A clone (snapid < CEPH_NOSNAP) is the version for some number of 
snapshots.  For example, if you wrote X, took snapshot 1, write X', wrote 
X'', took snap 2, took snap 3, then wrote X''', you'd have 3 clones with 
something like

X    (snapid 1) snaps=[1]
X''  (snapid 3) snaps=[2,3]
X''' (head) clones=[1,2]

The SnapSet is attached to the head and tells us we have 2 clones (1 and 
3).  Each clone has a snaps vector that tells us which snaps it exists in.  
There's some other bookkeeping in SnapSet as well that tells us what data 
extents are identical across adjacent clones (so that they can share 
blocks on disk efficiently).

There's one annoying oddity that if the head doesn't logically exist we 
create an object with snapid CEPH_SNAPDIR and attach the SnapSet to 
that.  We hope to remove this soon (by storing SnapSet on head and marking 
head as a whiteout) as it complicates the code.

Hope that helps!
sage

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Fwd: question about snapset
       [not found]   ` <CAB7xdi=NFjC4W3XYSLPYFrs_vN18+TY-5F_UGzCe_TCweYHxdg@mail.gmail.com>
@ 2017-02-03 17:02     ` sheng qiu
  2017-02-03 18:07       ` Sage Weil
  0 siblings, 1 reply; 5+ messages in thread
From: sheng qiu @ 2017-02-03 17:02 UTC (permalink / raw)
  To: ceph-devel

---------- Forwarded message ----------
From: sheng qiu <herbert1984106@gmail.com>
Date: Fri, Feb 3, 2017 at 7:45 AM
Subject: Re: question about snapset
To: Sage Weil <sage@newdream.net>
Cc: ceph-devel <ceph-devel@vger.kernel.org>


Hi Sage,

Thanks a lot for your reply. It's very helpful.
We are trying to avoid the query for snapset object for new object
write, we think it may save some latency. Currently when we measure
the op prepare latency, it's around 300 us, which is quite high. Is
this normal? In our test, we configure 64 pgs per osd and use five
shards with two workers per shard. The test machine is a four socket
with 40 cores and plenty of memory.
We are thinking how to reduce this latency, do you have any suggestions?

Thanks
Sheng


On Feb 3, 2017 6:05 AM, "Sage Weil" <sage@newdream.net> wrote:
>
> On Thu, 2 Feb 2017, sheng qiu wrote:
> > Hi cephers,
> >
> > We are reading the codes of Ceph I/O path. We found within the
> > do_op(), it tries to get the object context each time as well as the
> > snapset context.
> >
> > may i ask what's the usage of the snapset context and what's the
> > relationship between snapset context and the individual object? Does
> > each object has a snapset context ? Does it stores as attribute within
> > the onode that associated with the object?
> > i read some articles said only head object has snapset context, may i
> > ask what's the head object and what's the relationship between a head
> > object and other non-head objects?
>
> A given logical object may be contained by several snapshots, and we'll
> have a separate clone for each unique version of the object. The
> head (snapid == CEPH_NOSNAP) is the latest read/write version of the
> object.  A clone (snapid < CEPH_NOSNAP) is the version for some number of
> snapshots.  For example, if you wrote X, took snapshot 1, write X', wrote
> X'', took snap 2, took snap 3, then wrote X''', you'd have 3 clones with
> something like
>
> X    (snapid 1) snaps=[1]
> X''  (snapid 3) snaps=[2,3]
> X''' (head) clones=[1,2]
>
> The SnapSet is attached to the head and tells us we have 2 clones (1 and
> 3).  Each clone has a snaps vector that tells us which snaps it exists in.
> There's some other bookkeeping in SnapSet as well that tells us what data
> extents are identical across adjacent clones (so that they can share
> blocks on disk efficiently).
>
> There's one annoying oddity that if the head doesn't logically exist we
> create an object with snapid CEPH_SNAPDIR and attach the SnapSet to
> that.  We hope to remove this soon (by storing SnapSet on head and marking
> head as a whiteout) as it complicates the code.
>
> Hope that helps!
> sage

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fwd: question about snapset
  2017-02-03 17:02     ` Fwd: " sheng qiu
@ 2017-02-03 18:07       ` Sage Weil
  2017-02-05 23:28         ` Ming Lin
  0 siblings, 1 reply; 5+ messages in thread
From: Sage Weil @ 2017-02-03 18:07 UTC (permalink / raw)
  To: sheng qiu; +Cc: ceph-devel

On Fri, 3 Feb 2017, sheng qiu wrote:
> ---------- Forwarded message ----------
> From: sheng qiu <herbert1984106@gmail.com>
> Date: Fri, Feb 3, 2017 at 7:45 AM
> Subject: Re: question about snapset
> To: Sage Weil <sage@newdream.net>
> Cc: ceph-devel <ceph-devel@vger.kernel.org>
> 
> 
> Hi Sage,
> 
> Thanks a lot for your reply. It's very helpful.
> We are trying to avoid the query for snapset object for new object
> write, we think it may save some latency. Currently when we measure
> the op prepare latency, it's around 300 us, which is quite high. Is
> this normal? In our test, we configure 64 pgs per osd and use five
> shards with two workers per shard. The test machine is a four socket
> with 40 cores and plenty of memory.
> We are thinking how to reduce this latency, do you have any suggestions?

There are several sources, and this is an active area of 
investigation and optimization.  I'm not sure that the snapset 
specifically is probably a big part of the problem.. it's like the overall 
work involved with get_object_context(), which will fetch the 
attributes from the object.  The snapset will be a small part of this.

I suggest joining the weekly performance call if you can.  Or we can 
discuss some of the specific efforts on the list.  The main efforts here 
are

- simplifying ms_fast_dispatch so that incoming messages get queued more 
quickly
- making the new BlueStore (ObjectStore implementation) faster
- a big planned refactor for most of the do_op work that happens in 
between.

That's pretty vague but it's going to be a big project.  Right now we're 
trying to remove as much legacy complexity first to make our lives a 
bit easier...

sage


 > 
> Thanks
> Sheng
> 
> 
> On Feb 3, 2017 6:05 AM, "Sage Weil" <sage@newdream.net> wrote:
> >
> > On Thu, 2 Feb 2017, sheng qiu wrote:
> > > Hi cephers,
> > >
> > > We are reading the codes of Ceph I/O path. We found within the
> > > do_op(), it tries to get the object context each time as well as the
> > > snapset context.
> > >
> > > may i ask what's the usage of the snapset context and what's the
> > > relationship between snapset context and the individual object? Does
> > > each object has a snapset context ? Does it stores as attribute within
> > > the onode that associated with the object?
> > > i read some articles said only head object has snapset context, may i
> > > ask what's the head object and what's the relationship between a head
> > > object and other non-head objects?
> >
> > A given logical object may be contained by several snapshots, and we'll
> > have a separate clone for each unique version of the object. The
> > head (snapid == CEPH_NOSNAP) is the latest read/write version of the
> > object.  A clone (snapid < CEPH_NOSNAP) is the version for some number of
> > snapshots.  For example, if you wrote X, took snapshot 1, write X', wrote
> > X'', took snap 2, took snap 3, then wrote X''', you'd have 3 clones with
> > something like
> >
> > X    (snapid 1) snaps=[1]
> > X''  (snapid 3) snaps=[2,3]
> > X''' (head) clones=[1,2]
> >
> > The SnapSet is attached to the head and tells us we have 2 clones (1 and
> > 3).  Each clone has a snaps vector that tells us which snaps it exists in.
> > There's some other bookkeeping in SnapSet as well that tells us what data
> > extents are identical across adjacent clones (so that they can share
> > blocks on disk efficiently).
> >
> > There's one annoying oddity that if the head doesn't logically exist we
> > create an object with snapid CEPH_SNAPDIR and attach the SnapSet to
> > that.  We hope to remove this soon (by storing SnapSet on head and marking
> > head as a whiteout) as it complicates the code.
> >
> > Hope that helps!
> > sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fwd: question about snapset
  2017-02-03 18:07       ` Sage Weil
@ 2017-02-05 23:28         ` Ming Lin
  0 siblings, 0 replies; 5+ messages in thread
From: Ming Lin @ 2017-02-05 23:28 UTC (permalink / raw)
  To: Sage Weil; +Cc: sheng qiu, ceph-devel

On Fri, Feb 3, 2017 at 10:07 AM, Sage Weil <sage@newdream.net> wrote:
> On Fri, 3 Feb 2017, sheng qiu wrote:
>> ---------- Forwarded message ----------
>> From: sheng qiu <herbert1984106@gmail.com>
>> Date: Fri, Feb 3, 2017 at 7:45 AM
>> Subject: Re: question about snapset
>> To: Sage Weil <sage@newdream.net>
>> Cc: ceph-devel <ceph-devel@vger.kernel.org>
>>
>>
>> Hi Sage,
>>
>> Thanks a lot for your reply. It's very helpful.
>> We are trying to avoid the query for snapset object for new object
>> write, we think it may save some latency. Currently when we measure
>> the op prepare latency, it's around 300 us, which is quite high. Is
>> this normal? In our test, we configure 64 pgs per osd and use five
>> shards with two workers per shard. The test machine is a four socket
>> with 40 cores and plenty of memory.
>> We are thinking how to reduce this latency, do you have any suggestions?
>
> There are several sources, and this is an active area of
> investigation and optimization.  I'm not sure that the snapset
> specifically is probably a big part of the problem.. it's like the overall
> work involved with get_object_context(), which will fetch the
> attributes from the object.  The snapset will be a small part of this.
>
> I suggest joining the weekly performance call if you can.  Or we can
> discuss some of the specific efforts on the list.  The main efforts here
> are
>
> - simplifying ms_fast_dispatch so that incoming messages get queued more
> quickly

Look at below call stack.
Could you share more detail which parts to simplify?

(gdb) bt
#0  PG::queue_op ()
#1  OSD::enqueue_op ()
#2  OSD::handle_op ()
#3  OSD::dispatch_op_fast ()
#4  OSD::dispatch_session_waiting ()
#5  OSD::ms_fast_dispatch ()
#6  Messenger::ms_fast_dispatch ()
#7  DispatchQueue::fast_dispatch ()
#8  AsyncConnection::process ()
#9  EventCenter::process_events ()
#10 NetworkStack::<lambda()>::operator()
#12 start_thread ()
#13 clone ()

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-02-05 23:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-02 19:04 question about snapset sheng qiu
2017-02-03 14:05 ` Sage Weil
     [not found]   ` <CAB7xdi=NFjC4W3XYSLPYFrs_vN18+TY-5F_UGzCe_TCweYHxdg@mail.gmail.com>
2017-02-03 17:02     ` Fwd: " sheng qiu
2017-02-03 18:07       ` Sage Weil
2017-02-05 23:28         ` Ming Lin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.