All of lore.kernel.org
 help / color / mirror / Atom feed
* Snapshots of consistency groups
@ 2016-08-15 23:46 Victor Denisov
  2016-08-16 13:26 ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-08-15 23:46 UTC (permalink / raw)
  To: ceph-devel, Jason Dillaman, Josh Durgin, Mykola Golub

Gentlemen,

I'm writing to you to ask for your opinion regarding quiescing writes.

Here is the situation. In order to take snapshots of all images in a
consistency group,
we first need to quiesce all the image writers in the consistency group.
Let me call
group client - a client which requests a consistency group to take a snapshot.
Image client - the client that writes to an image.
Let's say group client starts sending notify_quiesce to all image
clients that write to the images in the group. After quiescing half of
the image clients the group client can die.

It presents us with a dilemma - what should we do with those quiesced
image clients.

Option 1 - is to wait till someone manually runs recover for that
consistency group.
We can show warning next to those unfinished groups when user runs
group list command.
There will be a command like group recover, which allows users to
rollback unsuccessful snapshots
or continue them using create snapshot command.

Option 2 - is to establish some heart beats between group client and
image client. If group client fails to heart beat then image client
unquiesces itself and continues normal operation.

Option 3 - is to have a timeout for each image client. If group client
fails to make a group snapshot within this timeout then we resume our
normal operation informing group client of the fact.

Which of these options do you prefer? Probably there are other options
that I miss.

Thanks,
Victor.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-08-15 23:46 Snapshots of consistency groups Victor Denisov
@ 2016-08-16 13:26 ` Jason Dillaman
  2016-08-16 13:29   ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Dillaman @ 2016-08-16 13:26 UTC (permalink / raw)
  To: Victor Denisov; +Cc: ceph-devel, Josh Durgin, Mykola Golub

Way back in April when we had the CDM, I was originally thinking we
should implement option 3. Essentially, you have a prepare group
snapshot RPC message that extends a "paused IO" lease to the caller.
When that lease expires, IO would automatically be resumed even if the
group snapshot hasn't been created yet.  This would also require
commit/abort group snapshot RPC messages.

However, thinking about this last night, here is another potential option:

Option 4 - require images to have the exclusive lock feature before
they can be added to a consistency group (and prevent disabling of
exclusive-lock while they are part of a group). Then librbd, via the
rbd CLI (or client application of the rbd consistency group snap
create API), can co-operatively acquire the lock from all active image
clients within the group (i.e. all IO has been flushed and paused) and
can proceed with snapshot creation. If the rbd CLI dies, the normal
exclusive lock handling process will automatically take care of
re-acquiring the lock from the dead client and resuming IO.

This option not only re-uses existing code, it would also eliminate
the need to add/update the RPC messages for prepare/commit/abort
snapshot creation to support group snapshots (since it could all be
handled internally).

On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> Gentlemen,
>
> I'm writing to you to ask for your opinion regarding quiescing writes.
>
> Here is the situation. In order to take snapshots of all images in a
> consistency group,
> we first need to quiesce all the image writers in the consistency group.
> Let me call
> group client - a client which requests a consistency group to take a snapshot.
> Image client - the client that writes to an image.
> Let's say group client starts sending notify_quiesce to all image
> clients that write to the images in the group. After quiescing half of
> the image clients the group client can die.
>
> It presents us with a dilemma - what should we do with those quiesced
> image clients.
>
> Option 1 - is to wait till someone manually runs recover for that
> consistency group.
> We can show warning next to those unfinished groups when user runs
> group list command.
> There will be a command like group recover, which allows users to
> rollback unsuccessful snapshots
> or continue them using create snapshot command.
>
> Option 2 - is to establish some heart beats between group client and
> image client. If group client fails to heart beat then image client
> unquiesces itself and continues normal operation.
>
> Option 3 - is to have a timeout for each image client. If group client
> fails to make a group snapshot within this timeout then we resume our
> normal operation informing group client of the fact.
>
> Which of these options do you prefer? Probably there are other options
> that I miss.
>
> Thanks,
> Victor.



-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-08-16 13:26 ` Jason Dillaman
@ 2016-08-16 13:29   ` Jason Dillaman
  2016-08-18 23:26     ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Dillaman @ 2016-08-16 13:29 UTC (permalink / raw)
  To: Victor Denisov; +Cc: ceph-devel, Josh Durgin, Mykola Golub

... one more thing:

I was also thinking that we need a new RBD feature bit to be used to
indicate that an image is part of a consistency group to prevent older
librbd clients from removing the image or group snapshots.  This could
be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
still open the image R/O while its part of a group.

On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> wrote:
> Way back in April when we had the CDM, I was originally thinking we
> should implement option 3. Essentially, you have a prepare group
> snapshot RPC message that extends a "paused IO" lease to the caller.
> When that lease expires, IO would automatically be resumed even if the
> group snapshot hasn't been created yet.  This would also require
> commit/abort group snapshot RPC messages.
>
> However, thinking about this last night, here is another potential option:
>
> Option 4 - require images to have the exclusive lock feature before
> they can be added to a consistency group (and prevent disabling of
> exclusive-lock while they are part of a group). Then librbd, via the
> rbd CLI (or client application of the rbd consistency group snap
> create API), can co-operatively acquire the lock from all active image
> clients within the group (i.e. all IO has been flushed and paused) and
> can proceed with snapshot creation. If the rbd CLI dies, the normal
> exclusive lock handling process will automatically take care of
> re-acquiring the lock from the dead client and resuming IO.
>
> This option not only re-uses existing code, it would also eliminate
> the need to add/update the RPC messages for prepare/commit/abort
> snapshot creation to support group snapshots (since it could all be
> handled internally).
>
> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> Gentlemen,
>>
>> I'm writing to you to ask for your opinion regarding quiescing writes.
>>
>> Here is the situation. In order to take snapshots of all images in a
>> consistency group,
>> we first need to quiesce all the image writers in the consistency group.
>> Let me call
>> group client - a client which requests a consistency group to take a snapshot.
>> Image client - the client that writes to an image.
>> Let's say group client starts sending notify_quiesce to all image
>> clients that write to the images in the group. After quiescing half of
>> the image clients the group client can die.
>>
>> It presents us with a dilemma - what should we do with those quiesced
>> image clients.
>>
>> Option 1 - is to wait till someone manually runs recover for that
>> consistency group.
>> We can show warning next to those unfinished groups when user runs
>> group list command.
>> There will be a command like group recover, which allows users to
>> rollback unsuccessful snapshots
>> or continue them using create snapshot command.
>>
>> Option 2 - is to establish some heart beats between group client and
>> image client. If group client fails to heart beat then image client
>> unquiesces itself and continues normal operation.
>>
>> Option 3 - is to have a timeout for each image client. If group client
>> fails to make a group snapshot within this timeout then we resume our
>> normal operation informing group client of the fact.
>>
>> Which of these options do you prefer? Probably there are other options
>> that I miss.
>>
>> Thanks,
>> Victor.
>
>
>
> --
> Jason



-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-08-16 13:29   ` Jason Dillaman
@ 2016-08-18 23:26     ` Victor Denisov
       [not found]       ` <CA+aFP1BMK1=daZpUFdNDJ+i8+nfQt42ahCXVaxy9fWdAY-kXwA@mail.gmail.com>
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-08-18 23:26 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: ceph-devel, Josh Durgin, Mykola Golub

If an image already has a writer who owns the lock,
should I implement a notification that allows to ask the writer to
release the lock,
is there already a standard way to intercept the exclusive lock?

On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com> wrote:
> ... one more thing:
>
> I was also thinking that we need a new RBD feature bit to be used to
> indicate that an image is part of a consistency group to prevent older
> librbd clients from removing the image or group snapshots.  This could
> be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
> still open the image R/O while its part of a group.
>
> On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>> Way back in April when we had the CDM, I was originally thinking we
>> should implement option 3. Essentially, you have a prepare group
>> snapshot RPC message that extends a "paused IO" lease to the caller.
>> When that lease expires, IO would automatically be resumed even if the
>> group snapshot hasn't been created yet.  This would also require
>> commit/abort group snapshot RPC messages.
>>
>> However, thinking about this last night, here is another potential option:
>>
>> Option 4 - require images to have the exclusive lock feature before
>> they can be added to a consistency group (and prevent disabling of
>> exclusive-lock while they are part of a group). Then librbd, via the
>> rbd CLI (or client application of the rbd consistency group snap
>> create API), can co-operatively acquire the lock from all active image
>> clients within the group (i.e. all IO has been flushed and paused) and
>> can proceed with snapshot creation. If the rbd CLI dies, the normal
>> exclusive lock handling process will automatically take care of
>> re-acquiring the lock from the dead client and resuming IO.
>>
>> This option not only re-uses existing code, it would also eliminate
>> the need to add/update the RPC messages for prepare/commit/abort
>> snapshot creation to support group snapshots (since it could all be
>> handled internally).
>>
>> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> Gentlemen,
>>>
>>> I'm writing to you to ask for your opinion regarding quiescing writes.
>>>
>>> Here is the situation. In order to take snapshots of all images in a
>>> consistency group,
>>> we first need to quiesce all the image writers in the consistency group.
>>> Let me call
>>> group client - a client which requests a consistency group to take a snapshot.
>>> Image client - the client that writes to an image.
>>> Let's say group client starts sending notify_quiesce to all image
>>> clients that write to the images in the group. After quiescing half of
>>> the image clients the group client can die.
>>>
>>> It presents us with a dilemma - what should we do with those quiesced
>>> image clients.
>>>
>>> Option 1 - is to wait till someone manually runs recover for that
>>> consistency group.
>>> We can show warning next to those unfinished groups when user runs
>>> group list command.
>>> There will be a command like group recover, which allows users to
>>> rollback unsuccessful snapshots
>>> or continue them using create snapshot command.
>>>
>>> Option 2 - is to establish some heart beats between group client and
>>> image client. If group client fails to heart beat then image client
>>> unquiesces itself and continues normal operation.
>>>
>>> Option 3 - is to have a timeout for each image client. If group client
>>> fails to make a group snapshot within this timeout then we resume our
>>> normal operation informing group client of the fact.
>>>
>>> Which of these options do you prefer? Probably there are other options
>>> that I miss.
>>>
>>> Thanks,
>>> Victor.
>>
>>
>>
>> --
>> Jason
>
>
>
> --
> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
       [not found]       ` <CA+aFP1BMK1=daZpUFdNDJ+i8+nfQt42ahCXVaxy9fWdAY-kXwA@mail.gmail.com>
@ 2016-08-19  4:20         ` Victor Denisov
  2016-08-19 12:48           ` Mykola Golub
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-08-19  4:20 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: ceph-devel, Josh Durgin, Mykola Golub

Could you please point me to the place in source code where writer
acquires an exclusive lock on the image.
I presume you were talking about the feature:
exclusive_lock, shared_lock which can be used from command line using
commands lock list, lock break.

On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote:
> There is already a "request lock" RPC message and this is already handled
> transparently within librbd when you attempt to acquire the lock and another
> client owns it.
>
>
> On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote:
>>
>> If an image already has a writer who owns the lock,
>> should I implement a notification that allows to ask the writer to
>> release the lock,
>> is there already a standard way to intercept the exclusive lock?
>>
>> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com>
>> wrote:
>> > ... one more thing:
>> >
>> > I was also thinking that we need a new RBD feature bit to be used to
>> > indicate that an image is part of a consistency group to prevent older
>> > librbd clients from removing the image or group snapshots.  This could
>> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
>> > still open the image R/O while its part of a group.
>> >
>> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com>
>> > wrote:
>> >> Way back in April when we had the CDM, I was originally thinking we
>> >> should implement option 3. Essentially, you have a prepare group
>> >> snapshot RPC message that extends a "paused IO" lease to the caller.
>> >> When that lease expires, IO would automatically be resumed even if the
>> >> group snapshot hasn't been created yet.  This would also require
>> >> commit/abort group snapshot RPC messages.
>> >>
>> >> However, thinking about this last night, here is another potential
>> >> option:
>> >>
>> >> Option 4 - require images to have the exclusive lock feature before
>> >> they can be added to a consistency group (and prevent disabling of
>> >> exclusive-lock while they are part of a group). Then librbd, via the
>> >> rbd CLI (or client application of the rbd consistency group snap
>> >> create API), can co-operatively acquire the lock from all active image
>> >> clients within the group (i.e. all IO has been flushed and paused) and
>> >> can proceed with snapshot creation. If the rbd CLI dies, the normal
>> >> exclusive lock handling process will automatically take care of
>> >> re-acquiring the lock from the dead client and resuming IO.
>> >>
>> >> This option not only re-uses existing code, it would also eliminate
>> >> the need to add/update the RPC messages for prepare/commit/abort
>> >> snapshot creation to support group snapshots (since it could all be
>> >> handled internally).
>> >>
>> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com>
>> >> wrote:
>> >>> Gentlemen,
>> >>>
>> >>> I'm writing to you to ask for your opinion regarding quiescing writes.
>> >>>
>> >>> Here is the situation. In order to take snapshots of all images in a
>> >>> consistency group,
>> >>> we first need to quiesce all the image writers in the consistency
>> >>> group.
>> >>> Let me call
>> >>> group client - a client which requests a consistency group to take a
>> >>> snapshot.
>> >>> Image client - the client that writes to an image.
>> >>> Let's say group client starts sending notify_quiesce to all image
>> >>> clients that write to the images in the group. After quiescing half of
>> >>> the image clients the group client can die.
>> >>>
>> >>> It presents us with a dilemma - what should we do with those quiesced
>> >>> image clients.
>> >>>
>> >>> Option 1 - is to wait till someone manually runs recover for that
>> >>> consistency group.
>> >>> We can show warning next to those unfinished groups when user runs
>> >>> group list command.
>> >>> There will be a command like group recover, which allows users to
>> >>> rollback unsuccessful snapshots
>> >>> or continue them using create snapshot command.
>> >>>
>> >>> Option 2 - is to establish some heart beats between group client and
>> >>> image client. If group client fails to heart beat then image client
>> >>> unquiesces itself and continues normal operation.
>> >>>
>> >>> Option 3 - is to have a timeout for each image client. If group client
>> >>> fails to make a group snapshot within this timeout then we resume our
>> >>> normal operation informing group client of the fact.
>> >>>
>> >>> Which of these options do you prefer? Probably there are other options
>> >>> that I miss.
>> >>>
>> >>> Thanks,
>> >>> Victor.
>> >>
>> >>
>> >>
>> >> --
>> >> Jason
>> >
>> >
>> >
>> > --
>> > Jason
>
>
>
> --
> Jason
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-08-19  4:20         ` Victor Denisov
@ 2016-08-19 12:48           ` Mykola Golub
  2016-08-20  0:36             ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Mykola Golub @ 2016-08-19 12:48 UTC (permalink / raw)
  To: Victor Denisov; +Cc: Jason Dillaman, ceph-devel, Josh Durgin

On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote:
> Could you please point me to the place in source code where writer
> acquires an exclusive lock on the image.

Grep for 'exclusive_lock->request_lock'. Basically, what you need
(after opening the image) is:

```
  C_SaferCond lock_ctx;
  {
    RWLock::WLocker l(ictx->owner_lock);

    if (ictx->exclusive_lock == nullptr) {
      // exclusive-lock feature is not enabled
      return -EINVAL;
    }
  
    // Request the lock. If it is currently owned by another client,
    // RPC message will be sent to the client to release the lock.
    ictx->exclusive_lock->request_lock(&lock_ctx);
  } // release owner_lock before waiting to avoid potential deadlock
  
  int r = lock_ctx.wait();
  if (r < 0) {
    return r;
  }
  
  RWLock::RLocker l(ictx->owner_lock);
  if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) {
       // failed to acquire exclusive lock
       return -EROFS;
  }
  
  // At this point lock is acquired
  ...

```

You might want to look at this PR

 https://github.com/ceph/ceph/pull/9592

where we discuss adding API methods to directly acquire and release
the exclusive lock. You don't need the API, but will find examples in
the patch, and also useful comments from Jason.

-- 
Mykola Golub

> I presume you were talking about the feature:
> exclusive_lock, shared_lock which can be used from command line using
> commands lock list, lock break.
> 
> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote:
> > There is already a "request lock" RPC message and this is already handled
> > transparently within librbd when you attempt to acquire the lock and another
> > client owns it.
> >
> >
> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote:
> >>
> >> If an image already has a writer who owns the lock,
> >> should I implement a notification that allows to ask the writer to
> >> release the lock,
> >> is there already a standard way to intercept the exclusive lock?
> >>
> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com>
> >> wrote:
> >> > ... one more thing:
> >> >
> >> > I was also thinking that we need a new RBD feature bit to be used to
> >> > indicate that an image is part of a consistency group to prevent older
> >> > librbd clients from removing the image or group snapshots.  This could
> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
> >> > still open the image R/O while its part of a group.
> >> >
> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com>
> >> > wrote:
> >> >> Way back in April when we had the CDM, I was originally thinking we
> >> >> should implement option 3. Essentially, you have a prepare group
> >> >> snapshot RPC message that extends a "paused IO" lease to the caller.
> >> >> When that lease expires, IO would automatically be resumed even if the
> >> >> group snapshot hasn't been created yet.  This would also require
> >> >> commit/abort group snapshot RPC messages.
> >> >>
> >> >> However, thinking about this last night, here is another potential
> >> >> option:
> >> >>
> >> >> Option 4 - require images to have the exclusive lock feature before
> >> >> they can be added to a consistency group (and prevent disabling of
> >> >> exclusive-lock while they are part of a group). Then librbd, via the
> >> >> rbd CLI (or client application of the rbd consistency group snap
> >> >> create API), can co-operatively acquire the lock from all active image
> >> >> clients within the group (i.e. all IO has been flushed and paused) and
> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal
> >> >> exclusive lock handling process will automatically take care of
> >> >> re-acquiring the lock from the dead client and resuming IO.
> >> >>
> >> >> This option not only re-uses existing code, it would also eliminate
> >> >> the need to add/update the RPC messages for prepare/commit/abort
> >> >> snapshot creation to support group snapshots (since it could all be
> >> >> handled internally).
> >> >>
> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com>
> >> >> wrote:
> >> >>> Gentlemen,
> >> >>>
> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes.
> >> >>>
> >> >>> Here is the situation. In order to take snapshots of all images in a
> >> >>> consistency group,
> >> >>> we first need to quiesce all the image writers in the consistency
> >> >>> group.
> >> >>> Let me call
> >> >>> group client - a client which requests a consistency group to take a
> >> >>> snapshot.
> >> >>> Image client - the client that writes to an image.
> >> >>> Let's say group client starts sending notify_quiesce to all image
> >> >>> clients that write to the images in the group. After quiescing half of
> >> >>> the image clients the group client can die.
> >> >>>
> >> >>> It presents us with a dilemma - what should we do with those quiesced
> >> >>> image clients.
> >> >>>
> >> >>> Option 1 - is to wait till someone manually runs recover for that
> >> >>> consistency group.
> >> >>> We can show warning next to those unfinished groups when user runs
> >> >>> group list command.
> >> >>> There will be a command like group recover, which allows users to
> >> >>> rollback unsuccessful snapshots
> >> >>> or continue them using create snapshot command.
> >> >>>
> >> >>> Option 2 - is to establish some heart beats between group client and
> >> >>> image client. If group client fails to heart beat then image client
> >> >>> unquiesces itself and continues normal operation.
> >> >>>
> >> >>> Option 3 - is to have a timeout for each image client. If group client
> >> >>> fails to make a group snapshot within this timeout then we resume our
> >> >>> normal operation informing group client of the fact.
> >> >>>
> >> >>> Which of these options do you prefer? Probably there are other options
> >> >>> that I miss.
> >> >>>
> >> >>> Thanks,
> >> >>> Victor.
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jason
> >> >
> >> >
> >> >
> >> > --
> >> > Jason
> >
> >
> >
> > --
> > Jason
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-08-19 12:48           ` Mykola Golub
@ 2016-08-20  0:36             ` Victor Denisov
  2016-08-20 16:27               ` Mykola Golub
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-08-20  0:36 UTC (permalink / raw)
  To: Mykola Golub; +Cc: Jason Dillaman, ceph-devel, Josh Durgin

What if I'm holding this lock and somebody else is trying to reacquire the lock.
How do I get notified about it?


On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote:
> On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote:
>> Could you please point me to the place in source code where writer
>> acquires an exclusive lock on the image.
>
> Grep for 'exclusive_lock->request_lock'. Basically, what you need
> (after opening the image) is:
>
> ```
>   C_SaferCond lock_ctx;
>   {
>     RWLock::WLocker l(ictx->owner_lock);
>
>     if (ictx->exclusive_lock == nullptr) {
>       // exclusive-lock feature is not enabled
>       return -EINVAL;
>     }
>
>     // Request the lock. If it is currently owned by another client,
>     // RPC message will be sent to the client to release the lock.
>     ictx->exclusive_lock->request_lock(&lock_ctx);
>   } // release owner_lock before waiting to avoid potential deadlock
>
>   int r = lock_ctx.wait();
>   if (r < 0) {
>     return r;
>   }
>
>   RWLock::RLocker l(ictx->owner_lock);
>   if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) {
>        // failed to acquire exclusive lock
>        return -EROFS;
>   }
>
>   // At this point lock is acquired
>   ...
>
> ```
>
> You might want to look at this PR
>
>  https://github.com/ceph/ceph/pull/9592
>
> where we discuss adding API methods to directly acquire and release
> the exclusive lock. You don't need the API, but will find examples in
> the patch, and also useful comments from Jason.
>
> --
> Mykola Golub
>
>> I presume you were talking about the feature:
>> exclusive_lock, shared_lock which can be used from command line using
>> commands lock list, lock break.
>>
>> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>> > There is already a "request lock" RPC message and this is already handled
>> > transparently within librbd when you attempt to acquire the lock and another
>> > client owns it.
>> >
>> >
>> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote:
>> >>
>> >> If an image already has a writer who owns the lock,
>> >> should I implement a notification that allows to ask the writer to
>> >> release the lock,
>> >> is there already a standard way to intercept the exclusive lock?
>> >>
>> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com>
>> >> wrote:
>> >> > ... one more thing:
>> >> >
>> >> > I was also thinking that we need a new RBD feature bit to be used to
>> >> > indicate that an image is part of a consistency group to prevent older
>> >> > librbd clients from removing the image or group snapshots.  This could
>> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
>> >> > still open the image R/O while its part of a group.
>> >> >
>> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com>
>> >> > wrote:
>> >> >> Way back in April when we had the CDM, I was originally thinking we
>> >> >> should implement option 3. Essentially, you have a prepare group
>> >> >> snapshot RPC message that extends a "paused IO" lease to the caller.
>> >> >> When that lease expires, IO would automatically be resumed even if the
>> >> >> group snapshot hasn't been created yet.  This would also require
>> >> >> commit/abort group snapshot RPC messages.
>> >> >>
>> >> >> However, thinking about this last night, here is another potential
>> >> >> option:
>> >> >>
>> >> >> Option 4 - require images to have the exclusive lock feature before
>> >> >> they can be added to a consistency group (and prevent disabling of
>> >> >> exclusive-lock while they are part of a group). Then librbd, via the
>> >> >> rbd CLI (or client application of the rbd consistency group snap
>> >> >> create API), can co-operatively acquire the lock from all active image
>> >> >> clients within the group (i.e. all IO has been flushed and paused) and
>> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal
>> >> >> exclusive lock handling process will automatically take care of
>> >> >> re-acquiring the lock from the dead client and resuming IO.
>> >> >>
>> >> >> This option not only re-uses existing code, it would also eliminate
>> >> >> the need to add/update the RPC messages for prepare/commit/abort
>> >> >> snapshot creation to support group snapshots (since it could all be
>> >> >> handled internally).
>> >> >>
>> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com>
>> >> >> wrote:
>> >> >>> Gentlemen,
>> >> >>>
>> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes.
>> >> >>>
>> >> >>> Here is the situation. In order to take snapshots of all images in a
>> >> >>> consistency group,
>> >> >>> we first need to quiesce all the image writers in the consistency
>> >> >>> group.
>> >> >>> Let me call
>> >> >>> group client - a client which requests a consistency group to take a
>> >> >>> snapshot.
>> >> >>> Image client - the client that writes to an image.
>> >> >>> Let's say group client starts sending notify_quiesce to all image
>> >> >>> clients that write to the images in the group. After quiescing half of
>> >> >>> the image clients the group client can die.
>> >> >>>
>> >> >>> It presents us with a dilemma - what should we do with those quiesced
>> >> >>> image clients.
>> >> >>>
>> >> >>> Option 1 - is to wait till someone manually runs recover for that
>> >> >>> consistency group.
>> >> >>> We can show warning next to those unfinished groups when user runs
>> >> >>> group list command.
>> >> >>> There will be a command like group recover, which allows users to
>> >> >>> rollback unsuccessful snapshots
>> >> >>> or continue them using create snapshot command.
>> >> >>>
>> >> >>> Option 2 - is to establish some heart beats between group client and
>> >> >>> image client. If group client fails to heart beat then image client
>> >> >>> unquiesces itself and continues normal operation.
>> >> >>>
>> >> >>> Option 3 - is to have a timeout for each image client. If group client
>> >> >>> fails to make a group snapshot within this timeout then we resume our
>> >> >>> normal operation informing group client of the fact.
>> >> >>>
>> >> >>> Which of these options do you prefer? Probably there are other options
>> >> >>> that I miss.
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Victor.
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Jason
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Jason
>> >
>> >
>> >
>> > --
>> > Jason
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-08-20  0:36             ` Victor Denisov
@ 2016-08-20 16:27               ` Mykola Golub
  2016-08-26 21:05                 ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Mykola Golub @ 2016-08-20 16:27 UTC (permalink / raw)
  To: Victor Denisov; +Cc: Jason Dillaman, ceph-devel, Josh Durgin

On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote:
> What if I'm holding this lock and somebody else is trying to reacquire the lock.
> How do I get notified about it?

The image watcher is notified, which triggers its handler:

 ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx)

The handler calls the current lock policy method `lock_requested()`,
which will define what to do with the lock request. The StandartPolicy
is to release the lock, so it may ping-ponging between the
clients. You may define a different policy -- rbd-mirror is an example
where it is used.

Everywhere where an operation needs the exclusive lock, it is always
checked if we currently are a lock owner, i.e:

 ictx->exclusive_lock->is_lock_owner()

and if it is false, the exlusive lock is requested. Before this check
you need to aquire ctx->owner_lock, and until you release owner_lock
you can be sure your exclusive lock will not leak to another
client. After releasing owner_lock, you will need to repeate the check
again when you need it.

-- 
Mykola Golub

> 
> 
> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote:
> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote:
> >> Could you please point me to the place in source code where writer
> >> acquires an exclusive lock on the image.
> >
> > Grep for 'exclusive_lock->request_lock'. Basically, what you need
> > (after opening the image) is:
> >
> > ```
> >   C_SaferCond lock_ctx;
> >   {
> >     RWLock::WLocker l(ictx->owner_lock);
> >
> >     if (ictx->exclusive_lock == nullptr) {
> >       // exclusive-lock feature is not enabled
> >       return -EINVAL;
> >     }
> >
> >     // Request the lock. If it is currently owned by another client,
> >     // RPC message will be sent to the client to release the lock.
> >     ictx->exclusive_lock->request_lock(&lock_ctx);
> >   } // release owner_lock before waiting to avoid potential deadlock
> >
> >   int r = lock_ctx.wait();
> >   if (r < 0) {
> >     return r;
> >   }
> >
> >   RWLock::RLocker l(ictx->owner_lock);
> >   if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) {
> >        // failed to acquire exclusive lock
> >        return -EROFS;
> >   }
> >
> >   // At this point lock is acquired
> >   ...
> >
> > ```
> >
> > You might want to look at this PR
> >
> >  https://github.com/ceph/ceph/pull/9592
> >
> > where we discuss adding API methods to directly acquire and release
> > the exclusive lock. You don't need the API, but will find examples in
> > the patch, and also useful comments from Jason.
> >
> > --
> > Mykola Golub
> >
> >> I presume you were talking about the feature:
> >> exclusive_lock, shared_lock which can be used from command line using
> >> commands lock list, lock break.
> >>
> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote:
> >> > There is already a "request lock" RPC message and this is already handled
> >> > transparently within librbd when you attempt to acquire the lock and another
> >> > client owns it.
> >> >
> >> >
> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote:
> >> >>
> >> >> If an image already has a writer who owns the lock,
> >> >> should I implement a notification that allows to ask the writer to
> >> >> release the lock,
> >> >> is there already a standard way to intercept the exclusive lock?
> >> >>
> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com>
> >> >> wrote:
> >> >> > ... one more thing:
> >> >> >
> >> >> > I was also thinking that we need a new RBD feature bit to be used to
> >> >> > indicate that an image is part of a consistency group to prevent older
> >> >> > librbd clients from removing the image or group snapshots.  This could
> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
> >> >> > still open the image R/O while its part of a group.
> >> >> >
> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com>
> >> >> > wrote:
> >> >> >> Way back in April when we had the CDM, I was originally thinking we
> >> >> >> should implement option 3. Essentially, you have a prepare group
> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller.
> >> >> >> When that lease expires, IO would automatically be resumed even if the
> >> >> >> group snapshot hasn't been created yet.  This would also require
> >> >> >> commit/abort group snapshot RPC messages.
> >> >> >>
> >> >> >> However, thinking about this last night, here is another potential
> >> >> >> option:
> >> >> >>
> >> >> >> Option 4 - require images to have the exclusive lock feature before
> >> >> >> they can be added to a consistency group (and prevent disabling of
> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the
> >> >> >> rbd CLI (or client application of the rbd consistency group snap
> >> >> >> create API), can co-operatively acquire the lock from all active image
> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and
> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal
> >> >> >> exclusive lock handling process will automatically take care of
> >> >> >> re-acquiring the lock from the dead client and resuming IO.
> >> >> >>
> >> >> >> This option not only re-uses existing code, it would also eliminate
> >> >> >> the need to add/update the RPC messages for prepare/commit/abort
> >> >> >> snapshot creation to support group snapshots (since it could all be
> >> >> >> handled internally).
> >> >> >>
> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com>
> >> >> >> wrote:
> >> >> >>> Gentlemen,
> >> >> >>>
> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes.
> >> >> >>>
> >> >> >>> Here is the situation. In order to take snapshots of all images in a
> >> >> >>> consistency group,
> >> >> >>> we first need to quiesce all the image writers in the consistency
> >> >> >>> group.
> >> >> >>> Let me call
> >> >> >>> group client - a client which requests a consistency group to take a
> >> >> >>> snapshot.
> >> >> >>> Image client - the client that writes to an image.
> >> >> >>> Let's say group client starts sending notify_quiesce to all image
> >> >> >>> clients that write to the images in the group. After quiescing half of
> >> >> >>> the image clients the group client can die.
> >> >> >>>
> >> >> >>> It presents us with a dilemma - what should we do with those quiesced
> >> >> >>> image clients.
> >> >> >>>
> >> >> >>> Option 1 - is to wait till someone manually runs recover for that
> >> >> >>> consistency group.
> >> >> >>> We can show warning next to those unfinished groups when user runs
> >> >> >>> group list command.
> >> >> >>> There will be a command like group recover, which allows users to
> >> >> >>> rollback unsuccessful snapshots
> >> >> >>> or continue them using create snapshot command.
> >> >> >>>
> >> >> >>> Option 2 - is to establish some heart beats between group client and
> >> >> >>> image client. If group client fails to heart beat then image client
> >> >> >>> unquiesces itself and continues normal operation.
> >> >> >>>
> >> >> >>> Option 3 - is to have a timeout for each image client. If group client
> >> >> >>> fails to make a group snapshot within this timeout then we resume our
> >> >> >>> normal operation informing group client of the fact.
> >> >> >>>
> >> >> >>> Which of these options do you prefer? Probably there are other options
> >> >> >>> that I miss.
> >> >> >>>
> >> >> >>> Thanks,
> >> >> >>> Victor.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Jason
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Jason
> >> >
> >> >
> >> >
> >> > --
> >> > Jason
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-08-20 16:27               ` Mykola Golub
@ 2016-08-26 21:05                 ` Victor Denisov
  2016-08-29  0:37                   ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-08-26 21:05 UTC (permalink / raw)
  To: Mykola Golub; +Cc: Jason Dillaman, ceph-devel, Josh Durgin

Guys,

I updated Snapshots section of this document:
http://pad.ceph.com/p/consistency_groups, in accordance with my
improved understanding of how it should be implemented.
Please take a look and provide your comments. Some of my concerns
regarding the implementation I highlighted in bold.

Looking forward to your valuable remarks.
Thanks in advance.
V.


On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote:
> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote:
>> What if I'm holding this lock and somebody else is trying to reacquire the lock.
>> How do I get notified about it?
>
> The image watcher is notified, which triggers its handler:
>
>  ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx)
>
> The handler calls the current lock policy method `lock_requested()`,
> which will define what to do with the lock request. The StandartPolicy
> is to release the lock, so it may ping-ponging between the
> clients. You may define a different policy -- rbd-mirror is an example
> where it is used.
>
> Everywhere where an operation needs the exclusive lock, it is always
> checked if we currently are a lock owner, i.e:
>
>  ictx->exclusive_lock->is_lock_owner()
>
> and if it is false, the exlusive lock is requested. Before this check
> you need to aquire ctx->owner_lock, and until you release owner_lock
> you can be sure your exclusive lock will not leak to another
> client. After releasing owner_lock, you will need to repeate the check
> again when you need it.
>
> --
> Mykola Golub
>
>>
>>
>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote:
>> >> Could you please point me to the place in source code where writer
>> >> acquires an exclusive lock on the image.
>> >
>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need
>> > (after opening the image) is:
>> >
>> > ```
>> >   C_SaferCond lock_ctx;
>> >   {
>> >     RWLock::WLocker l(ictx->owner_lock);
>> >
>> >     if (ictx->exclusive_lock == nullptr) {
>> >       // exclusive-lock feature is not enabled
>> >       return -EINVAL;
>> >     }
>> >
>> >     // Request the lock. If it is currently owned by another client,
>> >     // RPC message will be sent to the client to release the lock.
>> >     ictx->exclusive_lock->request_lock(&lock_ctx);
>> >   } // release owner_lock before waiting to avoid potential deadlock
>> >
>> >   int r = lock_ctx.wait();
>> >   if (r < 0) {
>> >     return r;
>> >   }
>> >
>> >   RWLock::RLocker l(ictx->owner_lock);
>> >   if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) {
>> >        // failed to acquire exclusive lock
>> >        return -EROFS;
>> >   }
>> >
>> >   // At this point lock is acquired
>> >   ...
>> >
>> > ```
>> >
>> > You might want to look at this PR
>> >
>> >  https://github.com/ceph/ceph/pull/9592
>> >
>> > where we discuss adding API methods to directly acquire and release
>> > the exclusive lock. You don't need the API, but will find examples in
>> > the patch, and also useful comments from Jason.
>> >
>> > --
>> > Mykola Golub
>> >
>> >> I presume you were talking about the feature:
>> >> exclusive_lock, shared_lock which can be used from command line using
>> >> commands lock list, lock break.
>> >>
>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>> >> > There is already a "request lock" RPC message and this is already handled
>> >> > transparently within librbd when you attempt to acquire the lock and another
>> >> > client owns it.
>> >> >
>> >> >
>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote:
>> >> >>
>> >> >> If an image already has a writer who owns the lock,
>> >> >> should I implement a notification that allows to ask the writer to
>> >> >> release the lock,
>> >> >> is there already a standard way to intercept the exclusive lock?
>> >> >>
>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com>
>> >> >> wrote:
>> >> >> > ... one more thing:
>> >> >> >
>> >> >> > I was also thinking that we need a new RBD feature bit to be used to
>> >> >> > indicate that an image is part of a consistency group to prevent older
>> >> >> > librbd clients from removing the image or group snapshots.  This could
>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
>> >> >> > still open the image R/O while its part of a group.
>> >> >> >
>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com>
>> >> >> > wrote:
>> >> >> >> Way back in April when we had the CDM, I was originally thinking we
>> >> >> >> should implement option 3. Essentially, you have a prepare group
>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller.
>> >> >> >> When that lease expires, IO would automatically be resumed even if the
>> >> >> >> group snapshot hasn't been created yet.  This would also require
>> >> >> >> commit/abort group snapshot RPC messages.
>> >> >> >>
>> >> >> >> However, thinking about this last night, here is another potential
>> >> >> >> option:
>> >> >> >>
>> >> >> >> Option 4 - require images to have the exclusive lock feature before
>> >> >> >> they can be added to a consistency group (and prevent disabling of
>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the
>> >> >> >> rbd CLI (or client application of the rbd consistency group snap
>> >> >> >> create API), can co-operatively acquire the lock from all active image
>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and
>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal
>> >> >> >> exclusive lock handling process will automatically take care of
>> >> >> >> re-acquiring the lock from the dead client and resuming IO.
>> >> >> >>
>> >> >> >> This option not only re-uses existing code, it would also eliminate
>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort
>> >> >> >> snapshot creation to support group snapshots (since it could all be
>> >> >> >> handled internally).
>> >> >> >>
>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com>
>> >> >> >> wrote:
>> >> >> >>> Gentlemen,
>> >> >> >>>
>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes.
>> >> >> >>>
>> >> >> >>> Here is the situation. In order to take snapshots of all images in a
>> >> >> >>> consistency group,
>> >> >> >>> we first need to quiesce all the image writers in the consistency
>> >> >> >>> group.
>> >> >> >>> Let me call
>> >> >> >>> group client - a client which requests a consistency group to take a
>> >> >> >>> snapshot.
>> >> >> >>> Image client - the client that writes to an image.
>> >> >> >>> Let's say group client starts sending notify_quiesce to all image
>> >> >> >>> clients that write to the images in the group. After quiescing half of
>> >> >> >>> the image clients the group client can die.
>> >> >> >>>
>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced
>> >> >> >>> image clients.
>> >> >> >>>
>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that
>> >> >> >>> consistency group.
>> >> >> >>> We can show warning next to those unfinished groups when user runs
>> >> >> >>> group list command.
>> >> >> >>> There will be a command like group recover, which allows users to
>> >> >> >>> rollback unsuccessful snapshots
>> >> >> >>> or continue them using create snapshot command.
>> >> >> >>>
>> >> >> >>> Option 2 - is to establish some heart beats between group client and
>> >> >> >>> image client. If group client fails to heart beat then image client
>> >> >> >>> unquiesces itself and continues normal operation.
>> >> >> >>>
>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client
>> >> >> >>> fails to make a group snapshot within this timeout then we resume our
>> >> >> >>> normal operation informing group client of the fact.
>> >> >> >>>
>> >> >> >>> Which of these options do you prefer? Probably there are other options
>> >> >> >>> that I miss.
>> >> >> >>>
>> >> >> >>> Thanks,
>> >> >> >>> Victor.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Jason
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Jason
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Jason
>> >> >
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-08-26 21:05                 ` Victor Denisov
@ 2016-08-29  0:37                   ` Jason Dillaman
  2016-08-29 21:10                     ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Dillaman @ 2016-08-29  0:37 UTC (permalink / raw)
  To: Victor Denisov; +Cc: Mykola Golub, ceph-devel, Josh Durgin

I think the first step is to implement the concept of snapshot "namespaces".

This could be implemented as an optional variant structure associated
with each snapshot at creation (see the ImageWatcher RPC messages or
journaling event type encoding for examples of this). For consistency
group snapshots, this structure would identify the snapshot as
belonging to the consistency group and have a unique id back to the
specific group snapshot.

When creating a snapshot, the state machine would (1) create the group
snapshot record, (2) set the state of the group to "creating snapshot"
(to prevent new images from being added/removed from the group while
the op is in-progress), (3) acquire the lock for all images in the
group, (4) create the individual image snapshots with the linkage back
to the group snapshot record (can be performed in parallel up to max
concurrent ops), (5) release the exclusive locks, and (6) reset the
group status to "ready".

If you have a hard crash/failure anywhere, a "snap remove" operation
should be designed to get the group back into consistent state (i.e.
remove any snapshots linked to the group and reset the group state
back to ready).

On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> Guys,
>
> I updated Snapshots section of this document:
> http://pad.ceph.com/p/consistency_groups, in accordance with my
> improved understanding of how it should be implemented.
> Please take a look and provide your comments. Some of my concerns
> regarding the implementation I highlighted in bold.
>
> Looking forward to your valuable remarks.
> Thanks in advance.
> V.
>
>
> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote:
>>> What if I'm holding this lock and somebody else is trying to reacquire the lock.
>>> How do I get notified about it?
>>
>> The image watcher is notified, which triggers its handler:
>>
>>  ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx)
>>
>> The handler calls the current lock policy method `lock_requested()`,
>> which will define what to do with the lock request. The StandartPolicy
>> is to release the lock, so it may ping-ponging between the
>> clients. You may define a different policy -- rbd-mirror is an example
>> where it is used.
>>
>> Everywhere where an operation needs the exclusive lock, it is always
>> checked if we currently are a lock owner, i.e:
>>
>>  ictx->exclusive_lock->is_lock_owner()
>>
>> and if it is false, the exlusive lock is requested. Before this check
>> you need to aquire ctx->owner_lock, and until you release owner_lock
>> you can be sure your exclusive lock will not leak to another
>> client. After releasing owner_lock, you will need to repeate the check
>> again when you need it.
>>
>> --
>> Mykola Golub
>>
>>>
>>>
>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote:
>>> >> Could you please point me to the place in source code where writer
>>> >> acquires an exclusive lock on the image.
>>> >
>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need
>>> > (after opening the image) is:
>>> >
>>> > ```
>>> >   C_SaferCond lock_ctx;
>>> >   {
>>> >     RWLock::WLocker l(ictx->owner_lock);
>>> >
>>> >     if (ictx->exclusive_lock == nullptr) {
>>> >       // exclusive-lock feature is not enabled
>>> >       return -EINVAL;
>>> >     }
>>> >
>>> >     // Request the lock. If it is currently owned by another client,
>>> >     // RPC message will be sent to the client to release the lock.
>>> >     ictx->exclusive_lock->request_lock(&lock_ctx);
>>> >   } // release owner_lock before waiting to avoid potential deadlock
>>> >
>>> >   int r = lock_ctx.wait();
>>> >   if (r < 0) {
>>> >     return r;
>>> >   }
>>> >
>>> >   RWLock::RLocker l(ictx->owner_lock);
>>> >   if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) {
>>> >        // failed to acquire exclusive lock
>>> >        return -EROFS;
>>> >   }
>>> >
>>> >   // At this point lock is acquired
>>> >   ...
>>> >
>>> > ```
>>> >
>>> > You might want to look at this PR
>>> >
>>> >  https://github.com/ceph/ceph/pull/9592
>>> >
>>> > where we discuss adding API methods to directly acquire and release
>>> > the exclusive lock. You don't need the API, but will find examples in
>>> > the patch, and also useful comments from Jason.
>>> >
>>> > --
>>> > Mykola Golub
>>> >
>>> >> I presume you were talking about the feature:
>>> >> exclusive_lock, shared_lock which can be used from command line using
>>> >> commands lock list, lock break.
>>> >>
>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>> >> > There is already a "request lock" RPC message and this is already handled
>>> >> > transparently within librbd when you attempt to acquire the lock and another
>>> >> > client owns it.
>>> >> >
>>> >> >
>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> >> >>
>>> >> >> If an image already has a writer who owns the lock,
>>> >> >> should I implement a notification that allows to ask the writer to
>>> >> >> release the lock,
>>> >> >> is there already a standard way to intercept the exclusive lock?
>>> >> >>
>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com>
>>> >> >> wrote:
>>> >> >> > ... one more thing:
>>> >> >> >
>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to
>>> >> >> > indicate that an image is part of a consistency group to prevent older
>>> >> >> > librbd clients from removing the image or group snapshots.  This could
>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
>>> >> >> > still open the image R/O while its part of a group.
>>> >> >> >
>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com>
>>> >> >> > wrote:
>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we
>>> >> >> >> should implement option 3. Essentially, you have a prepare group
>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller.
>>> >> >> >> When that lease expires, IO would automatically be resumed even if the
>>> >> >> >> group snapshot hasn't been created yet.  This would also require
>>> >> >> >> commit/abort group snapshot RPC messages.
>>> >> >> >>
>>> >> >> >> However, thinking about this last night, here is another potential
>>> >> >> >> option:
>>> >> >> >>
>>> >> >> >> Option 4 - require images to have the exclusive lock feature before
>>> >> >> >> they can be added to a consistency group (and prevent disabling of
>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the
>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap
>>> >> >> >> create API), can co-operatively acquire the lock from all active image
>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and
>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal
>>> >> >> >> exclusive lock handling process will automatically take care of
>>> >> >> >> re-acquiring the lock from the dead client and resuming IO.
>>> >> >> >>
>>> >> >> >> This option not only re-uses existing code, it would also eliminate
>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort
>>> >> >> >> snapshot creation to support group snapshots (since it could all be
>>> >> >> >> handled internally).
>>> >> >> >>
>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com>
>>> >> >> >> wrote:
>>> >> >> >>> Gentlemen,
>>> >> >> >>>
>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes.
>>> >> >> >>>
>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a
>>> >> >> >>> consistency group,
>>> >> >> >>> we first need to quiesce all the image writers in the consistency
>>> >> >> >>> group.
>>> >> >> >>> Let me call
>>> >> >> >>> group client - a client which requests a consistency group to take a
>>> >> >> >>> snapshot.
>>> >> >> >>> Image client - the client that writes to an image.
>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image
>>> >> >> >>> clients that write to the images in the group. After quiescing half of
>>> >> >> >>> the image clients the group client can die.
>>> >> >> >>>
>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced
>>> >> >> >>> image clients.
>>> >> >> >>>
>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that
>>> >> >> >>> consistency group.
>>> >> >> >>> We can show warning next to those unfinished groups when user runs
>>> >> >> >>> group list command.
>>> >> >> >>> There will be a command like group recover, which allows users to
>>> >> >> >>> rollback unsuccessful snapshots
>>> >> >> >>> or continue them using create snapshot command.
>>> >> >> >>>
>>> >> >> >>> Option 2 - is to establish some heart beats between group client and
>>> >> >> >>> image client. If group client fails to heart beat then image client
>>> >> >> >>> unquiesces itself and continues normal operation.
>>> >> >> >>>
>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client
>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our
>>> >> >> >>> normal operation informing group client of the fact.
>>> >> >> >>>
>>> >> >> >>> Which of these options do you prefer? Probably there are other options
>>> >> >> >>> that I miss.
>>> >> >> >>>
>>> >> >> >>> Thanks,
>>> >> >> >>> Victor.
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >> Jason
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > Jason
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Jason
>>> >> >
>>> >> --
>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >> the body of a message to majordomo@vger.kernel.org
>>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-08-29  0:37                   ` Jason Dillaman
@ 2016-08-29 21:10                     ` Victor Denisov
  2016-09-09 22:41                       ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-08-29 21:10 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin

Right, I forgot about snaphot "namespaces". I'll add this part.
I guess it makes sense to discuss the whole thing on the next CDM.

On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@redhat.com> wrote:
> I think the first step is to implement the concept of snapshot "namespaces".
>
> This could be implemented as an optional variant structure associated
> with each snapshot at creation (see the ImageWatcher RPC messages or
> journaling event type encoding for examples of this). For consistency
> group snapshots, this structure would identify the snapshot as
> belonging to the consistency group and have a unique id back to the
> specific group snapshot.
>
> When creating a snapshot, the state machine would (1) create the group
> snapshot record, (2) set the state of the group to "creating snapshot"
> (to prevent new images from being added/removed from the group while
> the op is in-progress), (3) acquire the lock for all images in the
> group, (4) create the individual image snapshots with the linkage back
> to the group snapshot record (can be performed in parallel up to max
> concurrent ops), (5) release the exclusive locks, and (6) reset the
> group status to "ready".
>
> If you have a hard crash/failure anywhere, a "snap remove" operation
> should be designed to get the group back into consistent state (i.e.
> remove any snapshots linked to the group and reset the group state
> back to ready).
>
> On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> Guys,
>>
>> I updated Snapshots section of this document:
>> http://pad.ceph.com/p/consistency_groups, in accordance with my
>> improved understanding of how it should be implemented.
>> Please take a look and provide your comments. Some of my concerns
>> regarding the implementation I highlighted in bold.
>>
>> Looking forward to your valuable remarks.
>> Thanks in advance.
>> V.
>>
>>
>> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote:
>>>> What if I'm holding this lock and somebody else is trying to reacquire the lock.
>>>> How do I get notified about it?
>>>
>>> The image watcher is notified, which triggers its handler:
>>>
>>>  ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx)
>>>
>>> The handler calls the current lock policy method `lock_requested()`,
>>> which will define what to do with the lock request. The StandartPolicy
>>> is to release the lock, so it may ping-ponging between the
>>> clients. You may define a different policy -- rbd-mirror is an example
>>> where it is used.
>>>
>>> Everywhere where an operation needs the exclusive lock, it is always
>>> checked if we currently are a lock owner, i.e:
>>>
>>>  ictx->exclusive_lock->is_lock_owner()
>>>
>>> and if it is false, the exlusive lock is requested. Before this check
>>> you need to aquire ctx->owner_lock, and until you release owner_lock
>>> you can be sure your exclusive lock will not leak to another
>>> client. After releasing owner_lock, you will need to repeate the check
>>> again when you need it.
>>>
>>> --
>>> Mykola Golub
>>>
>>>>
>>>>
>>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote:
>>>> >> Could you please point me to the place in source code where writer
>>>> >> acquires an exclusive lock on the image.
>>>> >
>>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need
>>>> > (after opening the image) is:
>>>> >
>>>> > ```
>>>> >   C_SaferCond lock_ctx;
>>>> >   {
>>>> >     RWLock::WLocker l(ictx->owner_lock);
>>>> >
>>>> >     if (ictx->exclusive_lock == nullptr) {
>>>> >       // exclusive-lock feature is not enabled
>>>> >       return -EINVAL;
>>>> >     }
>>>> >
>>>> >     // Request the lock. If it is currently owned by another client,
>>>> >     // RPC message will be sent to the client to release the lock.
>>>> >     ictx->exclusive_lock->request_lock(&lock_ctx);
>>>> >   } // release owner_lock before waiting to avoid potential deadlock
>>>> >
>>>> >   int r = lock_ctx.wait();
>>>> >   if (r < 0) {
>>>> >     return r;
>>>> >   }
>>>> >
>>>> >   RWLock::RLocker l(ictx->owner_lock);
>>>> >   if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) {
>>>> >        // failed to acquire exclusive lock
>>>> >        return -EROFS;
>>>> >   }
>>>> >
>>>> >   // At this point lock is acquired
>>>> >   ...
>>>> >
>>>> > ```
>>>> >
>>>> > You might want to look at this PR
>>>> >
>>>> >  https://github.com/ceph/ceph/pull/9592
>>>> >
>>>> > where we discuss adding API methods to directly acquire and release
>>>> > the exclusive lock. You don't need the API, but will find examples in
>>>> > the patch, and also useful comments from Jason.
>>>> >
>>>> > --
>>>> > Mykola Golub
>>>> >
>>>> >> I presume you were talking about the feature:
>>>> >> exclusive_lock, shared_lock which can be used from command line using
>>>> >> commands lock list, lock break.
>>>> >>
>>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>> >> > There is already a "request lock" RPC message and this is already handled
>>>> >> > transparently within librbd when you attempt to acquire the lock and another
>>>> >> > client owns it.
>>>> >> >
>>>> >> >
>>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>> >> >>
>>>> >> >> If an image already has a writer who owns the lock,
>>>> >> >> should I implement a notification that allows to ask the writer to
>>>> >> >> release the lock,
>>>> >> >> is there already a standard way to intercept the exclusive lock?
>>>> >> >>
>>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com>
>>>> >> >> wrote:
>>>> >> >> > ... one more thing:
>>>> >> >> >
>>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to
>>>> >> >> > indicate that an image is part of a consistency group to prevent older
>>>> >> >> > librbd clients from removing the image or group snapshots.  This could
>>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
>>>> >> >> > still open the image R/O while its part of a group.
>>>> >> >> >
>>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com>
>>>> >> >> > wrote:
>>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we
>>>> >> >> >> should implement option 3. Essentially, you have a prepare group
>>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller.
>>>> >> >> >> When that lease expires, IO would automatically be resumed even if the
>>>> >> >> >> group snapshot hasn't been created yet.  This would also require
>>>> >> >> >> commit/abort group snapshot RPC messages.
>>>> >> >> >>
>>>> >> >> >> However, thinking about this last night, here is another potential
>>>> >> >> >> option:
>>>> >> >> >>
>>>> >> >> >> Option 4 - require images to have the exclusive lock feature before
>>>> >> >> >> they can be added to a consistency group (and prevent disabling of
>>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the
>>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap
>>>> >> >> >> create API), can co-operatively acquire the lock from all active image
>>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and
>>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal
>>>> >> >> >> exclusive lock handling process will automatically take care of
>>>> >> >> >> re-acquiring the lock from the dead client and resuming IO.
>>>> >> >> >>
>>>> >> >> >> This option not only re-uses existing code, it would also eliminate
>>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort
>>>> >> >> >> snapshot creation to support group snapshots (since it could all be
>>>> >> >> >> handled internally).
>>>> >> >> >>
>>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com>
>>>> >> >> >> wrote:
>>>> >> >> >>> Gentlemen,
>>>> >> >> >>>
>>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes.
>>>> >> >> >>>
>>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a
>>>> >> >> >>> consistency group,
>>>> >> >> >>> we first need to quiesce all the image writers in the consistency
>>>> >> >> >>> group.
>>>> >> >> >>> Let me call
>>>> >> >> >>> group client - a client which requests a consistency group to take a
>>>> >> >> >>> snapshot.
>>>> >> >> >>> Image client - the client that writes to an image.
>>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image
>>>> >> >> >>> clients that write to the images in the group. After quiescing half of
>>>> >> >> >>> the image clients the group client can die.
>>>> >> >> >>>
>>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced
>>>> >> >> >>> image clients.
>>>> >> >> >>>
>>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that
>>>> >> >> >>> consistency group.
>>>> >> >> >>> We can show warning next to those unfinished groups when user runs
>>>> >> >> >>> group list command.
>>>> >> >> >>> There will be a command like group recover, which allows users to
>>>> >> >> >>> rollback unsuccessful snapshots
>>>> >> >> >>> or continue them using create snapshot command.
>>>> >> >> >>>
>>>> >> >> >>> Option 2 - is to establish some heart beats between group client and
>>>> >> >> >>> image client. If group client fails to heart beat then image client
>>>> >> >> >>> unquiesces itself and continues normal operation.
>>>> >> >> >>>
>>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client
>>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our
>>>> >> >> >>> normal operation informing group client of the fact.
>>>> >> >> >>>
>>>> >> >> >>> Which of these options do you prefer? Probably there are other options
>>>> >> >> >>> that I miss.
>>>> >> >> >>>
>>>> >> >> >>> Thanks,
>>>> >> >> >>> Victor.
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >> --
>>>> >> >> >> Jason
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > --
>>>> >> >> > Jason
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > --
>>>> >> > Jason
>>>> >> >
>>>> >> --
>>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> >> the body of a message to majordomo@vger.kernel.org
>>>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-08-29 21:10                     ` Victor Denisov
@ 2016-09-09 22:41                       ` Victor Denisov
  2016-09-10 18:37                         ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-09-09 22:41 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin

I have a question about where SnapshotNamespace type should be placed.
I placed it in cls/rbd/cls_rbd_types.h because cls client and cls
backend components should have access to this type.
Also this type is required in librbd/Operations.cc - because we want
to specify in what namespace Operations::snap_create should create
snapshots.
However Operations.cc doesn't import cls_rbd_types.h right now. If the
question was about public interface of librbd/librbd.cc, then I would
create a duplicate of SnapshotNamespace type in librbd layer without
hesitation.
But these functions are internal, so, my question is whether it's
really feasible to create another type for SnapshotNamespace in librbd
layer.


On Mon, Aug 29, 2016 at 2:10 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> Right, I forgot about snaphot "namespaces". I'll add this part.
> I guess it makes sense to discuss the whole thing on the next CDM.
>
> On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>> I think the first step is to implement the concept of snapshot "namespaces".
>>
>> This could be implemented as an optional variant structure associated
>> with each snapshot at creation (see the ImageWatcher RPC messages or
>> journaling event type encoding for examples of this). For consistency
>> group snapshots, this structure would identify the snapshot as
>> belonging to the consistency group and have a unique id back to the
>> specific group snapshot.
>>
>> When creating a snapshot, the state machine would (1) create the group
>> snapshot record, (2) set the state of the group to "creating snapshot"
>> (to prevent new images from being added/removed from the group while
>> the op is in-progress), (3) acquire the lock for all images in the
>> group, (4) create the individual image snapshots with the linkage back
>> to the group snapshot record (can be performed in parallel up to max
>> concurrent ops), (5) release the exclusive locks, and (6) reset the
>> group status to "ready".
>>
>> If you have a hard crash/failure anywhere, a "snap remove" operation
>> should be designed to get the group back into consistent state (i.e.
>> remove any snapshots linked to the group and reset the group state
>> back to ready).
>>
>> On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> Guys,
>>>
>>> I updated Snapshots section of this document:
>>> http://pad.ceph.com/p/consistency_groups, in accordance with my
>>> improved understanding of how it should be implemented.
>>> Please take a look and provide your comments. Some of my concerns
>>> regarding the implementation I highlighted in bold.
>>>
>>> Looking forward to your valuable remarks.
>>> Thanks in advance.
>>> V.
>>>
>>>
>>> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote:
>>>>> What if I'm holding this lock and somebody else is trying to reacquire the lock.
>>>>> How do I get notified about it?
>>>>
>>>> The image watcher is notified, which triggers its handler:
>>>>
>>>>  ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx)
>>>>
>>>> The handler calls the current lock policy method `lock_requested()`,
>>>> which will define what to do with the lock request. The StandartPolicy
>>>> is to release the lock, so it may ping-ponging between the
>>>> clients. You may define a different policy -- rbd-mirror is an example
>>>> where it is used.
>>>>
>>>> Everywhere where an operation needs the exclusive lock, it is always
>>>> checked if we currently are a lock owner, i.e:
>>>>
>>>>  ictx->exclusive_lock->is_lock_owner()
>>>>
>>>> and if it is false, the exlusive lock is requested. Before this check
>>>> you need to aquire ctx->owner_lock, and until you release owner_lock
>>>> you can be sure your exclusive lock will not leak to another
>>>> client. After releasing owner_lock, you will need to repeate the check
>>>> again when you need it.
>>>>
>>>> --
>>>> Mykola Golub
>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote:
>>>>> >> Could you please point me to the place in source code where writer
>>>>> >> acquires an exclusive lock on the image.
>>>>> >
>>>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need
>>>>> > (after opening the image) is:
>>>>> >
>>>>> > ```
>>>>> >   C_SaferCond lock_ctx;
>>>>> >   {
>>>>> >     RWLock::WLocker l(ictx->owner_lock);
>>>>> >
>>>>> >     if (ictx->exclusive_lock == nullptr) {
>>>>> >       // exclusive-lock feature is not enabled
>>>>> >       return -EINVAL;
>>>>> >     }
>>>>> >
>>>>> >     // Request the lock. If it is currently owned by another client,
>>>>> >     // RPC message will be sent to the client to release the lock.
>>>>> >     ictx->exclusive_lock->request_lock(&lock_ctx);
>>>>> >   } // release owner_lock before waiting to avoid potential deadlock
>>>>> >
>>>>> >   int r = lock_ctx.wait();
>>>>> >   if (r < 0) {
>>>>> >     return r;
>>>>> >   }
>>>>> >
>>>>> >   RWLock::RLocker l(ictx->owner_lock);
>>>>> >   if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) {
>>>>> >        // failed to acquire exclusive lock
>>>>> >        return -EROFS;
>>>>> >   }
>>>>> >
>>>>> >   // At this point lock is acquired
>>>>> >   ...
>>>>> >
>>>>> > ```
>>>>> >
>>>>> > You might want to look at this PR
>>>>> >
>>>>> >  https://github.com/ceph/ceph/pull/9592
>>>>> >
>>>>> > where we discuss adding API methods to directly acquire and release
>>>>> > the exclusive lock. You don't need the API, but will find examples in
>>>>> > the patch, and also useful comments from Jason.
>>>>> >
>>>>> > --
>>>>> > Mykola Golub
>>>>> >
>>>>> >> I presume you were talking about the feature:
>>>>> >> exclusive_lock, shared_lock which can be used from command line using
>>>>> >> commands lock list, lock break.
>>>>> >>
>>>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>> >> > There is already a "request lock" RPC message and this is already handled
>>>>> >> > transparently within librbd when you attempt to acquire the lock and another
>>>>> >> > client owns it.
>>>>> >> >
>>>>> >> >
>>>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>> >> >>
>>>>> >> >> If an image already has a writer who owns the lock,
>>>>> >> >> should I implement a notification that allows to ask the writer to
>>>>> >> >> release the lock,
>>>>> >> >> is there already a standard way to intercept the exclusive lock?
>>>>> >> >>
>>>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com>
>>>>> >> >> wrote:
>>>>> >> >> > ... one more thing:
>>>>> >> >> >
>>>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to
>>>>> >> >> > indicate that an image is part of a consistency group to prevent older
>>>>> >> >> > librbd clients from removing the image or group snapshots.  This could
>>>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
>>>>> >> >> > still open the image R/O while its part of a group.
>>>>> >> >> >
>>>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com>
>>>>> >> >> > wrote:
>>>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we
>>>>> >> >> >> should implement option 3. Essentially, you have a prepare group
>>>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller.
>>>>> >> >> >> When that lease expires, IO would automatically be resumed even if the
>>>>> >> >> >> group snapshot hasn't been created yet.  This would also require
>>>>> >> >> >> commit/abort group snapshot RPC messages.
>>>>> >> >> >>
>>>>> >> >> >> However, thinking about this last night, here is another potential
>>>>> >> >> >> option:
>>>>> >> >> >>
>>>>> >> >> >> Option 4 - require images to have the exclusive lock feature before
>>>>> >> >> >> they can be added to a consistency group (and prevent disabling of
>>>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the
>>>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap
>>>>> >> >> >> create API), can co-operatively acquire the lock from all active image
>>>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and
>>>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal
>>>>> >> >> >> exclusive lock handling process will automatically take care of
>>>>> >> >> >> re-acquiring the lock from the dead client and resuming IO.
>>>>> >> >> >>
>>>>> >> >> >> This option not only re-uses existing code, it would also eliminate
>>>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort
>>>>> >> >> >> snapshot creation to support group snapshots (since it could all be
>>>>> >> >> >> handled internally).
>>>>> >> >> >>
>>>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com>
>>>>> >> >> >> wrote:
>>>>> >> >> >>> Gentlemen,
>>>>> >> >> >>>
>>>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes.
>>>>> >> >> >>>
>>>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a
>>>>> >> >> >>> consistency group,
>>>>> >> >> >>> we first need to quiesce all the image writers in the consistency
>>>>> >> >> >>> group.
>>>>> >> >> >>> Let me call
>>>>> >> >> >>> group client - a client which requests a consistency group to take a
>>>>> >> >> >>> snapshot.
>>>>> >> >> >>> Image client - the client that writes to an image.
>>>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image
>>>>> >> >> >>> clients that write to the images in the group. After quiescing half of
>>>>> >> >> >>> the image clients the group client can die.
>>>>> >> >> >>>
>>>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced
>>>>> >> >> >>> image clients.
>>>>> >> >> >>>
>>>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that
>>>>> >> >> >>> consistency group.
>>>>> >> >> >>> We can show warning next to those unfinished groups when user runs
>>>>> >> >> >>> group list command.
>>>>> >> >> >>> There will be a command like group recover, which allows users to
>>>>> >> >> >>> rollback unsuccessful snapshots
>>>>> >> >> >>> or continue them using create snapshot command.
>>>>> >> >> >>>
>>>>> >> >> >>> Option 2 - is to establish some heart beats between group client and
>>>>> >> >> >>> image client. If group client fails to heart beat then image client
>>>>> >> >> >>> unquiesces itself and continues normal operation.
>>>>> >> >> >>>
>>>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client
>>>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our
>>>>> >> >> >>> normal operation informing group client of the fact.
>>>>> >> >> >>>
>>>>> >> >> >>> Which of these options do you prefer? Probably there are other options
>>>>> >> >> >>> that I miss.
>>>>> >> >> >>>
>>>>> >> >> >>> Thanks,
>>>>> >> >> >>> Victor.
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >> --
>>>>> >> >> >> Jason
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > --
>>>>> >> >> > Jason
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > --
>>>>> >> > Jason
>>>>> >> >
>>>>> >> --
>>>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> >> the body of a message to majordomo@vger.kernel.org
>>>>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-09-09 22:41                       ` Victor Denisov
@ 2016-09-10 18:37                         ` Jason Dillaman
  2016-09-11  1:46                           ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Dillaman @ 2016-09-10 18:37 UTC (permalink / raw)
  To: Victor Denisov; +Cc: Mykola Golub, ceph-devel, Josh Durgin

Those are all internal classes -- the cls types are already
dependencies within the librbd internals. Feel free to add the
necessary include and use it directly from within librbd.

On Fri, Sep 9, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> I have a question about where SnapshotNamespace type should be placed.
> I placed it in cls/rbd/cls_rbd_types.h because cls client and cls
> backend components should have access to this type.
> Also this type is required in librbd/Operations.cc - because we want
> to specify in what namespace Operations::snap_create should create
> snapshots.
> However Operations.cc doesn't import cls_rbd_types.h right now. If the
> question was about public interface of librbd/librbd.cc, then I would
> create a duplicate of SnapshotNamespace type in librbd layer without
> hesitation.
> But these functions are internal, so, my question is whether it's
> really feasible to create another type for SnapshotNamespace in librbd
> layer.
>
>
> On Mon, Aug 29, 2016 at 2:10 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> Right, I forgot about snaphot "namespaces". I'll add this part.
>> I guess it makes sense to discuss the whole thing on the next CDM.
>>
>> On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>> I think the first step is to implement the concept of snapshot "namespaces".
>>>
>>> This could be implemented as an optional variant structure associated
>>> with each snapshot at creation (see the ImageWatcher RPC messages or
>>> journaling event type encoding for examples of this). For consistency
>>> group snapshots, this structure would identify the snapshot as
>>> belonging to the consistency group and have a unique id back to the
>>> specific group snapshot.
>>>
>>> When creating a snapshot, the state machine would (1) create the group
>>> snapshot record, (2) set the state of the group to "creating snapshot"
>>> (to prevent new images from being added/removed from the group while
>>> the op is in-progress), (3) acquire the lock for all images in the
>>> group, (4) create the individual image snapshots with the linkage back
>>> to the group snapshot record (can be performed in parallel up to max
>>> concurrent ops), (5) release the exclusive locks, and (6) reset the
>>> group status to "ready".
>>>
>>> If you have a hard crash/failure anywhere, a "snap remove" operation
>>> should be designed to get the group back into consistent state (i.e.
>>> remove any snapshots linked to the group and reset the group state
>>> back to ready).
>>>
>>> On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>> Guys,
>>>>
>>>> I updated Snapshots section of this document:
>>>> http://pad.ceph.com/p/consistency_groups, in accordance with my
>>>> improved understanding of how it should be implemented.
>>>> Please take a look and provide your comments. Some of my concerns
>>>> regarding the implementation I highlighted in bold.
>>>>
>>>> Looking forward to your valuable remarks.
>>>> Thanks in advance.
>>>> V.
>>>>
>>>>
>>>> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>>>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote:
>>>>>> What if I'm holding this lock and somebody else is trying to reacquire the lock.
>>>>>> How do I get notified about it?
>>>>>
>>>>> The image watcher is notified, which triggers its handler:
>>>>>
>>>>>  ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx)
>>>>>
>>>>> The handler calls the current lock policy method `lock_requested()`,
>>>>> which will define what to do with the lock request. The StandartPolicy
>>>>> is to release the lock, so it may ping-ponging between the
>>>>> clients. You may define a different policy -- rbd-mirror is an example
>>>>> where it is used.
>>>>>
>>>>> Everywhere where an operation needs the exclusive lock, it is always
>>>>> checked if we currently are a lock owner, i.e:
>>>>>
>>>>>  ictx->exclusive_lock->is_lock_owner()
>>>>>
>>>>> and if it is false, the exlusive lock is requested. Before this check
>>>>> you need to aquire ctx->owner_lock, and until you release owner_lock
>>>>> you can be sure your exclusive lock will not leak to another
>>>>> client. After releasing owner_lock, you will need to repeate the check
>>>>> again when you need it.
>>>>>
>>>>> --
>>>>> Mykola Golub
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>>>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote:
>>>>>> >> Could you please point me to the place in source code where writer
>>>>>> >> acquires an exclusive lock on the image.
>>>>>> >
>>>>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need
>>>>>> > (after opening the image) is:
>>>>>> >
>>>>>> > ```
>>>>>> >   C_SaferCond lock_ctx;
>>>>>> >   {
>>>>>> >     RWLock::WLocker l(ictx->owner_lock);
>>>>>> >
>>>>>> >     if (ictx->exclusive_lock == nullptr) {
>>>>>> >       // exclusive-lock feature is not enabled
>>>>>> >       return -EINVAL;
>>>>>> >     }
>>>>>> >
>>>>>> >     // Request the lock. If it is currently owned by another client,
>>>>>> >     // RPC message will be sent to the client to release the lock.
>>>>>> >     ictx->exclusive_lock->request_lock(&lock_ctx);
>>>>>> >   } // release owner_lock before waiting to avoid potential deadlock
>>>>>> >
>>>>>> >   int r = lock_ctx.wait();
>>>>>> >   if (r < 0) {
>>>>>> >     return r;
>>>>>> >   }
>>>>>> >
>>>>>> >   RWLock::RLocker l(ictx->owner_lock);
>>>>>> >   if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) {
>>>>>> >        // failed to acquire exclusive lock
>>>>>> >        return -EROFS;
>>>>>> >   }
>>>>>> >
>>>>>> >   // At this point lock is acquired
>>>>>> >   ...
>>>>>> >
>>>>>> > ```
>>>>>> >
>>>>>> > You might want to look at this PR
>>>>>> >
>>>>>> >  https://github.com/ceph/ceph/pull/9592
>>>>>> >
>>>>>> > where we discuss adding API methods to directly acquire and release
>>>>>> > the exclusive lock. You don't need the API, but will find examples in
>>>>>> > the patch, and also useful comments from Jason.
>>>>>> >
>>>>>> > --
>>>>>> > Mykola Golub
>>>>>> >
>>>>>> >> I presume you were talking about the feature:
>>>>>> >> exclusive_lock, shared_lock which can be used from command line using
>>>>>> >> commands lock list, lock break.
>>>>>> >>
>>>>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>> >> > There is already a "request lock" RPC message and this is already handled
>>>>>> >> > transparently within librbd when you attempt to acquire the lock and another
>>>>>> >> > client owns it.
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>> >> >>
>>>>>> >> >> If an image already has a writer who owns the lock,
>>>>>> >> >> should I implement a notification that allows to ask the writer to
>>>>>> >> >> release the lock,
>>>>>> >> >> is there already a standard way to intercept the exclusive lock?
>>>>>> >> >>
>>>>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com>
>>>>>> >> >> wrote:
>>>>>> >> >> > ... one more thing:
>>>>>> >> >> >
>>>>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to
>>>>>> >> >> > indicate that an image is part of a consistency group to prevent older
>>>>>> >> >> > librbd clients from removing the image or group snapshots.  This could
>>>>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
>>>>>> >> >> > still open the image R/O while its part of a group.
>>>>>> >> >> >
>>>>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com>
>>>>>> >> >> > wrote:
>>>>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we
>>>>>> >> >> >> should implement option 3. Essentially, you have a prepare group
>>>>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller.
>>>>>> >> >> >> When that lease expires, IO would automatically be resumed even if the
>>>>>> >> >> >> group snapshot hasn't been created yet.  This would also require
>>>>>> >> >> >> commit/abort group snapshot RPC messages.
>>>>>> >> >> >>
>>>>>> >> >> >> However, thinking about this last night, here is another potential
>>>>>> >> >> >> option:
>>>>>> >> >> >>
>>>>>> >> >> >> Option 4 - require images to have the exclusive lock feature before
>>>>>> >> >> >> they can be added to a consistency group (and prevent disabling of
>>>>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the
>>>>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap
>>>>>> >> >> >> create API), can co-operatively acquire the lock from all active image
>>>>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and
>>>>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal
>>>>>> >> >> >> exclusive lock handling process will automatically take care of
>>>>>> >> >> >> re-acquiring the lock from the dead client and resuming IO.
>>>>>> >> >> >>
>>>>>> >> >> >> This option not only re-uses existing code, it would also eliminate
>>>>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort
>>>>>> >> >> >> snapshot creation to support group snapshots (since it could all be
>>>>>> >> >> >> handled internally).
>>>>>> >> >> >>
>>>>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com>
>>>>>> >> >> >> wrote:
>>>>>> >> >> >>> Gentlemen,
>>>>>> >> >> >>>
>>>>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes.
>>>>>> >> >> >>>
>>>>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a
>>>>>> >> >> >>> consistency group,
>>>>>> >> >> >>> we first need to quiesce all the image writers in the consistency
>>>>>> >> >> >>> group.
>>>>>> >> >> >>> Let me call
>>>>>> >> >> >>> group client - a client which requests a consistency group to take a
>>>>>> >> >> >>> snapshot.
>>>>>> >> >> >>> Image client - the client that writes to an image.
>>>>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image
>>>>>> >> >> >>> clients that write to the images in the group. After quiescing half of
>>>>>> >> >> >>> the image clients the group client can die.
>>>>>> >> >> >>>
>>>>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced
>>>>>> >> >> >>> image clients.
>>>>>> >> >> >>>
>>>>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that
>>>>>> >> >> >>> consistency group.
>>>>>> >> >> >>> We can show warning next to those unfinished groups when user runs
>>>>>> >> >> >>> group list command.
>>>>>> >> >> >>> There will be a command like group recover, which allows users to
>>>>>> >> >> >>> rollback unsuccessful snapshots
>>>>>> >> >> >>> or continue them using create snapshot command.
>>>>>> >> >> >>>
>>>>>> >> >> >>> Option 2 - is to establish some heart beats between group client and
>>>>>> >> >> >>> image client. If group client fails to heart beat then image client
>>>>>> >> >> >>> unquiesces itself and continues normal operation.
>>>>>> >> >> >>>
>>>>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client
>>>>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our
>>>>>> >> >> >>> normal operation informing group client of the fact.
>>>>>> >> >> >>>
>>>>>> >> >> >>> Which of these options do you prefer? Probably there are other options
>>>>>> >> >> >>> that I miss.
>>>>>> >> >> >>>
>>>>>> >> >> >>> Thanks,
>>>>>> >> >> >>> Victor.
>>>>>> >> >> >>
>>>>>> >> >> >>
>>>>>> >> >> >>
>>>>>> >> >> >> --
>>>>>> >> >> >> Jason
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >> > --
>>>>>> >> >> > Jason
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > --
>>>>>> >> > Jason
>>>>>> >> >
>>>>>> >> --
>>>>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> >> the body of a message to majordomo@vger.kernel.org
>>>>>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --
>>> Jason



-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-09-10 18:37                         ` Jason Dillaman
@ 2016-09-11  1:46                           ` Victor Denisov
  2016-09-12 13:18                             ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-09-11  1:46 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin

Another question. Maybe not really a question, but I would like to
verify if I understood what you wrote in the ether pad.

You suggest to create image snapshots simultaneously.
If everything shuts down when we are making those individual
snapshots, then we end up with a SnapshotRecord in incomplete state
and images either with snapshots or without them.
Do I understand correctly that if the user wants to remove this
unfinished group snapshot then we have to:
 - list all images in this group
 - look for snapshots in those images with the guid as their name.
 - delete those individual snapshots and ignore errors if those
snapshots don't exist.
 - delete then entry.

One thing that I don't understand in this case is, what if the user
decides to delete one of the images when there are dangling group
snapshots. Let's call this image A.
This dangling group snapshot could have successfully created a
snapshot of this image A. Let's call this snapshot A_snap.
Now if we remove image A from this group then once we try to cleanup
dangling group snapshot then A_snap shapshot will be overlooked,
because image A is not a member of the group any more.
And I don't understand how we can prevent this from happening in this
approach, except by disallowing to remove images if there are dandling
group snapshots.


You mentioned that we should call image's individual snapshots after
the groups guid. I assume we should name them something like
<guid>_<group_snap_id>.
If we named them only using guid, then we would be able to create only
one group snapshot.

Thanks,
V.


On Sat, Sep 10, 2016 at 11:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
> Those are all internal classes -- the cls types are already
> dependencies within the librbd internals. Feel free to add the
> necessary include and use it directly from within librbd.
>
> On Fri, Sep 9, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> I have a question about where SnapshotNamespace type should be placed.
>> I placed it in cls/rbd/cls_rbd_types.h because cls client and cls
>> backend components should have access to this type.
>> Also this type is required in librbd/Operations.cc - because we want
>> to specify in what namespace Operations::snap_create should create
>> snapshots.
>> However Operations.cc doesn't import cls_rbd_types.h right now. If the
>> question was about public interface of librbd/librbd.cc, then I would
>> create a duplicate of SnapshotNamespace type in librbd layer without
>> hesitation.
>> But these functions are internal, so, my question is whether it's
>> really feasible to create another type for SnapshotNamespace in librbd
>> layer.
>>
>>
>> On Mon, Aug 29, 2016 at 2:10 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> Right, I forgot about snaphot "namespaces". I'll add this part.
>>> I guess it makes sense to discuss the whole thing on the next CDM.
>>>
>>> On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>> I think the first step is to implement the concept of snapshot "namespaces".
>>>>
>>>> This could be implemented as an optional variant structure associated
>>>> with each snapshot at creation (see the ImageWatcher RPC messages or
>>>> journaling event type encoding for examples of this). For consistency
>>>> group snapshots, this structure would identify the snapshot as
>>>> belonging to the consistency group and have a unique id back to the
>>>> specific group snapshot.
>>>>
>>>> When creating a snapshot, the state machine would (1) create the group
>>>> snapshot record, (2) set the state of the group to "creating snapshot"
>>>> (to prevent new images from being added/removed from the group while
>>>> the op is in-progress), (3) acquire the lock for all images in the
>>>> group, (4) create the individual image snapshots with the linkage back
>>>> to the group snapshot record (can be performed in parallel up to max
>>>> concurrent ops), (5) release the exclusive locks, and (6) reset the
>>>> group status to "ready".
>>>>
>>>> If you have a hard crash/failure anywhere, a "snap remove" operation
>>>> should be designed to get the group back into consistent state (i.e.
>>>> remove any snapshots linked to the group and reset the group state
>>>> back to ready).
>>>>
>>>> On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>> Guys,
>>>>>
>>>>> I updated Snapshots section of this document:
>>>>> http://pad.ceph.com/p/consistency_groups, in accordance with my
>>>>> improved understanding of how it should be implemented.
>>>>> Please take a look and provide your comments. Some of my concerns
>>>>> regarding the implementation I highlighted in bold.
>>>>>
>>>>> Looking forward to your valuable remarks.
>>>>> Thanks in advance.
>>>>> V.
>>>>>
>>>>>
>>>>> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>>>>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote:
>>>>>>> What if I'm holding this lock and somebody else is trying to reacquire the lock.
>>>>>>> How do I get notified about it?
>>>>>>
>>>>>> The image watcher is notified, which triggers its handler:
>>>>>>
>>>>>>  ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx)
>>>>>>
>>>>>> The handler calls the current lock policy method `lock_requested()`,
>>>>>> which will define what to do with the lock request. The StandartPolicy
>>>>>> is to release the lock, so it may ping-ponging between the
>>>>>> clients. You may define a different policy -- rbd-mirror is an example
>>>>>> where it is used.
>>>>>>
>>>>>> Everywhere where an operation needs the exclusive lock, it is always
>>>>>> checked if we currently are a lock owner, i.e:
>>>>>>
>>>>>>  ictx->exclusive_lock->is_lock_owner()
>>>>>>
>>>>>> and if it is false, the exlusive lock is requested. Before this check
>>>>>> you need to aquire ctx->owner_lock, and until you release owner_lock
>>>>>> you can be sure your exclusive lock will not leak to another
>>>>>> client. After releasing owner_lock, you will need to repeate the check
>>>>>> again when you need it.
>>>>>>
>>>>>> --
>>>>>> Mykola Golub
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>>>>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote:
>>>>>>> >> Could you please point me to the place in source code where writer
>>>>>>> >> acquires an exclusive lock on the image.
>>>>>>> >
>>>>>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need
>>>>>>> > (after opening the image) is:
>>>>>>> >
>>>>>>> > ```
>>>>>>> >   C_SaferCond lock_ctx;
>>>>>>> >   {
>>>>>>> >     RWLock::WLocker l(ictx->owner_lock);
>>>>>>> >
>>>>>>> >     if (ictx->exclusive_lock == nullptr) {
>>>>>>> >       // exclusive-lock feature is not enabled
>>>>>>> >       return -EINVAL;
>>>>>>> >     }
>>>>>>> >
>>>>>>> >     // Request the lock. If it is currently owned by another client,
>>>>>>> >     // RPC message will be sent to the client to release the lock.
>>>>>>> >     ictx->exclusive_lock->request_lock(&lock_ctx);
>>>>>>> >   } // release owner_lock before waiting to avoid potential deadlock
>>>>>>> >
>>>>>>> >   int r = lock_ctx.wait();
>>>>>>> >   if (r < 0) {
>>>>>>> >     return r;
>>>>>>> >   }
>>>>>>> >
>>>>>>> >   RWLock::RLocker l(ictx->owner_lock);
>>>>>>> >   if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) {
>>>>>>> >        // failed to acquire exclusive lock
>>>>>>> >        return -EROFS;
>>>>>>> >   }
>>>>>>> >
>>>>>>> >   // At this point lock is acquired
>>>>>>> >   ...
>>>>>>> >
>>>>>>> > ```
>>>>>>> >
>>>>>>> > You might want to look at this PR
>>>>>>> >
>>>>>>> >  https://github.com/ceph/ceph/pull/9592
>>>>>>> >
>>>>>>> > where we discuss adding API methods to directly acquire and release
>>>>>>> > the exclusive lock. You don't need the API, but will find examples in
>>>>>>> > the patch, and also useful comments from Jason.
>>>>>>> >
>>>>>>> > --
>>>>>>> > Mykola Golub
>>>>>>> >
>>>>>>> >> I presume you were talking about the feature:
>>>>>>> >> exclusive_lock, shared_lock which can be used from command line using
>>>>>>> >> commands lock list, lock break.
>>>>>>> >>
>>>>>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>> >> > There is already a "request lock" RPC message and this is already handled
>>>>>>> >> > transparently within librbd when you attempt to acquire the lock and another
>>>>>>> >> > client owns it.
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>> >> >>
>>>>>>> >> >> If an image already has a writer who owns the lock,
>>>>>>> >> >> should I implement a notification that allows to ask the writer to
>>>>>>> >> >> release the lock,
>>>>>>> >> >> is there already a standard way to intercept the exclusive lock?
>>>>>>> >> >>
>>>>>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com>
>>>>>>> >> >> wrote:
>>>>>>> >> >> > ... one more thing:
>>>>>>> >> >> >
>>>>>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to
>>>>>>> >> >> > indicate that an image is part of a consistency group to prevent older
>>>>>>> >> >> > librbd clients from removing the image or group snapshots.  This could
>>>>>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
>>>>>>> >> >> > still open the image R/O while its part of a group.
>>>>>>> >> >> >
>>>>>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com>
>>>>>>> >> >> > wrote:
>>>>>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we
>>>>>>> >> >> >> should implement option 3. Essentially, you have a prepare group
>>>>>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller.
>>>>>>> >> >> >> When that lease expires, IO would automatically be resumed even if the
>>>>>>> >> >> >> group snapshot hasn't been created yet.  This would also require
>>>>>>> >> >> >> commit/abort group snapshot RPC messages.
>>>>>>> >> >> >>
>>>>>>> >> >> >> However, thinking about this last night, here is another potential
>>>>>>> >> >> >> option:
>>>>>>> >> >> >>
>>>>>>> >> >> >> Option 4 - require images to have the exclusive lock feature before
>>>>>>> >> >> >> they can be added to a consistency group (and prevent disabling of
>>>>>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the
>>>>>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap
>>>>>>> >> >> >> create API), can co-operatively acquire the lock from all active image
>>>>>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and
>>>>>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal
>>>>>>> >> >> >> exclusive lock handling process will automatically take care of
>>>>>>> >> >> >> re-acquiring the lock from the dead client and resuming IO.
>>>>>>> >> >> >>
>>>>>>> >> >> >> This option not only re-uses existing code, it would also eliminate
>>>>>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort
>>>>>>> >> >> >> snapshot creation to support group snapshots (since it could all be
>>>>>>> >> >> >> handled internally).
>>>>>>> >> >> >>
>>>>>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com>
>>>>>>> >> >> >> wrote:
>>>>>>> >> >> >>> Gentlemen,
>>>>>>> >> >> >>>
>>>>>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes.
>>>>>>> >> >> >>>
>>>>>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a
>>>>>>> >> >> >>> consistency group,
>>>>>>> >> >> >>> we first need to quiesce all the image writers in the consistency
>>>>>>> >> >> >>> group.
>>>>>>> >> >> >>> Let me call
>>>>>>> >> >> >>> group client - a client which requests a consistency group to take a
>>>>>>> >> >> >>> snapshot.
>>>>>>> >> >> >>> Image client - the client that writes to an image.
>>>>>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image
>>>>>>> >> >> >>> clients that write to the images in the group. After quiescing half of
>>>>>>> >> >> >>> the image clients the group client can die.
>>>>>>> >> >> >>>
>>>>>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced
>>>>>>> >> >> >>> image clients.
>>>>>>> >> >> >>>
>>>>>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that
>>>>>>> >> >> >>> consistency group.
>>>>>>> >> >> >>> We can show warning next to those unfinished groups when user runs
>>>>>>> >> >> >>> group list command.
>>>>>>> >> >> >>> There will be a command like group recover, which allows users to
>>>>>>> >> >> >>> rollback unsuccessful snapshots
>>>>>>> >> >> >>> or continue them using create snapshot command.
>>>>>>> >> >> >>>
>>>>>>> >> >> >>> Option 2 - is to establish some heart beats between group client and
>>>>>>> >> >> >>> image client. If group client fails to heart beat then image client
>>>>>>> >> >> >>> unquiesces itself and continues normal operation.
>>>>>>> >> >> >>>
>>>>>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client
>>>>>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our
>>>>>>> >> >> >>> normal operation informing group client of the fact.
>>>>>>> >> >> >>>
>>>>>>> >> >> >>> Which of these options do you prefer? Probably there are other options
>>>>>>> >> >> >>> that I miss.
>>>>>>> >> >> >>>
>>>>>>> >> >> >>> Thanks,
>>>>>>> >> >> >>> Victor.
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >>
>>>>>>> >> >> >> --
>>>>>>> >> >> >> Jason
>>>>>>> >> >> >
>>>>>>> >> >> >
>>>>>>> >> >> >
>>>>>>> >> >> > --
>>>>>>> >> >> > Jason
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> > --
>>>>>>> >> > Jason
>>>>>>> >> >
>>>>>>> >> --
>>>>>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>> >> the body of a message to majordomo@vger.kernel.org
>>>>>>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>> --
>>>> Jason
>
>
>
> --
> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-09-11  1:46                           ` Victor Denisov
@ 2016-09-12 13:18                             ` Jason Dillaman
  2016-09-12 22:52                               ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Dillaman @ 2016-09-12 13:18 UTC (permalink / raw)
  To: Victor Denisov; +Cc: Mykola Golub, ceph-devel, Josh Durgin

On Sat, Sep 10, 2016 at 9:46 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> Another question. Maybe not really a question, but I would like to
> verify if I understood what you wrote in the ether pad.
>
> You suggest to create image snapshots simultaneously.
> If everything shuts down when we are making those individual
> snapshots, then we end up with a SnapshotRecord in incomplete state
> and images either with snapshots or without them.
> Do I understand correctly that if the user wants to remove this
> unfinished group snapshot then we have to:
>  - list all images in this group
>  - look for snapshots in those images with the guid as their name.
>  - delete those individual snapshots and ignore errors if those
> snapshots don't exist.
>  - delete then entry.

It would be the standard remove state machine, which is basically the
steps you have above. Note that you would always need to handle the
"-ENOENT" case since I could always associate an image to a group
after a group snap was created (i.e. so the new image doesn't have a
matching image snapshot for a group snapshot).

> One thing that I don't understand in this case is, what if the user
> decides to delete one of the images when there are dangling group
> snapshots. Let's call this image A.
> This dangling group snapshot could have successfully created a
> snapshot of this image A. Let's call this snapshot A_snap.
> Now if we remove image A from this group then once we try to cleanup
> dangling group snapshot then A_snap shapshot will be overlooked,
> because image A is not a member of the group any more.
> And I don't understand how we can prevent this from happening in this
> approach, except by disallowing to remove images if there are dandling
> group snapshots.

How is the snapshot dangling for image A? If it successfully created
the snapshot on image A, it has a snapshot record that associates it
to the group. Therefore, when the image is removed from the group, I
would think you would automatically delete the the group snapshots
contained within the image.

> You mentioned that we should call image's individual snapshots after
> the groups guid. I assume we should name them something like
> <guid>_<group_snap_id>.
> If we named them only using guid, then we would be able to create only
> one group snapshot.

Yup -- that should be fine.

> Thanks,
> V.
>
>
> On Sat, Sep 10, 2016 at 11:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>> Those are all internal classes -- the cls types are already
>> dependencies within the librbd internals. Feel free to add the
>> necessary include and use it directly from within librbd.
>>
>> On Fri, Sep 9, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> I have a question about where SnapshotNamespace type should be placed.
>>> I placed it in cls/rbd/cls_rbd_types.h because cls client and cls
>>> backend components should have access to this type.
>>> Also this type is required in librbd/Operations.cc - because we want
>>> to specify in what namespace Operations::snap_create should create
>>> snapshots.
>>> However Operations.cc doesn't import cls_rbd_types.h right now. If the
>>> question was about public interface of librbd/librbd.cc, then I would
>>> create a duplicate of SnapshotNamespace type in librbd layer without
>>> hesitation.
>>> But these functions are internal, so, my question is whether it's
>>> really feasible to create another type for SnapshotNamespace in librbd
>>> layer.
>>>
>>>
>>> On Mon, Aug 29, 2016 at 2:10 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>> Right, I forgot about snaphot "namespaces". I'll add this part.
>>>> I guess it makes sense to discuss the whole thing on the next CDM.
>>>>
>>>> On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>> I think the first step is to implement the concept of snapshot "namespaces".
>>>>>
>>>>> This could be implemented as an optional variant structure associated
>>>>> with each snapshot at creation (see the ImageWatcher RPC messages or
>>>>> journaling event type encoding for examples of this). For consistency
>>>>> group snapshots, this structure would identify the snapshot as
>>>>> belonging to the consistency group and have a unique id back to the
>>>>> specific group snapshot.
>>>>>
>>>>> When creating a snapshot, the state machine would (1) create the group
>>>>> snapshot record, (2) set the state of the group to "creating snapshot"
>>>>> (to prevent new images from being added/removed from the group while
>>>>> the op is in-progress), (3) acquire the lock for all images in the
>>>>> group, (4) create the individual image snapshots with the linkage back
>>>>> to the group snapshot record (can be performed in parallel up to max
>>>>> concurrent ops), (5) release the exclusive locks, and (6) reset the
>>>>> group status to "ready".
>>>>>
>>>>> If you have a hard crash/failure anywhere, a "snap remove" operation
>>>>> should be designed to get the group back into consistent state (i.e.
>>>>> remove any snapshots linked to the group and reset the group state
>>>>> back to ready).
>>>>>
>>>>> On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>> Guys,
>>>>>>
>>>>>> I updated Snapshots section of this document:
>>>>>> http://pad.ceph.com/p/consistency_groups, in accordance with my
>>>>>> improved understanding of how it should be implemented.
>>>>>> Please take a look and provide your comments. Some of my concerns
>>>>>> regarding the implementation I highlighted in bold.
>>>>>>
>>>>>> Looking forward to your valuable remarks.
>>>>>> Thanks in advance.
>>>>>> V.
>>>>>>
>>>>>>
>>>>>> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>>>>>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote:
>>>>>>>> What if I'm holding this lock and somebody else is trying to reacquire the lock.
>>>>>>>> How do I get notified about it?
>>>>>>>
>>>>>>> The image watcher is notified, which triggers its handler:
>>>>>>>
>>>>>>>  ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx)
>>>>>>>
>>>>>>> The handler calls the current lock policy method `lock_requested()`,
>>>>>>> which will define what to do with the lock request. The StandartPolicy
>>>>>>> is to release the lock, so it may ping-ponging between the
>>>>>>> clients. You may define a different policy -- rbd-mirror is an example
>>>>>>> where it is used.
>>>>>>>
>>>>>>> Everywhere where an operation needs the exclusive lock, it is always
>>>>>>> checked if we currently are a lock owner, i.e:
>>>>>>>
>>>>>>>  ictx->exclusive_lock->is_lock_owner()
>>>>>>>
>>>>>>> and if it is false, the exlusive lock is requested. Before this check
>>>>>>> you need to aquire ctx->owner_lock, and until you release owner_lock
>>>>>>> you can be sure your exclusive lock will not leak to another
>>>>>>> client. After releasing owner_lock, you will need to repeate the check
>>>>>>> again when you need it.
>>>>>>>
>>>>>>> --
>>>>>>> Mykola Golub
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>>>>>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote:
>>>>>>>> >> Could you please point me to the place in source code where writer
>>>>>>>> >> acquires an exclusive lock on the image.
>>>>>>>> >
>>>>>>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need
>>>>>>>> > (after opening the image) is:
>>>>>>>> >
>>>>>>>> > ```
>>>>>>>> >   C_SaferCond lock_ctx;
>>>>>>>> >   {
>>>>>>>> >     RWLock::WLocker l(ictx->owner_lock);
>>>>>>>> >
>>>>>>>> >     if (ictx->exclusive_lock == nullptr) {
>>>>>>>> >       // exclusive-lock feature is not enabled
>>>>>>>> >       return -EINVAL;
>>>>>>>> >     }
>>>>>>>> >
>>>>>>>> >     // Request the lock. If it is currently owned by another client,
>>>>>>>> >     // RPC message will be sent to the client to release the lock.
>>>>>>>> >     ictx->exclusive_lock->request_lock(&lock_ctx);
>>>>>>>> >   } // release owner_lock before waiting to avoid potential deadlock
>>>>>>>> >
>>>>>>>> >   int r = lock_ctx.wait();
>>>>>>>> >   if (r < 0) {
>>>>>>>> >     return r;
>>>>>>>> >   }
>>>>>>>> >
>>>>>>>> >   RWLock::RLocker l(ictx->owner_lock);
>>>>>>>> >   if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) {
>>>>>>>> >        // failed to acquire exclusive lock
>>>>>>>> >        return -EROFS;
>>>>>>>> >   }
>>>>>>>> >
>>>>>>>> >   // At this point lock is acquired
>>>>>>>> >   ...
>>>>>>>> >
>>>>>>>> > ```
>>>>>>>> >
>>>>>>>> > You might want to look at this PR
>>>>>>>> >
>>>>>>>> >  https://github.com/ceph/ceph/pull/9592
>>>>>>>> >
>>>>>>>> > where we discuss adding API methods to directly acquire and release
>>>>>>>> > the exclusive lock. You don't need the API, but will find examples in
>>>>>>>> > the patch, and also useful comments from Jason.
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Mykola Golub
>>>>>>>> >
>>>>>>>> >> I presume you were talking about the feature:
>>>>>>>> >> exclusive_lock, shared_lock which can be used from command line using
>>>>>>>> >> commands lock list, lock break.
>>>>>>>> >>
>>>>>>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>> >> > There is already a "request lock" RPC message and this is already handled
>>>>>>>> >> > transparently within librbd when you attempt to acquire the lock and another
>>>>>>>> >> > client owns it.
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>> >> >>
>>>>>>>> >> >> If an image already has a writer who owns the lock,
>>>>>>>> >> >> should I implement a notification that allows to ask the writer to
>>>>>>>> >> >> release the lock,
>>>>>>>> >> >> is there already a standard way to intercept the exclusive lock?
>>>>>>>> >> >>
>>>>>>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com>
>>>>>>>> >> >> wrote:
>>>>>>>> >> >> > ... one more thing:
>>>>>>>> >> >> >
>>>>>>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to
>>>>>>>> >> >> > indicate that an image is part of a consistency group to prevent older
>>>>>>>> >> >> > librbd clients from removing the image or group snapshots.  This could
>>>>>>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
>>>>>>>> >> >> > still open the image R/O while its part of a group.
>>>>>>>> >> >> >
>>>>>>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com>
>>>>>>>> >> >> > wrote:
>>>>>>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we
>>>>>>>> >> >> >> should implement option 3. Essentially, you have a prepare group
>>>>>>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller.
>>>>>>>> >> >> >> When that lease expires, IO would automatically be resumed even if the
>>>>>>>> >> >> >> group snapshot hasn't been created yet.  This would also require
>>>>>>>> >> >> >> commit/abort group snapshot RPC messages.
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> However, thinking about this last night, here is another potential
>>>>>>>> >> >> >> option:
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> Option 4 - require images to have the exclusive lock feature before
>>>>>>>> >> >> >> they can be added to a consistency group (and prevent disabling of
>>>>>>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the
>>>>>>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap
>>>>>>>> >> >> >> create API), can co-operatively acquire the lock from all active image
>>>>>>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and
>>>>>>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal
>>>>>>>> >> >> >> exclusive lock handling process will automatically take care of
>>>>>>>> >> >> >> re-acquiring the lock from the dead client and resuming IO.
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> This option not only re-uses existing code, it would also eliminate
>>>>>>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort
>>>>>>>> >> >> >> snapshot creation to support group snapshots (since it could all be
>>>>>>>> >> >> >> handled internally).
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com>
>>>>>>>> >> >> >> wrote:
>>>>>>>> >> >> >>> Gentlemen,
>>>>>>>> >> >> >>>
>>>>>>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes.
>>>>>>>> >> >> >>>
>>>>>>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a
>>>>>>>> >> >> >>> consistency group,
>>>>>>>> >> >> >>> we first need to quiesce all the image writers in the consistency
>>>>>>>> >> >> >>> group.
>>>>>>>> >> >> >>> Let me call
>>>>>>>> >> >> >>> group client - a client which requests a consistency group to take a
>>>>>>>> >> >> >>> snapshot.
>>>>>>>> >> >> >>> Image client - the client that writes to an image.
>>>>>>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image
>>>>>>>> >> >> >>> clients that write to the images in the group. After quiescing half of
>>>>>>>> >> >> >>> the image clients the group client can die.
>>>>>>>> >> >> >>>
>>>>>>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced
>>>>>>>> >> >> >>> image clients.
>>>>>>>> >> >> >>>
>>>>>>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that
>>>>>>>> >> >> >>> consistency group.
>>>>>>>> >> >> >>> We can show warning next to those unfinished groups when user runs
>>>>>>>> >> >> >>> group list command.
>>>>>>>> >> >> >>> There will be a command like group recover, which allows users to
>>>>>>>> >> >> >>> rollback unsuccessful snapshots
>>>>>>>> >> >> >>> or continue them using create snapshot command.
>>>>>>>> >> >> >>>
>>>>>>>> >> >> >>> Option 2 - is to establish some heart beats between group client and
>>>>>>>> >> >> >>> image client. If group client fails to heart beat then image client
>>>>>>>> >> >> >>> unquiesces itself and continues normal operation.
>>>>>>>> >> >> >>>
>>>>>>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client
>>>>>>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our
>>>>>>>> >> >> >>> normal operation informing group client of the fact.
>>>>>>>> >> >> >>>
>>>>>>>> >> >> >>> Which of these options do you prefer? Probably there are other options
>>>>>>>> >> >> >>> that I miss.
>>>>>>>> >> >> >>>
>>>>>>>> >> >> >>> Thanks,
>>>>>>>> >> >> >>> Victor.
>>>>>>>> >> >> >>
>>>>>>>> >> >> >>
>>>>>>>> >> >> >>
>>>>>>>> >> >> >> --
>>>>>>>> >> >> >> Jason
>>>>>>>> >> >> >
>>>>>>>> >> >> >
>>>>>>>> >> >> >
>>>>>>>> >> >> > --
>>>>>>>> >> >> > Jason
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> > --
>>>>>>>> >> > Jason
>>>>>>>> >> >
>>>>>>>> >> --
>>>>>>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> >> the body of a message to majordomo@vger.kernel.org
>>>>>>>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jason
>>
>>
>>
>> --
>> Jason



-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-09-12 13:18                             ` Jason Dillaman
@ 2016-09-12 22:52                               ` Victor Denisov
  2016-09-12 23:06                                 ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-09-12 22:52 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin

Understood. Thank you, Jason.

On Mon, Sep 12, 2016 at 6:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
> On Sat, Sep 10, 2016 at 9:46 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> Another question. Maybe not really a question, but I would like to
>> verify if I understood what you wrote in the ether pad.
>>
>> You suggest to create image snapshots simultaneously.
>> If everything shuts down when we are making those individual
>> snapshots, then we end up with a SnapshotRecord in incomplete state
>> and images either with snapshots or without them.
>> Do I understand correctly that if the user wants to remove this
>> unfinished group snapshot then we have to:
>>  - list all images in this group
>>  - look for snapshots in those images with the guid as their name.
>>  - delete those individual snapshots and ignore errors if those
>> snapshots don't exist.
>>  - delete then entry.
>
> It would be the standard remove state machine, which is basically the
> steps you have above. Note that you would always need to handle the
> "-ENOENT" case since I could always associate an image to a group
> after a group snap was created (i.e. so the new image doesn't have a
> matching image snapshot for a group snapshot).
>
>> One thing that I don't understand in this case is, what if the user
>> decides to delete one of the images when there are dangling group
>> snapshots. Let's call this image A.
>> This dangling group snapshot could have successfully created a
>> snapshot of this image A. Let's call this snapshot A_snap.
>> Now if we remove image A from this group then once we try to cleanup
>> dangling group snapshot then A_snap shapshot will be overlooked,
>> because image A is not a member of the group any more.
>> And I don't understand how we can prevent this from happening in this
>> approach, except by disallowing to remove images if there are dandling
>> group snapshots.
>
> How is the snapshot dangling for image A? If it successfully created
> the snapshot on image A, it has a snapshot record that associates it
> to the group. Therefore, when the image is removed from the group, I
> would think you would automatically delete the the group snapshots
> contained within the image.
>
>> You mentioned that we should call image's individual snapshots after
>> the groups guid. I assume we should name them something like
>> <guid>_<group_snap_id>.
>> If we named them only using guid, then we would be able to create only
>> one group snapshot.
>
> Yup -- that should be fine.
>
>> Thanks,
>> V.
>>
>>
>> On Sat, Sep 10, 2016 at 11:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>> Those are all internal classes -- the cls types are already
>>> dependencies within the librbd internals. Feel free to add the
>>> necessary include and use it directly from within librbd.
>>>
>>> On Fri, Sep 9, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>> I have a question about where SnapshotNamespace type should be placed.
>>>> I placed it in cls/rbd/cls_rbd_types.h because cls client and cls
>>>> backend components should have access to this type.
>>>> Also this type is required in librbd/Operations.cc - because we want
>>>> to specify in what namespace Operations::snap_create should create
>>>> snapshots.
>>>> However Operations.cc doesn't import cls_rbd_types.h right now. If the
>>>> question was about public interface of librbd/librbd.cc, then I would
>>>> create a duplicate of SnapshotNamespace type in librbd layer without
>>>> hesitation.
>>>> But these functions are internal, so, my question is whether it's
>>>> really feasible to create another type for SnapshotNamespace in librbd
>>>> layer.
>>>>
>>>>
>>>> On Mon, Aug 29, 2016 at 2:10 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>> Right, I forgot about snaphot "namespaces". I'll add this part.
>>>>> I guess it makes sense to discuss the whole thing on the next CDM.
>>>>>
>>>>> On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>> I think the first step is to implement the concept of snapshot "namespaces".
>>>>>>
>>>>>> This could be implemented as an optional variant structure associated
>>>>>> with each snapshot at creation (see the ImageWatcher RPC messages or
>>>>>> journaling event type encoding for examples of this). For consistency
>>>>>> group snapshots, this structure would identify the snapshot as
>>>>>> belonging to the consistency group and have a unique id back to the
>>>>>> specific group snapshot.
>>>>>>
>>>>>> When creating a snapshot, the state machine would (1) create the group
>>>>>> snapshot record, (2) set the state of the group to "creating snapshot"
>>>>>> (to prevent new images from being added/removed from the group while
>>>>>> the op is in-progress), (3) acquire the lock for all images in the
>>>>>> group, (4) create the individual image snapshots with the linkage back
>>>>>> to the group snapshot record (can be performed in parallel up to max
>>>>>> concurrent ops), (5) release the exclusive locks, and (6) reset the
>>>>>> group status to "ready".
>>>>>>
>>>>>> If you have a hard crash/failure anywhere, a "snap remove" operation
>>>>>> should be designed to get the group back into consistent state (i.e.
>>>>>> remove any snapshots linked to the group and reset the group state
>>>>>> back to ready).
>>>>>>
>>>>>> On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>> Guys,
>>>>>>>
>>>>>>> I updated Snapshots section of this document:
>>>>>>> http://pad.ceph.com/p/consistency_groups, in accordance with my
>>>>>>> improved understanding of how it should be implemented.
>>>>>>> Please take a look and provide your comments. Some of my concerns
>>>>>>> regarding the implementation I highlighted in bold.
>>>>>>>
>>>>>>> Looking forward to your valuable remarks.
>>>>>>> Thanks in advance.
>>>>>>> V.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>>>>>>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote:
>>>>>>>>> What if I'm holding this lock and somebody else is trying to reacquire the lock.
>>>>>>>>> How do I get notified about it?
>>>>>>>>
>>>>>>>> The image watcher is notified, which triggers its handler:
>>>>>>>>
>>>>>>>>  ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx)
>>>>>>>>
>>>>>>>> The handler calls the current lock policy method `lock_requested()`,
>>>>>>>> which will define what to do with the lock request. The StandartPolicy
>>>>>>>> is to release the lock, so it may ping-ponging between the
>>>>>>>> clients. You may define a different policy -- rbd-mirror is an example
>>>>>>>> where it is used.
>>>>>>>>
>>>>>>>> Everywhere where an operation needs the exclusive lock, it is always
>>>>>>>> checked if we currently are a lock owner, i.e:
>>>>>>>>
>>>>>>>>  ictx->exclusive_lock->is_lock_owner()
>>>>>>>>
>>>>>>>> and if it is false, the exlusive lock is requested. Before this check
>>>>>>>> you need to aquire ctx->owner_lock, and until you release owner_lock
>>>>>>>> you can be sure your exclusive lock will not leak to another
>>>>>>>> client. After releasing owner_lock, you will need to repeate the check
>>>>>>>> again when you need it.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Mykola Golub
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>>>>>>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote:
>>>>>>>>> >> Could you please point me to the place in source code where writer
>>>>>>>>> >> acquires an exclusive lock on the image.
>>>>>>>>> >
>>>>>>>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need
>>>>>>>>> > (after opening the image) is:
>>>>>>>>> >
>>>>>>>>> > ```
>>>>>>>>> >   C_SaferCond lock_ctx;
>>>>>>>>> >   {
>>>>>>>>> >     RWLock::WLocker l(ictx->owner_lock);
>>>>>>>>> >
>>>>>>>>> >     if (ictx->exclusive_lock == nullptr) {
>>>>>>>>> >       // exclusive-lock feature is not enabled
>>>>>>>>> >       return -EINVAL;
>>>>>>>>> >     }
>>>>>>>>> >
>>>>>>>>> >     // Request the lock. If it is currently owned by another client,
>>>>>>>>> >     // RPC message will be sent to the client to release the lock.
>>>>>>>>> >     ictx->exclusive_lock->request_lock(&lock_ctx);
>>>>>>>>> >   } // release owner_lock before waiting to avoid potential deadlock
>>>>>>>>> >
>>>>>>>>> >   int r = lock_ctx.wait();
>>>>>>>>> >   if (r < 0) {
>>>>>>>>> >     return r;
>>>>>>>>> >   }
>>>>>>>>> >
>>>>>>>>> >   RWLock::RLocker l(ictx->owner_lock);
>>>>>>>>> >   if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) {
>>>>>>>>> >        // failed to acquire exclusive lock
>>>>>>>>> >        return -EROFS;
>>>>>>>>> >   }
>>>>>>>>> >
>>>>>>>>> >   // At this point lock is acquired
>>>>>>>>> >   ...
>>>>>>>>> >
>>>>>>>>> > ```
>>>>>>>>> >
>>>>>>>>> > You might want to look at this PR
>>>>>>>>> >
>>>>>>>>> >  https://github.com/ceph/ceph/pull/9592
>>>>>>>>> >
>>>>>>>>> > where we discuss adding API methods to directly acquire and release
>>>>>>>>> > the exclusive lock. You don't need the API, but will find examples in
>>>>>>>>> > the patch, and also useful comments from Jason.
>>>>>>>>> >
>>>>>>>>> > --
>>>>>>>>> > Mykola Golub
>>>>>>>>> >
>>>>>>>>> >> I presume you were talking about the feature:
>>>>>>>>> >> exclusive_lock, shared_lock which can be used from command line using
>>>>>>>>> >> commands lock list, lock break.
>>>>>>>>> >>
>>>>>>>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>> >> > There is already a "request lock" RPC message and this is already handled
>>>>>>>>> >> > transparently within librbd when you attempt to acquire the lock and another
>>>>>>>>> >> > client owns it.
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>> >> >>
>>>>>>>>> >> >> If an image already has a writer who owns the lock,
>>>>>>>>> >> >> should I implement a notification that allows to ask the writer to
>>>>>>>>> >> >> release the lock,
>>>>>>>>> >> >> is there already a standard way to intercept the exclusive lock?
>>>>>>>>> >> >>
>>>>>>>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com>
>>>>>>>>> >> >> wrote:
>>>>>>>>> >> >> > ... one more thing:
>>>>>>>>> >> >> >
>>>>>>>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to
>>>>>>>>> >> >> > indicate that an image is part of a consistency group to prevent older
>>>>>>>>> >> >> > librbd clients from removing the image or group snapshots.  This could
>>>>>>>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
>>>>>>>>> >> >> > still open the image R/O while its part of a group.
>>>>>>>>> >> >> >
>>>>>>>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com>
>>>>>>>>> >> >> > wrote:
>>>>>>>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we
>>>>>>>>> >> >> >> should implement option 3. Essentially, you have a prepare group
>>>>>>>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller.
>>>>>>>>> >> >> >> When that lease expires, IO would automatically be resumed even if the
>>>>>>>>> >> >> >> group snapshot hasn't been created yet.  This would also require
>>>>>>>>> >> >> >> commit/abort group snapshot RPC messages.
>>>>>>>>> >> >> >>
>>>>>>>>> >> >> >> However, thinking about this last night, here is another potential
>>>>>>>>> >> >> >> option:
>>>>>>>>> >> >> >>
>>>>>>>>> >> >> >> Option 4 - require images to have the exclusive lock feature before
>>>>>>>>> >> >> >> they can be added to a consistency group (and prevent disabling of
>>>>>>>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the
>>>>>>>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap
>>>>>>>>> >> >> >> create API), can co-operatively acquire the lock from all active image
>>>>>>>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and
>>>>>>>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal
>>>>>>>>> >> >> >> exclusive lock handling process will automatically take care of
>>>>>>>>> >> >> >> re-acquiring the lock from the dead client and resuming IO.
>>>>>>>>> >> >> >>
>>>>>>>>> >> >> >> This option not only re-uses existing code, it would also eliminate
>>>>>>>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort
>>>>>>>>> >> >> >> snapshot creation to support group snapshots (since it could all be
>>>>>>>>> >> >> >> handled internally).
>>>>>>>>> >> >> >>
>>>>>>>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com>
>>>>>>>>> >> >> >> wrote:
>>>>>>>>> >> >> >>> Gentlemen,
>>>>>>>>> >> >> >>>
>>>>>>>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes.
>>>>>>>>> >> >> >>>
>>>>>>>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a
>>>>>>>>> >> >> >>> consistency group,
>>>>>>>>> >> >> >>> we first need to quiesce all the image writers in the consistency
>>>>>>>>> >> >> >>> group.
>>>>>>>>> >> >> >>> Let me call
>>>>>>>>> >> >> >>> group client - a client which requests a consistency group to take a
>>>>>>>>> >> >> >>> snapshot.
>>>>>>>>> >> >> >>> Image client - the client that writes to an image.
>>>>>>>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image
>>>>>>>>> >> >> >>> clients that write to the images in the group. After quiescing half of
>>>>>>>>> >> >> >>> the image clients the group client can die.
>>>>>>>>> >> >> >>>
>>>>>>>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced
>>>>>>>>> >> >> >>> image clients.
>>>>>>>>> >> >> >>>
>>>>>>>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that
>>>>>>>>> >> >> >>> consistency group.
>>>>>>>>> >> >> >>> We can show warning next to those unfinished groups when user runs
>>>>>>>>> >> >> >>> group list command.
>>>>>>>>> >> >> >>> There will be a command like group recover, which allows users to
>>>>>>>>> >> >> >>> rollback unsuccessful snapshots
>>>>>>>>> >> >> >>> or continue them using create snapshot command.
>>>>>>>>> >> >> >>>
>>>>>>>>> >> >> >>> Option 2 - is to establish some heart beats between group client and
>>>>>>>>> >> >> >>> image client. If group client fails to heart beat then image client
>>>>>>>>> >> >> >>> unquiesces itself and continues normal operation.
>>>>>>>>> >> >> >>>
>>>>>>>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client
>>>>>>>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our
>>>>>>>>> >> >> >>> normal operation informing group client of the fact.
>>>>>>>>> >> >> >>>
>>>>>>>>> >> >> >>> Which of these options do you prefer? Probably there are other options
>>>>>>>>> >> >> >>> that I miss.
>>>>>>>>> >> >> >>>
>>>>>>>>> >> >> >>> Thanks,
>>>>>>>>> >> >> >>> Victor.
>>>>>>>>> >> >> >>
>>>>>>>>> >> >> >>
>>>>>>>>> >> >> >>
>>>>>>>>> >> >> >> --
>>>>>>>>> >> >> >> Jason
>>>>>>>>> >> >> >
>>>>>>>>> >> >> >
>>>>>>>>> >> >> >
>>>>>>>>> >> >> > --
>>>>>>>>> >> >> > Jason
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> > --
>>>>>>>>> >> > Jason
>>>>>>>>> >> >
>>>>>>>>> >> --
>>>>>>>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>> >> the body of a message to majordomo@vger.kernel.org
>>>>>>>>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jason
>>>
>>>
>>>
>>> --
>>> Jason
>
>
>
> --
> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-09-12 22:52                               ` Victor Denisov
@ 2016-09-12 23:06                                 ` Victor Denisov
  2016-09-12 23:09                                   ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-09-12 23:06 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin

Another quick question.
Do you think it makes sense to introduce snapshot namespaces in a pull
request and review it first?
It looks like a self sufficient change that we can merge before
introducing snapshots.

On Mon, Sep 12, 2016 at 3:52 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> Understood. Thank you, Jason.
>
> On Mon, Sep 12, 2016 at 6:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>> On Sat, Sep 10, 2016 at 9:46 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> Another question. Maybe not really a question, but I would like to
>>> verify if I understood what you wrote in the ether pad.
>>>
>>> You suggest to create image snapshots simultaneously.
>>> If everything shuts down when we are making those individual
>>> snapshots, then we end up with a SnapshotRecord in incomplete state
>>> and images either with snapshots or without them.
>>> Do I understand correctly that if the user wants to remove this
>>> unfinished group snapshot then we have to:
>>>  - list all images in this group
>>>  - look for snapshots in those images with the guid as their name.
>>>  - delete those individual snapshots and ignore errors if those
>>> snapshots don't exist.
>>>  - delete then entry.
>>
>> It would be the standard remove state machine, which is basically the
>> steps you have above. Note that you would always need to handle the
>> "-ENOENT" case since I could always associate an image to a group
>> after a group snap was created (i.e. so the new image doesn't have a
>> matching image snapshot for a group snapshot).
>>
>>> One thing that I don't understand in this case is, what if the user
>>> decides to delete one of the images when there are dangling group
>>> snapshots. Let's call this image A.
>>> This dangling group snapshot could have successfully created a
>>> snapshot of this image A. Let's call this snapshot A_snap.
>>> Now if we remove image A from this group then once we try to cleanup
>>> dangling group snapshot then A_snap shapshot will be overlooked,
>>> because image A is not a member of the group any more.
>>> And I don't understand how we can prevent this from happening in this
>>> approach, except by disallowing to remove images if there are dandling
>>> group snapshots.
>>
>> How is the snapshot dangling for image A? If it successfully created
>> the snapshot on image A, it has a snapshot record that associates it
>> to the group. Therefore, when the image is removed from the group, I
>> would think you would automatically delete the the group snapshots
>> contained within the image.
>>
>>> You mentioned that we should call image's individual snapshots after
>>> the groups guid. I assume we should name them something like
>>> <guid>_<group_snap_id>.
>>> If we named them only using guid, then we would be able to create only
>>> one group snapshot.
>>
>> Yup -- that should be fine.
>>
>>> Thanks,
>>> V.
>>>
>>>
>>> On Sat, Sep 10, 2016 at 11:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>> Those are all internal classes -- the cls types are already
>>>> dependencies within the librbd internals. Feel free to add the
>>>> necessary include and use it directly from within librbd.
>>>>
>>>> On Fri, Sep 9, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>> I have a question about where SnapshotNamespace type should be placed.
>>>>> I placed it in cls/rbd/cls_rbd_types.h because cls client and cls
>>>>> backend components should have access to this type.
>>>>> Also this type is required in librbd/Operations.cc - because we want
>>>>> to specify in what namespace Operations::snap_create should create
>>>>> snapshots.
>>>>> However Operations.cc doesn't import cls_rbd_types.h right now. If the
>>>>> question was about public interface of librbd/librbd.cc, then I would
>>>>> create a duplicate of SnapshotNamespace type in librbd layer without
>>>>> hesitation.
>>>>> But these functions are internal, so, my question is whether it's
>>>>> really feasible to create another type for SnapshotNamespace in librbd
>>>>> layer.
>>>>>
>>>>>
>>>>> On Mon, Aug 29, 2016 at 2:10 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>> Right, I forgot about snaphot "namespaces". I'll add this part.
>>>>>> I guess it makes sense to discuss the whole thing on the next CDM.
>>>>>>
>>>>>> On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>> I think the first step is to implement the concept of snapshot "namespaces".
>>>>>>>
>>>>>>> This could be implemented as an optional variant structure associated
>>>>>>> with each snapshot at creation (see the ImageWatcher RPC messages or
>>>>>>> journaling event type encoding for examples of this). For consistency
>>>>>>> group snapshots, this structure would identify the snapshot as
>>>>>>> belonging to the consistency group and have a unique id back to the
>>>>>>> specific group snapshot.
>>>>>>>
>>>>>>> When creating a snapshot, the state machine would (1) create the group
>>>>>>> snapshot record, (2) set the state of the group to "creating snapshot"
>>>>>>> (to prevent new images from being added/removed from the group while
>>>>>>> the op is in-progress), (3) acquire the lock for all images in the
>>>>>>> group, (4) create the individual image snapshots with the linkage back
>>>>>>> to the group snapshot record (can be performed in parallel up to max
>>>>>>> concurrent ops), (5) release the exclusive locks, and (6) reset the
>>>>>>> group status to "ready".
>>>>>>>
>>>>>>> If you have a hard crash/failure anywhere, a "snap remove" operation
>>>>>>> should be designed to get the group back into consistent state (i.e.
>>>>>>> remove any snapshots linked to the group and reset the group state
>>>>>>> back to ready).
>>>>>>>
>>>>>>> On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>> Guys,
>>>>>>>>
>>>>>>>> I updated Snapshots section of this document:
>>>>>>>> http://pad.ceph.com/p/consistency_groups, in accordance with my
>>>>>>>> improved understanding of how it should be implemented.
>>>>>>>> Please take a look and provide your comments. Some of my concerns
>>>>>>>> regarding the implementation I highlighted in bold.
>>>>>>>>
>>>>>>>> Looking forward to your valuable remarks.
>>>>>>>> Thanks in advance.
>>>>>>>> V.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>>>>>>>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote:
>>>>>>>>>> What if I'm holding this lock and somebody else is trying to reacquire the lock.
>>>>>>>>>> How do I get notified about it?
>>>>>>>>>
>>>>>>>>> The image watcher is notified, which triggers its handler:
>>>>>>>>>
>>>>>>>>>  ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx)
>>>>>>>>>
>>>>>>>>> The handler calls the current lock policy method `lock_requested()`,
>>>>>>>>> which will define what to do with the lock request. The StandartPolicy
>>>>>>>>> is to release the lock, so it may ping-ponging between the
>>>>>>>>> clients. You may define a different policy -- rbd-mirror is an example
>>>>>>>>> where it is used.
>>>>>>>>>
>>>>>>>>> Everywhere where an operation needs the exclusive lock, it is always
>>>>>>>>> checked if we currently are a lock owner, i.e:
>>>>>>>>>
>>>>>>>>>  ictx->exclusive_lock->is_lock_owner()
>>>>>>>>>
>>>>>>>>> and if it is false, the exlusive lock is requested. Before this check
>>>>>>>>> you need to aquire ctx->owner_lock, and until you release owner_lock
>>>>>>>>> you can be sure your exclusive lock will not leak to another
>>>>>>>>> client. After releasing owner_lock, you will need to repeate the check
>>>>>>>>> again when you need it.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Mykola Golub
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>>>>>>>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote:
>>>>>>>>>> >> Could you please point me to the place in source code where writer
>>>>>>>>>> >> acquires an exclusive lock on the image.
>>>>>>>>>> >
>>>>>>>>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need
>>>>>>>>>> > (after opening the image) is:
>>>>>>>>>> >
>>>>>>>>>> > ```
>>>>>>>>>> >   C_SaferCond lock_ctx;
>>>>>>>>>> >   {
>>>>>>>>>> >     RWLock::WLocker l(ictx->owner_lock);
>>>>>>>>>> >
>>>>>>>>>> >     if (ictx->exclusive_lock == nullptr) {
>>>>>>>>>> >       // exclusive-lock feature is not enabled
>>>>>>>>>> >       return -EINVAL;
>>>>>>>>>> >     }
>>>>>>>>>> >
>>>>>>>>>> >     // Request the lock. If it is currently owned by another client,
>>>>>>>>>> >     // RPC message will be sent to the client to release the lock.
>>>>>>>>>> >     ictx->exclusive_lock->request_lock(&lock_ctx);
>>>>>>>>>> >   } // release owner_lock before waiting to avoid potential deadlock
>>>>>>>>>> >
>>>>>>>>>> >   int r = lock_ctx.wait();
>>>>>>>>>> >   if (r < 0) {
>>>>>>>>>> >     return r;
>>>>>>>>>> >   }
>>>>>>>>>> >
>>>>>>>>>> >   RWLock::RLocker l(ictx->owner_lock);
>>>>>>>>>> >   if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) {
>>>>>>>>>> >        // failed to acquire exclusive lock
>>>>>>>>>> >        return -EROFS;
>>>>>>>>>> >   }
>>>>>>>>>> >
>>>>>>>>>> >   // At this point lock is acquired
>>>>>>>>>> >   ...
>>>>>>>>>> >
>>>>>>>>>> > ```
>>>>>>>>>> >
>>>>>>>>>> > You might want to look at this PR
>>>>>>>>>> >
>>>>>>>>>> >  https://github.com/ceph/ceph/pull/9592
>>>>>>>>>> >
>>>>>>>>>> > where we discuss adding API methods to directly acquire and release
>>>>>>>>>> > the exclusive lock. You don't need the API, but will find examples in
>>>>>>>>>> > the patch, and also useful comments from Jason.
>>>>>>>>>> >
>>>>>>>>>> > --
>>>>>>>>>> > Mykola Golub
>>>>>>>>>> >
>>>>>>>>>> >> I presume you were talking about the feature:
>>>>>>>>>> >> exclusive_lock, shared_lock which can be used from command line using
>>>>>>>>>> >> commands lock list, lock break.
>>>>>>>>>> >>
>>>>>>>>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>> >> > There is already a "request lock" RPC message and this is already handled
>>>>>>>>>> >> > transparently within librbd when you attempt to acquire the lock and another
>>>>>>>>>> >> > client owns it.
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> If an image already has a writer who owns the lock,
>>>>>>>>>> >> >> should I implement a notification that allows to ask the writer to
>>>>>>>>>> >> >> release the lock,
>>>>>>>>>> >> >> is there already a standard way to intercept the exclusive lock?
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com>
>>>>>>>>>> >> >> wrote:
>>>>>>>>>> >> >> > ... one more thing:
>>>>>>>>>> >> >> >
>>>>>>>>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to
>>>>>>>>>> >> >> > indicate that an image is part of a consistency group to prevent older
>>>>>>>>>> >> >> > librbd clients from removing the image or group snapshots.  This could
>>>>>>>>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
>>>>>>>>>> >> >> > still open the image R/O while its part of a group.
>>>>>>>>>> >> >> >
>>>>>>>>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com>
>>>>>>>>>> >> >> > wrote:
>>>>>>>>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we
>>>>>>>>>> >> >> >> should implement option 3. Essentially, you have a prepare group
>>>>>>>>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller.
>>>>>>>>>> >> >> >> When that lease expires, IO would automatically be resumed even if the
>>>>>>>>>> >> >> >> group snapshot hasn't been created yet.  This would also require
>>>>>>>>>> >> >> >> commit/abort group snapshot RPC messages.
>>>>>>>>>> >> >> >>
>>>>>>>>>> >> >> >> However, thinking about this last night, here is another potential
>>>>>>>>>> >> >> >> option:
>>>>>>>>>> >> >> >>
>>>>>>>>>> >> >> >> Option 4 - require images to have the exclusive lock feature before
>>>>>>>>>> >> >> >> they can be added to a consistency group (and prevent disabling of
>>>>>>>>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the
>>>>>>>>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap
>>>>>>>>>> >> >> >> create API), can co-operatively acquire the lock from all active image
>>>>>>>>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and
>>>>>>>>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal
>>>>>>>>>> >> >> >> exclusive lock handling process will automatically take care of
>>>>>>>>>> >> >> >> re-acquiring the lock from the dead client and resuming IO.
>>>>>>>>>> >> >> >>
>>>>>>>>>> >> >> >> This option not only re-uses existing code, it would also eliminate
>>>>>>>>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort
>>>>>>>>>> >> >> >> snapshot creation to support group snapshots (since it could all be
>>>>>>>>>> >> >> >> handled internally).
>>>>>>>>>> >> >> >>
>>>>>>>>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com>
>>>>>>>>>> >> >> >> wrote:
>>>>>>>>>> >> >> >>> Gentlemen,
>>>>>>>>>> >> >> >>>
>>>>>>>>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes.
>>>>>>>>>> >> >> >>>
>>>>>>>>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a
>>>>>>>>>> >> >> >>> consistency group,
>>>>>>>>>> >> >> >>> we first need to quiesce all the image writers in the consistency
>>>>>>>>>> >> >> >>> group.
>>>>>>>>>> >> >> >>> Let me call
>>>>>>>>>> >> >> >>> group client - a client which requests a consistency group to take a
>>>>>>>>>> >> >> >>> snapshot.
>>>>>>>>>> >> >> >>> Image client - the client that writes to an image.
>>>>>>>>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image
>>>>>>>>>> >> >> >>> clients that write to the images in the group. After quiescing half of
>>>>>>>>>> >> >> >>> the image clients the group client can die.
>>>>>>>>>> >> >> >>>
>>>>>>>>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced
>>>>>>>>>> >> >> >>> image clients.
>>>>>>>>>> >> >> >>>
>>>>>>>>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that
>>>>>>>>>> >> >> >>> consistency group.
>>>>>>>>>> >> >> >>> We can show warning next to those unfinished groups when user runs
>>>>>>>>>> >> >> >>> group list command.
>>>>>>>>>> >> >> >>> There will be a command like group recover, which allows users to
>>>>>>>>>> >> >> >>> rollback unsuccessful snapshots
>>>>>>>>>> >> >> >>> or continue them using create snapshot command.
>>>>>>>>>> >> >> >>>
>>>>>>>>>> >> >> >>> Option 2 - is to establish some heart beats between group client and
>>>>>>>>>> >> >> >>> image client. If group client fails to heart beat then image client
>>>>>>>>>> >> >> >>> unquiesces itself and continues normal operation.
>>>>>>>>>> >> >> >>>
>>>>>>>>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client
>>>>>>>>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our
>>>>>>>>>> >> >> >>> normal operation informing group client of the fact.
>>>>>>>>>> >> >> >>>
>>>>>>>>>> >> >> >>> Which of these options do you prefer? Probably there are other options
>>>>>>>>>> >> >> >>> that I miss.
>>>>>>>>>> >> >> >>>
>>>>>>>>>> >> >> >>> Thanks,
>>>>>>>>>> >> >> >>> Victor.
>>>>>>>>>> >> >> >>
>>>>>>>>>> >> >> >>
>>>>>>>>>> >> >> >>
>>>>>>>>>> >> >> >> --
>>>>>>>>>> >> >> >> Jason
>>>>>>>>>> >> >> >
>>>>>>>>>> >> >> >
>>>>>>>>>> >> >> >
>>>>>>>>>> >> >> > --
>>>>>>>>>> >> >> > Jason
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> > --
>>>>>>>>>> >> > Jason
>>>>>>>>>> >> >
>>>>>>>>>> >> --
>>>>>>>>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> >> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jason
>>>>
>>>>
>>>>
>>>> --
>>>> Jason
>>
>>
>>
>> --
>> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-09-12 23:06                                 ` Victor Denisov
@ 2016-09-12 23:09                                   ` Victor Denisov
  2016-09-13 12:33                                     ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-09-12 23:09 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin

I also think, probably, moving existing group api to Group.cc can be
done in a separate pull request.

On Mon, Sep 12, 2016 at 4:06 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> Another quick question.
> Do you think it makes sense to introduce snapshot namespaces in a pull
> request and review it first?
> It looks like a self sufficient change that we can merge before
> introducing snapshots.
>
> On Mon, Sep 12, 2016 at 3:52 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> Understood. Thank you, Jason.
>>
>> On Mon, Sep 12, 2016 at 6:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>> On Sat, Sep 10, 2016 at 9:46 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>> Another question. Maybe not really a question, but I would like to
>>>> verify if I understood what you wrote in the ether pad.
>>>>
>>>> You suggest to create image snapshots simultaneously.
>>>> If everything shuts down when we are making those individual
>>>> snapshots, then we end up with a SnapshotRecord in incomplete state
>>>> and images either with snapshots or without them.
>>>> Do I understand correctly that if the user wants to remove this
>>>> unfinished group snapshot then we have to:
>>>>  - list all images in this group
>>>>  - look for snapshots in those images with the guid as their name.
>>>>  - delete those individual snapshots and ignore errors if those
>>>> snapshots don't exist.
>>>>  - delete then entry.
>>>
>>> It would be the standard remove state machine, which is basically the
>>> steps you have above. Note that you would always need to handle the
>>> "-ENOENT" case since I could always associate an image to a group
>>> after a group snap was created (i.e. so the new image doesn't have a
>>> matching image snapshot for a group snapshot).
>>>
>>>> One thing that I don't understand in this case is, what if the user
>>>> decides to delete one of the images when there are dangling group
>>>> snapshots. Let's call this image A.
>>>> This dangling group snapshot could have successfully created a
>>>> snapshot of this image A. Let's call this snapshot A_snap.
>>>> Now if we remove image A from this group then once we try to cleanup
>>>> dangling group snapshot then A_snap shapshot will be overlooked,
>>>> because image A is not a member of the group any more.
>>>> And I don't understand how we can prevent this from happening in this
>>>> approach, except by disallowing to remove images if there are dandling
>>>> group snapshots.
>>>
>>> How is the snapshot dangling for image A? If it successfully created
>>> the snapshot on image A, it has a snapshot record that associates it
>>> to the group. Therefore, when the image is removed from the group, I
>>> would think you would automatically delete the the group snapshots
>>> contained within the image.
>>>
>>>> You mentioned that we should call image's individual snapshots after
>>>> the groups guid. I assume we should name them something like
>>>> <guid>_<group_snap_id>.
>>>> If we named them only using guid, then we would be able to create only
>>>> one group snapshot.
>>>
>>> Yup -- that should be fine.
>>>
>>>> Thanks,
>>>> V.
>>>>
>>>>
>>>> On Sat, Sep 10, 2016 at 11:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>> Those are all internal classes -- the cls types are already
>>>>> dependencies within the librbd internals. Feel free to add the
>>>>> necessary include and use it directly from within librbd.
>>>>>
>>>>> On Fri, Sep 9, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>> I have a question about where SnapshotNamespace type should be placed.
>>>>>> I placed it in cls/rbd/cls_rbd_types.h because cls client and cls
>>>>>> backend components should have access to this type.
>>>>>> Also this type is required in librbd/Operations.cc - because we want
>>>>>> to specify in what namespace Operations::snap_create should create
>>>>>> snapshots.
>>>>>> However Operations.cc doesn't import cls_rbd_types.h right now. If the
>>>>>> question was about public interface of librbd/librbd.cc, then I would
>>>>>> create a duplicate of SnapshotNamespace type in librbd layer without
>>>>>> hesitation.
>>>>>> But these functions are internal, so, my question is whether it's
>>>>>> really feasible to create another type for SnapshotNamespace in librbd
>>>>>> layer.
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 29, 2016 at 2:10 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>> Right, I forgot about snaphot "namespaces". I'll add this part.
>>>>>>> I guess it makes sense to discuss the whole thing on the next CDM.
>>>>>>>
>>>>>>> On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>> I think the first step is to implement the concept of snapshot "namespaces".
>>>>>>>>
>>>>>>>> This could be implemented as an optional variant structure associated
>>>>>>>> with each snapshot at creation (see the ImageWatcher RPC messages or
>>>>>>>> journaling event type encoding for examples of this). For consistency
>>>>>>>> group snapshots, this structure would identify the snapshot as
>>>>>>>> belonging to the consistency group and have a unique id back to the
>>>>>>>> specific group snapshot.
>>>>>>>>
>>>>>>>> When creating a snapshot, the state machine would (1) create the group
>>>>>>>> snapshot record, (2) set the state of the group to "creating snapshot"
>>>>>>>> (to prevent new images from being added/removed from the group while
>>>>>>>> the op is in-progress), (3) acquire the lock for all images in the
>>>>>>>> group, (4) create the individual image snapshots with the linkage back
>>>>>>>> to the group snapshot record (can be performed in parallel up to max
>>>>>>>> concurrent ops), (5) release the exclusive locks, and (6) reset the
>>>>>>>> group status to "ready".
>>>>>>>>
>>>>>>>> If you have a hard crash/failure anywhere, a "snap remove" operation
>>>>>>>> should be designed to get the group back into consistent state (i.e.
>>>>>>>> remove any snapshots linked to the group and reset the group state
>>>>>>>> back to ready).
>>>>>>>>
>>>>>>>> On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>> Guys,
>>>>>>>>>
>>>>>>>>> I updated Snapshots section of this document:
>>>>>>>>> http://pad.ceph.com/p/consistency_groups, in accordance with my
>>>>>>>>> improved understanding of how it should be implemented.
>>>>>>>>> Please take a look and provide your comments. Some of my concerns
>>>>>>>>> regarding the implementation I highlighted in bold.
>>>>>>>>>
>>>>>>>>> Looking forward to your valuable remarks.
>>>>>>>>> Thanks in advance.
>>>>>>>>> V.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>>>>>>>>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote:
>>>>>>>>>>> What if I'm holding this lock and somebody else is trying to reacquire the lock.
>>>>>>>>>>> How do I get notified about it?
>>>>>>>>>>
>>>>>>>>>> The image watcher is notified, which triggers its handler:
>>>>>>>>>>
>>>>>>>>>>  ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx)
>>>>>>>>>>
>>>>>>>>>> The handler calls the current lock policy method `lock_requested()`,
>>>>>>>>>> which will define what to do with the lock request. The StandartPolicy
>>>>>>>>>> is to release the lock, so it may ping-ponging between the
>>>>>>>>>> clients. You may define a different policy -- rbd-mirror is an example
>>>>>>>>>> where it is used.
>>>>>>>>>>
>>>>>>>>>> Everywhere where an operation needs the exclusive lock, it is always
>>>>>>>>>> checked if we currently are a lock owner, i.e:
>>>>>>>>>>
>>>>>>>>>>  ictx->exclusive_lock->is_lock_owner()
>>>>>>>>>>
>>>>>>>>>> and if it is false, the exlusive lock is requested. Before this check
>>>>>>>>>> you need to aquire ctx->owner_lock, and until you release owner_lock
>>>>>>>>>> you can be sure your exclusive lock will not leak to another
>>>>>>>>>> client. After releasing owner_lock, you will need to repeate the check
>>>>>>>>>> again when you need it.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Mykola Golub
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote:
>>>>>>>>>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote:
>>>>>>>>>>> >> Could you please point me to the place in source code where writer
>>>>>>>>>>> >> acquires an exclusive lock on the image.
>>>>>>>>>>> >
>>>>>>>>>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need
>>>>>>>>>>> > (after opening the image) is:
>>>>>>>>>>> >
>>>>>>>>>>> > ```
>>>>>>>>>>> >   C_SaferCond lock_ctx;
>>>>>>>>>>> >   {
>>>>>>>>>>> >     RWLock::WLocker l(ictx->owner_lock);
>>>>>>>>>>> >
>>>>>>>>>>> >     if (ictx->exclusive_lock == nullptr) {
>>>>>>>>>>> >       // exclusive-lock feature is not enabled
>>>>>>>>>>> >       return -EINVAL;
>>>>>>>>>>> >     }
>>>>>>>>>>> >
>>>>>>>>>>> >     // Request the lock. If it is currently owned by another client,
>>>>>>>>>>> >     // RPC message will be sent to the client to release the lock.
>>>>>>>>>>> >     ictx->exclusive_lock->request_lock(&lock_ctx);
>>>>>>>>>>> >   } // release owner_lock before waiting to avoid potential deadlock
>>>>>>>>>>> >
>>>>>>>>>>> >   int r = lock_ctx.wait();
>>>>>>>>>>> >   if (r < 0) {
>>>>>>>>>>> >     return r;
>>>>>>>>>>> >   }
>>>>>>>>>>> >
>>>>>>>>>>> >   RWLock::RLocker l(ictx->owner_lock);
>>>>>>>>>>> >   if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) {
>>>>>>>>>>> >        // failed to acquire exclusive lock
>>>>>>>>>>> >        return -EROFS;
>>>>>>>>>>> >   }
>>>>>>>>>>> >
>>>>>>>>>>> >   // At this point lock is acquired
>>>>>>>>>>> >   ...
>>>>>>>>>>> >
>>>>>>>>>>> > ```
>>>>>>>>>>> >
>>>>>>>>>>> > You might want to look at this PR
>>>>>>>>>>> >
>>>>>>>>>>> >  https://github.com/ceph/ceph/pull/9592
>>>>>>>>>>> >
>>>>>>>>>>> > where we discuss adding API methods to directly acquire and release
>>>>>>>>>>> > the exclusive lock. You don't need the API, but will find examples in
>>>>>>>>>>> > the patch, and also useful comments from Jason.
>>>>>>>>>>> >
>>>>>>>>>>> > --
>>>>>>>>>>> > Mykola Golub
>>>>>>>>>>> >
>>>>>>>>>>> >> I presume you were talking about the feature:
>>>>>>>>>>> >> exclusive_lock, shared_lock which can be used from command line using
>>>>>>>>>>> >> commands lock list, lock break.
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>> >> > There is already a "request lock" RPC message and this is already handled
>>>>>>>>>>> >> > transparently within librbd when you attempt to acquire the lock and another
>>>>>>>>>>> >> > client owns it.
>>>>>>>>>>> >> >
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> If an image already has a writer who owns the lock,
>>>>>>>>>>> >> >> should I implement a notification that allows to ask the writer to
>>>>>>>>>>> >> >> release the lock,
>>>>>>>>>>> >> >> is there already a standard way to intercept the exclusive lock?
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com>
>>>>>>>>>>> >> >> wrote:
>>>>>>>>>>> >> >> > ... one more thing:
>>>>>>>>>>> >> >> >
>>>>>>>>>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to
>>>>>>>>>>> >> >> > indicate that an image is part of a consistency group to prevent older
>>>>>>>>>>> >> >> > librbd clients from removing the image or group snapshots.  This could
>>>>>>>>>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can
>>>>>>>>>>> >> >> > still open the image R/O while its part of a group.
>>>>>>>>>>> >> >> >
>>>>>>>>>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com>
>>>>>>>>>>> >> >> > wrote:
>>>>>>>>>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we
>>>>>>>>>>> >> >> >> should implement option 3. Essentially, you have a prepare group
>>>>>>>>>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller.
>>>>>>>>>>> >> >> >> When that lease expires, IO would automatically be resumed even if the
>>>>>>>>>>> >> >> >> group snapshot hasn't been created yet.  This would also require
>>>>>>>>>>> >> >> >> commit/abort group snapshot RPC messages.
>>>>>>>>>>> >> >> >>
>>>>>>>>>>> >> >> >> However, thinking about this last night, here is another potential
>>>>>>>>>>> >> >> >> option:
>>>>>>>>>>> >> >> >>
>>>>>>>>>>> >> >> >> Option 4 - require images to have the exclusive lock feature before
>>>>>>>>>>> >> >> >> they can be added to a consistency group (and prevent disabling of
>>>>>>>>>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the
>>>>>>>>>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap
>>>>>>>>>>> >> >> >> create API), can co-operatively acquire the lock from all active image
>>>>>>>>>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and
>>>>>>>>>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal
>>>>>>>>>>> >> >> >> exclusive lock handling process will automatically take care of
>>>>>>>>>>> >> >> >> re-acquiring the lock from the dead client and resuming IO.
>>>>>>>>>>> >> >> >>
>>>>>>>>>>> >> >> >> This option not only re-uses existing code, it would also eliminate
>>>>>>>>>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort
>>>>>>>>>>> >> >> >> snapshot creation to support group snapshots (since it could all be
>>>>>>>>>>> >> >> >> handled internally).
>>>>>>>>>>> >> >> >>
>>>>>>>>>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com>
>>>>>>>>>>> >> >> >> wrote:
>>>>>>>>>>> >> >> >>> Gentlemen,
>>>>>>>>>>> >> >> >>>
>>>>>>>>>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes.
>>>>>>>>>>> >> >> >>>
>>>>>>>>>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a
>>>>>>>>>>> >> >> >>> consistency group,
>>>>>>>>>>> >> >> >>> we first need to quiesce all the image writers in the consistency
>>>>>>>>>>> >> >> >>> group.
>>>>>>>>>>> >> >> >>> Let me call
>>>>>>>>>>> >> >> >>> group client - a client which requests a consistency group to take a
>>>>>>>>>>> >> >> >>> snapshot.
>>>>>>>>>>> >> >> >>> Image client - the client that writes to an image.
>>>>>>>>>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image
>>>>>>>>>>> >> >> >>> clients that write to the images in the group. After quiescing half of
>>>>>>>>>>> >> >> >>> the image clients the group client can die.
>>>>>>>>>>> >> >> >>>
>>>>>>>>>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced
>>>>>>>>>>> >> >> >>> image clients.
>>>>>>>>>>> >> >> >>>
>>>>>>>>>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that
>>>>>>>>>>> >> >> >>> consistency group.
>>>>>>>>>>> >> >> >>> We can show warning next to those unfinished groups when user runs
>>>>>>>>>>> >> >> >>> group list command.
>>>>>>>>>>> >> >> >>> There will be a command like group recover, which allows users to
>>>>>>>>>>> >> >> >>> rollback unsuccessful snapshots
>>>>>>>>>>> >> >> >>> or continue them using create snapshot command.
>>>>>>>>>>> >> >> >>>
>>>>>>>>>>> >> >> >>> Option 2 - is to establish some heart beats between group client and
>>>>>>>>>>> >> >> >>> image client. If group client fails to heart beat then image client
>>>>>>>>>>> >> >> >>> unquiesces itself and continues normal operation.
>>>>>>>>>>> >> >> >>>
>>>>>>>>>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client
>>>>>>>>>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our
>>>>>>>>>>> >> >> >>> normal operation informing group client of the fact.
>>>>>>>>>>> >> >> >>>
>>>>>>>>>>> >> >> >>> Which of these options do you prefer? Probably there are other options
>>>>>>>>>>> >> >> >>> that I miss.
>>>>>>>>>>> >> >> >>>
>>>>>>>>>>> >> >> >>> Thanks,
>>>>>>>>>>> >> >> >>> Victor.
>>>>>>>>>>> >> >> >>
>>>>>>>>>>> >> >> >>
>>>>>>>>>>> >> >> >>
>>>>>>>>>>> >> >> >> --
>>>>>>>>>>> >> >> >> Jason
>>>>>>>>>>> >> >> >
>>>>>>>>>>> >> >> >
>>>>>>>>>>> >> >> >
>>>>>>>>>>> >> >> > --
>>>>>>>>>>> >> >> > Jason
>>>>>>>>>>> >> >
>>>>>>>>>>> >> >
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > --
>>>>>>>>>>> >> > Jason
>>>>>>>>>>> >> >
>>>>>>>>>>> >> --
>>>>>>>>>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>> >> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>> --
>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jason
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jason
>>>
>>>
>>>
>>> --
>>> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-09-12 23:09                                   ` Victor Denisov
@ 2016-09-13 12:33                                     ` Jason Dillaman
  2016-09-13 23:41                                       ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Dillaman @ 2016-09-13 12:33 UTC (permalink / raw)
  To: Victor Denisov; +Cc: Mykola Golub, ceph-devel, Josh Durgin

On Mon, Sep 12, 2016 at 7:09 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> I also think, probably, moving existing group api to Group.cc can be
> done in a separate pull request.

Agreed -- no worries.

> On Mon, Sep 12, 2016 at 4:06 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> Another quick question.
>> Do you think it makes sense to introduce snapshot namespaces in a pull
>> request and review it first?
>> It looks like a self sufficient change that we can merge before
>> introducing snapshots.

Yes, I think that would be a good idea to simplify the PR and get it
merged potentially quicker.

-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-09-13 12:33                                     ` Jason Dillaman
@ 2016-09-13 23:41                                       ` Victor Denisov
  2016-09-15 23:17                                         ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-09-13 23:41 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin

Published a pull request for extracting Group.cc

https://github.com/ceph/ceph/pull/11070

Please review.


On Tue, Sep 13, 2016 at 5:33 AM, Jason Dillaman <jdillama@redhat.com> wrote:
> On Mon, Sep 12, 2016 at 7:09 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> I also think, probably, moving existing group api to Group.cc can be
>> done in a separate pull request.
>
> Agreed -- no worries.
>
>> On Mon, Sep 12, 2016 at 4:06 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> Another quick question.
>>> Do you think it makes sense to introduce snapshot namespaces in a pull
>>> request and review it first?
>>> It looks like a self sufficient change that we can merge before
>>> introducing snapshots.
>
> Yes, I think that would be a good idea to simplify the PR and get it
> merged potentially quicker.
>
> --
> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-09-13 23:41                                       ` Victor Denisov
@ 2016-09-15 23:17                                         ` Victor Denisov
  2016-09-16 17:37                                           ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-09-15 23:17 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin

Hi Jason,

I have a question about implementation of snapshot namespaces in cls_rbd.

What if I implement encode and decode functions for SnapshotNamespace
types as follows:

inline void encode(const SnapshotNamespace &c, bufferlist &bl,
uint64_t features=0)
{
  ENCODE_DUMP_PRE();
  boost::apply_visitor(EncodeSnapshotTypeVisitor(bl), c);
  ENCODE_DUMP_POST(SnapshotNamespace);
}

inline void decode(SnapshotNamespace &c, bufferlist::iterator &p)
{
  uint32_t snap_type;
  ::decode(snap_type, p);
  switch (snap_type) {
    case cls::rbd::SNAPSHOT_NAMESPACE_TYPE_USER:
      c = UserSnapshotNamespace();
      break;
    case cls::rbd::SNAPSHOT_NAMESPACE_TYPE_GROUP:
      c = GroupSnapshotNamespace();
      break;
    default:
      c = UnknownSnapshotNamespace();
      break;
  }
  boost::apply_visitor(DecodeSnapshotTypeVisitor(p), c);
}

https://github.com/VictorDenisov/ceph/blob/consistency_groups_namespaces/src/cls/rbd/cls_rbd_types.h#L314

and then instead of


if (struct_v >= 5) {
      uint32_t snap_type;
      ::decode(snap_type, p);
      snapshot_namespace_type =
static_cast<cls::rbd::SnapshotNamespaceType>(snap_type);
      switch (snap_type) {
        case cls::rbd::SNAPSHOT_NAMESPACE_TYPE_USER:
          snapshot_namespace = cls::rbd::UserSnapshotNamespace();
          break;
        case cls::rbd::SNAPSHOT_NAMESPACE_TYPE_GROUP:
          snapshot_namespace = cls::rbd::GroupSnapshotNamespace();
          break;
        default:
          snapshot_namespace = cls::rbd::UnknownSnapshotNamespace();
          break;
      }
      boost::apply_visitor(cls::rbd::DecodeSnapshotTypeVisitor(p),
snapshot_namespace);
    } else {
      snapshot_namespace = cls::rbd::UserSnapshotNamespace();
    }


in cls_rbd_snap decoding I would write just:

if (struct_v >= 5) {
      ::decode(snapshot_namespace, p);
    } else {
      snapshot_namespace = cls::rbd::UserSnapshotNamespace();
    }

then code for ::encode function of cls_rbd_snap would change accordingly:

instead of

boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
snapshot_namespace);

I would do:
::encode(snapshot_namespace, bl);

The reason I would like to have encode and decode for
SnapshotNamespace implemented like this is because when I implement
get_snap_namespace method in cls_rbd.cc it's very convenient to have
conventional encode and decode functions. get_snap_namespace has to
serialize and deserialize results.

https://github.com/VictorDenisov/ceph/blob/consistency_groups_namespaces/src/cls/rbd/cls_rbd.cc#L1492


What do you think?

Thanks,
V.

On Tue, Sep 13, 2016 at 4:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> Published a pull request for extracting Group.cc
>
> https://github.com/ceph/ceph/pull/11070
>
> Please review.
>
>
> On Tue, Sep 13, 2016 at 5:33 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>> On Mon, Sep 12, 2016 at 7:09 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> I also think, probably, moving existing group api to Group.cc can be
>>> done in a separate pull request.
>>
>> Agreed -- no worries.
>>
>>> On Mon, Sep 12, 2016 at 4:06 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>> Another quick question.
>>>> Do you think it makes sense to introduce snapshot namespaces in a pull
>>>> request and review it first?
>>>> It looks like a self sufficient change that we can merge before
>>>> introducing snapshots.
>>
>> Yes, I think that would be a good idea to simplify the PR and get it
>> merged potentially quicker.
>>
>> --
>> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-09-15 23:17                                         ` Victor Denisov
@ 2016-09-16 17:37                                           ` Jason Dillaman
  2016-10-07 18:26                                             ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Dillaman @ 2016-09-16 17:37 UTC (permalink / raw)
  To: Victor Denisov; +Cc: Mykola Golub, ceph-devel, Josh Durgin

On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> if (struct_v >= 5) {
>       ::decode(snapshot_namespace, p);
>     } else {
>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>     }
>
> then code for ::encode function of cls_rbd_snap would change accordingly:
>
> instead of
>
> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
> snapshot_namespace);
>
> I would do:
> ::encode(snapshot_namespace, bl);


+1 -- looks good to me

-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-09-16 17:37                                           ` Jason Dillaman
@ 2016-10-07 18:26                                             ` Victor Denisov
  2016-10-10 14:18                                               ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-10-07 18:26 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin

Are any exceptions used in librbd code? Should the code be exception safe?

Thanks,
V.

On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> if (struct_v >= 5) {
>>       ::decode(snapshot_namespace, p);
>>     } else {
>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>     }
>>
>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>
>> instead of
>>
>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>> snapshot_namespace);
>>
>> I would do:
>> ::encode(snapshot_namespace, bl);
>
>
> +1 -- looks good to me
>
> --
> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-10-07 18:26                                             ` Victor Denisov
@ 2016-10-10 14:18                                               ` Jason Dillaman
  2016-10-10 22:14                                                 ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Dillaman @ 2016-10-10 14:18 UTC (permalink / raw)
  To: Victor Denisov; +Cc: Mykola Golub, ceph-devel, Josh Durgin

The only place exceptions are routinely used is within the "::decode"
functions. I would prefer to see the code not throwing new exceptions
on purpose.

On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> Are any exceptions used in librbd code? Should the code be exception safe?
>
> Thanks,
> V.
>
> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> if (struct_v >= 5) {
>>>       ::decode(snapshot_namespace, p);
>>>     } else {
>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>     }
>>>
>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>
>>> instead of
>>>
>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>> snapshot_namespace);
>>>
>>> I would do:
>>> ::encode(snapshot_namespace, bl);
>>
>>
>> +1 -- looks good to me
>>
>> --
>> Jason



-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-10-10 14:18                                               ` Jason Dillaman
@ 2016-10-10 22:14                                                 ` Victor Denisov
  2016-10-26  3:07                                                   ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-10-10 22:14 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin

Ok. I didn't have any intention to throw exceptions.
I was more concerned about whether it's ok to allocate and delete
objects or I should use smart pointers.

On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
> The only place exceptions are routinely used is within the "::decode"
> functions. I would prefer to see the code not throwing new exceptions
> on purpose.
>
> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> Are any exceptions used in librbd code? Should the code be exception safe?
>>
>> Thanks,
>> V.
>>
>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>> if (struct_v >= 5) {
>>>>       ::decode(snapshot_namespace, p);
>>>>     } else {
>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>     }
>>>>
>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>
>>>> instead of
>>>>
>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>> snapshot_namespace);
>>>>
>>>> I would do:
>>>> ::encode(snapshot_namespace, bl);
>>>
>>>
>>> +1 -- looks good to me
>>>
>>> --
>>> Jason
>
>
>
> --
> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-10-10 22:14                                                 ` Victor Denisov
@ 2016-10-26  3:07                                                   ` Victor Denisov
  2016-10-26 13:16                                                     ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-10-26  3:07 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin

Question. When we print out snapshots of an image, should the group
snapshots be listed, or should they be marked as special snapshots?

Thanks,
V.

On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> Ok. I didn't have any intention to throw exceptions.
> I was more concerned about whether it's ok to allocate and delete
> objects or I should use smart pointers.
>
> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>> The only place exceptions are routinely used is within the "::decode"
>> functions. I would prefer to see the code not throwing new exceptions
>> on purpose.
>>
>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>
>>> Thanks,
>>> V.
>>>
>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>> if (struct_v >= 5) {
>>>>>       ::decode(snapshot_namespace, p);
>>>>>     } else {
>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>     }
>>>>>
>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>
>>>>> instead of
>>>>>
>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>> snapshot_namespace);
>>>>>
>>>>> I would do:
>>>>> ::encode(snapshot_namespace, bl);
>>>>
>>>>
>>>> +1 -- looks good to me
>>>>
>>>> --
>>>> Jason
>>
>>
>>
>> --
>> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-10-26  3:07                                                   ` Victor Denisov
@ 2016-10-26 13:16                                                     ` Jason Dillaman
  2016-10-28 22:41                                                       ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Dillaman @ 2016-10-26 13:16 UTC (permalink / raw)
  To: Victor Denisov; +Cc: Mykola Golub, ceph-devel, Josh Durgin

In a perfect world, it would be nice to add a new optional to "rbd
snap ls" to show all snapshots (with a new column to indicate the
associated namespace).

On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> Question. When we print out snapshots of an image, should the group
> snapshots be listed, or should they be marked as special snapshots?
>
> Thanks,
> V.
>
> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> Ok. I didn't have any intention to throw exceptions.
>> I was more concerned about whether it's ok to allocate and delete
>> objects or I should use smart pointers.
>>
>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>> The only place exceptions are routinely used is within the "::decode"
>>> functions. I would prefer to see the code not throwing new exceptions
>>> on purpose.
>>>
>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>
>>>> Thanks,
>>>> V.
>>>>
>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>> if (struct_v >= 5) {
>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>     } else {
>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>     }
>>>>>>
>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>
>>>>>> instead of
>>>>>>
>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>> snapshot_namespace);
>>>>>>
>>>>>> I would do:
>>>>>> ::encode(snapshot_namespace, bl);
>>>>>
>>>>>
>>>>> +1 -- looks good to me
>>>>>
>>>>> --
>>>>> Jason
>>>
>>>
>>>
>>> --
>>> Jason



-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-10-26 13:16                                                     ` Jason Dillaman
@ 2016-10-28 22:41                                                       ` Victor Denisov
  2016-10-31 13:07                                                         ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-10-28 22:41 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin

Another thing that bothers me. When we remove an image from a consistency group.
Should we remove all snapshots of this image that were created as part
of a consistency group snapshot?

The easiest solution would be to remove all snapshots that are in
GroupSnapshotNamespace and reference this consistency group.
I looked into cinder docs for this feature:
http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html

But it's not clear to me which behavior cinder expects.

Thanks,
V.

On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote:
> In a perfect world, it would be nice to add a new optional to "rbd
> snap ls" to show all snapshots (with a new column to indicate the
> associated namespace).
>
> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> Question. When we print out snapshots of an image, should the group
>> snapshots be listed, or should they be marked as special snapshots?
>>
>> Thanks,
>> V.
>>
>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> Ok. I didn't have any intention to throw exceptions.
>>> I was more concerned about whether it's ok to allocate and delete
>>> objects or I should use smart pointers.
>>>
>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>> The only place exceptions are routinely used is within the "::decode"
>>>> functions. I would prefer to see the code not throwing new exceptions
>>>> on purpose.
>>>>
>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>>
>>>>> Thanks,
>>>>> V.
>>>>>
>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>> if (struct_v >= 5) {
>>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>>     } else {
>>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>>     }
>>>>>>>
>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>>
>>>>>>> instead of
>>>>>>>
>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>>> snapshot_namespace);
>>>>>>>
>>>>>>> I would do:
>>>>>>> ::encode(snapshot_namespace, bl);
>>>>>>
>>>>>>
>>>>>> +1 -- looks good to me
>>>>>>
>>>>>> --
>>>>>> Jason
>>>>
>>>>
>>>>
>>>> --
>>>> Jason
>
>
>
> --
> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-10-28 22:41                                                       ` Victor Denisov
@ 2016-10-31 13:07                                                         ` Jason Dillaman
  2016-11-01 15:01                                                           ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Dillaman @ 2016-10-31 13:07 UTC (permalink / raw)
  To: Victor Denisov; +Cc: Mykola Golub, ceph-devel

Looking at the Cinder codebase, I don't see any such restriction that
would prevent you from removing a volume from a consistency group that
has associated snapshots. I would double-check on the OpenStack
development mailing list if this is correct and is the intent. Worst
case, the RBD driver could raise an exception if there are still
consistency group snapshots associated to the image.


On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> Another thing that bothers me. When we remove an image from a consistency group.
> Should we remove all snapshots of this image that were created as part
> of a consistency group snapshot?
>
> The easiest solution would be to remove all snapshots that are in
> GroupSnapshotNamespace and reference this consistency group.
> I looked into cinder docs for this feature:
> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html
>
> But it's not clear to me which behavior cinder expects.
>
> Thanks,
> V.
>
> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>> In a perfect world, it would be nice to add a new optional to "rbd
>> snap ls" to show all snapshots (with a new column to indicate the
>> associated namespace).
>>
>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> Question. When we print out snapshots of an image, should the group
>>> snapshots be listed, or should they be marked as special snapshots?
>>>
>>> Thanks,
>>> V.
>>>
>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>> Ok. I didn't have any intention to throw exceptions.
>>>> I was more concerned about whether it's ok to allocate and delete
>>>> objects or I should use smart pointers.
>>>>
>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>> The only place exceptions are routinely used is within the "::decode"
>>>>> functions. I would prefer to see the code not throwing new exceptions
>>>>> on purpose.
>>>>>
>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>>>
>>>>>> Thanks,
>>>>>> V.
>>>>>>
>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>> if (struct_v >= 5) {
>>>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>>>     } else {
>>>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>>>     }
>>>>>>>>
>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>>>
>>>>>>>> instead of
>>>>>>>>
>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>>>> snapshot_namespace);
>>>>>>>>
>>>>>>>> I would do:
>>>>>>>> ::encode(snapshot_namespace, bl);
>>>>>>>
>>>>>>>
>>>>>>> +1 -- looks good to me
>>>>>>>
>>>>>>> --
>>>>>>> Jason
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jason
>>
>>
>>
>> --
>> Jason



-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-10-31 13:07                                                         ` Jason Dillaman
@ 2016-11-01 15:01                                                           ` Jason Dillaman
  2016-12-14  1:03                                                             ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Dillaman @ 2016-11-01 15:01 UTC (permalink / raw)
  To: dillaman; +Cc: Victor Denisov, Mykola Golub, ceph-devel

I chatted with Xing on IRC this morning re: Cinder generic groups. It
sounds like RBD will need to support preserving the image's
consistency group snapshots even if the image is removed from the
group. In the OpenStack case, you won't have to worry about the image
being deleted while it still has associated group snapshots.

We will also want to support being able to clone child images from a
group snapshot to ensure that we can thin provision new groups volumes
when creating a new group from a group snapshot. This means that the
group snapshots should be able to be protected/unprotected just like
standard user snapshots.

On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote:
> Looking at the Cinder codebase, I don't see any such restriction that
> would prevent you from removing a volume from a consistency group that
> has associated snapshots. I would double-check on the OpenStack
> development mailing list if this is correct and is the intent. Worst
> case, the RBD driver could raise an exception if there are still
> consistency group snapshots associated to the image.
>
>
> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> Another thing that bothers me. When we remove an image from a consistency group.
>> Should we remove all snapshots of this image that were created as part
>> of a consistency group snapshot?
>>
>> The easiest solution would be to remove all snapshots that are in
>> GroupSnapshotNamespace and reference this consistency group.
>> I looked into cinder docs for this feature:
>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html
>>
>> But it's not clear to me which behavior cinder expects.
>>
>> Thanks,
>> V.
>>
>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>> In a perfect world, it would be nice to add a new optional to "rbd
>>> snap ls" to show all snapshots (with a new column to indicate the
>>> associated namespace).
>>>
>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>> Question. When we print out snapshots of an image, should the group
>>>> snapshots be listed, or should they be marked as special snapshots?
>>>>
>>>> Thanks,
>>>> V.
>>>>
>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>> Ok. I didn't have any intention to throw exceptions.
>>>>> I was more concerned about whether it's ok to allocate and delete
>>>>> objects or I should use smart pointers.
>>>>>
>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>> The only place exceptions are routinely used is within the "::decode"
>>>>>> functions. I would prefer to see the code not throwing new exceptions
>>>>>> on purpose.
>>>>>>
>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> V.
>>>>>>>
>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>> if (struct_v >= 5) {
>>>>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>>>>     } else {
>>>>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>>>>
>>>>>>>>> instead of
>>>>>>>>>
>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>>>>> snapshot_namespace);
>>>>>>>>>
>>>>>>>>> I would do:
>>>>>>>>> ::encode(snapshot_namespace, bl);
>>>>>>>>
>>>>>>>>
>>>>>>>> +1 -- looks good to me
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jason
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jason
>>>
>>>
>>>
>>> --
>>> Jason
>
>
>
> --
> Jason



-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-11-01 15:01                                                           ` Jason Dillaman
@ 2016-12-14  1:03                                                             ` Victor Denisov
  2016-12-15 15:10                                                               ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-12-14  1:03 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel

Jason,

My current implementation of consistency group snapshot feature names
image snapshots like: <group_pool>_<group_id>_<group_snap_id>
I rely on this fact when I need to remove a consistency group. It's
necessary because if some of image snapshots were created, but the
whole group snapshot operation failed,
then the only way to find those dangling image snapshots is by this name.

It means that we should forbid renaming snapshots from
ConsistencyGroupSnapshot namespace.

Another option is to allocate image snapshot ids during the creation
of group snapshot, but this requires a major rewrite of the whole
process of snapshot creation for images.

What is your opinion on this?

Thanks,
V.


On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote:
> I chatted with Xing on IRC this morning re: Cinder generic groups. It
> sounds like RBD will need to support preserving the image's
> consistency group snapshots even if the image is removed from the
> group. In the OpenStack case, you won't have to worry about the image
> being deleted while it still has associated group snapshots.
>
> We will also want to support being able to clone child images from a
> group snapshot to ensure that we can thin provision new groups volumes
> when creating a new group from a group snapshot. This means that the
> group snapshots should be able to be protected/unprotected just like
> standard user snapshots.
>
> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>> Looking at the Cinder codebase, I don't see any such restriction that
>> would prevent you from removing a volume from a consistency group that
>> has associated snapshots. I would double-check on the OpenStack
>> development mailing list if this is correct and is the intent. Worst
>> case, the RBD driver could raise an exception if there are still
>> consistency group snapshots associated to the image.
>>
>>
>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> Another thing that bothers me. When we remove an image from a consistency group.
>>> Should we remove all snapshots of this image that were created as part
>>> of a consistency group snapshot?
>>>
>>> The easiest solution would be to remove all snapshots that are in
>>> GroupSnapshotNamespace and reference this consistency group.
>>> I looked into cinder docs for this feature:
>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html
>>>
>>> But it's not clear to me which behavior cinder expects.
>>>
>>> Thanks,
>>> V.
>>>
>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>> In a perfect world, it would be nice to add a new optional to "rbd
>>>> snap ls" to show all snapshots (with a new column to indicate the
>>>> associated namespace).
>>>>
>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>> Question. When we print out snapshots of an image, should the group
>>>>> snapshots be listed, or should they be marked as special snapshots?
>>>>>
>>>>> Thanks,
>>>>> V.
>>>>>
>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>> Ok. I didn't have any intention to throw exceptions.
>>>>>> I was more concerned about whether it's ok to allocate and delete
>>>>>> objects or I should use smart pointers.
>>>>>>
>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>> The only place exceptions are routinely used is within the "::decode"
>>>>>>> functions. I would prefer to see the code not throwing new exceptions
>>>>>>> on purpose.
>>>>>>>
>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> V.
>>>>>>>>
>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>> if (struct_v >= 5) {
>>>>>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>>>>>     } else {
>>>>>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>>>>>
>>>>>>>>>> instead of
>>>>>>>>>>
>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>>>>>> snapshot_namespace);
>>>>>>>>>>
>>>>>>>>>> I would do:
>>>>>>>>>> ::encode(snapshot_namespace, bl);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> +1 -- looks good to me
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jason
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jason
>>>>
>>>>
>>>>
>>>> --
>>>> Jason
>>
>>
>>
>> --
>> Jason
>
>
>
> --
> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-12-14  1:03                                                             ` Victor Denisov
@ 2016-12-15 15:10                                                               ` Jason Dillaman
  2016-12-16  0:05                                                                 ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Dillaman @ 2016-12-15 15:10 UTC (permalink / raw)
  To: Victor Denisov; +Cc: Mykola Golub, ceph-devel

I think I might be confused. When creating a group snapshot, we have
the ConsistencyGroupSnapshot that allows you to store the necessary
linkage between the image's snapshot and its associated group snapshot
[1]. Why not just name the image's snapshots to the same name as the
parent group snapshot name and search the snapshot's metadata to get
the linkage?

[1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255

On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> Jason,
>
> My current implementation of consistency group snapshot feature names
> image snapshots like: <group_pool>_<group_id>_<group_snap_id>
> I rely on this fact when I need to remove a consistency group. It's
> necessary because if some of image snapshots were created, but the
> whole group snapshot operation failed,
> then the only way to find those dangling image snapshots is by this name.
>
> It means that we should forbid renaming snapshots from
> ConsistencyGroupSnapshot namespace.
>
> Another option is to allocate image snapshot ids during the creation
> of group snapshot, but this requires a major rewrite of the whole
> process of snapshot creation for images.
>
> What is your opinion on this?
>
> Thanks,
> V.
>
>
> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>> I chatted with Xing on IRC this morning re: Cinder generic groups. It
>> sounds like RBD will need to support preserving the image's
>> consistency group snapshots even if the image is removed from the
>> group. In the OpenStack case, you won't have to worry about the image
>> being deleted while it still has associated group snapshots.
>>
>> We will also want to support being able to clone child images from a
>> group snapshot to ensure that we can thin provision new groups volumes
>> when creating a new group from a group snapshot. This means that the
>> group snapshots should be able to be protected/unprotected just like
>> standard user snapshots.
>>
>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>> Looking at the Cinder codebase, I don't see any such restriction that
>>> would prevent you from removing a volume from a consistency group that
>>> has associated snapshots. I would double-check on the OpenStack
>>> development mailing list if this is correct and is the intent. Worst
>>> case, the RBD driver could raise an exception if there are still
>>> consistency group snapshots associated to the image.
>>>
>>>
>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>> Another thing that bothers me. When we remove an image from a consistency group.
>>>> Should we remove all snapshots of this image that were created as part
>>>> of a consistency group snapshot?
>>>>
>>>> The easiest solution would be to remove all snapshots that are in
>>>> GroupSnapshotNamespace and reference this consistency group.
>>>> I looked into cinder docs for this feature:
>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html
>>>>
>>>> But it's not clear to me which behavior cinder expects.
>>>>
>>>> Thanks,
>>>> V.
>>>>
>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>> In a perfect world, it would be nice to add a new optional to "rbd
>>>>> snap ls" to show all snapshots (with a new column to indicate the
>>>>> associated namespace).
>>>>>
>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>> Question. When we print out snapshots of an image, should the group
>>>>>> snapshots be listed, or should they be marked as special snapshots?
>>>>>>
>>>>>> Thanks,
>>>>>> V.
>>>>>>
>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>> Ok. I didn't have any intention to throw exceptions.
>>>>>>> I was more concerned about whether it's ok to allocate and delete
>>>>>>> objects or I should use smart pointers.
>>>>>>>
>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>> The only place exceptions are routinely used is within the "::decode"
>>>>>>>> functions. I would prefer to see the code not throwing new exceptions
>>>>>>>> on purpose.
>>>>>>>>
>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> V.
>>>>>>>>>
>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>> if (struct_v >= 5) {
>>>>>>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>>>>>>     } else {
>>>>>>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>>>>>>     }
>>>>>>>>>>>
>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>>>>>>
>>>>>>>>>>> instead of
>>>>>>>>>>>
>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>>>>>>> snapshot_namespace);
>>>>>>>>>>>
>>>>>>>>>>> I would do:
>>>>>>>>>>> ::encode(snapshot_namespace, bl);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> +1 -- looks good to me
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Jason
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jason
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jason
>>>
>>>
>>>
>>> --
>>> Jason
>>
>>
>>
>> --
>> Jason



-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-12-15 15:10                                                               ` Jason Dillaman
@ 2016-12-16  0:05                                                                 ` Victor Denisov
  2016-12-16 14:53                                                                   ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2016-12-16  0:05 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel

Yes, but if image's snapshot is renamed then I'm not able to find this
snapshot having only group's snapshot in an inconsistent state for
example.

V.

On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote:
> I think I might be confused. When creating a group snapshot, we have
> the ConsistencyGroupSnapshot that allows you to store the necessary
> linkage between the image's snapshot and its associated group snapshot
> [1]. Why not just name the image's snapshots to the same name as the
> parent group snapshot name and search the snapshot's metadata to get
> the linkage?
>
> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255
>
> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> Jason,
>>
>> My current implementation of consistency group snapshot feature names
>> image snapshots like: <group_pool>_<group_id>_<group_snap_id>
>> I rely on this fact when I need to remove a consistency group. It's
>> necessary because if some of image snapshots were created, but the
>> whole group snapshot operation failed,
>> then the only way to find those dangling image snapshots is by this name.
>>
>> It means that we should forbid renaming snapshots from
>> ConsistencyGroupSnapshot namespace.
>>
>> Another option is to allocate image snapshot ids during the creation
>> of group snapshot, but this requires a major rewrite of the whole
>> process of snapshot creation for images.
>>
>> What is your opinion on this?
>>
>> Thanks,
>> V.
>>
>>
>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It
>>> sounds like RBD will need to support preserving the image's
>>> consistency group snapshots even if the image is removed from the
>>> group. In the OpenStack case, you won't have to worry about the image
>>> being deleted while it still has associated group snapshots.
>>>
>>> We will also want to support being able to clone child images from a
>>> group snapshot to ensure that we can thin provision new groups volumes
>>> when creating a new group from a group snapshot. This means that the
>>> group snapshots should be able to be protected/unprotected just like
>>> standard user snapshots.
>>>
>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>> Looking at the Cinder codebase, I don't see any such restriction that
>>>> would prevent you from removing a volume from a consistency group that
>>>> has associated snapshots. I would double-check on the OpenStack
>>>> development mailing list if this is correct and is the intent. Worst
>>>> case, the RBD driver could raise an exception if there are still
>>>> consistency group snapshots associated to the image.
>>>>
>>>>
>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>> Another thing that bothers me. When we remove an image from a consistency group.
>>>>> Should we remove all snapshots of this image that were created as part
>>>>> of a consistency group snapshot?
>>>>>
>>>>> The easiest solution would be to remove all snapshots that are in
>>>>> GroupSnapshotNamespace and reference this consistency group.
>>>>> I looked into cinder docs for this feature:
>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html
>>>>>
>>>>> But it's not clear to me which behavior cinder expects.
>>>>>
>>>>> Thanks,
>>>>> V.
>>>>>
>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>> In a perfect world, it would be nice to add a new optional to "rbd
>>>>>> snap ls" to show all snapshots (with a new column to indicate the
>>>>>> associated namespace).
>>>>>>
>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>> Question. When we print out snapshots of an image, should the group
>>>>>>> snapshots be listed, or should they be marked as special snapshots?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> V.
>>>>>>>
>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>> Ok. I didn't have any intention to throw exceptions.
>>>>>>>> I was more concerned about whether it's ok to allocate and delete
>>>>>>>> objects or I should use smart pointers.
>>>>>>>>
>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>> The only place exceptions are routinely used is within the "::decode"
>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions
>>>>>>>>> on purpose.
>>>>>>>>>
>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> V.
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>> if (struct_v >= 5) {
>>>>>>>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>>>>>>>     } else {
>>>>>>>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>>>>>>>     }
>>>>>>>>>>>>
>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>>>>>>>
>>>>>>>>>>>> instead of
>>>>>>>>>>>>
>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>>>>>>>> snapshot_namespace);
>>>>>>>>>>>>
>>>>>>>>>>>> I would do:
>>>>>>>>>>>> ::encode(snapshot_namespace, bl);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> +1 -- looks good to me
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Jason
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jason
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jason
>>>>
>>>>
>>>>
>>>> --
>>>> Jason
>>>
>>>
>>>
>>> --
>>> Jason
>
>
>
> --
> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-12-16  0:05                                                                 ` Victor Denisov
@ 2016-12-16 14:53                                                                   ` Jason Dillaman
  2017-01-04  0:40                                                                     ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Dillaman @ 2016-12-16 14:53 UTC (permalink / raw)
  To: Victor Denisov; +Cc: Mykola Golub, ceph-devel

Can you give a little background on this specific inconsistent case
you are referring to?

On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> Yes, but if image's snapshot is renamed then I'm not able to find this
> snapshot having only group's snapshot in an inconsistent state for
> example.
>
> V.
>
> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>> I think I might be confused. When creating a group snapshot, we have
>> the ConsistencyGroupSnapshot that allows you to store the necessary
>> linkage between the image's snapshot and its associated group snapshot
>> [1]. Why not just name the image's snapshots to the same name as the
>> parent group snapshot name and search the snapshot's metadata to get
>> the linkage?
>>
>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255
>>
>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> Jason,
>>>
>>> My current implementation of consistency group snapshot feature names
>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id>
>>> I rely on this fact when I need to remove a consistency group. It's
>>> necessary because if some of image snapshots were created, but the
>>> whole group snapshot operation failed,
>>> then the only way to find those dangling image snapshots is by this name.
>>>
>>> It means that we should forbid renaming snapshots from
>>> ConsistencyGroupSnapshot namespace.
>>>
>>> Another option is to allocate image snapshot ids during the creation
>>> of group snapshot, but this requires a major rewrite of the whole
>>> process of snapshot creation for images.
>>>
>>> What is your opinion on this?
>>>
>>> Thanks,
>>> V.
>>>
>>>
>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It
>>>> sounds like RBD will need to support preserving the image's
>>>> consistency group snapshots even if the image is removed from the
>>>> group. In the OpenStack case, you won't have to worry about the image
>>>> being deleted while it still has associated group snapshots.
>>>>
>>>> We will also want to support being able to clone child images from a
>>>> group snapshot to ensure that we can thin provision new groups volumes
>>>> when creating a new group from a group snapshot. This means that the
>>>> group snapshots should be able to be protected/unprotected just like
>>>> standard user snapshots.
>>>>
>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>> Looking at the Cinder codebase, I don't see any such restriction that
>>>>> would prevent you from removing a volume from a consistency group that
>>>>> has associated snapshots. I would double-check on the OpenStack
>>>>> development mailing list if this is correct and is the intent. Worst
>>>>> case, the RBD driver could raise an exception if there are still
>>>>> consistency group snapshots associated to the image.
>>>>>
>>>>>
>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>> Another thing that bothers me. When we remove an image from a consistency group.
>>>>>> Should we remove all snapshots of this image that were created as part
>>>>>> of a consistency group snapshot?
>>>>>>
>>>>>> The easiest solution would be to remove all snapshots that are in
>>>>>> GroupSnapshotNamespace and reference this consistency group.
>>>>>> I looked into cinder docs for this feature:
>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html
>>>>>>
>>>>>> But it's not clear to me which behavior cinder expects.
>>>>>>
>>>>>> Thanks,
>>>>>> V.
>>>>>>
>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd
>>>>>>> snap ls" to show all snapshots (with a new column to indicate the
>>>>>>> associated namespace).
>>>>>>>
>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>> Question. When we print out snapshots of an image, should the group
>>>>>>>> snapshots be listed, or should they be marked as special snapshots?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> V.
>>>>>>>>
>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>> Ok. I didn't have any intention to throw exceptions.
>>>>>>>>> I was more concerned about whether it's ok to allocate and delete
>>>>>>>>> objects or I should use smart pointers.
>>>>>>>>>
>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>> The only place exceptions are routinely used is within the "::decode"
>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions
>>>>>>>>>> on purpose.
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> V.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>> if (struct_v >= 5) {
>>>>>>>>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>>>>>>>>     } else {
>>>>>>>>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>
>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>>>>>>>>
>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>
>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>>>>>>>>> snapshot_namespace);
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would do:
>>>>>>>>>>>>> ::encode(snapshot_namespace, bl);
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> +1 -- looks good to me
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Jason
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Jason
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jason
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jason
>>>>
>>>>
>>>>
>>>> --
>>>> Jason
>>
>>
>>
>> --
>> Jason



-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2016-12-16 14:53                                                                   ` Jason Dillaman
@ 2017-01-04  0:40                                                                     ` Victor Denisov
  2017-01-04  2:56                                                                       ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2017-01-04  0:40 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel

Let's say we start creating a group snapshot.
We invoke async snap_create method in Operations class.
When we invoke this method we provide it with the snapshot name.

While we are wating for the response we can be aborted.
As a result we will be able to find the exact image snapshot using only its name
as this was the only information we had at the time of running
snap_create method.

If snap_create was successful we will be able to find the snapshot
otherwise we will not.
However if we allow renaming snapshots from GroupSnapshotNamespace
then we may not find the snapshot even if it
was created successfully.

On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote:
> Can you give a little background on this specific inconsistent case
> you are referring to?
>
> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> Yes, but if image's snapshot is renamed then I'm not able to find this
>> snapshot having only group's snapshot in an inconsistent state for
>> example.
>>
>> V.
>>
>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>> I think I might be confused. When creating a group snapshot, we have
>>> the ConsistencyGroupSnapshot that allows you to store the necessary
>>> linkage between the image's snapshot and its associated group snapshot
>>> [1]. Why not just name the image's snapshots to the same name as the
>>> parent group snapshot name and search the snapshot's metadata to get
>>> the linkage?
>>>
>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255
>>>
>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>> Jason,
>>>>
>>>> My current implementation of consistency group snapshot feature names
>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id>
>>>> I rely on this fact when I need to remove a consistency group. It's
>>>> necessary because if some of image snapshots were created, but the
>>>> whole group snapshot operation failed,
>>>> then the only way to find those dangling image snapshots is by this name.
>>>>
>>>> It means that we should forbid renaming snapshots from
>>>> ConsistencyGroupSnapshot namespace.
>>>>
>>>> Another option is to allocate image snapshot ids during the creation
>>>> of group snapshot, but this requires a major rewrite of the whole
>>>> process of snapshot creation for images.
>>>>
>>>> What is your opinion on this?
>>>>
>>>> Thanks,
>>>> V.
>>>>
>>>>
>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It
>>>>> sounds like RBD will need to support preserving the image's
>>>>> consistency group snapshots even if the image is removed from the
>>>>> group. In the OpenStack case, you won't have to worry about the image
>>>>> being deleted while it still has associated group snapshots.
>>>>>
>>>>> We will also want to support being able to clone child images from a
>>>>> group snapshot to ensure that we can thin provision new groups volumes
>>>>> when creating a new group from a group snapshot. This means that the
>>>>> group snapshots should be able to be protected/unprotected just like
>>>>> standard user snapshots.
>>>>>
>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>> Looking at the Cinder codebase, I don't see any such restriction that
>>>>>> would prevent you from removing a volume from a consistency group that
>>>>>> has associated snapshots. I would double-check on the OpenStack
>>>>>> development mailing list if this is correct and is the intent. Worst
>>>>>> case, the RBD driver could raise an exception if there are still
>>>>>> consistency group snapshots associated to the image.
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>> Another thing that bothers me. When we remove an image from a consistency group.
>>>>>>> Should we remove all snapshots of this image that were created as part
>>>>>>> of a consistency group snapshot?
>>>>>>>
>>>>>>> The easiest solution would be to remove all snapshots that are in
>>>>>>> GroupSnapshotNamespace and reference this consistency group.
>>>>>>> I looked into cinder docs for this feature:
>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html
>>>>>>>
>>>>>>> But it's not clear to me which behavior cinder expects.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> V.
>>>>>>>
>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd
>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the
>>>>>>>> associated namespace).
>>>>>>>>
>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>> Question. When we print out snapshots of an image, should the group
>>>>>>>>> snapshots be listed, or should they be marked as special snapshots?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> V.
>>>>>>>>>
>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>> Ok. I didn't have any intention to throw exceptions.
>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete
>>>>>>>>>> objects or I should use smart pointers.
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode"
>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions
>>>>>>>>>>> on purpose.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> V.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>> if (struct_v >= 5) {
>>>>>>>>>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>>>>>>>>>     } else {
>>>>>>>>>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>>>>>>>>>> snapshot_namespace);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would do:
>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl);
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> +1 -- looks good to me
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Jason
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Jason
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jason
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jason
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jason
>>>
>>>
>>>
>>> --
>>> Jason
>
>
>
> --
> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2017-01-04  0:40                                                                     ` Victor Denisov
@ 2017-01-04  2:56                                                                       ` Jason Dillaman
  2017-01-04 18:29                                                                         ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Dillaman @ 2017-01-04  2:56 UTC (permalink / raw)
  To: Victor Denisov; +Cc: Mykola Golub, ceph-devel

After starting the process of creating a group snapshot, you will
already have all the necessary data for the group snapshot namespace
[1] (group pool, group id, and group snapshot id) and the group
snapshot should be persistently recorded to disk as
GROUP_SNAPSHOT_STATE_PENDING.

Looking at the snapshot create state machine [2], I don't see any
place that a crash (or similar failure) would matter before the actual
image snapshot record is created atomically. You would pass the fully
populated GroupSnapshotNamespace to snap_create, and if the snapshot
is created, it's linked to the group via that namespace and any
failures afterwards don't matter since they are linked -- if the
snapshot fails to be created, it isn't linked to the group but the
snapshot doesn't exist either so there isn't anything to clean up.

Since you know which images are linked to the group and you know which
snapshots are in the group and which group snapshots are in the image,
you can reconcile any issues using the details in the
GroupSnapshotNamespace -- there shouldn't be any need to depend on the
actual snapshot name (it could technically just be a randomly assigned
UUID).

Perhaps we could talk about this at a future RBD standup meeting that
you are able to join (or the next CDM).

[1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L249
[2] https://github.com/ceph/ceph/blob/master/src/librbd/operation/SnapshotCreateRequest.h#L28

On Tue, Jan 3, 2017 at 7:40 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> Let's say we start creating a group snapshot.
> We invoke async snap_create method in Operations class.
> When we invoke this method we provide it with the snapshot name.
>
> While we are wating for the response we can be aborted.
> As a result we will be able to find the exact image snapshot using only its name
> as this was the only information we had at the time of running
> snap_create method.
>
> If snap_create was successful we will be able to find the snapshot
> otherwise we will not.
> However if we allow renaming snapshots from GroupSnapshotNamespace
> then we may not find the snapshot even if it
> was created successfully.
>
> On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>> Can you give a little background on this specific inconsistent case
>> you are referring to?
>>
>> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> Yes, but if image's snapshot is renamed then I'm not able to find this
>>> snapshot having only group's snapshot in an inconsistent state for
>>> example.
>>>
>>> V.
>>>
>>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>> I think I might be confused. When creating a group snapshot, we have
>>>> the ConsistencyGroupSnapshot that allows you to store the necessary
>>>> linkage between the image's snapshot and its associated group snapshot
>>>> [1]. Why not just name the image's snapshots to the same name as the
>>>> parent group snapshot name and search the snapshot's metadata to get
>>>> the linkage?
>>>>
>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255
>>>>
>>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>> Jason,
>>>>>
>>>>> My current implementation of consistency group snapshot feature names
>>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id>
>>>>> I rely on this fact when I need to remove a consistency group. It's
>>>>> necessary because if some of image snapshots were created, but the
>>>>> whole group snapshot operation failed,
>>>>> then the only way to find those dangling image snapshots is by this name.
>>>>>
>>>>> It means that we should forbid renaming snapshots from
>>>>> ConsistencyGroupSnapshot namespace.
>>>>>
>>>>> Another option is to allocate image snapshot ids during the creation
>>>>> of group snapshot, but this requires a major rewrite of the whole
>>>>> process of snapshot creation for images.
>>>>>
>>>>> What is your opinion on this?
>>>>>
>>>>> Thanks,
>>>>> V.
>>>>>
>>>>>
>>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It
>>>>>> sounds like RBD will need to support preserving the image's
>>>>>> consistency group snapshots even if the image is removed from the
>>>>>> group. In the OpenStack case, you won't have to worry about the image
>>>>>> being deleted while it still has associated group snapshots.
>>>>>>
>>>>>> We will also want to support being able to clone child images from a
>>>>>> group snapshot to ensure that we can thin provision new groups volumes
>>>>>> when creating a new group from a group snapshot. This means that the
>>>>>> group snapshots should be able to be protected/unprotected just like
>>>>>> standard user snapshots.
>>>>>>
>>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>> Looking at the Cinder codebase, I don't see any such restriction that
>>>>>>> would prevent you from removing a volume from a consistency group that
>>>>>>> has associated snapshots. I would double-check on the OpenStack
>>>>>>> development mailing list if this is correct and is the intent. Worst
>>>>>>> case, the RBD driver could raise an exception if there are still
>>>>>>> consistency group snapshots associated to the image.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>> Another thing that bothers me. When we remove an image from a consistency group.
>>>>>>>> Should we remove all snapshots of this image that were created as part
>>>>>>>> of a consistency group snapshot?
>>>>>>>>
>>>>>>>> The easiest solution would be to remove all snapshots that are in
>>>>>>>> GroupSnapshotNamespace and reference this consistency group.
>>>>>>>> I looked into cinder docs for this feature:
>>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html
>>>>>>>>
>>>>>>>> But it's not clear to me which behavior cinder expects.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> V.
>>>>>>>>
>>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd
>>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the
>>>>>>>>> associated namespace).
>>>>>>>>>
>>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>> Question. When we print out snapshots of an image, should the group
>>>>>>>>>> snapshots be listed, or should they be marked as special snapshots?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> V.
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>> Ok. I didn't have any intention to throw exceptions.
>>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete
>>>>>>>>>>> objects or I should use smart pointers.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode"
>>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions
>>>>>>>>>>>> on purpose.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> V.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>> if (struct_v >= 5) {
>>>>>>>>>>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>>>>>>>>>>     } else {
>>>>>>>>>>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>>>>>>>>>>> snapshot_namespace);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I would do:
>>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1 -- looks good to me
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Jason
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jason
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jason
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jason
>>>>
>>>>
>>>>
>>>> --
>>>> Jason
>>
>>
>>
>> --
>> Jason



-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2017-01-04  2:56                                                                       ` Jason Dillaman
@ 2017-01-04 18:29                                                                         ` Victor Denisov
  2017-01-04 18:34                                                                           ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2017-01-04 18:29 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel

It looks like next CDM is only next month. Let's try to figure it out in email.

> Since you know which images are linked to the group and you know which
> snapshots are in the group and which group snapshots are in the image,
> you can reconcile any issues using the details in the
> GroupSnapshotNamespace -- there shouldn't be any need to depend on the
> actual snapshot name (it could technically just be a randomly assigned
> UUID).

Let's say I have a consistency group and a snapshot of this group: CG
and CGSNAP.
Images in this snapshot I'll define as:
CG.images[0] - image1
CG.images[1] - image2
CG.images[2] - image3

Image snapshots in cg snapshot will be:
CG.CGSNAP.snaps[0] - reference to snapshot of image 1
CG.CGSNAP.snaps[1] - reference to snapshot of image 2

Imagine that this snapshot was created, but wasn't finalized.
CG.CGSNAP.state == PENDING.
CG.CGSNAP.snaps.length == 0;
I'll be writing in pseudo code.

Now, let's say we want to remove this pending CGSNAP. This is the code
how it's currently implemented:

for (image: CG.images) {
  snap_name = image.id + "_" + CG.CGSNAP.id + "_" + CG.id // This name
is unique because of uniqueness of the tuple (image.id, CG.CGSNAP.id,
CG.id)
  remove_image_snapshot(snap_name);
}
remove_cg_snap(CGSNAP);

However, if we don't rely on the name then this is how I envision the code:

for (image: CG.images) {
  for (snap: image.snaps) {
    if (snap.namespace.cg_id == CG.id && snap.namespace.cg_snap_id ==
CG.CGSNAP.id) { // this is our snapshot
      remove_image_snapshot(snap.name);
    }
  }
}
remove_cg_snap(CGSNAP);

In this solution I don't like the internal loop.
What do you think?

Thanks,
Victor.


On Tue, Jan 3, 2017 at 6:56 PM, Jason Dillaman <jdillama@redhat.com> wrote:
> After starting the process of creating a group snapshot, you will
> already have all the necessary data for the group snapshot namespace
> [1] (group pool, group id, and group snapshot id) and the group
> snapshot should be persistently recorded to disk as
> GROUP_SNAPSHOT_STATE_PENDING.
>
> Looking at the snapshot create state machine [2], I don't see any
> place that a crash (or similar failure) would matter before the actual
> image snapshot record is created atomically. You would pass the fully
> populated GroupSnapshotNamespace to snap_create, and if the snapshot
> is created, it's linked to the group via that namespace and any
> failures afterwards don't matter since they are linked -- if the
> snapshot fails to be created, it isn't linked to the group but the
> snapshot doesn't exist either so there isn't anything to clean up.
>
> Since you know which images are linked to the group and you know which
> snapshots are in the group and which group snapshots are in the image,
> you can reconcile any issues using the details in the
> GroupSnapshotNamespace -- there shouldn't be any need to depend on the
> actual snapshot name (it could technically just be a randomly assigned
> UUID).
>
> Perhaps we could talk about this at a future RBD standup meeting that
> you are able to join (or the next CDM).
>
> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L249
> [2] https://github.com/ceph/ceph/blob/master/src/librbd/operation/SnapshotCreateRequest.h#L28
>
> On Tue, Jan 3, 2017 at 7:40 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> Let's say we start creating a group snapshot.
>> We invoke async snap_create method in Operations class.
>> When we invoke this method we provide it with the snapshot name.
>>
>> While we are wating for the response we can be aborted.
>> As a result we will be able to find the exact image snapshot using only its name
>> as this was the only information we had at the time of running
>> snap_create method.
>>
>> If snap_create was successful we will be able to find the snapshot
>> otherwise we will not.
>> However if we allow renaming snapshots from GroupSnapshotNamespace
>> then we may not find the snapshot even if it
>> was created successfully.
>>
>> On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>> Can you give a little background on this specific inconsistent case
>>> you are referring to?
>>>
>>> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>> Yes, but if image's snapshot is renamed then I'm not able to find this
>>>> snapshot having only group's snapshot in an inconsistent state for
>>>> example.
>>>>
>>>> V.
>>>>
>>>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>> I think I might be confused. When creating a group snapshot, we have
>>>>> the ConsistencyGroupSnapshot that allows you to store the necessary
>>>>> linkage between the image's snapshot and its associated group snapshot
>>>>> [1]. Why not just name the image's snapshots to the same name as the
>>>>> parent group snapshot name and search the snapshot's metadata to get
>>>>> the linkage?
>>>>>
>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255
>>>>>
>>>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>> Jason,
>>>>>>
>>>>>> My current implementation of consistency group snapshot feature names
>>>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id>
>>>>>> I rely on this fact when I need to remove a consistency group. It's
>>>>>> necessary because if some of image snapshots were created, but the
>>>>>> whole group snapshot operation failed,
>>>>>> then the only way to find those dangling image snapshots is by this name.
>>>>>>
>>>>>> It means that we should forbid renaming snapshots from
>>>>>> ConsistencyGroupSnapshot namespace.
>>>>>>
>>>>>> Another option is to allocate image snapshot ids during the creation
>>>>>> of group snapshot, but this requires a major rewrite of the whole
>>>>>> process of snapshot creation for images.
>>>>>>
>>>>>> What is your opinion on this?
>>>>>>
>>>>>> Thanks,
>>>>>> V.
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It
>>>>>>> sounds like RBD will need to support preserving the image's
>>>>>>> consistency group snapshots even if the image is removed from the
>>>>>>> group. In the OpenStack case, you won't have to worry about the image
>>>>>>> being deleted while it still has associated group snapshots.
>>>>>>>
>>>>>>> We will also want to support being able to clone child images from a
>>>>>>> group snapshot to ensure that we can thin provision new groups volumes
>>>>>>> when creating a new group from a group snapshot. This means that the
>>>>>>> group snapshots should be able to be protected/unprotected just like
>>>>>>> standard user snapshots.
>>>>>>>
>>>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>> Looking at the Cinder codebase, I don't see any such restriction that
>>>>>>>> would prevent you from removing a volume from a consistency group that
>>>>>>>> has associated snapshots. I would double-check on the OpenStack
>>>>>>>> development mailing list if this is correct and is the intent. Worst
>>>>>>>> case, the RBD driver could raise an exception if there are still
>>>>>>>> consistency group snapshots associated to the image.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>> Another thing that bothers me. When we remove an image from a consistency group.
>>>>>>>>> Should we remove all snapshots of this image that were created as part
>>>>>>>>> of a consistency group snapshot?
>>>>>>>>>
>>>>>>>>> The easiest solution would be to remove all snapshots that are in
>>>>>>>>> GroupSnapshotNamespace and reference this consistency group.
>>>>>>>>> I looked into cinder docs for this feature:
>>>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html
>>>>>>>>>
>>>>>>>>> But it's not clear to me which behavior cinder expects.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> V.
>>>>>>>>>
>>>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd
>>>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the
>>>>>>>>>> associated namespace).
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>> Question. When we print out snapshots of an image, should the group
>>>>>>>>>>> snapshots be listed, or should they be marked as special snapshots?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> V.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>> Ok. I didn't have any intention to throw exceptions.
>>>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete
>>>>>>>>>>>> objects or I should use smart pointers.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode"
>>>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions
>>>>>>>>>>>>> on purpose.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> V.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>> if (struct_v >= 5) {
>>>>>>>>>>>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>>>>>>>>>>>     } else {
>>>>>>>>>>>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>>>>>>>>>>>> snapshot_namespace);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I would do:
>>>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1 -- looks good to me
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Jason
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Jason
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jason
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jason
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jason
>>>
>>>
>>>
>>> --
>>> Jason
>
>
>
> --
> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2017-01-04 18:29                                                                         ` Victor Denisov
@ 2017-01-04 18:34                                                                           ` Jason Dillaman
  2017-01-04 18:39                                                                             ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Dillaman @ 2017-01-04 18:34 UTC (permalink / raw)
  To: Victor Denisov; +Cc: Mykola Golub, ceph-devel

The loop isn't that big of a deal -- but you could eliminate it
entirely if you just index the in-memory snapshot table via the
SnapshotNamespace variant instead of just indexing snapshots by name
(e.g. ImageCtx::snap_ids key switches from a string to a namespace).
This would be required anyway since you might otherwise have duplicate
names between namespaces.

On Wed, Jan 4, 2017 at 1:29 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> It looks like next CDM is only next month. Let's try to figure it out in email.
>
>> Since you know which images are linked to the group and you know which
>> snapshots are in the group and which group snapshots are in the image,
>> you can reconcile any issues using the details in the
>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the
>> actual snapshot name (it could technically just be a randomly assigned
>> UUID).
>
> Let's say I have a consistency group and a snapshot of this group: CG
> and CGSNAP.
> Images in this snapshot I'll define as:
> CG.images[0] - image1
> CG.images[1] - image2
> CG.images[2] - image3
>
> Image snapshots in cg snapshot will be:
> CG.CGSNAP.snaps[0] - reference to snapshot of image 1
> CG.CGSNAP.snaps[1] - reference to snapshot of image 2
>
> Imagine that this snapshot was created, but wasn't finalized.
> CG.CGSNAP.state == PENDING.
> CG.CGSNAP.snaps.length == 0;
> I'll be writing in pseudo code.
>
> Now, let's say we want to remove this pending CGSNAP. This is the code
> how it's currently implemented:
>
> for (image: CG.images) {
>   snap_name = image.id + "_" + CG.CGSNAP.id + "_" + CG.id // This name
> is unique because of uniqueness of the tuple (image.id, CG.CGSNAP.id,
> CG.id)
>   remove_image_snapshot(snap_name);
> }
> remove_cg_snap(CGSNAP);
>
> However, if we don't rely on the name then this is how I envision the code:
>
> for (image: CG.images) {
>   for (snap: image.snaps) {
>     if (snap.namespace.cg_id == CG.id && snap.namespace.cg_snap_id ==
> CG.CGSNAP.id) { // this is our snapshot
>       remove_image_snapshot(snap.name);
>     }
>   }
> }
> remove_cg_snap(CGSNAP);
>
> In this solution I don't like the internal loop.
> What do you think?
>
> Thanks,
> Victor.
>
>
> On Tue, Jan 3, 2017 at 6:56 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>> After starting the process of creating a group snapshot, you will
>> already have all the necessary data for the group snapshot namespace
>> [1] (group pool, group id, and group snapshot id) and the group
>> snapshot should be persistently recorded to disk as
>> GROUP_SNAPSHOT_STATE_PENDING.
>>
>> Looking at the snapshot create state machine [2], I don't see any
>> place that a crash (or similar failure) would matter before the actual
>> image snapshot record is created atomically. You would pass the fully
>> populated GroupSnapshotNamespace to snap_create, and if the snapshot
>> is created, it's linked to the group via that namespace and any
>> failures afterwards don't matter since they are linked -- if the
>> snapshot fails to be created, it isn't linked to the group but the
>> snapshot doesn't exist either so there isn't anything to clean up.
>>
>> Since you know which images are linked to the group and you know which
>> snapshots are in the group and which group snapshots are in the image,
>> you can reconcile any issues using the details in the
>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the
>> actual snapshot name (it could technically just be a randomly assigned
>> UUID).
>>
>> Perhaps we could talk about this at a future RBD standup meeting that
>> you are able to join (or the next CDM).
>>
>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L249
>> [2] https://github.com/ceph/ceph/blob/master/src/librbd/operation/SnapshotCreateRequest.h#L28
>>
>> On Tue, Jan 3, 2017 at 7:40 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> Let's say we start creating a group snapshot.
>>> We invoke async snap_create method in Operations class.
>>> When we invoke this method we provide it with the snapshot name.
>>>
>>> While we are wating for the response we can be aborted.
>>> As a result we will be able to find the exact image snapshot using only its name
>>> as this was the only information we had at the time of running
>>> snap_create method.
>>>
>>> If snap_create was successful we will be able to find the snapshot
>>> otherwise we will not.
>>> However if we allow renaming snapshots from GroupSnapshotNamespace
>>> then we may not find the snapshot even if it
>>> was created successfully.
>>>
>>> On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>> Can you give a little background on this specific inconsistent case
>>>> you are referring to?
>>>>
>>>> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>> Yes, but if image's snapshot is renamed then I'm not able to find this
>>>>> snapshot having only group's snapshot in an inconsistent state for
>>>>> example.
>>>>>
>>>>> V.
>>>>>
>>>>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>> I think I might be confused. When creating a group snapshot, we have
>>>>>> the ConsistencyGroupSnapshot that allows you to store the necessary
>>>>>> linkage between the image's snapshot and its associated group snapshot
>>>>>> [1]. Why not just name the image's snapshots to the same name as the
>>>>>> parent group snapshot name and search the snapshot's metadata to get
>>>>>> the linkage?
>>>>>>
>>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255
>>>>>>
>>>>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>> Jason,
>>>>>>>
>>>>>>> My current implementation of consistency group snapshot feature names
>>>>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id>
>>>>>>> I rely on this fact when I need to remove a consistency group. It's
>>>>>>> necessary because if some of image snapshots were created, but the
>>>>>>> whole group snapshot operation failed,
>>>>>>> then the only way to find those dangling image snapshots is by this name.
>>>>>>>
>>>>>>> It means that we should forbid renaming snapshots from
>>>>>>> ConsistencyGroupSnapshot namespace.
>>>>>>>
>>>>>>> Another option is to allocate image snapshot ids during the creation
>>>>>>> of group snapshot, but this requires a major rewrite of the whole
>>>>>>> process of snapshot creation for images.
>>>>>>>
>>>>>>> What is your opinion on this?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> V.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It
>>>>>>>> sounds like RBD will need to support preserving the image's
>>>>>>>> consistency group snapshots even if the image is removed from the
>>>>>>>> group. In the OpenStack case, you won't have to worry about the image
>>>>>>>> being deleted while it still has associated group snapshots.
>>>>>>>>
>>>>>>>> We will also want to support being able to clone child images from a
>>>>>>>> group snapshot to ensure that we can thin provision new groups volumes
>>>>>>>> when creating a new group from a group snapshot. This means that the
>>>>>>>> group snapshots should be able to be protected/unprotected just like
>>>>>>>> standard user snapshots.
>>>>>>>>
>>>>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>> Looking at the Cinder codebase, I don't see any such restriction that
>>>>>>>>> would prevent you from removing a volume from a consistency group that
>>>>>>>>> has associated snapshots. I would double-check on the OpenStack
>>>>>>>>> development mailing list if this is correct and is the intent. Worst
>>>>>>>>> case, the RBD driver could raise an exception if there are still
>>>>>>>>> consistency group snapshots associated to the image.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>> Another thing that bothers me. When we remove an image from a consistency group.
>>>>>>>>>> Should we remove all snapshots of this image that were created as part
>>>>>>>>>> of a consistency group snapshot?
>>>>>>>>>>
>>>>>>>>>> The easiest solution would be to remove all snapshots that are in
>>>>>>>>>> GroupSnapshotNamespace and reference this consistency group.
>>>>>>>>>> I looked into cinder docs for this feature:
>>>>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html
>>>>>>>>>>
>>>>>>>>>> But it's not clear to me which behavior cinder expects.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> V.
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd
>>>>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the
>>>>>>>>>>> associated namespace).
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>> Question. When we print out snapshots of an image, should the group
>>>>>>>>>>>> snapshots be listed, or should they be marked as special snapshots?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> V.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>> Ok. I didn't have any intention to throw exceptions.
>>>>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete
>>>>>>>>>>>>> objects or I should use smart pointers.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode"
>>>>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions
>>>>>>>>>>>>>> on purpose.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> V.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>>> if (struct_v >= 5) {
>>>>>>>>>>>>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>>>>>>>>>>>>     } else {
>>>>>>>>>>>>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>>>>>>>>>>>>> snapshot_namespace);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I would do:
>>>>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +1 -- looks good to me
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Jason
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Jason
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jason
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jason
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jason
>>>>
>>>>
>>>>
>>>> --
>>>> Jason
>>
>>
>>
>> --
>> Jason



-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2017-01-04 18:34                                                                           ` Jason Dillaman
@ 2017-01-04 18:39                                                                             ` Victor Denisov
  2017-01-12  0:14                                                                               ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2017-01-04 18:39 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel

I understood. Thanks!

On Wed, Jan 4, 2017 at 10:34 AM, Jason Dillaman <jdillama@redhat.com> wrote:
> The loop isn't that big of a deal -- but you could eliminate it
> entirely if you just index the in-memory snapshot table via the
> SnapshotNamespace variant instead of just indexing snapshots by name
> (e.g. ImageCtx::snap_ids key switches from a string to a namespace).
> This would be required anyway since you might otherwise have duplicate
> names between namespaces.
>
> On Wed, Jan 4, 2017 at 1:29 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> It looks like next CDM is only next month. Let's try to figure it out in email.
>>
>>> Since you know which images are linked to the group and you know which
>>> snapshots are in the group and which group snapshots are in the image,
>>> you can reconcile any issues using the details in the
>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the
>>> actual snapshot name (it could technically just be a randomly assigned
>>> UUID).
>>
>> Let's say I have a consistency group and a snapshot of this group: CG
>> and CGSNAP.
>> Images in this snapshot I'll define as:
>> CG.images[0] - image1
>> CG.images[1] - image2
>> CG.images[2] - image3
>>
>> Image snapshots in cg snapshot will be:
>> CG.CGSNAP.snaps[0] - reference to snapshot of image 1
>> CG.CGSNAP.snaps[1] - reference to snapshot of image 2
>>
>> Imagine that this snapshot was created, but wasn't finalized.
>> CG.CGSNAP.state == PENDING.
>> CG.CGSNAP.snaps.length == 0;
>> I'll be writing in pseudo code.
>>
>> Now, let's say we want to remove this pending CGSNAP. This is the code
>> how it's currently implemented:
>>
>> for (image: CG.images) {
>>   snap_name = image.id + "_" + CG.CGSNAP.id + "_" + CG.id // This name
>> is unique because of uniqueness of the tuple (image.id, CG.CGSNAP.id,
>> CG.id)
>>   remove_image_snapshot(snap_name);
>> }
>> remove_cg_snap(CGSNAP);
>>
>> However, if we don't rely on the name then this is how I envision the code:
>>
>> for (image: CG.images) {
>>   for (snap: image.snaps) {
>>     if (snap.namespace.cg_id == CG.id && snap.namespace.cg_snap_id ==
>> CG.CGSNAP.id) { // this is our snapshot
>>       remove_image_snapshot(snap.name);
>>     }
>>   }
>> }
>> remove_cg_snap(CGSNAP);
>>
>> In this solution I don't like the internal loop.
>> What do you think?
>>
>> Thanks,
>> Victor.
>>
>>
>> On Tue, Jan 3, 2017 at 6:56 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>> After starting the process of creating a group snapshot, you will
>>> already have all the necessary data for the group snapshot namespace
>>> [1] (group pool, group id, and group snapshot id) and the group
>>> snapshot should be persistently recorded to disk as
>>> GROUP_SNAPSHOT_STATE_PENDING.
>>>
>>> Looking at the snapshot create state machine [2], I don't see any
>>> place that a crash (or similar failure) would matter before the actual
>>> image snapshot record is created atomically. You would pass the fully
>>> populated GroupSnapshotNamespace to snap_create, and if the snapshot
>>> is created, it's linked to the group via that namespace and any
>>> failures afterwards don't matter since they are linked -- if the
>>> snapshot fails to be created, it isn't linked to the group but the
>>> snapshot doesn't exist either so there isn't anything to clean up.
>>>
>>> Since you know which images are linked to the group and you know which
>>> snapshots are in the group and which group snapshots are in the image,
>>> you can reconcile any issues using the details in the
>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the
>>> actual snapshot name (it could technically just be a randomly assigned
>>> UUID).
>>>
>>> Perhaps we could talk about this at a future RBD standup meeting that
>>> you are able to join (or the next CDM).
>>>
>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L249
>>> [2] https://github.com/ceph/ceph/blob/master/src/librbd/operation/SnapshotCreateRequest.h#L28
>>>
>>> On Tue, Jan 3, 2017 at 7:40 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>> Let's say we start creating a group snapshot.
>>>> We invoke async snap_create method in Operations class.
>>>> When we invoke this method we provide it with the snapshot name.
>>>>
>>>> While we are wating for the response we can be aborted.
>>>> As a result we will be able to find the exact image snapshot using only its name
>>>> as this was the only information we had at the time of running
>>>> snap_create method.
>>>>
>>>> If snap_create was successful we will be able to find the snapshot
>>>> otherwise we will not.
>>>> However if we allow renaming snapshots from GroupSnapshotNamespace
>>>> then we may not find the snapshot even if it
>>>> was created successfully.
>>>>
>>>> On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>> Can you give a little background on this specific inconsistent case
>>>>> you are referring to?
>>>>>
>>>>> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>> Yes, but if image's snapshot is renamed then I'm not able to find this
>>>>>> snapshot having only group's snapshot in an inconsistent state for
>>>>>> example.
>>>>>>
>>>>>> V.
>>>>>>
>>>>>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>> I think I might be confused. When creating a group snapshot, we have
>>>>>>> the ConsistencyGroupSnapshot that allows you to store the necessary
>>>>>>> linkage between the image's snapshot and its associated group snapshot
>>>>>>> [1]. Why not just name the image's snapshots to the same name as the
>>>>>>> parent group snapshot name and search the snapshot's metadata to get
>>>>>>> the linkage?
>>>>>>>
>>>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255
>>>>>>>
>>>>>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>> Jason,
>>>>>>>>
>>>>>>>> My current implementation of consistency group snapshot feature names
>>>>>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id>
>>>>>>>> I rely on this fact when I need to remove a consistency group. It's
>>>>>>>> necessary because if some of image snapshots were created, but the
>>>>>>>> whole group snapshot operation failed,
>>>>>>>> then the only way to find those dangling image snapshots is by this name.
>>>>>>>>
>>>>>>>> It means that we should forbid renaming snapshots from
>>>>>>>> ConsistencyGroupSnapshot namespace.
>>>>>>>>
>>>>>>>> Another option is to allocate image snapshot ids during the creation
>>>>>>>> of group snapshot, but this requires a major rewrite of the whole
>>>>>>>> process of snapshot creation for images.
>>>>>>>>
>>>>>>>> What is your opinion on this?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> V.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It
>>>>>>>>> sounds like RBD will need to support preserving the image's
>>>>>>>>> consistency group snapshots even if the image is removed from the
>>>>>>>>> group. In the OpenStack case, you won't have to worry about the image
>>>>>>>>> being deleted while it still has associated group snapshots.
>>>>>>>>>
>>>>>>>>> We will also want to support being able to clone child images from a
>>>>>>>>> group snapshot to ensure that we can thin provision new groups volumes
>>>>>>>>> when creating a new group from a group snapshot. This means that the
>>>>>>>>> group snapshots should be able to be protected/unprotected just like
>>>>>>>>> standard user snapshots.
>>>>>>>>>
>>>>>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>> Looking at the Cinder codebase, I don't see any such restriction that
>>>>>>>>>> would prevent you from removing a volume from a consistency group that
>>>>>>>>>> has associated snapshots. I would double-check on the OpenStack
>>>>>>>>>> development mailing list if this is correct and is the intent. Worst
>>>>>>>>>> case, the RBD driver could raise an exception if there are still
>>>>>>>>>> consistency group snapshots associated to the image.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>> Another thing that bothers me. When we remove an image from a consistency group.
>>>>>>>>>>> Should we remove all snapshots of this image that were created as part
>>>>>>>>>>> of a consistency group snapshot?
>>>>>>>>>>>
>>>>>>>>>>> The easiest solution would be to remove all snapshots that are in
>>>>>>>>>>> GroupSnapshotNamespace and reference this consistency group.
>>>>>>>>>>> I looked into cinder docs for this feature:
>>>>>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html
>>>>>>>>>>>
>>>>>>>>>>> But it's not clear to me which behavior cinder expects.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> V.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd
>>>>>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the
>>>>>>>>>>>> associated namespace).
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>> Question. When we print out snapshots of an image, should the group
>>>>>>>>>>>>> snapshots be listed, or should they be marked as special snapshots?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> V.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>> Ok. I didn't have any intention to throw exceptions.
>>>>>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete
>>>>>>>>>>>>>> objects or I should use smart pointers.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode"
>>>>>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions
>>>>>>>>>>>>>>> on purpose.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> V.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>>>> if (struct_v >= 5) {
>>>>>>>>>>>>>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>>>>>>>>>>>>>     } else {
>>>>>>>>>>>>>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>>>>>>>>>>>>>> snapshot_namespace);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I would do:
>>>>>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +1 -- looks good to me
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Jason
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Jason
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jason
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jason
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jason
>>>
>>>
>>>
>>> --
>>> Jason
>
>
>
> --
> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2017-01-04 18:39                                                                             ` Victor Denisov
@ 2017-01-12  0:14                                                                               ` Victor Denisov
  2017-01-12  0:18                                                                                 ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2017-01-12  0:14 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel

Can I open an image and then redirect it to one of its snapshots?

I see that there is a snap_set method.
Do I understand correctly that I need to:
open the image
invoke snap_set method
invoke refresh method

Thanks in advance.
V.

On Wed, Jan 4, 2017 at 10:39 AM, Victor Denisov <vdenisov@mirantis.com> wrote:
> I understood. Thanks!
>
> On Wed, Jan 4, 2017 at 10:34 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>> The loop isn't that big of a deal -- but you could eliminate it
>> entirely if you just index the in-memory snapshot table via the
>> SnapshotNamespace variant instead of just indexing snapshots by name
>> (e.g. ImageCtx::snap_ids key switches from a string to a namespace).
>> This would be required anyway since you might otherwise have duplicate
>> names between namespaces.
>>
>> On Wed, Jan 4, 2017 at 1:29 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> It looks like next CDM is only next month. Let's try to figure it out in email.
>>>
>>>> Since you know which images are linked to the group and you know which
>>>> snapshots are in the group and which group snapshots are in the image,
>>>> you can reconcile any issues using the details in the
>>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the
>>>> actual snapshot name (it could technically just be a randomly assigned
>>>> UUID).
>>>
>>> Let's say I have a consistency group and a snapshot of this group: CG
>>> and CGSNAP.
>>> Images in this snapshot I'll define as:
>>> CG.images[0] - image1
>>> CG.images[1] - image2
>>> CG.images[2] - image3
>>>
>>> Image snapshots in cg snapshot will be:
>>> CG.CGSNAP.snaps[0] - reference to snapshot of image 1
>>> CG.CGSNAP.snaps[1] - reference to snapshot of image 2
>>>
>>> Imagine that this snapshot was created, but wasn't finalized.
>>> CG.CGSNAP.state == PENDING.
>>> CG.CGSNAP.snaps.length == 0;
>>> I'll be writing in pseudo code.
>>>
>>> Now, let's say we want to remove this pending CGSNAP. This is the code
>>> how it's currently implemented:
>>>
>>> for (image: CG.images) {
>>>   snap_name = image.id + "_" + CG.CGSNAP.id + "_" + CG.id // This name
>>> is unique because of uniqueness of the tuple (image.id, CG.CGSNAP.id,
>>> CG.id)
>>>   remove_image_snapshot(snap_name);
>>> }
>>> remove_cg_snap(CGSNAP);
>>>
>>> However, if we don't rely on the name then this is how I envision the code:
>>>
>>> for (image: CG.images) {
>>>   for (snap: image.snaps) {
>>>     if (snap.namespace.cg_id == CG.id && snap.namespace.cg_snap_id ==
>>> CG.CGSNAP.id) { // this is our snapshot
>>>       remove_image_snapshot(snap.name);
>>>     }
>>>   }
>>> }
>>> remove_cg_snap(CGSNAP);
>>>
>>> In this solution I don't like the internal loop.
>>> What do you think?
>>>
>>> Thanks,
>>> Victor.
>>>
>>>
>>> On Tue, Jan 3, 2017 at 6:56 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>> After starting the process of creating a group snapshot, you will
>>>> already have all the necessary data for the group snapshot namespace
>>>> [1] (group pool, group id, and group snapshot id) and the group
>>>> snapshot should be persistently recorded to disk as
>>>> GROUP_SNAPSHOT_STATE_PENDING.
>>>>
>>>> Looking at the snapshot create state machine [2], I don't see any
>>>> place that a crash (or similar failure) would matter before the actual
>>>> image snapshot record is created atomically. You would pass the fully
>>>> populated GroupSnapshotNamespace to snap_create, and if the snapshot
>>>> is created, it's linked to the group via that namespace and any
>>>> failures afterwards don't matter since they are linked -- if the
>>>> snapshot fails to be created, it isn't linked to the group but the
>>>> snapshot doesn't exist either so there isn't anything to clean up.
>>>>
>>>> Since you know which images are linked to the group and you know which
>>>> snapshots are in the group and which group snapshots are in the image,
>>>> you can reconcile any issues using the details in the
>>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the
>>>> actual snapshot name (it could technically just be a randomly assigned
>>>> UUID).
>>>>
>>>> Perhaps we could talk about this at a future RBD standup meeting that
>>>> you are able to join (or the next CDM).
>>>>
>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L249
>>>> [2] https://github.com/ceph/ceph/blob/master/src/librbd/operation/SnapshotCreateRequest.h#L28
>>>>
>>>> On Tue, Jan 3, 2017 at 7:40 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>> Let's say we start creating a group snapshot.
>>>>> We invoke async snap_create method in Operations class.
>>>>> When we invoke this method we provide it with the snapshot name.
>>>>>
>>>>> While we are wating for the response we can be aborted.
>>>>> As a result we will be able to find the exact image snapshot using only its name
>>>>> as this was the only information we had at the time of running
>>>>> snap_create method.
>>>>>
>>>>> If snap_create was successful we will be able to find the snapshot
>>>>> otherwise we will not.
>>>>> However if we allow renaming snapshots from GroupSnapshotNamespace
>>>>> then we may not find the snapshot even if it
>>>>> was created successfully.
>>>>>
>>>>> On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>> Can you give a little background on this specific inconsistent case
>>>>>> you are referring to?
>>>>>>
>>>>>> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>> Yes, but if image's snapshot is renamed then I'm not able to find this
>>>>>>> snapshot having only group's snapshot in an inconsistent state for
>>>>>>> example.
>>>>>>>
>>>>>>> V.
>>>>>>>
>>>>>>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>> I think I might be confused. When creating a group snapshot, we have
>>>>>>>> the ConsistencyGroupSnapshot that allows you to store the necessary
>>>>>>>> linkage between the image's snapshot and its associated group snapshot
>>>>>>>> [1]. Why not just name the image's snapshots to the same name as the
>>>>>>>> parent group snapshot name and search the snapshot's metadata to get
>>>>>>>> the linkage?
>>>>>>>>
>>>>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255
>>>>>>>>
>>>>>>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>> Jason,
>>>>>>>>>
>>>>>>>>> My current implementation of consistency group snapshot feature names
>>>>>>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id>
>>>>>>>>> I rely on this fact when I need to remove a consistency group. It's
>>>>>>>>> necessary because if some of image snapshots were created, but the
>>>>>>>>> whole group snapshot operation failed,
>>>>>>>>> then the only way to find those dangling image snapshots is by this name.
>>>>>>>>>
>>>>>>>>> It means that we should forbid renaming snapshots from
>>>>>>>>> ConsistencyGroupSnapshot namespace.
>>>>>>>>>
>>>>>>>>> Another option is to allocate image snapshot ids during the creation
>>>>>>>>> of group snapshot, but this requires a major rewrite of the whole
>>>>>>>>> process of snapshot creation for images.
>>>>>>>>>
>>>>>>>>> What is your opinion on this?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> V.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It
>>>>>>>>>> sounds like RBD will need to support preserving the image's
>>>>>>>>>> consistency group snapshots even if the image is removed from the
>>>>>>>>>> group. In the OpenStack case, you won't have to worry about the image
>>>>>>>>>> being deleted while it still has associated group snapshots.
>>>>>>>>>>
>>>>>>>>>> We will also want to support being able to clone child images from a
>>>>>>>>>> group snapshot to ensure that we can thin provision new groups volumes
>>>>>>>>>> when creating a new group from a group snapshot. This means that the
>>>>>>>>>> group snapshots should be able to be protected/unprotected just like
>>>>>>>>>> standard user snapshots.
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>> Looking at the Cinder codebase, I don't see any such restriction that
>>>>>>>>>>> would prevent you from removing a volume from a consistency group that
>>>>>>>>>>> has associated snapshots. I would double-check on the OpenStack
>>>>>>>>>>> development mailing list if this is correct and is the intent. Worst
>>>>>>>>>>> case, the RBD driver could raise an exception if there are still
>>>>>>>>>>> consistency group snapshots associated to the image.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>> Another thing that bothers me. When we remove an image from a consistency group.
>>>>>>>>>>>> Should we remove all snapshots of this image that were created as part
>>>>>>>>>>>> of a consistency group snapshot?
>>>>>>>>>>>>
>>>>>>>>>>>> The easiest solution would be to remove all snapshots that are in
>>>>>>>>>>>> GroupSnapshotNamespace and reference this consistency group.
>>>>>>>>>>>> I looked into cinder docs for this feature:
>>>>>>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html
>>>>>>>>>>>>
>>>>>>>>>>>> But it's not clear to me which behavior cinder expects.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> V.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd
>>>>>>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the
>>>>>>>>>>>>> associated namespace).
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>> Question. When we print out snapshots of an image, should the group
>>>>>>>>>>>>>> snapshots be listed, or should they be marked as special snapshots?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> V.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>> Ok. I didn't have any intention to throw exceptions.
>>>>>>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete
>>>>>>>>>>>>>>> objects or I should use smart pointers.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode"
>>>>>>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions
>>>>>>>>>>>>>>>> on purpose.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> V.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>>>>> if (struct_v >= 5) {
>>>>>>>>>>>>>>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>>>>>>>>>>>>>>     } else {
>>>>>>>>>>>>>>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>>>>>>>>>>>>>>> snapshot_namespace);
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I would do:
>>>>>>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> +1 -- looks good to me
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Jason
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Jason
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Jason
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jason
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jason
>>>>
>>>>
>>>>
>>>> --
>>>> Jason
>>
>>
>>
>> --
>> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2017-01-12  0:14                                                                               ` Victor Denisov
@ 2017-01-12  0:18                                                                                 ` Jason Dillaman
  2017-01-12  1:10                                                                                   ` Victor Denisov
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Dillaman @ 2017-01-12  0:18 UTC (permalink / raw)
  To: Victor Denisov; +Cc: Mykola Golub, ceph-devel

You can specify the snapshot at image open as well. There is no need to refresh.

On Wed, Jan 11, 2017 at 7:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> Can I open an image and then redirect it to one of its snapshots?
>
> I see that there is a snap_set method.
> Do I understand correctly that I need to:
> open the image
> invoke snap_set method
> invoke refresh method
>
> Thanks in advance.
> V.
>
> On Wed, Jan 4, 2017 at 10:39 AM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> I understood. Thanks!
>>
>> On Wed, Jan 4, 2017 at 10:34 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>> The loop isn't that big of a deal -- but you could eliminate it
>>> entirely if you just index the in-memory snapshot table via the
>>> SnapshotNamespace variant instead of just indexing snapshots by name
>>> (e.g. ImageCtx::snap_ids key switches from a string to a namespace).
>>> This would be required anyway since you might otherwise have duplicate
>>> names between namespaces.
>>>
>>> On Wed, Jan 4, 2017 at 1:29 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>> It looks like next CDM is only next month. Let's try to figure it out in email.
>>>>
>>>>> Since you know which images are linked to the group and you know which
>>>>> snapshots are in the group and which group snapshots are in the image,
>>>>> you can reconcile any issues using the details in the
>>>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the
>>>>> actual snapshot name (it could technically just be a randomly assigned
>>>>> UUID).
>>>>
>>>> Let's say I have a consistency group and a snapshot of this group: CG
>>>> and CGSNAP.
>>>> Images in this snapshot I'll define as:
>>>> CG.images[0] - image1
>>>> CG.images[1] - image2
>>>> CG.images[2] - image3
>>>>
>>>> Image snapshots in cg snapshot will be:
>>>> CG.CGSNAP.snaps[0] - reference to snapshot of image 1
>>>> CG.CGSNAP.snaps[1] - reference to snapshot of image 2
>>>>
>>>> Imagine that this snapshot was created, but wasn't finalized.
>>>> CG.CGSNAP.state == PENDING.
>>>> CG.CGSNAP.snaps.length == 0;
>>>> I'll be writing in pseudo code.
>>>>
>>>> Now, let's say we want to remove this pending CGSNAP. This is the code
>>>> how it's currently implemented:
>>>>
>>>> for (image: CG.images) {
>>>>   snap_name = image.id + "_" + CG.CGSNAP.id + "_" + CG.id // This name
>>>> is unique because of uniqueness of the tuple (image.id, CG.CGSNAP.id,
>>>> CG.id)
>>>>   remove_image_snapshot(snap_name);
>>>> }
>>>> remove_cg_snap(CGSNAP);
>>>>
>>>> However, if we don't rely on the name then this is how I envision the code:
>>>>
>>>> for (image: CG.images) {
>>>>   for (snap: image.snaps) {
>>>>     if (snap.namespace.cg_id == CG.id && snap.namespace.cg_snap_id ==
>>>> CG.CGSNAP.id) { // this is our snapshot
>>>>       remove_image_snapshot(snap.name);
>>>>     }
>>>>   }
>>>> }
>>>> remove_cg_snap(CGSNAP);
>>>>
>>>> In this solution I don't like the internal loop.
>>>> What do you think?
>>>>
>>>> Thanks,
>>>> Victor.
>>>>
>>>>
>>>> On Tue, Jan 3, 2017 at 6:56 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>> After starting the process of creating a group snapshot, you will
>>>>> already have all the necessary data for the group snapshot namespace
>>>>> [1] (group pool, group id, and group snapshot id) and the group
>>>>> snapshot should be persistently recorded to disk as
>>>>> GROUP_SNAPSHOT_STATE_PENDING.
>>>>>
>>>>> Looking at the snapshot create state machine [2], I don't see any
>>>>> place that a crash (or similar failure) would matter before the actual
>>>>> image snapshot record is created atomically. You would pass the fully
>>>>> populated GroupSnapshotNamespace to snap_create, and if the snapshot
>>>>> is created, it's linked to the group via that namespace and any
>>>>> failures afterwards don't matter since they are linked -- if the
>>>>> snapshot fails to be created, it isn't linked to the group but the
>>>>> snapshot doesn't exist either so there isn't anything to clean up.
>>>>>
>>>>> Since you know which images are linked to the group and you know which
>>>>> snapshots are in the group and which group snapshots are in the image,
>>>>> you can reconcile any issues using the details in the
>>>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the
>>>>> actual snapshot name (it could technically just be a randomly assigned
>>>>> UUID).
>>>>>
>>>>> Perhaps we could talk about this at a future RBD standup meeting that
>>>>> you are able to join (or the next CDM).
>>>>>
>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L249
>>>>> [2] https://github.com/ceph/ceph/blob/master/src/librbd/operation/SnapshotCreateRequest.h#L28
>>>>>
>>>>> On Tue, Jan 3, 2017 at 7:40 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>> Let's say we start creating a group snapshot.
>>>>>> We invoke async snap_create method in Operations class.
>>>>>> When we invoke this method we provide it with the snapshot name.
>>>>>>
>>>>>> While we are wating for the response we can be aborted.
>>>>>> As a result we will be able to find the exact image snapshot using only its name
>>>>>> as this was the only information we had at the time of running
>>>>>> snap_create method.
>>>>>>
>>>>>> If snap_create was successful we will be able to find the snapshot
>>>>>> otherwise we will not.
>>>>>> However if we allow renaming snapshots from GroupSnapshotNamespace
>>>>>> then we may not find the snapshot even if it
>>>>>> was created successfully.
>>>>>>
>>>>>> On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>> Can you give a little background on this specific inconsistent case
>>>>>>> you are referring to?
>>>>>>>
>>>>>>> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>> Yes, but if image's snapshot is renamed then I'm not able to find this
>>>>>>>> snapshot having only group's snapshot in an inconsistent state for
>>>>>>>> example.
>>>>>>>>
>>>>>>>> V.
>>>>>>>>
>>>>>>>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>> I think I might be confused. When creating a group snapshot, we have
>>>>>>>>> the ConsistencyGroupSnapshot that allows you to store the necessary
>>>>>>>>> linkage between the image's snapshot and its associated group snapshot
>>>>>>>>> [1]. Why not just name the image's snapshots to the same name as the
>>>>>>>>> parent group snapshot name and search the snapshot's metadata to get
>>>>>>>>> the linkage?
>>>>>>>>>
>>>>>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255
>>>>>>>>>
>>>>>>>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>> Jason,
>>>>>>>>>>
>>>>>>>>>> My current implementation of consistency group snapshot feature names
>>>>>>>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id>
>>>>>>>>>> I rely on this fact when I need to remove a consistency group. It's
>>>>>>>>>> necessary because if some of image snapshots were created, but the
>>>>>>>>>> whole group snapshot operation failed,
>>>>>>>>>> then the only way to find those dangling image snapshots is by this name.
>>>>>>>>>>
>>>>>>>>>> It means that we should forbid renaming snapshots from
>>>>>>>>>> ConsistencyGroupSnapshot namespace.
>>>>>>>>>>
>>>>>>>>>> Another option is to allocate image snapshot ids during the creation
>>>>>>>>>> of group snapshot, but this requires a major rewrite of the whole
>>>>>>>>>> process of snapshot creation for images.
>>>>>>>>>>
>>>>>>>>>> What is your opinion on this?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> V.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It
>>>>>>>>>>> sounds like RBD will need to support preserving the image's
>>>>>>>>>>> consistency group snapshots even if the image is removed from the
>>>>>>>>>>> group. In the OpenStack case, you won't have to worry about the image
>>>>>>>>>>> being deleted while it still has associated group snapshots.
>>>>>>>>>>>
>>>>>>>>>>> We will also want to support being able to clone child images from a
>>>>>>>>>>> group snapshot to ensure that we can thin provision new groups volumes
>>>>>>>>>>> when creating a new group from a group snapshot. This means that the
>>>>>>>>>>> group snapshots should be able to be protected/unprotected just like
>>>>>>>>>>> standard user snapshots.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>> Looking at the Cinder codebase, I don't see any such restriction that
>>>>>>>>>>>> would prevent you from removing a volume from a consistency group that
>>>>>>>>>>>> has associated snapshots. I would double-check on the OpenStack
>>>>>>>>>>>> development mailing list if this is correct and is the intent. Worst
>>>>>>>>>>>> case, the RBD driver could raise an exception if there are still
>>>>>>>>>>>> consistency group snapshots associated to the image.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>> Another thing that bothers me. When we remove an image from a consistency group.
>>>>>>>>>>>>> Should we remove all snapshots of this image that were created as part
>>>>>>>>>>>>> of a consistency group snapshot?
>>>>>>>>>>>>>
>>>>>>>>>>>>> The easiest solution would be to remove all snapshots that are in
>>>>>>>>>>>>> GroupSnapshotNamespace and reference this consistency group.
>>>>>>>>>>>>> I looked into cinder docs for this feature:
>>>>>>>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html
>>>>>>>>>>>>>
>>>>>>>>>>>>> But it's not clear to me which behavior cinder expects.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> V.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd
>>>>>>>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the
>>>>>>>>>>>>>> associated namespace).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>> Question. When we print out snapshots of an image, should the group
>>>>>>>>>>>>>>> snapshots be listed, or should they be marked as special snapshots?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> V.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>> Ok. I didn't have any intention to throw exceptions.
>>>>>>>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete
>>>>>>>>>>>>>>>> objects or I should use smart pointers.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode"
>>>>>>>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions
>>>>>>>>>>>>>>>>> on purpose.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> V.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>>>>>> if (struct_v >= 5) {
>>>>>>>>>>>>>>>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>>>>>>>>>>>>>>>     } else {
>>>>>>>>>>>>>>>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>>>>>>>>>>>>>>>> snapshot_namespace);
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I would do:
>>>>>>>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl);
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> +1 -- looks good to me
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Jason
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Jason
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jason
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jason
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jason
>>>
>>>
>>>
>>> --
>>> Jason



-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2017-01-12  0:18                                                                                 ` Jason Dillaman
@ 2017-01-12  1:10                                                                                   ` Victor Denisov
  2017-01-12  1:36                                                                                     ` Jason Dillaman
  0 siblings, 1 reply; 43+ messages in thread
From: Victor Denisov @ 2017-01-12  1:10 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel

If I understand correctly I can't open required snapshot immediately
if I know only its namespace.

I need to get the whole list of snapshots first and then find mine
using its namespace.
After that I want to redirect the opened image to my snapshot.

On Wed, Jan 11, 2017 at 4:18 PM, Jason Dillaman <jdillama@redhat.com> wrote:
> You can specify the snapshot at image open as well. There is no need to refresh.
>
> On Wed, Jan 11, 2017 at 7:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>> Can I open an image and then redirect it to one of its snapshots?
>>
>> I see that there is a snap_set method.
>> Do I understand correctly that I need to:
>> open the image
>> invoke snap_set method
>> invoke refresh method
>>
>> Thanks in advance.
>> V.
>>
>> On Wed, Jan 4, 2017 at 10:39 AM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> I understood. Thanks!
>>>
>>> On Wed, Jan 4, 2017 at 10:34 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>> The loop isn't that big of a deal -- but you could eliminate it
>>>> entirely if you just index the in-memory snapshot table via the
>>>> SnapshotNamespace variant instead of just indexing snapshots by name
>>>> (e.g. ImageCtx::snap_ids key switches from a string to a namespace).
>>>> This would be required anyway since you might otherwise have duplicate
>>>> names between namespaces.
>>>>
>>>> On Wed, Jan 4, 2017 at 1:29 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>> It looks like next CDM is only next month. Let's try to figure it out in email.
>>>>>
>>>>>> Since you know which images are linked to the group and you know which
>>>>>> snapshots are in the group and which group snapshots are in the image,
>>>>>> you can reconcile any issues using the details in the
>>>>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the
>>>>>> actual snapshot name (it could technically just be a randomly assigned
>>>>>> UUID).
>>>>>
>>>>> Let's say I have a consistency group and a snapshot of this group: CG
>>>>> and CGSNAP.
>>>>> Images in this snapshot I'll define as:
>>>>> CG.images[0] - image1
>>>>> CG.images[1] - image2
>>>>> CG.images[2] - image3
>>>>>
>>>>> Image snapshots in cg snapshot will be:
>>>>> CG.CGSNAP.snaps[0] - reference to snapshot of image 1
>>>>> CG.CGSNAP.snaps[1] - reference to snapshot of image 2
>>>>>
>>>>> Imagine that this snapshot was created, but wasn't finalized.
>>>>> CG.CGSNAP.state == PENDING.
>>>>> CG.CGSNAP.snaps.length == 0;
>>>>> I'll be writing in pseudo code.
>>>>>
>>>>> Now, let's say we want to remove this pending CGSNAP. This is the code
>>>>> how it's currently implemented:
>>>>>
>>>>> for (image: CG.images) {
>>>>>   snap_name = image.id + "_" + CG.CGSNAP.id + "_" + CG.id // This name
>>>>> is unique because of uniqueness of the tuple (image.id, CG.CGSNAP.id,
>>>>> CG.id)
>>>>>   remove_image_snapshot(snap_name);
>>>>> }
>>>>> remove_cg_snap(CGSNAP);
>>>>>
>>>>> However, if we don't rely on the name then this is how I envision the code:
>>>>>
>>>>> for (image: CG.images) {
>>>>>   for (snap: image.snaps) {
>>>>>     if (snap.namespace.cg_id == CG.id && snap.namespace.cg_snap_id ==
>>>>> CG.CGSNAP.id) { // this is our snapshot
>>>>>       remove_image_snapshot(snap.name);
>>>>>     }
>>>>>   }
>>>>> }
>>>>> remove_cg_snap(CGSNAP);
>>>>>
>>>>> In this solution I don't like the internal loop.
>>>>> What do you think?
>>>>>
>>>>> Thanks,
>>>>> Victor.
>>>>>
>>>>>
>>>>> On Tue, Jan 3, 2017 at 6:56 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>> After starting the process of creating a group snapshot, you will
>>>>>> already have all the necessary data for the group snapshot namespace
>>>>>> [1] (group pool, group id, and group snapshot id) and the group
>>>>>> snapshot should be persistently recorded to disk as
>>>>>> GROUP_SNAPSHOT_STATE_PENDING.
>>>>>>
>>>>>> Looking at the snapshot create state machine [2], I don't see any
>>>>>> place that a crash (or similar failure) would matter before the actual
>>>>>> image snapshot record is created atomically. You would pass the fully
>>>>>> populated GroupSnapshotNamespace to snap_create, and if the snapshot
>>>>>> is created, it's linked to the group via that namespace and any
>>>>>> failures afterwards don't matter since they are linked -- if the
>>>>>> snapshot fails to be created, it isn't linked to the group but the
>>>>>> snapshot doesn't exist either so there isn't anything to clean up.
>>>>>>
>>>>>> Since you know which images are linked to the group and you know which
>>>>>> snapshots are in the group and which group snapshots are in the image,
>>>>>> you can reconcile any issues using the details in the
>>>>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the
>>>>>> actual snapshot name (it could technically just be a randomly assigned
>>>>>> UUID).
>>>>>>
>>>>>> Perhaps we could talk about this at a future RBD standup meeting that
>>>>>> you are able to join (or the next CDM).
>>>>>>
>>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L249
>>>>>> [2] https://github.com/ceph/ceph/blob/master/src/librbd/operation/SnapshotCreateRequest.h#L28
>>>>>>
>>>>>> On Tue, Jan 3, 2017 at 7:40 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>> Let's say we start creating a group snapshot.
>>>>>>> We invoke async snap_create method in Operations class.
>>>>>>> When we invoke this method we provide it with the snapshot name.
>>>>>>>
>>>>>>> While we are wating for the response we can be aborted.
>>>>>>> As a result we will be able to find the exact image snapshot using only its name
>>>>>>> as this was the only information we had at the time of running
>>>>>>> snap_create method.
>>>>>>>
>>>>>>> If snap_create was successful we will be able to find the snapshot
>>>>>>> otherwise we will not.
>>>>>>> However if we allow renaming snapshots from GroupSnapshotNamespace
>>>>>>> then we may not find the snapshot even if it
>>>>>>> was created successfully.
>>>>>>>
>>>>>>> On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>> Can you give a little background on this specific inconsistent case
>>>>>>>> you are referring to?
>>>>>>>>
>>>>>>>> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>> Yes, but if image's snapshot is renamed then I'm not able to find this
>>>>>>>>> snapshot having only group's snapshot in an inconsistent state for
>>>>>>>>> example.
>>>>>>>>>
>>>>>>>>> V.
>>>>>>>>>
>>>>>>>>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>> I think I might be confused. When creating a group snapshot, we have
>>>>>>>>>> the ConsistencyGroupSnapshot that allows you to store the necessary
>>>>>>>>>> linkage between the image's snapshot and its associated group snapshot
>>>>>>>>>> [1]. Why not just name the image's snapshots to the same name as the
>>>>>>>>>> parent group snapshot name and search the snapshot's metadata to get
>>>>>>>>>> the linkage?
>>>>>>>>>>
>>>>>>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255
>>>>>>>>>>
>>>>>>>>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>> Jason,
>>>>>>>>>>>
>>>>>>>>>>> My current implementation of consistency group snapshot feature names
>>>>>>>>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id>
>>>>>>>>>>> I rely on this fact when I need to remove a consistency group. It's
>>>>>>>>>>> necessary because if some of image snapshots were created, but the
>>>>>>>>>>> whole group snapshot operation failed,
>>>>>>>>>>> then the only way to find those dangling image snapshots is by this name.
>>>>>>>>>>>
>>>>>>>>>>> It means that we should forbid renaming snapshots from
>>>>>>>>>>> ConsistencyGroupSnapshot namespace.
>>>>>>>>>>>
>>>>>>>>>>> Another option is to allocate image snapshot ids during the creation
>>>>>>>>>>> of group snapshot, but this requires a major rewrite of the whole
>>>>>>>>>>> process of snapshot creation for images.
>>>>>>>>>>>
>>>>>>>>>>> What is your opinion on this?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> V.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It
>>>>>>>>>>>> sounds like RBD will need to support preserving the image's
>>>>>>>>>>>> consistency group snapshots even if the image is removed from the
>>>>>>>>>>>> group. In the OpenStack case, you won't have to worry about the image
>>>>>>>>>>>> being deleted while it still has associated group snapshots.
>>>>>>>>>>>>
>>>>>>>>>>>> We will also want to support being able to clone child images from a
>>>>>>>>>>>> group snapshot to ensure that we can thin provision new groups volumes
>>>>>>>>>>>> when creating a new group from a group snapshot. This means that the
>>>>>>>>>>>> group snapshots should be able to be protected/unprotected just like
>>>>>>>>>>>> standard user snapshots.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>> Looking at the Cinder codebase, I don't see any such restriction that
>>>>>>>>>>>>> would prevent you from removing a volume from a consistency group that
>>>>>>>>>>>>> has associated snapshots. I would double-check on the OpenStack
>>>>>>>>>>>>> development mailing list if this is correct and is the intent. Worst
>>>>>>>>>>>>> case, the RBD driver could raise an exception if there are still
>>>>>>>>>>>>> consistency group snapshots associated to the image.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>> Another thing that bothers me. When we remove an image from a consistency group.
>>>>>>>>>>>>>> Should we remove all snapshots of this image that were created as part
>>>>>>>>>>>>>> of a consistency group snapshot?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The easiest solution would be to remove all snapshots that are in
>>>>>>>>>>>>>> GroupSnapshotNamespace and reference this consistency group.
>>>>>>>>>>>>>> I looked into cinder docs for this feature:
>>>>>>>>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But it's not clear to me which behavior cinder expects.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> V.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd
>>>>>>>>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the
>>>>>>>>>>>>>>> associated namespace).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>> Question. When we print out snapshots of an image, should the group
>>>>>>>>>>>>>>>> snapshots be listed, or should they be marked as special snapshots?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> V.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>>> Ok. I didn't have any intention to throw exceptions.
>>>>>>>>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete
>>>>>>>>>>>>>>>>> objects or I should use smart pointers.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode"
>>>>>>>>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions
>>>>>>>>>>>>>>>>>> on purpose.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> V.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>>>>>>> if (struct_v >= 5) {
>>>>>>>>>>>>>>>>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>>>>>>>>>>>>>>>>     } else {
>>>>>>>>>>>>>>>>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>>>>>>>>>>>>>>>>> snapshot_namespace);
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I would do:
>>>>>>>>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl);
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> +1 -- looks good to me
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Jason
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Jason
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Jason
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jason
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jason
>>>>
>>>>
>>>>
>>>> --
>>>> Jason
>
>
>
> --
> Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Snapshots of consistency groups
  2017-01-12  1:10                                                                                   ` Victor Denisov
@ 2017-01-12  1:36                                                                                     ` Jason Dillaman
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Dillaman @ 2017-01-12  1:36 UTC (permalink / raw)
  To: Victor Denisov; +Cc: Mykola Golub, ceph-devel

Given that specific use-case, yes, you can switch into an out of a
snapshot whenever you please using the snap_set API. Of course, the
snapshot API methods will need to be expanded to support namespaces.

On Wed, Jan 11, 2017 at 8:10 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
> If I understand correctly I can't open required snapshot immediately
> if I know only its namespace.
>
> I need to get the whole list of snapshots first and then find mine
> using its namespace.
> After that I want to redirect the opened image to my snapshot.
>
> On Wed, Jan 11, 2017 at 4:18 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>> You can specify the snapshot at image open as well. There is no need to refresh.
>>
>> On Wed, Jan 11, 2017 at 7:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>> Can I open an image and then redirect it to one of its snapshots?
>>>
>>> I see that there is a snap_set method.
>>> Do I understand correctly that I need to:
>>> open the image
>>> invoke snap_set method
>>> invoke refresh method
>>>
>>> Thanks in advance.
>>> V.
>>>
>>> On Wed, Jan 4, 2017 at 10:39 AM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>> I understood. Thanks!
>>>>
>>>> On Wed, Jan 4, 2017 at 10:34 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>> The loop isn't that big of a deal -- but you could eliminate it
>>>>> entirely if you just index the in-memory snapshot table via the
>>>>> SnapshotNamespace variant instead of just indexing snapshots by name
>>>>> (e.g. ImageCtx::snap_ids key switches from a string to a namespace).
>>>>> This would be required anyway since you might otherwise have duplicate
>>>>> names between namespaces.
>>>>>
>>>>> On Wed, Jan 4, 2017 at 1:29 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>> It looks like next CDM is only next month. Let's try to figure it out in email.
>>>>>>
>>>>>>> Since you know which images are linked to the group and you know which
>>>>>>> snapshots are in the group and which group snapshots are in the image,
>>>>>>> you can reconcile any issues using the details in the
>>>>>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the
>>>>>>> actual snapshot name (it could technically just be a randomly assigned
>>>>>>> UUID).
>>>>>>
>>>>>> Let's say I have a consistency group and a snapshot of this group: CG
>>>>>> and CGSNAP.
>>>>>> Images in this snapshot I'll define as:
>>>>>> CG.images[0] - image1
>>>>>> CG.images[1] - image2
>>>>>> CG.images[2] - image3
>>>>>>
>>>>>> Image snapshots in cg snapshot will be:
>>>>>> CG.CGSNAP.snaps[0] - reference to snapshot of image 1
>>>>>> CG.CGSNAP.snaps[1] - reference to snapshot of image 2
>>>>>>
>>>>>> Imagine that this snapshot was created, but wasn't finalized.
>>>>>> CG.CGSNAP.state == PENDING.
>>>>>> CG.CGSNAP.snaps.length == 0;
>>>>>> I'll be writing in pseudo code.
>>>>>>
>>>>>> Now, let's say we want to remove this pending CGSNAP. This is the code
>>>>>> how it's currently implemented:
>>>>>>
>>>>>> for (image: CG.images) {
>>>>>>   snap_name = image.id + "_" + CG.CGSNAP.id + "_" + CG.id // This name
>>>>>> is unique because of uniqueness of the tuple (image.id, CG.CGSNAP.id,
>>>>>> CG.id)
>>>>>>   remove_image_snapshot(snap_name);
>>>>>> }
>>>>>> remove_cg_snap(CGSNAP);
>>>>>>
>>>>>> However, if we don't rely on the name then this is how I envision the code:
>>>>>>
>>>>>> for (image: CG.images) {
>>>>>>   for (snap: image.snaps) {
>>>>>>     if (snap.namespace.cg_id == CG.id && snap.namespace.cg_snap_id ==
>>>>>> CG.CGSNAP.id) { // this is our snapshot
>>>>>>       remove_image_snapshot(snap.name);
>>>>>>     }
>>>>>>   }
>>>>>> }
>>>>>> remove_cg_snap(CGSNAP);
>>>>>>
>>>>>> In this solution I don't like the internal loop.
>>>>>> What do you think?
>>>>>>
>>>>>> Thanks,
>>>>>> Victor.
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 3, 2017 at 6:56 PM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>> After starting the process of creating a group snapshot, you will
>>>>>>> already have all the necessary data for the group snapshot namespace
>>>>>>> [1] (group pool, group id, and group snapshot id) and the group
>>>>>>> snapshot should be persistently recorded to disk as
>>>>>>> GROUP_SNAPSHOT_STATE_PENDING.
>>>>>>>
>>>>>>> Looking at the snapshot create state machine [2], I don't see any
>>>>>>> place that a crash (or similar failure) would matter before the actual
>>>>>>> image snapshot record is created atomically. You would pass the fully
>>>>>>> populated GroupSnapshotNamespace to snap_create, and if the snapshot
>>>>>>> is created, it's linked to the group via that namespace and any
>>>>>>> failures afterwards don't matter since they are linked -- if the
>>>>>>> snapshot fails to be created, it isn't linked to the group but the
>>>>>>> snapshot doesn't exist either so there isn't anything to clean up.
>>>>>>>
>>>>>>> Since you know which images are linked to the group and you know which
>>>>>>> snapshots are in the group and which group snapshots are in the image,
>>>>>>> you can reconcile any issues using the details in the
>>>>>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the
>>>>>>> actual snapshot name (it could technically just be a randomly assigned
>>>>>>> UUID).
>>>>>>>
>>>>>>> Perhaps we could talk about this at a future RBD standup meeting that
>>>>>>> you are able to join (or the next CDM).
>>>>>>>
>>>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L249
>>>>>>> [2] https://github.com/ceph/ceph/blob/master/src/librbd/operation/SnapshotCreateRequest.h#L28
>>>>>>>
>>>>>>> On Tue, Jan 3, 2017 at 7:40 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>> Let's say we start creating a group snapshot.
>>>>>>>> We invoke async snap_create method in Operations class.
>>>>>>>> When we invoke this method we provide it with the snapshot name.
>>>>>>>>
>>>>>>>> While we are wating for the response we can be aborted.
>>>>>>>> As a result we will be able to find the exact image snapshot using only its name
>>>>>>>> as this was the only information we had at the time of running
>>>>>>>> snap_create method.
>>>>>>>>
>>>>>>>> If snap_create was successful we will be able to find the snapshot
>>>>>>>> otherwise we will not.
>>>>>>>> However if we allow renaming snapshots from GroupSnapshotNamespace
>>>>>>>> then we may not find the snapshot even if it
>>>>>>>> was created successfully.
>>>>>>>>
>>>>>>>> On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>> Can you give a little background on this specific inconsistent case
>>>>>>>>> you are referring to?
>>>>>>>>>
>>>>>>>>> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>> Yes, but if image's snapshot is renamed then I'm not able to find this
>>>>>>>>>> snapshot having only group's snapshot in an inconsistent state for
>>>>>>>>>> example.
>>>>>>>>>>
>>>>>>>>>> V.
>>>>>>>>>>
>>>>>>>>>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>> I think I might be confused. When creating a group snapshot, we have
>>>>>>>>>>> the ConsistencyGroupSnapshot that allows you to store the necessary
>>>>>>>>>>> linkage between the image's snapshot and its associated group snapshot
>>>>>>>>>>> [1]. Why not just name the image's snapshots to the same name as the
>>>>>>>>>>> parent group snapshot name and search the snapshot's metadata to get
>>>>>>>>>>> the linkage?
>>>>>>>>>>>
>>>>>>>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>> Jason,
>>>>>>>>>>>>
>>>>>>>>>>>> My current implementation of consistency group snapshot feature names
>>>>>>>>>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id>
>>>>>>>>>>>> I rely on this fact when I need to remove a consistency group. It's
>>>>>>>>>>>> necessary because if some of image snapshots were created, but the
>>>>>>>>>>>> whole group snapshot operation failed,
>>>>>>>>>>>> then the only way to find those dangling image snapshots is by this name.
>>>>>>>>>>>>
>>>>>>>>>>>> It means that we should forbid renaming snapshots from
>>>>>>>>>>>> ConsistencyGroupSnapshot namespace.
>>>>>>>>>>>>
>>>>>>>>>>>> Another option is to allocate image snapshot ids during the creation
>>>>>>>>>>>> of group snapshot, but this requires a major rewrite of the whole
>>>>>>>>>>>> process of snapshot creation for images.
>>>>>>>>>>>>
>>>>>>>>>>>> What is your opinion on this?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> V.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It
>>>>>>>>>>>>> sounds like RBD will need to support preserving the image's
>>>>>>>>>>>>> consistency group snapshots even if the image is removed from the
>>>>>>>>>>>>> group. In the OpenStack case, you won't have to worry about the image
>>>>>>>>>>>>> being deleted while it still has associated group snapshots.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We will also want to support being able to clone child images from a
>>>>>>>>>>>>> group snapshot to ensure that we can thin provision new groups volumes
>>>>>>>>>>>>> when creating a new group from a group snapshot. This means that the
>>>>>>>>>>>>> group snapshots should be able to be protected/unprotected just like
>>>>>>>>>>>>> standard user snapshots.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>> Looking at the Cinder codebase, I don't see any such restriction that
>>>>>>>>>>>>>> would prevent you from removing a volume from a consistency group that
>>>>>>>>>>>>>> has associated snapshots. I would double-check on the OpenStack
>>>>>>>>>>>>>> development mailing list if this is correct and is the intent. Worst
>>>>>>>>>>>>>> case, the RBD driver could raise an exception if there are still
>>>>>>>>>>>>>> consistency group snapshots associated to the image.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>> Another thing that bothers me. When we remove an image from a consistency group.
>>>>>>>>>>>>>>> Should we remove all snapshots of this image that were created as part
>>>>>>>>>>>>>>> of a consistency group snapshot?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The easiest solution would be to remove all snapshots that are in
>>>>>>>>>>>>>>> GroupSnapshotNamespace and reference this consistency group.
>>>>>>>>>>>>>>> I looked into cinder docs for this feature:
>>>>>>>>>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But it's not clear to me which behavior cinder expects.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> V.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd
>>>>>>>>>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the
>>>>>>>>>>>>>>>> associated namespace).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>>> Question. When we print out snapshots of an image, should the group
>>>>>>>>>>>>>>>>> snapshots be listed, or should they be marked as special snapshots?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> V.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>>>> Ok. I didn't have any intention to throw exceptions.
>>>>>>>>>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete
>>>>>>>>>>>>>>>>>> objects or I should use smart pointers.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode"
>>>>>>>>>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions
>>>>>>>>>>>>>>>>>>> on purpose.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> V.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> if (struct_v >= 5) {
>>>>>>>>>>>>>>>>>>>>>>       ::decode(snapshot_namespace, p);
>>>>>>>>>>>>>>>>>>>>>>     } else {
>>>>>>>>>>>>>>>>>>>>>>       snapshot_namespace = cls::rbd::UserSnapshotNamespace();
>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl),
>>>>>>>>>>>>>>>>>>>>>> snapshot_namespace);
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I would do:
>>>>>>>>>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl);
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> +1 -- looks good to me
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Jason
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Jason
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jason
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jason
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jason
>>
>>
>>
>> --
>> Jason



-- 
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2017-01-12  1:36 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-15 23:46 Snapshots of consistency groups Victor Denisov
2016-08-16 13:26 ` Jason Dillaman
2016-08-16 13:29   ` Jason Dillaman
2016-08-18 23:26     ` Victor Denisov
     [not found]       ` <CA+aFP1BMK1=daZpUFdNDJ+i8+nfQt42ahCXVaxy9fWdAY-kXwA@mail.gmail.com>
2016-08-19  4:20         ` Victor Denisov
2016-08-19 12:48           ` Mykola Golub
2016-08-20  0:36             ` Victor Denisov
2016-08-20 16:27               ` Mykola Golub
2016-08-26 21:05                 ` Victor Denisov
2016-08-29  0:37                   ` Jason Dillaman
2016-08-29 21:10                     ` Victor Denisov
2016-09-09 22:41                       ` Victor Denisov
2016-09-10 18:37                         ` Jason Dillaman
2016-09-11  1:46                           ` Victor Denisov
2016-09-12 13:18                             ` Jason Dillaman
2016-09-12 22:52                               ` Victor Denisov
2016-09-12 23:06                                 ` Victor Denisov
2016-09-12 23:09                                   ` Victor Denisov
2016-09-13 12:33                                     ` Jason Dillaman
2016-09-13 23:41                                       ` Victor Denisov
2016-09-15 23:17                                         ` Victor Denisov
2016-09-16 17:37                                           ` Jason Dillaman
2016-10-07 18:26                                             ` Victor Denisov
2016-10-10 14:18                                               ` Jason Dillaman
2016-10-10 22:14                                                 ` Victor Denisov
2016-10-26  3:07                                                   ` Victor Denisov
2016-10-26 13:16                                                     ` Jason Dillaman
2016-10-28 22:41                                                       ` Victor Denisov
2016-10-31 13:07                                                         ` Jason Dillaman
2016-11-01 15:01                                                           ` Jason Dillaman
2016-12-14  1:03                                                             ` Victor Denisov
2016-12-15 15:10                                                               ` Jason Dillaman
2016-12-16  0:05                                                                 ` Victor Denisov
2016-12-16 14:53                                                                   ` Jason Dillaman
2017-01-04  0:40                                                                     ` Victor Denisov
2017-01-04  2:56                                                                       ` Jason Dillaman
2017-01-04 18:29                                                                         ` Victor Denisov
2017-01-04 18:34                                                                           ` Jason Dillaman
2017-01-04 18:39                                                                             ` Victor Denisov
2017-01-12  0:14                                                                               ` Victor Denisov
2017-01-12  0:18                                                                                 ` Jason Dillaman
2017-01-12  1:10                                                                                   ` Victor Denisov
2017-01-12  1:36                                                                                     ` Jason Dillaman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.