* Snapshots of consistency groups @ 2016-08-15 23:46 Victor Denisov 2016-08-16 13:26 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-08-15 23:46 UTC (permalink / raw) To: ceph-devel, Jason Dillaman, Josh Durgin, Mykola Golub Gentlemen, I'm writing to you to ask for your opinion regarding quiescing writes. Here is the situation. In order to take snapshots of all images in a consistency group, we first need to quiesce all the image writers in the consistency group. Let me call group client - a client which requests a consistency group to take a snapshot. Image client - the client that writes to an image. Let's say group client starts sending notify_quiesce to all image clients that write to the images in the group. After quiescing half of the image clients the group client can die. It presents us with a dilemma - what should we do with those quiesced image clients. Option 1 - is to wait till someone manually runs recover for that consistency group. We can show warning next to those unfinished groups when user runs group list command. There will be a command like group recover, which allows users to rollback unsuccessful snapshots or continue them using create snapshot command. Option 2 - is to establish some heart beats between group client and image client. If group client fails to heart beat then image client unquiesces itself and continues normal operation. Option 3 - is to have a timeout for each image client. If group client fails to make a group snapshot within this timeout then we resume our normal operation informing group client of the fact. Which of these options do you prefer? Probably there are other options that I miss. Thanks, Victor. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-08-15 23:46 Snapshots of consistency groups Victor Denisov @ 2016-08-16 13:26 ` Jason Dillaman 2016-08-16 13:29 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Jason Dillaman @ 2016-08-16 13:26 UTC (permalink / raw) To: Victor Denisov; +Cc: ceph-devel, Josh Durgin, Mykola Golub Way back in April when we had the CDM, I was originally thinking we should implement option 3. Essentially, you have a prepare group snapshot RPC message that extends a "paused IO" lease to the caller. When that lease expires, IO would automatically be resumed even if the group snapshot hasn't been created yet. This would also require commit/abort group snapshot RPC messages. However, thinking about this last night, here is another potential option: Option 4 - require images to have the exclusive lock feature before they can be added to a consistency group (and prevent disabling of exclusive-lock while they are part of a group). Then librbd, via the rbd CLI (or client application of the rbd consistency group snap create API), can co-operatively acquire the lock from all active image clients within the group (i.e. all IO has been flushed and paused) and can proceed with snapshot creation. If the rbd CLI dies, the normal exclusive lock handling process will automatically take care of re-acquiring the lock from the dead client and resuming IO. This option not only re-uses existing code, it would also eliminate the need to add/update the RPC messages for prepare/commit/abort snapshot creation to support group snapshots (since it could all be handled internally). On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > Gentlemen, > > I'm writing to you to ask for your opinion regarding quiescing writes. > > Here is the situation. In order to take snapshots of all images in a > consistency group, > we first need to quiesce all the image writers in the consistency group. > Let me call > group client - a client which requests a consistency group to take a snapshot. > Image client - the client that writes to an image. > Let's say group client starts sending notify_quiesce to all image > clients that write to the images in the group. After quiescing half of > the image clients the group client can die. > > It presents us with a dilemma - what should we do with those quiesced > image clients. > > Option 1 - is to wait till someone manually runs recover for that > consistency group. > We can show warning next to those unfinished groups when user runs > group list command. > There will be a command like group recover, which allows users to > rollback unsuccessful snapshots > or continue them using create snapshot command. > > Option 2 - is to establish some heart beats between group client and > image client. If group client fails to heart beat then image client > unquiesces itself and continues normal operation. > > Option 3 - is to have a timeout for each image client. If group client > fails to make a group snapshot within this timeout then we resume our > normal operation informing group client of the fact. > > Which of these options do you prefer? Probably there are other options > that I miss. > > Thanks, > Victor. -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-08-16 13:26 ` Jason Dillaman @ 2016-08-16 13:29 ` Jason Dillaman 2016-08-18 23:26 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Jason Dillaman @ 2016-08-16 13:29 UTC (permalink / raw) To: Victor Denisov; +Cc: ceph-devel, Josh Durgin, Mykola Golub ... one more thing: I was also thinking that we need a new RBD feature bit to be used to indicate that an image is part of a consistency group to prevent older librbd clients from removing the image or group snapshots. This could be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can still open the image R/O while its part of a group. On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> wrote: > Way back in April when we had the CDM, I was originally thinking we > should implement option 3. Essentially, you have a prepare group > snapshot RPC message that extends a "paused IO" lease to the caller. > When that lease expires, IO would automatically be resumed even if the > group snapshot hasn't been created yet. This would also require > commit/abort group snapshot RPC messages. > > However, thinking about this last night, here is another potential option: > > Option 4 - require images to have the exclusive lock feature before > they can be added to a consistency group (and prevent disabling of > exclusive-lock while they are part of a group). Then librbd, via the > rbd CLI (or client application of the rbd consistency group snap > create API), can co-operatively acquire the lock from all active image > clients within the group (i.e. all IO has been flushed and paused) and > can proceed with snapshot creation. If the rbd CLI dies, the normal > exclusive lock handling process will automatically take care of > re-acquiring the lock from the dead client and resuming IO. > > This option not only re-uses existing code, it would also eliminate > the need to add/update the RPC messages for prepare/commit/abort > snapshot creation to support group snapshots (since it could all be > handled internally). > > On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> Gentlemen, >> >> I'm writing to you to ask for your opinion regarding quiescing writes. >> >> Here is the situation. In order to take snapshots of all images in a >> consistency group, >> we first need to quiesce all the image writers in the consistency group. >> Let me call >> group client - a client which requests a consistency group to take a snapshot. >> Image client - the client that writes to an image. >> Let's say group client starts sending notify_quiesce to all image >> clients that write to the images in the group. After quiescing half of >> the image clients the group client can die. >> >> It presents us with a dilemma - what should we do with those quiesced >> image clients. >> >> Option 1 - is to wait till someone manually runs recover for that >> consistency group. >> We can show warning next to those unfinished groups when user runs >> group list command. >> There will be a command like group recover, which allows users to >> rollback unsuccessful snapshots >> or continue them using create snapshot command. >> >> Option 2 - is to establish some heart beats between group client and >> image client. If group client fails to heart beat then image client >> unquiesces itself and continues normal operation. >> >> Option 3 - is to have a timeout for each image client. If group client >> fails to make a group snapshot within this timeout then we resume our >> normal operation informing group client of the fact. >> >> Which of these options do you prefer? Probably there are other options >> that I miss. >> >> Thanks, >> Victor. > > > > -- > Jason -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-08-16 13:29 ` Jason Dillaman @ 2016-08-18 23:26 ` Victor Denisov [not found] ` <CA+aFP1BMK1=daZpUFdNDJ+i8+nfQt42ahCXVaxy9fWdAY-kXwA@mail.gmail.com> 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-08-18 23:26 UTC (permalink / raw) To: Jason Dillaman; +Cc: ceph-devel, Josh Durgin, Mykola Golub If an image already has a writer who owns the lock, should I implement a notification that allows to ask the writer to release the lock, is there already a standard way to intercept the exclusive lock? On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com> wrote: > ... one more thing: > > I was also thinking that we need a new RBD feature bit to be used to > indicate that an image is part of a consistency group to prevent older > librbd clients from removing the image or group snapshots. This could > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can > still open the image R/O while its part of a group. > > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> wrote: >> Way back in April when we had the CDM, I was originally thinking we >> should implement option 3. Essentially, you have a prepare group >> snapshot RPC message that extends a "paused IO" lease to the caller. >> When that lease expires, IO would automatically be resumed even if the >> group snapshot hasn't been created yet. This would also require >> commit/abort group snapshot RPC messages. >> >> However, thinking about this last night, here is another potential option: >> >> Option 4 - require images to have the exclusive lock feature before >> they can be added to a consistency group (and prevent disabling of >> exclusive-lock while they are part of a group). Then librbd, via the >> rbd CLI (or client application of the rbd consistency group snap >> create API), can co-operatively acquire the lock from all active image >> clients within the group (i.e. all IO has been flushed and paused) and >> can proceed with snapshot creation. If the rbd CLI dies, the normal >> exclusive lock handling process will automatically take care of >> re-acquiring the lock from the dead client and resuming IO. >> >> This option not only re-uses existing code, it would also eliminate >> the need to add/update the RPC messages for prepare/commit/abort >> snapshot creation to support group snapshots (since it could all be >> handled internally). >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> Gentlemen, >>> >>> I'm writing to you to ask for your opinion regarding quiescing writes. >>> >>> Here is the situation. In order to take snapshots of all images in a >>> consistency group, >>> we first need to quiesce all the image writers in the consistency group. >>> Let me call >>> group client - a client which requests a consistency group to take a snapshot. >>> Image client - the client that writes to an image. >>> Let's say group client starts sending notify_quiesce to all image >>> clients that write to the images in the group. After quiescing half of >>> the image clients the group client can die. >>> >>> It presents us with a dilemma - what should we do with those quiesced >>> image clients. >>> >>> Option 1 - is to wait till someone manually runs recover for that >>> consistency group. >>> We can show warning next to those unfinished groups when user runs >>> group list command. >>> There will be a command like group recover, which allows users to >>> rollback unsuccessful snapshots >>> or continue them using create snapshot command. >>> >>> Option 2 - is to establish some heart beats between group client and >>> image client. If group client fails to heart beat then image client >>> unquiesces itself and continues normal operation. >>> >>> Option 3 - is to have a timeout for each image client. If group client >>> fails to make a group snapshot within this timeout then we resume our >>> normal operation informing group client of the fact. >>> >>> Which of these options do you prefer? Probably there are other options >>> that I miss. >>> >>> Thanks, >>> Victor. >> >> >> >> -- >> Jason > > > > -- > Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <CA+aFP1BMK1=daZpUFdNDJ+i8+nfQt42ahCXVaxy9fWdAY-kXwA@mail.gmail.com>]
* Re: Snapshots of consistency groups [not found] ` <CA+aFP1BMK1=daZpUFdNDJ+i8+nfQt42ahCXVaxy9fWdAY-kXwA@mail.gmail.com> @ 2016-08-19 4:20 ` Victor Denisov 2016-08-19 12:48 ` Mykola Golub 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-08-19 4:20 UTC (permalink / raw) To: Jason Dillaman; +Cc: ceph-devel, Josh Durgin, Mykola Golub Could you please point me to the place in source code where writer acquires an exclusive lock on the image. I presume you were talking about the feature: exclusive_lock, shared_lock which can be used from command line using commands lock list, lock break. On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote: > There is already a "request lock" RPC message and this is already handled > transparently within librbd when you attempt to acquire the lock and another > client owns it. > > > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote: >> >> If an image already has a writer who owns the lock, >> should I implement a notification that allows to ask the writer to >> release the lock, >> is there already a standard way to intercept the exclusive lock? >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com> >> wrote: >> > ... one more thing: >> > >> > I was also thinking that we need a new RBD feature bit to be used to >> > indicate that an image is part of a consistency group to prevent older >> > librbd clients from removing the image or group snapshots. This could >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can >> > still open the image R/O while its part of a group. >> > >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> >> > wrote: >> >> Way back in April when we had the CDM, I was originally thinking we >> >> should implement option 3. Essentially, you have a prepare group >> >> snapshot RPC message that extends a "paused IO" lease to the caller. >> >> When that lease expires, IO would automatically be resumed even if the >> >> group snapshot hasn't been created yet. This would also require >> >> commit/abort group snapshot RPC messages. >> >> >> >> However, thinking about this last night, here is another potential >> >> option: >> >> >> >> Option 4 - require images to have the exclusive lock feature before >> >> they can be added to a consistency group (and prevent disabling of >> >> exclusive-lock while they are part of a group). Then librbd, via the >> >> rbd CLI (or client application of the rbd consistency group snap >> >> create API), can co-operatively acquire the lock from all active image >> >> clients within the group (i.e. all IO has been flushed and paused) and >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal >> >> exclusive lock handling process will automatically take care of >> >> re-acquiring the lock from the dead client and resuming IO. >> >> >> >> This option not only re-uses existing code, it would also eliminate >> >> the need to add/update the RPC messages for prepare/commit/abort >> >> snapshot creation to support group snapshots (since it could all be >> >> handled internally). >> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> >> >> wrote: >> >>> Gentlemen, >> >>> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes. >> >>> >> >>> Here is the situation. In order to take snapshots of all images in a >> >>> consistency group, >> >>> we first need to quiesce all the image writers in the consistency >> >>> group. >> >>> Let me call >> >>> group client - a client which requests a consistency group to take a >> >>> snapshot. >> >>> Image client - the client that writes to an image. >> >>> Let's say group client starts sending notify_quiesce to all image >> >>> clients that write to the images in the group. After quiescing half of >> >>> the image clients the group client can die. >> >>> >> >>> It presents us with a dilemma - what should we do with those quiesced >> >>> image clients. >> >>> >> >>> Option 1 - is to wait till someone manually runs recover for that >> >>> consistency group. >> >>> We can show warning next to those unfinished groups when user runs >> >>> group list command. >> >>> There will be a command like group recover, which allows users to >> >>> rollback unsuccessful snapshots >> >>> or continue them using create snapshot command. >> >>> >> >>> Option 2 - is to establish some heart beats between group client and >> >>> image client. If group client fails to heart beat then image client >> >>> unquiesces itself and continues normal operation. >> >>> >> >>> Option 3 - is to have a timeout for each image client. If group client >> >>> fails to make a group snapshot within this timeout then we resume our >> >>> normal operation informing group client of the fact. >> >>> >> >>> Which of these options do you prefer? Probably there are other options >> >>> that I miss. >> >>> >> >>> Thanks, >> >>> Victor. >> >> >> >> >> >> >> >> -- >> >> Jason >> > >> > >> > >> > -- >> > Jason > > > > -- > Jason > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-08-19 4:20 ` Victor Denisov @ 2016-08-19 12:48 ` Mykola Golub 2016-08-20 0:36 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Mykola Golub @ 2016-08-19 12:48 UTC (permalink / raw) To: Victor Denisov; +Cc: Jason Dillaman, ceph-devel, Josh Durgin On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote: > Could you please point me to the place in source code where writer > acquires an exclusive lock on the image. Grep for 'exclusive_lock->request_lock'. Basically, what you need (after opening the image) is: ``` C_SaferCond lock_ctx; { RWLock::WLocker l(ictx->owner_lock); if (ictx->exclusive_lock == nullptr) { // exclusive-lock feature is not enabled return -EINVAL; } // Request the lock. If it is currently owned by another client, // RPC message will be sent to the client to release the lock. ictx->exclusive_lock->request_lock(&lock_ctx); } // release owner_lock before waiting to avoid potential deadlock int r = lock_ctx.wait(); if (r < 0) { return r; } RWLock::RLocker l(ictx->owner_lock); if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) { // failed to acquire exclusive lock return -EROFS; } // At this point lock is acquired ... ``` You might want to look at this PR https://github.com/ceph/ceph/pull/9592 where we discuss adding API methods to directly acquire and release the exclusive lock. You don't need the API, but will find examples in the patch, and also useful comments from Jason. -- Mykola Golub > I presume you were talking about the feature: > exclusive_lock, shared_lock which can be used from command line using > commands lock list, lock break. > > On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote: > > There is already a "request lock" RPC message and this is already handled > > transparently within librbd when you attempt to acquire the lock and another > > client owns it. > > > > > > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote: > >> > >> If an image already has a writer who owns the lock, > >> should I implement a notification that allows to ask the writer to > >> release the lock, > >> is there already a standard way to intercept the exclusive lock? > >> > >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com> > >> wrote: > >> > ... one more thing: > >> > > >> > I was also thinking that we need a new RBD feature bit to be used to > >> > indicate that an image is part of a consistency group to prevent older > >> > librbd clients from removing the image or group snapshots. This could > >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can > >> > still open the image R/O while its part of a group. > >> > > >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> > >> > wrote: > >> >> Way back in April when we had the CDM, I was originally thinking we > >> >> should implement option 3. Essentially, you have a prepare group > >> >> snapshot RPC message that extends a "paused IO" lease to the caller. > >> >> When that lease expires, IO would automatically be resumed even if the > >> >> group snapshot hasn't been created yet. This would also require > >> >> commit/abort group snapshot RPC messages. > >> >> > >> >> However, thinking about this last night, here is another potential > >> >> option: > >> >> > >> >> Option 4 - require images to have the exclusive lock feature before > >> >> they can be added to a consistency group (and prevent disabling of > >> >> exclusive-lock while they are part of a group). Then librbd, via the > >> >> rbd CLI (or client application of the rbd consistency group snap > >> >> create API), can co-operatively acquire the lock from all active image > >> >> clients within the group (i.e. all IO has been flushed and paused) and > >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal > >> >> exclusive lock handling process will automatically take care of > >> >> re-acquiring the lock from the dead client and resuming IO. > >> >> > >> >> This option not only re-uses existing code, it would also eliminate > >> >> the need to add/update the RPC messages for prepare/commit/abort > >> >> snapshot creation to support group snapshots (since it could all be > >> >> handled internally). > >> >> > >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> > >> >> wrote: > >> >>> Gentlemen, > >> >>> > >> >>> I'm writing to you to ask for your opinion regarding quiescing writes. > >> >>> > >> >>> Here is the situation. In order to take snapshots of all images in a > >> >>> consistency group, > >> >>> we first need to quiesce all the image writers in the consistency > >> >>> group. > >> >>> Let me call > >> >>> group client - a client which requests a consistency group to take a > >> >>> snapshot. > >> >>> Image client - the client that writes to an image. > >> >>> Let's say group client starts sending notify_quiesce to all image > >> >>> clients that write to the images in the group. After quiescing half of > >> >>> the image clients the group client can die. > >> >>> > >> >>> It presents us with a dilemma - what should we do with those quiesced > >> >>> image clients. > >> >>> > >> >>> Option 1 - is to wait till someone manually runs recover for that > >> >>> consistency group. > >> >>> We can show warning next to those unfinished groups when user runs > >> >>> group list command. > >> >>> There will be a command like group recover, which allows users to > >> >>> rollback unsuccessful snapshots > >> >>> or continue them using create snapshot command. > >> >>> > >> >>> Option 2 - is to establish some heart beats between group client and > >> >>> image client. If group client fails to heart beat then image client > >> >>> unquiesces itself and continues normal operation. > >> >>> > >> >>> Option 3 - is to have a timeout for each image client. If group client > >> >>> fails to make a group snapshot within this timeout then we resume our > >> >>> normal operation informing group client of the fact. > >> >>> > >> >>> Which of these options do you prefer? Probably there are other options > >> >>> that I miss. > >> >>> > >> >>> Thanks, > >> >>> Victor. > >> >> > >> >> > >> >> > >> >> -- > >> >> Jason > >> > > >> > > >> > > >> > -- > >> > Jason > > > > > > > > -- > > Jason > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-08-19 12:48 ` Mykola Golub @ 2016-08-20 0:36 ` Victor Denisov 2016-08-20 16:27 ` Mykola Golub 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-08-20 0:36 UTC (permalink / raw) To: Mykola Golub; +Cc: Jason Dillaman, ceph-devel, Josh Durgin What if I'm holding this lock and somebody else is trying to reacquire the lock. How do I get notified about it? On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote: > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote: >> Could you please point me to the place in source code where writer >> acquires an exclusive lock on the image. > > Grep for 'exclusive_lock->request_lock'. Basically, what you need > (after opening the image) is: > > ``` > C_SaferCond lock_ctx; > { > RWLock::WLocker l(ictx->owner_lock); > > if (ictx->exclusive_lock == nullptr) { > // exclusive-lock feature is not enabled > return -EINVAL; > } > > // Request the lock. If it is currently owned by another client, > // RPC message will be sent to the client to release the lock. > ictx->exclusive_lock->request_lock(&lock_ctx); > } // release owner_lock before waiting to avoid potential deadlock > > int r = lock_ctx.wait(); > if (r < 0) { > return r; > } > > RWLock::RLocker l(ictx->owner_lock); > if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) { > // failed to acquire exclusive lock > return -EROFS; > } > > // At this point lock is acquired > ... > > ``` > > You might want to look at this PR > > https://github.com/ceph/ceph/pull/9592 > > where we discuss adding API methods to directly acquire and release > the exclusive lock. You don't need the API, but will find examples in > the patch, and also useful comments from Jason. > > -- > Mykola Golub > >> I presume you were talking about the feature: >> exclusive_lock, shared_lock which can be used from command line using >> commands lock list, lock break. >> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote: >> > There is already a "request lock" RPC message and this is already handled >> > transparently within librbd when you attempt to acquire the lock and another >> > client owns it. >> > >> > >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote: >> >> >> >> If an image already has a writer who owns the lock, >> >> should I implement a notification that allows to ask the writer to >> >> release the lock, >> >> is there already a standard way to intercept the exclusive lock? >> >> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com> >> >> wrote: >> >> > ... one more thing: >> >> > >> >> > I was also thinking that we need a new RBD feature bit to be used to >> >> > indicate that an image is part of a consistency group to prevent older >> >> > librbd clients from removing the image or group snapshots. This could >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can >> >> > still open the image R/O while its part of a group. >> >> > >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> >> >> > wrote: >> >> >> Way back in April when we had the CDM, I was originally thinking we >> >> >> should implement option 3. Essentially, you have a prepare group >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller. >> >> >> When that lease expires, IO would automatically be resumed even if the >> >> >> group snapshot hasn't been created yet. This would also require >> >> >> commit/abort group snapshot RPC messages. >> >> >> >> >> >> However, thinking about this last night, here is another potential >> >> >> option: >> >> >> >> >> >> Option 4 - require images to have the exclusive lock feature before >> >> >> they can be added to a consistency group (and prevent disabling of >> >> >> exclusive-lock while they are part of a group). Then librbd, via the >> >> >> rbd CLI (or client application of the rbd consistency group snap >> >> >> create API), can co-operatively acquire the lock from all active image >> >> >> clients within the group (i.e. all IO has been flushed and paused) and >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal >> >> >> exclusive lock handling process will automatically take care of >> >> >> re-acquiring the lock from the dead client and resuming IO. >> >> >> >> >> >> This option not only re-uses existing code, it would also eliminate >> >> >> the need to add/update the RPC messages for prepare/commit/abort >> >> >> snapshot creation to support group snapshots (since it could all be >> >> >> handled internally). >> >> >> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> >> >> >> wrote: >> >> >>> Gentlemen, >> >> >>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes. >> >> >>> >> >> >>> Here is the situation. In order to take snapshots of all images in a >> >> >>> consistency group, >> >> >>> we first need to quiesce all the image writers in the consistency >> >> >>> group. >> >> >>> Let me call >> >> >>> group client - a client which requests a consistency group to take a >> >> >>> snapshot. >> >> >>> Image client - the client that writes to an image. >> >> >>> Let's say group client starts sending notify_quiesce to all image >> >> >>> clients that write to the images in the group. After quiescing half of >> >> >>> the image clients the group client can die. >> >> >>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced >> >> >>> image clients. >> >> >>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that >> >> >>> consistency group. >> >> >>> We can show warning next to those unfinished groups when user runs >> >> >>> group list command. >> >> >>> There will be a command like group recover, which allows users to >> >> >>> rollback unsuccessful snapshots >> >> >>> or continue them using create snapshot command. >> >> >>> >> >> >>> Option 2 - is to establish some heart beats between group client and >> >> >>> image client. If group client fails to heart beat then image client >> >> >>> unquiesces itself and continues normal operation. >> >> >>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client >> >> >>> fails to make a group snapshot within this timeout then we resume our >> >> >>> normal operation informing group client of the fact. >> >> >>> >> >> >>> Which of these options do you prefer? Probably there are other options >> >> >>> that I miss. >> >> >>> >> >> >>> Thanks, >> >> >>> Victor. >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Jason >> >> > >> >> > >> >> > >> >> > -- >> >> > Jason >> > >> > >> > >> > -- >> > Jason >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-08-20 0:36 ` Victor Denisov @ 2016-08-20 16:27 ` Mykola Golub 2016-08-26 21:05 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Mykola Golub @ 2016-08-20 16:27 UTC (permalink / raw) To: Victor Denisov; +Cc: Jason Dillaman, ceph-devel, Josh Durgin On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote: > What if I'm holding this lock and somebody else is trying to reacquire the lock. > How do I get notified about it? The image watcher is notified, which triggers its handler: ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx) The handler calls the current lock policy method `lock_requested()`, which will define what to do with the lock request. The StandartPolicy is to release the lock, so it may ping-ponging between the clients. You may define a different policy -- rbd-mirror is an example where it is used. Everywhere where an operation needs the exclusive lock, it is always checked if we currently are a lock owner, i.e: ictx->exclusive_lock->is_lock_owner() and if it is false, the exlusive lock is requested. Before this check you need to aquire ctx->owner_lock, and until you release owner_lock you can be sure your exclusive lock will not leak to another client. After releasing owner_lock, you will need to repeate the check again when you need it. -- Mykola Golub > > > On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote: > > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote: > >> Could you please point me to the place in source code where writer > >> acquires an exclusive lock on the image. > > > > Grep for 'exclusive_lock->request_lock'. Basically, what you need > > (after opening the image) is: > > > > ``` > > C_SaferCond lock_ctx; > > { > > RWLock::WLocker l(ictx->owner_lock); > > > > if (ictx->exclusive_lock == nullptr) { > > // exclusive-lock feature is not enabled > > return -EINVAL; > > } > > > > // Request the lock. If it is currently owned by another client, > > // RPC message will be sent to the client to release the lock. > > ictx->exclusive_lock->request_lock(&lock_ctx); > > } // release owner_lock before waiting to avoid potential deadlock > > > > int r = lock_ctx.wait(); > > if (r < 0) { > > return r; > > } > > > > RWLock::RLocker l(ictx->owner_lock); > > if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) { > > // failed to acquire exclusive lock > > return -EROFS; > > } > > > > // At this point lock is acquired > > ... > > > > ``` > > > > You might want to look at this PR > > > > https://github.com/ceph/ceph/pull/9592 > > > > where we discuss adding API methods to directly acquire and release > > the exclusive lock. You don't need the API, but will find examples in > > the patch, and also useful comments from Jason. > > > > -- > > Mykola Golub > > > >> I presume you were talking about the feature: > >> exclusive_lock, shared_lock which can be used from command line using > >> commands lock list, lock break. > >> > >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote: > >> > There is already a "request lock" RPC message and this is already handled > >> > transparently within librbd when you attempt to acquire the lock and another > >> > client owns it. > >> > > >> > > >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote: > >> >> > >> >> If an image already has a writer who owns the lock, > >> >> should I implement a notification that allows to ask the writer to > >> >> release the lock, > >> >> is there already a standard way to intercept the exclusive lock? > >> >> > >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com> > >> >> wrote: > >> >> > ... one more thing: > >> >> > > >> >> > I was also thinking that we need a new RBD feature bit to be used to > >> >> > indicate that an image is part of a consistency group to prevent older > >> >> > librbd clients from removing the image or group snapshots. This could > >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can > >> >> > still open the image R/O while its part of a group. > >> >> > > >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> > >> >> > wrote: > >> >> >> Way back in April when we had the CDM, I was originally thinking we > >> >> >> should implement option 3. Essentially, you have a prepare group > >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller. > >> >> >> When that lease expires, IO would automatically be resumed even if the > >> >> >> group snapshot hasn't been created yet. This would also require > >> >> >> commit/abort group snapshot RPC messages. > >> >> >> > >> >> >> However, thinking about this last night, here is another potential > >> >> >> option: > >> >> >> > >> >> >> Option 4 - require images to have the exclusive lock feature before > >> >> >> they can be added to a consistency group (and prevent disabling of > >> >> >> exclusive-lock while they are part of a group). Then librbd, via the > >> >> >> rbd CLI (or client application of the rbd consistency group snap > >> >> >> create API), can co-operatively acquire the lock from all active image > >> >> >> clients within the group (i.e. all IO has been flushed and paused) and > >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal > >> >> >> exclusive lock handling process will automatically take care of > >> >> >> re-acquiring the lock from the dead client and resuming IO. > >> >> >> > >> >> >> This option not only re-uses existing code, it would also eliminate > >> >> >> the need to add/update the RPC messages for prepare/commit/abort > >> >> >> snapshot creation to support group snapshots (since it could all be > >> >> >> handled internally). > >> >> >> > >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> > >> >> >> wrote: > >> >> >>> Gentlemen, > >> >> >>> > >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes. > >> >> >>> > >> >> >>> Here is the situation. In order to take snapshots of all images in a > >> >> >>> consistency group, > >> >> >>> we first need to quiesce all the image writers in the consistency > >> >> >>> group. > >> >> >>> Let me call > >> >> >>> group client - a client which requests a consistency group to take a > >> >> >>> snapshot. > >> >> >>> Image client - the client that writes to an image. > >> >> >>> Let's say group client starts sending notify_quiesce to all image > >> >> >>> clients that write to the images in the group. After quiescing half of > >> >> >>> the image clients the group client can die. > >> >> >>> > >> >> >>> It presents us with a dilemma - what should we do with those quiesced > >> >> >>> image clients. > >> >> >>> > >> >> >>> Option 1 - is to wait till someone manually runs recover for that > >> >> >>> consistency group. > >> >> >>> We can show warning next to those unfinished groups when user runs > >> >> >>> group list command. > >> >> >>> There will be a command like group recover, which allows users to > >> >> >>> rollback unsuccessful snapshots > >> >> >>> or continue them using create snapshot command. > >> >> >>> > >> >> >>> Option 2 - is to establish some heart beats between group client and > >> >> >>> image client. If group client fails to heart beat then image client > >> >> >>> unquiesces itself and continues normal operation. > >> >> >>> > >> >> >>> Option 3 - is to have a timeout for each image client. If group client > >> >> >>> fails to make a group snapshot within this timeout then we resume our > >> >> >>> normal operation informing group client of the fact. > >> >> >>> > >> >> >>> Which of these options do you prefer? Probably there are other options > >> >> >>> that I miss. > >> >> >>> > >> >> >>> Thanks, > >> >> >>> Victor. > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> Jason > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > Jason > >> > > >> > > >> > > >> > -- > >> > Jason > >> > > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-08-20 16:27 ` Mykola Golub @ 2016-08-26 21:05 ` Victor Denisov 2016-08-29 0:37 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-08-26 21:05 UTC (permalink / raw) To: Mykola Golub; +Cc: Jason Dillaman, ceph-devel, Josh Durgin Guys, I updated Snapshots section of this document: http://pad.ceph.com/p/consistency_groups, in accordance with my improved understanding of how it should be implemented. Please take a look and provide your comments. Some of my concerns regarding the implementation I highlighted in bold. Looking forward to your valuable remarks. Thanks in advance. V. On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote: > On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote: >> What if I'm holding this lock and somebody else is trying to reacquire the lock. >> How do I get notified about it? > > The image watcher is notified, which triggers its handler: > > ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx) > > The handler calls the current lock policy method `lock_requested()`, > which will define what to do with the lock request. The StandartPolicy > is to release the lock, so it may ping-ponging between the > clients. You may define a different policy -- rbd-mirror is an example > where it is used. > > Everywhere where an operation needs the exclusive lock, it is always > checked if we currently are a lock owner, i.e: > > ictx->exclusive_lock->is_lock_owner() > > and if it is false, the exlusive lock is requested. Before this check > you need to aquire ctx->owner_lock, and until you release owner_lock > you can be sure your exclusive lock will not leak to another > client. After releasing owner_lock, you will need to repeate the check > again when you need it. > > -- > Mykola Golub > >> >> >> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote: >> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote: >> >> Could you please point me to the place in source code where writer >> >> acquires an exclusive lock on the image. >> > >> > Grep for 'exclusive_lock->request_lock'. Basically, what you need >> > (after opening the image) is: >> > >> > ``` >> > C_SaferCond lock_ctx; >> > { >> > RWLock::WLocker l(ictx->owner_lock); >> > >> > if (ictx->exclusive_lock == nullptr) { >> > // exclusive-lock feature is not enabled >> > return -EINVAL; >> > } >> > >> > // Request the lock. If it is currently owned by another client, >> > // RPC message will be sent to the client to release the lock. >> > ictx->exclusive_lock->request_lock(&lock_ctx); >> > } // release owner_lock before waiting to avoid potential deadlock >> > >> > int r = lock_ctx.wait(); >> > if (r < 0) { >> > return r; >> > } >> > >> > RWLock::RLocker l(ictx->owner_lock); >> > if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) { >> > // failed to acquire exclusive lock >> > return -EROFS; >> > } >> > >> > // At this point lock is acquired >> > ... >> > >> > ``` >> > >> > You might want to look at this PR >> > >> > https://github.com/ceph/ceph/pull/9592 >> > >> > where we discuss adding API methods to directly acquire and release >> > the exclusive lock. You don't need the API, but will find examples in >> > the patch, and also useful comments from Jason. >> > >> > -- >> > Mykola Golub >> > >> >> I presume you were talking about the feature: >> >> exclusive_lock, shared_lock which can be used from command line using >> >> commands lock list, lock break. >> >> >> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote: >> >> > There is already a "request lock" RPC message and this is already handled >> >> > transparently within librbd when you attempt to acquire the lock and another >> >> > client owns it. >> >> > >> >> > >> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote: >> >> >> >> >> >> If an image already has a writer who owns the lock, >> >> >> should I implement a notification that allows to ask the writer to >> >> >> release the lock, >> >> >> is there already a standard way to intercept the exclusive lock? >> >> >> >> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com> >> >> >> wrote: >> >> >> > ... one more thing: >> >> >> > >> >> >> > I was also thinking that we need a new RBD feature bit to be used to >> >> >> > indicate that an image is part of a consistency group to prevent older >> >> >> > librbd clients from removing the image or group snapshots. This could >> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can >> >> >> > still open the image R/O while its part of a group. >> >> >> > >> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> >> >> >> > wrote: >> >> >> >> Way back in April when we had the CDM, I was originally thinking we >> >> >> >> should implement option 3. Essentially, you have a prepare group >> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller. >> >> >> >> When that lease expires, IO would automatically be resumed even if the >> >> >> >> group snapshot hasn't been created yet. This would also require >> >> >> >> commit/abort group snapshot RPC messages. >> >> >> >> >> >> >> >> However, thinking about this last night, here is another potential >> >> >> >> option: >> >> >> >> >> >> >> >> Option 4 - require images to have the exclusive lock feature before >> >> >> >> they can be added to a consistency group (and prevent disabling of >> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the >> >> >> >> rbd CLI (or client application of the rbd consistency group snap >> >> >> >> create API), can co-operatively acquire the lock from all active image >> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and >> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal >> >> >> >> exclusive lock handling process will automatically take care of >> >> >> >> re-acquiring the lock from the dead client and resuming IO. >> >> >> >> >> >> >> >> This option not only re-uses existing code, it would also eliminate >> >> >> >> the need to add/update the RPC messages for prepare/commit/abort >> >> >> >> snapshot creation to support group snapshots (since it could all be >> >> >> >> handled internally). >> >> >> >> >> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> >> >> >> >> wrote: >> >> >> >>> Gentlemen, >> >> >> >>> >> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes. >> >> >> >>> >> >> >> >>> Here is the situation. In order to take snapshots of all images in a >> >> >> >>> consistency group, >> >> >> >>> we first need to quiesce all the image writers in the consistency >> >> >> >>> group. >> >> >> >>> Let me call >> >> >> >>> group client - a client which requests a consistency group to take a >> >> >> >>> snapshot. >> >> >> >>> Image client - the client that writes to an image. >> >> >> >>> Let's say group client starts sending notify_quiesce to all image >> >> >> >>> clients that write to the images in the group. After quiescing half of >> >> >> >>> the image clients the group client can die. >> >> >> >>> >> >> >> >>> It presents us with a dilemma - what should we do with those quiesced >> >> >> >>> image clients. >> >> >> >>> >> >> >> >>> Option 1 - is to wait till someone manually runs recover for that >> >> >> >>> consistency group. >> >> >> >>> We can show warning next to those unfinished groups when user runs >> >> >> >>> group list command. >> >> >> >>> There will be a command like group recover, which allows users to >> >> >> >>> rollback unsuccessful snapshots >> >> >> >>> or continue them using create snapshot command. >> >> >> >>> >> >> >> >>> Option 2 - is to establish some heart beats between group client and >> >> >> >>> image client. If group client fails to heart beat then image client >> >> >> >>> unquiesces itself and continues normal operation. >> >> >> >>> >> >> >> >>> Option 3 - is to have a timeout for each image client. If group client >> >> >> >>> fails to make a group snapshot within this timeout then we resume our >> >> >> >>> normal operation informing group client of the fact. >> >> >> >>> >> >> >> >>> Which of these options do you prefer? Probably there are other options >> >> >> >>> that I miss. >> >> >> >>> >> >> >> >>> Thanks, >> >> >> >>> Victor. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> Jason >> >> >> > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > Jason >> >> > >> >> > >> >> > >> >> > -- >> >> > Jason >> >> > >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> >> the body of a message to majordomo@vger.kernel.org >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-08-26 21:05 ` Victor Denisov @ 2016-08-29 0:37 ` Jason Dillaman 2016-08-29 21:10 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Jason Dillaman @ 2016-08-29 0:37 UTC (permalink / raw) To: Victor Denisov; +Cc: Mykola Golub, ceph-devel, Josh Durgin I think the first step is to implement the concept of snapshot "namespaces". This could be implemented as an optional variant structure associated with each snapshot at creation (see the ImageWatcher RPC messages or journaling event type encoding for examples of this). For consistency group snapshots, this structure would identify the snapshot as belonging to the consistency group and have a unique id back to the specific group snapshot. When creating a snapshot, the state machine would (1) create the group snapshot record, (2) set the state of the group to "creating snapshot" (to prevent new images from being added/removed from the group while the op is in-progress), (3) acquire the lock for all images in the group, (4) create the individual image snapshots with the linkage back to the group snapshot record (can be performed in parallel up to max concurrent ops), (5) release the exclusive locks, and (6) reset the group status to "ready". If you have a hard crash/failure anywhere, a "snap remove" operation should be designed to get the group back into consistent state (i.e. remove any snapshots linked to the group and reset the group state back to ready). On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > Guys, > > I updated Snapshots section of this document: > http://pad.ceph.com/p/consistency_groups, in accordance with my > improved understanding of how it should be implemented. > Please take a look and provide your comments. Some of my concerns > regarding the implementation I highlighted in bold. > > Looking forward to your valuable remarks. > Thanks in advance. > V. > > > On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote: >> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote: >>> What if I'm holding this lock and somebody else is trying to reacquire the lock. >>> How do I get notified about it? >> >> The image watcher is notified, which triggers its handler: >> >> ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx) >> >> The handler calls the current lock policy method `lock_requested()`, >> which will define what to do with the lock request. The StandartPolicy >> is to release the lock, so it may ping-ponging between the >> clients. You may define a different policy -- rbd-mirror is an example >> where it is used. >> >> Everywhere where an operation needs the exclusive lock, it is always >> checked if we currently are a lock owner, i.e: >> >> ictx->exclusive_lock->is_lock_owner() >> >> and if it is false, the exlusive lock is requested. Before this check >> you need to aquire ctx->owner_lock, and until you release owner_lock >> you can be sure your exclusive lock will not leak to another >> client. After releasing owner_lock, you will need to repeate the check >> again when you need it. >> >> -- >> Mykola Golub >> >>> >>> >>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote: >>> >> Could you please point me to the place in source code where writer >>> >> acquires an exclusive lock on the image. >>> > >>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need >>> > (after opening the image) is: >>> > >>> > ``` >>> > C_SaferCond lock_ctx; >>> > { >>> > RWLock::WLocker l(ictx->owner_lock); >>> > >>> > if (ictx->exclusive_lock == nullptr) { >>> > // exclusive-lock feature is not enabled >>> > return -EINVAL; >>> > } >>> > >>> > // Request the lock. If it is currently owned by another client, >>> > // RPC message will be sent to the client to release the lock. >>> > ictx->exclusive_lock->request_lock(&lock_ctx); >>> > } // release owner_lock before waiting to avoid potential deadlock >>> > >>> > int r = lock_ctx.wait(); >>> > if (r < 0) { >>> > return r; >>> > } >>> > >>> > RWLock::RLocker l(ictx->owner_lock); >>> > if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) { >>> > // failed to acquire exclusive lock >>> > return -EROFS; >>> > } >>> > >>> > // At this point lock is acquired >>> > ... >>> > >>> > ``` >>> > >>> > You might want to look at this PR >>> > >>> > https://github.com/ceph/ceph/pull/9592 >>> > >>> > where we discuss adding API methods to directly acquire and release >>> > the exclusive lock. You don't need the API, but will find examples in >>> > the patch, and also useful comments from Jason. >>> > >>> > -- >>> > Mykola Golub >>> > >>> >> I presume you were talking about the feature: >>> >> exclusive_lock, shared_lock which can be used from command line using >>> >> commands lock list, lock break. >>> >> >>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>> >> > There is already a "request lock" RPC message and this is already handled >>> >> > transparently within librbd when you attempt to acquire the lock and another >>> >> > client owns it. >>> >> > >>> >> > >>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote: >>> >> >> >>> >> >> If an image already has a writer who owns the lock, >>> >> >> should I implement a notification that allows to ask the writer to >>> >> >> release the lock, >>> >> >> is there already a standard way to intercept the exclusive lock? >>> >> >> >>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com> >>> >> >> wrote: >>> >> >> > ... one more thing: >>> >> >> > >>> >> >> > I was also thinking that we need a new RBD feature bit to be used to >>> >> >> > indicate that an image is part of a consistency group to prevent older >>> >> >> > librbd clients from removing the image or group snapshots. This could >>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can >>> >> >> > still open the image R/O while its part of a group. >>> >> >> > >>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> >>> >> >> > wrote: >>> >> >> >> Way back in April when we had the CDM, I was originally thinking we >>> >> >> >> should implement option 3. Essentially, you have a prepare group >>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller. >>> >> >> >> When that lease expires, IO would automatically be resumed even if the >>> >> >> >> group snapshot hasn't been created yet. This would also require >>> >> >> >> commit/abort group snapshot RPC messages. >>> >> >> >> >>> >> >> >> However, thinking about this last night, here is another potential >>> >> >> >> option: >>> >> >> >> >>> >> >> >> Option 4 - require images to have the exclusive lock feature before >>> >> >> >> they can be added to a consistency group (and prevent disabling of >>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the >>> >> >> >> rbd CLI (or client application of the rbd consistency group snap >>> >> >> >> create API), can co-operatively acquire the lock from all active image >>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and >>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal >>> >> >> >> exclusive lock handling process will automatically take care of >>> >> >> >> re-acquiring the lock from the dead client and resuming IO. >>> >> >> >> >>> >> >> >> This option not only re-uses existing code, it would also eliminate >>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort >>> >> >> >> snapshot creation to support group snapshots (since it could all be >>> >> >> >> handled internally). >>> >> >> >> >>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> >>> >> >> >> wrote: >>> >> >> >>> Gentlemen, >>> >> >> >>> >>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes. >>> >> >> >>> >>> >> >> >>> Here is the situation. In order to take snapshots of all images in a >>> >> >> >>> consistency group, >>> >> >> >>> we first need to quiesce all the image writers in the consistency >>> >> >> >>> group. >>> >> >> >>> Let me call >>> >> >> >>> group client - a client which requests a consistency group to take a >>> >> >> >>> snapshot. >>> >> >> >>> Image client - the client that writes to an image. >>> >> >> >>> Let's say group client starts sending notify_quiesce to all image >>> >> >> >>> clients that write to the images in the group. After quiescing half of >>> >> >> >>> the image clients the group client can die. >>> >> >> >>> >>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced >>> >> >> >>> image clients. >>> >> >> >>> >>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that >>> >> >> >>> consistency group. >>> >> >> >>> We can show warning next to those unfinished groups when user runs >>> >> >> >>> group list command. >>> >> >> >>> There will be a command like group recover, which allows users to >>> >> >> >>> rollback unsuccessful snapshots >>> >> >> >>> or continue them using create snapshot command. >>> >> >> >>> >>> >> >> >>> Option 2 - is to establish some heart beats between group client and >>> >> >> >>> image client. If group client fails to heart beat then image client >>> >> >> >>> unquiesces itself and continues normal operation. >>> >> >> >>> >>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client >>> >> >> >>> fails to make a group snapshot within this timeout then we resume our >>> >> >> >>> normal operation informing group client of the fact. >>> >> >> >>> >>> >> >> >>> Which of these options do you prefer? Probably there are other options >>> >> >> >>> that I miss. >>> >> >> >>> >>> >> >> >>> Thanks, >>> >> >> >>> Victor. >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> -- >>> >> >> >> Jason >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > -- >>> >> >> > Jason >>> >> > >>> >> > >>> >> > >>> >> > -- >>> >> > Jason >>> >> > >>> >> -- >>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> >> the body of a message to majordomo@vger.kernel.org >>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-08-29 0:37 ` Jason Dillaman @ 2016-08-29 21:10 ` Victor Denisov 2016-09-09 22:41 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-08-29 21:10 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin Right, I forgot about snaphot "namespaces". I'll add this part. I guess it makes sense to discuss the whole thing on the next CDM. On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@redhat.com> wrote: > I think the first step is to implement the concept of snapshot "namespaces". > > This could be implemented as an optional variant structure associated > with each snapshot at creation (see the ImageWatcher RPC messages or > journaling event type encoding for examples of this). For consistency > group snapshots, this structure would identify the snapshot as > belonging to the consistency group and have a unique id back to the > specific group snapshot. > > When creating a snapshot, the state machine would (1) create the group > snapshot record, (2) set the state of the group to "creating snapshot" > (to prevent new images from being added/removed from the group while > the op is in-progress), (3) acquire the lock for all images in the > group, (4) create the individual image snapshots with the linkage back > to the group snapshot record (can be performed in parallel up to max > concurrent ops), (5) release the exclusive locks, and (6) reset the > group status to "ready". > > If you have a hard crash/failure anywhere, a "snap remove" operation > should be designed to get the group back into consistent state (i.e. > remove any snapshots linked to the group and reset the group state > back to ready). > > On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> Guys, >> >> I updated Snapshots section of this document: >> http://pad.ceph.com/p/consistency_groups, in accordance with my >> improved understanding of how it should be implemented. >> Please take a look and provide your comments. Some of my concerns >> regarding the implementation I highlighted in bold. >> >> Looking forward to your valuable remarks. >> Thanks in advance. >> V. >> >> >> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote: >>>> What if I'm holding this lock and somebody else is trying to reacquire the lock. >>>> How do I get notified about it? >>> >>> The image watcher is notified, which triggers its handler: >>> >>> ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx) >>> >>> The handler calls the current lock policy method `lock_requested()`, >>> which will define what to do with the lock request. The StandartPolicy >>> is to release the lock, so it may ping-ponging between the >>> clients. You may define a different policy -- rbd-mirror is an example >>> where it is used. >>> >>> Everywhere where an operation needs the exclusive lock, it is always >>> checked if we currently are a lock owner, i.e: >>> >>> ictx->exclusive_lock->is_lock_owner() >>> >>> and if it is false, the exlusive lock is requested. Before this check >>> you need to aquire ctx->owner_lock, and until you release owner_lock >>> you can be sure your exclusive lock will not leak to another >>> client. After releasing owner_lock, you will need to repeate the check >>> again when you need it. >>> >>> -- >>> Mykola Golub >>> >>>> >>>> >>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote: >>>> >> Could you please point me to the place in source code where writer >>>> >> acquires an exclusive lock on the image. >>>> > >>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need >>>> > (after opening the image) is: >>>> > >>>> > ``` >>>> > C_SaferCond lock_ctx; >>>> > { >>>> > RWLock::WLocker l(ictx->owner_lock); >>>> > >>>> > if (ictx->exclusive_lock == nullptr) { >>>> > // exclusive-lock feature is not enabled >>>> > return -EINVAL; >>>> > } >>>> > >>>> > // Request the lock. If it is currently owned by another client, >>>> > // RPC message will be sent to the client to release the lock. >>>> > ictx->exclusive_lock->request_lock(&lock_ctx); >>>> > } // release owner_lock before waiting to avoid potential deadlock >>>> > >>>> > int r = lock_ctx.wait(); >>>> > if (r < 0) { >>>> > return r; >>>> > } >>>> > >>>> > RWLock::RLocker l(ictx->owner_lock); >>>> > if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) { >>>> > // failed to acquire exclusive lock >>>> > return -EROFS; >>>> > } >>>> > >>>> > // At this point lock is acquired >>>> > ... >>>> > >>>> > ``` >>>> > >>>> > You might want to look at this PR >>>> > >>>> > https://github.com/ceph/ceph/pull/9592 >>>> > >>>> > where we discuss adding API methods to directly acquire and release >>>> > the exclusive lock. You don't need the API, but will find examples in >>>> > the patch, and also useful comments from Jason. >>>> > >>>> > -- >>>> > Mykola Golub >>>> > >>>> >> I presume you were talking about the feature: >>>> >> exclusive_lock, shared_lock which can be used from command line using >>>> >> commands lock list, lock break. >>>> >> >>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>> >> > There is already a "request lock" RPC message and this is already handled >>>> >> > transparently within librbd when you attempt to acquire the lock and another >>>> >> > client owns it. >>>> >> > >>>> >> > >>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote: >>>> >> >> >>>> >> >> If an image already has a writer who owns the lock, >>>> >> >> should I implement a notification that allows to ask the writer to >>>> >> >> release the lock, >>>> >> >> is there already a standard way to intercept the exclusive lock? >>>> >> >> >>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com> >>>> >> >> wrote: >>>> >> >> > ... one more thing: >>>> >> >> > >>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to >>>> >> >> > indicate that an image is part of a consistency group to prevent older >>>> >> >> > librbd clients from removing the image or group snapshots. This could >>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can >>>> >> >> > still open the image R/O while its part of a group. >>>> >> >> > >>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> >>>> >> >> > wrote: >>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we >>>> >> >> >> should implement option 3. Essentially, you have a prepare group >>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller. >>>> >> >> >> When that lease expires, IO would automatically be resumed even if the >>>> >> >> >> group snapshot hasn't been created yet. This would also require >>>> >> >> >> commit/abort group snapshot RPC messages. >>>> >> >> >> >>>> >> >> >> However, thinking about this last night, here is another potential >>>> >> >> >> option: >>>> >> >> >> >>>> >> >> >> Option 4 - require images to have the exclusive lock feature before >>>> >> >> >> they can be added to a consistency group (and prevent disabling of >>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the >>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap >>>> >> >> >> create API), can co-operatively acquire the lock from all active image >>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and >>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal >>>> >> >> >> exclusive lock handling process will automatically take care of >>>> >> >> >> re-acquiring the lock from the dead client and resuming IO. >>>> >> >> >> >>>> >> >> >> This option not only re-uses existing code, it would also eliminate >>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort >>>> >> >> >> snapshot creation to support group snapshots (since it could all be >>>> >> >> >> handled internally). >>>> >> >> >> >>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> >>>> >> >> >> wrote: >>>> >> >> >>> Gentlemen, >>>> >> >> >>> >>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes. >>>> >> >> >>> >>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a >>>> >> >> >>> consistency group, >>>> >> >> >>> we first need to quiesce all the image writers in the consistency >>>> >> >> >>> group. >>>> >> >> >>> Let me call >>>> >> >> >>> group client - a client which requests a consistency group to take a >>>> >> >> >>> snapshot. >>>> >> >> >>> Image client - the client that writes to an image. >>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image >>>> >> >> >>> clients that write to the images in the group. After quiescing half of >>>> >> >> >>> the image clients the group client can die. >>>> >> >> >>> >>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced >>>> >> >> >>> image clients. >>>> >> >> >>> >>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that >>>> >> >> >>> consistency group. >>>> >> >> >>> We can show warning next to those unfinished groups when user runs >>>> >> >> >>> group list command. >>>> >> >> >>> There will be a command like group recover, which allows users to >>>> >> >> >>> rollback unsuccessful snapshots >>>> >> >> >>> or continue them using create snapshot command. >>>> >> >> >>> >>>> >> >> >>> Option 2 - is to establish some heart beats between group client and >>>> >> >> >>> image client. If group client fails to heart beat then image client >>>> >> >> >>> unquiesces itself and continues normal operation. >>>> >> >> >>> >>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client >>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our >>>> >> >> >>> normal operation informing group client of the fact. >>>> >> >> >>> >>>> >> >> >>> Which of these options do you prefer? Probably there are other options >>>> >> >> >>> that I miss. >>>> >> >> >>> >>>> >> >> >>> Thanks, >>>> >> >> >>> Victor. >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> -- >>>> >> >> >> Jason >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> > -- >>>> >> >> > Jason >>>> >> > >>>> >> > >>>> >> > >>>> >> > -- >>>> >> > Jason >>>> >> > >>>> >> -- >>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> >> the body of a message to majordomo@vger.kernel.org >>>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-08-29 21:10 ` Victor Denisov @ 2016-09-09 22:41 ` Victor Denisov 2016-09-10 18:37 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-09-09 22:41 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin I have a question about where SnapshotNamespace type should be placed. I placed it in cls/rbd/cls_rbd_types.h because cls client and cls backend components should have access to this type. Also this type is required in librbd/Operations.cc - because we want to specify in what namespace Operations::snap_create should create snapshots. However Operations.cc doesn't import cls_rbd_types.h right now. If the question was about public interface of librbd/librbd.cc, then I would create a duplicate of SnapshotNamespace type in librbd layer without hesitation. But these functions are internal, so, my question is whether it's really feasible to create another type for SnapshotNamespace in librbd layer. On Mon, Aug 29, 2016 at 2:10 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > Right, I forgot about snaphot "namespaces". I'll add this part. > I guess it makes sense to discuss the whole thing on the next CDM. > > On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@redhat.com> wrote: >> I think the first step is to implement the concept of snapshot "namespaces". >> >> This could be implemented as an optional variant structure associated >> with each snapshot at creation (see the ImageWatcher RPC messages or >> journaling event type encoding for examples of this). For consistency >> group snapshots, this structure would identify the snapshot as >> belonging to the consistency group and have a unique id back to the >> specific group snapshot. >> >> When creating a snapshot, the state machine would (1) create the group >> snapshot record, (2) set the state of the group to "creating snapshot" >> (to prevent new images from being added/removed from the group while >> the op is in-progress), (3) acquire the lock for all images in the >> group, (4) create the individual image snapshots with the linkage back >> to the group snapshot record (can be performed in parallel up to max >> concurrent ops), (5) release the exclusive locks, and (6) reset the >> group status to "ready". >> >> If you have a hard crash/failure anywhere, a "snap remove" operation >> should be designed to get the group back into consistent state (i.e. >> remove any snapshots linked to the group and reset the group state >> back to ready). >> >> On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> Guys, >>> >>> I updated Snapshots section of this document: >>> http://pad.ceph.com/p/consistency_groups, in accordance with my >>> improved understanding of how it should be implemented. >>> Please take a look and provide your comments. Some of my concerns >>> regarding the implementation I highlighted in bold. >>> >>> Looking forward to your valuable remarks. >>> Thanks in advance. >>> V. >>> >>> >>> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote: >>>>> What if I'm holding this lock and somebody else is trying to reacquire the lock. >>>>> How do I get notified about it? >>>> >>>> The image watcher is notified, which triggers its handler: >>>> >>>> ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx) >>>> >>>> The handler calls the current lock policy method `lock_requested()`, >>>> which will define what to do with the lock request. The StandartPolicy >>>> is to release the lock, so it may ping-ponging between the >>>> clients. You may define a different policy -- rbd-mirror is an example >>>> where it is used. >>>> >>>> Everywhere where an operation needs the exclusive lock, it is always >>>> checked if we currently are a lock owner, i.e: >>>> >>>> ictx->exclusive_lock->is_lock_owner() >>>> >>>> and if it is false, the exlusive lock is requested. Before this check >>>> you need to aquire ctx->owner_lock, and until you release owner_lock >>>> you can be sure your exclusive lock will not leak to another >>>> client. After releasing owner_lock, you will need to repeate the check >>>> again when you need it. >>>> >>>> -- >>>> Mykola Golub >>>> >>>>> >>>>> >>>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote: >>>>> >> Could you please point me to the place in source code where writer >>>>> >> acquires an exclusive lock on the image. >>>>> > >>>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need >>>>> > (after opening the image) is: >>>>> > >>>>> > ``` >>>>> > C_SaferCond lock_ctx; >>>>> > { >>>>> > RWLock::WLocker l(ictx->owner_lock); >>>>> > >>>>> > if (ictx->exclusive_lock == nullptr) { >>>>> > // exclusive-lock feature is not enabled >>>>> > return -EINVAL; >>>>> > } >>>>> > >>>>> > // Request the lock. If it is currently owned by another client, >>>>> > // RPC message will be sent to the client to release the lock. >>>>> > ictx->exclusive_lock->request_lock(&lock_ctx); >>>>> > } // release owner_lock before waiting to avoid potential deadlock >>>>> > >>>>> > int r = lock_ctx.wait(); >>>>> > if (r < 0) { >>>>> > return r; >>>>> > } >>>>> > >>>>> > RWLock::RLocker l(ictx->owner_lock); >>>>> > if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) { >>>>> > // failed to acquire exclusive lock >>>>> > return -EROFS; >>>>> > } >>>>> > >>>>> > // At this point lock is acquired >>>>> > ... >>>>> > >>>>> > ``` >>>>> > >>>>> > You might want to look at this PR >>>>> > >>>>> > https://github.com/ceph/ceph/pull/9592 >>>>> > >>>>> > where we discuss adding API methods to directly acquire and release >>>>> > the exclusive lock. You don't need the API, but will find examples in >>>>> > the patch, and also useful comments from Jason. >>>>> > >>>>> > -- >>>>> > Mykola Golub >>>>> > >>>>> >> I presume you were talking about the feature: >>>>> >> exclusive_lock, shared_lock which can be used from command line using >>>>> >> commands lock list, lock break. >>>>> >> >>>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>> >> > There is already a "request lock" RPC message and this is already handled >>>>> >> > transparently within librbd when you attempt to acquire the lock and another >>>>> >> > client owns it. >>>>> >> > >>>>> >> > >>>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>> >> >> >>>>> >> >> If an image already has a writer who owns the lock, >>>>> >> >> should I implement a notification that allows to ask the writer to >>>>> >> >> release the lock, >>>>> >> >> is there already a standard way to intercept the exclusive lock? >>>>> >> >> >>>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com> >>>>> >> >> wrote: >>>>> >> >> > ... one more thing: >>>>> >> >> > >>>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to >>>>> >> >> > indicate that an image is part of a consistency group to prevent older >>>>> >> >> > librbd clients from removing the image or group snapshots. This could >>>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can >>>>> >> >> > still open the image R/O while its part of a group. >>>>> >> >> > >>>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> >>>>> >> >> > wrote: >>>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we >>>>> >> >> >> should implement option 3. Essentially, you have a prepare group >>>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller. >>>>> >> >> >> When that lease expires, IO would automatically be resumed even if the >>>>> >> >> >> group snapshot hasn't been created yet. This would also require >>>>> >> >> >> commit/abort group snapshot RPC messages. >>>>> >> >> >> >>>>> >> >> >> However, thinking about this last night, here is another potential >>>>> >> >> >> option: >>>>> >> >> >> >>>>> >> >> >> Option 4 - require images to have the exclusive lock feature before >>>>> >> >> >> they can be added to a consistency group (and prevent disabling of >>>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the >>>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap >>>>> >> >> >> create API), can co-operatively acquire the lock from all active image >>>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and >>>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal >>>>> >> >> >> exclusive lock handling process will automatically take care of >>>>> >> >> >> re-acquiring the lock from the dead client and resuming IO. >>>>> >> >> >> >>>>> >> >> >> This option not only re-uses existing code, it would also eliminate >>>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort >>>>> >> >> >> snapshot creation to support group snapshots (since it could all be >>>>> >> >> >> handled internally). >>>>> >> >> >> >>>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> >>>>> >> >> >> wrote: >>>>> >> >> >>> Gentlemen, >>>>> >> >> >>> >>>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes. >>>>> >> >> >>> >>>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a >>>>> >> >> >>> consistency group, >>>>> >> >> >>> we first need to quiesce all the image writers in the consistency >>>>> >> >> >>> group. >>>>> >> >> >>> Let me call >>>>> >> >> >>> group client - a client which requests a consistency group to take a >>>>> >> >> >>> snapshot. >>>>> >> >> >>> Image client - the client that writes to an image. >>>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image >>>>> >> >> >>> clients that write to the images in the group. After quiescing half of >>>>> >> >> >>> the image clients the group client can die. >>>>> >> >> >>> >>>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced >>>>> >> >> >>> image clients. >>>>> >> >> >>> >>>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that >>>>> >> >> >>> consistency group. >>>>> >> >> >>> We can show warning next to those unfinished groups when user runs >>>>> >> >> >>> group list command. >>>>> >> >> >>> There will be a command like group recover, which allows users to >>>>> >> >> >>> rollback unsuccessful snapshots >>>>> >> >> >>> or continue them using create snapshot command. >>>>> >> >> >>> >>>>> >> >> >>> Option 2 - is to establish some heart beats between group client and >>>>> >> >> >>> image client. If group client fails to heart beat then image client >>>>> >> >> >>> unquiesces itself and continues normal operation. >>>>> >> >> >>> >>>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client >>>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our >>>>> >> >> >>> normal operation informing group client of the fact. >>>>> >> >> >>> >>>>> >> >> >>> Which of these options do you prefer? Probably there are other options >>>>> >> >> >>> that I miss. >>>>> >> >> >>> >>>>> >> >> >>> Thanks, >>>>> >> >> >>> Victor. >>>>> >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>> >> >> >> -- >>>>> >> >> >> Jason >>>>> >> >> > >>>>> >> >> > >>>>> >> >> > >>>>> >> >> > -- >>>>> >> >> > Jason >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > -- >>>>> >> > Jason >>>>> >> > >>>>> >> -- >>>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>> >> the body of a message to majordomo@vger.kernel.org >>>>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> -- >> Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-09-09 22:41 ` Victor Denisov @ 2016-09-10 18:37 ` Jason Dillaman 2016-09-11 1:46 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Jason Dillaman @ 2016-09-10 18:37 UTC (permalink / raw) To: Victor Denisov; +Cc: Mykola Golub, ceph-devel, Josh Durgin Those are all internal classes -- the cls types are already dependencies within the librbd internals. Feel free to add the necessary include and use it directly from within librbd. On Fri, Sep 9, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > I have a question about where SnapshotNamespace type should be placed. > I placed it in cls/rbd/cls_rbd_types.h because cls client and cls > backend components should have access to this type. > Also this type is required in librbd/Operations.cc - because we want > to specify in what namespace Operations::snap_create should create > snapshots. > However Operations.cc doesn't import cls_rbd_types.h right now. If the > question was about public interface of librbd/librbd.cc, then I would > create a duplicate of SnapshotNamespace type in librbd layer without > hesitation. > But these functions are internal, so, my question is whether it's > really feasible to create another type for SnapshotNamespace in librbd > layer. > > > On Mon, Aug 29, 2016 at 2:10 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> Right, I forgot about snaphot "namespaces". I'll add this part. >> I guess it makes sense to discuss the whole thing on the next CDM. >> >> On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>> I think the first step is to implement the concept of snapshot "namespaces". >>> >>> This could be implemented as an optional variant structure associated >>> with each snapshot at creation (see the ImageWatcher RPC messages or >>> journaling event type encoding for examples of this). For consistency >>> group snapshots, this structure would identify the snapshot as >>> belonging to the consistency group and have a unique id back to the >>> specific group snapshot. >>> >>> When creating a snapshot, the state machine would (1) create the group >>> snapshot record, (2) set the state of the group to "creating snapshot" >>> (to prevent new images from being added/removed from the group while >>> the op is in-progress), (3) acquire the lock for all images in the >>> group, (4) create the individual image snapshots with the linkage back >>> to the group snapshot record (can be performed in parallel up to max >>> concurrent ops), (5) release the exclusive locks, and (6) reset the >>> group status to "ready". >>> >>> If you have a hard crash/failure anywhere, a "snap remove" operation >>> should be designed to get the group back into consistent state (i.e. >>> remove any snapshots linked to the group and reset the group state >>> back to ready). >>> >>> On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>> Guys, >>>> >>>> I updated Snapshots section of this document: >>>> http://pad.ceph.com/p/consistency_groups, in accordance with my >>>> improved understanding of how it should be implemented. >>>> Please take a look and provide your comments. Some of my concerns >>>> regarding the implementation I highlighted in bold. >>>> >>>> Looking forward to your valuable remarks. >>>> Thanks in advance. >>>> V. >>>> >>>> >>>> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>>>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote: >>>>>> What if I'm holding this lock and somebody else is trying to reacquire the lock. >>>>>> How do I get notified about it? >>>>> >>>>> The image watcher is notified, which triggers its handler: >>>>> >>>>> ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx) >>>>> >>>>> The handler calls the current lock policy method `lock_requested()`, >>>>> which will define what to do with the lock request. The StandartPolicy >>>>> is to release the lock, so it may ping-ponging between the >>>>> clients. You may define a different policy -- rbd-mirror is an example >>>>> where it is used. >>>>> >>>>> Everywhere where an operation needs the exclusive lock, it is always >>>>> checked if we currently are a lock owner, i.e: >>>>> >>>>> ictx->exclusive_lock->is_lock_owner() >>>>> >>>>> and if it is false, the exlusive lock is requested. Before this check >>>>> you need to aquire ctx->owner_lock, and until you release owner_lock >>>>> you can be sure your exclusive lock will not leak to another >>>>> client. After releasing owner_lock, you will need to repeate the check >>>>> again when you need it. >>>>> >>>>> -- >>>>> Mykola Golub >>>>> >>>>>> >>>>>> >>>>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>>>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote: >>>>>> >> Could you please point me to the place in source code where writer >>>>>> >> acquires an exclusive lock on the image. >>>>>> > >>>>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need >>>>>> > (after opening the image) is: >>>>>> > >>>>>> > ``` >>>>>> > C_SaferCond lock_ctx; >>>>>> > { >>>>>> > RWLock::WLocker l(ictx->owner_lock); >>>>>> > >>>>>> > if (ictx->exclusive_lock == nullptr) { >>>>>> > // exclusive-lock feature is not enabled >>>>>> > return -EINVAL; >>>>>> > } >>>>>> > >>>>>> > // Request the lock. If it is currently owned by another client, >>>>>> > // RPC message will be sent to the client to release the lock. >>>>>> > ictx->exclusive_lock->request_lock(&lock_ctx); >>>>>> > } // release owner_lock before waiting to avoid potential deadlock >>>>>> > >>>>>> > int r = lock_ctx.wait(); >>>>>> > if (r < 0) { >>>>>> > return r; >>>>>> > } >>>>>> > >>>>>> > RWLock::RLocker l(ictx->owner_lock); >>>>>> > if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) { >>>>>> > // failed to acquire exclusive lock >>>>>> > return -EROFS; >>>>>> > } >>>>>> > >>>>>> > // At this point lock is acquired >>>>>> > ... >>>>>> > >>>>>> > ``` >>>>>> > >>>>>> > You might want to look at this PR >>>>>> > >>>>>> > https://github.com/ceph/ceph/pull/9592 >>>>>> > >>>>>> > where we discuss adding API methods to directly acquire and release >>>>>> > the exclusive lock. You don't need the API, but will find examples in >>>>>> > the patch, and also useful comments from Jason. >>>>>> > >>>>>> > -- >>>>>> > Mykola Golub >>>>>> > >>>>>> >> I presume you were talking about the feature: >>>>>> >> exclusive_lock, shared_lock which can be used from command line using >>>>>> >> commands lock list, lock break. >>>>>> >> >>>>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>> >> > There is already a "request lock" RPC message and this is already handled >>>>>> >> > transparently within librbd when you attempt to acquire the lock and another >>>>>> >> > client owns it. >>>>>> >> > >>>>>> >> > >>>>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>> >> >> >>>>>> >> >> If an image already has a writer who owns the lock, >>>>>> >> >> should I implement a notification that allows to ask the writer to >>>>>> >> >> release the lock, >>>>>> >> >> is there already a standard way to intercept the exclusive lock? >>>>>> >> >> >>>>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com> >>>>>> >> >> wrote: >>>>>> >> >> > ... one more thing: >>>>>> >> >> > >>>>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to >>>>>> >> >> > indicate that an image is part of a consistency group to prevent older >>>>>> >> >> > librbd clients from removing the image or group snapshots. This could >>>>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can >>>>>> >> >> > still open the image R/O while its part of a group. >>>>>> >> >> > >>>>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> >>>>>> >> >> > wrote: >>>>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we >>>>>> >> >> >> should implement option 3. Essentially, you have a prepare group >>>>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller. >>>>>> >> >> >> When that lease expires, IO would automatically be resumed even if the >>>>>> >> >> >> group snapshot hasn't been created yet. This would also require >>>>>> >> >> >> commit/abort group snapshot RPC messages. >>>>>> >> >> >> >>>>>> >> >> >> However, thinking about this last night, here is another potential >>>>>> >> >> >> option: >>>>>> >> >> >> >>>>>> >> >> >> Option 4 - require images to have the exclusive lock feature before >>>>>> >> >> >> they can be added to a consistency group (and prevent disabling of >>>>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the >>>>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap >>>>>> >> >> >> create API), can co-operatively acquire the lock from all active image >>>>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and >>>>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal >>>>>> >> >> >> exclusive lock handling process will automatically take care of >>>>>> >> >> >> re-acquiring the lock from the dead client and resuming IO. >>>>>> >> >> >> >>>>>> >> >> >> This option not only re-uses existing code, it would also eliminate >>>>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort >>>>>> >> >> >> snapshot creation to support group snapshots (since it could all be >>>>>> >> >> >> handled internally). >>>>>> >> >> >> >>>>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> >>>>>> >> >> >> wrote: >>>>>> >> >> >>> Gentlemen, >>>>>> >> >> >>> >>>>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes. >>>>>> >> >> >>> >>>>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a >>>>>> >> >> >>> consistency group, >>>>>> >> >> >>> we first need to quiesce all the image writers in the consistency >>>>>> >> >> >>> group. >>>>>> >> >> >>> Let me call >>>>>> >> >> >>> group client - a client which requests a consistency group to take a >>>>>> >> >> >>> snapshot. >>>>>> >> >> >>> Image client - the client that writes to an image. >>>>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image >>>>>> >> >> >>> clients that write to the images in the group. After quiescing half of >>>>>> >> >> >>> the image clients the group client can die. >>>>>> >> >> >>> >>>>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced >>>>>> >> >> >>> image clients. >>>>>> >> >> >>> >>>>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that >>>>>> >> >> >>> consistency group. >>>>>> >> >> >>> We can show warning next to those unfinished groups when user runs >>>>>> >> >> >>> group list command. >>>>>> >> >> >>> There will be a command like group recover, which allows users to >>>>>> >> >> >>> rollback unsuccessful snapshots >>>>>> >> >> >>> or continue them using create snapshot command. >>>>>> >> >> >>> >>>>>> >> >> >>> Option 2 - is to establish some heart beats between group client and >>>>>> >> >> >>> image client. If group client fails to heart beat then image client >>>>>> >> >> >>> unquiesces itself and continues normal operation. >>>>>> >> >> >>> >>>>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client >>>>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our >>>>>> >> >> >>> normal operation informing group client of the fact. >>>>>> >> >> >>> >>>>>> >> >> >>> Which of these options do you prefer? Probably there are other options >>>>>> >> >> >>> that I miss. >>>>>> >> >> >>> >>>>>> >> >> >>> Thanks, >>>>>> >> >> >>> Victor. >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> -- >>>>>> >> >> >> Jason >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> > -- >>>>>> >> >> > Jason >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> >> > -- >>>>>> >> > Jason >>>>>> >> > >>>>>> >> -- >>>>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> >> the body of a message to majordomo@vger.kernel.org >>>>>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> >>> -- >>> Jason -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-09-10 18:37 ` Jason Dillaman @ 2016-09-11 1:46 ` Victor Denisov 2016-09-12 13:18 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-09-11 1:46 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin Another question. Maybe not really a question, but I would like to verify if I understood what you wrote in the ether pad. You suggest to create image snapshots simultaneously. If everything shuts down when we are making those individual snapshots, then we end up with a SnapshotRecord in incomplete state and images either with snapshots or without them. Do I understand correctly that if the user wants to remove this unfinished group snapshot then we have to: - list all images in this group - look for snapshots in those images with the guid as their name. - delete those individual snapshots and ignore errors if those snapshots don't exist. - delete then entry. One thing that I don't understand in this case is, what if the user decides to delete one of the images when there are dangling group snapshots. Let's call this image A. This dangling group snapshot could have successfully created a snapshot of this image A. Let's call this snapshot A_snap. Now if we remove image A from this group then once we try to cleanup dangling group snapshot then A_snap shapshot will be overlooked, because image A is not a member of the group any more. And I don't understand how we can prevent this from happening in this approach, except by disallowing to remove images if there are dandling group snapshots. You mentioned that we should call image's individual snapshots after the groups guid. I assume we should name them something like <guid>_<group_snap_id>. If we named them only using guid, then we would be able to create only one group snapshot. Thanks, V. On Sat, Sep 10, 2016 at 11:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: > Those are all internal classes -- the cls types are already > dependencies within the librbd internals. Feel free to add the > necessary include and use it directly from within librbd. > > On Fri, Sep 9, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> I have a question about where SnapshotNamespace type should be placed. >> I placed it in cls/rbd/cls_rbd_types.h because cls client and cls >> backend components should have access to this type. >> Also this type is required in librbd/Operations.cc - because we want >> to specify in what namespace Operations::snap_create should create >> snapshots. >> However Operations.cc doesn't import cls_rbd_types.h right now. If the >> question was about public interface of librbd/librbd.cc, then I would >> create a duplicate of SnapshotNamespace type in librbd layer without >> hesitation. >> But these functions are internal, so, my question is whether it's >> really feasible to create another type for SnapshotNamespace in librbd >> layer. >> >> >> On Mon, Aug 29, 2016 at 2:10 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> Right, I forgot about snaphot "namespaces". I'll add this part. >>> I guess it makes sense to discuss the whole thing on the next CDM. >>> >>> On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>> I think the first step is to implement the concept of snapshot "namespaces". >>>> >>>> This could be implemented as an optional variant structure associated >>>> with each snapshot at creation (see the ImageWatcher RPC messages or >>>> journaling event type encoding for examples of this). For consistency >>>> group snapshots, this structure would identify the snapshot as >>>> belonging to the consistency group and have a unique id back to the >>>> specific group snapshot. >>>> >>>> When creating a snapshot, the state machine would (1) create the group >>>> snapshot record, (2) set the state of the group to "creating snapshot" >>>> (to prevent new images from being added/removed from the group while >>>> the op is in-progress), (3) acquire the lock for all images in the >>>> group, (4) create the individual image snapshots with the linkage back >>>> to the group snapshot record (can be performed in parallel up to max >>>> concurrent ops), (5) release the exclusive locks, and (6) reset the >>>> group status to "ready". >>>> >>>> If you have a hard crash/failure anywhere, a "snap remove" operation >>>> should be designed to get the group back into consistent state (i.e. >>>> remove any snapshots linked to the group and reset the group state >>>> back to ready). >>>> >>>> On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>> Guys, >>>>> >>>>> I updated Snapshots section of this document: >>>>> http://pad.ceph.com/p/consistency_groups, in accordance with my >>>>> improved understanding of how it should be implemented. >>>>> Please take a look and provide your comments. Some of my concerns >>>>> regarding the implementation I highlighted in bold. >>>>> >>>>> Looking forward to your valuable remarks. >>>>> Thanks in advance. >>>>> V. >>>>> >>>>> >>>>> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>>>>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote: >>>>>>> What if I'm holding this lock and somebody else is trying to reacquire the lock. >>>>>>> How do I get notified about it? >>>>>> >>>>>> The image watcher is notified, which triggers its handler: >>>>>> >>>>>> ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx) >>>>>> >>>>>> The handler calls the current lock policy method `lock_requested()`, >>>>>> which will define what to do with the lock request. The StandartPolicy >>>>>> is to release the lock, so it may ping-ponging between the >>>>>> clients. You may define a different policy -- rbd-mirror is an example >>>>>> where it is used. >>>>>> >>>>>> Everywhere where an operation needs the exclusive lock, it is always >>>>>> checked if we currently are a lock owner, i.e: >>>>>> >>>>>> ictx->exclusive_lock->is_lock_owner() >>>>>> >>>>>> and if it is false, the exlusive lock is requested. Before this check >>>>>> you need to aquire ctx->owner_lock, and until you release owner_lock >>>>>> you can be sure your exclusive lock will not leak to another >>>>>> client. After releasing owner_lock, you will need to repeate the check >>>>>> again when you need it. >>>>>> >>>>>> -- >>>>>> Mykola Golub >>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>>>>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote: >>>>>>> >> Could you please point me to the place in source code where writer >>>>>>> >> acquires an exclusive lock on the image. >>>>>>> > >>>>>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need >>>>>>> > (after opening the image) is: >>>>>>> > >>>>>>> > ``` >>>>>>> > C_SaferCond lock_ctx; >>>>>>> > { >>>>>>> > RWLock::WLocker l(ictx->owner_lock); >>>>>>> > >>>>>>> > if (ictx->exclusive_lock == nullptr) { >>>>>>> > // exclusive-lock feature is not enabled >>>>>>> > return -EINVAL; >>>>>>> > } >>>>>>> > >>>>>>> > // Request the lock. If it is currently owned by another client, >>>>>>> > // RPC message will be sent to the client to release the lock. >>>>>>> > ictx->exclusive_lock->request_lock(&lock_ctx); >>>>>>> > } // release owner_lock before waiting to avoid potential deadlock >>>>>>> > >>>>>>> > int r = lock_ctx.wait(); >>>>>>> > if (r < 0) { >>>>>>> > return r; >>>>>>> > } >>>>>>> > >>>>>>> > RWLock::RLocker l(ictx->owner_lock); >>>>>>> > if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) { >>>>>>> > // failed to acquire exclusive lock >>>>>>> > return -EROFS; >>>>>>> > } >>>>>>> > >>>>>>> > // At this point lock is acquired >>>>>>> > ... >>>>>>> > >>>>>>> > ``` >>>>>>> > >>>>>>> > You might want to look at this PR >>>>>>> > >>>>>>> > https://github.com/ceph/ceph/pull/9592 >>>>>>> > >>>>>>> > where we discuss adding API methods to directly acquire and release >>>>>>> > the exclusive lock. You don't need the API, but will find examples in >>>>>>> > the patch, and also useful comments from Jason. >>>>>>> > >>>>>>> > -- >>>>>>> > Mykola Golub >>>>>>> > >>>>>>> >> I presume you were talking about the feature: >>>>>>> >> exclusive_lock, shared_lock which can be used from command line using >>>>>>> >> commands lock list, lock break. >>>>>>> >> >>>>>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>> >> > There is already a "request lock" RPC message and this is already handled >>>>>>> >> > transparently within librbd when you attempt to acquire the lock and another >>>>>>> >> > client owns it. >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>> >> >> >>>>>>> >> >> If an image already has a writer who owns the lock, >>>>>>> >> >> should I implement a notification that allows to ask the writer to >>>>>>> >> >> release the lock, >>>>>>> >> >> is there already a standard way to intercept the exclusive lock? >>>>>>> >> >> >>>>>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com> >>>>>>> >> >> wrote: >>>>>>> >> >> > ... one more thing: >>>>>>> >> >> > >>>>>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to >>>>>>> >> >> > indicate that an image is part of a consistency group to prevent older >>>>>>> >> >> > librbd clients from removing the image or group snapshots. This could >>>>>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can >>>>>>> >> >> > still open the image R/O while its part of a group. >>>>>>> >> >> > >>>>>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> >>>>>>> >> >> > wrote: >>>>>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we >>>>>>> >> >> >> should implement option 3. Essentially, you have a prepare group >>>>>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller. >>>>>>> >> >> >> When that lease expires, IO would automatically be resumed even if the >>>>>>> >> >> >> group snapshot hasn't been created yet. This would also require >>>>>>> >> >> >> commit/abort group snapshot RPC messages. >>>>>>> >> >> >> >>>>>>> >> >> >> However, thinking about this last night, here is another potential >>>>>>> >> >> >> option: >>>>>>> >> >> >> >>>>>>> >> >> >> Option 4 - require images to have the exclusive lock feature before >>>>>>> >> >> >> they can be added to a consistency group (and prevent disabling of >>>>>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the >>>>>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap >>>>>>> >> >> >> create API), can co-operatively acquire the lock from all active image >>>>>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and >>>>>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal >>>>>>> >> >> >> exclusive lock handling process will automatically take care of >>>>>>> >> >> >> re-acquiring the lock from the dead client and resuming IO. >>>>>>> >> >> >> >>>>>>> >> >> >> This option not only re-uses existing code, it would also eliminate >>>>>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort >>>>>>> >> >> >> snapshot creation to support group snapshots (since it could all be >>>>>>> >> >> >> handled internally). >>>>>>> >> >> >> >>>>>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> >>>>>>> >> >> >> wrote: >>>>>>> >> >> >>> Gentlemen, >>>>>>> >> >> >>> >>>>>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes. >>>>>>> >> >> >>> >>>>>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a >>>>>>> >> >> >>> consistency group, >>>>>>> >> >> >>> we first need to quiesce all the image writers in the consistency >>>>>>> >> >> >>> group. >>>>>>> >> >> >>> Let me call >>>>>>> >> >> >>> group client - a client which requests a consistency group to take a >>>>>>> >> >> >>> snapshot. >>>>>>> >> >> >>> Image client - the client that writes to an image. >>>>>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image >>>>>>> >> >> >>> clients that write to the images in the group. After quiescing half of >>>>>>> >> >> >>> the image clients the group client can die. >>>>>>> >> >> >>> >>>>>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced >>>>>>> >> >> >>> image clients. >>>>>>> >> >> >>> >>>>>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that >>>>>>> >> >> >>> consistency group. >>>>>>> >> >> >>> We can show warning next to those unfinished groups when user runs >>>>>>> >> >> >>> group list command. >>>>>>> >> >> >>> There will be a command like group recover, which allows users to >>>>>>> >> >> >>> rollback unsuccessful snapshots >>>>>>> >> >> >>> or continue them using create snapshot command. >>>>>>> >> >> >>> >>>>>>> >> >> >>> Option 2 - is to establish some heart beats between group client and >>>>>>> >> >> >>> image client. If group client fails to heart beat then image client >>>>>>> >> >> >>> unquiesces itself and continues normal operation. >>>>>>> >> >> >>> >>>>>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client >>>>>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our >>>>>>> >> >> >>> normal operation informing group client of the fact. >>>>>>> >> >> >>> >>>>>>> >> >> >>> Which of these options do you prefer? Probably there are other options >>>>>>> >> >> >>> that I miss. >>>>>>> >> >> >>> >>>>>>> >> >> >>> Thanks, >>>>>>> >> >> >>> Victor. >>>>>>> >> >> >> >>>>>>> >> >> >> >>>>>>> >> >> >> >>>>>>> >> >> >> -- >>>>>>> >> >> >> Jason >>>>>>> >> >> > >>>>>>> >> >> > >>>>>>> >> >> > >>>>>>> >> >> > -- >>>>>>> >> >> > Jason >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > -- >>>>>>> >> > Jason >>>>>>> >> > >>>>>>> >> -- >>>>>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>> >> the body of a message to majordomo@vger.kernel.org >>>>>>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>>> >>>> -- >>>> Jason > > > > -- > Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-09-11 1:46 ` Victor Denisov @ 2016-09-12 13:18 ` Jason Dillaman 2016-09-12 22:52 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Jason Dillaman @ 2016-09-12 13:18 UTC (permalink / raw) To: Victor Denisov; +Cc: Mykola Golub, ceph-devel, Josh Durgin On Sat, Sep 10, 2016 at 9:46 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > Another question. Maybe not really a question, but I would like to > verify if I understood what you wrote in the ether pad. > > You suggest to create image snapshots simultaneously. > If everything shuts down when we are making those individual > snapshots, then we end up with a SnapshotRecord in incomplete state > and images either with snapshots or without them. > Do I understand correctly that if the user wants to remove this > unfinished group snapshot then we have to: > - list all images in this group > - look for snapshots in those images with the guid as their name. > - delete those individual snapshots and ignore errors if those > snapshots don't exist. > - delete then entry. It would be the standard remove state machine, which is basically the steps you have above. Note that you would always need to handle the "-ENOENT" case since I could always associate an image to a group after a group snap was created (i.e. so the new image doesn't have a matching image snapshot for a group snapshot). > One thing that I don't understand in this case is, what if the user > decides to delete one of the images when there are dangling group > snapshots. Let's call this image A. > This dangling group snapshot could have successfully created a > snapshot of this image A. Let's call this snapshot A_snap. > Now if we remove image A from this group then once we try to cleanup > dangling group snapshot then A_snap shapshot will be overlooked, > because image A is not a member of the group any more. > And I don't understand how we can prevent this from happening in this > approach, except by disallowing to remove images if there are dandling > group snapshots. How is the snapshot dangling for image A? If it successfully created the snapshot on image A, it has a snapshot record that associates it to the group. Therefore, when the image is removed from the group, I would think you would automatically delete the the group snapshots contained within the image. > You mentioned that we should call image's individual snapshots after > the groups guid. I assume we should name them something like > <guid>_<group_snap_id>. > If we named them only using guid, then we would be able to create only > one group snapshot. Yup -- that should be fine. > Thanks, > V. > > > On Sat, Sep 10, 2016 at 11:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >> Those are all internal classes -- the cls types are already >> dependencies within the librbd internals. Feel free to add the >> necessary include and use it directly from within librbd. >> >> On Fri, Sep 9, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> I have a question about where SnapshotNamespace type should be placed. >>> I placed it in cls/rbd/cls_rbd_types.h because cls client and cls >>> backend components should have access to this type. >>> Also this type is required in librbd/Operations.cc - because we want >>> to specify in what namespace Operations::snap_create should create >>> snapshots. >>> However Operations.cc doesn't import cls_rbd_types.h right now. If the >>> question was about public interface of librbd/librbd.cc, then I would >>> create a duplicate of SnapshotNamespace type in librbd layer without >>> hesitation. >>> But these functions are internal, so, my question is whether it's >>> really feasible to create another type for SnapshotNamespace in librbd >>> layer. >>> >>> >>> On Mon, Aug 29, 2016 at 2:10 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>> Right, I forgot about snaphot "namespaces". I'll add this part. >>>> I guess it makes sense to discuss the whole thing on the next CDM. >>>> >>>> On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>> I think the first step is to implement the concept of snapshot "namespaces". >>>>> >>>>> This could be implemented as an optional variant structure associated >>>>> with each snapshot at creation (see the ImageWatcher RPC messages or >>>>> journaling event type encoding for examples of this). For consistency >>>>> group snapshots, this structure would identify the snapshot as >>>>> belonging to the consistency group and have a unique id back to the >>>>> specific group snapshot. >>>>> >>>>> When creating a snapshot, the state machine would (1) create the group >>>>> snapshot record, (2) set the state of the group to "creating snapshot" >>>>> (to prevent new images from being added/removed from the group while >>>>> the op is in-progress), (3) acquire the lock for all images in the >>>>> group, (4) create the individual image snapshots with the linkage back >>>>> to the group snapshot record (can be performed in parallel up to max >>>>> concurrent ops), (5) release the exclusive locks, and (6) reset the >>>>> group status to "ready". >>>>> >>>>> If you have a hard crash/failure anywhere, a "snap remove" operation >>>>> should be designed to get the group back into consistent state (i.e. >>>>> remove any snapshots linked to the group and reset the group state >>>>> back to ready). >>>>> >>>>> On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>> Guys, >>>>>> >>>>>> I updated Snapshots section of this document: >>>>>> http://pad.ceph.com/p/consistency_groups, in accordance with my >>>>>> improved understanding of how it should be implemented. >>>>>> Please take a look and provide your comments. Some of my concerns >>>>>> regarding the implementation I highlighted in bold. >>>>>> >>>>>> Looking forward to your valuable remarks. >>>>>> Thanks in advance. >>>>>> V. >>>>>> >>>>>> >>>>>> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>>>>>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote: >>>>>>>> What if I'm holding this lock and somebody else is trying to reacquire the lock. >>>>>>>> How do I get notified about it? >>>>>>> >>>>>>> The image watcher is notified, which triggers its handler: >>>>>>> >>>>>>> ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx) >>>>>>> >>>>>>> The handler calls the current lock policy method `lock_requested()`, >>>>>>> which will define what to do with the lock request. The StandartPolicy >>>>>>> is to release the lock, so it may ping-ponging between the >>>>>>> clients. You may define a different policy -- rbd-mirror is an example >>>>>>> where it is used. >>>>>>> >>>>>>> Everywhere where an operation needs the exclusive lock, it is always >>>>>>> checked if we currently are a lock owner, i.e: >>>>>>> >>>>>>> ictx->exclusive_lock->is_lock_owner() >>>>>>> >>>>>>> and if it is false, the exlusive lock is requested. Before this check >>>>>>> you need to aquire ctx->owner_lock, and until you release owner_lock >>>>>>> you can be sure your exclusive lock will not leak to another >>>>>>> client. After releasing owner_lock, you will need to repeate the check >>>>>>> again when you need it. >>>>>>> >>>>>>> -- >>>>>>> Mykola Golub >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>>>>>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote: >>>>>>>> >> Could you please point me to the place in source code where writer >>>>>>>> >> acquires an exclusive lock on the image. >>>>>>>> > >>>>>>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need >>>>>>>> > (after opening the image) is: >>>>>>>> > >>>>>>>> > ``` >>>>>>>> > C_SaferCond lock_ctx; >>>>>>>> > { >>>>>>>> > RWLock::WLocker l(ictx->owner_lock); >>>>>>>> > >>>>>>>> > if (ictx->exclusive_lock == nullptr) { >>>>>>>> > // exclusive-lock feature is not enabled >>>>>>>> > return -EINVAL; >>>>>>>> > } >>>>>>>> > >>>>>>>> > // Request the lock. If it is currently owned by another client, >>>>>>>> > // RPC message will be sent to the client to release the lock. >>>>>>>> > ictx->exclusive_lock->request_lock(&lock_ctx); >>>>>>>> > } // release owner_lock before waiting to avoid potential deadlock >>>>>>>> > >>>>>>>> > int r = lock_ctx.wait(); >>>>>>>> > if (r < 0) { >>>>>>>> > return r; >>>>>>>> > } >>>>>>>> > >>>>>>>> > RWLock::RLocker l(ictx->owner_lock); >>>>>>>> > if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) { >>>>>>>> > // failed to acquire exclusive lock >>>>>>>> > return -EROFS; >>>>>>>> > } >>>>>>>> > >>>>>>>> > // At this point lock is acquired >>>>>>>> > ... >>>>>>>> > >>>>>>>> > ``` >>>>>>>> > >>>>>>>> > You might want to look at this PR >>>>>>>> > >>>>>>>> > https://github.com/ceph/ceph/pull/9592 >>>>>>>> > >>>>>>>> > where we discuss adding API methods to directly acquire and release >>>>>>>> > the exclusive lock. You don't need the API, but will find examples in >>>>>>>> > the patch, and also useful comments from Jason. >>>>>>>> > >>>>>>>> > -- >>>>>>>> > Mykola Golub >>>>>>>> > >>>>>>>> >> I presume you were talking about the feature: >>>>>>>> >> exclusive_lock, shared_lock which can be used from command line using >>>>>>>> >> commands lock list, lock break. >>>>>>>> >> >>>>>>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>> >> > There is already a "request lock" RPC message and this is already handled >>>>>>>> >> > transparently within librbd when you attempt to acquire the lock and another >>>>>>>> >> > client owns it. >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>> >> >> >>>>>>>> >> >> If an image already has a writer who owns the lock, >>>>>>>> >> >> should I implement a notification that allows to ask the writer to >>>>>>>> >> >> release the lock, >>>>>>>> >> >> is there already a standard way to intercept the exclusive lock? >>>>>>>> >> >> >>>>>>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com> >>>>>>>> >> >> wrote: >>>>>>>> >> >> > ... one more thing: >>>>>>>> >> >> > >>>>>>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to >>>>>>>> >> >> > indicate that an image is part of a consistency group to prevent older >>>>>>>> >> >> > librbd clients from removing the image or group snapshots. This could >>>>>>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can >>>>>>>> >> >> > still open the image R/O while its part of a group. >>>>>>>> >> >> > >>>>>>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> >>>>>>>> >> >> > wrote: >>>>>>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we >>>>>>>> >> >> >> should implement option 3. Essentially, you have a prepare group >>>>>>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller. >>>>>>>> >> >> >> When that lease expires, IO would automatically be resumed even if the >>>>>>>> >> >> >> group snapshot hasn't been created yet. This would also require >>>>>>>> >> >> >> commit/abort group snapshot RPC messages. >>>>>>>> >> >> >> >>>>>>>> >> >> >> However, thinking about this last night, here is another potential >>>>>>>> >> >> >> option: >>>>>>>> >> >> >> >>>>>>>> >> >> >> Option 4 - require images to have the exclusive lock feature before >>>>>>>> >> >> >> they can be added to a consistency group (and prevent disabling of >>>>>>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the >>>>>>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap >>>>>>>> >> >> >> create API), can co-operatively acquire the lock from all active image >>>>>>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and >>>>>>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal >>>>>>>> >> >> >> exclusive lock handling process will automatically take care of >>>>>>>> >> >> >> re-acquiring the lock from the dead client and resuming IO. >>>>>>>> >> >> >> >>>>>>>> >> >> >> This option not only re-uses existing code, it would also eliminate >>>>>>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort >>>>>>>> >> >> >> snapshot creation to support group snapshots (since it could all be >>>>>>>> >> >> >> handled internally). >>>>>>>> >> >> >> >>>>>>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> >>>>>>>> >> >> >> wrote: >>>>>>>> >> >> >>> Gentlemen, >>>>>>>> >> >> >>> >>>>>>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes. >>>>>>>> >> >> >>> >>>>>>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a >>>>>>>> >> >> >>> consistency group, >>>>>>>> >> >> >>> we first need to quiesce all the image writers in the consistency >>>>>>>> >> >> >>> group. >>>>>>>> >> >> >>> Let me call >>>>>>>> >> >> >>> group client - a client which requests a consistency group to take a >>>>>>>> >> >> >>> snapshot. >>>>>>>> >> >> >>> Image client - the client that writes to an image. >>>>>>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image >>>>>>>> >> >> >>> clients that write to the images in the group. After quiescing half of >>>>>>>> >> >> >>> the image clients the group client can die. >>>>>>>> >> >> >>> >>>>>>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced >>>>>>>> >> >> >>> image clients. >>>>>>>> >> >> >>> >>>>>>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that >>>>>>>> >> >> >>> consistency group. >>>>>>>> >> >> >>> We can show warning next to those unfinished groups when user runs >>>>>>>> >> >> >>> group list command. >>>>>>>> >> >> >>> There will be a command like group recover, which allows users to >>>>>>>> >> >> >>> rollback unsuccessful snapshots >>>>>>>> >> >> >>> or continue them using create snapshot command. >>>>>>>> >> >> >>> >>>>>>>> >> >> >>> Option 2 - is to establish some heart beats between group client and >>>>>>>> >> >> >>> image client. If group client fails to heart beat then image client >>>>>>>> >> >> >>> unquiesces itself and continues normal operation. >>>>>>>> >> >> >>> >>>>>>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client >>>>>>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our >>>>>>>> >> >> >>> normal operation informing group client of the fact. >>>>>>>> >> >> >>> >>>>>>>> >> >> >>> Which of these options do you prefer? Probably there are other options >>>>>>>> >> >> >>> that I miss. >>>>>>>> >> >> >>> >>>>>>>> >> >> >>> Thanks, >>>>>>>> >> >> >>> Victor. >>>>>>>> >> >> >> >>>>>>>> >> >> >> >>>>>>>> >> >> >> >>>>>>>> >> >> >> -- >>>>>>>> >> >> >> Jason >>>>>>>> >> >> > >>>>>>>> >> >> > >>>>>>>> >> >> > >>>>>>>> >> >> > -- >>>>>>>> >> >> > Jason >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > -- >>>>>>>> >> > Jason >>>>>>>> >> > >>>>>>>> >> -- >>>>>>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> >> the body of a message to majordomo@vger.kernel.org >>>>>>>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>>>> >>>>> -- >>>>> Jason >> >> >> >> -- >> Jason -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-09-12 13:18 ` Jason Dillaman @ 2016-09-12 22:52 ` Victor Denisov 2016-09-12 23:06 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-09-12 22:52 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin Understood. Thank you, Jason. On Mon, Sep 12, 2016 at 6:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: > On Sat, Sep 10, 2016 at 9:46 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> Another question. Maybe not really a question, but I would like to >> verify if I understood what you wrote in the ether pad. >> >> You suggest to create image snapshots simultaneously. >> If everything shuts down when we are making those individual >> snapshots, then we end up with a SnapshotRecord in incomplete state >> and images either with snapshots or without them. >> Do I understand correctly that if the user wants to remove this >> unfinished group snapshot then we have to: >> - list all images in this group >> - look for snapshots in those images with the guid as their name. >> - delete those individual snapshots and ignore errors if those >> snapshots don't exist. >> - delete then entry. > > It would be the standard remove state machine, which is basically the > steps you have above. Note that you would always need to handle the > "-ENOENT" case since I could always associate an image to a group > after a group snap was created (i.e. so the new image doesn't have a > matching image snapshot for a group snapshot). > >> One thing that I don't understand in this case is, what if the user >> decides to delete one of the images when there are dangling group >> snapshots. Let's call this image A. >> This dangling group snapshot could have successfully created a >> snapshot of this image A. Let's call this snapshot A_snap. >> Now if we remove image A from this group then once we try to cleanup >> dangling group snapshot then A_snap shapshot will be overlooked, >> because image A is not a member of the group any more. >> And I don't understand how we can prevent this from happening in this >> approach, except by disallowing to remove images if there are dandling >> group snapshots. > > How is the snapshot dangling for image A? If it successfully created > the snapshot on image A, it has a snapshot record that associates it > to the group. Therefore, when the image is removed from the group, I > would think you would automatically delete the the group snapshots > contained within the image. > >> You mentioned that we should call image's individual snapshots after >> the groups guid. I assume we should name them something like >> <guid>_<group_snap_id>. >> If we named them only using guid, then we would be able to create only >> one group snapshot. > > Yup -- that should be fine. > >> Thanks, >> V. >> >> >> On Sat, Sep 10, 2016 at 11:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>> Those are all internal classes -- the cls types are already >>> dependencies within the librbd internals. Feel free to add the >>> necessary include and use it directly from within librbd. >>> >>> On Fri, Sep 9, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>> I have a question about where SnapshotNamespace type should be placed. >>>> I placed it in cls/rbd/cls_rbd_types.h because cls client and cls >>>> backend components should have access to this type. >>>> Also this type is required in librbd/Operations.cc - because we want >>>> to specify in what namespace Operations::snap_create should create >>>> snapshots. >>>> However Operations.cc doesn't import cls_rbd_types.h right now. If the >>>> question was about public interface of librbd/librbd.cc, then I would >>>> create a duplicate of SnapshotNamespace type in librbd layer without >>>> hesitation. >>>> But these functions are internal, so, my question is whether it's >>>> really feasible to create another type for SnapshotNamespace in librbd >>>> layer. >>>> >>>> >>>> On Mon, Aug 29, 2016 at 2:10 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>> Right, I forgot about snaphot "namespaces". I'll add this part. >>>>> I guess it makes sense to discuss the whole thing on the next CDM. >>>>> >>>>> On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>> I think the first step is to implement the concept of snapshot "namespaces". >>>>>> >>>>>> This could be implemented as an optional variant structure associated >>>>>> with each snapshot at creation (see the ImageWatcher RPC messages or >>>>>> journaling event type encoding for examples of this). For consistency >>>>>> group snapshots, this structure would identify the snapshot as >>>>>> belonging to the consistency group and have a unique id back to the >>>>>> specific group snapshot. >>>>>> >>>>>> When creating a snapshot, the state machine would (1) create the group >>>>>> snapshot record, (2) set the state of the group to "creating snapshot" >>>>>> (to prevent new images from being added/removed from the group while >>>>>> the op is in-progress), (3) acquire the lock for all images in the >>>>>> group, (4) create the individual image snapshots with the linkage back >>>>>> to the group snapshot record (can be performed in parallel up to max >>>>>> concurrent ops), (5) release the exclusive locks, and (6) reset the >>>>>> group status to "ready". >>>>>> >>>>>> If you have a hard crash/failure anywhere, a "snap remove" operation >>>>>> should be designed to get the group back into consistent state (i.e. >>>>>> remove any snapshots linked to the group and reset the group state >>>>>> back to ready). >>>>>> >>>>>> On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>> Guys, >>>>>>> >>>>>>> I updated Snapshots section of this document: >>>>>>> http://pad.ceph.com/p/consistency_groups, in accordance with my >>>>>>> improved understanding of how it should be implemented. >>>>>>> Please take a look and provide your comments. Some of my concerns >>>>>>> regarding the implementation I highlighted in bold. >>>>>>> >>>>>>> Looking forward to your valuable remarks. >>>>>>> Thanks in advance. >>>>>>> V. >>>>>>> >>>>>>> >>>>>>> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>>>>>>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote: >>>>>>>>> What if I'm holding this lock and somebody else is trying to reacquire the lock. >>>>>>>>> How do I get notified about it? >>>>>>>> >>>>>>>> The image watcher is notified, which triggers its handler: >>>>>>>> >>>>>>>> ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx) >>>>>>>> >>>>>>>> The handler calls the current lock policy method `lock_requested()`, >>>>>>>> which will define what to do with the lock request. The StandartPolicy >>>>>>>> is to release the lock, so it may ping-ponging between the >>>>>>>> clients. You may define a different policy -- rbd-mirror is an example >>>>>>>> where it is used. >>>>>>>> >>>>>>>> Everywhere where an operation needs the exclusive lock, it is always >>>>>>>> checked if we currently are a lock owner, i.e: >>>>>>>> >>>>>>>> ictx->exclusive_lock->is_lock_owner() >>>>>>>> >>>>>>>> and if it is false, the exlusive lock is requested. Before this check >>>>>>>> you need to aquire ctx->owner_lock, and until you release owner_lock >>>>>>>> you can be sure your exclusive lock will not leak to another >>>>>>>> client. After releasing owner_lock, you will need to repeate the check >>>>>>>> again when you need it. >>>>>>>> >>>>>>>> -- >>>>>>>> Mykola Golub >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>>>>>>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote: >>>>>>>>> >> Could you please point me to the place in source code where writer >>>>>>>>> >> acquires an exclusive lock on the image. >>>>>>>>> > >>>>>>>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need >>>>>>>>> > (after opening the image) is: >>>>>>>>> > >>>>>>>>> > ``` >>>>>>>>> > C_SaferCond lock_ctx; >>>>>>>>> > { >>>>>>>>> > RWLock::WLocker l(ictx->owner_lock); >>>>>>>>> > >>>>>>>>> > if (ictx->exclusive_lock == nullptr) { >>>>>>>>> > // exclusive-lock feature is not enabled >>>>>>>>> > return -EINVAL; >>>>>>>>> > } >>>>>>>>> > >>>>>>>>> > // Request the lock. If it is currently owned by another client, >>>>>>>>> > // RPC message will be sent to the client to release the lock. >>>>>>>>> > ictx->exclusive_lock->request_lock(&lock_ctx); >>>>>>>>> > } // release owner_lock before waiting to avoid potential deadlock >>>>>>>>> > >>>>>>>>> > int r = lock_ctx.wait(); >>>>>>>>> > if (r < 0) { >>>>>>>>> > return r; >>>>>>>>> > } >>>>>>>>> > >>>>>>>>> > RWLock::RLocker l(ictx->owner_lock); >>>>>>>>> > if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) { >>>>>>>>> > // failed to acquire exclusive lock >>>>>>>>> > return -EROFS; >>>>>>>>> > } >>>>>>>>> > >>>>>>>>> > // At this point lock is acquired >>>>>>>>> > ... >>>>>>>>> > >>>>>>>>> > ``` >>>>>>>>> > >>>>>>>>> > You might want to look at this PR >>>>>>>>> > >>>>>>>>> > https://github.com/ceph/ceph/pull/9592 >>>>>>>>> > >>>>>>>>> > where we discuss adding API methods to directly acquire and release >>>>>>>>> > the exclusive lock. You don't need the API, but will find examples in >>>>>>>>> > the patch, and also useful comments from Jason. >>>>>>>>> > >>>>>>>>> > -- >>>>>>>>> > Mykola Golub >>>>>>>>> > >>>>>>>>> >> I presume you were talking about the feature: >>>>>>>>> >> exclusive_lock, shared_lock which can be used from command line using >>>>>>>>> >> commands lock list, lock break. >>>>>>>>> >> >>>>>>>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>> >> > There is already a "request lock" RPC message and this is already handled >>>>>>>>> >> > transparently within librbd when you attempt to acquire the lock and another >>>>>>>>> >> > client owns it. >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>> >> >> >>>>>>>>> >> >> If an image already has a writer who owns the lock, >>>>>>>>> >> >> should I implement a notification that allows to ask the writer to >>>>>>>>> >> >> release the lock, >>>>>>>>> >> >> is there already a standard way to intercept the exclusive lock? >>>>>>>>> >> >> >>>>>>>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com> >>>>>>>>> >> >> wrote: >>>>>>>>> >> >> > ... one more thing: >>>>>>>>> >> >> > >>>>>>>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to >>>>>>>>> >> >> > indicate that an image is part of a consistency group to prevent older >>>>>>>>> >> >> > librbd clients from removing the image or group snapshots. This could >>>>>>>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can >>>>>>>>> >> >> > still open the image R/O while its part of a group. >>>>>>>>> >> >> > >>>>>>>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> >>>>>>>>> >> >> > wrote: >>>>>>>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we >>>>>>>>> >> >> >> should implement option 3. Essentially, you have a prepare group >>>>>>>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller. >>>>>>>>> >> >> >> When that lease expires, IO would automatically be resumed even if the >>>>>>>>> >> >> >> group snapshot hasn't been created yet. This would also require >>>>>>>>> >> >> >> commit/abort group snapshot RPC messages. >>>>>>>>> >> >> >> >>>>>>>>> >> >> >> However, thinking about this last night, here is another potential >>>>>>>>> >> >> >> option: >>>>>>>>> >> >> >> >>>>>>>>> >> >> >> Option 4 - require images to have the exclusive lock feature before >>>>>>>>> >> >> >> they can be added to a consistency group (and prevent disabling of >>>>>>>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the >>>>>>>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap >>>>>>>>> >> >> >> create API), can co-operatively acquire the lock from all active image >>>>>>>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and >>>>>>>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal >>>>>>>>> >> >> >> exclusive lock handling process will automatically take care of >>>>>>>>> >> >> >> re-acquiring the lock from the dead client and resuming IO. >>>>>>>>> >> >> >> >>>>>>>>> >> >> >> This option not only re-uses existing code, it would also eliminate >>>>>>>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort >>>>>>>>> >> >> >> snapshot creation to support group snapshots (since it could all be >>>>>>>>> >> >> >> handled internally). >>>>>>>>> >> >> >> >>>>>>>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> >>>>>>>>> >> >> >> wrote: >>>>>>>>> >> >> >>> Gentlemen, >>>>>>>>> >> >> >>> >>>>>>>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes. >>>>>>>>> >> >> >>> >>>>>>>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a >>>>>>>>> >> >> >>> consistency group, >>>>>>>>> >> >> >>> we first need to quiesce all the image writers in the consistency >>>>>>>>> >> >> >>> group. >>>>>>>>> >> >> >>> Let me call >>>>>>>>> >> >> >>> group client - a client which requests a consistency group to take a >>>>>>>>> >> >> >>> snapshot. >>>>>>>>> >> >> >>> Image client - the client that writes to an image. >>>>>>>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image >>>>>>>>> >> >> >>> clients that write to the images in the group. After quiescing half of >>>>>>>>> >> >> >>> the image clients the group client can die. >>>>>>>>> >> >> >>> >>>>>>>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced >>>>>>>>> >> >> >>> image clients. >>>>>>>>> >> >> >>> >>>>>>>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that >>>>>>>>> >> >> >>> consistency group. >>>>>>>>> >> >> >>> We can show warning next to those unfinished groups when user runs >>>>>>>>> >> >> >>> group list command. >>>>>>>>> >> >> >>> There will be a command like group recover, which allows users to >>>>>>>>> >> >> >>> rollback unsuccessful snapshots >>>>>>>>> >> >> >>> or continue them using create snapshot command. >>>>>>>>> >> >> >>> >>>>>>>>> >> >> >>> Option 2 - is to establish some heart beats between group client and >>>>>>>>> >> >> >>> image client. If group client fails to heart beat then image client >>>>>>>>> >> >> >>> unquiesces itself and continues normal operation. >>>>>>>>> >> >> >>> >>>>>>>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client >>>>>>>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our >>>>>>>>> >> >> >>> normal operation informing group client of the fact. >>>>>>>>> >> >> >>> >>>>>>>>> >> >> >>> Which of these options do you prefer? Probably there are other options >>>>>>>>> >> >> >>> that I miss. >>>>>>>>> >> >> >>> >>>>>>>>> >> >> >>> Thanks, >>>>>>>>> >> >> >>> Victor. >>>>>>>>> >> >> >> >>>>>>>>> >> >> >> >>>>>>>>> >> >> >> >>>>>>>>> >> >> >> -- >>>>>>>>> >> >> >> Jason >>>>>>>>> >> >> > >>>>>>>>> >> >> > >>>>>>>>> >> >> > >>>>>>>>> >> >> > -- >>>>>>>>> >> >> > Jason >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> >> > -- >>>>>>>>> >> > Jason >>>>>>>>> >> > >>>>>>>>> >> -- >>>>>>>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>> >> the body of a message to majordomo@vger.kernel.org >>>>>>>>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>> -- >>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jason >>> >>> >>> >>> -- >>> Jason > > > > -- > Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-09-12 22:52 ` Victor Denisov @ 2016-09-12 23:06 ` Victor Denisov 2016-09-12 23:09 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-09-12 23:06 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin Another quick question. Do you think it makes sense to introduce snapshot namespaces in a pull request and review it first? It looks like a self sufficient change that we can merge before introducing snapshots. On Mon, Sep 12, 2016 at 3:52 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > Understood. Thank you, Jason. > > On Mon, Sep 12, 2016 at 6:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >> On Sat, Sep 10, 2016 at 9:46 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> Another question. Maybe not really a question, but I would like to >>> verify if I understood what you wrote in the ether pad. >>> >>> You suggest to create image snapshots simultaneously. >>> If everything shuts down when we are making those individual >>> snapshots, then we end up with a SnapshotRecord in incomplete state >>> and images either with snapshots or without them. >>> Do I understand correctly that if the user wants to remove this >>> unfinished group snapshot then we have to: >>> - list all images in this group >>> - look for snapshots in those images with the guid as their name. >>> - delete those individual snapshots and ignore errors if those >>> snapshots don't exist. >>> - delete then entry. >> >> It would be the standard remove state machine, which is basically the >> steps you have above. Note that you would always need to handle the >> "-ENOENT" case since I could always associate an image to a group >> after a group snap was created (i.e. so the new image doesn't have a >> matching image snapshot for a group snapshot). >> >>> One thing that I don't understand in this case is, what if the user >>> decides to delete one of the images when there are dangling group >>> snapshots. Let's call this image A. >>> This dangling group snapshot could have successfully created a >>> snapshot of this image A. Let's call this snapshot A_snap. >>> Now if we remove image A from this group then once we try to cleanup >>> dangling group snapshot then A_snap shapshot will be overlooked, >>> because image A is not a member of the group any more. >>> And I don't understand how we can prevent this from happening in this >>> approach, except by disallowing to remove images if there are dandling >>> group snapshots. >> >> How is the snapshot dangling for image A? If it successfully created >> the snapshot on image A, it has a snapshot record that associates it >> to the group. Therefore, when the image is removed from the group, I >> would think you would automatically delete the the group snapshots >> contained within the image. >> >>> You mentioned that we should call image's individual snapshots after >>> the groups guid. I assume we should name them something like >>> <guid>_<group_snap_id>. >>> If we named them only using guid, then we would be able to create only >>> one group snapshot. >> >> Yup -- that should be fine. >> >>> Thanks, >>> V. >>> >>> >>> On Sat, Sep 10, 2016 at 11:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>> Those are all internal classes -- the cls types are already >>>> dependencies within the librbd internals. Feel free to add the >>>> necessary include and use it directly from within librbd. >>>> >>>> On Fri, Sep 9, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>> I have a question about where SnapshotNamespace type should be placed. >>>>> I placed it in cls/rbd/cls_rbd_types.h because cls client and cls >>>>> backend components should have access to this type. >>>>> Also this type is required in librbd/Operations.cc - because we want >>>>> to specify in what namespace Operations::snap_create should create >>>>> snapshots. >>>>> However Operations.cc doesn't import cls_rbd_types.h right now. If the >>>>> question was about public interface of librbd/librbd.cc, then I would >>>>> create a duplicate of SnapshotNamespace type in librbd layer without >>>>> hesitation. >>>>> But these functions are internal, so, my question is whether it's >>>>> really feasible to create another type for SnapshotNamespace in librbd >>>>> layer. >>>>> >>>>> >>>>> On Mon, Aug 29, 2016 at 2:10 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>> Right, I forgot about snaphot "namespaces". I'll add this part. >>>>>> I guess it makes sense to discuss the whole thing on the next CDM. >>>>>> >>>>>> On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>> I think the first step is to implement the concept of snapshot "namespaces". >>>>>>> >>>>>>> This could be implemented as an optional variant structure associated >>>>>>> with each snapshot at creation (see the ImageWatcher RPC messages or >>>>>>> journaling event type encoding for examples of this). For consistency >>>>>>> group snapshots, this structure would identify the snapshot as >>>>>>> belonging to the consistency group and have a unique id back to the >>>>>>> specific group snapshot. >>>>>>> >>>>>>> When creating a snapshot, the state machine would (1) create the group >>>>>>> snapshot record, (2) set the state of the group to "creating snapshot" >>>>>>> (to prevent new images from being added/removed from the group while >>>>>>> the op is in-progress), (3) acquire the lock for all images in the >>>>>>> group, (4) create the individual image snapshots with the linkage back >>>>>>> to the group snapshot record (can be performed in parallel up to max >>>>>>> concurrent ops), (5) release the exclusive locks, and (6) reset the >>>>>>> group status to "ready". >>>>>>> >>>>>>> If you have a hard crash/failure anywhere, a "snap remove" operation >>>>>>> should be designed to get the group back into consistent state (i.e. >>>>>>> remove any snapshots linked to the group and reset the group state >>>>>>> back to ready). >>>>>>> >>>>>>> On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>> Guys, >>>>>>>> >>>>>>>> I updated Snapshots section of this document: >>>>>>>> http://pad.ceph.com/p/consistency_groups, in accordance with my >>>>>>>> improved understanding of how it should be implemented. >>>>>>>> Please take a look and provide your comments. Some of my concerns >>>>>>>> regarding the implementation I highlighted in bold. >>>>>>>> >>>>>>>> Looking forward to your valuable remarks. >>>>>>>> Thanks in advance. >>>>>>>> V. >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>>>>>>>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote: >>>>>>>>>> What if I'm holding this lock and somebody else is trying to reacquire the lock. >>>>>>>>>> How do I get notified about it? >>>>>>>>> >>>>>>>>> The image watcher is notified, which triggers its handler: >>>>>>>>> >>>>>>>>> ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx) >>>>>>>>> >>>>>>>>> The handler calls the current lock policy method `lock_requested()`, >>>>>>>>> which will define what to do with the lock request. The StandartPolicy >>>>>>>>> is to release the lock, so it may ping-ponging between the >>>>>>>>> clients. You may define a different policy -- rbd-mirror is an example >>>>>>>>> where it is used. >>>>>>>>> >>>>>>>>> Everywhere where an operation needs the exclusive lock, it is always >>>>>>>>> checked if we currently are a lock owner, i.e: >>>>>>>>> >>>>>>>>> ictx->exclusive_lock->is_lock_owner() >>>>>>>>> >>>>>>>>> and if it is false, the exlusive lock is requested. Before this check >>>>>>>>> you need to aquire ctx->owner_lock, and until you release owner_lock >>>>>>>>> you can be sure your exclusive lock will not leak to another >>>>>>>>> client. After releasing owner_lock, you will need to repeate the check >>>>>>>>> again when you need it. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Mykola Golub >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>>>>>>>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote: >>>>>>>>>> >> Could you please point me to the place in source code where writer >>>>>>>>>> >> acquires an exclusive lock on the image. >>>>>>>>>> > >>>>>>>>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need >>>>>>>>>> > (after opening the image) is: >>>>>>>>>> > >>>>>>>>>> > ``` >>>>>>>>>> > C_SaferCond lock_ctx; >>>>>>>>>> > { >>>>>>>>>> > RWLock::WLocker l(ictx->owner_lock); >>>>>>>>>> > >>>>>>>>>> > if (ictx->exclusive_lock == nullptr) { >>>>>>>>>> > // exclusive-lock feature is not enabled >>>>>>>>>> > return -EINVAL; >>>>>>>>>> > } >>>>>>>>>> > >>>>>>>>>> > // Request the lock. If it is currently owned by another client, >>>>>>>>>> > // RPC message will be sent to the client to release the lock. >>>>>>>>>> > ictx->exclusive_lock->request_lock(&lock_ctx); >>>>>>>>>> > } // release owner_lock before waiting to avoid potential deadlock >>>>>>>>>> > >>>>>>>>>> > int r = lock_ctx.wait(); >>>>>>>>>> > if (r < 0) { >>>>>>>>>> > return r; >>>>>>>>>> > } >>>>>>>>>> > >>>>>>>>>> > RWLock::RLocker l(ictx->owner_lock); >>>>>>>>>> > if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) { >>>>>>>>>> > // failed to acquire exclusive lock >>>>>>>>>> > return -EROFS; >>>>>>>>>> > } >>>>>>>>>> > >>>>>>>>>> > // At this point lock is acquired >>>>>>>>>> > ... >>>>>>>>>> > >>>>>>>>>> > ``` >>>>>>>>>> > >>>>>>>>>> > You might want to look at this PR >>>>>>>>>> > >>>>>>>>>> > https://github.com/ceph/ceph/pull/9592 >>>>>>>>>> > >>>>>>>>>> > where we discuss adding API methods to directly acquire and release >>>>>>>>>> > the exclusive lock. You don't need the API, but will find examples in >>>>>>>>>> > the patch, and also useful comments from Jason. >>>>>>>>>> > >>>>>>>>>> > -- >>>>>>>>>> > Mykola Golub >>>>>>>>>> > >>>>>>>>>> >> I presume you were talking about the feature: >>>>>>>>>> >> exclusive_lock, shared_lock which can be used from command line using >>>>>>>>>> >> commands lock list, lock break. >>>>>>>>>> >> >>>>>>>>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>> >> > There is already a "request lock" RPC message and this is already handled >>>>>>>>>> >> > transparently within librbd when you attempt to acquire the lock and another >>>>>>>>>> >> > client owns it. >>>>>>>>>> >> > >>>>>>>>>> >> > >>>>>>>>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>> >> >> >>>>>>>>>> >> >> If an image already has a writer who owns the lock, >>>>>>>>>> >> >> should I implement a notification that allows to ask the writer to >>>>>>>>>> >> >> release the lock, >>>>>>>>>> >> >> is there already a standard way to intercept the exclusive lock? >>>>>>>>>> >> >> >>>>>>>>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com> >>>>>>>>>> >> >> wrote: >>>>>>>>>> >> >> > ... one more thing: >>>>>>>>>> >> >> > >>>>>>>>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to >>>>>>>>>> >> >> > indicate that an image is part of a consistency group to prevent older >>>>>>>>>> >> >> > librbd clients from removing the image or group snapshots. This could >>>>>>>>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can >>>>>>>>>> >> >> > still open the image R/O while its part of a group. >>>>>>>>>> >> >> > >>>>>>>>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> >>>>>>>>>> >> >> > wrote: >>>>>>>>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we >>>>>>>>>> >> >> >> should implement option 3. Essentially, you have a prepare group >>>>>>>>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller. >>>>>>>>>> >> >> >> When that lease expires, IO would automatically be resumed even if the >>>>>>>>>> >> >> >> group snapshot hasn't been created yet. This would also require >>>>>>>>>> >> >> >> commit/abort group snapshot RPC messages. >>>>>>>>>> >> >> >> >>>>>>>>>> >> >> >> However, thinking about this last night, here is another potential >>>>>>>>>> >> >> >> option: >>>>>>>>>> >> >> >> >>>>>>>>>> >> >> >> Option 4 - require images to have the exclusive lock feature before >>>>>>>>>> >> >> >> they can be added to a consistency group (and prevent disabling of >>>>>>>>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the >>>>>>>>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap >>>>>>>>>> >> >> >> create API), can co-operatively acquire the lock from all active image >>>>>>>>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and >>>>>>>>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal >>>>>>>>>> >> >> >> exclusive lock handling process will automatically take care of >>>>>>>>>> >> >> >> re-acquiring the lock from the dead client and resuming IO. >>>>>>>>>> >> >> >> >>>>>>>>>> >> >> >> This option not only re-uses existing code, it would also eliminate >>>>>>>>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort >>>>>>>>>> >> >> >> snapshot creation to support group snapshots (since it could all be >>>>>>>>>> >> >> >> handled internally). >>>>>>>>>> >> >> >> >>>>>>>>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> >>>>>>>>>> >> >> >> wrote: >>>>>>>>>> >> >> >>> Gentlemen, >>>>>>>>>> >> >> >>> >>>>>>>>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes. >>>>>>>>>> >> >> >>> >>>>>>>>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a >>>>>>>>>> >> >> >>> consistency group, >>>>>>>>>> >> >> >>> we first need to quiesce all the image writers in the consistency >>>>>>>>>> >> >> >>> group. >>>>>>>>>> >> >> >>> Let me call >>>>>>>>>> >> >> >>> group client - a client which requests a consistency group to take a >>>>>>>>>> >> >> >>> snapshot. >>>>>>>>>> >> >> >>> Image client - the client that writes to an image. >>>>>>>>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image >>>>>>>>>> >> >> >>> clients that write to the images in the group. After quiescing half of >>>>>>>>>> >> >> >>> the image clients the group client can die. >>>>>>>>>> >> >> >>> >>>>>>>>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced >>>>>>>>>> >> >> >>> image clients. >>>>>>>>>> >> >> >>> >>>>>>>>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that >>>>>>>>>> >> >> >>> consistency group. >>>>>>>>>> >> >> >>> We can show warning next to those unfinished groups when user runs >>>>>>>>>> >> >> >>> group list command. >>>>>>>>>> >> >> >>> There will be a command like group recover, which allows users to >>>>>>>>>> >> >> >>> rollback unsuccessful snapshots >>>>>>>>>> >> >> >>> or continue them using create snapshot command. >>>>>>>>>> >> >> >>> >>>>>>>>>> >> >> >>> Option 2 - is to establish some heart beats between group client and >>>>>>>>>> >> >> >>> image client. If group client fails to heart beat then image client >>>>>>>>>> >> >> >>> unquiesces itself and continues normal operation. >>>>>>>>>> >> >> >>> >>>>>>>>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client >>>>>>>>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our >>>>>>>>>> >> >> >>> normal operation informing group client of the fact. >>>>>>>>>> >> >> >>> >>>>>>>>>> >> >> >>> Which of these options do you prefer? Probably there are other options >>>>>>>>>> >> >> >>> that I miss. >>>>>>>>>> >> >> >>> >>>>>>>>>> >> >> >>> Thanks, >>>>>>>>>> >> >> >>> Victor. >>>>>>>>>> >> >> >> >>>>>>>>>> >> >> >> >>>>>>>>>> >> >> >> >>>>>>>>>> >> >> >> -- >>>>>>>>>> >> >> >> Jason >>>>>>>>>> >> >> > >>>>>>>>>> >> >> > >>>>>>>>>> >> >> > >>>>>>>>>> >> >> > -- >>>>>>>>>> >> >> > Jason >>>>>>>>>> >> > >>>>>>>>>> >> > >>>>>>>>>> >> > >>>>>>>>>> >> > -- >>>>>>>>>> >> > Jason >>>>>>>>>> >> > >>>>>>>>>> >> -- >>>>>>>>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> >> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jason >>>> >>>> >>>> >>>> -- >>>> Jason >> >> >> >> -- >> Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-09-12 23:06 ` Victor Denisov @ 2016-09-12 23:09 ` Victor Denisov 2016-09-13 12:33 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-09-12 23:09 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin I also think, probably, moving existing group api to Group.cc can be done in a separate pull request. On Mon, Sep 12, 2016 at 4:06 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > Another quick question. > Do you think it makes sense to introduce snapshot namespaces in a pull > request and review it first? > It looks like a self sufficient change that we can merge before > introducing snapshots. > > On Mon, Sep 12, 2016 at 3:52 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> Understood. Thank you, Jason. >> >> On Mon, Sep 12, 2016 at 6:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>> On Sat, Sep 10, 2016 at 9:46 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>> Another question. Maybe not really a question, but I would like to >>>> verify if I understood what you wrote in the ether pad. >>>> >>>> You suggest to create image snapshots simultaneously. >>>> If everything shuts down when we are making those individual >>>> snapshots, then we end up with a SnapshotRecord in incomplete state >>>> and images either with snapshots or without them. >>>> Do I understand correctly that if the user wants to remove this >>>> unfinished group snapshot then we have to: >>>> - list all images in this group >>>> - look for snapshots in those images with the guid as their name. >>>> - delete those individual snapshots and ignore errors if those >>>> snapshots don't exist. >>>> - delete then entry. >>> >>> It would be the standard remove state machine, which is basically the >>> steps you have above. Note that you would always need to handle the >>> "-ENOENT" case since I could always associate an image to a group >>> after a group snap was created (i.e. so the new image doesn't have a >>> matching image snapshot for a group snapshot). >>> >>>> One thing that I don't understand in this case is, what if the user >>>> decides to delete one of the images when there are dangling group >>>> snapshots. Let's call this image A. >>>> This dangling group snapshot could have successfully created a >>>> snapshot of this image A. Let's call this snapshot A_snap. >>>> Now if we remove image A from this group then once we try to cleanup >>>> dangling group snapshot then A_snap shapshot will be overlooked, >>>> because image A is not a member of the group any more. >>>> And I don't understand how we can prevent this from happening in this >>>> approach, except by disallowing to remove images if there are dandling >>>> group snapshots. >>> >>> How is the snapshot dangling for image A? If it successfully created >>> the snapshot on image A, it has a snapshot record that associates it >>> to the group. Therefore, when the image is removed from the group, I >>> would think you would automatically delete the the group snapshots >>> contained within the image. >>> >>>> You mentioned that we should call image's individual snapshots after >>>> the groups guid. I assume we should name them something like >>>> <guid>_<group_snap_id>. >>>> If we named them only using guid, then we would be able to create only >>>> one group snapshot. >>> >>> Yup -- that should be fine. >>> >>>> Thanks, >>>> V. >>>> >>>> >>>> On Sat, Sep 10, 2016 at 11:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>> Those are all internal classes -- the cls types are already >>>>> dependencies within the librbd internals. Feel free to add the >>>>> necessary include and use it directly from within librbd. >>>>> >>>>> On Fri, Sep 9, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>> I have a question about where SnapshotNamespace type should be placed. >>>>>> I placed it in cls/rbd/cls_rbd_types.h because cls client and cls >>>>>> backend components should have access to this type. >>>>>> Also this type is required in librbd/Operations.cc - because we want >>>>>> to specify in what namespace Operations::snap_create should create >>>>>> snapshots. >>>>>> However Operations.cc doesn't import cls_rbd_types.h right now. If the >>>>>> question was about public interface of librbd/librbd.cc, then I would >>>>>> create a duplicate of SnapshotNamespace type in librbd layer without >>>>>> hesitation. >>>>>> But these functions are internal, so, my question is whether it's >>>>>> really feasible to create another type for SnapshotNamespace in librbd >>>>>> layer. >>>>>> >>>>>> >>>>>> On Mon, Aug 29, 2016 at 2:10 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>> Right, I forgot about snaphot "namespaces". I'll add this part. >>>>>>> I guess it makes sense to discuss the whole thing on the next CDM. >>>>>>> >>>>>>> On Sun, Aug 28, 2016 at 5:37 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>> I think the first step is to implement the concept of snapshot "namespaces". >>>>>>>> >>>>>>>> This could be implemented as an optional variant structure associated >>>>>>>> with each snapshot at creation (see the ImageWatcher RPC messages or >>>>>>>> journaling event type encoding for examples of this). For consistency >>>>>>>> group snapshots, this structure would identify the snapshot as >>>>>>>> belonging to the consistency group and have a unique id back to the >>>>>>>> specific group snapshot. >>>>>>>> >>>>>>>> When creating a snapshot, the state machine would (1) create the group >>>>>>>> snapshot record, (2) set the state of the group to "creating snapshot" >>>>>>>> (to prevent new images from being added/removed from the group while >>>>>>>> the op is in-progress), (3) acquire the lock for all images in the >>>>>>>> group, (4) create the individual image snapshots with the linkage back >>>>>>>> to the group snapshot record (can be performed in parallel up to max >>>>>>>> concurrent ops), (5) release the exclusive locks, and (6) reset the >>>>>>>> group status to "ready". >>>>>>>> >>>>>>>> If you have a hard crash/failure anywhere, a "snap remove" operation >>>>>>>> should be designed to get the group back into consistent state (i.e. >>>>>>>> remove any snapshots linked to the group and reset the group state >>>>>>>> back to ready). >>>>>>>> >>>>>>>> On Fri, Aug 26, 2016 at 5:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>> Guys, >>>>>>>>> >>>>>>>>> I updated Snapshots section of this document: >>>>>>>>> http://pad.ceph.com/p/consistency_groups, in accordance with my >>>>>>>>> improved understanding of how it should be implemented. >>>>>>>>> Please take a look and provide your comments. Some of my concerns >>>>>>>>> regarding the implementation I highlighted in bold. >>>>>>>>> >>>>>>>>> Looking forward to your valuable remarks. >>>>>>>>> Thanks in advance. >>>>>>>>> V. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Aug 20, 2016 at 9:27 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>>>>>>>>> On Fri, Aug 19, 2016 at 05:36:56PM -0700, Victor Denisov wrote: >>>>>>>>>>> What if I'm holding this lock and somebody else is trying to reacquire the lock. >>>>>>>>>>> How do I get notified about it? >>>>>>>>>> >>>>>>>>>> The image watcher is notified, which triggers its handler: >>>>>>>>>> >>>>>>>>>> ImageWatcher<I>::handle_payload(const RequestLockPayload, *ack_ctx) >>>>>>>>>> >>>>>>>>>> The handler calls the current lock policy method `lock_requested()`, >>>>>>>>>> which will define what to do with the lock request. The StandartPolicy >>>>>>>>>> is to release the lock, so it may ping-ponging between the >>>>>>>>>> clients. You may define a different policy -- rbd-mirror is an example >>>>>>>>>> where it is used. >>>>>>>>>> >>>>>>>>>> Everywhere where an operation needs the exclusive lock, it is always >>>>>>>>>> checked if we currently are a lock owner, i.e: >>>>>>>>>> >>>>>>>>>> ictx->exclusive_lock->is_lock_owner() >>>>>>>>>> >>>>>>>>>> and if it is false, the exlusive lock is requested. Before this check >>>>>>>>>> you need to aquire ctx->owner_lock, and until you release owner_lock >>>>>>>>>> you can be sure your exclusive lock will not leak to another >>>>>>>>>> client. After releasing owner_lock, you will need to repeate the check >>>>>>>>>> again when you need it. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Mykola Golub >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Aug 19, 2016 at 5:48 AM, Mykola Golub <mgolub@mirantis.com> wrote: >>>>>>>>>>> > On Thu, Aug 18, 2016 at 09:20:02PM -0700, Victor Denisov wrote: >>>>>>>>>>> >> Could you please point me to the place in source code where writer >>>>>>>>>>> >> acquires an exclusive lock on the image. >>>>>>>>>>> > >>>>>>>>>>> > Grep for 'exclusive_lock->request_lock'. Basically, what you need >>>>>>>>>>> > (after opening the image) is: >>>>>>>>>>> > >>>>>>>>>>> > ``` >>>>>>>>>>> > C_SaferCond lock_ctx; >>>>>>>>>>> > { >>>>>>>>>>> > RWLock::WLocker l(ictx->owner_lock); >>>>>>>>>>> > >>>>>>>>>>> > if (ictx->exclusive_lock == nullptr) { >>>>>>>>>>> > // exclusive-lock feature is not enabled >>>>>>>>>>> > return -EINVAL; >>>>>>>>>>> > } >>>>>>>>>>> > >>>>>>>>>>> > // Request the lock. If it is currently owned by another client, >>>>>>>>>>> > // RPC message will be sent to the client to release the lock. >>>>>>>>>>> > ictx->exclusive_lock->request_lock(&lock_ctx); >>>>>>>>>>> > } // release owner_lock before waiting to avoid potential deadlock >>>>>>>>>>> > >>>>>>>>>>> > int r = lock_ctx.wait(); >>>>>>>>>>> > if (r < 0) { >>>>>>>>>>> > return r; >>>>>>>>>>> > } >>>>>>>>>>> > >>>>>>>>>>> > RWLock::RLocker l(ictx->owner_lock); >>>>>>>>>>> > if (ictx->exclusive_lock == nullptr || !ictx->exclusive_lock->is_lock_owner()) { >>>>>>>>>>> > // failed to acquire exclusive lock >>>>>>>>>>> > return -EROFS; >>>>>>>>>>> > } >>>>>>>>>>> > >>>>>>>>>>> > // At this point lock is acquired >>>>>>>>>>> > ... >>>>>>>>>>> > >>>>>>>>>>> > ``` >>>>>>>>>>> > >>>>>>>>>>> > You might want to look at this PR >>>>>>>>>>> > >>>>>>>>>>> > https://github.com/ceph/ceph/pull/9592 >>>>>>>>>>> > >>>>>>>>>>> > where we discuss adding API methods to directly acquire and release >>>>>>>>>>> > the exclusive lock. You don't need the API, but will find examples in >>>>>>>>>>> > the patch, and also useful comments from Jason. >>>>>>>>>>> > >>>>>>>>>>> > -- >>>>>>>>>>> > Mykola Golub >>>>>>>>>>> > >>>>>>>>>>> >> I presume you were talking about the feature: >>>>>>>>>>> >> exclusive_lock, shared_lock which can be used from command line using >>>>>>>>>>> >> commands lock list, lock break. >>>>>>>>>>> >> >>>>>>>>>>> >> On Thu, Aug 18, 2016 at 5:47 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>> >> > There is already a "request lock" RPC message and this is already handled >>>>>>>>>>> >> > transparently within librbd when you attempt to acquire the lock and another >>>>>>>>>>> >> > client owns it. >>>>>>>>>>> >> > >>>>>>>>>>> >> > >>>>>>>>>>> >> > On Thursday, August 18, 2016, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> If an image already has a writer who owns the lock, >>>>>>>>>>> >> >> should I implement a notification that allows to ask the writer to >>>>>>>>>>> >> >> release the lock, >>>>>>>>>>> >> >> is there already a standard way to intercept the exclusive lock? >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> On Tue, Aug 16, 2016 at 6:29 AM, Jason Dillaman <jdillama@redhat.com> >>>>>>>>>>> >> >> wrote: >>>>>>>>>>> >> >> > ... one more thing: >>>>>>>>>>> >> >> > >>>>>>>>>>> >> >> > I was also thinking that we need a new RBD feature bit to be used to >>>>>>>>>>> >> >> > indicate that an image is part of a consistency group to prevent older >>>>>>>>>>> >> >> > librbd clients from removing the image or group snapshots. This could >>>>>>>>>>> >> >> > be a RBD_FEATURES_RW_INCOMPATIBLE feature bit so older clients can >>>>>>>>>>> >> >> > still open the image R/O while its part of a group. >>>>>>>>>>> >> >> > >>>>>>>>>>> >> >> > On Tue, Aug 16, 2016 at 9:26 AM, Jason Dillaman <jdillama@redhat.com> >>>>>>>>>>> >> >> > wrote: >>>>>>>>>>> >> >> >> Way back in April when we had the CDM, I was originally thinking we >>>>>>>>>>> >> >> >> should implement option 3. Essentially, you have a prepare group >>>>>>>>>>> >> >> >> snapshot RPC message that extends a "paused IO" lease to the caller. >>>>>>>>>>> >> >> >> When that lease expires, IO would automatically be resumed even if the >>>>>>>>>>> >> >> >> group snapshot hasn't been created yet. This would also require >>>>>>>>>>> >> >> >> commit/abort group snapshot RPC messages. >>>>>>>>>>> >> >> >> >>>>>>>>>>> >> >> >> However, thinking about this last night, here is another potential >>>>>>>>>>> >> >> >> option: >>>>>>>>>>> >> >> >> >>>>>>>>>>> >> >> >> Option 4 - require images to have the exclusive lock feature before >>>>>>>>>>> >> >> >> they can be added to a consistency group (and prevent disabling of >>>>>>>>>>> >> >> >> exclusive-lock while they are part of a group). Then librbd, via the >>>>>>>>>>> >> >> >> rbd CLI (or client application of the rbd consistency group snap >>>>>>>>>>> >> >> >> create API), can co-operatively acquire the lock from all active image >>>>>>>>>>> >> >> >> clients within the group (i.e. all IO has been flushed and paused) and >>>>>>>>>>> >> >> >> can proceed with snapshot creation. If the rbd CLI dies, the normal >>>>>>>>>>> >> >> >> exclusive lock handling process will automatically take care of >>>>>>>>>>> >> >> >> re-acquiring the lock from the dead client and resuming IO. >>>>>>>>>>> >> >> >> >>>>>>>>>>> >> >> >> This option not only re-uses existing code, it would also eliminate >>>>>>>>>>> >> >> >> the need to add/update the RPC messages for prepare/commit/abort >>>>>>>>>>> >> >> >> snapshot creation to support group snapshots (since it could all be >>>>>>>>>>> >> >> >> handled internally). >>>>>>>>>>> >> >> >> >>>>>>>>>>> >> >> >> On Mon, Aug 15, 2016 at 7:46 PM, Victor Denisov <vdenisov@mirantis.com> >>>>>>>>>>> >> >> >> wrote: >>>>>>>>>>> >> >> >>> Gentlemen, >>>>>>>>>>> >> >> >>> >>>>>>>>>>> >> >> >>> I'm writing to you to ask for your opinion regarding quiescing writes. >>>>>>>>>>> >> >> >>> >>>>>>>>>>> >> >> >>> Here is the situation. In order to take snapshots of all images in a >>>>>>>>>>> >> >> >>> consistency group, >>>>>>>>>>> >> >> >>> we first need to quiesce all the image writers in the consistency >>>>>>>>>>> >> >> >>> group. >>>>>>>>>>> >> >> >>> Let me call >>>>>>>>>>> >> >> >>> group client - a client which requests a consistency group to take a >>>>>>>>>>> >> >> >>> snapshot. >>>>>>>>>>> >> >> >>> Image client - the client that writes to an image. >>>>>>>>>>> >> >> >>> Let's say group client starts sending notify_quiesce to all image >>>>>>>>>>> >> >> >>> clients that write to the images in the group. After quiescing half of >>>>>>>>>>> >> >> >>> the image clients the group client can die. >>>>>>>>>>> >> >> >>> >>>>>>>>>>> >> >> >>> It presents us with a dilemma - what should we do with those quiesced >>>>>>>>>>> >> >> >>> image clients. >>>>>>>>>>> >> >> >>> >>>>>>>>>>> >> >> >>> Option 1 - is to wait till someone manually runs recover for that >>>>>>>>>>> >> >> >>> consistency group. >>>>>>>>>>> >> >> >>> We can show warning next to those unfinished groups when user runs >>>>>>>>>>> >> >> >>> group list command. >>>>>>>>>>> >> >> >>> There will be a command like group recover, which allows users to >>>>>>>>>>> >> >> >>> rollback unsuccessful snapshots >>>>>>>>>>> >> >> >>> or continue them using create snapshot command. >>>>>>>>>>> >> >> >>> >>>>>>>>>>> >> >> >>> Option 2 - is to establish some heart beats between group client and >>>>>>>>>>> >> >> >>> image client. If group client fails to heart beat then image client >>>>>>>>>>> >> >> >>> unquiesces itself and continues normal operation. >>>>>>>>>>> >> >> >>> >>>>>>>>>>> >> >> >>> Option 3 - is to have a timeout for each image client. If group client >>>>>>>>>>> >> >> >>> fails to make a group snapshot within this timeout then we resume our >>>>>>>>>>> >> >> >>> normal operation informing group client of the fact. >>>>>>>>>>> >> >> >>> >>>>>>>>>>> >> >> >>> Which of these options do you prefer? Probably there are other options >>>>>>>>>>> >> >> >>> that I miss. >>>>>>>>>>> >> >> >>> >>>>>>>>>>> >> >> >>> Thanks, >>>>>>>>>>> >> >> >>> Victor. >>>>>>>>>>> >> >> >> >>>>>>>>>>> >> >> >> >>>>>>>>>>> >> >> >> >>>>>>>>>>> >> >> >> -- >>>>>>>>>>> >> >> >> Jason >>>>>>>>>>> >> >> > >>>>>>>>>>> >> >> > >>>>>>>>>>> >> >> > >>>>>>>>>>> >> >> > -- >>>>>>>>>>> >> >> > Jason >>>>>>>>>>> >> > >>>>>>>>>>> >> > >>>>>>>>>>> >> > >>>>>>>>>>> >> > -- >>>>>>>>>>> >> > Jason >>>>>>>>>>> >> > >>>>>>>>>>> >> -- >>>>>>>>>>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>> >> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>>>> -- >>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jason >>>>> >>>>> >>>>> >>>>> -- >>>>> Jason >>> >>> >>> >>> -- >>> Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-09-12 23:09 ` Victor Denisov @ 2016-09-13 12:33 ` Jason Dillaman 2016-09-13 23:41 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Jason Dillaman @ 2016-09-13 12:33 UTC (permalink / raw) To: Victor Denisov; +Cc: Mykola Golub, ceph-devel, Josh Durgin On Mon, Sep 12, 2016 at 7:09 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > I also think, probably, moving existing group api to Group.cc can be > done in a separate pull request. Agreed -- no worries. > On Mon, Sep 12, 2016 at 4:06 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> Another quick question. >> Do you think it makes sense to introduce snapshot namespaces in a pull >> request and review it first? >> It looks like a self sufficient change that we can merge before >> introducing snapshots. Yes, I think that would be a good idea to simplify the PR and get it merged potentially quicker. -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-09-13 12:33 ` Jason Dillaman @ 2016-09-13 23:41 ` Victor Denisov 2016-09-15 23:17 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-09-13 23:41 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin Published a pull request for extracting Group.cc https://github.com/ceph/ceph/pull/11070 Please review. On Tue, Sep 13, 2016 at 5:33 AM, Jason Dillaman <jdillama@redhat.com> wrote: > On Mon, Sep 12, 2016 at 7:09 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> I also think, probably, moving existing group api to Group.cc can be >> done in a separate pull request. > > Agreed -- no worries. > >> On Mon, Sep 12, 2016 at 4:06 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> Another quick question. >>> Do you think it makes sense to introduce snapshot namespaces in a pull >>> request and review it first? >>> It looks like a self sufficient change that we can merge before >>> introducing snapshots. > > Yes, I think that would be a good idea to simplify the PR and get it > merged potentially quicker. > > -- > Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-09-13 23:41 ` Victor Denisov @ 2016-09-15 23:17 ` Victor Denisov 2016-09-16 17:37 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-09-15 23:17 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin Hi Jason, I have a question about implementation of snapshot namespaces in cls_rbd. What if I implement encode and decode functions for SnapshotNamespace types as follows: inline void encode(const SnapshotNamespace &c, bufferlist &bl, uint64_t features=0) { ENCODE_DUMP_PRE(); boost::apply_visitor(EncodeSnapshotTypeVisitor(bl), c); ENCODE_DUMP_POST(SnapshotNamespace); } inline void decode(SnapshotNamespace &c, bufferlist::iterator &p) { uint32_t snap_type; ::decode(snap_type, p); switch (snap_type) { case cls::rbd::SNAPSHOT_NAMESPACE_TYPE_USER: c = UserSnapshotNamespace(); break; case cls::rbd::SNAPSHOT_NAMESPACE_TYPE_GROUP: c = GroupSnapshotNamespace(); break; default: c = UnknownSnapshotNamespace(); break; } boost::apply_visitor(DecodeSnapshotTypeVisitor(p), c); } https://github.com/VictorDenisov/ceph/blob/consistency_groups_namespaces/src/cls/rbd/cls_rbd_types.h#L314 and then instead of if (struct_v >= 5) { uint32_t snap_type; ::decode(snap_type, p); snapshot_namespace_type = static_cast<cls::rbd::SnapshotNamespaceType>(snap_type); switch (snap_type) { case cls::rbd::SNAPSHOT_NAMESPACE_TYPE_USER: snapshot_namespace = cls::rbd::UserSnapshotNamespace(); break; case cls::rbd::SNAPSHOT_NAMESPACE_TYPE_GROUP: snapshot_namespace = cls::rbd::GroupSnapshotNamespace(); break; default: snapshot_namespace = cls::rbd::UnknownSnapshotNamespace(); break; } boost::apply_visitor(cls::rbd::DecodeSnapshotTypeVisitor(p), snapshot_namespace); } else { snapshot_namespace = cls::rbd::UserSnapshotNamespace(); } in cls_rbd_snap decoding I would write just: if (struct_v >= 5) { ::decode(snapshot_namespace, p); } else { snapshot_namespace = cls::rbd::UserSnapshotNamespace(); } then code for ::encode function of cls_rbd_snap would change accordingly: instead of boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), snapshot_namespace); I would do: ::encode(snapshot_namespace, bl); The reason I would like to have encode and decode for SnapshotNamespace implemented like this is because when I implement get_snap_namespace method in cls_rbd.cc it's very convenient to have conventional encode and decode functions. get_snap_namespace has to serialize and deserialize results. https://github.com/VictorDenisov/ceph/blob/consistency_groups_namespaces/src/cls/rbd/cls_rbd.cc#L1492 What do you think? Thanks, V. On Tue, Sep 13, 2016 at 4:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > Published a pull request for extracting Group.cc > > https://github.com/ceph/ceph/pull/11070 > > Please review. > > > On Tue, Sep 13, 2016 at 5:33 AM, Jason Dillaman <jdillama@redhat.com> wrote: >> On Mon, Sep 12, 2016 at 7:09 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> I also think, probably, moving existing group api to Group.cc can be >>> done in a separate pull request. >> >> Agreed -- no worries. >> >>> On Mon, Sep 12, 2016 at 4:06 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>> Another quick question. >>>> Do you think it makes sense to introduce snapshot namespaces in a pull >>>> request and review it first? >>>> It looks like a self sufficient change that we can merge before >>>> introducing snapshots. >> >> Yes, I think that would be a good idea to simplify the PR and get it >> merged potentially quicker. >> >> -- >> Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-09-15 23:17 ` Victor Denisov @ 2016-09-16 17:37 ` Jason Dillaman 2016-10-07 18:26 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Jason Dillaman @ 2016-09-16 17:37 UTC (permalink / raw) To: Victor Denisov; +Cc: Mykola Golub, ceph-devel, Josh Durgin On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > if (struct_v >= 5) { > ::decode(snapshot_namespace, p); > } else { > snapshot_namespace = cls::rbd::UserSnapshotNamespace(); > } > > then code for ::encode function of cls_rbd_snap would change accordingly: > > instead of > > boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), > snapshot_namespace); > > I would do: > ::encode(snapshot_namespace, bl); +1 -- looks good to me -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-09-16 17:37 ` Jason Dillaman @ 2016-10-07 18:26 ` Victor Denisov 2016-10-10 14:18 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-10-07 18:26 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin Are any exceptions used in librbd code? Should the code be exception safe? Thanks, V. On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: > On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> if (struct_v >= 5) { >> ::decode(snapshot_namespace, p); >> } else { >> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >> } >> >> then code for ::encode function of cls_rbd_snap would change accordingly: >> >> instead of >> >> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >> snapshot_namespace); >> >> I would do: >> ::encode(snapshot_namespace, bl); > > > +1 -- looks good to me > > -- > Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-10-07 18:26 ` Victor Denisov @ 2016-10-10 14:18 ` Jason Dillaman 2016-10-10 22:14 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Jason Dillaman @ 2016-10-10 14:18 UTC (permalink / raw) To: Victor Denisov; +Cc: Mykola Golub, ceph-devel, Josh Durgin The only place exceptions are routinely used is within the "::decode" functions. I would prefer to see the code not throwing new exceptions on purpose. On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > Are any exceptions used in librbd code? Should the code be exception safe? > > Thanks, > V. > > On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> if (struct_v >= 5) { >>> ::decode(snapshot_namespace, p); >>> } else { >>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>> } >>> >>> then code for ::encode function of cls_rbd_snap would change accordingly: >>> >>> instead of >>> >>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>> snapshot_namespace); >>> >>> I would do: >>> ::encode(snapshot_namespace, bl); >> >> >> +1 -- looks good to me >> >> -- >> Jason -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-10-10 14:18 ` Jason Dillaman @ 2016-10-10 22:14 ` Victor Denisov 2016-10-26 3:07 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-10-10 22:14 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin Ok. I didn't have any intention to throw exceptions. I was more concerned about whether it's ok to allocate and delete objects or I should use smart pointers. On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: > The only place exceptions are routinely used is within the "::decode" > functions. I would prefer to see the code not throwing new exceptions > on purpose. > > On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> Are any exceptions used in librbd code? Should the code be exception safe? >> >> Thanks, >> V. >> >> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>> if (struct_v >= 5) { >>>> ::decode(snapshot_namespace, p); >>>> } else { >>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>> } >>>> >>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>> >>>> instead of >>>> >>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>> snapshot_namespace); >>>> >>>> I would do: >>>> ::encode(snapshot_namespace, bl); >>> >>> >>> +1 -- looks good to me >>> >>> -- >>> Jason > > > > -- > Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-10-10 22:14 ` Victor Denisov @ 2016-10-26 3:07 ` Victor Denisov 2016-10-26 13:16 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-10-26 3:07 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin Question. When we print out snapshots of an image, should the group snapshots be listed, or should they be marked as special snapshots? Thanks, V. On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > Ok. I didn't have any intention to throw exceptions. > I was more concerned about whether it's ok to allocate and delete > objects or I should use smart pointers. > > On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >> The only place exceptions are routinely used is within the "::decode" >> functions. I would prefer to see the code not throwing new exceptions >> on purpose. >> >> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> Are any exceptions used in librbd code? Should the code be exception safe? >>> >>> Thanks, >>> V. >>> >>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>> if (struct_v >= 5) { >>>>> ::decode(snapshot_namespace, p); >>>>> } else { >>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>> } >>>>> >>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>> >>>>> instead of >>>>> >>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>> snapshot_namespace); >>>>> >>>>> I would do: >>>>> ::encode(snapshot_namespace, bl); >>>> >>>> >>>> +1 -- looks good to me >>>> >>>> -- >>>> Jason >> >> >> >> -- >> Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-10-26 3:07 ` Victor Denisov @ 2016-10-26 13:16 ` Jason Dillaman 2016-10-28 22:41 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Jason Dillaman @ 2016-10-26 13:16 UTC (permalink / raw) To: Victor Denisov; +Cc: Mykola Golub, ceph-devel, Josh Durgin In a perfect world, it would be nice to add a new optional to "rbd snap ls" to show all snapshots (with a new column to indicate the associated namespace). On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > Question. When we print out snapshots of an image, should the group > snapshots be listed, or should they be marked as special snapshots? > > Thanks, > V. > > On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> Ok. I didn't have any intention to throw exceptions. >> I was more concerned about whether it's ok to allocate and delete >> objects or I should use smart pointers. >> >> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>> The only place exceptions are routinely used is within the "::decode" >>> functions. I would prefer to see the code not throwing new exceptions >>> on purpose. >>> >>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>> >>>> Thanks, >>>> V. >>>> >>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>> if (struct_v >= 5) { >>>>>> ::decode(snapshot_namespace, p); >>>>>> } else { >>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>> } >>>>>> >>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>> >>>>>> instead of >>>>>> >>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>> snapshot_namespace); >>>>>> >>>>>> I would do: >>>>>> ::encode(snapshot_namespace, bl); >>>>> >>>>> >>>>> +1 -- looks good to me >>>>> >>>>> -- >>>>> Jason >>> >>> >>> >>> -- >>> Jason -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-10-26 13:16 ` Jason Dillaman @ 2016-10-28 22:41 ` Victor Denisov 2016-10-31 13:07 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-10-28 22:41 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel, Josh Durgin Another thing that bothers me. When we remove an image from a consistency group. Should we remove all snapshots of this image that were created as part of a consistency group snapshot? The easiest solution would be to remove all snapshots that are in GroupSnapshotNamespace and reference this consistency group. I looked into cinder docs for this feature: http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html But it's not clear to me which behavior cinder expects. Thanks, V. On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote: > In a perfect world, it would be nice to add a new optional to "rbd > snap ls" to show all snapshots (with a new column to indicate the > associated namespace). > > On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> Question. When we print out snapshots of an image, should the group >> snapshots be listed, or should they be marked as special snapshots? >> >> Thanks, >> V. >> >> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> Ok. I didn't have any intention to throw exceptions. >>> I was more concerned about whether it's ok to allocate and delete >>> objects or I should use smart pointers. >>> >>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>> The only place exceptions are routinely used is within the "::decode" >>>> functions. I would prefer to see the code not throwing new exceptions >>>> on purpose. >>>> >>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>>> >>>>> Thanks, >>>>> V. >>>>> >>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>> if (struct_v >= 5) { >>>>>>> ::decode(snapshot_namespace, p); >>>>>>> } else { >>>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>>> } >>>>>>> >>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>>> >>>>>>> instead of >>>>>>> >>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>>> snapshot_namespace); >>>>>>> >>>>>>> I would do: >>>>>>> ::encode(snapshot_namespace, bl); >>>>>> >>>>>> >>>>>> +1 -- looks good to me >>>>>> >>>>>> -- >>>>>> Jason >>>> >>>> >>>> >>>> -- >>>> Jason > > > > -- > Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-10-28 22:41 ` Victor Denisov @ 2016-10-31 13:07 ` Jason Dillaman 2016-11-01 15:01 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Jason Dillaman @ 2016-10-31 13:07 UTC (permalink / raw) To: Victor Denisov; +Cc: Mykola Golub, ceph-devel Looking at the Cinder codebase, I don't see any such restriction that would prevent you from removing a volume from a consistency group that has associated snapshots. I would double-check on the OpenStack development mailing list if this is correct and is the intent. Worst case, the RBD driver could raise an exception if there are still consistency group snapshots associated to the image. On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > Another thing that bothers me. When we remove an image from a consistency group. > Should we remove all snapshots of this image that were created as part > of a consistency group snapshot? > > The easiest solution would be to remove all snapshots that are in > GroupSnapshotNamespace and reference this consistency group. > I looked into cinder docs for this feature: > http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html > > But it's not clear to me which behavior cinder expects. > > Thanks, > V. > > On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote: >> In a perfect world, it would be nice to add a new optional to "rbd >> snap ls" to show all snapshots (with a new column to indicate the >> associated namespace). >> >> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> Question. When we print out snapshots of an image, should the group >>> snapshots be listed, or should they be marked as special snapshots? >>> >>> Thanks, >>> V. >>> >>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>> Ok. I didn't have any intention to throw exceptions. >>>> I was more concerned about whether it's ok to allocate and delete >>>> objects or I should use smart pointers. >>>> >>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>> The only place exceptions are routinely used is within the "::decode" >>>>> functions. I would prefer to see the code not throwing new exceptions >>>>> on purpose. >>>>> >>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>>>> >>>>>> Thanks, >>>>>> V. >>>>>> >>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>> if (struct_v >= 5) { >>>>>>>> ::decode(snapshot_namespace, p); >>>>>>>> } else { >>>>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>>>> } >>>>>>>> >>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>>>> >>>>>>>> instead of >>>>>>>> >>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>>>> snapshot_namespace); >>>>>>>> >>>>>>>> I would do: >>>>>>>> ::encode(snapshot_namespace, bl); >>>>>>> >>>>>>> >>>>>>> +1 -- looks good to me >>>>>>> >>>>>>> -- >>>>>>> Jason >>>>> >>>>> >>>>> >>>>> -- >>>>> Jason >> >> >> >> -- >> Jason -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-10-31 13:07 ` Jason Dillaman @ 2016-11-01 15:01 ` Jason Dillaman 2016-12-14 1:03 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Jason Dillaman @ 2016-11-01 15:01 UTC (permalink / raw) To: dillaman; +Cc: Victor Denisov, Mykola Golub, ceph-devel I chatted with Xing on IRC this morning re: Cinder generic groups. It sounds like RBD will need to support preserving the image's consistency group snapshots even if the image is removed from the group. In the OpenStack case, you won't have to worry about the image being deleted while it still has associated group snapshots. We will also want to support being able to clone child images from a group snapshot to ensure that we can thin provision new groups volumes when creating a new group from a group snapshot. This means that the group snapshots should be able to be protected/unprotected just like standard user snapshots. On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote: > Looking at the Cinder codebase, I don't see any such restriction that > would prevent you from removing a volume from a consistency group that > has associated snapshots. I would double-check on the OpenStack > development mailing list if this is correct and is the intent. Worst > case, the RBD driver could raise an exception if there are still > consistency group snapshots associated to the image. > > > On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> Another thing that bothers me. When we remove an image from a consistency group. >> Should we remove all snapshots of this image that were created as part >> of a consistency group snapshot? >> >> The easiest solution would be to remove all snapshots that are in >> GroupSnapshotNamespace and reference this consistency group. >> I looked into cinder docs for this feature: >> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html >> >> But it's not clear to me which behavior cinder expects. >> >> Thanks, >> V. >> >> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>> In a perfect world, it would be nice to add a new optional to "rbd >>> snap ls" to show all snapshots (with a new column to indicate the >>> associated namespace). >>> >>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>> Question. When we print out snapshots of an image, should the group >>>> snapshots be listed, or should they be marked as special snapshots? >>>> >>>> Thanks, >>>> V. >>>> >>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>> Ok. I didn't have any intention to throw exceptions. >>>>> I was more concerned about whether it's ok to allocate and delete >>>>> objects or I should use smart pointers. >>>>> >>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>> The only place exceptions are routinely used is within the "::decode" >>>>>> functions. I would prefer to see the code not throwing new exceptions >>>>>> on purpose. >>>>>> >>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>>>>> >>>>>>> Thanks, >>>>>>> V. >>>>>>> >>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>> if (struct_v >= 5) { >>>>>>>>> ::decode(snapshot_namespace, p); >>>>>>>>> } else { >>>>>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>>>>> >>>>>>>>> instead of >>>>>>>>> >>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>>>>> snapshot_namespace); >>>>>>>>> >>>>>>>>> I would do: >>>>>>>>> ::encode(snapshot_namespace, bl); >>>>>>>> >>>>>>>> >>>>>>>> +1 -- looks good to me >>>>>>>> >>>>>>>> -- >>>>>>>> Jason >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jason >>> >>> >>> >>> -- >>> Jason > > > > -- > Jason -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-11-01 15:01 ` Jason Dillaman @ 2016-12-14 1:03 ` Victor Denisov 2016-12-15 15:10 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-12-14 1:03 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel Jason, My current implementation of consistency group snapshot feature names image snapshots like: <group_pool>_<group_id>_<group_snap_id> I rely on this fact when I need to remove a consistency group. It's necessary because if some of image snapshots were created, but the whole group snapshot operation failed, then the only way to find those dangling image snapshots is by this name. It means that we should forbid renaming snapshots from ConsistencyGroupSnapshot namespace. Another option is to allocate image snapshot ids during the creation of group snapshot, but this requires a major rewrite of the whole process of snapshot creation for images. What is your opinion on this? Thanks, V. On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote: > I chatted with Xing on IRC this morning re: Cinder generic groups. It > sounds like RBD will need to support preserving the image's > consistency group snapshots even if the image is removed from the > group. In the OpenStack case, you won't have to worry about the image > being deleted while it still has associated group snapshots. > > We will also want to support being able to clone child images from a > group snapshot to ensure that we can thin provision new groups volumes > when creating a new group from a group snapshot. This means that the > group snapshots should be able to be protected/unprotected just like > standard user snapshots. > > On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote: >> Looking at the Cinder codebase, I don't see any such restriction that >> would prevent you from removing a volume from a consistency group that >> has associated snapshots. I would double-check on the OpenStack >> development mailing list if this is correct and is the intent. Worst >> case, the RBD driver could raise an exception if there are still >> consistency group snapshots associated to the image. >> >> >> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> Another thing that bothers me. When we remove an image from a consistency group. >>> Should we remove all snapshots of this image that were created as part >>> of a consistency group snapshot? >>> >>> The easiest solution would be to remove all snapshots that are in >>> GroupSnapshotNamespace and reference this consistency group. >>> I looked into cinder docs for this feature: >>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html >>> >>> But it's not clear to me which behavior cinder expects. >>> >>> Thanks, >>> V. >>> >>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>> In a perfect world, it would be nice to add a new optional to "rbd >>>> snap ls" to show all snapshots (with a new column to indicate the >>>> associated namespace). >>>> >>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>> Question. When we print out snapshots of an image, should the group >>>>> snapshots be listed, or should they be marked as special snapshots? >>>>> >>>>> Thanks, >>>>> V. >>>>> >>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>> Ok. I didn't have any intention to throw exceptions. >>>>>> I was more concerned about whether it's ok to allocate and delete >>>>>> objects or I should use smart pointers. >>>>>> >>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>> The only place exceptions are routinely used is within the "::decode" >>>>>>> functions. I would prefer to see the code not throwing new exceptions >>>>>>> on purpose. >>>>>>> >>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> V. >>>>>>>> >>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>> if (struct_v >= 5) { >>>>>>>>>> ::decode(snapshot_namespace, p); >>>>>>>>>> } else { >>>>>>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>>>>>> >>>>>>>>>> instead of >>>>>>>>>> >>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>>>>>> snapshot_namespace); >>>>>>>>>> >>>>>>>>>> I would do: >>>>>>>>>> ::encode(snapshot_namespace, bl); >>>>>>>>> >>>>>>>>> >>>>>>>>> +1 -- looks good to me >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jason >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jason >>>> >>>> >>>> >>>> -- >>>> Jason >> >> >> >> -- >> Jason > > > > -- > Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-12-14 1:03 ` Victor Denisov @ 2016-12-15 15:10 ` Jason Dillaman 2016-12-16 0:05 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Jason Dillaman @ 2016-12-15 15:10 UTC (permalink / raw) To: Victor Denisov; +Cc: Mykola Golub, ceph-devel I think I might be confused. When creating a group snapshot, we have the ConsistencyGroupSnapshot that allows you to store the necessary linkage between the image's snapshot and its associated group snapshot [1]. Why not just name the image's snapshots to the same name as the parent group snapshot name and search the snapshot's metadata to get the linkage? [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255 On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > Jason, > > My current implementation of consistency group snapshot feature names > image snapshots like: <group_pool>_<group_id>_<group_snap_id> > I rely on this fact when I need to remove a consistency group. It's > necessary because if some of image snapshots were created, but the > whole group snapshot operation failed, > then the only way to find those dangling image snapshots is by this name. > > It means that we should forbid renaming snapshots from > ConsistencyGroupSnapshot namespace. > > Another option is to allocate image snapshot ids during the creation > of group snapshot, but this requires a major rewrite of the whole > process of snapshot creation for images. > > What is your opinion on this? > > Thanks, > V. > > > On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote: >> I chatted with Xing on IRC this morning re: Cinder generic groups. It >> sounds like RBD will need to support preserving the image's >> consistency group snapshots even if the image is removed from the >> group. In the OpenStack case, you won't have to worry about the image >> being deleted while it still has associated group snapshots. >> >> We will also want to support being able to clone child images from a >> group snapshot to ensure that we can thin provision new groups volumes >> when creating a new group from a group snapshot. This means that the >> group snapshots should be able to be protected/unprotected just like >> standard user snapshots. >> >> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>> Looking at the Cinder codebase, I don't see any such restriction that >>> would prevent you from removing a volume from a consistency group that >>> has associated snapshots. I would double-check on the OpenStack >>> development mailing list if this is correct and is the intent. Worst >>> case, the RBD driver could raise an exception if there are still >>> consistency group snapshots associated to the image. >>> >>> >>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>> Another thing that bothers me. When we remove an image from a consistency group. >>>> Should we remove all snapshots of this image that were created as part >>>> of a consistency group snapshot? >>>> >>>> The easiest solution would be to remove all snapshots that are in >>>> GroupSnapshotNamespace and reference this consistency group. >>>> I looked into cinder docs for this feature: >>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html >>>> >>>> But it's not clear to me which behavior cinder expects. >>>> >>>> Thanks, >>>> V. >>>> >>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>> In a perfect world, it would be nice to add a new optional to "rbd >>>>> snap ls" to show all snapshots (with a new column to indicate the >>>>> associated namespace). >>>>> >>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>> Question. When we print out snapshots of an image, should the group >>>>>> snapshots be listed, or should they be marked as special snapshots? >>>>>> >>>>>> Thanks, >>>>>> V. >>>>>> >>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>> Ok. I didn't have any intention to throw exceptions. >>>>>>> I was more concerned about whether it's ok to allocate and delete >>>>>>> objects or I should use smart pointers. >>>>>>> >>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>> The only place exceptions are routinely used is within the "::decode" >>>>>>>> functions. I would prefer to see the code not throwing new exceptions >>>>>>>> on purpose. >>>>>>>> >>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> V. >>>>>>>>> >>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>> if (struct_v >= 5) { >>>>>>>>>>> ::decode(snapshot_namespace, p); >>>>>>>>>>> } else { >>>>>>>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>>>>>>> >>>>>>>>>>> instead of >>>>>>>>>>> >>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>>>>>>> snapshot_namespace); >>>>>>>>>>> >>>>>>>>>>> I would do: >>>>>>>>>>> ::encode(snapshot_namespace, bl); >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> +1 -- looks good to me >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jason >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jason >>>>> >>>>> >>>>> >>>>> -- >>>>> Jason >>> >>> >>> >>> -- >>> Jason >> >> >> >> -- >> Jason -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-12-15 15:10 ` Jason Dillaman @ 2016-12-16 0:05 ` Victor Denisov 2016-12-16 14:53 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2016-12-16 0:05 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel Yes, but if image's snapshot is renamed then I'm not able to find this snapshot having only group's snapshot in an inconsistent state for example. V. On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote: > I think I might be confused. When creating a group snapshot, we have > the ConsistencyGroupSnapshot that allows you to store the necessary > linkage between the image's snapshot and its associated group snapshot > [1]. Why not just name the image's snapshots to the same name as the > parent group snapshot name and search the snapshot's metadata to get > the linkage? > > [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255 > > On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> Jason, >> >> My current implementation of consistency group snapshot feature names >> image snapshots like: <group_pool>_<group_id>_<group_snap_id> >> I rely on this fact when I need to remove a consistency group. It's >> necessary because if some of image snapshots were created, but the >> whole group snapshot operation failed, >> then the only way to find those dangling image snapshots is by this name. >> >> It means that we should forbid renaming snapshots from >> ConsistencyGroupSnapshot namespace. >> >> Another option is to allocate image snapshot ids during the creation >> of group snapshot, but this requires a major rewrite of the whole >> process of snapshot creation for images. >> >> What is your opinion on this? >> >> Thanks, >> V. >> >> >> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>> I chatted with Xing on IRC this morning re: Cinder generic groups. It >>> sounds like RBD will need to support preserving the image's >>> consistency group snapshots even if the image is removed from the >>> group. In the OpenStack case, you won't have to worry about the image >>> being deleted while it still has associated group snapshots. >>> >>> We will also want to support being able to clone child images from a >>> group snapshot to ensure that we can thin provision new groups volumes >>> when creating a new group from a group snapshot. This means that the >>> group snapshots should be able to be protected/unprotected just like >>> standard user snapshots. >>> >>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>> Looking at the Cinder codebase, I don't see any such restriction that >>>> would prevent you from removing a volume from a consistency group that >>>> has associated snapshots. I would double-check on the OpenStack >>>> development mailing list if this is correct and is the intent. Worst >>>> case, the RBD driver could raise an exception if there are still >>>> consistency group snapshots associated to the image. >>>> >>>> >>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>> Another thing that bothers me. When we remove an image from a consistency group. >>>>> Should we remove all snapshots of this image that were created as part >>>>> of a consistency group snapshot? >>>>> >>>>> The easiest solution would be to remove all snapshots that are in >>>>> GroupSnapshotNamespace and reference this consistency group. >>>>> I looked into cinder docs for this feature: >>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html >>>>> >>>>> But it's not clear to me which behavior cinder expects. >>>>> >>>>> Thanks, >>>>> V. >>>>> >>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>> In a perfect world, it would be nice to add a new optional to "rbd >>>>>> snap ls" to show all snapshots (with a new column to indicate the >>>>>> associated namespace). >>>>>> >>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>> Question. When we print out snapshots of an image, should the group >>>>>>> snapshots be listed, or should they be marked as special snapshots? >>>>>>> >>>>>>> Thanks, >>>>>>> V. >>>>>>> >>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>> Ok. I didn't have any intention to throw exceptions. >>>>>>>> I was more concerned about whether it's ok to allocate and delete >>>>>>>> objects or I should use smart pointers. >>>>>>>> >>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>> The only place exceptions are routinely used is within the "::decode" >>>>>>>>> functions. I would prefer to see the code not throwing new exceptions >>>>>>>>> on purpose. >>>>>>>>> >>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> V. >>>>>>>>>> >>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>> if (struct_v >= 5) { >>>>>>>>>>>> ::decode(snapshot_namespace, p); >>>>>>>>>>>> } else { >>>>>>>>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>>>>>>>> >>>>>>>>>>>> instead of >>>>>>>>>>>> >>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>>>>>>>> snapshot_namespace); >>>>>>>>>>>> >>>>>>>>>>>> I would do: >>>>>>>>>>>> ::encode(snapshot_namespace, bl); >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> +1 -- looks good to me >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Jason >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jason >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jason >>>> >>>> >>>> >>>> -- >>>> Jason >>> >>> >>> >>> -- >>> Jason > > > > -- > Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-12-16 0:05 ` Victor Denisov @ 2016-12-16 14:53 ` Jason Dillaman 2017-01-04 0:40 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Jason Dillaman @ 2016-12-16 14:53 UTC (permalink / raw) To: Victor Denisov; +Cc: Mykola Golub, ceph-devel Can you give a little background on this specific inconsistent case you are referring to? On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > Yes, but if image's snapshot is renamed then I'm not able to find this > snapshot having only group's snapshot in an inconsistent state for > example. > > V. > > On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote: >> I think I might be confused. When creating a group snapshot, we have >> the ConsistencyGroupSnapshot that allows you to store the necessary >> linkage between the image's snapshot and its associated group snapshot >> [1]. Why not just name the image's snapshots to the same name as the >> parent group snapshot name and search the snapshot's metadata to get >> the linkage? >> >> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255 >> >> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> Jason, >>> >>> My current implementation of consistency group snapshot feature names >>> image snapshots like: <group_pool>_<group_id>_<group_snap_id> >>> I rely on this fact when I need to remove a consistency group. It's >>> necessary because if some of image snapshots were created, but the >>> whole group snapshot operation failed, >>> then the only way to find those dangling image snapshots is by this name. >>> >>> It means that we should forbid renaming snapshots from >>> ConsistencyGroupSnapshot namespace. >>> >>> Another option is to allocate image snapshot ids during the creation >>> of group snapshot, but this requires a major rewrite of the whole >>> process of snapshot creation for images. >>> >>> What is your opinion on this? >>> >>> Thanks, >>> V. >>> >>> >>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It >>>> sounds like RBD will need to support preserving the image's >>>> consistency group snapshots even if the image is removed from the >>>> group. In the OpenStack case, you won't have to worry about the image >>>> being deleted while it still has associated group snapshots. >>>> >>>> We will also want to support being able to clone child images from a >>>> group snapshot to ensure that we can thin provision new groups volumes >>>> when creating a new group from a group snapshot. This means that the >>>> group snapshots should be able to be protected/unprotected just like >>>> standard user snapshots. >>>> >>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>> Looking at the Cinder codebase, I don't see any such restriction that >>>>> would prevent you from removing a volume from a consistency group that >>>>> has associated snapshots. I would double-check on the OpenStack >>>>> development mailing list if this is correct and is the intent. Worst >>>>> case, the RBD driver could raise an exception if there are still >>>>> consistency group snapshots associated to the image. >>>>> >>>>> >>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>> Another thing that bothers me. When we remove an image from a consistency group. >>>>>> Should we remove all snapshots of this image that were created as part >>>>>> of a consistency group snapshot? >>>>>> >>>>>> The easiest solution would be to remove all snapshots that are in >>>>>> GroupSnapshotNamespace and reference this consistency group. >>>>>> I looked into cinder docs for this feature: >>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html >>>>>> >>>>>> But it's not clear to me which behavior cinder expects. >>>>>> >>>>>> Thanks, >>>>>> V. >>>>>> >>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>> In a perfect world, it would be nice to add a new optional to "rbd >>>>>>> snap ls" to show all snapshots (with a new column to indicate the >>>>>>> associated namespace). >>>>>>> >>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>> Question. When we print out snapshots of an image, should the group >>>>>>>> snapshots be listed, or should they be marked as special snapshots? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> V. >>>>>>>> >>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>> Ok. I didn't have any intention to throw exceptions. >>>>>>>>> I was more concerned about whether it's ok to allocate and delete >>>>>>>>> objects or I should use smart pointers. >>>>>>>>> >>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>> The only place exceptions are routinely used is within the "::decode" >>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions >>>>>>>>>> on purpose. >>>>>>>>>> >>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> V. >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>> if (struct_v >= 5) { >>>>>>>>>>>>> ::decode(snapshot_namespace, p); >>>>>>>>>>>>> } else { >>>>>>>>>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>>>>>>>>> >>>>>>>>>>>>> instead of >>>>>>>>>>>>> >>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>>>>>>>>> snapshot_namespace); >>>>>>>>>>>>> >>>>>>>>>>>>> I would do: >>>>>>>>>>>>> ::encode(snapshot_namespace, bl); >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> +1 -- looks good to me >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Jason >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jason >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jason >>>>> >>>>> >>>>> >>>>> -- >>>>> Jason >>>> >>>> >>>> >>>> -- >>>> Jason >> >> >> >> -- >> Jason -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2016-12-16 14:53 ` Jason Dillaman @ 2017-01-04 0:40 ` Victor Denisov 2017-01-04 2:56 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2017-01-04 0:40 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel Let's say we start creating a group snapshot. We invoke async snap_create method in Operations class. When we invoke this method we provide it with the snapshot name. While we are wating for the response we can be aborted. As a result we will be able to find the exact image snapshot using only its name as this was the only information we had at the time of running snap_create method. If snap_create was successful we will be able to find the snapshot otherwise we will not. However if we allow renaming snapshots from GroupSnapshotNamespace then we may not find the snapshot even if it was created successfully. On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote: > Can you give a little background on this specific inconsistent case > you are referring to? > > On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> Yes, but if image's snapshot is renamed then I'm not able to find this >> snapshot having only group's snapshot in an inconsistent state for >> example. >> >> V. >> >> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>> I think I might be confused. When creating a group snapshot, we have >>> the ConsistencyGroupSnapshot that allows you to store the necessary >>> linkage between the image's snapshot and its associated group snapshot >>> [1]. Why not just name the image's snapshots to the same name as the >>> parent group snapshot name and search the snapshot's metadata to get >>> the linkage? >>> >>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255 >>> >>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>> Jason, >>>> >>>> My current implementation of consistency group snapshot feature names >>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id> >>>> I rely on this fact when I need to remove a consistency group. It's >>>> necessary because if some of image snapshots were created, but the >>>> whole group snapshot operation failed, >>>> then the only way to find those dangling image snapshots is by this name. >>>> >>>> It means that we should forbid renaming snapshots from >>>> ConsistencyGroupSnapshot namespace. >>>> >>>> Another option is to allocate image snapshot ids during the creation >>>> of group snapshot, but this requires a major rewrite of the whole >>>> process of snapshot creation for images. >>>> >>>> What is your opinion on this? >>>> >>>> Thanks, >>>> V. >>>> >>>> >>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It >>>>> sounds like RBD will need to support preserving the image's >>>>> consistency group snapshots even if the image is removed from the >>>>> group. In the OpenStack case, you won't have to worry about the image >>>>> being deleted while it still has associated group snapshots. >>>>> >>>>> We will also want to support being able to clone child images from a >>>>> group snapshot to ensure that we can thin provision new groups volumes >>>>> when creating a new group from a group snapshot. This means that the >>>>> group snapshots should be able to be protected/unprotected just like >>>>> standard user snapshots. >>>>> >>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>> Looking at the Cinder codebase, I don't see any such restriction that >>>>>> would prevent you from removing a volume from a consistency group that >>>>>> has associated snapshots. I would double-check on the OpenStack >>>>>> development mailing list if this is correct and is the intent. Worst >>>>>> case, the RBD driver could raise an exception if there are still >>>>>> consistency group snapshots associated to the image. >>>>>> >>>>>> >>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>> Another thing that bothers me. When we remove an image from a consistency group. >>>>>>> Should we remove all snapshots of this image that were created as part >>>>>>> of a consistency group snapshot? >>>>>>> >>>>>>> The easiest solution would be to remove all snapshots that are in >>>>>>> GroupSnapshotNamespace and reference this consistency group. >>>>>>> I looked into cinder docs for this feature: >>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html >>>>>>> >>>>>>> But it's not clear to me which behavior cinder expects. >>>>>>> >>>>>>> Thanks, >>>>>>> V. >>>>>>> >>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd >>>>>>>> snap ls" to show all snapshots (with a new column to indicate the >>>>>>>> associated namespace). >>>>>>>> >>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>> Question. When we print out snapshots of an image, should the group >>>>>>>>> snapshots be listed, or should they be marked as special snapshots? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> V. >>>>>>>>> >>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>> Ok. I didn't have any intention to throw exceptions. >>>>>>>>>> I was more concerned about whether it's ok to allocate and delete >>>>>>>>>> objects or I should use smart pointers. >>>>>>>>>> >>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>> The only place exceptions are routinely used is within the "::decode" >>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions >>>>>>>>>>> on purpose. >>>>>>>>>>> >>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> V. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>> if (struct_v >= 5) { >>>>>>>>>>>>>> ::decode(snapshot_namespace, p); >>>>>>>>>>>>>> } else { >>>>>>>>>>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>>>>>>>>>> >>>>>>>>>>>>>> instead of >>>>>>>>>>>>>> >>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>>>>>>>>>> snapshot_namespace); >>>>>>>>>>>>>> >>>>>>>>>>>>>> I would do: >>>>>>>>>>>>>> ::encode(snapshot_namespace, bl); >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> +1 -- looks good to me >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Jason >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Jason >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jason >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jason >>>>> >>>>> >>>>> >>>>> -- >>>>> Jason >>> >>> >>> >>> -- >>> Jason > > > > -- > Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2017-01-04 0:40 ` Victor Denisov @ 2017-01-04 2:56 ` Jason Dillaman 2017-01-04 18:29 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Jason Dillaman @ 2017-01-04 2:56 UTC (permalink / raw) To: Victor Denisov; +Cc: Mykola Golub, ceph-devel After starting the process of creating a group snapshot, you will already have all the necessary data for the group snapshot namespace [1] (group pool, group id, and group snapshot id) and the group snapshot should be persistently recorded to disk as GROUP_SNAPSHOT_STATE_PENDING. Looking at the snapshot create state machine [2], I don't see any place that a crash (or similar failure) would matter before the actual image snapshot record is created atomically. You would pass the fully populated GroupSnapshotNamespace to snap_create, and if the snapshot is created, it's linked to the group via that namespace and any failures afterwards don't matter since they are linked -- if the snapshot fails to be created, it isn't linked to the group but the snapshot doesn't exist either so there isn't anything to clean up. Since you know which images are linked to the group and you know which snapshots are in the group and which group snapshots are in the image, you can reconcile any issues using the details in the GroupSnapshotNamespace -- there shouldn't be any need to depend on the actual snapshot name (it could technically just be a randomly assigned UUID). Perhaps we could talk about this at a future RBD standup meeting that you are able to join (or the next CDM). [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L249 [2] https://github.com/ceph/ceph/blob/master/src/librbd/operation/SnapshotCreateRequest.h#L28 On Tue, Jan 3, 2017 at 7:40 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > Let's say we start creating a group snapshot. > We invoke async snap_create method in Operations class. > When we invoke this method we provide it with the snapshot name. > > While we are wating for the response we can be aborted. > As a result we will be able to find the exact image snapshot using only its name > as this was the only information we had at the time of running > snap_create method. > > If snap_create was successful we will be able to find the snapshot > otherwise we will not. > However if we allow renaming snapshots from GroupSnapshotNamespace > then we may not find the snapshot even if it > was created successfully. > > On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote: >> Can you give a little background on this specific inconsistent case >> you are referring to? >> >> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> Yes, but if image's snapshot is renamed then I'm not able to find this >>> snapshot having only group's snapshot in an inconsistent state for >>> example. >>> >>> V. >>> >>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>> I think I might be confused. When creating a group snapshot, we have >>>> the ConsistencyGroupSnapshot that allows you to store the necessary >>>> linkage between the image's snapshot and its associated group snapshot >>>> [1]. Why not just name the image's snapshots to the same name as the >>>> parent group snapshot name and search the snapshot's metadata to get >>>> the linkage? >>>> >>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255 >>>> >>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>> Jason, >>>>> >>>>> My current implementation of consistency group snapshot feature names >>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id> >>>>> I rely on this fact when I need to remove a consistency group. It's >>>>> necessary because if some of image snapshots were created, but the >>>>> whole group snapshot operation failed, >>>>> then the only way to find those dangling image snapshots is by this name. >>>>> >>>>> It means that we should forbid renaming snapshots from >>>>> ConsistencyGroupSnapshot namespace. >>>>> >>>>> Another option is to allocate image snapshot ids during the creation >>>>> of group snapshot, but this requires a major rewrite of the whole >>>>> process of snapshot creation for images. >>>>> >>>>> What is your opinion on this? >>>>> >>>>> Thanks, >>>>> V. >>>>> >>>>> >>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It >>>>>> sounds like RBD will need to support preserving the image's >>>>>> consistency group snapshots even if the image is removed from the >>>>>> group. In the OpenStack case, you won't have to worry about the image >>>>>> being deleted while it still has associated group snapshots. >>>>>> >>>>>> We will also want to support being able to clone child images from a >>>>>> group snapshot to ensure that we can thin provision new groups volumes >>>>>> when creating a new group from a group snapshot. This means that the >>>>>> group snapshots should be able to be protected/unprotected just like >>>>>> standard user snapshots. >>>>>> >>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>> Looking at the Cinder codebase, I don't see any such restriction that >>>>>>> would prevent you from removing a volume from a consistency group that >>>>>>> has associated snapshots. I would double-check on the OpenStack >>>>>>> development mailing list if this is correct and is the intent. Worst >>>>>>> case, the RBD driver could raise an exception if there are still >>>>>>> consistency group snapshots associated to the image. >>>>>>> >>>>>>> >>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>> Another thing that bothers me. When we remove an image from a consistency group. >>>>>>>> Should we remove all snapshots of this image that were created as part >>>>>>>> of a consistency group snapshot? >>>>>>>> >>>>>>>> The easiest solution would be to remove all snapshots that are in >>>>>>>> GroupSnapshotNamespace and reference this consistency group. >>>>>>>> I looked into cinder docs for this feature: >>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html >>>>>>>> >>>>>>>> But it's not clear to me which behavior cinder expects. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> V. >>>>>>>> >>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd >>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the >>>>>>>>> associated namespace). >>>>>>>>> >>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>> Question. When we print out snapshots of an image, should the group >>>>>>>>>> snapshots be listed, or should they be marked as special snapshots? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> V. >>>>>>>>>> >>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>> Ok. I didn't have any intention to throw exceptions. >>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete >>>>>>>>>>> objects or I should use smart pointers. >>>>>>>>>>> >>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode" >>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions >>>>>>>>>>>> on purpose. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> V. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>> if (struct_v >= 5) { >>>>>>>>>>>>>>> ::decode(snapshot_namespace, p); >>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>>>>>>>>>>> snapshot_namespace); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I would do: >>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl); >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> +1 -- looks good to me >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Jason >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Jason >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jason >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jason >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jason >>>> >>>> >>>> >>>> -- >>>> Jason >> >> >> >> -- >> Jason -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2017-01-04 2:56 ` Jason Dillaman @ 2017-01-04 18:29 ` Victor Denisov 2017-01-04 18:34 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2017-01-04 18:29 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel It looks like next CDM is only next month. Let's try to figure it out in email. > Since you know which images are linked to the group and you know which > snapshots are in the group and which group snapshots are in the image, > you can reconcile any issues using the details in the > GroupSnapshotNamespace -- there shouldn't be any need to depend on the > actual snapshot name (it could technically just be a randomly assigned > UUID). Let's say I have a consistency group and a snapshot of this group: CG and CGSNAP. Images in this snapshot I'll define as: CG.images[0] - image1 CG.images[1] - image2 CG.images[2] - image3 Image snapshots in cg snapshot will be: CG.CGSNAP.snaps[0] - reference to snapshot of image 1 CG.CGSNAP.snaps[1] - reference to snapshot of image 2 Imagine that this snapshot was created, but wasn't finalized. CG.CGSNAP.state == PENDING. CG.CGSNAP.snaps.length == 0; I'll be writing in pseudo code. Now, let's say we want to remove this pending CGSNAP. This is the code how it's currently implemented: for (image: CG.images) { snap_name = image.id + "_" + CG.CGSNAP.id + "_" + CG.id // This name is unique because of uniqueness of the tuple (image.id, CG.CGSNAP.id, CG.id) remove_image_snapshot(snap_name); } remove_cg_snap(CGSNAP); However, if we don't rely on the name then this is how I envision the code: for (image: CG.images) { for (snap: image.snaps) { if (snap.namespace.cg_id == CG.id && snap.namespace.cg_snap_id == CG.CGSNAP.id) { // this is our snapshot remove_image_snapshot(snap.name); } } } remove_cg_snap(CGSNAP); In this solution I don't like the internal loop. What do you think? Thanks, Victor. On Tue, Jan 3, 2017 at 6:56 PM, Jason Dillaman <jdillama@redhat.com> wrote: > After starting the process of creating a group snapshot, you will > already have all the necessary data for the group snapshot namespace > [1] (group pool, group id, and group snapshot id) and the group > snapshot should be persistently recorded to disk as > GROUP_SNAPSHOT_STATE_PENDING. > > Looking at the snapshot create state machine [2], I don't see any > place that a crash (or similar failure) would matter before the actual > image snapshot record is created atomically. You would pass the fully > populated GroupSnapshotNamespace to snap_create, and if the snapshot > is created, it's linked to the group via that namespace and any > failures afterwards don't matter since they are linked -- if the > snapshot fails to be created, it isn't linked to the group but the > snapshot doesn't exist either so there isn't anything to clean up. > > Since you know which images are linked to the group and you know which > snapshots are in the group and which group snapshots are in the image, > you can reconcile any issues using the details in the > GroupSnapshotNamespace -- there shouldn't be any need to depend on the > actual snapshot name (it could technically just be a randomly assigned > UUID). > > Perhaps we could talk about this at a future RBD standup meeting that > you are able to join (or the next CDM). > > [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L249 > [2] https://github.com/ceph/ceph/blob/master/src/librbd/operation/SnapshotCreateRequest.h#L28 > > On Tue, Jan 3, 2017 at 7:40 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> Let's say we start creating a group snapshot. >> We invoke async snap_create method in Operations class. >> When we invoke this method we provide it with the snapshot name. >> >> While we are wating for the response we can be aborted. >> As a result we will be able to find the exact image snapshot using only its name >> as this was the only information we had at the time of running >> snap_create method. >> >> If snap_create was successful we will be able to find the snapshot >> otherwise we will not. >> However if we allow renaming snapshots from GroupSnapshotNamespace >> then we may not find the snapshot even if it >> was created successfully. >> >> On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>> Can you give a little background on this specific inconsistent case >>> you are referring to? >>> >>> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>> Yes, but if image's snapshot is renamed then I'm not able to find this >>>> snapshot having only group's snapshot in an inconsistent state for >>>> example. >>>> >>>> V. >>>> >>>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>> I think I might be confused. When creating a group snapshot, we have >>>>> the ConsistencyGroupSnapshot that allows you to store the necessary >>>>> linkage between the image's snapshot and its associated group snapshot >>>>> [1]. Why not just name the image's snapshots to the same name as the >>>>> parent group snapshot name and search the snapshot's metadata to get >>>>> the linkage? >>>>> >>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255 >>>>> >>>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>> Jason, >>>>>> >>>>>> My current implementation of consistency group snapshot feature names >>>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id> >>>>>> I rely on this fact when I need to remove a consistency group. It's >>>>>> necessary because if some of image snapshots were created, but the >>>>>> whole group snapshot operation failed, >>>>>> then the only way to find those dangling image snapshots is by this name. >>>>>> >>>>>> It means that we should forbid renaming snapshots from >>>>>> ConsistencyGroupSnapshot namespace. >>>>>> >>>>>> Another option is to allocate image snapshot ids during the creation >>>>>> of group snapshot, but this requires a major rewrite of the whole >>>>>> process of snapshot creation for images. >>>>>> >>>>>> What is your opinion on this? >>>>>> >>>>>> Thanks, >>>>>> V. >>>>>> >>>>>> >>>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It >>>>>>> sounds like RBD will need to support preserving the image's >>>>>>> consistency group snapshots even if the image is removed from the >>>>>>> group. In the OpenStack case, you won't have to worry about the image >>>>>>> being deleted while it still has associated group snapshots. >>>>>>> >>>>>>> We will also want to support being able to clone child images from a >>>>>>> group snapshot to ensure that we can thin provision new groups volumes >>>>>>> when creating a new group from a group snapshot. This means that the >>>>>>> group snapshots should be able to be protected/unprotected just like >>>>>>> standard user snapshots. >>>>>>> >>>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>> Looking at the Cinder codebase, I don't see any such restriction that >>>>>>>> would prevent you from removing a volume from a consistency group that >>>>>>>> has associated snapshots. I would double-check on the OpenStack >>>>>>>> development mailing list if this is correct and is the intent. Worst >>>>>>>> case, the RBD driver could raise an exception if there are still >>>>>>>> consistency group snapshots associated to the image. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>> Another thing that bothers me. When we remove an image from a consistency group. >>>>>>>>> Should we remove all snapshots of this image that were created as part >>>>>>>>> of a consistency group snapshot? >>>>>>>>> >>>>>>>>> The easiest solution would be to remove all snapshots that are in >>>>>>>>> GroupSnapshotNamespace and reference this consistency group. >>>>>>>>> I looked into cinder docs for this feature: >>>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html >>>>>>>>> >>>>>>>>> But it's not clear to me which behavior cinder expects. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> V. >>>>>>>>> >>>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd >>>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the >>>>>>>>>> associated namespace). >>>>>>>>>> >>>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>> Question. When we print out snapshots of an image, should the group >>>>>>>>>>> snapshots be listed, or should they be marked as special snapshots? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> V. >>>>>>>>>>> >>>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>> Ok. I didn't have any intention to throw exceptions. >>>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete >>>>>>>>>>>> objects or I should use smart pointers. >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode" >>>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions >>>>>>>>>>>>> on purpose. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> V. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>> if (struct_v >= 5) { >>>>>>>>>>>>>>>> ::decode(snapshot_namespace, p); >>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>>>>>>>>>>>> snapshot_namespace); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I would do: >>>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +1 -- looks good to me >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Jason >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Jason >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jason >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jason >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jason >>>>> >>>>> >>>>> >>>>> -- >>>>> Jason >>> >>> >>> >>> -- >>> Jason > > > > -- > Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2017-01-04 18:29 ` Victor Denisov @ 2017-01-04 18:34 ` Jason Dillaman 2017-01-04 18:39 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Jason Dillaman @ 2017-01-04 18:34 UTC (permalink / raw) To: Victor Denisov; +Cc: Mykola Golub, ceph-devel The loop isn't that big of a deal -- but you could eliminate it entirely if you just index the in-memory snapshot table via the SnapshotNamespace variant instead of just indexing snapshots by name (e.g. ImageCtx::snap_ids key switches from a string to a namespace). This would be required anyway since you might otherwise have duplicate names between namespaces. On Wed, Jan 4, 2017 at 1:29 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > It looks like next CDM is only next month. Let's try to figure it out in email. > >> Since you know which images are linked to the group and you know which >> snapshots are in the group and which group snapshots are in the image, >> you can reconcile any issues using the details in the >> GroupSnapshotNamespace -- there shouldn't be any need to depend on the >> actual snapshot name (it could technically just be a randomly assigned >> UUID). > > Let's say I have a consistency group and a snapshot of this group: CG > and CGSNAP. > Images in this snapshot I'll define as: > CG.images[0] - image1 > CG.images[1] - image2 > CG.images[2] - image3 > > Image snapshots in cg snapshot will be: > CG.CGSNAP.snaps[0] - reference to snapshot of image 1 > CG.CGSNAP.snaps[1] - reference to snapshot of image 2 > > Imagine that this snapshot was created, but wasn't finalized. > CG.CGSNAP.state == PENDING. > CG.CGSNAP.snaps.length == 0; > I'll be writing in pseudo code. > > Now, let's say we want to remove this pending CGSNAP. This is the code > how it's currently implemented: > > for (image: CG.images) { > snap_name = image.id + "_" + CG.CGSNAP.id + "_" + CG.id // This name > is unique because of uniqueness of the tuple (image.id, CG.CGSNAP.id, > CG.id) > remove_image_snapshot(snap_name); > } > remove_cg_snap(CGSNAP); > > However, if we don't rely on the name then this is how I envision the code: > > for (image: CG.images) { > for (snap: image.snaps) { > if (snap.namespace.cg_id == CG.id && snap.namespace.cg_snap_id == > CG.CGSNAP.id) { // this is our snapshot > remove_image_snapshot(snap.name); > } > } > } > remove_cg_snap(CGSNAP); > > In this solution I don't like the internal loop. > What do you think? > > Thanks, > Victor. > > > On Tue, Jan 3, 2017 at 6:56 PM, Jason Dillaman <jdillama@redhat.com> wrote: >> After starting the process of creating a group snapshot, you will >> already have all the necessary data for the group snapshot namespace >> [1] (group pool, group id, and group snapshot id) and the group >> snapshot should be persistently recorded to disk as >> GROUP_SNAPSHOT_STATE_PENDING. >> >> Looking at the snapshot create state machine [2], I don't see any >> place that a crash (or similar failure) would matter before the actual >> image snapshot record is created atomically. You would pass the fully >> populated GroupSnapshotNamespace to snap_create, and if the snapshot >> is created, it's linked to the group via that namespace and any >> failures afterwards don't matter since they are linked -- if the >> snapshot fails to be created, it isn't linked to the group but the >> snapshot doesn't exist either so there isn't anything to clean up. >> >> Since you know which images are linked to the group and you know which >> snapshots are in the group and which group snapshots are in the image, >> you can reconcile any issues using the details in the >> GroupSnapshotNamespace -- there shouldn't be any need to depend on the >> actual snapshot name (it could technically just be a randomly assigned >> UUID). >> >> Perhaps we could talk about this at a future RBD standup meeting that >> you are able to join (or the next CDM). >> >> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L249 >> [2] https://github.com/ceph/ceph/blob/master/src/librbd/operation/SnapshotCreateRequest.h#L28 >> >> On Tue, Jan 3, 2017 at 7:40 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> Let's say we start creating a group snapshot. >>> We invoke async snap_create method in Operations class. >>> When we invoke this method we provide it with the snapshot name. >>> >>> While we are wating for the response we can be aborted. >>> As a result we will be able to find the exact image snapshot using only its name >>> as this was the only information we had at the time of running >>> snap_create method. >>> >>> If snap_create was successful we will be able to find the snapshot >>> otherwise we will not. >>> However if we allow renaming snapshots from GroupSnapshotNamespace >>> then we may not find the snapshot even if it >>> was created successfully. >>> >>> On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>> Can you give a little background on this specific inconsistent case >>>> you are referring to? >>>> >>>> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>> Yes, but if image's snapshot is renamed then I'm not able to find this >>>>> snapshot having only group's snapshot in an inconsistent state for >>>>> example. >>>>> >>>>> V. >>>>> >>>>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>> I think I might be confused. When creating a group snapshot, we have >>>>>> the ConsistencyGroupSnapshot that allows you to store the necessary >>>>>> linkage between the image's snapshot and its associated group snapshot >>>>>> [1]. Why not just name the image's snapshots to the same name as the >>>>>> parent group snapshot name and search the snapshot's metadata to get >>>>>> the linkage? >>>>>> >>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255 >>>>>> >>>>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>> Jason, >>>>>>> >>>>>>> My current implementation of consistency group snapshot feature names >>>>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id> >>>>>>> I rely on this fact when I need to remove a consistency group. It's >>>>>>> necessary because if some of image snapshots were created, but the >>>>>>> whole group snapshot operation failed, >>>>>>> then the only way to find those dangling image snapshots is by this name. >>>>>>> >>>>>>> It means that we should forbid renaming snapshots from >>>>>>> ConsistencyGroupSnapshot namespace. >>>>>>> >>>>>>> Another option is to allocate image snapshot ids during the creation >>>>>>> of group snapshot, but this requires a major rewrite of the whole >>>>>>> process of snapshot creation for images. >>>>>>> >>>>>>> What is your opinion on this? >>>>>>> >>>>>>> Thanks, >>>>>>> V. >>>>>>> >>>>>>> >>>>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It >>>>>>>> sounds like RBD will need to support preserving the image's >>>>>>>> consistency group snapshots even if the image is removed from the >>>>>>>> group. In the OpenStack case, you won't have to worry about the image >>>>>>>> being deleted while it still has associated group snapshots. >>>>>>>> >>>>>>>> We will also want to support being able to clone child images from a >>>>>>>> group snapshot to ensure that we can thin provision new groups volumes >>>>>>>> when creating a new group from a group snapshot. This means that the >>>>>>>> group snapshots should be able to be protected/unprotected just like >>>>>>>> standard user snapshots. >>>>>>>> >>>>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>> Looking at the Cinder codebase, I don't see any such restriction that >>>>>>>>> would prevent you from removing a volume from a consistency group that >>>>>>>>> has associated snapshots. I would double-check on the OpenStack >>>>>>>>> development mailing list if this is correct and is the intent. Worst >>>>>>>>> case, the RBD driver could raise an exception if there are still >>>>>>>>> consistency group snapshots associated to the image. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>> Another thing that bothers me. When we remove an image from a consistency group. >>>>>>>>>> Should we remove all snapshots of this image that were created as part >>>>>>>>>> of a consistency group snapshot? >>>>>>>>>> >>>>>>>>>> The easiest solution would be to remove all snapshots that are in >>>>>>>>>> GroupSnapshotNamespace and reference this consistency group. >>>>>>>>>> I looked into cinder docs for this feature: >>>>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html >>>>>>>>>> >>>>>>>>>> But it's not clear to me which behavior cinder expects. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> V. >>>>>>>>>> >>>>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd >>>>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the >>>>>>>>>>> associated namespace). >>>>>>>>>>> >>>>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>> Question. When we print out snapshots of an image, should the group >>>>>>>>>>>> snapshots be listed, or should they be marked as special snapshots? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> V. >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>> Ok. I didn't have any intention to throw exceptions. >>>>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete >>>>>>>>>>>>> objects or I should use smart pointers. >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode" >>>>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions >>>>>>>>>>>>>> on purpose. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> V. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>>> if (struct_v >= 5) { >>>>>>>>>>>>>>>>> ::decode(snapshot_namespace, p); >>>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>>>>>>>>>>>>> snapshot_namespace); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I would do: >>>>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> +1 -- looks good to me >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Jason >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Jason >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Jason >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jason >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jason >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jason >>>> >>>> >>>> >>>> -- >>>> Jason >> >> >> >> -- >> Jason -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2017-01-04 18:34 ` Jason Dillaman @ 2017-01-04 18:39 ` Victor Denisov 2017-01-12 0:14 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2017-01-04 18:39 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel I understood. Thanks! On Wed, Jan 4, 2017 at 10:34 AM, Jason Dillaman <jdillama@redhat.com> wrote: > The loop isn't that big of a deal -- but you could eliminate it > entirely if you just index the in-memory snapshot table via the > SnapshotNamespace variant instead of just indexing snapshots by name > (e.g. ImageCtx::snap_ids key switches from a string to a namespace). > This would be required anyway since you might otherwise have duplicate > names between namespaces. > > On Wed, Jan 4, 2017 at 1:29 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> It looks like next CDM is only next month. Let's try to figure it out in email. >> >>> Since you know which images are linked to the group and you know which >>> snapshots are in the group and which group snapshots are in the image, >>> you can reconcile any issues using the details in the >>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the >>> actual snapshot name (it could technically just be a randomly assigned >>> UUID). >> >> Let's say I have a consistency group and a snapshot of this group: CG >> and CGSNAP. >> Images in this snapshot I'll define as: >> CG.images[0] - image1 >> CG.images[1] - image2 >> CG.images[2] - image3 >> >> Image snapshots in cg snapshot will be: >> CG.CGSNAP.snaps[0] - reference to snapshot of image 1 >> CG.CGSNAP.snaps[1] - reference to snapshot of image 2 >> >> Imagine that this snapshot was created, but wasn't finalized. >> CG.CGSNAP.state == PENDING. >> CG.CGSNAP.snaps.length == 0; >> I'll be writing in pseudo code. >> >> Now, let's say we want to remove this pending CGSNAP. This is the code >> how it's currently implemented: >> >> for (image: CG.images) { >> snap_name = image.id + "_" + CG.CGSNAP.id + "_" + CG.id // This name >> is unique because of uniqueness of the tuple (image.id, CG.CGSNAP.id, >> CG.id) >> remove_image_snapshot(snap_name); >> } >> remove_cg_snap(CGSNAP); >> >> However, if we don't rely on the name then this is how I envision the code: >> >> for (image: CG.images) { >> for (snap: image.snaps) { >> if (snap.namespace.cg_id == CG.id && snap.namespace.cg_snap_id == >> CG.CGSNAP.id) { // this is our snapshot >> remove_image_snapshot(snap.name); >> } >> } >> } >> remove_cg_snap(CGSNAP); >> >> In this solution I don't like the internal loop. >> What do you think? >> >> Thanks, >> Victor. >> >> >> On Tue, Jan 3, 2017 at 6:56 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>> After starting the process of creating a group snapshot, you will >>> already have all the necessary data for the group snapshot namespace >>> [1] (group pool, group id, and group snapshot id) and the group >>> snapshot should be persistently recorded to disk as >>> GROUP_SNAPSHOT_STATE_PENDING. >>> >>> Looking at the snapshot create state machine [2], I don't see any >>> place that a crash (or similar failure) would matter before the actual >>> image snapshot record is created atomically. You would pass the fully >>> populated GroupSnapshotNamespace to snap_create, and if the snapshot >>> is created, it's linked to the group via that namespace and any >>> failures afterwards don't matter since they are linked -- if the >>> snapshot fails to be created, it isn't linked to the group but the >>> snapshot doesn't exist either so there isn't anything to clean up. >>> >>> Since you know which images are linked to the group and you know which >>> snapshots are in the group and which group snapshots are in the image, >>> you can reconcile any issues using the details in the >>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the >>> actual snapshot name (it could technically just be a randomly assigned >>> UUID). >>> >>> Perhaps we could talk about this at a future RBD standup meeting that >>> you are able to join (or the next CDM). >>> >>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L249 >>> [2] https://github.com/ceph/ceph/blob/master/src/librbd/operation/SnapshotCreateRequest.h#L28 >>> >>> On Tue, Jan 3, 2017 at 7:40 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>> Let's say we start creating a group snapshot. >>>> We invoke async snap_create method in Operations class. >>>> When we invoke this method we provide it with the snapshot name. >>>> >>>> While we are wating for the response we can be aborted. >>>> As a result we will be able to find the exact image snapshot using only its name >>>> as this was the only information we had at the time of running >>>> snap_create method. >>>> >>>> If snap_create was successful we will be able to find the snapshot >>>> otherwise we will not. >>>> However if we allow renaming snapshots from GroupSnapshotNamespace >>>> then we may not find the snapshot even if it >>>> was created successfully. >>>> >>>> On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>> Can you give a little background on this specific inconsistent case >>>>> you are referring to? >>>>> >>>>> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>> Yes, but if image's snapshot is renamed then I'm not able to find this >>>>>> snapshot having only group's snapshot in an inconsistent state for >>>>>> example. >>>>>> >>>>>> V. >>>>>> >>>>>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>> I think I might be confused. When creating a group snapshot, we have >>>>>>> the ConsistencyGroupSnapshot that allows you to store the necessary >>>>>>> linkage between the image's snapshot and its associated group snapshot >>>>>>> [1]. Why not just name the image's snapshots to the same name as the >>>>>>> parent group snapshot name and search the snapshot's metadata to get >>>>>>> the linkage? >>>>>>> >>>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255 >>>>>>> >>>>>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>> Jason, >>>>>>>> >>>>>>>> My current implementation of consistency group snapshot feature names >>>>>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id> >>>>>>>> I rely on this fact when I need to remove a consistency group. It's >>>>>>>> necessary because if some of image snapshots were created, but the >>>>>>>> whole group snapshot operation failed, >>>>>>>> then the only way to find those dangling image snapshots is by this name. >>>>>>>> >>>>>>>> It means that we should forbid renaming snapshots from >>>>>>>> ConsistencyGroupSnapshot namespace. >>>>>>>> >>>>>>>> Another option is to allocate image snapshot ids during the creation >>>>>>>> of group snapshot, but this requires a major rewrite of the whole >>>>>>>> process of snapshot creation for images. >>>>>>>> >>>>>>>> What is your opinion on this? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> V. >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It >>>>>>>>> sounds like RBD will need to support preserving the image's >>>>>>>>> consistency group snapshots even if the image is removed from the >>>>>>>>> group. In the OpenStack case, you won't have to worry about the image >>>>>>>>> being deleted while it still has associated group snapshots. >>>>>>>>> >>>>>>>>> We will also want to support being able to clone child images from a >>>>>>>>> group snapshot to ensure that we can thin provision new groups volumes >>>>>>>>> when creating a new group from a group snapshot. This means that the >>>>>>>>> group snapshots should be able to be protected/unprotected just like >>>>>>>>> standard user snapshots. >>>>>>>>> >>>>>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>> Looking at the Cinder codebase, I don't see any such restriction that >>>>>>>>>> would prevent you from removing a volume from a consistency group that >>>>>>>>>> has associated snapshots. I would double-check on the OpenStack >>>>>>>>>> development mailing list if this is correct and is the intent. Worst >>>>>>>>>> case, the RBD driver could raise an exception if there are still >>>>>>>>>> consistency group snapshots associated to the image. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>> Another thing that bothers me. When we remove an image from a consistency group. >>>>>>>>>>> Should we remove all snapshots of this image that were created as part >>>>>>>>>>> of a consistency group snapshot? >>>>>>>>>>> >>>>>>>>>>> The easiest solution would be to remove all snapshots that are in >>>>>>>>>>> GroupSnapshotNamespace and reference this consistency group. >>>>>>>>>>> I looked into cinder docs for this feature: >>>>>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html >>>>>>>>>>> >>>>>>>>>>> But it's not clear to me which behavior cinder expects. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> V. >>>>>>>>>>> >>>>>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd >>>>>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the >>>>>>>>>>>> associated namespace). >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>> Question. When we print out snapshots of an image, should the group >>>>>>>>>>>>> snapshots be listed, or should they be marked as special snapshots? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> V. >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>> Ok. I didn't have any intention to throw exceptions. >>>>>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete >>>>>>>>>>>>>> objects or I should use smart pointers. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode" >>>>>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions >>>>>>>>>>>>>>> on purpose. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> V. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>>>> if (struct_v >= 5) { >>>>>>>>>>>>>>>>>> ::decode(snapshot_namespace, p); >>>>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>>>>>>>>>>>>>> snapshot_namespace); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I would do: >>>>>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> +1 -- looks good to me >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Jason >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Jason >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Jason >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jason >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jason >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jason >>>>> >>>>> >>>>> >>>>> -- >>>>> Jason >>> >>> >>> >>> -- >>> Jason > > > > -- > Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2017-01-04 18:39 ` Victor Denisov @ 2017-01-12 0:14 ` Victor Denisov 2017-01-12 0:18 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2017-01-12 0:14 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel Can I open an image and then redirect it to one of its snapshots? I see that there is a snap_set method. Do I understand correctly that I need to: open the image invoke snap_set method invoke refresh method Thanks in advance. V. On Wed, Jan 4, 2017 at 10:39 AM, Victor Denisov <vdenisov@mirantis.com> wrote: > I understood. Thanks! > > On Wed, Jan 4, 2017 at 10:34 AM, Jason Dillaman <jdillama@redhat.com> wrote: >> The loop isn't that big of a deal -- but you could eliminate it >> entirely if you just index the in-memory snapshot table via the >> SnapshotNamespace variant instead of just indexing snapshots by name >> (e.g. ImageCtx::snap_ids key switches from a string to a namespace). >> This would be required anyway since you might otherwise have duplicate >> names between namespaces. >> >> On Wed, Jan 4, 2017 at 1:29 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> It looks like next CDM is only next month. Let's try to figure it out in email. >>> >>>> Since you know which images are linked to the group and you know which >>>> snapshots are in the group and which group snapshots are in the image, >>>> you can reconcile any issues using the details in the >>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the >>>> actual snapshot name (it could technically just be a randomly assigned >>>> UUID). >>> >>> Let's say I have a consistency group and a snapshot of this group: CG >>> and CGSNAP. >>> Images in this snapshot I'll define as: >>> CG.images[0] - image1 >>> CG.images[1] - image2 >>> CG.images[2] - image3 >>> >>> Image snapshots in cg snapshot will be: >>> CG.CGSNAP.snaps[0] - reference to snapshot of image 1 >>> CG.CGSNAP.snaps[1] - reference to snapshot of image 2 >>> >>> Imagine that this snapshot was created, but wasn't finalized. >>> CG.CGSNAP.state == PENDING. >>> CG.CGSNAP.snaps.length == 0; >>> I'll be writing in pseudo code. >>> >>> Now, let's say we want to remove this pending CGSNAP. This is the code >>> how it's currently implemented: >>> >>> for (image: CG.images) { >>> snap_name = image.id + "_" + CG.CGSNAP.id + "_" + CG.id // This name >>> is unique because of uniqueness of the tuple (image.id, CG.CGSNAP.id, >>> CG.id) >>> remove_image_snapshot(snap_name); >>> } >>> remove_cg_snap(CGSNAP); >>> >>> However, if we don't rely on the name then this is how I envision the code: >>> >>> for (image: CG.images) { >>> for (snap: image.snaps) { >>> if (snap.namespace.cg_id == CG.id && snap.namespace.cg_snap_id == >>> CG.CGSNAP.id) { // this is our snapshot >>> remove_image_snapshot(snap.name); >>> } >>> } >>> } >>> remove_cg_snap(CGSNAP); >>> >>> In this solution I don't like the internal loop. >>> What do you think? >>> >>> Thanks, >>> Victor. >>> >>> >>> On Tue, Jan 3, 2017 at 6:56 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>> After starting the process of creating a group snapshot, you will >>>> already have all the necessary data for the group snapshot namespace >>>> [1] (group pool, group id, and group snapshot id) and the group >>>> snapshot should be persistently recorded to disk as >>>> GROUP_SNAPSHOT_STATE_PENDING. >>>> >>>> Looking at the snapshot create state machine [2], I don't see any >>>> place that a crash (or similar failure) would matter before the actual >>>> image snapshot record is created atomically. You would pass the fully >>>> populated GroupSnapshotNamespace to snap_create, and if the snapshot >>>> is created, it's linked to the group via that namespace and any >>>> failures afterwards don't matter since they are linked -- if the >>>> snapshot fails to be created, it isn't linked to the group but the >>>> snapshot doesn't exist either so there isn't anything to clean up. >>>> >>>> Since you know which images are linked to the group and you know which >>>> snapshots are in the group and which group snapshots are in the image, >>>> you can reconcile any issues using the details in the >>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the >>>> actual snapshot name (it could technically just be a randomly assigned >>>> UUID). >>>> >>>> Perhaps we could talk about this at a future RBD standup meeting that >>>> you are able to join (or the next CDM). >>>> >>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L249 >>>> [2] https://github.com/ceph/ceph/blob/master/src/librbd/operation/SnapshotCreateRequest.h#L28 >>>> >>>> On Tue, Jan 3, 2017 at 7:40 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>> Let's say we start creating a group snapshot. >>>>> We invoke async snap_create method in Operations class. >>>>> When we invoke this method we provide it with the snapshot name. >>>>> >>>>> While we are wating for the response we can be aborted. >>>>> As a result we will be able to find the exact image snapshot using only its name >>>>> as this was the only information we had at the time of running >>>>> snap_create method. >>>>> >>>>> If snap_create was successful we will be able to find the snapshot >>>>> otherwise we will not. >>>>> However if we allow renaming snapshots from GroupSnapshotNamespace >>>>> then we may not find the snapshot even if it >>>>> was created successfully. >>>>> >>>>> On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>> Can you give a little background on this specific inconsistent case >>>>>> you are referring to? >>>>>> >>>>>> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>> Yes, but if image's snapshot is renamed then I'm not able to find this >>>>>>> snapshot having only group's snapshot in an inconsistent state for >>>>>>> example. >>>>>>> >>>>>>> V. >>>>>>> >>>>>>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>> I think I might be confused. When creating a group snapshot, we have >>>>>>>> the ConsistencyGroupSnapshot that allows you to store the necessary >>>>>>>> linkage between the image's snapshot and its associated group snapshot >>>>>>>> [1]. Why not just name the image's snapshots to the same name as the >>>>>>>> parent group snapshot name and search the snapshot's metadata to get >>>>>>>> the linkage? >>>>>>>> >>>>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255 >>>>>>>> >>>>>>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>> Jason, >>>>>>>>> >>>>>>>>> My current implementation of consistency group snapshot feature names >>>>>>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id> >>>>>>>>> I rely on this fact when I need to remove a consistency group. It's >>>>>>>>> necessary because if some of image snapshots were created, but the >>>>>>>>> whole group snapshot operation failed, >>>>>>>>> then the only way to find those dangling image snapshots is by this name. >>>>>>>>> >>>>>>>>> It means that we should forbid renaming snapshots from >>>>>>>>> ConsistencyGroupSnapshot namespace. >>>>>>>>> >>>>>>>>> Another option is to allocate image snapshot ids during the creation >>>>>>>>> of group snapshot, but this requires a major rewrite of the whole >>>>>>>>> process of snapshot creation for images. >>>>>>>>> >>>>>>>>> What is your opinion on this? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> V. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It >>>>>>>>>> sounds like RBD will need to support preserving the image's >>>>>>>>>> consistency group snapshots even if the image is removed from the >>>>>>>>>> group. In the OpenStack case, you won't have to worry about the image >>>>>>>>>> being deleted while it still has associated group snapshots. >>>>>>>>>> >>>>>>>>>> We will also want to support being able to clone child images from a >>>>>>>>>> group snapshot to ensure that we can thin provision new groups volumes >>>>>>>>>> when creating a new group from a group snapshot. This means that the >>>>>>>>>> group snapshots should be able to be protected/unprotected just like >>>>>>>>>> standard user snapshots. >>>>>>>>>> >>>>>>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>> Looking at the Cinder codebase, I don't see any such restriction that >>>>>>>>>>> would prevent you from removing a volume from a consistency group that >>>>>>>>>>> has associated snapshots. I would double-check on the OpenStack >>>>>>>>>>> development mailing list if this is correct and is the intent. Worst >>>>>>>>>>> case, the RBD driver could raise an exception if there are still >>>>>>>>>>> consistency group snapshots associated to the image. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>> Another thing that bothers me. When we remove an image from a consistency group. >>>>>>>>>>>> Should we remove all snapshots of this image that were created as part >>>>>>>>>>>> of a consistency group snapshot? >>>>>>>>>>>> >>>>>>>>>>>> The easiest solution would be to remove all snapshots that are in >>>>>>>>>>>> GroupSnapshotNamespace and reference this consistency group. >>>>>>>>>>>> I looked into cinder docs for this feature: >>>>>>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html >>>>>>>>>>>> >>>>>>>>>>>> But it's not clear to me which behavior cinder expects. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> V. >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd >>>>>>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the >>>>>>>>>>>>> associated namespace). >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>> Question. When we print out snapshots of an image, should the group >>>>>>>>>>>>>> snapshots be listed, or should they be marked as special snapshots? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> V. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>> Ok. I didn't have any intention to throw exceptions. >>>>>>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete >>>>>>>>>>>>>>> objects or I should use smart pointers. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode" >>>>>>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions >>>>>>>>>>>>>>>> on purpose. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> V. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>>>>> if (struct_v >= 5) { >>>>>>>>>>>>>>>>>>> ::decode(snapshot_namespace, p); >>>>>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>>>>>>>>>>>>>>> snapshot_namespace); >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I would do: >>>>>>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> +1 -- looks good to me >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Jason >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Jason >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Jason >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Jason >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jason >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jason >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jason >>>> >>>> >>>> >>>> -- >>>> Jason >> >> >> >> -- >> Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2017-01-12 0:14 ` Victor Denisov @ 2017-01-12 0:18 ` Jason Dillaman 2017-01-12 1:10 ` Victor Denisov 0 siblings, 1 reply; 43+ messages in thread From: Jason Dillaman @ 2017-01-12 0:18 UTC (permalink / raw) To: Victor Denisov; +Cc: Mykola Golub, ceph-devel You can specify the snapshot at image open as well. There is no need to refresh. On Wed, Jan 11, 2017 at 7:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > Can I open an image and then redirect it to one of its snapshots? > > I see that there is a snap_set method. > Do I understand correctly that I need to: > open the image > invoke snap_set method > invoke refresh method > > Thanks in advance. > V. > > On Wed, Jan 4, 2017 at 10:39 AM, Victor Denisov <vdenisov@mirantis.com> wrote: >> I understood. Thanks! >> >> On Wed, Jan 4, 2017 at 10:34 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>> The loop isn't that big of a deal -- but you could eliminate it >>> entirely if you just index the in-memory snapshot table via the >>> SnapshotNamespace variant instead of just indexing snapshots by name >>> (e.g. ImageCtx::snap_ids key switches from a string to a namespace). >>> This would be required anyway since you might otherwise have duplicate >>> names between namespaces. >>> >>> On Wed, Jan 4, 2017 at 1:29 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>> It looks like next CDM is only next month. Let's try to figure it out in email. >>>> >>>>> Since you know which images are linked to the group and you know which >>>>> snapshots are in the group and which group snapshots are in the image, >>>>> you can reconcile any issues using the details in the >>>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the >>>>> actual snapshot name (it could technically just be a randomly assigned >>>>> UUID). >>>> >>>> Let's say I have a consistency group and a snapshot of this group: CG >>>> and CGSNAP. >>>> Images in this snapshot I'll define as: >>>> CG.images[0] - image1 >>>> CG.images[1] - image2 >>>> CG.images[2] - image3 >>>> >>>> Image snapshots in cg snapshot will be: >>>> CG.CGSNAP.snaps[0] - reference to snapshot of image 1 >>>> CG.CGSNAP.snaps[1] - reference to snapshot of image 2 >>>> >>>> Imagine that this snapshot was created, but wasn't finalized. >>>> CG.CGSNAP.state == PENDING. >>>> CG.CGSNAP.snaps.length == 0; >>>> I'll be writing in pseudo code. >>>> >>>> Now, let's say we want to remove this pending CGSNAP. This is the code >>>> how it's currently implemented: >>>> >>>> for (image: CG.images) { >>>> snap_name = image.id + "_" + CG.CGSNAP.id + "_" + CG.id // This name >>>> is unique because of uniqueness of the tuple (image.id, CG.CGSNAP.id, >>>> CG.id) >>>> remove_image_snapshot(snap_name); >>>> } >>>> remove_cg_snap(CGSNAP); >>>> >>>> However, if we don't rely on the name then this is how I envision the code: >>>> >>>> for (image: CG.images) { >>>> for (snap: image.snaps) { >>>> if (snap.namespace.cg_id == CG.id && snap.namespace.cg_snap_id == >>>> CG.CGSNAP.id) { // this is our snapshot >>>> remove_image_snapshot(snap.name); >>>> } >>>> } >>>> } >>>> remove_cg_snap(CGSNAP); >>>> >>>> In this solution I don't like the internal loop. >>>> What do you think? >>>> >>>> Thanks, >>>> Victor. >>>> >>>> >>>> On Tue, Jan 3, 2017 at 6:56 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>> After starting the process of creating a group snapshot, you will >>>>> already have all the necessary data for the group snapshot namespace >>>>> [1] (group pool, group id, and group snapshot id) and the group >>>>> snapshot should be persistently recorded to disk as >>>>> GROUP_SNAPSHOT_STATE_PENDING. >>>>> >>>>> Looking at the snapshot create state machine [2], I don't see any >>>>> place that a crash (or similar failure) would matter before the actual >>>>> image snapshot record is created atomically. You would pass the fully >>>>> populated GroupSnapshotNamespace to snap_create, and if the snapshot >>>>> is created, it's linked to the group via that namespace and any >>>>> failures afterwards don't matter since they are linked -- if the >>>>> snapshot fails to be created, it isn't linked to the group but the >>>>> snapshot doesn't exist either so there isn't anything to clean up. >>>>> >>>>> Since you know which images are linked to the group and you know which >>>>> snapshots are in the group and which group snapshots are in the image, >>>>> you can reconcile any issues using the details in the >>>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the >>>>> actual snapshot name (it could technically just be a randomly assigned >>>>> UUID). >>>>> >>>>> Perhaps we could talk about this at a future RBD standup meeting that >>>>> you are able to join (or the next CDM). >>>>> >>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L249 >>>>> [2] https://github.com/ceph/ceph/blob/master/src/librbd/operation/SnapshotCreateRequest.h#L28 >>>>> >>>>> On Tue, Jan 3, 2017 at 7:40 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>> Let's say we start creating a group snapshot. >>>>>> We invoke async snap_create method in Operations class. >>>>>> When we invoke this method we provide it with the snapshot name. >>>>>> >>>>>> While we are wating for the response we can be aborted. >>>>>> As a result we will be able to find the exact image snapshot using only its name >>>>>> as this was the only information we had at the time of running >>>>>> snap_create method. >>>>>> >>>>>> If snap_create was successful we will be able to find the snapshot >>>>>> otherwise we will not. >>>>>> However if we allow renaming snapshots from GroupSnapshotNamespace >>>>>> then we may not find the snapshot even if it >>>>>> was created successfully. >>>>>> >>>>>> On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>> Can you give a little background on this specific inconsistent case >>>>>>> you are referring to? >>>>>>> >>>>>>> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>> Yes, but if image's snapshot is renamed then I'm not able to find this >>>>>>>> snapshot having only group's snapshot in an inconsistent state for >>>>>>>> example. >>>>>>>> >>>>>>>> V. >>>>>>>> >>>>>>>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>> I think I might be confused. When creating a group snapshot, we have >>>>>>>>> the ConsistencyGroupSnapshot that allows you to store the necessary >>>>>>>>> linkage between the image's snapshot and its associated group snapshot >>>>>>>>> [1]. Why not just name the image's snapshots to the same name as the >>>>>>>>> parent group snapshot name and search the snapshot's metadata to get >>>>>>>>> the linkage? >>>>>>>>> >>>>>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255 >>>>>>>>> >>>>>>>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>> Jason, >>>>>>>>>> >>>>>>>>>> My current implementation of consistency group snapshot feature names >>>>>>>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id> >>>>>>>>>> I rely on this fact when I need to remove a consistency group. It's >>>>>>>>>> necessary because if some of image snapshots were created, but the >>>>>>>>>> whole group snapshot operation failed, >>>>>>>>>> then the only way to find those dangling image snapshots is by this name. >>>>>>>>>> >>>>>>>>>> It means that we should forbid renaming snapshots from >>>>>>>>>> ConsistencyGroupSnapshot namespace. >>>>>>>>>> >>>>>>>>>> Another option is to allocate image snapshot ids during the creation >>>>>>>>>> of group snapshot, but this requires a major rewrite of the whole >>>>>>>>>> process of snapshot creation for images. >>>>>>>>>> >>>>>>>>>> What is your opinion on this? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> V. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It >>>>>>>>>>> sounds like RBD will need to support preserving the image's >>>>>>>>>>> consistency group snapshots even if the image is removed from the >>>>>>>>>>> group. In the OpenStack case, you won't have to worry about the image >>>>>>>>>>> being deleted while it still has associated group snapshots. >>>>>>>>>>> >>>>>>>>>>> We will also want to support being able to clone child images from a >>>>>>>>>>> group snapshot to ensure that we can thin provision new groups volumes >>>>>>>>>>> when creating a new group from a group snapshot. This means that the >>>>>>>>>>> group snapshots should be able to be protected/unprotected just like >>>>>>>>>>> standard user snapshots. >>>>>>>>>>> >>>>>>>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>> Looking at the Cinder codebase, I don't see any such restriction that >>>>>>>>>>>> would prevent you from removing a volume from a consistency group that >>>>>>>>>>>> has associated snapshots. I would double-check on the OpenStack >>>>>>>>>>>> development mailing list if this is correct and is the intent. Worst >>>>>>>>>>>> case, the RBD driver could raise an exception if there are still >>>>>>>>>>>> consistency group snapshots associated to the image. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>> Another thing that bothers me. When we remove an image from a consistency group. >>>>>>>>>>>>> Should we remove all snapshots of this image that were created as part >>>>>>>>>>>>> of a consistency group snapshot? >>>>>>>>>>>>> >>>>>>>>>>>>> The easiest solution would be to remove all snapshots that are in >>>>>>>>>>>>> GroupSnapshotNamespace and reference this consistency group. >>>>>>>>>>>>> I looked into cinder docs for this feature: >>>>>>>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html >>>>>>>>>>>>> >>>>>>>>>>>>> But it's not clear to me which behavior cinder expects. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> V. >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd >>>>>>>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the >>>>>>>>>>>>>> associated namespace). >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>> Question. When we print out snapshots of an image, should the group >>>>>>>>>>>>>>> snapshots be listed, or should they be marked as special snapshots? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> V. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>> Ok. I didn't have any intention to throw exceptions. >>>>>>>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete >>>>>>>>>>>>>>>> objects or I should use smart pointers. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode" >>>>>>>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions >>>>>>>>>>>>>>>>> on purpose. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> V. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>>>>>> if (struct_v >= 5) { >>>>>>>>>>>>>>>>>>>> ::decode(snapshot_namespace, p); >>>>>>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>>>>>>>>>>>>>>>> snapshot_namespace); >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I would do: >>>>>>>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl); >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> +1 -- looks good to me >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> Jason >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Jason >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Jason >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Jason >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Jason >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jason >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jason >>>>> >>>>> >>>>> >>>>> -- >>>>> Jason >>> >>> >>> >>> -- >>> Jason -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2017-01-12 0:18 ` Jason Dillaman @ 2017-01-12 1:10 ` Victor Denisov 2017-01-12 1:36 ` Jason Dillaman 0 siblings, 1 reply; 43+ messages in thread From: Victor Denisov @ 2017-01-12 1:10 UTC (permalink / raw) To: Jason Dillaman; +Cc: Mykola Golub, ceph-devel If I understand correctly I can't open required snapshot immediately if I know only its namespace. I need to get the whole list of snapshots first and then find mine using its namespace. After that I want to redirect the opened image to my snapshot. On Wed, Jan 11, 2017 at 4:18 PM, Jason Dillaman <jdillama@redhat.com> wrote: > You can specify the snapshot at image open as well. There is no need to refresh. > > On Wed, Jan 11, 2017 at 7:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >> Can I open an image and then redirect it to one of its snapshots? >> >> I see that there is a snap_set method. >> Do I understand correctly that I need to: >> open the image >> invoke snap_set method >> invoke refresh method >> >> Thanks in advance. >> V. >> >> On Wed, Jan 4, 2017 at 10:39 AM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> I understood. Thanks! >>> >>> On Wed, Jan 4, 2017 at 10:34 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>> The loop isn't that big of a deal -- but you could eliminate it >>>> entirely if you just index the in-memory snapshot table via the >>>> SnapshotNamespace variant instead of just indexing snapshots by name >>>> (e.g. ImageCtx::snap_ids key switches from a string to a namespace). >>>> This would be required anyway since you might otherwise have duplicate >>>> names between namespaces. >>>> >>>> On Wed, Jan 4, 2017 at 1:29 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>> It looks like next CDM is only next month. Let's try to figure it out in email. >>>>> >>>>>> Since you know which images are linked to the group and you know which >>>>>> snapshots are in the group and which group snapshots are in the image, >>>>>> you can reconcile any issues using the details in the >>>>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the >>>>>> actual snapshot name (it could technically just be a randomly assigned >>>>>> UUID). >>>>> >>>>> Let's say I have a consistency group and a snapshot of this group: CG >>>>> and CGSNAP. >>>>> Images in this snapshot I'll define as: >>>>> CG.images[0] - image1 >>>>> CG.images[1] - image2 >>>>> CG.images[2] - image3 >>>>> >>>>> Image snapshots in cg snapshot will be: >>>>> CG.CGSNAP.snaps[0] - reference to snapshot of image 1 >>>>> CG.CGSNAP.snaps[1] - reference to snapshot of image 2 >>>>> >>>>> Imagine that this snapshot was created, but wasn't finalized. >>>>> CG.CGSNAP.state == PENDING. >>>>> CG.CGSNAP.snaps.length == 0; >>>>> I'll be writing in pseudo code. >>>>> >>>>> Now, let's say we want to remove this pending CGSNAP. This is the code >>>>> how it's currently implemented: >>>>> >>>>> for (image: CG.images) { >>>>> snap_name = image.id + "_" + CG.CGSNAP.id + "_" + CG.id // This name >>>>> is unique because of uniqueness of the tuple (image.id, CG.CGSNAP.id, >>>>> CG.id) >>>>> remove_image_snapshot(snap_name); >>>>> } >>>>> remove_cg_snap(CGSNAP); >>>>> >>>>> However, if we don't rely on the name then this is how I envision the code: >>>>> >>>>> for (image: CG.images) { >>>>> for (snap: image.snaps) { >>>>> if (snap.namespace.cg_id == CG.id && snap.namespace.cg_snap_id == >>>>> CG.CGSNAP.id) { // this is our snapshot >>>>> remove_image_snapshot(snap.name); >>>>> } >>>>> } >>>>> } >>>>> remove_cg_snap(CGSNAP); >>>>> >>>>> In this solution I don't like the internal loop. >>>>> What do you think? >>>>> >>>>> Thanks, >>>>> Victor. >>>>> >>>>> >>>>> On Tue, Jan 3, 2017 at 6:56 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>> After starting the process of creating a group snapshot, you will >>>>>> already have all the necessary data for the group snapshot namespace >>>>>> [1] (group pool, group id, and group snapshot id) and the group >>>>>> snapshot should be persistently recorded to disk as >>>>>> GROUP_SNAPSHOT_STATE_PENDING. >>>>>> >>>>>> Looking at the snapshot create state machine [2], I don't see any >>>>>> place that a crash (or similar failure) would matter before the actual >>>>>> image snapshot record is created atomically. You would pass the fully >>>>>> populated GroupSnapshotNamespace to snap_create, and if the snapshot >>>>>> is created, it's linked to the group via that namespace and any >>>>>> failures afterwards don't matter since they are linked -- if the >>>>>> snapshot fails to be created, it isn't linked to the group but the >>>>>> snapshot doesn't exist either so there isn't anything to clean up. >>>>>> >>>>>> Since you know which images are linked to the group and you know which >>>>>> snapshots are in the group and which group snapshots are in the image, >>>>>> you can reconcile any issues using the details in the >>>>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the >>>>>> actual snapshot name (it could technically just be a randomly assigned >>>>>> UUID). >>>>>> >>>>>> Perhaps we could talk about this at a future RBD standup meeting that >>>>>> you are able to join (or the next CDM). >>>>>> >>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L249 >>>>>> [2] https://github.com/ceph/ceph/blob/master/src/librbd/operation/SnapshotCreateRequest.h#L28 >>>>>> >>>>>> On Tue, Jan 3, 2017 at 7:40 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>> Let's say we start creating a group snapshot. >>>>>>> We invoke async snap_create method in Operations class. >>>>>>> When we invoke this method we provide it with the snapshot name. >>>>>>> >>>>>>> While we are wating for the response we can be aborted. >>>>>>> As a result we will be able to find the exact image snapshot using only its name >>>>>>> as this was the only information we had at the time of running >>>>>>> snap_create method. >>>>>>> >>>>>>> If snap_create was successful we will be able to find the snapshot >>>>>>> otherwise we will not. >>>>>>> However if we allow renaming snapshots from GroupSnapshotNamespace >>>>>>> then we may not find the snapshot even if it >>>>>>> was created successfully. >>>>>>> >>>>>>> On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>> Can you give a little background on this specific inconsistent case >>>>>>>> you are referring to? >>>>>>>> >>>>>>>> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>> Yes, but if image's snapshot is renamed then I'm not able to find this >>>>>>>>> snapshot having only group's snapshot in an inconsistent state for >>>>>>>>> example. >>>>>>>>> >>>>>>>>> V. >>>>>>>>> >>>>>>>>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>> I think I might be confused. When creating a group snapshot, we have >>>>>>>>>> the ConsistencyGroupSnapshot that allows you to store the necessary >>>>>>>>>> linkage between the image's snapshot and its associated group snapshot >>>>>>>>>> [1]. Why not just name the image's snapshots to the same name as the >>>>>>>>>> parent group snapshot name and search the snapshot's metadata to get >>>>>>>>>> the linkage? >>>>>>>>>> >>>>>>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255 >>>>>>>>>> >>>>>>>>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>> Jason, >>>>>>>>>>> >>>>>>>>>>> My current implementation of consistency group snapshot feature names >>>>>>>>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id> >>>>>>>>>>> I rely on this fact when I need to remove a consistency group. It's >>>>>>>>>>> necessary because if some of image snapshots were created, but the >>>>>>>>>>> whole group snapshot operation failed, >>>>>>>>>>> then the only way to find those dangling image snapshots is by this name. >>>>>>>>>>> >>>>>>>>>>> It means that we should forbid renaming snapshots from >>>>>>>>>>> ConsistencyGroupSnapshot namespace. >>>>>>>>>>> >>>>>>>>>>> Another option is to allocate image snapshot ids during the creation >>>>>>>>>>> of group snapshot, but this requires a major rewrite of the whole >>>>>>>>>>> process of snapshot creation for images. >>>>>>>>>>> >>>>>>>>>>> What is your opinion on this? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> V. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It >>>>>>>>>>>> sounds like RBD will need to support preserving the image's >>>>>>>>>>>> consistency group snapshots even if the image is removed from the >>>>>>>>>>>> group. In the OpenStack case, you won't have to worry about the image >>>>>>>>>>>> being deleted while it still has associated group snapshots. >>>>>>>>>>>> >>>>>>>>>>>> We will also want to support being able to clone child images from a >>>>>>>>>>>> group snapshot to ensure that we can thin provision new groups volumes >>>>>>>>>>>> when creating a new group from a group snapshot. This means that the >>>>>>>>>>>> group snapshots should be able to be protected/unprotected just like >>>>>>>>>>>> standard user snapshots. >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>> Looking at the Cinder codebase, I don't see any such restriction that >>>>>>>>>>>>> would prevent you from removing a volume from a consistency group that >>>>>>>>>>>>> has associated snapshots. I would double-check on the OpenStack >>>>>>>>>>>>> development mailing list if this is correct and is the intent. Worst >>>>>>>>>>>>> case, the RBD driver could raise an exception if there are still >>>>>>>>>>>>> consistency group snapshots associated to the image. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>> Another thing that bothers me. When we remove an image from a consistency group. >>>>>>>>>>>>>> Should we remove all snapshots of this image that were created as part >>>>>>>>>>>>>> of a consistency group snapshot? >>>>>>>>>>>>>> >>>>>>>>>>>>>> The easiest solution would be to remove all snapshots that are in >>>>>>>>>>>>>> GroupSnapshotNamespace and reference this consistency group. >>>>>>>>>>>>>> I looked into cinder docs for this feature: >>>>>>>>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> But it's not clear to me which behavior cinder expects. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> V. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd >>>>>>>>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the >>>>>>>>>>>>>>> associated namespace). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>> Question. When we print out snapshots of an image, should the group >>>>>>>>>>>>>>>> snapshots be listed, or should they be marked as special snapshots? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> V. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>>> Ok. I didn't have any intention to throw exceptions. >>>>>>>>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete >>>>>>>>>>>>>>>>> objects or I should use smart pointers. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode" >>>>>>>>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions >>>>>>>>>>>>>>>>>> on purpose. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> V. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>>>>>>> if (struct_v >= 5) { >>>>>>>>>>>>>>>>>>>>> ::decode(snapshot_namespace, p); >>>>>>>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>>>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>>>>>>>>>>>>>>>>> snapshot_namespace); >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I would do: >>>>>>>>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl); >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> +1 -- looks good to me >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> Jason >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Jason >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Jason >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Jason >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Jason >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jason >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jason >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jason >>>> >>>> >>>> >>>> -- >>>> Jason > > > > -- > Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Snapshots of consistency groups 2017-01-12 1:10 ` Victor Denisov @ 2017-01-12 1:36 ` Jason Dillaman 0 siblings, 0 replies; 43+ messages in thread From: Jason Dillaman @ 2017-01-12 1:36 UTC (permalink / raw) To: Victor Denisov; +Cc: Mykola Golub, ceph-devel Given that specific use-case, yes, you can switch into an out of a snapshot whenever you please using the snap_set API. Of course, the snapshot API methods will need to be expanded to support namespaces. On Wed, Jan 11, 2017 at 8:10 PM, Victor Denisov <vdenisov@mirantis.com> wrote: > If I understand correctly I can't open required snapshot immediately > if I know only its namespace. > > I need to get the whole list of snapshots first and then find mine > using its namespace. > After that I want to redirect the opened image to my snapshot. > > On Wed, Jan 11, 2017 at 4:18 PM, Jason Dillaman <jdillama@redhat.com> wrote: >> You can specify the snapshot at image open as well. There is no need to refresh. >> >> On Wed, Jan 11, 2017 at 7:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>> Can I open an image and then redirect it to one of its snapshots? >>> >>> I see that there is a snap_set method. >>> Do I understand correctly that I need to: >>> open the image >>> invoke snap_set method >>> invoke refresh method >>> >>> Thanks in advance. >>> V. >>> >>> On Wed, Jan 4, 2017 at 10:39 AM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>> I understood. Thanks! >>>> >>>> On Wed, Jan 4, 2017 at 10:34 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>> The loop isn't that big of a deal -- but you could eliminate it >>>>> entirely if you just index the in-memory snapshot table via the >>>>> SnapshotNamespace variant instead of just indexing snapshots by name >>>>> (e.g. ImageCtx::snap_ids key switches from a string to a namespace). >>>>> This would be required anyway since you might otherwise have duplicate >>>>> names between namespaces. >>>>> >>>>> On Wed, Jan 4, 2017 at 1:29 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>> It looks like next CDM is only next month. Let's try to figure it out in email. >>>>>> >>>>>>> Since you know which images are linked to the group and you know which >>>>>>> snapshots are in the group and which group snapshots are in the image, >>>>>>> you can reconcile any issues using the details in the >>>>>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the >>>>>>> actual snapshot name (it could technically just be a randomly assigned >>>>>>> UUID). >>>>>> >>>>>> Let's say I have a consistency group and a snapshot of this group: CG >>>>>> and CGSNAP. >>>>>> Images in this snapshot I'll define as: >>>>>> CG.images[0] - image1 >>>>>> CG.images[1] - image2 >>>>>> CG.images[2] - image3 >>>>>> >>>>>> Image snapshots in cg snapshot will be: >>>>>> CG.CGSNAP.snaps[0] - reference to snapshot of image 1 >>>>>> CG.CGSNAP.snaps[1] - reference to snapshot of image 2 >>>>>> >>>>>> Imagine that this snapshot was created, but wasn't finalized. >>>>>> CG.CGSNAP.state == PENDING. >>>>>> CG.CGSNAP.snaps.length == 0; >>>>>> I'll be writing in pseudo code. >>>>>> >>>>>> Now, let's say we want to remove this pending CGSNAP. This is the code >>>>>> how it's currently implemented: >>>>>> >>>>>> for (image: CG.images) { >>>>>> snap_name = image.id + "_" + CG.CGSNAP.id + "_" + CG.id // This name >>>>>> is unique because of uniqueness of the tuple (image.id, CG.CGSNAP.id, >>>>>> CG.id) >>>>>> remove_image_snapshot(snap_name); >>>>>> } >>>>>> remove_cg_snap(CGSNAP); >>>>>> >>>>>> However, if we don't rely on the name then this is how I envision the code: >>>>>> >>>>>> for (image: CG.images) { >>>>>> for (snap: image.snaps) { >>>>>> if (snap.namespace.cg_id == CG.id && snap.namespace.cg_snap_id == >>>>>> CG.CGSNAP.id) { // this is our snapshot >>>>>> remove_image_snapshot(snap.name); >>>>>> } >>>>>> } >>>>>> } >>>>>> remove_cg_snap(CGSNAP); >>>>>> >>>>>> In this solution I don't like the internal loop. >>>>>> What do you think? >>>>>> >>>>>> Thanks, >>>>>> Victor. >>>>>> >>>>>> >>>>>> On Tue, Jan 3, 2017 at 6:56 PM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>> After starting the process of creating a group snapshot, you will >>>>>>> already have all the necessary data for the group snapshot namespace >>>>>>> [1] (group pool, group id, and group snapshot id) and the group >>>>>>> snapshot should be persistently recorded to disk as >>>>>>> GROUP_SNAPSHOT_STATE_PENDING. >>>>>>> >>>>>>> Looking at the snapshot create state machine [2], I don't see any >>>>>>> place that a crash (or similar failure) would matter before the actual >>>>>>> image snapshot record is created atomically. You would pass the fully >>>>>>> populated GroupSnapshotNamespace to snap_create, and if the snapshot >>>>>>> is created, it's linked to the group via that namespace and any >>>>>>> failures afterwards don't matter since they are linked -- if the >>>>>>> snapshot fails to be created, it isn't linked to the group but the >>>>>>> snapshot doesn't exist either so there isn't anything to clean up. >>>>>>> >>>>>>> Since you know which images are linked to the group and you know which >>>>>>> snapshots are in the group and which group snapshots are in the image, >>>>>>> you can reconcile any issues using the details in the >>>>>>> GroupSnapshotNamespace -- there shouldn't be any need to depend on the >>>>>>> actual snapshot name (it could technically just be a randomly assigned >>>>>>> UUID). >>>>>>> >>>>>>> Perhaps we could talk about this at a future RBD standup meeting that >>>>>>> you are able to join (or the next CDM). >>>>>>> >>>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L249 >>>>>>> [2] https://github.com/ceph/ceph/blob/master/src/librbd/operation/SnapshotCreateRequest.h#L28 >>>>>>> >>>>>>> On Tue, Jan 3, 2017 at 7:40 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>> Let's say we start creating a group snapshot. >>>>>>>> We invoke async snap_create method in Operations class. >>>>>>>> When we invoke this method we provide it with the snapshot name. >>>>>>>> >>>>>>>> While we are wating for the response we can be aborted. >>>>>>>> As a result we will be able to find the exact image snapshot using only its name >>>>>>>> as this was the only information we had at the time of running >>>>>>>> snap_create method. >>>>>>>> >>>>>>>> If snap_create was successful we will be able to find the snapshot >>>>>>>> otherwise we will not. >>>>>>>> However if we allow renaming snapshots from GroupSnapshotNamespace >>>>>>>> then we may not find the snapshot even if it >>>>>>>> was created successfully. >>>>>>>> >>>>>>>> On Fri, Dec 16, 2016 at 6:53 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>> Can you give a little background on this specific inconsistent case >>>>>>>>> you are referring to? >>>>>>>>> >>>>>>>>> On Thu, Dec 15, 2016 at 7:05 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>> Yes, but if image's snapshot is renamed then I'm not able to find this >>>>>>>>>> snapshot having only group's snapshot in an inconsistent state for >>>>>>>>>> example. >>>>>>>>>> >>>>>>>>>> V. >>>>>>>>>> >>>>>>>>>> On Thu, Dec 15, 2016 at 7:10 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>> I think I might be confused. When creating a group snapshot, we have >>>>>>>>>>> the ConsistencyGroupSnapshot that allows you to store the necessary >>>>>>>>>>> linkage between the image's snapshot and its associated group snapshot >>>>>>>>>>> [1]. Why not just name the image's snapshots to the same name as the >>>>>>>>>>> parent group snapshot name and search the snapshot's metadata to get >>>>>>>>>>> the linkage? >>>>>>>>>>> >>>>>>>>>>> [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd_types.h#L255 >>>>>>>>>>> >>>>>>>>>>> On Tue, Dec 13, 2016 at 8:03 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>> Jason, >>>>>>>>>>>> >>>>>>>>>>>> My current implementation of consistency group snapshot feature names >>>>>>>>>>>> image snapshots like: <group_pool>_<group_id>_<group_snap_id> >>>>>>>>>>>> I rely on this fact when I need to remove a consistency group. It's >>>>>>>>>>>> necessary because if some of image snapshots were created, but the >>>>>>>>>>>> whole group snapshot operation failed, >>>>>>>>>>>> then the only way to find those dangling image snapshots is by this name. >>>>>>>>>>>> >>>>>>>>>>>> It means that we should forbid renaming snapshots from >>>>>>>>>>>> ConsistencyGroupSnapshot namespace. >>>>>>>>>>>> >>>>>>>>>>>> Another option is to allocate image snapshot ids during the creation >>>>>>>>>>>> of group snapshot, but this requires a major rewrite of the whole >>>>>>>>>>>> process of snapshot creation for images. >>>>>>>>>>>> >>>>>>>>>>>> What is your opinion on this? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> V. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Nov 1, 2016 at 8:01 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>> I chatted with Xing on IRC this morning re: Cinder generic groups. It >>>>>>>>>>>>> sounds like RBD will need to support preserving the image's >>>>>>>>>>>>> consistency group snapshots even if the image is removed from the >>>>>>>>>>>>> group. In the OpenStack case, you won't have to worry about the image >>>>>>>>>>>>> being deleted while it still has associated group snapshots. >>>>>>>>>>>>> >>>>>>>>>>>>> We will also want to support being able to clone child images from a >>>>>>>>>>>>> group snapshot to ensure that we can thin provision new groups volumes >>>>>>>>>>>>> when creating a new group from a group snapshot. This means that the >>>>>>>>>>>>> group snapshots should be able to be protected/unprotected just like >>>>>>>>>>>>> standard user snapshots. >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Oct 31, 2016 at 9:07 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>> Looking at the Cinder codebase, I don't see any such restriction that >>>>>>>>>>>>>> would prevent you from removing a volume from a consistency group that >>>>>>>>>>>>>> has associated snapshots. I would double-check on the OpenStack >>>>>>>>>>>>>> development mailing list if this is correct and is the intent. Worst >>>>>>>>>>>>>> case, the RBD driver could raise an exception if there are still >>>>>>>>>>>>>> consistency group snapshots associated to the image. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Oct 28, 2016 at 6:41 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>> Another thing that bothers me. When we remove an image from a consistency group. >>>>>>>>>>>>>>> Should we remove all snapshots of this image that were created as part >>>>>>>>>>>>>>> of a consistency group snapshot? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The easiest solution would be to remove all snapshots that are in >>>>>>>>>>>>>>> GroupSnapshotNamespace and reference this consistency group. >>>>>>>>>>>>>>> I looked into cinder docs for this feature: >>>>>>>>>>>>>>> http://docs.openstack.org/admin-guide/blockstorage-consistency-groups.html >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But it's not clear to me which behavior cinder expects. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> V. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Oct 26, 2016 at 6:16 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>>>> In a perfect world, it would be nice to add a new optional to "rbd >>>>>>>>>>>>>>>> snap ls" to show all snapshots (with a new column to indicate the >>>>>>>>>>>>>>>> associated namespace). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Oct 25, 2016 at 11:07 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>>> Question. When we print out snapshots of an image, should the group >>>>>>>>>>>>>>>>> snapshots be listed, or should they be marked as special snapshots? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> V. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 3:14 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>>>> Ok. I didn't have any intention to throw exceptions. >>>>>>>>>>>>>>>>>> I was more concerned about whether it's ok to allocate and delete >>>>>>>>>>>>>>>>>> objects or I should use smart pointers. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Mon, Oct 10, 2016 at 7:18 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>>>>>>> The only place exceptions are routinely used is within the "::decode" >>>>>>>>>>>>>>>>>>> functions. I would prefer to see the code not throwing new exceptions >>>>>>>>>>>>>>>>>>> on purpose. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, Oct 7, 2016 at 2:26 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>>>>>> Are any exceptions used in librbd code? Should the code be exception safe? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> V. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Fri, Sep 16, 2016 at 10:37 AM, Jason Dillaman <jdillama@redhat.com> wrote: >>>>>>>>>>>>>>>>>>>>> On Thu, Sep 15, 2016 at 7:17 PM, Victor Denisov <vdenisov@mirantis.com> wrote: >>>>>>>>>>>>>>>>>>>>>> if (struct_v >= 5) { >>>>>>>>>>>>>>>>>>>>>> ::decode(snapshot_namespace, p); >>>>>>>>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>>>>>>>> snapshot_namespace = cls::rbd::UserSnapshotNamespace(); >>>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> then code for ::encode function of cls_rbd_snap would change accordingly: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> boost::apply_visitor(cls::rbd::EncodeSnapshotTypeVisitor(bl), >>>>>>>>>>>>>>>>>>>>>> snapshot_namespace); >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I would do: >>>>>>>>>>>>>>>>>>>>>> ::encode(snapshot_namespace, bl); >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> +1 -- looks good to me >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> Jason >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> Jason >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Jason >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Jason >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Jason >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Jason >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jason >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jason >>>>> >>>>> >>>>> >>>>> -- >>>>> Jason >> >> >> >> -- >> Jason -- Jason ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2017-01-12 1:36 UTC | newest] Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-08-15 23:46 Snapshots of consistency groups Victor Denisov 2016-08-16 13:26 ` Jason Dillaman 2016-08-16 13:29 ` Jason Dillaman 2016-08-18 23:26 ` Victor Denisov [not found] ` <CA+aFP1BMK1=daZpUFdNDJ+i8+nfQt42ahCXVaxy9fWdAY-kXwA@mail.gmail.com> 2016-08-19 4:20 ` Victor Denisov 2016-08-19 12:48 ` Mykola Golub 2016-08-20 0:36 ` Victor Denisov 2016-08-20 16:27 ` Mykola Golub 2016-08-26 21:05 ` Victor Denisov 2016-08-29 0:37 ` Jason Dillaman 2016-08-29 21:10 ` Victor Denisov 2016-09-09 22:41 ` Victor Denisov 2016-09-10 18:37 ` Jason Dillaman 2016-09-11 1:46 ` Victor Denisov 2016-09-12 13:18 ` Jason Dillaman 2016-09-12 22:52 ` Victor Denisov 2016-09-12 23:06 ` Victor Denisov 2016-09-12 23:09 ` Victor Denisov 2016-09-13 12:33 ` Jason Dillaman 2016-09-13 23:41 ` Victor Denisov 2016-09-15 23:17 ` Victor Denisov 2016-09-16 17:37 ` Jason Dillaman 2016-10-07 18:26 ` Victor Denisov 2016-10-10 14:18 ` Jason Dillaman 2016-10-10 22:14 ` Victor Denisov 2016-10-26 3:07 ` Victor Denisov 2016-10-26 13:16 ` Jason Dillaman 2016-10-28 22:41 ` Victor Denisov 2016-10-31 13:07 ` Jason Dillaman 2016-11-01 15:01 ` Jason Dillaman 2016-12-14 1:03 ` Victor Denisov 2016-12-15 15:10 ` Jason Dillaman 2016-12-16 0:05 ` Victor Denisov 2016-12-16 14:53 ` Jason Dillaman 2017-01-04 0:40 ` Victor Denisov 2017-01-04 2:56 ` Jason Dillaman 2017-01-04 18:29 ` Victor Denisov 2017-01-04 18:34 ` Jason Dillaman 2017-01-04 18:39 ` Victor Denisov 2017-01-12 0:14 ` Victor Denisov 2017-01-12 0:18 ` Jason Dillaman 2017-01-12 1:10 ` Victor Denisov 2017-01-12 1:36 ` Jason Dillaman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.