* Re: [PATCH] xen-blkback: Switch to closed state after releasing the backing device
@ 2018-09-05 16:27 ` Valentin Vidic
0 siblings, 0 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-05 16:27 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Konrad Rzeszutek Wilk, Jens Axboe, xen-devel, linux-block,
linux-kernel, stable, drbd-user
On Wed, Sep 05, 2018 at 12:36:49PM +0200, Roger Pau Monn� wrote:
> On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> > Switching to closed state earlier can cause the block-drbd
> > script to fail with 'Device is held open by someone':
> >
> > root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> > kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> > kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> > kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> > root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> > root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
> >
> > Signed-off-by: Valentin Vidic <Valentin.Vidic@CARNet.hr>
> > Cc: stable@vger.kernel.org
> > ---
> > drivers/block/xen-blkback/xenbus.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> > index a4bc74e72c39..43bddc996709 100644
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -323,6 +323,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> > {
> > WARN_ON(xen_blkif_disconnect(blkif));
> > xen_vbd_free(&blkif->vbd);
> > + xenbus_switch_state(blkif->be->dev, XenbusStateClosed);
> > kfree(blkif->be->mode);
> > kfree(blkif->be);
> >
> > @@ -814,7 +815,6 @@ static void frontend_changed(struct xenbus_device *dev,
> >
> > case XenbusStateClosed:
> > xen_blkif_disconnect(be->blkif);
> > - xenbus_switch_state(dev, XenbusStateClosed);
> > if (xenbus_dev_is_online(dev))
> > break;
>
> AFAICT, this will cause the backend to never switch to 'Closed' state
> until the toolstack sets online to 0, which is not good IMO.
>
> If for example a frontend decides to close a device, the backend will
> stay in state 'Closing' until the toolstack actually removes the disk
> by setting online to 0.
>
> This will prevent resetting blk connections, as blkback will refuse to
> switch to state XenbusStateInitWait unless it's at XenbusStateClosed
> (see the XenbusStateInitialising case in frontend_changed), which will
> never be reached with your patch.
Ok, is it possible to test this somehow?
> Maybe the easiest solution would be to wait in the block-drbd script
> until the device is released? Maybe using fstat in a loop or one of
> the drbd tools?
That is an option, but I don't know if it is possible to check device
release like that. Including drbd-user in CC, maybe they have some
idea how to solve this.
--
Valentin
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [PATCH] xen-blkback: Switch to closed state after releasing the backing device
@ 2018-09-05 16:27 ` Valentin Vidic
0 siblings, 0 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-05 16:27 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Konrad Rzeszutek Wilk, Jens Axboe, xen-devel, linux-block,
linux-kernel, stable, drbd-user
On Wed, Sep 05, 2018 at 12:36:49PM +0200, Roger Pau Monné wrote:
> On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> > Switching to closed state earlier can cause the block-drbd
> > script to fail with 'Device is held open by someone':
> >
> > root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> > kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> > kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> > kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> > root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> > root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
> >
> > Signed-off-by: Valentin Vidic <Valentin.Vidic@CARNet.hr>
> > Cc: stable@vger.kernel.org
> > ---
> > drivers/block/xen-blkback/xenbus.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> > index a4bc74e72c39..43bddc996709 100644
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -323,6 +323,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> > {
> > WARN_ON(xen_blkif_disconnect(blkif));
> > xen_vbd_free(&blkif->vbd);
> > + xenbus_switch_state(blkif->be->dev, XenbusStateClosed);
> > kfree(blkif->be->mode);
> > kfree(blkif->be);
> >
> > @@ -814,7 +815,6 @@ static void frontend_changed(struct xenbus_device *dev,
> >
> > case XenbusStateClosed:
> > xen_blkif_disconnect(be->blkif);
> > - xenbus_switch_state(dev, XenbusStateClosed);
> > if (xenbus_dev_is_online(dev))
> > break;
>
> AFAICT, this will cause the backend to never switch to 'Closed' state
> until the toolstack sets online to 0, which is not good IMO.
>
> If for example a frontend decides to close a device, the backend will
> stay in state 'Closing' until the toolstack actually removes the disk
> by setting online to 0.
>
> This will prevent resetting blk connections, as blkback will refuse to
> switch to state XenbusStateInitWait unless it's at XenbusStateClosed
> (see the XenbusStateInitialising case in frontend_changed), which will
> never be reached with your patch.
Ok, is it possible to test this somehow?
> Maybe the easiest solution would be to wait in the block-drbd script
> until the device is released? Maybe using fstat in a loop or one of
> the drbd tools?
That is an option, but I don't know if it is possible to check device
release like that. Including drbd-user in CC, maybe they have some
idea how to solve this.
--
Valentin
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-05 16:27 ` Valentin Vidic
(?)
(?)
@ 2018-09-06 16:14 ` Roger Pau Monné
-1 siblings, 0 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-09-06 16:14 UTC (permalink / raw)
To: Valentin Vidic
Cc: Jens Axboe, linux-kernel, stable, linux-block, xen-devel, drbd-user
On Wed, Sep 05, 2018 at 06:27:56PM +0200, Valentin Vidic wrote:
> On Wed, Sep 05, 2018 at 12:36:49PM +0200, Roger Pau Monné wrote:
> > On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> > > Switching to closed state earlier can cause the block-drbd
> > > script to fail with 'Device is held open by someone':
> > >
> > > root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> > > kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> > > kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> > > kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> > > root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> > > root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
> > >
> > > Signed-off-by: Valentin Vidic <Valentin.Vidic@CARNet.hr>
> > > Cc: stable@vger.kernel.org
> > > ---
> > > drivers/block/xen-blkback/xenbus.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> > > index a4bc74e72c39..43bddc996709 100644
> > > --- a/drivers/block/xen-blkback/xenbus.c
> > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > @@ -323,6 +323,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> > > {
> > > WARN_ON(xen_blkif_disconnect(blkif));
> > > xen_vbd_free(&blkif->vbd);
> > > + xenbus_switch_state(blkif->be->dev, XenbusStateClosed);
> > > kfree(blkif->be->mode);
> > > kfree(blkif->be);
> > >
> > > @@ -814,7 +815,6 @@ static void frontend_changed(struct xenbus_device *dev,
> > >
> > > case XenbusStateClosed:
> > > xen_blkif_disconnect(be->blkif);
> > > - xenbus_switch_state(dev, XenbusStateClosed);
> > > if (xenbus_dev_is_online(dev))
> > > break;
> >
> > AFAICT, this will cause the backend to never switch to 'Closed' state
> > until the toolstack sets online to 0, which is not good IMO.
> >
> > If for example a frontend decides to close a device, the backend will
> > stay in state 'Closing' until the toolstack actually removes the disk
> > by setting online to 0.
> >
> > This will prevent resetting blk connections, as blkback will refuse to
> > switch to state XenbusStateInitWait unless it's at XenbusStateClosed
> > (see the XenbusStateInitialising case in frontend_changed), which will
> > never be reached with your patch.
>
> Ok, is it possible to test this somehow?
Yes, you can try booting a PV guest with pvgrub, that will cause
pvgrub to open a blk connection, then close it and afterwards the OS
kernel will start a new connection.
Thanks, Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-05 16:27 ` Valentin Vidic
@ 2018-09-06 16:14 ` Roger Pau Monné
-1 siblings, 0 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-09-06 16:14 UTC (permalink / raw)
To: Valentin Vidic
Cc: Konrad Rzeszutek Wilk, Jens Axboe, xen-devel, linux-block,
linux-kernel, stable, drbd-user
On Wed, Sep 05, 2018 at 06:27:56PM +0200, Valentin Vidic wrote:
> On Wed, Sep 05, 2018 at 12:36:49PM +0200, Roger Pau Monn� wrote:
> > On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> > > Switching to closed state earlier can cause the block-drbd
> > > script to fail with 'Device is held open by someone':
> > >
> > > root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> > > kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> > > kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> > > kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> > > root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> > > root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
> > >
> > > Signed-off-by: Valentin Vidic <Valentin.Vidic@CARNet.hr>
> > > Cc: stable@vger.kernel.org
> > > ---
> > > drivers/block/xen-blkback/xenbus.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> > > index a4bc74e72c39..43bddc996709 100644
> > > --- a/drivers/block/xen-blkback/xenbus.c
> > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > @@ -323,6 +323,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> > > {
> > > WARN_ON(xen_blkif_disconnect(blkif));
> > > xen_vbd_free(&blkif->vbd);
> > > + xenbus_switch_state(blkif->be->dev, XenbusStateClosed);
> > > kfree(blkif->be->mode);
> > > kfree(blkif->be);
> > >
> > > @@ -814,7 +815,6 @@ static void frontend_changed(struct xenbus_device *dev,
> > >
> > > case XenbusStateClosed:
> > > xen_blkif_disconnect(be->blkif);
> > > - xenbus_switch_state(dev, XenbusStateClosed);
> > > if (xenbus_dev_is_online(dev))
> > > break;
> >
> > AFAICT, this will cause the backend to never switch to 'Closed' state
> > until the toolstack sets online to 0, which is not good IMO.
> >
> > If for example a frontend decides to close a device, the backend will
> > stay in state 'Closing' until the toolstack actually removes the disk
> > by setting online to 0.
> >
> > This will prevent resetting blk connections, as blkback will refuse to
> > switch to state XenbusStateInitWait unless it's at XenbusStateClosed
> > (see the XenbusStateInitialising case in frontend_changed), which will
> > never be reached with your patch.
>
> Ok, is it possible to test this somehow?
Yes, you can try booting a PV guest with pvgrub, that will cause
pvgrub to open a blk connection, then close it and afterwards the OS
kernel will start a new connection.
Thanks, Roger.
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [PATCH] xen-blkback: Switch to closed state after releasing the backing device
@ 2018-09-06 16:14 ` Roger Pau Monné
0 siblings, 0 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-09-06 16:14 UTC (permalink / raw)
To: Valentin Vidic
Cc: Konrad Rzeszutek Wilk, Jens Axboe, xen-devel, linux-block,
linux-kernel, stable, drbd-user
On Wed, Sep 05, 2018 at 06:27:56PM +0200, Valentin Vidic wrote:
> On Wed, Sep 05, 2018 at 12:36:49PM +0200, Roger Pau Monné wrote:
> > On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> > > Switching to closed state earlier can cause the block-drbd
> > > script to fail with 'Device is held open by someone':
> > >
> > > root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> > > kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> > > kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> > > kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> > > root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> > > root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
> > >
> > > Signed-off-by: Valentin Vidic <Valentin.Vidic@CARNet.hr>
> > > Cc: stable@vger.kernel.org
> > > ---
> > > drivers/block/xen-blkback/xenbus.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> > > index a4bc74e72c39..43bddc996709 100644
> > > --- a/drivers/block/xen-blkback/xenbus.c
> > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > @@ -323,6 +323,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> > > {
> > > WARN_ON(xen_blkif_disconnect(blkif));
> > > xen_vbd_free(&blkif->vbd);
> > > + xenbus_switch_state(blkif->be->dev, XenbusStateClosed);
> > > kfree(blkif->be->mode);
> > > kfree(blkif->be);
> > >
> > > @@ -814,7 +815,6 @@ static void frontend_changed(struct xenbus_device *dev,
> > >
> > > case XenbusStateClosed:
> > > xen_blkif_disconnect(be->blkif);
> > > - xenbus_switch_state(dev, XenbusStateClosed);
> > > if (xenbus_dev_is_online(dev))
> > > break;
> >
> > AFAICT, this will cause the backend to never switch to 'Closed' state
> > until the toolstack sets online to 0, which is not good IMO.
> >
> > If for example a frontend decides to close a device, the backend will
> > stay in state 'Closing' until the toolstack actually removes the disk
> > by setting online to 0.
> >
> > This will prevent resetting blk connections, as blkback will refuse to
> > switch to state XenbusStateInitWait unless it's at XenbusStateClosed
> > (see the XenbusStateInitialising case in frontend_changed), which will
> > never be reached with your patch.
>
> Ok, is it possible to test this somehow?
Yes, you can try booting a PV guest with pvgrub, that will cause
pvgrub to open a blk connection, then close it and afterwards the OS
kernel will start a new connection.
Thanks, Roger.
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-06 16:14 ` Roger Pau Monné
@ 2018-09-06 22:03 ` Valentin Vidic
-1 siblings, 0 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-06 22:03 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Konrad Rzeszutek Wilk, Jens Axboe, xen-devel, linux-block,
linux-kernel, stable, drbd-user
On Thu, Sep 06, 2018 at 06:14:21PM +0200, Roger Pau Monn� wrote:
> On Wed, Sep 05, 2018 at 06:27:56PM +0200, Valentin Vidic wrote:
> > On Wed, Sep 05, 2018 at 12:36:49PM +0200, Roger Pau Monn� wrote:
> > > On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> > > > Switching to closed state earlier can cause the block-drbd
> > > > script to fail with 'Device is held open by someone':
> > > >
> > > > root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> > > > kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> > > > kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> > > > kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> > > > root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> > > > root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
> > > >
> > > > Signed-off-by: Valentin Vidic <Valentin.Vidic@CARNet.hr>
> > > > Cc: stable@vger.kernel.org
> > > > ---
> > > > drivers/block/xen-blkback/xenbus.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> > > > index a4bc74e72c39..43bddc996709 100644
> > > > --- a/drivers/block/xen-blkback/xenbus.c
> > > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > > @@ -323,6 +323,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> > > > {
> > > > WARN_ON(xen_blkif_disconnect(blkif));
> > > > xen_vbd_free(&blkif->vbd);
> > > > + xenbus_switch_state(blkif->be->dev, XenbusStateClosed);
> > > > kfree(blkif->be->mode);
> > > > kfree(blkif->be);
> > > >
> > > > @@ -814,7 +815,6 @@ static void frontend_changed(struct xenbus_device *dev,
> > > >
> > > > case XenbusStateClosed:
> > > > xen_blkif_disconnect(be->blkif);
> > > > - xenbus_switch_state(dev, XenbusStateClosed);
> > > > if (xenbus_dev_is_online(dev))
> > > > break;
> > >
> > > AFAICT, this will cause the backend to never switch to 'Closed' state
> > > until the toolstack sets online to 0, which is not good IMO.
> > >
> > > If for example a frontend decides to close a device, the backend will
> > > stay in state 'Closing' until the toolstack actually removes the disk
> > > by setting online to 0.
> > >
> > > This will prevent resetting blk connections, as blkback will refuse to
> > > switch to state XenbusStateInitWait unless it's at XenbusStateClosed
> > > (see the XenbusStateInitialising case in frontend_changed), which will
> > > never be reached with your patch.
> >
> > Ok, is it possible to test this somehow?
>
> Yes, you can try booting a PV guest with pvgrub, that will cause
> pvgrub to open a blk connection, then close it and afterwards the OS
> kernel will start a new connection.
Indeed the boot hangs after pvgrub loads the kernel and initrd, so it
seems this does not work as expected.
--
Valentin
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [PATCH] xen-blkback: Switch to closed state after releasing the backing device
@ 2018-09-06 22:03 ` Valentin Vidic
0 siblings, 0 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-06 22:03 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Konrad Rzeszutek Wilk, Jens Axboe, xen-devel, linux-block,
linux-kernel, stable, drbd-user
On Thu, Sep 06, 2018 at 06:14:21PM +0200, Roger Pau Monné wrote:
> On Wed, Sep 05, 2018 at 06:27:56PM +0200, Valentin Vidic wrote:
> > On Wed, Sep 05, 2018 at 12:36:49PM +0200, Roger Pau Monné wrote:
> > > On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> > > > Switching to closed state earlier can cause the block-drbd
> > > > script to fail with 'Device is held open by someone':
> > > >
> > > > root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> > > > kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> > > > kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> > > > kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> > > > root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> > > > root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
> > > >
> > > > Signed-off-by: Valentin Vidic <Valentin.Vidic@CARNet.hr>
> > > > Cc: stable@vger.kernel.org
> > > > ---
> > > > drivers/block/xen-blkback/xenbus.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> > > > index a4bc74e72c39..43bddc996709 100644
> > > > --- a/drivers/block/xen-blkback/xenbus.c
> > > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > > @@ -323,6 +323,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> > > > {
> > > > WARN_ON(xen_blkif_disconnect(blkif));
> > > > xen_vbd_free(&blkif->vbd);
> > > > + xenbus_switch_state(blkif->be->dev, XenbusStateClosed);
> > > > kfree(blkif->be->mode);
> > > > kfree(blkif->be);
> > > >
> > > > @@ -814,7 +815,6 @@ static void frontend_changed(struct xenbus_device *dev,
> > > >
> > > > case XenbusStateClosed:
> > > > xen_blkif_disconnect(be->blkif);
> > > > - xenbus_switch_state(dev, XenbusStateClosed);
> > > > if (xenbus_dev_is_online(dev))
> > > > break;
> > >
> > > AFAICT, this will cause the backend to never switch to 'Closed' state
> > > until the toolstack sets online to 0, which is not good IMO.
> > >
> > > If for example a frontend decides to close a device, the backend will
> > > stay in state 'Closing' until the toolstack actually removes the disk
> > > by setting online to 0.
> > >
> > > This will prevent resetting blk connections, as blkback will refuse to
> > > switch to state XenbusStateInitWait unless it's at XenbusStateClosed
> > > (see the XenbusStateInitialising case in frontend_changed), which will
> > > never be reached with your patch.
> >
> > Ok, is it possible to test this somehow?
>
> Yes, you can try booting a PV guest with pvgrub, that will cause
> pvgrub to open a blk connection, then close it and afterwards the OS
> kernel will start a new connection.
Indeed the boot hangs after pvgrub loads the kernel and initrd, so it
seems this does not work as expected.
--
Valentin
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-06 16:14 ` Roger Pau Monné
(?)
(?)
@ 2018-09-06 22:03 ` Valentin Vidic
-1 siblings, 0 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-06 22:03 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, drbd-user
On Thu, Sep 06, 2018 at 06:14:21PM +0200, Roger Pau Monné wrote:
> On Wed, Sep 05, 2018 at 06:27:56PM +0200, Valentin Vidic wrote:
> > On Wed, Sep 05, 2018 at 12:36:49PM +0200, Roger Pau Monné wrote:
> > > On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> > > > Switching to closed state earlier can cause the block-drbd
> > > > script to fail with 'Device is held open by someone':
> > > >
> > > > root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> > > > kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> > > > kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> > > > kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> > > > root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> > > > root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
> > > >
> > > > Signed-off-by: Valentin Vidic <Valentin.Vidic@CARNet.hr>
> > > > Cc: stable@vger.kernel.org
> > > > ---
> > > > drivers/block/xen-blkback/xenbus.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> > > > index a4bc74e72c39..43bddc996709 100644
> > > > --- a/drivers/block/xen-blkback/xenbus.c
> > > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > > @@ -323,6 +323,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> > > > {
> > > > WARN_ON(xen_blkif_disconnect(blkif));
> > > > xen_vbd_free(&blkif->vbd);
> > > > + xenbus_switch_state(blkif->be->dev, XenbusStateClosed);
> > > > kfree(blkif->be->mode);
> > > > kfree(blkif->be);
> > > >
> > > > @@ -814,7 +815,6 @@ static void frontend_changed(struct xenbus_device *dev,
> > > >
> > > > case XenbusStateClosed:
> > > > xen_blkif_disconnect(be->blkif);
> > > > - xenbus_switch_state(dev, XenbusStateClosed);
> > > > if (xenbus_dev_is_online(dev))
> > > > break;
> > >
> > > AFAICT, this will cause the backend to never switch to 'Closed' state
> > > until the toolstack sets online to 0, which is not good IMO.
> > >
> > > If for example a frontend decides to close a device, the backend will
> > > stay in state 'Closing' until the toolstack actually removes the disk
> > > by setting online to 0.
> > >
> > > This will prevent resetting blk connections, as blkback will refuse to
> > > switch to state XenbusStateInitWait unless it's at XenbusStateClosed
> > > (see the XenbusStateInitialising case in frontend_changed), which will
> > > never be reached with your patch.
> >
> > Ok, is it possible to test this somehow?
>
> Yes, you can try booting a PV guest with pvgrub, that will cause
> pvgrub to open a blk connection, then close it and afterwards the OS
> kernel will start a new connection.
Indeed the boot hangs after pvgrub loads the kernel and initrd, so it
seems this does not work as expected.
--
Valentin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-05 16:27 ` Valentin Vidic
` (3 preceding siblings ...)
(?)
@ 2018-09-07 12:03 ` Lars Ellenberg
-1 siblings, 0 replies; 80+ messages in thread
From: Lars Ellenberg @ 2018-09-07 12:03 UTC (permalink / raw)
To: Valentin Vidic
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, drbd-user, Roger Pau Monné
On Wed, Sep 05, 2018 at 06:27:56PM +0200, Valentin Vidic wrote:
> On Wed, Sep 05, 2018 at 12:36:49PM +0200, Roger Pau Monné wrote:
> > On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> > > Switching to closed state earlier can cause the block-drbd
> > > script to fail with 'Device is held open by someone':
> > >
> > > root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> > > kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> > > kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> > > kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> > > root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> > > root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
Very frequently it is *NOT* the "original user", that "still" holds it
open, but udev, or something triggered-by-udev.
So double-checking the udev rules,
or the "lvm global_filter" settings may help.
You could instrument DRBD to log current->{pid,comm} on open and close,
so you can better detect who the "someone" is in the message above.
Adding a small retry loop in the script may help as well.
As a side note,
with DRBD 9 (which still moves too fast to be "in tree")
we have the "auto-promote" feature, where no explicit promotion/demotion
is necessary anymore, but DRBD (tries to) promote itself with the
first "RW" opener, and demote itself once the last opener is gone.
Hth,
Lars
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-05 16:27 ` Valentin Vidic
@ 2018-09-07 12:03 ` Lars Ellenberg
-1 siblings, 0 replies; 80+ messages in thread
From: Lars Ellenberg @ 2018-09-07 12:03 UTC (permalink / raw)
To: Valentin Vidic
Cc: Roger Pau Monné,
Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, drbd-user
On Wed, Sep 05, 2018 at 06:27:56PM +0200, Valentin Vidic wrote:
> On Wed, Sep 05, 2018 at 12:36:49PM +0200, Roger Pau Monn� wrote:
> > On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> > > Switching to closed state earlier can cause the block-drbd
> > > script to fail with 'Device is held open by someone':
> > >
> > > root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> > > kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> > > kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> > > kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> > > root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> > > root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
Very frequently it is *NOT* the "original user", that "still" holds it
open, but udev, or something triggered-by-udev.
So double-checking the udev rules,
or the "lvm global_filter" settings may help.
You could instrument DRBD to log current->{pid,comm} on open and close,
so you can better detect who the "someone" is in the message above.
Adding a small retry loop in the script may help as well.
As a side note,
with DRBD 9 (which still moves too fast to be "in tree")
we have the "auto-promote" feature, where no explicit promotion/demotion
is necessary anymore, but DRBD (tries to) promote itself with the
first "RW" opener, and demote itself once the last opener is gone.
Hth,
Lars
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
@ 2018-09-07 12:03 ` Lars Ellenberg
0 siblings, 0 replies; 80+ messages in thread
From: Lars Ellenberg @ 2018-09-07 12:03 UTC (permalink / raw)
To: Valentin Vidic
Cc: Roger Pau Monné,
Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, drbd-user
On Wed, Sep 05, 2018 at 06:27:56PM +0200, Valentin Vidic wrote:
> On Wed, Sep 05, 2018 at 12:36:49PM +0200, Roger Pau Monné wrote:
> > On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> > > Switching to closed state earlier can cause the block-drbd
> > > script to fail with 'Device is held open by someone':
> > >
> > > root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> > > kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> > > kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> > > kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> > > root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> > > root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
Very frequently it is *NOT* the "original user", that "still" holds it
open, but udev, or something triggered-by-udev.
So double-checking the udev rules,
or the "lvm global_filter" settings may help.
You could instrument DRBD to log current->{pid,comm} on open and close,
so you can better detect who the "someone" is in the message above.
Adding a small retry loop in the script may help as well.
As a side note,
with DRBD 9 (which still moves too fast to be "in tree")
we have the "auto-promote" feature, where no explicit promotion/demotion
is necessary anymore, but DRBD (tries to) promote itself with the
first "RW" opener, and demote itself once the last opener is gone.
Hth,
Lars
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-07 12:03 ` Lars Ellenberg
(?)
@ 2018-09-07 12:13 ` Valentin Vidic
-1 siblings, 0 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-07 12:13 UTC (permalink / raw)
To: drbd-user
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, Roger Pau Monné
On Fri, Sep 07, 2018 at 02:03:37PM +0200, Lars Ellenberg wrote:
> Very frequently it is *NOT* the "original user", that "still" holds it
> open, but udev, or something triggered-by-udev.
>
> So double-checking the udev rules,
> or the "lvm global_filter" settings may help.
> You could instrument DRBD to log current->{pid,comm} on open and close,
> so you can better detect who the "someone" is in the message above.
Don't think there is anything else holding the device open, because it
is possible to change state to Secondary a few seconds later. But I will
try to print those values in case anything interesting comes up.
> Adding a small retry loop in the script may help as well.
Yes, that is an option, but it would still leave those nasty "State
change failed" messages in the log. I guess there is no way to check
the value of DRBD device->open_cnt from userspace?
--
Valentin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-07 12:03 ` Lars Ellenberg
(?)
(?)
@ 2018-09-07 12:13 ` Valentin Vidic
2018-09-07 13:28 ` Lars Ellenberg
2018-09-07 13:28 ` Lars Ellenberg
-1 siblings, 2 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-07 12:13 UTC (permalink / raw)
To: drbd-user
Cc: Roger Pau Monné,
Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel
On Fri, Sep 07, 2018 at 02:03:37PM +0200, Lars Ellenberg wrote:
> Very frequently it is *NOT* the "original user", that "still" holds it
> open, but udev, or something triggered-by-udev.
>
> So double-checking the udev rules,
> or the "lvm global_filter" settings may help.
> You could instrument DRBD to log current->{pid,comm} on open and close,
> so you can better detect who the "someone" is in the message above.
Don't think there is anything else holding the device open, because it
is possible to change state to Secondary a few seconds later. But I will
try to print those values in case anything interesting comes up.
> Adding a small retry loop in the script may help as well.
Yes, that is an option, but it would still leave those nasty "State
change failed" messages in the log. I guess there is no way to check
the value of DRBD device->open_cnt from userspace?
--
Valentin
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-07 12:13 ` Valentin Vidic
@ 2018-09-07 13:28 ` Lars Ellenberg
2018-09-07 13:28 ` Lars Ellenberg
1 sibling, 0 replies; 80+ messages in thread
From: Lars Ellenberg @ 2018-09-07 13:28 UTC (permalink / raw)
To: Valentin Vidic
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, drbd-user, Roger Pau Monné
On Fri, Sep 07, 2018 at 02:13:48PM +0200, Valentin Vidic wrote:
> On Fri, Sep 07, 2018 at 02:03:37PM +0200, Lars Ellenberg wrote:
> > Very frequently it is *NOT* the "original user", that "still" holds it
> > open, but udev, or something triggered-by-udev.
> >
> > So double-checking the udev rules,
> > or the "lvm global_filter" settings may help.
> > You could instrument DRBD to log current->{pid,comm} on open and close,
> > so you can better detect who the "someone" is in the message above.
>
> Don't think there is anything else holding the device open, because it
> is possible to change state to Secondary a few seconds later. But I will
> try to print those values in case anything interesting comes up.
>
> > Adding a small retry loop in the script may help as well.
>
> Yes, that is an option, but it would still leave those nasty "State
> change failed" messages in the log. I guess there is no way to check
> the value of DRBD device->open_cnt from userspace?
We don't expose that, no.
But even if we did, that would not be racefree :-)
The last (or even: any?) "close" of a block device that used to be open
for WRITE triggeres a udev "change" event, thus a udev run,
and the minimal action will be some read-only open and ioctl from
(systemd-)udev itself, more likely there also will be blkid and possibly
pvscan and similar actions. All of them should be read-only openers,
and all of them should be "short".
But they will race with the drbd demotion.
You can watch that happen with
# udevadm monitor &
then open and close for write:
# : > /dev/sda
(or any other block device ...)
To excercise the drbd vs udev race,
(you can leave the udevadm monitor running if you want):
## clear dmesg
# dmesg -c > /dev/null
## promote
# drbdadm primary s0 ;
## open/close for write
# : > /dev/drbd0 ;
## the close has triggered a udev run,
## which will now race with demotion,
## which will "sometimes" fail:
# drbdadm secondary s0 ;
## wait a bit, and retry
# sleep 1; drbdadm secondary s0 ; dmesg
### --- example output on some test box right now: ----------------
@root@ava:~# udevadm monitor &
...
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent
...
@root@ava:~# dmesg -c > /dev/null; drbdadm primary s0 ; : > /dev/drbd0 ; drbdadm secondary s0 ; sleep 1; drbdadm secondary s0 ; dmesg
KERNEL[609638.990320] change /devices/virtual/block/drbd0 (block)
KERNEL[609638.991364] change /devices/virtual/block/drbd0 (block)
UDEV [609639.008879] change /devices/virtual/block/drbd0 (block)
0: State change failed: (-12) Device is held open by someone
Command 'drbdsetup-84 secondary 0' terminated with exit code 11
UDEV [609639.011652] change /devices/virtual/block/drbd0 (block)
KERNEL[609640.017356] change /devices/virtual/block/drbd0 (block)
KERNEL[609640.018074] change /devices/virtual/block/drbd0 (block)
[609613.882751] block drbd0: role( Secondary -> Primary )
[609613.889998] block drbd0: State change failed: Device is held open by someone
[609613.894280] block drbd0: state = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/Outdated r----- }
[609613.897609] block drbd0: wanted = { cs:WFConnection ro:Secondary/Unknown ds:UpToDate/Outdated r----- }
[609614.909537] block drbd0: role( Primary -> Secondary )
[609614.909662] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
UDEV [609640.024588] change /devices/virtual/block/drbd0 (block)
UDEV [609640.028731] change /devices/virtual/block/drbd0 (block)
@root@ava:~#
### --------------------------------------------------------------
Obviously change s0 and drbd0 appropriately; to exactly hit the time
window where udev has it open may be tricky, and on a busy system even
the sleep 1 can be not enough (so even the second invocation may fail
still), on an idle system udev may already be done, or may not even have
started yet when the first "secondary" runs.
Your best bet is to review your udev rules,
and make sure "drbd*" will not be looked at
by blkid or pvscan or similar.
See also: https://github.com/systemd/systemd/commit/fee854ee8ccde
respectively https://github.com/systemd/systemd/issues/9371
to get rid even of the unconditional open by systemd-udevd itself.
Note,
I'm not saying blkback has no issue; I don't know.
I'm just pointing out that there are other things
that may cause the same effects.
Lars
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-07 12:13 ` Valentin Vidic
2018-09-07 13:28 ` Lars Ellenberg
@ 2018-09-07 13:28 ` Lars Ellenberg
2018-09-07 16:45 ` Valentin Vidic
2018-09-07 16:45 ` Valentin Vidic
1 sibling, 2 replies; 80+ messages in thread
From: Lars Ellenberg @ 2018-09-07 13:28 UTC (permalink / raw)
To: Valentin Vidic
Cc: drbd-user, Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel,
stable, linux-block, xen-devel, Roger Pau Monné
On Fri, Sep 07, 2018 at 02:13:48PM +0200, Valentin Vidic wrote:
> On Fri, Sep 07, 2018 at 02:03:37PM +0200, Lars Ellenberg wrote:
> > Very frequently it is *NOT* the "original user", that "still" holds it
> > open, but udev, or something triggered-by-udev.
> >
> > So double-checking the udev rules,
> > or the "lvm global_filter" settings may help.
> > You could instrument DRBD to log current->{pid,comm} on open and close,
> > so you can better detect who the "someone" is in the message above.
>
> Don't think there is anything else holding the device open, because it
> is possible to change state to Secondary a few seconds later. But I will
> try to print those values in case anything interesting comes up.
>
> > Adding a small retry loop in the script may help as well.
>
> Yes, that is an option, but it would still leave those nasty "State
> change failed" messages in the log. I guess there is no way to check
> the value of DRBD device->open_cnt from userspace?
We don't expose that, no.
But even if we did, that would not be racefree :-)
The last (or even: any?) "close" of a block device that used to be open
for WRITE triggeres a udev "change" event, thus a udev run,
and the minimal action will be some read-only open and ioctl from
(systemd-)udev itself, more likely there also will be blkid and possibly
pvscan and similar actions. All of them should be read-only openers,
and all of them should be "short".
But they will race with the drbd demotion.
You can watch that happen with
# udevadm monitor &
then open and close for write:
# : > /dev/sda
(or any other block device ...)
To excercise the drbd vs udev race,
(you can leave the udevadm monitor running if you want):
## clear dmesg
# dmesg -c > /dev/null
## promote
# drbdadm primary s0 ;
## open/close for write
# : > /dev/drbd0 ;
## the close has triggered a udev run,
## which will now race with demotion,
## which will "sometimes" fail:
# drbdadm secondary s0 ;
## wait a bit, and retry
# sleep 1; drbdadm secondary s0 ; dmesg
### --- example output on some test box right now: ----------------
@root@ava:~# udevadm monitor &
...
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent
...
@root@ava:~# dmesg -c > /dev/null; drbdadm primary s0 ; : > /dev/drbd0 ; drbdadm secondary s0 ; sleep 1; drbdadm secondary s0 ; dmesg
KERNEL[609638.990320] change /devices/virtual/block/drbd0 (block)
KERNEL[609638.991364] change /devices/virtual/block/drbd0 (block)
UDEV [609639.008879] change /devices/virtual/block/drbd0 (block)
0: State change failed: (-12) Device is held open by someone
Command 'drbdsetup-84 secondary 0' terminated with exit code 11
UDEV [609639.011652] change /devices/virtual/block/drbd0 (block)
KERNEL[609640.017356] change /devices/virtual/block/drbd0 (block)
KERNEL[609640.018074] change /devices/virtual/block/drbd0 (block)
[609613.882751] block drbd0: role( Secondary -> Primary )
[609613.889998] block drbd0: State change failed: Device is held open by someone
[609613.894280] block drbd0: state = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/Outdated r----- }
[609613.897609] block drbd0: wanted = { cs:WFConnection ro:Secondary/Unknown ds:UpToDate/Outdated r----- }
[609614.909537] block drbd0: role( Primary -> Secondary )
[609614.909662] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
UDEV [609640.024588] change /devices/virtual/block/drbd0 (block)
UDEV [609640.028731] change /devices/virtual/block/drbd0 (block)
@root@ava:~#
### --------------------------------------------------------------
Obviously change s0 and drbd0 appropriately; to exactly hit the time
window where udev has it open may be tricky, and on a busy system even
the sleep 1 can be not enough (so even the second invocation may fail
still), on an idle system udev may already be done, or may not even have
started yet when the first "secondary" runs.
Your best bet is to review your udev rules,
and make sure "drbd*" will not be looked at
by blkid or pvscan or similar.
See also: https://github.com/systemd/systemd/commit/fee854ee8ccde
respectively https://github.com/systemd/systemd/issues/9371
to get rid even of the unconditional open by systemd-udevd itself.
Note,
I'm not saying blkback has no issue; I don't know.
I'm just pointing out that there are other things
that may cause the same effects.
Lars
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-07 13:28 ` Lars Ellenberg
@ 2018-09-07 16:45 ` Valentin Vidic
2018-09-07 16:45 ` Valentin Vidic
1 sibling, 0 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-07 16:45 UTC (permalink / raw)
To: drbd-user
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, Roger Pau Monné
On Fri, Sep 07, 2018 at 03:28:28PM +0200, Lars Ellenberg wrote:
> We don't expose that, no.
> But even if we did, that would not be racefree :-)
>
> The last (or even: any?) "close" of a block device that used to be open
> for WRITE triggeres a udev "change" event, thus a udev run,
> and the minimal action will be some read-only open and ioctl from
> (systemd-)udev itself, more likely there also will be blkid and possibly
> pvscan and similar actions. All of them should be read-only openers,
> and all of them should be "short".
> But they will race with the drbd demotion.
True, but did not find any strange interaction with udev during VM shutdown.
This is what udevadm monitor reports:
KERNEL[1174.220256] remove /devices/vbd-10-51712 (xen-backend)
UDEV [1174.222484] remove /devices/vbd-10-51712 (xen-backend)
KERNEL[1174.224405] remove /devices/console-10-0 (xen-backend)
UDEV [1174.226964] remove /devices/console-10-0 (xen-backend)
KERNEL[1174.287215] change /devices/virtual/block/drbd0 (block)
KERNEL[1174.287267] change /devices/virtual/block/drbd0 (block)
UDEV [1174.295811] change /devices/virtual/block/drbd0 (block)
UDEV [1174.301983] change /devices/virtual/block/drbd0 (block)
Strace on the udev daemon gives only these:
[pid 7416] execve("/sbin/drbdadm", ["/sbin/drbdadm", "sh-udev", "minor-0"], [/* 10 vars */]) = 0
[pid 7416] execve("/lib/drbd/drbdadm-84", ["drbdadm", "sh-udev", "minor-0"], [/* 12 vars */]) = 0
[pid 7418] execve("/sbin/drbdadm", ["/sbin/drbdadm", "sh-udev", "minor-0"], [/* 9 vars */]) = 0
[pid 7418] execve("/lib/drbd/drbdadm-84", ["drbdadm", "sh-udev", "minor-0"], [/* 11 vars */]) = 0
and this should be 65-drbd.rules to add/remove symlinks in /dev.
Adding a dump_stack in drbd_release gives two possible code paths,
both from xen_blkback and the first one from workqueue being the
problematic one:
[ 530.698782] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G O 4.9.0-8-amd64 #1 Debian 4.9.110-3+deb9u4
[ 530.698783] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016
[ 530.698784] Workqueue: events xen_blkif_deferred_free [xen_blkback]
[ 530.698785] 0000000000000000 ffffffff81331e54 ffff883f82143800 ffff883f741b5660
[ 530.698787] ffffffffc03a309e ffff883f741b5580 ffffffff81245ca8 0000000281a186b5
[ 530.698789] 000000008935f100 8412ffa31a64cc4c 000000000000020a ffff883fa6218280
[ 530.698791] Call Trace:
[ 530.698792] [<ffffffff81331e54>] ? dump_stack+0x5c/0x78
[ 530.698805] [<ffffffffc03a309e>] ? drbd_release+0x1e/0x40 [drbd]
[ 530.698810] [<ffffffff81245ca8>] ? __blkdev_put+0x1e8/0x2a0
[ 530.698813] [<ffffffffc055d6b8>] ? xen_vbd_free.isra.9+0x48/0x60 [xen_blkback]
[ 530.698814] [<ffffffffc055d6f7>] ? xen_blkif_deferred_free+0x27/0x70 [xen_blkback]
[ 530.698816] [<ffffffff81092fea>] ? process_one_work+0x18a/0x420
[ 530.698817] [<ffffffff810932cd>] ? worker_thread+0x4d/0x490
[ 530.698818] [<ffffffff81093280>] ? process_one_work+0x420/0x420
[ 530.698820] [<ffffffff81099329>] ? kthread+0xd9/0xf0
[ 530.698822] [<ffffffff81099250>] ? kthread_park+0x60/0x60
[ 530.698823] [<ffffffff81615df7>] ? ret_from_fork+0x57/0x70
[ 1216.251924] CPU: 14 PID: 297 Comm: xenwatch Tainted: G O 4.9.0-8-amd64 #1 Debian 4.9.110-3+deb9u4
[ 1216.251925] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016
[ 1216.251926] 0000000000000000 ffffffff81331e54 ffff883f82143800 ffff883f741b5660
[ 1216.251928] ffffffffc03a309e ffff883f741b5580 ffffffff81245ca8 0000000281a186b5
[ 1216.251930] ffffffffc0562360 97e48f39d448f082 000000000000020a 0000000000000006
[ 1216.251933] Call Trace:
[ 1216.251935] [<ffffffff81331e54>] ? dump_stack+0x5c/0x78
[ 1216.251947] [<ffffffffc03a309e>] ? drbd_release+0x1e/0x40 [drbd]
[ 1216.251951] [<ffffffff81245ca8>] ? __blkdev_put+0x1e8/0x2a0
[ 1216.251954] [<ffffffff81414910>] ? register_xenbus_watch+0xe0/0xe0
[ 1216.251956] [<ffffffffc055d6b8>] ? xen_vbd_free.isra.9+0x48/0x60 [xen_blkback]
[ 1216.251959] [<ffffffffc055e9fe>] ? frontend_changed+0x9e/0x660 [xen_blkback]
[ 1216.251961] [<ffffffff814120d9>] ? xenbus_read_driver_state+0x39/0x60
[ 1216.251962] [<ffffffff8141542c>] ? xenbus_otherend_changed+0x8c/0x120
[ 1216.251964] [<ffffffff81414910>] ? register_xenbus_watch+0xe0/0xe0
[ 1216.251967] [<ffffffff81414995>] ? xenwatch_thread+0x85/0x120
[ 1216.251969] [<ffffffff810bbfb0>] ? prepare_to_wait_event+0xf0/0xf0
[ 1216.251971] [<ffffffff81099329>] ? kthread+0xd9/0xf0
[ 1216.251973] [<ffffffff81099250>] ? kthread_park+0x60/0x60
[ 1216.251975] [<ffffffff81615df7>] ? ret_from_fork+0x57/0x70
--
Valentin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-07 13:28 ` Lars Ellenberg
2018-09-07 16:45 ` Valentin Vidic
@ 2018-09-07 16:45 ` Valentin Vidic
2018-09-07 17:14 ` Valentin Vidic
2018-09-07 17:14 ` Valentin Vidic
1 sibling, 2 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-07 16:45 UTC (permalink / raw)
To: drbd-user
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, Roger Pau Monné
On Fri, Sep 07, 2018 at 03:28:28PM +0200, Lars Ellenberg wrote:
> We don't expose that, no.
> But even if we did, that would not be racefree :-)
>
> The last (or even: any?) "close" of a block device that used to be open
> for WRITE triggeres a udev "change" event, thus a udev run,
> and the minimal action will be some read-only open and ioctl from
> (systemd-)udev itself, more likely there also will be blkid and possibly
> pvscan and similar actions. All of them should be read-only openers,
> and all of them should be "short".
> But they will race with the drbd demotion.
True, but did not find any strange interaction with udev during VM shutdown.
This is what udevadm monitor reports:
KERNEL[1174.220256] remove /devices/vbd-10-51712 (xen-backend)
UDEV [1174.222484] remove /devices/vbd-10-51712 (xen-backend)
KERNEL[1174.224405] remove /devices/console-10-0 (xen-backend)
UDEV [1174.226964] remove /devices/console-10-0 (xen-backend)
KERNEL[1174.287215] change /devices/virtual/block/drbd0 (block)
KERNEL[1174.287267] change /devices/virtual/block/drbd0 (block)
UDEV [1174.295811] change /devices/virtual/block/drbd0 (block)
UDEV [1174.301983] change /devices/virtual/block/drbd0 (block)
Strace on the udev daemon gives only these:
[pid 7416] execve("/sbin/drbdadm", ["/sbin/drbdadm", "sh-udev", "minor-0"], [/* 10 vars */]) = 0
[pid 7416] execve("/lib/drbd/drbdadm-84", ["drbdadm", "sh-udev", "minor-0"], [/* 12 vars */]) = 0
[pid 7418] execve("/sbin/drbdadm", ["/sbin/drbdadm", "sh-udev", "minor-0"], [/* 9 vars */]) = 0
[pid 7418] execve("/lib/drbd/drbdadm-84", ["drbdadm", "sh-udev", "minor-0"], [/* 11 vars */]) = 0
and this should be 65-drbd.rules to add/remove symlinks in /dev.
Adding a dump_stack in drbd_release gives two possible code paths,
both from xen_blkback and the first one from workqueue being the
problematic one:
[ 530.698782] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G O 4.9.0-8-amd64 #1 Debian 4.9.110-3+deb9u4
[ 530.698783] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016
[ 530.698784] Workqueue: events xen_blkif_deferred_free [xen_blkback]
[ 530.698785] 0000000000000000 ffffffff81331e54 ffff883f82143800 ffff883f741b5660
[ 530.698787] ffffffffc03a309e ffff883f741b5580 ffffffff81245ca8 0000000281a186b5
[ 530.698789] 000000008935f100 8412ffa31a64cc4c 000000000000020a ffff883fa6218280
[ 530.698791] Call Trace:
[ 530.698792] [<ffffffff81331e54>] ? dump_stack+0x5c/0x78
[ 530.698805] [<ffffffffc03a309e>] ? drbd_release+0x1e/0x40 [drbd]
[ 530.698810] [<ffffffff81245ca8>] ? __blkdev_put+0x1e8/0x2a0
[ 530.698813] [<ffffffffc055d6b8>] ? xen_vbd_free.isra.9+0x48/0x60 [xen_blkback]
[ 530.698814] [<ffffffffc055d6f7>] ? xen_blkif_deferred_free+0x27/0x70 [xen_blkback]
[ 530.698816] [<ffffffff81092fea>] ? process_one_work+0x18a/0x420
[ 530.698817] [<ffffffff810932cd>] ? worker_thread+0x4d/0x490
[ 530.698818] [<ffffffff81093280>] ? process_one_work+0x420/0x420
[ 530.698820] [<ffffffff81099329>] ? kthread+0xd9/0xf0
[ 530.698822] [<ffffffff81099250>] ? kthread_park+0x60/0x60
[ 530.698823] [<ffffffff81615df7>] ? ret_from_fork+0x57/0x70
[ 1216.251924] CPU: 14 PID: 297 Comm: xenwatch Tainted: G O 4.9.0-8-amd64 #1 Debian 4.9.110-3+deb9u4
[ 1216.251925] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016
[ 1216.251926] 0000000000000000 ffffffff81331e54 ffff883f82143800 ffff883f741b5660
[ 1216.251928] ffffffffc03a309e ffff883f741b5580 ffffffff81245ca8 0000000281a186b5
[ 1216.251930] ffffffffc0562360 97e48f39d448f082 000000000000020a 0000000000000006
[ 1216.251933] Call Trace:
[ 1216.251935] [<ffffffff81331e54>] ? dump_stack+0x5c/0x78
[ 1216.251947] [<ffffffffc03a309e>] ? drbd_release+0x1e/0x40 [drbd]
[ 1216.251951] [<ffffffff81245ca8>] ? __blkdev_put+0x1e8/0x2a0
[ 1216.251954] [<ffffffff81414910>] ? register_xenbus_watch+0xe0/0xe0
[ 1216.251956] [<ffffffffc055d6b8>] ? xen_vbd_free.isra.9+0x48/0x60 [xen_blkback]
[ 1216.251959] [<ffffffffc055e9fe>] ? frontend_changed+0x9e/0x660 [xen_blkback]
[ 1216.251961] [<ffffffff814120d9>] ? xenbus_read_driver_state+0x39/0x60
[ 1216.251962] [<ffffffff8141542c>] ? xenbus_otherend_changed+0x8c/0x120
[ 1216.251964] [<ffffffff81414910>] ? register_xenbus_watch+0xe0/0xe0
[ 1216.251967] [<ffffffff81414995>] ? xenwatch_thread+0x85/0x120
[ 1216.251969] [<ffffffff810bbfb0>] ? prepare_to_wait_event+0xf0/0xf0
[ 1216.251971] [<ffffffff81099329>] ? kthread+0xd9/0xf0
[ 1216.251973] [<ffffffff81099250>] ? kthread_park+0x60/0x60
[ 1216.251975] [<ffffffff81615df7>] ? ret_from_fork+0x57/0x70
--
Valentin
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-07 16:45 ` Valentin Vidic
@ 2018-09-07 17:14 ` Valentin Vidic
2018-09-07 17:14 ` Valentin Vidic
1 sibling, 0 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-07 17:14 UTC (permalink / raw)
To: drbd-user
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, Roger Pau Monné
On Fri, Sep 07, 2018 at 06:45:00PM +0200, Valentin Vidic wrote:
> Adding a dump_stack in drbd_release gives two possible code paths,
> both from xen_blkback and the first one from workqueue being the
> problematic one:
In fact the first one is the original code path before I modified
blkback. The problem is it gets executed async from workqueue so
it might not always run before the call to drbdadm secondary.
> [ 530.698782] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G O 4.9.0-8-amd64 #1 Debian 4.9.110-3+deb9u4
> [ 530.698783] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016
> [ 530.698784] Workqueue: events xen_blkif_deferred_free [xen_blkback]
> [ 530.698785] 0000000000000000 ffffffff81331e54 ffff883f82143800 ffff883f741b5660
> [ 530.698787] ffffffffc03a309e ffff883f741b5580 ffffffff81245ca8 0000000281a186b5
> [ 530.698789] 000000008935f100 8412ffa31a64cc4c 000000000000020a ffff883fa6218280
> [ 530.698791] Call Trace:
> [ 530.698792] [<ffffffff81331e54>] ? dump_stack+0x5c/0x78
> [ 530.698805] [<ffffffffc03a309e>] ? drbd_release+0x1e/0x40 [drbd]
> [ 530.698810] [<ffffffff81245ca8>] ? __blkdev_put+0x1e8/0x2a0
> [ 530.698813] [<ffffffffc055d6b8>] ? xen_vbd_free.isra.9+0x48/0x60 [xen_blkback]
> [ 530.698814] [<ffffffffc055d6f7>] ? xen_blkif_deferred_free+0x27/0x70 [xen_blkback]
> [ 530.698816] [<ffffffff81092fea>] ? process_one_work+0x18a/0x420
> [ 530.698817] [<ffffffff810932cd>] ? worker_thread+0x4d/0x490
> [ 530.698818] [<ffffffff81093280>] ? process_one_work+0x420/0x420
> [ 530.698820] [<ffffffff81099329>] ? kthread+0xd9/0xf0
> [ 530.698822] [<ffffffff81099250>] ? kthread_park+0x60/0x60
> [ 530.698823] [<ffffffff81615df7>] ? ret_from_fork+0x57/0x70
--
Valentin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-07 16:45 ` Valentin Vidic
2018-09-07 17:14 ` Valentin Vidic
@ 2018-09-07 17:14 ` Valentin Vidic
2018-09-08 7:34 ` Valentin Vidic
2018-09-08 7:34 ` Valentin Vidic
1 sibling, 2 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-07 17:14 UTC (permalink / raw)
To: drbd-user
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, Roger Pau Monné
On Fri, Sep 07, 2018 at 06:45:00PM +0200, Valentin Vidic wrote:
> Adding a dump_stack in drbd_release gives two possible code paths,
> both from xen_blkback and the first one from workqueue being the
> problematic one:
In fact the first one is the original code path before I modified
blkback. The problem is it gets executed async from workqueue so
it might not always run before the call to drbdadm secondary.
> [ 530.698782] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G O 4.9.0-8-amd64 #1 Debian 4.9.110-3+deb9u4
> [ 530.698783] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016
> [ 530.698784] Workqueue: events xen_blkif_deferred_free [xen_blkback]
> [ 530.698785] 0000000000000000 ffffffff81331e54 ffff883f82143800 ffff883f741b5660
> [ 530.698787] ffffffffc03a309e ffff883f741b5580 ffffffff81245ca8 0000000281a186b5
> [ 530.698789] 000000008935f100 8412ffa31a64cc4c 000000000000020a ffff883fa6218280
> [ 530.698791] Call Trace:
> [ 530.698792] [<ffffffff81331e54>] ? dump_stack+0x5c/0x78
> [ 530.698805] [<ffffffffc03a309e>] ? drbd_release+0x1e/0x40 [drbd]
> [ 530.698810] [<ffffffff81245ca8>] ? __blkdev_put+0x1e8/0x2a0
> [ 530.698813] [<ffffffffc055d6b8>] ? xen_vbd_free.isra.9+0x48/0x60 [xen_blkback]
> [ 530.698814] [<ffffffffc055d6f7>] ? xen_blkif_deferred_free+0x27/0x70 [xen_blkback]
> [ 530.698816] [<ffffffff81092fea>] ? process_one_work+0x18a/0x420
> [ 530.698817] [<ffffffff810932cd>] ? worker_thread+0x4d/0x490
> [ 530.698818] [<ffffffff81093280>] ? process_one_work+0x420/0x420
> [ 530.698820] [<ffffffff81099329>] ? kthread+0xd9/0xf0
> [ 530.698822] [<ffffffff81099250>] ? kthread_park+0x60/0x60
> [ 530.698823] [<ffffffff81615df7>] ? ret_from_fork+0x57/0x70
--
Valentin
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-07 17:14 ` Valentin Vidic
@ 2018-09-08 7:34 ` Valentin Vidic
2018-09-10 12:45 ` Lars Ellenberg
2018-09-10 12:45 ` Lars Ellenberg
2018-09-08 7:34 ` Valentin Vidic
1 sibling, 2 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-08 7:34 UTC (permalink / raw)
To: drbd-user
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, Roger Pau Monné
On Fri, Sep 07, 2018 at 07:14:59PM +0200, Valentin Vidic wrote:
> In fact the first one is the original code path before I modified
> blkback. The problem is it gets executed async from workqueue so
> it might not always run before the call to drbdadm secondary.
As the DRBD device gets released only when the last IO request
has finished, I found a way to check and wait for this in the
block-drbd script:
--- block-drbd.orig 2018-09-08 09:07:23.499648515 +0200
+++ block-drbd 2018-09-08 09:28:12.892193649 +0200
@@ -230,6 +230,24 @@
and so cannot be mounted ${m2}${when}."
}
+wait_for_inflight()
+{
+ local dev="$1"
+ local inflight="/sys/block/${dev#/dev/}/inflight"
+ local rd wr
+
+ if ! [ -f "$inflight" ]; then
+ return
+ fi
+
+ while true; do
+ read rd wr < $inflight
+ if [ "$rd" = "0" -a "$wr" = "0" ]; then
+ return
+ fi
+ sleep 1
+ done
+}
t=$(xenstore_read_default "$XENBUS_PATH/type" 'MISSING')
@@ -285,6 +303,8 @@
drbd_lrole="${drbd_role%%/*}"
drbd_dev="$(drbdadm sh-dev $drbd_resource)"
+ wait_for_inflight $drbd_dev
+
if [ "$drbd_lrole" != 'Secondary' ]; then
drbdadm secondary $drbd_resource
fi
--
Valentin
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-08 7:34 ` Valentin Vidic
@ 2018-09-10 12:45 ` Lars Ellenberg
2018-09-10 12:45 ` Lars Ellenberg
1 sibling, 0 replies; 80+ messages in thread
From: Lars Ellenberg @ 2018-09-10 12:45 UTC (permalink / raw)
To: Valentin Vidic
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, drbd-user, Roger Pau Monné
On Sat, Sep 08, 2018 at 09:34:32AM +0200, Valentin Vidic wrote:
> On Fri, Sep 07, 2018 at 07:14:59PM +0200, Valentin Vidic wrote:
> > In fact the first one is the original code path before I modified
> > blkback. The problem is it gets executed async from workqueue so
> > it might not always run before the call to drbdadm secondary.
>
> As the DRBD device gets released only when the last IO request
> has finished, I found a way to check and wait for this in the
> block-drbd script:
> --- block-drbd.orig 2018-09-08 09:07:23.499648515 +0200
> +++ block-drbd 2018-09-08 09:28:12.892193649 +0200
> @@ -230,6 +230,24 @@
> and so cannot be mounted ${m2}${when}."
> }
>
> +wait_for_inflight()
> +{
> + local dev="$1"
> + local inflight="/sys/block/${dev#/dev/}/inflight"
> + local rd wr
> +
> + if ! [ -f "$inflight" ]; then
> + return
> + fi
> +
> + while true; do
> + read rd wr < $inflight
> + if [ "$rd" = "0" -a "$wr" = "0" ]; then
If it is "idle" now, but still "open",
this will not sleep, and still fail the demotion below.
> + return
> + fi
> + sleep 1
> + done
> +}
>
> t=$(xenstore_read_default "$XENBUS_PATH/type" 'MISSING')
>
> @@ -285,6 +303,8 @@
> drbd_lrole="${drbd_role%%/*}"
> drbd_dev="$(drbdadm sh-dev $drbd_resource)"
>
> + wait_for_inflight $drbd_dev
> +
> if [ "$drbd_lrole" != 'Secondary' ]; then
> drbdadm secondary $drbd_resource
You try to help it by "waiting forever until it appears to be idle".
I suggest to at least limit the retries by iteration or time.
And also (or, instead; but you'd potentially get a number of
"scary messages" in the logs) add something like:
for i in 1 2 3 5 7 x; do
drbdadm secondary $drbd_resource && exit 0
if [ $i = x ]; then
# ... "appears to still be in use, maybe by" ...
fuser -v $drbd_dev
exit 1
# else ... "will retry in $i seconds" ...
fi
sleep $i
done
...
Or, well, yes, fix blkback to not "defer" the final close "too long",
if at all possible.
Lars
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-08 7:34 ` Valentin Vidic
2018-09-10 12:45 ` Lars Ellenberg
@ 2018-09-10 12:45 ` Lars Ellenberg
2018-09-10 13:22 ` Valentin Vidic
2018-09-10 13:22 ` Valentin Vidic
1 sibling, 2 replies; 80+ messages in thread
From: Lars Ellenberg @ 2018-09-10 12:45 UTC (permalink / raw)
To: Valentin Vidic
Cc: drbd-user, Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel,
stable, linux-block, xen-devel, Roger Pau Monné
On Sat, Sep 08, 2018 at 09:34:32AM +0200, Valentin Vidic wrote:
> On Fri, Sep 07, 2018 at 07:14:59PM +0200, Valentin Vidic wrote:
> > In fact the first one is the original code path before I modified
> > blkback. The problem is it gets executed async from workqueue so
> > it might not always run before the call to drbdadm secondary.
>
> As the DRBD device gets released only when the last IO request
> has finished, I found a way to check and wait for this in the
> block-drbd script:
> --- block-drbd.orig 2018-09-08 09:07:23.499648515 +0200
> +++ block-drbd 2018-09-08 09:28:12.892193649 +0200
> @@ -230,6 +230,24 @@
> and so cannot be mounted ${m2}${when}."
> }
>
> +wait_for_inflight()
> +{
> + local dev="$1"
> + local inflight="/sys/block/${dev#/dev/}/inflight"
> + local rd wr
> +
> + if ! [ -f "$inflight" ]; then
> + return
> + fi
> +
> + while true; do
> + read rd wr < $inflight
> + if [ "$rd" = "0" -a "$wr" = "0" ]; then
If it is "idle" now, but still "open",
this will not sleep, and still fail the demotion below.
> + return
> + fi
> + sleep 1
> + done
> +}
>
> t=$(xenstore_read_default "$XENBUS_PATH/type" 'MISSING')
>
> @@ -285,6 +303,8 @@
> drbd_lrole="${drbd_role%%/*}"
> drbd_dev="$(drbdadm sh-dev $drbd_resource)"
>
> + wait_for_inflight $drbd_dev
> +
> if [ "$drbd_lrole" != 'Secondary' ]; then
> drbdadm secondary $drbd_resource
You try to help it by "waiting forever until it appears to be idle".
I suggest to at least limit the retries by iteration or time.
And also (or, instead; but you'd potentially get a number of
"scary messages" in the logs) add something like:
for i in 1 2 3 5 7 x; do
drbdadm secondary $drbd_resource && exit 0
if [ $i = x ]; then
# ... "appears to still be in use, maybe by" ...
fuser -v $drbd_dev
exit 1
# else ... "will retry in $i seconds" ...
fi
sleep $i
done
...
Or, well, yes, fix blkback to not "defer" the final close "too long",
if at all possible.
Lars
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-10 12:45 ` Lars Ellenberg
@ 2018-09-10 13:22 ` Valentin Vidic
2018-09-10 13:22 ` Valentin Vidic
1 sibling, 0 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-10 13:22 UTC (permalink / raw)
To: drbd-user
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, Roger Pau Monné
On Mon, Sep 10, 2018 at 02:45:31PM +0200, Lars Ellenberg wrote:
> On Sat, Sep 08, 2018 at 09:34:32AM +0200, Valentin Vidic wrote:
> > On Fri, Sep 07, 2018 at 07:14:59PM +0200, Valentin Vidic wrote:
> > > In fact the first one is the original code path before I modified
> > > blkback. The problem is it gets executed async from workqueue so
> > > it might not always run before the call to drbdadm secondary.
> >
> > As the DRBD device gets released only when the last IO request
> > has finished, I found a way to check and wait for this in the
> > block-drbd script:
>
> > --- block-drbd.orig 2018-09-08 09:07:23.499648515 +0200
> > +++ block-drbd 2018-09-08 09:28:12.892193649 +0200
> > @@ -230,6 +230,24 @@
> > and so cannot be mounted ${m2}${when}."
> > }
> >
> > +wait_for_inflight()
> > +{
> > + local dev="$1"
> > + local inflight="/sys/block/${dev#/dev/}/inflight"
> > + local rd wr
> > +
> > + if ! [ -f "$inflight" ]; then
> > + return
> > + fi
> > +
> > + while true; do
> > + read rd wr < $inflight
> > + if [ "$rd" = "0" -a "$wr" = "0" ]; then
>
> If it is "idle" now, but still "open",
> this will not sleep, and still fail the demotion below.
True, but in this case blkback is holding it open until all
the writes have finished and the last write closes the device.
Since fuser can't check blkback this is an approximation that
seems to work because I don't get any failed drbdadm calls now.
> You try to help it by "waiting forever until it appears to be idle".
> I suggest to at least limit the retries by iteration or time.
> And also (or, instead; but you'd potentially get a number of
> "scary messages" in the logs) add something like:
Ok, should I open a PR to discuss this change further?
> Or, well, yes, fix blkback to not "defer" the final close "too long",
> if at all possible.
blkback needs to finish the writes on shutdown or I get a fsck errors
on next boot. Ideally XenbusStateClosed should be delayed until the
device release but currently it does not seem possible without breaking
other things.
--
Valentin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-10 12:45 ` Lars Ellenberg
2018-09-10 13:22 ` Valentin Vidic
@ 2018-09-10 13:22 ` Valentin Vidic
2018-09-10 15:00 ` Roger Pau Monné
2018-09-10 15:00 ` Roger Pau Monné
1 sibling, 2 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-10 13:22 UTC (permalink / raw)
To: drbd-user
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, Roger Pau Monné
On Mon, Sep 10, 2018 at 02:45:31PM +0200, Lars Ellenberg wrote:
> On Sat, Sep 08, 2018 at 09:34:32AM +0200, Valentin Vidic wrote:
> > On Fri, Sep 07, 2018 at 07:14:59PM +0200, Valentin Vidic wrote:
> > > In fact the first one is the original code path before I modified
> > > blkback. The problem is it gets executed async from workqueue so
> > > it might not always run before the call to drbdadm secondary.
> >
> > As the DRBD device gets released only when the last IO request
> > has finished, I found a way to check and wait for this in the
> > block-drbd script:
>
> > --- block-drbd.orig 2018-09-08 09:07:23.499648515 +0200
> > +++ block-drbd 2018-09-08 09:28:12.892193649 +0200
> > @@ -230,6 +230,24 @@
> > and so cannot be mounted ${m2}${when}."
> > }
> >
> > +wait_for_inflight()
> > +{
> > + local dev="$1"
> > + local inflight="/sys/block/${dev#/dev/}/inflight"
> > + local rd wr
> > +
> > + if ! [ -f "$inflight" ]; then
> > + return
> > + fi
> > +
> > + while true; do
> > + read rd wr < $inflight
> > + if [ "$rd" = "0" -a "$wr" = "0" ]; then
>
> If it is "idle" now, but still "open",
> this will not sleep, and still fail the demotion below.
True, but in this case blkback is holding it open until all
the writes have finished and the last write closes the device.
Since fuser can't check blkback this is an approximation that
seems to work because I don't get any failed drbdadm calls now.
> You try to help it by "waiting forever until it appears to be idle".
> I suggest to at least limit the retries by iteration or time.
> And also (or, instead; but you'd potentially get a number of
> "scary messages" in the logs) add something like:
Ok, should I open a PR to discuss this change further?
> Or, well, yes, fix blkback to not "defer" the final close "too long",
> if at all possible.
blkback needs to finish the writes on shutdown or I get a fsck errors
on next boot. Ideally XenbusStateClosed should be delayed until the
device release but currently it does not seem possible without breaking
other things.
--
Valentin
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-10 13:22 ` Valentin Vidic
@ 2018-09-10 15:00 ` Roger Pau Monné
2018-09-10 15:00 ` Roger Pau Monné
1 sibling, 0 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-09-10 15:00 UTC (permalink / raw)
To: Valentin Vidic
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, drbd-user
On Mon, Sep 10, 2018 at 03:22:52PM +0200, Valentin Vidic wrote:
> On Mon, Sep 10, 2018 at 02:45:31PM +0200, Lars Ellenberg wrote:
> > On Sat, Sep 08, 2018 at 09:34:32AM +0200, Valentin Vidic wrote:
> > > On Fri, Sep 07, 2018 at 07:14:59PM +0200, Valentin Vidic wrote:
> > > > In fact the first one is the original code path before I modified
> > > > blkback. The problem is it gets executed async from workqueue so
> > > > it might not always run before the call to drbdadm secondary.
> > >
> > > As the DRBD device gets released only when the last IO request
> > > has finished, I found a way to check and wait for this in the
> > > block-drbd script:
> >
> > > --- block-drbd.orig 2018-09-08 09:07:23.499648515 +0200
> > > +++ block-drbd 2018-09-08 09:28:12.892193649 +0200
> > > @@ -230,6 +230,24 @@
> > > and so cannot be mounted ${m2}${when}."
> > > }
> > >
> > > +wait_for_inflight()
> > > +{
> > > + local dev="$1"
> > > + local inflight="/sys/block/${dev#/dev/}/inflight"
> > > + local rd wr
> > > +
> > > + if ! [ -f "$inflight" ]; then
> > > + return
> > > + fi
> > > +
> > > + while true; do
> > > + read rd wr < $inflight
> > > + if [ "$rd" = "0" -a "$wr" = "0" ]; then
> >
> > If it is "idle" now, but still "open",
> > this will not sleep, and still fail the demotion below.
>
> True, but in this case blkback is holding it open until all
> the writes have finished and the last write closes the device.
> Since fuser can't check blkback this is an approximation that
> seems to work because I don't get any failed drbdadm calls now.
>
> > You try to help it by "waiting forever until it appears to be idle".
> > I suggest to at least limit the retries by iteration or time.
> > And also (or, instead; but you'd potentially get a number of
> > "scary messages" in the logs) add something like:
>
> Ok, should I open a PR to discuss this change further?
>
> > Or, well, yes, fix blkback to not "defer" the final close "too long",
> > if at all possible.
>
> blkback needs to finish the writes on shutdown or I get a fsck errors
> on next boot. Ideally XenbusStateClosed should be delayed until the
> device release but currently it does not seem possible without breaking
> other things.
I can try to take a look at this and attempt to make sure the state is
only changed to closed in blkback _after_ the device has been
released, but it might take me a couple of days to get you a patch.
I'm afraid that other hotplug scripts will also have issues with such
behavior, and we shouldn't force all users of hotplug scripts to add
such workarounds.
Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-10 13:22 ` Valentin Vidic
2018-09-10 15:00 ` Roger Pau Monné
@ 2018-09-10 15:00 ` Roger Pau Monné
2018-09-10 16:18 ` Valentin Vidic
2018-09-10 16:18 ` Valentin Vidic
1 sibling, 2 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-09-10 15:00 UTC (permalink / raw)
To: Valentin Vidic
Cc: drbd-user, Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel,
stable, linux-block, xen-devel
On Mon, Sep 10, 2018 at 03:22:52PM +0200, Valentin Vidic wrote:
> On Mon, Sep 10, 2018 at 02:45:31PM +0200, Lars Ellenberg wrote:
> > On Sat, Sep 08, 2018 at 09:34:32AM +0200, Valentin Vidic wrote:
> > > On Fri, Sep 07, 2018 at 07:14:59PM +0200, Valentin Vidic wrote:
> > > > In fact the first one is the original code path before I modified
> > > > blkback. The problem is it gets executed async from workqueue so
> > > > it might not always run before the call to drbdadm secondary.
> > >
> > > As the DRBD device gets released only when the last IO request
> > > has finished, I found a way to check and wait for this in the
> > > block-drbd script:
> >
> > > --- block-drbd.orig 2018-09-08 09:07:23.499648515 +0200
> > > +++ block-drbd 2018-09-08 09:28:12.892193649 +0200
> > > @@ -230,6 +230,24 @@
> > > and so cannot be mounted ${m2}${when}."
> > > }
> > >
> > > +wait_for_inflight()
> > > +{
> > > + local dev="$1"
> > > + local inflight="/sys/block/${dev#/dev/}/inflight"
> > > + local rd wr
> > > +
> > > + if ! [ -f "$inflight" ]; then
> > > + return
> > > + fi
> > > +
> > > + while true; do
> > > + read rd wr < $inflight
> > > + if [ "$rd" = "0" -a "$wr" = "0" ]; then
> >
> > If it is "idle" now, but still "open",
> > this will not sleep, and still fail the demotion below.
>
> True, but in this case blkback is holding it open until all
> the writes have finished and the last write closes the device.
> Since fuser can't check blkback this is an approximation that
> seems to work because I don't get any failed drbdadm calls now.
>
> > You try to help it by "waiting forever until it appears to be idle".
> > I suggest to at least limit the retries by iteration or time.
> > And also (or, instead; but you'd potentially get a number of
> > "scary messages" in the logs) add something like:
>
> Ok, should I open a PR to discuss this change further?
>
> > Or, well, yes, fix blkback to not "defer" the final close "too long",
> > if at all possible.
>
> blkback needs to finish the writes on shutdown or I get a fsck errors
> on next boot. Ideally XenbusStateClosed should be delayed until the
> device release but currently it does not seem possible without breaking
> other things.
I can try to take a look at this and attempt to make sure the state is
only changed to closed in blkback _after_ the device has been
released, but it might take me a couple of days to get you a patch.
I'm afraid that other hotplug scripts will also have issues with such
behavior, and we shouldn't force all users of hotplug scripts to add
such workarounds.
Roger.
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-10 15:00 ` Roger Pau Monné
@ 2018-09-10 16:18 ` Valentin Vidic
2018-09-10 16:18 ` Valentin Vidic
1 sibling, 0 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-10 16:18 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, drbd-user
On Mon, Sep 10, 2018 at 05:00:58PM +0200, Roger Pau Monné wrote:
> I can try to take a look at this and attempt to make sure the state is
> only changed to closed in blkback _after_ the device has been
> released, but it might take me a couple of days to get you a patch.
Thanks, I have two test nodes now where I can try different approaches.
> I'm afraid that other hotplug scripts will also have issues with such
> behavior, and we shouldn't force all users of hotplug scripts to add
> such workarounds.
True, iSCSI and other "network" disks might have similar problems.
--
Valentin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-10 15:00 ` Roger Pau Monné
@ 2018-09-10 16:18 ` Valentin Vidic
2018-09-10 16:18 ` Valentin Vidic
1 sibling, 0 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-10 16:18 UTC (permalink / raw)
To: Roger Pau Monné
Cc: drbd-user, Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel,
stable, linux-block, xen-devel
On Mon, Sep 10, 2018 at 05:00:58PM +0200, Roger Pau Monn� wrote:
> I can try to take a look at this and attempt to make sure the state is
> only changed to closed in blkback _after_ the device has been
> released, but it might take me a couple of days to get you a patch.
Thanks, I have two test nodes now where I can try different approaches.
> I'm afraid that other hotplug scripts will also have issues with such
> behavior, and we shouldn't force all users of hotplug scripts to add
> such workarounds.
True, iSCSI and other "network" disks might have similar problems.
--
Valentin
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
@ 2018-09-10 16:18 ` Valentin Vidic
0 siblings, 0 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-10 16:18 UTC (permalink / raw)
To: Roger Pau Monné
Cc: drbd-user, Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel,
stable, linux-block, xen-devel
On Mon, Sep 10, 2018 at 05:00:58PM +0200, Roger Pau Monné wrote:
> I can try to take a look at this and attempt to make sure the state is
> only changed to closed in blkback _after_ the device has been
> released, but it might take me a couple of days to get you a patch.
Thanks, I have two test nodes now where I can try different approaches.
> I'm afraid that other hotplug scripts will also have issues with such
> behavior, and we shouldn't force all users of hotplug scripts to add
> such workarounds.
True, iSCSI and other "network" disks might have similar problems.
--
Valentin
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-10 16:18 ` Valentin Vidic
(?)
@ 2018-09-13 15:08 ` Roger Pau Monné
-1 siblings, 0 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-09-13 15:08 UTC (permalink / raw)
To: Valentin Vidic
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, drbd-user
On Mon, Sep 10, 2018 at 06:18:31PM +0200, Valentin Vidic wrote:
> On Mon, Sep 10, 2018 at 05:00:58PM +0200, Roger Pau Monné wrote:
> > I can try to take a look at this and attempt to make sure the state is
> > only changed to closed in blkback _after_ the device has been
> > released, but it might take me a couple of days to get you a patch.
>
> Thanks, I have two test nodes now where I can try different approaches.
>
> > I'm afraid that other hotplug scripts will also have issues with such
> > behavior, and we shouldn't force all users of hotplug scripts to add
> > such workarounds.
>
> True, iSCSI and other "network" disks might have similar problems.
So I have the following patch which I think might solve your issues
while keeping the reset logic working. Would you mind giving it a try
with your use case and pvgrub?
Thanks, Roger.
---8<---
From c6a76a6265f29440c886b422c9d1b8549ffd6ed8 Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <roger.pau@citrix.com>
Date: Thu, 13 Sep 2018 11:27:57 +0200
Subject: [PATCH] xen-blkback: only set Closed state when resources have been
released
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Don't set the backend to Closed state until all the resources have
been released and the blkback instance is gone. Setting the backend
state to Closed will be done by xenbus_dev_remove after removing the
device instance.
Note that this requires disconnecting and allowing reconnection with
state Closing, since the Closed state will only be reached when
detaching the device.
Reported-by: Valentin Vidic <Valentin.Vidic@carnet.hr>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
drivers/block/xen-blkback/xenbus.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index a4bc74e72c39..b0629f1b0511 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -770,7 +770,8 @@ static void frontend_changed(struct xenbus_device *dev,
switch (frontend_state) {
case XenbusStateInitialising:
- if (dev->state == XenbusStateClosed) {
+ if (dev->state == XenbusStateClosed ||
+ dev->state == XenbusStateClosing) {
pr_info("%s: prepare for reconnect\n", dev->nodename);
xenbus_switch_state(dev, XenbusStateInitWait);
}
@@ -809,12 +810,12 @@ static void frontend_changed(struct xenbus_device *dev,
break;
case XenbusStateClosing:
+ xen_blkif_disconnect(be->blkif);
xenbus_switch_state(dev, XenbusStateClosing);
break;
case XenbusStateClosed:
xen_blkif_disconnect(be->blkif);
- xenbus_switch_state(dev, XenbusStateClosed);
if (xenbus_dev_is_online(dev))
break;
/* fall through */
--
2.18.0
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-10 16:18 ` Valentin Vidic
@ 2018-09-13 15:08 ` Roger Pau Monné
-1 siblings, 0 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-09-13 15:08 UTC (permalink / raw)
To: Valentin Vidic
Cc: drbd-user, Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel,
stable, linux-block, xen-devel
On Mon, Sep 10, 2018 at 06:18:31PM +0200, Valentin Vidic wrote:
> On Mon, Sep 10, 2018 at 05:00:58PM +0200, Roger Pau Monn� wrote:
> > I can try to take a look at this and attempt to make sure the state is
> > only changed to closed in blkback _after_ the device has been
> > released, but it might take me a couple of days to get you a patch.
>
> Thanks, I have two test nodes now where I can try different approaches.
>
> > I'm afraid that other hotplug scripts will also have issues with such
> > behavior, and we shouldn't force all users of hotplug scripts to add
> > such workarounds.
>
> True, iSCSI and other "network" disks might have similar problems.
So I have the following patch which I think might solve your issues
while keeping the reset logic working. Would you mind giving it a try
with your use case and pvgrub?
Thanks, Roger.
---8<---
>From c6a76a6265f29440c886b422c9d1b8549ffd6ed8 Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <roger.pau@citrix.com>
Date: Thu, 13 Sep 2018 11:27:57 +0200
Subject: [PATCH] xen-blkback: only set Closed state when resources have been
released
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Don't set the backend to Closed state until all the resources have
been released and the blkback instance is gone. Setting the backend
state to Closed will be done by xenbus_dev_remove after removing the
device instance.
Note that this requires disconnecting and allowing reconnection with
state Closing, since the Closed state will only be reached when
detaching the device.
Reported-by: Valentin Vidic <Valentin.Vidic@carnet.hr>
Signed-off-by: Roger Pau Monn� <roger.pau@citrix.com>
---
drivers/block/xen-blkback/xenbus.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index a4bc74e72c39..b0629f1b0511 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -770,7 +770,8 @@ static void frontend_changed(struct xenbus_device *dev,
switch (frontend_state) {
case XenbusStateInitialising:
- if (dev->state == XenbusStateClosed) {
+ if (dev->state == XenbusStateClosed ||
+ dev->state == XenbusStateClosing) {
pr_info("%s: prepare for reconnect\n", dev->nodename);
xenbus_switch_state(dev, XenbusStateInitWait);
}
@@ -809,12 +810,12 @@ static void frontend_changed(struct xenbus_device *dev,
break;
case XenbusStateClosing:
+ xen_blkif_disconnect(be->blkif);
xenbus_switch_state(dev, XenbusStateClosing);
break;
case XenbusStateClosed:
xen_blkif_disconnect(be->blkif);
- xenbus_switch_state(dev, XenbusStateClosed);
if (xenbus_dev_is_online(dev))
break;
/* fall through */
--
2.18.0
^ permalink raw reply related [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
@ 2018-09-13 15:08 ` Roger Pau Monné
0 siblings, 0 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-09-13 15:08 UTC (permalink / raw)
To: Valentin Vidic
Cc: drbd-user, Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel,
stable, linux-block, xen-devel
On Mon, Sep 10, 2018 at 06:18:31PM +0200, Valentin Vidic wrote:
> On Mon, Sep 10, 2018 at 05:00:58PM +0200, Roger Pau Monné wrote:
> > I can try to take a look at this and attempt to make sure the state is
> > only changed to closed in blkback _after_ the device has been
> > released, but it might take me a couple of days to get you a patch.
>
> Thanks, I have two test nodes now where I can try different approaches.
>
> > I'm afraid that other hotplug scripts will also have issues with such
> > behavior, and we shouldn't force all users of hotplug scripts to add
> > such workarounds.
>
> True, iSCSI and other "network" disks might have similar problems.
So I have the following patch which I think might solve your issues
while keeping the reset logic working. Would you mind giving it a try
with your use case and pvgrub?
Thanks, Roger.
---8<---
From c6a76a6265f29440c886b422c9d1b8549ffd6ed8 Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <roger.pau@citrix.com>
Date: Thu, 13 Sep 2018 11:27:57 +0200
Subject: [PATCH] xen-blkback: only set Closed state when resources have been
released
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Don't set the backend to Closed state until all the resources have
been released and the blkback instance is gone. Setting the backend
state to Closed will be done by xenbus_dev_remove after removing the
device instance.
Note that this requires disconnecting and allowing reconnection with
state Closing, since the Closed state will only be reached when
detaching the device.
Reported-by: Valentin Vidic <Valentin.Vidic@carnet.hr>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
drivers/block/xen-blkback/xenbus.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index a4bc74e72c39..b0629f1b0511 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -770,7 +770,8 @@ static void frontend_changed(struct xenbus_device *dev,
switch (frontend_state) {
case XenbusStateInitialising:
- if (dev->state == XenbusStateClosed) {
+ if (dev->state == XenbusStateClosed ||
+ dev->state == XenbusStateClosing) {
pr_info("%s: prepare for reconnect\n", dev->nodename);
xenbus_switch_state(dev, XenbusStateInitWait);
}
@@ -809,12 +810,12 @@ static void frontend_changed(struct xenbus_device *dev,
break;
case XenbusStateClosing:
+ xen_blkif_disconnect(be->blkif);
xenbus_switch_state(dev, XenbusStateClosing);
break;
case XenbusStateClosed:
xen_blkif_disconnect(be->blkif);
- xenbus_switch_state(dev, XenbusStateClosed);
if (xenbus_dev_is_online(dev))
break;
/* fall through */
--
2.18.0
^ permalink raw reply related [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-13 15:08 ` Roger Pau Monné
@ 2018-09-14 11:49 ` Valentin Vidic
-1 siblings, 0 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-14 11:49 UTC (permalink / raw)
To: Roger Pau Monné
Cc: drbd-user, Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel,
stable, linux-block, xen-devel
On Thu, Sep 13, 2018 at 05:08:00PM +0200, Roger Pau Monn� wrote:
> So I have the following patch which I think might solve your issues
> while keeping the reset logic working. Would you mind giving it a try
> with your use case and pvgrub?
Thanks for the patch. It seems to be having some problems with pvgrub:
machines don't boot past pvgrub and they are using 100% CPU. Also in
dom0 xenstored is usind 100% CPU - strace reports it is looping in this:
read(26, "\\\240\0\0", 4) = 4
read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\\232`\231\31\1&", 24) = 24
read(26, "/local/domain/0/backend/vbd/1/51712/state", 41) = 41
read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0004", 29) = 29
read(26, "\0\330\0\0", 4) = 4
read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\JG\231\31\1&", 24) = 24
read(26, "/local/domain/0/backend/vbd/2/51712/state", 41) = 41
read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0\1\0\0\0004", 29) = 29
read(26, "\0\200\1\0", 4) = 4
read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\Z\373\231\31\1&", 24) = 24
read(26, "/local/domain/0/backend/vbd/5/51712/state", 41) = 41
read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\1\0\0\0004", 29) = 29
read(12, "\321\1\0\0", 4) = 4
write(12, "\321\1\0\0", 4) = 4
--
Valentin
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
@ 2018-09-14 11:49 ` Valentin Vidic
0 siblings, 0 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-14 11:49 UTC (permalink / raw)
To: Roger Pau Monné
Cc: drbd-user, Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel,
stable, linux-block, xen-devel
On Thu, Sep 13, 2018 at 05:08:00PM +0200, Roger Pau Monné wrote:
> So I have the following patch which I think might solve your issues
> while keeping the reset logic working. Would you mind giving it a try
> with your use case and pvgrub?
Thanks for the patch. It seems to be having some problems with pvgrub:
machines don't boot past pvgrub and they are using 100% CPU. Also in
dom0 xenstored is usind 100% CPU - strace reports it is looping in this:
read(26, "\\\240\0\0", 4) = 4
read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\\232`\231\31\1&", 24) = 24
read(26, "/local/domain/0/backend/vbd/1/51712/state", 41) = 41
read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0004", 29) = 29
read(26, "\0\330\0\0", 4) = 4
read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\JG\231\31\1&", 24) = 24
read(26, "/local/domain/0/backend/vbd/2/51712/state", 41) = 41
read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0\1\0\0\0004", 29) = 29
read(26, "\0\200\1\0", 4) = 4
read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\Z\373\231\31\1&", 24) = 24
read(26, "/local/domain/0/backend/vbd/5/51712/state", 41) = 41
read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\1\0\0\0004", 29) = 29
read(12, "\321\1\0\0", 4) = 4
write(12, "\321\1\0\0", 4) = 4
--
Valentin
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-14 11:49 ` Valentin Vidic
@ 2018-09-14 16:18 ` Roger Pau Monné
-1 siblings, 0 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-09-14 16:18 UTC (permalink / raw)
To: Valentin Vidic
Cc: drbd-user, Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel,
stable, linux-block, xen-devel
On Fri, Sep 14, 2018 at 01:49:51PM +0200, Valentin Vidic wrote:
> On Thu, Sep 13, 2018 at 05:08:00PM +0200, Roger Pau Monn� wrote:
> > So I have the following patch which I think might solve your issues
> > while keeping the reset logic working. Would you mind giving it a try
> > with your use case and pvgrub?
>
> Thanks for the patch. It seems to be having some problems with pvgrub:
> machines don't boot past pvgrub and they are using 100% CPU. Also in
> dom0 xenstored is usind 100% CPU - strace reports it is looping in this:
>
> read(26, "\\\240\0\0", 4) = 4
> read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\\232`\231\31\1&", 24) = 24
> read(26, "/local/domain/0/backend/vbd/1/51712/state", 41) = 41
> read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0004", 29) = 29
> read(26, "\0\330\0\0", 4) = 4
> read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\JG\231\31\1&", 24) = 24
> read(26, "/local/domain/0/backend/vbd/2/51712/state", 41) = 41
> read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0\1\0\0\0004", 29) = 29
> read(26, "\0\200\1\0", 4) = 4
> read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\Z\373\231\31\1&", 24) = 24
> read(26, "/local/domain/0/backend/vbd/5/51712/state", 41) = 41
> read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\1\0\0\0004", 29) = 29
> read(12, "\321\1\0\0", 4) = 4
> write(12, "\321\1\0\0", 4) = 4
I see the issue, blkfront in pvgrub is waiting for the backend to
switch to Closed state before booting the OS kernel, so my fix is not
going to work. I guess I will have to find a proper way to switch to
Closed state.
Thanks, Roger.
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
@ 2018-09-14 16:18 ` Roger Pau Monné
0 siblings, 0 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-09-14 16:18 UTC (permalink / raw)
To: Valentin Vidic
Cc: drbd-user, Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel,
stable, linux-block, xen-devel
On Fri, Sep 14, 2018 at 01:49:51PM +0200, Valentin Vidic wrote:
> On Thu, Sep 13, 2018 at 05:08:00PM +0200, Roger Pau Monné wrote:
> > So I have the following patch which I think might solve your issues
> > while keeping the reset logic working. Would you mind giving it a try
> > with your use case and pvgrub?
>
> Thanks for the patch. It seems to be having some problems with pvgrub:
> machines don't boot past pvgrub and they are using 100% CPU. Also in
> dom0 xenstored is usind 100% CPU - strace reports it is looping in this:
>
> read(26, "\\\240\0\0", 4) = 4
> read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\\232`\231\31\1&", 24) = 24
> read(26, "/local/domain/0/backend/vbd/1/51712/state", 41) = 41
> read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0004", 29) = 29
> read(26, "\0\330\0\0", 4) = 4
> read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\JG\231\31\1&", 24) = 24
> read(26, "/local/domain/0/backend/vbd/2/51712/state", 41) = 41
> read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0\1\0\0\0004", 29) = 29
> read(26, "\0\200\1\0", 4) = 4
> read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\Z\373\231\31\1&", 24) = 24
> read(26, "/local/domain/0/backend/vbd/5/51712/state", 41) = 41
> read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\1\0\0\0004", 29) = 29
> read(12, "\321\1\0\0", 4) = 4
> write(12, "\321\1\0\0", 4) = 4
I see the issue, blkfront in pvgrub is waiting for the backend to
switch to Closed state before booting the OS kernel, so my fix is not
going to work. I guess I will have to find a proper way to switch to
Closed state.
Thanks, Roger.
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-14 11:49 ` Valentin Vidic
(?)
(?)
@ 2018-09-14 16:18 ` Roger Pau Monné
-1 siblings, 0 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-09-14 16:18 UTC (permalink / raw)
To: Valentin Vidic
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, drbd-user
On Fri, Sep 14, 2018 at 01:49:51PM +0200, Valentin Vidic wrote:
> On Thu, Sep 13, 2018 at 05:08:00PM +0200, Roger Pau Monné wrote:
> > So I have the following patch which I think might solve your issues
> > while keeping the reset logic working. Would you mind giving it a try
> > with your use case and pvgrub?
>
> Thanks for the patch. It seems to be having some problems with pvgrub:
> machines don't boot past pvgrub and they are using 100% CPU. Also in
> dom0 xenstored is usind 100% CPU - strace reports it is looping in this:
>
> read(26, "\\\240\0\0", 4) = 4
> read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\\232`\231\31\1&", 24) = 24
> read(26, "/local/domain/0/backend/vbd/1/51712/state", 41) = 41
> read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0004", 29) = 29
> read(26, "\0\330\0\0", 4) = 4
> read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\JG\231\31\1&", 24) = 24
> read(26, "/local/domain/0/backend/vbd/2/51712/state", 41) = 41
> read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0\1\0\0\0004", 29) = 29
> read(26, "\0\200\1\0", 4) = 4
> read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\Z\373\231\31\1&", 24) = 24
> read(26, "/local/domain/0/backend/vbd/5/51712/state", 41) = 41
> read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\1\0\0\0004", 29) = 29
> read(12, "\321\1\0\0", 4) = 4
> write(12, "\321\1\0\0", 4) = 4
I see the issue, blkfront in pvgrub is waiting for the backend to
switch to Closed state before booting the OS kernel, so my fix is not
going to work. I guess I will have to find a proper way to switch to
Closed state.
Thanks, Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-13 15:08 ` Roger Pau Monné
(?)
(?)
@ 2018-09-14 11:49 ` Valentin Vidic
-1 siblings, 0 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-14 11:49 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, drbd-user
On Thu, Sep 13, 2018 at 05:08:00PM +0200, Roger Pau Monné wrote:
> So I have the following patch which I think might solve your issues
> while keeping the reset logic working. Would you mind giving it a try
> with your use case and pvgrub?
Thanks for the patch. It seems to be having some problems with pvgrub:
machines don't boot past pvgrub and they are using 100% CPU. Also in
dom0 xenstored is usind 100% CPU - strace reports it is looping in this:
read(26, "\\\240\0\0", 4) = 4
read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\\232`\231\31\1&", 24) = 24
read(26, "/local/domain/0/backend/vbd/1/51712/state", 41) = 41
read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0004", 29) = 29
read(26, "\0\330\0\0", 4) = 4
read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\JG\231\31\1&", 24) = 24
read(26, "/local/domain/0/backend/vbd/2/51712/state", 41) = 41
read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0\1\0\0\0004", 29) = 29
read(26, "\0\200\1\0", 4) = 4
read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\Z\373\231\31\1&", 24) = 24
read(26, "/local/domain/0/backend/vbd/5/51712/state", 41) = 41
read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\1\0\0\0004", 29) = 29
read(12, "\321\1\0\0", 4) = 4
write(12, "\321\1\0\0", 4) = 4
--
Valentin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 80+ messages in thread
* Re: [DRBD-user] [PATCH] xen-blkback: Switch to closed state after releasing the backing device
2018-09-07 17:14 ` Valentin Vidic
2018-09-08 7:34 ` Valentin Vidic
@ 2018-09-08 7:34 ` Valentin Vidic
1 sibling, 0 replies; 80+ messages in thread
From: Valentin Vidic @ 2018-09-08 7:34 UTC (permalink / raw)
To: drbd-user
Cc: Jens Axboe, Konrad Rzeszutek Wilk, linux-kernel, stable,
linux-block, xen-devel, Roger Pau Monné
On Fri, Sep 07, 2018 at 07:14:59PM +0200, Valentin Vidic wrote:
> In fact the first one is the original code path before I modified
> blkback. The problem is it gets executed async from workqueue so
> it might not always run before the call to drbdadm secondary.
As the DRBD device gets released only when the last IO request
has finished, I found a way to check and wait for this in the
block-drbd script:
--- block-drbd.orig 2018-09-08 09:07:23.499648515 +0200
+++ block-drbd 2018-09-08 09:28:12.892193649 +0200
@@ -230,6 +230,24 @@
and so cannot be mounted ${m2}${when}."
}
+wait_for_inflight()
+{
+ local dev="$1"
+ local inflight="/sys/block/${dev#/dev/}/inflight"
+ local rd wr
+
+ if ! [ -f "$inflight" ]; then
+ return
+ fi
+
+ while true; do
+ read rd wr < $inflight
+ if [ "$rd" = "0" -a "$wr" = "0" ]; then
+ return
+ fi
+ sleep 1
+ done
+}
t=$(xenstore_read_default "$XENBUS_PATH/type" 'MISSING')
@@ -285,6 +303,8 @@
drbd_lrole="${drbd_role%%/*}"
drbd_dev="$(drbdadm sh-dev $drbd_resource)"
+ wait_for_inflight $drbd_dev
+
if [ "$drbd_lrole" != 'Secondary' ]; then
drbdadm secondary $drbd_resource
fi
--
Valentin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 80+ messages in thread