linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] NBD: replace kill_bdev() with __invalidate_device()
       [not found]   ` <56F34412.1050707@gmail.com>
@ 2016-04-20 11:06     ` Markus Pargmann
  2016-04-23  2:17       ` Ratna Manoj
  0 siblings, 1 reply; 10+ messages in thread
From: Markus Pargmann @ 2016-04-20 11:06 UTC (permalink / raw)
  To: Ratna Manoj; +Cc: pbonzini, jack, grao, jv, nbd-general, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3706 bytes --]

Hi,

On Thursday 24 March 2016 07:04:10 Ratna Manoj wrote:
> From: Ratna Manoj Bolla <manoj.br@gmail.com>
> 
> When a filesystem is mounted on a nbd device and on a disconnect, because 
> of kill_bdev(), and resetting bdev size to zero, buffer_head mappings are 
> getting destroyed under mounted filesystem.
> 
> After a bdev size reset(i.e bdev->bd_inode->i_size = 0) on a disconnect,
> followed by a sys_umount(),
>         generic_shutdown_super()->...
>         ->__sync_blockdev()->...
>         -> blkdev_writepages()->...
>         ->do_invalidatepage()->...
>         -> discard_buffer()   is discarding superblock buffer_head assumed
> to be in mapped state by ext4_commit_super().
> 
> 
> 
> Signed-off-by: Ratna Manoj Bolla <manoj.br@gmail.com>
> ---
> This script reproduces both the kernel panic scenarios:
>  
> $ qemu-img create -f qcow2 f.img 1G
> $ mkfs.ext4 f.img
> $ qemu-nbd -c /dev/nbd0 f.img
> $ mount /dev/nbd0 dir
> $ killall -KILL qemu-nbd
> $ sleep 1
> $ ls dir
> $ umount dir
> 
> Bug reports:
> http://www.kernelhub.org/?p=2&msg=361407
> https://www.mail-archive.com/nbd-general@lists.sourceforge.net/msg02388.html

Thanks, please CC nbd-general@lists.sourceforge.net,
linux-kernel@vger.kernel.org as well.

So this patch simply does not cleanup the blockdevice to avoid any
errors on the filesystem side. The userspace thread that called
NBD_DO_IT will exit immediately before the filesystem decided to release
the blockdevice. The nbd driver assumes that the shutdown was done and
accepts new clients setting up sockets and so on. Couldn't this lead to
a lot of problems?

Currently NBD_DO_IT returns when it is save to use the NBD device again.
This patch changes this as the blockdevice may still be in use when
NBD_DO_IT returns. I think it would be better to delay NBD_DO_IT until
everything is cleaned up and all filesystems are closed.

Best Regards,

Markus

> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index f6b51d7..6e77b3a 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -119,7 +119,8 @@ static const char *nbdcmd_to_ascii(int cmd)
>  
>  static int nbd_size_clear(struct nbd_device *nbd, struct block_device *bdev)
>  {
> -	bdev->bd_inode->i_size = 0;
> +	if (bdev->bd_openers <= 1)
> +		bdev->bd_inode->i_size = 0;
>  	set_capacity(nbd->disk, 0);
>  	kobject_uevent(&nbd_to_dev(nbd)->kobj, KOBJ_CHANGE);
>  
> @@ -678,6 +679,9 @@ static void nbd_reset(struct nbd_device *nbd)
>  
>  static void nbd_bdev_reset(struct block_device *bdev)
>  {
> +	if (bdev->bd_openers > 1)
> +		return;
> +
>  	set_device_ro(bdev, false);
>  	bdev->bd_inode->i_size = 0;
>  	if (max_part > 0) {
> @@ -735,7 +739,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
>  		nbd_clear_que(nbd);
>  		BUG_ON(!list_empty(&nbd->queue_head));
>  		BUG_ON(!list_empty(&nbd->waiting_queue));
> -		kill_bdev(bdev);
> +		__invalidate_device(bdev, true);
>  		return 0;
>  
>  	case NBD_SET_SOCK: {
> @@ -809,7 +813,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
>  
>  		sock_shutdown(nbd);
>  		nbd_clear_que(nbd);
> -		kill_bdev(bdev);
> +		__invalidate_device(bdev, true);
>  		nbd_bdev_reset(bdev);
>  
>  		if (nbd->disconnect) /* user requested, ignore socket errors */
> 
> 
> 

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] NBD: replace kill_bdev() with __invalidate_device()
  2016-04-20 11:06     ` [PATCH] NBD: replace kill_bdev() with __invalidate_device() Markus Pargmann
@ 2016-04-23  2:17       ` Ratna Manoj
  2016-04-28  9:00         ` Markus Pargmann
  2016-05-10  9:24         ` Paolo Bonzini
  0 siblings, 2 replies; 10+ messages in thread
From: Ratna Manoj @ 2016-04-23  2:17 UTC (permalink / raw)
  To: Markus Pargmann
  Cc: pbonzini, jack, Gou Rao, Vinod Jayaraman, nbd-general, linux-kernel

Thanks for the review. 

Atleast for ext4 this crash happens on a sys_umount() call, timing of
which is not in control of block driver. Block driver cannot force the
filesystems to be unmounted, and the file system does not expect 
buffers to get unmapped under it.

Ext4 can be fixed with the this patch:
http://www.spinics.net/lists/linux-ext4/msg51112.html 
It did not make to the kernel. It checks the state of the buffer head
before committing.

When we consider diskett/CD as user space thread that called NBD_DO_IT,
this problem is analogous to changing disk with another or the same
disk suddenly when the file system is still mounted. 

If we completely kill the block device we would loss some writes when
same thread is reconnected.

if we do not completely kill or if we only invalidate clean buffers, 
we will have inconsistency on re-attach with a different thread
(analogous to replacing disk with different disk suddenly). 

Ratna.    
   

On Wed, Apr 20, 2016 at 4:36 PM, Markus Pargmann <mpa@pengutronix.de> wrote:

> Hi,
>
> On Thursday 24 March 2016 07:04:10 Ratna Manoj wrote:
> > From: Ratna Manoj Bolla <manoj.br@gmail.com>
> >
> > When a filesystem is mounted on a nbd device and on a disconnect, because
> > of kill_bdev(), and resetting bdev size to zero, buffer_head mappings are
> > getting destroyed under mounted filesystem.
> >
> > After a bdev size reset(i.e bdev->bd_inode->i_size = 0) on a disconnect,
> > followed by a sys_umount(),
> >         generic_shutdown_super()->...
> >         ->__sync_blockdev()->...
> >         -> blkdev_writepages()->...
> >         ->do_invalidatepage()->...
> >         -> discard_buffer()   is discarding superblock buffer_head 
> assumed
> > to be in mapped state by ext4_commit_super().
> >
> >
> >
> > Signed-off-by: Ratna Manoj Bolla <manoj.br@gmail.com>
> > ---
> > This script reproduces both the kernel panic scenarios:
> >
> > $ qemu-img create -f qcow2 f.img 1G
> > $ mkfs.ext4 f.img
> > $ qemu-nbd -c /dev/nbd0 f.img
> > $ mount /dev/nbd0 dir
> > $ killall -KILL qemu-nbd
> > $ sleep 1
> > $ ls dir
> > $ umount dir
> >
> > Bug reports:
> > http://www.kernelhub.org/?p=2&msg=361407
> > 
> https://www.mail-archive.com/nbd-general@lists.sourceforge.net/msg02388.html
>
> Thanks, please CC nbd-general@lists.sourceforge.net,
> linux-kernel@vger.kernel.org as well.
>
> So this patch simply does not cleanup the blockdevice to avoid any
> errors on the filesystem side. The userspace thread that called
> NBD_DO_IT will exit immediately before the filesystem decided to release
> the blockdevice. The nbd driver assumes that the shutdown was done and
> accepts new clients setting up sockets and so on. Couldn't this lead to
> a lot of problems?
>
> Currently NBD_DO_IT returns when it is save to use the NBD device again.
> This patch changes this as the blockdevice may still be in use when
> NBD_DO_IT returns. I think it would be better to delay NBD_DO_IT until
> everything is cleaned up and all filesystems are closed.
>
> Best Regards,
>
> Markus
>
> >
> > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> > index f6b51d7..6e77b3a 100644
> > --- a/drivers/block/nbd.c
> > +++ b/drivers/block/nbd.c
> > @@ -119,7 +119,8 @@ static const char *nbdcmd_to_ascii(int cmd)
> >
> >  static int nbd_size_clear(struct nbd_device *nbd, struct block_device 
> *bdev)
> >  {
> > -     bdev->bd_inode->i_size = 0;
> > +     if (bdev->bd_openers <= 1)
> > +             bdev->bd_inode->i_size = 0;
> >       set_capacity(nbd->disk, 0);
> >       kobject_uevent(&nbd_to_dev(nbd)->kobj, KOBJ_CHANGE);
> >
> > @@ -678,6 +679,9 @@ static void nbd_reset(struct nbd_device *nbd)
> >
> >  static void nbd_bdev_reset(struct block_device *bdev)
> >  {
> > +     if (bdev->bd_openers > 1)
> > +             return;
> > +
> >       set_device_ro(bdev, false);
> >       bdev->bd_inode->i_size = 0;
> >       if (max_part > 0) {
> > @@ -735,7 +739,7 @@ static int __nbd_ioctl(struct block_device *bdev, 
> struct nbd_device *nbd,
> >               nbd_clear_que(nbd);
> >               BUG_ON(!list_empty(&nbd->queue_head));
> >               BUG_ON(!list_empty(&nbd->waiting_queue));
> > -             kill_bdev(bdev);
> > +             __invalidate_device(bdev, true);
> >               return 0;
> >
> >       case NBD_SET_SOCK: {
> > @@ -809,7 +813,7 @@ static int __nbd_ioctl(struct block_device *bdev, 
> struct nbd_device *nbd,
> >
> >               sock_shutdown(nbd);
> >               nbd_clear_que(nbd);
> > -             kill_bdev(bdev);
> > +             __invalidate_device(bdev, true);
> >               nbd_bdev_reset(bdev);
> >
> >               if (nbd->disconnect) /* user requested, ignore socket 
> errors */
> >
> >
> >
>
> --
> Pengutronix e.K.                           |                             |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] NBD: replace kill_bdev() with __invalidate_device()
  2016-04-23  2:17       ` Ratna Manoj
@ 2016-04-28  9:00         ` Markus Pargmann
  2016-04-28 16:27           ` [Nbd] " Wouter Verhelst
  2016-05-10  9:24         ` Paolo Bonzini
  1 sibling, 1 reply; 10+ messages in thread
From: Markus Pargmann @ 2016-04-28  9:00 UTC (permalink / raw)
  To: Ratna Manoj
  Cc: pbonzini, jack, Gou Rao, Vinod Jayaraman, nbd-general, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6370 bytes --]

Hi,

On Saturday 23 April 2016 07:47:21 Ratna Manoj wrote:
> Thanks for the review. 
> 
> Atleast for ext4 this crash happens on a sys_umount() call, timing of
> which is not in control of block driver. Block driver cannot force the
> filesystems to be unmounted, and the file system does not expect 
> buffers to get unmapped under it.

Yes the block driver can't force a clean umount.

> 
> Ext4 can be fixed with the this patch:
> http://www.spinics.net/lists/linux-ext4/msg51112.html 
> It did not make to the kernel. It checks the state of the buffer head
> before committing.
> 
> When we consider diskett/CD as user space thread that called NBD_DO_IT,
> this problem is analogous to changing disk with another or the same
> disk suddenly when the file system is still mounted. 
> 
> If we completely kill the block device we would loss some writes when
> same thread is reconnected.

I am not so sure about your exact use-case here.

If the NBD_DO_IT thread returns I am considering the connection and
block device as dead and disconnected. Securing any data afterwards with
a new connection is potentially dangerous as it may be a different
server.

> 
> if we do not completely kill or if we only invalidate clean buffers, 
> we will have inconsistency on re-attach with a different thread
> (analogous to replacing disk with different disk suddenly). 

Yes exactly. That's why I suggested that NBD_DO_IT waits until all
blockdevice users are gone. This would avoid any issues with
writing/reading data to a wrong server.

Best Regards,

Markus

> 
> Ratna.    
>    
> 
> On Wed, Apr 20, 2016 at 4:36 PM, Markus Pargmann <mpa@pengutronix.de> wrote:
> 
> > Hi,
> >
> > On Thursday 24 March 2016 07:04:10 Ratna Manoj wrote:
> > > From: Ratna Manoj Bolla <manoj.br@gmail.com>
> > >
> > > When a filesystem is mounted on a nbd device and on a disconnect, because
> > > of kill_bdev(), and resetting bdev size to zero, buffer_head mappings are
> > > getting destroyed under mounted filesystem.
> > >
> > > After a bdev size reset(i.e bdev->bd_inode->i_size = 0) on a disconnect,
> > > followed by a sys_umount(),
> > >         generic_shutdown_super()->...
> > >         ->__sync_blockdev()->...
> > >         -> blkdev_writepages()->...
> > >         ->do_invalidatepage()->...
> > >         -> discard_buffer()   is discarding superblock buffer_head 
> > assumed
> > > to be in mapped state by ext4_commit_super().
> > >
> > >
> > >
> > > Signed-off-by: Ratna Manoj Bolla <manoj.br@gmail.com>
> > > ---
> > > This script reproduces both the kernel panic scenarios:
> > >
> > > $ qemu-img create -f qcow2 f.img 1G
> > > $ mkfs.ext4 f.img
> > > $ qemu-nbd -c /dev/nbd0 f.img
> > > $ mount /dev/nbd0 dir
> > > $ killall -KILL qemu-nbd
> > > $ sleep 1
> > > $ ls dir
> > > $ umount dir
> > >
> > > Bug reports:
> > > http://www.kernelhub.org/?p=2&msg=361407
> > > 
> > https://www.mail-archive.com/nbd-general@lists.sourceforge.net/msg02388.html
> >
> > Thanks, please CC nbd-general@lists.sourceforge.net,
> > linux-kernel@vger.kernel.org as well.
> >
> > So this patch simply does not cleanup the blockdevice to avoid any
> > errors on the filesystem side. The userspace thread that called
> > NBD_DO_IT will exit immediately before the filesystem decided to release
> > the blockdevice. The nbd driver assumes that the shutdown was done and
> > accepts new clients setting up sockets and so on. Couldn't this lead to
> > a lot of problems?
> >
> > Currently NBD_DO_IT returns when it is save to use the NBD device again.
> > This patch changes this as the blockdevice may still be in use when
> > NBD_DO_IT returns. I think it would be better to delay NBD_DO_IT until
> > everything is cleaned up and all filesystems are closed.
> >
> > Best Regards,
> >
> > Markus
> >
> > >
> > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> > > index f6b51d7..6e77b3a 100644
> > > --- a/drivers/block/nbd.c
> > > +++ b/drivers/block/nbd.c
> > > @@ -119,7 +119,8 @@ static const char *nbdcmd_to_ascii(int cmd)
> > >
> > >  static int nbd_size_clear(struct nbd_device *nbd, struct block_device 
> > *bdev)
> > >  {
> > > -     bdev->bd_inode->i_size = 0;
> > > +     if (bdev->bd_openers <= 1)
> > > +             bdev->bd_inode->i_size = 0;
> > >       set_capacity(nbd->disk, 0);
> > >       kobject_uevent(&nbd_to_dev(nbd)->kobj, KOBJ_CHANGE);
> > >
> > > @@ -678,6 +679,9 @@ static void nbd_reset(struct nbd_device *nbd)
> > >
> > >  static void nbd_bdev_reset(struct block_device *bdev)
> > >  {
> > > +     if (bdev->bd_openers > 1)
> > > +             return;
> > > +
> > >       set_device_ro(bdev, false);
> > >       bdev->bd_inode->i_size = 0;
> > >       if (max_part > 0) {
> > > @@ -735,7 +739,7 @@ static int __nbd_ioctl(struct block_device *bdev, 
> > struct nbd_device *nbd,
> > >               nbd_clear_que(nbd);
> > >               BUG_ON(!list_empty(&nbd->queue_head));
> > >               BUG_ON(!list_empty(&nbd->waiting_queue));
> > > -             kill_bdev(bdev);
> > > +             __invalidate_device(bdev, true);
> > >               return 0;
> > >
> > >       case NBD_SET_SOCK: {
> > > @@ -809,7 +813,7 @@ static int __nbd_ioctl(struct block_device *bdev, 
> > struct nbd_device *nbd,
> > >
> > >               sock_shutdown(nbd);
> > >               nbd_clear_que(nbd);
> > > -             kill_bdev(bdev);
> > > +             __invalidate_device(bdev, true);
> > >               nbd_bdev_reset(bdev);
> > >
> > >               if (nbd->disconnect) /* user requested, ignore socket 
> > errors */
> > >
> > >
> > >
> >
> > --
> > Pengutronix e.K.                           |                             |
> > Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> > Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
> > Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
> >
> 
> 
> 

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Nbd] [PATCH] NBD: replace kill_bdev() with __invalidate_device()
  2016-04-28  9:00         ` Markus Pargmann
@ 2016-04-28 16:27           ` Wouter Verhelst
  2016-04-28 18:43             ` Ratna Manoj
  2016-05-12  9:53             ` Markus Pargmann
  0 siblings, 2 replies; 10+ messages in thread
From: Wouter Verhelst @ 2016-04-28 16:27 UTC (permalink / raw)
  To: Markus Pargmann
  Cc: Ratna Manoj, nbd-general, Vinod Jayaraman, jack, linux-kernel,
	Gou Rao, pbonzini

On Thu, Apr 28, 2016 at 11:00:20AM +0200, Markus Pargmann wrote:
> Hi,
> 
> On Saturday 23 April 2016 07:47:21 Ratna Manoj wrote:
> > Thanks for the review. 
> > 
> > Atleast for ext4 this crash happens on a sys_umount() call, timing of
> > which is not in control of block driver. Block driver cannot force the
> > filesystems to be unmounted, and the file system does not expect 
> > buffers to get unmapped under it.
> 
> Yes the block driver can't force a clean umount.
> 
> > 
> > Ext4 can be fixed with the this patch:
> > http://www.spinics.net/lists/linux-ext4/msg51112.html 
> > It did not make to the kernel. It checks the state of the buffer head
> > before committing.
> > 
> > When we consider diskett/CD as user space thread that called NBD_DO_IT,
> > this problem is analogous to changing disk with another or the same
> > disk suddenly when the file system is still mounted. 
> > 
> > If we completely kill the block device we would loss some writes when
> > same thread is reconnected.
> 
> I am not so sure about your exact use-case here.
> 
> If the NBD_DO_IT thread returns I am considering the connection and
> block device as dead and disconnected. Securing any data afterwards with
> a new connection is potentially dangerous as it may be a different
> server.

Yes, from the kernel POV you're right.

However, at some point I agreed with Paul (your predecessor) that when
this happens due to an error condition (as opposed to it being due to an
explicit disconnect), the kernel would block all reads from or writes to
the device, and the client may try to reconnect *from the same
PID* (i.e., it may not fork()). If that succeeds, the next NBD_DO_IT is
assumed to be connected to the same server; if instead the process
exits, then the block device is assumed to be dead, will be reset, and
all pending reads or writes would error.

In principle, this allows for a proper reconnect from userspace if it
can be done. However, I'm not sure whether this ever worked well or
whether it was documented, so it's probably fine if you think it should
be replaced with something else.

(obviously, userspace reconnecting the device to a different device is
wrong and should not be done, but that's a case of "if you break it, you
get to keep both pieces)

At any rate, I think it makes sense for userspace to be given a chance
to *attempt* to reconnect a device when the connection drops
unexpectedly.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Nbd] [PATCH] NBD: replace kill_bdev() with __invalidate_device()
  2016-04-28 16:27           ` [Nbd] " Wouter Verhelst
@ 2016-04-28 18:43             ` Ratna Manoj
  2016-05-12  9:53             ` Markus Pargmann
  1 sibling, 0 replies; 10+ messages in thread
From: Ratna Manoj @ 2016-04-28 18:43 UTC (permalink / raw)
  To: Wouter Verhelst, Markus Pargmann
  Cc: nbd-general, Vinod Jayaraman, jack, linux-kernel, Gou Rao, pbonzini


Hi,

On Thursday 28 April 2016 09:57 PM, Wouter Verhelst wrote:
> On Thu, Apr 28, 2016 at 11:00:20AM +0200, Markus Pargmann wrote:
>> Hi,
>>
>> On Saturday 23 April 2016 07:47:21 Ratna Manoj wrote:
>>> Thanks for the review. 
>>>
>>> Atleast for ext4 this crash happens on a sys_umount() call, timing of
>>> which is not in control of block driver. Block driver cannot force the
>>> filesystems to be unmounted, and the file system does not expect 
>>> buffers to get unmapped under it.
>> Yes the block driver can't force a clean umount.
>>
>>> Ext4 can be fixed with the this patch:
>>> http://www.spinics.net/lists/linux-ext4/msg51112.html 
>>> It did not make to the kernel. It checks the state of the buffer head
>>> before committing.
>>>
>>> When we consider diskett/CD as user space thread that called NBD_DO_IT,
>>> this problem is analogous to changing disk with another or the same
>>> disk suddenly when the file system is still mounted. 
>>>
>>> If we completely kill the block device we would loss some writes when
>>> same thread is reconnected.
>> I am not so sure about your exact use-case here.
>>
>> If the NBD_DO_IT thread returns I am considering the connection and
>> block device as dead and disconnected. Securing any data afterwards with
>> a new connection is potentially dangerous as it may be a different
>> server.
> Yes, from the kernel POV you're right.
>
> However, at some point I agreed with Paul (your predecessor) that when
> this happens due to an error condition (as opposed to it being due to an
> explicit disconnect), the kernel would block all reads from or writes to
> the device, and the client may try to reconnect *from the same
> PID* (i.e., it may not fork()). If that succeeds, the next NBD_DO_IT is
> assumed to be connected to the same server; if instead the process
> exits, then the block device is assumed to be dead, will be reset, and
> all pending reads or writes would error.
>
> In principle, this allows for a proper reconnect from userspace if it
> can be done. However, I'm not sure whether this ever worked well or
> whether it was documented, so it's probably fine if you think it should
> be replaced with something else.
>
> (obviously, userspace reconnecting the device to a different device is
> wrong and should not be done, but that's a case of "if you break it, you
> get to keep both pieces)
>
> At any rate, I think it makes sense for userspace to be given a chance
> to *attempt* to reconnect a device when the connection drops
> unexpectedly.

To summarize different options we have:

1. Wait for all the buffers to get cleaned:
   a) ... at the time of disconnect:  
      Not possible, because we can neither force upper layers to do the required
      (ext4 umount crash scenario), nor kill mapped buffers unconditionally.
   b) ... after disconnect, fail next reconnect with EBUSY if bd_openers > 1:
      This would prevent even harmless reconnects from same device along with
      problematic different device reconnects.

2. Do not wait for all buffers to get cleaned:
   a) This particular patch does this. Inconsistency with a different device 
      reconnect is its disadvantage. 

We may want to give preference same device reconnects as it is not rare. 

(For floppy/cdrom disk drivers also, changing disk with different disk in the 
 middle results in inconsistency, and they left this case as user responsibility
 to not do it.)

-Ratna

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] NBD: replace kill_bdev() with __invalidate_device()
  2016-04-23  2:17       ` Ratna Manoj
  2016-04-28  9:00         ` Markus Pargmann
@ 2016-05-10  9:24         ` Paolo Bonzini
  1 sibling, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2016-05-10  9:24 UTC (permalink / raw)
  To: Ratna Manoj, Markus Pargmann
  Cc: jack, Gou Rao, Vinod Jayaraman, nbd-general, linux-kernel



On 23/04/2016 04:17, Ratna Manoj wrote:
> Ext4 can be fixed with the this patch:
> http://www.spinics.net/lists/linux-ext4/msg51112.html 
> It did not make to the kernel. It checks the state of the buffer head
> before committing.

The patch did not make it, but Ted Ts'o proposed an alternative
(modifying __sync_dirty_buffer).

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Nbd] [PATCH] NBD: replace kill_bdev() with __invalidate_device()
  2016-04-28 16:27           ` [Nbd] " Wouter Verhelst
  2016-04-28 18:43             ` Ratna Manoj
@ 2016-05-12  9:53             ` Markus Pargmann
  2016-05-15 12:55               ` Wouter Verhelst
  1 sibling, 1 reply; 10+ messages in thread
From: Markus Pargmann @ 2016-05-12  9:53 UTC (permalink / raw)
  To: Wouter Verhelst
  Cc: Ratna Manoj, nbd-general, Vinod Jayaraman, jack, linux-kernel,
	Gou Rao, pbonzini

[-- Attachment #1: Type: text/plain, Size: 3734 bytes --]

Hi,

On Thursday 28 April 2016 18:27:34 Wouter Verhelst wrote:
> On Thu, Apr 28, 2016 at 11:00:20AM +0200, Markus Pargmann wrote:
> > Hi,
> > 
> > On Saturday 23 April 2016 07:47:21 Ratna Manoj wrote:
> > > Thanks for the review. 
> > > 
> > > Atleast for ext4 this crash happens on a sys_umount() call, timing of
> > > which is not in control of block driver. Block driver cannot force the
> > > filesystems to be unmounted, and the file system does not expect 
> > > buffers to get unmapped under it.
> > 
> > Yes the block driver can't force a clean umount.
> > 
> > > 
> > > Ext4 can be fixed with the this patch:
> > > http://www.spinics.net/lists/linux-ext4/msg51112.html 
> > > It did not make to the kernel. It checks the state of the buffer head
> > > before committing.
> > > 
> > > When we consider diskett/CD as user space thread that called NBD_DO_IT,
> > > this problem is analogous to changing disk with another or the same
> > > disk suddenly when the file system is still mounted. 
> > > 
> > > If we completely kill the block device we would loss some writes when
> > > same thread is reconnected.
> > 
> > I am not so sure about your exact use-case here.
> > 
> > If the NBD_DO_IT thread returns I am considering the connection and
> > block device as dead and disconnected. Securing any data afterwards with
> > a new connection is potentially dangerous as it may be a different
> > server.
> 
> Yes, from the kernel POV you're right.
> 
> However, at some point I agreed with Paul (your predecessor) that when
> this happens due to an error condition (as opposed to it being due to an
> explicit disconnect), the kernel would block all reads from or writes to
> the device, and the client may try to reconnect *from the same
> PID* (i.e., it may not fork()). If that succeeds, the next NBD_DO_IT is
> assumed to be connected to the same server; if instead the process
> exits, then the block device is assumed to be dead, will be reset, and
> all pending reads or writes would error.
> 
> In principle, this allows for a proper reconnect from userspace if it
> can be done. However, I'm not sure whether this ever worked well or
> whether it was documented, so it's probably fine if you think it should
> be replaced with something else.

At least I was not aware of this possibility. As far as I know the
previous code even had issues with the signals used to kill on timeouts
which also killed the userspace program sometimes.

Currently I can't see a code path that supports reconnects. But I may
have removed that accidently in the past.

> 
> (obviously, userspace reconnecting the device to a different device is
> wrong and should not be done, but that's a case of "if you break it, you
> get to keep both pieces)
> 
> At any rate, I think it makes sense for userspace to be given a chance
> to *attempt* to reconnect a device when the connection drops
> unexpectedly.

Perhaps it would be better to setup the kernel driver explicitly for
that. Perhaps some flag to let the kernel driver know that the client
would like to keep the block device open? In that case the client could
excplicitly use NBD_CLEAR_SOCK to cleanup everything.

Or perhaps a completely new ioctl that can transmit back some more
information about what failures were seen and whether the blockdevice
was closed or not?

Best Regards,

Markus

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Nbd] [PATCH] NBD: replace kill_bdev() with __invalidate_device()
  2016-05-12  9:53             ` Markus Pargmann
@ 2016-05-15 12:55               ` Wouter Verhelst
  2016-05-19  6:35                 ` Markus Pargmann
  0 siblings, 1 reply; 10+ messages in thread
From: Wouter Verhelst @ 2016-05-15 12:55 UTC (permalink / raw)
  To: Markus Pargmann
  Cc: Ratna Manoj, nbd-general, Vinod Jayaraman, jack, linux-kernel,
	Gou Rao, pbonzini

Hi Markus,

On Thu, May 12, 2016 at 11:53:01AM +0200, Markus Pargmann wrote:
> On Thursday 28 April 2016 18:27:34 Wouter Verhelst wrote:
> > However, at some point I agreed with Paul (your predecessor) that when
> > this happens due to an error condition (as opposed to it being due to an
> > explicit disconnect), the kernel would block all reads from or writes to
> > the device, and the client may try to reconnect *from the same
> > PID* (i.e., it may not fork()). If that succeeds, the next NBD_DO_IT is
> > assumed to be connected to the same server; if instead the process
> > exits, then the block device is assumed to be dead, will be reset, and
> > all pending reads or writes would error.
> > 
> > In principle, this allows for a proper reconnect from userspace if it
> > can be done. However, I'm not sure whether this ever worked well or
> > whether it was documented, so it's probably fine if you think it should
> > be replaced with something else.
> 
> At least I was not aware of this possibility. As far as I know the
> previous code even had issues with the signals used to kill on timeouts
> which also killed the userspace program sometimes.
> 
> Currently I can't see a code path that supports reconnects. But I may
> have removed that accidently in the past.

Right. Like I said, I'm not sure if it ever worked well. The user space
client has a -persist option that tries to implement it, but I've been
getting some bug reports from people who've tried it (although that may
have been my fault rather than the kernel's).

> > (obviously, userspace reconnecting the device to a different device is
> > wrong and should not be done, but that's a case of "if you break it, you
> > get to keep both pieces)
> > 
> > At any rate, I think it makes sense for userspace to be given a chance
> > to *attempt* to reconnect a device when the connection drops
> > unexpectedly.
> 
> Perhaps it would be better to setup the kernel driver explicitly for
> that. Perhaps some flag to let the kernel driver know that the client
> would like to keep the block device open? In that case the client could
> excplicitly use NBD_CLEAR_SOCK to cleanup everything.

I'm not sure what you mean by this. Can you clarify?

> Or perhaps a completely new ioctl that can transmit back some more
> information about what failures were seen and whether the blockdevice
> was closed or not?

The intent was that ioctl(NBD_DO_IT) would return an error when the
disconnect was not requested, and would return 0 when the connection
dropped due to userspace doing ioctl(NBD_DISCONNECT), since dropping the
connection when userspace explicitly asks for it is not an error.

drivers/block/nbd.c contains the following:

static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
                       unsigned int cmd, unsigned long arg)
{
[...]
	case NBD_DO_IT: {
[...]
                if (nbd->disconnect) /* user requested, ignore socket errors */
                        return 0;
                return error;
	}
[...]

so the signalling part of it is at least still there. Whether it works,
I haven't tested.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Nbd] [PATCH] NBD: replace kill_bdev() with __invalidate_device()
  2016-05-15 12:55               ` Wouter Verhelst
@ 2016-05-19  6:35                 ` Markus Pargmann
  2016-05-24 15:17                   ` Wouter Verhelst
  0 siblings, 1 reply; 10+ messages in thread
From: Markus Pargmann @ 2016-05-19  6:35 UTC (permalink / raw)
  To: Wouter Verhelst
  Cc: Ratna Manoj, nbd-general, Vinod Jayaraman, jack, linux-kernel,
	Gou Rao, pbonzini

[-- Attachment #1: Type: text/plain, Size: 4716 bytes --]

Hi Wouter,

On Sun, May 15, 2016 at 02:55:39PM +0200, Wouter Verhelst wrote:
> Hi Markus,
> 
> On Thu, May 12, 2016 at 11:53:01AM +0200, Markus Pargmann wrote:
> > On Thursday 28 April 2016 18:27:34 Wouter Verhelst wrote:
> > > However, at some point I agreed with Paul (your predecessor) that when
> > > this happens due to an error condition (as opposed to it being due to an
> > > explicit disconnect), the kernel would block all reads from or writes to
> > > the device, and the client may try to reconnect *from the same
> > > PID* (i.e., it may not fork()). If that succeeds, the next NBD_DO_IT is
> > > assumed to be connected to the same server; if instead the process
> > > exits, then the block device is assumed to be dead, will be reset, and
> > > all pending reads or writes would error.
> > > 
> > > In principle, this allows for a proper reconnect from userspace if it
> > > can be done. However, I'm not sure whether this ever worked well or
> > > whether it was documented, so it's probably fine if you think it should
> > > be replaced with something else.
> > 
> > At least I was not aware of this possibility. As far as I know the
> > previous code even had issues with the signals used to kill on timeouts
> > which also killed the userspace program sometimes.
> > 
> > Currently I can't see a code path that supports reconnects. But I may
> > have removed that accidently in the past.
> 
> Right. Like I said, I'm not sure if it ever worked well. The user space
> client has a -persist option that tries to implement it, but I've been
> getting some bug reports from people who've tried it (although that may
> have been my fault rather than the kernel's).
> 
> > > (obviously, userspace reconnecting the device to a different device is
> > > wrong and should not be done, but that's a case of "if you break it, you
> > > get to keep both pieces)
> > > 
> > > At any rate, I think it makes sense for userspace to be given a chance
> > > to *attempt* to reconnect a device when the connection drops
> > > unexpectedly.
> > 
> > Perhaps it would be better to setup the kernel driver explicitly for
> > that. Perhaps some flag to let the kernel driver know that the client
> > would like to keep the block device open? In that case the client could
> > excplicitly use NBD_CLEAR_SOCK to cleanup everything.
> 
> I'm not sure what you mean by this. Can you clarify?

I meant that it might be better to have a separate way for NBD_DO_IT.
Something where the client software can directly instruct the kernel to
keep everything opened in case of an error so that the client may
reconnect afterwards.

This could be a new ioctl that sets it up, for example 'NBD_PERSISTENT'.
The NBD_DO_IT afterwards would keep everything up and running in case of
a connection error so that the client could set a new socket using
NBD_SET_SOCK and reenter using NBD_DO_IT.

For all clients that are not capable of this mechanism or don't use it,
NBD_DO_IT would clean up everything properly on any error.


> 
> > Or perhaps a completely new ioctl that can transmit back some more
> > information about what failures were seen and whether the blockdevice
> > was closed or not?
> 
> The intent was that ioctl(NBD_DO_IT) would return an error when the
> disconnect was not requested, and would return 0 when the connection
> dropped due to userspace doing ioctl(NBD_DISCONNECT), since dropping the
> connection when userspace explicitly asks for it is not an error.
> 
> drivers/block/nbd.c contains the following:
> 
> static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
>                        unsigned int cmd, unsigned long arg)
> {
> [...]
> 	case NBD_DO_IT: {
> [...]
>                 if (nbd->disconnect) /* user requested, ignore socket errors */
>                         return 0;
>                 return error;
> 	}
> [...]
> 
> so the signalling part of it is at least still there. Whether it works,
> I haven't tested.

I just looked up the kernel code from 4.0. This code was there as
well. But the socket and blockdevice were both destroyed before leaving
the NBD_DO_IT ioctl. So it seems to have never been really persistent.
Filesystems would have still been killed.

So for a persistent nbd device there is some more code necessary to do
it.

Best Regards,

Markus

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Nbd] [PATCH] NBD: replace kill_bdev() with __invalidate_device()
  2016-05-19  6:35                 ` Markus Pargmann
@ 2016-05-24 15:17                   ` Wouter Verhelst
  0 siblings, 0 replies; 10+ messages in thread
From: Wouter Verhelst @ 2016-05-24 15:17 UTC (permalink / raw)
  To: Markus Pargmann
  Cc: nbd-general, Vinod Jayaraman, jack, linux-kernel, Gou Rao, pbonzini

On Thu, May 19, 2016 at 08:35:03AM +0200, Markus Pargmann wrote:
> Hi Wouter,
> 
> On Sun, May 15, 2016 at 02:55:39PM +0200, Wouter Verhelst wrote:
> > Hi Markus,
> > 
> > On Thu, May 12, 2016 at 11:53:01AM +0200, Markus Pargmann wrote:
> > > On Thursday 28 April 2016 18:27:34 Wouter Verhelst wrote:
> > > > However, at some point I agreed with Paul (your predecessor) that when
> > > > this happens due to an error condition (as opposed to it being due to an
> > > > explicit disconnect), the kernel would block all reads from or writes to
> > > > the device, and the client may try to reconnect *from the same
> > > > PID* (i.e., it may not fork()). If that succeeds, the next NBD_DO_IT is
> > > > assumed to be connected to the same server; if instead the process
> > > > exits, then the block device is assumed to be dead, will be reset, and
> > > > all pending reads or writes would error.
> > > > 
> > > > In principle, this allows for a proper reconnect from userspace if it
> > > > can be done. However, I'm not sure whether this ever worked well or
> > > > whether it was documented, so it's probably fine if you think it should
> > > > be replaced with something else.
> > > 
> > > At least I was not aware of this possibility. As far as I know the
> > > previous code even had issues with the signals used to kill on timeouts
> > > which also killed the userspace program sometimes.
> > > 
> > > Currently I can't see a code path that supports reconnects. But I may
> > > have removed that accidently in the past.
> > 
> > Right. Like I said, I'm not sure if it ever worked well. The user space
> > client has a -persist option that tries to implement it, but I've been
> > getting some bug reports from people who've tried it (although that may
> > have been my fault rather than the kernel's).
> > 
> > > > (obviously, userspace reconnecting the device to a different device is
> > > > wrong and should not be done, but that's a case of "if you break it, you
> > > > get to keep both pieces)
> > > > 
> > > > At any rate, I think it makes sense for userspace to be given a chance
> > > > to *attempt* to reconnect a device when the connection drops
> > > > unexpectedly.
> > > 
> > > Perhaps it would be better to setup the kernel driver explicitly for
> > > that. Perhaps some flag to let the kernel driver know that the client
> > > would like to keep the block device open? In that case the client could
> > > excplicitly use NBD_CLEAR_SOCK to cleanup everything.
> > 
> > I'm not sure what you mean by this. Can you clarify?
> 
> I meant that it might be better to have a separate way for NBD_DO_IT.
> Something where the client software can directly instruct the kernel to
> keep everything opened in case of an error so that the client may
> reconnect afterwards.
>
> This could be a new ioctl that sets it up, for example 'NBD_PERSISTENT'.

If we're going to add a new ioctl, I think it might be useful to have a
second "flags" type ioctl; one where the client can set options, and
where the return value of the ioctl indicates which options the kernel
understands and will/can honour.

This could then also be used for things like checking whether the kernel
supports structured writes, etc.

> The NBD_DO_IT afterwards would keep everything up and running in case of
> a connection error so that the client could set a new socket using
> NBD_SET_SOCK and reenter using NBD_DO_IT.
> 
> For all clients that are not capable of this mechanism or don't use it,
> NBD_DO_IT would clean up everything properly on any error.

Right.

> > > Or perhaps a completely new ioctl that can transmit back some more
> > > information about what failures were seen and whether the blockdevice
> > > was closed or not?
> > 
> > The intent was that ioctl(NBD_DO_IT) would return an error when the
> > disconnect was not requested, and would return 0 when the connection
> > dropped due to userspace doing ioctl(NBD_DISCONNECT), since dropping the
> > connection when userspace explicitly asks for it is not an error.
> > 
> > drivers/block/nbd.c contains the following:
> > 
> > static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
> >                        unsigned int cmd, unsigned long arg)
> > {
> > [...]
> > 	case NBD_DO_IT: {
> > [...]
> >                 if (nbd->disconnect) /* user requested, ignore socket errors */
> >                         return 0;
> >                 return error;
> > 	}
> > [...]
> > 
> > so the signalling part of it is at least still there. Whether it works,
> > I haven't tested.
> 
> I just looked up the kernel code from 4.0. This code was there as
> well. But the socket and blockdevice were both destroyed before leaving
> the NBD_DO_IT ioctl. So it seems to have never been really persistent.
> Filesystems would have still been killed.
> 
> So for a persistent nbd device there is some more code necessary to do
> it.

Okay.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-05-24 15:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <56E711D1.9090708@gmail.com>
     [not found] ` <2358965.KUbuAa1sts@adelgunde>
     [not found]   ` <56F34412.1050707@gmail.com>
2016-04-20 11:06     ` [PATCH] NBD: replace kill_bdev() with __invalidate_device() Markus Pargmann
2016-04-23  2:17       ` Ratna Manoj
2016-04-28  9:00         ` Markus Pargmann
2016-04-28 16:27           ` [Nbd] " Wouter Verhelst
2016-04-28 18:43             ` Ratna Manoj
2016-05-12  9:53             ` Markus Pargmann
2016-05-15 12:55               ` Wouter Verhelst
2016-05-19  6:35                 ` Markus Pargmann
2016-05-24 15:17                   ` Wouter Verhelst
2016-05-10  9:24         ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).