All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] sd async probing
@ 2017-12-12  8:57 Hannes Reinecke
  2017-12-12  8:57 ` [PATCH 1/4] block: check for GENHD_FL_UP in del_gendisk() Hannes Reinecke
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Hannes Reinecke @ 2017-12-12  8:57 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Bart van Assche, linux-scsi,
	Hannes Reinecke

Hi all,

here's a patchset to facilitate full async scsi disk probing.
The current approach has the problem of calling
async_synchronize_full_domain() during sd_remove(), which is required
to wait for _all_ current asynchronous probes to complete.
When running under iSER sd_remove() is called from the RX thread directly,
ie sd_remove() has to complete before RX processing can continue.
But as probing has to issue I/O any response will be stuck in the RX queue,
resulting in a deadlock.

This patchset breaks up the synchronization point into a per-device cookie,
thus removing the global synchronisation and breaking the deadlock.
But during testing some more deficiencies up and down the stack had been
revealed, so we'll need the entire patchset to make it work.

As usual, comments and reviews are welcome.

Hannes Reinecke (4):
  block: check for GENHD_FL_UP in del_gendisk()
  sd: Check if parent is still present before proceeding with probing
  scsi: add missing get_device() return value checks
  sd: use async_probe cookie to avoid deadlocks

 block/genhd.c            |  3 +++
 drivers/scsi/scsi_scan.c | 14 ++++++++++++--
 drivers/scsi/sd.c        | 19 +++++++++++++------
 drivers/scsi/sd.h        |  3 +++
 4 files changed, 31 insertions(+), 8 deletions(-)

-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] block: check for GENHD_FL_UP in del_gendisk()
  2017-12-12  8:57 [PATCH 0/4] sd async probing Hannes Reinecke
@ 2017-12-12  8:57 ` Hannes Reinecke
  2017-12-12 16:57   ` Bart Van Assche
  2017-12-12  8:57 ` [PATCH 2/4] sd: Check if parent is still present before proceeding with probing Hannes Reinecke
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Hannes Reinecke @ 2017-12-12  8:57 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Bart van Assche, linux-scsi,
	Hannes Reinecke

From: Hannes Reinecke <hare@suse.com>

When a device is probed asynchronously del_gendisk() might be called
before the async probing was run, causing del_gendisk() to crash
due to uninitialized sysfs objects.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 block/genhd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/genhd.c b/block/genhd.c
index c2223f1..cc40d95 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -697,6 +697,9 @@ void del_gendisk(struct gendisk *disk)
 	struct disk_part_iter piter;
 	struct hd_struct *part;
 
+	if (!(disk->flags & GENHD_FL_UP))
+		return;
+
 	blk_integrity_del(disk);
 	disk_del_events(disk);
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] sd: Check if parent is still present before proceeding with probing
  2017-12-12  8:57 [PATCH 0/4] sd async probing Hannes Reinecke
  2017-12-12  8:57 ` [PATCH 1/4] block: check for GENHD_FL_UP in del_gendisk() Hannes Reinecke
@ 2017-12-12  8:57 ` Hannes Reinecke
  2017-12-14 22:02   ` Bart Van Assche
  2017-12-12  8:57 ` [PATCH 3/4] scsi: add missing get_device() return value checks Hannes Reinecke
  2017-12-12  8:57 ` [PATCH 4/4] sd: use async_probe cookie to avoid deadlocks Hannes Reinecke
  3 siblings, 1 reply; 13+ messages in thread
From: Hannes Reinecke @ 2017-12-12  8:57 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Bart van Assche, linux-scsi,
	Hannes Reinecke, Hannes Reinecke

When a new SCSI disk is created we should be checking if the parent
device is actually present before proceeding with probing.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/sd.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index ab75ebd..228b0b62 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3399,6 +3399,10 @@ static int sd_probe(struct device *dev)
 	}
 
 	device_initialize(&sdkp->dev);
+
+	if (!get_device(dev))
+		goto out_free_index;
+
 	sdkp->dev.parent = dev;
 	sdkp->dev.class = &sd_disk_class;
 	dev_set_name(&sdkp->dev, "%s", dev_name(dev));
@@ -3407,7 +3411,6 @@ static int sd_probe(struct device *dev)
 	if (error)
 		goto out_free_index;
 
-	get_device(dev);
 	dev_set_drvdata(dev, sdkp);
 
 	get_device(&sdkp->dev);	/* prevent release before async_schedule */
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] scsi: add missing get_device() return value checks
  2017-12-12  8:57 [PATCH 0/4] sd async probing Hannes Reinecke
  2017-12-12  8:57 ` [PATCH 1/4] block: check for GENHD_FL_UP in del_gendisk() Hannes Reinecke
  2017-12-12  8:57 ` [PATCH 2/4] sd: Check if parent is still present before proceeding with probing Hannes Reinecke
@ 2017-12-12  8:57 ` Hannes Reinecke
  2017-12-14 22:03   ` Bart Van Assche
  2017-12-12  8:57 ` [PATCH 4/4] sd: use async_probe cookie to avoid deadlocks Hannes Reinecke
  3 siblings, 1 reply; 13+ messages in thread
From: Hannes Reinecke @ 2017-12-12  8:57 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Bart van Assche, linux-scsi,
	Hannes Reinecke, Hannes Reinecke

We need to validate that get_device() succeeded, otherwise
we'll end up working with invalid devices.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/scsi_scan.c | 14 ++++++++++++--
 drivers/scsi/sd.c        |  8 +++++---
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index be5e919..18edfd7 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -245,6 +245,10 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
 	INIT_WORK(&sdev->requeue_work, scsi_requeue_run_queue);
 
 	sdev->sdev_gendev.parent = get_device(&starget->dev);
+	if (!sdev->sdev_gendev.parent) {
+		kfree(sdev);
+		return NULL;
+	}
 	sdev->sdev_target = starget;
 
 	/* usually NULL and set by ->slave_alloc instead */
@@ -366,7 +370,8 @@ static struct scsi_target *__scsi_find_target(struct device *parent,
 		}
 	}
 	if (found_starget)
-		get_device(&found_starget->dev);
+		if (!get_device(&found_starget->dev))
+			found_starget = NULL;
 
 	return found_starget;
 }
@@ -436,6 +441,10 @@ static struct scsi_target *scsi_alloc_target(struct device *parent,
 	device_initialize(dev);
 	kref_init(&starget->reap_ref);
 	dev->parent = get_device(parent);
+	if (!dev->parent) {
+		kfree(starget);
+		return NULL;
+	}
 	dev_set_name(dev, "target%d:%d:%d", shost->host_no, channel, id);
 	dev->bus = &scsi_bus_type;
 	dev->type = &scsi_target_type;
@@ -469,7 +478,8 @@ static struct scsi_target *scsi_alloc_target(struct device *parent,
 			return NULL;
 		}
 	}
-	get_device(dev);
+	/* No good way to recover here; keep fingers crossed */
+	WARN_ON(!get_device(dev));
 
 	return starget;
 
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 228b0b62..abbab17 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -619,10 +619,12 @@ static struct scsi_disk *scsi_disk_get(struct gendisk *disk)
 
 	if (disk->private_data) {
 		sdkp = scsi_disk(disk);
-		if (scsi_device_get(sdkp->device) == 0)
-			get_device(&sdkp->dev);
-		else
+		if (scsi_device_get(sdkp->device))
+			sdkp = NULL;
+		else if (!get_device(&sdkp->dev)) {
+			scsi_device_put(sdkp->device);
 			sdkp = NULL;
+		}
 	}
 	mutex_unlock(&sd_ref_mutex);
 	return sdkp;
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] sd: use async_probe cookie to avoid deadlocks
  2017-12-12  8:57 [PATCH 0/4] sd async probing Hannes Reinecke
                   ` (2 preceding siblings ...)
  2017-12-12  8:57 ` [PATCH 3/4] scsi: add missing get_device() return value checks Hannes Reinecke
@ 2017-12-12  8:57 ` Hannes Reinecke
  2017-12-14 22:13   ` Bart Van Assche
  3 siblings, 1 reply; 13+ messages in thread
From: Hannes Reinecke @ 2017-12-12  8:57 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Bart van Assche, linux-scsi,
	Hannes Reinecke, Hannes Reinecke

With the current design we're waiting for all async probes to
finish when removing any sd device.
This might lead to a livelock where the 'remove' call is blocking
for any probe calls to finish, and the probe calls are waiting for
a response, which will never be processes as the thread handling
the responses is waiting for the remove call to finish.
Which is completely pointless as we only _really_ care for the
probe on _this_ device to be completed; any other probing can
happily continue for all we care.
So save the async probing cookie in the structure and only wait
if this specific probe is still active.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/sd.c | 6 ++++--
 drivers/scsi/sd.h | 3 +++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index abbab17..7bf20ca 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3416,7 +3416,8 @@ static int sd_probe(struct device *dev)
 	dev_set_drvdata(dev, sdkp);
 
 	get_device(&sdkp->dev);	/* prevent release before async_schedule */
-	async_schedule_domain(sd_probe_async, sdkp, &scsi_sd_probe_domain);
+	sdkp->async_probe = async_schedule_domain(sd_probe_async, sdkp,
+						  &scsi_sd_probe_domain);
 
 	return 0;
 
@@ -3454,7 +3455,8 @@ static int sd_remove(struct device *dev)
 	scsi_autopm_get_device(sdkp->device);
 
 	async_synchronize_full_domain(&scsi_sd_pm_domain);
-	async_synchronize_full_domain(&scsi_sd_probe_domain);
+	async_synchronize_cookie_domain(sdkp->async_probe,
+					&scsi_sd_probe_domain);
 	device_del(&sdkp->dev);
 	del_gendisk(sdkp->disk);
 	sd_shutdown(dev);
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 320de75..d8aff29 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -2,6 +2,8 @@
 #ifndef _SCSI_DISK_H
 #define _SCSI_DISK_H
 
+#include <linux/async.h>
+
 /*
  * More than enough for everybody ;)  The huge number of majors
  * is a leftover from 16bit dev_t days, we don't really need that
@@ -73,6 +75,7 @@ struct scsi_disk {
 	struct device	dev;
 	struct gendisk	*disk;
 	struct opal_dev *opal_dev;
+	async_cookie_t  async_probe;
 #ifdef CONFIG_BLK_DEV_ZONED
 	unsigned int	nr_zones;
 	unsigned int	zone_blocks;
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] block: check for GENHD_FL_UP in del_gendisk()
  2017-12-12  8:57 ` [PATCH 1/4] block: check for GENHD_FL_UP in del_gendisk() Hannes Reinecke
@ 2017-12-12 16:57   ` Bart Van Assche
  2017-12-13  7:36     ` Hannes Reinecke
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2017-12-12 16:57 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, hare

On Tue, 2017-12-12 at 09:57 +0100, Hannes Reinecke wrote:
> From: Hannes Reinecke <hare@suse.com>
> 
> When a device is probed asynchronously del_gendisk() might be called
> before the async probing was run, causing del_gendisk() to crash
> due to uninitialized sysfs objects.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>  block/genhd.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index c2223f1..cc40d95 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -697,6 +697,9 @@ void del_gendisk(struct gendisk *disk)
>  	struct disk_part_iter piter;
>  	struct hd_struct *part;
>  
> +	if (!(disk->flags & GENHD_FL_UP))
> +		return;
> +
>  	blk_integrity_del(disk);
>  	disk_del_events(disk);

Hello Hannes,

Thank you for having published your approach for increasing disk probing
concurrency. Your approach looks interesting to me. However, I don't think
that patches 1/4..3/4 are sufficient to avoid races between e.g.
device_add_disk() and del_gendisk(). As far as I know no locks are held
around the device_add_disk() and del_gendisk() calls. Does that mean that
del_gendisk() can call e.g. blk_integrity_del() before device_add_disk() has
called blk_integrity_add()?

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] block: check for GENHD_FL_UP in del_gendisk()
  2017-12-12 16:57   ` Bart Van Assche
@ 2017-12-13  7:36     ` Hannes Reinecke
  2017-12-14 21:47       ` Bart Van Assche
  0 siblings, 1 reply; 13+ messages in thread
From: Hannes Reinecke @ 2017-12-13  7:36 UTC (permalink / raw)
  To: Bart Van Assche, hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi

On 12/12/2017 05:57 PM, Bart Van Assche wrote:
> On Tue, 2017-12-12 at 09:57 +0100, Hannes Reinecke wrote:
>> From: Hannes Reinecke <hare@suse.com>
>>
>> When a device is probed asynchronously del_gendisk() might be called
>> before the async probing was run, causing del_gendisk() to crash
>> due to uninitialized sysfs objects.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.com>
>> ---
>>  block/genhd.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/block/genhd.c b/block/genhd.c
>> index c2223f1..cc40d95 100644
>> --- a/block/genhd.c
>> +++ b/block/genhd.c
>> @@ -697,6 +697,9 @@ void del_gendisk(struct gendisk *disk)
>>  	struct disk_part_iter piter;
>>  	struct hd_struct *part;
>>  
>> +	if (!(disk->flags & GENHD_FL_UP))
>> +		return;
>> +
>>  	blk_integrity_del(disk);
>>  	disk_del_events(disk);
> 
> Hello Hannes,
> 
> Thank you for having published your approach for increasing disk probing
> concurrency. Your approach looks interesting to me. However, I don't think
> that patches 1/4..3/4 are sufficient to avoid races between e.g.
> device_add_disk() and del_gendisk(). As far as I know no locks are held
> around the device_add_disk() and del_gendisk() calls. Does that mean that
> del_gendisk() can call e.g. blk_integrity_del() before device_add_disk() has
> called blk_integrity_add()?
> 
In principle, yes.
However, the overall idea here is that  device_add_disk() and
del_gendisk() are enclosed within upper layer procedures, which
themselves provide additional locking.
In our case the sd driver provided synchronisation guarantees ensuring
that device_add_disk() and del_gendisk() doesn't run concurrently.

if one is really concerned we could convert disk->flags to a bitmask,
and use atomic bitmask modification; that should avoid any concurrency
issues.
However, I'm not sure if it's worth it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.com			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] block: check for GENHD_FL_UP in del_gendisk()
  2017-12-13  7:36     ` Hannes Reinecke
@ 2017-12-14 21:47       ` Bart Van Assche
  0 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2017-12-14 21:47 UTC (permalink / raw)
  To: hare, hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi

On Wed, 2017-12-13 at 08:36 +0100, Hannes Reinecke wrote:
> On 12/12/2017 05:57 PM, Bart Van Assche wrote:
> > On Tue, 2017-12-12 at 09:57 +0100, Hannes Reinecke wrote:
> > > From: Hannes Reinecke <hare@suse.com>
> > > 
> > > When a device is probed asynchronously del_gendisk() might be called
> > > before the async probing was run, causing del_gendisk() to crash
> > > due to uninitialized sysfs objects.
> > > 
> > > Signed-off-by: Hannes Reinecke <hare@suse.com>
> > > ---
> > >  block/genhd.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/block/genhd.c b/block/genhd.c
> > > index c2223f1..cc40d95 100644
> > > --- a/block/genhd.c
> > > +++ b/block/genhd.c
> > > @@ -697,6 +697,9 @@ void del_gendisk(struct gendisk *disk)
> > >  	struct disk_part_iter piter;
> > >  	struct hd_struct *part;
> > >  
> > > +	if (!(disk->flags & GENHD_FL_UP))
> > > +		return;
> > > +
> > >  	blk_integrity_del(disk);
> > >  	disk_del_events(disk);
> > 
> > Hello Hannes,
> > 
> > Thank you for having published your approach for increasing disk probing
> > concurrency. Your approach looks interesting to me. However, I don't think
> > that patches 1/4..3/4 are sufficient to avoid races between e.g.
> > device_add_disk() and del_gendisk(). As far as I know no locks are held
> > around the device_add_disk() and del_gendisk() calls. Does that mean that
> > del_gendisk() can call e.g. blk_integrity_del() before device_add_disk() has
> > called blk_integrity_add()?
> 
> In principle, yes. However, the overall idea here is that device_add_disk()
> and del_gendisk() are enclosed within upper layer procedures, which themselves
> provide additional locking. In our case the sd driver provided synchronisation
> guarantees ensuring that device_add_disk() and del_gendisk() doesn't run
> concurrently.
> 
> if one is really concerned we could convert disk->flags to a bitmask, and use
> atomic bitmask modification; that should avoid any concurrency issues.

Hello Hannes,

Regarding the scenario explained in a previous e-mail: what guarantees that the
device_add_disk() call in sd_probe_async() does not happen concurrently with the
device_unregister() call from __scsi_remove_device()?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] sd: Check if parent is still present before proceeding with probing
  2017-12-12  8:57 ` [PATCH 2/4] sd: Check if parent is still present before proceeding with probing Hannes Reinecke
@ 2017-12-14 22:02   ` Bart Van Assche
  0 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2017-12-14 22:02 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, hare

On Tue, 2017-12-12 at 09:57 +0100, Hannes Reinecke wrote:
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index ab75ebd..228b0b62 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3399,6 +3399,10 @@ static int sd_probe(struct device *dev)
>  	}
>  
>  	device_initialize(&sdkp->dev);
> +
> +	if (!get_device(dev))
> +		goto out_free_index;

get_device(dev) returns the value of the 'dev' argument. Are you sure the
if-test is useful? Did you perhaps want to check sdp->sdev_state like
scsi_device_get() does?

>  	sdkp->dev.parent = dev;
>  	sdkp->dev.class = &sd_disk_class;
>  	dev_set_name(&sdkp->dev, "%s", dev_name(dev));
> @@ -3407,7 +3411,6 @@ static int sd_probe(struct device *dev)
>  	if (error)
>  		goto out_free_index;
>  
> -	get_device(dev);
>  	dev_set_drvdata(dev, sdkp);
>  
>  	get_device(&sdkp->dev);	/* prevent release before async_schedule */

Since the get_device() call has been moved up, are there any error paths that
have to be adjusted?

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] scsi: add missing get_device() return value checks
  2017-12-12  8:57 ` [PATCH 3/4] scsi: add missing get_device() return value checks Hannes Reinecke
@ 2017-12-14 22:03   ` Bart Van Assche
  2017-12-15 12:08     ` Hannes Reinecke
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2017-12-14 22:03 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, hare

On Tue, 2017-12-12 at 09:57 +0100, Hannes Reinecke wrote:
>  	sdev->sdev_gendev.parent = get_device(&starget->dev);
> +	if (!sdev->sdev_gendev.parent) {
> +		kfree(sdev);
> +		return NULL;
> +	}

Are you sure that get_device() can return NULL?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] sd: use async_probe cookie to avoid deadlocks
  2017-12-12  8:57 ` [PATCH 4/4] sd: use async_probe cookie to avoid deadlocks Hannes Reinecke
@ 2017-12-14 22:13   ` Bart Van Assche
  2017-12-15 14:08     ` Hannes Reinecke
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2017-12-14 22:13 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, hare

On Tue, 2017-12-12 at 09:57 +0100, Hannes Reinecke wrote:
> With the current design we're waiting for all async probes to
> finish when removing any sd device.
> This might lead to a livelock where the 'remove' call is blocking
> for any probe calls to finish, and the probe calls are waiting for
> a response, which will never be processes as the thread handling
> the responses is waiting for the remove call to finish.
> Which is completely pointless as we only _really_ care for the
> probe on _this_ device to be completed; any other probing can
> happily continue for all we care.
> So save the async probing cookie in the structure and only wait
> if this specific probe is still active.

From async_synchronize_cookie_domain():

	wait_event(async_done, lowest_in_progress(domain) >= cookie);

So async_synchronize_cookie_domain() also waits for multiple asynchronous
probes to finish. Does this patch have any advantages over the patch I
posted (https://marc.info/?l=linux-scsi&m=151275368714540)?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] scsi: add missing get_device() return value checks
  2017-12-14 22:03   ` Bart Van Assche
@ 2017-12-15 12:08     ` Hannes Reinecke
  0 siblings, 0 replies; 13+ messages in thread
From: Hannes Reinecke @ 2017-12-15 12:08 UTC (permalink / raw)
  To: Bart Van Assche, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, hare

On 12/14/2017 11:03 PM, Bart Van Assche wrote:
> On Tue, 2017-12-12 at 09:57 +0100, Hannes Reinecke wrote:
>>  	sdev->sdev_gendev.parent = get_device(&starget->dev);
>> +	if (!sdev->sdev_gendev.parent) {
>> +		kfree(sdev);
>> +		return NULL;
>> +	}
> 
> Are you sure that get_device() can return NULL?
> 
If it can't return NULL, why bother having a return value at all?
And this is not the only place where get_device() is checked for a NULL
return value.

We _really_ need to come up with a consistent view here.
If get_device() can never return NULL I would strongly argue for making
it a void function.
If it _can_ return NULL we need to check for it.

Either way I don't mind; but please stay consistent.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] sd: use async_probe cookie to avoid deadlocks
  2017-12-14 22:13   ` Bart Van Assche
@ 2017-12-15 14:08     ` Hannes Reinecke
  0 siblings, 0 replies; 13+ messages in thread
From: Hannes Reinecke @ 2017-12-15 14:08 UTC (permalink / raw)
  To: Bart Van Assche, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, hare

On 12/14/2017 11:13 PM, Bart Van Assche wrote:
> On Tue, 2017-12-12 at 09:57 +0100, Hannes Reinecke wrote:
>> With the current design we're waiting for all async probes to
>> finish when removing any sd device.
>> This might lead to a livelock where the 'remove' call is blocking
>> for any probe calls to finish, and the probe calls are waiting for
>> a response, which will never be processes as the thread handling
>> the responses is waiting for the remove call to finish.
>> Which is completely pointless as we only _really_ care for the
>> probe on _this_ device to be completed; any other probing can
>> happily continue for all we care.
>> So save the async probing cookie in the structure and only wait
>> if this specific probe is still active.
> 
> From async_synchronize_cookie_domain():
> 
> 	wait_event(async_done, lowest_in_progress(domain) >= cookie);
> 
> So async_synchronize_cookie_domain() also waits for multiple asynchronous
> probes to finish. Does this patch have any advantages over the patch I
> posted (https://marc.info/?l=linux-scsi&m=151275368714540)?
> 
Correct, it waits for all _previous_ entries to complete.
But this was precisely the point; previous entries (should) have all
necessary information to complete (they probably only have been
scheduled out for some reason), so a

The main advantage is that the change to make it work is relatively simple.
(And doesn't change the interface; something we as poor distribution
developer have to worry about...)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-12-15 14:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-12  8:57 [PATCH 0/4] sd async probing Hannes Reinecke
2017-12-12  8:57 ` [PATCH 1/4] block: check for GENHD_FL_UP in del_gendisk() Hannes Reinecke
2017-12-12 16:57   ` Bart Van Assche
2017-12-13  7:36     ` Hannes Reinecke
2017-12-14 21:47       ` Bart Van Assche
2017-12-12  8:57 ` [PATCH 2/4] sd: Check if parent is still present before proceeding with probing Hannes Reinecke
2017-12-14 22:02   ` Bart Van Assche
2017-12-12  8:57 ` [PATCH 3/4] scsi: add missing get_device() return value checks Hannes Reinecke
2017-12-14 22:03   ` Bart Van Assche
2017-12-15 12:08     ` Hannes Reinecke
2017-12-12  8:57 ` [PATCH 4/4] sd: use async_probe cookie to avoid deadlocks Hannes Reinecke
2017-12-14 22:13   ` Bart Van Assche
2017-12-15 14:08     ` Hannes Reinecke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.