All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] libfc: unsafe refcounting in fc_rport_work()
@ 2016-04-20 13:24 Hannes Reinecke
  2016-04-20 14:17 ` Johannes Thumshirn
  2016-04-20 19:03 ` James Bottomley
  0 siblings, 2 replies; 8+ messages in thread
From: Hannes Reinecke @ 2016-04-20 13:24 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, Ewan Milne, James Bottomley, linux-scsi,
	Hannes Reinecke

When pushing items on a workqueue we cannot take reference
when the workqueue item is executed, as the structure might
already been freed at that time.
So instead we need to take a reference before adding it
to the workqueue, thereby ensuring that the workqueue item
will always be valid.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/libfc/fc_rport.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/libfc/fc_rport.c b/drivers/scsi/libfc/fc_rport.c
index 589ff9a..8b08263f 100644
--- a/drivers/scsi/libfc/fc_rport.c
+++ b/drivers/scsi/libfc/fc_rport.c
@@ -263,7 +263,6 @@ static void fc_rport_work(struct work_struct *work)
 		ids = rdata->ids;
 		rdata->event = RPORT_EV_NONE;
 		rdata->major_retries = 0;
-		kref_get(&rdata->kref);
 		mutex_unlock(&rdata->rp_mutex);
 
 		if (!rport)
@@ -297,7 +296,6 @@ static void fc_rport_work(struct work_struct *work)
 			FC_RPORT_DBG(rdata, "lld callback ev %d\n", event);
 			rdata->lld_event_callback(lport, rdata, event);
 		}
-		kref_put(&rdata->kref, lport->tt.rport_destroy);
 		break;
 
 	case RPORT_EV_FAILED:
@@ -377,6 +375,7 @@ static void fc_rport_work(struct work_struct *work)
 		mutex_unlock(&rdata->rp_mutex);
 		break;
 	}
+	kref_put(&rdata->kref, lport->tt.rport_destroy);
 }
 
 /**
@@ -438,8 +437,15 @@ static void fc_rport_enter_delete(struct fc_rport_priv *rdata,
 
 	fc_rport_state_enter(rdata, RPORT_ST_DELETE);
 
-	if (rdata->event == RPORT_EV_NONE)
-		queue_work(rport_event_queue, &rdata->event_work);
+	if (rdata->event == RPORT_EV_NONE) {
+		if (!kref_get_unless_zero(&rdata->kref)) {
+			FC_RPORT_DBG(rdata, "port already deleted\n");
+			return;
+		}
+		if (!queue_work(rport_event_queue, &rdata->event_work))
+			kref_put(&rdata->kref,
+				 rdata->local_port->tt.rport_destroy);
+	}
 	rdata->event = event;
 }
 
@@ -487,8 +493,15 @@ static void fc_rport_enter_ready(struct fc_rport_priv *rdata)
 
 	FC_RPORT_DBG(rdata, "Port is Ready\n");
 
-	if (rdata->event == RPORT_EV_NONE)
-		queue_work(rport_event_queue, &rdata->event_work);
+	if (rdata->event == RPORT_EV_NONE) {
+		if (!kref_get_unless_zero(&rdata->kref)) {
+			FC_RPORT_DBG(rdata, "port already deleted\n");
+			return;
+		}
+		if (!queue_work(rport_event_queue, &rdata->event_work))
+			kref_put(&rdata->kref,
+				 rdata->local_port->tt.rport_destroy);
+	}
 	rdata->event = RPORT_EV_READY;
 }
 
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] libfc: unsafe refcounting in fc_rport_work()
  2016-04-20 13:24 [PATCH] libfc: unsafe refcounting in fc_rport_work() Hannes Reinecke
@ 2016-04-20 14:17 ` Johannes Thumshirn
  2016-04-20 19:03 ` James Bottomley
  1 sibling, 0 replies; 8+ messages in thread
From: Johannes Thumshirn @ 2016-04-20 14:17 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Christoph Hellwig, Ewan Milne,
	James Bottomley, linux-scsi

On Wed, Apr 20, 2016 at 03:24:21PM +0200, Hannes Reinecke wrote:
> When pushing items on a workqueue we cannot take reference
> when the workqueue item is executed, as the structure might
> already been freed at that time.
> So instead we need to take a reference before adding it
> to the workqueue, thereby ensuring that the workqueue item
> will always be valid.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] libfc: unsafe refcounting in fc_rport_work()
  2016-04-20 13:24 [PATCH] libfc: unsafe refcounting in fc_rport_work() Hannes Reinecke
  2016-04-20 14:17 ` Johannes Thumshirn
@ 2016-04-20 19:03 ` James Bottomley
  2016-04-20 19:19   ` Christoph Hellwig
                     ` (3 more replies)
  1 sibling, 4 replies; 8+ messages in thread
From: James Bottomley @ 2016-04-20 19:03 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, Ewan Milne, linux-scsi

On Wed, 2016-04-20 at 15:24 +0200, Hannes Reinecke wrote:
> When pushing items on a workqueue we cannot take reference
> when the workqueue item is executed, as the structure might
> already been freed at that time.
> So instead we need to take a reference before adding it
> to the workqueue, thereby ensuring that the workqueue item
> will always be valid.

Have you actually seen this happen?  The rdata structure is fully ref
counted, so if it's done a final put, then something should see
unreferenced memory.  It looks like the model is that the final put is
done from the queue, so I don't quite see how you can lose the final
reference in either of the places you alter.

Plus, kref_get_unless_zero() should not be used.  At that point, the
structure would be freed, so there's no point looking for it. 
 kref_get_unless_zero is for refcounts that don't necessarily free the
structure (embedded ones).

James


> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/libfc/fc_rport.c | 25 +++++++++++++++++++------
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/libfc/fc_rport.c
> b/drivers/scsi/libfc/fc_rport.c
> index 589ff9a..8b08263f 100644
> --- a/drivers/scsi/libfc/fc_rport.c
> +++ b/drivers/scsi/libfc/fc_rport.c
> @@ -263,7 +263,6 @@ static void fc_rport_work(struct work_struct
> *work)
>  		ids = rdata->ids;
>  		rdata->event = RPORT_EV_NONE;
>  		rdata->major_retries = 0;
> -		kref_get(&rdata->kref);
>  		mutex_unlock(&rdata->rp_mutex);
>  
>  		if (!rport)
> @@ -297,7 +296,6 @@ static void fc_rport_work(struct work_struct
> *work)
>  			FC_RPORT_DBG(rdata, "lld callback ev %d\n",
> event);
>  			rdata->lld_event_callback(lport, rdata,
> event);
>  		}
> -		kref_put(&rdata->kref, lport->tt.rport_destroy);
>  		break;
>  
>  	case RPORT_EV_FAILED:
> @@ -377,6 +375,7 @@ static void fc_rport_work(struct work_struct
> *work)
>  		mutex_unlock(&rdata->rp_mutex);
>  		break;
>  	}
> +	kref_put(&rdata->kref, lport->tt.rport_destroy);
>  }
>  
>  /**
> @@ -438,8 +437,15 @@ static void fc_rport_enter_delete(struct
> fc_rport_priv *rdata,
>  
>  	fc_rport_state_enter(rdata, RPORT_ST_DELETE);
>  
> -	if (rdata->event == RPORT_EV_NONE)
> -		queue_work(rport_event_queue, &rdata->event_work);
> +	if (rdata->event == RPORT_EV_NONE) {
> +		if (!kref_get_unless_zero(&rdata->kref)) {
> +			FC_RPORT_DBG(rdata, "port already
> deleted\n");
> +			return;
> +		}
> +		if (!queue_work(rport_event_queue, &rdata
> ->event_work))
> +			kref_put(&rdata->kref,
> +				 rdata->local_port
> ->tt.rport_destroy);
> +	}
>  	rdata->event = event;
>  }
>  
> @@ -487,8 +493,15 @@ static void fc_rport_enter_ready(struct
> fc_rport_priv *rdata)
>  
>  	FC_RPORT_DBG(rdata, "Port is Ready\n");
>  
> -	if (rdata->event == RPORT_EV_NONE)
> -		queue_work(rport_event_queue, &rdata->event_work);
> +	if (rdata->event == RPORT_EV_NONE) {
> +		if (!kref_get_unless_zero(&rdata->kref)) {
> +			FC_RPORT_DBG(rdata, "port already
> deleted\n");
> +			return;
> +		}
> +		if (!queue_work(rport_event_queue, &rdata
> ->event_work))
> +			kref_put(&rdata->kref,
> +				 rdata->local_port
> ->tt.rport_destroy);
> +	}
>  	rdata->event = RPORT_EV_READY;
>  }
>  


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] libfc: unsafe refcounting in fc_rport_work()
  2016-04-20 19:03 ` James Bottomley
@ 2016-04-20 19:19   ` Christoph Hellwig
  2016-04-21  2:25   ` Ewan Milne
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2016-04-20 19:19 UTC (permalink / raw)
  To: James Bottomley
  Cc: Hannes Reinecke, Martin K. Petersen, Christoph Hellwig,
	Ewan Milne, linux-scsi

On Wed, Apr 20, 2016 at 03:03:12PM -0400, James Bottomley wrote:
> Plus, kref_get_unless_zero() should not be used.  At that point, the
> structure would be freed, so there's no point looking for it. 

Agreed for this particular case.  Note that the whole code seems rather
whckay as it uses a rport_destroy method that just has one instance,
so I'd really like to see things cleaned up to remove this method
and add a warapper for the kref_put before doing further changes.

>  kref_get_unless_zero is for refcounts that don't necessarily free the
> structure (embedded ones).

The main case actually is for embedded krefs.  The important bit is that
the structure is on some lookup data structure (typically list), and
kref_get_unless_zero protects against the window from between dropping
the last references and doing the list removal in the destructor.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] libfc: unsafe refcounting in fc_rport_work()
  2016-04-20 19:03 ` James Bottomley
  2016-04-20 19:19   ` Christoph Hellwig
@ 2016-04-21  2:25   ` Ewan Milne
  2016-04-21 20:11     ` James Bottomley
  2016-04-21 12:39   ` Johannes Thumshirn
  2016-04-25  8:01   ` Hannes Reinecke
  3 siblings, 1 reply; 8+ messages in thread
From: Ewan Milne @ 2016-04-21  2:25 UTC (permalink / raw)
  To: James Bottomley
  Cc: Hannes Reinecke, Martin K. Petersen, Christoph Hellwig, linux-scsi


----- Original Message -----
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Hannes Reinecke <hare@suse.de>, Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>, Ewan Milne <emilne@redhat.com>, linux-scsi@vger.kernel.org
Sent: Wed, 20 Apr 2016 15:03:12 -0400 (EDT)
Subject: Re: [PATCH] libfc: unsafe refcounting in fc_rport_work()

On Wed, 2016-04-20 at 15:24 +0200, Hannes Reinecke wrote:
> When pushing items on a workqueue we cannot take reference
> when the workqueue item is executed, as the structure might
> already been freed at that time.
> So instead we need to take a reference before adding it
> to the workqueue, thereby ensuring that the workqueue item
> will always be valid.

>Have you actually seen this happen?  The rdata structure is fully ref
>counted, so if it's done a final put, then something should see
>unreferenced memory.  It looks like the model is that the final put is
>done from the queue, so I don't quite see how you can lose the final
>reference in either of the places you alter.

I have two dumps with freed rport structures that have delayed_work
items that are still on the active timer list.  Both are with FCoE, one
using bnx and the other using ixgbe.  Something is wrong.  I'm not
sure why we don't see anything on regular FC.

It seems to me that we should be holding a kref on the rport while
a work item is pending, but I'm not sure releasing it at the end of the
work function is OK, either.  The workqueue code might still reference
the embedded work or delayed_work on the way out, no?

I agree with Christoph, this whole thing needs looking at.  All the
delayed_work stuff seems problematic to me.  James S fixed a 3-way
deadlock involving this code a while back, for instance.

-Ewan


> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/libfc/fc_rport.c | 25 +++++++++++++++++++------
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/libfc/fc_rport.c
> b/drivers/scsi/libfc/fc_rport.c
> index 589ff9a..8b08263f 100644
> --- a/drivers/scsi/libfc/fc_rport.c
> +++ b/drivers/scsi/libfc/fc_rport.c
> @@ -263,7 +263,6 @@ static void fc_rport_work(struct work_struct
> *work)
>  		ids = rdata->ids;
>  		rdata->event = RPORT_EV_NONE;
>  		rdata->major_retries = 0;
> -		kref_get(&rdata->kref);
>  		mutex_unlock(&rdata->rp_mutex);
>  
>  		if (!rport)
> @@ -297,7 +296,6 @@ static void fc_rport_work(struct work_struct
> *work)
>  			FC_RPORT_DBG(rdata, "lld callback ev %d\n",
> event);
>  			rdata->lld_event_callback(lport, rdata,
> event);
>  		}
> -		kref_put(&rdata->kref, lport->tt.rport_destroy);
>  		break;
>  
>  	case RPORT_EV_FAILED:
> @@ -377,6 +375,7 @@ static void fc_rport_work(struct work_struct
> *work)
>  		mutex_unlock(&rdata->rp_mutex);
>  		break;
>  	}
> +	kref_put(&rdata->kref, lport->tt.rport_destroy);
>  }
>  
>  /**
> @@ -438,8 +437,15 @@ static void fc_rport_enter_delete(struct
> fc_rport_priv *rdata,
>  
>  	fc_rport_state_enter(rdata, RPORT_ST_DELETE);
>  
> -	if (rdata->event == RPORT_EV_NONE)
> -		queue_work(rport_event_queue, &rdata->event_work);
> +	if (rdata->event == RPORT_EV_NONE) {
> +		if (!kref_get_unless_zero(&rdata->kref)) {
> +			FC_RPORT_DBG(rdata, "port already
> deleted\n");
> +			return;
> +		}
> +		if (!queue_work(rport_event_queue, &rdata
> ->event_work))
> +			kref_put(&rdata->kref,
> +				 rdata->local_port
> ->tt.rport_destroy);
> +	}
>  	rdata->event = event;
>  }
>  
> @@ -487,8 +493,15 @@ static void fc_rport_enter_ready(struct
> fc_rport_priv *rdata)
>  
>  	FC_RPORT_DBG(rdata, "Port is Ready\n");
>  
> -	if (rdata->event == RPORT_EV_NONE)
> -		queue_work(rport_event_queue, &rdata->event_work);
> +	if (rdata->event == RPORT_EV_NONE) {
> +		if (!kref_get_unless_zero(&rdata->kref)) {
> +			FC_RPORT_DBG(rdata, "port already
> deleted\n");
> +			return;
> +		}
> +		if (!queue_work(rport_event_queue, &rdata
> ->event_work))
> +			kref_put(&rdata->kref,
> +				 rdata->local_port
> ->tt.rport_destroy);
> +	}
>  	rdata->event = RPORT_EV_READY;
>  }
>  



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] libfc: unsafe refcounting in fc_rport_work()
  2016-04-20 19:03 ` James Bottomley
  2016-04-20 19:19   ` Christoph Hellwig
  2016-04-21  2:25   ` Ewan Milne
@ 2016-04-21 12:39   ` Johannes Thumshirn
  2016-04-25  8:01   ` Hannes Reinecke
  3 siblings, 0 replies; 8+ messages in thread
From: Johannes Thumshirn @ 2016-04-21 12:39 UTC (permalink / raw)
  To: James Bottomley
  Cc: Hannes Reinecke, Martin K. Petersen, Christoph Hellwig,
	Ewan Milne, linux-scsi

On Wed, Apr 20, 2016 at 03:03:12PM -0400, James Bottomley wrote:
> On Wed, 2016-04-20 at 15:24 +0200, Hannes Reinecke wrote:
> > When pushing items on a workqueue we cannot take reference
> > when the workqueue item is executed, as the structure might
> > already been freed at that time.
> > So instead we need to take a reference before adding it
> > to the workqueue, thereby ensuring that the workqueue item
> > will always be valid.
> 
> Have you actually seen this happen?  The rdata structure is fully ref
> counted, so if it's done a final put, then something should see
> unreferenced memory.  It looks like the model is that the final put is
> done from the queue, so I don't quite see how you can lose the final
> reference in either of the places you alter.
> 

I _think_ I have seen this, however I'm not 100% certain. What I can say
is, I have crash dumps of lpfc, fnic and bnx2fc with either no longer
vaild pointers (use after free) or overwritten pointers (one has ASCII
'O' 'W' 'N' written to it), but I never have had a chance to reproduce
them in a test environment. It's even possible that these are all totally
unrelated issues, again, I'm not certain at all.

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] libfc: unsafe refcounting in fc_rport_work()
  2016-04-21  2:25   ` Ewan Milne
@ 2016-04-21 20:11     ` James Bottomley
  0 siblings, 0 replies; 8+ messages in thread
From: James Bottomley @ 2016-04-21 20:11 UTC (permalink / raw)
  To: Ewan Milne
  Cc: Hannes Reinecke, Martin K. Petersen, Christoph Hellwig, linux-scsi

On Wed, 2016-04-20 at 22:25 -0400, Ewan Milne wrote:
> ----- Original Message -----
> From: James Bottomley <James.Bottomley@HansenPartnership.com>
> To: Hannes Reinecke <hare@suse.de>, Martin K. Petersen <
> martin.petersen@oracle.com>
> Cc: Christoph Hellwig <hch@lst.de>, Ewan Milne <emilne@redhat.com>, 
> linux-scsi@vger.kernel.org
> Sent: Wed, 20 Apr 2016 15:03:12 -0400 (EDT)
> Subject: Re: [PATCH] libfc: unsafe refcounting in fc_rport_work()
> 
> On Wed, 2016-04-20 at 15:24 +0200, Hannes Reinecke wrote:
> > When pushing items on a workqueue we cannot take reference
> > when the workqueue item is executed, as the structure might
> > already been freed at that time.
> > So instead we need to take a reference before adding it
> > to the workqueue, thereby ensuring that the workqueue item
> > will always be valid.
> 
> > Have you actually seen this happen?  The rdata structure is fully
> > ref
> > counted, so if it's done a final put, then something should see
> > unreferenced memory.  It looks like the model is that the final put
> > is
> > done from the queue, so I don't quite see how you can lose the
> > final
> > reference in either of the places you alter.
> 
> I have two dumps with freed rport structures that have delayed_work
> items that are still on the active timer list.  Both are with FCoE, 
> one using bnx and the other using ixgbe.  Something is wrong.  I'm 
> not sure why we don't see anything on regular FC.
> 
> It seems to me that we should be holding a kref on the rport while
> a work item is pending, but I'm not sure releasing it at the end of 
> the work function is OK, either.  The workqueue code might still
> reference the embedded work or delayed_work on the way out, no?

OK, so I'm by no means a libfc expert, but when I took a cursory look
at the code it seemed to me that the final kref_put is actually in the
workqueue, here:

	case RPORT_EV_FAILED:
	case RPORT_EV_LOGO:
	case
RPORT_EV_STOP:

[...]
		if (rdata->rp_state == RPORT_ST_DELETE) {
			
if (port_id == FC_FID_DIR_SERV) {
				rdata->event = RPORT_EV_NONE;
				
mutex_unlock(&rdata->rp_mutex);
				kref_put(&rdata->kref, lport
->tt.rport_destroy);
			} else if ((rdata->flags & FC_RP_STARTED)
&&
				   rdata->major_retries <
				   lport
->max_rport_retry_count) {
				rdata->major_retries++;
				rdata
->event = RPORT_EV_NONE;
				FC_RPORT_DBG(rdata, "work restart\n");
				
fc_rport_enter_flogi(rdata);
				mutex_unlock(&rdata->rp_mutex);
			
} else {
				FC_RPORT_DBG(rdata, "work delete\n");
				list_del_r
cu(&rdata->peers);
				mutex_unlock(&rdata->rp_mutex);
				kref_p
ut(&rdata->kref, lport->tt.rport_destroy);
			}

So the question I have is: what does this patch fix, because the rdata
structure can only be killed in the workqueue and therefore you should
never need an additional kref for the workqueue call itself?  If it
does fix something, it would indicate we have some sort of other race
in the code, and we should find and fix that.

> I agree with Christoph, this whole thing needs looking at.  All the
> delayed_work stuff seems problematic to me.  James S fixed a 3-way
> deadlock involving this code a while back, for instance.

I'm not saying there isn't a problem in libfc, I'd just like someone to
propose something better than what looks like a random band aid. 
 Ideally I'd like to understand what the problem is so I know we
actually fixed it.

James

> -Ewan
> 
> 
> > Signed-off-by: Hannes Reinecke <hare@suse.de>
> > ---
> >  drivers/scsi/libfc/fc_rport.c | 25 +++++++++++++++++++------
> >  1 file changed, 19 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/scsi/libfc/fc_rport.c
> > b/drivers/scsi/libfc/fc_rport.c
> > index 589ff9a..8b08263f 100644
> > --- a/drivers/scsi/libfc/fc_rport.c
> > +++ b/drivers/scsi/libfc/fc_rport.c
> > @@ -263,7 +263,6 @@ static void fc_rport_work(struct work_struct
> > *work)
> >  		ids = rdata->ids;
> >  		rdata->event = RPORT_EV_NONE;
> >  		rdata->major_retries = 0;
> > -		kref_get(&rdata->kref);
> >  		mutex_unlock(&rdata->rp_mutex);
> >  
> >  		if (!rport)
> > @@ -297,7 +296,6 @@ static void fc_rport_work(struct work_struct
> > *work)
> >  			FC_RPORT_DBG(rdata, "lld callback ev
> > %d\n",
> > event);
> >  			rdata->lld_event_callback(lport, rdata,
> > event);
> >  		}
> > -		kref_put(&rdata->kref, lport->tt.rport_destroy);
> >  		break;
> >  
> >  	case RPORT_EV_FAILED:
> > @@ -377,6 +375,7 @@ static void fc_rport_work(struct work_struct
> > *work)
> >  		mutex_unlock(&rdata->rp_mutex);
> >  		break;
> >  	}
> > +	kref_put(&rdata->kref, lport->tt.rport_destroy);
> >  }
> >  
> >  /**
> > @@ -438,8 +437,15 @@ static void fc_rport_enter_delete(struct
> > fc_rport_priv *rdata,
> >  
> >  	fc_rport_state_enter(rdata, RPORT_ST_DELETE);
> >  
> > -	if (rdata->event == RPORT_EV_NONE)
> > -		queue_work(rport_event_queue, &rdata->event_work);
> > +	if (rdata->event == RPORT_EV_NONE) {
> > +		if (!kref_get_unless_zero(&rdata->kref)) {
> > +			FC_RPORT_DBG(rdata, "port already
> > deleted\n");
> > +			return;
> > +		}
> > +		if (!queue_work(rport_event_queue, &rdata
> > ->event_work))
> > +			kref_put(&rdata->kref,
> > +				 rdata->local_port
> > ->tt.rport_destroy);
> > +	}
> >  	rdata->event = event;
> >  }
> >  
> > @@ -487,8 +493,15 @@ static void fc_rport_enter_ready(struct
> > fc_rport_priv *rdata)
> >  
> >  	FC_RPORT_DBG(rdata, "Port is Ready\n");
> >  
> > -	if (rdata->event == RPORT_EV_NONE)
> > -		queue_work(rport_event_queue, &rdata->event_work);
> > +	if (rdata->event == RPORT_EV_NONE) {
> > +		if (!kref_get_unless_zero(&rdata->kref)) {
> > +			FC_RPORT_DBG(rdata, "port already
> > deleted\n");
> > +			return;
> > +		}
> > +		if (!queue_work(rport_event_queue, &rdata
> > ->event_work))
> > +			kref_put(&rdata->kref,
> > +				 rdata->local_port
> > ->tt.rport_destroy);
> > +	}
> >  	rdata->event = RPORT_EV_READY;
> >  }
> >  
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] libfc: unsafe refcounting in fc_rport_work()
  2016-04-20 19:03 ` James Bottomley
                     ` (2 preceding siblings ...)
  2016-04-21 12:39   ` Johannes Thumshirn
@ 2016-04-25  8:01   ` Hannes Reinecke
  3 siblings, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2016-04-25  8:01 UTC (permalink / raw)
  To: James Bottomley, Martin K. Petersen
  Cc: Christoph Hellwig, Ewan Milne, linux-scsi

On 04/20/2016 09:03 PM, James Bottomley wrote:
> On Wed, 2016-04-20 at 15:24 +0200, Hannes Reinecke wrote:
>> When pushing items on a workqueue we cannot take reference
>> when the workqueue item is executed, as the structure might
>> already been freed at that time.
>> So instead we need to take a reference before adding it
>> to the workqueue, thereby ensuring that the workqueue item
>> will always be valid.
> 
> Have you actually seen this happen?  The rdata structure is fully ref
> counted, so if it's done a final put, then something should see
> unreferenced memory.  It looks like the model is that the final put is
> done from the queue, so I don't quite see how you can lose the final
> reference in either of the places you alter.
> 
Yes, I _did_ see this happen; a customer was complaining about a
soft lockup happening in fc_rport_timeout every 30 seconds.

> Plus, kref_get_unless_zero() should not be used.  At that point, the
> structure would be freed, so there's no point looking for it. 
>  kref_get_unless_zero is for refcounts that don't necessarily free the
> structure (embedded ones).
> 
Yes, you are right; turns out to be a problem with mutexes and krefs
in general (cf https://lkml.org/lkml/2015/2/11/245).

I'll be sending a new patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-04-25  8:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-20 13:24 [PATCH] libfc: unsafe refcounting in fc_rport_work() Hannes Reinecke
2016-04-20 14:17 ` Johannes Thumshirn
2016-04-20 19:03 ` James Bottomley
2016-04-20 19:19   ` Christoph Hellwig
2016-04-21  2:25   ` Ewan Milne
2016-04-21 20:11     ` James Bottomley
2016-04-21 12:39   ` Johannes Thumshirn
2016-04-25  8:01   ` Hannes Reinecke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.