* Re: [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown
2017-03-29 2:29 [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown Junichi Nomura
@ 2017-03-29 11:17 ` Johannes Thumshirn
2017-03-29 23:26 ` Junichi Nomura
2017-04-03 21:51 ` [PATCH] lpfc: fix double free of bound CQ/WQ ring pointer Mauricio Faria de Oliveira
2017-04-03 21:53 ` [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown Mauricio Faria de Oliveira
2 siblings, 1 reply; 7+ messages in thread
From: Johannes Thumshirn @ 2017-03-29 11:17 UTC (permalink / raw)
To: Junichi Nomura
Cc: linux-scsi, dick.kennedy, james.smart, anton, martin.petersen
On Wed, Mar 29, 2017 at 02:29:45AM +0000, Junichi Nomura wrote:
> Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"),
> "rmmod lpfc" starting to cause panic or corruption due to double free.
>
> The double-free occurs as followings:
> - During initialization, lpfc_create_wq_cq() binds cq and wq to
> the same ring in the way that both cq->pring and wq->pring point
> to the same object.
> - Upon removal, lpfc_sli4_queue_destroy() ends up calling
> lpfc_sli4_queue_free() for both wqs and cqs
> and kfree(queue->pring) is done twice.
>
> The problem became more visible in v4.11-rc3 because commit 85e8a23936ab
> ("scsi: lpfc: Add shutdown method for kexec") made lpfc_pci_remove_one()
> called during driver shutdown.
Well the obvious band-aid would be setting the pointers to NULL after freeing
them. lpfc_sli4_queue_free() checks for queue's precense and doesn't use
queue->pring prior to freeing it, so the following _should_ to the trick:
>From befa936d8935a1bed01df65b376f515fa42c99da Mon Sep 17 00:00:00 2001
From: Johannes Thumshirn <jthumshirn@suse.de>
Date: Wed, 29 Mar 2017 13:08:55 +0200
Subject: [PATCH] lpfc: prevent double free of lpfc queue ring pointer
Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications")
rmoving the lpfc module causes a double free in lpfc_sli4_queue_free().
This can be prevented by setting the queue->pring and queue pointers to NULL,
so kfree() will simply ignore the pointers on a second call.
Reported-by: Junichi Nomura <j-nomura@ce.jp.nec.com>
Fixes: 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications")
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
drivers/scsi/lpfc/lpfc_sli.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 1c9fa45..86e1529 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -13759,7 +13759,9 @@ lpfc_sli4_queue_free(struct lpfc_queue *queue)
kfree(queue->rqbp);
}
kfree(queue->pring);
+ queue->pring = NULL;
kfree(queue);
+ queue = NULL;
return;
}
--
2.10.2
I'll have a look if we at the callers of lpfc_sli4_queue_free() as well
and check if there's a better (a.k.a more correct) way to fix this.
Byte,
Johannes
--
Johannes Thumshirn Storage
jthumshirn@suse.de +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown
2017-03-29 11:17 ` Johannes Thumshirn
@ 2017-03-29 23:26 ` Junichi Nomura
0 siblings, 0 replies; 7+ messages in thread
From: Junichi Nomura @ 2017-03-29 23:26 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: linux-scsi, dick.kennedy, james.smart, anton, martin.petersen
On 03/29/17 20:17, Johannes Thumshirn wrote:
> On Wed, Mar 29, 2017 at 02:29:45AM +0000, Junichi Nomura wrote:
>> The double-free occurs as followings:
>> - During initialization, lpfc_create_wq_cq() binds cq and wq to
>> the same ring in the way that both cq->pring and wq->pring point
>> to the same object.
>> - Upon removal, lpfc_sli4_queue_destroy() ends up calling
>> lpfc_sli4_queue_free() for both wqs and cqs
>> and kfree(queue->pring) is done twice.
>>
>> The problem became more visible in v4.11-rc3 because commit 85e8a23936ab
>> ("scsi: lpfc: Add shutdown method for kexec") made lpfc_pci_remove_one()
>> called during driver shutdown.
>
> Well the obvious band-aid would be setting the pointers to NULL after freeing
> them. lpfc_sli4_queue_free() checks for queue's precense and doesn't use
> queue->pring prior to freeing it, so the following _should_ to the trick:
>
> From befa936d8935a1bed01df65b376f515fa42c99da Mon Sep 17 00:00:00 2001
> From: Johannes Thumshirn <jthumshirn@suse.de>
> Date: Wed, 29 Mar 2017 13:08:55 +0200
> Subject: [PATCH] lpfc: prevent double free of lpfc queue ring pointer
>
> Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications")
> rmoving the lpfc module causes a double free in lpfc_sli4_queue_free().
>
> This can be prevented by setting the queue->pring and queue pointers to NULL,
> so kfree() will simply ignore the pointers on a second call.
No, it doesn't work.
Even if lpfc_sli4_queue_free(wq) sets wq->pring to NULL, cq->pring still
holds bogus pointer and lpfc_sli4_queue_free(cq) will call kfree(cq->pring)
and cause double-free.
--
Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] lpfc: fix double free of bound CQ/WQ ring pointer
2017-03-29 2:29 [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown Junichi Nomura
2017-03-29 11:17 ` Johannes Thumshirn
@ 2017-04-03 21:51 ` Mauricio Faria de Oliveira
2017-04-03 21:53 ` [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown Mauricio Faria de Oliveira
2 siblings, 0 replies; 7+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-04-03 21:51 UTC (permalink / raw)
To: jthumshirn; +Cc: linux-scsi, dick.kennedy, james.smart, anton, martin.petersen
commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications")
binds the CQs and WQs ring pointer (sets it to same address on both).
lpfc_create_wq_cq():
...
rc = lpfc_cq_create(phba, cq, eq, <...>)
...
rc = lpfc_wq_create(phba, wq, cq, qtype);
...
/* Bind this CQ/WQ to the NVME ring */
pring = wq->pring;
...
cq->pring = pring;
...
The commit frees both CQ & WQ for FCP/NVME on lpfc_sli4_queue_destroy(),
which causes a double free (potential corruption or panic) on freeing
the ring pointer of the second entity (CQ is first, WQ is second):
lpfc_pci_remove_one() # that is, .remove / .shutdown
-> lpfc_pci_remove_one_s4()
-> lpfc_sli4_hba_unset()
-> lpfc_sli4_queue_destroy()
-> lpfc_sli4_release_queues() # Release FCP/NVME cqs
-> __lpfc_sli4_release_queue()
-> lpfc_sli4_queue_free()
-> kfree(queue->pring) # first free
-> lpfc_sli4_release_queues() # Release FCP/NVME wqs
-> __lpfc_sli4_release_queue()
-> lpfc_sli4_queue_free()
-> kfree(queue->pring) # second free
So, check for WQs in lpfc_sli4_queue_free() and do not free the pring,
as it is freed before in the bound CQ. [the WQs are created only via
lpfc_wq_create(), which sets struct lpfc_queue::type == LPFC_WQ. And
that happens in 2 sites (lpfc_create_wq_cq() & lpfc_fof_queue_setup()),
both of which bind the CQs & WQs. Thus, checking for the LPFC_WQ type
correlates to whether the WQ is bound to a CQ, which is freed first.]
Additional details:
For reference, that binding also occurs on one other function:
lpfc_fof_queue_setup():
...
rc = lpfc_cq_create(phba, phba->sli4_hba.oas_cq, <...>)
...
rc = lpfc_wq_create(phba, phba->sli4_hba.oas_wq, <...>)
...
/* Bind this CQ/WQ to the NVME ring */
pring = phba->sli4_hba.oas_wq->pring;
...
phba->sli4_hba.oas_cq->pring = pring;
And used to occur similarly on lpfc_sli4_queue_setup(), but was changed
by that commit; although the problem is more related to the new freeing
pattern introduced in lpfc_sli4_queue_destroy() plus the bound CQs/WQs.
- /* Bind this WQ to the next FCP ring */
- pring = &psli->ring[MAX_SLI3_CONFIGURED_RINGS + fcp_wqidx];
...
- phba->sli4_hba.fcp_cq[fcp_wqidx]->pring = pring;
commit 85e8a23936ab ("scsi: lpfc: Add shutdown method for kexec") made
this more likely as lpfc_pci_remove_one() is called on driver shutdown
(e.g., modprobe -r / rmmod).
(this patch is partially based on a different patch suggested by Johannes,
thus adding a Suggested-by tag for due credit.)
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Reported-by: Junichi Nomura <j-nomura@ce.jp.nec.com>
Suggested-by: Johannes Thumshirn <jthumshirn@suse.de>
---
drivers/scsi/lpfc/lpfc_sli.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 1c9fa45df7eb..8befe841adaa 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -13758,7 +13758,14 @@ void lpfc_sli4_els_xri_abort_event_proc(struct lpfc_hba *phba)
lpfc_free_rq_buffer(queue->phba, queue);
kfree(queue->rqbp);
}
- kfree(queue->pring);
+
+ /*
+ * The WQs/CQs' pring is bound (same pointer).
+ * So free it only once, and not again on WQ.
+ */
+ if (queue->type != LPFC_WQ)
+ kfree(queue->pring);
+
kfree(queue);
return;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown
2017-03-29 2:29 [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown Junichi Nomura
2017-03-29 11:17 ` Johannes Thumshirn
2017-04-03 21:51 ` [PATCH] lpfc: fix double free of bound CQ/WQ ring pointer Mauricio Faria de Oliveira
@ 2017-04-03 21:53 ` Mauricio Faria de Oliveira
2017-04-04 2:10 ` Junichi Nomura
2 siblings, 1 reply; 7+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-04-03 21:53 UTC (permalink / raw)
To: Junichi Nomura, linux-scsi, dick.kennedy, james.smart
Cc: anton, martin.petersen
Hi Junichi,
On 03/28/2017 11:29 PM, Junichi Nomura wrote:
> Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"),
> "rmmod lpfc" starting to cause panic or corruption due to double free.
Thanks for the report. Can you please check whether the patch just sent
([PATCH] lpfc: fix double free of bound CQ/WQ ring pointer) resolves it?
I don't have a setup to test it handy right now.
cheers,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown
2017-04-03 21:53 ` [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown Mauricio Faria de Oliveira
@ 2017-04-04 2:10 ` Junichi Nomura
2017-04-04 12:07 ` Mauricio Faria de Oliveira
0 siblings, 1 reply; 7+ messages in thread
From: Junichi Nomura @ 2017-04-04 2:10 UTC (permalink / raw)
To: Mauricio Faria de Oliveira, linux-scsi, dick.kennedy, james.smart
Cc: anton, martin.petersen
On 04/04/17 06:53, Mauricio Faria de Oliveira wrote:
> Hi Junichi,
>
> On 03/28/2017 11:29 PM, Junichi Nomura wrote:
>> Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"),
>> "rmmod lpfc" starting to cause panic or corruption due to double free.
>
> Thanks for the report. Can you please check whether the patch just sent
> ([PATCH] lpfc: fix double free of bound CQ/WQ ring pointer) resolves it?
It works for me. Thank you!
Considering future maintenance, it might be a bit fragile to just depend
on the code comment about representing the relation between cq/wq and
shared pring but it's maintainers' choice.
--
Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [REGRESSION] v4.11-rc3: lpfc: panic during module removal / shutdown
2017-04-04 2:10 ` Junichi Nomura
@ 2017-04-04 12:07 ` Mauricio Faria de Oliveira
0 siblings, 0 replies; 7+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-04-04 12:07 UTC (permalink / raw)
To: Junichi Nomura, james.smart, martin.petersen
Cc: linux-scsi, dick.kennedy, anton
Hi Martin and Junichi,
On 04/03/2017 11:10 PM, Junichi Nomura wrote:
> On 04/04/17 06:53, Mauricio Faria de Oliveira wrote:
>> On 03/28/2017 11:29 PM, Junichi Nomura wrote:
>>> Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"),
>>> "rmmod lpfc" starting to cause panic or corruption due to double free.
>> Thanks for the report. Can you please check whether the patch just sent
>> ([PATCH] lpfc: fix double free of bound CQ/WQ ring pointer) resolves it?
> It works for me. Thank you!
Excellent, thanks!
Martin, can you review/consider it for 4.11-rc6, please?
> Considering future maintenance, it might be a bit fragile to just depend
> on the code comment about representing the relation between cq/wq and
> shared pring but it's maintainers' choice.
I agree -- there should be a better way of identifying a bound WQ/CQ.
Perhaps there is, but I couldn't find it currently.
For now, as far as I could grep and examine the code (detailed in commit
message), a WQ is always bound to a CQ, so to check for WQ and not free
its ring pointer seems to be sufficient (as the CQ ring pointer is freed
first).
If that changes, probably some form of flagging and/or queue type
determination would be better/necessary.
cheers,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 7+ messages in thread