All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] pci: aer: wait till the workqueue completes before free memory
@ 2015-12-17 14:32 Sebastian Andrzej Siewior
  2016-01-06 23:27 ` Bjorn Helgaas
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-12-17 14:32 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci

I start a binary which should flash the FPGA and re-enumare the PCI-BUS
and find a new device. It works most of the time. With SLUB debug it
crashes on each iteration with something like this (compressed output):

| pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000
| Unable to handle kernel paging request for data at address 0x27ef9e3e
| Faulting instruction address: 0x602f5328
| Oops: Kernel access of bad area, sig: 11 [#1]
| Workqueue: events aer_isr
| GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0
| NIP [602f5328] pci_walk_bus+0xd4/0x104

Register 25 has the user-after magic. As it turns out, the old PCIe
device is leaving, generates an error before it left, aer_irq() is fired,
it schedules a work item. What happens now is that free_irq() is
invoked, all resources are gone *before* the aes_isr() work item is
completed.
So to fix this, I flush the workqueue to ensure that there is no more
work pending.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
Bjorn, this could deserve a stable tag. However it seems to have been
like that even in v2.6.20.

 drivers/pci/pcie/aer/aerdrv.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c
index 0bf82a20a0fb..7acd27348098 100644
--- a/drivers/pci/pcie/aer/aerdrv.c
+++ b/drivers/pci/pcie/aer/aerdrv.c
@@ -282,8 +282,10 @@ static void aer_remove(struct pcie_device *dev)
 
 	if (rpc) {
 		/* If register interrupt service, it must be free. */
-		if (rpc->isr)
+		if (rpc->isr) {
 			free_irq(dev->irq, dev);
+			flush_work(&rpc->dpc_handler);
+		}
 
 		wait_event(rpc->wait_release, rpc->prod_idx == rpc->cons_idx);
 
-- 
2.6.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] pci: aer: wait till the workqueue completes before free memory
  2015-12-17 14:32 [PATCH] pci: aer: wait till the workqueue completes before free memory Sebastian Andrzej Siewior
@ 2016-01-06 23:27 ` Bjorn Helgaas
  2016-01-15 18:03   ` Sebastian Andrzej Siewior
  2016-01-15 18:36   ` [PATCH v2] " Sebastian Andrzej Siewior
  0 siblings, 2 replies; 7+ messages in thread
From: Bjorn Helgaas @ 2016-01-06 23:27 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Bjorn Helgaas, linux-pci

Hi Sebastian,

On Thu, Dec 17, 2015 at 03:32:43PM +0100, Sebastian Andrzej Siewior wrote:
> I start a binary which should flash the FPGA and re-enumare the PCI-BUS
> and find a new device. It works most of the time. With SLUB debug it
> crashes on each iteration with something like this (compressed output):
> 
> | pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000
> | Unable to handle kernel paging request for data at address 0x27ef9e3e
> | Faulting instruction address: 0x602f5328
> | Oops: Kernel access of bad area, sig: 11 [#1]
> | Workqueue: events aer_isr
> | GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0
> | NIP [602f5328] pci_walk_bus+0xd4/0x104
> 
> Register 25 has the user-after magic. As it turns out, the old PCIe
> device is leaving, generates an error before it left, aer_irq() is fired,
> it schedules a work item. What happens now is that free_irq() is
> invoked, all resources are gone *before* the aes_isr() work item is
> completed.
> So to fix this, I flush the workqueue to ensure that there is no more
> work pending.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> Bjorn, this could deserve a stable tag. However it seems to have been
> like that even in v2.6.20.
> 
>  drivers/pci/pcie/aer/aerdrv.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c
> index 0bf82a20a0fb..7acd27348098 100644
> --- a/drivers/pci/pcie/aer/aerdrv.c
> +++ b/drivers/pci/pcie/aer/aerdrv.c
> @@ -282,8 +282,10 @@ static void aer_remove(struct pcie_device *dev)
>  
>  	if (rpc) {
>  		/* If register interrupt service, it must be free. */
> -		if (rpc->isr)
> +		if (rpc->isr) {
>  			free_irq(dev->irq, dev);
> +			flush_work(&rpc->dpc_handler);
> +		}
>  
>  		wait_event(rpc->wait_release, rpc->prod_idx == rpc->cons_idx);

Your change looks reasonable.  But I'm curious about the wait_event()
just below it.  That *looks* like it's intended to do the same thing
as your flush_work().

Can you explain why the wait_event() isn't working?  If we add the
flush_work(), can we remove the wait_event() stuff?

Bjorn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pci: aer: wait till the workqueue completes before free memory
  2016-01-06 23:27 ` Bjorn Helgaas
@ 2016-01-15 18:03   ` Sebastian Andrzej Siewior
  2016-01-15 18:36   ` [PATCH v2] " Sebastian Andrzej Siewior
  1 sibling, 0 replies; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-01-15 18:03 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Bjorn Helgaas, linux-pci

* Bjorn Helgaas | 2016-01-06 17:27:58 [-0600]:

>Hi Sebastian,
Hi Bjorn,

>Your change looks reasonable.  But I'm curious about the wait_event()
>just below it.  That *looks* like it's intended to do the same thing
>as your flush_work().
Indeed.

>Can you explain why the wait_event() isn't working?  If we add the

aer_isr() invokes get_e_source() which increments rpc->cons_idx. So
the condition is valid after that and the function does not terminate
yes it invokes aer_isr_one_error().
That means if we have one CPU doing the ISR + workqueue task and another
CPU doing the aer_remove() removal thingy then the latter CPU evaluates
the condition to true and continues cleanup while the former is still in
aer_isr_one_error() wondering where the memory went.

>flush_work(), can we remove the wait_event() stuff?

I think so since its only purpose is to sync against removal which does
not work on SMP. So let me remove this and the wait_release member.

>Bjorn

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] pci: aer: wait till the workqueue completes before free memory
  2016-01-06 23:27 ` Bjorn Helgaas
  2016-01-15 18:03   ` Sebastian Andrzej Siewior
@ 2016-01-15 18:36   ` Sebastian Andrzej Siewior
  2016-01-21 20:57     ` Bjorn Helgaas
  1 sibling, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-01-15 18:36 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Bjorn Helgaas, linux-pci

I start a binary which should flash the FPGA and re-enumare the PCI-BUS
and find a new device. It works most of the time. With SLUB debug it
crashes on each iteration with something like this (compressed output):

| pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000
| Unable to handle kernel paging request for data at address 0x27ef9e3e
| Faulting instruction address: 0x602f5328
| Oops: Kernel access of bad area, sig: 11 [#1]
| Workqueue: events aer_isr
| GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0
| NIP [602f5328] pci_walk_bus+0xd4/0x104

Register 25 has the user-after magic. As it turns out, the old PCIe
device is leaving, generates an error before it left, aer_irq() is fired,
it schedules a work item. What happens now is that free_irq() is
invoked, all resources are gone *before* the aes_isr() work item is
completed.
So to fix this, I flush the workqueue to ensure that there is no more
work pending.
The wait_event() on wait_release should actually synchronized against
removal. However the condition (->prod_idx == ->cons_idx) is made true
before the function completes (aer_isr_one_error() is invoked right
after that) so it does not fulfill its purpose. Therefore I remove it.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1…v2:
    - remove wait_release since it is broken on SMP
    - don't flush the workqueue only if we have ->isr set because the
      workqueue could be scheduled via the inject module.

*compile* tested only because I don't have the HW at the moment.

Bjorn, this could deserve a stable tag. However it seems to have been
like that even in v2.6.20.

 drivers/pci/pcie/aer/aerdrv.c      | 4 +---
 drivers/pci/pcie/aer/aerdrv.h      | 1 -
 drivers/pci/pcie/aer/aerdrv_core.c | 2 --
 3 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c
index 0bf82a20a0fb..48d21e0edd56 100644
--- a/drivers/pci/pcie/aer/aerdrv.c
+++ b/drivers/pci/pcie/aer/aerdrv.c
@@ -262,7 +262,6 @@ static struct aer_rpc *aer_alloc_rpc(struct pcie_device *dev)
 	rpc->rpd = dev;
 	INIT_WORK(&rpc->dpc_handler, aer_isr);
 	mutex_init(&rpc->rpc_mutex);
-	init_waitqueue_head(&rpc->wait_release);
 
 	/* Use PCIe bus function to store rpc into PCIe device */
 	set_service_data(dev, rpc);
@@ -285,8 +284,7 @@ static void aer_remove(struct pcie_device *dev)
 		if (rpc->isr)
 			free_irq(dev->irq, dev);
 
-		wait_event(rpc->wait_release, rpc->prod_idx == rpc->cons_idx);
-
+		flush_work(&rpc->dpc_handler);
 		aer_disable_rootport(rpc);
 		kfree(rpc);
 		set_service_data(dev, NULL);
diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index 84420b7c9456..945c939a86c5 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -72,7 +72,6 @@ struct aer_rpc {
 					 * recovery on the same
 					 * root port hierarchy
 					 */
-	wait_queue_head_t wait_release;
 };
 
 struct aer_broadcast_data {
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index fba785e9df75..4e14de0f0f98 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -811,8 +811,6 @@ void aer_isr(struct work_struct *work)
 	while (get_e_source(rpc, &e_src))
 		aer_isr_one_error(p_device, &e_src);
 	mutex_unlock(&rpc->rpc_mutex);
-
-	wake_up(&rpc->wait_release);
 }
 
 /**
-- 
2.7.0.rc3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] pci: aer: wait till the workqueue completes before free memory
  2016-01-15 18:36   ` [PATCH v2] " Sebastian Andrzej Siewior
@ 2016-01-21 20:57     ` Bjorn Helgaas
  2016-01-23 20:09       ` Sebastian Andrzej Siewior
  2016-01-25 16:22       ` Bjorn Helgaas
  0 siblings, 2 replies; 7+ messages in thread
From: Bjorn Helgaas @ 2016-01-21 20:57 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Bjorn Helgaas, linux-pci

Hi Sebastian,

On Fri, Jan 15, 2016 at 07:36:25PM +0100, Sebastian Andrzej Siewior wrote:
> I start a binary which should flash the FPGA and re-enumare the PCI-BUS
> and find a new device. It works most of the time. With SLUB debug it
> crashes on each iteration with something like this (compressed output):
> 
> | pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000
> | Unable to handle kernel paging request for data at address 0x27ef9e3e
> | Faulting instruction address: 0x602f5328
> | Oops: Kernel access of bad area, sig: 11 [#1]
> | Workqueue: events aer_isr
> | GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0
> | NIP [602f5328] pci_walk_bus+0xd4/0x104
> 
> Register 25 has the user-after magic. As it turns out, the old PCIe
> device is leaving, generates an error before it left, aer_irq() is fired,
> it schedules a work item. What happens now is that free_irq() is
> invoked, all resources are gone *before* the aes_isr() work item is
> completed.
> So to fix this, I flush the workqueue to ensure that there is no more
> work pending.
> The wait_event() on wait_release should actually synchronized against
> removal. However the condition (->prod_idx == ->cons_idx) is made true
> before the function completes (aer_isr_one_error() is invoked right
> after that) so it does not fulfill its purpose. Therefore I remove it.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

I propose to merge this patch unchanged, but with the following
changelog.  I want to add a bit more detail about the concurrency
problem and remove a bit of the specific detail about your FPGA:


commit 9963c9487f733ef8fe3a06ce3398072a40f955bf
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri Jan 15 19:36:25 2016 +0100

PCI/AER: Flush workqueue on device remove to avoid use-after-free

A Root Port's AER structure (rpc) contains a queue of events.  aer_irq()
enqueues AER status information and schedules aer_isr() to dequeue and
process it.  When we remove a device, aer_remove() waits for the queue to
be empty, then frees the rpc struct.

But aer_isr() references the rpc struct after dequeueing and possibly
emptying the queue, which can cause a use-after-free error as in the
following scenario with two threads, aer_isr() on the left and a
concurrent aer_remove() on the right:

  Thread A                      Thread B
  --------                      --------
  aer_irq():
    rpc->prod_idx++
				aer_remove():
				  wait_event(rpc->prod_idx == rpc->cons_idx)
				  # now blocked until queue becomes empty
  aer_isr():                      # ...
    rpc->cons_idx++               # unblocked because queue is now empty
    ...                           kfree(rpc)
    mutex_unlock(&rpc->rpc_mutex)

Wait until the last scheduled instance of aer_isr() has completed before
freeing the rpc struct by using flush_work() in aer_remove().

I reproduced this use-after-free by flashing a device FPGA and
re-enumerating the bus to find the new device.  With SLUB debug, this
crashes with 0x6b bytes (POISON_FREE, the use-after-free magic number) in
GPR25:

  pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000
  Unable to handle kernel paging request for data at address 0x27ef9e3e
  Workqueue: events aer_isr
  GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0
  NIP [602f5328] pci_walk_bus+0xd4/0x104

[bhelgaas: changelog]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] pci: aer: wait till the workqueue completes before free memory
  2016-01-21 20:57     ` Bjorn Helgaas
@ 2016-01-23 20:09       ` Sebastian Andrzej Siewior
  2016-01-25 16:22       ` Bjorn Helgaas
  1 sibling, 0 replies; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-01-23 20:09 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Bjorn Helgaas, linux-pci

On 01/21/2016 09:57 PM, Bjorn Helgaas wrote:
> Hi Sebastian,

Hi Bjorn,

> I propose to merge this patch unchanged, but with the following
> changelog.  I want to add a bit more detail about the concurrency
> problem and remove a bit of the specific detail about your FPGA:

perfect, thanks.

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] pci: aer: wait till the workqueue completes before free memory
  2016-01-21 20:57     ` Bjorn Helgaas
  2016-01-23 20:09       ` Sebastian Andrzej Siewior
@ 2016-01-25 16:22       ` Bjorn Helgaas
  1 sibling, 0 replies; 7+ messages in thread
From: Bjorn Helgaas @ 2016-01-25 16:22 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Bjorn Helgaas, linux-pci

On Thu, Jan 21, 2016 at 02:57:17PM -0600, Bjorn Helgaas wrote:
> Hi Sebastian,
> 
> On Fri, Jan 15, 2016 at 07:36:25PM +0100, Sebastian Andrzej Siewior wrote:
> > I start a binary which should flash the FPGA and re-enumare the PCI-BUS
> > and find a new device. It works most of the time. With SLUB debug it
> > crashes on each iteration with something like this (compressed output):
> > 
> > | pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000
> > | Unable to handle kernel paging request for data at address 0x27ef9e3e
> > | Faulting instruction address: 0x602f5328
> > | Oops: Kernel access of bad area, sig: 11 [#1]
> > | Workqueue: events aer_isr
> > | GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0
> > | NIP [602f5328] pci_walk_bus+0xd4/0x104
> > 
> > Register 25 has the user-after magic. As it turns out, the old PCIe
> > device is leaving, generates an error before it left, aer_irq() is fired,
> > it schedules a work item. What happens now is that free_irq() is
> > invoked, all resources are gone *before* the aes_isr() work item is
> > completed.
> > So to fix this, I flush the workqueue to ensure that there is no more
> > work pending.
> > The wait_event() on wait_release should actually synchronized against
> > removal. However the condition (->prod_idx == ->cons_idx) is made true
> > before the function completes (aer_isr_one_error() is invoked right
> > after that) so it does not fulfill its purpose. Therefore I remove it.
> > 
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> I propose to merge this patch unchanged, but with the following
> changelog.  I want to add a bit more detail about the concurrency
> problem and remove a bit of the specific detail about your FPGA:
> 
> 
> commit 9963c9487f733ef8fe3a06ce3398072a40f955bf
> Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Date:   Fri Jan 15 19:36:25 2016 +0100
> 
> PCI/AER: Flush workqueue on device remove to avoid use-after-free
> 
> A Root Port's AER structure (rpc) contains a queue of events.  aer_irq()
> enqueues AER status information and schedules aer_isr() to dequeue and
> process it.  When we remove a device, aer_remove() waits for the queue to
> be empty, then frees the rpc struct.
> 
> But aer_isr() references the rpc struct after dequeueing and possibly
> emptying the queue, which can cause a use-after-free error as in the
> following scenario with two threads, aer_isr() on the left and a
> concurrent aer_remove() on the right:
> 
>   Thread A                      Thread B
>   --------                      --------
>   aer_irq():
>     rpc->prod_idx++
> 				aer_remove():
> 				  wait_event(rpc->prod_idx == rpc->cons_idx)
> 				  # now blocked until queue becomes empty
>   aer_isr():                      # ...
>     rpc->cons_idx++               # unblocked because queue is now empty
>     ...                           kfree(rpc)
>     mutex_unlock(&rpc->rpc_mutex)
> 
> Wait until the last scheduled instance of aer_isr() has completed before
> freeing the rpc struct by using flush_work() in aer_remove().
> 
> I reproduced this use-after-free by flashing a device FPGA and
> re-enumerating the bus to find the new device.  With SLUB debug, this
> crashes with 0x6b bytes (POISON_FREE, the use-after-free magic number) in
> GPR25:
> 
>   pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000
>   Unable to handle kernel paging request for data at address 0x27ef9e3e
>   Workqueue: events aer_isr
>   GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0
>   NIP [602f5328] pci_walk_bus+0xd4/0x104
> 
> [bhelgaas: changelog]
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Applied to for-linus for v4.5 with stable tag, thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-01-25 16:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-17 14:32 [PATCH] pci: aer: wait till the workqueue completes before free memory Sebastian Andrzej Siewior
2016-01-06 23:27 ` Bjorn Helgaas
2016-01-15 18:03   ` Sebastian Andrzej Siewior
2016-01-15 18:36   ` [PATCH v2] " Sebastian Andrzej Siewior
2016-01-21 20:57     ` Bjorn Helgaas
2016-01-23 20:09       ` Sebastian Andrzej Siewior
2016-01-25 16:22       ` Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.