linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ocxl: Fix page fault handler in case of fault on dying process
@ 2018-06-18 12:14 Frederic Barrat
  2018-06-20  1:39 ` Alastair D'Silva
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Frederic Barrat @ 2018-06-18 12:14 UTC (permalink / raw)
  To: andrew.donnellan, alastair, linuxppc-dev; +Cc: clombard, vaibhav

If a process exits without doing proper cleanup, there's a window
where an opencapi device can try to access the memory of the dying
process and may trigger a page fault. That's an expected scenario and
the ocxl driver holds a reference on the mm_struct of the process
until the opencapi device is notified of the process exiting.
However, if mm_users is already at 0, i.e. the address space of the
process has already been destroyed, the driver shouldn't try resolving
the page fault, as it will fail, but it can also try accessing already
freed data.

It is fixed by only calling the bottom half of the page fault handler
if mm_users is greater than 0 and get a reference on mm_users instead
of mm_count. Otherwise, we can safely return a translation fault to
the device, as its associated memory context is being removed. The
opencapi device will be properly cleaned up shortly after when closing
the file descriptors.

Fixes: 5ef3166e8a32 ("ocxl: Driver code for 'generic' opencapi devices")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---
 drivers/misc/ocxl/link.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index f30790582dc0..2aeaf34e7eda 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -136,7 +136,7 @@ static void xsl_fault_handler_bh(struct work_struct *fault_work)
 	int rc;
 
 	/*
-	 * We need to release a reference on the mm whenever exiting this
+	 * We must release a reference on mm_users whenever exiting this
 	 * function (taken in the memory fault interrupt handler)
 	 */
 	rc = copro_handle_mm_fault(fault->pe_data.mm, fault->dar, fault->dsisr,
@@ -172,7 +172,7 @@ static void xsl_fault_handler_bh(struct work_struct *fault_work)
 	}
 	r = RESTART;
 ack:
-	mmdrop(fault->pe_data.mm);
+	mmput(fault->pe_data.mm);
 	ack_irq(spa, r);
 }
 
@@ -184,6 +184,7 @@ static irqreturn_t xsl_fault_handler(int irq, void *data)
 	struct pe_data *pe_data;
 	struct ocxl_process_element *pe;
 	int lpid, pid, tid;
+	bool schedule = false;
 
 	read_irq(spa, &dsisr, &dar, &pe_handle);
 	trace_ocxl_fault(spa->spa_mem, pe_handle, dsisr, dar, -1);
@@ -226,14 +227,19 @@ static irqreturn_t xsl_fault_handler(int irq, void *data)
 	}
 	WARN_ON(pe_data->mm->context.id != pid);
 
-	spa->xsl_fault.pe = pe_handle;
-	spa->xsl_fault.dar = dar;
-	spa->xsl_fault.dsisr = dsisr;
-	spa->xsl_fault.pe_data = *pe_data;
-	mmgrab(pe_data->mm); /* mm count is released by bottom half */
-
+	if (mmget_not_zero(pe_data->mm)) {
+			spa->xsl_fault.pe = pe_handle;
+			spa->xsl_fault.dar = dar;
+			spa->xsl_fault.dsisr = dsisr;
+			spa->xsl_fault.pe_data = *pe_data;
+			schedule = true;
+			/* mm_users count released by bottom half */
+	}
 	rcu_read_unlock();
-	schedule_work(&spa->xsl_fault.fault_work);
+	if (schedule)
+		schedule_work(&spa->xsl_fault.fault_work);
+	else
+		ack_irq(spa, ADDRESS_ERROR);
 	return IRQ_HANDLED;
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] ocxl: Fix page fault handler in case of fault on dying process
  2018-06-18 12:14 [PATCH] ocxl: Fix page fault handler in case of fault on dying process Frederic Barrat
@ 2018-06-20  1:39 ` Alastair D'Silva
  2018-06-21  8:07 ` Andrew Donnellan
  2018-07-11 13:24 ` Michael Ellerman
  2 siblings, 0 replies; 4+ messages in thread
From: Alastair D'Silva @ 2018-06-20  1:39 UTC (permalink / raw)
  To: Frederic Barrat, andrew.donnellan, linuxppc-dev; +Cc: clombard, vaibhav

On Mon, 2018-06-18 at 14:14 +0200, Frederic Barrat wrote:
> If a process exits without doing proper cleanup, there's a window
> where an opencapi device can try to access the memory of the dying
> process and may trigger a page fault. That's an expected scenario and
> the ocxl driver holds a reference on the mm_struct of the process
> until the opencapi device is notified of the process exiting.
> However, if mm_users is already at 0, i.e. the address space of the
> process has already been destroyed, the driver shouldn't try
> resolving
> the page fault, as it will fail, but it can also try accessing
> already
> freed data.
> 
> It is fixed by only calling the bottom half of the page fault handler
> if mm_users is greater than 0 and get a reference on mm_users instead
> of mm_count. Otherwise, we can safely return a translation fault to
> the device, as its associated memory context is being removed. The
> opencapi device will be properly cleaned up shortly after when
> closing
> the file descriptors.
> 
> Fixes: 5ef3166e8a32 ("ocxl: Driver code for 'generic' opencapi
> devices")
> Cc: stable@vger.kernel.org # v4.16+
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
>  drivers/misc/ocxl/link.c | 24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index f30790582dc0..2aeaf34e7eda 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -136,7 +136,7 @@ static void xsl_fault_handler_bh(struct
> work_struct *fault_work)
>  	int rc;
>  
>  	/*
> -	 * We need to release a reference on the mm whenever exiting
> this
> +	 * We must release a reference on mm_users whenever exiting
> this
>  	 * function (taken in the memory fault interrupt handler)
>  	 */
>  	rc = copro_handle_mm_fault(fault->pe_data.mm, fault->dar,
> fault->dsisr,
> @@ -172,7 +172,7 @@ static void xsl_fault_handler_bh(struct
> work_struct *fault_work)
>  	}
>  	r = RESTART;
>  ack:
> -	mmdrop(fault->pe_data.mm);
> +	mmput(fault->pe_data.mm);
>  	ack_irq(spa, r);
>  }
>  
> @@ -184,6 +184,7 @@ static irqreturn_t xsl_fault_handler(int irq,
> void *data)
>  	struct pe_data *pe_data;
>  	struct ocxl_process_element *pe;
>  	int lpid, pid, tid;
> +	bool schedule = false;
>  
>  	read_irq(spa, &dsisr, &dar, &pe_handle);
>  	trace_ocxl_fault(spa->spa_mem, pe_handle, dsisr, dar, -1);
> @@ -226,14 +227,19 @@ static irqreturn_t xsl_fault_handler(int irq,
> void *data)
>  	}
>  	WARN_ON(pe_data->mm->context.id != pid);
>  
> -	spa->xsl_fault.pe = pe_handle;
> -	spa->xsl_fault.dar = dar;
> -	spa->xsl_fault.dsisr = dsisr;
> -	spa->xsl_fault.pe_data = *pe_data;
> -	mmgrab(pe_data->mm); /* mm count is released by bottom half
> */
> -
> +	if (mmget_not_zero(pe_data->mm)) {
> +			spa->xsl_fault.pe = pe_handle;
> +			spa->xsl_fault.dar = dar;
> +			spa->xsl_fault.dsisr = dsisr;
> +			spa->xsl_fault.pe_data = *pe_data;
> +			schedule = true;
> +			/* mm_users count released by bottom half */
> +	}
>  	rcu_read_unlock();
> -	schedule_work(&spa->xsl_fault.fault_work);
> +	if (schedule)
> +		schedule_work(&spa->xsl_fault.fault_work);
> +	else
> +		ack_irq(spa, ADDRESS_ERROR);
>  	return IRQ_HANDLED;
>  }
>  

Reviewed-By: Alastair D'Silva <alastair@d-silva.org>

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] ocxl: Fix page fault handler in case of fault on dying process
  2018-06-18 12:14 [PATCH] ocxl: Fix page fault handler in case of fault on dying process Frederic Barrat
  2018-06-20  1:39 ` Alastair D'Silva
@ 2018-06-21  8:07 ` Andrew Donnellan
  2018-07-11 13:24 ` Michael Ellerman
  2 siblings, 0 replies; 4+ messages in thread
From: Andrew Donnellan @ 2018-06-21  8:07 UTC (permalink / raw)
  To: Frederic Barrat, alastair, linuxppc-dev; +Cc: clombard, vaibhav

On 18/06/18 22:14, Frederic Barrat wrote:
> If a process exits without doing proper cleanup, there's a window
> where an opencapi device can try to access the memory of the dying
> process and may trigger a page fault. That's an expected scenario and
> the ocxl driver holds a reference on the mm_struct of the process
> until the opencapi device is notified of the process exiting.
> However, if mm_users is already at 0, i.e. the address space of the
> process has already been destroyed, the driver shouldn't try resolving
> the page fault, as it will fail, but it can also try accessing already
> freed data.
> 
> It is fixed by only calling the bottom half of the page fault handler
> if mm_users is greater than 0 and get a reference on mm_users instead
> of mm_count. Otherwise, we can safely return a translation fault to
> the device, as its associated memory context is being removed. The
> opencapi device will be properly cleaned up shortly after when closing
> the file descriptors.
> 
> Fixes: 5ef3166e8a32 ("ocxl: Driver code for 'generic' opencapi devices")
> Cc: stable@vger.kernel.org # v4.16+
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>

I think I understand what's going on here and it looks right.

Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ocxl: Fix page fault handler in case of fault on dying process
  2018-06-18 12:14 [PATCH] ocxl: Fix page fault handler in case of fault on dying process Frederic Barrat
  2018-06-20  1:39 ` Alastair D'Silva
  2018-06-21  8:07 ` Andrew Donnellan
@ 2018-07-11 13:24 ` Michael Ellerman
  2 siblings, 0 replies; 4+ messages in thread
From: Michael Ellerman @ 2018-07-11 13:24 UTC (permalink / raw)
  To: Frederic Barrat, andrew.donnellan, alastair, linuxppc-dev
  Cc: clombard, vaibhav

On Mon, 2018-06-18 at 12:14:36 UTC, Frederic Barrat wrote:
> If a process exits without doing proper cleanup, there's a window
> where an opencapi device can try to access the memory of the dying
> process and may trigger a page fault. That's an expected scenario and
> the ocxl driver holds a reference on the mm_struct of the process
> until the opencapi device is notified of the process exiting.
> However, if mm_users is already at 0, i.e. the address space of the
> process has already been destroyed, the driver shouldn't try resolving
> the page fault, as it will fail, but it can also try accessing already
> freed data.
> 
> It is fixed by only calling the bottom half of the page fault handler
> if mm_users is greater than 0 and get a reference on mm_users instead
> of mm_count. Otherwise, we can safely return a translation fault to
> the device, as its associated memory context is being removed. The
> opencapi device will be properly cleaned up shortly after when closing
> the file descriptors.
> 
> Fixes: 5ef3166e8a32 ("ocxl: Driver code for 'generic' opencapi devices")
> Cc: stable@vger.kernel.org # v4.16+
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> Reviewed-By: Alastair D'Silva <alastair@d-silva.org>
> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/d497ebf5fb3a026c0817f8c96cde57

cheers

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-07-11 13:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-18 12:14 [PATCH] ocxl: Fix page fault handler in case of fault on dying process Frederic Barrat
2018-06-20  1:39 ` Alastair D'Silva
2018-06-21  8:07 ` Andrew Donnellan
2018-07-11 13:24 ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).