All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] cxl: Fix possible deadlock when processing page faults from cxllib
@ 2018-04-03  9:43 Frederic Barrat
  2018-04-03  9:56 ` Laurent Dufour
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Frederic Barrat @ 2018-04-03  9:43 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: ldufour, clombard, andrew.donnellan, vaibhav

cxllib_handle_fault() is called by an external driver when it needs to
have the host process page faults for a buffer which may cover several
pages. Currently the function holds the mm->mmap_sem semaphore with
read access while iterating over the buffer, since it could spawn
several VMAs. When calling a lower-level function to handle the page
fault for a single page, the semaphore is accessed again in read
mode. That is wrong and can lead to deadlocks if a writer tries to
sneak in while a buffer of several pages is being processed.

The fix is to release the semaphore once cxllib_handle_fault() got the
information it needs from the current vma. The address space/VMAs
could evolve while we iterate over the full buffer, but in the
unlikely case where we miss a page, the driver will raise a new page
fault when retrying.

Fixes: 3ced8d730063 ("cxl: Export library to support IBM XSL")
Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
---
 drivers/misc/cxl/cxllib.c | 85 ++++++++++++++++++++++++++++++-----------------
 1 file changed, 55 insertions(+), 30 deletions(-)

diff --git a/drivers/misc/cxl/cxllib.c b/drivers/misc/cxl/cxllib.c
index 30ccba436b3b..55cd35d1a9cc 100644
--- a/drivers/misc/cxl/cxllib.c
+++ b/drivers/misc/cxl/cxllib.c
@@ -208,49 +208,74 @@ int cxllib_get_PE_attributes(struct task_struct *task,
 }
 EXPORT_SYMBOL_GPL(cxllib_get_PE_attributes);
 
-int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 flags)
+static int get_vma_info(struct mm_struct *mm, u64 addr,
+			u64 *vma_start, u64 *vma_end,
+			unsigned long *page_size)
 {
-	int rc;
-	u64 dar;
 	struct vm_area_struct *vma = NULL;
-	unsigned long page_size;
-
-	if (mm == NULL)
-		return -EFAULT;
+	int rc = 0;
 
 	down_read(&mm->mmap_sem);
 
 	vma = find_vma(mm, addr);
 	if (!vma) {
-		pr_err("Can't find vma for addr %016llx\n", addr);
 		rc = -EFAULT;
 		goto out;
 	}
-	/* get the size of the pages allocated */
-	page_size = vma_kernel_pagesize(vma);
-
-	for (dar = (addr & ~(page_size - 1)); dar < (addr + size); dar += page_size) {
-		if (dar < vma->vm_start || dar >= vma->vm_end) {
-			vma = find_vma(mm, addr);
-			if (!vma) {
-				pr_err("Can't find vma for addr %016llx\n", addr);
-				rc = -EFAULT;
-				goto out;
-			}
-			/* get the size of the pages allocated */
-			page_size = vma_kernel_pagesize(vma);
+	*page_size = vma_kernel_pagesize(vma);
+	*vma_start = vma->vm_start;
+	*vma_end = vma->vm_end;
+out:
+	up_read(&mm->mmap_sem);
+	return rc;
+}
+
+int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 flags)
+{
+	int rc;
+	u64 dar, vma_start, vma_end;
+	unsigned long page_size;
+
+	if (mm == NULL)
+		return -EFAULT;
+
+	/*
+	 * The buffer we have to process can extend over several pages
+	 * and may also cover several VMAs.
+	 * We iterate over all the pages. The page size could vary
+	 * between VMAs.
+	 */
+	rc = get_vma_info(mm, addr, &vma_start, &vma_end, &page_size);
+	if (rc)
+		return rc;
+
+	for (dar = (addr & ~(page_size - 1)); dar < (addr + size);
+	     dar += page_size) {
+		if (dar < vma_start || dar >= vma_end) {
+			/*
+			 * We don't hold the mm->mmap_sem semaphore
+			 * while iterating, since the semaphore is
+			 * required by one of the lower-level page
+			 * fault processing functions and it could
+			 * create a deadlock.
+			 *
+			 * It means the VMAs can be altered between 2
+			 * loop iterations and we could theoretically
+			 * miss a page (however unlikely). But that's
+			 * not really a problem, as the driver will
+			 * retry access, get another page fault on the
+			 * missing page and call us again.
+			 */
+			rc = get_vma_info(mm, dar, &vma_start, &vma_end,
+					&page_size);
+			if (rc)
+				return rc;
 		}
 
 		rc = cxl_handle_mm_fault(mm, flags, dar);
-		if (rc) {
-			pr_err("cxl_handle_mm_fault failed %d", rc);
-			rc = -EFAULT;
-			goto out;
-		}
+		if (rc)
+			return -EFAULT;
 	}
-	rc = 0;
-out:
-	up_read(&mm->mmap_sem);
-	return rc;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(cxllib_handle_fault);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] cxl: Fix possible deadlock when processing page faults from cxllib
  2018-04-03  9:43 [PATCH] cxl: Fix possible deadlock when processing page faults from cxllib Frederic Barrat
@ 2018-04-03  9:56 ` Laurent Dufour
  2018-04-03 11:43 ` Vaibhav Jain
  2018-04-03 14:40 ` Aneesh Kumar K.V
  2 siblings, 0 replies; 7+ messages in thread
From: Laurent Dufour @ 2018-04-03  9:56 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev, mpe; +Cc: clombard, andrew.donnellan, vaibhav



On 03/04/2018 11:43, Frederic Barrat wrote:
> cxllib_handle_fault() is called by an external driver when it needs to
> have the host process page faults for a buffer which may cover several
> pages. Currently the function holds the mm->mmap_sem semaphore with
> read access while iterating over the buffer, since it could spawn
> several VMAs. When calling a lower-level function to handle the page
> fault for a single page, the semaphore is accessed again in read
> mode. That is wrong and can lead to deadlocks if a writer tries to
> sneak in while a buffer of several pages is being processed.
> 
> The fix is to release the semaphore once cxllib_handle_fault() got the
> information it needs from the current vma. The address space/VMAs
> could evolve while we iterate over the full buffer, but in the
> unlikely case where we miss a page, the driver will raise a new page
> fault when retrying.
> 
> Fixes: 3ced8d730063 ("cxl: Export library to support IBM XSL")
> Cc: stable@vger.kernel.org # 4.13+
> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

FWIW,
Reviewed-by : Laurent Dufour <ldufour@linux.vnet.ibm.com>

> ---
>  drivers/misc/cxl/cxllib.c | 85 ++++++++++++++++++++++++++++++-----------------
>  1 file changed, 55 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/misc/cxl/cxllib.c b/drivers/misc/cxl/cxllib.c
> index 30ccba436b3b..55cd35d1a9cc 100644
> --- a/drivers/misc/cxl/cxllib.c
> +++ b/drivers/misc/cxl/cxllib.c
> @@ -208,49 +208,74 @@ int cxllib_get_PE_attributes(struct task_struct *task,
>  }
>  EXPORT_SYMBOL_GPL(cxllib_get_PE_attributes);
> 
> -int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 flags)
> +static int get_vma_info(struct mm_struct *mm, u64 addr,
> +			u64 *vma_start, u64 *vma_end,
> +			unsigned long *page_size)
>  {
> -	int rc;
> -	u64 dar;
>  	struct vm_area_struct *vma = NULL;
> -	unsigned long page_size;
> -
> -	if (mm == NULL)
> -		return -EFAULT;
> +	int rc = 0;
> 
>  	down_read(&mm->mmap_sem);
> 
>  	vma = find_vma(mm, addr);
>  	if (!vma) {
> -		pr_err("Can't find vma for addr %016llx\n", addr);
>  		rc = -EFAULT;
>  		goto out;
>  	}
> -	/* get the size of the pages allocated */
> -	page_size = vma_kernel_pagesize(vma);
> -
> -	for (dar = (addr & ~(page_size - 1)); dar < (addr + size); dar += page_size) {
> -		if (dar < vma->vm_start || dar >= vma->vm_end) {
> -			vma = find_vma(mm, addr);
> -			if (!vma) {
> -				pr_err("Can't find vma for addr %016llx\n", addr);
> -				rc = -EFAULT;
> -				goto out;
> -			}
> -			/* get the size of the pages allocated */
> -			page_size = vma_kernel_pagesize(vma);
> +	*page_size = vma_kernel_pagesize(vma);
> +	*vma_start = vma->vm_start;
> +	*vma_end = vma->vm_end;
> +out:
> +	up_read(&mm->mmap_sem);
> +	return rc;
> +}
> +
> +int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 flags)
> +{
> +	int rc;
> +	u64 dar, vma_start, vma_end;
> +	unsigned long page_size;
> +
> +	if (mm == NULL)
> +		return -EFAULT;
> +
> +	/*
> +	 * The buffer we have to process can extend over several pages
> +	 * and may also cover several VMAs.
> +	 * We iterate over all the pages. The page size could vary
> +	 * between VMAs.
> +	 */
> +	rc = get_vma_info(mm, addr, &vma_start, &vma_end, &page_size);
> +	if (rc)
> +		return rc;
> +
> +	for (dar = (addr & ~(page_size - 1)); dar < (addr + size);
> +	     dar += page_size) {
> +		if (dar < vma_start || dar >= vma_end) {
> +			/*
> +			 * We don't hold the mm->mmap_sem semaphore
> +			 * while iterating, since the semaphore is
> +			 * required by one of the lower-level page
> +			 * fault processing functions and it could
> +			 * create a deadlock.
> +			 *
> +			 * It means the VMAs can be altered between 2
> +			 * loop iterations and we could theoretically
> +			 * miss a page (however unlikely). But that's
> +			 * not really a problem, as the driver will
> +			 * retry access, get another page fault on the
> +			 * missing page and call us again.
> +			 */
> +			rc = get_vma_info(mm, dar, &vma_start, &vma_end,
> +					&page_size);
> +			if (rc)
> +				return rc;
>  		}
> 
>  		rc = cxl_handle_mm_fault(mm, flags, dar);
> -		if (rc) {
> -			pr_err("cxl_handle_mm_fault failed %d", rc);
> -			rc = -EFAULT;
> -			goto out;
> -		}
> +		if (rc)
> +			return -EFAULT;
>  	}
> -	rc = 0;
> -out:
> -	up_read(&mm->mmap_sem);
> -	return rc;
> +	return 0;
>  }
>  EXPORT_SYMBOL_GPL(cxllib_handle_fault);
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cxl: Fix possible deadlock when processing page faults from cxllib
  2018-04-03  9:43 [PATCH] cxl: Fix possible deadlock when processing page faults from cxllib Frederic Barrat
  2018-04-03  9:56 ` Laurent Dufour
@ 2018-04-03 11:43 ` Vaibhav Jain
  2018-04-03 14:40 ` Aneesh Kumar K.V
  2 siblings, 0 replies; 7+ messages in thread
From: Vaibhav Jain @ 2018-04-03 11:43 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev, mpe; +Cc: ldufour, clombard, andrew.donnellan

Frederic Barrat <fbarrat@linux.vnet.ibm.com> writes:

> cxllib_handle_fault() is called by an external driver when it needs to
> have the host process page faults for a buffer which may cover several
> pages. Currently the function holds the mm->mmap_sem semaphore with
> read access while iterating over the buffer, since it could spawn
> several VMAs. When calling a lower-level function to handle the page
> fault for a single page, the semaphore is accessed again in read
> mode. That is wrong and can lead to deadlocks if a writer tries to
> sneak in while a buffer of several pages is being processed.
>
> The fix is to release the semaphore once cxllib_handle_fault() got the
> information it needs from the current vma. The address space/VMAs
> could evolve while we iterate over the full buffer, but in the
> unlikely case where we miss a page, the driver will raise a new page
> fault when retrying.
>
> Fixes: 3ced8d730063 ("cxl: Export library to support IBM XSL")
> Cc: stable@vger.kernel.org # 4.13+
> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>

Reviewed-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cxl: Fix possible deadlock when processing page faults from cxllib
  2018-04-03  9:43 [PATCH] cxl: Fix possible deadlock when processing page faults from cxllib Frederic Barrat
  2018-04-03  9:56 ` Laurent Dufour
  2018-04-03 11:43 ` Vaibhav Jain
@ 2018-04-03 14:40 ` Aneesh Kumar K.V
  2018-04-03 15:31   ` Aneesh Kumar K.V
  2018-04-03 16:40   ` Frederic Barrat
  2 siblings, 2 replies; 7+ messages in thread
From: Aneesh Kumar K.V @ 2018-04-03 14:40 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev, mpe
  Cc: ldufour, clombard, andrew.donnellan, vaibhav

On 04/03/2018 03:13 PM, Frederic Barrat wrote:
> cxllib_handle_fault() is called by an external driver when it needs to
> have the host process page faults for a buffer which may cover several
> pages. Currently the function holds the mm->mmap_sem semaphore with
> read access while iterating over the buffer, since it could spawn
> several VMAs. When calling a lower-level function to handle the page
> fault for a single page, the semaphore is accessed again in read
> mode. That is wrong and can lead to deadlocks if a writer tries to
> sneak in while a buffer of several pages is being processed.
> 
> The fix is to release the semaphore once cxllib_handle_fault() got the
> information it needs from the current vma. The address space/VMAs
> could evolve while we iterate over the full buffer, but in the
> unlikely case where we miss a page, the driver will raise a new page
> fault when retrying.
> 
> Fixes: 3ced8d730063 ("cxl: Export library to support IBM XSL")
> Cc: stable@vger.kernel.org # 4.13+
> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
> ---
>   drivers/misc/cxl/cxllib.c | 85 ++++++++++++++++++++++++++++++-----------------
>   1 file changed, 55 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/misc/cxl/cxllib.c b/drivers/misc/cxl/cxllib.c
> index 30ccba436b3b..55cd35d1a9cc 100644
> --- a/drivers/misc/cxl/cxllib.c
> +++ b/drivers/misc/cxl/cxllib.c
> @@ -208,49 +208,74 @@ int cxllib_get_PE_attributes(struct task_struct *task,
>   }
>   EXPORT_SYMBOL_GPL(cxllib_get_PE_attributes);
>   
> -int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 flags)
> +static int get_vma_info(struct mm_struct *mm, u64 addr,
> +			u64 *vma_start, u64 *vma_end,
> +			unsigned long *page_size)
>   {
> -	int rc;
> -	u64 dar;
>   	struct vm_area_struct *vma = NULL;
> -	unsigned long page_size;
> -
> -	if (mm == NULL)
> -		return -EFAULT;
> +	int rc = 0;
>   
>   	down_read(&mm->mmap_sem);
>   
>   	vma = find_vma(mm, addr);
>   	if (!vma) {
> -		pr_err("Can't find vma for addr %016llx\n", addr);
>   		rc = -EFAULT;
>   		goto out;
>   	}
> -	/* get the size of the pages allocated */
> -	page_size = vma_kernel_pagesize(vma);
> -
> -	for (dar = (addr & ~(page_size - 1)); dar < (addr + size); dar += page_size) {
> -		if (dar < vma->vm_start || dar >= vma->vm_end) {
> -			vma = find_vma(mm, addr);
> -			if (!vma) {
> -				pr_err("Can't find vma for addr %016llx\n", addr);
> -				rc = -EFAULT;
> -				goto out;
> -			}
> -			/* get the size of the pages allocated */
> -			page_size = vma_kernel_pagesize(vma);
> +	*page_size = vma_kernel_pagesize(vma);
> +	*vma_start = vma->vm_start;
> +	*vma_end = vma->vm_end;
> +out:
> +	up_read(&mm->mmap_sem);
> +	return rc;
> +}
> +
> +int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 flags)
> +{
> +	int rc;
> +	u64 dar, vma_start, vma_end;
> +	unsigned long page_size;
> +
> +	if (mm == NULL)
> +		return -EFAULT;
> +
> +	/*
> +	 * The buffer we have to process can extend over several pages
> +	 * and may also cover several VMAs.
> +	 * We iterate over all the pages. The page size could vary
> +	 * between VMAs.
> +	 */
> +	rc = get_vma_info(mm, addr, &vma_start, &vma_end, &page_size);
> +	if (rc)
> +		return rc;
> +
> +	for (dar = (addr & ~(page_size - 1)); dar < (addr + size);
> +	     dar += page_size) {
> +		if (dar < vma_start || dar >= vma_end) {


IIUC, we are fetching the vma to get just the page_size with which it is 
mapped? Can't we iterate with PAGE_SIZE? Considering hugetlb page size 
will be larger than PAGE_SIZE, we might call into cxl_handle_mm_fault 
multiple times for a hugetlb page. Does that cause any issue? Also can 
cxl be used with hugetlb mappings?

> +			/*
> +			 * We don't hold the mm->mmap_sem semaphore
> +			 * while iterating, since the semaphore is
> +			 * required by one of the lower-level page
> +			 * fault processing functions and it could
> +			 * create a deadlock.
> +			 *
> +			 * It means the VMAs can be altered between 2
> +			 * loop iterations and we could theoretically
> +			 * miss a page (however unlikely). But that's
> +			 * not really a problem, as the driver will
> +			 * retry access, get another page fault on the
> +			 * missing page and call us again.
> +			 */
> +			rc = get_vma_info(mm, dar, &vma_start, &vma_end,
> +					&page_size);
> +			if (rc)
> +				return rc;
>   		}
>   
>   		rc = cxl_handle_mm_fault(mm, flags, dar);
> -		if (rc) {
> -			pr_err("cxl_handle_mm_fault failed %d", rc);
> -			rc = -EFAULT;
> -			goto out;
> -		}
> +		if (rc)
> +			return -EFAULT;
>   	}
> -	rc = 0;
> -out:
> -	up_read(&mm->mmap_sem);
> -	return rc;
> +	return 0;
>   }
>   EXPORT_SYMBOL_GPL(cxllib_handle_fault);
> 

-aneesh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cxl: Fix possible deadlock when processing page faults from cxllib
  2018-04-03 14:40 ` Aneesh Kumar K.V
@ 2018-04-03 15:31   ` Aneesh Kumar K.V
  2018-04-04 10:58     ` Frederic Barrat
  2018-04-03 16:40   ` Frederic Barrat
  1 sibling, 1 reply; 7+ messages in thread
From: Aneesh Kumar K.V @ 2018-04-03 15:31 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev, mpe
  Cc: ldufour, clombard, andrew.donnellan, vaibhav

On 04/03/2018 08:10 PM, Aneesh Kumar K.V wrote:
> On 04/03/2018 03:13 PM, Frederic Barrat wrote:
>> cxllib_handle_fault() is called by an external driver when it needs to
>> have the host process page faults for a buffer which may cover several
>> pages. Currently the function holds the mm->mmap_sem semaphore with
>> read access while iterating over the buffer, since it could spawn
>> several VMAs. When calling a lower-level function to handle the page
>> fault for a single page, the semaphore is accessed again in read
>> mode. That is wrong and can lead to deadlocks if a writer tries to
>> sneak in while a buffer of several pages is being processed.
>>
>> The fix is to release the semaphore once cxllib_handle_fault() got the
>> information it needs from the current vma. The address space/VMAs
>> could evolve while we iterate over the full buffer, but in the
>> unlikely case where we miss a page, the driver will raise a new page
>> fault when retrying.
>>
>> Fixes: 3ced8d730063 ("cxl: Export library to support IBM XSL")
>> Cc: stable@vger.kernel.org # 4.13+
>> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
>> ---
>>   drivers/misc/cxl/cxllib.c | 85 
>> ++++++++++++++++++++++++++++++-----------------
>>   1 file changed, 55 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/misc/cxl/cxllib.c b/drivers/misc/cxl/cxllib.c
>> index 30ccba436b3b..55cd35d1a9cc 100644
>> --- a/drivers/misc/cxl/cxllib.c
>> +++ b/drivers/misc/cxl/cxllib.c
>> @@ -208,49 +208,74 @@ int cxllib_get_PE_attributes(struct task_struct 
>> *task,
>>   }
>>   EXPORT_SYMBOL_GPL(cxllib_get_PE_attributes);
>> -int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 
>> flags)
>> +static int get_vma_info(struct mm_struct *mm, u64 addr,
>> +            u64 *vma_start, u64 *vma_end,
>> +            unsigned long *page_size)
>>   {
>> -    int rc;
>> -    u64 dar;
>>       struct vm_area_struct *vma = NULL;
>> -    unsigned long page_size;
>> -
>> -    if (mm == NULL)
>> -        return -EFAULT;
>> +    int rc = 0;
>>       down_read(&mm->mmap_sem);
>>       vma = find_vma(mm, addr);
>>       if (!vma) {
>> -        pr_err("Can't find vma for addr %016llx\n", addr);
>>           rc = -EFAULT;
>>           goto out;
>>       }
>> -    /* get the size of the pages allocated */
>> -    page_size = vma_kernel_pagesize(vma);
>> -
>> -    for (dar = (addr & ~(page_size - 1)); dar < (addr + size); dar += 
>> page_size) {
>> -        if (dar < vma->vm_start || dar >= vma->vm_end) {
>> -            vma = find_vma(mm, addr);
>> -            if (!vma) {
>> -                pr_err("Can't find vma for addr %016llx\n", addr);
>> -                rc = -EFAULT;
>> -                goto out;
>> -            }
>> -            /* get the size of the pages allocated */
>> -            page_size = vma_kernel_pagesize(vma);
>> +    *page_size = vma_kernel_pagesize(vma);
>> +    *vma_start = vma->vm_start;
>> +    *vma_end = vma->vm_end;
>> +out:
>> +    up_read(&mm->mmap_sem);
>> +    return rc;
>> +}
>> +
>> +int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 
>> flags)
>> +{
>> +    int rc;
>> +    u64 dar, vma_start, vma_end;
>> +    unsigned long page_size;
>> +
>> +    if (mm == NULL)
>> +        return -EFAULT;
>> +
>> +    /*
>> +     * The buffer we have to process can extend over several pages
>> +     * and may also cover several VMAs.
>> +     * We iterate over all the pages. The page size could vary
>> +     * between VMAs.
>> +     */
>> +    rc = get_vma_info(mm, addr, &vma_start, &vma_end, &page_size);
>> +    if (rc)
>> +        return rc;
>> +
>> +    for (dar = (addr & ~(page_size - 1)); dar < (addr + size);
>> +         dar += page_size) {
>> +        if (dar < vma_start || dar >= vma_end) {
> 
> 
> IIUC, we are fetching the vma to get just the page_size with which it is 
> mapped? Can't we iterate with PAGE_SIZE? Considering hugetlb page size 
> will be larger than PAGE_SIZE, we might call into cxl_handle_mm_fault 
> multiple times for a hugetlb page. Does that cause any issue? Also can 
> cxl be used with hugetlb mappings?
> 
Can you also try to use a helper like below. That will clarify the need 
of find_vma there.

static int __cxllib_handle_fault(struct mm_struct *mm, unsigned long 
start, unsigned long end,
				 unsigned long mapping_psize, u64 flags)
{
	int rc;
	unsigned long dar;

	for (dar = start; dar < end; dar += mapping_psize) {
		rc = cxl_handle_mm_fault(mm, flags, dar);
		if (rc) {
			rc = -EFAULT;
			goto out;
		}
	}
	rc = 0;
out:
	return rc;
}

-aneesh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cxl: Fix possible deadlock when processing page faults from cxllib
  2018-04-03 14:40 ` Aneesh Kumar K.V
  2018-04-03 15:31   ` Aneesh Kumar K.V
@ 2018-04-03 16:40   ` Frederic Barrat
  1 sibling, 0 replies; 7+ messages in thread
From: Frederic Barrat @ 2018-04-03 16:40 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Frederic Barrat, linuxppc-dev, mpe
  Cc: ldufour, clombard, andrew.donnellan, vaibhav



Le 03/04/2018 à 16:40, Aneesh Kumar K.V a écrit :
> On 04/03/2018 03:13 PM, Frederic Barrat wrote:
>> cxllib_handle_fault() is called by an external driver when it needs to
>> have the host process page faults for a buffer which may cover several
>> pages. Currently the function holds the mm->mmap_sem semaphore with
>> read access while iterating over the buffer, since it could spawn
>> several VMAs. When calling a lower-level function to handle the page
>> fault for a single page, the semaphore is accessed again in read
>> mode. That is wrong and can lead to deadlocks if a writer tries to
>> sneak in while a buffer of several pages is being processed.
>>
>> The fix is to release the semaphore once cxllib_handle_fault() got the
>> information it needs from the current vma. The address space/VMAs
>> could evolve while we iterate over the full buffer, but in the
>> unlikely case where we miss a page, the driver will raise a new page
>> fault when retrying.
>>
>> Fixes: 3ced8d730063 ("cxl: Export library to support IBM XSL")
>> Cc: stable@vger.kernel.org # 4.13+
>> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
>> ---
>>   drivers/misc/cxl/cxllib.c | 85 
>> ++++++++++++++++++++++++++++++-----------------
>>   1 file changed, 55 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/misc/cxl/cxllib.c b/drivers/misc/cxl/cxllib.c
>> index 30ccba436b3b..55cd35d1a9cc 100644
>> --- a/drivers/misc/cxl/cxllib.c
>> +++ b/drivers/misc/cxl/cxllib.c
>> @@ -208,49 +208,74 @@ int cxllib_get_PE_attributes(struct task_struct 
>> *task,
>>   }
>>   EXPORT_SYMBOL_GPL(cxllib_get_PE_attributes);
>> -int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 
>> flags)
>> +static int get_vma_info(struct mm_struct *mm, u64 addr,
>> +            u64 *vma_start, u64 *vma_end,
>> +            unsigned long *page_size)
>>   {
>> -    int rc;
>> -    u64 dar;
>>       struct vm_area_struct *vma = NULL;
>> -    unsigned long page_size;
>> -
>> -    if (mm == NULL)
>> -        return -EFAULT;
>> +    int rc = 0;
>>       down_read(&mm->mmap_sem);
>>       vma = find_vma(mm, addr);
>>       if (!vma) {
>> -        pr_err("Can't find vma for addr %016llx\n", addr);
>>           rc = -EFAULT;
>>           goto out;
>>       }
>> -    /* get the size of the pages allocated */
>> -    page_size = vma_kernel_pagesize(vma);
>> -
>> -    for (dar = (addr & ~(page_size - 1)); dar < (addr + size); dar += 
>> page_size) {
>> -        if (dar < vma->vm_start || dar >= vma->vm_end) {
>> -            vma = find_vma(mm, addr);
>> -            if (!vma) {
>> -                pr_err("Can't find vma for addr %016llx\n", addr);
>> -                rc = -EFAULT;
>> -                goto out;
>> -            }
>> -            /* get the size of the pages allocated */
>> -            page_size = vma_kernel_pagesize(vma);
>> +    *page_size = vma_kernel_pagesize(vma);
>> +    *vma_start = vma->vm_start;
>> +    *vma_end = vma->vm_end;
>> +out:
>> +    up_read(&mm->mmap_sem);
>> +    return rc;
>> +}
>> +
>> +int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 
>> flags)
>> +{
>> +    int rc;
>> +    u64 dar, vma_start, vma_end;
>> +    unsigned long page_size;
>> +
>> +    if (mm == NULL)
>> +        return -EFAULT;
>> +
>> +    /*
>> +     * The buffer we have to process can extend over several pages
>> +     * and may also cover several VMAs.
>> +     * We iterate over all the pages. The page size could vary
>> +     * between VMAs.
>> +     */
>> +    rc = get_vma_info(mm, addr, &vma_start, &vma_end, &page_size);
>> +    if (rc)
>> +        return rc;
>> +
>> +    for (dar = (addr & ~(page_size - 1)); dar < (addr + size);
>> +         dar += page_size) {
>> +        if (dar < vma_start || dar >= vma_end) {
> 
> 
> IIUC, we are fetching the vma to get just the page_size with which it is 
> mapped? Can't we iterate with PAGE_SIZE? Considering hugetlb page size 
> will be larger than PAGE_SIZE, we might call into cxl_handle_mm_fault 
> multiple times for a hugetlb page. Does that cause any issue? Also can 
> cxl be used with hugetlb mappings?

I discussed it with Aneesh, but for the record:
- huge pages could be used, cxl has no control over it.
- incrementing the loop with PAGE_SIZE if it's a huge page would be a 
waste, as only the first call to cxl_handle_mm_fault() would be useful.
- having to account for several VMAs and potentially page sizes make it 
more complicated. An idea is to check with Mellanox if we can reduce the 
scope, in case the caller can rule out some cases. It's too late for 
coral, but something we can look into for the future/upstream.

   Fred



>> +            /*
>> +             * We don't hold the mm->mmap_sem semaphore
>> +             * while iterating, since the semaphore is
>> +             * required by one of the lower-level page
>> +             * fault processing functions and it could
>> +             * create a deadlock.
>> +             *
>> +             * It means the VMAs can be altered between 2
>> +             * loop iterations and we could theoretically
>> +             * miss a page (however unlikely). But that's
>> +             * not really a problem, as the driver will
>> +             * retry access, get another page fault on the
>> +             * missing page and call us again.
>> +             */
>> +            rc = get_vma_info(mm, dar, &vma_start, &vma_end,
>> +                    &page_size);
>> +            if (rc)
>> +                return rc;
>>           }
>>           rc = cxl_handle_mm_fault(mm, flags, dar);
>> -        if (rc) {
>> -            pr_err("cxl_handle_mm_fault failed %d", rc);
>> -            rc = -EFAULT;
>> -            goto out;
>> -        }
>> +        if (rc)
>> +            return -EFAULT;
>>       }
>> -    rc = 0;
>> -out:
>> -    up_read(&mm->mmap_sem);
>> -    return rc;
>> +    return 0;
>>   }
>>   EXPORT_SYMBOL_GPL(cxllib_handle_fault);
>>
> 
> -aneesh
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cxl: Fix possible deadlock when processing page faults from cxllib
  2018-04-03 15:31   ` Aneesh Kumar K.V
@ 2018-04-04 10:58     ` Frederic Barrat
  0 siblings, 0 replies; 7+ messages in thread
From: Frederic Barrat @ 2018-04-04 10:58 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Frederic Barrat, linuxppc-dev, mpe
  Cc: ldufour, clombard, andrew.donnellan, vaibhav



Le 03/04/2018 à 17:31, Aneesh Kumar K.V a écrit :
> On 04/03/2018 08:10 PM, Aneesh Kumar K.V wrote:
>> On 04/03/2018 03:13 PM, Frederic Barrat wrote:
>>> cxllib_handle_fault() is called by an external driver when it needs to
>>> have the host process page faults for a buffer which may cover several
>>> pages. Currently the function holds the mm->mmap_sem semaphore with
>>> read access while iterating over the buffer, since it could spawn
>>> several VMAs. When calling a lower-level function to handle the page
>>> fault for a single page, the semaphore is accessed again in read
>>> mode. That is wrong and can lead to deadlocks if a writer tries to
>>> sneak in while a buffer of several pages is being processed.
>>>
>>> The fix is to release the semaphore once cxllib_handle_fault() got the
>>> information it needs from the current vma. The address space/VMAs
>>> could evolve while we iterate over the full buffer, but in the
>>> unlikely case where we miss a page, the driver will raise a new page
>>> fault when retrying.
>>>
>>> Fixes: 3ced8d730063 ("cxl: Export library to support IBM XSL")
>>> Cc: stable@vger.kernel.org # 4.13+
>>> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
>>> ---
>>>   drivers/misc/cxl/cxllib.c | 85 
>>> ++++++++++++++++++++++++++++++-----------------
>>>   1 file changed, 55 insertions(+), 30 deletions(-)
>>>
>>> diff --git a/drivers/misc/cxl/cxllib.c b/drivers/misc/cxl/cxllib.c
>>> index 30ccba436b3b..55cd35d1a9cc 100644
>>> --- a/drivers/misc/cxl/cxllib.c
>>> +++ b/drivers/misc/cxl/cxllib.c
>>> @@ -208,49 +208,74 @@ int cxllib_get_PE_attributes(struct task_struct 
>>> *task,
>>>   }
>>>   EXPORT_SYMBOL_GPL(cxllib_get_PE_attributes);
>>> -int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, 
>>> u64 flags)
>>> +static int get_vma_info(struct mm_struct *mm, u64 addr,
>>> +            u64 *vma_start, u64 *vma_end,
>>> +            unsigned long *page_size)
>>>   {
>>> -    int rc;
>>> -    u64 dar;
>>>       struct vm_area_struct *vma = NULL;
>>> -    unsigned long page_size;
>>> -
>>> -    if (mm == NULL)
>>> -        return -EFAULT;
>>> +    int rc = 0;
>>>       down_read(&mm->mmap_sem);
>>>       vma = find_vma(mm, addr);
>>>       if (!vma) {
>>> -        pr_err("Can't find vma for addr %016llx\n", addr);
>>>           rc = -EFAULT;
>>>           goto out;
>>>       }
>>> -    /* get the size of the pages allocated */
>>> -    page_size = vma_kernel_pagesize(vma);
>>> -
>>> -    for (dar = (addr & ~(page_size - 1)); dar < (addr + size); dar 
>>> += page_size) {
>>> -        if (dar < vma->vm_start || dar >= vma->vm_end) {
>>> -            vma = find_vma(mm, addr);
>>> -            if (!vma) {
>>> -                pr_err("Can't find vma for addr %016llx\n", addr);
>>> -                rc = -EFAULT;
>>> -                goto out;
>>> -            }
>>> -            /* get the size of the pages allocated */
>>> -            page_size = vma_kernel_pagesize(vma);
>>> +    *page_size = vma_kernel_pagesize(vma);
>>> +    *vma_start = vma->vm_start;
>>> +    *vma_end = vma->vm_end;
>>> +out:
>>> +    up_read(&mm->mmap_sem);
>>> +    return rc;
>>> +}
>>> +
>>> +int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, 
>>> u64 flags)
>>> +{
>>> +    int rc;
>>> +    u64 dar, vma_start, vma_end;
>>> +    unsigned long page_size;
>>> +
>>> +    if (mm == NULL)
>>> +        return -EFAULT;
>>> +
>>> +    /*
>>> +     * The buffer we have to process can extend over several pages
>>> +     * and may also cover several VMAs.
>>> +     * We iterate over all the pages. The page size could vary
>>> +     * between VMAs.
>>> +     */
>>> +    rc = get_vma_info(mm, addr, &vma_start, &vma_end, &page_size);
>>> +    if (rc)
>>> +        return rc;
>>> +
>>> +    for (dar = (addr & ~(page_size - 1)); dar < (addr + size);
>>> +         dar += page_size) {
>>> +        if (dar < vma_start || dar >= vma_end) {
>>
>>
>> IIUC, we are fetching the vma to get just the page_size with which it 
>> is mapped? Can't we iterate with PAGE_SIZE? Considering hugetlb page 
>> size will be larger than PAGE_SIZE, we might call into 
>> cxl_handle_mm_fault multiple times for a hugetlb page. Does that cause 
>> any issue? Also can cxl be used with hugetlb mappings?
>>
> Can you also try to use a helper like below. That will clarify the need 
> of find_vma there.
> 
> static int __cxllib_handle_fault(struct mm_struct *mm, unsigned long 
> start, unsigned long end,
>                   unsigned long mapping_psize, u64 flags)
> {
>      int rc;
>      unsigned long dar;
> 
>      for (dar = start; dar < end; dar += mapping_psize) {
>          rc = cxl_handle_mm_fault(mm, flags, dar);
>          if (rc) {
>              rc = -EFAULT;
>              goto out;
>          }
>      }
>      rc = 0;
> out:
>      return rc;
> }


I'm struggling to make good use of it. I see your point, it makes 
touching all the pages for a given VMA easy (same page size).
But I'm given a buffer, which can span several VMAs, and we can have a 
varying page size. So I now need an outside loop iterating over all the 
VMAs and call the helper for a subset of the VMA. IMHO, it's not making 
the code any easier, i.e what I gain in the helper is lost in the 
outside VMA loop.
The current code, with a single loop and a varying increment based on 
the page size, doesn't look that bad to me.

   Fred

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-04-04 13:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-03  9:43 [PATCH] cxl: Fix possible deadlock when processing page faults from cxllib Frederic Barrat
2018-04-03  9:56 ` Laurent Dufour
2018-04-03 11:43 ` Vaibhav Jain
2018-04-03 14:40 ` Aneesh Kumar K.V
2018-04-03 15:31   ` Aneesh Kumar K.V
2018-04-04 10:58     ` Frederic Barrat
2018-04-03 16:40   ` Frederic Barrat

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.