linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 (re-send)] xen/gntdev: Use mempolicy instead of VM_IO flag to avoid NUMA balancing
@ 2016-11-17 20:10 Boris Ostrovsky
  2016-11-18  7:59 ` Olaf Hering
  2016-11-18 21:51 ` Hugh Dickins
  0 siblings, 2 replies; 7+ messages in thread
From: Boris Ostrovsky @ 2016-11-17 20:10 UTC (permalink / raw)
  To: david.vrabel, jgross
  Cc: xen-devel, linux-kernel, olaf, Boris Ostrovsky, stable

Commit 9c17d96500f7 ("xen/gntdev: Grant maps should not be subject to
NUMA balancing") set VM_IO flag to prevent grant maps from being
subjected to NUMA balancing.

It was discovered recently that this flag causes get_user_pages() to
always fail with -EFAULT.

check_vma_flags
__get_user_pages
__get_user_pages_locked
__get_user_pages_unlocked
get_user_pages_fast
iov_iter_get_pages
dio_refill_pages
do_direct_IO
do_blockdev_direct_IO
do_blockdev_direct_IO
ext4_direct_IO_read
generic_file_read_iter
aio_run_iocb

(which can happen if guest's vdisk has direct-io-safe option).

To avoid this don't use vm_flags. Instead, use mempolicy that prohibits
page migration (i.e. clear MPOL_F_MOF|MPOL_F_MORON) and make sure we
don't consult task's policy (which may include those flags) if vma
doesn't have one.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: stable@vger.kernel.org
---

Mis-spelled David's address.

Changes in v3:
* Don't use __mpol_dup() and get_task_policy() which are not exported
  for use by drivers. Add vm_operations_struct.get_policy().
* Copy to stable


 drivers/xen/gntdev.c |   27 ++++++++++++++++++++++++++-
 1 files changed, 26 insertions(+), 1 deletions(-)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index bb95212..632edd4 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -35,6 +35,7 @@
 #include <linux/spinlock.h>
 #include <linux/slab.h>
 #include <linux/highmem.h>
+#include <linux/mempolicy.h>
 
 #include <xen/xen.h>
 #include <xen/grant_table.h>
@@ -433,10 +434,28 @@ static void gntdev_vma_close(struct vm_area_struct *vma)
 	return map->pages[(addr - map->pages_vm_start) >> PAGE_SHIFT];
 }
 
+#ifdef CONFIG_NUMA
+/*
+ * We have this op to make sure callers (such as vma_policy_mof()) don't
+ * check current task's policy which may include migrate flags (MPOL_F_MOF
+ * or MPOL_F_MORON)
+ */
+static struct mempolicy *gntdev_vma_get_policy(struct vm_area_struct *vma,
+					       unsigned long addr)
+{
+	if (mpol_needs_cond_ref(vma->vm_policy))
+		mpol_get(vma->vm_policy);
+	return vma->vm_policy;
+}
+#endif
+
 static const struct vm_operations_struct gntdev_vmops = {
 	.open = gntdev_vma_open,
 	.close = gntdev_vma_close,
 	.find_special_page = gntdev_vma_find_special_page,
+#ifdef CONFIG_NUMA
+	.get_policy = gntdev_vma_get_policy,
+#endif
 };
 
 /* ------------------------------------------------------------------ */
@@ -1007,7 +1026,13 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 
 	vma->vm_ops = &gntdev_vmops;
 
-	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
+	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+
+#ifdef CONFIG_NUMA
+	/* Prevent NUMA balancing */
+	if (vma->vm_policy)
+		vma->vm_policy->flags &= ~(MPOL_F_MOF | MPOL_F_MORON);
+#endif
 
 	if (use_ptemod)
 		vma->vm_flags |= VM_DONTCOPY;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 (re-send)] xen/gntdev: Use mempolicy instead of VM_IO flag to avoid NUMA balancing
  2016-11-17 20:10 [PATCH v3 (re-send)] xen/gntdev: Use mempolicy instead of VM_IO flag to avoid NUMA balancing Boris Ostrovsky
@ 2016-11-18  7:59 ` Olaf Hering
  2016-11-18 21:51 ` Hugh Dickins
  1 sibling, 0 replies; 7+ messages in thread
From: Olaf Hering @ 2016-11-18  7:59 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: david.vrabel, jgross, xen-devel, linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 292 bytes --]

On Thu, Nov 17, Boris Ostrovsky wrote:

> Commit 9c17d96500f7 ("xen/gntdev: Grant maps should not be subject to
> NUMA balancing") set VM_IO flag to prevent grant maps from being
> subjected to NUMA balancing.

Thanks, this works for me in 4.4:

Tested-by: Olaf Hering <olaf@aepfle.de>

Olaf

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 (re-send)] xen/gntdev: Use mempolicy instead of VM_IO flag to avoid NUMA balancing
  2016-11-17 20:10 [PATCH v3 (re-send)] xen/gntdev: Use mempolicy instead of VM_IO flag to avoid NUMA balancing Boris Ostrovsky
  2016-11-18  7:59 ` Olaf Hering
@ 2016-11-18 21:51 ` Hugh Dickins
  2016-11-18 22:14   ` Boris Ostrovsky
  1 sibling, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2016-11-18 21:51 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Mel Gorman, david.vrabel, jgross, xen-devel, linux-kernel,
	linux-mm, olaf

On Thu, 17 Nov 2016, Boris Ostrovsky wrote:

> Commit 9c17d96500f7 ("xen/gntdev: Grant maps should not be subject to
> NUMA balancing") set VM_IO flag to prevent grant maps from being
> subjected to NUMA balancing.
> 
> It was discovered recently that this flag causes get_user_pages() to
> always fail with -EFAULT.
> 
> check_vma_flags
> __get_user_pages
> __get_user_pages_locked
> __get_user_pages_unlocked
> get_user_pages_fast
> iov_iter_get_pages
> dio_refill_pages
> do_direct_IO
> do_blockdev_direct_IO
> do_blockdev_direct_IO
> ext4_direct_IO_read
> generic_file_read_iter
> aio_run_iocb
> 
> (which can happen if guest's vdisk has direct-io-safe option).
> 
> To avoid this don't use vm_flags. Instead, use mempolicy that prohibits
> page migration (i.e. clear MPOL_F_MOF|MPOL_F_MORON) and make sure we
> don't consult task's policy (which may include those flags) if vma
> doesn't have one.
> 
> Reported-by: Olaf Hering <olaf@aepfle.de>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: stable@vger.kernel.org

Hmm, sorry, but this seems overcomplicated to me: ingenious, but an
unusual use of the ->get_policy method, which is a little worrying,
since it has only been used for shmem (+ shm and kernfs) until now.

Maybe I'm wrong, but wouldn't substituting VM_MIXEDMAP for VM_IO
solve the problem more simply?

Hugh

> ---
> 
> Mis-spelled David's address.
> 
> Changes in v3:
> * Don't use __mpol_dup() and get_task_policy() which are not exported
>   for use by drivers. Add vm_operations_struct.get_policy().
> * Copy to stable
> 
> 
>  drivers/xen/gntdev.c |   27 ++++++++++++++++++++++++++-
>  1 files changed, 26 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index bb95212..632edd4 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -35,6 +35,7 @@
>  #include <linux/spinlock.h>
>  #include <linux/slab.h>
>  #include <linux/highmem.h>
> +#include <linux/mempolicy.h>
>  
>  #include <xen/xen.h>
>  #include <xen/grant_table.h>
> @@ -433,10 +434,28 @@ static void gntdev_vma_close(struct vm_area_struct *vma)
>  	return map->pages[(addr - map->pages_vm_start) >> PAGE_SHIFT];
>  }
>  
> +#ifdef CONFIG_NUMA
> +/*
> + * We have this op to make sure callers (such as vma_policy_mof()) don't
> + * check current task's policy which may include migrate flags (MPOL_F_MOF
> + * or MPOL_F_MORON)
> + */
> +static struct mempolicy *gntdev_vma_get_policy(struct vm_area_struct *vma,
> +					       unsigned long addr)
> +{
> +	if (mpol_needs_cond_ref(vma->vm_policy))
> +		mpol_get(vma->vm_policy);
> +	return vma->vm_policy;
> +}
> +#endif
> +
>  static const struct vm_operations_struct gntdev_vmops = {
>  	.open = gntdev_vma_open,
>  	.close = gntdev_vma_close,
>  	.find_special_page = gntdev_vma_find_special_page,
> +#ifdef CONFIG_NUMA
> +	.get_policy = gntdev_vma_get_policy,
> +#endif
>  };
>  
>  /* ------------------------------------------------------------------ */
> @@ -1007,7 +1026,13 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>  
>  	vma->vm_ops = &gntdev_vmops;
>  
> -	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
> +	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
> +
> +#ifdef CONFIG_NUMA
> +	/* Prevent NUMA balancing */
> +	if (vma->vm_policy)
> +		vma->vm_policy->flags &= ~(MPOL_F_MOF | MPOL_F_MORON);
> +#endif
>  
>  	if (use_ptemod)
>  		vma->vm_flags |= VM_DONTCOPY;
> -- 
> 1.7.1
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 (re-send)] xen/gntdev: Use mempolicy instead of VM_IO flag to avoid NUMA balancing
  2016-11-18 21:51 ` Hugh Dickins
@ 2016-11-18 22:14   ` Boris Ostrovsky
  2016-11-18 22:27     ` Hugh Dickins
  0 siblings, 1 reply; 7+ messages in thread
From: Boris Ostrovsky @ 2016-11-18 22:14 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Mel Gorman, david.vrabel, jgross, xen-devel, linux-kernel,
	linux-mm, olaf

On 11/18/2016 04:51 PM, Hugh Dickins wrote:
> On Thu, 17 Nov 2016, Boris Ostrovsky wrote:
>
>> Commit 9c17d96500f7 ("xen/gntdev: Grant maps should not be subject to
>> NUMA balancing") set VM_IO flag to prevent grant maps from being
>> subjected to NUMA balancing.
>>
>> It was discovered recently that this flag causes get_user_pages() to
>> always fail with -EFAULT.
>>
>> check_vma_flags
>> __get_user_pages
>> __get_user_pages_locked
>> __get_user_pages_unlocked
>> get_user_pages_fast
>> iov_iter_get_pages
>> dio_refill_pages
>> do_direct_IO
>> do_blockdev_direct_IO
>> do_blockdev_direct_IO
>> ext4_direct_IO_read
>> generic_file_read_iter
>> aio_run_iocb
>>
>> (which can happen if guest's vdisk has direct-io-safe option).
>>
>> To avoid this don't use vm_flags. Instead, use mempolicy that prohibits
>> page migration (i.e. clear MPOL_F_MOF|MPOL_F_MORON) and make sure we
>> don't consult task's policy (which may include those flags) if vma
>> doesn't have one.
>>
>> Reported-by: Olaf Hering <olaf@aepfle.de>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Cc: stable@vger.kernel.org
> Hmm, sorry, but this seems overcomplicated to me: ingenious, but an
> unusual use of the ->get_policy method, which is a little worrying,
> since it has only been used for shmem (+ shm and kernfs) until now.
>
> Maybe I'm wrong, but wouldn't substituting VM_MIXEDMAP for VM_IO
> solve the problem more simply?

It would indeed. I didn't want to use it because it has specific meaning
("Can contain "struct page" and pure PFN pages") and that didn't seem
like the right flag to describe this vma.


-boris


>
> Hugh
>
>> ---
>>
>> Mis-spelled David's address.
>>
>> Changes in v3:
>> * Don't use __mpol_dup() and get_task_policy() which are not exported
>>   for use by drivers. Add vm_operations_struct.get_policy().
>> * Copy to stable
>>
>>
>>  drivers/xen/gntdev.c |   27 ++++++++++++++++++++++++++-
>>  1 files changed, 26 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>> index bb95212..632edd4 100644
>> --- a/drivers/xen/gntdev.c
>> +++ b/drivers/xen/gntdev.c
>> @@ -35,6 +35,7 @@
>>  #include <linux/spinlock.h>
>>  #include <linux/slab.h>
>>  #include <linux/highmem.h>
>> +#include <linux/mempolicy.h>
>>  
>>  #include <xen/xen.h>
>>  #include <xen/grant_table.h>
>> @@ -433,10 +434,28 @@ static void gntdev_vma_close(struct vm_area_struct *vma)
>>  	return map->pages[(addr - map->pages_vm_start) >> PAGE_SHIFT];
>>  }
>>  
>> +#ifdef CONFIG_NUMA
>> +/*
>> + * We have this op to make sure callers (such as vma_policy_mof()) don't
>> + * check current task's policy which may include migrate flags (MPOL_F_MOF
>> + * or MPOL_F_MORON)
>> + */
>> +static struct mempolicy *gntdev_vma_get_policy(struct vm_area_struct *vma,
>> +					       unsigned long addr)
>> +{
>> +	if (mpol_needs_cond_ref(vma->vm_policy))
>> +		mpol_get(vma->vm_policy);
>> +	return vma->vm_policy;
>> +}
>> +#endif
>> +
>>  static const struct vm_operations_struct gntdev_vmops = {
>>  	.open = gntdev_vma_open,
>>  	.close = gntdev_vma_close,
>>  	.find_special_page = gntdev_vma_find_special_page,
>> +#ifdef CONFIG_NUMA
>> +	.get_policy = gntdev_vma_get_policy,
>> +#endif
>>  };
>>  
>>  /* ------------------------------------------------------------------ */
>> @@ -1007,7 +1026,13 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>>  
>>  	vma->vm_ops = &gntdev_vmops;
>>  
>> -	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
>> +	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
>> +
>> +#ifdef CONFIG_NUMA
>> +	/* Prevent NUMA balancing */
>> +	if (vma->vm_policy)
>> +		vma->vm_policy->flags &= ~(MPOL_F_MOF | MPOL_F_MORON);
>> +#endif
>>  
>>  	if (use_ptemod)
>>  		vma->vm_flags |= VM_DONTCOPY;
>> -- 
>> 1.7.1
>>
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 (re-send)] xen/gntdev: Use mempolicy instead of VM_IO flag to avoid NUMA balancing
  2016-11-18 22:14   ` Boris Ostrovsky
@ 2016-11-18 22:27     ` Hugh Dickins
  2016-11-18 22:49       ` Boris Ostrovsky
  0 siblings, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2016-11-18 22:27 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Hugh Dickins, Mel Gorman, david.vrabel, jgross, xen-devel,
	linux-kernel, linux-mm, olaf

On Fri, 18 Nov 2016, Boris Ostrovsky wrote:
> On 11/18/2016 04:51 PM, Hugh Dickins wrote:
> > Hmm, sorry, but this seems overcomplicated to me: ingenious, but an
> > unusual use of the ->get_policy method, which is a little worrying,
> > since it has only been used for shmem (+ shm and kernfs) until now.
> >
> > Maybe I'm wrong, but wouldn't substituting VM_MIXEDMAP for VM_IO
> > solve the problem more simply?
> 
> It would indeed. I didn't want to use it because it has specific meaning
> ("Can contain "struct page" and pure PFN pages") and that didn't seem
> like the right flag to describe this vma.

It is okay if it contains 0 pure PFN pages; and no worse than VM_IO was.
A comment on why VM_MIXEDMAP is being used there would certainly be good.
But I do find its use preferable to enlisting an unusual ->get_policy.

Hugh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 (re-send)] xen/gntdev: Use mempolicy instead of VM_IO flag to avoid NUMA balancing
  2016-11-18 22:27     ` Hugh Dickins
@ 2016-11-18 22:49       ` Boris Ostrovsky
  2016-11-18 23:27         ` Hugh Dickins
  0 siblings, 1 reply; 7+ messages in thread
From: Boris Ostrovsky @ 2016-11-18 22:49 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Mel Gorman, david.vrabel, jgross, xen-devel, linux-kernel,
	linux-mm, olaf

On 11/18/2016 05:27 PM, Hugh Dickins wrote:
> On Fri, 18 Nov 2016, Boris Ostrovsky wrote:
>> On 11/18/2016 04:51 PM, Hugh Dickins wrote:
>>> Hmm, sorry, but this seems overcomplicated to me: ingenious, but an
>>> unusual use of the ->get_policy method, which is a little worrying,
>>> since it has only been used for shmem (+ shm and kernfs) until now.
>>>
>>> Maybe I'm wrong, but wouldn't substituting VM_MIXEDMAP for VM_IO
>>> solve the problem more simply?
>> It would indeed. I didn't want to use it because it has specific meaning
>> ("Can contain "struct page" and pure PFN pages") and that didn't seem
>> like the right flag to describe this vma.
> It is okay if it contains 0 pure PFN pages; and no worse than VM_IO was.
> A comment on why VM_MIXEDMAP is being used there would certainly be good.
> But I do find its use preferable to enlisting an unusual ->get_policy.

OK, I'll set VM_MIXEDMAP then.

I am still curious though why you feel get_policy is not appropriate
here (beside the fact that so far it had limited use). It is essentially
trying to say that the only policy to be consulted (in vma_policy_mof())
is of the vma itself and not of the task.

-boris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 (re-send)] xen/gntdev: Use mempolicy instead of VM_IO flag to avoid NUMA balancing
  2016-11-18 22:49       ` Boris Ostrovsky
@ 2016-11-18 23:27         ` Hugh Dickins
  0 siblings, 0 replies; 7+ messages in thread
From: Hugh Dickins @ 2016-11-18 23:27 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Hugh Dickins, Mel Gorman, david.vrabel, jgross, xen-devel,
	linux-kernel, linux-mm, olaf

On Fri, 18 Nov 2016, Boris Ostrovsky wrote:
> On 11/18/2016 05:27 PM, Hugh Dickins wrote:
> > On Fri, 18 Nov 2016, Boris Ostrovsky wrote:
> >> On 11/18/2016 04:51 PM, Hugh Dickins wrote:
> >>> Hmm, sorry, but this seems overcomplicated to me: ingenious, but an
> >>> unusual use of the ->get_policy method, which is a little worrying,
> >>> since it has only been used for shmem (+ shm and kernfs) until now.
> >>>
> >>> Maybe I'm wrong, but wouldn't substituting VM_MIXEDMAP for VM_IO
> >>> solve the problem more simply?
> >> It would indeed. I didn't want to use it because it has specific meaning
> >> ("Can contain "struct page" and pure PFN pages") and that didn't seem
> >> like the right flag to describe this vma.
> > It is okay if it contains 0 pure PFN pages; and no worse than VM_IO was.
> > A comment on why VM_MIXEDMAP is being used there would certainly be good.
> > But I do find its use preferable to enlisting an unusual ->get_policy.
> 
> OK, I'll set VM_MIXEDMAP then.

Thanks, if it accomplishes what you need, then please do use it.

> 
> I am still curious though why you feel get_policy is not appropriate
> here (beside the fact that so far it had limited use). It is essentially
> trying to say that the only policy to be consulted (in vma_policy_mof())
> is of the vma itself and not of the task.

I agree that get_policy is explicitly about NUMA, and so relevant to the
matter of (discouraging) NUMA balancing, without any apology needed.

But there are no other examples of its use that way, it's been something
private to shmem (hence shm and kernfs) up until now: the complement of
set_policy, which implements the mbind() syscall on shmem objects.

Introduce an exceptional new usage, and we're likely to introduce bugs
(not to mention the long history of bugs in mpol_dup() that you also use).
Perhaps I'd find one already if I took the time to study your patch.

Full disclosure: I'm also contemplating a change to its interface,
to handle a possible NUMA interleave issue, so I do need to keep
an eye on all its callers.

If we have to choose between two less-than-ideal solutions,
please let's choose the simplest.

Hugh

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-11-18 23:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-17 20:10 [PATCH v3 (re-send)] xen/gntdev: Use mempolicy instead of VM_IO flag to avoid NUMA balancing Boris Ostrovsky
2016-11-18  7:59 ` Olaf Hering
2016-11-18 21:51 ` Hugh Dickins
2016-11-18 22:14   ` Boris Ostrovsky
2016-11-18 22:27     ` Hugh Dickins
2016-11-18 22:49       ` Boris Ostrovsky
2016-11-18 23:27         ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).