linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory
@ 2017-09-22 20:13 Reza Arbab
  2017-09-22 20:31 ` Reza Arbab
  2017-09-26 13:37 ` Michal Hocko
  0 siblings, 2 replies; 8+ messages in thread
From: Reza Arbab @ 2017-09-22 20:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Jan Kara, Ross Zwisler, Kirill A. Shutemov,
	Lorenzo Stoakes, Dave Jiang, Jérôme Glisse,
	Matthew Wilcox, Hugh Dickins, Huang Ying, Ingo Molnar,
	Aneesh Kumar K.V, James Morse, Naoya Horiguchi, Minchan Kim,
	Johannes Weiner, Will Deacon, linux-mm, linux-kernel

The move_pages() syscall can be used to find the numa node where a page
currently resides. This is not working for device public memory pages,
which erroneously report -EFAULT (unmapped or zero page).

Enable by adding a FOLL_DEVICE flag for follow_page(), which
move_pages() will use. This could be done unconditionally, but adding a
flag seems like a safer change.

Cc: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
 include/linux/mm.h | 1 +
 mm/gup.c           | 2 +-
 mm/migrate.c       | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f8c10d3..783cb57 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2368,6 +2368,7 @@ static inline struct page *follow_page(struct vm_area_struct *vma,
 #define FOLL_MLOCK	0x1000	/* lock present pages */
 #define FOLL_REMOTE	0x2000	/* we are working on non-current tsk/mm */
 #define FOLL_COW	0x4000	/* internal GUP flag */
+#define FOLL_DEVICE	0x8000	/* return device pages */
 
 static inline int vm_fault_to_errno(int vm_fault, int foll_flags)
 {
diff --git a/mm/gup.c b/mm/gup.c
index b2b4d42..6fbad70 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -110,7 +110,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 		return NULL;
 	}
 
-	page = vm_normal_page(vma, address, pte);
+	page = _vm_normal_page(vma, address, pte, flags & FOLL_DEVICE);
 	if (!page && pte_devmap(pte) && (flags & FOLL_GET)) {
 		/*
 		 * Only return device mapping pages in the FOLL_GET case since
diff --git a/mm/migrate.c b/mm/migrate.c
index 6954c14..dea0ceb 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1690,7 +1690,7 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
 			goto set_status;
 
 		/* FOLL_DUMP to ignore special (like zero) pages */
-		page = follow_page(vma, addr, FOLL_DUMP);
+		page = follow_page(vma, addr, FOLL_DUMP | FOLL_DEVICE);
 
 		err = PTR_ERR(page);
 		if (IS_ERR(page))
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory
  2017-09-22 20:13 [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory Reza Arbab
@ 2017-09-22 20:31 ` Reza Arbab
  2017-09-22 21:01   ` Reza Arbab
  2017-09-26 13:37 ` Michal Hocko
  1 sibling, 1 reply; 8+ messages in thread
From: Reza Arbab @ 2017-09-22 20:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Jan Kara, Ross Zwisler, Kirill A. Shutemov,
	Lorenzo Stoakes, Dave Jiang, Jérôme Glisse,
	Matthew Wilcox, Hugh Dickins, Huang Ying, Ingo Molnar,
	Aneesh Kumar K.V, James Morse, Naoya Horiguchi, Minchan Kim,
	Johannes Weiner, Will Deacon, linux-mm, linux-kernel

On Fri, Sep 22, 2017 at 08:13:56PM +0000, Reza Arbab wrote:
>The move_pages() syscall can be used to find the numa node where a page
>currently resides. This is not working for device public memory pages,
>which erroneously report -EFAULT (unmapped or zero page).

Argh. Please disregard this patch.

My test setup has a chunk of system memory carved out as pretend device 
public memory, to experiment with. Of course the real thing has no numa 
node!

Apologies all, it's been a long day.

-- 
Reza Arbab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory
  2017-09-22 20:31 ` Reza Arbab
@ 2017-09-22 21:01   ` Reza Arbab
  0 siblings, 0 replies; 8+ messages in thread
From: Reza Arbab @ 2017-09-22 21:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Jan Kara, Ross Zwisler, Kirill A. Shutemov,
	Lorenzo Stoakes, Dave Jiang, Jérôme Glisse,
	Matthew Wilcox, Hugh Dickins, Huang Ying, Ingo Molnar,
	Aneesh Kumar K.V, James Morse, Naoya Horiguchi, Minchan Kim,
	Johannes Weiner, Will Deacon, linux-mm, linux-kernel

On Fri, Sep 22, 2017 at 08:31:57PM +0000, Reza Arbab wrote:
>On Fri, Sep 22, 2017 at 08:13:56PM +0000, Reza Arbab wrote:
>>The move_pages() syscall can be used to find the numa node where a page
>>currently resides. This is not working for device public memory pages,
>>which erroneously report -EFAULT (unmapped or zero page).
>
>Argh. Please disregard this patch.
>
>My test setup has a chunk of system memory carved out as pretend 
>device public memory, to experiment with. Of course the real thing has 
>no numa node!

On third thought, yes it does! 

static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
{
	:
	nid = dev_to_node(device);
	if (nid < 0)
		nid = numa_mem_id();
	:
	if (devmem->pagemap.type == MEMORY_DEVICE_PUBLIC)
		ret = arch_add_memory(nid, align_start, align_size, false);
	:
}

So now I think the patch may be right after all. Please un-disregard it.  
Regard it? Whatever.

-- 
Reza Arbab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory
  2017-09-22 20:13 [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory Reza Arbab
  2017-09-22 20:31 ` Reza Arbab
@ 2017-09-26 13:37 ` Michal Hocko
  2017-09-26 14:47   ` Reza Arbab
  1 sibling, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2017-09-26 13:37 UTC (permalink / raw)
  To: Reza Arbab
  Cc: Andrew Morton, Jan Kara, Ross Zwisler, Kirill A. Shutemov,
	Lorenzo Stoakes, Dave Jiang, Jérôme Glisse,
	Matthew Wilcox, Hugh Dickins, Huang Ying, Ingo Molnar,
	Aneesh Kumar K.V, James Morse, Naoya Horiguchi, Minchan Kim,
	Johannes Weiner, Will Deacon, linux-mm, linux-kernel

On Fri 22-09-17 15:13:56, Reza Arbab wrote:
> The move_pages() syscall can be used to find the numa node where a page
> currently resides. This is not working for device public memory pages,
> which erroneously report -EFAULT (unmapped or zero page).
> 
> Enable by adding a FOLL_DEVICE flag for follow_page(), which
> move_pages() will use. This could be done unconditionally, but adding a
> flag seems like a safer change.

I do not understand purpose of this patch. What is the numa node of a
device memory?

> Cc: Jerome Glisse <jglisse@redhat.com>
> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
>  include/linux/mm.h | 1 +
>  mm/gup.c           | 2 +-
>  mm/migrate.c       | 2 +-
>  3 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f8c10d3..783cb57 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2368,6 +2368,7 @@ static inline struct page *follow_page(struct vm_area_struct *vma,
>  #define FOLL_MLOCK	0x1000	/* lock present pages */
>  #define FOLL_REMOTE	0x2000	/* we are working on non-current tsk/mm */
>  #define FOLL_COW	0x4000	/* internal GUP flag */
> +#define FOLL_DEVICE	0x8000	/* return device pages */
>  
>  static inline int vm_fault_to_errno(int vm_fault, int foll_flags)
>  {
> diff --git a/mm/gup.c b/mm/gup.c
> index b2b4d42..6fbad70 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -110,7 +110,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>  		return NULL;
>  	}
>  
> -	page = vm_normal_page(vma, address, pte);
> +	page = _vm_normal_page(vma, address, pte, flags & FOLL_DEVICE);
>  	if (!page && pte_devmap(pte) && (flags & FOLL_GET)) {
>  		/*
>  		 * Only return device mapping pages in the FOLL_GET case since
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 6954c14..dea0ceb 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1690,7 +1690,7 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
>  			goto set_status;
>  
>  		/* FOLL_DUMP to ignore special (like zero) pages */
> -		page = follow_page(vma, addr, FOLL_DUMP);
> +		page = follow_page(vma, addr, FOLL_DUMP | FOLL_DEVICE);
>  
>  		err = PTR_ERR(page);
>  		if (IS_ERR(page))
> -- 
> 1.8.3.1
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory
  2017-09-26 13:37 ` Michal Hocko
@ 2017-09-26 14:47   ` Reza Arbab
  2017-09-26 16:19     ` Jerome Glisse
  2017-09-26 16:32     ` Michal Hocko
  0 siblings, 2 replies; 8+ messages in thread
From: Reza Arbab @ 2017-09-26 14:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Jan Kara, Ross Zwisler, Kirill A. Shutemov,
	Lorenzo Stoakes, Dave Jiang, Jérôme Glisse,
	Matthew Wilcox, Hugh Dickins, Huang Ying, Ingo Molnar,
	Aneesh Kumar K.V, James Morse, Naoya Horiguchi, Minchan Kim,
	Johannes Weiner, Will Deacon, linux-mm, linux-kernel

On Tue, Sep 26, 2017 at 01:37:07PM +0000, Michal Hocko wrote:
>On Fri 22-09-17 15:13:56, Reza Arbab wrote:
>> The move_pages() syscall can be used to find the numa node where a page
>> currently resides. This is not working for device public memory pages,
>> which erroneously report -EFAULT (unmapped or zero page).
>>
>> Enable by adding a FOLL_DEVICE flag for follow_page(), which
>> move_pages() will use. This could be done unconditionally, but adding a
>> flag seems like a safer change.
>
>I do not understand purpose of this patch. What is the numa node of a
>device memory?

Well, using hmm_devmem_pages_create() it is added to this node:

	nid = dev_to_node(device);
	if (nid < 0)
		nid = numa_mem_id();

I understand it's minimally useful information to userspace, but the 
memory does have a nid and move_pages() is supposed to be able to return 
what that is. I ran into this using a testcase which tries to verify 
that user addresses were correctly migrated to coherent device memory.

That said, I'm okay with dropping this if you don't think it's 
worthwhile.

-- 
Reza Arbab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory
  2017-09-26 14:47   ` Reza Arbab
@ 2017-09-26 16:19     ` Jerome Glisse
  2017-09-26 16:32     ` Michal Hocko
  1 sibling, 0 replies; 8+ messages in thread
From: Jerome Glisse @ 2017-09-26 16:19 UTC (permalink / raw)
  To: Reza Arbab
  Cc: Michal Hocko, Andrew Morton, Jan Kara, Ross Zwisler,
	Kirill A. Shutemov, Lorenzo Stoakes, Dave Jiang, Matthew Wilcox,
	Hugh Dickins, Huang Ying, Ingo Molnar, Aneesh Kumar K.V,
	James Morse, Naoya Horiguchi, Minchan Kim, Johannes Weiner,
	Will Deacon, linux-mm, linux-kernel

On Tue, Sep 26, 2017 at 09:47:10AM -0500, Reza Arbab wrote:
> On Tue, Sep 26, 2017 at 01:37:07PM +0000, Michal Hocko wrote:
> > On Fri 22-09-17 15:13:56, Reza Arbab wrote:
> > > The move_pages() syscall can be used to find the numa node where a page
> > > currently resides. This is not working for device public memory pages,
> > > which erroneously report -EFAULT (unmapped or zero page).
> > > 
> > > Enable by adding a FOLL_DEVICE flag for follow_page(), which
> > > move_pages() will use. This could be done unconditionally, but adding a
> > > flag seems like a safer change.
> > 
> > I do not understand purpose of this patch. What is the numa node of a
> > device memory?
> 
> Well, using hmm_devmem_pages_create() it is added to this node:
> 
> 	nid = dev_to_node(device);
> 	if (nid < 0)
> 		nid = numa_mem_id();
> 
> I understand it's minimally useful information to userspace, but the memory
> does have a nid and move_pages() is supposed to be able to return what that
> is. I ran into this using a testcase which tries to verify that user
> addresses were correctly migrated to coherent device memory.
> 
> That said, I'm okay with dropping this if you don't think it's worthwhile.

Just to add a data point, PCIE devices are tie to one CPU (architecturaly PCIE
lane are connected to CPU at least on x86/ppc AFAIK) and thus to one numa node.


Right now i am traveling but i want to check that this patch does not allow
user to inadvertaly pin device memory page. I will look into it once i am
back.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory
  2017-09-26 14:47   ` Reza Arbab
  2017-09-26 16:19     ` Jerome Glisse
@ 2017-09-26 16:32     ` Michal Hocko
  2017-09-26 18:35       ` Reza Arbab
  1 sibling, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2017-09-26 16:32 UTC (permalink / raw)
  To: Reza Arbab
  Cc: Andrew Morton, Jan Kara, Ross Zwisler, Kirill A. Shutemov,
	Lorenzo Stoakes, Dave Jiang, Jérôme Glisse,
	Matthew Wilcox, Hugh Dickins, Huang Ying, Ingo Molnar,
	Aneesh Kumar K.V, James Morse, Naoya Horiguchi, Minchan Kim,
	Johannes Weiner, Will Deacon, linux-mm, linux-kernel

On Tue 26-09-17 09:47:10, Reza Arbab wrote:
> On Tue, Sep 26, 2017 at 01:37:07PM +0000, Michal Hocko wrote:
> > On Fri 22-09-17 15:13:56, Reza Arbab wrote:
> > > The move_pages() syscall can be used to find the numa node where a page
> > > currently resides. This is not working for device public memory pages,
> > > which erroneously report -EFAULT (unmapped or zero page).
> > > 
> > > Enable by adding a FOLL_DEVICE flag for follow_page(), which
> > > move_pages() will use. This could be done unconditionally, but adding a
> > > flag seems like a safer change.
> > 
> > I do not understand purpose of this patch. What is the numa node of a
> > device memory?
> 
> Well, using hmm_devmem_pages_create() it is added to this node:
> 
> 	nid = dev_to_node(device);
> 	if (nid < 0)
> 		nid = numa_mem_id();

OK, but do all the HMM devices have concept of NUMA affinity? From the
code you are pasting they do not have to...
 
> I understand it's minimally useful information to userspace, but the memory
> does have a nid and move_pages() is supposed to be able to return what that
> is. I ran into this using a testcase which tries to verify that user
> addresses were correctly migrated to coherent device memory.
> 
> That said, I'm okay with dropping this if you don't think it's worthwhile.

I am just worried that we allow information which is not generally
sensible and I am also not sure what the userspace can actually do with
that information.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory
  2017-09-26 16:32     ` Michal Hocko
@ 2017-09-26 18:35       ` Reza Arbab
  0 siblings, 0 replies; 8+ messages in thread
From: Reza Arbab @ 2017-09-26 18:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Jan Kara, Ross Zwisler, Kirill A. Shutemov,
	Lorenzo Stoakes, Dave Jiang, Jérôme Glisse,
	Matthew Wilcox, Hugh Dickins, Huang Ying, Ingo Molnar,
	Aneesh Kumar K.V, James Morse, Naoya Horiguchi, Minchan Kim,
	Johannes Weiner, Will Deacon, linux-mm, linux-kernel

On Tue, Sep 26, 2017 at 04:32:41PM +0000, Michal Hocko wrote:
>On Tue 26-09-17 09:47:10, Reza Arbab wrote:
>> On Tue, Sep 26, 2017 at 01:37:07PM +0000, Michal Hocko wrote:
>> > On Fri 22-09-17 15:13:56, Reza Arbab wrote:
>> > > The move_pages() syscall can be used to find the numa node where a page
>> > > currently resides. This is not working for device public memory pages,
>> > > which erroneously report -EFAULT (unmapped or zero page).
>> > >
>> > > Enable by adding a FOLL_DEVICE flag for follow_page(), which
>> > > move_pages() will use. This could be done unconditionally, but adding a
>> > > flag seems like a safer change.
>> >
>> > I do not understand purpose of this patch. What is the numa node of a
>> > device memory?
>>
>> Well, using hmm_devmem_pages_create() it is added to this node:
>>
>> 	nid = dev_to_node(device);
>> 	if (nid < 0)
>> 		nid = numa_mem_id();
>
>OK, but do all the HMM devices have concept of NUMA affinity? From the
>code you are pasting they do not have to...

I don't know the definitive answer here, but as Jerome said PCIE devices 
should, and we are heading that way with NVLink/CAPI as well. It seems 
the default is just the nearest node.

>> I understand it's minimally useful information to userspace, but the memory
>> does have a nid and move_pages() is supposed to be able to return what that
>> is. I ran into this using a testcase which tries to verify that user
>> addresses were correctly migrated to coherent device memory.
>>
>> That said, I'm okay with dropping this if you don't think it's worthwhile.
>
>I am just worried that we allow information which is not generally
>sensible and I am also not sure what the userspace can actually do with
>that information.

As mentioned, it is minimally useful, e.g. for verifying migration, so 
returning the nid seems sensible to me. Alternatively, we might at least 
change the documentation to say 

-EFAULT
    This is a zero page, a device page, or the memory area is not mapped by the process.
			 ^^^^^^^^^^^^^

-- 
Reza Arbab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-09-26 18:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-22 20:13 [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory Reza Arbab
2017-09-22 20:31 ` Reza Arbab
2017-09-22 21:01   ` Reza Arbab
2017-09-26 13:37 ` Michal Hocko
2017-09-26 14:47   ` Reza Arbab
2017-09-26 16:19     ` Jerome Glisse
2017-09-26 16:32     ` Michal Hocko
2017-09-26 18:35       ` Reza Arbab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).