* [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory
@ 2017-09-22 20:13 Reza Arbab
2017-09-22 20:31 ` Reza Arbab
2017-09-26 13:37 ` Michal Hocko
0 siblings, 2 replies; 8+ messages in thread
From: Reza Arbab @ 2017-09-22 20:13 UTC (permalink / raw)
To: Andrew Morton
Cc: Michal Hocko, Jan Kara, Ross Zwisler, Kirill A. Shutemov,
Lorenzo Stoakes, Dave Jiang, Jérôme Glisse,
Matthew Wilcox, Hugh Dickins, Huang Ying, Ingo Molnar,
Aneesh Kumar K.V, James Morse, Naoya Horiguchi, Minchan Kim,
Johannes Weiner, Will Deacon, linux-mm, linux-kernel
The move_pages() syscall can be used to find the numa node where a page
currently resides. This is not working for device public memory pages,
which erroneously report -EFAULT (unmapped or zero page).
Enable by adding a FOLL_DEVICE flag for follow_page(), which
move_pages() will use. This could be done unconditionally, but adding a
flag seems like a safer change.
Cc: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
---
include/linux/mm.h | 1 +
mm/gup.c | 2 +-
mm/migrate.c | 2 +-
3 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f8c10d3..783cb57 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2368,6 +2368,7 @@ static inline struct page *follow_page(struct vm_area_struct *vma,
#define FOLL_MLOCK 0x1000 /* lock present pages */
#define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */
#define FOLL_COW 0x4000 /* internal GUP flag */
+#define FOLL_DEVICE 0x8000 /* return device pages */
static inline int vm_fault_to_errno(int vm_fault, int foll_flags)
{
diff --git a/mm/gup.c b/mm/gup.c
index b2b4d42..6fbad70 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -110,7 +110,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
return NULL;
}
- page = vm_normal_page(vma, address, pte);
+ page = _vm_normal_page(vma, address, pte, flags & FOLL_DEVICE);
if (!page && pte_devmap(pte) && (flags & FOLL_GET)) {
/*
* Only return device mapping pages in the FOLL_GET case since
diff --git a/mm/migrate.c b/mm/migrate.c
index 6954c14..dea0ceb 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1690,7 +1690,7 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
goto set_status;
/* FOLL_DUMP to ignore special (like zero) pages */
- page = follow_page(vma, addr, FOLL_DUMP);
+ page = follow_page(vma, addr, FOLL_DUMP | FOLL_DEVICE);
err = PTR_ERR(page);
if (IS_ERR(page))
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory
2017-09-22 20:13 [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory Reza Arbab
@ 2017-09-22 20:31 ` Reza Arbab
2017-09-22 21:01 ` Reza Arbab
2017-09-26 13:37 ` Michal Hocko
1 sibling, 1 reply; 8+ messages in thread
From: Reza Arbab @ 2017-09-22 20:31 UTC (permalink / raw)
To: Andrew Morton
Cc: Michal Hocko, Jan Kara, Ross Zwisler, Kirill A. Shutemov,
Lorenzo Stoakes, Dave Jiang, Jérôme Glisse,
Matthew Wilcox, Hugh Dickins, Huang Ying, Ingo Molnar,
Aneesh Kumar K.V, James Morse, Naoya Horiguchi, Minchan Kim,
Johannes Weiner, Will Deacon, linux-mm, linux-kernel
On Fri, Sep 22, 2017 at 08:13:56PM +0000, Reza Arbab wrote:
>The move_pages() syscall can be used to find the numa node where a page
>currently resides. This is not working for device public memory pages,
>which erroneously report -EFAULT (unmapped or zero page).
Argh. Please disregard this patch.
My test setup has a chunk of system memory carved out as pretend device
public memory, to experiment with. Of course the real thing has no numa
node!
Apologies all, it's been a long day.
--
Reza Arbab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory
2017-09-22 20:31 ` Reza Arbab
@ 2017-09-22 21:01 ` Reza Arbab
0 siblings, 0 replies; 8+ messages in thread
From: Reza Arbab @ 2017-09-22 21:01 UTC (permalink / raw)
To: Andrew Morton
Cc: Michal Hocko, Jan Kara, Ross Zwisler, Kirill A. Shutemov,
Lorenzo Stoakes, Dave Jiang, Jérôme Glisse,
Matthew Wilcox, Hugh Dickins, Huang Ying, Ingo Molnar,
Aneesh Kumar K.V, James Morse, Naoya Horiguchi, Minchan Kim,
Johannes Weiner, Will Deacon, linux-mm, linux-kernel
On Fri, Sep 22, 2017 at 08:31:57PM +0000, Reza Arbab wrote:
>On Fri, Sep 22, 2017 at 08:13:56PM +0000, Reza Arbab wrote:
>>The move_pages() syscall can be used to find the numa node where a page
>>currently resides. This is not working for device public memory pages,
>>which erroneously report -EFAULT (unmapped or zero page).
>
>Argh. Please disregard this patch.
>
>My test setup has a chunk of system memory carved out as pretend
>device public memory, to experiment with. Of course the real thing has
>no numa node!
On third thought, yes it does!
static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
{
:
nid = dev_to_node(device);
if (nid < 0)
nid = numa_mem_id();
:
if (devmem->pagemap.type == MEMORY_DEVICE_PUBLIC)
ret = arch_add_memory(nid, align_start, align_size, false);
:
}
So now I think the patch may be right after all. Please un-disregard it.
Regard it? Whatever.
--
Reza Arbab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory
2017-09-22 20:13 [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory Reza Arbab
2017-09-22 20:31 ` Reza Arbab
@ 2017-09-26 13:37 ` Michal Hocko
2017-09-26 14:47 ` Reza Arbab
1 sibling, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2017-09-26 13:37 UTC (permalink / raw)
To: Reza Arbab
Cc: Andrew Morton, Jan Kara, Ross Zwisler, Kirill A. Shutemov,
Lorenzo Stoakes, Dave Jiang, Jérôme Glisse,
Matthew Wilcox, Hugh Dickins, Huang Ying, Ingo Molnar,
Aneesh Kumar K.V, James Morse, Naoya Horiguchi, Minchan Kim,
Johannes Weiner, Will Deacon, linux-mm, linux-kernel
On Fri 22-09-17 15:13:56, Reza Arbab wrote:
> The move_pages() syscall can be used to find the numa node where a page
> currently resides. This is not working for device public memory pages,
> which erroneously report -EFAULT (unmapped or zero page).
>
> Enable by adding a FOLL_DEVICE flag for follow_page(), which
> move_pages() will use. This could be done unconditionally, but adding a
> flag seems like a safer change.
I do not understand purpose of this patch. What is the numa node of a
device memory?
> Cc: Jerome Glisse <jglisse@redhat.com>
> Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
> ---
> include/linux/mm.h | 1 +
> mm/gup.c | 2 +-
> mm/migrate.c | 2 +-
> 3 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f8c10d3..783cb57 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2368,6 +2368,7 @@ static inline struct page *follow_page(struct vm_area_struct *vma,
> #define FOLL_MLOCK 0x1000 /* lock present pages */
> #define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */
> #define FOLL_COW 0x4000 /* internal GUP flag */
> +#define FOLL_DEVICE 0x8000 /* return device pages */
>
> static inline int vm_fault_to_errno(int vm_fault, int foll_flags)
> {
> diff --git a/mm/gup.c b/mm/gup.c
> index b2b4d42..6fbad70 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -110,7 +110,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
> return NULL;
> }
>
> - page = vm_normal_page(vma, address, pte);
> + page = _vm_normal_page(vma, address, pte, flags & FOLL_DEVICE);
> if (!page && pte_devmap(pte) && (flags & FOLL_GET)) {
> /*
> * Only return device mapping pages in the FOLL_GET case since
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 6954c14..dea0ceb 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1690,7 +1690,7 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
> goto set_status;
>
> /* FOLL_DUMP to ignore special (like zero) pages */
> - page = follow_page(vma, addr, FOLL_DUMP);
> + page = follow_page(vma, addr, FOLL_DUMP | FOLL_DEVICE);
>
> err = PTR_ERR(page);
> if (IS_ERR(page))
> --
> 1.8.3.1
>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory
2017-09-26 13:37 ` Michal Hocko
@ 2017-09-26 14:47 ` Reza Arbab
2017-09-26 16:19 ` Jerome Glisse
2017-09-26 16:32 ` Michal Hocko
0 siblings, 2 replies; 8+ messages in thread
From: Reza Arbab @ 2017-09-26 14:47 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Jan Kara, Ross Zwisler, Kirill A. Shutemov,
Lorenzo Stoakes, Dave Jiang, Jérôme Glisse,
Matthew Wilcox, Hugh Dickins, Huang Ying, Ingo Molnar,
Aneesh Kumar K.V, James Morse, Naoya Horiguchi, Minchan Kim,
Johannes Weiner, Will Deacon, linux-mm, linux-kernel
On Tue, Sep 26, 2017 at 01:37:07PM +0000, Michal Hocko wrote:
>On Fri 22-09-17 15:13:56, Reza Arbab wrote:
>> The move_pages() syscall can be used to find the numa node where a page
>> currently resides. This is not working for device public memory pages,
>> which erroneously report -EFAULT (unmapped or zero page).
>>
>> Enable by adding a FOLL_DEVICE flag for follow_page(), which
>> move_pages() will use. This could be done unconditionally, but adding a
>> flag seems like a safer change.
>
>I do not understand purpose of this patch. What is the numa node of a
>device memory?
Well, using hmm_devmem_pages_create() it is added to this node:
nid = dev_to_node(device);
if (nid < 0)
nid = numa_mem_id();
I understand it's minimally useful information to userspace, but the
memory does have a nid and move_pages() is supposed to be able to return
what that is. I ran into this using a testcase which tries to verify
that user addresses were correctly migrated to coherent device memory.
That said, I'm okay with dropping this if you don't think it's
worthwhile.
--
Reza Arbab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory
2017-09-26 14:47 ` Reza Arbab
@ 2017-09-26 16:19 ` Jerome Glisse
2017-09-26 16:32 ` Michal Hocko
1 sibling, 0 replies; 8+ messages in thread
From: Jerome Glisse @ 2017-09-26 16:19 UTC (permalink / raw)
To: Reza Arbab
Cc: Michal Hocko, Andrew Morton, Jan Kara, Ross Zwisler,
Kirill A. Shutemov, Lorenzo Stoakes, Dave Jiang, Matthew Wilcox,
Hugh Dickins, Huang Ying, Ingo Molnar, Aneesh Kumar K.V,
James Morse, Naoya Horiguchi, Minchan Kim, Johannes Weiner,
Will Deacon, linux-mm, linux-kernel
On Tue, Sep 26, 2017 at 09:47:10AM -0500, Reza Arbab wrote:
> On Tue, Sep 26, 2017 at 01:37:07PM +0000, Michal Hocko wrote:
> > On Fri 22-09-17 15:13:56, Reza Arbab wrote:
> > > The move_pages() syscall can be used to find the numa node where a page
> > > currently resides. This is not working for device public memory pages,
> > > which erroneously report -EFAULT (unmapped or zero page).
> > >
> > > Enable by adding a FOLL_DEVICE flag for follow_page(), which
> > > move_pages() will use. This could be done unconditionally, but adding a
> > > flag seems like a safer change.
> >
> > I do not understand purpose of this patch. What is the numa node of a
> > device memory?
>
> Well, using hmm_devmem_pages_create() it is added to this node:
>
> nid = dev_to_node(device);
> if (nid < 0)
> nid = numa_mem_id();
>
> I understand it's minimally useful information to userspace, but the memory
> does have a nid and move_pages() is supposed to be able to return what that
> is. I ran into this using a testcase which tries to verify that user
> addresses were correctly migrated to coherent device memory.
>
> That said, I'm okay with dropping this if you don't think it's worthwhile.
Just to add a data point, PCIE devices are tie to one CPU (architecturaly PCIE
lane are connected to CPU at least on x86/ppc AFAIK) and thus to one numa node.
Right now i am traveling but i want to check that this patch does not allow
user to inadvertaly pin device memory page. I will look into it once i am
back.
Cheers,
Jerome
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory
2017-09-26 14:47 ` Reza Arbab
2017-09-26 16:19 ` Jerome Glisse
@ 2017-09-26 16:32 ` Michal Hocko
2017-09-26 18:35 ` Reza Arbab
1 sibling, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2017-09-26 16:32 UTC (permalink / raw)
To: Reza Arbab
Cc: Andrew Morton, Jan Kara, Ross Zwisler, Kirill A. Shutemov,
Lorenzo Stoakes, Dave Jiang, Jérôme Glisse,
Matthew Wilcox, Hugh Dickins, Huang Ying, Ingo Molnar,
Aneesh Kumar K.V, James Morse, Naoya Horiguchi, Minchan Kim,
Johannes Weiner, Will Deacon, linux-mm, linux-kernel
On Tue 26-09-17 09:47:10, Reza Arbab wrote:
> On Tue, Sep 26, 2017 at 01:37:07PM +0000, Michal Hocko wrote:
> > On Fri 22-09-17 15:13:56, Reza Arbab wrote:
> > > The move_pages() syscall can be used to find the numa node where a page
> > > currently resides. This is not working for device public memory pages,
> > > which erroneously report -EFAULT (unmapped or zero page).
> > >
> > > Enable by adding a FOLL_DEVICE flag for follow_page(), which
> > > move_pages() will use. This could be done unconditionally, but adding a
> > > flag seems like a safer change.
> >
> > I do not understand purpose of this patch. What is the numa node of a
> > device memory?
>
> Well, using hmm_devmem_pages_create() it is added to this node:
>
> nid = dev_to_node(device);
> if (nid < 0)
> nid = numa_mem_id();
OK, but do all the HMM devices have concept of NUMA affinity? From the
code you are pasting they do not have to...
> I understand it's minimally useful information to userspace, but the memory
> does have a nid and move_pages() is supposed to be able to return what that
> is. I ran into this using a testcase which tries to verify that user
> addresses were correctly migrated to coherent device memory.
>
> That said, I'm okay with dropping this if you don't think it's worthwhile.
I am just worried that we allow information which is not generally
sensible and I am also not sure what the userspace can actually do with
that information.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory
2017-09-26 16:32 ` Michal Hocko
@ 2017-09-26 18:35 ` Reza Arbab
0 siblings, 0 replies; 8+ messages in thread
From: Reza Arbab @ 2017-09-26 18:35 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Jan Kara, Ross Zwisler, Kirill A. Shutemov,
Lorenzo Stoakes, Dave Jiang, Jérôme Glisse,
Matthew Wilcox, Hugh Dickins, Huang Ying, Ingo Molnar,
Aneesh Kumar K.V, James Morse, Naoya Horiguchi, Minchan Kim,
Johannes Weiner, Will Deacon, linux-mm, linux-kernel
On Tue, Sep 26, 2017 at 04:32:41PM +0000, Michal Hocko wrote:
>On Tue 26-09-17 09:47:10, Reza Arbab wrote:
>> On Tue, Sep 26, 2017 at 01:37:07PM +0000, Michal Hocko wrote:
>> > On Fri 22-09-17 15:13:56, Reza Arbab wrote:
>> > > The move_pages() syscall can be used to find the numa node where a page
>> > > currently resides. This is not working for device public memory pages,
>> > > which erroneously report -EFAULT (unmapped or zero page).
>> > >
>> > > Enable by adding a FOLL_DEVICE flag for follow_page(), which
>> > > move_pages() will use. This could be done unconditionally, but adding a
>> > > flag seems like a safer change.
>> >
>> > I do not understand purpose of this patch. What is the numa node of a
>> > device memory?
>>
>> Well, using hmm_devmem_pages_create() it is added to this node:
>>
>> nid = dev_to_node(device);
>> if (nid < 0)
>> nid = numa_mem_id();
>
>OK, but do all the HMM devices have concept of NUMA affinity? From the
>code you are pasting they do not have to...
I don't know the definitive answer here, but as Jerome said PCIE devices
should, and we are heading that way with NVLink/CAPI as well. It seems
the default is just the nearest node.
>> I understand it's minimally useful information to userspace, but the memory
>> does have a nid and move_pages() is supposed to be able to return what that
>> is. I ran into this using a testcase which tries to verify that user
>> addresses were correctly migrated to coherent device memory.
>>
>> That said, I'm okay with dropping this if you don't think it's worthwhile.
>
>I am just worried that we allow information which is not generally
>sensible and I am also not sure what the userspace can actually do with
>that information.
As mentioned, it is minimally useful, e.g. for verifying migration, so
returning the nid seems sensible to me. Alternatively, we might at least
change the documentation to say
-EFAULT
This is a zero page, a device page, or the memory area is not mapped by the process.
^^^^^^^^^^^^^
--
Reza Arbab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-09-26 18:35 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-22 20:13 [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory Reza Arbab
2017-09-22 20:31 ` Reza Arbab
2017-09-22 21:01 ` Reza Arbab
2017-09-26 13:37 ` Michal Hocko
2017-09-26 14:47 ` Reza Arbab
2017-09-26 16:19 ` Jerome Glisse
2017-09-26 16:32 ` Michal Hocko
2017-09-26 18:35 ` Reza Arbab
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).