* [PATCH v6 01/10] memory-failure: fetch compound_head after pgmap_pfn_valid()
2021-11-24 19:09 [PATCH v6 00/10] mm, device-dax: Introduce compound pages in devmap Joao Martins
@ 2021-11-24 19:09 ` Joao Martins
2021-11-24 19:09 ` [PATCH v6 02/10] mm/page_alloc: split prep_compound_page into head and tail subparts Joao Martins
` (9 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Joao Martins @ 2021-11-24 19:09 UTC (permalink / raw)
To: linux-mm
Cc: Dan Williams, Vishal Verma, Dave Jiang, Naoya Horiguchi,
Matthew Wilcox, Jason Gunthorpe, John Hubbard, Jane Chu,
Muchun Song, Mike Kravetz, Andrew Morton, Jonathan Corbet,
Christoph Hellwig, nvdimm, linux-doc, Joao Martins
memory_failure_dev_pagemap() at the moment assumes base pages (e.g.
dax_lock_page()). For devmap with compound pages fetch the
compound_head in case a tail page memory failure is being handled.
Currently this is a nop, but in the advent of compound pages in
dev_pagemap it allows memory_failure_dev_pagemap() to keep working.
Reported-by: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
---
mm/memory-failure.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 8f0ee5b08696..f5749db8fad3 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1600,6 +1600,12 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
goto out;
}
+ /*
+ * Pages instantiated by device-dax (not filesystem-dax)
+ * may be compound pages.
+ */
+ page = compound_head(page);
+
/*
* Prevent the inode from being freed while we are interrogating
* the address_space, typically this would be handled by
--
2.17.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v6 02/10] mm/page_alloc: split prep_compound_page into head and tail subparts
2021-11-24 19:09 [PATCH v6 00/10] mm, device-dax: Introduce compound pages in devmap Joao Martins
2021-11-24 19:09 ` [PATCH v6 01/10] memory-failure: fetch compound_head after pgmap_pfn_valid() Joao Martins
@ 2021-11-24 19:09 ` Joao Martins
2021-11-24 19:09 ` [PATCH v6 03/10] mm/page_alloc: refactor memmap_init_zone_device() page init Joao Martins
` (8 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Joao Martins @ 2021-11-24 19:09 UTC (permalink / raw)
To: linux-mm
Cc: Dan Williams, Vishal Verma, Dave Jiang, Naoya Horiguchi,
Matthew Wilcox, Jason Gunthorpe, John Hubbard, Jane Chu,
Muchun Song, Mike Kravetz, Andrew Morton, Jonathan Corbet,
Christoph Hellwig, nvdimm, linux-doc, Joao Martins
Split the utility function prep_compound_page() into head and tail
counterparts, and use them accordingly.
This is in preparation for sharing the storage for compound page
metadata.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
---
mm/page_alloc.c | 30 ++++++++++++++++++++----------
1 file changed, 20 insertions(+), 10 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 58490fa8948d..ba096f731e36 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -727,23 +727,33 @@ void free_compound_page(struct page *page)
free_the_page(page, compound_order(page));
}
+static void prep_compound_head(struct page *page, unsigned int order)
+{
+ set_compound_page_dtor(page, COMPOUND_PAGE_DTOR);
+ set_compound_order(page, order);
+ atomic_set(compound_mapcount_ptr(page), -1);
+ if (hpage_pincount_available(page))
+ atomic_set(compound_pincount_ptr(page), 0);
+}
+
+static void prep_compound_tail(struct page *head, int tail_idx)
+{
+ struct page *p = head + tail_idx;
+
+ p->mapping = TAIL_MAPPING;
+ set_compound_head(p, head);
+}
+
void prep_compound_page(struct page *page, unsigned int order)
{
int i;
int nr_pages = 1 << order;
__SetPageHead(page);
- for (i = 1; i < nr_pages; i++) {
- struct page *p = page + i;
- p->mapping = TAIL_MAPPING;
- set_compound_head(p, page);
- }
+ for (i = 1; i < nr_pages; i++)
+ prep_compound_tail(page, i);
- set_compound_page_dtor(page, COMPOUND_PAGE_DTOR);
- set_compound_order(page, order);
- atomic_set(compound_mapcount_ptr(page), -1);
- if (hpage_pincount_available(page))
- atomic_set(compound_pincount_ptr(page), 0);
+ prep_compound_head(page, order);
}
#ifdef CONFIG_DEBUG_PAGEALLOC
--
2.17.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v6 03/10] mm/page_alloc: refactor memmap_init_zone_device() page init
2021-11-24 19:09 [PATCH v6 00/10] mm, device-dax: Introduce compound pages in devmap Joao Martins
2021-11-24 19:09 ` [PATCH v6 01/10] memory-failure: fetch compound_head after pgmap_pfn_valid() Joao Martins
2021-11-24 19:09 ` [PATCH v6 02/10] mm/page_alloc: split prep_compound_page into head and tail subparts Joao Martins
@ 2021-11-24 19:09 ` Joao Martins
2021-11-24 19:09 ` [PATCH v6 04/10] mm/memremap: add ZONE_DEVICE support for compound pages Joao Martins
` (7 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Joao Martins @ 2021-11-24 19:09 UTC (permalink / raw)
To: linux-mm
Cc: Dan Williams, Vishal Verma, Dave Jiang, Naoya Horiguchi,
Matthew Wilcox, Jason Gunthorpe, John Hubbard, Jane Chu,
Muchun Song, Mike Kravetz, Andrew Morton, Jonathan Corbet,
Christoph Hellwig, nvdimm, linux-doc, Joao Martins
Move struct page init to an helper function __init_zone_device_page().
This is in preparation for sharing the storage for compound page
metadata.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
mm/page_alloc.c | 74 +++++++++++++++++++++++++++----------------------
1 file changed, 41 insertions(+), 33 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ba096f731e36..f7f33c83222f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6565,6 +6565,46 @@ void __meminit memmap_init_range(unsigned long size, int nid, unsigned long zone
}
#ifdef CONFIG_ZONE_DEVICE
+static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
+ unsigned long zone_idx, int nid,
+ struct dev_pagemap *pgmap)
+{
+
+ __init_single_page(page, pfn, zone_idx, nid);
+
+ /*
+ * Mark page reserved as it will need to wait for onlining
+ * phase for it to be fully associated with a zone.
+ *
+ * We can use the non-atomic __set_bit operation for setting
+ * the flag as we are still initializing the pages.
+ */
+ __SetPageReserved(page);
+
+ /*
+ * ZONE_DEVICE pages union ->lru with a ->pgmap back pointer
+ * and zone_device_data. It is a bug if a ZONE_DEVICE page is
+ * ever freed or placed on a driver-private list.
+ */
+ page->pgmap = pgmap;
+ page->zone_device_data = NULL;
+
+ /*
+ * Mark the block movable so that blocks are reserved for
+ * movable at startup. This will force kernel allocations
+ * to reserve their blocks rather than leaking throughout
+ * the address space during boot when many long-lived
+ * kernel allocations are made.
+ *
+ * Please note that MEMINIT_HOTPLUG path doesn't clear memmap
+ * because this is done early in section_activate()
+ */
+ if (IS_ALIGNED(pfn, pageblock_nr_pages)) {
+ set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+ cond_resched();
+ }
+}
+
void __ref memmap_init_zone_device(struct zone *zone,
unsigned long start_pfn,
unsigned long nr_pages,
@@ -6593,39 +6633,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
struct page *page = pfn_to_page(pfn);
- __init_single_page(page, pfn, zone_idx, nid);
-
- /*
- * Mark page reserved as it will need to wait for onlining
- * phase for it to be fully associated with a zone.
- *
- * We can use the non-atomic __set_bit operation for setting
- * the flag as we are still initializing the pages.
- */
- __SetPageReserved(page);
-
- /*
- * ZONE_DEVICE pages union ->lru with a ->pgmap back pointer
- * and zone_device_data. It is a bug if a ZONE_DEVICE page is
- * ever freed or placed on a driver-private list.
- */
- page->pgmap = pgmap;
- page->zone_device_data = NULL;
-
- /*
- * Mark the block movable so that blocks are reserved for
- * movable at startup. This will force kernel allocations
- * to reserve their blocks rather than leaking throughout
- * the address space during boot when many long-lived
- * kernel allocations are made.
- *
- * Please note that MEMINIT_HOTPLUG path doesn't clear memmap
- * because this is done early in section_activate()
- */
- if (IS_ALIGNED(pfn, pageblock_nr_pages)) {
- set_pageblock_migratetype(page, MIGRATE_MOVABLE);
- cond_resched();
- }
+ __init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
}
pr_info("%s initialised %lu pages in %ums\n", __func__,
--
2.17.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v6 04/10] mm/memremap: add ZONE_DEVICE support for compound pages
2021-11-24 19:09 [PATCH v6 00/10] mm, device-dax: Introduce compound pages in devmap Joao Martins
` (2 preceding siblings ...)
2021-11-24 19:09 ` [PATCH v6 03/10] mm/page_alloc: refactor memmap_init_zone_device() page init Joao Martins
@ 2021-11-24 19:09 ` Joao Martins
2021-11-25 6:11 ` Christoph Hellwig
2021-11-24 19:10 ` [PATCH v6 05/10] device-dax: use ALIGN() for determining pgoff Joao Martins
` (6 subsequent siblings)
10 siblings, 1 reply; 22+ messages in thread
From: Joao Martins @ 2021-11-24 19:09 UTC (permalink / raw)
To: linux-mm
Cc: Dan Williams, Vishal Verma, Dave Jiang, Naoya Horiguchi,
Matthew Wilcox, Jason Gunthorpe, John Hubbard, Jane Chu,
Muchun Song, Mike Kravetz, Andrew Morton, Jonathan Corbet,
Christoph Hellwig, nvdimm, linux-doc, Joao Martins
Add a new @vmemmap_shift property for struct dev_pagemap which specifies that a
devmap is composed of a set of compound pages of order @vmemmap_shift, instead of
base pages. When a compound page devmap is requested, all but the first
page are initialised as tail pages instead of order-0 pages.
For certain ZONE_DEVICE users like device-dax which have a fixed page size,
this creates an opportunity to optimize GUP and GUP-fast walkers, treating
it the same way as THP or hugetlb pages.
Additionally, commit 7118fc2906e2 ("hugetlb: address ref count racing in
prep_compound_gigantic_page") removed set_page_count() because the
setting of page ref count to zero was redundant. devmap pages don't come
from page allocator though and only head page refcount is used for
compound pages, hence initialize tail page count to zero.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
include/linux/memremap.h | 11 +++++++++++
mm/memremap.c | 12 ++++++------
mm/page_alloc.c | 38 +++++++++++++++++++++++++++++++++++++-
3 files changed, 54 insertions(+), 7 deletions(-)
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 119f130ef8f1..aaf85bda093b 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -99,6 +99,11 @@ struct dev_pagemap_ops {
* @done: completion for @internal_ref
* @type: memory type: see MEMORY_* in memory_hotplug.h
* @flags: PGMAP_* flags to specify defailed behavior
+ * @vmemmap_shift: structural definition of how the vmemmap page metadata
+ * is populated, specifically the metadata page order.
+ * A zero value (default) uses base pages as the vmemmap metadata
+ * representation. A bigger value will set up compound struct pages
+ * of the requested order value.
* @ops: method table
* @owner: an opaque pointer identifying the entity that manages this
* instance. Used by various helpers to make sure that no
@@ -114,6 +119,7 @@ struct dev_pagemap {
struct completion done;
enum memory_type type;
unsigned int flags;
+ unsigned long vmemmap_shift;
const struct dev_pagemap_ops *ops;
void *owner;
int nr_range;
@@ -130,6 +136,11 @@ static inline struct vmem_altmap *pgmap_altmap(struct dev_pagemap *pgmap)
return NULL;
}
+static inline unsigned long pgmap_vmemmap_nr(struct dev_pagemap *pgmap)
+{
+ return 1 << pgmap->vmemmap_shift;
+}
+
#ifdef CONFIG_ZONE_DEVICE
bool pfn_zone_device_reserved(unsigned long pfn);
void *memremap_pages(struct dev_pagemap *pgmap, int nid);
diff --git a/mm/memremap.c b/mm/memremap.c
index 84de22c14567..3afa246eb1ab 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -102,11 +102,11 @@ static unsigned long pfn_end(struct dev_pagemap *pgmap, int range_id)
return (range->start + range_len(range)) >> PAGE_SHIFT;
}
-static unsigned long pfn_next(unsigned long pfn)
+static unsigned long pfn_next(struct dev_pagemap *pgmap, unsigned long pfn)
{
- if (pfn % 1024 == 0)
+ if (pfn % (1024 << pgmap->vmemmap_shift))
cond_resched();
- return pfn + 1;
+ return pfn + pgmap_vmemmap_nr(pgmap);
}
/*
@@ -130,7 +130,7 @@ bool pfn_zone_device_reserved(unsigned long pfn)
}
#define for_each_device_pfn(pfn, map, i) \
- for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(pfn))
+ for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(map, pfn))
static void dev_pagemap_kill(struct dev_pagemap *pgmap)
{
@@ -315,8 +315,8 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
PHYS_PFN(range->start),
PHYS_PFN(range_len(range)), pgmap);
- percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id)
- - pfn_first(pgmap, range_id));
+ percpu_ref_get_many(pgmap->ref, (pfn_end(pgmap, range_id)
+ - pfn_first(pgmap, range_id)) >> pgmap->vmemmap_shift);
return 0;
err_add_memory:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f7f33c83222f..ea537839816e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6605,6 +6605,35 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
}
}
+static void __ref memmap_init_compound(struct page *head,
+ unsigned long head_pfn,
+ unsigned long zone_idx, int nid,
+ struct dev_pagemap *pgmap,
+ unsigned long nr_pages)
+{
+ unsigned long pfn, end_pfn = head_pfn + nr_pages;
+ unsigned int order = pgmap->vmemmap_shift;
+
+ __SetPageHead(head);
+ for (pfn = head_pfn + 1; pfn < end_pfn; pfn++) {
+ struct page *page = pfn_to_page(pfn);
+
+ __init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
+ prep_compound_tail(head, pfn - head_pfn);
+ set_page_count(page, 0);
+
+ /*
+ * The first tail page stores compound_mapcount_ptr() and
+ * compound_order() and the second tail page stores
+ * compound_pincount_ptr(). Call prep_compound_head() after
+ * the first and second tail pages have been initialized to
+ * not have the data overwritten.
+ */
+ if (pfn == head_pfn + 2)
+ prep_compound_head(head, order);
+ }
+}
+
void __ref memmap_init_zone_device(struct zone *zone,
unsigned long start_pfn,
unsigned long nr_pages,
@@ -6613,6 +6642,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
unsigned long pfn, end_pfn = start_pfn + nr_pages;
struct pglist_data *pgdat = zone->zone_pgdat;
struct vmem_altmap *altmap = pgmap_altmap(pgmap);
+ unsigned int pfns_per_compound = pgmap_vmemmap_nr(pgmap);
unsigned long zone_idx = zone_idx(zone);
unsigned long start = jiffies;
int nid = pgdat->node_id;
@@ -6630,10 +6660,16 @@ void __ref memmap_init_zone_device(struct zone *zone,
nr_pages = end_pfn - start_pfn;
}
- for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+ for (pfn = start_pfn; pfn < end_pfn; pfn += pfns_per_compound) {
struct page *page = pfn_to_page(pfn);
__init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
+
+ if (pfns_per_compound == 1)
+ continue;
+
+ memmap_init_compound(page, pfn, zone_idx, nid, pgmap,
+ pfns_per_compound);
}
pr_info("%s initialised %lu pages in %ums\n", __func__,
--
2.17.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v6 04/10] mm/memremap: add ZONE_DEVICE support for compound pages
2021-11-24 19:09 ` [PATCH v6 04/10] mm/memremap: add ZONE_DEVICE support for compound pages Joao Martins
@ 2021-11-25 6:11 ` Christoph Hellwig
2021-11-25 11:35 ` Joao Martins
0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2021-11-25 6:11 UTC (permalink / raw)
To: Joao Martins
Cc: linux-mm, Dan Williams, Vishal Verma, Dave Jiang,
Naoya Horiguchi, Matthew Wilcox, Jason Gunthorpe, John Hubbard,
Jane Chu, Muchun Song, Mike Kravetz, Andrew Morton,
Jonathan Corbet, Christoph Hellwig, nvdimm, linux-doc
On Wed, Nov 24, 2021 at 07:09:59PM +0000, Joao Martins wrote:
> Add a new @vmemmap_shift property for struct dev_pagemap which specifies that a
> devmap is composed of a set of compound pages of order @vmemmap_shift, instead of
> base pages. When a compound page devmap is requested, all but the first
> page are initialised as tail pages instead of order-0 pages.
Please wrap commit log lines after 73 characters.
> #define for_each_device_pfn(pfn, map, i) \
> - for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(pfn))
> + for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(map, pfn))
It would be nice to fix up this long line while you're at it.
> static void dev_pagemap_kill(struct dev_pagemap *pgmap)
> {
> @@ -315,8 +315,8 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
> memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
> PHYS_PFN(range->start),
> PHYS_PFN(range_len(range)), pgmap);
> - percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id)
> - - pfn_first(pgmap, range_id));
> + percpu_ref_get_many(pgmap->ref, (pfn_end(pgmap, range_id)
> + - pfn_first(pgmap, range_id)) >> pgmap->vmemmap_shift);
In the Linux coding style the - goes ointo the first line.
But it would be really nice to clean this up with a helper ala pfn_len
anyway:
percpu_ref_get_many(pgmap->ref,
pfn_len(pgmap, range_id) >> pgmap->vmemmap_shift);
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 04/10] mm/memremap: add ZONE_DEVICE support for compound pages
2021-11-25 6:11 ` Christoph Hellwig
@ 2021-11-25 11:35 ` Joao Martins
0 siblings, 0 replies; 22+ messages in thread
From: Joao Martins @ 2021-11-25 11:35 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-mm, Dan Williams, Vishal Verma, Dave Jiang,
Naoya Horiguchi, Matthew Wilcox, Jason Gunthorpe, John Hubbard,
Jane Chu, Muchun Song, Mike Kravetz, Andrew Morton,
Jonathan Corbet, nvdimm, linux-doc
On 11/25/21 06:11, Christoph Hellwig wrote:
> On Wed, Nov 24, 2021 at 07:09:59PM +0000, Joao Martins wrote:
>> Add a new @vmemmap_shift property for struct dev_pagemap which specifies that a
>> devmap is composed of a set of compound pages of order @vmemmap_shift, instead of
>> base pages. When a compound page devmap is requested, all but the first
>> page are initialised as tail pages instead of order-0 pages.
>
> Please wrap commit log lines after 73 characters.
>
Fixed.
>> #define for_each_device_pfn(pfn, map, i) \
>> - for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(pfn))
>> + for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(map, pfn))
>
> It would be nice to fix up this long line while you're at it.
>
OK -- I am gonna assume that it's enough to move pfn = pfn_next(...)
clause into the next line.
>> static void dev_pagemap_kill(struct dev_pagemap *pgmap)
>> {
>> @@ -315,8 +315,8 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
>> memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
>> PHYS_PFN(range->start),
>> PHYS_PFN(range_len(range)), pgmap);
>> - percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id)
>> - - pfn_first(pgmap, range_id));
>> + percpu_ref_get_many(pgmap->ref, (pfn_end(pgmap, range_id)
>> + - pfn_first(pgmap, range_id)) >> pgmap->vmemmap_shift);
>
> In the Linux coding style the - goes ointo the first line.
>
> But it would be really nice to clean this up with a helper ala pfn_len
> anyway:
>
> percpu_ref_get_many(pgmap->ref,
> pfn_len(pgmap, range_id) >> pgmap->vmemmap_shift);
>
OK, I moved the computation to an helper.
I've staged your comments (see below diff for this patch), plus wrapping the commit
message to 73 columns (I've also double-checked and this one seems to be the only one
making that mistake).
I'll wait a couple days to follow up v7 should you have further comments
in other patches.
diff --git a/mm/memremap.c b/mm/memremap.c
index 3afa246eb1ab..d591f3aa8884 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -109,6 +109,12 @@ static unsigned long pfn_next(struct dev_pagemap *pgmap, unsigned
long pfn)
return pfn + pgmap_vmemmap_nr(pgmap);
}
+static unsigned long pfn_len(struct dev_pagemap *pgmap, unsigned long range_id)
+{
+ return (pfn_end(pgmap, range_id) -
+ pfn_first(pgmap, range_id)) >> pgmap->vmemmap_shift;
+}
+
/*
* This returns true if the page is reserved by ZONE_DEVICE driver.
*/
@@ -130,7 +136,8 @@ bool pfn_zone_device_reserved(unsigned long pfn)
}
#define for_each_device_pfn(pfn, map, i) \
- for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(map, pfn))
+ for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); \
+ pfn = pfn_next(map, pfn))
static void dev_pagemap_kill(struct dev_pagemap *pgmap)
{
@@ -315,8 +322,7 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params
*params,
memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
PHYS_PFN(range->start),
PHYS_PFN(range_len(range)), pgmap);
- percpu_ref_get_many(pgmap->ref, (pfn_end(pgmap, range_id)
- - pfn_first(pgmap, range_id)) >> pgmap->vmemmap_shift);
+ percpu_ref_get_many(pgmap->ref, pfn_len(pgmap, range_id));
return 0;
err_add_memory:
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v6 05/10] device-dax: use ALIGN() for determining pgoff
2021-11-24 19:09 [PATCH v6 00/10] mm, device-dax: Introduce compound pages in devmap Joao Martins
` (3 preceding siblings ...)
2021-11-24 19:09 ` [PATCH v6 04/10] mm/memremap: add ZONE_DEVICE support for compound pages Joao Martins
@ 2021-11-24 19:10 ` Joao Martins
2021-11-24 19:10 ` [PATCH v6 06/10] device-dax: use struct_size() Joao Martins
` (5 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Joao Martins @ 2021-11-24 19:10 UTC (permalink / raw)
To: linux-mm
Cc: Dan Williams, Vishal Verma, Dave Jiang, Naoya Horiguchi,
Matthew Wilcox, Jason Gunthorpe, John Hubbard, Jane Chu,
Muchun Song, Mike Kravetz, Andrew Morton, Jonathan Corbet,
Christoph Hellwig, nvdimm, linux-doc, Joao Martins
Rather than calculating @pgoff manually, switch to ALIGN() instead.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/dax/device.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index dd8222a42808..0b82159b3564 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -234,8 +234,8 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
* mapped. No need to consider the zero page, or racing
* conflicting mappings.
*/
- pgoff = linear_page_index(vmf->vma, vmf->address
- & ~(fault_size - 1));
+ pgoff = linear_page_index(vmf->vma,
+ ALIGN(vmf->address, fault_size));
for (i = 0; i < fault_size / PAGE_SIZE; i++) {
struct page *page;
--
2.17.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v6 06/10] device-dax: use struct_size()
2021-11-24 19:09 [PATCH v6 00/10] mm, device-dax: Introduce compound pages in devmap Joao Martins
` (4 preceding siblings ...)
2021-11-24 19:10 ` [PATCH v6 05/10] device-dax: use ALIGN() for determining pgoff Joao Martins
@ 2021-11-24 19:10 ` Joao Martins
2021-11-24 19:10 ` [PATCH v6 07/10] device-dax: ensure dev_dax->pgmap is valid for dynamic devices Joao Martins
` (4 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Joao Martins @ 2021-11-24 19:10 UTC (permalink / raw)
To: linux-mm
Cc: Dan Williams, Vishal Verma, Dave Jiang, Naoya Horiguchi,
Matthew Wilcox, Jason Gunthorpe, John Hubbard, Jane Chu,
Muchun Song, Mike Kravetz, Andrew Morton, Jonathan Corbet,
Christoph Hellwig, nvdimm, linux-doc, Joao Martins
Use the struct_size() helper for the size of a struct with variable array
member at the end, rather than manually calculating it.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/dax/device.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 0b82159b3564..038816b91af6 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -404,8 +404,9 @@ int dev_dax_probe(struct dev_dax *dev_dax)
return -EINVAL;
if (!pgmap) {
- pgmap = devm_kzalloc(dev, sizeof(*pgmap) + sizeof(struct range)
- * (dev_dax->nr_range - 1), GFP_KERNEL);
+ pgmap = devm_kzalloc(dev,
+ struct_size(pgmap, ranges, dev_dax->nr_range - 1),
+ GFP_KERNEL);
if (!pgmap)
return -ENOMEM;
pgmap->nr_range = dev_dax->nr_range;
--
2.17.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v6 07/10] device-dax: ensure dev_dax->pgmap is valid for dynamic devices
2021-11-24 19:09 [PATCH v6 00/10] mm, device-dax: Introduce compound pages in devmap Joao Martins
` (5 preceding siblings ...)
2021-11-24 19:10 ` [PATCH v6 06/10] device-dax: use struct_size() Joao Martins
@ 2021-11-24 19:10 ` Joao Martins
2021-11-24 19:10 ` [PATCH v6 08/10] device-dax: factor out page mapping initialization Joao Martins
` (3 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Joao Martins @ 2021-11-24 19:10 UTC (permalink / raw)
To: linux-mm
Cc: Dan Williams, Vishal Verma, Dave Jiang, Naoya Horiguchi,
Matthew Wilcox, Jason Gunthorpe, John Hubbard, Jane Chu,
Muchun Song, Mike Kravetz, Andrew Morton, Jonathan Corbet,
Christoph Hellwig, nvdimm, linux-doc, Joao Martins
Right now, only static dax regions have a valid @pgmap pointer in its
struct dev_dax. Dynamic dax case however, do not.
In preparation for device-dax compound devmap support, make sure that
dev_dax pgmap field is set after it has been allocated and initialized.
dynamic dax device have the @pgmap is allocated at probe() and it's
managed by devm (contrast to static dax region which a pgmap is provided
and dax core kfrees it). So in addition to ensure a valid @pgmap, clear
the pgmap when the dynamic dax device is released to avoid the same
pgmap ranges to be re-requested across multiple region device reconfigs.
Add a static_dev_dax() and use that helper in dev_dax_probe() to ensure
the initialization differences between dynamic and static regions are
more explicit. While at it, consolidate the ranges initialization when we
allocate the @pgmap for the dynamic dax region case. Also take the
opportunity to document the differences between static and dynamic da
regions.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/dax/bus.c | 32 ++++++++++++++++++++++++++++++++
drivers/dax/bus.h | 1 +
drivers/dax/device.c | 29 +++++++++++++++++++++--------
3 files changed, 54 insertions(+), 8 deletions(-)
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 6cc4da4c713d..a22350e822fa 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -129,11 +129,35 @@ ATTRIBUTE_GROUPS(dax_drv);
static int dax_bus_match(struct device *dev, struct device_driver *drv);
+/*
+ * Static dax regions are regions created by an external subsystem
+ * nvdimm where a single range is assigned. Its boundaries are by the external
+ * subsystem and are usually limited to one physical memory range. For example,
+ * for PMEM it is usually defined by NVDIMM Namespace boundaries (i.e. a
+ * single contiguous range)
+ *
+ * On dynamic dax regions, the assigned region can be partitioned by dax core
+ * into multiple subdivisions. A subdivision is represented into one
+ * /dev/daxN.M device composed by one or more potentially discontiguous ranges.
+ *
+ * When allocating a dax region, drivers must set whether it's static
+ * (IORESOURCE_DAX_STATIC). On static dax devices, the @pgmap is pre-assigned
+ * to dax core when calling devm_create_dev_dax(), whereas in dynamic dax
+ * devices it is NULL but afterwards allocated by dax core on device ->probe().
+ * Care is needed to make sure that dynamic dax devices are torn down with a
+ * cleared @pgmap field (see kill_dev_dax()).
+ */
static bool is_static(struct dax_region *dax_region)
{
return (dax_region->res.flags & IORESOURCE_DAX_STATIC) != 0;
}
+bool static_dev_dax(struct dev_dax *dev_dax)
+{
+ return is_static(dev_dax->region);
+}
+EXPORT_SYMBOL_GPL(static_dev_dax);
+
static u64 dev_dax_size(struct dev_dax *dev_dax)
{
u64 size = 0;
@@ -363,6 +387,14 @@ void kill_dev_dax(struct dev_dax *dev_dax)
kill_dax(dax_dev);
unmap_mapping_range(inode->i_mapping, 0, 0, 1);
+
+ /*
+ * Dynamic dax region have the pgmap allocated via dev_kzalloc()
+ * and thus freed by devm. Clear the pgmap to not have stale pgmap
+ * ranges on probe() from previous reconfigurations of region devices.
+ */
+ if (!static_dev_dax(dev_dax))
+ dev_dax->pgmap = NULL;
}
EXPORT_SYMBOL_GPL(kill_dev_dax);
diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index 1e946ad7780a..4acdfee7dd59 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -48,6 +48,7 @@ int __dax_driver_register(struct dax_device_driver *dax_drv,
__dax_driver_register(driver, THIS_MODULE, KBUILD_MODNAME)
void dax_driver_unregister(struct dax_device_driver *dax_drv);
void kill_dev_dax(struct dev_dax *dev_dax);
+bool static_dev_dax(struct dev_dax *dev_dax);
#if IS_ENABLED(CONFIG_DEV_DAX_PMEM_COMPAT)
int dev_dax_probe(struct dev_dax *dev_dax);
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 038816b91af6..630de5a795b0 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -398,18 +398,34 @@ int dev_dax_probe(struct dev_dax *dev_dax)
void *addr;
int rc, i;
- pgmap = dev_dax->pgmap;
- if (dev_WARN_ONCE(dev, pgmap && dev_dax->nr_range > 1,
- "static pgmap / multi-range device conflict\n"))
- return -EINVAL;
+ if (static_dev_dax(dev_dax)) {
+ if (dev_dax->nr_range > 1) {
+ dev_warn(dev,
+ "static pgmap / multi-range device conflict\n");
+ return -EINVAL;
+ }
+
+ pgmap = dev_dax->pgmap;
+ } else {
+ if (dev_dax->pgmap) {
+ dev_warn(dev,
+ "dynamic-dax with pre-populated page map\n");
+ return -EINVAL;
+ }
- if (!pgmap) {
pgmap = devm_kzalloc(dev,
struct_size(pgmap, ranges, dev_dax->nr_range - 1),
GFP_KERNEL);
if (!pgmap)
return -ENOMEM;
+
pgmap->nr_range = dev_dax->nr_range;
+ dev_dax->pgmap = pgmap;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct range *range = &dev_dax->ranges[i].range;
+ pgmap->ranges[i] = *range;
+ }
}
for (i = 0; i < dev_dax->nr_range; i++) {
@@ -421,9 +437,6 @@ int dev_dax_probe(struct dev_dax *dev_dax)
i, range->start, range->end);
return -EBUSY;
}
- /* don't update the range for static pgmap */
- if (!dev_dax->pgmap)
- pgmap->ranges[i] = *range;
}
pgmap->type = MEMORY_DEVICE_GENERIC;
--
2.17.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v6 08/10] device-dax: factor out page mapping initialization
2021-11-24 19:09 [PATCH v6 00/10] mm, device-dax: Introduce compound pages in devmap Joao Martins
` (6 preceding siblings ...)
2021-11-24 19:10 ` [PATCH v6 07/10] device-dax: ensure dev_dax->pgmap is valid for dynamic devices Joao Martins
@ 2021-11-24 19:10 ` Joao Martins
2021-11-24 19:10 ` [PATCH v6 09/10] device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}() Joao Martins
` (2 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Joao Martins @ 2021-11-24 19:10 UTC (permalink / raw)
To: linux-mm
Cc: Dan Williams, Vishal Verma, Dave Jiang, Naoya Horiguchi,
Matthew Wilcox, Jason Gunthorpe, John Hubbard, Jane Chu,
Muchun Song, Mike Kravetz, Andrew Morton, Jonathan Corbet,
Christoph Hellwig, nvdimm, linux-doc, Joao Martins
Move initialization of page->mapping into a separate helper.
This is in preparation to move the mapping set to be prior
to inserting the page table entry and also for tidying up
compound page handling into one helper.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/dax/device.c | 45 ++++++++++++++++++++++----------------------
1 file changed, 23 insertions(+), 22 deletions(-)
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 630de5a795b0..9c87927d4bc2 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -73,6 +73,27 @@ __weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
return -1;
}
+static void dax_set_mapping(struct vm_fault *vmf, pfn_t pfn,
+ unsigned long fault_size)
+{
+ unsigned long i, nr_pages = fault_size / PAGE_SIZE;
+ struct file *filp = vmf->vma->vm_file;
+ pgoff_t pgoff;
+
+ pgoff = linear_page_index(vmf->vma,
+ ALIGN(vmf->address, fault_size));
+
+ for (i = 0; i < nr_pages; i++) {
+ struct page *page = pfn_to_page(pfn_t_to_pfn(pfn) + i);
+
+ if (page->mapping)
+ continue;
+
+ page->mapping = filp->f_mapping;
+ page->index = pgoff + i;
+ }
+}
+
static vm_fault_t __dev_dax_pte_fault(struct dev_dax *dev_dax,
struct vm_fault *vmf, pfn_t *pfn)
{
@@ -224,28 +245,8 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
rc = VM_FAULT_SIGBUS;
}
- if (rc == VM_FAULT_NOPAGE) {
- unsigned long i;
- pgoff_t pgoff;
-
- /*
- * In the device-dax case the only possibility for a
- * VM_FAULT_NOPAGE result is when device-dax capacity is
- * mapped. No need to consider the zero page, or racing
- * conflicting mappings.
- */
- pgoff = linear_page_index(vmf->vma,
- ALIGN(vmf->address, fault_size));
- for (i = 0; i < fault_size / PAGE_SIZE; i++) {
- struct page *page;
-
- page = pfn_to_page(pfn_t_to_pfn(pfn) + i);
- if (page->mapping)
- continue;
- page->mapping = filp->f_mapping;
- page->index = pgoff + i;
- }
- }
+ if (rc == VM_FAULT_NOPAGE)
+ dax_set_mapping(vmf, pfn, fault_size);
dax_read_unlock(id);
return rc;
--
2.17.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v6 09/10] device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}()
2021-11-24 19:09 [PATCH v6 00/10] mm, device-dax: Introduce compound pages in devmap Joao Martins
` (7 preceding siblings ...)
2021-11-24 19:10 ` [PATCH v6 08/10] device-dax: factor out page mapping initialization Joao Martins
@ 2021-11-24 19:10 ` Joao Martins
2021-11-25 11:42 ` Joao Martins
2021-11-25 18:02 ` [PATCH v6 09/10] device-dax: set mapping prior to vmf_insert_pfn{, _pmd, pud}() kernel test robot
2021-11-24 19:10 ` [PATCH v6 10/10] device-dax: compound devmap support Joao Martins
2021-11-24 22:30 ` [PATCH v6 00/10] mm, device-dax: Introduce compound pages in devmap Dan Williams
10 siblings, 2 replies; 22+ messages in thread
From: Joao Martins @ 2021-11-24 19:10 UTC (permalink / raw)
To: linux-mm
Cc: Dan Williams, Vishal Verma, Dave Jiang, Naoya Horiguchi,
Matthew Wilcox, Jason Gunthorpe, John Hubbard, Jane Chu,
Muchun Song, Mike Kravetz, Andrew Morton, Jonathan Corbet,
Christoph Hellwig, nvdimm, linux-doc, Joao Martins
Normally, the @page mapping is set prior to inserting the page into a
page table entry. Make device-dax adhere to the same ordering, rather
than setting mapping after the PTE is inserted.
The address_space never changes and it is always associated with the
same inode and underlying pages. So, the page mapping is set once but
cleared when the struct pages are removed/freed (i.e. after
{devm_}memunmap_pages()).
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/dax/device.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 9c87927d4bc2..0ef9fecec005 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -121,6 +121,8 @@ static vm_fault_t __dev_dax_pte_fault(struct dev_dax *dev_dax,
*pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
+ dax_set_mapping(vmf, *pfn, fault_size);
+
return vmf_insert_mixed(vmf->vma, vmf->address, *pfn);
}
@@ -161,6 +163,8 @@ static vm_fault_t __dev_dax_pmd_fault(struct dev_dax *dev_dax,
*pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
+ dax_set_mapping(vmf, *pfn, fault_size);
+
return vmf_insert_pfn_pmd(vmf, *pfn, vmf->flags & FAULT_FLAG_WRITE);
}
@@ -203,6 +207,8 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax,
*pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
+ dax_set_mapping(vmf, *pfn, fault_size);
+
return vmf_insert_pfn_pud(vmf, *pfn, vmf->flags & FAULT_FLAG_WRITE);
}
#else
@@ -245,8 +251,6 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
rc = VM_FAULT_SIGBUS;
}
- if (rc == VM_FAULT_NOPAGE)
- dax_set_mapping(vmf, pfn, fault_size);
dax_read_unlock(id);
return rc;
--
2.17.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v6 09/10] device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}()
2021-11-24 19:10 ` [PATCH v6 09/10] device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}() Joao Martins
@ 2021-11-25 11:42 ` Joao Martins
2021-11-26 18:39 ` Joao Martins
2021-11-25 18:02 ` [PATCH v6 09/10] device-dax: set mapping prior to vmf_insert_pfn{, _pmd, pud}() kernel test robot
1 sibling, 1 reply; 22+ messages in thread
From: Joao Martins @ 2021-11-25 11:42 UTC (permalink / raw)
To: linux-mm
Cc: Dan Williams, Vishal Verma, Dave Jiang, Naoya Horiguchi,
Matthew Wilcox, Jason Gunthorpe, John Hubbard, Jane Chu,
Muchun Song, Mike Kravetz, Andrew Morton, Jonathan Corbet,
Christoph Hellwig, nvdimm, linux-doc
On 11/24/21 19:10, Joao Martins wrote:
> Normally, the @page mapping is set prior to inserting the page into a
> page table entry. Make device-dax adhere to the same ordering, rather
> than setting mapping after the PTE is inserted.
>
> The address_space never changes and it is always associated with the
> same inode and underlying pages. So, the page mapping is set once but
> cleared when the struct pages are removed/freed (i.e. after
> {devm_}memunmap_pages()).
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> drivers/dax/device.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
> index 9c87927d4bc2..0ef9fecec005 100644
> --- a/drivers/dax/device.c
> +++ b/drivers/dax/device.c
> @@ -121,6 +121,8 @@ static vm_fault_t __dev_dax_pte_fault(struct dev_dax *dev_dax,
>
> *pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
>
> + dax_set_mapping(vmf, *pfn, fault_size);
> +
> return vmf_insert_mixed(vmf->vma, vmf->address, *pfn);
> }
>
> @@ -161,6 +163,8 @@ static vm_fault_t __dev_dax_pmd_fault(struct dev_dax *dev_dax,
>
> *pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
>
> + dax_set_mapping(vmf, *pfn, fault_size);
> +
> return vmf_insert_pfn_pmd(vmf, *pfn, vmf->flags & FAULT_FLAG_WRITE);
> }
>
> @@ -203,6 +207,8 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax,
>
> *pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
>
> + dax_set_mapping(vmf, *pfn, fault_size);
> +
> return vmf_insert_pfn_pud(vmf, *pfn, vmf->flags & FAULT_FLAG_WRITE);
> }
> #else
> @@ -245,8 +251,6 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
> rc = VM_FAULT_SIGBUS;
> }
>
> - if (rc == VM_FAULT_NOPAGE)
> - dax_set_mapping(vmf, pfn, fault_size);
> dax_read_unlock(id);
>
> return rc;
>
This last chunk is going to spoof out a new warning because @fault_size in
dev_dax_huge_fault stops being used after this patch.
I've added below chunk for the next version (in addition to Christoph comments in
patch 4):
@@ -217,7 +223,6 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
enum page_entry_size pe_size)
{
struct file *filp = vmf->vma->vm_file;
- unsigned long fault_size;
vm_fault_t rc = VM_FAULT_SIGBUS;
int id;
pfn_t pfn;
@@ -230,23 +235,18 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
id = dax_read_lock();
switch (pe_size) {
case PE_SIZE_PTE:
- fault_size = PAGE_SIZE;
rc = __dev_dax_pte_fault(dev_dax, vmf, &pfn);
break;
case PE_SIZE_PMD:
- fault_size = PMD_SIZE;
rc = __dev_dax_pmd_fault(dev_dax, vmf, &pfn);
break;
case PE_SIZE_PUD:
- fault_size = PUD_SIZE;
rc = __dev_dax_pud_fault(dev_dax, vmf, &pfn);
break;
default:
rc = VM_FAULT_SIGBUS;
}
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 09/10] device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}()
2021-11-25 11:42 ` Joao Martins
@ 2021-11-26 18:39 ` Joao Martins
2021-11-29 7:32 ` Christoph Hellwig
0 siblings, 1 reply; 22+ messages in thread
From: Joao Martins @ 2021-11-26 18:39 UTC (permalink / raw)
To: linux-mm, Andrew Morton
Cc: Dan Williams, Vishal Verma, Dave Jiang, Naoya Horiguchi,
Matthew Wilcox, Jason Gunthorpe, John Hubbard, Jane Chu,
Muchun Song, Mike Kravetz, Jonathan Corbet, Christoph Hellwig,
nvdimm, linux-doc
On 11/25/21 11:42, Joao Martins wrote:
> On 11/24/21 19:10, Joao Martins wrote:
>> @@ -245,8 +251,6 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
>> rc = VM_FAULT_SIGBUS;
>> }
>>
>> - if (rc == VM_FAULT_NOPAGE)
>> - dax_set_mapping(vmf, pfn, fault_size);
>> dax_read_unlock(id);
>>
>> return rc;
>>
> This last chunk is going to spoof out a new warning because @fault_size in
> dev_dax_huge_fault stops being used after this patch.
> I've added below chunk for the next version (in addition to Christoph comments in
> patch 4):
>
Re-attached as a replacement patch below scissors line.
As mentioned earlier, I'll be respinning v7 series with the comments I got on patch 4 and
this replacement below. But given the build warning yesterday&today, figured I
preemptively attach a replacement for it.
----->8-----
From 2f4cb25c0a6546a27ced4981f0963546f386caec Mon Sep 17 00:00:00 2001
From: Joao Martins <joao.m.martins@oracle.com>
Date: Tue, 23 Nov 2021 06:00:38 -0500
Subject: [PATCH] device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}()
Normally, the @page mapping is set prior to inserting the page into a
page table entry. Make device-dax adhere to the same ordering, rather
than setting mapping after the PTE is inserted.
The address_space never changes and it is always associated with the
same inode and underlying pages. So, the page mapping is set once but
cleared when the struct pages are removed/freed (i.e. after
{devm_}memunmap_pages()).
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/dax/device.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 9c87927d4bc2..19a6b86486ce 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -121,6 +121,8 @@ static vm_fault_t __dev_dax_pte_fault(struct dev_dax *dev_dax,
*pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
+ dax_set_mapping(vmf, *pfn, fault_size);
+
return vmf_insert_mixed(vmf->vma, vmf->address, *pfn);
}
@@ -161,6 +163,8 @@ static vm_fault_t __dev_dax_pmd_fault(struct dev_dax *dev_dax,
*pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
+ dax_set_mapping(vmf, *pfn, fault_size);
+
return vmf_insert_pfn_pmd(vmf, *pfn, vmf->flags & FAULT_FLAG_WRITE);
}
@@ -203,6 +207,8 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax,
*pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
+ dax_set_mapping(vmf, *pfn, fault_size);
+
return vmf_insert_pfn_pud(vmf, *pfn, vmf->flags & FAULT_FLAG_WRITE);
}
#else
@@ -217,7 +223,6 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
enum page_entry_size pe_size)
{
struct file *filp = vmf->vma->vm_file;
- unsigned long fault_size;
vm_fault_t rc = VM_FAULT_SIGBUS;
int id;
pfn_t pfn;
@@ -230,23 +235,18 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
id = dax_read_lock();
switch (pe_size) {
case PE_SIZE_PTE:
- fault_size = PAGE_SIZE;
rc = __dev_dax_pte_fault(dev_dax, vmf, &pfn);
break;
case PE_SIZE_PMD:
- fault_size = PMD_SIZE;
rc = __dev_dax_pmd_fault(dev_dax, vmf, &pfn);
break;
case PE_SIZE_PUD:
- fault_size = PUD_SIZE;
rc = __dev_dax_pud_fault(dev_dax, vmf, &pfn);
break;
default:
rc = VM_FAULT_SIGBUS;
}
- if (rc == VM_FAULT_NOPAGE)
- dax_set_mapping(vmf, pfn, fault_size);
dax_read_unlock(id);
return rc;
--
2.17.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v6 09/10] device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}()
2021-11-26 18:39 ` Joao Martins
@ 2021-11-29 7:32 ` Christoph Hellwig
2021-11-29 15:49 ` Joao Martins
0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2021-11-29 7:32 UTC (permalink / raw)
To: Joao Martins
Cc: linux-mm, Andrew Morton, Dan Williams, Vishal Verma, Dave Jiang,
Naoya Horiguchi, Matthew Wilcox, Jason Gunthorpe, John Hubbard,
Jane Chu, Muchun Song, Mike Kravetz, Jonathan Corbet,
Christoph Hellwig, nvdimm, linux-doc
On Fri, Nov 26, 2021 at 06:39:39PM +0000, Joao Martins wrote:
> @@ -230,23 +235,18 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
> id = dax_read_lock();
> switch (pe_size) {
> case PE_SIZE_PTE:
> - fault_size = PAGE_SIZE;
> rc = __dev_dax_pte_fault(dev_dax, vmf, &pfn);
> break;
> case PE_SIZE_PMD:
> - fault_size = PMD_SIZE;
> rc = __dev_dax_pmd_fault(dev_dax, vmf, &pfn);
> break;
> case PE_SIZE_PUD:
> - fault_size = PUD_SIZE;
> rc = __dev_dax_pud_fault(dev_dax, vmf, &pfn);
> break;
> default:
> rc = VM_FAULT_SIGBUS;
> }
>
> dax_read_unlock(id);
I wonder if if would make sense to move dax_read_lock / dax_read_unlock
іnto the individul helpers as well now. That way you could directly
return from the switch. Aso it seems like pfn is only an input
parameter now and doesn't need to be passed by reference.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 09/10] device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}()
2021-11-29 7:32 ` Christoph Hellwig
@ 2021-11-29 15:49 ` Joao Martins
2021-11-29 16:48 ` Christoph Hellwig
2021-11-29 17:20 ` Joao Martins
0 siblings, 2 replies; 22+ messages in thread
From: Joao Martins @ 2021-11-29 15:49 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-mm, Andrew Morton, Dan Williams, Vishal Verma, Dave Jiang,
Naoya Horiguchi, Matthew Wilcox, Jason Gunthorpe, John Hubbard,
Jane Chu, Muchun Song, Mike Kravetz, Jonathan Corbet, nvdimm,
linux-doc
On 11/29/21 07:32, Christoph Hellwig wrote:
> On Fri, Nov 26, 2021 at 06:39:39PM +0000, Joao Martins wrote:
>> @@ -230,23 +235,18 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
>> id = dax_read_lock();
>> switch (pe_size) {
>> case PE_SIZE_PTE:
>> - fault_size = PAGE_SIZE;
>> rc = __dev_dax_pte_fault(dev_dax, vmf, &pfn);
>> break;
>> case PE_SIZE_PMD:
>> - fault_size = PMD_SIZE;
>> rc = __dev_dax_pmd_fault(dev_dax, vmf, &pfn);
>> break;
>> case PE_SIZE_PUD:
>> - fault_size = PUD_SIZE;
>> rc = __dev_dax_pud_fault(dev_dax, vmf, &pfn);
>> break;
>> default:
>> rc = VM_FAULT_SIGBUS;
>> }
>>
>> dax_read_unlock(id);
>
> I wonder if if would make sense to move dax_read_lock / dax_read_unlock
> іnto the individul helpers as well now. That way you could directly
> return from the switch.
Hmmm -- if by individual helpers moving to __dev_dax_{pte,pmd,pud}_fault()
it would be slightly less straighforward. Unless you might mean to move
to check_vma() (around the dax_alive() check) and that might actually
remove the opencoding of dax_read_lock in dax_mmap() even.
I would rather prefer that this cleanup around dax_read_{un,}lock is
a separate patch separate to this series, unless you feel strongly that
it needs to be part of this set.
> Aso it seems like pfn is only an input
> parameter now and doesn't need to be passed by reference.
>
It's actually just an output parameter (that dax_set_mapping would then use).
The fault handlers in device-dax use vmf->address to calculate pfn that they
insert in the page table entry. After this patch we can actually just remove
@pfn argument.
Joao
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 09/10] device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}()
2021-11-29 15:49 ` Joao Martins
@ 2021-11-29 16:48 ` Christoph Hellwig
2021-11-29 17:20 ` Joao Martins
1 sibling, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2021-11-29 16:48 UTC (permalink / raw)
To: Joao Martins
Cc: Christoph Hellwig, linux-mm, Andrew Morton, Dan Williams,
Vishal Verma, Dave Jiang, Naoya Horiguchi, Matthew Wilcox,
Jason Gunthorpe, John Hubbard, Jane Chu, Muchun Song,
Mike Kravetz, Jonathan Corbet, nvdimm, linux-doc
On Mon, Nov 29, 2021 at 03:49:46PM +0000, Joao Martins wrote:
> Hmmm -- if by individual helpers moving to __dev_dax_{pte,pmd,pud}_fault()
> it would be slightly less straighforward. Unless you might mean to move
> to check_vma() (around the dax_alive() check) and that might actually
> remove the opencoding of dax_read_lock in dax_mmap() even.
>
> I would rather prefer that this cleanup around dax_read_{un,}lock is
> a separate patch separate to this series, unless you feel strongly that
> it needs to be part of this set.
Feel free to keep it as-is.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 09/10] device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}()
2021-11-29 15:49 ` Joao Martins
2021-11-29 16:48 ` Christoph Hellwig
@ 2021-11-29 17:20 ` Joao Martins
1 sibling, 0 replies; 22+ messages in thread
From: Joao Martins @ 2021-11-29 17:20 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-mm, Andrew Morton, Dan Williams, Vishal Verma, Dave Jiang,
Naoya Horiguchi, Matthew Wilcox, Jason Gunthorpe, John Hubbard,
Jane Chu, Muchun Song, Mike Kravetz, Jonathan Corbet, nvdimm,
linux-doc
On 11/29/21 15:49, Joao Martins wrote:
> On 11/29/21 07:32, Christoph Hellwig wrote:
>> On Fri, Nov 26, 2021 at 06:39:39PM +0000, Joao Martins wrote:
>> Aso it seems like pfn is only an input
>> parameter now and doesn't need to be passed by reference.
>>
> It's actually just an output parameter (that dax_set_mapping would then use).
>
> The fault handlers in device-dax use vmf->address to calculate pfn that they
> insert in the page table entry. After this patch we can actually just remove
> @pfn argument.
I've added your suggestion as a cleanup patch between 9 and current 10 (11 in v7):
---->8----
From 999cec9efa757b82f435124518b042caeb51bde6 Mon Sep 17 00:00:00 2001
From: Joao Martins <joao.m.martins@oracle.com>
Date: Mon, 29 Nov 2021 11:12:00 -0500
Subject: [PATCH] device-dax: remove pfn from __dev_dax_{pte,pmd,pud}_fault()
After moving the page mapping to be set prior to pte insertion, the pfn
in dev_dax_huge_fault() no longer is necessary. Remove it, as well as
the @pfn argument passed to the internal fault handler helpers.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/dax/device.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 19a6b86486ce..914368164e05 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -95,10 +95,11 @@ static void dax_set_mapping(struct vm_fault *vmf, pfn_t pfn,
}
static vm_fault_t __dev_dax_pte_fault(struct dev_dax *dev_dax,
- struct vm_fault *vmf, pfn_t *pfn)
+ struct vm_fault *vmf)
{
struct device *dev = &dev_dax->dev;
phys_addr_t phys;
+ pfn_t pfn;
unsigned int fault_size = PAGE_SIZE;
if (check_vma(dev_dax, vmf->vma, __func__))
@@ -119,20 +120,21 @@ static vm_fault_t __dev_dax_pte_fault(struct dev_dax *dev_dax,
return VM_FAULT_SIGBUS;
}
- *pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
+ pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
- dax_set_mapping(vmf, *pfn, fault_size);
+ dax_set_mapping(vmf, pfn, fault_size);
- return vmf_insert_mixed(vmf->vma, vmf->address, *pfn);
+ return vmf_insert_mixed(vmf->vma, vmf->address, pfn);
}
static vm_fault_t __dev_dax_pmd_fault(struct dev_dax *dev_dax,
- struct vm_fault *vmf, pfn_t *pfn)
+ struct vm_fault *vmf)
{
unsigned long pmd_addr = vmf->address & PMD_MASK;
struct device *dev = &dev_dax->dev;
phys_addr_t phys;
pgoff_t pgoff;
+ pfn_t pfn;
unsigned int fault_size = PMD_SIZE;
if (check_vma(dev_dax, vmf->vma, __func__))
@@ -161,21 +163,22 @@ static vm_fault_t __dev_dax_pmd_fault(struct dev_dax *dev_dax,
return VM_FAULT_SIGBUS;
}
- *pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
+ pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
- dax_set_mapping(vmf, *pfn, fault_size);
+ dax_set_mapping(vmf, pfn, fault_size);
- return vmf_insert_pfn_pmd(vmf, *pfn, vmf->flags & FAULT_FLAG_WRITE);
+ return vmf_insert_pfn_pmd(vmf, pfn, vmf->flags & FAULT_FLAG_WRITE);
}
#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax,
- struct vm_fault *vmf, pfn_t *pfn)
+ struct vm_fault *vmf)
{
unsigned long pud_addr = vmf->address & PUD_MASK;
struct device *dev = &dev_dax->dev;
phys_addr_t phys;
pgoff_t pgoff;
+ pfn_t pfn;
unsigned int fault_size = PUD_SIZE;
@@ -205,11 +208,11 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax,
return VM_FAULT_SIGBUS;
}
- *pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
+ pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
- dax_set_mapping(vmf, *pfn, fault_size);
+ dax_set_mapping(vmf, pfn, fault_size);
- return vmf_insert_pfn_pud(vmf, *pfn, vmf->flags & FAULT_FLAG_WRITE);
+ return vmf_insert_pfn_pud(vmf, pfn, vmf->flags & FAULT_FLAG_WRITE);
}
#else
static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax,
@@ -225,7 +228,6 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
struct file *filp = vmf->vma->vm_file;
vm_fault_t rc = VM_FAULT_SIGBUS;
int id;
- pfn_t pfn;
struct dev_dax *dev_dax = filp->private_data;
dev_dbg(&dev_dax->dev, "%s: %s (%#lx - %#lx) size = %d\n", current->comm,
@@ -235,13 +237,13 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
id = dax_read_lock();
switch (pe_size) {
case PE_SIZE_PTE:
- rc = __dev_dax_pte_fault(dev_dax, vmf, &pfn);
+ rc = __dev_dax_pte_fault(dev_dax, vmf);
break;
case PE_SIZE_PMD:
- rc = __dev_dax_pmd_fault(dev_dax, vmf, &pfn);
+ rc = __dev_dax_pmd_fault(dev_dax, vmf);
break;
case PE_SIZE_PUD:
- rc = __dev_dax_pud_fault(dev_dax, vmf, &pfn);
+ rc = __dev_dax_pud_fault(dev_dax, vmf);
break;
default:
rc = VM_FAULT_SIGBUS;
--
2.17.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v6 09/10] device-dax: set mapping prior to vmf_insert_pfn{, _pmd, pud}()
2021-11-24 19:10 ` [PATCH v6 09/10] device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}() Joao Martins
2021-11-25 11:42 ` Joao Martins
@ 2021-11-25 18:02 ` kernel test robot
1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2021-11-25 18:02 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 5608 bytes --]
Hi Joao,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on hnaz-mm/master]
url: https://github.com/0day-ci/linux/commits/Joao-Martins/mm-device-dax-Introduce-compound-pages-in-devmap/20211125-031335
base: https://github.com/hnaz/linux-mm master
config: arc-allyesconfig (https://download.01.org/0day-ci/archive/20211126/202111260142.vX4mNkxP-lkp(a)intel.com/config)
compiler: arceb-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/44ce2eab00a4dc12253e01dd01be15b7e4d7d1ea
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Joao-Martins/mm-device-dax-Introduce-compound-pages-in-devmap/20211125-031335
git checkout 44ce2eab00a4dc12253e01dd01be15b7e4d7d1ea
# save the config file to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=arc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
drivers/dax/device.c: In function 'dev_dax_huge_fault':
>> drivers/dax/device.c:226:23: warning: variable 'fault_size' set but not used [-Wunused-but-set-variable]
226 | unsigned long fault_size;
| ^~~~~~~~~~
drivers/dax/device.c: At top level:
drivers/dax/device.c:396:5: warning: no previous prototype for 'dev_dax_probe' [-Wmissing-prototypes]
396 | int dev_dax_probe(struct dev_dax *dev_dax)
| ^~~~~~~~~~~~~
vim +/fault_size +226 drivers/dax/device.c
9557feee39b75ce drivers/dax/dax.c Dave Jiang 2017-02-24 221
226ab561075f6f8 drivers/dax/device.c Dan Williams 2018-07-13 222 static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
c791ace1e747371 drivers/dax/dax.c Dave Jiang 2017-02-24 223 enum page_entry_size pe_size)
dee410792419aaa drivers/dax/dax.c Dan Williams 2016-05-14 224 {
f42003917b4569a drivers/dax/dax.c Dave Jiang 2017-02-22 225 struct file *filp = vmf->vma->vm_file;
2232c6382a453db drivers/dax/device.c Dan Williams 2018-07-13 @226 unsigned long fault_size;
36bdac1e674debd drivers/dax/device.c Souptick Joarder 2018-09-04 227 vm_fault_t rc = VM_FAULT_SIGBUS;
36bdac1e674debd drivers/dax/device.c Souptick Joarder 2018-09-04 228 int id;
2232c6382a453db drivers/dax/device.c Dan Williams 2018-07-13 229 pfn_t pfn;
5f0694b300b9fb8 drivers/dax/dax.c Dan Williams 2017-01-30 230 struct dev_dax *dev_dax = filp->private_data;
dee410792419aaa drivers/dax/dax.c Dan Williams 2016-05-14 231
6daaca522ab464d drivers/dax/device.c Dan Williams 2018-03-05 232 dev_dbg(&dev_dax->dev, "%s: %s (%#lx - %#lx) size = %d\n", current->comm,
6daaca522ab464d drivers/dax/device.c Dan Williams 2018-03-05 233 (vmf->flags & FAULT_FLAG_WRITE) ? "write" : "read",
762026203c0b87b drivers/dax/dax.c Oliver O'Halloran 2017-04-12 234 vmf->vma->vm_start, vmf->vma->vm_end, pe_size);
dee410792419aaa drivers/dax/dax.c Dan Williams 2016-05-14 235
7b6be8444e0f0dd drivers/dax/device.c Dan Williams 2017-04-11 236 id = dax_read_lock();
c791ace1e747371 drivers/dax/dax.c Dave Jiang 2017-02-24 237 switch (pe_size) {
c791ace1e747371 drivers/dax/dax.c Dave Jiang 2017-02-24 238 case PE_SIZE_PTE:
2232c6382a453db drivers/dax/device.c Dan Williams 2018-07-13 239 fault_size = PAGE_SIZE;
2232c6382a453db drivers/dax/device.c Dan Williams 2018-07-13 240 rc = __dev_dax_pte_fault(dev_dax, vmf, &pfn);
a2d581675d485eb drivers/dax/dax.c Dave Jiang 2017-02-24 241 break;
c791ace1e747371 drivers/dax/dax.c Dave Jiang 2017-02-24 242 case PE_SIZE_PMD:
2232c6382a453db drivers/dax/device.c Dan Williams 2018-07-13 243 fault_size = PMD_SIZE;
2232c6382a453db drivers/dax/device.c Dan Williams 2018-07-13 244 rc = __dev_dax_pmd_fault(dev_dax, vmf, &pfn);
9557feee39b75ce drivers/dax/dax.c Dave Jiang 2017-02-24 245 break;
c791ace1e747371 drivers/dax/dax.c Dave Jiang 2017-02-24 246 case PE_SIZE_PUD:
2232c6382a453db drivers/dax/device.c Dan Williams 2018-07-13 247 fault_size = PUD_SIZE;
2232c6382a453db drivers/dax/device.c Dan Williams 2018-07-13 248 rc = __dev_dax_pud_fault(dev_dax, vmf, &pfn);
a2d581675d485eb drivers/dax/dax.c Dave Jiang 2017-02-24 249 break;
a2d581675d485eb drivers/dax/dax.c Dave Jiang 2017-02-24 250 default:
54eafcc9e339aff drivers/dax/dax.c Pushkar Jambhlekar 2017-04-11 251 rc = VM_FAULT_SIGBUS;
a2d581675d485eb drivers/dax/dax.c Dave Jiang 2017-02-24 252 }
2232c6382a453db drivers/dax/device.c Dan Williams 2018-07-13 253
7b6be8444e0f0dd drivers/dax/device.c Dan Williams 2017-04-11 254 dax_read_unlock(id);
dee410792419aaa drivers/dax/dax.c Dan Williams 2016-05-14 255
dee410792419aaa drivers/dax/dax.c Dan Williams 2016-05-14 256 return rc;
dee410792419aaa drivers/dax/dax.c Dan Williams 2016-05-14 257 }
dee410792419aaa drivers/dax/dax.c Dan Williams 2016-05-14 258
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v6 10/10] device-dax: compound devmap support
2021-11-24 19:09 [PATCH v6 00/10] mm, device-dax: Introduce compound pages in devmap Joao Martins
` (8 preceding siblings ...)
2021-11-24 19:10 ` [PATCH v6 09/10] device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}() Joao Martins
@ 2021-11-24 19:10 ` Joao Martins
2021-11-24 22:30 ` [PATCH v6 00/10] mm, device-dax: Introduce compound pages in devmap Dan Williams
10 siblings, 0 replies; 22+ messages in thread
From: Joao Martins @ 2021-11-24 19:10 UTC (permalink / raw)
To: linux-mm
Cc: Dan Williams, Vishal Verma, Dave Jiang, Naoya Horiguchi,
Matthew Wilcox, Jason Gunthorpe, John Hubbard, Jane Chu,
Muchun Song, Mike Kravetz, Andrew Morton, Jonathan Corbet,
Christoph Hellwig, nvdimm, linux-doc, Joao Martins
Use the newly added compound devmap facility which maps the assigned dax
ranges as compound pages at a page size of @align.
dax devices are created with a fixed @align (huge page size) which is
enforced through as well at mmap() of the device. Faults, consequently
happen too at the specified @align specified at the creation, and those
don't change throughout dax device lifetime. MCEs unmap a whole dax
huge page, as well as splits occurring at the configured page size.
Performance measured by gup_test improves considerably for
unpin_user_pages() and altmap with NVDIMMs:
$ gup_test -f /dev/dax1.0 -m 16384 -r 10 -S -a -n 512 -w
(pin_user_pages_fast 2M pages) put:~71 ms -> put:~22 ms
[altmap]
(pin_user_pages_fast 2M pages) get:~524ms put:~525 ms -> get: ~127ms put:~71ms
$ gup_test -f /dev/dax1.0 -m 129022 -r 10 -S -a -n 512 -w
(pin_user_pages_fast 2M pages) put:~513 ms -> put:~188 ms
[altmap with -m 127004]
(pin_user_pages_fast 2M pages) get:~4.1 secs put:~4.12 secs -> get:~1sec put:~563ms
.. as well as unpin_user_page_range_dirty_lock() being just as effective
as THP/hugetlb[0] pages.
[0] https://lore.kernel.org/linux-mm/20210212130843.13865-5-joao.m.martins@oracle.com/
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/dax/device.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 0ef9fecec005..9b51108aea91 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -78,14 +78,20 @@ static void dax_set_mapping(struct vm_fault *vmf, pfn_t pfn,
{
unsigned long i, nr_pages = fault_size / PAGE_SIZE;
struct file *filp = vmf->vma->vm_file;
+ struct dev_dax *dev_dax = filp->private_data;
pgoff_t pgoff;
+ /* mapping is only set on the head */
+ if (dev_dax->pgmap->vmemmap_shift)
+ nr_pages = 1;
+
pgoff = linear_page_index(vmf->vma,
ALIGN(vmf->address, fault_size));
for (i = 0; i < nr_pages; i++) {
struct page *page = pfn_to_page(pfn_t_to_pfn(pfn) + i);
+ page = compound_head(page);
if (page->mapping)
continue;
@@ -445,6 +451,9 @@ int dev_dax_probe(struct dev_dax *dev_dax)
}
pgmap->type = MEMORY_DEVICE_GENERIC;
+ if (dev_dax->align > PAGE_SIZE)
+ pgmap->vmemmap_shift =
+ order_base_2(dev_dax->align >> PAGE_SHIFT);
addr = devm_memremap_pages(dev, pgmap);
if (IS_ERR(addr))
return PTR_ERR(addr);
--
2.17.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v6 00/10] mm, device-dax: Introduce compound pages in devmap
2021-11-24 19:09 [PATCH v6 00/10] mm, device-dax: Introduce compound pages in devmap Joao Martins
` (9 preceding siblings ...)
2021-11-24 19:10 ` [PATCH v6 10/10] device-dax: compound devmap support Joao Martins
@ 2021-11-24 22:30 ` Dan Williams
2021-11-24 22:41 ` Andrew Morton
10 siblings, 1 reply; 22+ messages in thread
From: Dan Williams @ 2021-11-24 22:30 UTC (permalink / raw)
To: Joao Martins
Cc: Linux MM, Vishal Verma, Dave Jiang, Naoya Horiguchi,
Matthew Wilcox, Jason Gunthorpe, John Hubbard, Jane Chu,
Muchun Song, Mike Kravetz, Andrew Morton, Jonathan Corbet,
Christoph Hellwig, Linux NVDIMM, Linux Doc Mailing List
On Wed, Nov 24, 2021 at 11:10 AM Joao Martins <joao.m.martins@oracle.com> wrote:
>
> Changes since v5[9]:
>
> * Keep @dev on the previous line to improve readability on
> patch 5 (Christoph Hellwig)
> * Document is_static() function to clarify what are static and
> dynamic dax regions in patch 7 (Christoph Hellwig)
> * Deduce @f_mapping and @pgmap from vmf->vma->vm_file to reduce
> the number of arguments of set_{page,compound}_mapping() in last
> patch (Christoph Hellwig)
> * Factor out @mapping initialization to a separate helper ([new] patch 8)
> and rename set_page_mapping() to dax_set_mapping() in the process.
> * Remove set_compound_mapping() and instead adjust dax_set_mapping()
> to handle @vmemmap_shift case on the last patch. This greatly
> simplifies the last patch, and addresses a similar comment by Christoph
> on having an earlier return. No functional change on the changes
> to dax_set_mapping compared to its earlier version so I retained
> Dan's Rb on last patch.
> * Initialize the mapping prior to inserting the PTE/PMD/PUD as opposed
> to after the fact. ([new] patch 9, Jason Gunthorpe)
>
Looks good Joao, I was about to ping Christoph and Jason to make sure
their review comments are fully addressed before pulling this into my
dax tree, but I see Andrew has already picked this up. I'm ok for this
to go through -mm.
It might end up colliding with some of the DAX cleanups that are
brewing, but if that happens I might apply them to resolve conflicts
and ask Andrew to drop them out of -mm. We can cross that bridge
later.
Thanks for all the effort on this Joao, it's a welcome improvement.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 00/10] mm, device-dax: Introduce compound pages in devmap
2021-11-24 22:30 ` [PATCH v6 00/10] mm, device-dax: Introduce compound pages in devmap Dan Williams
@ 2021-11-24 22:41 ` Andrew Morton
0 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2021-11-24 22:41 UTC (permalink / raw)
To: Dan Williams
Cc: Joao Martins, Linux MM, Vishal Verma, Dave Jiang,
Naoya Horiguchi, Matthew Wilcox, Jason Gunthorpe, John Hubbard,
Jane Chu, Muchun Song, Mike Kravetz, Jonathan Corbet,
Christoph Hellwig, Linux NVDIMM, Linux Doc Mailing List
On Wed, 24 Nov 2021 14:30:56 -0800 Dan Williams <dan.j.williams@intel.com> wrote:
> It might end up colliding with some of the DAX cleanups that are
> brewing, but if that happens I might apply them to resolve conflicts
> and ask Andrew to drop them out of -mm. We can cross that bridge
> later.
Yep, not a problem.
^ permalink raw reply [flat|nested] 22+ messages in thread