* [PATCH v2 0/6] shmem: high order folios support in write path
[not found] <CGME20230919135546eucas1p1181b8914fb5eceda5f08068802941358@eucas1p1.samsung.com>
@ 2023-09-19 13:55 ` Daniel Gomez
[not found] ` <CGME20230919135547eucas1p2777d9fde904adf4c2d0ac665d78880c1@eucas1p2.samsung.com>
` (6 more replies)
0 siblings, 7 replies; 36+ messages in thread
From: Daniel Gomez @ 2023-09-19 13:55 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
This series add support for high order folios in shmem write
path when swap is disabled (noswap option). This is part of the Large
Block Size (LBS) effort [1][2] and a continuation of the shmem work
from Luis here [3] following Matthew Wilcox's suggestion [4] regarding
the path to take for the folio allocation order calculation.
[1] https://kernelnewbies.org/KernelProjects/large-block-size
[2] https://docs.google.com/spreadsheets/d/e/2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8/pubhtml#
[3] RFC v2 add support for blocksize > PAGE_SIZE
https://lore.kernel.org/all/ZHBowMEDfyrAAOWH@bombadil.infradead.org/T/#md3e93ab46ce2ad9254e1eb54ffe71211988b5632
[4] https://lore.kernel.org/all/ZHD9zmIeNXICDaRJ@casper.infradead.org/
fsx and fstests has been performed on tmpfs with noswap with the
following results:
V2:
- fsx: 4,9B
- fstests: Same result as baseline for next-230918.
V1:
- fsx: 2d test, 21,5B
- fstests: Same result as baseline for next-230911 [3][4][5]
Patches have been tested and sent from next-230918.
[3] Baseline next-230911 failures are: generic/080 generic/126
generic/193 generic/633 generic/689
[4] fstests logs baseline: https://gitlab.com/-/snippets/3598621
[5] fstests logs patches: https://gitlab.com/-/snippets/3598628
Note: because of next-230918 regression in rmap, patch [8] applied.
[8] 20230918151729.5A1F4C32796@smtp.kernel.org
Daniel
Changes since v1
* Order handling code simplified in shmem_get_folio_gfp after Matthew Willcox's
review.
* Drop patch 1/6 [6] and merge mapping_size_order code directly in shmem.
* Added MAX_SHMEM_ORDER to make it explicit we don't have the same max order as
in pagecache (MAX_PAGECACHE_ORDER).
* Use HPAGE_PMD_ORDER-1 as MAX_SHMEM_ORDER to respect huge mount option.
* Update cover letter: drop huge strategy question and add more context regarding
LBS project. Add fsx and fstests summary with new baseline.
* Add fixes found by Matthew in patch 3/6 [7].
* Fix length (i_size_read -> PAGE_SIZE) that is passed to shmem_get_folio_gfp in
shmem_fault and shmem_read_folio_gfp to PAGE_SIZE.
* Add patch as suggested by Matthew to return the number of pages freed in
shmem_free_swap (instead of errno). When no pages are freed, return 0 (pages).
Note: As an alternative, we can embed -ENOENT and make use of IS_ERR_VALUE.
Approach discarded because little value was added. If this method is preferred,
please let discuss it.
[6] filemap: make the folio order calculation shareable
[7] shmem: account for large order folios
Daniel Gomez (5):
shmem: drop BLOCKS_PER_PAGE macro
shmem: return freed pages in shmem_free_swap
shmem: add order parameter support to shmem_alloc_folio
shmem: add file length in shmem_get_folio path
shmem: add large folios support to the write path
Luis Chamberlain (1):
shmem: account for large order folios
include/linux/shmem_fs.h | 2 +-
mm/khugepaged.c | 2 +-
mm/shmem.c | 141 ++++++++++++++++++++++++++-------------
3 files changed, 97 insertions(+), 48 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 1/6] shmem: drop BLOCKS_PER_PAGE macro
[not found] ` <CGME20230919135547eucas1p2777d9fde904adf4c2d0ac665d78880c1@eucas1p2.samsung.com>
@ 2023-09-19 13:55 ` Daniel Gomez
0 siblings, 0 replies; 36+ messages in thread
From: Daniel Gomez @ 2023-09-19 13:55 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
The commit [1] replaced all BLOCKS_PER_PAGE in favor of the
generic PAGE_SECTORS but definition was not removed. Drop it
as unused macro.
[1] e09764cff44b5 ("shmem: quota support").
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
---
mm/shmem.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index be050efe18cb..de0d0fa0349e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -84,7 +84,6 @@ static struct vfsmount *shm_mnt;
#include "internal.h"
-#define BLOCKS_PER_PAGE (PAGE_SIZE/512)
#define VM_ACCT(size) (PAGE_ALIGN(size) >> PAGE_SHIFT)
/* Pretend that each entry is of this size in directory's i_size */
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 2/6] shmem: return freed pages in shmem_free_swap
[not found] ` <CGME20230919135549eucas1p1f67e7879a14a87724a9462fb8dd635bf@eucas1p1.samsung.com>
@ 2023-09-19 13:55 ` Daniel Gomez
2023-09-19 14:56 ` Matthew Wilcox
0 siblings, 1 reply; 36+ messages in thread
From: Daniel Gomez @ 2023-09-19 13:55 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
Both shmem_free_swap callers require to get the number of pages in the
folio after calling shmem_free_swap. Make shmem_free_swap return the
expected value directly and return 0 number of pages being freed
to avoid error handling in the external accounting.
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
mm/shmem.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index de0d0fa0349e..5c9e80207cbf 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -846,16 +846,18 @@ static void shmem_delete_from_page_cache(struct folio *folio, void *radswap)
/*
* Remove swap entry from page cache, free the swap and its page cache.
*/
-static int shmem_free_swap(struct address_space *mapping,
+static long shmem_free_swap(struct address_space *mapping,
pgoff_t index, void *radswap)
{
void *old;
old = xa_cmpxchg_irq(&mapping->i_pages, index, radswap, NULL, 0);
if (old != radswap)
- return -ENOENT;
+ return 0;
+
free_swap_and_cache(radix_to_swp_entry(radswap));
- return 0;
+
+ return folio_nr_pages((struct folio *)radswap);
}
/*
@@ -1008,7 +1010,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
if (xa_is_value(folio)) {
if (unfalloc)
continue;
- nr_swaps_freed += !shmem_free_swap(mapping,
+ nr_swaps_freed += shmem_free_swap(mapping,
indices[i], folio);
continue;
}
@@ -1077,12 +1079,12 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
if (xa_is_value(folio)) {
if (unfalloc)
continue;
- if (shmem_free_swap(mapping, indices[i], folio)) {
+ nr_swaps_freed += shmem_free_swap(mapping, indices[i], folio);
+ if (!nr_swaps_freed) {
/* Swap was replaced by page: retry */
index = indices[i];
break;
}
- nr_swaps_freed++;
continue;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 3/6] shmem: account for large order folios
[not found] ` <CGME20230919135550eucas1p2c19565924daeecf71734ea89d95c84db@eucas1p2.samsung.com>
@ 2023-09-19 13:55 ` Daniel Gomez
0 siblings, 0 replies; 36+ messages in thread
From: Daniel Gomez @ 2023-09-19 13:55 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
From: Luis Chamberlain <mcgrof@kernel.org>
shmem uses the shem_info_inode alloced, swapped to account
for allocated pages and swapped pages. In preparation for large
order folios adjust the accounting to use folio_nr_pages().
This should produce no functional changes yet as larger order
folios are not yet used or supported in shmem.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
mm/shmem.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 5c9e80207cbf..d41ee5983fd4 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -871,16 +871,16 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
pgoff_t start, pgoff_t end)
{
XA_STATE(xas, &mapping->i_pages, start);
- struct page *page;
+ struct folio *folio;
unsigned long swapped = 0;
unsigned long max = end - 1;
rcu_read_lock();
- xas_for_each(&xas, page, max) {
- if (xas_retry(&xas, page))
+ xas_for_each(&xas, folio, max) {
+ if (xas_retry(&xas, folio))
continue;
- if (xa_is_value(page))
- swapped++;
+ if (xa_is_value(folio))
+ swapped += folio_nr_pages(folio);
if (xas.xa_index == max)
break;
if (need_resched()) {
@@ -1530,7 +1530,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
if (add_to_swap_cache(folio, swap,
__GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN,
NULL) == 0) {
- shmem_recalc_inode(inode, 0, 1);
+ shmem_recalc_inode(inode, 0, folio_nr_pages(folio));
swap_shmem_alloc(swap);
shmem_delete_from_page_cache(folio, swp_to_radix_entry(swap));
@@ -1803,6 +1803,7 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
struct address_space *mapping = inode->i_mapping;
swp_entry_t swapin_error;
void *old;
+ long num_swap_pages;
swapin_error = make_poisoned_swp_entry();
old = xa_cmpxchg_irq(&mapping->i_pages, index,
@@ -1812,13 +1813,14 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
return;
folio_wait_writeback(folio);
+ num_swap_pages = folio_nr_pages(folio);
delete_from_swap_cache(folio);
/*
* Don't treat swapin error folio as alloced. Otherwise inode->i_blocks
* won't be 0 when inode is released and thus trigger WARN_ON(i_blocks)
* in shmem_evict_inode().
*/
- shmem_recalc_inode(inode, -1, -1);
+ shmem_recalc_inode(inode, -num_swap_pages, -num_swap_pages);
swap_free(swap);
}
@@ -1905,7 +1907,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
if (error)
goto failed;
- shmem_recalc_inode(inode, 0, -1);
+ shmem_recalc_inode(inode, 0, -folio_nr_pages(folio));
if (sgp == SGP_WRITE)
folio_mark_accessed(folio);
@@ -2665,7 +2667,7 @@ int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
if (ret)
goto out_delete_from_cache;
- shmem_recalc_inode(inode, 1, 0);
+ shmem_recalc_inode(inode, folio_nr_pages(folio), 0);
folio_unlock(folio);
return 0;
out_delete_from_cache:
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 4/6] shmem: add order parameter support to shmem_alloc_folio
[not found] ` <CGME20230919135552eucas1p11e19cd339078c2e0b788b52fae46e7c9@eucas1p1.samsung.com>
@ 2023-09-19 13:55 ` Daniel Gomez
0 siblings, 0 replies; 36+ messages in thread
From: Daniel Gomez @ 2023-09-19 13:55 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
In preparation for high order folio support for the write path, add
order parameter when allocating a folio. This is on the write path
when huge support is not enabled or when it is but the huge page
allocation fails, the fallback will take advantage of this too.
Use order 0 for the non write paths such as reads or swap in as these
currently lack high order folios support.
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
mm/shmem.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index d41ee5983fd4..66d94207b40c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1667,20 +1667,21 @@ static struct folio *shmem_alloc_hugefolio(gfp_t gfp,
}
static struct folio *shmem_alloc_folio(gfp_t gfp,
- struct shmem_inode_info *info, pgoff_t index)
+ struct shmem_inode_info *info, pgoff_t index,
+ unsigned int order)
{
struct vm_area_struct pvma;
struct folio *folio;
shmem_pseudo_vma_init(&pvma, info, index);
- folio = vma_alloc_folio(gfp, 0, &pvma, 0, false);
+ folio = vma_alloc_folio(gfp, order, &pvma, 0, false);
shmem_pseudo_vma_destroy(&pvma);
return folio;
}
static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode,
- pgoff_t index, bool huge)
+ pgoff_t index, bool huge, unsigned int *order)
{
struct shmem_inode_info *info = SHMEM_I(inode);
struct folio *folio;
@@ -1689,7 +1690,7 @@ static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode,
if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
huge = false;
- nr = huge ? HPAGE_PMD_NR : 1;
+ nr = huge ? HPAGE_PMD_NR : 1U << *order;
err = shmem_inode_acct_block(inode, nr);
if (err)
@@ -1698,7 +1699,7 @@ static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode,
if (huge)
folio = shmem_alloc_hugefolio(gfp, info, index);
else
- folio = shmem_alloc_folio(gfp, info, index);
+ folio = shmem_alloc_folio(gfp, info, index, *order);
if (folio) {
__folio_set_locked(folio);
__folio_set_swapbacked(folio);
@@ -1748,7 +1749,7 @@ static int shmem_replace_folio(struct folio **foliop, gfp_t gfp,
*/
gfp &= ~GFP_CONSTRAINT_MASK;
VM_BUG_ON_FOLIO(folio_test_large(old), old);
- new = shmem_alloc_folio(gfp, info, index);
+ new = shmem_alloc_folio(gfp, info, index, 0);
if (!new)
return -ENOMEM;
@@ -1959,6 +1960,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
int error;
int once = 0;
int alloced = 0;
+ unsigned int order = 0;
if (index > (MAX_LFS_FILESIZE >> PAGE_SHIFT))
return -EFBIG;
@@ -2034,10 +2036,12 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
huge_gfp = vma_thp_gfp_mask(vma);
huge_gfp = limit_gfp_mask(huge_gfp, gfp);
- folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true);
+ folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true,
+ &order);
if (IS_ERR(folio)) {
alloc_nohuge:
- folio = shmem_alloc_and_acct_folio(gfp, inode, index, false);
+ folio = shmem_alloc_and_acct_folio(gfp, inode, index, false,
+ &order);
}
if (IS_ERR(folio)) {
int retry = 5;
@@ -2600,7 +2604,7 @@ int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
if (!*foliop) {
ret = -ENOMEM;
- folio = shmem_alloc_folio(gfp, info, pgoff);
+ folio = shmem_alloc_folio(gfp, info, pgoff, 0);
if (!folio)
goto out_unacct_blocks;
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 5/6] shmem: add file length in shmem_get_folio path
[not found] ` <CGME20230919135554eucas1p1fefbe420a2381465f3b6b2b7f298433c@eucas1p1.samsung.com>
@ 2023-09-19 13:55 ` Daniel Gomez
2023-09-20 18:03 ` kernel test robot
0 siblings, 1 reply; 36+ messages in thread
From: Daniel Gomez @ 2023-09-19 13:55 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
To be able to calculate folio order based on the file size when
allocation occurs on the write path. Use of length 0 for non write
paths and PAGE_SIZE for pagecache read and vm fault.
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
include/linux/shmem_fs.h | 2 +-
mm/khugepaged.c | 2 +-
mm/shmem.c | 32 ++++++++++++++++++--------------
3 files changed, 20 insertions(+), 16 deletions(-)
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 6b0c626620f5..b3509e7f1054 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -133,7 +133,7 @@ enum sgp_type {
};
int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
- enum sgp_type sgp);
+ enum sgp_type sgp, size_t len);
struct folio *shmem_read_folio_gfp(struct address_space *mapping,
pgoff_t index, gfp_t gfp);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 88433cc25d8a..e5d3feff6de6 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1856,7 +1856,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
xas_unlock_irq(&xas);
/* swap in or instantiate fallocated page */
if (shmem_get_folio(mapping->host, index,
- &folio, SGP_NOALLOC)) {
+ &folio, SGP_NOALLOC, 0)) {
result = SCAN_FAIL;
goto xa_unlocked;
}
diff --git a/mm/shmem.c b/mm/shmem.c
index 66d94207b40c..38aafa0b0845 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -971,7 +971,7 @@ static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index)
* (although in some cases this is just a waste of time).
*/
folio = NULL;
- shmem_get_folio(inode, index, &folio, SGP_READ);
+ shmem_get_folio(inode, index, &folio, SGP_READ, 0);
return folio;
}
@@ -1948,7 +1948,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
struct folio **foliop, enum sgp_type sgp, gfp_t gfp,
struct vm_area_struct *vma, struct vm_fault *vmf,
- vm_fault_t *fault_type)
+ vm_fault_t *fault_type, size_t len)
{
struct address_space *mapping = inode->i_mapping;
struct shmem_inode_info *info = SHMEM_I(inode);
@@ -2162,10 +2162,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
}
int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
- enum sgp_type sgp)
+ enum sgp_type sgp, size_t len)
{
return shmem_get_folio_gfp(inode, index, foliop, sgp,
- mapping_gfp_mask(inode->i_mapping), NULL, NULL, NULL);
+ mapping_gfp_mask(inode->i_mapping),
+ NULL, NULL, NULL, len);
}
/*
@@ -2248,8 +2249,8 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf)
spin_unlock(&inode->i_lock);
}
- err = shmem_get_folio_gfp(inode, vmf->pgoff, &folio, SGP_CACHE,
- gfp, vma, vmf, &ret);
+ err = shmem_get_folio_gfp(inode, vmf->pgoff, &folio, SGP_CACHE, gfp,
+ vma, vmf, &ret, PAGE_SIZE);
if (err)
return vmf_error(err);
if (folio)
@@ -2700,6 +2701,9 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
struct folio *folio;
int ret = 0;
+ if (!mapping_large_folio_support(mapping))
+ len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
+
/* i_rwsem is held by caller */
if (unlikely(info->seals & (F_SEAL_GROW |
F_SEAL_WRITE | F_SEAL_FUTURE_WRITE))) {
@@ -2709,7 +2713,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
return -EPERM;
}
- ret = shmem_get_folio(inode, index, &folio, SGP_WRITE);
+ ret = shmem_get_folio(inode, index, &folio, SGP_WRITE, len);
if (ret)
return ret;
@@ -2781,7 +2785,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
break;
}
- error = shmem_get_folio(inode, index, &folio, SGP_READ);
+ error = shmem_get_folio(inode, index, &folio, SGP_READ, 0);
if (error) {
if (error == -EINVAL)
error = 0;
@@ -2958,7 +2962,7 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
break;
error = shmem_get_folio(inode, *ppos / PAGE_SIZE, &folio,
- SGP_READ);
+ SGP_READ, 0);
if (error) {
if (error == -EINVAL)
error = 0;
@@ -3145,7 +3149,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
error = -ENOMEM;
else
error = shmem_get_folio(inode, index, &folio,
- SGP_FALLOC);
+ SGP_FALLOC, 0);
if (error) {
info->fallocend = undo_fallocend;
/* Remove the !uptodate folios we added */
@@ -3500,7 +3504,7 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
inode->i_op = &shmem_short_symlink_operations;
} else {
inode_nohighmem(inode);
- error = shmem_get_folio(inode, 0, &folio, SGP_WRITE);
+ error = shmem_get_folio(inode, 0, &folio, SGP_WRITE, 0);
if (error)
goto out_remove_offset;
inode->i_mapping->a_ops = &shmem_aops;
@@ -3548,7 +3552,7 @@ static const char *shmem_get_link(struct dentry *dentry,
return ERR_PTR(-ECHILD);
}
} else {
- error = shmem_get_folio(inode, 0, &folio, SGP_READ);
+ error = shmem_get_folio(inode, 0, &folio, SGP_READ, 0);
if (error)
return ERR_PTR(error);
if (!folio)
@@ -4916,8 +4920,8 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
int error;
BUG_ON(!shmem_mapping(mapping));
- error = shmem_get_folio_gfp(inode, index, &folio, SGP_CACHE,
- gfp, NULL, NULL, NULL);
+ error = shmem_get_folio_gfp(inode, index, &folio, SGP_CACHE, gfp, NULL,
+ NULL, NULL, PAGE_SIZE);
if (error)
return ERR_PTR(error);
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 6/6] shmem: add large folios support to the write path
[not found] ` <CGME20230919135556eucas1p19920c52d4af0809499eac6bbf4466117@eucas1p1.samsung.com>
@ 2023-09-19 13:55 ` Daniel Gomez
2023-09-19 15:01 ` Matthew Wilcox
` (2 more replies)
0 siblings, 3 replies; 36+ messages in thread
From: Daniel Gomez @ 2023-09-19 13:55 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
Add large folio support for shmem write path matching the same high
order preference mechanism used for iomap buffered IO path as used in
__filemap_get_folio() with a difference on the max order permitted
(being PMD_ORDER-1) to respect the huge mount option when large folio
is supported.
Use the __folio_get_max_order to get a hint for the order of the folio
based on file size which takes care of the mapping requirements.
Swap does not support high order folios for now, so make it order 0 in
case swap is enabled.
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
mm/shmem.c | 66 ++++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 52 insertions(+), 14 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 38aafa0b0845..96c74c96c0d9 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -95,6 +95,9 @@ static struct vfsmount *shm_mnt;
/* Symlink up to this size is kmalloc'ed instead of using a swappable page */
#define SHORT_SYMLINK_LEN 128
+/* Like MAX_PAGECACHE_ORDER but respecting huge option */
+#define MAX_SHMEM_ORDER HPAGE_PMD_ORDER - 1
+
/*
* shmem_fallocate communicates with shmem_fault or shmem_writepage via
* inode->i_private (with i_rwsem making sure that it has only one user at
@@ -1680,26 +1683,58 @@ static struct folio *shmem_alloc_folio(gfp_t gfp,
return folio;
}
+/**
+ * shmem_mapping_size_order - Get maximum folio order for the given file size.
+ * @mapping: Target address_space.
+ * @index: The page index.
+ * @size: The suggested size of the folio to create.
+ *
+ * This returns a high order for folios (when supported) based on the file size
+ * which the mapping currently allows at the given index. The index is relevant
+ * due to alignment considerations the mapping might have. The returned order
+ * may be less than the size passed.
+ *
+ * Like __filemap_get_folio order calculation.
+ *
+ * Return: The order.
+ */
+static inline unsigned int
+shmem_mapping_size_order(struct address_space *mapping, pgoff_t index,
+ size_t size, struct shmem_sb_info *sbinfo)
+{
+ unsigned int order = ilog2(size);
+
+ if ((order <= PAGE_SHIFT) ||
+ (!mapping_large_folio_support(mapping) || !sbinfo->noswap))
+ return 0;
+ else
+ order = order - PAGE_SHIFT;
+
+ /* If we're not aligned, allocate a smaller folio */
+ if (index & ((1UL << order) - 1))
+ order = __ffs(index);
+
+ order = min_t(size_t, order, MAX_SHMEM_ORDER);
+
+ /* Order-1 not supported due to THP dependency */
+ return (order == 1) ? 0 : order;
+}
+
static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode,
- pgoff_t index, bool huge, unsigned int *order)
+ pgoff_t index, unsigned int order)
{
struct shmem_inode_info *info = SHMEM_I(inode);
struct folio *folio;
- int nr;
- int err;
-
- if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
- huge = false;
- nr = huge ? HPAGE_PMD_NR : 1U << *order;
+ int nr = 1U << order;
+ int err = shmem_inode_acct_block(inode, nr);
- err = shmem_inode_acct_block(inode, nr);
if (err)
goto failed;
- if (huge)
+ if (order == HPAGE_PMD_ORDER)
folio = shmem_alloc_hugefolio(gfp, info, index);
else
- folio = shmem_alloc_folio(gfp, info, index, *order);
+ folio = shmem_alloc_folio(gfp, info, index, order);
if (folio) {
__folio_set_locked(folio);
__folio_set_swapbacked(folio);
@@ -2030,18 +2065,19 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
return 0;
}
+ order = shmem_mapping_size_order(inode->i_mapping, index, len, sbinfo);
+
if (!shmem_is_huge(inode, index, false,
vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0))
goto alloc_nohuge;
huge_gfp = vma_thp_gfp_mask(vma);
huge_gfp = limit_gfp_mask(huge_gfp, gfp);
- folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true,
- &order);
+ folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index,
+ HPAGE_PMD_ORDER);
if (IS_ERR(folio)) {
alloc_nohuge:
- folio = shmem_alloc_and_acct_folio(gfp, inode, index, false,
- &order);
+ folio = shmem_alloc_and_acct_folio(gfp, inode, index, order);
}
if (IS_ERR(folio)) {
int retry = 5;
@@ -2145,6 +2181,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
if (folio_test_large(folio)) {
folio_unlock(folio);
folio_put(folio);
+ if (--order == 1)
+ order = 0;
goto alloc_nohuge;
}
unlock:
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2 2/6] shmem: return freed pages in shmem_free_swap
2023-09-19 13:55 ` [PATCH v2 2/6] shmem: return freed pages in shmem_free_swap Daniel Gomez
@ 2023-09-19 14:56 ` Matthew Wilcox
0 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2023-09-19 14:56 UTC (permalink / raw)
To: Daniel Gomez
Cc: minchan, senozhatsky, axboe, djwong, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
gost.dev, Pankaj Raghav
On Tue, Sep 19, 2023 at 01:55:47PM +0000, Daniel Gomez wrote:
> +++ b/mm/shmem.c
> @@ -846,16 +846,18 @@ static void shmem_delete_from_page_cache(struct folio *folio, void *radswap)
> /*
> * Remove swap entry from page cache, free the swap and its page cache.
> */
> -static int shmem_free_swap(struct address_space *mapping,
> +static long shmem_free_swap(struct address_space *mapping,
> pgoff_t index, void *radswap)
> {
> void *old;
>
> old = xa_cmpxchg_irq(&mapping->i_pages, index, radswap, NULL, 0);
> if (old != radswap)
> - return -ENOENT;
> + return 0;
> +
> free_swap_and_cache(radix_to_swp_entry(radswap));
> - return 0;
> +
> + return folio_nr_pages((struct folio *)radswap);
> }
Oh my goodness. I have led you astray; my apologies.
shmem_free_swap() is called when the 'folio' is NOT actually a folio.
It's an 'exceptional' / 'value' entry. We can't do this.
Do we encode the size of the swap entry in the swp_entry_t or do
we have to get that information from the XArray (which no longer
knows it after we've stored a NULL there)?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 6/6] shmem: add large folios support to the write path
2023-09-19 13:55 ` [PATCH v2 6/6] shmem: add large folios support to the write path Daniel Gomez
@ 2023-09-19 15:01 ` Matthew Wilcox
2023-09-19 16:28 ` Daniel Gomez
2023-09-20 17:41 ` kernel test robot
2023-09-25 20:39 ` kernel test robot
2 siblings, 1 reply; 36+ messages in thread
From: Matthew Wilcox @ 2023-09-19 15:01 UTC (permalink / raw)
To: Daniel Gomez
Cc: minchan, senozhatsky, axboe, djwong, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
gost.dev, Pankaj Raghav
On Tue, Sep 19, 2023 at 01:55:54PM +0000, Daniel Gomez wrote:
> Add large folio support for shmem write path matching the same high
> order preference mechanism used for iomap buffered IO path as used in
> __filemap_get_folio() with a difference on the max order permitted
> (being PMD_ORDER-1) to respect the huge mount option when large folio
> is supported.
I'm strongly opposed to "respecting the huge mount option". We're
determining the best order to use for the folios. Artificially limiting
the size because the sysadmin read an article from 2005 that said to
use this option is STUPID.
> else
> - folio = shmem_alloc_folio(gfp, info, index, *order);
> + folio = shmem_alloc_folio(gfp, info, index, order);
Why did you introduce it as *order, only to change it back to order
in this patch? It feels like you just fixed up patch 6 rather than
percolating the changes all the way back to where they should have
been done. This makes the reviewer's life hard.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 6/6] shmem: add large folios support to the write path
2023-09-19 15:01 ` Matthew Wilcox
@ 2023-09-19 16:28 ` Daniel Gomez
0 siblings, 0 replies; 36+ messages in thread
From: Daniel Gomez @ 2023-09-19 16:28 UTC (permalink / raw)
To: Matthew Wilcox
Cc: minchan, senozhatsky, axboe, djwong, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
gost.dev, Pankaj Raghav
On Tue, Sep 19, 2023 at 04:01:19PM +0100, Matthew Wilcox wrote:
> On Tue, Sep 19, 2023 at 01:55:54PM +0000, Daniel Gomez wrote:
> > Add large folio support for shmem write path matching the same high
> > order preference mechanism used for iomap buffered IO path as used in
> > __filemap_get_folio() with a difference on the max order permitted
> > (being PMD_ORDER-1) to respect the huge mount option when large folio
> > is supported.
>
> I'm strongly opposed to "respecting the huge mount option". We're
> determining the best order to use for the folios. Artificially limiting
> the size because the sysadmin read an article from 2005 that said to
> use this option is STUPID.
Then, I would still have the conflict on what to do when the order is
same as huge. I guess huge does not make sense in this new scenario?
unless we add large folios controls as proposal in linux-MM meeting
notes [1]. But I'm missing a bit of context so it's not clear to me
what to do next.
[1] https://lore.kernel.org/all/4966f496-9f71-460c-b2ab-8661384ce626@arm.com/T/#u
In that sense, I wanted to have a big picture of what was this new
strategy implying in terms of folio order when adding to page cache,
so I added tracing for it (same as in readahead). With bpftrace I
can see the following (notes added to explain each field) after running
fsx up to 119M:
@c: 363049108 /* total folio order being traced */
@order[8]: 2 /* order 8 being used 2 times (add_to_page_cache) */
@order[5]: 3249587 */ order 5 being used 3249587 times
(add_to_page_cache) */
@order[4]: 5972205
@order[3]: 8890418
@order[2]: 10380055
@order[0]: 334556841
@order_2: /* linear histogram of folio order */
[0, 1) 334556841 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1, 2) 0 | |
[2, 3) 10380055 |@ |
[3, 4) 8890418 |@ |
[4, 5) 5972205 | |
[5, 6) 3249587 | |
[6, 7) 0 | |
[7, 8) 0 | |
[8, 9) 2 | |
I guess that's not te best workload to see this but would tracing be also
interesting to add to the series?
>
> > else
> > - folio = shmem_alloc_folio(gfp, info, index, *order);
> > + folio = shmem_alloc_folio(gfp, info, index, order);
>
> Why did you introduce it as *order, only to change it back to order
> in this patch? It feels like you just fixed up patch 6 rather than
> percolating the changes all the way back to where they should have
> been done. This makes the reviewer's life hard.
>
Sorry about that. I missed it in my changes.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 6/6] shmem: add large folios support to the write path
2023-09-19 13:55 ` [PATCH v2 6/6] shmem: add large folios support to the write path Daniel Gomez
2023-09-19 15:01 ` Matthew Wilcox
@ 2023-09-20 17:41 ` kernel test robot
2023-09-25 20:39 ` kernel test robot
2 siblings, 0 replies; 36+ messages in thread
From: kernel test robot @ 2023-09-20 17:41 UTC (permalink / raw)
To: Daniel Gomez; +Cc: oe-kbuild-all
Hi Daniel,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.6-rc2 next-20230920]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Daniel-Gomez/shmem-drop-BLOCKS_PER_PAGE-macro/20230920-005146
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230919135536.2165715-7-da.gomez%40samsung.com
patch subject: [PATCH v2 6/6] shmem: add large folios support to the write path
config: nios2-allyesconfig (https://download.01.org/0day-ci/archive/20230921/202309210127.kmPiVzus-lkp@intel.com/config)
compiler: nios2-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230921/202309210127.kmPiVzus-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309210127.kmPiVzus-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from <command-line>:
In function 'shmem_alloc_and_acct_folio',
inlined from 'shmem_get_folio_gfp.isra' at mm/shmem.c:2080:11:
>> include/linux/compiler_types.h:425:45: error: call to '__compiletime_assert_359' declared with attribute error: BUILD_BUG failed
425 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
include/linux/compiler_types.h:406:25: note: in definition of macro '__compiletime_assert'
406 | prefix ## suffix(); \
| ^~~~~~
include/linux/compiler_types.h:425:9: note: in expansion of macro '_compiletime_assert'
425 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
| ^~~~~~~~~~~~~~~~
include/linux/huge_mm.h:257:28: note: in expansion of macro 'BUILD_BUG'
257 | #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
| ^~~~~~~~~
include/linux/huge_mm.h:67:26: note: in expansion of macro 'HPAGE_PMD_SHIFT'
67 | #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)
| ^~~~~~~~~~~~~~~
mm/shmem.c:1734:22: note: in expansion of macro 'HPAGE_PMD_ORDER'
1734 | if (order == HPAGE_PMD_ORDER)
| ^~~~~~~~~~~~~~~
vim +/__compiletime_assert_359 +425 include/linux/compiler_types.h
eb5c2d4b45e3d2 Will Deacon 2020-07-21 411
eb5c2d4b45e3d2 Will Deacon 2020-07-21 412 #define _compiletime_assert(condition, msg, prefix, suffix) \
eb5c2d4b45e3d2 Will Deacon 2020-07-21 413 __compiletime_assert(condition, msg, prefix, suffix)
eb5c2d4b45e3d2 Will Deacon 2020-07-21 414
eb5c2d4b45e3d2 Will Deacon 2020-07-21 415 /**
eb5c2d4b45e3d2 Will Deacon 2020-07-21 416 * compiletime_assert - break build and emit msg if condition is false
eb5c2d4b45e3d2 Will Deacon 2020-07-21 417 * @condition: a compile-time constant condition to check
eb5c2d4b45e3d2 Will Deacon 2020-07-21 418 * @msg: a message to emit if condition is false
eb5c2d4b45e3d2 Will Deacon 2020-07-21 419 *
eb5c2d4b45e3d2 Will Deacon 2020-07-21 420 * In tradition of POSIX assert, this macro will break the build if the
eb5c2d4b45e3d2 Will Deacon 2020-07-21 421 * supplied condition is *false*, emitting the supplied error message if the
eb5c2d4b45e3d2 Will Deacon 2020-07-21 422 * compiler has support to do so.
eb5c2d4b45e3d2 Will Deacon 2020-07-21 423 */
eb5c2d4b45e3d2 Will Deacon 2020-07-21 424 #define compiletime_assert(condition, msg) \
eb5c2d4b45e3d2 Will Deacon 2020-07-21 @425 _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
eb5c2d4b45e3d2 Will Deacon 2020-07-21 426
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 5/6] shmem: add file length in shmem_get_folio path
2023-09-19 13:55 ` [PATCH v2 5/6] shmem: add file length in shmem_get_folio path Daniel Gomez
@ 2023-09-20 18:03 ` kernel test robot
0 siblings, 0 replies; 36+ messages in thread
From: kernel test robot @ 2023-09-20 18:03 UTC (permalink / raw)
To: Daniel Gomez; +Cc: oe-kbuild-all
Hi Daniel,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.6-rc2 next-20230920]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Daniel-Gomez/shmem-drop-BLOCKS_PER_PAGE-macro/20230920-005146
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230919135536.2165715-6-da.gomez%40samsung.com
patch subject: [PATCH v2 5/6] shmem: add file length in shmem_get_folio path
config: sparc-randconfig-001-20230920 (https://download.01.org/0day-ci/archive/20230921/202309210143.LTEqtr2H-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230921/202309210143.LTEqtr2H-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309210143.LTEqtr2H-lkp@intel.com/
All errors (new ones prefixed by >>):
mm/userfaultfd.c: In function 'mfill_atomic_pte_continue':
>> mm/userfaultfd.c:259:15: error: too few arguments to function 'shmem_get_folio'
259 | ret = shmem_get_folio(inode, pgoff, &folio, SGP_NOALLOC);
| ^~~~~~~~~~~~~~~
In file included from mm/userfaultfd.c:17:
include/linux/shmem_fs.h:135:5: note: declared here
135 | int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
| ^~~~~~~~~~~~~~~
vim +/shmem_get_folio +259 mm/userfaultfd.c
c1a4de99fada21 Andrea Arcangeli 2015-09-04 246
153132571f0204 Axel Rasmussen 2021-06-30 247 /* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */
61c5004022f56c Axel Rasmussen 2023-03-14 248 static int mfill_atomic_pte_continue(pmd_t *dst_pmd,
153132571f0204 Axel Rasmussen 2021-06-30 249 struct vm_area_struct *dst_vma,
153132571f0204 Axel Rasmussen 2021-06-30 250 unsigned long dst_addr,
d9712937037e0c Axel Rasmussen 2023-03-14 251 uffd_flags_t flags)
153132571f0204 Axel Rasmussen 2021-06-30 252 {
153132571f0204 Axel Rasmussen 2021-06-30 253 struct inode *inode = file_inode(dst_vma->vm_file);
153132571f0204 Axel Rasmussen 2021-06-30 254 pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
12acf4fbc4f78b Matthew Wilcox (Oracle 2022-09-02 255) struct folio *folio;
153132571f0204 Axel Rasmussen 2021-06-30 256 struct page *page;
153132571f0204 Axel Rasmussen 2021-06-30 257 int ret;
153132571f0204 Axel Rasmussen 2021-06-30 258
12acf4fbc4f78b Matthew Wilcox (Oracle 2022-09-02 @259) ret = shmem_get_folio(inode, pgoff, &folio, SGP_NOALLOC);
12acf4fbc4f78b Matthew Wilcox (Oracle 2022-09-02 260) /* Our caller expects us to return -EFAULT if we failed to find folio */
73f37dbcfe1763 Axel Rasmussen 2022-06-10 261 if (ret == -ENOENT)
73f37dbcfe1763 Axel Rasmussen 2022-06-10 262 ret = -EFAULT;
153132571f0204 Axel Rasmussen 2021-06-30 263 if (ret)
153132571f0204 Axel Rasmussen 2021-06-30 264 goto out;
12acf4fbc4f78b Matthew Wilcox (Oracle 2022-09-02 265) if (!folio) {
153132571f0204 Axel Rasmussen 2021-06-30 266 ret = -EFAULT;
153132571f0204 Axel Rasmussen 2021-06-30 267 goto out;
153132571f0204 Axel Rasmussen 2021-06-30 268 }
153132571f0204 Axel Rasmussen 2021-06-30 269
12acf4fbc4f78b Matthew Wilcox (Oracle 2022-09-02 270) page = folio_file_page(folio, pgoff);
a7605426666196 Yang Shi 2022-01-14 271 if (PageHWPoison(page)) {
a7605426666196 Yang Shi 2022-01-14 272 ret = -EIO;
a7605426666196 Yang Shi 2022-01-14 273 goto out_release;
a7605426666196 Yang Shi 2022-01-14 274 }
a7605426666196 Yang Shi 2022-01-14 275
61c5004022f56c Axel Rasmussen 2023-03-14 276 ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
d9712937037e0c Axel Rasmussen 2023-03-14 277 page, false, flags);
153132571f0204 Axel Rasmussen 2021-06-30 278 if (ret)
153132571f0204 Axel Rasmussen 2021-06-30 279 goto out_release;
153132571f0204 Axel Rasmussen 2021-06-30 280
12acf4fbc4f78b Matthew Wilcox (Oracle 2022-09-02 281) folio_unlock(folio);
153132571f0204 Axel Rasmussen 2021-06-30 282 ret = 0;
153132571f0204 Axel Rasmussen 2021-06-30 283 out:
153132571f0204 Axel Rasmussen 2021-06-30 284 return ret;
153132571f0204 Axel Rasmussen 2021-06-30 285 out_release:
12acf4fbc4f78b Matthew Wilcox (Oracle 2022-09-02 286) folio_unlock(folio);
12acf4fbc4f78b Matthew Wilcox (Oracle 2022-09-02 287) folio_put(folio);
153132571f0204 Axel Rasmussen 2021-06-30 288 goto out;
153132571f0204 Axel Rasmussen 2021-06-30 289 }
153132571f0204 Axel Rasmussen 2021-06-30 290
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 6/6] shmem: add large folios support to the write path
2023-09-19 13:55 ` [PATCH v2 6/6] shmem: add large folios support to the write path Daniel Gomez
2023-09-19 15:01 ` Matthew Wilcox
2023-09-20 17:41 ` kernel test robot
@ 2023-09-25 20:39 ` kernel test robot
2 siblings, 0 replies; 36+ messages in thread
From: kernel test robot @ 2023-09-25 20:39 UTC (permalink / raw)
To: Daniel Gomez; +Cc: llvm, oe-kbuild-all
Hi Daniel,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.6-rc3 next-20230925]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Daniel-Gomez/shmem-drop-BLOCKS_PER_PAGE-macro/20230920-005146
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230919135536.2165715-7-da.gomez%40samsung.com
patch subject: [PATCH v2 6/6] shmem: add large folios support to the write path
config: arm-spear3xx_defconfig (https://download.01.org/0day-ci/archive/20230926/202309260433.1Ai6wzDe-lkp@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230926/202309260433.1Ai6wzDe-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309260433.1Ai6wzDe-lkp@intel.com/
All errors (new ones prefixed by >>):
>> mm/shmem.c:1734:15: error: call to '__compiletime_assert_307' declared with 'error' attribute: BUILD_BUG failed
if (order == HPAGE_PMD_ORDER)
^
include/linux/huge_mm.h:67:26: note: expanded from macro 'HPAGE_PMD_ORDER'
#define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)
^
include/linux/huge_mm.h:257:28: note: expanded from macro 'HPAGE_PMD_SHIFT'
#define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
^
include/linux/build_bug.h:59:21: note: expanded from macro 'BUILD_BUG'
#define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
^
note: (skipping 2 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all)
include/linux/compiler_types.h:413:2: note: expanded from macro '_compiletime_assert'
__compiletime_assert(condition, msg, prefix, suffix)
^
include/linux/compiler_types.h:406:4: note: expanded from macro '__compiletime_assert'
prefix ## suffix(); \
^
<scratch space>:52:1: note: expanded from here
__compiletime_assert_307
^
1 error generated.
vim +1734 mm/shmem.c
1722
1723 static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode,
1724 pgoff_t index, unsigned int order)
1725 {
1726 struct shmem_inode_info *info = SHMEM_I(inode);
1727 struct folio *folio;
1728 int nr = 1U << order;
1729 int err = shmem_inode_acct_block(inode, nr);
1730
1731 if (err)
1732 goto failed;
1733
> 1734 if (order == HPAGE_PMD_ORDER)
1735 folio = shmem_alloc_hugefolio(gfp, info, index);
1736 else
1737 folio = shmem_alloc_folio(gfp, info, index, order);
1738 if (folio) {
1739 __folio_set_locked(folio);
1740 __folio_set_swapbacked(folio);
1741 return folio;
1742 }
1743
1744 err = -ENOMEM;
1745 shmem_inode_unacct_blocks(inode, nr);
1746 failed:
1747 return ERR_PTR(err);
1748 }
1749
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 36+ messages in thread
* [RFC PATCH 00/11] shmem: high order folios support in write path
[not found] ` <CGME20231028211535eucas1p250e19444b8c973221b7cb9e8ab957da7@eucas1p2.samsung.com>
@ 2023-10-28 21:15 ` Daniel Gomez
[not found] ` <CGME20231028211538eucas1p186e33f92dbea7030f14f7f79aa1b8d54@eucas1p1.samsung.com>
` (11 more replies)
0 siblings, 12 replies; 36+ messages in thread
From: Daniel Gomez @ 2023-10-28 21:15 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
Hi,
This series try to add support for high order folios in shmem write and
fallocate paths when swap is disabled (noswap option). This is part of the
Large Block Size (LBS) effort [1][2] and a continuation of the shmem work from
Luis here [3] following Matthew Wilcox's suggestion [4] regarding the path to
take for the folio allocation order calculation.
[1] https://kernelnewbies.org/KernelProjects/large-block-size
[2] https://docs.google.com/spreadsheets/d/e/2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8/pubhtml#
[3] RFC v2 add support for blocksize > PAGE_SIZE
https://lore.kernel.org/all/ZHBowMEDfyrAAOWH@bombadil.infradead.org/T/#md3e93ab46ce2ad9254e1eb54ffe71211988b5632
[4] https://lore.kernel.org/all/ZHD9zmIeNXICDaRJ@casper.infradead.org/
I went from the latest v2 to an RFC because my current implementation is broken
the moment large folios is enabled in the fallocate path. So this is a work in
progress RFC patch series (and therefore incomplete).
The issue was identified when running this series on fstests for tmpfs and fail
on generic/285 and generic/436 tests (lseek DATA/HOLE) [5][6] with large folios
support in the fallocate path. To fix these regressions I try adding support
for per-block tracking of the uptodate flag instead of doing it per folio. I
borrowed this implementation from iomap but I may not integrated it correctly.
I think this was introduced in iomap few years back to address the problem when
block size < PS [7], and recently being optimized [8] and added the ability of
tracking per-block dirty flag. With large folios, per-block (page) uptodate is
needed, otherwise the entire large folio is marked as uptodate in
shmem_write_end() making the lseek HOLE and DATA tests fail (above tests).
These per-block uptodate tracking support [9] fixes the above mentioned generic
tests but introduces new errors easily reproducible with fsx [10]. In addition,
this other thread [11] explains the performance problem with XFS for high order
folios in iomap write path but I think here we need at least the uptodate only
because of the above reasoning.
Please, find below the logs [5][6] for lseek DATA/HOLE before and after the
fixes. And the logs [10] for the fsx failure.
I'm looking forward for your comments for error correction and to determine the
overall vailidty of the approach.
Note:
In case people are interested in testing, we've added testing support for tmpfs
in kdevops using (x)fstests. Please find the link to the baseline results in
the below changes section. Available profiles are:
* default (no mount options)
* huge=always
* huge=within_size
* huge=advise
* noswap, huge=never
* noswap, huge=always
* noswap, huge=within_size
* noswap, huge=advise
Changes since v2
* Rebased onto next-20231027 including latests changes for shmem and mempolicy.
* Testing tmpfs using fstests with kdevops. Baseline results for different
linux-next tags can be found here:
https://github.com/linux-kdevops/kdevops/tree/master/workflows/fstests/expunges/6.6.0-rc6-next-20231019/tmpfs/unassigned
https://github.com/linux-kdevops/kdevops/tree/master/workflows/fstests/expunges/6.6.0-rc4-next-20231006/tmpfs/unassigned
https://github.com/linux-kdevops/kdevops/tree/master/workflows/fstests/expunges/6.6.0-rc4-next-20231004/tmpfs/unassigned
* Added XArray tests to prove order is not kept when replacing an entry with
NULL when using cmpxchg. Required for patch 'shmem: return number of pages
beeing freed in shmem_free_swap'
* Added XArray test for multi-index use.
* Drop huge argument in shmem_alloc_and_add_folio() and make use of VM_HUGEPAGE
instead.
* Increase max order from PMD_ORDER-1 to PMD_ORDER (MAX_PAGECACHE_ORDER).
* Add/fix shmem_free_swap conversion to return (properly) the number of pages
freed.
* Fix order patch being changed in further patch. However, I do initialize
order = 0 in patch [patch-order] and then updated to the mapping size in
patch [patch-high-order]. I can merge both patches if necessary to avoid this
change in the series.
[patch-order]: shmem: add order arg to shmem_alloc_folio()
[patch-high-order]: shmem: add large folio support to the write path
* Folio order tracing when added to page cache.
* THP vs large folios in the write path: if huge flag is passed and kernel has
support for THP, then allocation will use huge the path, otherwise folio
order will be used, based on the file size without using huge_gfp flags.
* Add patch to remove huge flag argument from shmem_alloc_and_add_folio. We can
check for the huge flag being set as part of gfp flags (VM_HUGEPAGE). Check
patch: 'shmem: remove huge arg from shmem_alloc_and_add_folio()'.
* Add high order folios in fallocate path.
* Add per-block uptodate tracking based on iomap implementation (work in
progress).
Changes since v1
* Order handling code simplified in shmem_get_folio_gfp after Matthew Willcox's
review.
* Drop patch 1/6 [filemap] and merge mapping_size_order code directly in shmem.
[filemap] filemap: make the folio order calculation shareable
* Added MAX_SHMEM_ORDER to make it explicit we don't have the same max order as
in pagecache (MAX_PAGECACHE_ORDER).
* Use HPAGE_PMD_ORDER-1 as MAX_SHMEM_ORDER to respect huge mount option.
* Update cover letter: drop huge strategy question and add more context
regarding LBS project. Add fsx and fstests summary with new baseline.
* Add fixes found by Matthew in patch 3/6 [acct].
[acct] shmem: account for large order folios
* Fix length (i_size_read -> PAGE_SIZE) that is passed to shmem_get_folio_gfp
in shmem_fault and shmem_read_folio_gfp to PAGE_SIZE.
* Add patch as suggested by Matthew to return the number of pages freed in
shmem_free_swap (instead of errno). When no pages are freed, return 0
(pages). Note: As an alternative, we can embed -ENOENT and make use of
IS_ERR_VALUE. Approach discarded because little value was added. If this
method is preferred, please let discuss it.
[5] (x)ftests regression with large folios in the fallocate path:
generic/285: src/seek_sanity_test/test09()
generic/436: src/seek_sanity_test/test13()
[6] (x)ftests, how to check/reproduce regressions:
```sh
mkdir -p /mnt/test-tmpfs
./src/seek_sanity_test -s 9 -e 9 /mnt/test-tmpfs/file
./src/seek_sanity_test -s 13 -e 13 /mnt/test-tmpfs/file
umount /mnt/test-tmpfs
```
[7] iomap per-block uptodate tracking in iomap:
9dc55f1389f9 iomap: add support for sub-pagesize buffered I/O without buffer heads
1cea335d1db1 iomap: fix sub-page uptodate handling
[8] iomap per-block dirty and uptodate flags optimizations in iomap
4ce02c679722 iomap: Add per-block dirty state tracking to improve performance
35d30c9cf127 iomap: don't skip reading in !uptodate folios when unsharing a range
a01b8f225248 iomap: Allocate ifs in ->write_begin() early
7f79d85b525b iomap: Refactor iomap_write_delalloc_punch() function out
0af2b37d8e7a iomap: Use iomap_punch_t typedef
eee2d2e6ea55 iomap: Fix possible overflow condition in iomap_write_delalloc_scan
cc86181a3b76 iomap: Add some uptodate state handling helpers for ifs state bitmap
3ea5c76cadee iomap: Drop ifs argument from iomap_set_range_uptodate()
04f52c4e6f80 iomap: Rename iomap_page to iomap_folio_state and others
[9] Patch: shmem: add per-block uptodate tracking
[10] fsx up to 633 ops (or up to 1200 without -X).
```sh
mkdir -p /mnt/test-tmpfs
mount -t tmpfs -o size=1G -o noswap tmpfs /mnt/test-tmpfs
/root/xfstests-dev/ltp/fsx /mnt/test-tmpfs/file -d -N 1200 -X
umount /mnt/test-tmpfs
```
Logs:
```logs
READ BAD DATA: offset = 0x0, size = 0x3364c, fname = /mnt/test-tmpfs/file
OFFSET GOOD BAD RANGE
0x28000 0x79d0 0x0000 0x0
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x28001 0xd079 0x0000 0x1
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x28002 0x7914 0x0000 0x2
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x28003 0x1479 0x0000 0x3
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x28004 0x79ec 0x0000 0x4
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x28005 0xec79 0x0000 0x5
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x28006 0x7929 0x0000 0x6
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x28007 0x2979 0x0000 0x7
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x28008 0x7935 0x0000 0x8
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x28009 0x3579 0x0000 0x9
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x2800a 0x7968 0x0000 0xa
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x2800b 0x6879 0x0000 0xb
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x2800c 0x79d3 0x0000 0xc
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x2800d 0xd379 0x0000 0xd
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x2800e 0x79f2 0x0000 0xe
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x2800f 0xf279 0x0000 0xf
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
LOG DUMP (633 total operations):
1( 1 mod 256): SKIPPED (no operation)
2( 2 mod 256): TRUNCATE UP from 0x0 to 0x3aea7 ******WWWW
3( 3 mod 256): COPY 0x1a3d6 thru 0x26607 (0xc232 bytes) to 0x2ea8c thru 0x3acbd
4( 4 mod 256): READ 0x2f6d2 thru 0x3853e (0x8e6d bytes)
5( 5 mod 256): READ 0x2d6d3 thru 0x310f5 (0x3a23 bytes) ***RRRR***
6( 6 mod 256): SKIPPED (no operation)
7( 7 mod 256): WRITE 0x341a2 thru 0x3fafc (0xb95b bytes) EXTEND
...
625(113 mod 256): PUNCH 0x8f01 thru 0x9806 (0x906 bytes)
626(114 mod 256): MAPREAD 0xe517 thru 0x11396 (0x2e80 bytes)
627(115 mod 256): SKIPPED (no operation)
628(116 mod 256): SKIPPED (no operation)
629(117 mod 256): FALLOC 0x284fe thru 0x32480 (0x9f82 bytes) EXTENDING ******FFFF
630(118 mod 256): WRITE 0x333ac thru 0x3364b (0x2a0 bytes) HOLE
631(119 mod 256): SKIPPED (no operation)
632(120 mod 256): SKIPPED (no operation)
633(121 mod 256): WRITE 0x1f876 thru 0x2d86a (0xdff5 bytes) ***WWWW
```
Daniel Gomez (9):
XArray: add cmpxchg order test
shmem: drop BLOCKS_PER_PAGE macro
shmem: return number of pages beeing freed in shmem_free_swap
shmem: trace shmem_add_to_page_cache folio order
shmem: remove huge arg from shmem_alloc_and_add_folio()
shmem: add file length arg in shmem_get_folio() path
shmem: add order arg to shmem_alloc_folio()
shmem: add large folio support to the write path
shmem: add per-block uptodate tracking
Luis Chamberlain (2):
test_xarray: add tests for advanced multi-index use
shmem: account for large order folios
MAINTAINERS | 1 +
include/linux/shmem_fs.h | 2 +-
include/trace/events/shmem.h | 52 ++++++
lib/test_xarray.c | 155 +++++++++++++++++
mm/khugepaged.c | 3 +-
mm/shmem.c | 325 ++++++++++++++++++++++++++++-------
mm/userfaultfd.c | 2 +-
7 files changed, 472 insertions(+), 68 deletions(-)
create mode 100644 include/trace/events/shmem.h
--
2.39.2
^ permalink raw reply [flat|nested] 36+ messages in thread
* [RFC PATCH 01/11] XArray: add cmpxchg order test
[not found] ` <CGME20231028211538eucas1p186e33f92dbea7030f14f7f79aa1b8d54@eucas1p1.samsung.com>
@ 2023-10-28 21:15 ` Daniel Gomez
2023-10-29 20:11 ` Matthew Wilcox
0 siblings, 1 reply; 36+ messages in thread
From: Daniel Gomez @ 2023-10-28 21:15 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
XArray multi-index entries do not keep track of the order stored once
the entry is being marked as used (replaced with NULL). Add a test
to check the order is actually lost.
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
---
lib/test_xarray.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/lib/test_xarray.c b/lib/test_xarray.c
index e77d4856442c..6c22588963bc 100644
--- a/lib/test_xarray.c
+++ b/lib/test_xarray.c
@@ -423,6 +423,26 @@ static noinline void check_cmpxchg(struct xarray *xa)
XA_BUG_ON(xa, !xa_empty(xa));
}
+static noinline void check_cmpxchg_order(struct xarray *xa)
+{
+ void *FIVE = xa_mk_value(5);
+ unsigned int order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 15 : 1;
+ void *old;
+
+ XA_BUG_ON(xa, !xa_empty(xa));
+ XA_BUG_ON(xa, xa_store_index(xa, 5, GFP_KERNEL) != NULL);
+ XA_BUG_ON(xa, xa_insert(xa, 5, FIVE, GFP_KERNEL) != -EBUSY);
+ XA_BUG_ON(xa, xa_store_order(xa, 5, order, FIVE, GFP_KERNEL));
+ XA_BUG_ON(xa, xa_get_order(xa, 5) != order);
+ XA_BUG_ON(xa, xa_get_order(xa, xa_to_value(FIVE)) != order);
+ old = xa_cmpxchg(xa, 5, FIVE, NULL, GFP_KERNEL);
+ XA_BUG_ON(xa, old != FIVE);
+ XA_BUG_ON(xa, xa_get_order(xa, 5) != 0);
+ XA_BUG_ON(xa, xa_get_order(xa, xa_to_value(FIVE)) != 0);
+ XA_BUG_ON(xa, xa_get_order(xa, xa_to_value(old)) != 0);
+ XA_BUG_ON(xa, !xa_empty(xa));
+}
+
static noinline void check_reserve(struct xarray *xa)
{
void *entry;
@@ -1801,6 +1821,7 @@ static int xarray_checks(void)
check_xas_erase(&array);
check_insert(&array);
check_cmpxchg(&array);
+ check_cmpxchg_order(&array);
check_reserve(&array);
check_reserve(&xa0);
check_multi_store(&array);
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [RFC PATCH 02/11] test_xarray: add tests for advanced multi-index use
[not found] ` <CGME20231028211538eucas1p1456b4c759a9fed51a6a77fbf2c946011@eucas1p1.samsung.com>
@ 2023-10-28 21:15 ` Daniel Gomez
0 siblings, 0 replies; 36+ messages in thread
From: Daniel Gomez @ 2023-10-28 21:15 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
From: Luis Chamberlain <mcgrof@kernel.org>
The multi index selftests are great but they don't replicate
how we deal with the page cache exactly, which makes it a bit
hard to follow as the page cache uses the advanced API.
Add tests which use the advanced API, mimicking what we do in the
page cache, while at it, extend the example to do what is needed for
min order support.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: Daniel Gomez <da.gomez@samsung.com>
---
lib/test_xarray.c | 134 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 134 insertions(+)
diff --git a/lib/test_xarray.c b/lib/test_xarray.c
index 6c22588963bc..22a687e33dc5 100644
--- a/lib/test_xarray.c
+++ b/lib/test_xarray.c
@@ -694,6 +694,139 @@ static noinline void check_multi_store(struct xarray *xa)
#endif
}
+#ifdef CONFIG_XARRAY_MULTI
+static noinline void check_xa_multi_store_adv_add(struct xarray *xa,
+ unsigned long index,
+ unsigned int order,
+ void *p)
+{
+ XA_STATE(xas, xa, index);
+
+ xas_set_order(&xas, index, order);
+
+ do {
+ xas_lock_irq(&xas);
+
+ xas_store(&xas, p);
+ XA_BUG_ON(xa, xas_error(&xas));
+ XA_BUG_ON(xa, xa_load(xa, index) != p);
+
+ xas_unlock_irq(&xas);
+ } while (xas_nomem(&xas, GFP_KERNEL));
+
+ XA_BUG_ON(xa, xas_error(&xas));
+}
+
+static noinline void check_xa_multi_store_adv_delete(struct xarray *xa,
+ unsigned long index,
+ unsigned int order)
+{
+ unsigned int nrpages = 1UL << order;
+ unsigned long base = round_down(index, nrpages);
+ XA_STATE(xas, xa, base);
+
+ xas_set_order(&xas, base, order);
+ xas_store(&xas, NULL);
+ xas_init_marks(&xas);
+}
+
+static unsigned long some_val = 0xdeadbeef;
+static unsigned long some_val_2 = 0xdeaddead;
+
+/* mimics the page cache */
+static noinline void check_xa_multi_store_adv(struct xarray *xa,
+ unsigned long pos,
+ unsigned int order)
+{
+ unsigned int nrpages = 1UL << order;
+ unsigned long index, base, next_index, next_next_index;
+ unsigned int i;
+
+ index = pos >> PAGE_SHIFT;
+ base = round_down(index, nrpages);
+ next_index = round_down(base + nrpages, nrpages);
+ next_next_index = round_down(next_index + nrpages, nrpages);
+
+ check_xa_multi_store_adv_add(xa, base, order, &some_val);
+
+ for (i = 0; i < nrpages; i++)
+ XA_BUG_ON(xa, xa_load(xa, base + i) != &some_val);
+
+ XA_BUG_ON(xa, xa_load(xa, next_index) != NULL);
+
+ /* Use order 0 for the next item */
+ check_xa_multi_store_adv_add(xa, next_index, 0, &some_val_2);
+ XA_BUG_ON(xa, xa_load(xa, next_index) != &some_val_2);
+
+ /* Remove the next item */
+ check_xa_multi_store_adv_delete(xa, next_index, 0);
+
+ /* Now use order for a new pointer */
+ check_xa_multi_store_adv_add(xa, next_index, order, &some_val_2);
+
+ for (i = 0; i < nrpages; i++)
+ XA_BUG_ON(xa, xa_load(xa, next_index + i) != &some_val_2);
+
+ check_xa_multi_store_adv_delete(xa, next_index, order);
+ check_xa_multi_store_adv_delete(xa, base, order);
+ XA_BUG_ON(xa, !xa_empty(xa));
+
+ /* starting fresh again */
+
+ /* let's test some holes now */
+
+ /* hole at base and next_next */
+ check_xa_multi_store_adv_add(xa, next_index, order, &some_val_2);
+
+ for (i = 0; i < nrpages; i++)
+ XA_BUG_ON(xa, xa_load(xa, base + i) != NULL);
+
+ for (i = 0; i < nrpages; i++)
+ XA_BUG_ON(xa, xa_load(xa, next_index + i) != &some_val_2);
+
+ for (i = 0; i < nrpages; i++)
+ XA_BUG_ON(xa, xa_load(xa, next_next_index + i) != NULL);
+
+ check_xa_multi_store_adv_delete(xa, next_index, order);
+ XA_BUG_ON(xa, !xa_empty(xa));
+
+ /* hole at base and next */
+
+ check_xa_multi_store_adv_add(xa, next_next_index, order, &some_val_2);
+
+ for (i = 0; i < nrpages; i++)
+ XA_BUG_ON(xa, xa_load(xa, base + i) != NULL);
+
+ for (i = 0; i < nrpages; i++)
+ XA_BUG_ON(xa, xa_load(xa, next_index + i) != NULL);
+
+ for (i = 0; i < nrpages; i++)
+ XA_BUG_ON(xa, xa_load(xa, next_next_index + i) != &some_val_2);
+
+ check_xa_multi_store_adv_delete(xa, next_next_index, order);
+ XA_BUG_ON(xa, !xa_empty(xa));
+}
+#endif
+
+static noinline void check_multi_store_advanced(struct xarray *xa)
+{
+#ifdef CONFIG_XARRAY_MULTI
+ unsigned int max_order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 20 : 1;
+ unsigned long end = ULONG_MAX/2;
+ unsigned long pos, i;
+
+ /*
+ * About 117 million tests below.
+ */
+ for (pos = 7; pos < end; pos = (pos * pos) + 564) {
+ for (i = 0; i < max_order; i++) {
+ check_xa_multi_store_adv(xa, pos, i);
+ check_xa_multi_store_adv(xa, pos + 157, i);
+ }
+ }
+#endif
+}
+
static noinline void check_xa_alloc_1(struct xarray *xa, unsigned int base)
{
int i;
@@ -1825,6 +1958,7 @@ static int xarray_checks(void)
check_reserve(&array);
check_reserve(&xa0);
check_multi_store(&array);
+ check_multi_store_advanced(&array);
check_get_order(&array);
check_xa_alloc();
check_find(&array);
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [RFC PATCH 03/11] shmem: drop BLOCKS_PER_PAGE macro
[not found] ` <CGME20231028211540eucas1p1fe328f4dadd3645c2c086055efc872ad@eucas1p1.samsung.com>
@ 2023-10-28 21:15 ` Daniel Gomez
0 siblings, 0 replies; 36+ messages in thread
From: Daniel Gomez @ 2023-10-28 21:15 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
The commit [1] replaced all BLOCKS_PER_PAGE in favor of the
generic PAGE_SECTORS but definition was not removed. Drop it
as unused macro.
[1] e09764cff44b5 ("shmem: quota support").
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
---
mm/shmem.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 0d1ce70bce38..a2ac425b97ea 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -84,7 +84,6 @@ static struct vfsmount *shm_mnt __ro_after_init;
#include "internal.h"
-#define BLOCKS_PER_PAGE (PAGE_SIZE/512)
#define VM_ACCT(size) (PAGE_ALIGN(size) >> PAGE_SHIFT)
/* Pretend that each entry is of this size in directory's i_size */
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [RFC PATCH 04/11] shmem: return number of pages beeing freed in shmem_free_swap
[not found] ` <CGME20231028211541eucas1p26663bd957cb449c7346b9dd00e33a20f@eucas1p2.samsung.com>
@ 2023-10-28 21:15 ` Daniel Gomez
0 siblings, 0 replies; 36+ messages in thread
From: Daniel Gomez @ 2023-10-28 21:15 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
Both shmem_free_swap callers expect the number of pages being freed. In
the large folios context, this needs to support larger values other than
0 (used as 1 page being freed) and -ENOENT (used as 0 pages being
freed). In preparation for large folios adoption, make shmem_free_swap
routine return the number of pages being freed. So, returning 0 in this
context, means 0 pages being freed.
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
---
mm/shmem.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index a2ac425b97ea..9f4c9b9286e5 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -827,18 +827,22 @@ static void shmem_delete_from_page_cache(struct folio *folio, void *radswap)
}
/*
- * Remove swap entry from page cache, free the swap and its page cache.
+ * Remove swap entry from page cache, free the swap and its page cache. Returns
+ * the number of pages being freed. 0 means entry not found in XArray (0 pages
+ * being freed).
*/
-static int shmem_free_swap(struct address_space *mapping,
+static long shmem_free_swap(struct address_space *mapping,
pgoff_t index, void *radswap)
{
void *old;
+ long swaps_freed = 1UL << xa_get_order(&mapping->i_pages, index);
old = xa_cmpxchg_irq(&mapping->i_pages, index, radswap, NULL, 0);
if (old != radswap)
- return -ENOENT;
+ return 0;
free_swap_and_cache(radix_to_swp_entry(radswap));
- return 0;
+
+ return swaps_freed;
}
/*
@@ -990,7 +994,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
if (xa_is_value(folio)) {
if (unfalloc)
continue;
- nr_swaps_freed += !shmem_free_swap(mapping,
+ nr_swaps_freed += shmem_free_swap(mapping,
indices[i], folio);
continue;
}
@@ -1057,14 +1061,17 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
folio = fbatch.folios[i];
if (xa_is_value(folio)) {
+ long swaps_freed;
+
if (unfalloc)
continue;
- if (shmem_free_swap(mapping, indices[i], folio)) {
+ swaps_freed = shmem_free_swap(mapping, indices[i], folio);
+ if (!swaps_freed) {
/* Swap was replaced by page: retry */
index = indices[i];
break;
}
- nr_swaps_freed++;
+ nr_swaps_freed += swaps_freed;
continue;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [RFC PATCH 05/11] shmem: account for large order folios
[not found] ` <CGME20231028211543eucas1p2c980dda91fdccaa0b5af3734c357b2f7@eucas1p2.samsung.com>
@ 2023-10-28 21:15 ` Daniel Gomez
2023-10-29 20:40 ` Matthew Wilcox
0 siblings, 1 reply; 36+ messages in thread
From: Daniel Gomez @ 2023-10-28 21:15 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
From: Luis Chamberlain <mcgrof@kernel.org>
shmem uses the shem_info_inode alloced, swapped to account
for allocated pages and swapped pages. In preparation for large
order folios adjust the accounting to use folio_nr_pages().
This should produce no functional changes yet as larger order
folios are not yet used or supported in shmem.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
mm/shmem.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 9f4c9b9286e5..ab31d2880e5d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -856,16 +856,16 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
pgoff_t start, pgoff_t end)
{
XA_STATE(xas, &mapping->i_pages, start);
- struct page *page;
+ struct folio *folio;
unsigned long swapped = 0;
unsigned long max = end - 1;
rcu_read_lock();
- xas_for_each(&xas, page, max) {
- if (xas_retry(&xas, page))
+ xas_for_each(&xas, folio, max) {
+ if (xas_retry(&xas, folio))
continue;
- if (xa_is_value(page))
- swapped++;
+ if (xa_is_value(folio))
+ swapped += folio_nr_pages(folio);
if (xas.xa_index == max)
break;
if (need_resched()) {
@@ -1514,7 +1514,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
if (add_to_swap_cache(folio, swap,
__GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN,
NULL) == 0) {
- shmem_recalc_inode(inode, 0, 1);
+ shmem_recalc_inode(inode, 0, folio_nr_pages(folio));
swap_shmem_alloc(swap);
shmem_delete_from_page_cache(folio, swp_to_radix_entry(swap));
@@ -1828,6 +1828,7 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
struct address_space *mapping = inode->i_mapping;
swp_entry_t swapin_error;
void *old;
+ long num_swap_pages;
swapin_error = make_poisoned_swp_entry();
old = xa_cmpxchg_irq(&mapping->i_pages, index,
@@ -1837,13 +1838,14 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
return;
folio_wait_writeback(folio);
+ num_swap_pages = folio_nr_pages(folio);
delete_from_swap_cache(folio);
/*
* Don't treat swapin error folio as alloced. Otherwise inode->i_blocks
* won't be 0 when inode is released and thus trigger WARN_ON(i_blocks)
* in shmem_evict_inode().
*/
- shmem_recalc_inode(inode, -1, -1);
+ shmem_recalc_inode(inode, -num_swap_pages, -num_swap_pages);
swap_free(swap);
}
@@ -1928,7 +1930,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
if (error)
goto failed;
- shmem_recalc_inode(inode, 0, -1);
+ shmem_recalc_inode(inode, 0, -folio_nr_pages(folio));
if (sgp == SGP_WRITE)
folio_mark_accessed(folio);
@@ -2684,7 +2686,7 @@ int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
if (ret)
goto out_delete_from_cache;
- shmem_recalc_inode(inode, 1, 0);
+ shmem_recalc_inode(inode, folio_nr_pages(folio), 0);
folio_unlock(folio);
return 0;
out_delete_from_cache:
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [RFC PATCH 06/11] shmem: trace shmem_add_to_page_cache folio order
[not found] ` <CGME20231028211545eucas1p2da564864423007a5ab006cdd1ab4d4a1@eucas1p2.samsung.com>
@ 2023-10-28 21:15 ` Daniel Gomez
2023-10-29 23:14 ` Matthew Wilcox
0 siblings, 1 reply; 36+ messages in thread
From: Daniel Gomez @ 2023-10-28 21:15 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
To be able to trace and account for order of the folio.
Based on include/trace/filemap.h.
Update MAINTAINERS file list for SHMEM.
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
MAINTAINERS | 1 +
include/trace/events/shmem.h | 52 ++++++++++++++++++++++++++++++++++++
mm/shmem.c | 4 +++
3 files changed, 57 insertions(+)
create mode 100644 include/trace/events/shmem.h
diff --git a/MAINTAINERS b/MAINTAINERS
index bdc4638b2df5..befa63e7cb28 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -21923,6 +21923,7 @@ M: Hugh Dickins <hughd@google.com>
L: linux-mm@kvack.org
S: Maintained
F: include/linux/shmem_fs.h
+F: include/trace/events/shmem.h
F: mm/shmem.c
TOMOYO SECURITY MODULE
diff --git a/include/trace/events/shmem.h b/include/trace/events/shmem.h
new file mode 100644
index 000000000000..223f78f11457
--- /dev/null
+++ b/include/trace/events/shmem.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM shmem
+
+#if !defined(_TRACE_SHMEM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_SHMEM_H
+
+#include <linux/types.h>
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(mm_shmem_op_page_cache,
+
+ TP_PROTO(struct folio *folio),
+
+ TP_ARGS(folio),
+
+ TP_STRUCT__entry(
+ __field(unsigned long, pfn)
+ __field(unsigned long, i_ino)
+ __field(unsigned long, index)
+ __field(dev_t, s_dev)
+ __field(unsigned char, order)
+ ),
+
+ TP_fast_assign(
+ __entry->pfn = folio_pfn(folio);
+ __entry->i_ino = folio->mapping->host->i_ino;
+ __entry->index = folio->index;
+ if (folio->mapping->host->i_sb)
+ __entry->s_dev = folio->mapping->host->i_sb->s_dev;
+ else
+ __entry->s_dev = folio->mapping->host->i_rdev;
+ __entry->order = folio_order(folio);
+ ),
+
+ TP_printk("dev %d:%d ino %lx pfn=0x%lx ofs=%lu order=%u",
+ MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
+ __entry->i_ino,
+ __entry->pfn,
+ __entry->index << PAGE_SHIFT,
+ __entry->order)
+);
+
+DEFINE_EVENT(mm_shmem_op_page_cache, mm_shmem_add_to_page_cache,
+ TP_PROTO(struct folio *folio),
+ TP_ARGS(folio)
+ );
+
+#endif /* _TRACE_SHMEM_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/shmem.c b/mm/shmem.c
index ab31d2880e5d..e2893cf2287f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -84,6 +84,9 @@ static struct vfsmount *shm_mnt __ro_after_init;
#include "internal.h"
+#define CREATE_TRACE_POINTS
+#include <trace/events/shmem.h>
+
#define VM_ACCT(size) (PAGE_ALIGN(size) >> PAGE_SHIFT)
/* Pretend that each entry is of this size in directory's i_size */
@@ -1726,6 +1729,7 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
}
}
+ trace_mm_shmem_add_to_page_cache(folio);
shmem_recalc_inode(inode, pages, 0);
folio_add_lru(folio);
return folio;
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [RFC PATCH 07/11] shmem: remove huge arg from shmem_alloc_and_add_folio()
[not found] ` <CGME20231028211546eucas1p2147a423b26a6fa92be7e6c20df429da5@eucas1p2.samsung.com>
@ 2023-10-28 21:15 ` Daniel Gomez
2023-10-29 23:17 ` Matthew Wilcox
0 siblings, 1 reply; 36+ messages in thread
From: Daniel Gomez @ 2023-10-28 21:15 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
The huge flag is already part of of the memory allocation flag (gfp_t).
Make use of the VM_HUGEPAGE bit set by vma_thp_gfp_mask() to know if
the allocation must be a huge page.
Drop CONFIG_TRANSPARENT_HUGEPAGE check in shmem_alloc_and_add_folio()
as VM_HUGEPAGE won't be set unless THP config is enabled.
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
mm/shmem.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index e2893cf2287f..9d68211373c4 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1644,7 +1644,7 @@ static struct folio *shmem_alloc_folio(gfp_t gfp,
static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
struct inode *inode, pgoff_t index,
- struct mm_struct *fault_mm, bool huge)
+ struct mm_struct *fault_mm)
{
struct address_space *mapping = inode->i_mapping;
struct shmem_inode_info *info = SHMEM_I(inode);
@@ -1652,10 +1652,7 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
long pages;
int error;
- if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
- huge = false;
-
- if (huge) {
+ if (gfp & VM_HUGEPAGE) {
pages = HPAGE_PMD_NR;
index = round_down(index, HPAGE_PMD_NR);
@@ -1690,7 +1687,7 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
if (xa_find(&mapping->i_pages, &index,
index + pages - 1, XA_PRESENT)) {
error = -EEXIST;
- } else if (huge) {
+ } else if (gfp & VM_HUGEPAGE) {
count_vm_event(THP_FILE_FALLBACK);
count_vm_event(THP_FILE_FALLBACK_CHARGE);
}
@@ -2054,7 +2051,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
huge_gfp = vma_thp_gfp_mask(vma);
huge_gfp = limit_gfp_mask(huge_gfp, gfp);
folio = shmem_alloc_and_add_folio(huge_gfp,
- inode, index, fault_mm, true);
+ inode, index, fault_mm);
if (!IS_ERR(folio)) {
count_vm_event(THP_FILE_ALLOC);
goto alloced;
@@ -2063,7 +2060,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
goto repeat;
}
- folio = shmem_alloc_and_add_folio(gfp, inode, index, fault_mm, false);
+ folio = shmem_alloc_and_add_folio(gfp, inode, index, fault_mm);
if (IS_ERR(folio)) {
error = PTR_ERR(folio);
if (error == -EEXIST)
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [RFC PATCH 08/11] shmem: add file length arg in shmem_get_folio() path
[not found] ` <CGME20231028211548eucas1p18d34af3d578966ba6778d4e60751789d@eucas1p1.samsung.com>
@ 2023-10-28 21:15 ` Daniel Gomez
0 siblings, 0 replies; 36+ messages in thread
From: Daniel Gomez @ 2023-10-28 21:15 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
In preparation for large folio in the write path, add file length
argument in shmem_get_folio() path to be able to calculate the folio
order based on the file size. Use of order-0 (PAGE_SIZE) for non write
paths such as read, page cache read, and vm fault.
This enables high order folios in the write and fallocate paths once the
folio order is calculated based on the length.
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
include/linux/shmem_fs.h | 2 +-
mm/khugepaged.c | 3 ++-
mm/shmem.c | 33 ++++++++++++++++++---------------
mm/userfaultfd.c | 2 +-
4 files changed, 22 insertions(+), 18 deletions(-)
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 2caa6b86106a..7138ea980884 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -137,7 +137,7 @@ enum sgp_type {
};
int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
- enum sgp_type sgp);
+ enum sgp_type sgp, size_t len);
struct folio *shmem_read_folio_gfp(struct address_space *mapping,
pgoff_t index, gfp_t gfp);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 064654717843..fcde8223b507 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1855,7 +1855,8 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
xas_unlock_irq(&xas);
/* swap in or instantiate fallocated page */
if (shmem_get_folio(mapping->host, index,
- &folio, SGP_NOALLOC)) {
+ &folio, SGP_NOALLOC,
+ PAGE_SIZE)) {
result = SCAN_FAIL;
goto xa_unlocked;
}
diff --git a/mm/shmem.c b/mm/shmem.c
index 9d68211373c4..d8dc2ceaba18 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -958,7 +958,7 @@ static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index)
* (although in some cases this is just a waste of time).
*/
folio = NULL;
- shmem_get_folio(inode, index, &folio, SGP_READ);
+ shmem_get_folio(inode, index, &folio, SGP_READ, PAGE_SIZE);
return folio;
}
@@ -1644,7 +1644,7 @@ static struct folio *shmem_alloc_folio(gfp_t gfp,
static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
struct inode *inode, pgoff_t index,
- struct mm_struct *fault_mm)
+ struct mm_struct *fault_mm, size_t len)
{
struct address_space *mapping = inode->i_mapping;
struct shmem_inode_info *info = SHMEM_I(inode);
@@ -1969,7 +1969,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
*/
static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
struct folio **foliop, enum sgp_type sgp, gfp_t gfp,
- struct vm_fault *vmf, vm_fault_t *fault_type)
+ struct vm_fault *vmf, vm_fault_t *fault_type, size_t len)
{
struct vm_area_struct *vma = vmf ? vmf->vma : NULL;
struct mm_struct *fault_mm;
@@ -2051,7 +2051,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
huge_gfp = vma_thp_gfp_mask(vma);
huge_gfp = limit_gfp_mask(huge_gfp, gfp);
folio = shmem_alloc_and_add_folio(huge_gfp,
- inode, index, fault_mm);
+ inode, index, fault_mm, len);
if (!IS_ERR(folio)) {
count_vm_event(THP_FILE_ALLOC);
goto alloced;
@@ -2060,7 +2060,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
goto repeat;
}
- folio = shmem_alloc_and_add_folio(gfp, inode, index, fault_mm);
+ folio = shmem_alloc_and_add_folio(gfp, inode, index, fault_mm, len);
if (IS_ERR(folio)) {
error = PTR_ERR(folio);
if (error == -EEXIST)
@@ -2140,10 +2140,10 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
}
int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
- enum sgp_type sgp)
+ enum sgp_type sgp, size_t len)
{
return shmem_get_folio_gfp(inode, index, foliop, sgp,
- mapping_gfp_mask(inode->i_mapping), NULL, NULL);
+ mapping_gfp_mask(inode->i_mapping), NULL, NULL, len);
}
/*
@@ -2237,7 +2237,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf)
WARN_ON_ONCE(vmf->page != NULL);
err = shmem_get_folio_gfp(inode, vmf->pgoff, &folio, SGP_CACHE,
- gfp, vmf, &ret);
+ gfp, vmf, &ret, PAGE_SIZE);
if (err)
return vmf_error(err);
if (folio) {
@@ -2716,6 +2716,9 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
struct folio *folio;
int ret = 0;
+ if (!mapping_large_folio_support(mapping))
+ len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
+
/* i_rwsem is held by caller */
if (unlikely(info->seals & (F_SEAL_GROW |
F_SEAL_WRITE | F_SEAL_FUTURE_WRITE))) {
@@ -2725,7 +2728,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
return -EPERM;
}
- ret = shmem_get_folio(inode, index, &folio, SGP_WRITE);
+ ret = shmem_get_folio(inode, index, &folio, SGP_WRITE, len);
if (ret)
return ret;
@@ -2796,7 +2799,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
break;
}
- error = shmem_get_folio(inode, index, &folio, SGP_READ);
+ error = shmem_get_folio(inode, index, &folio, SGP_READ, PAGE_SIZE);
if (error) {
if (error == -EINVAL)
error = 0;
@@ -2973,7 +2976,7 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
break;
error = shmem_get_folio(inode, *ppos / PAGE_SIZE, &folio,
- SGP_READ);
+ SGP_READ, PAGE_SIZE);
if (error) {
if (error == -EINVAL)
error = 0;
@@ -3160,7 +3163,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
error = -ENOMEM;
else
error = shmem_get_folio(inode, index, &folio,
- SGP_FALLOC);
+ SGP_FALLOC, (end - index) << PAGE_SHIFT);
if (error) {
info->fallocend = undo_fallocend;
/* Remove the !uptodate folios we added */
@@ -3511,7 +3514,7 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
inode->i_op = &shmem_short_symlink_operations;
} else {
inode_nohighmem(inode);
- error = shmem_get_folio(inode, 0, &folio, SGP_WRITE);
+ error = shmem_get_folio(inode, 0, &folio, SGP_WRITE, PAGE_SIZE);
if (error)
goto out_remove_offset;
inode->i_mapping->a_ops = &shmem_aops;
@@ -3558,7 +3561,7 @@ static const char *shmem_get_link(struct dentry *dentry, struct inode *inode,
return ERR_PTR(-ECHILD);
}
} else {
- error = shmem_get_folio(inode, 0, &folio, SGP_READ);
+ error = shmem_get_folio(inode, 0, &folio, SGP_READ, PAGE_SIZE);
if (error)
return ERR_PTR(error);
if (!folio)
@@ -4923,7 +4926,7 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
BUG_ON(!shmem_mapping(mapping));
error = shmem_get_folio_gfp(inode, index, &folio, SGP_CACHE,
- gfp, NULL, NULL);
+ gfp, NULL, NULL, PAGE_SIZE);
if (error)
return ERR_PTR(error);
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 96d9eae5c7cc..aab8679b322a 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -256,7 +256,7 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd,
struct page *page;
int ret;
- ret = shmem_get_folio(inode, pgoff, &folio, SGP_NOALLOC);
+ ret = shmem_get_folio(inode, pgoff, &folio, SGP_NOALLOC, PAGE_SIZE);
/* Our caller expects us to return -EFAULT if we failed to find folio */
if (ret == -ENOENT)
ret = -EFAULT;
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [RFC PATCH 09/11] shmem: add order arg to shmem_alloc_folio()
[not found] ` <CGME20231028211550eucas1p1dc1d47e413de350deda962c3df5111ef@eucas1p1.samsung.com>
@ 2023-10-28 21:15 ` Daniel Gomez
2023-10-31 7:04 ` Hannes Reinecke
0 siblings, 1 reply; 36+ messages in thread
From: Daniel Gomez @ 2023-10-28 21:15 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
Add folio order argument to the shmem_alloc_folio() and merge it with
the shmem_alloc_folio_huge(). Return will make use of the new
page_rmappable_folio() where order-0 and high order folios are
both supported.
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
mm/shmem.c | 33 ++++++++++-----------------------
1 file changed, 10 insertions(+), 23 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index d8dc2ceaba18..fc7605da4316 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1614,40 +1614,27 @@ static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp)
return result;
}
-static struct folio *shmem_alloc_hugefolio(gfp_t gfp,
- struct shmem_inode_info *info, pgoff_t index)
+static struct folio *shmem_alloc_folio(gfp_t gfp, struct shmem_inode_info *info,
+ pgoff_t index, unsigned int order)
{
struct mempolicy *mpol;
pgoff_t ilx;
struct page *page;
- mpol = shmem_get_pgoff_policy(info, index, HPAGE_PMD_ORDER, &ilx);
- page = alloc_pages_mpol(gfp, HPAGE_PMD_ORDER, mpol, ilx, numa_node_id());
+ mpol = shmem_get_pgoff_policy(info, index, order, &ilx);
+ page = alloc_pages_mpol(gfp, order, mpol, ilx, numa_node_id());
mpol_cond_put(mpol);
return page_rmappable_folio(page);
}
-static struct folio *shmem_alloc_folio(gfp_t gfp,
- struct shmem_inode_info *info, pgoff_t index)
-{
- struct mempolicy *mpol;
- pgoff_t ilx;
- struct page *page;
-
- mpol = shmem_get_pgoff_policy(info, index, 0, &ilx);
- page = alloc_pages_mpol(gfp, 0, mpol, ilx, numa_node_id());
- mpol_cond_put(mpol);
-
- return (struct folio *)page;
-}
-
static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
struct inode *inode, pgoff_t index,
struct mm_struct *fault_mm, size_t len)
{
struct address_space *mapping = inode->i_mapping;
struct shmem_inode_info *info = SHMEM_I(inode);
+ unsigned int order = 0;
struct folio *folio;
long pages;
int error;
@@ -1668,12 +1655,12 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
index + HPAGE_PMD_NR - 1, XA_PRESENT))
return ERR_PTR(-E2BIG);
- folio = shmem_alloc_hugefolio(gfp, info, index);
+ folio = shmem_alloc_folio(gfp, info, index, HPAGE_PMD_ORDER);
if (!folio)
count_vm_event(THP_FILE_FALLBACK);
} else {
- pages = 1;
- folio = shmem_alloc_folio(gfp, info, index);
+ pages = 1UL << order;
+ folio = shmem_alloc_folio(gfp, info, index, order);
}
if (!folio)
return ERR_PTR(-ENOMEM);
@@ -1774,7 +1761,7 @@ static int shmem_replace_folio(struct folio **foliop, gfp_t gfp,
*/
gfp &= ~GFP_CONSTRAINT_MASK;
VM_BUG_ON_FOLIO(folio_test_large(old), old);
- new = shmem_alloc_folio(gfp, info, index);
+ new = shmem_alloc_folio(gfp, info, index, 0);
if (!new)
return -ENOMEM;
@@ -2618,7 +2605,7 @@ int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
if (!*foliop) {
ret = -ENOMEM;
- folio = shmem_alloc_folio(gfp, info, pgoff);
+ folio = shmem_alloc_folio(gfp, info, pgoff, 0);
if (!folio)
goto out_unacct_blocks;
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [RFC PATCH 10/11] shmem: add large folio support to the write path
[not found] ` <CGME20231028211551eucas1p1552b7695f12c27f4ea1b92ecb6259b31@eucas1p1.samsung.com>
@ 2023-10-28 21:15 ` Daniel Gomez
2023-10-28 23:51 ` kernel test robot
2023-10-29 23:32 ` Matthew Wilcox
0 siblings, 2 replies; 36+ messages in thread
From: Daniel Gomez @ 2023-10-28 21:15 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
Current work in progress. Large folios in the fallocate path makes
regress fstests generic/285 and generic/436.
Add large folio support for shmem write path matching the same high
order preference mechanism used for iomap buffered IO path as used in
__filemap_get_folio().
Add shmem_mapping_size_order to get a hint for the order of the folio
based on the file size which takes care of the mapping requirements.
Swap does not support high order folios for now, so make it order 0 in
case swap is enabled.
Add the __GFP_COMP flag for high order folios except when huge is
enabled. This fixes a memory leak when allocating high order folios.
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
mm/shmem.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 48 insertions(+), 1 deletion(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index fc7605da4316..eb314927be78 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1621,6 +1621,9 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, struct shmem_inode_info *info,
pgoff_t ilx;
struct page *page;
+ if ((order != 0) && !(gfp & VM_HUGEPAGE))
+ gfp |= __GFP_COMP;
+
mpol = shmem_get_pgoff_policy(info, index, order, &ilx);
page = alloc_pages_mpol(gfp, order, mpol, ilx, numa_node_id());
mpol_cond_put(mpol);
@@ -1628,17 +1631,56 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, struct shmem_inode_info *info,
return page_rmappable_folio(page);
}
+/**
+ * shmem_mapping_size_order - Get maximum folio order for the given file size.
+ * @mapping: Target address_space.
+ * @index: The page index.
+ * @size: The suggested size of the folio to create.
+ *
+ * This returns a high order for folios (when supported) based on the file size
+ * which the mapping currently allows at the given index. The index is relevant
+ * due to alignment considerations the mapping might have. The returned order
+ * may be less than the size passed.
+ *
+ * Like __filemap_get_folio order calculation.
+ *
+ * Return: The order.
+ */
+static inline unsigned int
+shmem_mapping_size_order(struct address_space *mapping, pgoff_t index,
+ size_t size, struct shmem_sb_info *sbinfo)
+{
+ unsigned int order = ilog2(size);
+
+ if ((order <= PAGE_SHIFT) ||
+ (!mapping_large_folio_support(mapping) || !sbinfo->noswap))
+ return 0;
+
+ order -= PAGE_SHIFT;
+
+ /* If we're not aligned, allocate a smaller folio */
+ if (index & ((1UL << order) - 1))
+ order = __ffs(index);
+
+ order = min_t(size_t, order, MAX_PAGECACHE_ORDER);
+
+ /* Order-1 not supported due to THP dependency */
+ return (order == 1) ? 0 : order;
+}
+
static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
struct inode *inode, pgoff_t index,
struct mm_struct *fault_mm, size_t len)
{
struct address_space *mapping = inode->i_mapping;
struct shmem_inode_info *info = SHMEM_I(inode);
- unsigned int order = 0;
+ unsigned int order = shmem_mapping_size_order(mapping, index, len,
+ SHMEM_SB(inode->i_sb));
struct folio *folio;
long pages;
int error;
+neworder:
if (gfp & VM_HUGEPAGE) {
pages = HPAGE_PMD_NR;
index = round_down(index, HPAGE_PMD_NR);
@@ -1721,6 +1763,11 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
unlock:
folio_unlock(folio);
folio_put(folio);
+ if (order != 0) {
+ if (--order == 1)
+ order = 0;
+ goto neworder;
+ }
return ERR_PTR(error);
}
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [RFC PATCH 11/11] shmem: add per-block uptodate tracking
[not found] ` <CGME20231028211553eucas1p1a93637df6c46692531894e26023920d5@eucas1p1.samsung.com>
@ 2023-10-28 21:15 ` Daniel Gomez
2023-10-28 23:51 ` kernel test robot
2023-10-29 4:46 ` kernel test robot
0 siblings, 2 replies; 36+ messages in thread
From: Daniel Gomez @ 2023-10-28 21:15 UTC (permalink / raw)
To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav, Daniel Gomez
Current work in progress due to fsx regression (check below).
Based on iomap per-block dirty and uptodate state track, add support
for shmem_folio_state struct to track uptodate per-block when a folio is
larger than a block. In shmem, this is when large folios is used, as one
block is equal to one page in this context.
Add support for invalidate_folio, release_folio and is_partially_uptodate
address space operations. The first two are needed to be able to free
the new shmem_folio_state struct. The last callback is required for
large folios when enabling per-block tracking.
This was spotted when running fstests for tmpfs and regress on
generic/285 and generic/436 tests [1] with large folios support in the
fallocate path without having per-block uptodate tracking.
[1] tests:
generic/285: src/seek_sanity_test/test09()
generic/436: src/seek_sanity_test/test13()
How to reproduce:
```sh
mkdir -p /mnt/test-tmpfs
./src/seek_sanity_test -s 9 -e 9 /mnt/test-tmpfs/file
./src/seek_sanity_test -s 13 -e 13 /mnt/test-tmpfs/file
umount /mnt/test-tmpfs
```
After per-block uptodate support is added, fsx regresion is found when
running the following:
```sh
mkdir -p /mnt/test-tmpfs
mount -t tmpfs -o size=1G -o noswap tmpfs /mnt/test-tmpfs
/root/xfstests-dev/ltp/fsx /mnt/test-tmpfs/file -d -N 1200 -X
umount /mnt/test-tmpfs
```
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
mm/shmem.c | 169 +++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 159 insertions(+), 10 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index eb314927be78..fa67594495d5 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -132,6 +132,94 @@ struct shmem_options {
#define SHMEM_SEEN_QUOTA 32
};
+/*
+ * Structure allocated for each folio to track per-block uptodate state.
+ *
+ * Like buffered-io shmem_folio_state struct but only for uptodate.
+ */
+struct shmem_folio_state {
+ spinlock_t state_lock;
+ unsigned long state[];
+};
+
+static inline bool sfs_is_fully_uptodate(struct folio *folio,
+ struct shmem_folio_state *sfs)
+{
+ struct inode *inode = folio->mapping->host;
+
+ return bitmap_full(sfs->state, i_blocks_per_folio(inode, folio));
+}
+
+static inline bool sfs_block_is_uptodate(struct shmem_folio_state *sfs,
+ unsigned int block)
+{
+ return test_bit(block, sfs->state);
+}
+
+static void sfs_set_range_uptodate(struct folio *folio,
+ struct shmem_folio_state *sfs, size_t off,
+ size_t len)
+{
+ struct inode *inode = folio->mapping->host;
+ unsigned int first_blk = off >> inode->i_blkbits;
+ unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
+ unsigned int nr_blks = last_blk - first_blk + 1;
+ unsigned long flags;
+
+ spin_lock_irqsave(&sfs->state_lock, flags);
+ bitmap_set(sfs->state, first_blk, nr_blks);
+ if (sfs_is_fully_uptodate(folio, sfs))
+ folio_mark_uptodate(folio);
+ spin_unlock_irqrestore(&sfs->state_lock, flags);
+}
+
+static void shmem_set_range_uptodate(struct folio *folio, size_t off,
+ size_t len)
+{
+ struct shmem_folio_state *sfs = folio->private;
+
+ if (sfs)
+ sfs_set_range_uptodate(folio, sfs, off, len);
+ else
+ folio_mark_uptodate(folio);
+}
+
+static struct shmem_folio_state *sfs_alloc(struct inode *inode,
+ struct folio *folio, gfp_t gfp)
+{
+ struct shmem_folio_state *sfs = folio->private;
+ unsigned int nr_blocks = i_blocks_per_folio(inode, folio);
+
+ if (sfs || nr_blocks <= 1)
+ return sfs;
+
+ /*
+ * sfs->state tracks uptodate flag when the block size is smaller
+ * than the folio size.
+ */
+ sfs = kzalloc(struct_size(sfs, state, BITS_TO_LONGS(nr_blocks)), gfp);
+ if (!sfs)
+ return sfs;
+
+ spin_lock_init(&sfs->state_lock);
+ if (folio_test_uptodate(folio))
+ bitmap_set(sfs->state, 0, nr_blocks);
+ folio_attach_private(folio, sfs);
+
+ return sfs;
+}
+
+static void sfs_free(struct folio *folio)
+{
+ struct shmem_folio_state *sfs = folio_detach_private(folio);
+
+ if (!sfs)
+ return;
+ WARN_ON_ONCE(sfs_is_fully_uptodate(folio, sfs) !=
+ folio_test_uptodate(folio));
+ kfree(sfs);
+}
+
#ifdef CONFIG_TMPFS
static unsigned long shmem_default_max_blocks(void)
{
@@ -1495,7 +1583,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
}
folio_zero_range(folio, 0, folio_size(folio));
flush_dcache_folio(folio);
- folio_mark_uptodate(folio);
+ shmem_set_range_uptodate(folio, 0, folio_size(folio));
}
swap = folio_alloc_swap(folio);
@@ -1676,6 +1764,7 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
struct shmem_inode_info *info = SHMEM_I(inode);
unsigned int order = shmem_mapping_size_order(mapping, index, len,
SHMEM_SB(inode->i_sb));
+ struct shmem_folio_state *sfs;
struct folio *folio;
long pages;
int error;
@@ -1755,6 +1844,10 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
}
}
+ sfs = sfs_alloc(inode, folio, gfp);
+ if (!sfs && i_blocks_per_folio(inode, folio) > 1)
+ goto unlock;
+
trace_mm_shmem_add_to_page_cache(folio);
shmem_recalc_inode(inode, pages, 0);
folio_add_lru(folio);
@@ -1818,7 +1911,7 @@ static int shmem_replace_folio(struct folio **foliop, gfp_t gfp,
__folio_set_locked(new);
__folio_set_swapbacked(new);
- folio_mark_uptodate(new);
+ shmem_set_range_uptodate(new, 0, folio_size(new));
new->swap = entry;
folio_set_swapcache(new);
@@ -2146,7 +2239,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
for (i = 0; i < n; i++)
clear_highpage(folio_page(folio, i));
flush_dcache_folio(folio);
- folio_mark_uptodate(folio);
+ shmem_set_range_uptodate(folio, 0, folio_size(folio));
}
/* Perhaps the file has been truncated since we checked */
@@ -2788,13 +2881,18 @@ shmem_write_end(struct file *file, struct address_space *mapping,
if (pos + copied > inode->i_size)
i_size_write(inode, pos + copied);
+ if (unlikely(copied < len && !folio_test_uptodate(folio)))
+ return 0;
+
if (!folio_test_uptodate(folio)) {
- if (copied < folio_size(folio)) {
- size_t from = offset_in_folio(folio, pos);
- folio_zero_segments(folio, 0, from,
- from + copied, folio_size(folio));
- }
- folio_mark_uptodate(folio);
+ size_t from = offset_in_folio(folio, pos);
+ if (!folio_test_large(folio) && copied < folio_size(folio))
+ folio_zero_segments(folio, 0, from, from + copied,
+ folio_size(folio));
+ if (folio_test_large(folio) && copied < PAGE_SIZE)
+ folio_zero_segments(folio, from, from, from + copied,
+ folio_size(folio));
+ shmem_set_range_uptodate(folio, from, len);
}
folio_mark_dirty(folio);
folio_unlock(folio);
@@ -2803,6 +2901,54 @@ shmem_write_end(struct file *file, struct address_space *mapping,
return copied;
}
+void shmem_invalidate_folio(struct folio *folio, size_t offset, size_t len)
+{
+ /*
+ * If we're invalidating the entire folio, clear the dirty state
+ * from it and release it to avoid unnecessary buildup of the LRU.
+ */
+ if (offset == 0 && len == folio_size(folio)) {
+ WARN_ON_ONCE(folio_test_writeback(folio));
+ folio_cancel_dirty(folio);
+ sfs_free(folio);
+ }
+}
+
+bool shmem_release_folio(struct folio *folio, gfp_t gfp_flags)
+{
+ sfs_free(folio);
+ return true;
+}
+
+/*
+ * shmem_is_partially_uptodate checks whether blocks within a folio are
+ * uptodate or not.
+ *
+ * Returns true if all blocks which correspond to the specified part
+ * of the folio are uptodate.
+ */
+bool shmem_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
+{
+ struct shmem_folio_state *sfs = folio->private;
+ struct inode *inode = folio->mapping->host;
+ unsigned first, last, i;
+
+ if (!sfs)
+ return false;
+
+ /* Caller's range may extend past the end of this folio */
+ count = min(folio_size(folio) - from, count);
+
+ /* First and last blocks in range within folio */
+ first = from >> inode->i_blkbits;
+ last = (from + count - 1) >> inode->i_blkbits;
+
+ for (i = first; i <= last; i++)
+ if (!sfs_block_is_uptodate(sfs, i))
+ return false;
+ return true;
+}
+
static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
struct file *file = iocb->ki_filp;
@@ -3554,7 +3700,7 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
inode->i_mapping->a_ops = &shmem_aops;
inode->i_op = &shmem_symlink_inode_operations;
memcpy(folio_address(folio), symname, len);
- folio_mark_uptodate(folio);
+ shmem_set_range_uptodate(folio, 0, folio_size(folio));
folio_mark_dirty(folio);
folio_unlock(folio);
folio_put(folio);
@@ -4524,6 +4670,9 @@ const struct address_space_operations shmem_aops = {
#ifdef CONFIG_MIGRATION
.migrate_folio = migrate_folio,
#endif
+ .invalidate_folio = shmem_invalidate_folio,
+ .release_folio = shmem_release_folio,
+ .is_partially_uptodate = shmem_is_partially_uptodate,
.error_remove_page = shmem_error_remove_page,
};
EXPORT_SYMBOL(shmem_aops);
--
2.39.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [RFC PATCH 10/11] shmem: add large folio support to the write path
2023-10-28 21:15 ` [RFC PATCH 10/11] shmem: add large folio support to the write path Daniel Gomez
@ 2023-10-28 23:51 ` kernel test robot
2023-10-29 23:32 ` Matthew Wilcox
1 sibling, 0 replies; 36+ messages in thread
From: kernel test robot @ 2023-10-28 23:51 UTC (permalink / raw)
To: Daniel Gomez; +Cc: oe-kbuild-all
Hi Daniel,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:
[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on next-20231027]
[cannot apply to linus/master v6.6-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Daniel-Gomez/XArray-add-cmpxchg-order-test/20231029-051730
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20231028211518.3424020-11-da.gomez%40samsung.com
patch subject: [RFC PATCH 10/11] shmem: add large folio support to the write path
config: loongarch-allyesconfig (https://download.01.org/0day-ci/archive/20231029/202310290726.C5ndh95F-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231029/202310290726.C5ndh95F-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310290726.C5ndh95F-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> mm/shmem.c:1652: warning: Function parameter or member 'sbinfo' not described in 'shmem_mapping_size_order'
vim +1652 mm/shmem.c
1633
1634 /**
1635 * shmem_mapping_size_order - Get maximum folio order for the given file size.
1636 * @mapping: Target address_space.
1637 * @index: The page index.
1638 * @size: The suggested size of the folio to create.
1639 *
1640 * This returns a high order for folios (when supported) based on the file size
1641 * which the mapping currently allows at the given index. The index is relevant
1642 * due to alignment considerations the mapping might have. The returned order
1643 * may be less than the size passed.
1644 *
1645 * Like __filemap_get_folio order calculation.
1646 *
1647 * Return: The order.
1648 */
1649 static inline unsigned int
1650 shmem_mapping_size_order(struct address_space *mapping, pgoff_t index,
1651 size_t size, struct shmem_sb_info *sbinfo)
> 1652 {
1653 unsigned int order = ilog2(size);
1654
1655 if ((order <= PAGE_SHIFT) ||
1656 (!mapping_large_folio_support(mapping) || !sbinfo->noswap))
1657 return 0;
1658
1659 order -= PAGE_SHIFT;
1660
1661 /* If we're not aligned, allocate a smaller folio */
1662 if (index & ((1UL << order) - 1))
1663 order = __ffs(index);
1664
1665 order = min_t(size_t, order, MAX_PAGECACHE_ORDER);
1666
1667 /* Order-1 not supported due to THP dependency */
1668 return (order == 1) ? 0 : order;
1669 }
1670
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH 11/11] shmem: add per-block uptodate tracking
2023-10-28 21:15 ` [RFC PATCH 11/11] shmem: add per-block uptodate tracking Daniel Gomez
@ 2023-10-28 23:51 ` kernel test robot
2023-10-29 4:46 ` kernel test robot
1 sibling, 0 replies; 36+ messages in thread
From: kernel test robot @ 2023-10-28 23:51 UTC (permalink / raw)
To: Daniel Gomez; +Cc: oe-kbuild-all
Hi Daniel,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:
[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on next-20231027]
[cannot apply to linus/master v6.6-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Daniel-Gomez/XArray-add-cmpxchg-order-test/20231029-051730
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20231028211518.3424020-12-da.gomez%40samsung.com
patch subject: [RFC PATCH 11/11] shmem: add per-block uptodate tracking
config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20231029/202310290754.qn6yKgu6-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231029/202310290754.qn6yKgu6-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310290754.qn6yKgu6-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> mm/shmem.c:2904:6: warning: no previous prototype for 'shmem_invalidate_folio' [-Wmissing-prototypes]
2904 | void shmem_invalidate_folio(struct folio *folio, size_t offset, size_t len)
| ^~~~~~~~~~~~~~~~~~~~~~
>> mm/shmem.c:2917:6: warning: no previous prototype for 'shmem_release_folio' [-Wmissing-prototypes]
2917 | bool shmem_release_folio(struct folio *folio, gfp_t gfp_flags)
| ^~~~~~~~~~~~~~~~~~~
>> mm/shmem.c:2930:6: warning: no previous prototype for 'shmem_is_partially_uptodate' [-Wmissing-prototypes]
2930 | bool shmem_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from <command-line>:
mm/shmem.c: In function 'shmem_alloc_and_add_folio.isra':
include/linux/compiler_types.h:425:45: error: call to '__compiletime_assert_360' declared with attribute error: BUILD_BUG failed
425 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
include/linux/compiler_types.h:406:25: note: in definition of macro '__compiletime_assert'
406 | prefix ## suffix(); \
| ^~~~~~
include/linux/compiler_types.h:425:9: note: in expansion of macro '_compiletime_assert'
425 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
| ^~~~~~~~~~~~~~~~
include/linux/huge_mm.h:257:28: note: in expansion of macro 'BUILD_BUG'
257 | #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
| ^~~~~~~~~
include/linux/huge_mm.h:67:26: note: in expansion of macro 'HPAGE_PMD_SHIFT'
67 | #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)
| ^~~~~~~~~~~~~~~
include/linux/huge_mm.h:68:26: note: in expansion of macro 'HPAGE_PMD_ORDER'
68 | #define HPAGE_PMD_NR (1<<HPAGE_PMD_ORDER)
| ^~~~~~~~~~~~~~~
mm/shmem.c:1774:25: note: in expansion of macro 'HPAGE_PMD_NR'
1774 | pages = HPAGE_PMD_NR;
| ^~~~~~~~~~~~
vim +/shmem_invalidate_folio +2904 mm/shmem.c
2903
> 2904 void shmem_invalidate_folio(struct folio *folio, size_t offset, size_t len)
2905 {
2906 /*
2907 * If we're invalidating the entire folio, clear the dirty state
2908 * from it and release it to avoid unnecessary buildup of the LRU.
2909 */
2910 if (offset == 0 && len == folio_size(folio)) {
2911 WARN_ON_ONCE(folio_test_writeback(folio));
2912 folio_cancel_dirty(folio);
2913 sfs_free(folio);
2914 }
2915 }
2916
> 2917 bool shmem_release_folio(struct folio *folio, gfp_t gfp_flags)
2918 {
2919 sfs_free(folio);
2920 return true;
2921 }
2922
2923 /*
2924 * shmem_is_partially_uptodate checks whether blocks within a folio are
2925 * uptodate or not.
2926 *
2927 * Returns true if all blocks which correspond to the specified part
2928 * of the folio are uptodate.
2929 */
> 2930 bool shmem_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
2931 {
2932 struct shmem_folio_state *sfs = folio->private;
2933 struct inode *inode = folio->mapping->host;
2934 unsigned first, last, i;
2935
2936 if (!sfs)
2937 return false;
2938
2939 /* Caller's range may extend past the end of this folio */
2940 count = min(folio_size(folio) - from, count);
2941
2942 /* First and last blocks in range within folio */
2943 first = from >> inode->i_blkbits;
2944 last = (from + count - 1) >> inode->i_blkbits;
2945
2946 for (i = first; i <= last; i++)
2947 if (!sfs_block_is_uptodate(sfs, i))
2948 return false;
2949 return true;
2950 }
2951
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH 11/11] shmem: add per-block uptodate tracking
2023-10-28 21:15 ` [RFC PATCH 11/11] shmem: add per-block uptodate tracking Daniel Gomez
2023-10-28 23:51 ` kernel test robot
@ 2023-10-29 4:46 ` kernel test robot
1 sibling, 0 replies; 36+ messages in thread
From: kernel test robot @ 2023-10-29 4:46 UTC (permalink / raw)
To: Daniel Gomez; +Cc: oe-kbuild-all
Hi Daniel,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on next-20231027]
[cannot apply to linus/master v6.6-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Daniel-Gomez/XArray-add-cmpxchg-order-test/20231029-051730
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20231028211518.3424020-12-da.gomez%40samsung.com
patch subject: [RFC PATCH 11/11] shmem: add per-block uptodate tracking
config: powerpc64-randconfig-003-20231029 (https://download.01.org/0day-ci/archive/20231029/202310291218.fYLv1WWN-lkp@intel.com/config)
compiler: powerpc64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231029/202310291218.fYLv1WWN-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310291218.fYLv1WWN-lkp@intel.com/
All error/warnings (new ones prefixed by >>):
>> mm/shmem.c:4673:29: error: 'shmem_invalidate_folio' undeclared here (not in a function); did you mean 'shmem_replace_folio'?
4673 | .invalidate_folio = shmem_invalidate_folio,
| ^~~~~~~~~~~~~~~~~~~~~~
| shmem_replace_folio
>> mm/shmem.c:4674:27: error: 'shmem_release_folio' undeclared here (not in a function); did you mean 'shmem_replace_folio'?
4674 | .release_folio = shmem_release_folio,
| ^~~~~~~~~~~~~~~~~~~
| shmem_replace_folio
>> mm/shmem.c:4675:34: error: 'shmem_is_partially_uptodate' undeclared here (not in a function); did you mean 'shmem_set_range_uptodate'?
4675 | .is_partially_uptodate = shmem_is_partially_uptodate,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
| shmem_set_range_uptodate
>> mm/shmem.c:212:13: warning: 'sfs_free' defined but not used [-Wunused-function]
212 | static void sfs_free(struct folio *folio)
| ^~~~~~~~
vim +4673 mm/shmem.c
4662
4663 const struct address_space_operations shmem_aops = {
4664 .writepage = shmem_writepage,
4665 .dirty_folio = noop_dirty_folio,
4666 #ifdef CONFIG_TMPFS
4667 .write_begin = shmem_write_begin,
4668 .write_end = shmem_write_end,
4669 #endif
4670 #ifdef CONFIG_MIGRATION
4671 .migrate_folio = migrate_folio,
4672 #endif
> 4673 .invalidate_folio = shmem_invalidate_folio,
> 4674 .release_folio = shmem_release_folio,
> 4675 .is_partially_uptodate = shmem_is_partially_uptodate,
4676 .error_remove_page = shmem_error_remove_page,
4677 };
4678 EXPORT_SYMBOL(shmem_aops);
4679
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH 01/11] XArray: add cmpxchg order test
2023-10-28 21:15 ` [RFC PATCH 01/11] XArray: add cmpxchg order test Daniel Gomez
@ 2023-10-29 20:11 ` Matthew Wilcox
2023-11-03 23:12 ` Daniel Gomez
0 siblings, 1 reply; 36+ messages in thread
From: Matthew Wilcox @ 2023-10-29 20:11 UTC (permalink / raw)
To: Daniel Gomez
Cc: minchan, senozhatsky, axboe, djwong, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
gost.dev, Pankaj Raghav
On Sat, Oct 28, 2023 at 09:15:35PM +0000, Daniel Gomez wrote:
> +static noinline void check_cmpxchg_order(struct xarray *xa)
> +{
> + void *FIVE = xa_mk_value(5);
> + unsigned int order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 15 : 1;
... have you tried this with CONFIG_XARRAY_MULTI deselected?
I suspect it will BUG() because orders greater than 0 are not allowed.
> + XA_BUG_ON(xa, !xa_empty(xa));
> + XA_BUG_ON(xa, xa_store_index(xa, 5, GFP_KERNEL) != NULL);
> + XA_BUG_ON(xa, xa_insert(xa, 5, FIVE, GFP_KERNEL) != -EBUSY);
> + XA_BUG_ON(xa, xa_store_order(xa, 5, order, FIVE, GFP_KERNEL));
> + XA_BUG_ON(xa, xa_get_order(xa, 5) != order);
> + XA_BUG_ON(xa, xa_get_order(xa, xa_to_value(FIVE)) != order);
> + old = xa_cmpxchg(xa, 5, FIVE, NULL, GFP_KERNEL);
> + XA_BUG_ON(xa, old != FIVE);
> + XA_BUG_ON(xa, xa_get_order(xa, 5) != 0);
> + XA_BUG_ON(xa, xa_get_order(xa, xa_to_value(FIVE)) != 0);
> + XA_BUG_ON(xa, xa_get_order(xa, xa_to_value(old)) != 0);
> + XA_BUG_ON(xa, !xa_empty(xa));
I'm not sure this is a great test. It definitely does do what you claim
it will, but for example, it's possible that we might keep that
information for other orders. So maybe we should have another entry at
(1 << order) that keeps the node around and could theoretically keep
the order information around for the now-NULL entry?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH 05/11] shmem: account for large order folios
2023-10-28 21:15 ` [RFC PATCH 05/11] shmem: account for large order folios Daniel Gomez
@ 2023-10-29 20:40 ` Matthew Wilcox
0 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2023-10-29 20:40 UTC (permalink / raw)
To: Daniel Gomez
Cc: minchan, senozhatsky, axboe, djwong, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
gost.dev, Pankaj Raghav
On Sat, Oct 28, 2023 at 09:15:42PM +0000, Daniel Gomez wrote:
> @@ -856,16 +856,16 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
> pgoff_t start, pgoff_t end)
> {
> XA_STATE(xas, &mapping->i_pages, start);
> - struct page *page;
> + struct folio *folio;
> unsigned long swapped = 0;
> unsigned long max = end - 1;
>
> rcu_read_lock();
> - xas_for_each(&xas, page, max) {
> - if (xas_retry(&xas, page))
> + xas_for_each(&xas, folio, max) {
> + if (xas_retry(&xas, folio))
> continue;
> - if (xa_is_value(page))
> - swapped++;
> + if (xa_is_value(folio))
> + swapped += folio_nr_pages(folio);
... you can't call folio_nr_pages() if xa_is_value().
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH 00/11] shmem: high order folios support in write path
2023-10-28 21:15 ` [RFC PATCH 00/11] shmem: high order folios support in " Daniel Gomez
` (10 preceding siblings ...)
[not found] ` <CGME20231028211553eucas1p1a93637df6c46692531894e26023920d5@eucas1p1.samsung.com>
@ 2023-10-29 20:43 ` Matthew Wilcox
11 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2023-10-29 20:43 UTC (permalink / raw)
To: Daniel Gomez
Cc: minchan, senozhatsky, axboe, djwong, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
gost.dev, Pankaj Raghav
On Sat, Oct 28, 2023 at 09:15:34PM +0000, Daniel Gomez wrote:
> This series try to add support for high order folios in shmem write and
> fallocate paths when swap is disabled (noswap option). This is part of the
> Large Block Size (LBS) effort [1][2] and a continuation of the shmem work from
> Luis here [3] following Matthew Wilcox's suggestion [4] regarding the path to
> take for the folio allocation order calculation.
I don't see how this is part of the LBS effort. shmem doesn't use a
block device. swap might, but that's a separate problem, as you've
pointed out.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH 06/11] shmem: trace shmem_add_to_page_cache folio order
2023-10-28 21:15 ` [RFC PATCH 06/11] shmem: trace shmem_add_to_page_cache folio order Daniel Gomez
@ 2023-10-29 23:14 ` Matthew Wilcox
0 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2023-10-29 23:14 UTC (permalink / raw)
To: Daniel Gomez
Cc: minchan, senozhatsky, axboe, djwong, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
gost.dev, Pankaj Raghav
On Sat, Oct 28, 2023 at 09:15:44PM +0000, Daniel Gomez wrote:
> To be able to trace and account for order of the folio.
>
> Based on include/trace/filemap.h.
Why is this better than using trace_mm_filemap_add_to_page_cache()?
It's basically the same thing.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH 07/11] shmem: remove huge arg from shmem_alloc_and_add_folio()
2023-10-28 21:15 ` [RFC PATCH 07/11] shmem: remove huge arg from shmem_alloc_and_add_folio() Daniel Gomez
@ 2023-10-29 23:17 ` Matthew Wilcox
0 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2023-10-29 23:17 UTC (permalink / raw)
To: Daniel Gomez
Cc: minchan, senozhatsky, axboe, djwong, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
gost.dev, Pankaj Raghav
On Sat, Oct 28, 2023 at 09:15:45PM +0000, Daniel Gomez wrote:
> The huge flag is already part of of the memory allocation flag (gfp_t).
> Make use of the VM_HUGEPAGE bit set by vma_thp_gfp_mask() to know if
> the allocation must be a huge page.
... what?
> + if (gfp & VM_HUGEPAGE) {
Does sparse not complain about this? VM_HUGEPAGE is never part of
the GFP flags and there's supposed to be annotations that make the
various checkers warn.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH 10/11] shmem: add large folio support to the write path
2023-10-28 21:15 ` [RFC PATCH 10/11] shmem: add large folio support to the write path Daniel Gomez
2023-10-28 23:51 ` kernel test robot
@ 2023-10-29 23:32 ` Matthew Wilcox
1 sibling, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2023-10-29 23:32 UTC (permalink / raw)
To: Daniel Gomez
Cc: minchan, senozhatsky, axboe, djwong, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
gost.dev, Pankaj Raghav
On Sat, Oct 28, 2023 at 09:15:50PM +0000, Daniel Gomez wrote:
> +++ b/mm/shmem.c
> @@ -1621,6 +1621,9 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, struct shmem_inode_info *info,
> pgoff_t ilx;
> struct page *page;
>
> + if ((order != 0) && !(gfp & VM_HUGEPAGE))
> + gfp |= __GFP_COMP;
This is silly. Just set it unconditionally.
> +static inline unsigned int
> +shmem_mapping_size_order(struct address_space *mapping, pgoff_t index,
> + size_t size, struct shmem_sb_info *sbinfo)
> +{
> + unsigned int order = ilog2(size);
> +
> + if ((order <= PAGE_SHIFT) ||
> + (!mapping_large_folio_support(mapping) || !sbinfo->noswap))
> + return 0;
> +
> + order -= PAGE_SHIFT;
You know we have get_order(), right?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH 09/11] shmem: add order arg to shmem_alloc_folio()
2023-10-28 21:15 ` [RFC PATCH 09/11] shmem: add order arg to shmem_alloc_folio() Daniel Gomez
@ 2023-10-31 7:04 ` Hannes Reinecke
0 siblings, 0 replies; 36+ messages in thread
From: Hannes Reinecke @ 2023-10-31 7:04 UTC (permalink / raw)
To: Daniel Gomez, minchan, senozhatsky, axboe, djwong, willy, hughd,
akpm, mcgrof, linux-kernel, linux-block, linux-xfs,
linux-fsdevel, linux-mm
Cc: gost.dev, Pankaj Raghav
On 10/28/23 23:15, Daniel Gomez wrote:
> Add folio order argument to the shmem_alloc_folio() and merge it with
> the shmem_alloc_folio_huge(). Return will make use of the new
> page_rmappable_folio() where order-0 and high order folios are
> both supported.
>
> Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> ---
> mm/shmem.c | 33 ++++++++++-----------------------
> 1 file changed, 10 insertions(+), 23 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d8dc2ceaba18..fc7605da4316 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1614,40 +1614,27 @@ static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp)
> return result;
> }
>
> -static struct folio *shmem_alloc_hugefolio(gfp_t gfp,
> - struct shmem_inode_info *info, pgoff_t index)
> +static struct folio *shmem_alloc_folio(gfp_t gfp, struct shmem_inode_info *info,
> + pgoff_t index, unsigned int order)
> {
> struct mempolicy *mpol;
> pgoff_t ilx;
> struct page *page;
>
> - mpol = shmem_get_pgoff_policy(info, index, HPAGE_PMD_ORDER, &ilx);
> - page = alloc_pages_mpol(gfp, HPAGE_PMD_ORDER, mpol, ilx, numa_node_id());
> + mpol = shmem_get_pgoff_policy(info, index, order, &ilx);
> + page = alloc_pages_mpol(gfp, order, mpol, ilx, numa_node_id());
> mpol_cond_put(mpol);
>
> return page_rmappable_folio(page);
> }
>
> -static struct folio *shmem_alloc_folio(gfp_t gfp,
> - struct shmem_inode_info *info, pgoff_t index)
> -{
> - struct mempolicy *mpol;
> - pgoff_t ilx;
> - struct page *page;
> -
> - mpol = shmem_get_pgoff_policy(info, index, 0, &ilx);
> - page = alloc_pages_mpol(gfp, 0, mpol, ilx, numa_node_id());
> - mpol_cond_put(mpol);
> -
> - return (struct folio *)page;
> -}
> -
> static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
> struct inode *inode, pgoff_t index,
> struct mm_struct *fault_mm, size_t len)
> {
> struct address_space *mapping = inode->i_mapping;
> struct shmem_inode_info *info = SHMEM_I(inode);
> + unsigned int order = 0;
> struct folio *folio;
> long pages;
> int error;
> @@ -1668,12 +1655,12 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
> index + HPAGE_PMD_NR - 1, XA_PRESENT))
> return ERR_PTR(-E2BIG);
>
> - folio = shmem_alloc_hugefolio(gfp, info, index);
> + folio = shmem_alloc_folio(gfp, info, index, HPAGE_PMD_ORDER);
> if (!folio)
> count_vm_event(THP_FILE_FALLBACK);
> } else {
> - pages = 1;
> - folio = shmem_alloc_folio(gfp, info, index);
> + pages = 1UL << order;
> + folio = shmem_alloc_folio(gfp, info, index, order);
> }
> if (!folio)
> return ERR_PTR(-ENOMEM);
> @@ -1774,7 +1761,7 @@ static int shmem_replace_folio(struct folio **foliop, gfp_t gfp,
> */
> gfp &= ~GFP_CONSTRAINT_MASK;
> VM_BUG_ON_FOLIO(folio_test_large(old), old);
> - new = shmem_alloc_folio(gfp, info, index);
> + new = shmem_alloc_folio(gfp, info, index, 0);
Shouldn't you use folio_order(old) here?
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH 01/11] XArray: add cmpxchg order test
2023-10-29 20:11 ` Matthew Wilcox
@ 2023-11-03 23:12 ` Daniel Gomez
0 siblings, 0 replies; 36+ messages in thread
From: Daniel Gomez @ 2023-11-03 23:12 UTC (permalink / raw)
To: Matthew Wilcox
Cc: minchan, senozhatsky, axboe, djwong, hughd, akpm, mcgrof,
linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
gost.dev, Pankaj Raghav
On Sun, Oct 29, 2023 at 08:11:32PM +0000, Matthew Wilcox wrote:
> On Sat, Oct 28, 2023 at 09:15:35PM +0000, Daniel Gomez wrote:
> > +static noinline void check_cmpxchg_order(struct xarray *xa)
> > +{
> > + void *FIVE = xa_mk_value(5);
> > + unsigned int order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 15 : 1;
>
> ... have you tried this with CONFIG_XARRAY_MULTI deselected?
> I suspect it will BUG() because orders greater than 0 are not allowed.
>
> > + XA_BUG_ON(xa, !xa_empty(xa));
> > + XA_BUG_ON(xa, xa_store_index(xa, 5, GFP_KERNEL) != NULL);
> > + XA_BUG_ON(xa, xa_insert(xa, 5, FIVE, GFP_KERNEL) != -EBUSY);
> > + XA_BUG_ON(xa, xa_store_order(xa, 5, order, FIVE, GFP_KERNEL));
> > + XA_BUG_ON(xa, xa_get_order(xa, 5) != order);
> > + XA_BUG_ON(xa, xa_get_order(xa, xa_to_value(FIVE)) != order);
> > + old = xa_cmpxchg(xa, 5, FIVE, NULL, GFP_KERNEL);
> > + XA_BUG_ON(xa, old != FIVE);
> > + XA_BUG_ON(xa, xa_get_order(xa, 5) != 0);
> > + XA_BUG_ON(xa, xa_get_order(xa, xa_to_value(FIVE)) != 0);
> > + XA_BUG_ON(xa, xa_get_order(xa, xa_to_value(old)) != 0);
> > + XA_BUG_ON(xa, !xa_empty(xa));
>
> I'm not sure this is a great test. It definitely does do what you claim
> it will, but for example, it's possible that we might keep that
> information for other orders. So maybe we should have another entry at
> (1 << order) that keeps the node around and could theoretically keep
> the order information around for the now-NULL entry?
Thanks Matthew for the review. I'm sending a separate patch with the
fixes and improvements on the XArray cmpxchg test.
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2023-11-03 23:13 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CGME20230919135546eucas1p1181b8914fb5eceda5f08068802941358@eucas1p1.samsung.com>
2023-09-19 13:55 ` [PATCH v2 0/6] shmem: high order folios support in write path Daniel Gomez
[not found] ` <CGME20230919135547eucas1p2777d9fde904adf4c2d0ac665d78880c1@eucas1p2.samsung.com>
2023-09-19 13:55 ` [PATCH v2 1/6] shmem: drop BLOCKS_PER_PAGE macro Daniel Gomez
[not found] ` <CGME20230919135549eucas1p1f67e7879a14a87724a9462fb8dd635bf@eucas1p1.samsung.com>
2023-09-19 13:55 ` [PATCH v2 2/6] shmem: return freed pages in shmem_free_swap Daniel Gomez
2023-09-19 14:56 ` Matthew Wilcox
[not found] ` <CGME20230919135550eucas1p2c19565924daeecf71734ea89d95c84db@eucas1p2.samsung.com>
2023-09-19 13:55 ` [PATCH v2 3/6] shmem: account for large order folios Daniel Gomez
[not found] ` <CGME20230919135552eucas1p11e19cd339078c2e0b788b52fae46e7c9@eucas1p1.samsung.com>
2023-09-19 13:55 ` [PATCH v2 4/6] shmem: add order parameter support to shmem_alloc_folio Daniel Gomez
[not found] ` <CGME20230919135554eucas1p1fefbe420a2381465f3b6b2b7f298433c@eucas1p1.samsung.com>
2023-09-19 13:55 ` [PATCH v2 5/6] shmem: add file length in shmem_get_folio path Daniel Gomez
2023-09-20 18:03 ` kernel test robot
[not found] ` <CGME20230919135556eucas1p19920c52d4af0809499eac6bbf4466117@eucas1p1.samsung.com>
2023-09-19 13:55 ` [PATCH v2 6/6] shmem: add large folios support to the write path Daniel Gomez
2023-09-19 15:01 ` Matthew Wilcox
2023-09-19 16:28 ` Daniel Gomez
2023-09-20 17:41 ` kernel test robot
2023-09-25 20:39 ` kernel test robot
[not found] ` <CGME20231028211535eucas1p250e19444b8c973221b7cb9e8ab957da7@eucas1p2.samsung.com>
2023-10-28 21:15 ` [RFC PATCH 00/11] shmem: high order folios support in " Daniel Gomez
[not found] ` <CGME20231028211538eucas1p186e33f92dbea7030f14f7f79aa1b8d54@eucas1p1.samsung.com>
2023-10-28 21:15 ` [RFC PATCH 01/11] XArray: add cmpxchg order test Daniel Gomez
2023-10-29 20:11 ` Matthew Wilcox
2023-11-03 23:12 ` Daniel Gomez
[not found] ` <CGME20231028211538eucas1p1456b4c759a9fed51a6a77fbf2c946011@eucas1p1.samsung.com>
2023-10-28 21:15 ` [RFC PATCH 02/11] test_xarray: add tests for advanced multi-index use Daniel Gomez
[not found] ` <CGME20231028211540eucas1p1fe328f4dadd3645c2c086055efc872ad@eucas1p1.samsung.com>
2023-10-28 21:15 ` [RFC PATCH 03/11] shmem: drop BLOCKS_PER_PAGE macro Daniel Gomez
[not found] ` <CGME20231028211541eucas1p26663bd957cb449c7346b9dd00e33a20f@eucas1p2.samsung.com>
2023-10-28 21:15 ` [RFC PATCH 04/11] shmem: return number of pages beeing freed in shmem_free_swap Daniel Gomez
[not found] ` <CGME20231028211543eucas1p2c980dda91fdccaa0b5af3734c357b2f7@eucas1p2.samsung.com>
2023-10-28 21:15 ` [RFC PATCH 05/11] shmem: account for large order folios Daniel Gomez
2023-10-29 20:40 ` Matthew Wilcox
[not found] ` <CGME20231028211545eucas1p2da564864423007a5ab006cdd1ab4d4a1@eucas1p2.samsung.com>
2023-10-28 21:15 ` [RFC PATCH 06/11] shmem: trace shmem_add_to_page_cache folio order Daniel Gomez
2023-10-29 23:14 ` Matthew Wilcox
[not found] ` <CGME20231028211546eucas1p2147a423b26a6fa92be7e6c20df429da5@eucas1p2.samsung.com>
2023-10-28 21:15 ` [RFC PATCH 07/11] shmem: remove huge arg from shmem_alloc_and_add_folio() Daniel Gomez
2023-10-29 23:17 ` Matthew Wilcox
[not found] ` <CGME20231028211548eucas1p18d34af3d578966ba6778d4e60751789d@eucas1p1.samsung.com>
2023-10-28 21:15 ` [RFC PATCH 08/11] shmem: add file length arg in shmem_get_folio() path Daniel Gomez
[not found] ` <CGME20231028211550eucas1p1dc1d47e413de350deda962c3df5111ef@eucas1p1.samsung.com>
2023-10-28 21:15 ` [RFC PATCH 09/11] shmem: add order arg to shmem_alloc_folio() Daniel Gomez
2023-10-31 7:04 ` Hannes Reinecke
[not found] ` <CGME20231028211551eucas1p1552b7695f12c27f4ea1b92ecb6259b31@eucas1p1.samsung.com>
2023-10-28 21:15 ` [RFC PATCH 10/11] shmem: add large folio support to the write path Daniel Gomez
2023-10-28 23:51 ` kernel test robot
2023-10-29 23:32 ` Matthew Wilcox
[not found] ` <CGME20231028211553eucas1p1a93637df6c46692531894e26023920d5@eucas1p1.samsung.com>
2023-10-28 21:15 ` [RFC PATCH 11/11] shmem: add per-block uptodate tracking Daniel Gomez
2023-10-28 23:51 ` kernel test robot
2023-10-29 4:46 ` kernel test robot
2023-10-29 20:43 ` [RFC PATCH 00/11] shmem: high order folios support in write path Matthew Wilcox
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.