We actually use nrexceptional for very little these days. It's a minor pain to keep in sync with nrpages, but the pain becomes much bigger with the THP patches because we don't know how many indices a shadow entry occupies. It's easier to just remove it than keep it accurate. Also, we save 8 bytes per inode which is nothing to sneeze at; on my laptop, it would improve shmem_inode_cache from 22 to 23 objects per 16kB, and inode_cache from 26 to 27 objects. Combined, that saves a megabyte of memory from a combined usage of 25MB for both caches. Unfortunately, ext4 doesn't cross a magic boundary, so it doesn't save any memory for ext4. Matthew Wilcox (Oracle) (4): mm: Introduce and use mapping_empty mm: Stop accounting shadow entries dax: Account DAX entries as nrpages mm: Remove nrexceptional from inode fs/block_dev.c | 2 +- fs/dax.c | 8 ++++---- fs/gfs2/glock.c | 3 +-- fs/inode.c | 2 +- include/linux/fs.h | 2 -- include/linux/pagemap.h | 5 +++++ mm/filemap.c | 16 ---------------- mm/swap_state.c | 4 ---- mm/truncate.c | 19 +++---------------- mm/workingset.c | 1 - 10 files changed, 15 insertions(+), 47 deletions(-) -- 2.28.0
Instead of checking the two counters (nrpages and nrexceptional), we can just check whether i_pages is empty. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Vishal Verma <vishal.l.verma@intel.com> --- fs/block_dev.c | 2 +- fs/dax.c | 2 +- fs/gfs2/glock.c | 3 +-- include/linux/pagemap.h | 5 +++++ mm/truncate.c | 18 +++--------------- 5 files changed, 11 insertions(+), 19 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 9e84b1928b94..34105f66e12f 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -79,7 +79,7 @@ static void kill_bdev(struct block_device *bdev) { struct address_space *mapping = bdev->bd_inode->i_mapping; - if (mapping->nrpages == 0 && mapping->nrexceptional == 0) + if (mapping_empty(mapping)) return; invalidate_bh_lrus(); diff --git a/fs/dax.c b/fs/dax.c index 5b47834f2e1b..53ed0ab8c958 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -965,7 +965,7 @@ int dax_writeback_mapping_range(struct address_space *mapping, if (WARN_ON_ONCE(inode->i_blkbits != PAGE_SHIFT)) return -EIO; - if (!mapping->nrexceptional || wbc->sync_mode != WB_SYNC_ALL) + if (mapping_empty(mapping) || wbc->sync_mode != WB_SYNC_ALL) return 0; trace_dax_writeback_range(inode, xas.xa_index, end_index); diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 5441c17562c5..bfad01ce096d 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -273,8 +273,7 @@ static void __gfs2_glock_put(struct gfs2_glock *gl) if (mapping) { truncate_inode_pages_final(mapping); if (!gfs2_withdrawn(sdp)) - GLOCK_BUG_ON(gl, mapping->nrpages || - mapping->nrexceptional); + GLOCK_BUG_ON(gl, !mapping_empty(mapping)); } trace_gfs2_glock_put(gl); sdp->sd_lockstruct.ls_ops->lm_put_lock(gl); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index dc3390e6ee3e..86143d36d028 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -18,6 +18,11 @@ struct pagevec; +static inline bool mapping_empty(struct address_space *mapping) +{ + return xa_empty(&mapping->i_pages); +} + /* * Bits in mapping->flags. */ diff --git a/mm/truncate.c b/mm/truncate.c index 11ef90d7e3af..58524aaf67e2 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -342,7 +342,7 @@ void truncate_inode_pages_range(struct address_space *mapping, struct page * page; bool partial_end; - if (mapping->nrpages == 0 && mapping->nrexceptional == 0) + if (mapping_empty(mapping)) goto out; /* @@ -470,9 +470,6 @@ EXPORT_SYMBOL(truncate_inode_pages); */ void truncate_inode_pages_final(struct address_space *mapping) { - unsigned long nrexceptional; - unsigned long nrpages; - /* * Page reclaim can not participate in regular inode lifetime * management (can't call iput()) and thus can race with the @@ -482,16 +479,7 @@ void truncate_inode_pages_final(struct address_space *mapping) */ mapping_set_exiting(mapping); - /* - * When reclaim installs eviction entries, it increases - * nrexceptional first, then decreases nrpages. Make sure we see - * this in the right order or we might miss an entry. - */ - nrpages = mapping->nrpages; - smp_rmb(); - nrexceptional = mapping->nrexceptional; - - if (nrpages || nrexceptional) { + if (!mapping_empty(mapping)) { /* * As truncation uses a lockless tree lookup, cycle * the tree lock to make sure any ongoing tree @@ -657,7 +645,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping, int ret2 = 0; int did_range_unmap = 0; - if (mapping->nrpages == 0 && mapping->nrexceptional == 0) + if (mapping_empty(mapping)) goto out; pagevec_init(&pvec); -- 2.28.0
We no longer need to keep track of how many shadow entries are present in a mapping. This saves a few writes to the inode and memory barriers. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Vishal Verma <vishal.l.verma@intel.com> --- mm/filemap.c | 13 ------------- mm/swap_state.c | 4 ---- mm/truncate.c | 1 - mm/workingset.c | 1 - 4 files changed, 19 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index bd116f63263e..2e68116be4b0 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -140,17 +140,6 @@ static void page_cache_delete(struct address_space *mapping, page->mapping = NULL; /* Leave page->index set: truncation lookup relies upon it */ - - if (shadow) { - mapping->nrexceptional += nr; - /* - * Make sure the nrexceptional update is committed before - * the nrpages update so that final truncate racing - * with reclaim does not see both counters 0 at the - * same time and miss a shadow entry. - */ - smp_wmb(); - } mapping->nrpages -= nr; } @@ -883,8 +872,6 @@ noinline int __add_to_page_cache_locked(struct page *page, if (xas_error(&xas)) goto unlock; - if (old) - mapping->nrexceptional--; mapping->nrpages++; /* hugetlb pages do not participate in page cache accounting */ diff --git a/mm/swap_state.c b/mm/swap_state.c index ee465827420e..85aca8d63aeb 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -160,7 +160,6 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, xas_store(&xas, page); xas_next(&xas); } - address_space->nrexceptional -= nr_shadows; address_space->nrpages += nr; __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr); ADD_CACHE_INFO(add_total, nr); @@ -199,8 +198,6 @@ void __delete_from_swap_cache(struct page *page, xas_next(&xas); } ClearPageSwapCache(page); - if (shadow) - address_space->nrexceptional += nr; address_space->nrpages -= nr; __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr); ADD_CACHE_INFO(del_total, nr); @@ -301,7 +298,6 @@ void clear_shadow_from_swap_cache(int type, unsigned long begin, xas_store(&xas, NULL); nr_shadows++; } - address_space->nrexceptional -= nr_shadows; xa_unlock_irq(&address_space->i_pages); /* search the next swapcache until we meet end */ diff --git a/mm/truncate.c b/mm/truncate.c index 58524aaf67e2..27cf411ae51f 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -40,7 +40,6 @@ static inline void __clear_shadow_entry(struct address_space *mapping, if (xas_load(&xas) != entry) return; xas_store(&xas, NULL); - mapping->nrexceptional--; } static void clear_shadow_entry(struct address_space *mapping, pgoff_t index, diff --git a/mm/workingset.c b/mm/workingset.c index 975a4d2dd02e..74d5f460e446 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -557,7 +557,6 @@ static enum lru_status shadow_lru_isolate(struct list_head *item, goto out_invalid; if (WARN_ON_ONCE(node->count != node->nr_values)) goto out_invalid; - mapping->nrexceptional -= node->nr_values; xa_delete_node(node, workingset_update_node); __inc_lruvec_slab_state(node, WORKINGSET_NODERECLAIM); -- 2.28.0
Simplify mapping_needs_writeback() by accounting DAX entries as pages instead of exceptional entries. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Vishal Verma <vishal.l.verma@intel.com> --- fs/dax.c | 6 +++--- mm/filemap.c | 3 --- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 53ed0ab8c958..a20f2342a9e4 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -525,7 +525,7 @@ static void *grab_mapping_entry(struct xa_state *xas, dax_disassociate_entry(entry, mapping, false); xas_store(xas, NULL); /* undo the PMD join */ dax_wake_entry(xas, entry, true); - mapping->nrexceptional--; + mapping->nrpages -= PG_PMD_NR; entry = NULL; xas_set(xas, index); } @@ -541,7 +541,7 @@ static void *grab_mapping_entry(struct xa_state *xas, dax_lock_entry(xas, entry); if (xas_error(xas)) goto out_unlock; - mapping->nrexceptional++; + mapping->nrpages += 1UL << order; } out_unlock: @@ -661,7 +661,7 @@ static int __dax_invalidate_entry(struct address_space *mapping, goto out; dax_disassociate_entry(entry, mapping, trunc); xas_store(&xas, NULL); - mapping->nrexceptional--; + mapping->nrpages -= 1UL << dax_entry_order(entry); ret = 1; out: put_unlocked_entry(&xas, entry); diff --git a/mm/filemap.c b/mm/filemap.c index 2e68116be4b0..2214a2c48dd1 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -616,9 +616,6 @@ EXPORT_SYMBOL(filemap_fdatawait_keep_errors); /* Returns true if writeback might be needed or already in progress. */ static bool mapping_needs_writeback(struct address_space *mapping) { - if (dax_mapping(mapping)) - return mapping->nrexceptional; - return mapping->nrpages; } -- 2.28.0
We no longer track anything in nrexceptional, so remove it, saving 8 bytes per inode. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Vishal Verma <vishal.l.verma@intel.com> --- fs/inode.c | 2 +- include/linux/fs.h | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 9d78c37b00b8..4531358ae97b 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -530,7 +530,7 @@ void clear_inode(struct inode *inode) */ xa_lock_irq(&inode->i_data.i_pages); BUG_ON(inode->i_data.nrpages); - BUG_ON(inode->i_data.nrexceptional); + BUG_ON(!mapping_empty(&inode->i_data)); xa_unlock_irq(&inode->i_data.i_pages); BUG_ON(!list_empty(&inode->i_data.private_list)); BUG_ON(!(inode->i_state & I_FREEING)); diff --git a/include/linux/fs.h b/include/linux/fs.h index 0bd126418bb6..a5d801430040 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -439,7 +439,6 @@ int pagecache_write_end(struct file *, struct address_space *mapping, * @i_mmap: Tree of private and shared mappings. * @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable. * @nrpages: Number of page entries, protected by the i_pages lock. - * @nrexceptional: Shadow or DAX entries, protected by the i_pages lock. * @writeback_index: Writeback starts here. * @a_ops: Methods. * @flags: Error bits and flags (AS_*). @@ -460,7 +459,6 @@ struct address_space { struct rb_root_cached i_mmap; struct rw_semaphore i_mmap_rwsem; unsigned long nrpages; - unsigned long nrexceptional; pgoff_t writeback_index; const struct address_space_operations *a_ops; unsigned long flags; -- 2.28.0
Ping? These patches still apply to next-20210121.
On Mon, Oct 26, 2020 at 03:18:45PM +0000, Matthew Wilcox (Oracle) wrote:
> We actually use nrexceptional for very little these days. It's a minor
> pain to keep in sync with nrpages, but the pain becomes much bigger
> with the THP patches because we don't know how many indices a shadow
> entry occupies. It's easier to just remove it than keep it accurate.
>
> Also, we save 8 bytes per inode which is nothing to sneeze at; on my
> laptop, it would improve shmem_inode_cache from 22 to 23 objects per
> 16kB, and inode_cache from 26 to 27 objects. Combined, that saves
> a megabyte of memory from a combined usage of 25MB for both caches.
> Unfortunately, ext4 doesn't cross a magic boundary, so it doesn't save
> any memory for ext4.
>
> Matthew Wilcox (Oracle) (4):
> mm: Introduce and use mapping_empty
> mm: Stop accounting shadow entries
> dax: Account DAX entries as nrpages
> mm: Remove nrexceptional from inode
>
> fs/block_dev.c | 2 +-
> fs/dax.c | 8 ++++----
> fs/gfs2/glock.c | 3 +--
> fs/inode.c | 2 +-
> include/linux/fs.h | 2 --
> include/linux/pagemap.h | 5 +++++
> mm/filemap.c | 16 ----------------
> mm/swap_state.c | 4 ----
> mm/truncate.c | 19 +++----------------
> mm/workingset.c | 1 -
> 10 files changed, 15 insertions(+), 47 deletions(-)
>
> --
> 2.28.0
>
On Mon, Oct 26, 2020 at 03:18:46PM +0000, Matthew Wilcox (Oracle) wrote:
> Instead of checking the two counters (nrpages and nrexceptional), we
> can just check whether i_pages is empty.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Heh, I was looking for the fs/inode.c hunk here, because I remember
those BUG_ONs in the inode free path. Found it in the last patch - I
guess they escaped grep but the compiler let you know? :-)
On Mon, Oct 26, 2020 at 03:18:47PM +0000, Matthew Wilcox (Oracle) wrote:
> We no longer need to keep track of how many shadow entries are
> present in a mapping. This saves a few writes to the inode and
> memory barriers.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
On Mon, Oct 26, 2020 at 03:18:48PM +0000, Matthew Wilcox (Oracle) wrote:
> Simplify mapping_needs_writeback() by accounting DAX entries as
> pages instead of exceptional entries.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
On Mon, Oct 26, 2020 at 03:18:49PM +0000, Matthew Wilcox (Oracle) wrote:
> We no longer track anything in nrexceptional, so remove it, saving 8
> bytes per inode.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
On Thu, Jan 21, 2021 at 03:42:31PM -0500, Johannes Weiner wrote:
> On Mon, Oct 26, 2020 at 03:18:46PM +0000, Matthew Wilcox (Oracle) wrote:
> > Instead of checking the two counters (nrpages and nrexceptional), we
> > can just check whether i_pages is empty.
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Tested-by: Vishal Verma <vishal.l.verma@intel.com>
>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>
> Heh, I was looking for the fs/inode.c hunk here, because I remember
> those BUG_ONs in the inode free path. Found it in the last patch - I
> guess they escaped grep but the compiler let you know? :-)
Heh, I forget now! I think I did it that way on purpose, but now I
forget what that purpose was!
> On Oct 26, 2020, at 9:18 AM, Matthew Wilcox (Oracle) <willy@infradead.org> wrote:
>
> We actually use nrexceptional for very little these days. It's a minor
> pain to keep in sync with nrpages, but the pain becomes much bigger
> with the THP patches because we don't know how many indices a shadow
> entry occupies. It's easier to just remove it than keep it accurate.
>
> Also, we save 8 bytes per inode which is nothing to sneeze at; on my
> laptop, it would improve shmem_inode_cache from 22 to 23 objects per
> 16kB, and inode_cache from 26 to 27 objects. Combined, that saves
> a megabyte of memory from a combined usage of 25MB for both caches.
> Unfortunately, ext4 doesn't cross a magic boundary, so it doesn't save
> any memory for ext4.
>
> Matthew Wilcox (Oracle) (4):
> mm: Introduce and use mapping_empty
> mm: Stop accounting shadow entries
> dax: Account DAX entries as nrpages
> mm: Remove nrexceptional from inode
>
> fs/block_dev.c | 2 +-
> fs/dax.c | 8 ++++----
> fs/gfs2/glock.c | 3 +--
> fs/inode.c | 2 +-
> include/linux/fs.h | 2 --
> include/linux/pagemap.h | 5 +++++
> mm/filemap.c | 16 ----------------
> mm/swap_state.c | 4 ----
> mm/truncate.c | 19 +++----------------
> mm/workingset.c | 1 -
> 10 files changed, 15 insertions(+), 47 deletions(-)
>
> --
> 2.28.0
Looks good to me.
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Ping?
On Thu, Jan 21, 2021 at 06:43:34PM +0000, Matthew Wilcox wrote:
> Ping? These patches still apply to next-20210121.
>
> On Mon, Oct 26, 2020 at 03:18:45PM +0000, Matthew Wilcox (Oracle) wrote:
> > We actually use nrexceptional for very little these days. It's a minor
> > pain to keep in sync with nrpages, but the pain becomes much bigger
> > with the THP patches because we don't know how many indices a shadow
> > entry occupies. It's easier to just remove it than keep it accurate.
> >
> > Also, we save 8 bytes per inode which is nothing to sneeze at; on my
> > laptop, it would improve shmem_inode_cache from 22 to 23 objects per
> > 16kB, and inode_cache from 26 to 27 objects. Combined, that saves
> > a megabyte of memory from a combined usage of 25MB for both caches.
> > Unfortunately, ext4 doesn't cross a magic boundary, so it doesn't save
> > any memory for ext4.
> >
> > Matthew Wilcox (Oracle) (4):
> > mm: Introduce and use mapping_empty
> > mm: Stop accounting shadow entries
> > dax: Account DAX entries as nrpages
> > mm: Remove nrexceptional from inode
> >
> > fs/block_dev.c | 2 +-
> > fs/dax.c | 8 ++++----
> > fs/gfs2/glock.c | 3 +--
> > fs/inode.c | 2 +-
> > include/linux/fs.h | 2 --
> > include/linux/pagemap.h | 5 +++++
> > mm/filemap.c | 16 ----------------
> > mm/swap_state.c | 4 ----
> > mm/truncate.c | 19 +++----------------
> > mm/workingset.c | 1 -
> > 10 files changed, 15 insertions(+), 47 deletions(-)
> >
> > --
> > 2.28.0
> >
>