From: Jan Kara <jack@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Jan Kara <jack@suse.cz>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
kernel-team@fb.com
Subject: Re: [PATCH 3/6] lib: radix-tree: native accounting of exceptional entries
Date: Tue, 8 Nov 2016 11:08:41 +0100 [thread overview]
Message-ID: <20161108100841.GJ32353@quack2.suse.cz> (raw)
In-Reply-To: <20161107190741.3619-4-hannes@cmpxchg.org>
On Mon 07-11-16 14:07:38, Johannes Weiner wrote:
> The way the page cache is sneaking shadow entries of evicted pages
> into the radix tree past the node entry accounting and tracking them
> manually in the upper bits of node->count is fraught with problems.
>
> These shadow entries are marked in the tree as exceptional entries,
> which are a native concept to the radix tree. Maintain an explicit
> counter of exceptional entries in the radix tree node. Subsequent
> patches will switch shadow entry tracking over to that counter.
>
> DAX and shmem are the other users of exceptional entries. Since slot
> replacements that change the entry type from regular to exceptional
> must now be accounted, introduce a __radix_tree_replace() function
> that does replacement and accounting, and switch DAX and shmem over.
>
> The increase in radix tree node size is temporary. The last patch will
> transition from the shadow count in the upper bits of node->count to
> the new exceptional entry counter and shrink node->count to one byte.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
The patch looks good to me. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/dax.c | 5 +++--
> include/linux/radix-tree.h | 10 +++++++---
> lib/radix-tree.c | 50 +++++++++++++++++++++++++++++++++++++++++++---
> mm/shmem.c | 8 ++++----
> 4 files changed, 61 insertions(+), 12 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 014defd2e744..db78bae0dc0f 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -643,12 +643,13 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
> }
> mapping->nrexceptional++;
> } else {
> + struct radix_tree_node *node;
> void **slot;
> void *ret;
>
> - ret = __radix_tree_lookup(page_tree, index, NULL, &slot);
> + ret = __radix_tree_lookup(page_tree, index, &node, &slot);
> WARN_ON_ONCE(ret != entry);
> - radix_tree_replace_slot(slot, new_entry);
> + __radix_tree_replace(page_tree, node, slot, new_entry);
> }
> if (vmf->flags & FAULT_FLAG_WRITE)
> radix_tree_tag_set(page_tree, index, PAGECACHE_TAG_DIRTY);
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index af3581b8a451..7ced8a70cc8b 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -85,9 +85,10 @@ static inline bool radix_tree_is_internal_node(void *ptr)
> #define RADIX_TREE_COUNT_MASK ((1UL << RADIX_TREE_COUNT_SHIFT) - 1)
>
> struct radix_tree_node {
> - unsigned char shift; /* Bits remaining in each slot */
> - unsigned char offset; /* Slot offset in parent */
> - unsigned int count;
> + unsigned char shift; /* Bits remaining in each slot */
> + unsigned char offset; /* Slot offset in parent */
> + unsigned int count; /* Total entry count */
> + unsigned char exceptional; /* Exceptional entry count */
> union {
> struct {
> /* Used when ascending tree */
> @@ -276,6 +277,9 @@ void *__radix_tree_lookup(struct radix_tree_root *root, unsigned long index,
> struct radix_tree_node **nodep, void ***slotp);
> void *radix_tree_lookup(struct radix_tree_root *, unsigned long);
> void **radix_tree_lookup_slot(struct radix_tree_root *, unsigned long);
> +void __radix_tree_replace(struct radix_tree_root *root,
> + struct radix_tree_node *node,
> + void **slot, void *item);
> bool __radix_tree_delete_node(struct radix_tree_root *root,
> struct radix_tree_node *node);
> void *radix_tree_delete_item(struct radix_tree_root *, unsigned long, void *);
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index 8e6d552c40dd..ddc6facbf4da 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -220,10 +220,10 @@ static void dump_node(struct radix_tree_node *node, unsigned long index)
> {
> unsigned long i;
>
> - pr_debug("radix node: %p offset %d tags %lx %lx %lx shift %d count %d parent %p\n",
> + pr_debug("radix node: %p offset %d tags %lx %lx %lx shift %d count %d exceptional %d parent %p\n",
> node, node->offset,
> node->tags[0][0], node->tags[1][0], node->tags[2][0],
> - node->shift, node->count, node->parent);
> + node->shift, node->count, node->exceptional, node->parent);
>
> for (i = 0; i < RADIX_TREE_MAP_SIZE; i++) {
> unsigned long first = index | (i << node->shift);
> @@ -522,8 +522,13 @@ static int radix_tree_extend(struct radix_tree_root *root,
> node->offset = 0;
> node->count = 1;
> node->parent = NULL;
> - if (radix_tree_is_internal_node(slot))
> + if (radix_tree_is_internal_node(slot)) {
> entry_to_node(slot)->parent = node;
> + } else {
> + /* Moving an exceptional root->rnode to a node */
> + if (radix_tree_exceptional_entry(slot))
> + node->exceptional = 1;
> + }
> node->slots[0] = slot;
> slot = node_to_entry(node);
> rcu_assign_pointer(root->rnode, slot);
> @@ -649,6 +654,8 @@ int __radix_tree_insert(struct radix_tree_root *root, unsigned long index,
> if (node) {
> unsigned offset = get_slot_offset(node, slot);
> node->count++;
> + if (radix_tree_exceptional_entry(item))
> + node->exceptional++;
> BUG_ON(tag_get(node, 0, offset));
> BUG_ON(tag_get(node, 1, offset));
> BUG_ON(tag_get(node, 2, offset));
> @@ -747,6 +754,41 @@ void *radix_tree_lookup(struct radix_tree_root *root, unsigned long index)
> EXPORT_SYMBOL(radix_tree_lookup);
>
> /**
> + * __radix_tree_replace - replace item in a slot
> + * @root: radix tree root
> + * @node: pointer to tree node
> + * @slot: pointer to slot in @node
> + * @item: new item to store in the slot.
> + *
> + * For use with __radix_tree_lookup(). Caller must hold tree write locked
> + * across slot lookup and replacement.
> + */
> +void __radix_tree_replace(struct radix_tree_root *root,
> + struct radix_tree_node *node,
> + void **slot, void *item)
> +{
> + void *old = rcu_dereference_raw(*slot);
> + int count, exceptional;
> +
> + WARN_ON_ONCE(radix_tree_is_internal_node(item));
> +
> + count = !!item - !!old;
> + exceptional = !!radix_tree_exceptional_entry(item) -
> + !!radix_tree_exceptional_entry(old);
> +
> + WARN_ONCE(!node && slot != (void **)&root->rnode &&
> + (count || exceptional),
> + "Unaccounted slot replacement: old=%p item=%p count=%d exceptional=%d\n",
> + old, item, count, exceptional);
> +
> + if (node) {
> + node->count += count;
> + node->exceptional += exceptional;
> + }
> + rcu_assign_pointer(*slot, item);
> +}
> +
> +/**
> * radix_tree_tag_set - set a tag on a radix tree node
> * @root: radix tree root
> * @index: index key
> @@ -1561,6 +1603,8 @@ void *radix_tree_delete_item(struct radix_tree_root *root,
> delete_sibling_entries(node, node_to_entry(slot), offset);
> node->slots[offset] = NULL;
> node->count--;
> + if (radix_tree_exceptional_entry(entry))
> + node->exceptional--;
>
> __radix_tree_delete_node(root, node);
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index ad7813d73ea7..7f3a08df25c9 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -300,18 +300,18 @@ void shmem_uncharge(struct inode *inode, long pages)
> static int shmem_radix_tree_replace(struct address_space *mapping,
> pgoff_t index, void *expected, void *replacement)
> {
> + struct radix_tree_node *node;
> void **pslot;
> void *item;
>
> VM_BUG_ON(!expected);
> VM_BUG_ON(!replacement);
> - pslot = radix_tree_lookup_slot(&mapping->page_tree, index);
> - if (!pslot)
> + item = __radix_tree_lookup(&mapping->page_tree, index, &node, &pslot);
> + if (!item)
> return -ENOENT;
> - item = radix_tree_deref_slot_protected(pslot, &mapping->tree_lock);
> if (item != expected)
> return -ENOENT;
> - radix_tree_replace_slot(pslot, replacement);
> + __radix_tree_replace(&mapping->page_tree, node, pslot, replacement);
> return 0;
> }
>
> --
> 2.10.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2016-11-08 10:08 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-07 19:07 [PATCH 0/6] mm: workingset: radix tree subtleties & single-page file refaults Johannes Weiner
2016-11-07 19:07 ` [PATCH 1/6] mm: khugepaged: fix radix tree node leak in shmem collapse error path Johannes Weiner
2016-11-08 9:53 ` Jan Kara
2016-11-08 16:12 ` Johannes Weiner
2016-11-09 7:41 ` Jan Kara
2016-11-11 10:59 ` Kirill A. Shutemov
2016-11-11 12:22 ` Jan Kara
2016-11-11 16:37 ` Kirill A. Shutemov
2016-11-14 8:07 ` Jan Kara
2016-11-14 14:29 ` Kirill A. Shutemov
2016-11-14 15:52 ` Johannes Weiner
2016-11-14 16:48 ` Johannes Weiner
2016-11-14 19:40 ` Kirill A. Shutemov
2016-11-15 14:00 ` Johannes Weiner
2016-11-07 19:07 ` [PATCH 2/6] mm: workingset: turn shadow node shrinker bugs into warnings Johannes Weiner
2016-11-08 9:57 ` Jan Kara
2016-11-07 19:07 ` [PATCH 3/6] lib: radix-tree: native accounting of exceptional entries Johannes Weiner
2016-11-08 10:08 ` Jan Kara [this message]
2016-11-07 19:07 ` [PATCH 4/6] lib: radix-tree: check accounting of existing slot replacement users Johannes Weiner
2016-11-08 10:12 ` Jan Kara
2016-11-07 19:07 ` [PATCH 5/6] mm: workingset: switch shadow entry tracking to radix tree exceptional counting Johannes Weiner
2016-11-08 10:27 ` Jan Kara
2016-11-08 19:30 ` Johannes Weiner
2016-11-07 19:07 ` [PATCH 6/6] mm: workingset: restore refault tracking for single-page files Johannes Weiner
2016-11-08 10:31 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161108100841.GJ32353@quack2.suse.cz \
--to=jack@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).