All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Stable tree <stable@vger.kernel.org>,
	Greg KH <gregkh@linuxfoundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jan Kara <jack@suse.cz>
Subject: Re: [PATCH stable 4.4 2/4] mm: filemap: don't plant shadow entries without radix tree node
Date: Wed, 26 Oct 2016 14:45:53 +0200	[thread overview]
Message-ID: <20161026124553.GA25683@dhcp22.suse.cz> (raw)
In-Reply-To: <20161025075148.31661-3-mhocko@kernel.org>

Greg,
I do not see this one in the 4.4 queue you have just sent today.

On Tue 25-10-16 09:51:46, Michal Hocko wrote:
> From: Johannes Weiner <hannes@cmpxchg.org>
> 
> commit d3798ae8c6f3767c726403c2ca6ecc317752c9dd upstream.
> 
> When the underflow checks were added to workingset_node_shadow_dec(),
> they triggered immediately:
> 
>   kernel BUG at ./include/linux/swap.h:276!
>   invalid opcode: 0000 [#1] SMP
>   Modules linked in: isofs usb_storage fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6
>    soundcore wmi acpi_als pinctrl_sunrisepoint kfifo_buf tpm_tis industrialio acpi_pad pinctrl_intel tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt
>   CPU: 0 PID: 20929 Comm: blkid Not tainted 4.8.0-rc8-00087-gbe67d60ba944 #1
>   Hardware name: System manufacturer System Product Name/Z170-K, BIOS 1803 05/06/2016
>   task: ffff8faa93ecd940 task.stack: ffff8faa7f478000
>   RIP: page_cache_tree_insert+0xf1/0x100
>   Call Trace:
>     __add_to_page_cache_locked+0x12e/0x270
>     add_to_page_cache_lru+0x4e/0xe0
>     mpage_readpages+0x112/0x1d0
>     blkdev_readpages+0x1d/0x20
>     __do_page_cache_readahead+0x1ad/0x290
>     force_page_cache_readahead+0xaa/0x100
>     page_cache_sync_readahead+0x3f/0x50
>     generic_file_read_iter+0x5af/0x740
>     blkdev_read_iter+0x35/0x40
>     __vfs_read+0xe1/0x130
>     vfs_read+0x96/0x130
>     SyS_read+0x55/0xc0
>     entry_SYSCALL_64_fastpath+0x13/0x8f
>   Code: 03 00 48 8b 5d d8 65 48 33 1c 25 28 00 00 00 44 89 e8 75 19 48 83 c4 18 5b 41 5c 41 5d 41 5e 5d c3 0f 0b 41 bd ef ff ff ff eb d7 <0f> 0b e8 88 68 ef ff 0f 1f 84 00
>   RIP  page_cache_tree_insert+0xf1/0x100
> 
> This is a long-standing bug in the way shadow entries are accounted in
> the radix tree nodes. The shrinker needs to know when radix tree nodes
> contain only shadow entries, no pages, so node->count is split in half
> to count shadows in the upper bits and pages in the lower bits.
> 
> Unfortunately, the radix tree implementation doesn't know of this and
> assumes all entries are in node->count. When there is a shadow entry
> directly in root->rnode and the tree is later extended, the radix tree
> implementation will copy that entry into the new node and and bump its
> node->count, i.e. increases the page count bits. Once the shadow gets
> removed and we subtract from the upper counter, node->count underflows
> and triggers the warning. Afterwards, without node->count reaching 0
> again, the radix tree node is leaked.
> 
> Limit shadow entries to when we have actual radix tree nodes and can
> count them properly. That means we lose the ability to detect refaults
> from files that had only the first page faulted in at eviction time.
> 
> [hannes@cmpxchg.org: backport for 4.4 stable]
> Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/filemap.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 4cfe423d3e8a..7ad648c9780c 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -164,6 +164,14 @@ static void page_cache_tree_delete(struct address_space *mapping,
>  
>  	__radix_tree_lookup(&mapping->page_tree, page->index, &node, &slot);
>  
> +	if (!node) {
> +		/*
> +		 * We need a node to properly account shadow
> +		 * entries. Don't plant any without. XXX
> +		 */
> +		shadow = NULL;
> +	}
> +
>  	if (shadow) {
>  		mapping->nrshadows++;
>  		/*
> -- 
> 2.9.3
> 

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Stable tree <stable@vger.kernel.org>,
	Greg KH <gregkh@linuxfoundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jan Kara <jack@suse.cz>
Subject: Re: [PATCH stable 4.4 2/4] mm: filemap: don't plant shadow entries without radix tree node
Date: Wed, 26 Oct 2016 14:45:53 +0200	[thread overview]
Message-ID: <20161026124553.GA25683@dhcp22.suse.cz> (raw)
In-Reply-To: <20161025075148.31661-3-mhocko@kernel.org>

Greg,
I do not see this one in the 4.4 queue you have just sent today.

On Tue 25-10-16 09:51:46, Michal Hocko wrote:
> From: Johannes Weiner <hannes@cmpxchg.org>
> 
> commit d3798ae8c6f3767c726403c2ca6ecc317752c9dd upstream.
> 
> When the underflow checks were added to workingset_node_shadow_dec(),
> they triggered immediately:
> 
>   kernel BUG at ./include/linux/swap.h:276!
>   invalid opcode: 0000 [#1] SMP
>   Modules linked in: isofs usb_storage fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6
>    soundcore wmi acpi_als pinctrl_sunrisepoint kfifo_buf tpm_tis industrialio acpi_pad pinctrl_intel tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt
>   CPU: 0 PID: 20929 Comm: blkid Not tainted 4.8.0-rc8-00087-gbe67d60ba944 #1
>   Hardware name: System manufacturer System Product Name/Z170-K, BIOS 1803 05/06/2016
>   task: ffff8faa93ecd940 task.stack: ffff8faa7f478000
>   RIP: page_cache_tree_insert+0xf1/0x100
>   Call Trace:
>     __add_to_page_cache_locked+0x12e/0x270
>     add_to_page_cache_lru+0x4e/0xe0
>     mpage_readpages+0x112/0x1d0
>     blkdev_readpages+0x1d/0x20
>     __do_page_cache_readahead+0x1ad/0x290
>     force_page_cache_readahead+0xaa/0x100
>     page_cache_sync_readahead+0x3f/0x50
>     generic_file_read_iter+0x5af/0x740
>     blkdev_read_iter+0x35/0x40
>     __vfs_read+0xe1/0x130
>     vfs_read+0x96/0x130
>     SyS_read+0x55/0xc0
>     entry_SYSCALL_64_fastpath+0x13/0x8f
>   Code: 03 00 48 8b 5d d8 65 48 33 1c 25 28 00 00 00 44 89 e8 75 19 48 83 c4 18 5b 41 5c 41 5d 41 5e 5d c3 0f 0b 41 bd ef ff ff ff eb d7 <0f> 0b e8 88 68 ef ff 0f 1f 84 00
>   RIP  page_cache_tree_insert+0xf1/0x100
> 
> This is a long-standing bug in the way shadow entries are accounted in
> the radix tree nodes. The shrinker needs to know when radix tree nodes
> contain only shadow entries, no pages, so node->count is split in half
> to count shadows in the upper bits and pages in the lower bits.
> 
> Unfortunately, the radix tree implementation doesn't know of this and
> assumes all entries are in node->count. When there is a shadow entry
> directly in root->rnode and the tree is later extended, the radix tree
> implementation will copy that entry into the new node and and bump its
> node->count, i.e. increases the page count bits. Once the shadow gets
> removed and we subtract from the upper counter, node->count underflows
> and triggers the warning. Afterwards, without node->count reaching 0
> again, the radix tree node is leaked.
> 
> Limit shadow entries to when we have actual radix tree nodes and can
> count them properly. That means we lose the ability to detect refaults
> from files that had only the first page faulted in at eviction time.
> 
> [hannes@cmpxchg.org: backport for 4.4 stable]
> Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/filemap.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 4cfe423d3e8a..7ad648c9780c 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -164,6 +164,14 @@ static void page_cache_tree_delete(struct address_space *mapping,
>  
>  	__radix_tree_lookup(&mapping->page_tree, page->index, &node, &slot);
>  
> +	if (!node) {
> +		/*
> +		 * We need a node to properly account shadow
> +		 * entries. Don't plant any without. XXX
> +		 */
> +		shadow = NULL;
> +	}
> +
>  	if (shadow) {
>  		mapping->nrshadows++;
>  		/*
> -- 
> 2.9.3
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-10-26 12:46 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-25  7:51 [PATCH stable 4.4 0/4] mm: workingset backports Michal Hocko
2016-10-25  7:51 ` Michal Hocko
2016-10-25  7:51 ` Michal Hocko
2016-10-25  7:51 ` [PATCH stable 4.4 1/4] mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page() Michal Hocko
2016-10-25  7:51   ` Michal Hocko
2016-10-25  7:51   ` Michal Hocko
2016-10-25  7:51 ` [PATCH stable 4.4 2/4] mm: filemap: don't plant shadow entries without radix tree node Michal Hocko
2016-10-25  7:51   ` Michal Hocko
2016-10-25  7:51   ` Michal Hocko
2016-10-26 12:45   ` Michal Hocko [this message]
2016-10-26 12:45     ` Michal Hocko
2016-10-26 12:47     ` Michal Hocko
2016-10-26 12:47       ` Michal Hocko
2016-10-26 13:10       ` Greg KH
2016-10-26 13:10         ` Greg KH
2016-10-25  7:51 ` [PATCH stable 4.4 3/4] mm: filemap: fix mapping->nrpages double accounting in fuse Michal Hocko
2016-10-25  7:51   ` Michal Hocko
2016-10-25  7:51   ` Michal Hocko
2016-10-25  7:51 ` [PATCH stable 4.4 4/4] Using BUG_ON() as an assert() is _never_ acceptable Michal Hocko
2016-10-25  7:51   ` Michal Hocko
2016-10-25  7:51   ` Michal Hocko
2016-10-25  7:56 ` [PATCH stable 4.4 0/4] mm: workingset backports Michal Hocko
2016-10-25  7:56   ` Michal Hocko
2016-10-25 14:16 ` Johannes Weiner
2016-10-25 14:16   ` Johannes Weiner
2016-10-26  9:34 ` Greg KH
2016-10-26  9:34   ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161026124553.GA25683@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.