linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] Swap xarray workingset eviction warning.
@ 2018-07-01 23:09 Peter Geis
  2018-07-02  2:50 ` Matthew Wilcox
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Geis @ 2018-07-01 23:09 UTC (permalink / raw)
  To: willy, linux-fsdevel

Good Evening,

Recently some code changes were introduced in the Linux-Next tree that 
has led to a warning bug.
The merge appears to have occurred on June 16, 2018, when 
mm/workingset.c was converted over to use XArray.

The issue is occurring on both Tegra2 and Tegra3 processors.
On T2 processors it appears to just be a warning, but on T3 processors 
it leads to serious issues with hung tasks.

The warning is as follows:
[10409.408904] ------------[ cut here ]------------
[10409.408912] WARNING: CPU: 0 PID: 38 at ./include/linux/xarray.h:53 
workingset_eviction+0x14c/0x154
[10409.408915] Modules linked in: fuse cpufreq_userspace 
cpufreq_powersave bnep qmi_wwan usbnet mii brcmfmac cfg80211 brcmutil 
hci_uart btbcm st_gyro_spi st_sensors_spi cpcap_adc bluetooth 
st_gyro_i2c st_gyro cpcap_pwrbutton st_sensors_i2c st_sensors ak8975 
industrialio_triggered_buffer snd_soc_tegra20_i2s joydev 
snd_soc_tegra_pcm bmp280_spi snd_soc_core bmp280_i2c bmp280 
snd_pcm_dmaengine ac97_bus tegra_vde(C) snd_soc_tegra20_das 
uio_pdrv_genirq uio sch_fq_codel ip_tables x_tables phy_mapphone_mdm6600
[10409.409005] CPU: 0 PID: 38 Comm: kswapd0 Tainted: G        WC 
4.18.0-rc1-next-20180621-00001-gecc0b99b72bc-dirty #55
[10409.409009] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[10409.409021] [<c0115a2c>] (unwind_backtrace) from [<c010ee40>] 
(show_stack+0x20/0x24)
[10409.409032] [<c010ee40>] (show_stack) from [<c0a299f8>] 
(dump_stack+0xb4/0xe0)
[10409.409044] [<c0a299f8>] (dump_stack) from [<c0134ef8>] 
(__warn+0x104/0x11c)
[10409.409052] [<c0134ef8>] (__warn) from [<c0135048>] 
(warn_slowpath_null+0x50/0x58)
[10409.409060] [<c0135048>] (warn_slowpath_null) from [<c02e0aa0>] 
(workingset_eviction+0x14c/0x154)
[10409.409071] [<c02e0aa0>] (workingset_eviction) from [<c02c0cb8>] 
(__remove_mapping+0x154/0x1f0)
[10409.409079] [<c02c0cb8>] (__remove_mapping) from [<c02c37a4>] 
(shrink_page_list+0xa60/0x101c)
[10409.409087] [<c02c37a4>] (shrink_page_list) from [<c02c4558>] 
(shrink_inactive_list+0x258/0x6ac)
[10409.409095] [<c02c4558>] (shrink_inactive_list) from [<c02c5150>] 
(shrink_node_memcg+0x330/0x6a8)
[10409.409103] [<c02c5150>] (shrink_node_memcg) from [<c02c55c4>] 
(shrink_node+0xfc/0x51c)
[10409.409110] [<c02c55c4>] (shrink_node) from [<c02c69b8>] 
(kswapd+0x2f4/0x86c)
[10409.409118] [<c02c69b8>] (kswapd) from [<c01597d8>] (kthread+0x16c/0x174)
[10409.409125] [<c01597d8>] (kthread) from [<c01010b4>] 
(ret_from_fork+0x14/0x20)
[10409.409129] Exception stack(0xedc6dfb0 to 0xedc6dff8)
[10409.409134] dfa0:                                     00000000 
00000000 00000000 00000000
[10409.409140] dfc0: 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000
[10409.409145] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[10409.409149] ---[ end trace 5258a39aad429bfe ]---

Very Respectfully,
Peter Geis

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Swap xarray workingset eviction warning.
  2018-07-01 23:09 [BUG] Swap xarray workingset eviction warning Peter Geis
@ 2018-07-02  2:50 ` Matthew Wilcox
  2018-07-02  3:11   ` Gao Xiang
  2018-07-05 17:00   ` Johannes Weiner
  0 siblings, 2 replies; 6+ messages in thread
From: Matthew Wilcox @ 2018-07-02  2:50 UTC (permalink / raw)
  To: Peter Geis; +Cc: linux-fsdevel, Johannes Weiner, linux-mm

On Sun, Jul 01, 2018 at 07:09:41PM -0400, Peter Geis wrote:
> The warning is as follows:
> [10409.408904] ------------[ cut here ]------------
> [10409.408912] WARNING: CPU: 0 PID: 38 at ./include/linux/xarray.h:53
> workingset_eviction+0x14c/0x154

This is interesting.  Here's the code that leads to the warning:

static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction)
{
        eviction >>= bucket_order;
        eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid;
        eviction = (eviction << NODES_SHIFT) | pgdat->node_id;

        return xa_mk_value(eviction);
}

The warning itself comes from:

static inline void *xa_mk_value(unsigned long v)
{
        WARN_ON((long)v < 0);
        return (void *)((v << 1) | 1);
}

The fact that we haven't seen this on other architectures makes me wonder
if NODES_SHIFT or MEM_CGROUP_ID_SHIFT are messed up on Tegra?

Johannes, I wonder if you could help out here?  I'm not terribly familiar
with this part of the workingset code.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Swap xarray workingset eviction warning.
  2018-07-02  2:50 ` Matthew Wilcox
@ 2018-07-02  3:11   ` Gao Xiang
  2018-07-05 17:00   ` Johannes Weiner
  1 sibling, 0 replies; 6+ messages in thread
From: Gao Xiang @ 2018-07-02  3:11 UTC (permalink / raw)
  To: Matthew Wilcox, Peter Geis; +Cc: linux-fsdevel, Johannes Weiner, linux-mm

Hi Matthew,

On 2018/7/2 10:50, Matthew Wilcox wrote:
> On Sun, Jul 01, 2018 at 07:09:41PM -0400, Peter Geis wrote:
>> The warning is as follows:
>> [10409.408904] ------------[ cut here ]------------
>> [10409.408912] WARNING: CPU: 0 PID: 38 at ./include/linux/xarray.h:53
>> workingset_eviction+0x14c/0x154
> This is interesting.  Here's the code that leads to the warning:
> 
> static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction)
> {
>         eviction >>= bucket_order;
>         eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid;
>         eviction = (eviction << NODES_SHIFT) | pgdat->node_id;
> 
>         return xa_mk_value(eviction);
> }
> 
> The warning itself comes from:
> 
> static inline void *xa_mk_value(unsigned long v)
> {
>         WARN_ON((long)v < 0);
>         return (void *)((v << 1) | 1);
> }

Sorry for breaking in, how do you think about considering

[RFC PATCH v4] <linux/tagptr.h>: Introduce tagged pointer
https://marc.info/?l=linux-kernel&m=153035209012070&w=2

to replace these masks? It seems boths for the XArray or old radix trees has many hacked code...

or if you think this implmentation is not ok, could you please give some suggestions or alternatives on tagptr...

> 
> The fact that we haven't seen this on other architectures makes me wonder
> if NODES_SHIFT or MEM_CGROUP_ID_SHIFT are messed up on Tegra?
> 
> Johannes, I wonder if you could help out here?  I'm not terribly familiar
> with this part of the workingset code.
Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Swap xarray workingset eviction warning.
  2018-07-02  2:50 ` Matthew Wilcox
  2018-07-02  3:11   ` Gao Xiang
@ 2018-07-05 17:00   ` Johannes Weiner
  2018-07-05 17:53     ` Matthew Wilcox
  1 sibling, 1 reply; 6+ messages in thread
From: Johannes Weiner @ 2018-07-05 17:00 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Peter Geis, linux-fsdevel, linux-mm

Hello,

On Sun, Jul 01, 2018 at 07:50:59PM -0700, Matthew Wilcox wrote:
> On Sun, Jul 01, 2018 at 07:09:41PM -0400, Peter Geis wrote:
> > The warning is as follows:
> > [10409.408904] ------------[ cut here ]------------
> > [10409.408912] WARNING: CPU: 0 PID: 38 at ./include/linux/xarray.h:53
> > workingset_eviction+0x14c/0x154
> 
> This is interesting.  Here's the code that leads to the warning:
> 
> static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction)
> {
>         eviction >>= bucket_order;
>         eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid;
>         eviction = (eviction << NODES_SHIFT) | pgdat->node_id;
> 
>         return xa_mk_value(eviction);
> }
> 
> The warning itself comes from:
> 
> static inline void *xa_mk_value(unsigned long v)
> {
>         WARN_ON((long)v < 0);
>         return (void *)((v << 1) | 1);
> }
> 
> The fact that we haven't seen this on other architectures makes me wonder
> if NODES_SHIFT or MEM_CGROUP_ID_SHIFT are messed up on Tegra?
> 
> Johannes, I wonder if you could help out here?  I'm not terribly familiar
> with this part of the workingset code.

This could be a matter of uptime, but the warning triggers on a thing
that is supposed to happen everywhere eventually. Let's fix it.

The eviction timestamp is from a monotonically increasing counter that
will eventually reach values high enough that the left-shifts for
memcg id and node id will truncate the upper bits.

The bucketing logic isn't supposed to prevent this truncation, it's
just making sure that the namespace of the truncated timestamp field
is big enough to cover the full range of actionable refault pages.

xa_mk_value() doesn't understand that we're okay with it chopping off
our upper-most bit. We shouldn't make this an API behavior, either, so
let's fix the workingset code to always clear those bits before hand.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---

diff --git a/mm/workingset.c b/mm/workingset.c
index a466e731231d..1da19c04b6f7 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -173,6 +173,7 @@ static unsigned int bucket_order __read_mostly;
 static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction)
 {
 	eviction >>= bucket_order;
+	eviction &= EVICTION_MASK;
 	eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid;
 	eviction = (eviction << NODES_SHIFT) | pgdat->node_id;
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [BUG] Swap xarray workingset eviction warning.
  2018-07-05 17:00   ` Johannes Weiner
@ 2018-07-05 17:53     ` Matthew Wilcox
  2018-07-05 18:43       ` Johannes Weiner
  0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2018-07-05 17:53 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Peter Geis, linux-fsdevel, linux-mm

On Thu, Jul 05, 2018 at 01:00:19PM -0400, Johannes Weiner wrote:
> This could be a matter of uptime, but the warning triggers on a thing
> that is supposed to happen everywhere eventually. Let's fix it.

Ahh!  Thank you!

> xa_mk_value() doesn't understand that we're okay with it chopping off
> our upper-most bit. We shouldn't make this an API behavior, either, so
> let's fix the workingset code to always clear those bits before hand.

Makes sense.  I'll just fold this in, if that's OK with you?

> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
> 
> diff --git a/mm/workingset.c b/mm/workingset.c
> index a466e731231d..1da19c04b6f7 100644
> --- a/mm/workingset.c
> +++ b/mm/workingset.c
> @@ -173,6 +173,7 @@ static unsigned int bucket_order __read_mostly;
>  static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction)
>  {
>  	eviction >>= bucket_order;
> +	eviction &= EVICTION_MASK;
>  	eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid;
>  	eviction = (eviction << NODES_SHIFT) | pgdat->node_id;
>  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Swap xarray workingset eviction warning.
  2018-07-05 17:53     ` Matthew Wilcox
@ 2018-07-05 18:43       ` Johannes Weiner
  0 siblings, 0 replies; 6+ messages in thread
From: Johannes Weiner @ 2018-07-05 18:43 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Peter Geis, linux-fsdevel, linux-mm

On Thu, Jul 05, 2018 at 10:53:52AM -0700, Matthew Wilcox wrote:
> On Thu, Jul 05, 2018 at 01:00:19PM -0400, Johannes Weiner wrote:
> > This could be a matter of uptime, but the warning triggers on a thing
> > that is supposed to happen everywhere eventually. Let's fix it.
> 
> Ahh!  Thank you!
> 
> > xa_mk_value() doesn't understand that we're okay with it chopping off
> > our upper-most bit. We shouldn't make this an API behavior, either, so
> > let's fix the workingset code to always clear those bits before hand.
> 
> Makes sense.  I'll just fold this in, if that's OK with you?

Sounds good to me, thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-07-05 18:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-01 23:09 [BUG] Swap xarray workingset eviction warning Peter Geis
2018-07-02  2:50 ` Matthew Wilcox
2018-07-02  3:11   ` Gao Xiang
2018-07-05 17:00   ` Johannes Weiner
2018-07-05 17:53     ` Matthew Wilcox
2018-07-05 18:43       ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).