linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ira Weiny <ira.weiny@intel.com>
To: Ralph Campbell <rcampbell@nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	Christoph Hellwig <hch@infradead.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Sachin Sant <sachinp@linux.vnet.ibm.com>,
	linux-nvdimm@lists.01.org, Jason Gunthorpe <jgg@mellanox.com>
Subject: Re: [PATCH] mm/mremap_pages: Fix static key devmap_managed_key updates
Date: Thu, 22 Oct 2020 12:10:29 -0700	[thread overview]
Message-ID: <20201022191028.GA534324@iweiny-DESK2.sc.intel.com> (raw)
In-Reply-To: <d7540264-48f1-9fdc-0769-de68fdfc1c7b@nvidia.com>

On Thu, Oct 22, 2020 at 11:19:43AM -0700, Ralph Campbell wrote:
> 
> On 10/22/20 8:41 AM, Ira Weiny wrote:
> > On Thu, Oct 22, 2020 at 11:37:53AM +0530, Aneesh Kumar K.V wrote:
> > > commit 6f42193fd86e ("memremap: don't use a separate devm action for
> > > devmap_managed_enable_get") changed the static key updates such that we
> > > now call devmap_managed_enable_put() without doing the equivalent
> > > devmap_managed_enable_get().
> > > 
> > > devmap_managed_enable_get() is only called for MEMORY_DEVICE_PRIVATE and
> > > MEMORY_DEVICE_FS_DAX, But memunmap_pages() get called for other pgmap
> > > types too. This results in the below warning when switching between
> > > system-ram and devdax mode for devdax namespace.
> > > 
> > >   jump label: negative count!
> > >   WARNING: CPU: 52 PID: 1335 at kernel/jump_label.c:235 static_key_slow_try_dec+0x88/0xa0
> > >   Modules linked in:
> > >   ....
> > > 
> > >   NIP [c000000000433318] static_key_slow_try_dec+0x88/0xa0
> > >   LR [c000000000433314] static_key_slow_try_dec+0x84/0xa0
> > >   Call Trace:
> > >   [c000000025c1f660] [c000000000433314] static_key_slow_try_dec+0x84/0xa0 (unreliable)
> > >   [c000000025c1f6d0] [c000000000433664] __static_key_slow_dec_cpuslocked+0x34/0xd0
> > >   [c000000025c1f700] [c0000000004337a4] static_key_slow_dec+0x54/0xf0
> > >   [c000000025c1f770] [c00000000059c49c] memunmap_pages+0x36c/0x500
> > >   [c000000025c1f820] [c000000000d91d10] devm_action_release+0x30/0x50
> > >   [c000000025c1f840] [c000000000d92e34] release_nodes+0x2f4/0x3e0
> > >   [c000000025c1f8f0] [c000000000d8b15c] device_release_driver_internal+0x17c/0x280
> > >   [c000000025c1f930] [c000000000d883a4] bus_remove_device+0x124/0x210
> > >   [c000000025c1f9b0] [c000000000d80ef4] device_del+0x1d4/0x530
> > >   [c000000025c1fa70] [c000000000e341e8] unregister_dev_dax+0x48/0xe0
> > >   [c000000025c1fae0] [c000000000d91d10] devm_action_release+0x30/0x50
> > >   [c000000025c1fb00] [c000000000d92e34] release_nodes+0x2f4/0x3e0
> > >   [c000000025c1fbb0] [c000000000d8b15c] device_release_driver_internal+0x17c/0x280
> > >   [c000000025c1fbf0] [c000000000d87000] unbind_store+0x130/0x170
> > >   [c000000025c1fc30] [c000000000d862a0] drv_attr_store+0x40/0x60
> > >   [c000000025c1fc50] [c0000000006d316c] sysfs_kf_write+0x6c/0xb0
> > >   [c000000025c1fc90] [c0000000006d2328] kernfs_fop_write+0x118/0x280
> > >   [c000000025c1fce0] [c0000000005a79f8] vfs_write+0xe8/0x2a0
> > >   [c000000025c1fd30] [c0000000005a7d94] ksys_write+0x84/0x140
> > >   [c000000025c1fd80] [c00000000003a430] system_call_exception+0x120/0x270
> > >   [c000000025c1fe20] [c00000000000c540] system_call_common+0xf0/0x27c
> > > 
> > > Cc: Christoph Hellwig <hch@infradead.org>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
> > > Cc: linux-nvdimm@lists.01.org
> > > Cc: Ira Weiny <ira.weiny@intel.com>
> > > Cc: Jason Gunthorpe <jgg@mellanox.com>
> > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> > > ---
> > >   mm/memremap.c | 19 +++++++++++++++----
> > >   1 file changed, 15 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/mm/memremap.c b/mm/memremap.c
> > > index 73a206d0f645..d4402ff3e467 100644
> > > --- a/mm/memremap.c
> > > +++ b/mm/memremap.c
> > > @@ -158,6 +158,16 @@ void memunmap_pages(struct dev_pagemap *pgmap)
> > >   {
> > >   	unsigned long pfn;
> > >   	int i;
> > > +	bool need_devmap_managed = false;
> > > +
> > > +	switch (pgmap->type) {
> > > +	case MEMORY_DEVICE_PRIVATE:
> > > +	case MEMORY_DEVICE_FS_DAX:
> > > +		need_devmap_managed = true;
> > > +		break;
> > > +	default:
> > > +		break;
> > > +	}
> > 
> > Is it overkill to avoid duplicating this switch logic in
> > page_is_devmap_managed() by creating another call which can be used here?
> 
> Perhaps. I can imagine a helper defined in include/linux/mm.h which
> page_is_devmap_managed() could also call but that would impact a lot of
> places that include mm.h. Since memremap.c already has to have intimate
> knowledge of the pgmap->type, I think limiting the change to just what
> is needed is better for now. So the patch looks OK to me.
> 
> Looking at this some more, I would suggest changing devmap_managed_enable_get()
> and devmap_managed_enable_put() to do the special case checking instead of
> doing it in memremap_pages() and memunmap_pages().
> Then devmap_managed_enable_get() doesn't need to return an error if
> CONFIG_DEV_PAGEMAP_OPS isn't defined. I have only compile tested the
> following.

This looks cleaner to me.  Aneesh?

FWIW:
Reviewed-by: Ira Weiny <ira.weiny@intel.com>

Ira

> 
> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> ---
>  mm/memremap.c | 39 ++++++++++++++++-----------------------
>  1 file changed, 16 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 73a206d0f645..16b2fb482da1 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -41,28 +41,24 @@ EXPORT_SYMBOL_GPL(memremap_compat_align);
>  DEFINE_STATIC_KEY_FALSE(devmap_managed_key);
>  EXPORT_SYMBOL(devmap_managed_key);
> -static void devmap_managed_enable_put(void)
> +static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
>  {
> -	static_branch_dec(&devmap_managed_key);
> +	if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
> +	    pgmap->type == MEMORY_DEVICE_FS_DAX)
> +		static_branch_dec(&devmap_managed_key);
>  }
> -static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
> +static void devmap_managed_enable_get(struct dev_pagemap *pgmap)
>  {
> -	if (pgmap->type == MEMORY_DEVICE_PRIVATE &&
> -	    (!pgmap->ops || !pgmap->ops->page_free)) {
> -		WARN(1, "Missing page_free method\n");
> -		return -EINVAL;
> -	}
> -
> -	static_branch_inc(&devmap_managed_key);
> -	return 0;
> +	if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
> +	    pgmap->type == MEMORY_DEVICE_FS_DAX)
> +		static_branch_inc(&devmap_managed_key);
>  }
>  #else
> -static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
> +static void devmap_managed_enable_get(struct dev_pagemap *pgmap)
>  {
> -	return -EINVAL;
>  }
> -static void devmap_managed_enable_put(void)
> +static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
>  {
>  }
>  #endif /* CONFIG_DEV_PAGEMAP_OPS */
> @@ -169,7 +165,7 @@ void memunmap_pages(struct dev_pagemap *pgmap)
>  		pageunmap_range(pgmap, i);
>  	WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n");
> -	devmap_managed_enable_put();
> +	devmap_managed_enable_put(pgmap);
>  }
>  EXPORT_SYMBOL_GPL(memunmap_pages);
> @@ -307,7 +303,6 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>  		.pgprot = PAGE_KERNEL,
>  	};
>  	const int nr_range = pgmap->nr_range;
> -	bool need_devmap_managed = true;
>  	int error, i;
>  	if (WARN_ONCE(!nr_range, "nr_range must be specified\n"))
> @@ -323,6 +318,10 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>  			WARN(1, "Missing migrate_to_ram method\n");
>  			return ERR_PTR(-EINVAL);
>  		}
> +		if (!pgmap->ops->page_free) {
> +			WARN(1, "Missing page_free method\n");
> +			return ERR_PTR(-EINVAL);
> +		}
>  		if (!pgmap->owner) {
>  			WARN(1, "Missing owner\n");
>  			return ERR_PTR(-EINVAL);
> @@ -336,11 +335,9 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>  		}
>  		break;
>  	case MEMORY_DEVICE_GENERIC:
> -		need_devmap_managed = false;
>  		break;
>  	case MEMORY_DEVICE_PCI_P2PDMA:
>  		params.pgprot = pgprot_noncached(params.pgprot);
> -		need_devmap_managed = false;
>  		break;
>  	default:
>  		WARN(1, "Invalid pgmap type %d\n", pgmap->type);
> @@ -364,11 +361,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>  		}
>  	}
> -	if (need_devmap_managed) {
> -		error = devmap_managed_enable_get(pgmap);
> -		if (error)
> -			return ERR_PTR(error);
> -	}
> +	devmap_managed_enable_get(pgmap);
>  	/*
>  	 * Clear the pgmap nr_range as it will be incremented for each
> -- 
> 2.20.1
> 
> > >   	dev_pagemap_kill(pgmap);
> > >   	for (i = 0; i < pgmap->nr_range; i++)
> > > @@ -169,7 +179,8 @@ void memunmap_pages(struct dev_pagemap *pgmap)
> > >   		pageunmap_range(pgmap, i);
> > >   	WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n");
> > > -	devmap_managed_enable_put();
> > > +	if (need_devmap_managed)
> > > +		devmap_managed_enable_put();
> > >   }
> > >   EXPORT_SYMBOL_GPL(memunmap_pages);
> > > @@ -307,7 +318,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
> > >   		.pgprot = PAGE_KERNEL,
> > >   	};
> > >   	const int nr_range = pgmap->nr_range;
> > > -	bool need_devmap_managed = true;
> > > +	bool need_devmap_managed = false;
> > 
> > I'm CC'ing Ralph Campbell because I think some of his work has proposed this
> > same change.
> > 
> > Ira
> 
> This part of the patch isn't strictly needed, it just reverses the default value of
> need_devmap_managed.
> 
> > >   	int error, i;
> > >   	if (WARN_ONCE(!nr_range, "nr_range must be specified\n"))
> > > @@ -327,6 +338,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
> > >   			WARN(1, "Missing owner\n");
> > >   			return ERR_PTR(-EINVAL);
> > >   		}
> > > +		need_devmap_managed = true;
> > >   		break;
> > >   	case MEMORY_DEVICE_FS_DAX:
> > >   		if (!IS_ENABLED(CONFIG_ZONE_DEVICE) ||
> > > @@ -334,13 +346,12 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
> > >   			WARN(1, "File system DAX not supported\n");
> > >   			return ERR_PTR(-EINVAL);
> > >   		}
> > > +		need_devmap_managed = true;
> > >   		break;
> > >   	case MEMORY_DEVICE_GENERIC:
> > > -		need_devmap_managed = false;
> > >   		break;
> > >   	case MEMORY_DEVICE_PCI_P2PDMA:
> > >   		params.pgprot = pgprot_noncached(params.pgprot);
> > > -		need_devmap_managed = false;
> > >   		break;
> > >   	default:
> > >   		WARN(1, "Invalid pgmap type %d\n", pgmap->type);
> > > -- 
> > > 2.26.2
> > > 


  reply	other threads:[~2020-10-22 19:10 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-22  6:07 [PATCH] mm/mremap_pages: Fix static key devmap_managed_key updates Aneesh Kumar K.V
2020-10-22  8:34 ` Sachin Sant
2020-10-22 13:26 ` Christoph Hellwig
2020-10-22 15:41 ` Ira Weiny
2020-10-22 18:19   ` Ralph Campbell
2020-10-22 19:10     ` Ira Weiny [this message]
2020-10-23  2:52       ` Aneesh Kumar K.V
2020-10-23  6:38     ` Sachin Sant
2020-10-23  6:46     ` Christoph Hellwig
2020-10-23 17:29       ` Ralph Campbell
2020-10-23 18:32 Ralph Campbell
2020-10-24  8:19 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201022191028.GA534324@iweiny-DESK2.sc.intel.com \
    --to=ira.weiny@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=jgg@mellanox.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=rcampbell@nvidia.com \
    --cc=sachinp@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).