From: "Williams, Dan J" <dan.j.williams@intel.com> To: "hch@lst.de" <hch@lst.de> Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "mingo@kernel.org" <mingo@kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "hpa@zytor.com" <hpa@zytor.com>, "linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>, "ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>, "boaz@plexistor.com" <boaz@plexistor.com>, "david@fromorbit.com" <david@fromorbit.com> Subject: Re: [PATCH v2 9/9] devm_memremap_pages: protect against pmem device unbind Date: Wed, 26 Aug 2015 21:39:18 +0000 [thread overview] Message-ID: <1440625157.31365.21.camel@intel.com> (raw) In-Reply-To: <20150826124649.GA8014@lst.de> [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 7548 bytes --] On Wed, 2015-08-26 at 14:46 +0200, Christoph Hellwig wrote: > On Tue, Aug 25, 2015 at 09:28:13PM -0400, Dan Williams wrote: > > Given that: > > > > 1/ device ->remove() can not be failed > > > > 2/ a pmem device may be unbound at any time > > > > 3/ we do not know what other parts of the kernel are actively using a > > 'struct page' from devm_memremap_pages() > > > > ...provide a facility for active usages of device memory to block pmem > > device unbind. With a percpu_ref it should be feasible to take a > > reference on a per-I/O or other high frequency basis. > > Without a caller of get_page_map this is just adding dead code. I'd > suggest to group it in a series with that caller. > Agreed, we can drop this until the first user arrives. > Also if the page_map gets exposed in a header the name is a bit too generic. > memremap_map maybe? Done, and in the patch below I hide the internal implementation details of page_map in kernel/memremap.c and only expose the percpu_ref in the public memremap_map. 8<---- Subject: devm_memremap_pages: protect against pmem device unbind From: Dan Williams <dan.j.williams@intel.com> Given that: 1/ device ->remove() can not be failed 2/ a pmem device may be unbound at any time 3/ we do not know what other parts of the kernel are actively using a 'struct page' from devm_memremap_pages() ...provide a facility for active usages of device memory to block pmem device unbind. With a percpu_ref it should be feasible to take a reference on a per-I/O or other high frequency basis. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- include/linux/io.h | 30 ++++++++++++++++ kernel/memremap.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 125 insertions(+), 1 deletion(-) diff --git a/include/linux/io.h b/include/linux/io.h index de64c1e53612..9e696b114c6d 100644 --- a/include/linux/io.h +++ b/include/linux/io.h @@ -90,8 +90,25 @@ void devm_memunmap(struct device *dev, void *addr); void *__devm_memremap_pages(struct device *dev, struct resource *res); #ifdef CONFIG_ZONE_DEVICE +#include <linux/percpu-refcount.h> + +struct memremap_map { + struct percpu_ref percpu_ref; +}; + void *devm_memremap_pages(struct device *dev, struct resource *res); +struct memremap_map * __must_check get_memremap_map(resource_size_t addr); +static inline void ref_memremap_map(struct memremap_map *memremap_map) +{ + percpu_ref_get(&memremap_map->percpu_ref); +} + +static inline void put_memremap_map(struct memremap_map *memremap_map) +{ + percpu_ref_put(&memremap_map->percpu_ref); +} #else +struct memremap_map; static inline void *devm_memremap_pages(struct device *dev, struct resource *res) { /* @@ -102,6 +119,19 @@ static inline void *devm_memremap_pages(struct device *dev, struct resource *res WARN_ON_ONCE(1); return ERR_PTR(-ENXIO); } + +static inline __must_check struct memremap_map *get_memremap_map(resource_size_t addr) +{ + return NULL; +} + +static inline void ref_memremap_map(struct memremap_map *memremap_map) +{ +} + +static inline void put_memremap_map(struct memremap_map *memremap_map) +{ +} #endif /* diff --git a/kernel/memremap.c b/kernel/memremap.c index 72b0c66628b6..5b9f04789b96 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -11,7 +11,11 @@ * General Public License for more details. */ #include <linux/device.h> +#include <linux/ioport.h> #include <linux/types.h> +#include <linux/sched.h> +#include <linux/wait.h> +#include <linux/list.h> #include <linux/io.h> #include <linux/mm.h> #include <linux/memory_hotplug.h> @@ -138,14 +142,74 @@ void devm_memunmap(struct device *dev, void *addr) EXPORT_SYMBOL(devm_memunmap); #ifdef CONFIG_ZONE_DEVICE +static DEFINE_MUTEX(page_map_lock); +static DECLARE_WAIT_QUEUE_HEAD(page_map_wait); +static LIST_HEAD(page_maps); + +enum { + PAGE_MAP_LIVE, + PAGE_MAP_CONFIRM, +}; + struct page_map { struct resource res; + struct list_head list; + unsigned long flags; + struct memremap_map map; + struct device *dev; }; +static struct page_map *to_page_map(struct percpu_ref *ref) +{ + return container_of(ref, struct page_map, map.percpu_ref); +} + +static void page_map_release(struct percpu_ref *ref) +{ + struct page_map *page_map = to_page_map(ref); + + /* signal page_map is idle (no more refs) */ + clear_bit(PAGE_MAP_LIVE, &page_map->flags); + wake_up_all(&page_map_wait); +} + +static void page_map_confirm(struct percpu_ref *ref) +{ + struct page_map *page_map = to_page_map(ref); + + /* signal page_map is confirmed dead (slow path ref mode) */ + set_bit(PAGE_MAP_CONFIRM, &page_map->flags); + wake_up_all(&page_map_wait); +} + +static void page_map_destroy(struct page_map *page_map) +{ + long tmo; + + /* flush new lookups */ + mutex_lock(&page_map_lock); + list_del_rcu(&page_map->list); + mutex_unlock(&page_map_lock); + synchronize_rcu(); + + percpu_ref_kill_and_confirm(&page_map->map.percpu_ref, page_map_confirm); + do { + tmo = wait_event_interruptible_timeout(page_map_wait, + !test_bit(PAGE_MAP_LIVE, &page_map->flags) + && test_bit(PAGE_MAP_CONFIRM, &page_map->flags), 5*HZ); + if (tmo <= 0) + dev_dbg(page_map->dev, + "page map active, continuing to wait...\n"); + } while (tmo <= 0); +} + static void devm_memremap_pages_release(struct device *dev, void *res) { struct page_map *page_map = res; + if (test_bit(PAGE_MAP_LIVE, &page_map->flags)) + page_map_destroy(page_map); + /* pages are dead and unused, undo the arch mapping */ arch_remove_memory(page_map->res.start, resource_size(&page_map->res)); } @@ -155,7 +219,7 @@ void *devm_memremap_pages(struct device *dev, struct resource *res) int is_ram = region_intersects(res->start, resource_size(res), "System RAM"); struct page_map *page_map; - int error, nid; + int error, nid, rc; if (is_ram == REGION_MIXED) { WARN_ONCE(1, "%s attempted on mixed region %pr\n", @@ -172,6 +236,12 @@ void *devm_memremap_pages(struct device *dev, struct resource *res) return ERR_PTR(-ENOMEM); memcpy(&page_map->res, res, sizeof(*res)); + INIT_LIST_HEAD(&page_map->list); + page_map->dev = dev; + rc = percpu_ref_init(&page_map->map.percpu_ref, page_map_release, 0, + GFP_KERNEL); + if (rc) + return ERR_PTR(rc); nid = dev_to_node(dev); if (nid < 0) @@ -183,8 +253,32 @@ void *devm_memremap_pages(struct device *dev, struct resource *res) return ERR_PTR(error); } + set_bit(PAGE_MAP_LIVE, &page_map->flags); + mutex_lock(&page_map_lock); + list_add_rcu(&page_map->list, &page_maps); + mutex_unlock(&page_map_lock); + devres_add(dev, page_map); return __va(res->start); } EXPORT_SYMBOL(devm_memremap_pages); + +struct memremap_map * __must_check get_memremap_map(resource_size_t addr) +{ + struct memremap_map *ret = NULL; + struct page_map *page_map; + + rcu_read_lock(); + list_for_each_entry_rcu(page_map, &page_maps, list) { + if (addr >= page_map->res.start && addr <= page_map->res.end) { + if (percpu_ref_tryget(&page_map->map.percpu_ref)) + ret = &page_map->map; + break; + } + } + rcu_read_unlock(); + + return ret; +} +EXPORT_SYMBOL_GPL(get_memremap_map); #endif /* CONFIG_ZONE_DEVICE */ N§²æìr¸zǧu©²Æ {\béì¹»\x1c®&Þ)îÆi¢Ø^nr¶Ý¢j$½§$¢¸\x05¢¹¨è§~'.)îÄÃ,yèm¶ÿÃ\f%{±j+ðèצj)Z·
WARNING: multiple messages have this Message-ID (diff)
From: "Williams, Dan J" <dan.j.williams@intel.com> To: "hch@lst.de" <hch@lst.de> Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "mingo@kernel.org" <mingo@kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "hpa@zytor.com" <hpa@zytor.com>, "linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>, "ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>, "boaz@plexistor.com" <boaz@plexistor.com>, "david@fromorbit.com" <david@fromorbit.com> Subject: Re: [PATCH v2 9/9] devm_memremap_pages: protect against pmem device unbind Date: Wed, 26 Aug 2015 21:39:18 +0000 [thread overview] Message-ID: <1440625157.31365.21.camel@intel.com> (raw) In-Reply-To: <20150826124649.GA8014@lst.de> [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 7359 bytes --] On Wed, 2015-08-26 at 14:46 +0200, Christoph Hellwig wrote: > On Tue, Aug 25, 2015 at 09:28:13PM -0400, Dan Williams wrote: > > Given that: > > > > 1/ device ->remove() can not be failed > > > > 2/ a pmem device may be unbound at any time > > > > 3/ we do not know what other parts of the kernel are actively using a > > 'struct page' from devm_memremap_pages() > > > > ...provide a facility for active usages of device memory to block pmem > > device unbind. With a percpu_ref it should be feasible to take a > > reference on a per-I/O or other high frequency basis. > > Without a caller of get_page_map this is just adding dead code. I'd > suggest to group it in a series with that caller. > Agreed, we can drop this until the first user arrives. > Also if the page_map gets exposed in a header the name is a bit too generic. > memremap_map maybe? Done, and in the patch below I hide the internal implementation details of page_map in kernel/memremap.c and only expose the percpu_ref in the public memremap_map. 8<---- Subject: devm_memremap_pages: protect against pmem device unbind From: Dan Williams <dan.j.williams@intel.com> Given that: 1/ device ->remove() can not be failed 2/ a pmem device may be unbound at any time 3/ we do not know what other parts of the kernel are actively using a 'struct page' from devm_memremap_pages() ...provide a facility for active usages of device memory to block pmem device unbind. With a percpu_ref it should be feasible to take a reference on a per-I/O or other high frequency basis. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- include/linux/io.h | 30 ++++++++++++++++ kernel/memremap.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 125 insertions(+), 1 deletion(-) diff --git a/include/linux/io.h b/include/linux/io.h index de64c1e53612..9e696b114c6d 100644 --- a/include/linux/io.h +++ b/include/linux/io.h @@ -90,8 +90,25 @@ void devm_memunmap(struct device *dev, void *addr); void *__devm_memremap_pages(struct device *dev, struct resource *res); #ifdef CONFIG_ZONE_DEVICE +#include <linux/percpu-refcount.h> + +struct memremap_map { + struct percpu_ref percpu_ref; +}; + void *devm_memremap_pages(struct device *dev, struct resource *res); +struct memremap_map * __must_check get_memremap_map(resource_size_t addr); +static inline void ref_memremap_map(struct memremap_map *memremap_map) +{ + percpu_ref_get(&memremap_map->percpu_ref); +} + +static inline void put_memremap_map(struct memremap_map *memremap_map) +{ + percpu_ref_put(&memremap_map->percpu_ref); +} #else +struct memremap_map; static inline void *devm_memremap_pages(struct device *dev, struct resource *res) { /* @@ -102,6 +119,19 @@ static inline void *devm_memremap_pages(struct device *dev, struct resource *res WARN_ON_ONCE(1); return ERR_PTR(-ENXIO); } + +static inline __must_check struct memremap_map *get_memremap_map(resource_size_t addr) +{ + return NULL; +} + +static inline void ref_memremap_map(struct memremap_map *memremap_map) +{ +} + +static inline void put_memremap_map(struct memremap_map *memremap_map) +{ +} #endif /* diff --git a/kernel/memremap.c b/kernel/memremap.c index 72b0c66628b6..5b9f04789b96 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -11,7 +11,11 @@ * General Public License for more details. */ #include <linux/device.h> +#include <linux/ioport.h> #include <linux/types.h> +#include <linux/sched.h> +#include <linux/wait.h> +#include <linux/list.h> #include <linux/io.h> #include <linux/mm.h> #include <linux/memory_hotplug.h> @@ -138,14 +142,74 @@ void devm_memunmap(struct device *dev, void *addr) EXPORT_SYMBOL(devm_memunmap); #ifdef CONFIG_ZONE_DEVICE +static DEFINE_MUTEX(page_map_lock); +static DECLARE_WAIT_QUEUE_HEAD(page_map_wait); +static LIST_HEAD(page_maps); + +enum { + PAGE_MAP_LIVE, + PAGE_MAP_CONFIRM, +}; + struct page_map { struct resource res; + struct list_head list; + unsigned long flags; + struct memremap_map map; + struct device *dev; }; +static struct page_map *to_page_map(struct percpu_ref *ref) +{ + return container_of(ref, struct page_map, map.percpu_ref); +} + +static void page_map_release(struct percpu_ref *ref) +{ + struct page_map *page_map = to_page_map(ref); + + /* signal page_map is idle (no more refs) */ + clear_bit(PAGE_MAP_LIVE, &page_map->flags); + wake_up_all(&page_map_wait); +} + +static void page_map_confirm(struct percpu_ref *ref) +{ + struct page_map *page_map = to_page_map(ref); + + /* signal page_map is confirmed dead (slow path ref mode) */ + set_bit(PAGE_MAP_CONFIRM, &page_map->flags); + wake_up_all(&page_map_wait); +} + +static void page_map_destroy(struct page_map *page_map) +{ + long tmo; + + /* flush new lookups */ + mutex_lock(&page_map_lock); + list_del_rcu(&page_map->list); + mutex_unlock(&page_map_lock); + synchronize_rcu(); + + percpu_ref_kill_and_confirm(&page_map->map.percpu_ref, page_map_confirm); + do { + tmo = wait_event_interruptible_timeout(page_map_wait, + !test_bit(PAGE_MAP_LIVE, &page_map->flags) + && test_bit(PAGE_MAP_CONFIRM, &page_map->flags), 5*HZ); + if (tmo <= 0) + dev_dbg(page_map->dev, + "page map active, continuing to wait...\n"); + } while (tmo <= 0); +} + static void devm_memremap_pages_release(struct device *dev, void *res) { struct page_map *page_map = res; + if (test_bit(PAGE_MAP_LIVE, &page_map->flags)) + page_map_destroy(page_map); + /* pages are dead and unused, undo the arch mapping */ arch_remove_memory(page_map->res.start, resource_size(&page_map->res)); } @@ -155,7 +219,7 @@ void *devm_memremap_pages(struct device *dev, struct resource *res) int is_ram = region_intersects(res->start, resource_size(res), "System RAM"); struct page_map *page_map; - int error, nid; + int error, nid, rc; if (is_ram == REGION_MIXED) { WARN_ONCE(1, "%s attempted on mixed region %pr\n", @@ -172,6 +236,12 @@ void *devm_memremap_pages(struct device *dev, struct resource *res) return ERR_PTR(-ENOMEM); memcpy(&page_map->res, res, sizeof(*res)); + INIT_LIST_HEAD(&page_map->list); + page_map->dev = dev; + rc = percpu_ref_init(&page_map->map.percpu_ref, page_map_release, 0, + GFP_KERNEL); + if (rc) + return ERR_PTR(rc); nid = dev_to_node(dev); if (nid < 0) @@ -183,8 +253,32 @@ void *devm_memremap_pages(struct device *dev, struct resource *res) return ERR_PTR(error); } + set_bit(PAGE_MAP_LIVE, &page_map->flags); + mutex_lock(&page_map_lock); + list_add_rcu(&page_map->list, &page_maps); + mutex_unlock(&page_map_lock); + devres_add(dev, page_map); return __va(res->start); } EXPORT_SYMBOL(devm_memremap_pages); + +struct memremap_map * __must_check get_memremap_map(resource_size_t addr) +{ + struct memremap_map *ret = NULL; + struct page_map *page_map; + + rcu_read_lock(); + list_for_each_entry_rcu(page_map, &page_maps, list) { + if (addr >= page_map->res.start && addr <= page_map->res.end) { + if (percpu_ref_tryget(&page_map->map.percpu_ref)) + ret = &page_map->map; + break; + } + } + rcu_read_unlock(); + + return ret; +} +EXPORT_SYMBOL_GPL(get_memremap_map); #endif /* CONFIG_ZONE_DEVICE */ ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
next prev parent reply other threads:[~2015-08-26 21:39 UTC|newest] Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-08-26 1:27 [PATCH v2 0/9] initial struct page support for pmem Dan Williams 2015-08-26 1:27 ` Dan Williams 2015-08-26 1:27 ` [PATCH v2 1/9] dax: drop size parameter to ->direct_access() Dan Williams 2015-08-26 1:27 ` Dan Williams 2015-08-26 1:27 ` [PATCH v2 2/9] mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h Dan Williams 2015-08-26 1:27 ` Dan Williams 2015-09-18 23:42 ` Tyler Baker 2015-09-18 23:42 ` Tyler Baker 2015-09-18 23:59 ` Dan Williams 2015-09-18 23:59 ` Dan Williams 2015-09-19 6:49 ` Tyler Baker 2015-09-19 6:49 ` Tyler Baker 2015-09-29 19:21 ` Nicolas Pitre 2015-09-29 19:21 ` Nicolas Pitre 2015-09-29 19:36 ` Tyler Baker 2015-09-29 19:36 ` Tyler Baker 2015-09-29 19:36 ` Tyler Baker 2015-09-29 19:47 ` Nicolas Pitre 2015-09-29 19:47 ` Nicolas Pitre 2015-09-29 19:47 ` Nicolas Pitre 2015-08-26 1:27 ` [PATCH v2 3/9] mm: ZONE_DEVICE for "device memory" Dan Williams 2015-08-26 1:27 ` Dan Williams 2015-08-26 1:27 ` [PATCH v2 4/9] add devm_memremap_pages Dan Williams 2015-08-26 1:27 ` Dan Williams 2015-08-26 1:27 ` [PATCH v2 5/9] x86, pmem: push fallback handling to arch code Dan Williams 2015-08-26 1:27 ` Dan Williams 2015-08-26 12:41 ` Christoph Hellwig 2015-08-26 12:41 ` Christoph Hellwig 2015-08-26 21:34 ` Williams, Dan J 2015-08-26 21:34 ` Williams, Dan J 2015-08-27 7:33 ` hch 2015-08-27 7:33 ` hch 2015-08-28 20:22 ` Ross Zwisler 2015-08-28 20:22 ` Ross Zwisler 2015-08-28 21:41 ` Toshi Kani 2015-08-28 21:41 ` Toshi Kani 2015-08-28 21:47 ` Dan Williams 2015-08-28 21:47 ` Dan Williams 2015-08-28 21:48 ` Toshi Kani 2015-08-28 21:48 ` Toshi Kani 2015-08-29 4:04 ` Williams, Dan J 2015-08-29 4:04 ` Williams, Dan J 2015-08-29 13:57 ` hch 2015-08-29 13:57 ` hch 2015-08-26 1:27 ` [PATCH v2 6/9] libnvdimm, pfn: 'struct page' provider infrastructure Dan Williams 2015-08-26 1:27 ` Dan Williams 2015-08-26 1:28 ` [PATCH v2 7/9] libnvdimm, pmem: 'struct page' for pmem Dan Williams 2015-08-26 1:28 ` Dan Williams 2015-08-26 1:28 ` [PATCH v2 8/9] libnvdimm, pmem: direct map legacy pmem by default Dan Williams 2015-08-26 1:28 ` Dan Williams 2015-08-26 1:28 ` [PATCH v2 9/9] devm_memremap_pages: protect against pmem device unbind Dan Williams 2015-08-26 1:28 ` Dan Williams 2015-08-26 12:46 ` Christoph Hellwig 2015-08-26 12:46 ` Christoph Hellwig 2015-08-26 21:39 ` Williams, Dan J [this message] 2015-08-26 21:39 ` Williams, Dan J 2015-08-27 7:33 ` hch 2015-08-27 7:33 ` hch
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1440625157.31365.21.camel@intel.com \ --to=dan.j.williams@intel.com \ --cc=boaz@plexistor.com \ --cc=david@fromorbit.com \ --cc=hch@lst.de \ --cc=hpa@zytor.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-nvdimm@lists.01.org \ --cc=mingo@kernel.org \ --cc=ross.zwisler@linux.intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.