All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -mm 0/3] slab: cleanup mem hotplug synchronization
@ 2014-04-06 15:33 ` Vladimir Davydov
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Davydov @ 2014-04-06 15:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, devel

Hi,

kmem_cache_{create,destroy,shrink} need to get a stable value of
cpu/node online mask, because they init/destroy/access per-cpu/node
kmem_cache parts, which can be allocated or destroyed on cpu/mem
hotplug. To protect against cpu hotplug, these functions use
{get,put}_online_cpus. However, they do nothing to synchronize with
memory hotplug - taking the slab_mutex does not eliminate the
possibility of race as described in patch 3.

What we need there is something like get_online_cpus, but for memory. We
already have lock_memory_hotplug, which serves for the purpose, but it's
a bit of a hammer right now, because it's backed by a mutex. As a
result, it imposes some limitations to locking order, which are not
desirable, and can't be used just like get_online_cpus. I propose to
turn this mutex into an rw semaphore, which will be taken for reading in
lock_memory_hotplug and for writing in memory hotplug code (that's what
patch 1 does).

When I tried to use this rw semaphore in the slab implementation, I came
across a problem with lockdep: rw_semaphore is not marked as read
recursive although it is (at least, it looks so to me), so lockdep
complains about wrong ordering with a sysfs internal mutex in case of
slub, because in contrast to recursive read lock, non-recursive one
should always be taken in the same order with a mutex. That's why in
patch 2 I mark rw semaphore as read-recursive, just like rw spin lock.

Thanks,

Vladimir Davydov (3):
  mem-hotplug: turn mem_hotplug_mutex to rwsem
  lockdep: mark rwsem_acquire_read as recursive
  slab: lock_memory_hotplug for kmem_cache_{create,destroy,shrink}

 include/linux/lockdep.h |    2 +-
 include/linux/mmzone.h  |    7 ++---
 mm/memory_hotplug.c     |   70 +++++++++++++++++++++--------------------------
 mm/slab.c               |   26 ++----------------
 mm/slab.h               |    1 +
 mm/slab_common.c        |   35 ++++++++++++++++++++++--
 mm/slob.c               |    3 +-
 mm/slub.c               |    5 ++--
 mm/vmscan.c             |    2 +-
 9 files changed, 75 insertions(+), 76 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH -mm 0/3] slab: cleanup mem hotplug synchronization
@ 2014-04-06 15:33 ` Vladimir Davydov
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Davydov @ 2014-04-06 15:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, devel

Hi,

kmem_cache_{create,destroy,shrink} need to get a stable value of
cpu/node online mask, because they init/destroy/access per-cpu/node
kmem_cache parts, which can be allocated or destroyed on cpu/mem
hotplug. To protect against cpu hotplug, these functions use
{get,put}_online_cpus. However, they do nothing to synchronize with
memory hotplug - taking the slab_mutex does not eliminate the
possibility of race as described in patch 3.

What we need there is something like get_online_cpus, but for memory. We
already have lock_memory_hotplug, which serves for the purpose, but it's
a bit of a hammer right now, because it's backed by a mutex. As a
result, it imposes some limitations to locking order, which are not
desirable, and can't be used just like get_online_cpus. I propose to
turn this mutex into an rw semaphore, which will be taken for reading in
lock_memory_hotplug and for writing in memory hotplug code (that's what
patch 1 does).

When I tried to use this rw semaphore in the slab implementation, I came
across a problem with lockdep: rw_semaphore is not marked as read
recursive although it is (at least, it looks so to me), so lockdep
complains about wrong ordering with a sysfs internal mutex in case of
slub, because in contrast to recursive read lock, non-recursive one
should always be taken in the same order with a mutex. That's why in
patch 2 I mark rw semaphore as read-recursive, just like rw spin lock.

Thanks,

Vladimir Davydov (3):
  mem-hotplug: turn mem_hotplug_mutex to rwsem
  lockdep: mark rwsem_acquire_read as recursive
  slab: lock_memory_hotplug for kmem_cache_{create,destroy,shrink}

 include/linux/lockdep.h |    2 +-
 include/linux/mmzone.h  |    7 ++---
 mm/memory_hotplug.c     |   70 +++++++++++++++++++++--------------------------
 mm/slab.c               |   26 ++----------------
 mm/slab.h               |    1 +
 mm/slab_common.c        |   35 ++++++++++++++++++++++--
 mm/slob.c               |    3 +-
 mm/slub.c               |    5 ++--
 mm/vmscan.c             |    2 +-
 9 files changed, 75 insertions(+), 76 deletions(-)

-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH -mm 1/3] mem-hotplug: turn mem_hotplug_mutex to rwsem
  2014-04-06 15:33 ` Vladimir Davydov
@ 2014-04-06 15:33   ` Vladimir Davydov
  -1 siblings, 0 replies; 14+ messages in thread
From: Vladimir Davydov @ 2014-04-06 15:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, devel

{un}lock_memory_hotplug is used to synchronize against memory hotplug.
Currently it is backed by a mutex, which makes it a bit of hammer -
threads that only want to get a stable value of online nodes mask won't
be able to proceed concurrently.

This patch fixes this by turning mem_hotplug_mutex to an rw sempahore so
that lock_memory_hotplug only takes it for reading while the memory
hotplug code locks it for writing. This will allow to invoke code under
{un}lock_memory_hotplug concurrently while still guaranteeing protection
against memory hotplug.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/mmzone.h |    7 +++--
 mm/memory_hotplug.c    |   70 +++++++++++++++++++++---------------------------
 mm/vmscan.c            |    2 +-
 3 files changed, 35 insertions(+), 44 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fac5509c18f0..586cc2d2c25e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -481,9 +481,8 @@ struct zone {
 	 * give them a chance of being in the same cacheline.
 	 *
 	 * Write access to present_pages at runtime should be protected by
-	 * lock_memory_hotplug()/unlock_memory_hotplug().  Any reader who can't
-	 * tolerant drift of present_pages should hold memory hotplug lock to
-	 * get a stable value.
+	 * mem_hotplug_sem. Any reader who can't tolerant drift of
+	 * present_pages should take it for reading to get a stable value.
 	 *
 	 * Read access to managed_pages should be safe because it's unsigned
 	 * long. Write access to zone->managed_pages and totalram_pages are
@@ -766,7 +765,7 @@ typedef struct pglist_data {
 	nodemask_t reclaim_nodes;	/* Nodes allowed to reclaim from */
 	wait_queue_head_t kswapd_wait;
 	wait_queue_head_t pfmemalloc_wait;
-	struct task_struct *kswapd;	/* Protected by lock_memory_hotplug() */
+	struct task_struct *kswapd;	/* Protected by mem_hotplug_sem */
 	int kswapd_max_order;
 	enum zone_type classzone_idx;
 #ifdef CONFIG_NUMA_BALANCING
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a650db29606f..052db2e72035 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -31,6 +31,7 @@
 #include <linux/stop_machine.h>
 #include <linux/hugetlb.h>
 #include <linux/memblock.h>
+#include <linux/rwsem.h>
 
 #include <asm/tlbflush.h>
 
@@ -47,16 +48,16 @@ static void generic_online_page(struct page *page);
 
 static online_page_callback_t online_page_callback = generic_online_page;
 
-DEFINE_MUTEX(mem_hotplug_mutex);
+static DECLARE_RWSEM(mem_hotplug_sem);
 
 void lock_memory_hotplug(void)
 {
-	mutex_lock(&mem_hotplug_mutex);
+	down_read(&mem_hotplug_sem);
 }
 
 void unlock_memory_hotplug(void)
 {
-	mutex_unlock(&mem_hotplug_mutex);
+	up_read(&mem_hotplug_sem);
 }
 
 
@@ -727,14 +728,14 @@ int set_online_page_callback(online_page_callback_t callback)
 {
 	int rc = -EINVAL;
 
-	lock_memory_hotplug();
+	down_write(&mem_hotplug_sem);
 
 	if (online_page_callback == generic_online_page) {
 		online_page_callback = callback;
 		rc = 0;
 	}
 
-	unlock_memory_hotplug();
+	up_write(&mem_hotplug_sem);
 
 	return rc;
 }
@@ -744,14 +745,14 @@ int restore_online_page_callback(online_page_callback_t callback)
 {
 	int rc = -EINVAL;
 
-	lock_memory_hotplug();
+	down_write(&mem_hotplug_sem);
 
 	if (online_page_callback == callback) {
 		online_page_callback = generic_online_page;
 		rc = 0;
 	}
 
-	unlock_memory_hotplug();
+	up_write(&mem_hotplug_sem);
 
 	return rc;
 }
@@ -899,7 +900,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	int ret;
 	struct memory_notify arg;
 
-	lock_memory_hotplug();
+	down_write(&mem_hotplug_sem);
 	/*
 	 * This doesn't need a lock to do pfn_to_page().
 	 * The section can't be removed here because of the
@@ -907,23 +908,18 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	 */
 	zone = page_zone(pfn_to_page(pfn));
 
+	ret = -EINVAL;
 	if ((zone_idx(zone) > ZONE_NORMAL || online_type == ONLINE_MOVABLE) &&
-	    !can_online_high_movable(zone)) {
-		unlock_memory_hotplug();
-		return -EINVAL;
-	}
+	    !can_online_high_movable(zone))
+		goto out;
 
 	if (online_type == ONLINE_KERNEL && zone_idx(zone) == ZONE_MOVABLE) {
-		if (move_pfn_range_left(zone - 1, zone, pfn, pfn + nr_pages)) {
-			unlock_memory_hotplug();
-			return -EINVAL;
-		}
+		if (move_pfn_range_left(zone - 1, zone, pfn, pfn + nr_pages))
+			goto out;
 	}
 	if (online_type == ONLINE_MOVABLE && zone_idx(zone) == ZONE_MOVABLE - 1) {
-		if (move_pfn_range_right(zone, zone + 1, pfn, pfn + nr_pages)) {
-			unlock_memory_hotplug();
-			return -EINVAL;
-		}
+		if (move_pfn_range_right(zone, zone + 1, pfn, pfn + nr_pages))
+			goto out;
 	}
 
 	/* Previous code may changed the zone of the pfn range */
@@ -939,8 +935,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	ret = notifier_to_errno(ret);
 	if (ret) {
 		memory_notify(MEM_CANCEL_ONLINE, &arg);
-		unlock_memory_hotplug();
-		return ret;
+		goto out;
 	}
 	/*
 	 * If this zone is not populated, then it is not in zonelist.
@@ -964,8 +959,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 		       (((unsigned long long) pfn + nr_pages)
 			    << PAGE_SHIFT) - 1);
 		memory_notify(MEM_CANCEL_ONLINE, &arg);
-		unlock_memory_hotplug();
-		return ret;
+		goto out;
 	}
 
 	zone->present_pages += onlined_pages;
@@ -995,9 +989,9 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 
 	if (onlined_pages)
 		memory_notify(MEM_ONLINE, &arg);
-	unlock_memory_hotplug();
-
-	return 0;
+out:
+	up_write(&mem_hotplug_sem);
+	return ret;
 }
 #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
 
@@ -1055,7 +1049,7 @@ int try_online_node(int nid)
 	if (node_online(nid))
 		return 0;
 
-	lock_memory_hotplug();
+	down_write(&mem_hotplug_sem);
 	pgdat = hotadd_new_pgdat(nid, 0);
 	if (!pgdat) {
 		pr_err("Cannot online node %d due to NULL pgdat\n", nid);
@@ -1073,7 +1067,7 @@ int try_online_node(int nid)
 	}
 
 out:
-	unlock_memory_hotplug();
+	up_write(&mem_hotplug_sem);
 	return ret;
 }
 
@@ -1117,7 +1111,7 @@ int __ref add_memory(int nid, u64 start, u64 size)
 		new_pgdat = !p;
 	}
 
-	lock_memory_hotplug();
+	down_write(&mem_hotplug_sem);
 
 	new_node = !node_online(nid);
 	if (new_node) {
@@ -1158,7 +1152,7 @@ error:
 	release_memory_resource(res);
 
 out:
-	unlock_memory_hotplug();
+	up_write(&mem_hotplug_sem);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(add_memory);
@@ -1565,7 +1559,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	if (!test_pages_in_a_zone(start_pfn, end_pfn))
 		return -EINVAL;
 
-	lock_memory_hotplug();
+	down_write(&mem_hotplug_sem);
 
 	zone = page_zone(pfn_to_page(start_pfn));
 	node = zone_to_nid(zone);
@@ -1672,7 +1666,7 @@ repeat:
 	writeback_set_ratelimit();
 
 	memory_notify(MEM_OFFLINE, &arg);
-	unlock_memory_hotplug();
+	up_write(&mem_hotplug_sem);
 	return 0;
 
 failed_removal:
@@ -1684,7 +1678,7 @@ failed_removal:
 	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
 
 out:
-	unlock_memory_hotplug();
+	up_write(&mem_hotplug_sem);
 	return ret;
 }
 
@@ -1888,7 +1882,7 @@ void __ref remove_memory(int nid, u64 start, u64 size)
 
 	BUG_ON(check_hotplug_memory_range(start, size));
 
-	lock_memory_hotplug();
+	down_write(&mem_hotplug_sem);
 
 	/*
 	 * All memory blocks must be offlined before removing memory.  Check
@@ -1897,10 +1891,8 @@ void __ref remove_memory(int nid, u64 start, u64 size)
 	 */
 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
 				check_memblock_offlined_cb);
-	if (ret) {
-		unlock_memory_hotplug();
+	if (ret)
 		BUG();
-	}
 
 	/* remove memmap entry */
 	firmware_map_remove(start, start + size, "System RAM");
@@ -1909,7 +1901,7 @@ void __ref remove_memory(int nid, u64 start, u64 size)
 
 	try_offline_node(nid);
 
-	unlock_memory_hotplug();
+	up_write(&mem_hotplug_sem);
 }
 EXPORT_SYMBOL_GPL(remove_memory);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 06879ead7380..b6fe28a5dcd1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3418,7 +3418,7 @@ int kswapd_run(int nid)
 
 /*
  * Called by memory hotplug when all memory in a node is offlined.  Caller must
- * hold lock_memory_hotplug().
+ * hold mem_hotplug_sem for writing.
  */
 void kswapd_stop(int nid)
 {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH -mm 1/3] mem-hotplug: turn mem_hotplug_mutex to rwsem
@ 2014-04-06 15:33   ` Vladimir Davydov
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Davydov @ 2014-04-06 15:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, devel

{un}lock_memory_hotplug is used to synchronize against memory hotplug.
Currently it is backed by a mutex, which makes it a bit of hammer -
threads that only want to get a stable value of online nodes mask won't
be able to proceed concurrently.

This patch fixes this by turning mem_hotplug_mutex to an rw sempahore so
that lock_memory_hotplug only takes it for reading while the memory
hotplug code locks it for writing. This will allow to invoke code under
{un}lock_memory_hotplug concurrently while still guaranteeing protection
against memory hotplug.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/mmzone.h |    7 +++--
 mm/memory_hotplug.c    |   70 +++++++++++++++++++++---------------------------
 mm/vmscan.c            |    2 +-
 3 files changed, 35 insertions(+), 44 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fac5509c18f0..586cc2d2c25e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -481,9 +481,8 @@ struct zone {
 	 * give them a chance of being in the same cacheline.
 	 *
 	 * Write access to present_pages at runtime should be protected by
-	 * lock_memory_hotplug()/unlock_memory_hotplug().  Any reader who can't
-	 * tolerant drift of present_pages should hold memory hotplug lock to
-	 * get a stable value.
+	 * mem_hotplug_sem. Any reader who can't tolerant drift of
+	 * present_pages should take it for reading to get a stable value.
 	 *
 	 * Read access to managed_pages should be safe because it's unsigned
 	 * long. Write access to zone->managed_pages and totalram_pages are
@@ -766,7 +765,7 @@ typedef struct pglist_data {
 	nodemask_t reclaim_nodes;	/* Nodes allowed to reclaim from */
 	wait_queue_head_t kswapd_wait;
 	wait_queue_head_t pfmemalloc_wait;
-	struct task_struct *kswapd;	/* Protected by lock_memory_hotplug() */
+	struct task_struct *kswapd;	/* Protected by mem_hotplug_sem */
 	int kswapd_max_order;
 	enum zone_type classzone_idx;
 #ifdef CONFIG_NUMA_BALANCING
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a650db29606f..052db2e72035 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -31,6 +31,7 @@
 #include <linux/stop_machine.h>
 #include <linux/hugetlb.h>
 #include <linux/memblock.h>
+#include <linux/rwsem.h>
 
 #include <asm/tlbflush.h>
 
@@ -47,16 +48,16 @@ static void generic_online_page(struct page *page);
 
 static online_page_callback_t online_page_callback = generic_online_page;
 
-DEFINE_MUTEX(mem_hotplug_mutex);
+static DECLARE_RWSEM(mem_hotplug_sem);
 
 void lock_memory_hotplug(void)
 {
-	mutex_lock(&mem_hotplug_mutex);
+	down_read(&mem_hotplug_sem);
 }
 
 void unlock_memory_hotplug(void)
 {
-	mutex_unlock(&mem_hotplug_mutex);
+	up_read(&mem_hotplug_sem);
 }
 
 
@@ -727,14 +728,14 @@ int set_online_page_callback(online_page_callback_t callback)
 {
 	int rc = -EINVAL;
 
-	lock_memory_hotplug();
+	down_write(&mem_hotplug_sem);
 
 	if (online_page_callback == generic_online_page) {
 		online_page_callback = callback;
 		rc = 0;
 	}
 
-	unlock_memory_hotplug();
+	up_write(&mem_hotplug_sem);
 
 	return rc;
 }
@@ -744,14 +745,14 @@ int restore_online_page_callback(online_page_callback_t callback)
 {
 	int rc = -EINVAL;
 
-	lock_memory_hotplug();
+	down_write(&mem_hotplug_sem);
 
 	if (online_page_callback == callback) {
 		online_page_callback = generic_online_page;
 		rc = 0;
 	}
 
-	unlock_memory_hotplug();
+	up_write(&mem_hotplug_sem);
 
 	return rc;
 }
@@ -899,7 +900,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	int ret;
 	struct memory_notify arg;
 
-	lock_memory_hotplug();
+	down_write(&mem_hotplug_sem);
 	/*
 	 * This doesn't need a lock to do pfn_to_page().
 	 * The section can't be removed here because of the
@@ -907,23 +908,18 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	 */
 	zone = page_zone(pfn_to_page(pfn));
 
+	ret = -EINVAL;
 	if ((zone_idx(zone) > ZONE_NORMAL || online_type == ONLINE_MOVABLE) &&
-	    !can_online_high_movable(zone)) {
-		unlock_memory_hotplug();
-		return -EINVAL;
-	}
+	    !can_online_high_movable(zone))
+		goto out;
 
 	if (online_type == ONLINE_KERNEL && zone_idx(zone) == ZONE_MOVABLE) {
-		if (move_pfn_range_left(zone - 1, zone, pfn, pfn + nr_pages)) {
-			unlock_memory_hotplug();
-			return -EINVAL;
-		}
+		if (move_pfn_range_left(zone - 1, zone, pfn, pfn + nr_pages))
+			goto out;
 	}
 	if (online_type == ONLINE_MOVABLE && zone_idx(zone) == ZONE_MOVABLE - 1) {
-		if (move_pfn_range_right(zone, zone + 1, pfn, pfn + nr_pages)) {
-			unlock_memory_hotplug();
-			return -EINVAL;
-		}
+		if (move_pfn_range_right(zone, zone + 1, pfn, pfn + nr_pages))
+			goto out;
 	}
 
 	/* Previous code may changed the zone of the pfn range */
@@ -939,8 +935,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	ret = notifier_to_errno(ret);
 	if (ret) {
 		memory_notify(MEM_CANCEL_ONLINE, &arg);
-		unlock_memory_hotplug();
-		return ret;
+		goto out;
 	}
 	/*
 	 * If this zone is not populated, then it is not in zonelist.
@@ -964,8 +959,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 		       (((unsigned long long) pfn + nr_pages)
 			    << PAGE_SHIFT) - 1);
 		memory_notify(MEM_CANCEL_ONLINE, &arg);
-		unlock_memory_hotplug();
-		return ret;
+		goto out;
 	}
 
 	zone->present_pages += onlined_pages;
@@ -995,9 +989,9 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 
 	if (onlined_pages)
 		memory_notify(MEM_ONLINE, &arg);
-	unlock_memory_hotplug();
-
-	return 0;
+out:
+	up_write(&mem_hotplug_sem);
+	return ret;
 }
 #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
 
@@ -1055,7 +1049,7 @@ int try_online_node(int nid)
 	if (node_online(nid))
 		return 0;
 
-	lock_memory_hotplug();
+	down_write(&mem_hotplug_sem);
 	pgdat = hotadd_new_pgdat(nid, 0);
 	if (!pgdat) {
 		pr_err("Cannot online node %d due to NULL pgdat\n", nid);
@@ -1073,7 +1067,7 @@ int try_online_node(int nid)
 	}
 
 out:
-	unlock_memory_hotplug();
+	up_write(&mem_hotplug_sem);
 	return ret;
 }
 
@@ -1117,7 +1111,7 @@ int __ref add_memory(int nid, u64 start, u64 size)
 		new_pgdat = !p;
 	}
 
-	lock_memory_hotplug();
+	down_write(&mem_hotplug_sem);
 
 	new_node = !node_online(nid);
 	if (new_node) {
@@ -1158,7 +1152,7 @@ error:
 	release_memory_resource(res);
 
 out:
-	unlock_memory_hotplug();
+	up_write(&mem_hotplug_sem);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(add_memory);
@@ -1565,7 +1559,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	if (!test_pages_in_a_zone(start_pfn, end_pfn))
 		return -EINVAL;
 
-	lock_memory_hotplug();
+	down_write(&mem_hotplug_sem);
 
 	zone = page_zone(pfn_to_page(start_pfn));
 	node = zone_to_nid(zone);
@@ -1672,7 +1666,7 @@ repeat:
 	writeback_set_ratelimit();
 
 	memory_notify(MEM_OFFLINE, &arg);
-	unlock_memory_hotplug();
+	up_write(&mem_hotplug_sem);
 	return 0;
 
 failed_removal:
@@ -1684,7 +1678,7 @@ failed_removal:
 	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
 
 out:
-	unlock_memory_hotplug();
+	up_write(&mem_hotplug_sem);
 	return ret;
 }
 
@@ -1888,7 +1882,7 @@ void __ref remove_memory(int nid, u64 start, u64 size)
 
 	BUG_ON(check_hotplug_memory_range(start, size));
 
-	lock_memory_hotplug();
+	down_write(&mem_hotplug_sem);
 
 	/*
 	 * All memory blocks must be offlined before removing memory.  Check
@@ -1897,10 +1891,8 @@ void __ref remove_memory(int nid, u64 start, u64 size)
 	 */
 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
 				check_memblock_offlined_cb);
-	if (ret) {
-		unlock_memory_hotplug();
+	if (ret)
 		BUG();
-	}
 
 	/* remove memmap entry */
 	firmware_map_remove(start, start + size, "System RAM");
@@ -1909,7 +1901,7 @@ void __ref remove_memory(int nid, u64 start, u64 size)
 
 	try_offline_node(nid);
 
-	unlock_memory_hotplug();
+	up_write(&mem_hotplug_sem);
 }
 EXPORT_SYMBOL_GPL(remove_memory);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 06879ead7380..b6fe28a5dcd1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3418,7 +3418,7 @@ int kswapd_run(int nid)
 
 /*
  * Called by memory hotplug when all memory in a node is offlined.  Caller must
- * hold lock_memory_hotplug().
+ * hold mem_hotplug_sem for writing.
  */
 void kswapd_stop(int nid)
 {
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH -mm 2/3] lockdep: mark rwsem_acquire_read as recursive
  2014-04-06 15:33 ` Vladimir Davydov
@ 2014-04-06 15:33   ` Vladimir Davydov
  -1 siblings, 0 replies; 14+ messages in thread
From: Vladimir Davydov @ 2014-04-06 15:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, devel, Peter Zijlstra, Ingo Molnar

rw_semaphore implementation allows recursing calls to down_read, but
lockdep thinks that it doesn't. As a result, it will complain
false-positively, e.g. if we do not observe some predefined locking
order when taking an rw semaphore for reading and a mutex.

This patch makes lockdep think rw semaphore is read-recursive, just like
rw spin lock.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
---
 include/linux/lockdep.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 008388f920d7..4b95fe85375e 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -500,7 +500,7 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 
 #define rwsem_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
 #define rwsem_acquire_nest(l, s, t, n, i)	lock_acquire_exclusive(l, s, t, n, i)
-#define rwsem_acquire_read(l, s, t, i)		lock_acquire_shared(l, s, t, NULL, i)
+#define rwsem_acquire_read(l, s, t, i)		lock_acquire_shared_recursive(l, s, t, NULL, i)
 #define rwsem_release(l, n, i)			lock_release(l, n, i)
 
 #define lock_map_acquire(l)			lock_acquire_exclusive(l, 0, 0, NULL, _THIS_IP_)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH -mm 2/3] lockdep: mark rwsem_acquire_read as recursive
@ 2014-04-06 15:33   ` Vladimir Davydov
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Davydov @ 2014-04-06 15:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, devel, Peter Zijlstra, Ingo Molnar

rw_semaphore implementation allows recursing calls to down_read, but
lockdep thinks that it doesn't. As a result, it will complain
false-positively, e.g. if we do not observe some predefined locking
order when taking an rw semaphore for reading and a mutex.

This patch makes lockdep think rw semaphore is read-recursive, just like
rw spin lock.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
---
 include/linux/lockdep.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 008388f920d7..4b95fe85375e 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -500,7 +500,7 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 
 #define rwsem_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
 #define rwsem_acquire_nest(l, s, t, n, i)	lock_acquire_exclusive(l, s, t, n, i)
-#define rwsem_acquire_read(l, s, t, i)		lock_acquire_shared(l, s, t, NULL, i)
+#define rwsem_acquire_read(l, s, t, i)		lock_acquire_shared_recursive(l, s, t, NULL, i)
 #define rwsem_release(l, n, i)			lock_release(l, n, i)
 
 #define lock_map_acquire(l)			lock_acquire_exclusive(l, 0, 0, NULL, _THIS_IP_)
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH -mm 3/3] slab: lock_memory_hotplug for kmem_cache_{create,destroy,shrink}
  2014-04-06 15:33 ` Vladimir Davydov
@ 2014-04-06 15:33   ` Vladimir Davydov
  -1 siblings, 0 replies; 14+ messages in thread
From: Vladimir Davydov @ 2014-04-06 15:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, devel, Christoph Lameter, Pekka Enberg

When we create a sl[au]b cache, we allocate kmem_cache_node structures
for each online NUMA node. To handle nodes taken online/offline, we
register memory hotplug notifier and allocate/free kmem_cache_node
corresponding to the node that changes its state for each kmem cache.

To synchronize between the two paths we hold the slab_mutex during both
the cache creationg/destruction path and while tuning per-node parts of
kmem caches in memory hotplug handler, but that's not quite right,
because it does not guarantee that a newly created cache will have all
kmem_cache_nodes initialized in case it races with memory hotplug. For
instance, in case of slub:

    CPU0                            CPU1
    ----                            ----
    kmem_cache_create:              online_pages:
     __kmem_cache_create:            slab_memory_callback:
                                      slab_mem_going_online_callback:
                                       lock slab_mutex
                                       for each slab_caches list entry
                                           allocate kmem_cache node
                                       unlock slab_mutex
      lock slab_mutex
      init_kmem_cache_nodes:
       for_each_node_state(node, N_NORMAL_MEMORY)
           allocate kmem_cache node
      add kmem_cache to slab_caches list
      unlock slab_mutex
                                    online_pages (continued):
                                     node_states_set_node

As a result we'll get a kmem cache with not all kmem_cache_nodes
allocated.

To avoid issues like that we should hold the memory hotplug lock during
the whole kmem cache creation/destruction/shrink paths, just like we do
with cpu hotplug. This patch does the trick.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
---
 mm/slab.c        |   26 ++------------------------
 mm/slab.h        |    1 +
 mm/slab_common.c |   35 +++++++++++++++++++++++++++++++++--
 mm/slob.c        |    3 +--
 mm/slub.c        |    5 ++---
 5 files changed, 39 insertions(+), 31 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index eebc619ae33c..b33aae64e5d7 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2442,8 +2442,7 @@ out:
 	return nr_freed;
 }
 
-/* Called with slab_mutex held to protect against cpu hotplug */
-static int __cache_shrink(struct kmem_cache *cachep)
+int __kmem_cache_shrink(struct kmem_cache *cachep)
 {
 	int ret = 0, i = 0;
 	struct kmem_cache_node *n;
@@ -2464,32 +2463,11 @@ static int __cache_shrink(struct kmem_cache *cachep)
 	return (ret ? 1 : 0);
 }
 
-/**
- * kmem_cache_shrink - Shrink a cache.
- * @cachep: The cache to shrink.
- *
- * Releases as many slabs as possible for a cache.
- * To help debugging, a zero exit status indicates all slabs were released.
- */
-int kmem_cache_shrink(struct kmem_cache *cachep)
-{
-	int ret;
-	BUG_ON(!cachep || in_interrupt());
-
-	get_online_cpus();
-	mutex_lock(&slab_mutex);
-	ret = __cache_shrink(cachep);
-	mutex_unlock(&slab_mutex);
-	put_online_cpus();
-	return ret;
-}
-EXPORT_SYMBOL(kmem_cache_shrink);
-
 int __kmem_cache_shutdown(struct kmem_cache *cachep)
 {
 	int i;
 	struct kmem_cache_node *n;
-	int rc = __cache_shrink(cachep);
+	int rc = __kmem_cache_shrink(cachep);
 
 	if (rc)
 		return rc;
diff --git a/mm/slab.h b/mm/slab.h
index 3045316b7c9d..00a47de7631f 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -91,6 +91,7 @@ __kmem_cache_alias(const char *name, size_t size, size_t align,
 #define CACHE_CREATE_MASK (SLAB_CORE_FLAGS | SLAB_DEBUG_FLAGS | SLAB_CACHE_FLAGS)
 
 int __kmem_cache_shutdown(struct kmem_cache *);
+int __kmem_cache_shrink(struct kmem_cache *);
 
 struct seq_file;
 struct file;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index f3cfccf76dda..321953cae074 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -205,6 +205,8 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 	int err;
 
 	get_online_cpus();
+	lock_memory_hotplug();
+
 	mutex_lock(&slab_mutex);
 
 	err = kmem_cache_sanity_check(name, size);
@@ -239,6 +241,8 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 
 out_unlock:
 	mutex_unlock(&slab_mutex);
+
+	unlock_memory_hotplug();
 	put_online_cpus();
 
 	if (err) {
@@ -272,6 +276,8 @@ void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_c
 	char *cache_name;
 
 	get_online_cpus();
+	lock_memory_hotplug();
+
 	mutex_lock(&slab_mutex);
 
 	/*
@@ -299,6 +305,8 @@ void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_c
 
 out_unlock:
 	mutex_unlock(&slab_mutex);
+
+	unlock_memory_hotplug();
 	put_online_cpus();
 }
 
@@ -326,6 +334,8 @@ static int kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 void kmem_cache_destroy(struct kmem_cache *s)
 {
 	get_online_cpus();
+	lock_memory_hotplug();
+
 	mutex_lock(&slab_mutex);
 
 	s->refcount--;
@@ -354,15 +364,36 @@ void kmem_cache_destroy(struct kmem_cache *s)
 	memcg_free_cache_params(s);
 	kfree(s->name);
 	kmem_cache_free(kmem_cache, s);
-	goto out_put_cpus;
+	goto out;
 
 out_unlock:
 	mutex_unlock(&slab_mutex);
-out_put_cpus:
+out:
+	unlock_memory_hotplug();
 	put_online_cpus();
 }
 EXPORT_SYMBOL(kmem_cache_destroy);
 
+/**
+ * kmem_cache_shrink - Shrink a cache.
+ * @cachep: The cache to shrink.
+ *
+ * Releases as many slabs as possible for a cache.
+ * To help debugging, a zero exit status indicates all slabs were released.
+ */
+int kmem_cache_shrink(struct kmem_cache *cachep)
+{
+	int ret;
+
+	get_online_cpus();
+	lock_memory_hotplug();
+	ret = __kmem_cache_shrink(cachep);
+	unlock_memory_hotplug();
+	put_online_cpus();
+	return ret;
+}
+EXPORT_SYMBOL(kmem_cache_shrink);
+
 int slab_is_available(void)
 {
 	return slab_state >= UP;
diff --git a/mm/slob.c b/mm/slob.c
index 730cad45d4be..21980e0f39a8 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -620,11 +620,10 @@ int __kmem_cache_shutdown(struct kmem_cache *c)
 	return 0;
 }
 
-int kmem_cache_shrink(struct kmem_cache *d)
+int __kmem_cache_shrink(struct kmem_cache *d)
 {
 	return 0;
 }
-EXPORT_SYMBOL(kmem_cache_shrink);
 
 struct kmem_cache kmem_cache_boot = {
 	.name = "kmem_cache",
diff --git a/mm/slub.c b/mm/slub.c
index 5e234f1f8853..26d185d8cd64 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3412,7 +3412,7 @@ EXPORT_SYMBOL(kfree);
  * being allocated from last increasing the chance that the last objects
  * are freed in them.
  */
-int kmem_cache_shrink(struct kmem_cache *s)
+int __kmem_cache_shrink(struct kmem_cache *s)
 {
 	int node;
 	int i;
@@ -3468,7 +3468,6 @@ int kmem_cache_shrink(struct kmem_cache *s)
 	kfree(slabs_by_inuse);
 	return 0;
 }
-EXPORT_SYMBOL(kmem_cache_shrink);
 
 static int slab_mem_going_offline_callback(void *arg)
 {
@@ -3476,7 +3475,7 @@ static int slab_mem_going_offline_callback(void *arg)
 
 	mutex_lock(&slab_mutex);
 	list_for_each_entry(s, &slab_caches, list)
-		kmem_cache_shrink(s);
+		__kmem_cache_shrink(s);
 	mutex_unlock(&slab_mutex);
 
 	return 0;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH -mm 3/3] slab: lock_memory_hotplug for kmem_cache_{create,destroy,shrink}
@ 2014-04-06 15:33   ` Vladimir Davydov
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Davydov @ 2014-04-06 15:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, devel, Christoph Lameter, Pekka Enberg

When we create a sl[au]b cache, we allocate kmem_cache_node structures
for each online NUMA node. To handle nodes taken online/offline, we
register memory hotplug notifier and allocate/free kmem_cache_node
corresponding to the node that changes its state for each kmem cache.

To synchronize between the two paths we hold the slab_mutex during both
the cache creationg/destruction path and while tuning per-node parts of
kmem caches in memory hotplug handler, but that's not quite right,
because it does not guarantee that a newly created cache will have all
kmem_cache_nodes initialized in case it races with memory hotplug. For
instance, in case of slub:

    CPU0                            CPU1
    ----                            ----
    kmem_cache_create:              online_pages:
     __kmem_cache_create:            slab_memory_callback:
                                      slab_mem_going_online_callback:
                                       lock slab_mutex
                                       for each slab_caches list entry
                                           allocate kmem_cache node
                                       unlock slab_mutex
      lock slab_mutex
      init_kmem_cache_nodes:
       for_each_node_state(node, N_NORMAL_MEMORY)
           allocate kmem_cache node
      add kmem_cache to slab_caches list
      unlock slab_mutex
                                    online_pages (continued):
                                     node_states_set_node

As a result we'll get a kmem cache with not all kmem_cache_nodes
allocated.

To avoid issues like that we should hold the memory hotplug lock during
the whole kmem cache creation/destruction/shrink paths, just like we do
with cpu hotplug. This patch does the trick.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
---
 mm/slab.c        |   26 ++------------------------
 mm/slab.h        |    1 +
 mm/slab_common.c |   35 +++++++++++++++++++++++++++++++++--
 mm/slob.c        |    3 +--
 mm/slub.c        |    5 ++---
 5 files changed, 39 insertions(+), 31 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index eebc619ae33c..b33aae64e5d7 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2442,8 +2442,7 @@ out:
 	return nr_freed;
 }
 
-/* Called with slab_mutex held to protect against cpu hotplug */
-static int __cache_shrink(struct kmem_cache *cachep)
+int __kmem_cache_shrink(struct kmem_cache *cachep)
 {
 	int ret = 0, i = 0;
 	struct kmem_cache_node *n;
@@ -2464,32 +2463,11 @@ static int __cache_shrink(struct kmem_cache *cachep)
 	return (ret ? 1 : 0);
 }
 
-/**
- * kmem_cache_shrink - Shrink a cache.
- * @cachep: The cache to shrink.
- *
- * Releases as many slabs as possible for a cache.
- * To help debugging, a zero exit status indicates all slabs were released.
- */
-int kmem_cache_shrink(struct kmem_cache *cachep)
-{
-	int ret;
-	BUG_ON(!cachep || in_interrupt());
-
-	get_online_cpus();
-	mutex_lock(&slab_mutex);
-	ret = __cache_shrink(cachep);
-	mutex_unlock(&slab_mutex);
-	put_online_cpus();
-	return ret;
-}
-EXPORT_SYMBOL(kmem_cache_shrink);
-
 int __kmem_cache_shutdown(struct kmem_cache *cachep)
 {
 	int i;
 	struct kmem_cache_node *n;
-	int rc = __cache_shrink(cachep);
+	int rc = __kmem_cache_shrink(cachep);
 
 	if (rc)
 		return rc;
diff --git a/mm/slab.h b/mm/slab.h
index 3045316b7c9d..00a47de7631f 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -91,6 +91,7 @@ __kmem_cache_alias(const char *name, size_t size, size_t align,
 #define CACHE_CREATE_MASK (SLAB_CORE_FLAGS | SLAB_DEBUG_FLAGS | SLAB_CACHE_FLAGS)
 
 int __kmem_cache_shutdown(struct kmem_cache *);
+int __kmem_cache_shrink(struct kmem_cache *);
 
 struct seq_file;
 struct file;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index f3cfccf76dda..321953cae074 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -205,6 +205,8 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 	int err;
 
 	get_online_cpus();
+	lock_memory_hotplug();
+
 	mutex_lock(&slab_mutex);
 
 	err = kmem_cache_sanity_check(name, size);
@@ -239,6 +241,8 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 
 out_unlock:
 	mutex_unlock(&slab_mutex);
+
+	unlock_memory_hotplug();
 	put_online_cpus();
 
 	if (err) {
@@ -272,6 +276,8 @@ void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_c
 	char *cache_name;
 
 	get_online_cpus();
+	lock_memory_hotplug();
+
 	mutex_lock(&slab_mutex);
 
 	/*
@@ -299,6 +305,8 @@ void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_c
 
 out_unlock:
 	mutex_unlock(&slab_mutex);
+
+	unlock_memory_hotplug();
 	put_online_cpus();
 }
 
@@ -326,6 +334,8 @@ static int kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 void kmem_cache_destroy(struct kmem_cache *s)
 {
 	get_online_cpus();
+	lock_memory_hotplug();
+
 	mutex_lock(&slab_mutex);
 
 	s->refcount--;
@@ -354,15 +364,36 @@ void kmem_cache_destroy(struct kmem_cache *s)
 	memcg_free_cache_params(s);
 	kfree(s->name);
 	kmem_cache_free(kmem_cache, s);
-	goto out_put_cpus;
+	goto out;
 
 out_unlock:
 	mutex_unlock(&slab_mutex);
-out_put_cpus:
+out:
+	unlock_memory_hotplug();
 	put_online_cpus();
 }
 EXPORT_SYMBOL(kmem_cache_destroy);
 
+/**
+ * kmem_cache_shrink - Shrink a cache.
+ * @cachep: The cache to shrink.
+ *
+ * Releases as many slabs as possible for a cache.
+ * To help debugging, a zero exit status indicates all slabs were released.
+ */
+int kmem_cache_shrink(struct kmem_cache *cachep)
+{
+	int ret;
+
+	get_online_cpus();
+	lock_memory_hotplug();
+	ret = __kmem_cache_shrink(cachep);
+	unlock_memory_hotplug();
+	put_online_cpus();
+	return ret;
+}
+EXPORT_SYMBOL(kmem_cache_shrink);
+
 int slab_is_available(void)
 {
 	return slab_state >= UP;
diff --git a/mm/slob.c b/mm/slob.c
index 730cad45d4be..21980e0f39a8 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -620,11 +620,10 @@ int __kmem_cache_shutdown(struct kmem_cache *c)
 	return 0;
 }
 
-int kmem_cache_shrink(struct kmem_cache *d)
+int __kmem_cache_shrink(struct kmem_cache *d)
 {
 	return 0;
 }
-EXPORT_SYMBOL(kmem_cache_shrink);
 
 struct kmem_cache kmem_cache_boot = {
 	.name = "kmem_cache",
diff --git a/mm/slub.c b/mm/slub.c
index 5e234f1f8853..26d185d8cd64 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3412,7 +3412,7 @@ EXPORT_SYMBOL(kfree);
  * being allocated from last increasing the chance that the last objects
  * are freed in them.
  */
-int kmem_cache_shrink(struct kmem_cache *s)
+int __kmem_cache_shrink(struct kmem_cache *s)
 {
 	int node;
 	int i;
@@ -3468,7 +3468,6 @@ int kmem_cache_shrink(struct kmem_cache *s)
 	kfree(slabs_by_inuse);
 	return 0;
 }
-EXPORT_SYMBOL(kmem_cache_shrink);
 
 static int slab_mem_going_offline_callback(void *arg)
 {
@@ -3476,7 +3475,7 @@ static int slab_mem_going_offline_callback(void *arg)
 
 	mutex_lock(&slab_mutex);
 	list_for_each_entry(s, &slab_caches, list)
-		kmem_cache_shrink(s);
+		__kmem_cache_shrink(s);
 	mutex_unlock(&slab_mutex);
 
 	return 0;
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH -mm 0/3] slab: cleanup mem hotplug synchronization
  2014-04-06 15:33 ` Vladimir Davydov
@ 2014-04-06 17:46   ` Vladimir Davydov
  -1 siblings, 0 replies; 14+ messages in thread
From: Vladimir Davydov @ 2014-04-06 17:46 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, devel, Peter Zijlstra, Ingo Molnar,
	Christoph Lameter, Pekka Enberg

On 04/06/2014 07:33 PM, Vladimir Davydov wrote:
> kmem_cache_{create,destroy,shrink} need to get a stable value of
> cpu/node online mask, because they init/destroy/access per-cpu/node
> kmem_cache parts, which can be allocated or destroyed on cpu/mem
> hotplug. To protect against cpu hotplug, these functions use
> {get,put}_online_cpus. However, they do nothing to synchronize with
> memory hotplug - taking the slab_mutex does not eliminate the
> possibility of race as described in patch 3.
>
> What we need there is something like get_online_cpus, but for memory. We
> already have lock_memory_hotplug, which serves for the purpose, but it's
> a bit of a hammer right now, because it's backed by a mutex. As a
> result, it imposes some limitations to locking order, which are not
> desirable, and can't be used just like get_online_cpus. I propose to
> turn this mutex into an rw semaphore, which will be taken for reading in
> lock_memory_hotplug and for writing in memory hotplug code (that's what
> patch 1 does).

This is absolutely wrong, because down_read cannot be nested inside
down/up_write critical section. Although it would work now, it could
result in deadlocks in future. Please ignore this set completely.

Actually we need to implement a recursive rw semaphore here, just like
cpu_hotplug_lock.

Sorry for the noise.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -mm 0/3] slab: cleanup mem hotplug synchronization
@ 2014-04-06 17:46   ` Vladimir Davydov
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Davydov @ 2014-04-06 17:46 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, devel, Peter Zijlstra, Ingo Molnar,
	Christoph Lameter, Pekka Enberg

On 04/06/2014 07:33 PM, Vladimir Davydov wrote:
> kmem_cache_{create,destroy,shrink} need to get a stable value of
> cpu/node online mask, because they init/destroy/access per-cpu/node
> kmem_cache parts, which can be allocated or destroyed on cpu/mem
> hotplug. To protect against cpu hotplug, these functions use
> {get,put}_online_cpus. However, they do nothing to synchronize with
> memory hotplug - taking the slab_mutex does not eliminate the
> possibility of race as described in patch 3.
>
> What we need there is something like get_online_cpus, but for memory. We
> already have lock_memory_hotplug, which serves for the purpose, but it's
> a bit of a hammer right now, because it's backed by a mutex. As a
> result, it imposes some limitations to locking order, which are not
> desirable, and can't be used just like get_online_cpus. I propose to
> turn this mutex into an rw semaphore, which will be taken for reading in
> lock_memory_hotplug and for writing in memory hotplug code (that's what
> patch 1 does).

This is absolutely wrong, because down_read cannot be nested inside
down/up_write critical section. Although it would work now, it could
result in deadlocks in future. Please ignore this set completely.

Actually we need to implement a recursive rw semaphore here, just like
cpu_hotplug_lock.

Sorry for the noise.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -mm 2/3] lockdep: mark rwsem_acquire_read as recursive
  2014-04-06 15:33   ` Vladimir Davydov
@ 2014-04-07  8:13     ` Peter Zijlstra
  -1 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2014-04-07  8:13 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: akpm, linux-kernel, linux-mm, devel, Ingo Molnar

On Sun, Apr 06, 2014 at 07:33:51PM +0400, Vladimir Davydov wrote:
> rw_semaphore implementation allows recursing calls to down_read, but
> lockdep thinks that it doesn't. As a result, it will complain
> false-positively, e.g. if we do not observe some predefined locking
> order when taking an rw semaphore for reading and a mutex.
> 
> This patch makes lockdep think rw semaphore is read-recursive, just like
> rw spin lock.

Uhm no rwsem isn't read recursive.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -mm 2/3] lockdep: mark rwsem_acquire_read as recursive
@ 2014-04-07  8:13     ` Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2014-04-07  8:13 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: akpm, linux-kernel, linux-mm, devel, Ingo Molnar

On Sun, Apr 06, 2014 at 07:33:51PM +0400, Vladimir Davydov wrote:
> rw_semaphore implementation allows recursing calls to down_read, but
> lockdep thinks that it doesn't. As a result, it will complain
> false-positively, e.g. if we do not observe some predefined locking
> order when taking an rw semaphore for reading and a mutex.
> 
> This patch makes lockdep think rw semaphore is read-recursive, just like
> rw spin lock.

Uhm no rwsem isn't read recursive.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -mm 2/3] lockdep: mark rwsem_acquire_read as recursive
  2014-04-07  8:13     ` Peter Zijlstra
@ 2014-04-07  8:26       ` Vladimir Davydov
  -1 siblings, 0 replies; 14+ messages in thread
From: Vladimir Davydov @ 2014-04-07  8:26 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: akpm, linux-kernel, linux-mm, devel, Ingo Molnar

On 04/07/2014 12:13 PM, Peter Zijlstra wrote:
> On Sun, Apr 06, 2014 at 07:33:51PM +0400, Vladimir Davydov wrote:
>> rw_semaphore implementation allows recursing calls to down_read, but
>> lockdep thinks that it doesn't. As a result, it will complain
>> false-positively, e.g. if we do not observe some predefined locking
>> order when taking an rw semaphore for reading and a mutex.
>>
>> This patch makes lockdep think rw semaphore is read-recursive, just like
>> rw spin lock.
> Uhm no rwsem isn't read recursive.

Yeah, I was mistaken and is already reworking my set. Please sorry for
the noise.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -mm 2/3] lockdep: mark rwsem_acquire_read as recursive
@ 2014-04-07  8:26       ` Vladimir Davydov
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Davydov @ 2014-04-07  8:26 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: akpm, linux-kernel, linux-mm, devel, Ingo Molnar

On 04/07/2014 12:13 PM, Peter Zijlstra wrote:
> On Sun, Apr 06, 2014 at 07:33:51PM +0400, Vladimir Davydov wrote:
>> rw_semaphore implementation allows recursing calls to down_read, but
>> lockdep thinks that it doesn't. As a result, it will complain
>> false-positively, e.g. if we do not observe some predefined locking
>> order when taking an rw semaphore for reading and a mutex.
>>
>> This patch makes lockdep think rw semaphore is read-recursive, just like
>> rw spin lock.
> Uhm no rwsem isn't read recursive.

Yeah, I was mistaken and is already reworking my set. Please sorry for
the noise.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-04-07  8:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-06 15:33 [PATCH -mm 0/3] slab: cleanup mem hotplug synchronization Vladimir Davydov
2014-04-06 15:33 ` Vladimir Davydov
2014-04-06 15:33 ` [PATCH -mm 1/3] mem-hotplug: turn mem_hotplug_mutex to rwsem Vladimir Davydov
2014-04-06 15:33   ` Vladimir Davydov
2014-04-06 15:33 ` [PATCH -mm 2/3] lockdep: mark rwsem_acquire_read as recursive Vladimir Davydov
2014-04-06 15:33   ` Vladimir Davydov
2014-04-07  8:13   ` Peter Zijlstra
2014-04-07  8:13     ` Peter Zijlstra
2014-04-07  8:26     ` Vladimir Davydov
2014-04-07  8:26       ` Vladimir Davydov
2014-04-06 15:33 ` [PATCH -mm 3/3] slab: lock_memory_hotplug for kmem_cache_{create,destroy,shrink} Vladimir Davydov
2014-04-06 15:33   ` Vladimir Davydov
2014-04-06 17:46 ` [PATCH -mm 0/3] slab: cleanup mem hotplug synchronization Vladimir Davydov
2014-04-06 17:46   ` Vladimir Davydov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.