linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] mm: online/offline_pages called w.o. mem_hotplug_lock
@ 2018-08-17  7:58 David Hildenbrand
  2018-08-17  7:59 ` [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug David Hildenbrand
  2018-08-17  7:59 ` [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
  0 siblings, 2 replies; 15+ messages in thread
From: David Hildenbrand @ 2018-08-17  7:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-acpi, devel, linux-s390, xen-devel,
	Andrew Morton, Michal Hocko, Vlastimil Babka, Dan Williams,
	Pavel Tatashin, Oscar Salvador, Vitaly Kuznetsov,
	Martin Schwidefsky, Heiko Carstens, David Hildenbrand,
	K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Benjamin Herrenschmidt, Paul Mackerras, Rafael J . Wysocki,
	Len Brown, Greg Kroah-Hartman, David Rientjes

Reading through the code and studying how mem_hotplug_lock is to be used,
I noticed that there are two places where we can end up calling
device_online()/device_offline() - online_pages()/offline_pages() without
the mem_hotplug_lock. And there are other places where we call
device_online()/device_offline() without the device_hotplug_lock.

While e.g.
	echo "online" > /sys/devices/system/memory/memory9/state
is fine, e.g.
	echo 1 > /sys/devices/system/memory/memory9/online
Will not take the mem_hotplug_lock. However the device_lock() and
device_hotplug_lock.

E.g. via memory_probe_store(), we can end up calling
add_memory()->online_pages() without the device_hotplug_lock. So we can
have concurrent callers in online_pages(). We e.g. touch in online_pages()
basically unprotected zone->present_pages then.

Looks like there is a longer history to that (see Patch #2 for details),
and fixing it to work the way it was intended is not really possible. We
would e.g. have to take the mem_hotplug_lock in device/base/core.c, which
sounds wrong.

Summary: We had a lock inversion on mem_hotplug_lock and device_lock(),
and the approach to fix it fixed one inversion, but dropped the
mem_hotplug_lock on another instance (.online).

As far as I understand from the code and from b93e0f329e24 ("mm,
memory_hotplug: get rid of zonelists_mutex"), mem_hotplug_lock is required
because we assume that
	"both memory online and offline are fully serialized."
and this is not the case if we only hold the device_lock().

I propose the general rules:

1. add_memory/add_memory_resource() must only be called with
   device_hotplug_lock. For now only done in ACPI code.
2. remove_memory() must only be called with device_hotplug_lock. This is
   already documented and true in ACPI code.
3. device_online()/device_offline() must only be called with
   device_hotplug_lock. This is already documented and true for now in core
   code. Other callers (related to memory hotplug) have to be fixed up.
4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
   online_pages/offline_pages. For now this is only true for the first two
   instances.

To me, this looks way cleaner than what we have right now (and easier to
verify). And looking at the documentation of remove_memory, using
lock_device_hotplug also for add_memory() feels natural. Second patch could
maybe split up.

But let's first hear if this is actually a problem and if there migh be
alternatives (or cleanups). Only tested with DIMM-based hotplug.

David Hildenbrand (1):
  mm/memory_hotplug: fix online/offline_pages called w.o.
    mem_hotplug_lock

Vitaly Kuznetsov (1):
  drivers/base: export lock_device_hotplug/unlock_device_hotplug

 arch/powerpc/platforms/powernv/memtrace.c |  3 ++
 drivers/acpi/acpi_memhotplug.c            |  1 +
 drivers/base/core.c                       |  2 ++
 drivers/base/memory.c                     | 18 +++++-----
 drivers/hv/hv_balloon.c                   |  4 +++
 drivers/s390/char/sclp_cmd.c              |  3 ++
 drivers/xen/balloon.c                     |  3 ++
 mm/memory_hotplug.c                       | 42 ++++++++++++++++++-----
 8 files changed, 57 insertions(+), 19 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug
  2018-08-17  7:58 [PATCH RFC 0/2] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
@ 2018-08-17  7:59 ` David Hildenbrand
  2018-08-17  8:41   ` Greg Kroah-Hartman
  2018-08-17  7:59 ` [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
  1 sibling, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2018-08-17  7:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-acpi, devel, linux-s390, xen-devel,
	Andrew Morton, Michal Hocko, Vlastimil Babka, Dan Williams,
	Pavel Tatashin, Oscar Salvador, Vitaly Kuznetsov,
	Martin Schwidefsky, Heiko Carstens, David Hildenbrand,
	K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Benjamin Herrenschmidt, Paul Mackerras, Rafael J . Wysocki,
	Len Brown, Greg Kroah-Hartman, David Rientjes

From: Vitaly Kuznetsov <vkuznets@redhat.com>

Well require to call add_memory()/add_memory_resource() with
device_hotplug_lock held, to avoid a lock inversion. Allow external modules
(e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
lock device hotplug.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
[modify patch description]
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 04bbcd779e11..9010b9e942b5 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -700,11 +700,13 @@ void lock_device_hotplug(void)
 {
 	mutex_lock(&device_hotplug_lock);
 }
+EXPORT_SYMBOL_GPL(lock_device_hotplug);
 
 void unlock_device_hotplug(void)
 {
 	mutex_unlock(&device_hotplug_lock);
 }
+EXPORT_SYMBOL_GPL(unlock_device_hotplug);
 
 int lock_device_hotplug_sysfs(void)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
  2018-08-17  7:58 [PATCH RFC 0/2] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
  2018-08-17  7:59 ` [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug David Hildenbrand
@ 2018-08-17  7:59 ` David Hildenbrand
  2018-08-17  8:20   ` Rafael J. Wysocki
  1 sibling, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2018-08-17  7:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-acpi, devel, linux-s390, xen-devel,
	Andrew Morton, Michal Hocko, Vlastimil Babka, Dan Williams,
	Pavel Tatashin, Oscar Salvador, Vitaly Kuznetsov,
	Martin Schwidefsky, Heiko Carstens, David Hildenbrand,
	K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Benjamin Herrenschmidt, Paul Mackerras, Rafael J . Wysocki,
	Len Brown, Greg Kroah-Hartman, David Rientjes

There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
fix concurrent memory hot-add deadlock"), which tried to fix a possible
lock inversion reported and discussed in [1] due to the two locks
	a) device_lock()
	b) mem_hotplug_lock

While add_memory() first takes b), followed by a) during
bus_probe_device(), onlining of memory from user space first took b),
followed by a), exposing a possible deadlock.

In [1], and it was decided to not make use of device_hotplug_lock, but
rather to enforce a locking order. Looking at 1., this order is not always
satisfied when calling device_online() - essentially we simply don't take
one of both locks anymore - and fixing this would require us to
take the mem_hotplug_lock in core driver code (online_store()), which
sounds wrong.

The problems I spotted related to this:

1. Memory block device attributes: While .state first calls
   mem_hotplug_begin() and the calls device_online() - which takes
   device_lock() - .online does no longer call mem_hotplug_begin(), so
   effectively calls online_pages() without mem_hotplug_lock. onlining/
   offlining of pages is no longer serialised across different devices.

2. device_online() should be called under device_hotplug_lock, however
   onlining memory during add_memory() does not take care of that. (I
   didn't follow how strictly this is needed, but there seems to be a
   reason because it is documented at device_online() and
   device_offline()).

In addition, I think there is also something wrong about the locking in

3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
   (and device_online()) without locks. This was introduced after
   30467e0b3be. And skimming over the code, I assume it could need some
   more care in regards to locking.

ACPI code already holds the device_hotplug_lock, and as we are
effectively hotplugging memory block devices, requiring to hold that
lock does not sound too wrong, although not chosen in [1], as
	"I don't think resolving a locking dependency is appropriate by
	 just serializing them with another lock."
I think this is the cleanest solution.

Requiring add_memory()/add_memory_resource() to be called under
device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
online_pages/offline_pages() fixes 1. and 3.

Fixup all callers of add_memory/add_memory_resource to hold the lock if
not already done.

So this is essentially a revert of 30467e0b3be, implementation of what
was suggested in [1] by Vitaly, applied to the current tree.

[1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
    2015-February/065324.html

This patch is partly based on a patch by Vitaly Kuznetsov.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/platforms/powernv/memtrace.c |  3 ++
 drivers/acpi/acpi_memhotplug.c            |  1 +
 drivers/base/memory.c                     | 18 +++++-----
 drivers/hv/hv_balloon.c                   |  4 +++
 drivers/s390/char/sclp_cmd.c              |  3 ++
 drivers/xen/balloon.c                     |  3 ++
 mm/memory_hotplug.c                       | 42 ++++++++++++++++++-----
 7 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 51dc398ae3f7..4c2737a33020 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -206,6 +206,8 @@ static int memtrace_online(void)
 	int i, ret = 0;
 	struct memtrace_entry *ent;
 
+	/* add_memory() requires device_hotplug_lock */
+	lock_device_hotplug();
 	for (i = memtrace_array_nr - 1; i >= 0; i--) {
 		ent = &memtrace_array[i];
 
@@ -244,6 +246,7 @@ static int memtrace_online(void)
 		pr_info("Added trace memory back to node %d\n", ent->nid);
 		ent->size = ent->start = ent->nid = -1;
 	}
+	unlock_device_hotplug();
 	if (ret)
 		return ret;
 
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 6b0d3ef7309c..e7a4c7900967 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (node < 0)
 			node = memory_add_physaddr_to_nid(info->start_addr);
 
+		/* we already hold the device_hotplug lock at this point */
 		result = add_memory(node, info->start_addr, info->length);
 
 		/*
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c8a1cb0b6136..f60507f994df 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -341,19 +341,11 @@ store_mem_state(struct device *dev,
 		goto err;
 	}
 
-	/*
-	 * Memory hotplug needs to hold mem_hotplug_begin() for probe to find
-	 * the correct memory block to online before doing device_online(dev),
-	 * which will take dev->mutex.  Take the lock early to prevent an
-	 * inversion, memory_subsys_online() callbacks will be implemented by
-	 * assuming it's already protected.
-	 */
-	mem_hotplug_begin();
-
 	switch (online_type) {
 	case MMOP_ONLINE_KERNEL:
 	case MMOP_ONLINE_MOVABLE:
 	case MMOP_ONLINE_KEEP:
+		/* mem->online_type is protected by device_hotplug_lock */
 		mem->online_type = online_type;
 		ret = device_online(&mem->dev);
 		break;
@@ -364,7 +356,6 @@ store_mem_state(struct device *dev,
 		ret = -EINVAL; /* should never happen */
 	}
 
-	mem_hotplug_done();
 err:
 	unlock_device_hotplug();
 
@@ -522,6 +513,12 @@ memory_probe_store(struct device *dev, struct device_attribute *attr,
 		return -EINVAL;
 
 	nid = memory_add_physaddr_to_nid(phys_addr);
+
+	/* add_memory() requires the device_hotplug_lock */
+	ret = lock_device_hotplug_sysfs();
+	if (ret)
+		return ret;
+
 	ret = add_memory(nid, phys_addr,
 			 MIN_MEMORY_BLOCK_SIZE * sections_per_block);
 
@@ -530,6 +527,7 @@ memory_probe_store(struct device *dev, struct device_attribute *attr,
 
 	ret = count;
 out:
+	unlock_device_hotplug();
 	return ret;
 }
 
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index b1b788082793..c6d0b654f109 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -735,8 +735,12 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 		dm_device.ha_waiting = !memhp_auto_online;
 
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
+
+		/* add_memory() requires the device_hotplug lock */
+		lock_device_hotplug();
 		ret = add_memory(nid, PFN_PHYS((start_pfn)),
 				(HA_CHUNK << PAGE_SHIFT));
+		unlock_device_hotplug();
 
 		if (ret) {
 			pr_err("hot_add memory failed error is %d\n", ret);
diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
index d7686a68c093..cd0cdab8fcfb 100644
--- a/drivers/s390/char/sclp_cmd.c
+++ b/drivers/s390/char/sclp_cmd.c
@@ -405,8 +405,11 @@ static void __init add_memory_merged(u16 rn)
 	align_to_block_size(&start, &size, block_size);
 	if (!size)
 		goto skip_add;
+	/* add_memory() requires the device_hotplug lock */
+	lock_device_hotplug();
 	for (addr = start; addr < start + size; addr += block_size)
 		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size);
+	unlock_device_hotplug();
 skip_add:
 	first_rn = rn;
 	num = 1;
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 559e77a20a4d..df1ced4c0692 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -395,8 +395,11 @@ static enum bp_state reserve_additional_memory(void)
 	 * callers drop the mutex before trying again.
 	 */
 	mutex_unlock(&balloon_mutex);
+	/* add_memory_resource() requires the device_hotplug lock */
+	lock_device_hotplug();
 	rc = add_memory_resource(nid, resource->start, resource_size(resource),
 				 memhp_auto_online);
+	unlock_device_hotplug();
 	mutex_lock(&balloon_mutex);
 
 	if (rc) {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6a2726920ed2..492e558f791b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -885,7 +885,6 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
 	return zone;
 }
 
-/* Must be protected by mem_hotplug_begin() or a device_lock */
 int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type)
 {
 	unsigned long flags;
@@ -897,6 +896,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	struct memory_notify arg;
 	struct memory_block *mem;
 
+	mem_hotplug_begin();
+
 	/*
 	 * We can't use pfn_to_nid() because nid might be stored in struct page
 	 * which is not yet initialized. Instead, we find nid from memory block.
@@ -961,6 +962,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 
 	if (onlined_pages)
 		memory_notify(MEM_ONLINE, &arg);
+	mem_hotplug_done();
 	return 0;
 
 failed_addition:
@@ -968,6 +970,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 		 (unsigned long long) pfn << PAGE_SHIFT,
 		 (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
 	memory_notify(MEM_CANCEL_ONLINE, &arg);
+	mem_hotplug_done();
 	return ret;
 }
 #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
@@ -1115,7 +1118,18 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
-/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
+/*
+ * Requires device_hotplug_lock:
+ *
+ * add_memory_resource() will first take the mem_hotplug_lock and then
+ * device_lock() the created memory block device (via bus_probe_device()).
+ * However, device_online() calls device_lock() and then takes the
+ * mem_hotplug lock (via online_pages()). The device_hotplug_lock makes
+ * sure this lock inversion can never happen - and also device_online()
+ * needs it.
+ *
+ * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
+ */
 int __ref add_memory_resource(int nid, u64 start, u64 size, bool online)
 {
 	bool new_node = false;
@@ -1163,25 +1177,26 @@ int __ref add_memory_resource(int nid, u64 start, u64 size, bool online)
 	/* create new memmap entry */
 	firmware_map_add_hotplug(start, start + size, "System RAM");
 
+	/* device_online() will take the lock when calling online_pages() */
+	mem_hotplug_done();
+
 	/* online pages if requested */
 	if (online)
 		walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1),
 				  NULL, online_memory_block);
 
-	goto out;
-
+	return ret;
 error:
 	/* rollback pgdat allocation and others */
 	if (new_node)
 		rollback_node_hotadd(nid);
 	memblock_remove(start, size);
-
-out:
 	mem_hotplug_done();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(add_memory_resource);
 
+/* requires device_hotplug_lock, see add_memory_resource() */
 int __ref add_memory(int nid, u64 start, u64 size)
 {
 	struct resource *res;
@@ -1605,10 +1620,16 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		return -EINVAL;
 	if (!IS_ALIGNED(end_pfn, pageblock_nr_pages))
 		return -EINVAL;
+
+	mem_hotplug_begin();
+
 	/* This makes hotplug much easier...and readable.
 	   we assume this for now. .*/
-	if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, &valid_end))
+	if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start,
+				  &valid_end)) {
+		mem_hotplug_done();
 		return -EINVAL;
+	}
 
 	zone = page_zone(pfn_to_page(valid_start));
 	node = zone_to_nid(zone);
@@ -1617,8 +1638,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	/* set above range as isolated */
 	ret = start_isolate_page_range(start_pfn, end_pfn,
 				       MIGRATE_MOVABLE, true);
-	if (ret)
+	if (ret) {
+		mem_hotplug_done();
 		return ret;
+	}
 
 	arg.start_pfn = start_pfn;
 	arg.nr_pages = nr_pages;
@@ -1689,6 +1712,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	writeback_set_ratelimit();
 
 	memory_notify(MEM_OFFLINE, &arg);
+	mem_hotplug_done();
 	return 0;
 
 failed_removal:
@@ -1698,10 +1722,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	memory_notify(MEM_CANCEL_OFFLINE, &arg);
 	/* pushback to free area */
 	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	mem_hotplug_done();
 	return ret;
 }
 
-/* Must be protected by mem_hotplug_begin() or a device_lock */
 int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 {
 	return __offline_pages(start_pfn, start_pfn + nr_pages);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
  2018-08-17  7:59 ` [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
@ 2018-08-17  8:20   ` Rafael J. Wysocki
  2018-08-17  8:27     ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Rafael J. Wysocki @ 2018-08-17  8:20 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux Kernel Mailing List, Linux Memory Management List,
	linuxppc-dev, ACPI Devel Maling List, devel, linux-s390,
	xen-devel, Andrew Morton, Michal Hocko, Vlastimil Babka,
	Dan Williams, Pavel Tatashin, osalvador, Vitaly Kuznetsov,
	Martin Schwidefsky, Heiko Carstens, kys, haiyangz, sthemmin,
	Benjamin Herrenschmidt, Paul Mackerras, Rafael J. Wysocki,
	Len Brown, Greg Kroah-Hartman, David Rientjes

On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand <david@redhat.com> wrote:
>
> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
> fix concurrent memory hot-add deadlock"), which tried to fix a possible
> lock inversion reported and discussed in [1] due to the two locks
>         a) device_lock()
>         b) mem_hotplug_lock
>
> While add_memory() first takes b), followed by a) during
> bus_probe_device(), onlining of memory from user space first took b),
> followed by a), exposing a possible deadlock.
>
> In [1], and it was decided to not make use of device_hotplug_lock, but
> rather to enforce a locking order. Looking at 1., this order is not always
> satisfied when calling device_online() - essentially we simply don't take
> one of both locks anymore - and fixing this would require us to
> take the mem_hotplug_lock in core driver code (online_store()), which
> sounds wrong.
>
> The problems I spotted related to this:
>
> 1. Memory block device attributes: While .state first calls
>    mem_hotplug_begin() and the calls device_online() - which takes
>    device_lock() - .online does no longer call mem_hotplug_begin(), so
>    effectively calls online_pages() without mem_hotplug_lock. onlining/
>    offlining of pages is no longer serialised across different devices.
>
> 2. device_online() should be called under device_hotplug_lock, however
>    onlining memory during add_memory() does not take care of that. (I
>    didn't follow how strictly this is needed, but there seems to be a
>    reason because it is documented at device_online() and
>    device_offline()).
>
> In addition, I think there is also something wrong about the locking in
>
> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
>    (and device_online()) without locks. This was introduced after
>    30467e0b3be. And skimming over the code, I assume it could need some
>    more care in regards to locking.
>
> ACPI code already holds the device_hotplug_lock, and as we are
> effectively hotplugging memory block devices, requiring to hold that
> lock does not sound too wrong, although not chosen in [1], as
>         "I don't think resolving a locking dependency is appropriate by
>          just serializing them with another lock."
> I think this is the cleanest solution.
>
> Requiring add_memory()/add_memory_resource() to be called under
> device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
> online_pages/offline_pages() fixes 1. and 3.
>
> Fixup all callers of add_memory/add_memory_resource to hold the lock if
> not already done.
>
> So this is essentially a revert of 30467e0b3be, implementation of what
> was suggested in [1] by Vitaly, applied to the current tree.
>
> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
>     2015-February/065324.html
>
> This patch is partly based on a patch by Vitaly Kuznetsov.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/powerpc/platforms/powernv/memtrace.c |  3 ++
>  drivers/acpi/acpi_memhotplug.c            |  1 +
>  drivers/base/memory.c                     | 18 +++++-----
>  drivers/hv/hv_balloon.c                   |  4 +++
>  drivers/s390/char/sclp_cmd.c              |  3 ++
>  drivers/xen/balloon.c                     |  3 ++
>  mm/memory_hotplug.c                       | 42 ++++++++++++++++++-----
>  7 files changed, 55 insertions(+), 19 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
> index 51dc398ae3f7..4c2737a33020 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -206,6 +206,8 @@ static int memtrace_online(void)
>         int i, ret = 0;
>         struct memtrace_entry *ent;
>
> +       /* add_memory() requires device_hotplug_lock */
> +       lock_device_hotplug();
>         for (i = memtrace_array_nr - 1; i >= 0; i--) {
>                 ent = &memtrace_array[i];
>
> @@ -244,6 +246,7 @@ static int memtrace_online(void)
>                 pr_info("Added trace memory back to node %d\n", ent->nid);
>                 ent->size = ent->start = ent->nid = -1;
>         }
> +       unlock_device_hotplug();
>         if (ret)
>                 return ret;
>
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 6b0d3ef7309c..e7a4c7900967 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>                 if (node < 0)
>                         node = memory_add_physaddr_to_nid(info->start_addr);
>
> +               /* we already hold the device_hotplug lock at this point */
>                 result = add_memory(node, info->start_addr, info->length);
>
>                 /*

A very minor nit here: I would say "device_hotplug_lock is already
held at this point" in the comment (I sort of don't like to say "we"
in code comments as it is not particularly clear what group of people
is represented by that and the lock is actually called
device_hotplug_lock).

Otherwise the approach is fine by me.

BTW, the reason why device_hotplug_lock is acquired by the ACPI memory
hotplug is because it generally needs to be synchronized with respect
CPU hot-remove and similar.  I believe that this may be the case in
non-ACPI setups as well.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
  2018-08-17  8:20   ` Rafael J. Wysocki
@ 2018-08-17  8:27     ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2018-08-17  8:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Linux Memory Management List,
	linuxppc-dev, ACPI Devel Maling List, devel, linux-s390,
	xen-devel, Andrew Morton, Michal Hocko, Vlastimil Babka,
	Dan Williams, Pavel Tatashin, osalvador, Vitaly Kuznetsov,
	Martin Schwidefsky, Heiko Carstens, kys, haiyangz, sthemmin,
	Benjamin Herrenschmidt, Paul Mackerras, Rafael J. Wysocki,
	Len Brown, Greg Kroah-Hartman, David Rientjes

On 17.08.2018 10:20, Rafael J. Wysocki wrote:
> On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
>> fix concurrent memory hot-add deadlock"), which tried to fix a possible
>> lock inversion reported and discussed in [1] due to the two locks
>>         a) device_lock()
>>         b) mem_hotplug_lock
>>
>> While add_memory() first takes b), followed by a) during
>> bus_probe_device(), onlining of memory from user space first took b),
>> followed by a), exposing a possible deadlock.
>>
>> In [1], and it was decided to not make use of device_hotplug_lock, but
>> rather to enforce a locking order. Looking at 1., this order is not always
>> satisfied when calling device_online() - essentially we simply don't take
>> one of both locks anymore - and fixing this would require us to
>> take the mem_hotplug_lock in core driver code (online_store()), which
>> sounds wrong.
>>
>> The problems I spotted related to this:
>>
>> 1. Memory block device attributes: While .state first calls
>>    mem_hotplug_begin() and the calls device_online() - which takes
>>    device_lock() - .online does no longer call mem_hotplug_begin(), so
>>    effectively calls online_pages() without mem_hotplug_lock. onlining/
>>    offlining of pages is no longer serialised across different devices.
>>
>> 2. device_online() should be called under device_hotplug_lock, however
>>    onlining memory during add_memory() does not take care of that. (I
>>    didn't follow how strictly this is needed, but there seems to be a
>>    reason because it is documented at device_online() and
>>    device_offline()).
>>
>> In addition, I think there is also something wrong about the locking in
>>
>> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
>>    (and device_online()) without locks. This was introduced after
>>    30467e0b3be. And skimming over the code, I assume it could need some
>>    more care in regards to locking.
>>
>> ACPI code already holds the device_hotplug_lock, and as we are
>> effectively hotplugging memory block devices, requiring to hold that
>> lock does not sound too wrong, although not chosen in [1], as
>>         "I don't think resolving a locking dependency is appropriate by
>>          just serializing them with another lock."
>> I think this is the cleanest solution.
>>
>> Requiring add_memory()/add_memory_resource() to be called under
>> device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
>> online_pages/offline_pages() fixes 1. and 3.
>>
>> Fixup all callers of add_memory/add_memory_resource to hold the lock if
>> not already done.
>>
>> So this is essentially a revert of 30467e0b3be, implementation of what
>> was suggested in [1] by Vitaly, applied to the current tree.
>>
>> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
>>     2015-February/065324.html
>>
>> This patch is partly based on a patch by Vitaly Kuznetsov.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  arch/powerpc/platforms/powernv/memtrace.c |  3 ++
>>  drivers/acpi/acpi_memhotplug.c            |  1 +
>>  drivers/base/memory.c                     | 18 +++++-----
>>  drivers/hv/hv_balloon.c                   |  4 +++
>>  drivers/s390/char/sclp_cmd.c              |  3 ++
>>  drivers/xen/balloon.c                     |  3 ++
>>  mm/memory_hotplug.c                       | 42 ++++++++++++++++++-----
>>  7 files changed, 55 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
>> index 51dc398ae3f7..4c2737a33020 100644
>> --- a/arch/powerpc/platforms/powernv/memtrace.c
>> +++ b/arch/powerpc/platforms/powernv/memtrace.c
>> @@ -206,6 +206,8 @@ static int memtrace_online(void)
>>         int i, ret = 0;
>>         struct memtrace_entry *ent;
>>
>> +       /* add_memory() requires device_hotplug_lock */
>> +       lock_device_hotplug();
>>         for (i = memtrace_array_nr - 1; i >= 0; i--) {
>>                 ent = &memtrace_array[i];
>>
>> @@ -244,6 +246,7 @@ static int memtrace_online(void)
>>                 pr_info("Added trace memory back to node %d\n", ent->nid);
>>                 ent->size = ent->start = ent->nid = -1;
>>         }
>> +       unlock_device_hotplug();
>>         if (ret)
>>                 return ret;
>>
>> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
>> index 6b0d3ef7309c..e7a4c7900967 100644
>> --- a/drivers/acpi/acpi_memhotplug.c
>> +++ b/drivers/acpi/acpi_memhotplug.c
>> @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>>                 if (node < 0)
>>                         node = memory_add_physaddr_to_nid(info->start_addr);
>>
>> +               /* we already hold the device_hotplug lock at this point */
>>                 result = add_memory(node, info->start_addr, info->length);
>>
>>                 /*
> 
> A very minor nit here: I would say "device_hotplug_lock is already
> held at this point" in the comment (I sort of don't like to say "we"
> in code comments as it is not particularly clear what group of people
> is represented by that and the lock is actually called
> device_hotplug_lock).

Easy to fix, thanks!

> 
> Otherwise the approach is fine by me.
> 
> BTW, the reason why device_hotplug_lock is acquired by the ACPI memory
> hotplug is because it generally needs to be synchronized with respect
> CPU hot-remove and similar.  I believe that this may be the case in
> non-ACPI setups as well.

Yes, and that lock is the reason why we didn't have real problems with
ACPI memory hotplug in this respect so far. (as user triggered
online/offline also takes that lock already)

> 
> Thanks,
> Rafael
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug
  2018-08-17  7:59 ` [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug David Hildenbrand
@ 2018-08-17  8:41   ` Greg Kroah-Hartman
  2018-08-17  8:56     ` David Hildenbrand
  2018-08-17  8:57     ` Rafael J. Wysocki
  0 siblings, 2 replies; 15+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-17  8:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, linux-acpi, devel,
	linux-s390, xen-devel, Andrew Morton, Michal Hocko,
	Vlastimil Babka, Dan Williams, Pavel Tatashin, Oscar Salvador,
	Vitaly Kuznetsov, Martin Schwidefsky, Heiko Carstens,
	K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Benjamin Herrenschmidt, Paul Mackerras, Rafael J . Wysocki,
	Len Brown, David Rientjes

On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> Well require to call add_memory()/add_memory_resource() with
> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
> lock device hotplug.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> [modify patch description]
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/base/core.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 04bbcd779e11..9010b9e942b5 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
>  {
>  	mutex_lock(&device_hotplug_lock);
>  }
> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
>  
>  void unlock_device_hotplug(void)
>  {
>  	mutex_unlock(&device_hotplug_lock);
>  }
> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);

If these are going to be "global" symbols, let's properly name them.
device_hotplug_lock/unlock would be better.  But I am _really_ nervous
about letting stuff outside of the driver core mess with this, as people
better know what they are doing.

Can't we just "lock" the memory stuff instead?  Why does the entirety of
the driver core need to be messed with here?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug
  2018-08-17  8:41   ` Greg Kroah-Hartman
@ 2018-08-17  8:56     ` David Hildenbrand
  2018-08-17  9:03       ` Rafael J. Wysocki
  2018-08-17  8:57     ` Rafael J. Wysocki
  1 sibling, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2018-08-17  8:56 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, linux-mm, linuxppc-dev, linux-acpi, devel,
	linux-s390, xen-devel, Andrew Morton, Michal Hocko,
	Vlastimil Babka, Dan Williams, Pavel Tatashin, Oscar Salvador,
	Vitaly Kuznetsov, Martin Schwidefsky, Heiko Carstens,
	K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Benjamin Herrenschmidt, Paul Mackerras, Rafael J . Wysocki,
	Len Brown, David Rientjes

On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
>> From: Vitaly Kuznetsov <vkuznets@redhat.com>
>>
>> Well require to call add_memory()/add_memory_resource() with
>> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
>> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
>> lock device hotplug.
>>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> [modify patch description]
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  drivers/base/core.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index 04bbcd779e11..9010b9e942b5 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
>>  {
>>  	mutex_lock(&device_hotplug_lock);
>>  }
>> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
>>  
>>  void unlock_device_hotplug(void)
>>  {
>>  	mutex_unlock(&device_hotplug_lock);
>>  }
>> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
> 
> If these are going to be "global" symbols, let's properly name them.
> device_hotplug_lock/unlock would be better.  But I am _really_ nervous
> about letting stuff outside of the driver core mess with this, as people
> better know what they are doing.

The only "problem" is that we have kernel modules (for paravirtualized
devices) that call add_memory(). This is Hyper-V right now, but we might
have other ones in the future. Without them we would not have to export
it. We might also get kernel modules that want to call remove_memory() -
which will require the device_hotplug_lock as of now.

What we could do is

a) add_memory() -> _add_memory() and don't export it
b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
We export that one.
c) Use add_memory() in external modules only

Similar wrapper would be needed e.g. for remove_memory() later on.

> 
> Can't we just "lock" the memory stuff instead?  Why does the entirety of
> the driver core need to be messed with here?

The main problem is that add_memory() will first take the
mem_hotplug_lock, to later on create and attach the memory block devices
(to a bus without any drivers) via bus_probe_device(). Here, we will
take the device_lock() of these devices.

Setting a device online from user space (.online = true) will first take
the device_hotplug_lock, then the device_lock(), and _right now_ not the
mem_hotplug_lock (see cover letter/patch 2).

We have to take mem_hotplug_lock here, but doing it down in e.g.
online_pages() will right now create the possibility for a lock
inversion. So one alternative is to take the mem_hotplug_lock in core.c
before calling device_online()/device_offline(). But this feels very wrong.

Thanks!

> 
> thanks,
> 
> greg k-h
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug
  2018-08-17  8:41   ` Greg Kroah-Hartman
  2018-08-17  8:56     ` David Hildenbrand
@ 2018-08-17  8:57     ` Rafael J. Wysocki
  1 sibling, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2018-08-17  8:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: David Hildenbrand, Linux Kernel Mailing List,
	Linux Memory Management List, linuxppc-dev,
	ACPI Devel Maling List, devel, linux-s390, xen-devel,
	Andrew Morton, Michal Hocko, Vlastimil Babka, Dan Williams,
	Pavel Tatashin, osalvador, Vitaly Kuznetsov, Martin Schwidefsky,
	Heiko Carstens, kys, haiyangz, sthemmin, Benjamin Herrenschmidt,
	Paul Mackerras, Rafael J. Wysocki, Len Brown, David Rientjes

On Fri, Aug 17, 2018 at 10:41 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
> > From: Vitaly Kuznetsov <vkuznets@redhat.com>
> >
> > Well require to call add_memory()/add_memory_resource() with
> > device_hotplug_lock held, to avoid a lock inversion. Allow external modules
> > (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
> > lock device hotplug.
> >
> > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > [modify patch description]
> > Signed-off-by: David Hildenbrand <david@redhat.com>
> > ---
> >  drivers/base/core.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 04bbcd779e11..9010b9e942b5 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
> >  {
> >       mutex_lock(&device_hotplug_lock);
> >  }
> > +EXPORT_SYMBOL_GPL(lock_device_hotplug);
> >
> >  void unlock_device_hotplug(void)
> >  {
> >       mutex_unlock(&device_hotplug_lock);
> >  }
> > +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
>
> If these are going to be "global" symbols, let's properly name them.
> device_hotplug_lock/unlock would be better.

Well, device_hotplug_lock is the name of the lock itself. :-)

> But I am _really_ nervous about letting stuff outside of the driver core mess
> with this, as people better know what they are doing.
>
> Can't we just "lock" the memory stuff instead?  Why does the entirety of
> the driver core need to be messed with here?

Because, in general, memory hotplug and hotplug of devices are not
independent.  CPUs and memory may only be possible to take away
together and that may be the case for other devices too (say, it
wouldn't be a good idea to access a memory block that has just gone
away from a device, for DMA and the like).  That's why
device_hotplug_lock was introduced in the first place.

That said I agree that exporting this to drivers doesn't feel particularly safe.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug
  2018-08-17  8:56     ` David Hildenbrand
@ 2018-08-17  9:03       ` Rafael J. Wysocki
  2018-08-17  9:41         ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Rafael J. Wysocki @ 2018-08-17  9:03 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List,
	Linux Memory Management List, linuxppc-dev,
	ACPI Devel Maling List, devel, linux-s390, xen-devel,
	Andrew Morton, Michal Hocko, Vlastimil Babka, Dan Williams,
	Pavel Tatashin, osalvador, Vitaly Kuznetsov, Martin Schwidefsky,
	Heiko Carstens, kys, haiyangz, sthemmin, Benjamin Herrenschmidt,
	Paul Mackerras, Rafael J. Wysocki, Len Brown, David Rientjes

On Fri, Aug 17, 2018 at 10:56 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
> > On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
> >> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> >>
> >> Well require to call add_memory()/add_memory_resource() with
> >> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
> >> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
> >> lock device hotplug.
> >>
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> [modify patch description]
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  drivers/base/core.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >> index 04bbcd779e11..9010b9e942b5 100644
> >> --- a/drivers/base/core.c
> >> +++ b/drivers/base/core.c
> >> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
> >>  {
> >>      mutex_lock(&device_hotplug_lock);
> >>  }
> >> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
> >>
> >>  void unlock_device_hotplug(void)
> >>  {
> >>      mutex_unlock(&device_hotplug_lock);
> >>  }
> >> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
> >
> > If these are going to be "global" symbols, let's properly name them.
> > device_hotplug_lock/unlock would be better.  But I am _really_ nervous
> > about letting stuff outside of the driver core mess with this, as people
> > better know what they are doing.
>
> The only "problem" is that we have kernel modules (for paravirtualized
> devices) that call add_memory(). This is Hyper-V right now, but we might
> have other ones in the future. Without them we would not have to export
> it. We might also get kernel modules that want to call remove_memory() -
> which will require the device_hotplug_lock as of now.
>
> What we could do is
>
> a) add_memory() -> _add_memory() and don't export it
> b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
> We export that one.
> c) Use add_memory() in external modules only
>
> Similar wrapper would be needed e.g. for remove_memory() later on.

That would be safer IMO, as it would prevent developers from using
add_memory() without the lock, say.

If the lock is always going to be required for add_memory(), make it
hard (or event impossible) to use the latter without it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug
  2018-08-17  9:03       ` Rafael J. Wysocki
@ 2018-08-17  9:41         ` David Hildenbrand
  2018-08-17 10:06           ` Greg Kroah-Hartman
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2018-08-17  9:41 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List,
	Linux Memory Management List, linuxppc-dev,
	ACPI Devel Maling List, devel, linux-s390, xen-devel,
	Andrew Morton, Michal Hocko, Vlastimil Babka, Dan Williams,
	Pavel Tatashin, osalvador, Vitaly Kuznetsov, Martin Schwidefsky,
	Heiko Carstens, kys, haiyangz, sthemmin, Benjamin Herrenschmidt,
	Paul Mackerras, Rafael J. Wysocki, Len Brown, David Rientjes

On 17.08.2018 11:03, Rafael J. Wysocki wrote:
> On Fri, Aug 17, 2018 at 10:56 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
>>> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
>>>> From: Vitaly Kuznetsov <vkuznets@redhat.com>
>>>>
>>>> Well require to call add_memory()/add_memory_resource() with
>>>> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
>>>> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
>>>> lock device hotplug.
>>>>
>>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>>> [modify patch description]
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>  drivers/base/core.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>>> index 04bbcd779e11..9010b9e942b5 100644
>>>> --- a/drivers/base/core.c
>>>> +++ b/drivers/base/core.c
>>>> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
>>>>  {
>>>>      mutex_lock(&device_hotplug_lock);
>>>>  }
>>>> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
>>>>
>>>>  void unlock_device_hotplug(void)
>>>>  {
>>>>      mutex_unlock(&device_hotplug_lock);
>>>>  }
>>>> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
>>>
>>> If these are going to be "global" symbols, let's properly name them.
>>> device_hotplug_lock/unlock would be better.  But I am _really_ nervous
>>> about letting stuff outside of the driver core mess with this, as people
>>> better know what they are doing.
>>
>> The only "problem" is that we have kernel modules (for paravirtualized
>> devices) that call add_memory(). This is Hyper-V right now, but we might
>> have other ones in the future. Without them we would not have to export
>> it. We might also get kernel modules that want to call remove_memory() -
>> which will require the device_hotplug_lock as of now.
>>
>> What we could do is
>>
>> a) add_memory() -> _add_memory() and don't export it
>> b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
>> We export that one.
>> c) Use add_memory() in external modules only
>>
>> Similar wrapper would be needed e.g. for remove_memory() later on.
> 
> That would be safer IMO, as it would prevent developers from using
> add_memory() without the lock, say.
> 
> If the lock is always going to be required for add_memory(), make it
> hard (or event impossible) to use the latter without it.
> 

If there are no objections, I'll go into that direction. But I'll wait
for more comments regarding the general concept first.

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug
  2018-08-17  9:41         ` David Hildenbrand
@ 2018-08-17 10:06           ` Greg Kroah-Hartman
  2018-08-17 11:04             ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-17 10:06 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Rafael J. Wysocki, Michal Hocko, Benjamin Herrenschmidt,
	Heiko Carstens, Linux Memory Management List, Paul Mackerras,
	linux-s390, sthemmin, Pavel Tatashin, ACPI Devel Maling List,
	David Rientjes, xen-devel, Len Brown, haiyangz, Dan Williams,
	Andrew Morton, Vlastimil Babka, osalvador, Rafael J. Wysocki,
	Linux Kernel Mailing List, Martin Schwidefsky, devel,
	Vitaly Kuznetsov, linuxppc-dev

On Fri, Aug 17, 2018 at 11:41:24AM +0200, David Hildenbrand wrote:
> On 17.08.2018 11:03, Rafael J. Wysocki wrote:
> > On Fri, Aug 17, 2018 at 10:56 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
> >>> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
> >>>> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> >>>>
> >>>> Well require to call add_memory()/add_memory_resource() with
> >>>> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
> >>>> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
> >>>> lock device hotplug.
> >>>>
> >>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >>>> [modify patch description]
> >>>> Signed-off-by: David Hildenbrand <david@redhat.com>
> >>>> ---
> >>>>  drivers/base/core.c | 2 ++
> >>>>  1 file changed, 2 insertions(+)
> >>>>
> >>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >>>> index 04bbcd779e11..9010b9e942b5 100644
> >>>> --- a/drivers/base/core.c
> >>>> +++ b/drivers/base/core.c
> >>>> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
> >>>>  {
> >>>>      mutex_lock(&device_hotplug_lock);
> >>>>  }
> >>>> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
> >>>>
> >>>>  void unlock_device_hotplug(void)
> >>>>  {
> >>>>      mutex_unlock(&device_hotplug_lock);
> >>>>  }
> >>>> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
> >>>
> >>> If these are going to be "global" symbols, let's properly name them.
> >>> device_hotplug_lock/unlock would be better.  But I am _really_ nervous
> >>> about letting stuff outside of the driver core mess with this, as people
> >>> better know what they are doing.
> >>
> >> The only "problem" is that we have kernel modules (for paravirtualized
> >> devices) that call add_memory(). This is Hyper-V right now, but we might
> >> have other ones in the future. Without them we would not have to export
> >> it. We might also get kernel modules that want to call remove_memory() -
> >> which will require the device_hotplug_lock as of now.
> >>
> >> What we could do is
> >>
> >> a) add_memory() -> _add_memory() and don't export it
> >> b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
> >> We export that one.
> >> c) Use add_memory() in external modules only
> >>
> >> Similar wrapper would be needed e.g. for remove_memory() later on.
> > 
> > That would be safer IMO, as it would prevent developers from using
> > add_memory() without the lock, say.
> > 
> > If the lock is always going to be required for add_memory(), make it
> > hard (or event impossible) to use the latter without it.
> > 
> 
> If there are no objections, I'll go into that direction. But I'll wait
> for more comments regarding the general concept first.

It is the middle of the merge window, and maintainers are really busy
right now.  I doubt you will get many review comments just yet...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug
  2018-08-17 10:06           ` Greg Kroah-Hartman
@ 2018-08-17 11:04             ` David Hildenbrand
  2018-08-17 11:28               ` Heiko Carstens
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2018-08-17 11:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Rafael J. Wysocki, Michal Hocko, Benjamin Herrenschmidt,
	Heiko Carstens, Linux Memory Management List, Paul Mackerras,
	linux-s390, sthemmin, Pavel Tatashin, ACPI Devel Maling List,
	David Rientjes, xen-devel, Len Brown, haiyangz, Dan Williams,
	Andrew Morton, Vlastimil Babka, osalvador, Rafael J. Wysocki,
	Linux Kernel Mailing List, Martin Schwidefsky, devel,
	Vitaly Kuznetsov, linuxppc-dev

On 17.08.2018 12:06, Greg Kroah-Hartman wrote:
> On Fri, Aug 17, 2018 at 11:41:24AM +0200, David Hildenbrand wrote:
>> On 17.08.2018 11:03, Rafael J. Wysocki wrote:
>>> On Fri, Aug 17, 2018 at 10:56 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
>>>>> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
>>>>>> From: Vitaly Kuznetsov <vkuznets@redhat.com>
>>>>>>
>>>>>> Well require to call add_memory()/add_memory_resource() with
>>>>>> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
>>>>>> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
>>>>>> lock device hotplug.
>>>>>>
>>>>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>>>>> [modify patch description]
>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>> ---
>>>>>>  drivers/base/core.c | 2 ++
>>>>>>  1 file changed, 2 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>>>>> index 04bbcd779e11..9010b9e942b5 100644
>>>>>> --- a/drivers/base/core.c
>>>>>> +++ b/drivers/base/core.c
>>>>>> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
>>>>>>  {
>>>>>>      mutex_lock(&device_hotplug_lock);
>>>>>>  }
>>>>>> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
>>>>>>
>>>>>>  void unlock_device_hotplug(void)
>>>>>>  {
>>>>>>      mutex_unlock(&device_hotplug_lock);
>>>>>>  }
>>>>>> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
>>>>>
>>>>> If these are going to be "global" symbols, let's properly name them.
>>>>> device_hotplug_lock/unlock would be better.  But I am _really_ nervous
>>>>> about letting stuff outside of the driver core mess with this, as people
>>>>> better know what they are doing.
>>>>
>>>> The only "problem" is that we have kernel modules (for paravirtualized
>>>> devices) that call add_memory(). This is Hyper-V right now, but we might
>>>> have other ones in the future. Without them we would not have to export
>>>> it. We might also get kernel modules that want to call remove_memory() -
>>>> which will require the device_hotplug_lock as of now.
>>>>
>>>> What we could do is
>>>>
>>>> a) add_memory() -> _add_memory() and don't export it
>>>> b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
>>>> We export that one.
>>>> c) Use add_memory() in external modules only
>>>>
>>>> Similar wrapper would be needed e.g. for remove_memory() later on.
>>>
>>> That would be safer IMO, as it would prevent developers from using
>>> add_memory() without the lock, say.
>>>
>>> If the lock is always going to be required for add_memory(), make it
>>> hard (or event impossible) to use the latter without it.
>>>
>>
>> If there are no objections, I'll go into that direction. But I'll wait
>> for more comments regarding the general concept first.
> 
> It is the middle of the merge window, and maintainers are really busy
> right now.  I doubt you will get many review comments just yet...
> 

This has been broken since 2015, so I guess it can wait a bit :)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug
  2018-08-17 11:04             ` David Hildenbrand
@ 2018-08-17 11:28               ` Heiko Carstens
  2018-08-17 11:56                 ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Heiko Carstens @ 2018-08-17 11:28 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Michal Hocko,
	Benjamin Herrenschmidt, Linux Memory Management List,
	Paul Mackerras, linux-s390, sthemmin, Pavel Tatashin,
	ACPI Devel Maling List, David Rientjes, xen-devel, Len Brown,
	haiyangz, Dan Williams, Andrew Morton, Vlastimil Babka,
	osalvador, Rafael J. Wysocki, Linux Kernel Mailing List,
	Martin Schwidefsky, devel, Vitaly Kuznetsov, linuxppc-dev

On Fri, Aug 17, 2018 at 01:04:58PM +0200, David Hildenbrand wrote:
> >> If there are no objections, I'll go into that direction. But I'll wait
> >> for more comments regarding the general concept first.
> > 
> > It is the middle of the merge window, and maintainers are really busy
> > right now.  I doubt you will get many review comments just yet...
> > 
> 
> This has been broken since 2015, so I guess it can wait a bit :)

I hope you figured out what needs to be locked why. Your patch description
seems to be "only" about locking order ;)

I tried to figure out and document that partially with 55adc1d05dca ("mm:
add private lock to serialize memory hotplug operations"), and that wasn't
easy to figure out. I was especially concerned about sprinkling
lock/unlock_device_hotplug() calls, which has the potential to make it the
next BKL thing.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug
  2018-08-17 11:28               ` Heiko Carstens
@ 2018-08-17 11:56                 ` David Hildenbrand
  2018-08-17 17:02                   ` Greg Kroah-Hartman
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2018-08-17 11:56 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Michal Hocko,
	Benjamin Herrenschmidt, Linux Memory Management List,
	Paul Mackerras, linux-s390, sthemmin, Pavel Tatashin,
	ACPI Devel Maling List, David Rientjes, xen-devel, Len Brown,
	haiyangz, Dan Williams, Andrew Morton, Vlastimil Babka,
	osalvador, Rafael J. Wysocki, Linux Kernel Mailing List,
	Martin Schwidefsky, devel, Vitaly Kuznetsov, linuxppc-dev

On 17.08.2018 13:28, Heiko Carstens wrote:
> On Fri, Aug 17, 2018 at 01:04:58PM +0200, David Hildenbrand wrote:
>>>> If there are no objections, I'll go into that direction. But I'll wait
>>>> for more comments regarding the general concept first.
>>>
>>> It is the middle of the merge window, and maintainers are really busy
>>> right now.  I doubt you will get many review comments just yet...
>>>
>>
>> This has been broken since 2015, so I guess it can wait a bit :)
> 
> I hope you figured out what needs to be locked why. Your patch description
> seems to be "only" about locking order ;)

Well I hope so, too ... but there is a reason for the RFC mark ;) There
is definitely a lot of magic in the current code. And that's why it is
also not that obvious that locking is wrong.

To avoid/fix the locking order problem was the motivation for the
original patch that dropped mem_hotplug_lock on one path. So I focused
on that in my description.

> 
> I tried to figure out and document that partially with 55adc1d05dca ("mm:
> add private lock to serialize memory hotplug operations"), and that wasn't
> easy to figure out. I was especially concerned about sprinkling

Haven't seen that so far as that was reworked by 3f906ba23689
("mm/memory-hotplug: switch locking to a percpu rwsem"). Thanks for the
pointer. There is a long history to all this.

> lock/unlock_device_hotplug() calls, which has the potential to make it the
> next BKL thing.

Well, the thing with memory hotplug and device_hotplug_lock is that

a) ACPI already holds it while adding/removing memory via add_memory()
b) we hold it during online/offline of memory (via sysfs calls to
   device_online()/device_offline())

So it is already pretty much involved in all memory hotplug/unplug
activities on x86 (except paravirt). And as far as I understand, there
are good reasons to hold the lock in core.c and ACPI. (as mentioned by
Rafael)

The exceptions are add_memory() called on s390x, hyper-v, xen and ppc
(including manual probing). And device_online()/device_offline() called
from the kernel.

Holding device_hotplug_lock when adding/removing memory from the system
doesn't sound too wrong (especially as devices are created/removed). At
least that way (documenting and following the rules in the patch
description) we might at least get locking right.


I am very open to other suggestions (but as noted by Greg, many
maintainers might be busy by know).

E.g. When adding the memory block devices, we know that there won't be a
driver to attach to (as there are no drivers for the "memory" subsystem)
- the bus_probe_device() function that takes the device_lock() could
pretty much be avoided for that case. But burying such special cases
down in core driver code definitely won't make locking related to memory
hotplug easier.

Thanks for having a look!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug
  2018-08-17 11:56                 ` David Hildenbrand
@ 2018-08-17 17:02                   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 15+ messages in thread
From: Greg Kroah-Hartman @ 2018-08-17 17:02 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Heiko Carstens, Michal Hocko, Rafael J. Wysocki,
	Benjamin Herrenschmidt, Linux Memory Management List,
	Paul Mackerras, linux-s390, sthemmin, Pavel Tatashin,
	ACPI Devel Maling List, David Rientjes, xen-devel, Len Brown,
	haiyangz, Dan Williams, Vitaly Kuznetsov, Vlastimil Babka,
	osalvador, Rafael J. Wysocki, Linux Kernel Mailing List,
	Martin Schwidefsky, devel, Andrew Morton, linuxppc-dev

On Fri, Aug 17, 2018 at 01:56:35PM +0200, David Hildenbrand wrote:
> E.g. When adding the memory block devices, we know that there won't be a
> driver to attach to (as there are no drivers for the "memory" subsystem)
> - the bus_probe_device() function that takes the device_lock() could
> pretty much be avoided for that case. But burying such special cases
> down in core driver code definitely won't make locking related to memory
> hotplug easier.

You don't have to have a driver for a device if you don't want to, or
you can just have a default one for all memory devices if you somehow
need it.  No reason to not do this if it makes things easier for you.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-08-17 17:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-17  7:58 [PATCH RFC 0/2] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
2018-08-17  7:59 ` [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug David Hildenbrand
2018-08-17  8:41   ` Greg Kroah-Hartman
2018-08-17  8:56     ` David Hildenbrand
2018-08-17  9:03       ` Rafael J. Wysocki
2018-08-17  9:41         ` David Hildenbrand
2018-08-17 10:06           ` Greg Kroah-Hartman
2018-08-17 11:04             ` David Hildenbrand
2018-08-17 11:28               ` Heiko Carstens
2018-08-17 11:56                 ` David Hildenbrand
2018-08-17 17:02                   ` Greg Kroah-Hartman
2018-08-17  8:57     ` Rafael J. Wysocki
2018-08-17  7:59 ` [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
2018-08-17  8:20   ` Rafael J. Wysocki
2018-08-17  8:27     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).