linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock
@ 2018-08-21 10:44 David Hildenbrand
  2018-08-21 10:44 ` [PATCH RFCv2 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock David Hildenbrand
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: David Hildenbrand @ 2018-08-21 10:44 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, David Hildenbrand, Andrew Morton, Balbir Singh,
	Benjamin Herrenschmidt, Boris Ostrovsky, Dan Williams,
	Greg Kroah-Hartman, Haiyang Zhang, Heiko Carstens, John Allen,
	Jonathan Corbet, Joonsoo Kim, Juergen Gross, Kate Stewart,
	K. Y. Srinivasan, Len Brown, Martin Schwidefsky,
	Mathieu Malaterre, Michael Ellerman, Michael Neuling,
	Michal Hocko, Nathan Fontenot, Oscar Salvador, Paul Mackerras,
	Pavel Tatashin, Philippe Ombredanne, Rafael J. Wysocki,
	Rashmica Gupta, Stephen Hemminger, Thomas Gleixner,
	Vlastimil Babka, YASUAKI ISHIMATSU

This is the same approach as in the first RFC, but this time without
exporting device_hotplug_lock (requested by Greg) and with some more
details and documentation regarding locking. Tested only on x86 so far.

--------------------------------------------------------------------------

Reading through the code and studying how mem_hotplug_lock is to be used,
I noticed that there are two places where we can end up calling
device_online()/device_offline() - online_pages()/offline_pages() without
the mem_hotplug_lock. And there are other places where we call
device_online()/device_offline() without the device_hotplug_lock.

While e.g.
	echo "online" > /sys/devices/system/memory/memory9/state
is fine, e.g.
	echo 1 > /sys/devices/system/memory/memory9/online
Will not take the mem_hotplug_lock. However the device_lock() and
device_hotplug_lock.

E.g. via memory_probe_store(), we can end up calling
add_memory()->online_pages() without the device_hotplug_lock. So we can
have concurrent callers in online_pages(). We e.g. touch in online_pages()
basically unprotected zone->present_pages then.

Looks like there is a longer history to that (see Patch #2 for details),
and fixing it to work the way it was intended is not really possible. We
would e.g. have to take the mem_hotplug_lock in device/base/core.c, which
sounds wrong.

Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
More details can be found in patch 3 and patch 6.

I propose the general rules (documentation added in patch 6):

1. add_memory/add_memory_resource() must only be called with
   device_hotplug_lock.
2. remove_memory() must only be called with device_hotplug_lock. This is
   already documented and holds for all callers.
3. device_online()/device_offline() must only be called with
   device_hotplug_lock. This is already documented and true for now in core
   code. Other callers (related to memory hotplug) have to be fixed up.
4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
   online_pages/offline_pages.

To me, this looks way cleaner than what we have right now (and easier to
verify). And looking at the documentation of remove_memory, using
lock_device_hotplug also for add_memory() feels natural.


RFC -> RFCv2:
- Don't export device_hotplug_lock, provide proper remove_memory/add_memory
  wrappers.
- Split up the patches a bit.
- Try to improve powernv memtrace locking
- Add some documentation for locking that matches my knowledge

David Hildenbrand (6):
  mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
  mm/memory_hotplug: make add_memory() take the device_hotplug_lock
  mm/memory_hotplug: fix online/offline_pages called w.o.
    mem_hotplug_lock
  powerpc/powernv: hold device_hotplug_lock when calling device_online()
  powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages()
  memory-hotplug.txt: Add some details about locking internals

 Documentation/memory-hotplug.txt              | 39 +++++++++++-
 arch/powerpc/platforms/powernv/memtrace.c     | 14 +++--
 .../platforms/pseries/hotplug-memory.c        |  8 +--
 drivers/acpi/acpi_memhotplug.c                |  4 +-
 drivers/base/memory.c                         | 22 +++----
 drivers/xen/balloon.c                         |  3 +
 include/linux/memory_hotplug.h                |  4 +-
 mm/memory_hotplug.c                           | 59 +++++++++++++++----
 8 files changed, 115 insertions(+), 38 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH RFCv2 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
  2018-08-21 10:44 [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
@ 2018-08-21 10:44 ` David Hildenbrand
  2018-08-30 19:35   ` Pasha Tatashin
  2018-08-21 10:44 ` [PATCH RFCv2 2/6] mm/memory_hotplug: make add_memory() " David Hildenbrand
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2018-08-21 10:44 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, David Hildenbrand, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Rafael J. Wysocki, Len Brown, Rashmica Gupta,
	Michael Neuling, Balbir Singh, Nathan Fontenot, John Allen,
	Andrew Morton, Michal Hocko, Dan Williams, Joonsoo Kim,
	Vlastimil Babka, Pavel Tatashin, Greg Kroah-Hartman,
	Oscar Salvador, YASUAKI ISHIMATSU, Mathieu Malaterre

remove_memory() is exported right now but requires the
device_hotplug_lock, which is not exported. So let's provide a variant
that takes the lock and only export that one.

The lock is already held in
	arch/powerpc/platforms/pseries/hotplug-memory.c
	drivers/acpi/acpi_memhotplug.c
So, let's use the locked variant.

The lock is not held (but taken in)
	arch/powerpc/platforms/powernv/memtrace.c
So let's keep using the (now) locked variant.

Apart from that, there are not other users in the tree.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/platforms/powernv/memtrace.c       | 2 --
 arch/powerpc/platforms/pseries/hotplug-memory.c | 6 +++---
 drivers/acpi/acpi_memhotplug.c                  | 2 +-
 include/linux/memory_hotplug.h                  | 3 ++-
 mm/memory_hotplug.c                             | 9 ++++++++-
 5 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 51dc398ae3f7..8f1cd4f3bfd5 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -90,9 +90,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
 	walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
 			  change_memblock_state);
 
-	lock_device_hotplug();
 	remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
-	unlock_device_hotplug();
 
 	return true;
 }
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f54c626..b3f54466e25f 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -334,7 +334,7 @@ static int pseries_remove_memblock(unsigned long base, unsigned int memblock_siz
 	nid = memory_add_physaddr_to_nid(base);
 
 	for (i = 0; i < sections_per_block; i++) {
-		remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
+		__remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
 		base += MIN_MEMORY_BLOCK_SIZE;
 	}
 
@@ -423,7 +423,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
 	block_sz = pseries_memory_block_size();
 	nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
-	remove_memory(nid, lmb->base_addr, block_sz);
+	__remove_memory(nid, lmb->base_addr, block_sz);
 
 	/* Update memory regions for memory remove */
 	memblock_remove(lmb->base_addr, block_sz);
@@ -710,7 +710,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 
 	rc = dlpar_online_lmb(lmb);
 	if (rc) {
-		remove_memory(nid, lmb->base_addr, block_sz);
+		__remove_memory(nid, lmb->base_addr, block_sz);
 		dlpar_remove_device_tree_lmb(lmb);
 	} else {
 		lmb->flags |= DRCONF_MEM_ASSIGNED;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 6b0d3ef7309c..811148415993 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device)
 			nid = memory_add_physaddr_to_nid(info->start_addr);
 
 		acpi_unbind_memory_blocks(info);
-		remove_memory(nid, info->start_addr, info->length);
+		__remove_memory(nid, info->start_addr, info->length);
 		list_del(&info->list);
 		kfree(info);
 	}
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 34a28227068d..1f096852f479 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -301,6 +301,7 @@ extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages);
 extern void try_offline_node(int nid);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
 extern void remove_memory(int nid, u64 start, u64 size);
+extern void __remove_memory(int nid, u64 start, u64 size);
 
 #else
 static inline bool is_mem_section_removable(unsigned long pfn,
@@ -317,6 +318,7 @@ static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 }
 
 static inline void remove_memory(int nid, u64 start, u64 size) {}
+static inline void __remove_memory(int nid, u64 start, u64 size) {}
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 extern void __ref free_area_init_core_hotplug(int nid);
@@ -330,7 +332,6 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
 extern bool is_memblock_offlined(struct memory_block *mem);
-extern void remove_memory(int nid, u64 start, u64 size);
 extern int sparse_add_one_section(struct pglist_data *pgdat,
 		unsigned long start_pfn, struct vmem_altmap *altmap);
 extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9eea6e809a4e..898e13d4d87d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1872,7 +1872,7 @@ EXPORT_SYMBOL(try_offline_node);
  * and online/offline operations before this call, as required by
  * try_offline_node().
  */
-void __ref remove_memory(int nid, u64 start, u64 size)
+void __ref __remove_memory(int nid, u64 start, u64 size)
 {
 	int ret;
 
@@ -1901,5 +1901,12 @@ void __ref remove_memory(int nid, u64 start, u64 size)
 
 	mem_hotplug_done();
 }
+
+void __ref remove_memory(int nid, u64 start, u64 size)
+{
+	lock_device_hotplug();
+	__remove_memory(nid, start, size);
+	unlock_device_hotplug();
+}
 EXPORT_SYMBOL_GPL(remove_memory);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH RFCv2 2/6] mm/memory_hotplug: make add_memory() take the device_hotplug_lock
  2018-08-21 10:44 [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
  2018-08-21 10:44 ` [PATCH RFCv2 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock David Hildenbrand
@ 2018-08-21 10:44 ` David Hildenbrand
  2018-08-30 19:36   ` Pasha Tatashin
  2018-08-21 10:44 ` [PATCH RFCv2 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2018-08-21 10:44 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, David Hildenbrand, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Rafael J. Wysocki, Len Brown,
	Greg Kroah-Hartman, Boris Ostrovsky, Juergen Gross,
	Nathan Fontenot, John Allen, Andrew Morton, Michal Hocko,
	Dan Williams, Joonsoo Kim, Vlastimil Babka, Oscar Salvador,
	Mathieu Malaterre, Pavel Tatashin, YASUAKI ISHIMATSU

add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
	arch/powerpc/platforms/pseries/hotplug-memory.c
	drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.

In general, we should hold the device_hotplug_lock when adding memory
to synchronize against online/offline request (e.g. from user space) -
which already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
hot-add deadlock"). add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.

Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.

The lock is not held yet in
	drivers/xen/balloon.c
	arch/powerpc/platforms/powernv/memtrace.c
	drivers/s390/char/sclp_cmd.c
	drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.

Don't export add_memory_resource(), as it once was exported to be used
by XEN, which is never built as a module. If somebody requires it, we
also have to export a locked variant (as device_hotplug_lock is never
exported).

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 .../platforms/pseries/hotplug-memory.c        |  2 +-
 drivers/acpi/acpi_memhotplug.c                |  2 +-
 drivers/base/memory.c                         |  9 ++++++--
 drivers/xen/balloon.c                         |  3 +++
 include/linux/memory_hotplug.h                |  1 +
 mm/memory_hotplug.c                           | 22 ++++++++++++++++---
 6 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index b3f54466e25f..2e6f41dc103a 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -702,7 +702,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 	nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
 	/* Add the memory */
-	rc = add_memory(nid, lmb->base_addr, block_sz);
+	rc = __add_memory(nid, lmb->base_addr, block_sz);
 	if (rc) {
 		dlpar_remove_device_tree_lmb(lmb);
 		return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 811148415993..8fe0960ea572 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (node < 0)
 			node = memory_add_physaddr_to_nid(info->start_addr);
 
-		result = add_memory(node, info->start_addr, info->length);
+		result = __add_memory(node, info->start_addr, info->length);
 
 		/*
 		 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c8a1cb0b6136..5b0375be7f65 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -521,15 +521,20 @@ memory_probe_store(struct device *dev, struct device_attribute *attr,
 	if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
 		return -EINVAL;
 
+	ret = lock_device_hotplug_sysfs();
+	if (ret)
+		goto out;
+
 	nid = memory_add_physaddr_to_nid(phys_addr);
-	ret = add_memory(nid, phys_addr,
-			 MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+	ret = __add_memory(nid, phys_addr,
+			   MIN_MEMORY_BLOCK_SIZE * sections_per_block);
 
 	if (ret)
 		goto out;
 
 	ret = count;
 out:
+	unlock_device_hotplug();
 	return ret;
 }
 
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index e12bb256036f..6bab019a82b1 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -395,7 +395,10 @@ static enum bp_state reserve_additional_memory(void)
 	 * callers drop the mutex before trying again.
 	 */
 	mutex_unlock(&balloon_mutex);
+	/* add_memory_resource() requires the device_hotplug lock */
+	lock_device_hotplug();
 	rc = add_memory_resource(nid, resource, memhp_auto_online);
+	unlock_device_hotplug();
 	mutex_lock(&balloon_mutex);
 
 	if (rc) {
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 1f096852f479..ffd9cd10fcf3 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -324,6 +324,7 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
 extern void __ref free_area_init_core_hotplug(int nid);
 extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
 		void *arg, int (*func)(struct memory_block *, void *));
+extern int __add_memory(int nid, u64 start, u64 size);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int add_memory_resource(int nid, struct resource *resource, bool online);
 extern int arch_add_memory(int nid, u64 start, u64 size,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 898e13d4d87d..e2b5c751e3ea 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1111,7 +1111,12 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
-/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
+/*
+ * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
+ * and online/offline operations (triggered e.g. by sysfs).
+ *
+ * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
+ */
 int __ref add_memory_resource(int nid, struct resource *res, bool online)
 {
 	u64 start, size;
@@ -1180,9 +1185,9 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
 	mem_hotplug_done();
 	return ret;
 }
-EXPORT_SYMBOL_GPL(add_memory_resource);
 
-int __ref add_memory(int nid, u64 start, u64 size)
+/* requires device_hotplug_lock, see add_memory_resource() */
+int __ref __add_memory(int nid, u64 start, u64 size)
 {
 	struct resource *res;
 	int ret;
@@ -1196,6 +1201,17 @@ int __ref add_memory(int nid, u64 start, u64 size)
 		release_memory_resource(res);
 	return ret;
 }
+
+int add_memory(int nid, u64 start, u64 size)
+{
+	int rc;
+
+	lock_device_hotplug();
+	rc = __add_memory(nid, start, size);
+	unlock_device_hotplug();
+
+	return rc;
+}
 EXPORT_SYMBOL_GPL(add_memory);
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH RFCv2 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
  2018-08-21 10:44 [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
  2018-08-21 10:44 ` [PATCH RFCv2 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock David Hildenbrand
  2018-08-21 10:44 ` [PATCH RFCv2 2/6] mm/memory_hotplug: make add_memory() " David Hildenbrand
@ 2018-08-21 10:44 ` David Hildenbrand
  2018-08-30 19:37   ` Pasha Tatashin
       [not found]   ` <70372ef5-e332-6c07-f08c-50f8808bde6d@gmail.com>
  2018-08-21 10:44 ` [PATCH RFCv2 4/6] powerpc/powernv: hold device_hotplug_lock when calling device_online() David Hildenbrand
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 20+ messages in thread
From: David Hildenbrand @ 2018-08-21 10:44 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, David Hildenbrand, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Rafael J. Wysocki, Len Brown,
	Greg Kroah-Hartman, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Martin Schwidefsky, Heiko Carstens,
	Boris Ostrovsky, Juergen Gross, Rashmica Gupta, Michael Neuling,
	Balbir Singh, Kate Stewart, Thomas Gleixner, Philippe Ombredanne,
	Andrew Morton, Michal Hocko, Pavel Tatashin, Vlastimil Babka,
	Dan Williams, Oscar Salvador, YASUAKI ISHIMATSU,
	Mathieu Malaterre

There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
fix concurrent memory hot-add deadlock"), which tried to fix a possible
lock inversion reported and discussed in [1] due to the two locks
	a) device_lock()
	b) mem_hotplug_lock

While add_memory() first takes b), followed by a) during
bus_probe_device(), onlining of memory from user space first took b),
followed by a), exposing a possible deadlock.

In [1], and it was decided to not make use of device_hotplug_lock, but
rather to enforce a locking order.

The problems I spotted related to this:

1. Memory block device attributes: While .state first calls
   mem_hotplug_begin() and the calls device_online() - which takes
   device_lock() - .online does no longer call mem_hotplug_begin(), so
   effectively calls online_pages() without mem_hotplug_lock.

2. device_online() should be called under device_hotplug_lock, however
   onlining memory during add_memory() does not take care of that.

In addition, I think there is also something wrong about the locking in

3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
   without locks. This was introduced after 30467e0b3be. And skimming over
   the code, I assume it could need some more care in regards to locking
   (e.g. device_online() called without device_hotplug_lock - but I'll
   not touch that for now).

Now that we hold the device_hotplug_lock when
- adding memory (e.g. via add_memory()/add_memory_resource())
- removing memory (e.g. via remove_memory())
- device_online()/device_offline()

We can move mem_hotplug_lock usage back into
online_pages()/offline_pages().

Why is mem_hotplug_lock still needed? Essentially to make
get_online_mems()/put_online_mems() be very fast (relying on
device_hotplug_lock would be very slow), and to serialize against
addition of memory that does not create memory block devices (hmm).

[1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
    2015-February/065324.html

This patch is partly based on a patch by Vitaly Kuznetsov.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c | 13 +------------
 mm/memory_hotplug.c   | 28 ++++++++++++++++++++--------
 2 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 5b0375be7f65..04be13539eb8 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -228,7 +228,6 @@ static bool pages_correctly_probed(unsigned long start_pfn)
 /*
  * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is
  * OK to have direct references to sparsemem variables in here.
- * Must already be protected by mem_hotplug_begin().
  */
 static int
 memory_block_action(unsigned long phys_index, unsigned long action, int online_type)
@@ -294,7 +293,6 @@ static int memory_subsys_online(struct device *dev)
 	if (mem->online_type < 0)
 		mem->online_type = MMOP_ONLINE_KEEP;
 
-	/* Already under protection of mem_hotplug_begin() */
 	ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
 
 	/* clear online_type */
@@ -341,19 +339,11 @@ store_mem_state(struct device *dev,
 		goto err;
 	}
 
-	/*
-	 * Memory hotplug needs to hold mem_hotplug_begin() for probe to find
-	 * the correct memory block to online before doing device_online(dev),
-	 * which will take dev->mutex.  Take the lock early to prevent an
-	 * inversion, memory_subsys_online() callbacks will be implemented by
-	 * assuming it's already protected.
-	 */
-	mem_hotplug_begin();
-
 	switch (online_type) {
 	case MMOP_ONLINE_KERNEL:
 	case MMOP_ONLINE_MOVABLE:
 	case MMOP_ONLINE_KEEP:
+		/* mem->online_type is protected by device_hotplug_lock */
 		mem->online_type = online_type;
 		ret = device_online(&mem->dev);
 		break;
@@ -364,7 +354,6 @@ store_mem_state(struct device *dev,
 		ret = -EINVAL; /* should never happen */
 	}
 
-	mem_hotplug_done();
 err:
 	unlock_device_hotplug();
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e2b5c751e3ea..a2c6c87d83f3 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -881,7 +881,6 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
 	return zone;
 }
 
-/* Must be protected by mem_hotplug_begin() or a device_lock */
 int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type)
 {
 	unsigned long flags;
@@ -893,6 +892,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	struct memory_notify arg;
 	struct memory_block *mem;
 
+	mem_hotplug_begin();
+
 	/*
 	 * We can't use pfn_to_nid() because nid might be stored in struct page
 	 * which is not yet initialized. Instead, we find nid from memory block.
@@ -957,6 +958,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 
 	if (onlined_pages)
 		memory_notify(MEM_ONLINE, &arg);
+	mem_hotplug_done();
 	return 0;
 
 failed_addition:
@@ -964,6 +966,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 		 (unsigned long long) pfn << PAGE_SHIFT,
 		 (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
 	memory_notify(MEM_CANCEL_ONLINE, &arg);
+	mem_hotplug_done();
 	return ret;
 }
 #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
@@ -1168,20 +1171,20 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
 	/* create new memmap entry */
 	firmware_map_add_hotplug(start, start + size, "System RAM");
 
+	/* device_online() will take the lock when calling online_pages() */
+	mem_hotplug_done();
+
 	/* online pages if requested */
 	if (online)
 		walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1),
 				  NULL, online_memory_block);
 
-	goto out;
-
+	return ret;
 error:
 	/* rollback pgdat allocation and others */
 	if (new_node)
 		rollback_node_hotadd(nid);
 	memblock_remove(start, size);
-
-out:
 	mem_hotplug_done();
 	return ret;
 }
@@ -1621,10 +1624,16 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		return -EINVAL;
 	if (!IS_ALIGNED(end_pfn, pageblock_nr_pages))
 		return -EINVAL;
+
+	mem_hotplug_begin();
+
 	/* This makes hotplug much easier...and readable.
 	   we assume this for now. .*/
-	if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, &valid_end))
+	if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start,
+				  &valid_end)) {
+		mem_hotplug_done();
 		return -EINVAL;
+	}
 
 	zone = page_zone(pfn_to_page(valid_start));
 	node = zone_to_nid(zone);
@@ -1633,8 +1642,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	/* set above range as isolated */
 	ret = start_isolate_page_range(start_pfn, end_pfn,
 				       MIGRATE_MOVABLE, true);
-	if (ret)
+	if (ret) {
+		mem_hotplug_done();
 		return ret;
+	}
 
 	arg.start_pfn = start_pfn;
 	arg.nr_pages = nr_pages;
@@ -1705,6 +1716,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	writeback_set_ratelimit();
 
 	memory_notify(MEM_OFFLINE, &arg);
+	mem_hotplug_done();
 	return 0;
 
 failed_removal:
@@ -1714,10 +1726,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	memory_notify(MEM_CANCEL_OFFLINE, &arg);
 	/* pushback to free area */
 	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	mem_hotplug_done();
 	return ret;
 }
 
-/* Must be protected by mem_hotplug_begin() or a device_lock */
 int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 {
 	return __offline_pages(start_pfn, start_pfn + nr_pages);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH RFCv2 4/6] powerpc/powernv: hold device_hotplug_lock when calling device_online()
  2018-08-21 10:44 [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
                   ` (2 preceding siblings ...)
  2018-08-21 10:44 ` [PATCH RFCv2 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
@ 2018-08-21 10:44 ` David Hildenbrand
  2018-08-30 19:38   ` Pasha Tatashin
  2018-08-21 10:44 ` [PATCH RFCv2 5/6] powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages() David Hildenbrand
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2018-08-21 10:44 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, David Hildenbrand, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Rashmica Gupta, Balbir Singh, Michael Neuling

device_online() should be called with device_hotplug_lock() held.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/platforms/powernv/memtrace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 8f1cd4f3bfd5..ef7181d4fe68 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -229,9 +229,11 @@ static int memtrace_online(void)
 		 * we need to online the memory ourselves.
 		 */
 		if (!memhp_auto_online) {
+			lock_device_hotplug();
 			walk_memory_range(PFN_DOWN(ent->start),
 					  PFN_UP(ent->start + ent->size - 1),
 					  NULL, online_mem_block);
+			unlock_device_hotplug();
 		}
 
 		/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH RFCv2 5/6] powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages()
  2018-08-21 10:44 [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
                   ` (3 preceding siblings ...)
  2018-08-21 10:44 ` [PATCH RFCv2 4/6] powerpc/powernv: hold device_hotplug_lock when calling device_online() David Hildenbrand
@ 2018-08-21 10:44 ` David Hildenbrand
  2018-08-30 19:38   ` Pasha Tatashin
  2018-08-21 10:44 ` [PATCH RFCv2 6/6] memory-hotplug.txt: Add some details about locking internals David Hildenbrand
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2018-08-21 10:44 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, David Hildenbrand, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Rashmica Gupta, Balbir Singh, Michael Neuling

Let's perform all checking + offlining + removing under
device_hotplug_lock, so nobody can mess with these devices via
sysfs concurrently.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/platforms/powernv/memtrace.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index ef7181d4fe68..473e59842ec5 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -74,9 +74,13 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
 {
 	u64 end_pfn = start_pfn + nr_pages - 1;
 
+	lock_device_hotplug();
+
 	if (walk_memory_range(start_pfn, end_pfn, NULL,
-	    check_memblock_online))
+	    check_memblock_online)) {
+		unlock_device_hotplug();
 		return false;
+	}
 
 	walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
 			  change_memblock_state);
@@ -84,14 +88,16 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
 	if (offline_pages(start_pfn, nr_pages)) {
 		walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
 				  change_memblock_state);
+		unlock_device_hotplug();
 		return false;
 	}
 
 	walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
 			  change_memblock_state);
 
-	remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
+	__remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
 
+	unlock_device_hotplug();
 	return true;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH RFCv2 6/6] memory-hotplug.txt: Add some details about locking internals
  2018-08-21 10:44 [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
                   ` (4 preceding siblings ...)
  2018-08-21 10:44 ` [PATCH RFCv2 5/6] powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages() David Hildenbrand
@ 2018-08-21 10:44 ` David Hildenbrand
  2018-08-30 19:38   ` Pasha Tatashin
  2018-08-30 12:31 ` [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
  2018-08-31 20:54 ` Oscar Salvador
  7 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2018-08-21 10:44 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, David Hildenbrand, Jonathan Corbet, Michal Hocko,
	Andrew Morton

Let's document the magic a bit, especially why device_hotplug_lock is
required when adding/removing memory and how it all play together with
requests to online/offline memory from user space.

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 Documentation/memory-hotplug.txt | 39 +++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 7f49ebf3ddb2..03aaad7d7373 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -3,7 +3,7 @@ Memory Hotplug
 ==============
 
 :Created:							Jul 28 2007
-:Updated: Add description of notifier of memory hotplug:	Oct 11 2007
+:Updated: Add some details about locking internals:		Aug 20 2018
 
 This document is about memory hotplug including how-to-use and current status.
 Because Memory Hotplug is still under development, contents of this text will
@@ -495,6 +495,43 @@ further processing of the notification queue.
 
 NOTIFY_STOP stops further processing of the notification queue.
 
+
+Locking Internals
+=================
+
+When adding/removing memory that uses memory block devices (i.e. ordinary RAM),
+the device_hotplug_lock should be held to:
+
+- synchronize against online/offline requests (e.g. via sysfs). This way, memory
+  block devices can only be accessed (.online/.state attributes) by user
+  space once memory has been fully added. And when removing memory, we
+  know nobody is in critical sections.
+- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
+
+Especially, there is a possible lock inversion that is avoided using
+device_hotplug_lock when adding memory and user space tries to online that
+memory faster than expected:
+
+- device_online() will first take the device_lock(), followed by
+  mem_hotplug_lock
+- add_memory_resource() will first take the mem_hotplug_lock, followed by
+  the device_lock() (while creating the devices, during bus_add_device()).
+
+As the device is visible to user space before taking the device_lock(), this
+can result in a lock inversion.
+
+onlining/offlining of memory should be done via device_online()/
+device_offline() - to make sure it is properly synchronized to actions
+via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
+
+When adding/removing/onlining/offlining memory or adding/removing
+heterogeneous/device memory, we should always hold the mem_hotplug_lock to
+serialise memory hotplug (e.g. access to global/zone variables).
+
+In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) allows
+for a quite efficient get_online_mems/put_online_mems implementation.
+
+
 Future Work
 ===========
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock
  2018-08-21 10:44 [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
                   ` (5 preceding siblings ...)
  2018-08-21 10:44 ` [PATCH RFCv2 6/6] memory-hotplug.txt: Add some details about locking internals David Hildenbrand
@ 2018-08-30 12:31 ` David Hildenbrand
  2018-08-30 15:54   ` Pasha Tatashin
  2018-08-31 20:54 ` Oscar Salvador
  7 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2018-08-30 12:31 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, Andrew Morton, Balbir Singh, Benjamin Herrenschmidt,
	Boris Ostrovsky, Dan Williams, Greg Kroah-Hartman, Haiyang Zhang,
	Heiko Carstens, John Allen, Jonathan Corbet, Joonsoo Kim,
	Juergen Gross, Kate Stewart, K. Y. Srinivasan, Len Brown,
	Martin Schwidefsky, Mathieu Malaterre, Michael Ellerman,
	Michael Neuling, Michal Hocko, Nathan Fontenot, Oscar Salvador,
	Paul Mackerras, Pavel Tatashin, Philippe Ombredanne,
	Rafael J. Wysocki, Rashmica Gupta, Stephen Hemminger,
	Thomas Gleixner, Vlastimil Babka, YASUAKI ISHIMATSU

On 21.08.2018 12:44, David Hildenbrand wrote:
> This is the same approach as in the first RFC, but this time without
> exporting device_hotplug_lock (requested by Greg) and with some more
> details and documentation regarding locking. Tested only on x86 so far.
> 

I'll be on vacation for two weeks starting on Saturday. If there are no
comments I'll send this as !RFC once I return.

Thanks!

> --------------------------------------------------------------------------
> 
> Reading through the code and studying how mem_hotplug_lock is to be used,
> I noticed that there are two places where we can end up calling
> device_online()/device_offline() - online_pages()/offline_pages() without
> the mem_hotplug_lock. And there are other places where we call
> device_online()/device_offline() without the device_hotplug_lock.
> 
> While e.g.
> 	echo "online" > /sys/devices/system/memory/memory9/state
> is fine, e.g.
> 	echo 1 > /sys/devices/system/memory/memory9/online
> Will not take the mem_hotplug_lock. However the device_lock() and
> device_hotplug_lock.
> 
> E.g. via memory_probe_store(), we can end up calling
> add_memory()->online_pages() without the device_hotplug_lock. So we can
> have concurrent callers in online_pages(). We e.g. touch in online_pages()
> basically unprotected zone->present_pages then.
> 
> Looks like there is a longer history to that (see Patch #2 for details),
> and fixing it to work the way it was intended is not really possible. We
> would e.g. have to take the mem_hotplug_lock in device/base/core.c, which
> sounds wrong.
> 
> Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
> More details can be found in patch 3 and patch 6.
> 
> I propose the general rules (documentation added in patch 6):
> 
> 1. add_memory/add_memory_resource() must only be called with
>    device_hotplug_lock.
> 2. remove_memory() must only be called with device_hotplug_lock. This is
>    already documented and holds for all callers.
> 3. device_online()/device_offline() must only be called with
>    device_hotplug_lock. This is already documented and true for now in core
>    code. Other callers (related to memory hotplug) have to be fixed up.
> 4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
>    online_pages/offline_pages.
> 
> To me, this looks way cleaner than what we have right now (and easier to
> verify). And looking at the documentation of remove_memory, using
> lock_device_hotplug also for add_memory() feels natural.
> 
> 
> RFC -> RFCv2:
> - Don't export device_hotplug_lock, provide proper remove_memory/add_memory
>   wrappers.
> - Split up the patches a bit.
> - Try to improve powernv memtrace locking
> - Add some documentation for locking that matches my knowledge
> 
> David Hildenbrand (6):
>   mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
>   mm/memory_hotplug: make add_memory() take the device_hotplug_lock
>   mm/memory_hotplug: fix online/offline_pages called w.o.
>     mem_hotplug_lock
>   powerpc/powernv: hold device_hotplug_lock when calling device_online()
>   powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages()
>   memory-hotplug.txt: Add some details about locking internals
> 
>  Documentation/memory-hotplug.txt              | 39 +++++++++++-
>  arch/powerpc/platforms/powernv/memtrace.c     | 14 +++--
>  .../platforms/pseries/hotplug-memory.c        |  8 +--
>  drivers/acpi/acpi_memhotplug.c                |  4 +-
>  drivers/base/memory.c                         | 22 +++----
>  drivers/xen/balloon.c                         |  3 +
>  include/linux/memory_hotplug.h                |  4 +-
>  mm/memory_hotplug.c                           | 59 +++++++++++++++----
>  8 files changed, 115 insertions(+), 38 deletions(-)
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock
  2018-08-30 12:31 ` [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
@ 2018-08-30 15:54   ` Pasha Tatashin
  0 siblings, 0 replies; 20+ messages in thread
From: Pasha Tatashin @ 2018-08-30 15:54 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, Andrew Morton, Balbir Singh, Benjamin Herrenschmidt,
	Boris Ostrovsky, Dan Williams, Greg Kroah-Hartman, Haiyang Zhang,
	Heiko Carstens, John Allen, Jonathan Corbet, Joonsoo Kim,
	Juergen Gross, Kate Stewart, KY Srinivasan, Len Brown,
	Martin Schwidefsky, Mathieu Malaterre, Michael Ellerman,
	Michael Neuling, Michal Hocko, Nathan Fontenot, Oscar Salvador,
	Paul Mackerras, Philippe Ombredanne, Rafael J. Wysocki,
	Rashmica Gupta, Stephen Hemminger, Thomas Gleixner,
	Vlastimil Babka, YASUAKI ISHIMATSU



On 8/30/18 8:31 AM, David Hildenbrand wrote:
> On 21.08.2018 12:44, David Hildenbrand wrote:
>> This is the same approach as in the first RFC, but this time without
>> exporting device_hotplug_lock (requested by Greg) and with some more
>> details and documentation regarding locking. Tested only on x86 so far.
>>
> 
> I'll be on vacation for two weeks starting on Saturday. If there are no
> comments I'll send this as !RFC once I return.
>
I am studying this series, will send my comments later today.

Pavel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFCv2 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
  2018-08-21 10:44 ` [PATCH RFCv2 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock David Hildenbrand
@ 2018-08-30 19:35   ` Pasha Tatashin
  2018-08-31 13:12     ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: Pasha Tatashin @ 2018-08-30 19:35 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, Len Brown, Rashmica Gupta, Michael Neuling,
	Balbir Singh, Nathan Fontenot, John Allen, Andrew Morton,
	Michal Hocko, Dan Williams, Joonsoo Kim, Vlastimil Babka,
	Greg Kroah-Hartman, Oscar Salvador, YASUAKI ISHIMATSU,
	Mathieu Malaterre

> +
> +void __ref remove_memory(int nid, u64 start, u64 size)

Remove __ref, otherwise looks good:

Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>

> +{
> +	lock_device_hotplug();
> +	__remove_memory(nid, start, size);
> +	unlock_device_hotplug();
> +}
>  EXPORT_SYMBOL_GPL(remove_memory);
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFCv2 2/6] mm/memory_hotplug: make add_memory() take the device_hotplug_lock
  2018-08-21 10:44 ` [PATCH RFCv2 2/6] mm/memory_hotplug: make add_memory() " David Hildenbrand
@ 2018-08-30 19:36   ` Pasha Tatashin
  0 siblings, 0 replies; 20+ messages in thread
From: Pasha Tatashin @ 2018-08-30 19:36 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman,
	Boris Ostrovsky, Juergen Gross, Nathan Fontenot, John Allen,
	Andrew Morton, Michal Hocko, Dan Williams, Joonsoo Kim,
	Vlastimil Babka, Oscar Salvador, Mathieu Malaterre,
	YASUAKI ISHIMATSU

On 8/21/18 6:44 AM, David Hildenbrand wrote:
> add_memory() currently does not take the device_hotplug_lock, however
> is aleady called under the lock from
> 	arch/powerpc/platforms/pseries/hotplug-memory.c
> 	drivers/acpi/acpi_memhotplug.c
> to synchronize against CPU hot-remove and similar.
> 
> In general, we should hold the device_hotplug_lock when adding memory
> to synchronize against online/offline request (e.g. from user space) -
> which already resulted in lock inversions due to device_lock() and
> mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
> hot-add deadlock"). add_memory()/add_memory_resource() will create memory
> block devices, so this really feels like the right thing to do.
> 
> Holding the device_hotplug_lock makes sure that a memory block device
> can really only be accessed (e.g. via .online/.state) from user space,
> once the memory has been fully added to the system.
> 
> The lock is not held yet in
> 	drivers/xen/balloon.c
> 	arch/powerpc/platforms/powernv/memtrace.c
> 	drivers/s390/char/sclp_cmd.c
> 	drivers/hv/hv_balloon.c
> So, let's either use the locked variants or take the lock.
> 
> Don't export add_memory_resource(), as it once was exported to be used
> by XEN, which is never built as a module. If somebody requires it, we
> also have to export a locked variant (as device_hotplug_lock is never
> exported).

Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFCv2 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
  2018-08-21 10:44 ` [PATCH RFCv2 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
@ 2018-08-30 19:37   ` Pasha Tatashin
       [not found]   ` <70372ef5-e332-6c07-f08c-50f8808bde6d@gmail.com>
  1 sibling, 0 replies; 20+ messages in thread
From: Pasha Tatashin @ 2018-08-30 19:37 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Martin Schwidefsky,
	Heiko Carstens, Boris Ostrovsky, Juergen Gross, Rashmica Gupta,
	Michael Neuling, Balbir Singh, Kate Stewart, Thomas Gleixner,
	Philippe Ombredanne, Andrew Morton, Michal Hocko,
	Vlastimil Babka, Dan Williams, Oscar Salvador, YASUAKI ISHIMATSU,
	Mathieu Malaterre

On 8/21/18 6:44 AM, David Hildenbrand wrote:
> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
> fix concurrent memory hot-add deadlock"), which tried to fix a possible
> lock inversion reported and discussed in [1] due to the two locks
> 	a) device_lock()
> 	b) mem_hotplug_lock
> 
> While add_memory() first takes b), followed by a) during
> bus_probe_device(), onlining of memory from user space first took b),
> followed by a), exposing a possible deadlock.
> 
> In [1], and it was decided to not make use of device_hotplug_lock, but
> rather to enforce a locking order.
> 
> The problems I spotted related to this:
> 
> 1. Memory block device attributes: While .state first calls
>    mem_hotplug_begin() and the calls device_online() - which takes
>    device_lock() - .online does no longer call mem_hotplug_begin(), so
>    effectively calls online_pages() without mem_hotplug_lock.
> 
> 2. device_online() should be called under device_hotplug_lock, however
>    onlining memory during add_memory() does not take care of that.
> 
> In addition, I think there is also something wrong about the locking in
> 
> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
>    without locks. This was introduced after 30467e0b3be. And skimming over
>    the code, I assume it could need some more care in regards to locking
>    (e.g. device_online() called without device_hotplug_lock - but I'll
>    not touch that for now).
> 
> Now that we hold the device_hotplug_lock when
> - adding memory (e.g. via add_memory()/add_memory_resource())
> - removing memory (e.g. via remove_memory())
> - device_online()/device_offline()
> 
> We can move mem_hotplug_lock usage back into
> online_pages()/offline_pages().
> 
> Why is mem_hotplug_lock still needed? Essentially to make
> get_online_mems()/put_online_mems() be very fast (relying on
> device_hotplug_lock would be very slow), and to serialize against
> addition of memory that does not create memory block devices (hmm).
> 
> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
>     2015-February/065324.html
> 
> This patch is partly based on a patch by Vitaly Kuznetsov.

Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFCv2 4/6] powerpc/powernv: hold device_hotplug_lock when calling device_online()
  2018-08-21 10:44 ` [PATCH RFCv2 4/6] powerpc/powernv: hold device_hotplug_lock when calling device_online() David Hildenbrand
@ 2018-08-30 19:38   ` Pasha Tatashin
  0 siblings, 0 replies; 20+ messages in thread
From: Pasha Tatashin @ 2018-08-30 19:38 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rashmica Gupta, Balbir Singh, Michael Neuling

Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>

On 8/21/18 6:44 AM, David Hildenbrand wrote:
> device_online() should be called with device_hotplug_lock() held.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Cc: Balbir Singh <bsingharora@gmail.com>
> Cc: Michael Neuling <mikey@neuling.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/powerpc/platforms/powernv/memtrace.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
> index 8f1cd4f3bfd5..ef7181d4fe68 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -229,9 +229,11 @@ static int memtrace_online(void)
>  		 * we need to online the memory ourselves.
>  		 */
>  		if (!memhp_auto_online) {
> +			lock_device_hotplug();
>  			walk_memory_range(PFN_DOWN(ent->start),
>  					  PFN_UP(ent->start + ent->size - 1),
>  					  NULL, online_mem_block);
> +			unlock_device_hotplug();
>  		}
>  
>  		/*
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFCv2 5/6] powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages()
  2018-08-21 10:44 ` [PATCH RFCv2 5/6] powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages() David Hildenbrand
@ 2018-08-30 19:38   ` Pasha Tatashin
  0 siblings, 0 replies; 20+ messages in thread
From: Pasha Tatashin @ 2018-08-30 19:38 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rashmica Gupta, Balbir Singh, Michael Neuling

Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>

On 8/21/18 6:44 AM, David Hildenbrand wrote:
> Let's perform all checking + offlining + removing under
> device_hotplug_lock, so nobody can mess with these devices via
> sysfs concurrently.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Cc: Balbir Singh <bsingharora@gmail.com>
> Cc: Michael Neuling <mikey@neuling.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/powerpc/platforms/powernv/memtrace.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
> index ef7181d4fe68..473e59842ec5 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -74,9 +74,13 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
>  {
>  	u64 end_pfn = start_pfn + nr_pages - 1;
>  
> +	lock_device_hotplug();
> +
>  	if (walk_memory_range(start_pfn, end_pfn, NULL,
> -	    check_memblock_online))
> +	    check_memblock_online)) {
> +		unlock_device_hotplug();
>  		return false;
> +	}
>  
>  	walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
>  			  change_memblock_state);
> @@ -84,14 +88,16 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
>  	if (offline_pages(start_pfn, nr_pages)) {
>  		walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
>  				  change_memblock_state);
> +		unlock_device_hotplug();
>  		return false;
>  	}
>  
>  	walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
>  			  change_memblock_state);
>  
> -	remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
> +	__remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
>  
> +	unlock_device_hotplug();
>  	return true;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFCv2 6/6] memory-hotplug.txt: Add some details about locking internals
  2018-08-21 10:44 ` [PATCH RFCv2 6/6] memory-hotplug.txt: Add some details about locking internals David Hildenbrand
@ 2018-08-30 19:38   ` Pasha Tatashin
  0 siblings, 0 replies; 20+ messages in thread
From: Pasha Tatashin @ 2018-08-30 19:38 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, Jonathan Corbet, Michal Hocko, Andrew Morton


Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>

On 8/21/18 6:44 AM, David Hildenbrand wrote:
> Let's document the magic a bit, especially why device_hotplug_lock is
> required when adding/removing memory and how it all play together with
> requests to online/offline memory from user space.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFCv2 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
  2018-08-30 19:35   ` Pasha Tatashin
@ 2018-08-31 13:12     ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2018-08-31 13:12 UTC (permalink / raw)
  To: Pasha Tatashin, linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, Len Brown, Rashmica Gupta, Michael Neuling,
	Balbir Singh, Nathan Fontenot, John Allen, Andrew Morton,
	Michal Hocko, Dan Williams, Joonsoo Kim, Vlastimil Babka,
	Greg Kroah-Hartman, Oscar Salvador, YASUAKI ISHIMATSU,
	Mathieu Malaterre

On 30.08.2018 21:35, Pasha Tatashin wrote:
>> +
>> +void __ref remove_memory(int nid, u64 start, u64 size)
> 
> Remove __ref, otherwise looks good:

Indeed, will do.

Thanks for the review. Will resend in two weeks when I'm back from vacation.

Cheers!

> 
> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
> 
>> +{
>> +	lock_device_hotplug();
>> +	__remove_memory(nid, start, size);
>> +	unlock_device_hotplug();
>> +}
>>  EXPORT_SYMBOL_GPL(remove_memory);
>>  #endif /* CONFIG_MEMORY_HOTREMOVE */


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock
  2018-08-21 10:44 [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
                   ` (6 preceding siblings ...)
  2018-08-30 12:31 ` [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
@ 2018-08-31 20:54 ` Oscar Salvador
  2018-09-01 14:03   ` David Hildenbrand
  7 siblings, 1 reply; 20+ messages in thread
From: Oscar Salvador @ 2018-08-31 20:54 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-doc, linuxppc-dev, linux-acpi,
	xen-devel, devel, Andrew Morton, Balbir Singh,
	Benjamin Herrenschmidt, Boris Ostrovsky, Dan Williams,
	Greg Kroah-Hartman, Haiyang Zhang, Heiko Carstens, John Allen,
	Jonathan Corbet, Joonsoo Kim, Juergen Gross, Kate Stewart,
	K. Y. Srinivasan, Len Brown, Martin Schwidefsky,
	Mathieu Malaterre, Michael Ellerman, Michael Neuling,
	Michal Hocko, Nathan Fontenot, Oscar Salvador, Paul Mackerras,
	Pavel Tatashin, Philippe Ombredanne, Rafael J. Wysocki,
	Rashmica Gupta, Stephen Hemminger, Thomas Gleixner,
	Vlastimil Babka, YASUAKI ISHIMATSU

On Tue, Aug 21, 2018 at 12:44:12PM +0200, David Hildenbrand wrote:
> This is the same approach as in the first RFC, but this time without
> exporting device_hotplug_lock (requested by Greg) and with some more
> details and documentation regarding locking. Tested only on x86 so far.

Hi David,

I would like to review this but I am on vacation, so I will not be able to get to it
soon.
I plan to do it once I am back.

Thanks
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock
  2018-08-31 20:54 ` Oscar Salvador
@ 2018-09-01 14:03   ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2018-09-01 14:03 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-mm, linux-kernel, linux-doc, linuxppc-dev, linux-acpi,
	xen-devel, devel, Andrew Morton, Balbir Singh,
	Benjamin Herrenschmidt, Boris Ostrovsky, Dan Williams,
	Greg Kroah-Hartman, Haiyang Zhang, Heiko Carstens, John Allen,
	Jonathan Corbet, Joonsoo Kim, Juergen Gross, Kate Stewart,
	K. Y. Srinivasan, Len Brown, Martin Schwidefsky,
	Mathieu Malaterre, Michael Ellerman, Michael Neuling,
	Michal Hocko, Nathan Fontenot, Oscar Salvador, Paul Mackerras,
	Pavel Tatashin, Philippe Ombredanne, Rafael J. Wysocki,
	Rashmica Gupta, Stephen Hemminger, Thomas Gleixner,
	Vlastimil Babka, YASUAKI ISHIMATSU

On 31.08.2018 22:54, Oscar Salvador wrote:
> On Tue, Aug 21, 2018 at 12:44:12PM +0200, David Hildenbrand wrote:
>> This is the same approach as in the first RFC, but this time without
>> exporting device_hotplug_lock (requested by Greg) and with some more
>> details and documentation regarding locking. Tested only on x86 so far.
> 
> Hi David,
> 
> I would like to review this but I am on vacation, so I will not be able to get to it
> soon.
> I plan to do it once I am back.

Sure, I won't be resending within next two weeks either way, as I am
also on vacation.

Have a nice vacation!

> 
> Thanks
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFCv2 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
       [not found]   ` <70372ef5-e332-6c07-f08c-50f8808bde6d@gmail.com>
@ 2018-09-17  7:32     ` David Hildenbrand
  2018-09-25  1:26       ` Rashmica Gupta
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2018-09-17  7:32 UTC (permalink / raw)
  To: Rashmica, linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Martin Schwidefsky, Heiko Carstens, Boris Ostrovsky,
	Juergen Gross, Michael Neuling, Balbir Singh, Kate Stewart,
	Thomas Gleixner, Philippe Ombredanne, Andrew Morton,
	Michal Hocko, Pavel Tatashin, Vlastimil Babka, Dan Williams,
	Oscar Salvador, YASUAKI ISHIMATSU, Mathieu Malaterre

Am 03.09.18 um 02:36 schrieb Rashmica:
> Hi David,
> 
> 
> On 21/08/18 20:44, David Hildenbrand wrote:
> 
>> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
>> fix concurrent memory hot-add deadlock"), which tried to fix a possible
>> lock inversion reported and discussed in [1] due to the two locks
>> 	a) device_lock()
>> 	b) mem_hotplug_lock
>>
>> While add_memory() first takes b), followed by a) during
>> bus_probe_device(), onlining of memory from user space first took b),
>> followed by a), exposing a possible deadlock.
> 
> Do you mean "onlining of memory from user space first took a),
> followed by b)"? 

Very right, thanks.

> 
>> In [1], and it was decided to not make use of device_hotplug_lock, but
>> rather to enforce a locking order.
>>
>> The problems I spotted related to this:
>>
>> 1. Memory block device attributes: While .state first calls
>>    mem_hotplug_begin() and the calls device_online() - which takes
>>    device_lock() - .online does no longer call mem_hotplug_begin(), so
>>    effectively calls online_pages() without mem_hotplug_lock.
>>
>> 2. device_online() should be called under device_hotplug_lock, however
>>    onlining memory during add_memory() does not take care of that.
>>
>> In addition, I think there is also something wrong about the locking in
>>
>> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
>>    without locks. This was introduced after 30467e0b3be. And skimming over
>>    the code, I assume it could need some more care in regards to locking
>>    (e.g. device_online() called without device_hotplug_lock - but I'll
>>    not touch that for now).
> 
> Can you mention that you fixed this in later patches?

Sure!

> 
> 
> The series looks good to me. Feel free to add my reviewed-by:
> 
> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
> 

Thanks, r-b only for this patch or all of the series?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFCv2 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
  2018-09-17  7:32     ` David Hildenbrand
@ 2018-09-25  1:26       ` Rashmica Gupta
  0 siblings, 0 replies; 20+ messages in thread
From: Rashmica Gupta @ 2018-09-25  1:26 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Martin Schwidefsky, Heiko Carstens, Boris Ostrovsky,
	Juergen Gross, Michael Neuling, Balbir Singh, Kate Stewart,
	Thomas Gleixner, Philippe Ombredanne, Andrew Morton,
	Michal Hocko, Pavel Tatashin, Vlastimil Babka, Dan Williams,
	Oscar Salvador, YASUAKI ISHIMATSU, Mathieu Malaterre

On Mon, 2018-09-17 at 09:32 +0200, David Hildenbrand wrote:
> Am 03.09.18 um 02:36 schrieb Rashmica:
> > Hi David,
> > 
> > 
> > On 21/08/18 20:44, David Hildenbrand wrote:
> > 
> > > There seem to be some problems as result of 30467e0b3be ("mm,
> > > hotplug:
> > > fix concurrent memory hot-add deadlock"), which tried to fix a
> > > possible
> > > lock inversion reported and discussed in [1] due to the two locks
> > > 	a) device_lock()
> > > 	b) mem_hotplug_lock
> > > 
> > > While add_memory() first takes b), followed by a) during
> > > bus_probe_device(), onlining of memory from user space first took
> > > b),
> > > followed by a), exposing a possible deadlock.
> > 
> > Do you mean "onlining of memory from user space first took a),
> > followed by b)"? 
> 
> Very right, thanks.
> 
> > 
> > > In [1], and it was decided to not make use of
> > > device_hotplug_lock, but
> > > rather to enforce a locking order.
> > > 
> > > The problems I spotted related to this:
> > > 
> > > 1. Memory block device attributes: While .state first calls
> > >    mem_hotplug_begin() and the calls device_online() - which
> > > takes
> > >    device_lock() - .online does no longer call
> > > mem_hotplug_begin(), so
> > >    effectively calls online_pages() without mem_hotplug_lock.
> > > 
> > > 2. device_online() should be called under device_hotplug_lock,
> > > however
> > >    onlining memory during add_memory() does not take care of
> > > that.
> > > 
> > > In addition, I think there is also something wrong about the
> > > locking in
> > > 
> > > 3. arch/powerpc/platforms/powernv/memtrace.c calls
> > > offline_pages()
> > >    without locks. This was introduced after 30467e0b3be. And
> > > skimming over
> > >    the code, I assume it could need some more care in regards to
> > > locking
> > >    (e.g. device_online() called without device_hotplug_lock - but
> > > I'll
> > >    not touch that for now).
> > 
> > Can you mention that you fixed this in later patches?
> 
> Sure!
> 
> > 
> > 
> > The series looks good to me. Feel free to add my reviewed-by:
> > 
> > Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
> > 
> 
> Thanks, r-b only for this patch or all of the series?

Sorry, I somehow missed this. To all of the series.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-09-25  1:27 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-21 10:44 [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
2018-08-21 10:44 ` [PATCH RFCv2 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock David Hildenbrand
2018-08-30 19:35   ` Pasha Tatashin
2018-08-31 13:12     ` David Hildenbrand
2018-08-21 10:44 ` [PATCH RFCv2 2/6] mm/memory_hotplug: make add_memory() " David Hildenbrand
2018-08-30 19:36   ` Pasha Tatashin
2018-08-21 10:44 ` [PATCH RFCv2 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
2018-08-30 19:37   ` Pasha Tatashin
     [not found]   ` <70372ef5-e332-6c07-f08c-50f8808bde6d@gmail.com>
2018-09-17  7:32     ` David Hildenbrand
2018-09-25  1:26       ` Rashmica Gupta
2018-08-21 10:44 ` [PATCH RFCv2 4/6] powerpc/powernv: hold device_hotplug_lock when calling device_online() David Hildenbrand
2018-08-30 19:38   ` Pasha Tatashin
2018-08-21 10:44 ` [PATCH RFCv2 5/6] powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages() David Hildenbrand
2018-08-30 19:38   ` Pasha Tatashin
2018-08-21 10:44 ` [PATCH RFCv2 6/6] memory-hotplug.txt: Add some details about locking internals David Hildenbrand
2018-08-30 19:38   ` Pasha Tatashin
2018-08-30 12:31 ` [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
2018-08-30 15:54   ` Pasha Tatashin
2018-08-31 20:54 ` Oscar Salvador
2018-09-01 14:03   ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).