linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock
@ 2018-09-18 11:48 David Hildenbrand
  2018-09-18 11:48 ` [PATCH v1 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock David Hildenbrand
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: David Hildenbrand @ 2018-09-18 11:48 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, David Hildenbrand, Andrew Morton, Balbir Singh,
	Benjamin Herrenschmidt, Boris Ostrovsky, Dan Williams,
	Greg Kroah-Hartman, Haiyang Zhang, Heiko Carstens, John Allen,
	Jonathan Corbet, Joonsoo Kim, Juergen Gross, Kate Stewart,
	K. Y. Srinivasan, Len Brown, Martin Schwidefsky,
	Mathieu Malaterre, Michael Ellerman, Michael Neuling,
	Michal Hocko, Nathan Fontenot, Oscar Salvador, Paul Mackerras,
	Pavel Tatashin, Pavel Tatashin, Philippe Ombredanne,
	Rafael J. Wysocki, Rashmica Gupta, Stephen Hemminger,
	Thomas Gleixner, Vlastimil Babka, YASUAKI ISHIMATSU

Reading through the code and studying how mem_hotplug_lock is to be used,
I noticed that there are two places where we can end up calling
device_online()/device_offline() - online_pages()/offline_pages() without
the mem_hotplug_lock. And there are other places where we call
device_online()/device_offline() without the device_hotplug_lock.

While e.g.
	echo "online" > /sys/devices/system/memory/memory9/state
is fine, e.g.
	echo 1 > /sys/devices/system/memory/memory9/online
Will not take the mem_hotplug_lock. However the device_lock() and
device_hotplug_lock.

E.g. via memory_probe_store(), we can end up calling
add_memory()->online_pages() without the device_hotplug_lock. So we can
have concurrent callers in online_pages(). We e.g. touch in online_pages()
basically unprotected zone->present_pages then.

Looks like there is a longer history to that (see Patch #2 for details),
and fixing it to work the way it was intended is not really possible. We
would e.g. have to take the mem_hotplug_lock in device/base/core.c, which
sounds wrong.

Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
More details can be found in patch 3 and patch 6.

I propose the general rules (documentation added in patch 6):

1. add_memory/add_memory_resource() must only be called with
   device_hotplug_lock.
2. remove_memory() must only be called with device_hotplug_lock. This is
   already documented and holds for all callers.
3. device_online()/device_offline() must only be called with
   device_hotplug_lock. This is already documented and true for now in core
   code. Other callers (related to memory hotplug) have to be fixed up.
4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
   online_pages/offline_pages.

To me, this looks way cleaner than what we have right now (and easier to
verify). And looking at the documentation of remove_memory, using
lock_device_hotplug also for add_memory() feels natural.


RFCv2 -> v1:
- Dropped an unnecessary _ref from remove_memory() in patch #1
- Minor patch description fixes.
- Added rb's

RFC -> RFCv2:
- Don't export device_hotplug_lock, provide proper remove_memory/add_memory
  wrappers.
- Split up the patches a bit.
- Try to improve powernv memtrace locking
- Add some documentation for locking that matches my knowledge

David Hildenbrand (6):
  mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
  mm/memory_hotplug: make add_memory() take the device_hotplug_lock
  mm/memory_hotplug: fix online/offline_pages called w.o.
    mem_hotplug_lock
  powerpc/powernv: hold device_hotplug_lock when calling device_online()
  powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages()
  memory-hotplug.txt: Add some details about locking internals

 Documentation/memory-hotplug.txt              | 39 +++++++++++-
 arch/powerpc/platforms/powernv/memtrace.c     | 14 +++--
 .../platforms/pseries/hotplug-memory.c        |  8 +--
 drivers/acpi/acpi_memhotplug.c                |  4 +-
 drivers/base/memory.c                         | 22 +++----
 drivers/xen/balloon.c                         |  3 +
 include/linux/memory_hotplug.h                |  4 +-
 mm/memory_hotplug.c                           | 59 +++++++++++++++----
 8 files changed, 115 insertions(+), 38 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v1 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
  2018-09-18 11:48 [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
@ 2018-09-18 11:48 ` David Hildenbrand
  2018-09-18 21:18   ` Rafael J. Wysocki
  2018-09-18 11:48 ` [PATCH v1 2/6] mm/memory_hotplug: make add_memory() " David Hildenbrand
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2018-09-18 11:48 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, David Hildenbrand, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Rafael J. Wysocki, Len Brown, Rashmica Gupta,
	Michael Neuling, Balbir Singh, Nathan Fontenot, John Allen,
	Andrew Morton, Michal Hocko, Dan Williams, Joonsoo Kim,
	Vlastimil Babka, Pavel Tatashin, Greg Kroah-Hartman,
	Oscar Salvador, YASUAKI ISHIMATSU, Mathieu Malaterre

remove_memory() is exported right now but requires the
device_hotplug_lock, which is not exported. So let's provide a variant
that takes the lock and only export that one.

The lock is already held in
	arch/powerpc/platforms/pseries/hotplug-memory.c
	drivers/acpi/acpi_memhotplug.c
So, let's use the locked variant.

The lock is not held (but taken in)
	arch/powerpc/platforms/powernv/memtrace.c
So let's keep using the (now) locked variant.

Apart from that, there are not other users in the tree.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Cc: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/platforms/powernv/memtrace.c       | 2 --
 arch/powerpc/platforms/pseries/hotplug-memory.c | 6 +++---
 drivers/acpi/acpi_memhotplug.c                  | 2 +-
 include/linux/memory_hotplug.h                  | 3 ++-
 mm/memory_hotplug.c                             | 9 ++++++++-
 5 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 51dc398ae3f7..8f1cd4f3bfd5 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -90,9 +90,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
 	walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
 			  change_memblock_state);
 
-	lock_device_hotplug();
 	remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
-	unlock_device_hotplug();
 
 	return true;
 }
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f54c626..b3f54466e25f 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -334,7 +334,7 @@ static int pseries_remove_memblock(unsigned long base, unsigned int memblock_siz
 	nid = memory_add_physaddr_to_nid(base);
 
 	for (i = 0; i < sections_per_block; i++) {
-		remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
+		__remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
 		base += MIN_MEMORY_BLOCK_SIZE;
 	}
 
@@ -423,7 +423,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
 	block_sz = pseries_memory_block_size();
 	nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
-	remove_memory(nid, lmb->base_addr, block_sz);
+	__remove_memory(nid, lmb->base_addr, block_sz);
 
 	/* Update memory regions for memory remove */
 	memblock_remove(lmb->base_addr, block_sz);
@@ -710,7 +710,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 
 	rc = dlpar_online_lmb(lmb);
 	if (rc) {
-		remove_memory(nid, lmb->base_addr, block_sz);
+		__remove_memory(nid, lmb->base_addr, block_sz);
 		dlpar_remove_device_tree_lmb(lmb);
 	} else {
 		lmb->flags |= DRCONF_MEM_ASSIGNED;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 6b0d3ef7309c..811148415993 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device)
 			nid = memory_add_physaddr_to_nid(info->start_addr);
 
 		acpi_unbind_memory_blocks(info);
-		remove_memory(nid, info->start_addr, info->length);
+		__remove_memory(nid, info->start_addr, info->length);
 		list_del(&info->list);
 		kfree(info);
 	}
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 34a28227068d..1f096852f479 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -301,6 +301,7 @@ extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages);
 extern void try_offline_node(int nid);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
 extern void remove_memory(int nid, u64 start, u64 size);
+extern void __remove_memory(int nid, u64 start, u64 size);
 
 #else
 static inline bool is_mem_section_removable(unsigned long pfn,
@@ -317,6 +318,7 @@ static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 }
 
 static inline void remove_memory(int nid, u64 start, u64 size) {}
+static inline void __remove_memory(int nid, u64 start, u64 size) {}
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 extern void __ref free_area_init_core_hotplug(int nid);
@@ -330,7 +332,6 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
 extern bool is_memblock_offlined(struct memory_block *mem);
-extern void remove_memory(int nid, u64 start, u64 size);
 extern int sparse_add_one_section(struct pglist_data *pgdat,
 		unsigned long start_pfn, struct vmem_altmap *altmap);
 extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 38d94b703e9d..b8b1bd970322 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1873,7 +1873,7 @@ EXPORT_SYMBOL(try_offline_node);
  * and online/offline operations before this call, as required by
  * try_offline_node().
  */
-void __ref remove_memory(int nid, u64 start, u64 size)
+void __ref __remove_memory(int nid, u64 start, u64 size)
 {
 	int ret;
 
@@ -1902,5 +1902,12 @@ void __ref remove_memory(int nid, u64 start, u64 size)
 
 	mem_hotplug_done();
 }
+
+void remove_memory(int nid, u64 start, u64 size)
+{
+	lock_device_hotplug();
+	__remove_memory(nid, start, size);
+	unlock_device_hotplug();
+}
 EXPORT_SYMBOL_GPL(remove_memory);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v1 2/6] mm/memory_hotplug: make add_memory() take the device_hotplug_lock
  2018-09-18 11:48 [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
  2018-09-18 11:48 ` [PATCH v1 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock David Hildenbrand
@ 2018-09-18 11:48 ` David Hildenbrand
  2018-09-18 21:19   ` Rafael J. Wysocki
  2018-09-18 11:48 ` [PATCH v1 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2018-09-18 11:48 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, David Hildenbrand, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Rafael J. Wysocki, Len Brown,
	Greg Kroah-Hartman, Boris Ostrovsky, Juergen Gross,
	Nathan Fontenot, John Allen, Andrew Morton, Michal Hocko,
	Dan Williams, Joonsoo Kim, Vlastimil Babka, Oscar Salvador,
	Mathieu Malaterre, Pavel Tatashin, YASUAKI ISHIMATSU

add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
	arch/powerpc/platforms/pseries/hotplug-memory.c
	drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.

In general, we should hold the device_hotplug_lock when adding memory
to synchronize against online/offline request (e.g. from user space) -
which already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
hot-add deadlock"). add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.

Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.

The lock is not held yet in
	drivers/xen/balloon.c
	arch/powerpc/platforms/powernv/memtrace.c
	drivers/s390/char/sclp_cmd.c
	drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.

Don't export add_memory_resource(), as it once was exported to be used
by XEN, which is never built as a module. If somebody requires it, we
also have to export a locked variant (as device_hotplug_lock is never
exported).

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 .../platforms/pseries/hotplug-memory.c        |  2 +-
 drivers/acpi/acpi_memhotplug.c                |  2 +-
 drivers/base/memory.c                         |  9 ++++++--
 drivers/xen/balloon.c                         |  3 +++
 include/linux/memory_hotplug.h                |  1 +
 mm/memory_hotplug.c                           | 22 ++++++++++++++++---
 6 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index b3f54466e25f..2e6f41dc103a 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -702,7 +702,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 	nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
 	/* Add the memory */
-	rc = add_memory(nid, lmb->base_addr, block_sz);
+	rc = __add_memory(nid, lmb->base_addr, block_sz);
 	if (rc) {
 		dlpar_remove_device_tree_lmb(lmb);
 		return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 811148415993..8fe0960ea572 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (node < 0)
 			node = memory_add_physaddr_to_nid(info->start_addr);
 
-		result = add_memory(node, info->start_addr, info->length);
+		result = __add_memory(node, info->start_addr, info->length);
 
 		/*
 		 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 817320c7c4c1..40cac122ec73 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -519,15 +519,20 @@ memory_probe_store(struct device *dev, struct device_attribute *attr,
 	if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
 		return -EINVAL;
 
+	ret = lock_device_hotplug_sysfs();
+	if (ret)
+		goto out;
+
 	nid = memory_add_physaddr_to_nid(phys_addr);
-	ret = add_memory(nid, phys_addr,
-			 MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+	ret = __add_memory(nid, phys_addr,
+			   MIN_MEMORY_BLOCK_SIZE * sections_per_block);
 
 	if (ret)
 		goto out;
 
 	ret = count;
 out:
+	unlock_device_hotplug();
 	return ret;
 }
 
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index e12bb256036f..6bab019a82b1 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -395,7 +395,10 @@ static enum bp_state reserve_additional_memory(void)
 	 * callers drop the mutex before trying again.
 	 */
 	mutex_unlock(&balloon_mutex);
+	/* add_memory_resource() requires the device_hotplug lock */
+	lock_device_hotplug();
 	rc = add_memory_resource(nid, resource, memhp_auto_online);
+	unlock_device_hotplug();
 	mutex_lock(&balloon_mutex);
 
 	if (rc) {
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 1f096852f479..ffd9cd10fcf3 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -324,6 +324,7 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
 extern void __ref free_area_init_core_hotplug(int nid);
 extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
 		void *arg, int (*func)(struct memory_block *, void *));
+extern int __add_memory(int nid, u64 start, u64 size);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int add_memory_resource(int nid, struct resource *resource, bool online);
 extern int arch_add_memory(int nid, u64 start, u64 size,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b8b1bd970322..ef5444145c88 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1111,7 +1111,12 @@ static int online_memory_block(struct memory_block *mem, void *arg)
 	return device_online(&mem->dev);
 }
 
-/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
+/*
+ * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
+ * and online/offline operations (triggered e.g. by sysfs).
+ *
+ * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
+ */
 int __ref add_memory_resource(int nid, struct resource *res, bool online)
 {
 	u64 start, size;
@@ -1180,9 +1185,9 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
 	mem_hotplug_done();
 	return ret;
 }
-EXPORT_SYMBOL_GPL(add_memory_resource);
 
-int __ref add_memory(int nid, u64 start, u64 size)
+/* requires device_hotplug_lock, see add_memory_resource() */
+int __ref __add_memory(int nid, u64 start, u64 size)
 {
 	struct resource *res;
 	int ret;
@@ -1196,6 +1201,17 @@ int __ref add_memory(int nid, u64 start, u64 size)
 		release_memory_resource(res);
 	return ret;
 }
+
+int add_memory(int nid, u64 start, u64 size)
+{
+	int rc;
+
+	lock_device_hotplug();
+	rc = __add_memory(nid, start, size);
+	unlock_device_hotplug();
+
+	return rc;
+}
 EXPORT_SYMBOL_GPL(add_memory);
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v1 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
  2018-09-18 11:48 [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
  2018-09-18 11:48 ` [PATCH v1 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock David Hildenbrand
  2018-09-18 11:48 ` [PATCH v1 2/6] mm/memory_hotplug: make add_memory() " David Hildenbrand
@ 2018-09-18 11:48 ` David Hildenbrand
  2018-09-18 11:48 ` [PATCH v1 4/6] powerpc/powernv: hold device_hotplug_lock when calling device_online() David Hildenbrand
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2018-09-18 11:48 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, David Hildenbrand, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Rafael J. Wysocki, Len Brown,
	Greg Kroah-Hartman, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Martin Schwidefsky, Heiko Carstens,
	Boris Ostrovsky, Juergen Gross, Rashmica Gupta, Michael Neuling,
	Balbir Singh, Kate Stewart, Thomas Gleixner, Philippe Ombredanne,
	Andrew Morton, Michal Hocko, Pavel Tatashin, Vlastimil Babka,
	Dan Williams, Oscar Salvador, YASUAKI ISHIMATSU,
	Mathieu Malaterre

There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
fix concurrent memory hot-add deadlock"), which tried to fix a possible
lock inversion reported and discussed in [1] due to the two locks
	a) device_lock()
	b) mem_hotplug_lock

While add_memory() first takes b), followed by a) during
bus_probe_device(), onlining of memory from user space first took a),
followed by b), exposing a possible deadlock.

In [1], and it was decided to not make use of device_hotplug_lock, but
rather to enforce a locking order.

The problems I spotted related to this:

1. Memory block device attributes: While .state first calls
   mem_hotplug_begin() and the calls device_online() - which takes
   device_lock() - .online does no longer call mem_hotplug_begin(), so
   effectively calls online_pages() without mem_hotplug_lock.

2. device_online() should be called under device_hotplug_lock, however
   onlining memory during add_memory() does not take care of that.

In addition, I think there is also something wrong about the locking in

3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
   without locks. This was introduced after 30467e0b3be. And skimming over
   the code, I assume it could need some more care in regards to locking
   (e.g. device_online() called without device_hotplug_lock. This will
   be addressed in the following patches.

Now that we hold the device_hotplug_lock when
- adding memory (e.g. via add_memory()/add_memory_resource())
- removing memory (e.g. via remove_memory())
- device_online()/device_offline()

We can move mem_hotplug_lock usage back into
online_pages()/offline_pages().

Why is mem_hotplug_lock still needed? Essentially to make
get_online_mems()/put_online_mems() be very fast (relying on
device_hotplug_lock would be very slow), and to serialize against
addition of memory that does not create memory block devices (hmm).

[1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
    2015-February/065324.html

This patch is partly based on a patch by Vitaly Kuznetsov.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Cc: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c | 13 +------------
 mm/memory_hotplug.c   | 28 ++++++++++++++++++++--------
 2 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 40cac122ec73..0e5985682642 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -228,7 +228,6 @@ static bool pages_correctly_probed(unsigned long start_pfn)
 /*
  * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is
  * OK to have direct references to sparsemem variables in here.
- * Must already be protected by mem_hotplug_begin().
  */
 static int
 memory_block_action(unsigned long phys_index, unsigned long action, int online_type)
@@ -294,7 +293,6 @@ static int memory_subsys_online(struct device *dev)
 	if (mem->online_type < 0)
 		mem->online_type = MMOP_ONLINE_KEEP;
 
-	/* Already under protection of mem_hotplug_begin() */
 	ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
 
 	/* clear online_type */
@@ -341,19 +339,11 @@ store_mem_state(struct device *dev,
 		goto err;
 	}
 
-	/*
-	 * Memory hotplug needs to hold mem_hotplug_begin() for probe to find
-	 * the correct memory block to online before doing device_online(dev),
-	 * which will take dev->mutex.  Take the lock early to prevent an
-	 * inversion, memory_subsys_online() callbacks will be implemented by
-	 * assuming it's already protected.
-	 */
-	mem_hotplug_begin();
-
 	switch (online_type) {
 	case MMOP_ONLINE_KERNEL:
 	case MMOP_ONLINE_MOVABLE:
 	case MMOP_ONLINE_KEEP:
+		/* mem->online_type is protected by device_hotplug_lock */
 		mem->online_type = online_type;
 		ret = device_online(&mem->dev);
 		break;
@@ -364,7 +354,6 @@ store_mem_state(struct device *dev,
 		ret = -EINVAL; /* should never happen */
 	}
 
-	mem_hotplug_done();
 err:
 	unlock_device_hotplug();
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ef5444145c88..497e9315ca6f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -881,7 +881,6 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
 	return zone;
 }
 
-/* Must be protected by mem_hotplug_begin() or a device_lock */
 int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type)
 {
 	unsigned long flags;
@@ -893,6 +892,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	struct memory_notify arg;
 	struct memory_block *mem;
 
+	mem_hotplug_begin();
+
 	/*
 	 * We can't use pfn_to_nid() because nid might be stored in struct page
 	 * which is not yet initialized. Instead, we find nid from memory block.
@@ -957,6 +958,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 
 	if (onlined_pages)
 		memory_notify(MEM_ONLINE, &arg);
+	mem_hotplug_done();
 	return 0;
 
 failed_addition:
@@ -964,6 +966,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 		 (unsigned long long) pfn << PAGE_SHIFT,
 		 (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
 	memory_notify(MEM_CANCEL_ONLINE, &arg);
+	mem_hotplug_done();
 	return ret;
 }
 #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
@@ -1168,20 +1171,20 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
 	/* create new memmap entry */
 	firmware_map_add_hotplug(start, start + size, "System RAM");
 
+	/* device_online() will take the lock when calling online_pages() */
+	mem_hotplug_done();
+
 	/* online pages if requested */
 	if (online)
 		walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1),
 				  NULL, online_memory_block);
 
-	goto out;
-
+	return ret;
 error:
 	/* rollback pgdat allocation and others */
 	if (new_node)
 		rollback_node_hotadd(nid);
 	memblock_remove(start, size);
-
-out:
 	mem_hotplug_done();
 	return ret;
 }
@@ -1622,10 +1625,16 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		return -EINVAL;
 	if (!IS_ALIGNED(end_pfn, pageblock_nr_pages))
 		return -EINVAL;
+
+	mem_hotplug_begin();
+
 	/* This makes hotplug much easier...and readable.
 	   we assume this for now. .*/
-	if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, &valid_end))
+	if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start,
+				  &valid_end)) {
+		mem_hotplug_done();
 		return -EINVAL;
+	}
 
 	zone = page_zone(pfn_to_page(valid_start));
 	node = zone_to_nid(zone);
@@ -1634,8 +1643,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	/* set above range as isolated */
 	ret = start_isolate_page_range(start_pfn, end_pfn,
 				       MIGRATE_MOVABLE, true);
-	if (ret)
+	if (ret) {
+		mem_hotplug_done();
 		return ret;
+	}
 
 	arg.start_pfn = start_pfn;
 	arg.nr_pages = nr_pages;
@@ -1706,6 +1717,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	writeback_set_ratelimit();
 
 	memory_notify(MEM_OFFLINE, &arg);
+	mem_hotplug_done();
 	return 0;
 
 failed_removal:
@@ -1715,10 +1727,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	memory_notify(MEM_CANCEL_OFFLINE, &arg);
 	/* pushback to free area */
 	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	mem_hotplug_done();
 	return ret;
 }
 
-/* Must be protected by mem_hotplug_begin() or a device_lock */
 int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 {
 	return __offline_pages(start_pfn, start_pfn + nr_pages);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v1 4/6] powerpc/powernv: hold device_hotplug_lock when calling device_online()
  2018-09-18 11:48 [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
                   ` (2 preceding siblings ...)
  2018-09-18 11:48 ` [PATCH v1 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
@ 2018-09-18 11:48 ` David Hildenbrand
  2018-09-18 11:48 ` [PATCH v1 5/6] powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages() David Hildenbrand
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2018-09-18 11:48 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, David Hildenbrand, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Rashmica Gupta, Balbir Singh, Michael Neuling

device_online() should be called with device_hotplug_lock() held.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/platforms/powernv/memtrace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 8f1cd4f3bfd5..ef7181d4fe68 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -229,9 +229,11 @@ static int memtrace_online(void)
 		 * we need to online the memory ourselves.
 		 */
 		if (!memhp_auto_online) {
+			lock_device_hotplug();
 			walk_memory_range(PFN_DOWN(ent->start),
 					  PFN_UP(ent->start + ent->size - 1),
 					  NULL, online_mem_block);
+			unlock_device_hotplug();
 		}
 
 		/*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v1 5/6] powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages()
  2018-09-18 11:48 [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
                   ` (3 preceding siblings ...)
  2018-09-18 11:48 ` [PATCH v1 4/6] powerpc/powernv: hold device_hotplug_lock when calling device_online() David Hildenbrand
@ 2018-09-18 11:48 ` David Hildenbrand
  2018-09-18 11:48 ` [PATCH v1 6/6] memory-hotplug.txt: Add some details about locking internals David Hildenbrand
  2018-09-19  1:22 ` [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock Balbir Singh
  6 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2018-09-18 11:48 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, David Hildenbrand, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Rashmica Gupta, Balbir Singh, Michael Neuling

Let's perform all checking + offlining + removing under
device_hotplug_lock, so nobody can mess with these devices via
sysfs concurrently.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/platforms/powernv/memtrace.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index ef7181d4fe68..473e59842ec5 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -74,9 +74,13 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
 {
 	u64 end_pfn = start_pfn + nr_pages - 1;
 
+	lock_device_hotplug();
+
 	if (walk_memory_range(start_pfn, end_pfn, NULL,
-	    check_memblock_online))
+	    check_memblock_online)) {
+		unlock_device_hotplug();
 		return false;
+	}
 
 	walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
 			  change_memblock_state);
@@ -84,14 +88,16 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
 	if (offline_pages(start_pfn, nr_pages)) {
 		walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
 				  change_memblock_state);
+		unlock_device_hotplug();
 		return false;
 	}
 
 	walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
 			  change_memblock_state);
 
-	remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
+	__remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
 
+	unlock_device_hotplug();
 	return true;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v1 6/6] memory-hotplug.txt: Add some details about locking internals
  2018-09-18 11:48 [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
                   ` (4 preceding siblings ...)
  2018-09-18 11:48 ` [PATCH v1 5/6] powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages() David Hildenbrand
@ 2018-09-18 11:48 ` David Hildenbrand
  2018-09-19  1:22 ` [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock Balbir Singh
  6 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2018-09-18 11:48 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-doc, linuxppc-dev, linux-acpi, xen-devel,
	devel, David Hildenbrand, Jonathan Corbet, Michal Hocko,
	Andrew Morton

Let's document the magic a bit, especially why device_hotplug_lock is
required when adding/removing memory and how it all play together with
requests to online/offline memory from user space.

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 Documentation/memory-hotplug.txt | 39 +++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 7f49ebf3ddb2..03aaad7d7373 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -3,7 +3,7 @@ Memory Hotplug
 ==============
 
 :Created:							Jul 28 2007
-:Updated: Add description of notifier of memory hotplug:	Oct 11 2007
+:Updated: Add some details about locking internals:		Aug 20 2018
 
 This document is about memory hotplug including how-to-use and current status.
 Because Memory Hotplug is still under development, contents of this text will
@@ -495,6 +495,43 @@ further processing of the notification queue.
 
 NOTIFY_STOP stops further processing of the notification queue.
 
+
+Locking Internals
+=================
+
+When adding/removing memory that uses memory block devices (i.e. ordinary RAM),
+the device_hotplug_lock should be held to:
+
+- synchronize against online/offline requests (e.g. via sysfs). This way, memory
+  block devices can only be accessed (.online/.state attributes) by user
+  space once memory has been fully added. And when removing memory, we
+  know nobody is in critical sections.
+- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
+
+Especially, there is a possible lock inversion that is avoided using
+device_hotplug_lock when adding memory and user space tries to online that
+memory faster than expected:
+
+- device_online() will first take the device_lock(), followed by
+  mem_hotplug_lock
+- add_memory_resource() will first take the mem_hotplug_lock, followed by
+  the device_lock() (while creating the devices, during bus_add_device()).
+
+As the device is visible to user space before taking the device_lock(), this
+can result in a lock inversion.
+
+onlining/offlining of memory should be done via device_online()/
+device_offline() - to make sure it is properly synchronized to actions
+via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
+
+When adding/removing/onlining/offlining memory or adding/removing
+heterogeneous/device memory, we should always hold the mem_hotplug_lock to
+serialise memory hotplug (e.g. access to global/zone variables).
+
+In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) allows
+for a quite efficient get_online_mems/put_online_mems implementation.
+
+
 Future Work
 ===========
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
  2018-09-18 11:48 ` [PATCH v1 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock David Hildenbrand
@ 2018-09-18 21:18   ` Rafael J. Wysocki
  0 siblings, 0 replies; 12+ messages in thread
From: Rafael J. Wysocki @ 2018-09-18 21:18 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux Memory Management List, Linux Kernel Mailing List,
	open list:DOCUMENTATION, linuxppc-dev, ACPI Devel Maling List,
	xen-devel, devel, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Rafael J. Wysocki, Len Brown, rashmica.g,
	Michael Neuling, Balbir Singh, nfont, jallen, Andrew Morton,
	Michal Hocko, Dan Williams, Joonsoo Kim, Vlastimil Babka,
	Pavel Tatashin, Greg Kroah-Hartman, osalvador, yasu.isimatu,
	malat

On Tue, Sep 18, 2018 at 1:48 PM David Hildenbrand <david@redhat.com> wrote:
>
> remove_memory() is exported right now but requires the
> device_hotplug_lock, which is not exported. So let's provide a variant
> that takes the lock and only export that one.
>
> The lock is already held in
>         arch/powerpc/platforms/pseries/hotplug-memory.c
>         drivers/acpi/acpi_memhotplug.c
> So, let's use the locked variant.
>
> The lock is not held (but taken in)
>         arch/powerpc/platforms/powernv/memtrace.c
> So let's keep using the (now) locked variant.
>
> Apart from that, there are not other users in the tree.
>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Cc: Michael Neuling <mikey@neuling.org>
> Cc: Balbir Singh <bsingharora@gmail.com>
> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> Cc: John Allen <jallen@linux.vnet.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
> Cc: Mathieu Malaterre <malat@debian.org>
> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/powerpc/platforms/powernv/memtrace.c       | 2 --
>  arch/powerpc/platforms/pseries/hotplug-memory.c | 6 +++---
>  drivers/acpi/acpi_memhotplug.c                  | 2 +-
>  include/linux/memory_hotplug.h                  | 3 ++-
>  mm/memory_hotplug.c                             | 9 ++++++++-
>  5 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
> index 51dc398ae3f7..8f1cd4f3bfd5 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -90,9 +90,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
>         walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
>                           change_memblock_state);
>
> -       lock_device_hotplug();
>         remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
> -       unlock_device_hotplug();
>
>         return true;
>  }
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index c1578f54c626..b3f54466e25f 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -334,7 +334,7 @@ static int pseries_remove_memblock(unsigned long base, unsigned int memblock_siz
>         nid = memory_add_physaddr_to_nid(base);
>
>         for (i = 0; i < sections_per_block; i++) {
> -               remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
> +               __remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
>                 base += MIN_MEMORY_BLOCK_SIZE;
>         }
>
> @@ -423,7 +423,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
>         block_sz = pseries_memory_block_size();
>         nid = memory_add_physaddr_to_nid(lmb->base_addr);
>
> -       remove_memory(nid, lmb->base_addr, block_sz);
> +       __remove_memory(nid, lmb->base_addr, block_sz);
>
>         /* Update memory regions for memory remove */
>         memblock_remove(lmb->base_addr, block_sz);
> @@ -710,7 +710,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>
>         rc = dlpar_online_lmb(lmb);
>         if (rc) {
> -               remove_memory(nid, lmb->base_addr, block_sz);
> +               __remove_memory(nid, lmb->base_addr, block_sz);
>                 dlpar_remove_device_tree_lmb(lmb);
>         } else {
>                 lmb->flags |= DRCONF_MEM_ASSIGNED;
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 6b0d3ef7309c..811148415993 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device)
>                         nid = memory_add_physaddr_to_nid(info->start_addr);
>
>                 acpi_unbind_memory_blocks(info);
> -               remove_memory(nid, info->start_addr, info->length);
> +               __remove_memory(nid, info->start_addr, info->length);
>                 list_del(&info->list);
>                 kfree(info);
>         }
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 34a28227068d..1f096852f479 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -301,6 +301,7 @@ extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages);
>  extern void try_offline_node(int nid);
>  extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
>  extern void remove_memory(int nid, u64 start, u64 size);
> +extern void __remove_memory(int nid, u64 start, u64 size);
>
>  #else
>  static inline bool is_mem_section_removable(unsigned long pfn,
> @@ -317,6 +318,7 @@ static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
>  }
>
>  static inline void remove_memory(int nid, u64 start, u64 size) {}
> +static inline void __remove_memory(int nid, u64 start, u64 size) {}
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>
>  extern void __ref free_area_init_core_hotplug(int nid);
> @@ -330,7 +332,6 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>                 unsigned long nr_pages, struct vmem_altmap *altmap);
>  extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
>  extern bool is_memblock_offlined(struct memory_block *mem);
> -extern void remove_memory(int nid, u64 start, u64 size);
>  extern int sparse_add_one_section(struct pglist_data *pgdat,
>                 unsigned long start_pfn, struct vmem_altmap *altmap);
>  extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 38d94b703e9d..b8b1bd970322 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1873,7 +1873,7 @@ EXPORT_SYMBOL(try_offline_node);
>   * and online/offline operations before this call, as required by
>   * try_offline_node().
>   */
> -void __ref remove_memory(int nid, u64 start, u64 size)
> +void __ref __remove_memory(int nid, u64 start, u64 size)
>  {
>         int ret;
>
> @@ -1902,5 +1902,12 @@ void __ref remove_memory(int nid, u64 start, u64 size)
>
>         mem_hotplug_done();
>  }
> +
> +void remove_memory(int nid, u64 start, u64 size)
> +{
> +       lock_device_hotplug();
> +       __remove_memory(nid, start, size);
> +       unlock_device_hotplug();
> +}
>  EXPORT_SYMBOL_GPL(remove_memory);
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
> --

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 2/6] mm/memory_hotplug: make add_memory() take the device_hotplug_lock
  2018-09-18 11:48 ` [PATCH v1 2/6] mm/memory_hotplug: make add_memory() " David Hildenbrand
@ 2018-09-18 21:19   ` Rafael J. Wysocki
  0 siblings, 0 replies; 12+ messages in thread
From: Rafael J. Wysocki @ 2018-09-18 21:19 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux Memory Management List, Linux Kernel Mailing List,
	open list:DOCUMENTATION, linuxppc-dev, ACPI Devel Maling List,
	xen-devel, devel, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Rafael J. Wysocki, Len Brown,
	Greg Kroah-Hartman, Boris Ostrovsky, Juergen Gross, nfont,
	jallen, Andrew Morton, Michal Hocko, Dan Williams, Joonsoo Kim,
	Vlastimil Babka, osalvador, malat, pavel.tatashin, yasu.isimatu

On Tue, Sep 18, 2018 at 1:48 PM David Hildenbrand <david@redhat.com> wrote:
>
> add_memory() currently does not take the device_hotplug_lock, however
> is aleady called under the lock from
>         arch/powerpc/platforms/pseries/hotplug-memory.c
>         drivers/acpi/acpi_memhotplug.c
> to synchronize against CPU hot-remove and similar.
>
> In general, we should hold the device_hotplug_lock when adding memory
> to synchronize against online/offline request (e.g. from user space) -
> which already resulted in lock inversions due to device_lock() and
> mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
> hot-add deadlock"). add_memory()/add_memory_resource() will create memory
> block devices, so this really feels like the right thing to do.
>
> Holding the device_hotplug_lock makes sure that a memory block device
> can really only be accessed (e.g. via .online/.state) from user space,
> once the memory has been fully added to the system.
>
> The lock is not held yet in
>         drivers/xen/balloon.c
>         arch/powerpc/platforms/powernv/memtrace.c
>         drivers/s390/char/sclp_cmd.c
>         drivers/hv/hv_balloon.c
> So, let's either use the locked variants or take the lock.
>
> Don't export add_memory_resource(), as it once was exported to be used
> by XEN, which is never built as a module. If somebody requires it, we
> also have to export a locked variant (as device_hotplug_lock is never
> exported).
>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> Cc: John Allen <jallen@linux.vnet.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Mathieu Malaterre <malat@debian.org>
> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
> Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  .../platforms/pseries/hotplug-memory.c        |  2 +-
>  drivers/acpi/acpi_memhotplug.c                |  2 +-
>  drivers/base/memory.c                         |  9 ++++++--
>  drivers/xen/balloon.c                         |  3 +++
>  include/linux/memory_hotplug.h                |  1 +
>  mm/memory_hotplug.c                           | 22 ++++++++++++++++---
>  6 files changed, 32 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index b3f54466e25f..2e6f41dc103a 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -702,7 +702,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>         nid = memory_add_physaddr_to_nid(lmb->base_addr);
>
>         /* Add the memory */
> -       rc = add_memory(nid, lmb->base_addr, block_sz);
> +       rc = __add_memory(nid, lmb->base_addr, block_sz);
>         if (rc) {
>                 dlpar_remove_device_tree_lmb(lmb);
>                 return rc;
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 811148415993..8fe0960ea572 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>                 if (node < 0)
>                         node = memory_add_physaddr_to_nid(info->start_addr);
>
> -               result = add_memory(node, info->start_addr, info->length);
> +               result = __add_memory(node, info->start_addr, info->length);
>
>                 /*
>                  * If the memory block has been used by the kernel, add_memory()
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 817320c7c4c1..40cac122ec73 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -519,15 +519,20 @@ memory_probe_store(struct device *dev, struct device_attribute *attr,
>         if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
>                 return -EINVAL;
>
> +       ret = lock_device_hotplug_sysfs();
> +       if (ret)
> +               goto out;
> +
>         nid = memory_add_physaddr_to_nid(phys_addr);
> -       ret = add_memory(nid, phys_addr,
> -                        MIN_MEMORY_BLOCK_SIZE * sections_per_block);
> +       ret = __add_memory(nid, phys_addr,
> +                          MIN_MEMORY_BLOCK_SIZE * sections_per_block);
>
>         if (ret)
>                 goto out;
>
>         ret = count;
>  out:
> +       unlock_device_hotplug();
>         return ret;
>  }
>
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index e12bb256036f..6bab019a82b1 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -395,7 +395,10 @@ static enum bp_state reserve_additional_memory(void)
>          * callers drop the mutex before trying again.
>          */
>         mutex_unlock(&balloon_mutex);
> +       /* add_memory_resource() requires the device_hotplug lock */
> +       lock_device_hotplug();
>         rc = add_memory_resource(nid, resource, memhp_auto_online);
> +       unlock_device_hotplug();
>         mutex_lock(&balloon_mutex);
>
>         if (rc) {
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 1f096852f479..ffd9cd10fcf3 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -324,6 +324,7 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
>  extern void __ref free_area_init_core_hotplug(int nid);
>  extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
>                 void *arg, int (*func)(struct memory_block *, void *));
> +extern int __add_memory(int nid, u64 start, u64 size);
>  extern int add_memory(int nid, u64 start, u64 size);
>  extern int add_memory_resource(int nid, struct resource *resource, bool online);
>  extern int arch_add_memory(int nid, u64 start, u64 size,
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index b8b1bd970322..ef5444145c88 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1111,7 +1111,12 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>         return device_online(&mem->dev);
>  }
>
> -/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
> +/*
> + * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
> + * and online/offline operations (triggered e.g. by sysfs).
> + *
> + * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
> + */
>  int __ref add_memory_resource(int nid, struct resource *res, bool online)
>  {
>         u64 start, size;
> @@ -1180,9 +1185,9 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online)
>         mem_hotplug_done();
>         return ret;
>  }
> -EXPORT_SYMBOL_GPL(add_memory_resource);
>
> -int __ref add_memory(int nid, u64 start, u64 size)
> +/* requires device_hotplug_lock, see add_memory_resource() */
> +int __ref __add_memory(int nid, u64 start, u64 size)
>  {
>         struct resource *res;
>         int ret;
> @@ -1196,6 +1201,17 @@ int __ref add_memory(int nid, u64 start, u64 size)
>                 release_memory_resource(res);
>         return ret;
>  }
> +
> +int add_memory(int nid, u64 start, u64 size)
> +{
> +       int rc;
> +
> +       lock_device_hotplug();
> +       rc = __add_memory(nid, start, size);
> +       unlock_device_hotplug();
> +
> +       return rc;
> +}
>  EXPORT_SYMBOL_GPL(add_memory);
>
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> --

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock
  2018-09-18 11:48 [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
                   ` (5 preceding siblings ...)
  2018-09-18 11:48 ` [PATCH v1 6/6] memory-hotplug.txt: Add some details about locking internals David Hildenbrand
@ 2018-09-19  1:22 ` Balbir Singh
  2018-09-19  7:35   ` David Hildenbrand
  6 siblings, 1 reply; 12+ messages in thread
From: Balbir Singh @ 2018-09-19  1:22 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-doc, linuxppc-dev, linux-acpi,
	xen-devel, devel, Andrew Morton, Benjamin Herrenschmidt,
	Boris Ostrovsky, Dan Williams, Greg Kroah-Hartman, Haiyang Zhang,
	Heiko Carstens, John Allen, Jonathan Corbet, Joonsoo Kim,
	Juergen Gross, Kate Stewart, K. Y. Srinivasan, Len Brown,
	Martin Schwidefsky, Mathieu Malaterre, Michael Ellerman,
	Michael Neuling, Michal Hocko, Nathan Fontenot, Oscar Salvador,
	Paul Mackerras, Pavel Tatashin, Pavel Tatashin,
	Philippe Ombredanne, Rafael J. Wysocki, Rashmica Gupta,
	Stephen Hemminger, Thomas Gleixner, Vlastimil Babka,
	YASUAKI ISHIMATSU

On Tue, Sep 18, 2018 at 01:48:16PM +0200, David Hildenbrand wrote:
> Reading through the code and studying how mem_hotplug_lock is to be used,
> I noticed that there are two places where we can end up calling
> device_online()/device_offline() - online_pages()/offline_pages() without
> the mem_hotplug_lock. And there are other places where we call
> device_online()/device_offline() without the device_hotplug_lock.
> 
> While e.g.
> 	echo "online" > /sys/devices/system/memory/memory9/state
> is fine, e.g.
> 	echo 1 > /sys/devices/system/memory/memory9/online
> Will not take the mem_hotplug_lock. However the device_lock() and
> device_hotplug_lock.
> 
> E.g. via memory_probe_store(), we can end up calling
> add_memory()->online_pages() without the device_hotplug_lock. So we can
> have concurrent callers in online_pages(). We e.g. touch in online_pages()
> basically unprotected zone->present_pages then.
> 
> Looks like there is a longer history to that (see Patch #2 for details),
> and fixing it to work the way it was intended is not really possible. We
> would e.g. have to take the mem_hotplug_lock in device/base/core.c, which
> sounds wrong.
> 
> Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
> More details can be found in patch 3 and patch 6.
> 
> I propose the general rules (documentation added in patch 6):
> 
> 1. add_memory/add_memory_resource() must only be called with
>    device_hotplug_lock.
> 2. remove_memory() must only be called with device_hotplug_lock. This is
>    already documented and holds for all callers.
> 3. device_online()/device_offline() must only be called with
>    device_hotplug_lock. This is already documented and true for now in core
>    code. Other callers (related to memory hotplug) have to be fixed up.
> 4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
>    online_pages/offline_pages.
> 
> To me, this looks way cleaner than what we have right now (and easier to
> verify). And looking at the documentation of remove_memory, using
> lock_device_hotplug also for add_memory() feels natural.
>

That seems reasonable, but also implies that device_online() would hold
back add/remove memory, could you please also document what mode
read/write the locks need to be held? For example can the device_hotplug_lock
be held in read mode while add/remove memory via (mem_hotplug_lock) is held
in write mode?

Balbir Singh.
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock
  2018-09-19  1:22 ` [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock Balbir Singh
@ 2018-09-19  7:35   ` David Hildenbrand
  2018-09-23  2:34     ` Balbir Singh
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2018-09-19  7:35 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, linux-kernel, linux-doc, linuxppc-dev, linux-acpi,
	xen-devel, devel, Andrew Morton, Benjamin Herrenschmidt,
	Boris Ostrovsky, Dan Williams, Greg Kroah-Hartman, Haiyang Zhang,
	Heiko Carstens, John Allen, Jonathan Corbet, Joonsoo Kim,
	Juergen Gross, Kate Stewart, K. Y. Srinivasan, Len Brown,
	Martin Schwidefsky, Mathieu Malaterre, Michael Ellerman,
	Michael Neuling, Michal Hocko, Nathan Fontenot, Oscar Salvador,
	Paul Mackerras, Pavel Tatashin, Pavel Tatashin,
	Philippe Ombredanne, Rafael J. Wysocki, Rashmica Gupta,
	Stephen Hemminger, Thomas Gleixner, Vlastimil Babka,
	YASUAKI ISHIMATSU

Am 19.09.18 um 03:22 schrieb Balbir Singh:
> On Tue, Sep 18, 2018 at 01:48:16PM +0200, David Hildenbrand wrote:
>> Reading through the code and studying how mem_hotplug_lock is to be used,
>> I noticed that there are two places where we can end up calling
>> device_online()/device_offline() - online_pages()/offline_pages() without
>> the mem_hotplug_lock. And there are other places where we call
>> device_online()/device_offline() without the device_hotplug_lock.
>>
>> While e.g.
>> 	echo "online" > /sys/devices/system/memory/memory9/state
>> is fine, e.g.
>> 	echo 1 > /sys/devices/system/memory/memory9/online
>> Will not take the mem_hotplug_lock. However the device_lock() and
>> device_hotplug_lock.
>>
>> E.g. via memory_probe_store(), we can end up calling
>> add_memory()->online_pages() without the device_hotplug_lock. So we can
>> have concurrent callers in online_pages(). We e.g. touch in online_pages()
>> basically unprotected zone->present_pages then.
>>
>> Looks like there is a longer history to that (see Patch #2 for details),
>> and fixing it to work the way it was intended is not really possible. We
>> would e.g. have to take the mem_hotplug_lock in device/base/core.c, which
>> sounds wrong.
>>
>> Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
>> More details can be found in patch 3 and patch 6.
>>
>> I propose the general rules (documentation added in patch 6):
>>
>> 1. add_memory/add_memory_resource() must only be called with
>>    device_hotplug_lock.
>> 2. remove_memory() must only be called with device_hotplug_lock. This is
>>    already documented and holds for all callers.
>> 3. device_online()/device_offline() must only be called with
>>    device_hotplug_lock. This is already documented and true for now in core
>>    code. Other callers (related to memory hotplug) have to be fixed up.
>> 4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
>>    online_pages/offline_pages.
>>
>> To me, this looks way cleaner than what we have right now (and easier to
>> verify). And looking at the documentation of remove_memory, using
>> lock_device_hotplug also for add_memory() feels natural.
>>
> 
> That seems reasonable, but also implies that device_online() would hold
> back add/remove memory, could you please also document what mode
> read/write the locks need to be held? For example can the device_hotplug_lock
> be held in read mode while add/remove memory via (mem_hotplug_lock) is held
> in write mode?

device_hotplug_lock is an ordinary mutex. So no option there.

Only mem_hotplug_lock is a per CPU RW mutex. And as of now it only
exists to not require get_online_mems()/put_online_mems() to take the
device_hotplug_lock. Which is perfectly valid, because these users only
care about memory (not any other devices) not suddenly vanish. And that
RW lock makes things fast.

Any modifications (online/offline/add/remove) require the
mem_hotplug_lock in write.

I can add some more details to documentation in patch #6.

"... we should always hold the mem_hotplug_lock (via
mem_hotplug_begin/mem_hotplug_done) in write mode to serialize memory
hotplug" ..."

"In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in
read mode allows for a quite efficient get_online_mems/put_online_mems
implementation, so code accessing memory can protect from that memory
vanishing."

Would that work for you?

Thanks!

> 
> Balbir Singh.
>  
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock
  2018-09-19  7:35   ` David Hildenbrand
@ 2018-09-23  2:34     ` Balbir Singh
  0 siblings, 0 replies; 12+ messages in thread
From: Balbir Singh @ 2018-09-23  2:34 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-doc, linuxppc-dev, linux-acpi,
	xen-devel, devel, Andrew Morton, Benjamin Herrenschmidt,
	Boris Ostrovsky, Dan Williams, Greg Kroah-Hartman, Haiyang Zhang,
	Heiko Carstens, John Allen, Jonathan Corbet, Joonsoo Kim,
	Juergen Gross, Kate Stewart, K. Y. Srinivasan, Len Brown,
	Martin Schwidefsky, Mathieu Malaterre, Michael Ellerman,
	Michael Neuling, Michal Hocko, Nathan Fontenot, Oscar Salvador,
	Paul Mackerras, Pavel Tatashin, Pavel Tatashin,
	Philippe Ombredanne, Rafael J. Wysocki, Rashmica Gupta,
	Stephen Hemminger, Thomas Gleixner, Vlastimil Babka,
	YASUAKI ISHIMATSU

On Wed, Sep 19, 2018 at 09:35:07AM +0200, David Hildenbrand wrote:
> Am 19.09.18 um 03:22 schrieb Balbir Singh:
> > On Tue, Sep 18, 2018 at 01:48:16PM +0200, David Hildenbrand wrote:
> >> Reading through the code and studying how mem_hotplug_lock is to be used,
> >> I noticed that there are two places where we can end up calling
> >> device_online()/device_offline() - online_pages()/offline_pages() without
> >> the mem_hotplug_lock. And there are other places where we call
> >> device_online()/device_offline() without the device_hotplug_lock.
> >>
> >> While e.g.
> >> 	echo "online" > /sys/devices/system/memory/memory9/state
> >> is fine, e.g.
> >> 	echo 1 > /sys/devices/system/memory/memory9/online
> >> Will not take the mem_hotplug_lock. However the device_lock() and
> >> device_hotplug_lock.
> >>
> >> E.g. via memory_probe_store(), we can end up calling
> >> add_memory()->online_pages() without the device_hotplug_lock. So we can
> >> have concurrent callers in online_pages(). We e.g. touch in online_pages()
> >> basically unprotected zone->present_pages then.
> >>
> >> Looks like there is a longer history to that (see Patch #2 for details),
> >> and fixing it to work the way it was intended is not really possible. We
> >> would e.g. have to take the mem_hotplug_lock in device/base/core.c, which
> >> sounds wrong.
> >>
> >> Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
> >> More details can be found in patch 3 and patch 6.
> >>
> >> I propose the general rules (documentation added in patch 6):
> >>
> >> 1. add_memory/add_memory_resource() must only be called with
> >>    device_hotplug_lock.
> >> 2. remove_memory() must only be called with device_hotplug_lock. This is
> >>    already documented and holds for all callers.
> >> 3. device_online()/device_offline() must only be called with
> >>    device_hotplug_lock. This is already documented and true for now in core
> >>    code. Other callers (related to memory hotplug) have to be fixed up.
> >> 4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
> >>    online_pages/offline_pages.
> >>
> >> To me, this looks way cleaner than what we have right now (and easier to
> >> verify). And looking at the documentation of remove_memory, using
> >> lock_device_hotplug also for add_memory() feels natural.
> >>
> > 
> > That seems reasonable, but also implies that device_online() would hold
> > back add/remove memory, could you please also document what mode
> > read/write the locks need to be held? For example can the device_hotplug_lock
> > be held in read mode while add/remove memory via (mem_hotplug_lock) is held
> > in write mode?
> 
> device_hotplug_lock is an ordinary mutex. So no option there.
> 
> Only mem_hotplug_lock is a per CPU RW mutex. And as of now it only
> exists to not require get_online_mems()/put_online_mems() to take the
> device_hotplug_lock. Which is perfectly valid, because these users only
> care about memory (not any other devices) not suddenly vanish. And that
> RW lock makes things fast.
> 
> Any modifications (online/offline/add/remove) require the
> mem_hotplug_lock in write.
> 
> I can add some more details to documentation in patch #6.
> 
> "... we should always hold the mem_hotplug_lock (via
> mem_hotplug_begin/mem_hotplug_done) in write mode to serialize memory
> hotplug" ..."
> 
> "In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in
> read mode allows for a quite efficient get_online_mems/put_online_mems
> implementation, so code accessing memory can protect from that memory
> vanishing."
> 
> Would that work for you?

Yes, Thanks

Balbir Singh.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-09-23  2:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-18 11:48 [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
2018-09-18 11:48 ` [PATCH v1 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock David Hildenbrand
2018-09-18 21:18   ` Rafael J. Wysocki
2018-09-18 11:48 ` [PATCH v1 2/6] mm/memory_hotplug: make add_memory() " David Hildenbrand
2018-09-18 21:19   ` Rafael J. Wysocki
2018-09-18 11:48 ` [PATCH v1 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
2018-09-18 11:48 ` [PATCH v1 4/6] powerpc/powernv: hold device_hotplug_lock when calling device_online() David Hildenbrand
2018-09-18 11:48 ` [PATCH v1 5/6] powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages() David Hildenbrand
2018-09-18 11:48 ` [PATCH v1 6/6] memory-hotplug.txt: Add some details about locking internals David Hildenbrand
2018-09-19  1:22 ` [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock Balbir Singh
2018-09-19  7:35   ` David Hildenbrand
2018-09-23  2:34     ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).