* [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
@ 2020-01-10 4:30 Dan Williams
2020-01-10 9:10 ` David Hildenbrand
0 siblings, 1 reply; 22+ messages in thread
From: Dan Williams @ 2020-01-10 4:30 UTC (permalink / raw)
To: akpm
Cc: stable, Vishal Verma, David Hildenbrand, Pavel Tatashin,
Michal Hocko, Dave Hansen, linux-mm, linux-kernel
The daxctl unit test for the dax_kmem driver currently triggers the
lockdep splat below. It results from the fact that
remove_memory_block_devices() is invoked under the mem_hotplug_lock()
causing lockdep entanglements with cpu_hotplug_lock().
The mem_hotplug_lock() is not needed to synchronize the memory block
device sysfs interface vs the page online state, that is already handled
by lock_device_hotplug(). Specifically lock_device_hotplug()
is sufficient to allow try_remove_memory() to check the offline
state of the memblocks and be assured that subsequent online attempts
will be blocked. The device_online() path checks mem->section_count
before allowing any state manipulations and mem->section_count is
cleared in remove_memory_block_devices().
The add_memory() path does create memblock devices under the lock, but
there is no lockdep report on that path, so it is left alone for now.
This change is only possible thanks to the recent change that refactored
memory block device removal out of arch_remove_memory() (commit
4c4b7f9ba948 mm/memory_hotplug: remove memory block devices before
arch_remove_memory()).
======================================================
WARNING: possible circular locking dependency detected
5.5.0-rc3+ #230 Tainted: G OE
------------------------------------------------------
lt-daxctl/6459 is trying to acquire lock:
ffff99c7f0003510 (kn->count#241){++++}, at: kernfs_remove_by_name_ns+0x41/0x80
but task is already holding lock:
ffffffffa76a5450 (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x20/0xe0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (mem_hotplug_lock.rw_sem){++++}:
__lock_acquire+0x39c/0x790
lock_acquire+0xa2/0x1b0
get_online_mems+0x3e/0xb0
kmem_cache_create_usercopy+0x2e/0x260
kmem_cache_create+0x12/0x20
ptlock_cache_init+0x20/0x28
start_kernel+0x243/0x547
secondary_startup_64+0xb6/0xc0
-> #1 (cpu_hotplug_lock.rw_sem){++++}:
__lock_acquire+0x39c/0x790
lock_acquire+0xa2/0x1b0
cpus_read_lock+0x3e/0xb0
online_pages+0x37/0x300
memory_subsys_online+0x17d/0x1c0
device_online+0x60/0x80
state_store+0x65/0xd0
kernfs_fop_write+0xcf/0x1c0
vfs_write+0xdb/0x1d0
ksys_write+0x65/0xe0
do_syscall_64+0x5c/0xa0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #0 (kn->count#241){++++}:
check_prev_add+0x98/0xa40
validate_chain+0x576/0x860
__lock_acquire+0x39c/0x790
lock_acquire+0xa2/0x1b0
__kernfs_remove+0x25f/0x2e0
kernfs_remove_by_name_ns+0x41/0x80
remove_files.isra.0+0x30/0x70
sysfs_remove_group+0x3d/0x80
sysfs_remove_groups+0x29/0x40
device_remove_attrs+0x39/0x70
device_del+0x16a/0x3f0
device_unregister+0x16/0x60
remove_memory_block_devices+0x82/0xb0
try_remove_memory+0xb5/0x130
remove_memory+0x26/0x40
dev_dax_kmem_remove+0x44/0x6a [kmem]
device_release_driver_internal+0xe4/0x1c0
unbind_store+0xef/0x120
kernfs_fop_write+0xcf/0x1c0
vfs_write+0xdb/0x1d0
ksys_write+0x65/0xe0
do_syscall_64+0x5c/0xa0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
other info that might help us debug this:
Chain exists of:
kn->count#241 --> cpu_hotplug_lock.rw_sem --> mem_hotplug_lock.rw_sem
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(mem_hotplug_lock.rw_sem);
lock(cpu_hotplug_lock.rw_sem);
lock(mem_hotplug_lock.rw_sem);
lock(kn->count#241);
*** DEADLOCK ***
No fixes tag as this seems to have been a long standing issue that
likely predated the addition of kernfs lockdep annotations.
Cc: <stable@vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
mm/memory_hotplug.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 55ac23ef11c1..a4e7dadded08 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1763,8 +1763,6 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
BUG_ON(check_hotplug_memory_range(start, size));
- mem_hotplug_begin();
-
/*
* All memory blocks must be offlined before removing memory. Check
* whether all memory blocks in question are offline and return error
@@ -1777,9 +1775,17 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
/* remove memmap entry */
firmware_map_remove(start, start + size, "System RAM");
- /* remove memory block devices before removing memory */
+ /*
+ * Remove memory block devices before removing memory, and do
+ * not hold the mem_hotplug_lock() over kobject removal
+ * operations. lock_device_hotplug() keeps the
+ * check_memblock_offlined_cb result valid until the entire
+ * removal process is complete.
+ */
remove_memory_block_devices(start, size);
+ mem_hotplug_begin();
+
arch_remove_memory(nid, start, size, NULL);
memblock_free(start, size);
memblock_remove(start, size);
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
2020-01-10 4:30 [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat Dan Williams
@ 2020-01-10 9:10 ` David Hildenbrand
2020-01-10 16:42 ` Dan Williams
0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2020-01-10 9:10 UTC (permalink / raw)
To: Dan Williams, akpm
Cc: stable, Vishal Verma, Pavel Tatashin, Michal Hocko, Dave Hansen,
linux-mm, linux-kernel
On 10.01.20 05:30, Dan Williams wrote:
> The daxctl unit test for the dax_kmem driver currently triggers the
> lockdep splat below. It results from the fact that
> remove_memory_block_devices() is invoked under the mem_hotplug_lock()
> causing lockdep entanglements with cpu_hotplug_lock().
>
> The mem_hotplug_lock() is not needed to synchronize the memory block
> device sysfs interface vs the page online state, that is already handled
> by lock_device_hotplug(). Specifically lock_device_hotplug()
> is sufficient to allow try_remove_memory() to check the offline
> state of the memblocks and be assured that subsequent online attempts
> will be blocked. The device_online() path checks mem->section_count
> before allowing any state manipulations and mem->section_count is
> cleared in remove_memory_block_devices().
>
> The add_memory() path does create memblock devices under the lock, but
> there is no lockdep report on that path, so it is left alone for now.
>
> This change is only possible thanks to the recent change that refactored
> memory block device removal out of arch_remove_memory() (commit
> 4c4b7f9ba948 mm/memory_hotplug: remove memory block devices before
> arch_remove_memory()).
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.5.0-rc3+ #230 Tainted: G OE
> ------------------------------------------------------
> lt-daxctl/6459 is trying to acquire lock:
> ffff99c7f0003510 (kn->count#241){++++}, at: kernfs_remove_by_name_ns+0x41/0x80
>
> but task is already holding lock:
> ffffffffa76a5450 (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x20/0xe0
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (mem_hotplug_lock.rw_sem){++++}:
> __lock_acquire+0x39c/0x790
> lock_acquire+0xa2/0x1b0
> get_online_mems+0x3e/0xb0
> kmem_cache_create_usercopy+0x2e/0x260
> kmem_cache_create+0x12/0x20
> ptlock_cache_init+0x20/0x28
> start_kernel+0x243/0x547
> secondary_startup_64+0xb6/0xc0
>
> -> #1 (cpu_hotplug_lock.rw_sem){++++}:
> __lock_acquire+0x39c/0x790
> lock_acquire+0xa2/0x1b0
> cpus_read_lock+0x3e/0xb0
> online_pages+0x37/0x300
> memory_subsys_online+0x17d/0x1c0
> device_online+0x60/0x80
> state_store+0x65/0xd0
> kernfs_fop_write+0xcf/0x1c0
> vfs_write+0xdb/0x1d0
> ksys_write+0x65/0xe0
> do_syscall_64+0x5c/0xa0
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #0 (kn->count#241){++++}:
> check_prev_add+0x98/0xa40
> validate_chain+0x576/0x860
> __lock_acquire+0x39c/0x790
> lock_acquire+0xa2/0x1b0
> __kernfs_remove+0x25f/0x2e0
> kernfs_remove_by_name_ns+0x41/0x80
> remove_files.isra.0+0x30/0x70
> sysfs_remove_group+0x3d/0x80
> sysfs_remove_groups+0x29/0x40
> device_remove_attrs+0x39/0x70
> device_del+0x16a/0x3f0
> device_unregister+0x16/0x60
> remove_memory_block_devices+0x82/0xb0
> try_remove_memory+0xb5/0x130
> remove_memory+0x26/0x40
> dev_dax_kmem_remove+0x44/0x6a [kmem]
> device_release_driver_internal+0xe4/0x1c0
> unbind_store+0xef/0x120
> kernfs_fop_write+0xcf/0x1c0
> vfs_write+0xdb/0x1d0
> ksys_write+0x65/0xe0
> do_syscall_64+0x5c/0xa0
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> other info that might help us debug this:
>
> Chain exists of:
> kn->count#241 --> cpu_hotplug_lock.rw_sem --> mem_hotplug_lock.rw_sem
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(mem_hotplug_lock.rw_sem);
> lock(cpu_hotplug_lock.rw_sem);
> lock(mem_hotplug_lock.rw_sem);
> lock(kn->count#241);
>
> *** DEADLOCK ***
>
> No fixes tag as this seems to have been a long standing issue that
> likely predated the addition of kernfs lockdep annotations.
>
> Cc: <stable@vger.kernel.org>
> Cc: Vishal Verma <vishal.l.verma@intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> mm/memory_hotplug.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 55ac23ef11c1..a4e7dadded08 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1763,8 +1763,6 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
>
> BUG_ON(check_hotplug_memory_range(start, size));
>
> - mem_hotplug_begin();
> -
> /*
> * All memory blocks must be offlined before removing memory. Check
> * whether all memory blocks in question are offline and return error
> @@ -1777,9 +1775,17 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
> /* remove memmap entry */
> firmware_map_remove(start, start + size, "System RAM");
>
> - /* remove memory block devices before removing memory */
> + /*
> + * Remove memory block devices before removing memory, and do
> + * not hold the mem_hotplug_lock() over kobject removal
> + * operations. lock_device_hotplug() keeps the
> + * check_memblock_offlined_cb result valid until the entire
> + * removal process is complete.
> + */
Maybe shorten that to
/*
* Remove memory block devices before removing memory. Protected
* by the device_hotplug_lock only.
*/
AFAIK, the device hotplug lock is sufficient here. The memory hotplug
lock / cpu hotplug lock is only needed when calling into arch code
(especially for PPC). We hold both locks when onlining/offlining memory.
> remove_memory_block_devices(start, size);
>
> + mem_hotplug_begin();
> +
> arch_remove_memory(nid, start, size, NULL);
> memblock_free(start, size);
> memblock_remove(start, size);
>
I'd suggest to do the same in the adding part right away (if easily
possible) to make it clearer. I properly documented the semantics of
add_memory_block_devices()/remove_memory_block_devices() already (that
they need the device hotplug lock).
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
2020-01-10 9:10 ` David Hildenbrand
@ 2020-01-10 16:42 ` Dan Williams
0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2020-01-10 16:42 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, stable, Vishal Verma, Pavel Tatashin,
Michal Hocko, Dave Hansen, Linux MM, Linux Kernel Mailing List
On Fri, Jan 10, 2020 at 1:10 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 10.01.20 05:30, Dan Williams wrote:
> > The daxctl unit test for the dax_kmem driver currently triggers the
> > lockdep splat below. It results from the fact that
> > remove_memory_block_devices() is invoked under the mem_hotplug_lock()
> > causing lockdep entanglements with cpu_hotplug_lock().
> >
> > The mem_hotplug_lock() is not needed to synchronize the memory block
> > device sysfs interface vs the page online state, that is already handled
> > by lock_device_hotplug(). Specifically lock_device_hotplug()
> > is sufficient to allow try_remove_memory() to check the offline
> > state of the memblocks and be assured that subsequent online attempts
> > will be blocked. The device_online() path checks mem->section_count
> > before allowing any state manipulations and mem->section_count is
> > cleared in remove_memory_block_devices().
> >
> > The add_memory() path does create memblock devices under the lock, but
> > there is no lockdep report on that path, so it is left alone for now.
> >
> > This change is only possible thanks to the recent change that refactored
> > memory block device removal out of arch_remove_memory() (commit
> > 4c4b7f9ba948 mm/memory_hotplug: remove memory block devices before
> > arch_remove_memory()).
> >
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 5.5.0-rc3+ #230 Tainted: G OE
> > ------------------------------------------------------
> > lt-daxctl/6459 is trying to acquire lock:
> > ffff99c7f0003510 (kn->count#241){++++}, at: kernfs_remove_by_name_ns+0x41/0x80
> >
> > but task is already holding lock:
> > ffffffffa76a5450 (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x20/0xe0
> >
> > which lock already depends on the new lock.
> >
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #2 (mem_hotplug_lock.rw_sem){++++}:
> > __lock_acquire+0x39c/0x790
> > lock_acquire+0xa2/0x1b0
> > get_online_mems+0x3e/0xb0
> > kmem_cache_create_usercopy+0x2e/0x260
> > kmem_cache_create+0x12/0x20
> > ptlock_cache_init+0x20/0x28
> > start_kernel+0x243/0x547
> > secondary_startup_64+0xb6/0xc0
> >
> > -> #1 (cpu_hotplug_lock.rw_sem){++++}:
> > __lock_acquire+0x39c/0x790
> > lock_acquire+0xa2/0x1b0
> > cpus_read_lock+0x3e/0xb0
> > online_pages+0x37/0x300
> > memory_subsys_online+0x17d/0x1c0
> > device_online+0x60/0x80
> > state_store+0x65/0xd0
> > kernfs_fop_write+0xcf/0x1c0
> > vfs_write+0xdb/0x1d0
> > ksys_write+0x65/0xe0
> > do_syscall_64+0x5c/0xa0
> > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > -> #0 (kn->count#241){++++}:
> > check_prev_add+0x98/0xa40
> > validate_chain+0x576/0x860
> > __lock_acquire+0x39c/0x790
> > lock_acquire+0xa2/0x1b0
> > __kernfs_remove+0x25f/0x2e0
> > kernfs_remove_by_name_ns+0x41/0x80
> > remove_files.isra.0+0x30/0x70
> > sysfs_remove_group+0x3d/0x80
> > sysfs_remove_groups+0x29/0x40
> > device_remove_attrs+0x39/0x70
> > device_del+0x16a/0x3f0
> > device_unregister+0x16/0x60
> > remove_memory_block_devices+0x82/0xb0
> > try_remove_memory+0xb5/0x130
> > remove_memory+0x26/0x40
> > dev_dax_kmem_remove+0x44/0x6a [kmem]
> > device_release_driver_internal+0xe4/0x1c0
> > unbind_store+0xef/0x120
> > kernfs_fop_write+0xcf/0x1c0
> > vfs_write+0xdb/0x1d0
> > ksys_write+0x65/0xe0
> > do_syscall_64+0x5c/0xa0
> > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > other info that might help us debug this:
> >
> > Chain exists of:
> > kn->count#241 --> cpu_hotplug_lock.rw_sem --> mem_hotplug_lock.rw_sem
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(mem_hotplug_lock.rw_sem);
> > lock(cpu_hotplug_lock.rw_sem);
> > lock(mem_hotplug_lock.rw_sem);
> > lock(kn->count#241);
> >
> > *** DEADLOCK ***
> >
> > No fixes tag as this seems to have been a long standing issue that
> > likely predated the addition of kernfs lockdep annotations.
> >
> > Cc: <stable@vger.kernel.org>
> > Cc: Vishal Verma <vishal.l.verma@intel.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> > mm/memory_hotplug.c | 12 +++++++++---
> > 1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 55ac23ef11c1..a4e7dadded08 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1763,8 +1763,6 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
> >
> > BUG_ON(check_hotplug_memory_range(start, size));
> >
> > - mem_hotplug_begin();
> > -
> > /*
> > * All memory blocks must be offlined before removing memory. Check
> > * whether all memory blocks in question are offline and return error
> > @@ -1777,9 +1775,17 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
> > /* remove memmap entry */
> > firmware_map_remove(start, start + size, "System RAM");
> >
> > - /* remove memory block devices before removing memory */
> > + /*
> > + * Remove memory block devices before removing memory, and do
> > + * not hold the mem_hotplug_lock() over kobject removal
> > + * operations. lock_device_hotplug() keeps the
> > + * check_memblock_offlined_cb result valid until the entire
> > + * removal process is complete.
> > + */
>
> Maybe shorten that to
>
> /*
> * Remove memory block devices before removing memory. Protected
> * by the device_hotplug_lock only.
> */
Why make someone dig for the reasons this lock is sufficient?
>
> AFAIK, the device hotplug lock is sufficient here. The memory hotplug
> lock / cpu hotplug lock is only needed when calling into arch code
> (especially for PPC). We hold both locks when onlining/offlining memory.
>
> > remove_memory_block_devices(start, size);
> >
> > + mem_hotplug_begin();
> > +
> > arch_remove_memory(nid, start, size, NULL);
> > memblock_free(start, size);
> > memblock_remove(start, size);
> >
>
> I'd suggest to do the same in the adding part right away (if easily
> possible) to make it clearer.
Let's let this fix percolate upstream for a bit to make sure there was
no protection the mem_hotplug_begin() was inadvertently providing.
> I properly documented the semantics of
> add_memory_block_devices()/remove_memory_block_devices() already (that
> they need the device hotplug lock).
I see that, but I prefer lockdep_assert_held() in the code rather than
comments. I'll send a patch to fix that up.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
@ 2020-01-10 16:42 ` Dan Williams
0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2020-01-10 16:42 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, stable, Vishal Verma, Pavel Tatashin,
Michal Hocko, Dave Hansen, Linux MM, Linux Kernel Mailing List
On Fri, Jan 10, 2020 at 1:10 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 10.01.20 05:30, Dan Williams wrote:
> > The daxctl unit test for the dax_kmem driver currently triggers the
> > lockdep splat below. It results from the fact that
> > remove_memory_block_devices() is invoked under the mem_hotplug_lock()
> > causing lockdep entanglements with cpu_hotplug_lock().
> >
> > The mem_hotplug_lock() is not needed to synchronize the memory block
> > device sysfs interface vs the page online state, that is already handled
> > by lock_device_hotplug(). Specifically lock_device_hotplug()
> > is sufficient to allow try_remove_memory() to check the offline
> > state of the memblocks and be assured that subsequent online attempts
> > will be blocked. The device_online() path checks mem->section_count
> > before allowing any state manipulations and mem->section_count is
> > cleared in remove_memory_block_devices().
> >
> > The add_memory() path does create memblock devices under the lock, but
> > there is no lockdep report on that path, so it is left alone for now.
> >
> > This change is only possible thanks to the recent change that refactored
> > memory block device removal out of arch_remove_memory() (commit
> > 4c4b7f9ba948 mm/memory_hotplug: remove memory block devices before
> > arch_remove_memory()).
> >
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 5.5.0-rc3+ #230 Tainted: G OE
> > ------------------------------------------------------
> > lt-daxctl/6459 is trying to acquire lock:
> > ffff99c7f0003510 (kn->count#241){++++}, at: kernfs_remove_by_name_ns+0x41/0x80
> >
> > but task is already holding lock:
> > ffffffffa76a5450 (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x20/0xe0
> >
> > which lock already depends on the new lock.
> >
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #2 (mem_hotplug_lock.rw_sem){++++}:
> > __lock_acquire+0x39c/0x790
> > lock_acquire+0xa2/0x1b0
> > get_online_mems+0x3e/0xb0
> > kmem_cache_create_usercopy+0x2e/0x260
> > kmem_cache_create+0x12/0x20
> > ptlock_cache_init+0x20/0x28
> > start_kernel+0x243/0x547
> > secondary_startup_64+0xb6/0xc0
> >
> > -> #1 (cpu_hotplug_lock.rw_sem){++++}:
> > __lock_acquire+0x39c/0x790
> > lock_acquire+0xa2/0x1b0
> > cpus_read_lock+0x3e/0xb0
> > online_pages+0x37/0x300
> > memory_subsys_online+0x17d/0x1c0
> > device_online+0x60/0x80
> > state_store+0x65/0xd0
> > kernfs_fop_write+0xcf/0x1c0
> > vfs_write+0xdb/0x1d0
> > ksys_write+0x65/0xe0
> > do_syscall_64+0x5c/0xa0
> > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > -> #0 (kn->count#241){++++}:
> > check_prev_add+0x98/0xa40
> > validate_chain+0x576/0x860
> > __lock_acquire+0x39c/0x790
> > lock_acquire+0xa2/0x1b0
> > __kernfs_remove+0x25f/0x2e0
> > kernfs_remove_by_name_ns+0x41/0x80
> > remove_files.isra.0+0x30/0x70
> > sysfs_remove_group+0x3d/0x80
> > sysfs_remove_groups+0x29/0x40
> > device_remove_attrs+0x39/0x70
> > device_del+0x16a/0x3f0
> > device_unregister+0x16/0x60
> > remove_memory_block_devices+0x82/0xb0
> > try_remove_memory+0xb5/0x130
> > remove_memory+0x26/0x40
> > dev_dax_kmem_remove+0x44/0x6a [kmem]
> > device_release_driver_internal+0xe4/0x1c0
> > unbind_store+0xef/0x120
> > kernfs_fop_write+0xcf/0x1c0
> > vfs_write+0xdb/0x1d0
> > ksys_write+0x65/0xe0
> > do_syscall_64+0x5c/0xa0
> > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > other info that might help us debug this:
> >
> > Chain exists of:
> > kn->count#241 --> cpu_hotplug_lock.rw_sem --> mem_hotplug_lock.rw_sem
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(mem_hotplug_lock.rw_sem);
> > lock(cpu_hotplug_lock.rw_sem);
> > lock(mem_hotplug_lock.rw_sem);
> > lock(kn->count#241);
> >
> > *** DEADLOCK ***
> >
> > No fixes tag as this seems to have been a long standing issue that
> > likely predated the addition of kernfs lockdep annotations.
> >
> > Cc: <stable@vger.kernel.org>
> > Cc: Vishal Verma <vishal.l.verma@intel.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> > mm/memory_hotplug.c | 12 +++++++++---
> > 1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 55ac23ef11c1..a4e7dadded08 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1763,8 +1763,6 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
> >
> > BUG_ON(check_hotplug_memory_range(start, size));
> >
> > - mem_hotplug_begin();
> > -
> > /*
> > * All memory blocks must be offlined before removing memory. Check
> > * whether all memory blocks in question are offline and return error
> > @@ -1777,9 +1775,17 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
> > /* remove memmap entry */
> > firmware_map_remove(start, start + size, "System RAM");
> >
> > - /* remove memory block devices before removing memory */
> > + /*
> > + * Remove memory block devices before removing memory, and do
> > + * not hold the mem_hotplug_lock() over kobject removal
> > + * operations. lock_device_hotplug() keeps the
> > + * check_memblock_offlined_cb result valid until the entire
> > + * removal process is complete.
> > + */
>
> Maybe shorten that to
>
> /*
> * Remove memory block devices before removing memory. Protected
> * by the device_hotplug_lock only.
> */
Why make someone dig for the reasons this lock is sufficient?
>
> AFAIK, the device hotplug lock is sufficient here. The memory hotplug
> lock / cpu hotplug lock is only needed when calling into arch code
> (especially for PPC). We hold both locks when onlining/offlining memory.
>
> > remove_memory_block_devices(start, size);
> >
> > + mem_hotplug_begin();
> > +
> > arch_remove_memory(nid, start, size, NULL);
> > memblock_free(start, size);
> > memblock_remove(start, size);
> >
>
> I'd suggest to do the same in the adding part right away (if easily
> possible) to make it clearer.
Let's let this fix percolate upstream for a bit to make sure there was
no protection the mem_hotplug_begin() was inadvertently providing.
> I properly documented the semantics of
> add_memory_block_devices()/remove_memory_block_devices() already (that
> they need the device hotplug lock).
I see that, but I prefer lockdep_assert_held() in the code rather than
comments. I'll send a patch to fix that up.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
2020-01-10 16:42 ` Dan Williams
(?)
@ 2020-01-10 16:54 ` David Hildenbrand
2020-01-10 16:57 ` David Hildenbrand
2020-01-10 17:24 ` Dan Williams
-1 siblings, 2 replies; 22+ messages in thread
From: David Hildenbrand @ 2020-01-10 16:54 UTC (permalink / raw)
To: Dan Williams
Cc: Andrew Morton, stable, Vishal Verma, Pavel Tatashin,
Michal Hocko, Dave Hansen, Linux MM, Linux Kernel Mailing List
On 10.01.20 17:42, Dan Williams wrote:
> On Fri, Jan 10, 2020 at 1:10 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 10.01.20 05:30, Dan Williams wrote:
>>> The daxctl unit test for the dax_kmem driver currently triggers the
>>> lockdep splat below. It results from the fact that
>>> remove_memory_block_devices() is invoked under the mem_hotplug_lock()
>>> causing lockdep entanglements with cpu_hotplug_lock().
>>>
>>> The mem_hotplug_lock() is not needed to synchronize the memory block
>>> device sysfs interface vs the page online state, that is already handled
>>> by lock_device_hotplug(). Specifically lock_device_hotplug()
>>> is sufficient to allow try_remove_memory() to check the offline
>>> state of the memblocks and be assured that subsequent online attempts
>>> will be blocked. The device_online() path checks mem->section_count
>>> before allowing any state manipulations and mem->section_count is
>>> cleared in remove_memory_block_devices().
>>>
>>> The add_memory() path does create memblock devices under the lock, but
>>> there is no lockdep report on that path, so it is left alone for now.
>>>
>>> This change is only possible thanks to the recent change that refactored
>>> memory block device removal out of arch_remove_memory() (commit
>>> 4c4b7f9ba948 mm/memory_hotplug: remove memory block devices before
>>> arch_remove_memory()).
>>>
>>> ======================================================
>>> WARNING: possible circular locking dependency detected
>>> 5.5.0-rc3+ #230 Tainted: G OE
>>> ------------------------------------------------------
>>> lt-daxctl/6459 is trying to acquire lock:
>>> ffff99c7f0003510 (kn->count#241){++++}, at: kernfs_remove_by_name_ns+0x41/0x80
>>>
>>> but task is already holding lock:
>>> ffffffffa76a5450 (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x20/0xe0
>>>
>>> which lock already depends on the new lock.
>>>
>>>
>>> the existing dependency chain (in reverse order) is:
>>>
>>> -> #2 (mem_hotplug_lock.rw_sem){++++}:
>>> __lock_acquire+0x39c/0x790
>>> lock_acquire+0xa2/0x1b0
>>> get_online_mems+0x3e/0xb0
>>> kmem_cache_create_usercopy+0x2e/0x260
>>> kmem_cache_create+0x12/0x20
>>> ptlock_cache_init+0x20/0x28
>>> start_kernel+0x243/0x547
>>> secondary_startup_64+0xb6/0xc0
>>>
>>> -> #1 (cpu_hotplug_lock.rw_sem){++++}:
>>> __lock_acquire+0x39c/0x790
>>> lock_acquire+0xa2/0x1b0
>>> cpus_read_lock+0x3e/0xb0
>>> online_pages+0x37/0x300
>>> memory_subsys_online+0x17d/0x1c0
>>> device_online+0x60/0x80
>>> state_store+0x65/0xd0
>>> kernfs_fop_write+0xcf/0x1c0
>>> vfs_write+0xdb/0x1d0
>>> ksys_write+0x65/0xe0
>>> do_syscall_64+0x5c/0xa0
>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>
>>> -> #0 (kn->count#241){++++}:
>>> check_prev_add+0x98/0xa40
>>> validate_chain+0x576/0x860
>>> __lock_acquire+0x39c/0x790
>>> lock_acquire+0xa2/0x1b0
>>> __kernfs_remove+0x25f/0x2e0
>>> kernfs_remove_by_name_ns+0x41/0x80
>>> remove_files.isra.0+0x30/0x70
>>> sysfs_remove_group+0x3d/0x80
>>> sysfs_remove_groups+0x29/0x40
>>> device_remove_attrs+0x39/0x70
>>> device_del+0x16a/0x3f0
>>> device_unregister+0x16/0x60
>>> remove_memory_block_devices+0x82/0xb0
>>> try_remove_memory+0xb5/0x130
>>> remove_memory+0x26/0x40
>>> dev_dax_kmem_remove+0x44/0x6a [kmem]
>>> device_release_driver_internal+0xe4/0x1c0
>>> unbind_store+0xef/0x120
>>> kernfs_fop_write+0xcf/0x1c0
>>> vfs_write+0xdb/0x1d0
>>> ksys_write+0x65/0xe0
>>> do_syscall_64+0x5c/0xa0
>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>
>>> other info that might help us debug this:
>>>
>>> Chain exists of:
>>> kn->count#241 --> cpu_hotplug_lock.rw_sem --> mem_hotplug_lock.rw_sem
>>>
>>> Possible unsafe locking scenario:
>>>
>>> CPU0 CPU1
>>> ---- ----
>>> lock(mem_hotplug_lock.rw_sem);
>>> lock(cpu_hotplug_lock.rw_sem);
>>> lock(mem_hotplug_lock.rw_sem);
>>> lock(kn->count#241);
>>>
>>> *** DEADLOCK ***
>>>
>>> No fixes tag as this seems to have been a long standing issue that
>>> likely predated the addition of kernfs lockdep annotations.
>>>
>>> Cc: <stable@vger.kernel.org>
>>> Cc: Vishal Verma <vishal.l.verma@intel.com>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>> ---
>>> mm/memory_hotplug.c | 12 +++++++++---
>>> 1 file changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 55ac23ef11c1..a4e7dadded08 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -1763,8 +1763,6 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
>>>
>>> BUG_ON(check_hotplug_memory_range(start, size));
>>>
>>> - mem_hotplug_begin();
>>> -
>>> /*
>>> * All memory blocks must be offlined before removing memory. Check
>>> * whether all memory blocks in question are offline and return error
>>> @@ -1777,9 +1775,17 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
>>> /* remove memmap entry */
>>> firmware_map_remove(start, start + size, "System RAM");
>>>
>>> - /* remove memory block devices before removing memory */
>>> + /*
>>> + * Remove memory block devices before removing memory, and do
>>> + * not hold the mem_hotplug_lock() over kobject removal
>>> + * operations. lock_device_hotplug() keeps the
>>> + * check_memblock_offlined_cb result valid until the entire
>>> + * removal process is complete.
>>> + */
>>
>> Maybe shorten that to
>>
>> /*
>> * Remove memory block devices before removing memory. Protected
>> * by the device_hotplug_lock only.
>> */
>
> Why make someone dig for the reasons this lock is sufficient?
I think 5 LOC of comment are too much for something that is documented
e.g., in Documentation/core-api/memory-hotplug.rst ("Locking
Internals"). But whatever you prefer.
>
>>
>> AFAIK, the device hotplug lock is sufficient here. The memory hotplug
>> lock / cpu hotplug lock is only needed when calling into arch code
>> (especially for PPC). We hold both locks when onlining/offlining memory.
>>
>>> remove_memory_block_devices(start, size);
>>>
>>> + mem_hotplug_begin();
>>> +
>>> arch_remove_memory(nid, start, size, NULL);
>>> memblock_free(start, size);
>>> memblock_remove(start, size);
>>>
>>
>> I'd suggest to do the same in the adding part right away (if easily
>> possible) to make it clearer.
>
> Let's let this fix percolate upstream for a bit to make sure there was
> no protection the mem_hotplug_begin() was inadvertently providing.
Yeah, why not.
>
>> I properly documented the semantics of
>> add_memory_block_devices()/remove_memory_block_devices() already (that
>> they need the device hotplug lock).
>
> I see that, but I prefer lockdep_assert_held() in the code rather than
> comments. I'll send a patch to fix that up.
That won't work as early boot code from ACPI won't hold it while it adds
memory. And we decided (especially Michal :) ) to keep it like that.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
2020-01-10 16:54 ` David Hildenbrand
@ 2020-01-10 16:57 ` David Hildenbrand
2020-01-10 17:24 ` Dan Williams
1 sibling, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2020-01-10 16:57 UTC (permalink / raw)
To: Dan Williams
Cc: Andrew Morton, stable, Vishal Verma, Pavel Tatashin,
Michal Hocko, Dave Hansen, Linux MM, Linux Kernel Mailing List
On 10.01.20 17:54, David Hildenbrand wrote:
> On 10.01.20 17:42, Dan Williams wrote:
>> On Fri, Jan 10, 2020 at 1:10 AM David Hildenbrand <david@redhat.com> wrote:
>>>
>>> On 10.01.20 05:30, Dan Williams wrote:
>>>> The daxctl unit test for the dax_kmem driver currently triggers the
>>>> lockdep splat below. It results from the fact that
>>>> remove_memory_block_devices() is invoked under the mem_hotplug_lock()
>>>> causing lockdep entanglements with cpu_hotplug_lock().
>>>>
>>>> The mem_hotplug_lock() is not needed to synchronize the memory block
>>>> device sysfs interface vs the page online state, that is already handled
>>>> by lock_device_hotplug(). Specifically lock_device_hotplug()
>>>> is sufficient to allow try_remove_memory() to check the offline
>>>> state of the memblocks and be assured that subsequent online attempts
>>>> will be blocked. The device_online() path checks mem->section_count
>>>> before allowing any state manipulations and mem->section_count is
>>>> cleared in remove_memory_block_devices().
>>>>
>>>> The add_memory() path does create memblock devices under the lock, but
>>>> there is no lockdep report on that path, so it is left alone for now.
>>>>
>>>> This change is only possible thanks to the recent change that refactored
>>>> memory block device removal out of arch_remove_memory() (commit
>>>> 4c4b7f9ba948 mm/memory_hotplug: remove memory block devices before
>>>> arch_remove_memory()).
>>>>
>>>> ======================================================
>>>> WARNING: possible circular locking dependency detected
>>>> 5.5.0-rc3+ #230 Tainted: G OE
>>>> ------------------------------------------------------
>>>> lt-daxctl/6459 is trying to acquire lock:
>>>> ffff99c7f0003510 (kn->count#241){++++}, at: kernfs_remove_by_name_ns+0x41/0x80
>>>>
>>>> but task is already holding lock:
>>>> ffffffffa76a5450 (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x20/0xe0
>>>>
>>>> which lock already depends on the new lock.
>>>>
>>>>
>>>> the existing dependency chain (in reverse order) is:
>>>>
>>>> -> #2 (mem_hotplug_lock.rw_sem){++++}:
>>>> __lock_acquire+0x39c/0x790
>>>> lock_acquire+0xa2/0x1b0
>>>> get_online_mems+0x3e/0xb0
>>>> kmem_cache_create_usercopy+0x2e/0x260
>>>> kmem_cache_create+0x12/0x20
>>>> ptlock_cache_init+0x20/0x28
>>>> start_kernel+0x243/0x547
>>>> secondary_startup_64+0xb6/0xc0
>>>>
>>>> -> #1 (cpu_hotplug_lock.rw_sem){++++}:
>>>> __lock_acquire+0x39c/0x790
>>>> lock_acquire+0xa2/0x1b0
>>>> cpus_read_lock+0x3e/0xb0
>>>> online_pages+0x37/0x300
>>>> memory_subsys_online+0x17d/0x1c0
>>>> device_online+0x60/0x80
>>>> state_store+0x65/0xd0
>>>> kernfs_fop_write+0xcf/0x1c0
>>>> vfs_write+0xdb/0x1d0
>>>> ksys_write+0x65/0xe0
>>>> do_syscall_64+0x5c/0xa0
>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>
>>>> -> #0 (kn->count#241){++++}:
>>>> check_prev_add+0x98/0xa40
>>>> validate_chain+0x576/0x860
>>>> __lock_acquire+0x39c/0x790
>>>> lock_acquire+0xa2/0x1b0
>>>> __kernfs_remove+0x25f/0x2e0
>>>> kernfs_remove_by_name_ns+0x41/0x80
>>>> remove_files.isra.0+0x30/0x70
>>>> sysfs_remove_group+0x3d/0x80
>>>> sysfs_remove_groups+0x29/0x40
>>>> device_remove_attrs+0x39/0x70
>>>> device_del+0x16a/0x3f0
>>>> device_unregister+0x16/0x60
>>>> remove_memory_block_devices+0x82/0xb0
>>>> try_remove_memory+0xb5/0x130
>>>> remove_memory+0x26/0x40
>>>> dev_dax_kmem_remove+0x44/0x6a [kmem]
>>>> device_release_driver_internal+0xe4/0x1c0
>>>> unbind_store+0xef/0x120
>>>> kernfs_fop_write+0xcf/0x1c0
>>>> vfs_write+0xdb/0x1d0
>>>> ksys_write+0x65/0xe0
>>>> do_syscall_64+0x5c/0xa0
>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>
>>>> other info that might help us debug this:
>>>>
>>>> Chain exists of:
>>>> kn->count#241 --> cpu_hotplug_lock.rw_sem --> mem_hotplug_lock.rw_sem
>>>>
>>>> Possible unsafe locking scenario:
>>>>
>>>> CPU0 CPU1
>>>> ---- ----
>>>> lock(mem_hotplug_lock.rw_sem);
>>>> lock(cpu_hotplug_lock.rw_sem);
>>>> lock(mem_hotplug_lock.rw_sem);
>>>> lock(kn->count#241);
>>>>
>>>> *** DEADLOCK ***
>>>>
>>>> No fixes tag as this seems to have been a long standing issue that
>>>> likely predated the addition of kernfs lockdep annotations.
>>>>
>>>> Cc: <stable@vger.kernel.org>
>>>> Cc: Vishal Verma <vishal.l.verma@intel.com>
>>>> Cc: David Hildenbrand <david@redhat.com>
>>>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>>> ---
>>>> mm/memory_hotplug.c | 12 +++++++++---
>>>> 1 file changed, 9 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>> index 55ac23ef11c1..a4e7dadded08 100644
>>>> --- a/mm/memory_hotplug.c
>>>> +++ b/mm/memory_hotplug.c
>>>> @@ -1763,8 +1763,6 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
>>>>
>>>> BUG_ON(check_hotplug_memory_range(start, size));
>>>>
>>>> - mem_hotplug_begin();
>>>> -
>>>> /*
>>>> * All memory blocks must be offlined before removing memory. Check
>>>> * whether all memory blocks in question are offline and return error
>>>> @@ -1777,9 +1775,17 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
>>>> /* remove memmap entry */
>>>> firmware_map_remove(start, start + size, "System RAM");
>>>>
>>>> - /* remove memory block devices before removing memory */
>>>> + /*
>>>> + * Remove memory block devices before removing memory, and do
>>>> + * not hold the mem_hotplug_lock() over kobject removal
>>>> + * operations. lock_device_hotplug() keeps the
>>>> + * check_memblock_offlined_cb result valid until the entire
>>>> + * removal process is complete.
>>>> + */
>>>
>>> Maybe shorten that to
>>>
>>> /*
>>> * Remove memory block devices before removing memory. Protected
>>> * by the device_hotplug_lock only.
>>> */
>>
>> Why make someone dig for the reasons this lock is sufficient?
>
> I think 5 LOC of comment are too much for something that is documented
> e.g., in Documentation/core-api/memory-hotplug.rst ("Locking
> Internals"). But whatever you prefer.
>
>>
>>>
>>> AFAIK, the device hotplug lock is sufficient here. The memory hotplug
>>> lock / cpu hotplug lock is only needed when calling into arch code
>>> (especially for PPC). We hold both locks when onlining/offlining memory.
>>>
>>>> remove_memory_block_devices(start, size);
>>>>
>>>> + mem_hotplug_begin();
>>>> +
>>>> arch_remove_memory(nid, start, size, NULL);
>>>> memblock_free(start, size);
>>>> memblock_remove(start, size);
>>>>
>>>
>>> I'd suggest to do the same in the adding part right away (if easily
>>> possible) to make it clearer.
>>
>> Let's let this fix percolate upstream for a bit to make sure there was
>> no protection the mem_hotplug_begin() was inadvertently providing.
>
> Yeah, why not.
>
>>
>>> I properly documented the semantics of
>>> add_memory_block_devices()/remove_memory_block_devices() already (that
>>> they need the device hotplug lock).
>>
>> I see that, but I prefer lockdep_assert_held() in the code rather than
>> comments. I'll send a patch to fix that up.
>
> That won't work as early boot code from ACPI won't hold it while it adds
> memory. And we decided (especially Michal :) ) to keep it like that.
>
Was only thinking about the adding part, it could work for the removal
part, though :)
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
2020-01-10 16:54 ` David Hildenbrand
@ 2020-01-10 17:24 ` Dan Williams
2020-01-10 17:24 ` Dan Williams
1 sibling, 0 replies; 22+ messages in thread
From: Dan Williams @ 2020-01-10 17:24 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, stable, Vishal Verma, Pavel Tatashin,
Michal Hocko, Dave Hansen, Linux MM, Linux Kernel Mailing List
On Fri, Jan 10, 2020 at 8:54 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 10.01.20 17:42, Dan Williams wrote:
> > On Fri, Jan 10, 2020 at 1:10 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 10.01.20 05:30, Dan Williams wrote:
> >>> The daxctl unit test for the dax_kmem driver currently triggers the
> >>> lockdep splat below. It results from the fact that
> >>> remove_memory_block_devices() is invoked under the mem_hotplug_lock()
> >>> causing lockdep entanglements with cpu_hotplug_lock().
> >>>
> >>> The mem_hotplug_lock() is not needed to synchronize the memory block
> >>> device sysfs interface vs the page online state, that is already handled
> >>> by lock_device_hotplug(). Specifically lock_device_hotplug()
> >>> is sufficient to allow try_remove_memory() to check the offline
> >>> state of the memblocks and be assured that subsequent online attempts
> >>> will be blocked. The device_online() path checks mem->section_count
> >>> before allowing any state manipulations and mem->section_count is
> >>> cleared in remove_memory_block_devices().
> >>>
> >>> The add_memory() path does create memblock devices under the lock, but
> >>> there is no lockdep report on that path, so it is left alone for now.
> >>>
> >>> This change is only possible thanks to the recent change that refactored
> >>> memory block device removal out of arch_remove_memory() (commit
> >>> 4c4b7f9ba948 mm/memory_hotplug: remove memory block devices before
> >>> arch_remove_memory()).
> >>>
> >>> ======================================================
> >>> WARNING: possible circular locking dependency detected
> >>> 5.5.0-rc3+ #230 Tainted: G OE
> >>> ------------------------------------------------------
> >>> lt-daxctl/6459 is trying to acquire lock:
> >>> ffff99c7f0003510 (kn->count#241){++++}, at: kernfs_remove_by_name_ns+0x41/0x80
> >>>
> >>> but task is already holding lock:
> >>> ffffffffa76a5450 (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x20/0xe0
> >>>
> >>> which lock already depends on the new lock.
> >>>
> >>>
> >>> the existing dependency chain (in reverse order) is:
> >>>
> >>> -> #2 (mem_hotplug_lock.rw_sem){++++}:
> >>> __lock_acquire+0x39c/0x790
> >>> lock_acquire+0xa2/0x1b0
> >>> get_online_mems+0x3e/0xb0
> >>> kmem_cache_create_usercopy+0x2e/0x260
> >>> kmem_cache_create+0x12/0x20
> >>> ptlock_cache_init+0x20/0x28
> >>> start_kernel+0x243/0x547
> >>> secondary_startup_64+0xb6/0xc0
> >>>
> >>> -> #1 (cpu_hotplug_lock.rw_sem){++++}:
> >>> __lock_acquire+0x39c/0x790
> >>> lock_acquire+0xa2/0x1b0
> >>> cpus_read_lock+0x3e/0xb0
> >>> online_pages+0x37/0x300
> >>> memory_subsys_online+0x17d/0x1c0
> >>> device_online+0x60/0x80
> >>> state_store+0x65/0xd0
> >>> kernfs_fop_write+0xcf/0x1c0
> >>> vfs_write+0xdb/0x1d0
> >>> ksys_write+0x65/0xe0
> >>> do_syscall_64+0x5c/0xa0
> >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >>>
> >>> -> #0 (kn->count#241){++++}:
> >>> check_prev_add+0x98/0xa40
> >>> validate_chain+0x576/0x860
> >>> __lock_acquire+0x39c/0x790
> >>> lock_acquire+0xa2/0x1b0
> >>> __kernfs_remove+0x25f/0x2e0
> >>> kernfs_remove_by_name_ns+0x41/0x80
> >>> remove_files.isra.0+0x30/0x70
> >>> sysfs_remove_group+0x3d/0x80
> >>> sysfs_remove_groups+0x29/0x40
> >>> device_remove_attrs+0x39/0x70
> >>> device_del+0x16a/0x3f0
> >>> device_unregister+0x16/0x60
> >>> remove_memory_block_devices+0x82/0xb0
> >>> try_remove_memory+0xb5/0x130
> >>> remove_memory+0x26/0x40
> >>> dev_dax_kmem_remove+0x44/0x6a [kmem]
> >>> device_release_driver_internal+0xe4/0x1c0
> >>> unbind_store+0xef/0x120
> >>> kernfs_fop_write+0xcf/0x1c0
> >>> vfs_write+0xdb/0x1d0
> >>> ksys_write+0x65/0xe0
> >>> do_syscall_64+0x5c/0xa0
> >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >>>
> >>> other info that might help us debug this:
> >>>
> >>> Chain exists of:
> >>> kn->count#241 --> cpu_hotplug_lock.rw_sem --> mem_hotplug_lock.rw_sem
> >>>
> >>> Possible unsafe locking scenario:
> >>>
> >>> CPU0 CPU1
> >>> ---- ----
> >>> lock(mem_hotplug_lock.rw_sem);
> >>> lock(cpu_hotplug_lock.rw_sem);
> >>> lock(mem_hotplug_lock.rw_sem);
> >>> lock(kn->count#241);
> >>>
> >>> *** DEADLOCK ***
> >>>
> >>> No fixes tag as this seems to have been a long standing issue that
> >>> likely predated the addition of kernfs lockdep annotations.
> >>>
> >>> Cc: <stable@vger.kernel.org>
> >>> Cc: Vishal Verma <vishal.l.verma@intel.com>
> >>> Cc: David Hildenbrand <david@redhat.com>
> >>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> >>> Cc: Michal Hocko <mhocko@suse.com>
> >>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> >>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >>> ---
> >>> mm/memory_hotplug.c | 12 +++++++++---
> >>> 1 file changed, 9 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> >>> index 55ac23ef11c1..a4e7dadded08 100644
> >>> --- a/mm/memory_hotplug.c
> >>> +++ b/mm/memory_hotplug.c
> >>> @@ -1763,8 +1763,6 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
> >>>
> >>> BUG_ON(check_hotplug_memory_range(start, size));
> >>>
> >>> - mem_hotplug_begin();
> >>> -
> >>> /*
> >>> * All memory blocks must be offlined before removing memory. Check
> >>> * whether all memory blocks in question are offline and return error
> >>> @@ -1777,9 +1775,17 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
> >>> /* remove memmap entry */
> >>> firmware_map_remove(start, start + size, "System RAM");
> >>>
> >>> - /* remove memory block devices before removing memory */
> >>> + /*
> >>> + * Remove memory block devices before removing memory, and do
> >>> + * not hold the mem_hotplug_lock() over kobject removal
> >>> + * operations. lock_device_hotplug() keeps the
> >>> + * check_memblock_offlined_cb result valid until the entire
> >>> + * removal process is complete.
> >>> + */
> >>
> >> Maybe shorten that to
> >>
> >> /*
> >> * Remove memory block devices before removing memory. Protected
> >> * by the device_hotplug_lock only.
> >> */
> >
> > Why make someone dig for the reasons this lock is sufficient?
>
> I think 5 LOC of comment are too much for something that is documented
> e.g., in Documentation/core-api/memory-hotplug.rst ("Locking
> Internals"). But whatever you prefer.
Sure, lets beef up that doc to clarify this case and refer to it.
>
> >
> >>
> >> AFAIK, the device hotplug lock is sufficient here. The memory hotplug
> >> lock / cpu hotplug lock is only needed when calling into arch code
> >> (especially for PPC). We hold both locks when onlining/offlining memory.
> >>
> >>> remove_memory_block_devices(start, size);
> >>>
> >>> + mem_hotplug_begin();
> >>> +
> >>> arch_remove_memory(nid, start, size, NULL);
> >>> memblock_free(start, size);
> >>> memblock_remove(start, size);
> >>>
> >>
> >> I'd suggest to do the same in the adding part right away (if easily
> >> possible) to make it clearer.
> >
> > Let's let this fix percolate upstream for a bit to make sure there was
> > no protection the mem_hotplug_begin() was inadvertently providing.
>
> Yeah, why not.
>
> >
> >> I properly documented the semantics of
> >> add_memory_block_devices()/remove_memory_block_devices() already (that
> >> they need the device hotplug lock).
> >
> > I see that, but I prefer lockdep_assert_held() in the code rather than
> > comments. I'll send a patch to fix that up.
>
> That won't work as early boot code from ACPI won't hold it while it adds
> memory. And we decided (especially Michal :) ) to keep it like that.
So then the comment is actively misleading for that case. I would
expect an explicit _unlocked path for that case with a comment about
why it's special. Is there already a comment to that effect somewhere?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
@ 2020-01-10 17:24 ` Dan Williams
0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2020-01-10 17:24 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, stable, Vishal Verma, Pavel Tatashin,
Michal Hocko, Dave Hansen, Linux MM, Linux Kernel Mailing List
On Fri, Jan 10, 2020 at 8:54 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 10.01.20 17:42, Dan Williams wrote:
> > On Fri, Jan 10, 2020 at 1:10 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 10.01.20 05:30, Dan Williams wrote:
> >>> The daxctl unit test for the dax_kmem driver currently triggers the
> >>> lockdep splat below. It results from the fact that
> >>> remove_memory_block_devices() is invoked under the mem_hotplug_lock()
> >>> causing lockdep entanglements with cpu_hotplug_lock().
> >>>
> >>> The mem_hotplug_lock() is not needed to synchronize the memory block
> >>> device sysfs interface vs the page online state, that is already handled
> >>> by lock_device_hotplug(). Specifically lock_device_hotplug()
> >>> is sufficient to allow try_remove_memory() to check the offline
> >>> state of the memblocks and be assured that subsequent online attempts
> >>> will be blocked. The device_online() path checks mem->section_count
> >>> before allowing any state manipulations and mem->section_count is
> >>> cleared in remove_memory_block_devices().
> >>>
> >>> The add_memory() path does create memblock devices under the lock, but
> >>> there is no lockdep report on that path, so it is left alone for now.
> >>>
> >>> This change is only possible thanks to the recent change that refactored
> >>> memory block device removal out of arch_remove_memory() (commit
> >>> 4c4b7f9ba948 mm/memory_hotplug: remove memory block devices before
> >>> arch_remove_memory()).
> >>>
> >>> ======================================================
> >>> WARNING: possible circular locking dependency detected
> >>> 5.5.0-rc3+ #230 Tainted: G OE
> >>> ------------------------------------------------------
> >>> lt-daxctl/6459 is trying to acquire lock:
> >>> ffff99c7f0003510 (kn->count#241){++++}, at: kernfs_remove_by_name_ns+0x41/0x80
> >>>
> >>> but task is already holding lock:
> >>> ffffffffa76a5450 (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x20/0xe0
> >>>
> >>> which lock already depends on the new lock.
> >>>
> >>>
> >>> the existing dependency chain (in reverse order) is:
> >>>
> >>> -> #2 (mem_hotplug_lock.rw_sem){++++}:
> >>> __lock_acquire+0x39c/0x790
> >>> lock_acquire+0xa2/0x1b0
> >>> get_online_mems+0x3e/0xb0
> >>> kmem_cache_create_usercopy+0x2e/0x260
> >>> kmem_cache_create+0x12/0x20
> >>> ptlock_cache_init+0x20/0x28
> >>> start_kernel+0x243/0x547
> >>> secondary_startup_64+0xb6/0xc0
> >>>
> >>> -> #1 (cpu_hotplug_lock.rw_sem){++++}:
> >>> __lock_acquire+0x39c/0x790
> >>> lock_acquire+0xa2/0x1b0
> >>> cpus_read_lock+0x3e/0xb0
> >>> online_pages+0x37/0x300
> >>> memory_subsys_online+0x17d/0x1c0
> >>> device_online+0x60/0x80
> >>> state_store+0x65/0xd0
> >>> kernfs_fop_write+0xcf/0x1c0
> >>> vfs_write+0xdb/0x1d0
> >>> ksys_write+0x65/0xe0
> >>> do_syscall_64+0x5c/0xa0
> >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >>>
> >>> -> #0 (kn->count#241){++++}:
> >>> check_prev_add+0x98/0xa40
> >>> validate_chain+0x576/0x860
> >>> __lock_acquire+0x39c/0x790
> >>> lock_acquire+0xa2/0x1b0
> >>> __kernfs_remove+0x25f/0x2e0
> >>> kernfs_remove_by_name_ns+0x41/0x80
> >>> remove_files.isra.0+0x30/0x70
> >>> sysfs_remove_group+0x3d/0x80
> >>> sysfs_remove_groups+0x29/0x40
> >>> device_remove_attrs+0x39/0x70
> >>> device_del+0x16a/0x3f0
> >>> device_unregister+0x16/0x60
> >>> remove_memory_block_devices+0x82/0xb0
> >>> try_remove_memory+0xb5/0x130
> >>> remove_memory+0x26/0x40
> >>> dev_dax_kmem_remove+0x44/0x6a [kmem]
> >>> device_release_driver_internal+0xe4/0x1c0
> >>> unbind_store+0xef/0x120
> >>> kernfs_fop_write+0xcf/0x1c0
> >>> vfs_write+0xdb/0x1d0
> >>> ksys_write+0x65/0xe0
> >>> do_syscall_64+0x5c/0xa0
> >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >>>
> >>> other info that might help us debug this:
> >>>
> >>> Chain exists of:
> >>> kn->count#241 --> cpu_hotplug_lock.rw_sem --> mem_hotplug_lock.rw_sem
> >>>
> >>> Possible unsafe locking scenario:
> >>>
> >>> CPU0 CPU1
> >>> ---- ----
> >>> lock(mem_hotplug_lock.rw_sem);
> >>> lock(cpu_hotplug_lock.rw_sem);
> >>> lock(mem_hotplug_lock.rw_sem);
> >>> lock(kn->count#241);
> >>>
> >>> *** DEADLOCK ***
> >>>
> >>> No fixes tag as this seems to have been a long standing issue that
> >>> likely predated the addition of kernfs lockdep annotations.
> >>>
> >>> Cc: <stable@vger.kernel.org>
> >>> Cc: Vishal Verma <vishal.l.verma@intel.com>
> >>> Cc: David Hildenbrand <david@redhat.com>
> >>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> >>> Cc: Michal Hocko <mhocko@suse.com>
> >>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> >>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >>> ---
> >>> mm/memory_hotplug.c | 12 +++++++++---
> >>> 1 file changed, 9 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> >>> index 55ac23ef11c1..a4e7dadded08 100644
> >>> --- a/mm/memory_hotplug.c
> >>> +++ b/mm/memory_hotplug.c
> >>> @@ -1763,8 +1763,6 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
> >>>
> >>> BUG_ON(check_hotplug_memory_range(start, size));
> >>>
> >>> - mem_hotplug_begin();
> >>> -
> >>> /*
> >>> * All memory blocks must be offlined before removing memory. Check
> >>> * whether all memory blocks in question are offline and return error
> >>> @@ -1777,9 +1775,17 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
> >>> /* remove memmap entry */
> >>> firmware_map_remove(start, start + size, "System RAM");
> >>>
> >>> - /* remove memory block devices before removing memory */
> >>> + /*
> >>> + * Remove memory block devices before removing memory, and do
> >>> + * not hold the mem_hotplug_lock() over kobject removal
> >>> + * operations. lock_device_hotplug() keeps the
> >>> + * check_memblock_offlined_cb result valid until the entire
> >>> + * removal process is complete.
> >>> + */
> >>
> >> Maybe shorten that to
> >>
> >> /*
> >> * Remove memory block devices before removing memory. Protected
> >> * by the device_hotplug_lock only.
> >> */
> >
> > Why make someone dig for the reasons this lock is sufficient?
>
> I think 5 LOC of comment are too much for something that is documented
> e.g., in Documentation/core-api/memory-hotplug.rst ("Locking
> Internals"). But whatever you prefer.
Sure, lets beef up that doc to clarify this case and refer to it.
>
> >
> >>
> >> AFAIK, the device hotplug lock is sufficient here. The memory hotplug
> >> lock / cpu hotplug lock is only needed when calling into arch code
> >> (especially for PPC). We hold both locks when onlining/offlining memory.
> >>
> >>> remove_memory_block_devices(start, size);
> >>>
> >>> + mem_hotplug_begin();
> >>> +
> >>> arch_remove_memory(nid, start, size, NULL);
> >>> memblock_free(start, size);
> >>> memblock_remove(start, size);
> >>>
> >>
> >> I'd suggest to do the same in the adding part right away (if easily
> >> possible) to make it clearer.
> >
> > Let's let this fix percolate upstream for a bit to make sure there was
> > no protection the mem_hotplug_begin() was inadvertently providing.
>
> Yeah, why not.
>
> >
> >> I properly documented the semantics of
> >> add_memory_block_devices()/remove_memory_block_devices() already (that
> >> they need the device hotplug lock).
> >
> > I see that, but I prefer lockdep_assert_held() in the code rather than
> > comments. I'll send a patch to fix that up.
>
> That won't work as early boot code from ACPI won't hold it while it adds
> memory. And we decided (especially Michal :) ) to keep it like that.
So then the comment is actively misleading for that case. I would
expect an explicit _unlocked path for that case with a comment about
why it's special. Is there already a comment to that effect somewhere?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
2020-01-10 17:24 ` Dan Williams
(?)
@ 2020-01-10 17:29 ` David Hildenbrand
2020-01-10 17:33 ` Dan Williams
-1 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2020-01-10 17:29 UTC (permalink / raw)
To: Dan Williams
Cc: Andrew Morton, stable, Vishal Verma, Pavel Tatashin,
Michal Hocko, Dave Hansen, Linux MM, Linux Kernel Mailing List
>>> Why make someone dig for the reasons this lock is sufficient?
>>
>> I think 5 LOC of comment are too much for something that is documented
>> e.g., in Documentation/core-api/memory-hotplug.rst ("Locking
>> Internals"). But whatever you prefer.
>
> Sure, lets beef up that doc to clarify this case and refer to it.
Referring is a good idea. We should change the "is advised" for the device_online()
to a "is required" or similar. Back then I wasn't sure how it all worked in
detail...
>>>> I properly documented the semantics of
>>>> add_memory_block_devices()/remove_memory_block_devices() already (that
>>>> they need the device hotplug lock).
>>>
>>> I see that, but I prefer lockdep_assert_held() in the code rather than
>>> comments. I'll send a patch to fix that up.
>>
>> That won't work as early boot code from ACPI won't hold it while it adds
>> memory. And we decided (especially Michal :) ) to keep it like that.
>
> So then the comment is actively misleading for that case. I would
> expect an explicit _unlocked path for that case with a comment about
> why it's special. Is there already a comment to that effect somewhere?
>
__add_memory() - the locked variant - is called from the same ACPI location
either locked or unlocked. I added a comment back then after a longe
discussion with Michal:
drivers/acpi/scan.c:
/*
* Although we call __add_memory() that is documented to require the
* device_hotplug_lock, it is not necessary here because this is an
* early code when userspace or any other code path cannot trigger
* hotplug/hotunplug operations.
*/
It really is a special case, though.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
2020-01-10 17:29 ` David Hildenbrand
@ 2020-01-10 17:33 ` Dan Williams
0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2020-01-10 17:33 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, stable, Vishal Verma, Pavel Tatashin,
Michal Hocko, Dave Hansen, Linux MM, Linux Kernel Mailing List
On Fri, Jan 10, 2020 at 9:29 AM David Hildenbrand <david@redhat.com> wrote:
[..]
> > So then the comment is actively misleading for that case. I would
> > expect an explicit _unlocked path for that case with a comment about
> > why it's special. Is there already a comment to that effect somewhere?
> >
>
> __add_memory() - the locked variant - is called from the same ACPI location
> either locked or unlocked. I added a comment back then after a longe
> discussion with Michal:
>
> drivers/acpi/scan.c:
> /*
> * Although we call __add_memory() that is documented to require the
> * device_hotplug_lock, it is not necessary here because this is an
> * early code when userspace or any other code path cannot trigger
> * hotplug/hotunplug operations.
> */
>
>
> It really is a special case, though.
That's a large comment block when we could have just taken the lock.
There's probably many other code paths in the kernel where some locks
are not necessary before userspace is up, but the code takes the lock
anyway to minimize the code maintenance burden. Is there really a
compelling reason to be clever here?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
@ 2020-01-10 17:33 ` Dan Williams
0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2020-01-10 17:33 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, stable, Vishal Verma, Pavel Tatashin,
Michal Hocko, Dave Hansen, Linux MM, Linux Kernel Mailing List
On Fri, Jan 10, 2020 at 9:29 AM David Hildenbrand <david@redhat.com> wrote:
[..]
> > So then the comment is actively misleading for that case. I would
> > expect an explicit _unlocked path for that case with a comment about
> > why it's special. Is there already a comment to that effect somewhere?
> >
>
> __add_memory() - the locked variant - is called from the same ACPI location
> either locked or unlocked. I added a comment back then after a longe
> discussion with Michal:
>
> drivers/acpi/scan.c:
> /*
> * Although we call __add_memory() that is documented to require the
> * device_hotplug_lock, it is not necessary here because this is an
> * early code when userspace or any other code path cannot trigger
> * hotplug/hotunplug operations.
> */
>
>
> It really is a special case, though.
That's a large comment block when we could have just taken the lock.
There's probably many other code paths in the kernel where some locks
are not necessary before userspace is up, but the code takes the lock
anyway to minimize the code maintenance burden. Is there really a
compelling reason to be clever here?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
2020-01-10 17:33 ` Dan Williams
(?)
@ 2020-01-10 17:36 ` David Hildenbrand
2020-01-10 17:39 ` Dan Williams
-1 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2020-01-10 17:36 UTC (permalink / raw)
To: Dan Williams
Cc: Andrew Morton, stable, Vishal Verma, Pavel Tatashin,
Michal Hocko, Dave Hansen, Linux MM, Linux Kernel Mailing List
On 10.01.20 18:33, Dan Williams wrote:
> On Fri, Jan 10, 2020 at 9:29 AM David Hildenbrand <david@redhat.com> wrote:
> [..]
>>> So then the comment is actively misleading for that case. I would
>>> expect an explicit _unlocked path for that case with a comment about
>>> why it's special. Is there already a comment to that effect somewhere?
>>>
>>
>> __add_memory() - the locked variant - is called from the same ACPI location
>> either locked or unlocked. I added a comment back then after a longe
>> discussion with Michal:
>>
>> drivers/acpi/scan.c:
>> /*
>> * Although we call __add_memory() that is documented to require the
>> * device_hotplug_lock, it is not necessary here because this is an
>> * early code when userspace or any other code path cannot trigger
>> * hotplug/hotunplug operations.
>> */
>>
>>
>> It really is a special case, though.
>
> That's a large comment block when we could have just taken the lock.
> There's probably many other code paths in the kernel where some locks
> are not necessary before userspace is up, but the code takes the lock
> anyway to minimize the code maintenance burden. Is there really a
> compelling reason to be clever here?
It was a lengthy discussion back then and I was sharing your opinion. I
even had a patch ready to enforce that we are holding the lock (that's
how I identified that specific case in the first place).
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
2020-01-10 17:36 ` David Hildenbrand
@ 2020-01-10 17:39 ` Dan Williams
0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2020-01-10 17:39 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, stable, Vishal Verma, Pavel Tatashin,
Michal Hocko, Dave Hansen, Linux MM, Linux Kernel Mailing List
On Fri, Jan 10, 2020 at 9:36 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 10.01.20 18:33, Dan Williams wrote:
> > On Fri, Jan 10, 2020 at 9:29 AM David Hildenbrand <david@redhat.com> wrote:
> > [..]
> >>> So then the comment is actively misleading for that case. I would
> >>> expect an explicit _unlocked path for that case with a comment about
> >>> why it's special. Is there already a comment to that effect somewhere?
> >>>
> >>
> >> __add_memory() - the locked variant - is called from the same ACPI location
> >> either locked or unlocked. I added a comment back then after a longe
> >> discussion with Michal:
> >>
> >> drivers/acpi/scan.c:
> >> /*
> >> * Although we call __add_memory() that is documented to require the
> >> * device_hotplug_lock, it is not necessary here because this is an
> >> * early code when userspace or any other code path cannot trigger
> >> * hotplug/hotunplug operations.
> >> */
> >>
> >>
> >> It really is a special case, though.
> >
> > That's a large comment block when we could have just taken the lock.
> > There's probably many other code paths in the kernel where some locks
> > are not necessary before userspace is up, but the code takes the lock
> > anyway to minimize the code maintenance burden. Is there really a
> > compelling reason to be clever here?
>
> It was a lengthy discussion back then and I was sharing your opinion. I
> even had a patch ready to enforce that we are holding the lock (that's
> how I identified that specific case in the first place).
Ok, apologies I missed that opportunity to back you up. Michal, is
this still worth it?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
@ 2020-01-10 17:39 ` Dan Williams
0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2020-01-10 17:39 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, stable, Vishal Verma, Pavel Tatashin,
Michal Hocko, Dave Hansen, Linux MM, Linux Kernel Mailing List
On Fri, Jan 10, 2020 at 9:36 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 10.01.20 18:33, Dan Williams wrote:
> > On Fri, Jan 10, 2020 at 9:29 AM David Hildenbrand <david@redhat.com> wrote:
> > [..]
> >>> So then the comment is actively misleading for that case. I would
> >>> expect an explicit _unlocked path for that case with a comment about
> >>> why it's special. Is there already a comment to that effect somewhere?
> >>>
> >>
> >> __add_memory() - the locked variant - is called from the same ACPI location
> >> either locked or unlocked. I added a comment back then after a longe
> >> discussion with Michal:
> >>
> >> drivers/acpi/scan.c:
> >> /*
> >> * Although we call __add_memory() that is documented to require the
> >> * device_hotplug_lock, it is not necessary here because this is an
> >> * early code when userspace or any other code path cannot trigger
> >> * hotplug/hotunplug operations.
> >> */
> >>
> >>
> >> It really is a special case, though.
> >
> > That's a large comment block when we could have just taken the lock.
> > There's probably many other code paths in the kernel where some locks
> > are not necessary before userspace is up, but the code takes the lock
> > anyway to minimize the code maintenance burden. Is there really a
> > compelling reason to be clever here?
>
> It was a lengthy discussion back then and I was sharing your opinion. I
> even had a patch ready to enforce that we are holding the lock (that's
> how I identified that specific case in the first place).
Ok, apologies I missed that opportunity to back you up. Michal, is
this still worth it?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
2020-01-10 17:39 ` Dan Williams
(?)
@ 2020-01-10 17:42 ` David Hildenbrand
2020-01-10 21:27 ` Dan Williams
-1 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2020-01-10 17:42 UTC (permalink / raw)
To: Dan Williams
Cc: Andrew Morton, stable, Vishal Verma, Pavel Tatashin,
Michal Hocko, Dave Hansen, Linux MM, Linux Kernel Mailing List
On 10.01.20 18:39, Dan Williams wrote:
> On Fri, Jan 10, 2020 at 9:36 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 10.01.20 18:33, Dan Williams wrote:
>>> On Fri, Jan 10, 2020 at 9:29 AM David Hildenbrand <david@redhat.com> wrote:
>>> [..]
>>>>> So then the comment is actively misleading for that case. I would
>>>>> expect an explicit _unlocked path for that case with a comment about
>>>>> why it's special. Is there already a comment to that effect somewhere?
>>>>>
>>>>
>>>> __add_memory() - the locked variant - is called from the same ACPI location
>>>> either locked or unlocked. I added a comment back then after a longe
>>>> discussion with Michal:
>>>>
>>>> drivers/acpi/scan.c:
>>>> /*
>>>> * Although we call __add_memory() that is documented to require the
>>>> * device_hotplug_lock, it is not necessary here because this is an
>>>> * early code when userspace or any other code path cannot trigger
>>>> * hotplug/hotunplug operations.
>>>> */
>>>>
>>>>
>>>> It really is a special case, though.
>>>
>>> That's a large comment block when we could have just taken the lock.
>>> There's probably many other code paths in the kernel where some locks
>>> are not necessary before userspace is up, but the code takes the lock
>>> anyway to minimize the code maintenance burden. Is there really a
>>> compelling reason to be clever here?
>>
>> It was a lengthy discussion back then and I was sharing your opinion. I
>> even had a patch ready to enforce that we are holding the lock (that's
>> how I identified that specific case in the first place).
>
> Ok, apologies I missed that opportunity to back you up. Michal, is
> this still worth it?
>
For your reference (roughly 5 months ago, so not that old)
https://lkml.kernel.org/r/20190724143017.12841-1-david@redhat.com
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
2020-01-10 17:42 ` David Hildenbrand
@ 2020-01-10 21:27 ` Dan Williams
0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2020-01-10 21:27 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, stable, Vishal Verma, Pavel Tatashin,
Michal Hocko, Dave Hansen, Linux MM, Linux Kernel Mailing List
On Fri, Jan 10, 2020 at 9:42 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 10.01.20 18:39, Dan Williams wrote:
> > On Fri, Jan 10, 2020 at 9:36 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 10.01.20 18:33, Dan Williams wrote:
> >>> On Fri, Jan 10, 2020 at 9:29 AM David Hildenbrand <david@redhat.com> wrote:
> >>> [..]
> >>>>> So then the comment is actively misleading for that case. I would
> >>>>> expect an explicit _unlocked path for that case with a comment about
> >>>>> why it's special. Is there already a comment to that effect somewhere?
> >>>>>
> >>>>
> >>>> __add_memory() - the locked variant - is called from the same ACPI location
> >>>> either locked or unlocked. I added a comment back then after a longe
> >>>> discussion with Michal:
> >>>>
> >>>> drivers/acpi/scan.c:
> >>>> /*
> >>>> * Although we call __add_memory() that is documented to require the
> >>>> * device_hotplug_lock, it is not necessary here because this is an
> >>>> * early code when userspace or any other code path cannot trigger
> >>>> * hotplug/hotunplug operations.
> >>>> */
> >>>>
> >>>>
> >>>> It really is a special case, though.
> >>>
> >>> That's a large comment block when we could have just taken the lock.
> >>> There's probably many other code paths in the kernel where some locks
> >>> are not necessary before userspace is up, but the code takes the lock
> >>> anyway to minimize the code maintenance burden. Is there really a
> >>> compelling reason to be clever here?
> >>
> >> It was a lengthy discussion back then and I was sharing your opinion. I
> >> even had a patch ready to enforce that we are holding the lock (that's
> >> how I identified that specific case in the first place).
> >
> > Ok, apologies I missed that opportunity to back you up. Michal, is
> > this still worth it?
> >
>
> For your reference (roughly 5 months ago, so not that old)
>
> https://lkml.kernel.org/r/20190724143017.12841-1-david@redhat.com
Oh, now I see the problem. You need to add that lock so far away from
the __add_memory() to avoid lock inversion problems with the
acpi_scan_lock. The organization I was envisioning would not work
without deeper refactoring.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
@ 2020-01-10 21:27 ` Dan Williams
0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2020-01-10 21:27 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, stable, Vishal Verma, Pavel Tatashin,
Michal Hocko, Dave Hansen, Linux MM, Linux Kernel Mailing List
On Fri, Jan 10, 2020 at 9:42 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 10.01.20 18:39, Dan Williams wrote:
> > On Fri, Jan 10, 2020 at 9:36 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 10.01.20 18:33, Dan Williams wrote:
> >>> On Fri, Jan 10, 2020 at 9:29 AM David Hildenbrand <david@redhat.com> wrote:
> >>> [..]
> >>>>> So then the comment is actively misleading for that case. I would
> >>>>> expect an explicit _unlocked path for that case with a comment about
> >>>>> why it's special. Is there already a comment to that effect somewhere?
> >>>>>
> >>>>
> >>>> __add_memory() - the locked variant - is called from the same ACPI location
> >>>> either locked or unlocked. I added a comment back then after a longe
> >>>> discussion with Michal:
> >>>>
> >>>> drivers/acpi/scan.c:
> >>>> /*
> >>>> * Although we call __add_memory() that is documented to require the
> >>>> * device_hotplug_lock, it is not necessary here because this is an
> >>>> * early code when userspace or any other code path cannot trigger
> >>>> * hotplug/hotunplug operations.
> >>>> */
> >>>>
> >>>>
> >>>> It really is a special case, though.
> >>>
> >>> That's a large comment block when we could have just taken the lock.
> >>> There's probably many other code paths in the kernel where some locks
> >>> are not necessary before userspace is up, but the code takes the lock
> >>> anyway to minimize the code maintenance burden. Is there really a
> >>> compelling reason to be clever here?
> >>
> >> It was a lengthy discussion back then and I was sharing your opinion. I
> >> even had a patch ready to enforce that we are holding the lock (that's
> >> how I identified that specific case in the first place).
> >
> > Ok, apologies I missed that opportunity to back you up. Michal, is
> > this still worth it?
> >
>
> For your reference (roughly 5 months ago, so not that old)
>
> https://lkml.kernel.org/r/20190724143017.12841-1-david@redhat.com
Oh, now I see the problem. You need to add that lock so far away from
the __add_memory() to avoid lock inversion problems with the
acpi_scan_lock. The organization I was envisioning would not work
without deeper refactoring.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
2020-01-10 21:27 ` Dan Williams
(?)
@ 2020-01-24 12:45 ` Michal Hocko
2020-01-24 18:04 ` Dan Williams
-1 siblings, 1 reply; 22+ messages in thread
From: Michal Hocko @ 2020-01-24 12:45 UTC (permalink / raw)
To: Dan Williams
Cc: David Hildenbrand, Andrew Morton, stable, Vishal Verma,
Pavel Tatashin, Dave Hansen, Linux MM, Linux Kernel Mailing List
On Fri 10-01-20 13:27:24, Dan Williams wrote:
> On Fri, Jan 10, 2020 at 9:42 AM David Hildenbrand <david@redhat.com> wrote:
[...]
> > For your reference (roughly 5 months ago, so not that old)
> >
> > https://lkml.kernel.org/r/20190724143017.12841-1-david@redhat.com
>
> Oh, now I see the problem. You need to add that lock so far away from
> the __add_memory() to avoid lock inversion problems with the
> acpi_scan_lock. The organization I was envisioning would not work
> without deeper refactoring.
Sorry to come back to this late. Has this been resolved?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
2020-01-24 12:45 ` Michal Hocko
@ 2020-01-24 18:04 ` Dan Williams
0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2020-01-24 18:04 UTC (permalink / raw)
To: Michal Hocko
Cc: David Hildenbrand, Andrew Morton, stable, Vishal Verma,
Pavel Tatashin, Dave Hansen, Linux MM, Linux Kernel Mailing List
On Fri, Jan 24, 2020 at 4:56 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 10-01-20 13:27:24, Dan Williams wrote:
> > On Fri, Jan 10, 2020 at 9:42 AM David Hildenbrand <david@redhat.com> wrote:
> [...]
> > > For your reference (roughly 5 months ago, so not that old)
> > >
> > > https://lkml.kernel.org/r/20190724143017.12841-1-david@redhat.com
> >
> > Oh, now I see the problem. You need to add that lock so far away from
> > the __add_memory() to avoid lock inversion problems with the
> > acpi_scan_lock. The organization I was envisioning would not work
> > without deeper refactoring.
>
> Sorry to come back to this late. Has this been resolved?
The mem_hotplug_lock lockdep splat fix in this patch has not landed.
David and I have not quite come to consensus on how to resolve online
racing removal. IIUC David wants that invalidation to be
pages_correctly_probed(), I would prefer it to be directly tied to the
object, struct memory_block, that remove_memory_block_devices() has
modified, mem->section_count = 0.
...or are you referring to the discussion about acpi_scan_lock()? I
came around to agreeing with your position that documenting was better
than adding superfluous locking especially because the
acpi_scan_lock() is take so far away from where the device_hotplug
lock is needed.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
@ 2020-01-24 18:04 ` Dan Williams
0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2020-01-24 18:04 UTC (permalink / raw)
To: Michal Hocko
Cc: David Hildenbrand, Andrew Morton, stable, Vishal Verma,
Pavel Tatashin, Dave Hansen, Linux MM, Linux Kernel Mailing List
On Fri, Jan 24, 2020 at 4:56 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 10-01-20 13:27:24, Dan Williams wrote:
> > On Fri, Jan 10, 2020 at 9:42 AM David Hildenbrand <david@redhat.com> wrote:
> [...]
> > > For your reference (roughly 5 months ago, so not that old)
> > >
> > > https://lkml.kernel.org/r/20190724143017.12841-1-david@redhat.com
> >
> > Oh, now I see the problem. You need to add that lock so far away from
> > the __add_memory() to avoid lock inversion problems with the
> > acpi_scan_lock. The organization I was envisioning would not work
> > without deeper refactoring.
>
> Sorry to come back to this late. Has this been resolved?
The mem_hotplug_lock lockdep splat fix in this patch has not landed.
David and I have not quite come to consensus on how to resolve online
racing removal. IIUC David wants that invalidation to be
pages_correctly_probed(), I would prefer it to be directly tied to the
object, struct memory_block, that remove_memory_block_devices() has
modified, mem->section_count = 0.
...or are you referring to the discussion about acpi_scan_lock()? I
came around to agreeing with your position that documenting was better
than adding superfluous locking especially because the
acpi_scan_lock() is take so far away from where the device_hotplug
lock is needed.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
2020-01-24 18:04 ` Dan Williams
(?)
@ 2020-01-24 18:13 ` David Hildenbrand
-1 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2020-01-24 18:13 UTC (permalink / raw)
To: Dan Williams, Michal Hocko
Cc: Andrew Morton, stable, Vishal Verma, Pavel Tatashin, Dave Hansen,
Linux MM, Linux Kernel Mailing List
On 24.01.20 19:04, Dan Williams wrote:
> On Fri, Jan 24, 2020 at 4:56 AM Michal Hocko <mhocko@kernel.org> wrote:
>>
>> On Fri 10-01-20 13:27:24, Dan Williams wrote:
>>> On Fri, Jan 10, 2020 at 9:42 AM David Hildenbrand <david@redhat.com> wrote:
>> [...]
>>>> For your reference (roughly 5 months ago, so not that old)
>>>>
>>>> https://lkml.kernel.org/r/20190724143017.12841-1-david@redhat.com
>>>
>>> Oh, now I see the problem. You need to add that lock so far away from
>>> the __add_memory() to avoid lock inversion problems with the
>>> acpi_scan_lock. The organization I was envisioning would not work
>>> without deeper refactoring.
>>
>> Sorry to come back to this late. Has this been resolved?
>
> The mem_hotplug_lock lockdep splat fix in this patch has not landed.
> David and I have not quite come to consensus on how to resolve online
> racing removal. IIUC David wants that invalidation to be
> pages_correctly_probed(), I would prefer it to be directly tied to the
> object, struct memory_block, that remove_memory_block_devices() has
> modified, mem->section_count = 0.
FWIW, there is no such race possible - esp. zombie devices (see
https://lore.kernel.org/lkml/1580c2bb-5e94-121d-8153-c8a7230b764b@redhat.com/).
(I'm planning to send a patch to remove mem->section_count soon)
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat
2020-01-24 18:04 ` Dan Williams
(?)
(?)
@ 2020-01-27 13:47 ` Michal Hocko
-1 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2020-01-27 13:47 UTC (permalink / raw)
To: Dan Williams
Cc: David Hildenbrand, Andrew Morton, stable, Vishal Verma,
Pavel Tatashin, Dave Hansen, Linux MM, Linux Kernel Mailing List
On Fri 24-01-20 10:04:52, Dan Williams wrote:
> On Fri, Jan 24, 2020 at 4:56 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Fri 10-01-20 13:27:24, Dan Williams wrote:
> > > On Fri, Jan 10, 2020 at 9:42 AM David Hildenbrand <david@redhat.com> wrote:
> > [...]
> > > > For your reference (roughly 5 months ago, so not that old)
> > > >
> > > > https://lkml.kernel.org/r/20190724143017.12841-1-david@redhat.com
> > >
> > > Oh, now I see the problem. You need to add that lock so far away from
> > > the __add_memory() to avoid lock inversion problems with the
> > > acpi_scan_lock. The organization I was envisioning would not work
> > > without deeper refactoring.
> >
> > Sorry to come back to this late. Has this been resolved?
>
> The mem_hotplug_lock lockdep splat fix in this patch has not landed.
> David and I have not quite come to consensus on how to resolve online
> racing removal. IIUC David wants that invalidation to be
> pages_correctly_probed(), I would prefer it to be directly tied to the
> object, struct memory_block, that remove_memory_block_devices() has
> modified, mem->section_count = 0.
I was asking about this part and I can see you have already posted a
patch[1] and I do not see any reason for not merging it.
[1] http://lkml.kernel.org/r/157991441887.2763922.4770790047389427325.stgit@dwillia2-desk3.amr.corp.intel.com
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2020-01-27 13:47 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-10 4:30 [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat Dan Williams
2020-01-10 9:10 ` David Hildenbrand
2020-01-10 16:42 ` Dan Williams
2020-01-10 16:42 ` Dan Williams
2020-01-10 16:54 ` David Hildenbrand
2020-01-10 16:57 ` David Hildenbrand
2020-01-10 17:24 ` Dan Williams
2020-01-10 17:24 ` Dan Williams
2020-01-10 17:29 ` David Hildenbrand
2020-01-10 17:33 ` Dan Williams
2020-01-10 17:33 ` Dan Williams
2020-01-10 17:36 ` David Hildenbrand
2020-01-10 17:39 ` Dan Williams
2020-01-10 17:39 ` Dan Williams
2020-01-10 17:42 ` David Hildenbrand
2020-01-10 21:27 ` Dan Williams
2020-01-10 21:27 ` Dan Williams
2020-01-24 12:45 ` Michal Hocko
2020-01-24 18:04 ` Dan Williams
2020-01-24 18:04 ` Dan Williams
2020-01-24 18:13 ` David Hildenbrand
2020-01-27 13:47 ` Michal Hocko
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.