[v1] ACPI / scan: Acquire device_hotplug_lock in acpi_scan_init()
diff mbox series

Message ID 20190724143017.12841-1-david@redhat.com
State In Next
Commit b1de1c3cd1ef7fab921756c5ae07490f640aa99a
Headers show
Series
  • [v1] ACPI / scan: Acquire device_hotplug_lock in acpi_scan_init()
Related show

Commit Message

David Hildenbrand July 24, 2019, 2:30 p.m. UTC
We end up calling __add_memory() without the device hotplug lock held.
(I used a local patch to assert in __add_memory() that the
 device_hotplug_lock is held - I might upstream that as well soon)

[   26.771684]        create_memory_block_devices+0xa4/0x140
[   26.772952]        add_memory_resource+0xde/0x200
[   26.773987]        __add_memory+0x6e/0xa0
[   26.775161]        acpi_memory_device_add+0x149/0x2b0
[   26.776263]        acpi_bus_attach+0xf1/0x1f0
[   26.777247]        acpi_bus_attach+0x66/0x1f0
[   26.778268]        acpi_bus_attach+0x66/0x1f0
[   26.779073]        acpi_bus_attach+0x66/0x1f0
[   26.780143]        acpi_bus_scan+0x3e/0x90
[   26.780844]        acpi_scan_init+0x109/0x257
[   26.781638]        acpi_init+0x2ab/0x30d
[   26.782248]        do_one_initcall+0x58/0x2cf
[   26.783181]        kernel_init_freeable+0x1bd/0x247
[   26.784345]        kernel_init+0x5/0xf1
[   26.785314]        ret_from_fork+0x3a/0x50

So perform the locking just like in acpi_device_hotplug().

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/acpi/scan.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Rafael J. Wysocki July 25, 2019, 9:11 a.m. UTC | #1
On Wednesday, July 24, 2019 4:30:17 PM CEST David Hildenbrand wrote:
> We end up calling __add_memory() without the device hotplug lock held.
> (I used a local patch to assert in __add_memory() that the
>  device_hotplug_lock is held - I might upstream that as well soon)
> 
> [   26.771684]        create_memory_block_devices+0xa4/0x140
> [   26.772952]        add_memory_resource+0xde/0x200
> [   26.773987]        __add_memory+0x6e/0xa0
> [   26.775161]        acpi_memory_device_add+0x149/0x2b0
> [   26.776263]        acpi_bus_attach+0xf1/0x1f0
> [   26.777247]        acpi_bus_attach+0x66/0x1f0
> [   26.778268]        acpi_bus_attach+0x66/0x1f0
> [   26.779073]        acpi_bus_attach+0x66/0x1f0
> [   26.780143]        acpi_bus_scan+0x3e/0x90
> [   26.780844]        acpi_scan_init+0x109/0x257
> [   26.781638]        acpi_init+0x2ab/0x30d
> [   26.782248]        do_one_initcall+0x58/0x2cf
> [   26.783181]        kernel_init_freeable+0x1bd/0x247
> [   26.784345]        kernel_init+0x5/0xf1
> [   26.785314]        ret_from_fork+0x3a/0x50
> 
> So perform the locking just like in acpi_device_hotplug().
> 
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Len Brown <lenb@kernel.org
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/acpi/scan.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 0e28270b0fd8..cbc9d64b48dd 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -2204,7 +2204,9 @@ int __init acpi_scan_init(void)
>  	acpi_gpe_apply_masked_gpes();
>  	acpi_update_all_gpes();
>  
> +	lock_device_hotplug();
>  	mutex_lock(&acpi_scan_lock);
> +
>  	/*
>  	 * Enumerate devices in the ACPI namespace.
>  	 */
> @@ -2232,6 +2234,7 @@ int __init acpi_scan_init(void)
>  
>   out:
>  	mutex_unlock(&acpi_scan_lock);
> +	unlock_device_hotplug();
>  	return result;
>  }
>  
>
Oscar Salvador July 25, 2019, 9:18 a.m. UTC | #2
On Wed, Jul 24, 2019 at 04:30:17PM +0200, David Hildenbrand wrote:
> We end up calling __add_memory() without the device hotplug lock held.
> (I used a local patch to assert in __add_memory() that the
>  device_hotplug_lock is held - I might upstream that as well soon)
> 
> [   26.771684]        create_memory_block_devices+0xa4/0x140
> [   26.772952]        add_memory_resource+0xde/0x200
> [   26.773987]        __add_memory+0x6e/0xa0
> [   26.775161]        acpi_memory_device_add+0x149/0x2b0
> [   26.776263]        acpi_bus_attach+0xf1/0x1f0
> [   26.777247]        acpi_bus_attach+0x66/0x1f0
> [   26.778268]        acpi_bus_attach+0x66/0x1f0
> [   26.779073]        acpi_bus_attach+0x66/0x1f0
> [   26.780143]        acpi_bus_scan+0x3e/0x90
> [   26.780844]        acpi_scan_init+0x109/0x257
> [   26.781638]        acpi_init+0x2ab/0x30d
> [   26.782248]        do_one_initcall+0x58/0x2cf
> [   26.783181]        kernel_init_freeable+0x1bd/0x247
> [   26.784345]        kernel_init+0x5/0xf1
> [   26.785314]        ret_from_fork+0x3a/0x50
> 
> So perform the locking just like in acpi_device_hotplug().
> 
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Len Brown <lenb@kernel.org
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Given that that call comes from a __init function, so while booting, I wonder
how bad it is.
Anyway, let us be consistent:

Reviewed-by: Oscar Salvador <osalvador@suse.de>


> ---
>  drivers/acpi/scan.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 0e28270b0fd8..cbc9d64b48dd 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -2204,7 +2204,9 @@ int __init acpi_scan_init(void)
>  	acpi_gpe_apply_masked_gpes();
>  	acpi_update_all_gpes();
>  
> +	lock_device_hotplug();
>  	mutex_lock(&acpi_scan_lock);
> +
>  	/*
>  	 * Enumerate devices in the ACPI namespace.
>  	 */
> @@ -2232,6 +2234,7 @@ int __init acpi_scan_init(void)
>  
>   out:
>  	mutex_unlock(&acpi_scan_lock);
> +	unlock_device_hotplug();
>  	return result;
>  }
>  
> -- 
> 2.21.0
>
Rafael J. Wysocki July 25, 2019, 9:22 a.m. UTC | #3
On Thu, Jul 25, 2019 at 11:18 AM Oscar Salvador <osalvador@suse.de> wrote:
>
> On Wed, Jul 24, 2019 at 04:30:17PM +0200, David Hildenbrand wrote:
> > We end up calling __add_memory() without the device hotplug lock held.
> > (I used a local patch to assert in __add_memory() that the
> >  device_hotplug_lock is held - I might upstream that as well soon)
> >
> > [   26.771684]        create_memory_block_devices+0xa4/0x140
> > [   26.772952]        add_memory_resource+0xde/0x200
> > [   26.773987]        __add_memory+0x6e/0xa0
> > [   26.775161]        acpi_memory_device_add+0x149/0x2b0
> > [   26.776263]        acpi_bus_attach+0xf1/0x1f0
> > [   26.777247]        acpi_bus_attach+0x66/0x1f0
> > [   26.778268]        acpi_bus_attach+0x66/0x1f0
> > [   26.779073]        acpi_bus_attach+0x66/0x1f0
> > [   26.780143]        acpi_bus_scan+0x3e/0x90
> > [   26.780844]        acpi_scan_init+0x109/0x257
> > [   26.781638]        acpi_init+0x2ab/0x30d
> > [   26.782248]        do_one_initcall+0x58/0x2cf
> > [   26.783181]        kernel_init_freeable+0x1bd/0x247
> > [   26.784345]        kernel_init+0x5/0xf1
> > [   26.785314]        ret_from_fork+0x3a/0x50
> >
> > So perform the locking just like in acpi_device_hotplug().
> >
> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > Cc: Len Brown <lenb@kernel.org
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Oscar Salvador <osalvador@suse.de>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Signed-off-by: David Hildenbrand <david@redhat.com>
>
> Given that that call comes from a __init function, so while booting, I wonder
> how bad it is.

Yes, it probably does not matter.

> Anyway, let us be consistent:

Right.
David Hildenbrand July 25, 2019, 9:23 a.m. UTC | #4
On 25.07.19 11:22, Rafael J. Wysocki wrote:
> On Thu, Jul 25, 2019 at 11:18 AM Oscar Salvador <osalvador@suse.de> wrote:
>>
>> On Wed, Jul 24, 2019 at 04:30:17PM +0200, David Hildenbrand wrote:
>>> We end up calling __add_memory() without the device hotplug lock held.
>>> (I used a local patch to assert in __add_memory() that the
>>>  device_hotplug_lock is held - I might upstream that as well soon)
>>>
>>> [   26.771684]        create_memory_block_devices+0xa4/0x140
>>> [   26.772952]        add_memory_resource+0xde/0x200
>>> [   26.773987]        __add_memory+0x6e/0xa0
>>> [   26.775161]        acpi_memory_device_add+0x149/0x2b0
>>> [   26.776263]        acpi_bus_attach+0xf1/0x1f0
>>> [   26.777247]        acpi_bus_attach+0x66/0x1f0
>>> [   26.778268]        acpi_bus_attach+0x66/0x1f0
>>> [   26.779073]        acpi_bus_attach+0x66/0x1f0
>>> [   26.780143]        acpi_bus_scan+0x3e/0x90
>>> [   26.780844]        acpi_scan_init+0x109/0x257
>>> [   26.781638]        acpi_init+0x2ab/0x30d
>>> [   26.782248]        do_one_initcall+0x58/0x2cf
>>> [   26.783181]        kernel_init_freeable+0x1bd/0x247
>>> [   26.784345]        kernel_init+0x5/0xf1
>>> [   26.785314]        ret_from_fork+0x3a/0x50
>>>
>>> So perform the locking just like in acpi_device_hotplug().
>>>
>>> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
>>> Cc: Len Brown <lenb@kernel.org
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Oscar Salvador <osalvador@suse.de>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>
>> Given that that call comes from a __init function, so while booting, I wonder
>> how bad it is.
> 
> Yes, it probably does not matter.

It can at least confuse lockdep, but I agree that this is not stable
material.

> 
>> Anyway, let us be consistent:
> 
> Right.
>
Michal Hocko July 25, 2019, 12:56 p.m. UTC | #5
On Wed 24-07-19 16:30:17, David Hildenbrand wrote:
> We end up calling __add_memory() without the device hotplug lock held.
> (I used a local patch to assert in __add_memory() that the
>  device_hotplug_lock is held - I might upstream that as well soon)
> 
> [   26.771684]        create_memory_block_devices+0xa4/0x140
> [   26.772952]        add_memory_resource+0xde/0x200
> [   26.773987]        __add_memory+0x6e/0xa0
> [   26.775161]        acpi_memory_device_add+0x149/0x2b0
> [   26.776263]        acpi_bus_attach+0xf1/0x1f0
> [   26.777247]        acpi_bus_attach+0x66/0x1f0
> [   26.778268]        acpi_bus_attach+0x66/0x1f0
> [   26.779073]        acpi_bus_attach+0x66/0x1f0
> [   26.780143]        acpi_bus_scan+0x3e/0x90
> [   26.780844]        acpi_scan_init+0x109/0x257
> [   26.781638]        acpi_init+0x2ab/0x30d
> [   26.782248]        do_one_initcall+0x58/0x2cf
> [   26.783181]        kernel_init_freeable+0x1bd/0x247
> [   26.784345]        kernel_init+0x5/0xf1
> [   26.785314]        ret_from_fork+0x3a/0x50
> 
> So perform the locking just like in acpi_device_hotplug().

While playing with the device_hotplug_lock, can we actually document
what it is protecting please? I have a bad feeling that we are adding
this lock just because some other code path does rather than with a good
idea why it is needed. This patch just confirms that. What exactly does
the lock protect from here in an early boot stage.

> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Len Brown <lenb@kernel.org
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/acpi/scan.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 0e28270b0fd8..cbc9d64b48dd 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -2204,7 +2204,9 @@ int __init acpi_scan_init(void)
>  	acpi_gpe_apply_masked_gpes();
>  	acpi_update_all_gpes();
>  
> +	lock_device_hotplug();
>  	mutex_lock(&acpi_scan_lock);
> +
>  	/*
>  	 * Enumerate devices in the ACPI namespace.
>  	 */
> @@ -2232,6 +2234,7 @@ int __init acpi_scan_init(void)
>  
>   out:
>  	mutex_unlock(&acpi_scan_lock);
> +	unlock_device_hotplug();
>  	return result;
>  }
>  
> -- 
> 2.21.0
David Hildenbrand July 25, 2019, 1:05 p.m. UTC | #6
On 25.07.19 14:56, Michal Hocko wrote:
> On Wed 24-07-19 16:30:17, David Hildenbrand wrote:
>> We end up calling __add_memory() without the device hotplug lock held.
>> (I used a local patch to assert in __add_memory() that the
>>  device_hotplug_lock is held - I might upstream that as well soon)
>>
>> [   26.771684]        create_memory_block_devices+0xa4/0x140
>> [   26.772952]        add_memory_resource+0xde/0x200
>> [   26.773987]        __add_memory+0x6e/0xa0
>> [   26.775161]        acpi_memory_device_add+0x149/0x2b0
>> [   26.776263]        acpi_bus_attach+0xf1/0x1f0
>> [   26.777247]        acpi_bus_attach+0x66/0x1f0
>> [   26.778268]        acpi_bus_attach+0x66/0x1f0
>> [   26.779073]        acpi_bus_attach+0x66/0x1f0
>> [   26.780143]        acpi_bus_scan+0x3e/0x90
>> [   26.780844]        acpi_scan_init+0x109/0x257
>> [   26.781638]        acpi_init+0x2ab/0x30d
>> [   26.782248]        do_one_initcall+0x58/0x2cf
>> [   26.783181]        kernel_init_freeable+0x1bd/0x247
>> [   26.784345]        kernel_init+0x5/0xf1
>> [   26.785314]        ret_from_fork+0x3a/0x50
>>
>> So perform the locking just like in acpi_device_hotplug().
> 
> While playing with the device_hotplug_lock, can we actually document
> what it is protecting please? I have a bad feeling that we are adding
> this lock just because some other code path does rather than with a good
> idea why it is needed. This patch just confirms that. What exactly does
> the lock protect from here in an early boot stage.

We have plenty of documentation already

mm/memory_hotplug.c

git grep -C5 device_hotplug mm/memory_hotplug.c

Also see

Documentation/core-api/memory-hotplug.rst

Regarding the early stage: primarily lockdep as I mentioned.
Michal Hocko July 25, 2019, 1:57 p.m. UTC | #7
On Thu 25-07-19 15:05:02, David Hildenbrand wrote:
> On 25.07.19 14:56, Michal Hocko wrote:
> > On Wed 24-07-19 16:30:17, David Hildenbrand wrote:
> >> We end up calling __add_memory() without the device hotplug lock held.
> >> (I used a local patch to assert in __add_memory() that the
> >>  device_hotplug_lock is held - I might upstream that as well soon)
> >>
> >> [   26.771684]        create_memory_block_devices+0xa4/0x140
> >> [   26.772952]        add_memory_resource+0xde/0x200
> >> [   26.773987]        __add_memory+0x6e/0xa0
> >> [   26.775161]        acpi_memory_device_add+0x149/0x2b0
> >> [   26.776263]        acpi_bus_attach+0xf1/0x1f0
> >> [   26.777247]        acpi_bus_attach+0x66/0x1f0
> >> [   26.778268]        acpi_bus_attach+0x66/0x1f0
> >> [   26.779073]        acpi_bus_attach+0x66/0x1f0
> >> [   26.780143]        acpi_bus_scan+0x3e/0x90
> >> [   26.780844]        acpi_scan_init+0x109/0x257
> >> [   26.781638]        acpi_init+0x2ab/0x30d
> >> [   26.782248]        do_one_initcall+0x58/0x2cf
> >> [   26.783181]        kernel_init_freeable+0x1bd/0x247
> >> [   26.784345]        kernel_init+0x5/0xf1
> >> [   26.785314]        ret_from_fork+0x3a/0x50
> >>
> >> So perform the locking just like in acpi_device_hotplug().
> > 
> > While playing with the device_hotplug_lock, can we actually document
> > what it is protecting please? I have a bad feeling that we are adding
> > this lock just because some other code path does rather than with a good
> > idea why it is needed. This patch just confirms that. What exactly does
> > the lock protect from here in an early boot stage.
> 
> We have plenty of documentation already
> 
> mm/memory_hotplug.c
> 
> git grep -C5 device_hotplug mm/memory_hotplug.c
> 
> Also see
> 
> Documentation/core-api/memory-hotplug.rst

OK, fair enough. I was more pointing to a documentation right there
where the lock is declared because that is the place where people
usually check for documentation. The core-api documentation looks quite
nice. And based on that doc it seems that this patch is actually not
needed because neither the online/offline or cpu hotplug should be
possible that early unless I am missing something.

> Regarding the early stage: primarily lockdep as I mentioned.

Could you add a lockdep splat that would be fixed by this patch to the
changelog for reference?
David Hildenbrand July 25, 2019, 2:35 p.m. UTC | #8
On 25.07.19 15:57, Michal Hocko wrote:
> On Thu 25-07-19 15:05:02, David Hildenbrand wrote:
>> On 25.07.19 14:56, Michal Hocko wrote:
>>> On Wed 24-07-19 16:30:17, David Hildenbrand wrote:
>>>> We end up calling __add_memory() without the device hotplug lock held.
>>>> (I used a local patch to assert in __add_memory() that the
>>>>  device_hotplug_lock is held - I might upstream that as well soon)
>>>>
>>>> [   26.771684]        create_memory_block_devices+0xa4/0x140
>>>> [   26.772952]        add_memory_resource+0xde/0x200
>>>> [   26.773987]        __add_memory+0x6e/0xa0
>>>> [   26.775161]        acpi_memory_device_add+0x149/0x2b0
>>>> [   26.776263]        acpi_bus_attach+0xf1/0x1f0
>>>> [   26.777247]        acpi_bus_attach+0x66/0x1f0
>>>> [   26.778268]        acpi_bus_attach+0x66/0x1f0
>>>> [   26.779073]        acpi_bus_attach+0x66/0x1f0
>>>> [   26.780143]        acpi_bus_scan+0x3e/0x90
>>>> [   26.780844]        acpi_scan_init+0x109/0x257
>>>> [   26.781638]        acpi_init+0x2ab/0x30d
>>>> [   26.782248]        do_one_initcall+0x58/0x2cf
>>>> [   26.783181]        kernel_init_freeable+0x1bd/0x247
>>>> [   26.784345]        kernel_init+0x5/0xf1
>>>> [   26.785314]        ret_from_fork+0x3a/0x50
>>>>
>>>> So perform the locking just like in acpi_device_hotplug().
>>>
>>> While playing with the device_hotplug_lock, can we actually document
>>> what it is protecting please? I have a bad feeling that we are adding
>>> this lock just because some other code path does rather than with a good
>>> idea why it is needed. This patch just confirms that. What exactly does
>>> the lock protect from here in an early boot stage.
>>
>> We have plenty of documentation already
>>
>> mm/memory_hotplug.c
>>
>> git grep -C5 device_hotplug mm/memory_hotplug.c
>>
>> Also see
>>
>> Documentation/core-api/memory-hotplug.rst
> 
> OK, fair enough. I was more pointing to a documentation right there
> where the lock is declared because that is the place where people
> usually check for documentation. The core-api documentation looks quite
> nice. And based on that doc it seems that this patch is actually not
> needed because neither the online/offline or cpu hotplug should be
> possible that early unless I am missing something.

I really prefer to stick to locking rules as outlined on the
interfaces if it doesn't hurt. Why it is not needed is not clear.

> 
>> Regarding the early stage: primarily lockdep as I mentioned.
> 
> Could you add a lockdep splat that would be fixed by this patch to the
> changelog for reference?
> 

I have one where I enforce what's documented (but that's of course not
upstream and therefore not "real" yet)

commit 263da346cd3cf526de3f5138827fbc3520f2f8e0
Author: David Hildenbrand <david@redhat.com>
Date:   Fri Jun 21 12:05:39 2019 +0200

    mm/memory_hotplug: Assert that the device_hotplug_lock is held
    
    We currently need the device_hotplug_lock(), as documented. Let's assert
    that the lock is held when adding/removing/onlining/offlining memory.
    
    Updated documentation to make this clearer.
    
    Signed-off-by: David Hildenbrand <david@redhat.com>


That patch in return was the result of debugging a lockdep warning
we can trigger right now (and I think it's a false positive
prevented by the device_hotplug_lock - I think it is the tie breaker).
Anyhow, this patch here didn't change it.


1. Start a guest with a DIMM attached
2. Online a memory block of that DIMM
3. Unplug the DIMM

:/# [   22.616108] Offlined Pages 32768
[   22.631567] 
[   22.632337] ======================================================
[   22.635104] WARNING: possible circular locking dependency detected
[   22.637475] 5.3.0-rc1-next-20190723+ #111 Not tainted
[   22.639314] ------------------------------------------------------
[   22.641276] kworker/u4:0/8 is trying to acquire lock:
[   22.642578] (____ptrval____) (kn->count#39){++++}, at: kernfs_remove_by_name_ns+0x3c/0x80
[   22.645004] 
[   22.645004] but task is already holding lock:
[   22.646495] (____ptrval____) (mem_sysfs_mutex){+.+.}, at: remove_memory_block_devices+0x65/0xd0
[   22.648649] 
[   22.648649] which lock already depends on the new lock.
[   22.648649] 
[   22.650488] 
[   22.650488] the existing dependency chain (in reverse order) is:
[   22.651987] 
[   22.651987] -> #4 (mem_sysfs_mutex){+.+.}:
[   22.653254]        __mutex_lock+0x8d/0x930
[   22.654079]        create_memory_block_devices+0xa4/0x140
[   22.655292]        add_memory_resource+0xd6/0x200
[   22.656252]        __add_memory+0x58/0x90
[   22.657096]        acpi_memory_device_add+0x149/0x2b0
[   22.658126]        acpi_bus_attach+0xf1/0x1f0
[   22.658899]        acpi_bus_attach+0x66/0x1f0
[   22.659698]        acpi_bus_attach+0x66/0x1f0
[   22.660482]        acpi_bus_attach+0x66/0x1f0
[   22.661265]        acpi_bus_scan+0x3e/0x90
[   22.662098]        acpi_scan_init+0x104/0x24d
[   22.662920]        acpi_init+0x2ab/0x30d
[   22.663733]        do_one_initcall+0x58/0x2cf
[   22.664727]        kernel_init_freeable+0x1b8/0x242
[   22.665780]        kernel_init+0x5/0xf1
[   22.666494]        ret_from_fork+0x3a/0x50
[   22.667271] 
[   22.667271] -> #3 (mem_hotplug_lock.rw_sem){++++}:
[   22.668378]        get_online_mems+0x39/0xc0
[   22.669327]        kmem_cache_create_usercopy+0x29/0x280
[   22.670369]        kmem_cache_create+0xd/0x10
[   22.671412]        ptlock_cache_init+0x1b/0x23
[   22.672206]        start_kernel+0x225/0x501
[   22.672979]        secondary_startup_64+0xa4/0xb0
[   22.673887] 
[   22.673887] -> #2 (cpu_hotplug_lock.rw_sem){++++}:
[   22.675091]        cpus_read_lock+0x39/0xc0
[   22.675962]        __offline_pages+0x3e/0x7c0
[   22.676997]        memory_subsys_offline+0x3a/0x60
[   22.678073]        device_offline+0x82/0xb0
[   22.679039]        acpi_bus_offline+0xdb/0x150
[   22.679912]        acpi_device_hotplug+0x1b4/0x3a0
[   22.680939]        acpi_hotplug_work_fn+0x15/0x20
[   22.682025]        process_one_work+0x26c/0x5a0
[   22.683019]        worker_thread+0x48/0x3e0
[   22.683942]        kthread+0x103/0x140
[   22.684855]        ret_from_fork+0x3a/0x50
[   22.685841] 
[   22.685841] -> #1 (&device->physical_node_lock){+.+.}:
[   22.687246]        __mutex_lock+0x8d/0x930
[   22.688179]        acpi_get_first_physical_node+0x18/0x60
[   22.689699]        acpi_companion_match+0x3b/0x60
[   22.690989]        acpi_device_uevent_modalias+0x9/0x20
[   22.692626]        platform_uevent+0xd/0x40
[   22.693832]        dev_uevent+0x86/0x1c0
[   22.695133]        uevent_show+0x93/0x100
[   22.695988]        dev_attr_show+0x14/0x40
[   22.697342]        sysfs_kf_seq_show+0xb2/0xf0
[   22.698845]        seq_read+0xd0/0x3f0
[   22.700066]        vfs_read+0xc0/0x170
[   22.701168]        ksys_read+0x63/0xe0
[   22.702392]        do_syscall_64+0x4b/0x1b0
[   22.703979]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   22.705708] 
[   22.705708] -> #0 (kn->count#39){++++}:
[   22.707658]        __lock_acquire+0xe2f/0x1a20
[   22.708877]        lock_acquire+0x95/0x190
[   22.710299]        __kernfs_remove+0x253/0x2f0
[   22.711936]        kernfs_remove_by_name_ns+0x3c/0x80
[   22.713392]        device_del+0x148/0x360
[   22.714685]        device_unregister+0x9/0x20
[   22.716414]        remove_memory_block_devices+0x90/0xd0
[   22.718135]        try_remove_memory+0xc6/0x130
[   22.719669]        __remove_memory+0x5/0xc
[   22.721178]        acpi_memory_device_remove+0x72/0xf0
[   22.723178]        acpi_bus_trim+0x50/0x90
[   22.724537]        acpi_device_hotplug+0x222/0x3a0
[   22.726257]        acpi_hotplug_work_fn+0x15/0x20
[   22.728044]        process_one_work+0x26c/0x5a0
[   22.729825]        worker_thread+0x48/0x3e0
[   22.731128]        kthread+0x103/0x140
[   22.732137]        ret_from_fork+0x3a/0x50
[   22.733368] 
[   22.733368] other info that might help us debug this:
[   22.733368] 
[   22.736178] Chain exists of:
[   22.736178]   kn->count#39 --> mem_hotplug_lock.rw_sem --> mem_sysfs_mutex
[   22.736178] 
[   22.739723]  Possible unsafe locking scenario:
[   22.739723] 
[   22.741143]        CPU0                    CPU1
[   22.741788]        ----                    ----
[   22.742653]   lock(mem_sysfs_mutex);
[   22.743990]                                lock(mem_hotplug_lock.rw_sem);
[   22.746069]                                lock(mem_sysfs_mutex);
[   22.747207]   lock(kn->count#39);
[   22.748132] 
[   22.748132]  *** DEADLOCK ***
[   22.748132] 
[   22.749182] 7 locks held by kworker/u4:0/8:
[   22.750684]  #0: (____ptrval____) ((wq_completion)kacpi_hotplug){+.+.}, at: process_one_work+0x1e9/0x5a0
[   22.753966]  #1: (____ptrval____) ((work_completion)(&hpw->work)){+.+.}, at: process_one_work+0x1e9/0x5a0
[   22.756429]  #2: (____ptrval____) (device_hotplug_lock){+.+.}, at: acpi_device_hotplug+0x2d/0x3a0
[   22.758292]  #3: (____ptrval____) (acpi_scan_lock){+.+.}, at: acpi_device_hotplug+0x3b/0x3a0
[   22.759836]  #4: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: try_remove_memory+0x3b/0x130
[   22.761463]  #5: (____ptrval____) (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x1b/0xf2
[   22.763812]  #6: (____ptrval____) (mem_sysfs_mutex){+.+.}, at: remove_memory_block_devices+0x65/0xd0
Michal Hocko July 25, 2019, 7:19 p.m. UTC | #9
On Thu 25-07-19 16:35:07, David Hildenbrand wrote:
> On 25.07.19 15:57, Michal Hocko wrote:
> > On Thu 25-07-19 15:05:02, David Hildenbrand wrote:
> >> On 25.07.19 14:56, Michal Hocko wrote:
> >>> On Wed 24-07-19 16:30:17, David Hildenbrand wrote:
> >>>> We end up calling __add_memory() without the device hotplug lock held.
> >>>> (I used a local patch to assert in __add_memory() that the
> >>>>  device_hotplug_lock is held - I might upstream that as well soon)
> >>>>
> >>>> [   26.771684]        create_memory_block_devices+0xa4/0x140
> >>>> [   26.772952]        add_memory_resource+0xde/0x200
> >>>> [   26.773987]        __add_memory+0x6e/0xa0
> >>>> [   26.775161]        acpi_memory_device_add+0x149/0x2b0
> >>>> [   26.776263]        acpi_bus_attach+0xf1/0x1f0
> >>>> [   26.777247]        acpi_bus_attach+0x66/0x1f0
> >>>> [   26.778268]        acpi_bus_attach+0x66/0x1f0
> >>>> [   26.779073]        acpi_bus_attach+0x66/0x1f0
> >>>> [   26.780143]        acpi_bus_scan+0x3e/0x90
> >>>> [   26.780844]        acpi_scan_init+0x109/0x257
> >>>> [   26.781638]        acpi_init+0x2ab/0x30d
> >>>> [   26.782248]        do_one_initcall+0x58/0x2cf
> >>>> [   26.783181]        kernel_init_freeable+0x1bd/0x247
> >>>> [   26.784345]        kernel_init+0x5/0xf1
> >>>> [   26.785314]        ret_from_fork+0x3a/0x50
> >>>>
> >>>> So perform the locking just like in acpi_device_hotplug().
> >>>
> >>> While playing with the device_hotplug_lock, can we actually document
> >>> what it is protecting please? I have a bad feeling that we are adding
> >>> this lock just because some other code path does rather than with a good
> >>> idea why it is needed. This patch just confirms that. What exactly does
> >>> the lock protect from here in an early boot stage.
> >>
> >> We have plenty of documentation already
> >>
> >> mm/memory_hotplug.c
> >>
> >> git grep -C5 device_hotplug mm/memory_hotplug.c
> >>
> >> Also see
> >>
> >> Documentation/core-api/memory-hotplug.rst
> > 
> > OK, fair enough. I was more pointing to a documentation right there
> > where the lock is declared because that is the place where people
> > usually check for documentation. The core-api documentation looks quite
> > nice. And based on that doc it seems that this patch is actually not
> > needed because neither the online/offline or cpu hotplug should be
> > possible that early unless I am missing something.
> 
> I really prefer to stick to locking rules as outlined on the
> interfaces if it doesn't hurt. Why it is not needed is not clear.
> 
> > 
> >> Regarding the early stage: primarily lockdep as I mentioned.
> > 
> > Could you add a lockdep splat that would be fixed by this patch to the
> > changelog for reference?
> > 
> 
> I have one where I enforce what's documented (but that's of course not
> upstream and therefore not "real" yet)

Then I suppose to not add locking for something that is not a problem.
Really, think about it. People will look at this code and follow the
lead without really knowing why the locking is needed.
device_hotplug_lock has its purpose and if the code in question doesn't
need synchronization for the documented scenarios then the locking
simply shouldn't be there. Adding the lock just because of a
non-existing, and IMHO dubious, lockdep splats is just wrong.

We need to rationalize the locking here, not to add more hacks.
David Hildenbrand July 25, 2019, 8:49 p.m. UTC | #10
On 25.07.19 21:19, Michal Hocko wrote:
> On Thu 25-07-19 16:35:07, David Hildenbrand wrote:
>> On 25.07.19 15:57, Michal Hocko wrote:
>>> On Thu 25-07-19 15:05:02, David Hildenbrand wrote:
>>>> On 25.07.19 14:56, Michal Hocko wrote:
>>>>> On Wed 24-07-19 16:30:17, David Hildenbrand wrote:
>>>>>> We end up calling __add_memory() without the device hotplug lock held.
>>>>>> (I used a local patch to assert in __add_memory() that the
>>>>>>  device_hotplug_lock is held - I might upstream that as well soon)
>>>>>>
>>>>>> [   26.771684]        create_memory_block_devices+0xa4/0x140
>>>>>> [   26.772952]        add_memory_resource+0xde/0x200
>>>>>> [   26.773987]        __add_memory+0x6e/0xa0
>>>>>> [   26.775161]        acpi_memory_device_add+0x149/0x2b0
>>>>>> [   26.776263]        acpi_bus_attach+0xf1/0x1f0
>>>>>> [   26.777247]        acpi_bus_attach+0x66/0x1f0
>>>>>> [   26.778268]        acpi_bus_attach+0x66/0x1f0
>>>>>> [   26.779073]        acpi_bus_attach+0x66/0x1f0
>>>>>> [   26.780143]        acpi_bus_scan+0x3e/0x90
>>>>>> [   26.780844]        acpi_scan_init+0x109/0x257
>>>>>> [   26.781638]        acpi_init+0x2ab/0x30d
>>>>>> [   26.782248]        do_one_initcall+0x58/0x2cf
>>>>>> [   26.783181]        kernel_init_freeable+0x1bd/0x247
>>>>>> [   26.784345]        kernel_init+0x5/0xf1
>>>>>> [   26.785314]        ret_from_fork+0x3a/0x50
>>>>>>
>>>>>> So perform the locking just like in acpi_device_hotplug().
>>>>>
>>>>> While playing with the device_hotplug_lock, can we actually document
>>>>> what it is protecting please? I have a bad feeling that we are adding
>>>>> this lock just because some other code path does rather than with a good
>>>>> idea why it is needed. This patch just confirms that. What exactly does
>>>>> the lock protect from here in an early boot stage.
>>>>
>>>> We have plenty of documentation already
>>>>
>>>> mm/memory_hotplug.c
>>>>
>>>> git grep -C5 device_hotplug mm/memory_hotplug.c
>>>>
>>>> Also see
>>>>
>>>> Documentation/core-api/memory-hotplug.rst
>>>
>>> OK, fair enough. I was more pointing to a documentation right there
>>> where the lock is declared because that is the place where people
>>> usually check for documentation. The core-api documentation looks quite
>>> nice. And based on that doc it seems that this patch is actually not
>>> needed because neither the online/offline or cpu hotplug should be
>>> possible that early unless I am missing something.
>>
>> I really prefer to stick to locking rules as outlined on the
>> interfaces if it doesn't hurt. Why it is not needed is not clear.
>>
>>>
>>>> Regarding the early stage: primarily lockdep as I mentioned.
>>>
>>> Could you add a lockdep splat that would be fixed by this patch to the
>>> changelog for reference?
>>>
>>
>> I have one where I enforce what's documented (but that's of course not
>> upstream and therefore not "real" yet)
> 
> Then I suppose to not add locking for something that is not a problem.
> Really, think about it. People will look at this code and follow the
> lead without really knowing why the locking is needed.
> device_hotplug_lock has its purpose and if the code in question doesn't
> need synchronization for the documented scenarios then the locking
> simply shouldn't be there. Adding the lock just because of a
> non-existing, and IMHO dubious, lockdep splats is just wrong.
> 
> We need to rationalize the locking here, not to add more hacks.

No, sorry. The real hack is calling a function that is *documented* to
be called under lock without it. That is an optimization for a special
case. That is the black magic in the code.

The only alternative I see to this patch is adding a comment like

/*
 * We end up calling __add_memory() without the device_hotplug_lock
 * held. This is fine as we cannot race with other hotplug activities
 * and userspace trying to online memory blocks.
 */

Personally, I don't think that's any better than just grabbing the lock
as we are told to. (honestly, I don't see how optimizing away the lock
here is of *any* help to optimize our overall memory hotplug locking)

@Rafael, what's your take? lock or comment?
Rafael J. Wysocki July 25, 2019, 9:23 p.m. UTC | #11
On Thu, Jul 25, 2019 at 10:49 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 25.07.19 21:19, Michal Hocko wrote:
> > On Thu 25-07-19 16:35:07, David Hildenbrand wrote:
> >> On 25.07.19 15:57, Michal Hocko wrote:
> >>> On Thu 25-07-19 15:05:02, David Hildenbrand wrote:
> >>>> On 25.07.19 14:56, Michal Hocko wrote:
> >>>>> On Wed 24-07-19 16:30:17, David Hildenbrand wrote:
> >>>>>> We end up calling __add_memory() without the device hotplug lock held.
> >>>>>> (I used a local patch to assert in __add_memory() that the
> >>>>>>  device_hotplug_lock is held - I might upstream that as well soon)
> >>>>>>
> >>>>>> [   26.771684]        create_memory_block_devices+0xa4/0x140
> >>>>>> [   26.772952]        add_memory_resource+0xde/0x200
> >>>>>> [   26.773987]        __add_memory+0x6e/0xa0
> >>>>>> [   26.775161]        acpi_memory_device_add+0x149/0x2b0
> >>>>>> [   26.776263]        acpi_bus_attach+0xf1/0x1f0
> >>>>>> [   26.777247]        acpi_bus_attach+0x66/0x1f0
> >>>>>> [   26.778268]        acpi_bus_attach+0x66/0x1f0
> >>>>>> [   26.779073]        acpi_bus_attach+0x66/0x1f0
> >>>>>> [   26.780143]        acpi_bus_scan+0x3e/0x90
> >>>>>> [   26.780844]        acpi_scan_init+0x109/0x257
> >>>>>> [   26.781638]        acpi_init+0x2ab/0x30d
> >>>>>> [   26.782248]        do_one_initcall+0x58/0x2cf
> >>>>>> [   26.783181]        kernel_init_freeable+0x1bd/0x247
> >>>>>> [   26.784345]        kernel_init+0x5/0xf1
> >>>>>> [   26.785314]        ret_from_fork+0x3a/0x50
> >>>>>>
> >>>>>> So perform the locking just like in acpi_device_hotplug().
> >>>>>
> >>>>> While playing with the device_hotplug_lock, can we actually document
> >>>>> what it is protecting please? I have a bad feeling that we are adding
> >>>>> this lock just because some other code path does rather than with a good
> >>>>> idea why it is needed. This patch just confirms that. What exactly does
> >>>>> the lock protect from here in an early boot stage.
> >>>>
> >>>> We have plenty of documentation already
> >>>>
> >>>> mm/memory_hotplug.c
> >>>>
> >>>> git grep -C5 device_hotplug mm/memory_hotplug.c
> >>>>
> >>>> Also see
> >>>>
> >>>> Documentation/core-api/memory-hotplug.rst
> >>>
> >>> OK, fair enough. I was more pointing to a documentation right there
> >>> where the lock is declared because that is the place where people
> >>> usually check for documentation. The core-api documentation looks quite
> >>> nice. And based on that doc it seems that this patch is actually not
> >>> needed because neither the online/offline or cpu hotplug should be
> >>> possible that early unless I am missing something.
> >>
> >> I really prefer to stick to locking rules as outlined on the
> >> interfaces if it doesn't hurt. Why it is not needed is not clear.
> >>
> >>>
> >>>> Regarding the early stage: primarily lockdep as I mentioned.
> >>>
> >>> Could you add a lockdep splat that would be fixed by this patch to the
> >>> changelog for reference?
> >>>
> >>
> >> I have one where I enforce what's documented (but that's of course not
> >> upstream and therefore not "real" yet)
> >
> > Then I suppose to not add locking for something that is not a problem.
> > Really, think about it. People will look at this code and follow the
> > lead without really knowing why the locking is needed.
> > device_hotplug_lock has its purpose and if the code in question doesn't
> > need synchronization for the documented scenarios then the locking
> > simply shouldn't be there. Adding the lock just because of a
> > non-existing, and IMHO dubious, lockdep splats is just wrong.
> >
> > We need to rationalize the locking here, not to add more hacks.
>
> No, sorry. The real hack is calling a function that is *documented* to
> be called under lock without it. That is an optimization for a special
> case. That is the black magic in the code.
>
> The only alternative I see to this patch is adding a comment like
>
> /*
>  * We end up calling __add_memory() without the device_hotplug_lock
>  * held. This is fine as we cannot race with other hotplug activities
>  * and userspace trying to online memory blocks.
>  */
>
> Personally, I don't think that's any better than just grabbing the lock
> as we are told to. (honestly, I don't see how optimizing away the lock
> here is of *any* help to optimize our overall memory hotplug locking)
>
> @Rafael, what's your take? lock or comment?

Well, I have ACKed your patch already. :-)

That said, adding a comment stating that the lock is acquired mostly
for consistency wouldn't hurt.
David Hildenbrand July 26, 2019, 7:20 a.m. UTC | #12
On 25.07.19 23:23, Rafael J. Wysocki wrote:
> On Thu, Jul 25, 2019 at 10:49 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 25.07.19 21:19, Michal Hocko wrote:
>>> On Thu 25-07-19 16:35:07, David Hildenbrand wrote:
>>>> On 25.07.19 15:57, Michal Hocko wrote:
>>>>> On Thu 25-07-19 15:05:02, David Hildenbrand wrote:
>>>>>> On 25.07.19 14:56, Michal Hocko wrote:
>>>>>>> On Wed 24-07-19 16:30:17, David Hildenbrand wrote:
>>>>>>>> We end up calling __add_memory() without the device hotplug lock held.
>>>>>>>> (I used a local patch to assert in __add_memory() that the
>>>>>>>>  device_hotplug_lock is held - I might upstream that as well soon)
>>>>>>>>
>>>>>>>> [   26.771684]        create_memory_block_devices+0xa4/0x140
>>>>>>>> [   26.772952]        add_memory_resource+0xde/0x200
>>>>>>>> [   26.773987]        __add_memory+0x6e/0xa0
>>>>>>>> [   26.775161]        acpi_memory_device_add+0x149/0x2b0
>>>>>>>> [   26.776263]        acpi_bus_attach+0xf1/0x1f0
>>>>>>>> [   26.777247]        acpi_bus_attach+0x66/0x1f0
>>>>>>>> [   26.778268]        acpi_bus_attach+0x66/0x1f0
>>>>>>>> [   26.779073]        acpi_bus_attach+0x66/0x1f0
>>>>>>>> [   26.780143]        acpi_bus_scan+0x3e/0x90
>>>>>>>> [   26.780844]        acpi_scan_init+0x109/0x257
>>>>>>>> [   26.781638]        acpi_init+0x2ab/0x30d
>>>>>>>> [   26.782248]        do_one_initcall+0x58/0x2cf
>>>>>>>> [   26.783181]        kernel_init_freeable+0x1bd/0x247
>>>>>>>> [   26.784345]        kernel_init+0x5/0xf1
>>>>>>>> [   26.785314]        ret_from_fork+0x3a/0x50
>>>>>>>>
>>>>>>>> So perform the locking just like in acpi_device_hotplug().
>>>>>>>
>>>>>>> While playing with the device_hotplug_lock, can we actually document
>>>>>>> what it is protecting please? I have a bad feeling that we are adding
>>>>>>> this lock just because some other code path does rather than with a good
>>>>>>> idea why it is needed. This patch just confirms that. What exactly does
>>>>>>> the lock protect from here in an early boot stage.
>>>>>>
>>>>>> We have plenty of documentation already
>>>>>>
>>>>>> mm/memory_hotplug.c
>>>>>>
>>>>>> git grep -C5 device_hotplug mm/memory_hotplug.c
>>>>>>
>>>>>> Also see
>>>>>>
>>>>>> Documentation/core-api/memory-hotplug.rst
>>>>>
>>>>> OK, fair enough. I was more pointing to a documentation right there
>>>>> where the lock is declared because that is the place where people
>>>>> usually check for documentation. The core-api documentation looks quite
>>>>> nice. And based on that doc it seems that this patch is actually not
>>>>> needed because neither the online/offline or cpu hotplug should be
>>>>> possible that early unless I am missing something.
>>>>
>>>> I really prefer to stick to locking rules as outlined on the
>>>> interfaces if it doesn't hurt. Why it is not needed is not clear.
>>>>
>>>>>
>>>>>> Regarding the early stage: primarily lockdep as I mentioned.
>>>>>
>>>>> Could you add a lockdep splat that would be fixed by this patch to the
>>>>> changelog for reference?
>>>>>
>>>>
>>>> I have one where I enforce what's documented (but that's of course not
>>>> upstream and therefore not "real" yet)
>>>
>>> Then I suppose to not add locking for something that is not a problem.
>>> Really, think about it. People will look at this code and follow the
>>> lead without really knowing why the locking is needed.
>>> device_hotplug_lock has its purpose and if the code in question doesn't
>>> need synchronization for the documented scenarios then the locking
>>> simply shouldn't be there. Adding the lock just because of a
>>> non-existing, and IMHO dubious, lockdep splats is just wrong.
>>>
>>> We need to rationalize the locking here, not to add more hacks.
>>
>> No, sorry. The real hack is calling a function that is *documented* to
>> be called under lock without it. That is an optimization for a special
>> case. That is the black magic in the code.
>>
>> The only alternative I see to this patch is adding a comment like
>>
>> /*
>>  * We end up calling __add_memory() without the device_hotplug_lock
>>  * held. This is fine as we cannot race with other hotplug activities
>>  * and userspace trying to online memory blocks.
>>  */
>>
>> Personally, I don't think that's any better than just grabbing the lock
>> as we are told to. (honestly, I don't see how optimizing away the lock
>> here is of *any* help to optimize our overall memory hotplug locking)
>>
>> @Rafael, what's your take? lock or comment?
> 
> Well, I have ACKed your patch already. :-)

It's never to late to un-ACK if you changed your mind :)

> 
> That said, adding a comment stating that the lock is acquired mostly
> for consistency wouldn't hurt.
> 

I can certainly do that. Thanks!
Michal Hocko July 26, 2019, 7:57 a.m. UTC | #13
On Thu 25-07-19 22:49:36, David Hildenbrand wrote:
> On 25.07.19 21:19, Michal Hocko wrote:
[...]
> > We need to rationalize the locking here, not to add more hacks.
> 
> No, sorry. The real hack is calling a function that is *documented* to
> be called under lock without it. That is an optimization for a special
> case. That is the black magic in the code.

OK, let me ask differently. What does the device_hotplug_lock actually
protects from in the add_memory path? (Which data structures)

This function is meant to be used when struct pages and node/zone data
structures should be updated. Why should we even care about some device
concept here? This should all be handled a layer up. Not all memory will
have user space API to control online/offline state.
David Hildenbrand July 26, 2019, 8:05 a.m. UTC | #14
On 26.07.19 09:57, Michal Hocko wrote:
> On Thu 25-07-19 22:49:36, David Hildenbrand wrote:
>> On 25.07.19 21:19, Michal Hocko wrote:
> [...]
>>> We need to rationalize the locking here, not to add more hacks.
>>
>> No, sorry. The real hack is calling a function that is *documented* to
>> be called under lock without it. That is an optimization for a special
>> case. That is the black magic in the code.
> 
> OK, let me ask differently. What does the device_hotplug_lock actually
> protects from in the add_memory path? (Which data structures)
> 
> This function is meant to be used when struct pages and node/zone data
> structures should be updated. Why should we even care about some device
> concept here? This should all be handled a layer up. Not all memory will
> have user space API to control online/offline state.

Via add_memory()/__add_memory() we create memory block devices for all
memory. So all memory we create via this function (IOW, hotplug) will
have user space APIs.

Sorry, I can't follow what you are saying here - are you confusing the
function we are talking about with arch_add_memory() ? (where I pulled
out the creation of memory block devices)
Michal Hocko July 26, 2019, 8:31 a.m. UTC | #15
On Fri 26-07-19 10:05:58, David Hildenbrand wrote:
> On 26.07.19 09:57, Michal Hocko wrote:
> > On Thu 25-07-19 22:49:36, David Hildenbrand wrote:
> >> On 25.07.19 21:19, Michal Hocko wrote:
> > [...]
> >>> We need to rationalize the locking here, not to add more hacks.
> >>
> >> No, sorry. The real hack is calling a function that is *documented* to
> >> be called under lock without it. That is an optimization for a special
> >> case. That is the black magic in the code.
> > 
> > OK, let me ask differently. What does the device_hotplug_lock actually
> > protects from in the add_memory path? (Which data structures)
> > 
> > This function is meant to be used when struct pages and node/zone data
> > structures should be updated. Why should we even care about some device
> > concept here? This should all be handled a layer up. Not all memory will
> > have user space API to control online/offline state.
> 
> Via add_memory()/__add_memory() we create memory block devices for all
> memory. So all memory we create via this function (IOW, hotplug) will
> have user space APIs.

Ups, I have mixed add_memory with add_pages which I've had in mind while
writing that. Sorry about the confusion.

Anyway, my dislike of the device_hotplug_lock persists. I would really
love to see it go rather than grow even more to the hotplug code. We
should be really striving for mem hotplug internal and ideally range
defined locking longterm.
David Hildenbrand July 26, 2019, 8:36 a.m. UTC | #16
On 26.07.19 10:31, Michal Hocko wrote:
> On Fri 26-07-19 10:05:58, David Hildenbrand wrote:
>> On 26.07.19 09:57, Michal Hocko wrote:
>>> On Thu 25-07-19 22:49:36, David Hildenbrand wrote:
>>>> On 25.07.19 21:19, Michal Hocko wrote:
>>> [...]
>>>>> We need to rationalize the locking here, not to add more hacks.
>>>>
>>>> No, sorry. The real hack is calling a function that is *documented* to
>>>> be called under lock without it. That is an optimization for a special
>>>> case. That is the black magic in the code.
>>>
>>> OK, let me ask differently. What does the device_hotplug_lock actually
>>> protects from in the add_memory path? (Which data structures)
>>>
>>> This function is meant to be used when struct pages and node/zone data
>>> structures should be updated. Why should we even care about some device
>>> concept here? This should all be handled a layer up. Not all memory will
>>> have user space API to control online/offline state.
>>
>> Via add_memory()/__add_memory() we create memory block devices for all
>> memory. So all memory we create via this function (IOW, hotplug) will
>> have user space APIs.
> 
> Ups, I have mixed add_memory with add_pages which I've had in mind while
> writing that. Sorry about the confusion.

No worries :)

> 
> Anyway, my dislike of the device_hotplug_lock persists. I would really
> love to see it go rather than grow even more to the hotplug code. We
> should be really striving for mem hotplug internal and ideally range
> defined locking longterm. 

Yes, and that is a different story, because it will require major
changes to all add_memory() users. (esp, due to the documented race
conditions). Having that said, memory hotplug locking is not ideal yet.
Michal Hocko July 26, 2019, 8:44 a.m. UTC | #17
On Fri 26-07-19 10:36:42, David Hildenbrand wrote:
> On 26.07.19 10:31, Michal Hocko wrote:
[...]
> > Anyway, my dislike of the device_hotplug_lock persists. I would really
> > love to see it go rather than grow even more to the hotplug code. We
> > should be really striving for mem hotplug internal and ideally range
> > defined locking longterm. 
> 
> Yes, and that is a different story, because it will require major
> changes to all add_memory() users. (esp, due to the documented race
> conditions). Having that said, memory hotplug locking is not ideal yet.

I am really happy to hear that we are on the same page here. Do we have
any document (I am sorry but I am lacking behind recent development in
this area) that describes roadblocks to remove device_hotplug_lock?
David Hildenbrand July 26, 2019, 8:57 a.m. UTC | #18
On 26.07.19 10:44, Michal Hocko wrote:
> On Fri 26-07-19 10:36:42, David Hildenbrand wrote:
>> On 26.07.19 10:31, Michal Hocko wrote:
> [...]
>>> Anyway, my dislike of the device_hotplug_lock persists. I would really
>>> love to see it go rather than grow even more to the hotplug code. We
>>> should be really striving for mem hotplug internal and ideally range
>>> defined locking longterm. 
>>
>> Yes, and that is a different story, because it will require major
>> changes to all add_memory() users. (esp, due to the documented race
>> conditions). Having that said, memory hotplug locking is not ideal yet.
> 
> I am really happy to hear that we are on the same page here. Do we have
> any document (I am sorry but I am lacking behind recent development in
> this area) that describes roadblocks to remove device_hotplug_lock?

Only the core-api document I mentioned (I documented there quite some
current conditions I identified back then).

I am not sure if we can remove it completely from
add_memory()/remove_memory(): We actually create/delete devices which
can otherwise create races with user space.

Besides that:
- try_offline_node() needs the lock to synchronize against cpu hotplug
- I *assume* try_online_node() needs it as well

Then, there is the possible race condition with user space onlining
memory avoided by the lock. Also, currently the lock protects the
"online_type" when onlining memory.

Then, there might be other global variables (eventually
zone/node/section related) that might need this lock right now - no
details known.

IOW, we have to be very carefully and it is more involved than it might
seem.

Locking is definitely better (and more reliably!) than one year ago, but
there is definitely a lot to do. (unfortunately, just like in many areas
in memory hotplug code :( - say zone handling when offlining/failing to
online memory).
Michal Hocko July 26, 2019, 10:31 a.m. UTC | #19
On Fri 26-07-19 10:57:52, David Hildenbrand wrote:
> On 26.07.19 10:44, Michal Hocko wrote:
> > On Fri 26-07-19 10:36:42, David Hildenbrand wrote:
> >> On 26.07.19 10:31, Michal Hocko wrote:
> > [...]
> >>> Anyway, my dislike of the device_hotplug_lock persists. I would really
> >>> love to see it go rather than grow even more to the hotplug code. We
> >>> should be really striving for mem hotplug internal and ideally range
> >>> defined locking longterm. 
> >>
> >> Yes, and that is a different story, because it will require major
> >> changes to all add_memory() users. (esp, due to the documented race
> >> conditions). Having that said, memory hotplug locking is not ideal yet.
> > 
> > I am really happy to hear that we are on the same page here. Do we have
> > any document (I am sorry but I am lacking behind recent development in
> > this area) that describes roadblocks to remove device_hotplug_lock?
> 
> Only the core-api document I mentioned (I documented there quite some
> current conditions I identified back then).

That document doesn't describe which _data structures_ are protected by
the lock though. It documents only the current state of locking.

> I am not sure if we can remove it completely from
> add_memory()/remove_memory(): We actually create/delete devices which
> can otherwise create races with user space.

More details would be really appreciated.

> Besides that:
> - try_offline_node() needs the lock to synchronize against cpu hotplug
> - I *assume* try_online_node() needs it as well

more details on why would be great.

> Then, there is the possible race condition with user space onlining
> memory avoided by the lock. Also, currently the lock protects the
> "online_type" when onlining memory.

I do not see the race, if the user API triggered online/offline takes a
range lock on the affected physical memory range

> Then, there might be other global variables (eventually
> zone/node/section related) that might need this lock right now - no
> details known.

zones/nodes have their own locking for spans. Sections should be using
a low level locking but I am not really sure this is needed if there is
a mem hotplug lock in place (range or global)

> IOW, we have to be very carefully and it is more involved than it might
> seem.

I am not questioning that. And that is why I am asking about a todo list
for that transition.

> Locking is definitely better (and more reliably!) than one year ago, but
> there is definitely a lot to do. (unfortunately, just like in many areas
> in memory hotplug code :( - say zone handling when offlining/failing to
> online memory).

Yeah, the code is shaping up. And I am happy to see that happening. But
please try to understand that I really do not like to see some ad-hoc
locking enforcement without a clear locking model in place. This patch
is an example of it. Whoever would like to rationalize locking further
will have to stumble over this and scratch head why the hack the locking
is there and my experience tells me that people usually go along with
existing code and make further assumptions based on that so we are
unlikely to get rid of the locking...
David Hildenbrand July 26, 2019, 10:37 a.m. UTC | #20
On 26.07.19 12:31, Michal Hocko wrote:
> On Fri 26-07-19 10:57:52, David Hildenbrand wrote:
>> On 26.07.19 10:44, Michal Hocko wrote:
>>> On Fri 26-07-19 10:36:42, David Hildenbrand wrote:
>>>> On 26.07.19 10:31, Michal Hocko wrote:
>>> [...]
>>>>> Anyway, my dislike of the device_hotplug_lock persists. I would really
>>>>> love to see it go rather than grow even more to the hotplug code. We
>>>>> should be really striving for mem hotplug internal and ideally range
>>>>> defined locking longterm. 
>>>>
>>>> Yes, and that is a different story, because it will require major
>>>> changes to all add_memory() users. (esp, due to the documented race
>>>> conditions). Having that said, memory hotplug locking is not ideal yet.
>>>
>>> I am really happy to hear that we are on the same page here. Do we have
>>> any document (I am sorry but I am lacking behind recent development in
>>> this area) that describes roadblocks to remove device_hotplug_lock?
>>
>> Only the core-api document I mentioned (I documented there quite some
>> current conditions I identified back then).
> 
> That document doesn't describe which _data structures_ are protected by
> the lock though. It documents only the current state of locking.

Yeah, I also thing we should find out more and document it.
Unfortunately, optimize the locking is not very high on my priority list
(there are more critical things to figure out than optimizing locking
that at least seems to work :) ). It is on my list, though.

> 
>> I am not sure if we can remove it completely from
>> add_memory()/remove_memory(): We actually create/delete devices which
>> can otherwise create races with user space.
> 
> More details would be really appreciated.
> 
>> Besides that:
>> - try_offline_node() needs the lock to synchronize against cpu hotplug
>> - I *assume* try_online_node() needs it as well
> 
> more details on why would be great.
> 
>> Then, there is the possible race condition with user space onlining
>> memory avoided by the lock. Also, currently the lock protects the
>> "online_type" when onlining memory.
> 
> I do not see the race, if the user API triggered online/offline takes a
> range lock on the affected physical memory range

Yeah, and that's still future work. Another item on the list.

> 
>> Then, there might be other global variables (eventually
>> zone/node/section related) that might need this lock right now - no
>> details known.
> 
> zones/nodes have their own locking for spans. Sections should be using
> a low level locking but I am not really sure this is needed if there is
> a mem hotplug lock in place (range or global)
> 
>> IOW, we have to be very carefully and it is more involved than it might
>> seem.
> 
> I am not questioning that. And that is why I am asking about a todo list
> for that transition.

I think somebody will have to invest quite some effort to create that
todo list first :) (I'd love to provide more information right now, but
I don't really have more)

> 
>> Locking is definitely better (and more reliably!) than one year ago, but
>> there is definitely a lot to do. (unfortunately, just like in many areas
>> in memory hotplug code :( - say zone handling when offlining/failing to
>> online memory).
> 
> Yeah, the code is shaping up. And I am happy to see that happening. But
> please try to understand that I really do not like to see some ad-hoc
> locking enforcement without a clear locking model in place. This patch
> is an example of it. Whoever would like to rationalize locking further
> will have to stumble over this and scratch head why the hack the locking
> is there and my experience tells me that people usually go along with
> existing code and make further assumptions based on that so we are
> unlikely to get rid of the locking...

I do understand, but we really have to rethink locking in a more broad
sense and document it. Here, I am going to add a comment as requested by
Rafael.

Patch
diff mbox series

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 0e28270b0fd8..cbc9d64b48dd 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -2204,7 +2204,9 @@  int __init acpi_scan_init(void)
 	acpi_gpe_apply_masked_gpes();
 	acpi_update_all_gpes();
 
+	lock_device_hotplug();
 	mutex_lock(&acpi_scan_lock);
+
 	/*
 	 * Enumerate devices in the ACPI namespace.
 	 */
@@ -2232,6 +2234,7 @@  int __init acpi_scan_init(void)
 
  out:
 	mutex_unlock(&acpi_scan_lock);
+	unlock_device_hotplug();
 	return result;
 }