All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Documentation/power: Update docs about suspend and CPU hotplug
@ 2011-10-11 20:19 Srivatsa S. Bhat
  2011-10-11 22:02 ` Rafael J. Wysocki
  0 siblings, 1 reply; 12+ messages in thread
From: Srivatsa S. Bhat @ 2011-10-11 20:19 UTC (permalink / raw)
  To: rdunlap
  Cc: rjw, len.brown, pavel, a.p.zijlstra, linux-pm, linux-doc, linux-kernel

Update the documentation about the interaction between the suspend (S3) call
path and the CPU hotplug infrastructure.
This patch focusses only on the activities of the freezer, cpu hotplug and
the notifications involved. It outlines how regular CPU hotplug differs from
the way it is invoked during suspend and also tries to explain the locking
involved.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 Documentation/power/00-INDEX                   |    2 
 Documentation/power/suspend-and-cpuhotplug.txt |  113 ++++++++++++++++++++++++
 2 files changed, 115 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/power/suspend-and-cpuhotplug.txt

diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
index 45e9d4a..a4d682f 100644
--- a/Documentation/power/00-INDEX
+++ b/Documentation/power/00-INDEX
@@ -26,6 +26,8 @@ s2ram.txt
 	- How to get suspend to ram working (and debug it when it isn't)
 states.txt
 	- System power management states
+suspend-and-cpuhotplug.txt
+	- Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug
 swsusp-and-swap-files.txt
 	- Using swap files with software suspend (to disk)
 swsusp-dmcrypt.txt
diff --git a/Documentation/power/suspend-and-cpuhotplug.txt b/Documentation/power/suspend-and-cpuhotplug.txt
new file mode 100644
index 0000000..d0ba411
--- /dev/null
+++ b/Documentation/power/suspend-and-cpuhotplug.txt
@@ -0,0 +1,113 @@
+Interaction of Suspend code (S3) with the CPU hotplug infrastructure
+   (C) 2011 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>, GPL
+
+
+I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?
+
+Well, a picture speaks more than a thousand words... So ASCII art follows :-)
+
+[This depicts the current design in the kernel, and focusses only on the
+interactions between suspend call paths involving the freezer and cpu hotplug
+and also tries to explain the locking involved. It also outlines the
+notifications involved.]
+
+On a high level, the suspend-resume cycle goes like this:
+
+|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
+|tasks |    |     cpus      |    |          |    |     cpus     |    |tasks|
+
+
+More details follow:
+
+Regular CPU hotplug                                   Suspend call path
+-------------------                              ---------------------------
+
+Write 0 (or 1) to                                     Write 'mem' to
+/sys/devices/system/cpu/cpu*/online                   /sys/power/state
+     sysfs file                                          syfs file
+         |                                                   |
+         |                                                   v
+         |                                         Acquire pm_mutex lock
+         |                                                   |
+         |                                                   v
+         |                                Send PM_SUSPEND_PREPARE notifications
+         |                                                   |
+         |                                                   v
+         |                                              Freeze tasks
+         |                                                   |
+         |                                                   |
+         v                                                   v
+     cpu_down()                                 disable_nonboot_cpus() /*start*/
+         |                                                   |
+         v                                                   v
+Acquire cpu_add_remove_lock                     Acquire cpu_add_remove_lock
+         |                                                   |
+         v                                                   v
+If cpu_hotplug_disabled is 1                 Iterate over CURRENTLY online CPUs
+   return gracefully                                         |
+         |                                                   |
+         |                                                   |           ----
+         v                                                   v               |
+           \                                               /                 |
+             --------                             --------                   |
+                      \                         /                            |
+                        --------       --------                              |L
+                                 \____/                                      |
+                                   |                                         |
+                                   v                                         |O
+                              _cpu_down()                                    |
+                     [This takes cpuhotplug.lock                             |
+                      before taking down the CPU                             |
+                      and releases it when done]                             |O
+                   While it is at it, notifications                          |
+                   are sent when notable events occur,                       |
+                   by running all registered callbacks.                      |
+                                   |                                         |O
+                                  / \                                        |
+                                 /   \                                       |
+                                <     >                                      |
+        _______________________/       \_____________________                |P
+       |                                                     |               |
+       v                                                     v               |
+Release cpu_add_remove_lock                      Note down these cpus in     |
+[That's it!, for                                     frozen_cpus mask    ----
+ regular CPU hotplug]                                        |
+                                                             v
+                                                  Disable regular cpu hotplug
+                                              by setting cpu_hotplug_disabled=1
+                                                             |
+                                                             v
+                                                Release cpu_add_remove_lock
+                                                             |
+                                                             v
+                                          /* disable_nonboot_cpus() complete */
+                                                             |
+                                                             v
+                                                        Do suspend
+
+
+Resuming back is likewise, with the counterparts being (in the order of
+execution during resume):
+* enable_nonboot_cpus() which involves:
+   |  Acquire cpu_add_remove_lock
+   |  Reset cpu_hotplug_disabled to 0, thereby enabling regular cpu hotplug
+   |  Call _cpu_up() [for all those cpus in the frozen_cpus mask, in a loop]
+   |  Release cpu_add_remove_lock
+   v
+
+* thaw tasks
+* send PM_POST_SUSPEND notifications
+* Release pm_mutex lock.
+
+It is to be noted here that the pm_mutex lock is acquired at the very
+beginning, when we are just starting out to suspend, and then released only
+after the entire cycle is complete (i.e., suspend + resume).
+
+
+Important files and functions/entry points:
+------------------------------------------
+
+kernel/power/process.c : freeze_processes(), thaw_processes()
+kernel/power/suspend.c : suspend_prepare(), suspend_enter(), suspend_finish()
+kernel/cpu.c: cpu_[up|down](), _cpu_[up|down](), [disable|enable]_nonboot_cpus()
+


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Documentation/power: Update docs about suspend and CPU hotplug
  2011-10-11 20:19 [PATCH] Documentation/power: Update docs about suspend and CPU hotplug Srivatsa S. Bhat
@ 2011-10-11 22:02 ` Rafael J. Wysocki
  2011-10-12  4:17   ` Srivatsa S. Bhat
  0 siblings, 1 reply; 12+ messages in thread
From: Rafael J. Wysocki @ 2011-10-11 22:02 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: rdunlap, len.brown, pavel, a.p.zijlstra, linux-pm, linux-doc,
	linux-kernel

On Tuesday, October 11, 2011, Srivatsa S. Bhat wrote:
> Update the documentation about the interaction between the suspend (S3) call
> path and the CPU hotplug infrastructure.
> This patch focusses only on the activities of the freezer, cpu hotplug and
> the notifications involved. It outlines how regular CPU hotplug differs from
> the way it is invoked during suspend and also tries to explain the locking
> involved.
> 
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
> 
>  Documentation/power/00-INDEX                   |    2 
>  Documentation/power/suspend-and-cpuhotplug.txt |  113 ++++++++++++++++++++++++
>  2 files changed, 115 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/power/suspend-and-cpuhotplug.txt
> 
> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
> index 45e9d4a..a4d682f 100644
> --- a/Documentation/power/00-INDEX
> +++ b/Documentation/power/00-INDEX
> @@ -26,6 +26,8 @@ s2ram.txt
>  	- How to get suspend to ram working (and debug it when it isn't)
>  states.txt
>  	- System power management states
> +suspend-and-cpuhotplug.txt
> +	- Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug
>  swsusp-and-swap-files.txt
>  	- Using swap files with software suspend (to disk)
>  swsusp-dmcrypt.txt
> diff --git a/Documentation/power/suspend-and-cpuhotplug.txt b/Documentation/power/suspend-and-cpuhotplug.txt
> new file mode 100644
> index 0000000..d0ba411
> --- /dev/null
> +++ b/Documentation/power/suspend-and-cpuhotplug.txt
> @@ -0,0 +1,113 @@
> +Interaction of Suspend code (S3) with the CPU hotplug infrastructure
> +   (C) 2011 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>, GPL
> +
> +
> +I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?
> +
> +Well, a picture speaks more than a thousand words... So ASCII art follows :-)
> +
> +[This depicts the current design in the kernel, and focusses only on the
> +interactions between suspend call paths involving the freezer and cpu hotplug
> +and also tries to explain the locking involved. It also outlines the
> +notifications involved.]
> +
> +On a high level, the suspend-resume cycle goes like this:
> +
> +|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
> +|tasks |    |     cpus      |    |          |    |     cpus     |    |tasks|
> +
> +
> +More details follow:
> +
> +Regular CPU hotplug                                   Suspend call path
> +-------------------                              ---------------------------
> +
> +Write 0 (or 1) to                                     Write 'mem' to
> +/sys/devices/system/cpu/cpu*/online                   /sys/power/state
> +     sysfs file                                          syfs file
> +         |                                                   |
> +         |                                                   v
> +         |                                         Acquire pm_mutex lock
> +         |                                                   |
> +         |                                                   v
> +         |                                Send PM_SUSPEND_PREPARE notifications
> +         |                                                   |
> +         |                                                   v
> +         |                                              Freeze tasks

OK, so something appears to be missing here.  Namely, the task writing to
/sys/devices/system/cpu/cpu*/online should be frozen at this point or
suspend should be aborted.  I suppose neither of these happens and I wonder
why exactly.

Thanks,
Rafael


> +         |                                                   |
> +         |                                                   |
> +         v                                                   v
> +     cpu_down()                                 disable_nonboot_cpus() /*start*/
> +         |                                                   |
> +         v                                                   v
> +Acquire cpu_add_remove_lock                     Acquire cpu_add_remove_lock
> +         |                                                   |
> +         v                                                   v
> +If cpu_hotplug_disabled is 1                 Iterate over CURRENTLY online CPUs
> +   return gracefully                                         |
> +         |                                                   |
> +         |                                                   |           ----
> +         v                                                   v               |
> +           \                                               /                 |
> +             --------                             --------                   |
> +                      \                         /                            |
> +                        --------       --------                              |L
> +                                 \____/                                      |
> +                                   |                                         |
> +                                   v                                         |O
> +                              _cpu_down()                                    |
> +                     [This takes cpuhotplug.lock                             |
> +                      before taking down the CPU                             |
> +                      and releases it when done]                             |O
> +                   While it is at it, notifications                          |
> +                   are sent when notable events occur,                       |
> +                   by running all registered callbacks.                      |
> +                                   |                                         |O
> +                                  / \                                        |
> +                                 /   \                                       |
> +                                <     >                                      |
> +        _______________________/       \_____________________                |P
> +       |                                                     |               |
> +       v                                                     v               |
> +Release cpu_add_remove_lock                      Note down these cpus in     |
> +[That's it!, for                                     frozen_cpus mask    ----
> + regular CPU hotplug]                                        |
> +                                                             v
> +                                                  Disable regular cpu hotplug
> +                                              by setting cpu_hotplug_disabled=1
> +                                                             |
> +                                                             v
> +                                                Release cpu_add_remove_lock
> +                                                             |
> +                                                             v
> +                                          /* disable_nonboot_cpus() complete */
> +                                                             |
> +                                                             v
> +                                                        Do suspend
> +
> +
> +Resuming back is likewise, with the counterparts being (in the order of
> +execution during resume):
> +* enable_nonboot_cpus() which involves:
> +   |  Acquire cpu_add_remove_lock
> +   |  Reset cpu_hotplug_disabled to 0, thereby enabling regular cpu hotplug
> +   |  Call _cpu_up() [for all those cpus in the frozen_cpus mask, in a loop]
> +   |  Release cpu_add_remove_lock
> +   v
> +
> +* thaw tasks
> +* send PM_POST_SUSPEND notifications
> +* Release pm_mutex lock.
> +
> +It is to be noted here that the pm_mutex lock is acquired at the very
> +beginning, when we are just starting out to suspend, and then released only
> +after the entire cycle is complete (i.e., suspend + resume).
> +
> +
> +Important files and functions/entry points:
> +------------------------------------------
> +
> +kernel/power/process.c : freeze_processes(), thaw_processes()
> +kernel/power/suspend.c : suspend_prepare(), suspend_enter(), suspend_finish()
> +kernel/cpu.c: cpu_[up|down](), _cpu_[up|down](), [disable|enable]_nonboot_cpus()
> +
> 
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Documentation/power: Update docs about suspend and CPU hotplug
  2011-10-11 22:02 ` Rafael J. Wysocki
@ 2011-10-12  4:17   ` Srivatsa S. Bhat
  2011-10-12 19:19     ` Rafael J. Wysocki
  0 siblings, 1 reply; 12+ messages in thread
From: Srivatsa S. Bhat @ 2011-10-12  4:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: rdunlap, len.brown, pavel, a.p.zijlstra, linux-pm, linux-doc,
	linux-kernel

On 10/12/2011 03:32 AM, Rafael J. Wysocki wrote:
> On Tuesday, October 11, 2011, Srivatsa S. Bhat wrote:
>> Update the documentation about the interaction between the suspend (S3) call
>> path and the CPU hotplug infrastructure.
>> This patch focusses only on the activities of the freezer, cpu hotplug and
>> the notifications involved. It outlines how regular CPU hotplug differs from
>> the way it is invoked during suspend and also tries to explain the locking
>> involved.
>>
>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> ---
>>
>>  Documentation/power/00-INDEX                   |    2 
>>  Documentation/power/suspend-and-cpuhotplug.txt |  113 ++++++++++++++++++++++++
>>  2 files changed, 115 insertions(+), 0 deletions(-)
>>  create mode 100644 Documentation/power/suspend-and-cpuhotplug.txt
>>
>> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
>> index 45e9d4a..a4d682f 100644
>> --- a/Documentation/power/00-INDEX
>> +++ b/Documentation/power/00-INDEX
>> @@ -26,6 +26,8 @@ s2ram.txt
>>  	- How to get suspend to ram working (and debug it when it isn't)
>>  states.txt
>>  	- System power management states
>> +suspend-and-cpuhotplug.txt
>> +	- Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug
>>  swsusp-and-swap-files.txt
>>  	- Using swap files with software suspend (to disk)
>>  swsusp-dmcrypt.txt
>> diff --git a/Documentation/power/suspend-and-cpuhotplug.txt b/Documentation/power/suspend-and-cpuhotplug.txt
>> new file mode 100644
>> index 0000000..d0ba411
>> --- /dev/null
>> +++ b/Documentation/power/suspend-and-cpuhotplug.txt
>> @@ -0,0 +1,113 @@
>> +Interaction of Suspend code (S3) with the CPU hotplug infrastructure
>> +   (C) 2011 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>, GPL
>> +
>> +
>> +I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?
>> +
>> +Well, a picture speaks more than a thousand words... So ASCII art follows :-)
>> +
>> +[This depicts the current design in the kernel, and focusses only on the
>> +interactions between suspend call paths involving the freezer and cpu hotplug
>> +and also tries to explain the locking involved. It also outlines the
>> +notifications involved.]
>> +
>> +On a high level, the suspend-resume cycle goes like this:
>> +
>> +|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
>> +|tasks |    |     cpus      |    |          |    |     cpus     |    |tasks|
>> +
>> +
>> +More details follow:
>> +
>> +Regular CPU hotplug                                   Suspend call path
>> +-------------------                              ---------------------------
>> +
>> +Write 0 (or 1) to                                     Write 'mem' to
>> +/sys/devices/system/cpu/cpu*/online                   /sys/power/state
>> +     sysfs file                                          syfs file
>> +         |                                                   |
>> +         |                                                   v
>> +         |                                         Acquire pm_mutex lock
>> +         |                                                   |
>> +         |                                                   v
>> +         |                                Send PM_SUSPEND_PREPARE notifications
>> +         |                                                   |
>> +         |                                                   v
>> +         |                                              Freeze tasks
> 
> OK, so something appears to be missing here.  Namely, the task writing to
> /sys/devices/system/cpu/cpu*/online should be frozen at this point or
> suspend should be aborted.  I suppose neither of these happens and I wonder
> why exactly.
> 

I have a couple of clarifications to make here:
* Firstly, this picture is not meant to represent what happens when regular
  cpu hotplug and suspend run together. That race condition has not been
  brought out here. What it does try to explain is, how the regular cpu
  hotplug path is different from suspend, and where they share common code.
  Please don't think about timing/race condition when reading it. Its just
  meant to explain the call path and locking involved.

* Secondly, this picture explains the *current* design, and *not* the mutual
  exclusion design I have proposed between regular cpu hotplug and suspend.
  The reason being, this doc was written to help everyone understand the
  current locking schemes, to help evaluate my proposal for a different
  scheme (mutual exclusion).

Now, coming to your point, if that task writing to the sysfs file has not
been frozen, then the current kernel doesn't abort suspend, which is why we are
encountering problems, and which is exactly what my patchset tries to solve.
Link to my patchset:
http://thread.gmane.org/gmane.linux.documentation/3414/focus=3414

> 
>> +         |                                                   |
>> +         |                                                   |
>> +         v                                                   v
>> +     cpu_down()                                 disable_nonboot_cpus() /*start*/
>> +         |                                                   |
>> +         v                                                   v
>> +Acquire cpu_add_remove_lock                     Acquire cpu_add_remove_lock
>> +         |                                                   |
>> +         v                                                   v
>> +If cpu_hotplug_disabled is 1                 Iterate over CURRENTLY online CPUs
>> +   return gracefully                                         |
>> +         |                                                   |
>> +         |                                                   |           ----
>> +         v                                                   v               |
>> +           \                                               /                 |
>> +             --------                             --------                   |
>> +                      \                         /                            |
>> +                        --------       --------                              |L
>> +                                 \____/                                      |
>> +                                   |                                         |
>> +                                   v                                         |O
>> +                              _cpu_down()                                    |
>> +                     [This takes cpuhotplug.lock                             |
>> +                      before taking down the CPU                             |
>> +                      and releases it when done]                             |O
>> +                   While it is at it, notifications                          |
>> +                   are sent when notable events occur,                       |
>> +                   by running all registered callbacks.                      |
>> +                                   |                                         |O
>> +                                  / \                                        |
>> +                                 /   \                                       |
>> +                                <     >                                      |
>> +        _______________________/       \_____________________                |P
>> +       |                                                     |               |
>> +       v                                                     v               |
>> +Release cpu_add_remove_lock                      Note down these cpus in     |
>> +[That's it!, for                                     frozen_cpus mask    ----
>> + regular CPU hotplug]                                        |
>> +                                                             v
>> +                                                  Disable regular cpu hotplug
>> +                                              by setting cpu_hotplug_disabled=1
>> +                                                             |
>> +                                                             v
>> +                                                Release cpu_add_remove_lock
>> +                                                             |
>> +                                                             v
>> +                                          /* disable_nonboot_cpus() complete */
>> +                                                             |
>> +                                                             v
>> +                                                        Do suspend
>> +
>> +
>> +Resuming back is likewise, with the counterparts being (in the order of
>> +execution during resume):
>> +* enable_nonboot_cpus() which involves:
>> +   |  Acquire cpu_add_remove_lock
>> +   |  Reset cpu_hotplug_disabled to 0, thereby enabling regular cpu hotplug
>> +   |  Call _cpu_up() [for all those cpus in the frozen_cpus mask, in a loop]
>> +   |  Release cpu_add_remove_lock
>> +   v
>> +
>> +* thaw tasks
>> +* send PM_POST_SUSPEND notifications
>> +* Release pm_mutex lock.
>> +
>> +It is to be noted here that the pm_mutex lock is acquired at the very
>> +beginning, when we are just starting out to suspend, and then released only
>> +after the entire cycle is complete (i.e., suspend + resume).
>> +
>> +
>> +Important files and functions/entry points:
>> +------------------------------------------
>> +
>> +kernel/power/process.c : freeze_processes(), thaw_processes()
>> +kernel/power/suspend.c : suspend_prepare(), suspend_enter(), suspend_finish()
>> +kernel/cpu.c: cpu_[up|down](), _cpu_[up|down](), [disable|enable]_nonboot_cpus()
>> +
>>
>>
>>
> 


-- 
Regards,
Srivatsa S. Bhat  <srivatsa.bhat@linux.vnet.ibm.com>
Linux Technology Center,
IBM India Systems and Technology Lab


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Documentation/power: Update docs about suspend and CPU hotplug
  2011-10-12  4:17   ` Srivatsa S. Bhat
@ 2011-10-12 19:19     ` Rafael J. Wysocki
  2011-10-12 21:13       ` Srivatsa S. Bhat
  0 siblings, 1 reply; 12+ messages in thread
From: Rafael J. Wysocki @ 2011-10-12 19:19 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: rdunlap, len.brown, pavel, a.p.zijlstra, linux-pm, linux-doc,
	linux-kernel

On Wednesday, October 12, 2011, Srivatsa S. Bhat wrote:
> On 10/12/2011 03:32 AM, Rafael J. Wysocki wrote:
> > On Tuesday, October 11, 2011, Srivatsa S. Bhat wrote:
> >> Update the documentation about the interaction between the suspend (S3) call
> >> path and the CPU hotplug infrastructure.
> >> This patch focusses only on the activities of the freezer, cpu hotplug and
> >> the notifications involved. It outlines how regular CPU hotplug differs from
> >> the way it is invoked during suspend and also tries to explain the locking
> >> involved.
> >>
> >> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> >> ---
> >>
> >>  Documentation/power/00-INDEX                   |    2 
> >>  Documentation/power/suspend-and-cpuhotplug.txt |  113 ++++++++++++++++++++++++
> >>  2 files changed, 115 insertions(+), 0 deletions(-)
> >>  create mode 100644 Documentation/power/suspend-and-cpuhotplug.txt
> >>
> >> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
> >> index 45e9d4a..a4d682f 100644
> >> --- a/Documentation/power/00-INDEX
> >> +++ b/Documentation/power/00-INDEX
> >> @@ -26,6 +26,8 @@ s2ram.txt
> >>  	- How to get suspend to ram working (and debug it when it isn't)
> >>  states.txt
> >>  	- System power management states
> >> +suspend-and-cpuhotplug.txt
> >> +	- Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug
> >>  swsusp-and-swap-files.txt
> >>  	- Using swap files with software suspend (to disk)
> >>  swsusp-dmcrypt.txt
> >> diff --git a/Documentation/power/suspend-and-cpuhotplug.txt b/Documentation/power/suspend-and-cpuhotplug.txt
> >> new file mode 100644
> >> index 0000000..d0ba411
> >> --- /dev/null
> >> +++ b/Documentation/power/suspend-and-cpuhotplug.txt
> >> @@ -0,0 +1,113 @@
> >> +Interaction of Suspend code (S3) with the CPU hotplug infrastructure
> >> +   (C) 2011 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>, GPL
> >> +
> >> +
> >> +I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?
> >> +
> >> +Well, a picture speaks more than a thousand words... So ASCII art follows :-)
> >> +
> >> +[This depicts the current design in the kernel, and focusses only on the
> >> +interactions between suspend call paths involving the freezer and cpu hotplug
> >> +and also tries to explain the locking involved. It also outlines the
> >> +notifications involved.]
> >> +
> >> +On a high level, the suspend-resume cycle goes like this:
> >> +
> >> +|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
> >> +|tasks |    |     cpus      |    |          |    |     cpus     |    |tasks|
> >> +
> >> +
> >> +More details follow:
> >> +
> >> +Regular CPU hotplug                                   Suspend call path
> >> +-------------------                              ---------------------------
> >> +
> >> +Write 0 (or 1) to                                     Write 'mem' to
> >> +/sys/devices/system/cpu/cpu*/online                   /sys/power/state
> >> +     sysfs file                                          syfs file
> >> +         |                                                   |
> >> +         |                                                   v
> >> +         |                                         Acquire pm_mutex lock
> >> +         |                                                   |
> >> +         |                                                   v
> >> +         |                                Send PM_SUSPEND_PREPARE notifications
> >> +         |                                                   |
> >> +         |                                                   v
> >> +         |                                              Freeze tasks
> > 
> > OK, so something appears to be missing here.  Namely, the task writing to
> > /sys/devices/system/cpu/cpu*/online should be frozen at this point or
> > suspend should be aborted.  I suppose neither of these happens and I wonder
> > why exactly.
> > 
> 
> I have a couple of clarifications to make here:
> * Firstly, this picture is not meant to represent what happens when regular
>   cpu hotplug and suspend run together. That race condition has not been
>   brought out here. What it does try to explain is, how the regular cpu
>   hotplug path is different from suspend, and where they share common code.
>   Please don't think about timing/race condition when reading it. Its just
>   meant to explain the call path and locking involved.

Well, I didn't understand this part.  And the question above is:

> I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?

which kind of suggests something different from what you're saying.  Care to
clarify that in the document?

> * Secondly, this picture explains the *current* design, and *not* the mutual
>   exclusion design I have proposed between regular cpu hotplug and suspend.
>   The reason being, this doc was written to help everyone understand the
>   current locking schemes, to help evaluate my proposal for a different
>   scheme (mutual exclusion).

I understand that.

> Now, coming to your point, if that task writing to the sysfs file has not
> been frozen, then the current kernel doesn't abort suspend, which is why we are
> encountering problems, and which is exactly what my patchset tries to solve.
> Link to my patchset:
> http://thread.gmane.org/gmane.linux.documentation/3414/focus=3414

This isn't my point, actually.  My point is that the task writing to
/sys/devices/system/cpu/cpu*/online should be frozen by the freezer.
If it is not frozen, then the freezer should fail.  If that doesn't
happen, there's a bug that has to be fixed and it is _not_ the lack
of mutual exclusion.  The bug is that, apparently, suspend continues
even though there is an unfrozen user space process in the system.

Do you have any idea why that happens?

Rafael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Documentation/power: Update docs about suspend and CPU hotplug
  2011-10-12 19:19     ` Rafael J. Wysocki
@ 2011-10-12 21:13       ` Srivatsa S. Bhat
  2011-10-14  6:24         ` [PATCH v2] " Srivatsa S. Bhat
  2011-10-15 22:42         ` [PATCH] " Rafael J. Wysocki
  0 siblings, 2 replies; 12+ messages in thread
From: Srivatsa S. Bhat @ 2011-10-12 21:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: rdunlap, len.brown, pavel, a.p.zijlstra, linux-pm, linux-doc,
	linux-kernel

On 10/13/2011 12:49 AM, Rafael J. Wysocki wrote:
> On Wednesday, October 12, 2011, Srivatsa S. Bhat wrote:
>> On 10/12/2011 03:32 AM, Rafael J. Wysocki wrote:
>>> On Tuesday, October 11, 2011, Srivatsa S. Bhat wrote:
>>>> Update the documentation about the interaction between the suspend (S3) call
>>>> path and the CPU hotplug infrastructure.
>>>> This patch focusses only on the activities of the freezer, cpu hotplug and
>>>> the notifications involved. It outlines how regular CPU hotplug differs from
>>>> the way it is invoked during suspend and also tries to explain the locking
>>>> involved.
>>>>
>>>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>>> ---
>>>>
>>>>  Documentation/power/00-INDEX                   |    2 
>>>>  Documentation/power/suspend-and-cpuhotplug.txt |  113 ++++++++++++++++++++++++
>>>>  2 files changed, 115 insertions(+), 0 deletions(-)
>>>>  create mode 100644 Documentation/power/suspend-and-cpuhotplug.txt
>>>>
>>>> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
>>>> index 45e9d4a..a4d682f 100644
>>>> --- a/Documentation/power/00-INDEX
>>>> +++ b/Documentation/power/00-INDEX
>>>> @@ -26,6 +26,8 @@ s2ram.txt
>>>>  	- How to get suspend to ram working (and debug it when it isn't)
>>>>  states.txt
>>>>  	- System power management states
>>>> +suspend-and-cpuhotplug.txt
>>>> +	- Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug
>>>>  swsusp-and-swap-files.txt
>>>>  	- Using swap files with software suspend (to disk)
>>>>  swsusp-dmcrypt.txt
>>>> diff --git a/Documentation/power/suspend-and-cpuhotplug.txt b/Documentation/power/suspend-and-cpuhotplug.txt
>>>> new file mode 100644
>>>> index 0000000..d0ba411
>>>> --- /dev/null
>>>> +++ b/Documentation/power/suspend-and-cpuhotplug.txt
>>>> @@ -0,0 +1,113 @@
>>>> +Interaction of Suspend code (S3) with the CPU hotplug infrastructure
>>>> +   (C) 2011 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>, GPL
>>>> +
>>>> +
>>>> +I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?
>>>> +
>>>> +Well, a picture speaks more than a thousand words... So ASCII art follows :-)
>>>> +
>>>> +[This depicts the current design in the kernel, and focusses only on the
>>>> +interactions between suspend call paths involving the freezer and cpu hotplug
>>>> +and also tries to explain the locking involved. It also outlines the
>>>> +notifications involved.]
>>>> +
>>>> +On a high level, the suspend-resume cycle goes like this:
>>>> +
>>>> +|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
>>>> +|tasks |    |     cpus      |    |          |    |     cpus     |    |tasks|
>>>> +
>>>> +
>>>> +More details follow:
>>>> +
>>>> +Regular CPU hotplug                                   Suspend call path
>>>> +-------------------                              ---------------------------
>>>> +
>>>> +Write 0 (or 1) to                                     Write 'mem' to
>>>> +/sys/devices/system/cpu/cpu*/online                   /sys/power/state
>>>> +     sysfs file                                          syfs file
>>>> +         |                                                   |
>>>> +         |                                                   v
>>>> +         |                                         Acquire pm_mutex lock
>>>> +         |                                                   |
>>>> +         |                                                   v
>>>> +         |                                Send PM_SUSPEND_PREPARE notifications
>>>> +         |                                                   |
>>>> +         |                                                   v
>>>> +         |                                              Freeze tasks
>>>
>>> OK, so something appears to be missing here.  Namely, the task writing to
>>> /sys/devices/system/cpu/cpu*/online should be frozen at this point or
>>> suspend should be aborted.  I suppose neither of these happens and I wonder
>>> why exactly.
>>>
>>
>> I have a couple of clarifications to make here:
>> * Firstly, this picture is not meant to represent what happens when regular
>>   cpu hotplug and suspend run together. That race condition has not been
>>   brought out here. What it does try to explain is, how the regular cpu
>>   hotplug path is different from suspend, and where they share common code.
>>   Please don't think about timing/race condition when reading it. Its just
>>   meant to explain the call path and locking involved.
> 
> Well, I didn't understand this part.  And the question above is:
> 
>> I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?
> 
> which kind of suggests something different from what you're saying.  Care to
> clarify that in the document?
> 

Ok, I get it. I'll clarify the question and post the next version.

>> * Secondly, this picture explains the *current* design, and *not* the mutual
>>   exclusion design I have proposed between regular cpu hotplug and suspend.
>>   The reason being, this doc was written to help everyone understand the
>>   current locking schemes, to help evaluate my proposal for a different
>>   scheme (mutual exclusion).
> 
> I understand that.
> 
>> Now, coming to your point, if that task writing to the sysfs file has not
>> been frozen, then the current kernel doesn't abort suspend, which is why we are
>> encountering problems, and which is exactly what my patchset tries to solve.
>> Link to my patchset:
>> http://thread.gmane.org/gmane.linux.documentation/3414/focus=3414
> 
> This isn't my point, actually.  My point is that the task writing to
> /sys/devices/system/cpu/cpu*/online should be frozen by the freezer.
> If it is not frozen, then the freezer should fail.  If that doesn't
> happen, there's a bug that has to be fixed and it is _not_ the lack
> of mutual exclusion.  The bug is that, apparently, suspend continues
> even though there is an unfrozen user space process in the system.
> 
> Do you have any idea why that happens?
> 

Sorry, I think I explained it wrong above. The freezer doesn't have any bugs
in this context. If it fails to freeze the tasks, suspend does get aborted.

But the point here is, suppose the task writing to the 'online' sysfs
file has already entered the kernel, and _now_ the freezer started
freezing tasks, it might encounter trouble in freezing that cpu hotplug
operation that has already begun (because the cpu hotplug online operation
waits on the frozen userspace to get microcode). 

So, to clarify again, regular cpu hotplug and the cpu hotplug operations
carried out during suspend are properly serialized in the current kernel.
We have no problems there.

The problem is with freezer and regular cpu hotplug racing with each other,
as illustrated in this scenario:
  * the regular cpu online operation continues its journey from userspace
    into the kernel, since the freezing has not yet begun.
  * then freezer starts and freezes userspace.
  * If cpu online has not yet completed the microcode update stuff by now,
    it will now start waiting on the frozen userspace unfortunately.
  * Now the freezer continues and tries to freeze the remaining tasks. But
    due to this wait mentioned above, the freezer won't be able to freeze
    the cpu online hotplug task and hence freezing of tasks fails.
  [This race condition is where the whole problem lies.]

And if freezing of tasks fails, then suspend gets aborted. So no problems
there again.

Thus, since the race condition between freezer and regular cpu hotplug is
where the problem lies, my patch prevents the two from running simultaneously
by implementing mutual exclusion. But it so happens that since freezer is
almost the first step in the suspend/hibernate path, and since pm_mutex
is acquired in this path at the very beginning and released only after all
the tasks are thawed, I reused this lock to implement the mutual exclusion,
which now translates (from our requirement of mutual exclusion between 
just freezer and regular cpu hotplug) to a mutual exclusion between regular
cpu hotplug and the entire suspend operation.

-- 
Regards,
Srivatsa S. Bhat  <srivatsa.bhat@linux.vnet.ibm.com>
Linux Technology Center,
IBM India Systems and Technology Lab

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] Documentation/power: Update docs about suspend and CPU hotplug
  2011-10-12 21:13       ` Srivatsa S. Bhat
@ 2011-10-14  6:24         ` Srivatsa S. Bhat
  2011-10-14 15:44           ` Alan Stern
  2011-10-15 22:42         ` [PATCH] " Rafael J. Wysocki
  1 sibling, 1 reply; 12+ messages in thread
From: Srivatsa S. Bhat @ 2011-10-14  6:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: rdunlap, len.brown, pavel, a.p.zijlstra, linux-pm, linux-doc,
	linux-kernel

Update the documentation about the interaction between the suspend (S3) call
path and the CPU hotplug infrastructure.
This patch focusses only on the activities of the freezer, cpu hotplug and
the notifications involved. It outlines how regular CPU hotplug differs from
the way it is invoked during suspend and also tries to explain the locking
involved.

v2: Clarified the question, to emphasize that the document explains only
    the difference (and similarity) in the two code paths but not what happens
    when race conditions occur between them.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 Documentation/power/00-INDEX                   |    2 
 Documentation/power/suspend-and-cpuhotplug.txt |  118 ++++++++++++++++++++++++
 2 files changed, 120 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/power/suspend-and-cpuhotplug.txt

diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
index 45e9d4a..a4d682f 100644
--- a/Documentation/power/00-INDEX
+++ b/Documentation/power/00-INDEX
@@ -26,6 +26,8 @@ s2ram.txt
 	- How to get suspend to ram working (and debug it when it isn't)
 states.txt
 	- System power management states
+suspend-and-cpuhotplug.txt
+	- Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug
 swsusp-and-swap-files.txt
 	- Using swap files with software suspend (to disk)
 swsusp-dmcrypt.txt
diff --git a/Documentation/power/suspend-and-cpuhotplug.txt b/Documentation/power/suspend-and-cpuhotplug.txt
new file mode 100644
index 0000000..be10902
--- /dev/null
+++ b/Documentation/power/suspend-and-cpuhotplug.txt
@@ -0,0 +1,118 @@
+Interaction of Suspend code (S3) with the CPU hotplug infrastructure
+   (C) 2011 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>, GPL
+
+
+I. How does the regular CPU hotplug code differ from how the Suspend-to-RAM
+   infrastructure uses it internally? And where do the two of them share
+   common code?
+
+Well, a picture speaks more than a thousand words... So ASCII art follows :-)
+
+[This depicts the current design in the kernel, and focusses only on the
+interactions involving the freezer and cpu hotplug and also tries to explain
+the locking involved. It outlines the notifications involved as well.
+But please note that here, only the call paths are illustrated, with the aim
+of describing where they take different paths and where they share code.
+What happens when regular CPU hotplug and Suspend-to-RAM race with each other
+is not depicted here.]
+
+On a high level, the suspend-resume cycle goes like this:
+
+|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
+|tasks |    |     cpus      |    |          |    |     cpus     |    |tasks|
+
+
+More details follow:
+
+Regular CPU hotplug                                   Suspend call path
+-------------------                              ---------------------------
+
+Write 0 (or 1) to                                     Write 'mem' to
+/sys/devices/system/cpu/cpu*/online                   /sys/power/state
+     sysfs file                                          syfs file
+         |                                                   |
+         |                                                   v
+         |                                         Acquire pm_mutex lock
+         |                                                   |
+         |                                                   v
+         |                                Send PM_SUSPEND_PREPARE notifications
+         |                                                   |
+         |                                                   v
+         |                                              Freeze tasks
+         |                                                   |
+         |                                                   |
+         v                                                   v
+     cpu_down()                                 disable_nonboot_cpus() /*start*/
+         |                                                   |
+         v                                                   v
+Acquire cpu_add_remove_lock                     Acquire cpu_add_remove_lock
+         |                                                   |
+         v                                                   v
+If cpu_hotplug_disabled is 1                 Iterate over CURRENTLY online CPUs
+   return gracefully                                         |
+         |                                                   |
+         |                                                   |           ----
+         v                                                   v               |
+           \                                               /                 |
+             --------                             --------                   |
+                      \                         /                            |
+                        --------       --------                              |L
+                                 \____/                                      |
+                                   |                                         |
+                                   v                                         |O
+                              _cpu_down()                                    |
+                     [This takes cpuhotplug.lock                             |
+                      before taking down the CPU                             |
+                      and releases it when done]                             |O
+                   While it is at it, notifications                          |
+                   are sent when notable events occur,                       |
+                   by running all registered callbacks.                      |
+                                   |                                         |O
+                                  / \                                        |
+                                 /   \                                       |
+                                <     >                                      |
+        _______________________/       \_____________________                |P
+       |                                                     |               |
+       v                                                     v               |
+Release cpu_add_remove_lock                      Note down these cpus in     |
+[That's it!, for                                     frozen_cpus mask    ----
+ regular CPU hotplug]                                        |
+                                                             v
+                                                  Disable regular cpu hotplug
+                                              by setting cpu_hotplug_disabled=1
+                                                             |
+                                                             v
+                                                Release cpu_add_remove_lock
+                                                             |
+                                                             v
+                                          /* disable_nonboot_cpus() complete */
+                                                             |
+                                                             v
+                                                        Do suspend
+
+
+Resuming back is likewise, with the counterparts being (in the order of
+execution during resume):
+* enable_nonboot_cpus() which involves:
+   |  Acquire cpu_add_remove_lock
+   |  Reset cpu_hotplug_disabled to 0, thereby enabling regular cpu hotplug
+   |  Call _cpu_up() [for all those cpus in the frozen_cpus mask, in a loop]
+   |  Release cpu_add_remove_lock
+   v
+
+* thaw tasks
+* send PM_POST_SUSPEND notifications
+* Release pm_mutex lock.
+
+It is to be noted here that the pm_mutex lock is acquired at the very
+beginning, when we are just starting out to suspend, and then released only
+after the entire cycle is complete (i.e., suspend + resume).
+
+
+Important files and functions/entry points:
+------------------------------------------
+
+kernel/power/process.c : freeze_processes(), thaw_processes()
+kernel/power/suspend.c : suspend_prepare(), suspend_enter(), suspend_finish()
+kernel/cpu.c: cpu_[up|down](), _cpu_[up|down](), [disable|enable]_nonboot_cpus()
+



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] Documentation/power: Update docs about suspend and CPU hotplug
  2011-10-14  6:24         ` [PATCH v2] " Srivatsa S. Bhat
@ 2011-10-14 15:44           ` Alan Stern
  2011-10-14 18:18             ` Srivatsa S. Bhat
  2011-10-17 13:10             ` [PATCH v3] " Srivatsa S. Bhat
  0 siblings, 2 replies; 12+ messages in thread
From: Alan Stern @ 2011-10-14 15:44 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Rafael J. Wysocki, rdunlap, len.brown, pavel, a.p.zijlstra,
	linux-pm, linux-doc, linux-kernel

On Fri, 14 Oct 2011, Srivatsa S. Bhat wrote:

> Update the documentation about the interaction between the suspend (S3) call
> path and the CPU hotplug infrastructure.
> This patch focusses only on the activities of the freezer, cpu hotplug and
> the notifications involved. It outlines how regular CPU hotplug differs from
> the way it is invoked during suspend and also tries to explain the locking
> involved.
> 
> v2: Clarified the question, to emphasize that the document explains only
>     the difference (and similarity) in the two code paths but not what happens
>     when race conditions occur between them.

Drawing the two diagrams in parallel, the way this does, carries a very
strong implication that the events being described happen in parallel.  
It really would be much better to have two separate diagrams and point
out the common portions.

Also, the document should discuss the issues involved in CPU hotplug.  
In particular, what happens if the CPUs are not all the same, or if the 
requiring differing microcodes.  And whether or not the microcode needs 
to get reloaded (presumably only after hibernation, not after suspend).
And what happens when a CPU is hot-unplugged and replaced with a 
differeing CPU which is then hot-plugged.

Alan Stern


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] Documentation/power: Update docs about suspend and CPU hotplug
  2011-10-14 15:44           ` Alan Stern
@ 2011-10-14 18:18             ` Srivatsa S. Bhat
  2011-10-17 13:10             ` [PATCH v3] " Srivatsa S. Bhat
  1 sibling, 0 replies; 12+ messages in thread
From: Srivatsa S. Bhat @ 2011-10-14 18:18 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, rdunlap, len.brown, pavel, a.p.zijlstra,
	linux-pm, linux-doc, linux-kernel

On 10/14/2011 09:14 PM, Alan Stern wrote:
> On Fri, 14 Oct 2011, Srivatsa S. Bhat wrote:
> 
>> Update the documentation about the interaction between the suspend (S3) call
>> path and the CPU hotplug infrastructure.
>> This patch focusses only on the activities of the freezer, cpu hotplug and
>> the notifications involved. It outlines how regular CPU hotplug differs from
>> the way it is invoked during suspend and also tries to explain the locking
>> involved.
>>
>> v2: Clarified the question, to emphasize that the document explains only
>>     the difference (and similarity) in the two code paths but not what happens
>>     when race conditions occur between them.
> 
> Drawing the two diagrams in parallel, the way this does, carries a very
> strong implication that the events being described happen in parallel.  
> It really would be much better to have two separate diagrams and point
> out the common portions.
> 

I felt that drawing this way side-by-side would make it easier to see where they
differ and where they call the same code. But if this is really causing everyone
to believe that it represents events happening in parallel, then I will think
of separating the two diagrams and still somehow effectively explain what I set
out to explain in this document. Thank you for pointing it out.

> Also, the document should discuss the issues involved in CPU hotplug.  
> In particular, what happens if the CPUs are not all the same, or if the 
> requiring differing microcodes.  And whether or not the microcode needs 
> to get reloaded (presumably only after hibernation, not after suspend).
> And what happens when a CPU is hot-unplugged and replaced with a 
> differeing CPU which is then hot-plugged.
> 

Ok, I'll add another section that deals with the whole CPU hotplug and microcode
story. I'll include it in the next version. Thank you.

-- 
Regards,
Srivatsa S. Bhat  <srivatsa.bhat@linux.vnet.ibm.com>
Linux Technology Center,
IBM India Systems and Technology Lab

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Documentation/power: Update docs about suspend and CPU hotplug
  2011-10-12 21:13       ` Srivatsa S. Bhat
  2011-10-14  6:24         ` [PATCH v2] " Srivatsa S. Bhat
@ 2011-10-15 22:42         ` Rafael J. Wysocki
  2011-10-16  0:14           ` Srivatsa S. Bhat
  1 sibling, 1 reply; 12+ messages in thread
From: Rafael J. Wysocki @ 2011-10-15 22:42 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: rdunlap, len.brown, pavel, a.p.zijlstra, linux-pm, linux-doc,
	linux-kernel

On Wednesday, October 12, 2011, Srivatsa S. Bhat wrote:
> On 10/13/2011 12:49 AM, Rafael J. Wysocki wrote:
> > On Wednesday, October 12, 2011, Srivatsa S. Bhat wrote:
> >> On 10/12/2011 03:32 AM, Rafael J. Wysocki wrote:
> >>> On Tuesday, October 11, 2011, Srivatsa S. Bhat wrote:
> >>>> Update the documentation about the interaction between the suspend (S3) call
> >>>> path and the CPU hotplug infrastructure.
> >>>> This patch focusses only on the activities of the freezer, cpu hotplug and
> >>>> the notifications involved. It outlines how regular CPU hotplug differs from
> >>>> the way it is invoked during suspend and also tries to explain the locking
> >>>> involved.
> >>>>
> >>>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> >>>> ---
> >>>>
> >>>>  Documentation/power/00-INDEX                   |    2 
> >>>>  Documentation/power/suspend-and-cpuhotplug.txt |  113 ++++++++++++++++++++++++
> >>>>  2 files changed, 115 insertions(+), 0 deletions(-)
> >>>>  create mode 100644 Documentation/power/suspend-and-cpuhotplug.txt
> >>>>
> >>>> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
> >>>> index 45e9d4a..a4d682f 100644
> >>>> --- a/Documentation/power/00-INDEX
> >>>> +++ b/Documentation/power/00-INDEX
> >>>> @@ -26,6 +26,8 @@ s2ram.txt
> >>>>  	- How to get suspend to ram working (and debug it when it isn't)
> >>>>  states.txt
> >>>>  	- System power management states
> >>>> +suspend-and-cpuhotplug.txt
> >>>> +	- Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug
> >>>>  swsusp-and-swap-files.txt
> >>>>  	- Using swap files with software suspend (to disk)
> >>>>  swsusp-dmcrypt.txt
> >>>> diff --git a/Documentation/power/suspend-and-cpuhotplug.txt b/Documentation/power/suspend-and-cpuhotplug.txt
> >>>> new file mode 100644
> >>>> index 0000000..d0ba411
> >>>> --- /dev/null
> >>>> +++ b/Documentation/power/suspend-and-cpuhotplug.txt
> >>>> @@ -0,0 +1,113 @@
> >>>> +Interaction of Suspend code (S3) with the CPU hotplug infrastructure
> >>>> +   (C) 2011 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>, GPL
> >>>> +
> >>>> +
> >>>> +I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?
> >>>> +
> >>>> +Well, a picture speaks more than a thousand words... So ASCII art follows :-)
> >>>> +
> >>>> +[This depicts the current design in the kernel, and focusses only on the
> >>>> +interactions between suspend call paths involving the freezer and cpu hotplug
> >>>> +and also tries to explain the locking involved. It also outlines the
> >>>> +notifications involved.]
> >>>> +
> >>>> +On a high level, the suspend-resume cycle goes like this:
> >>>> +
> >>>> +|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
> >>>> +|tasks |    |     cpus      |    |          |    |     cpus     |    |tasks|
> >>>> +
> >>>> +
> >>>> +More details follow:
> >>>> +
> >>>> +Regular CPU hotplug                                   Suspend call path
> >>>> +-------------------                              ---------------------------
> >>>> +
> >>>> +Write 0 (or 1) to                                     Write 'mem' to
> >>>> +/sys/devices/system/cpu/cpu*/online                   /sys/power/state
> >>>> +     sysfs file                                          syfs file
> >>>> +         |                                                   |
> >>>> +         |                                                   v
> >>>> +         |                                         Acquire pm_mutex lock
> >>>> +         |                                                   |
> >>>> +         |                                                   v
> >>>> +         |                                Send PM_SUSPEND_PREPARE notifications
> >>>> +         |                                                   |
> >>>> +         |                                                   v
> >>>> +         |                                              Freeze tasks
> >>>
> >>> OK, so something appears to be missing here.  Namely, the task writing to
> >>> /sys/devices/system/cpu/cpu*/online should be frozen at this point or
> >>> suspend should be aborted.  I suppose neither of these happens and I wonder
> >>> why exactly.
> >>>
> >>
> >> I have a couple of clarifications to make here:
> >> * Firstly, this picture is not meant to represent what happens when regular
> >>   cpu hotplug and suspend run together. That race condition has not been
> >>   brought out here. What it does try to explain is, how the regular cpu
> >>   hotplug path is different from suspend, and where they share common code.
> >>   Please don't think about timing/race condition when reading it. Its just
> >>   meant to explain the call path and locking involved.
> > 
> > Well, I didn't understand this part.  And the question above is:
> > 
> >> I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?
> > 
> > which kind of suggests something different from what you're saying.  Care to
> > clarify that in the document?
> > 
> 
> Ok, I get it. I'll clarify the question and post the next version.
> 
> >> * Secondly, this picture explains the *current* design, and *not* the mutual
> >>   exclusion design I have proposed between regular cpu hotplug and suspend.
> >>   The reason being, this doc was written to help everyone understand the
> >>   current locking schemes, to help evaluate my proposal for a different
> >>   scheme (mutual exclusion).
> > 
> > I understand that.
> > 
> >> Now, coming to your point, if that task writing to the sysfs file has not
> >> been frozen, then the current kernel doesn't abort suspend, which is why we are
> >> encountering problems, and which is exactly what my patchset tries to solve.
> >> Link to my patchset:
> >> http://thread.gmane.org/gmane.linux.documentation/3414/focus=3414
> > 
> > This isn't my point, actually.  My point is that the task writing to
> > /sys/devices/system/cpu/cpu*/online should be frozen by the freezer.
> > If it is not frozen, then the freezer should fail.  If that doesn't
> > happen, there's a bug that has to be fixed and it is _not_ the lack
> > of mutual exclusion.  The bug is that, apparently, suspend continues
> > even though there is an unfrozen user space process in the system.
> > 
> > Do you have any idea why that happens?
> > 
> 
> Sorry, I think I explained it wrong above. The freezer doesn't have any bugs
> in this context. If it fails to freeze the tasks, suspend does get aborted.
> 
> But the point here is, suppose the task writing to the 'online' sysfs
> file has already entered the kernel, and _now_ the freezer started
> freezing tasks, it might encounter trouble in freezing that cpu hotplug
> operation that has already begun (because the cpu hotplug online operation
> waits on the frozen userspace to get microcode). 
> 
> So, to clarify again, regular cpu hotplug and the cpu hotplug operations
> carried out during suspend are properly serialized in the current kernel.
> We have no problems there.
> 
> The problem is with freezer and regular cpu hotplug racing with each other,
> as illustrated in this scenario:
>   * the regular cpu online operation continues its journey from userspace
>     into the kernel, since the freezing has not yet begun.
>   * then freezer starts and freezes userspace.
>   * If cpu online has not yet completed the microcode update stuff by now,
>     it will now start waiting on the frozen userspace unfortunately.
>   * Now the freezer continues and tries to freeze the remaining tasks. But
>     due to this wait mentioned above, the freezer won't be able to freeze
>     the cpu online hotplug task and hence freezing of tasks fails.
>   [This race condition is where the whole problem lies.]
> 
> And if freezing of tasks fails, then suspend gets aborted. So no problems
> there again.

That pretty much is how it is supposed to be.

Do I understand correctly that you're attempting to make suspend always
succeed if CPU hotplug stress test is run in parallel with it?

Rafael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Documentation/power: Update docs about suspend and CPU hotplug
  2011-10-15 22:42         ` [PATCH] " Rafael J. Wysocki
@ 2011-10-16  0:14           ` Srivatsa S. Bhat
  0 siblings, 0 replies; 12+ messages in thread
From: Srivatsa S. Bhat @ 2011-10-16  0:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: rdunlap, len.brown, pavel, a.p.zijlstra, linux-pm, linux-doc,
	linux-kernel

On 10/16/2011 04:12 AM, Rafael J. Wysocki wrote:
> On Wednesday, October 12, 2011, Srivatsa S. Bhat wrote:
>> On 10/13/2011 12:49 AM, Rafael J. Wysocki wrote:
>>> On Wednesday, October 12, 2011, Srivatsa S. Bhat wrote:
>>>> On 10/12/2011 03:32 AM, Rafael J. Wysocki wrote:
>>>>> On Tuesday, October 11, 2011, Srivatsa S. Bhat wrote:
>>>>>> Update the documentation about the interaction between the suspend (S3) call
>>>>>> path and the CPU hotplug infrastructure.
>>>>>> This patch focusses only on the activities of the freezer, cpu hotplug and
>>>>>> the notifications involved. It outlines how regular CPU hotplug differs from
>>>>>> the way it is invoked during suspend and also tries to explain the locking
>>>>>> involved.
>>>>>>
>>>>>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>>>>> ---
>>>>>>
>>>>>>  Documentation/power/00-INDEX                   |    2 
>>>>>>  Documentation/power/suspend-and-cpuhotplug.txt |  113 ++++++++++++++++++++++++
>>>>>>  2 files changed, 115 insertions(+), 0 deletions(-)
>>>>>>  create mode 100644 Documentation/power/suspend-and-cpuhotplug.txt
>>>>>>
>>>>>> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
>>>>>> index 45e9d4a..a4d682f 100644
>>>>>> --- a/Documentation/power/00-INDEX
>>>>>> +++ b/Documentation/power/00-INDEX
>>>>>> @@ -26,6 +26,8 @@ s2ram.txt
>>>>>>  	- How to get suspend to ram working (and debug it when it isn't)
>>>>>>  states.txt
>>>>>>  	- System power management states
>>>>>> +suspend-and-cpuhotplug.txt
>>>>>> +	- Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug
>>>>>>  swsusp-and-swap-files.txt
>>>>>>  	- Using swap files with software suspend (to disk)
>>>>>>  swsusp-dmcrypt.txt
>>>>>> diff --git a/Documentation/power/suspend-and-cpuhotplug.txt b/Documentation/power/suspend-and-cpuhotplug.txt
>>>>>> new file mode 100644
>>>>>> index 0000000..d0ba411
>>>>>> --- /dev/null
>>>>>> +++ b/Documentation/power/suspend-and-cpuhotplug.txt
>>>>>> @@ -0,0 +1,113 @@
>>>>>> +Interaction of Suspend code (S3) with the CPU hotplug infrastructure
>>>>>> +   (C) 2011 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>, GPL
>>>>>> +
>>>>>> +
>>>>>> +I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?
>>>>>> +
>>>>>> +Well, a picture speaks more than a thousand words... So ASCII art follows :-)
>>>>>> +
>>>>>> +[This depicts the current design in the kernel, and focusses only on the
>>>>>> +interactions between suspend call paths involving the freezer and cpu hotplug
>>>>>> +and also tries to explain the locking involved. It also outlines the
>>>>>> +notifications involved.]
>>>>>> +
>>>>>> +On a high level, the suspend-resume cycle goes like this:
>>>>>> +
>>>>>> +|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
>>>>>> +|tasks |    |     cpus      |    |          |    |     cpus     |    |tasks|
>>>>>> +
>>>>>> +
>>>>>> +More details follow:
>>>>>> +
>>>>>> +Regular CPU hotplug                                   Suspend call path
>>>>>> +-------------------                              ---------------------------
>>>>>> +
>>>>>> +Write 0 (or 1) to                                     Write 'mem' to
>>>>>> +/sys/devices/system/cpu/cpu*/online                   /sys/power/state
>>>>>> +     sysfs file                                          syfs file
>>>>>> +         |                                                   |
>>>>>> +         |                                                   v
>>>>>> +         |                                         Acquire pm_mutex lock
>>>>>> +         |                                                   |
>>>>>> +         |                                                   v
>>>>>> +         |                                Send PM_SUSPEND_PREPARE notifications
>>>>>> +         |                                                   |
>>>>>> +         |                                                   v
>>>>>> +         |                                              Freeze tasks
>>>>>
>>>>> OK, so something appears to be missing here.  Namely, the task writing to
>>>>> /sys/devices/system/cpu/cpu*/online should be frozen at this point or
>>>>> suspend should be aborted.  I suppose neither of these happens and I wonder
>>>>> why exactly.
>>>>>
>>>>
>>>> I have a couple of clarifications to make here:
>>>> * Firstly, this picture is not meant to represent what happens when regular
>>>>   cpu hotplug and suspend run together. That race condition has not been
>>>>   brought out here. What it does try to explain is, how the regular cpu
>>>>   hotplug path is different from suspend, and where they share common code.
>>>>   Please don't think about timing/race condition when reading it. Its just
>>>>   meant to explain the call path and locking involved.
>>>
>>> Well, I didn't understand this part.  And the question above is:
>>>
>>>> I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?
>>>
>>> which kind of suggests something different from what you're saying.  Care to
>>> clarify that in the document?
>>>
>>
>> Ok, I get it. I'll clarify the question and post the next version.
>>
>>>> * Secondly, this picture explains the *current* design, and *not* the mutual
>>>>   exclusion design I have proposed between regular cpu hotplug and suspend.
>>>>   The reason being, this doc was written to help everyone understand the
>>>>   current locking schemes, to help evaluate my proposal for a different
>>>>   scheme (mutual exclusion).
>>>
>>> I understand that.
>>>
>>>> Now, coming to your point, if that task writing to the sysfs file has not
>>>> been frozen, then the current kernel doesn't abort suspend, which is why we are
>>>> encountering problems, and which is exactly what my patchset tries to solve.
>>>> Link to my patchset:
>>>> http://thread.gmane.org/gmane.linux.documentation/3414/focus=3414
>>>
>>> This isn't my point, actually.  My point is that the task writing to
>>> /sys/devices/system/cpu/cpu*/online should be frozen by the freezer.
>>> If it is not frozen, then the freezer should fail.  If that doesn't
>>> happen, there's a bug that has to be fixed and it is _not_ the lack
>>> of mutual exclusion.  The bug is that, apparently, suspend continues
>>> even though there is an unfrozen user space process in the system.
>>>
>>> Do you have any idea why that happens?
>>>
>>
>> Sorry, I think I explained it wrong above. The freezer doesn't have any bugs
>> in this context. If it fails to freeze the tasks, suspend does get aborted.
>>
>> But the point here is, suppose the task writing to the 'online' sysfs
>> file has already entered the kernel, and _now_ the freezer started
>> freezing tasks, it might encounter trouble in freezing that cpu hotplug
>> operation that has already begun (because the cpu hotplug online operation
>> waits on the frozen userspace to get microcode). 
>>
>> So, to clarify again, regular cpu hotplug and the cpu hotplug operations
>> carried out during suspend are properly serialized in the current kernel.
>> We have no problems there.
>>
>> The problem is with freezer and regular cpu hotplug racing with each other,
>> as illustrated in this scenario:
>>   * the regular cpu online operation continues its journey from userspace
>>     into the kernel, since the freezing has not yet begun.
>>   * then freezer starts and freezes userspace.
>>   * If cpu online has not yet completed the microcode update stuff by now,
>>     it will now start waiting on the frozen userspace unfortunately.
>>   * Now the freezer continues and tries to freeze the remaining tasks. But
>>     due to this wait mentioned above, the freezer won't be able to freeze
>>     the cpu online hotplug task and hence freezing of tasks fails.
>>   [This race condition is where the whole problem lies.]
>>
>> And if freezing of tasks fails, then suspend gets aborted. So no problems
>> there again.
> 
> That pretty much is how it is supposed to be.
> 

Yes, I totally agree, assuming you are referring to suspend being aborted
after a freezer failure.
However, if you are referring to freezing failure when CPU hotplug is run,
then I don't think that's how it is supposed to be. We know the problem
and we have at least 2 solutions. We can surely avoid freezing failures in
this scenario.

> Do I understand correctly that you're attempting to make suspend always
> succeed if CPU hotplug stress test is run in parallel with it?
> 

Short answer:
I was attempting to prevent freezing failures. CPU hotplug was one of the
problematic scenarios I found. So yes, I am attempting to make freezing
(and thereby suspend) to always succeed, even when CPU hotplug is run in
parallel.

Long answer:
Well, overall, I was just attempting to make the suspend infrastructure more
robust. And clearly, hitting task freezing failures is not good, whatever the
scenario maybe. And we also now know that this used to be hit by a situation
that wasn't handled properly.

So all I was trying to do was prevent the freezing failures, either by:
  * solving the problem in the microcode update itself, keeping the freezer
    code unaware of it, since the root cause of the problem was not in the
    freezer code anyway.
  * or, by the other approach of mutually excluding CPU hotplug and suspend.
    This would make both of them aware of each other, since we know they'll
    hit problems if they run together.

If we applied either of these solutions, suspend would always succeed even if
you run CPU hotplug in parallel with it, because we would never hit freezing
failures pertaining to this scenario.

>From your question, I get a slight feeling that you might be hinting that
this whole usecase itself is rather unreal and hence why should we even
bother making suspend succeed in "crazy" scenarios like this (as has already
been debated in some of the threads).
If that is really what you are pointing at, then I have to admit that I don't
feel comfortable with the idea of giving the user a manual telling him what
to do and what not to do, just because the system doesn't take care of certain
scenarios. And moreover, if we know of a loophole, I believe we can be more
or less sure that some code or the other will hit it sooner or later.

What do you think?

-- 
Regards,
Srivatsa S. Bhat  <srivatsa.bhat@linux.vnet.ibm.com>
Linux Technology Center,
IBM India Systems and Technology Lab


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3] Documentation/power: Update docs about suspend and CPU hotplug
  2011-10-14 15:44           ` Alan Stern
  2011-10-14 18:18             ` Srivatsa S. Bhat
@ 2011-10-17 13:10             ` Srivatsa S. Bhat
  2011-10-19 22:08               ` Rafael J. Wysocki
  1 sibling, 1 reply; 12+ messages in thread
From: Srivatsa S. Bhat @ 2011-10-17 13:10 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, rdunlap, len.brown, pavel, a.p.zijlstra,
	linux-pm, linux-doc, linux-kernel, Tejun Heo, Borislav Petkov

Update the documentation about the interaction between the suspend (S3) call
path and the CPU hotplug infrastructure.
This patch focusses only on the activities of the freezer, cpu hotplug and
the notifications involved. It outlines how regular CPU hotplug differs from
the way it is invoked during suspend and also tries to explain the locking
involved. In addition to that, it discusses the issue of microcode update
during CPU hotplug operations.

v3:
   * Split the diagram into two, in order to avoid giving the wrong notion
     that this document explains the situation of CPU hotplug and suspend
     running in parallel.
   * Added a short description about CPU microcode update during CPU hotplug
     operations in different scenarios.
   * Added a section on known issues when CPU hotplug and suspend race with
     each other.

v2:
   * Clarified the question, to emphasize that the document explains only
     the difference (and similarity) in the two code paths but not what
     happens when race conditions occur between them.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 Documentation/power/00-INDEX                   |    2 
 Documentation/power/suspend-and-cpuhotplug.txt |  277 ++++++++++++++++++++++++
 2 files changed, 279 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/power/suspend-and-cpuhotplug.txt

diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
index 45e9d4a..a4d682f 100644
--- a/Documentation/power/00-INDEX
+++ b/Documentation/power/00-INDEX
@@ -26,6 +26,8 @@ s2ram.txt
 	- How to get suspend to ram working (and debug it when it isn't)
 states.txt
 	- System power management states
+suspend-and-cpuhotplug.txt
+	- Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug
 swsusp-and-swap-files.txt
 	- Using swap files with software suspend (to disk)
 swsusp-dmcrypt.txt
diff --git a/Documentation/power/suspend-and-cpuhotplug.txt b/Documentation/power/suspend-and-cpuhotplug.txt
new file mode 100644
index 0000000..a1acf1b
--- /dev/null
+++ b/Documentation/power/suspend-and-cpuhotplug.txt
@@ -0,0 +1,277 @@
+Interaction of Suspend code (S3) with the CPU hotplug infrastructure
+
+     (C) 2011 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
+
+
+I. How does the regular CPU hotplug code differ from how the Suspend-to-RAM
+   infrastructure uses it internally? And where do they share common code?
+
+Well, a picture is worth a thousand words... So ASCII art follows :-)
+
+[This depicts the current design in the kernel, and focusses only on the
+interactions involving the freezer and CPU hotplug and also tries to explain
+the locking involved. It outlines the notifications involved as well.
+But please note that here, only the call paths are illustrated, with the aim
+of describing where they take different paths and where they share code.
+What happens when regular CPU hotplug and Suspend-to-RAM race with each other
+is not depicted here.]
+
+On a high level, the suspend-resume cycle goes like this:
+
+|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
+|tasks |    |     cpus      |    |          |    |     cpus     |    |tasks|
+
+
+More details follow:
+
+                                Suspend call path
+                                -----------------
+
+                                  Write 'mem' to
+                                /sys/power/state
+                                    syfs file
+                                        |
+                                        v
+                               Acquire pm_mutex lock
+                                        |
+                                        v
+                             Send PM_SUSPEND_PREPARE
+                                   notifications
+                                        |
+                                        v
+                                   Freeze tasks
+                                        |
+                                        |
+                                        v
+                              disable_nonboot_cpus()
+                                   /* start */
+                                        |
+                                        v
+                            Acquire cpu_add_remove_lock
+                                        |
+                                        v
+                             Iterate over CURRENTLY
+                                   online CPUs
+                                        |
+                                        |
+                                        |                ----------
+                                        v                          | L
+             ======>               _cpu_down()                     |
+            |              [This takes cpuhotplug.lock             |
+  Common    |               before taking down the CPU             |
+   code     |               and releases it when done]             | O
+            |            While it is at it, notifications          |
+            |            are sent when notable events occur,       |
+             ======>     by running all registered callbacks.      |
+                                        |                          | O
+                                        |                          |
+                                        |                          |
+                                        v                          |
+                            Note down these cpus in                | P
+                                frozen_cpus mask         ----------
+                                        |
+                                        v
+                           Disable regular cpu hotplug
+                        by setting cpu_hotplug_disabled=1
+                                        |
+                                        v
+                            Release cpu_add_remove_lock
+                                        |
+                                        v
+                       /* disable_nonboot_cpus() complete */
+                                        |
+                                        v
+                                   Do suspend
+
+
+
+Resuming back is likewise, with the counterparts being (in the order of
+execution during resume):
+* enable_nonboot_cpus() which involves:
+   |  Acquire cpu_add_remove_lock
+   |  Reset cpu_hotplug_disabled to 0, thereby enabling regular cpu hotplug
+   |  Call _cpu_up() [for all those cpus in the frozen_cpus mask, in a loop]
+   |  Release cpu_add_remove_lock
+   v
+
+* thaw tasks
+* send PM_POST_SUSPEND notifications
+* Release pm_mutex lock.
+
+
+It is to be noted here that the pm_mutex lock is acquired at the very
+beginning, when we are just starting out to suspend, and then released only
+after the entire cycle is complete (i.e., suspend + resume).
+
+
+
+                          Regular CPU hotplug call path
+                          -----------------------------
+
+                                Write 0 (or 1) to
+                       /sys/devices/system/cpu/cpu*/online
+                                    sysfs file
+                                        |
+                                        |
+                                        v
+                                    cpu_down()
+                                        |
+                                        v
+                           Acquire cpu_add_remove_lock
+                                        |
+                                        v
+                          If cpu_hotplug_disabled is 1
+                                return gracefully
+                                        |
+                                        |
+                                        v
+             ======>                _cpu_down()
+            |              [This takes cpuhotplug.lock
+  Common    |               before taking down the CPU
+   code     |               and releases it when done]
+            |            While it is at it, notifications
+            |           are sent when notable events occur,
+             ======>    by running all registered callbacks.
+                                        |
+                                        |
+                                        v
+                          Release cpu_add_remove_lock
+                               [That's it!, for
+                              regular CPU hotplug]
+
+
+
+So, as can be seen from the two diagrams (the parts marked as "Common code"),
+regular CPU hotplug and the suspend code path converge at the _cpu_down() and
+_cpu_up() functions. They differ in the arguments passed to these functions,
+in that during regular CPU hotplug, 0 is passed for the 'tasks_frozen'
+argument. But during suspend, since the tasks are already frozen by the time
+the non-boot CPUs are offlined or onlined, the _cpu_*() functions are called
+with the 'tasks_frozen' argument set to 1.
+[See below for some known issues regarding this.]
+
+
+Important files and functions/entry points:
+------------------------------------------
+
+kernel/power/process.c : freeze_processes(), thaw_processes()
+kernel/power/suspend.c : suspend_prepare(), suspend_enter(), suspend_finish()
+kernel/cpu.c: cpu_[up|down](), _cpu_[up|down](), [disable|enable]_nonboot_cpus()
+
+
+
+II. What are the issues involved in CPU hotplug?
+    -------------------------------------------
+
+There are some interesting situations involving CPU hotplug and microcode
+update on the CPUs, as discussed below:
+
+[Please bear in mind that the kernel requests the microcode images from
+userspace, using the request_firmware() function defined in
+drivers/base/firmware_class.c]
+
+
+a. When all the CPUs are identical:
+
+   This is the most common situation and it is quite straightforward: we want
+   to apply the same microcode revision to each of the CPUs.
+   To give an example of x86, the collect_cpu_info() function defined in
+   arch/x86/kernel/microcode_core.c helps in discovering the type of the CPU
+   and thereby in applying the correct microcode revision to it.
+   But note that the kernel does not maintain a common microcode image for the
+   all CPUs, in order to handle case 'b' described below.
+
+
+b. When some of the CPUs are different than the rest:
+
+   In this case since we probably need to apply different microcode revisions
+   to different CPUs, the kernel maintains a copy of the correct microcode
+   image for each CPU (after appropriate CPU type/model discovery using
+   functions such as collect_cpu_info()).
+
+
+c. When a CPU is physically hot-unplugged and a new (and possibly different
+   type of) CPU is hot-plugged into the system:
+
+   In the current design of the kernel, whenever a CPU is taken offline during
+   a regular CPU hotplug operation, upon receiving the CPU_DEAD notification
+   (which is sent by the CPU hotplug code), the microcode update driver's
+   callback for that event reacts by freeing the kernel's copy of the
+   microcode image for that CPU.
+
+   Hence, when a new CPU is brought online, since the kernel finds that it
+   doesn't have the microcode image, it does the CPU type/model discovery
+   afresh and then requests the userspace for the appropriate microcode image
+   for that CPU, which is subsequently applied.
+
+   For example, in x86, the mc_cpu_callback() function (which is the microcode
+   update driver's callback registered for CPU hotplug events) calls
+   microcode_update_cpu() which would call microcode_init_cpu() in this case,
+   instead of microcode_resume_cpu() when it finds that the kernel doesn't
+   have a valid microcode image. This ensures that the CPU type/model
+   discovery is performed and the right microcode is applied to the CPU after
+   getting it from userspace.
+
+
+d. Handling microcode update during suspend/hibernate:
+
+   Strictly speaking, during a CPU hotplug operation which does not involve
+   physically removing or inserting CPUs, the CPUs are not actually powered
+   off during a CPU offline. They are just put to the lowest C-states possible.
+   Hence, in such a case, it is not really necessary to re-apply microcode
+   when the CPUs are brought back online, since they wouldn't have lost the
+   image during the CPU offline operation.
+
+   This is the usual scenario encountered during a resume after a suspend.
+   However, in the case of hibernation, since all the CPUs are completely
+   powered off, during restore it becomes necessary to apply the microcode
+   images to all the CPUs.
+
+   [Note that we don't expect someone to physically pull out nodes and insert
+   nodes with a different type of CPUs in-between a suspend-resume or a
+   hibernate/restore cycle.]
+
+   In the current design of the kernel however, during a CPU offline operation
+   as part of the suspend/hibernate cycle (the CPU_DEAD_FROZEN notification),
+   the existing copy of microcode image in the kernel is not freed up.
+   And during the CPU online operations (during resume/restore), since the
+   kernel finds that it already has copies of the microcode images for all the
+   CPUs, it just applies them to the CPUs, avoiding any re-discovery of CPU
+   type/model and the need for validating whether the microcode revisions are
+   right for the CPUs or not (due to the above assumption that physical CPU
+   hotplug will not be done in-between suspend/resume or hibernate/restore
+   cycles).
+
+
+III. Are there any known problems when regular CPU hotplug and suspend race
+     with each other?
+
+Yes, they are listed below:
+
+1. When invoking regular CPU hotplug, the 'tasks_frozen' argument passed to
+   the _cpu_down() and _cpu_up() functions is *always* 0.
+   This might not reflect the true current state of the system, since the
+   tasks could have been frozen by an out-of-band event such as a suspend
+   operation in progress. Hence, it will lead to wrong notifications being
+   sent during the cpu online/offline events (eg, CPU_ONLINE notification
+   instead of CPU_ONLINE_FROZEN) which in turn will lead to execution of
+   inappropriate code by the callbacks registered for such CPU hotplug events.
+
+2. If a regular CPU hotplug stress test happens to race with the freezer due
+   to a suspend operation in progress at the same time, then we could hit the
+   situation described below:
+
+    * A regular cpu online operation continues its journey from userspace
+      into the kernel, since the freezing has not yet begun.
+    * Then freezer gets to work and freezes userspace.
+    * If cpu online has not yet completed the microcode update stuff by now,
+      it will now start waiting on the frozen userspace in the
+      TASK_UNINTERRUPTIBLE state, in order to get the microcode image.
+    * Now the freezer continues and tries to freeze the remaining tasks. But
+      due to this wait mentioned above, the freezer won't be able to freeze
+      the cpu online hotplug task and hence freezing of tasks fails.
+
+   As a result of this task freezing failure, the suspend operation gets
+   aborted.
+
+




^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] Documentation/power: Update docs about suspend and CPU hotplug
  2011-10-17 13:10             ` [PATCH v3] " Srivatsa S. Bhat
@ 2011-10-19 22:08               ` Rafael J. Wysocki
  0 siblings, 0 replies; 12+ messages in thread
From: Rafael J. Wysocki @ 2011-10-19 22:08 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Alan Stern, rdunlap, len.brown, pavel, a.p.zijlstra, linux-pm,
	linux-doc, linux-kernel, Tejun Heo, Borislav Petkov

On Monday, October 17, 2011, Srivatsa S. Bhat wrote:
> Update the documentation about the interaction between the suspend (S3) call
> path and the CPU hotplug infrastructure.
> This patch focusses only on the activities of the freezer, cpu hotplug and
> the notifications involved. It outlines how regular CPU hotplug differs from
> the way it is invoked during suspend and also tries to explain the locking
> involved. In addition to that, it discusses the issue of microcode update
> during CPU hotplug operations.
> 
> v3:
>    * Split the diagram into two, in order to avoid giving the wrong notion
>      that this document explains the situation of CPU hotplug and suspend
>      running in parallel.
>    * Added a short description about CPU microcode update during CPU hotplug
>      operations in different scenarios.
>    * Added a section on known issues when CPU hotplug and suspend race with
>      each other.
> 
> v2:
>    * Clarified the question, to emphasize that the document explains only
>      the difference (and similarity) in the two code paths but not what
>      happens when race conditions occur between them.
> 
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Applied to linux-pm/linux-next.

Thanks,
Rafael


> ---
> 
>  Documentation/power/00-INDEX                   |    2 
>  Documentation/power/suspend-and-cpuhotplug.txt |  277 ++++++++++++++++++++++++
>  2 files changed, 279 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/power/suspend-and-cpuhotplug.txt
> 
> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
> index 45e9d4a..a4d682f 100644
> --- a/Documentation/power/00-INDEX
> +++ b/Documentation/power/00-INDEX
> @@ -26,6 +26,8 @@ s2ram.txt
>  	- How to get suspend to ram working (and debug it when it isn't)
>  states.txt
>  	- System power management states
> +suspend-and-cpuhotplug.txt
> +	- Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug
>  swsusp-and-swap-files.txt
>  	- Using swap files with software suspend (to disk)
>  swsusp-dmcrypt.txt
> diff --git a/Documentation/power/suspend-and-cpuhotplug.txt b/Documentation/power/suspend-and-cpuhotplug.txt
> new file mode 100644
> index 0000000..a1acf1b
> --- /dev/null
> +++ b/Documentation/power/suspend-and-cpuhotplug.txt
> @@ -0,0 +1,277 @@
> +Interaction of Suspend code (S3) with the CPU hotplug infrastructure
> +
> +     (C) 2011 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> +
> +
> +I. How does the regular CPU hotplug code differ from how the Suspend-to-RAM
> +   infrastructure uses it internally? And where do they share common code?
> +
> +Well, a picture is worth a thousand words... So ASCII art follows :-)
> +
> +[This depicts the current design in the kernel, and focusses only on the
> +interactions involving the freezer and CPU hotplug and also tries to explain
> +the locking involved. It outlines the notifications involved as well.
> +But please note that here, only the call paths are illustrated, with the aim
> +of describing where they take different paths and where they share code.
> +What happens when regular CPU hotplug and Suspend-to-RAM race with each other
> +is not depicted here.]
> +
> +On a high level, the suspend-resume cycle goes like this:
> +
> +|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
> +|tasks |    |     cpus      |    |          |    |     cpus     |    |tasks|
> +
> +
> +More details follow:
> +
> +                                Suspend call path
> +                                -----------------
> +
> +                                  Write 'mem' to
> +                                /sys/power/state
> +                                    syfs file
> +                                        |
> +                                        v
> +                               Acquire pm_mutex lock
> +                                        |
> +                                        v
> +                             Send PM_SUSPEND_PREPARE
> +                                   notifications
> +                                        |
> +                                        v
> +                                   Freeze tasks
> +                                        |
> +                                        |
> +                                        v
> +                              disable_nonboot_cpus()
> +                                   /* start */
> +                                        |
> +                                        v
> +                            Acquire cpu_add_remove_lock
> +                                        |
> +                                        v
> +                             Iterate over CURRENTLY
> +                                   online CPUs
> +                                        |
> +                                        |
> +                                        |                ----------
> +                                        v                          | L
> +             ======>               _cpu_down()                     |
> +            |              [This takes cpuhotplug.lock             |
> +  Common    |               before taking down the CPU             |
> +   code     |               and releases it when done]             | O
> +            |            While it is at it, notifications          |
> +            |            are sent when notable events occur,       |
> +             ======>     by running all registered callbacks.      |
> +                                        |                          | O
> +                                        |                          |
> +                                        |                          |
> +                                        v                          |
> +                            Note down these cpus in                | P
> +                                frozen_cpus mask         ----------
> +                                        |
> +                                        v
> +                           Disable regular cpu hotplug
> +                        by setting cpu_hotplug_disabled=1
> +                                        |
> +                                        v
> +                            Release cpu_add_remove_lock
> +                                        |
> +                                        v
> +                       /* disable_nonboot_cpus() complete */
> +                                        |
> +                                        v
> +                                   Do suspend
> +
> +
> +
> +Resuming back is likewise, with the counterparts being (in the order of
> +execution during resume):
> +* enable_nonboot_cpus() which involves:
> +   |  Acquire cpu_add_remove_lock
> +   |  Reset cpu_hotplug_disabled to 0, thereby enabling regular cpu hotplug
> +   |  Call _cpu_up() [for all those cpus in the frozen_cpus mask, in a loop]
> +   |  Release cpu_add_remove_lock
> +   v
> +
> +* thaw tasks
> +* send PM_POST_SUSPEND notifications
> +* Release pm_mutex lock.
> +
> +
> +It is to be noted here that the pm_mutex lock is acquired at the very
> +beginning, when we are just starting out to suspend, and then released only
> +after the entire cycle is complete (i.e., suspend + resume).
> +
> +
> +
> +                          Regular CPU hotplug call path
> +                          -----------------------------
> +
> +                                Write 0 (or 1) to
> +                       /sys/devices/system/cpu/cpu*/online
> +                                    sysfs file
> +                                        |
> +                                        |
> +                                        v
> +                                    cpu_down()
> +                                        |
> +                                        v
> +                           Acquire cpu_add_remove_lock
> +                                        |
> +                                        v
> +                          If cpu_hotplug_disabled is 1
> +                                return gracefully
> +                                        |
> +                                        |
> +                                        v
> +             ======>                _cpu_down()
> +            |              [This takes cpuhotplug.lock
> +  Common    |               before taking down the CPU
> +   code     |               and releases it when done]
> +            |            While it is at it, notifications
> +            |           are sent when notable events occur,
> +             ======>    by running all registered callbacks.
> +                                        |
> +                                        |
> +                                        v
> +                          Release cpu_add_remove_lock
> +                               [That's it!, for
> +                              regular CPU hotplug]
> +
> +
> +
> +So, as can be seen from the two diagrams (the parts marked as "Common code"),
> +regular CPU hotplug and the suspend code path converge at the _cpu_down() and
> +_cpu_up() functions. They differ in the arguments passed to these functions,
> +in that during regular CPU hotplug, 0 is passed for the 'tasks_frozen'
> +argument. But during suspend, since the tasks are already frozen by the time
> +the non-boot CPUs are offlined or onlined, the _cpu_*() functions are called
> +with the 'tasks_frozen' argument set to 1.
> +[See below for some known issues regarding this.]
> +
> +
> +Important files and functions/entry points:
> +------------------------------------------
> +
> +kernel/power/process.c : freeze_processes(), thaw_processes()
> +kernel/power/suspend.c : suspend_prepare(), suspend_enter(), suspend_finish()
> +kernel/cpu.c: cpu_[up|down](), _cpu_[up|down](), [disable|enable]_nonboot_cpus()
> +
> +
> +
> +II. What are the issues involved in CPU hotplug?
> +    -------------------------------------------
> +
> +There are some interesting situations involving CPU hotplug and microcode
> +update on the CPUs, as discussed below:
> +
> +[Please bear in mind that the kernel requests the microcode images from
> +userspace, using the request_firmware() function defined in
> +drivers/base/firmware_class.c]
> +
> +
> +a. When all the CPUs are identical:
> +
> +   This is the most common situation and it is quite straightforward: we want
> +   to apply the same microcode revision to each of the CPUs.
> +   To give an example of x86, the collect_cpu_info() function defined in
> +   arch/x86/kernel/microcode_core.c helps in discovering the type of the CPU
> +   and thereby in applying the correct microcode revision to it.
> +   But note that the kernel does not maintain a common microcode image for the
> +   all CPUs, in order to handle case 'b' described below.
> +
> +
> +b. When some of the CPUs are different than the rest:
> +
> +   In this case since we probably need to apply different microcode revisions
> +   to different CPUs, the kernel maintains a copy of the correct microcode
> +   image for each CPU (after appropriate CPU type/model discovery using
> +   functions such as collect_cpu_info()).
> +
> +
> +c. When a CPU is physically hot-unplugged and a new (and possibly different
> +   type of) CPU is hot-plugged into the system:
> +
> +   In the current design of the kernel, whenever a CPU is taken offline during
> +   a regular CPU hotplug operation, upon receiving the CPU_DEAD notification
> +   (which is sent by the CPU hotplug code), the microcode update driver's
> +   callback for that event reacts by freeing the kernel's copy of the
> +   microcode image for that CPU.
> +
> +   Hence, when a new CPU is brought online, since the kernel finds that it
> +   doesn't have the microcode image, it does the CPU type/model discovery
> +   afresh and then requests the userspace for the appropriate microcode image
> +   for that CPU, which is subsequently applied.
> +
> +   For example, in x86, the mc_cpu_callback() function (which is the microcode
> +   update driver's callback registered for CPU hotplug events) calls
> +   microcode_update_cpu() which would call microcode_init_cpu() in this case,
> +   instead of microcode_resume_cpu() when it finds that the kernel doesn't
> +   have a valid microcode image. This ensures that the CPU type/model
> +   discovery is performed and the right microcode is applied to the CPU after
> +   getting it from userspace.
> +
> +
> +d. Handling microcode update during suspend/hibernate:
> +
> +   Strictly speaking, during a CPU hotplug operation which does not involve
> +   physically removing or inserting CPUs, the CPUs are not actually powered
> +   off during a CPU offline. They are just put to the lowest C-states possible.
> +   Hence, in such a case, it is not really necessary to re-apply microcode
> +   when the CPUs are brought back online, since they wouldn't have lost the
> +   image during the CPU offline operation.
> +
> +   This is the usual scenario encountered during a resume after a suspend.
> +   However, in the case of hibernation, since all the CPUs are completely
> +   powered off, during restore it becomes necessary to apply the microcode
> +   images to all the CPUs.
> +
> +   [Note that we don't expect someone to physically pull out nodes and insert
> +   nodes with a different type of CPUs in-between a suspend-resume or a
> +   hibernate/restore cycle.]
> +
> +   In the current design of the kernel however, during a CPU offline operation
> +   as part of the suspend/hibernate cycle (the CPU_DEAD_FROZEN notification),
> +   the existing copy of microcode image in the kernel is not freed up.
> +   And during the CPU online operations (during resume/restore), since the
> +   kernel finds that it already has copies of the microcode images for all the
> +   CPUs, it just applies them to the CPUs, avoiding any re-discovery of CPU
> +   type/model and the need for validating whether the microcode revisions are
> +   right for the CPUs or not (due to the above assumption that physical CPU
> +   hotplug will not be done in-between suspend/resume or hibernate/restore
> +   cycles).
> +
> +
> +III. Are there any known problems when regular CPU hotplug and suspend race
> +     with each other?
> +
> +Yes, they are listed below:
> +
> +1. When invoking regular CPU hotplug, the 'tasks_frozen' argument passed to
> +   the _cpu_down() and _cpu_up() functions is *always* 0.
> +   This might not reflect the true current state of the system, since the
> +   tasks could have been frozen by an out-of-band event such as a suspend
> +   operation in progress. Hence, it will lead to wrong notifications being
> +   sent during the cpu online/offline events (eg, CPU_ONLINE notification
> +   instead of CPU_ONLINE_FROZEN) which in turn will lead to execution of
> +   inappropriate code by the callbacks registered for such CPU hotplug events.
> +
> +2. If a regular CPU hotplug stress test happens to race with the freezer due
> +   to a suspend operation in progress at the same time, then we could hit the
> +   situation described below:
> +
> +    * A regular cpu online operation continues its journey from userspace
> +      into the kernel, since the freezing has not yet begun.
> +    * Then freezer gets to work and freezes userspace.
> +    * If cpu online has not yet completed the microcode update stuff by now,
> +      it will now start waiting on the frozen userspace in the
> +      TASK_UNINTERRUPTIBLE state, in order to get the microcode image.
> +    * Now the freezer continues and tries to freeze the remaining tasks. But
> +      due to this wait mentioned above, the freezer won't be able to freeze
> +      the cpu online hotplug task and hence freezing of tasks fails.
> +
> +   As a result of this task freezing failure, the suspend operation gets
> +   aborted.
> +
> +
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-10-19 22:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-11 20:19 [PATCH] Documentation/power: Update docs about suspend and CPU hotplug Srivatsa S. Bhat
2011-10-11 22:02 ` Rafael J. Wysocki
2011-10-12  4:17   ` Srivatsa S. Bhat
2011-10-12 19:19     ` Rafael J. Wysocki
2011-10-12 21:13       ` Srivatsa S. Bhat
2011-10-14  6:24         ` [PATCH v2] " Srivatsa S. Bhat
2011-10-14 15:44           ` Alan Stern
2011-10-14 18:18             ` Srivatsa S. Bhat
2011-10-17 13:10             ` [PATCH v3] " Srivatsa S. Bhat
2011-10-19 22:08               ` Rafael J. Wysocki
2011-10-15 22:42         ` [PATCH] " Rafael J. Wysocki
2011-10-16  0:14           ` Srivatsa S. Bhat

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.