All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	Dario Faggioli <dfaggioli@suse.com>
Subject: Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
Date: Mon, 16 Jul 2018 14:47:07 +0200	[thread overview]
Message-ID: <c678a828-b1d8-884e-44dc-d8e97611d1de@suse.com> (raw)
In-Reply-To: <5B4C8D3702000078001D45EA@suse.com>

On 16/07/18 14:19, Jan Beulich wrote:
>>>> On 16.07.18 at 13:47, <jgross@suse.com> wrote:
>> On 16/07/18 11:17, Jan Beulich wrote:
>>>>>> On 13.07.18 at 11:02, <jgross@suse.com> wrote:
>>>> On 11/07/18 14:04, Jan Beulich wrote:
>>>>> While I've run into the issue with further patches in place which no
>>>>> longer guarantee the per-CPU area to start out as all zeros, the
>>>>> CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
>>>>> the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
>>>>> of schedule_cpu_switch() will trigger the "c != old_pool" assertion
>>>>> there.
>>>>>
>>>>> Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
>>>>> should not happen before cpu_disable_scheduler()). Clearing it in
>>>>> CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
>>>>> piece of code twice. Since the field's value shouldn't matter while the
>>>>> CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
>>>>> only for other than the suspend/resume case (which gets specially
>>>>> handled in cpupool_cpu_remove()).
>>>>>
>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>> ---
>>>>> TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
>>>>>      cpupool_cpu_remove(), but besides that - as per above - likely
>>>>>      being too early, that function has further prereqs to be met. It
>>>>>      also doesn't look as if cpupool_unassign_cpu_helper() could be used
>>>>>      there.
>>>>>
>>>>> --- a/xen/common/cpupool.c
>>>>> +++ b/xen/common/cpupool.c
>>>>> @@ -778,6 +778,8 @@ static int cpu_callback(
>>>>>      {
>>>>>      case CPU_DOWN_FAILED:
>>>>>      case CPU_ONLINE:
>>>>> +        if ( system_state <= SYS_STATE_active )
>>>>> +            per_cpu(cpupool, cpu) = NULL;
>>>>>          rc = cpupool_cpu_add(cpu);
>>>>
>>>> Wouldn't it make more sense to clear the field in cpupool_cpu_add()
>>>> which already is testing system_state?
>>>
>>> Hmm, this may be a matter of taste: I consider the change done here
>>> a prereq to calling the function in the first place. As said in the
>>> description, I actually think this should come earlier, and it's just that
>>> I can't see how to cleanly do so.
> 
> You didn't comment on this one at all, yet it matters for how a v2
> is supposed to look like.

My comment was thought to address this question, too. cpupool_cpu_add()
is handling the special case of resuming explicitly, where the old cpu
assignment to a cpupool is kept. So I believe setting
  per_cpu(cpupool, cpu) = NULL
in the else clause of cpupool_cpu_add() only is better.

>>>> Modifying the condition in cpupool_cpu_add() to
>>>>
>>>>   if ( system_state <= SYS_STATE_active )
>>>>
>>>> at the same time would have the benefit to catch problems in case
>>>> suspending cpus is failing during SYS_STATE_suspend (I'd expect
>>>> triggering the first ASSERT in schedule_cpu_switch() in this case).
>>>
>>> You mean the if() there, not the else? If so - how would the "else"
>>> body then ever be reached? IOW if anything I could only see the
>>> "else" to become "else if ( system_state <= SYS_STATE_active )".
>>
>> Bad wording on my side.
>>
>> I should have written "the condition in cpupool_cpu_add() should match
>> if ( system_state <= SYS_STATE_active )."
>>
>> So: "if ( system_state > SYS_STATE_active )", as the test is for the
>> other case.
> 
> I'd recommend against this, as someone adding a new SYS_STATE_*
> past suspend/resume would quite likely miss this one. The strong
> ordering of states imo should only be used for active and lower states.
> But yes, I could see the if() there to become suspend || resume to
> address the problem you describe.

Yes, this would seem to be a better choice here.

> Coming back to your DOWN_FAILED consideration: Why are you
> thinking this can't happen during suspend? disable_nonboot_cpus()
> uses plain cpu_down() after all.

Right.

DOWN_FAILED is used only once, and that is in cpu_down() after the step
CPU_DOWN_PREPARE returned an error. And CPU_DOWN_PREPARE is only used
for cpufreq driver where it never returns an error, and for cpupools
which don't matter here, as only other components failing at step
CPU_DOWN_PREPARE would lead to calling cpupool/DOWN_FAILED.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  parent reply	other threads:[~2018-07-16 12:47 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
2018-07-11 12:04 ` [PATCH 1/8] cpupools: fix state when downing a CPU failed Jan Beulich
2018-07-11 12:06 ` [PATCH 2/8] x86: distinguish CPU offlining from CPU removal Jan Beulich
2018-07-12 10:53   ` Wei Liu
2018-07-12 11:48     ` Jan Beulich
2018-07-13  8:39       ` Wei Liu
2018-07-12 12:42   ` Andrew Cooper
2018-07-11 12:06 ` [PATCH 3/8] allow cpu_down() to be called earlier Jan Beulich
2018-07-12 10:55   ` Wei Liu
2018-07-12 12:44   ` Andrew Cooper
2018-07-11 12:07 ` [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads Jan Beulich
2018-07-11 18:11   ` Brian Woods
2018-07-12 13:02   ` Andrew Cooper
2018-07-12 14:22     ` Jan Beulich
2018-07-11 12:09 ` [PATCH 5/8] x86: bring up all CPUs even if not all are supposed to be used Jan Beulich
2018-07-12 15:38   ` Andrew Cooper
2018-07-13  8:11     ` Jan Beulich
2018-07-11 12:10 ` [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads Jan Beulich
2018-07-12 15:45   ` Andrew Cooper
2018-07-13  8:13     ` Jan Beulich
2018-07-16 12:37       ` Andrew Cooper
2018-07-16 12:53         ` Jan Beulich
2018-07-16 13:01           ` Andrew Cooper
2018-07-11 12:11 ` [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus=" Jan Beulich
2018-07-11 12:23   ` Andrew Cooper
2018-07-11 15:18   ` Roger Pau Monné
2018-07-11 16:02   ` Wei Liu
2018-07-11 12:12 ` [PATCH 8/8] cpumask: tidy {,z}alloc_cpumask_var() Jan Beulich
2018-07-11 12:20   ` Andrew Cooper
2018-07-12 15:13   ` Wei Liu
     [not found] ` <5B45F26A02000078001D312F@suse.com>
2018-07-13  9:02   ` [PATCH 1/8] cpupools: fix state when downing a CPU failed Juergen Gross
2018-07-16  9:17     ` Jan Beulich
     [not found]     ` <5B4C629002000078001D4346@suse.com>
2018-07-16 11:47       ` Juergen Gross
2018-07-16 12:19         ` Jan Beulich
     [not found]         ` <5B4C8D3702000078001D45EA@suse.com>
2018-07-16 12:47           ` Juergen Gross [this message]
2018-07-16 13:01             ` Jan Beulich
     [not found]             ` <5B4C973D02000078001D4693@suse.com>
2018-07-16 14:21               ` Juergen Gross
2018-07-16 14:26                 ` Jan Beulich
     [not found]                 ` <5B4CAB1202000078001D47BC@suse.com>
2018-07-16 14:53                   ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c678a828-b1d8-884e-44dc-d8e97611d1de@suse.com \
    --to=jgross@suse.com \
    --cc=JBeulich@suse.com \
    --cc=dfaggioli@suse.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.