All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: Andrew Cooper <Andrew.Cooper3@citrix.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: George Dunlap <George.Dunlap@citrix.com>,
	Dario Faggioli <dfaggioli@suse.com>,
	Gao Ruifeng <ruifeng.gao@intel.com>,
	Jan Beulich <jbeulich@suse.com>
Subject: Re: [PATCH v3 3/3] xen/sched: fix cpu hotplug
Date: Thu, 1 Sep 2022 08:11:14 +0200	[thread overview]
Message-ID: <94576d45-39c2-a786-2fe2-5effb16caf68@suse.com> (raw)
In-Reply-To: <096ed545-f268-ba45-6333-ed51d20fc99c@citrix.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 6454 bytes --]

On 01.09.22 00:52, Andrew Cooper wrote:
> On 16/08/2022 11:13, Juergen Gross wrote:
>> Cpu cpu unplugging is calling schedule_cpu_rm() via stop_machine_run()
> 
> Cpu cpu.
> 
>> with interrupts disabled, thus any memory allocation or freeing must
>> be avoided.
>>
>> Since commit 5047cd1d5dea ("xen/common: Use enhanced
>> ASSERT_ALLOC_CONTEXT in xmalloc()") this restriction is being enforced
>> via an assertion, which will now fail.
>>
>> Before that commit cpu unplugging in normal configurations was working
>> just by chance as only the cpu performing schedule_cpu_rm() was doing
>> active work. With core scheduling enabled, however, failures could
>> result from memory allocations not being properly propagated to other
>> cpus' TLBs.
> 
> This isn't accurate, is it?  The problem with initiating a TLB flush
> with IRQs disabled is that you can deadlock against a remote CPU which
> is waiting for you to enable IRQs first to take a TLB flush IPI.

As long as only one cpu is trying to allocate/free memory during the
stop_machine_run() action the deadlock won't happen.

> How does a memory allocation out of the xenheap result in a TLB flush?
> Even with split heaps, you're only potentially allocating into a new
> slot which was unused...

Yeah, you are right. The main problem would occur only when a virtual
address is changed to point at another physical address, which should be
quite unlikely.

I can drop that paragraph, as it doesn't really help.

> 
>> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
>> index 228470ac41..ffb2d6202b 100644
>> --- a/xen/common/sched/core.c
>> +++ b/xen/common/sched/core.c
>> @@ -3260,6 +3260,17 @@ static struct cpu_rm_data *schedule_cpu_rm_alloc(unsigned int cpu)
>>       if ( !data )
>>           goto out;
>>   
>> +    if ( aff_alloc )
>> +    {
>> +        if ( !update_node_aff_alloc(&data->affinity) )
> 
> I spent ages trying to figure out what this was doing, before realising
> the problem is the function name.
> 
> alloc (as with free) is the critical piece of information and needs to
> come first.  The fact we typically pass the result to
> update_node_aff(inity) isn't relevant, and becomes actively wrong here
> when we're nowhere near.
> 
> Patch 1 needs to name these helpers:
> 
> bool alloc_affinity_masks(struct affinity_masks *affinity);
> void free_affinity_masks(struct affinity_masks *affinity);
> 
> and then patches 2 and 3 become far easier to follow.
> 
> Similarly in patch 2, the new helpers need to be
> {alloc,free}_cpu_rm_data() to make sense.  These have nothing to do with
> scheduling.
> 
> Also, you shouldn't introduce the helpers static in patch 2 and then
> turn them non-static in patch 3.  That just adds unnecessary churn to
> the complicated patch.

Okay to all of above.

> 
>> +        {
>> +            XFREE(data);
>> +            goto out;
>> +        }
>> +    }
>> +    else
>> +        memset(&data->affinity, 0, sizeof(data->affinity));
> 
> I honestly don't think it is worth optimising xzalloc() -> xmalloc()
> for the cognitive complexity of having this logic here.

I don't mind either way. This logic is the result of one of Jan's comments.

> 
>> diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
>> index 58e082eb4c..2506861e4f 100644
>> --- a/xen/common/sched/cpupool.c
>> +++ b/xen/common/sched/cpupool.c
>> @@ -411,22 +411,28 @@ int cpupool_move_domain(struct domain *d, struct cpupool *c)
>>   }
>>   
>>   /* Update affinities of all domains in a cpupool. */
>> -static void cpupool_update_node_affinity(const struct cpupool *c)
>> +static void cpupool_update_node_affinity(const struct cpupool *c,
>> +                                         struct affinity_masks *masks)
>>   {
>> -    struct affinity_masks masks;
>> +    struct affinity_masks local_masks;
>>       struct domain *d;
>>   
>> -    if ( !update_node_aff_alloc(&masks) )
>> -        return;
>> +    if ( !masks )
>> +    {
>> +        if ( !update_node_aff_alloc(&local_masks) )
>> +            return;
>> +        masks = &local_masks;
>> +    }
>>   
>>       rcu_read_lock(&domlist_read_lock);
>>   
>>       for_each_domain_in_cpupool(d, c)
>> -        domain_update_node_aff(d, &masks);
>> +        domain_update_node_aff(d, masks);
>>   
>>       rcu_read_unlock(&domlist_read_lock);
>>   
>> -    update_node_aff_free(&masks);
>> +    if ( masks == &local_masks )
>> +        update_node_aff_free(masks);
>>   }
>>   
>>   /*
> 
> Why do we need this at all?  domain_update_node_aff() already knows what
> to do when passed NULL, so this seems like an awfully complicated no-op.

You do realize that update_node_aff_free() will do something in case masks
was initially NULL?

> 
>> @@ -1008,10 +1016,21 @@ static int cf_check cpu_callback(
>>   {
>>       unsigned int cpu = (unsigned long)hcpu;
>>       int rc = 0;
>> +    static struct cpu_rm_data *mem;
>>   
>>       switch ( action )
>>       {
>>       case CPU_DOWN_FAILED:
>> +        if ( system_state <= SYS_STATE_active )
>> +        {
>> +            if ( mem )
>> +            {
> 
> So, this does compile (and indeed I've tested the result), but I can't
> see how it should.
> 
> mem is guaranteed to be uninitialised at this point, and ...

... it is defined as "static", so it is clearly NULL initially.

> 
>> +                schedule_cpu_rm_free(mem, cpu);
>> +                mem = NULL;
>> +            }
>> +            rc = cpupool_cpu_add(cpu);
>> +        }
>> +        break;
>>       case CPU_ONLINE:
>>           if ( system_state <= SYS_STATE_active )
>>               rc = cpupool_cpu_add(cpu);
>> @@ -1019,12 +1038,31 @@ static int cf_check cpu_callback(
>>       case CPU_DOWN_PREPARE:
>>           /* Suspend/Resume don't change assignments of cpus to cpupools. */
>>           if ( system_state <= SYS_STATE_active )
>> +        {
>>               rc = cpupool_cpu_remove_prologue(cpu);
>> +            if ( !rc )
>> +            {
>> +                ASSERT(!mem);
> 
> ... here, and each subsequent assertion too.
> 
> Given that I tested the patch and it does fix the IRQ assertion, I can
> only imagine that it works by deterministically finding stack rubble
> which happens to be 0.

Not really, as mem isn't on the stack. :-)


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

  reply	other threads:[~2022-09-01  6:11 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-16 10:13 [PATCH v3 0/3] xen/sched: fix cpu hotplug Juergen Gross
2022-08-16 10:13 ` [PATCH v3 1/3] xen/sched: introduce cpupool_update_node_affinity() Juergen Gross
2022-08-16 10:13 ` [PATCH v3 2/3] xen/sched: carve out memory allocation and freeing from schedule_cpu_rm() Juergen Gross
2022-09-01 11:17   ` Andrew Cooper
2022-09-01 11:24     ` Juergen Gross
2022-08-16 10:13 ` [PATCH v3 3/3] xen/sched: fix cpu hotplug Juergen Gross
2022-08-31 22:52   ` Andrew Cooper
2022-09-01  6:11     ` Juergen Gross [this message]
2022-09-01 12:01       ` Andrew Cooper
2022-09-01 12:08         ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94576d45-39c2-a786-2fe2-5effb16caf68@suse.com \
    --to=jgross@suse.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=George.Dunlap@citrix.com \
    --cc=dfaggioli@suse.com \
    --cc=jbeulich@suse.com \
    --cc=ruifeng.gao@intel.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.