linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Toshi Kani <toshi.kani@hp.com>
Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de,
	mingo@redhat.com, hpa@zytor.com, x86@kernel.org, bp@suse.de,
	paul.gortmaker@windriver.com, JBeulich@suse.com,
	prarit@redhat.com, drjones@redhat.com, riel@redhat.com,
	gong.chen@linux.intel.com, andi@firstfloor.org, lenb@kernel.org,
	rjw@rjwysocki.net, linux-acpi@vger.kernel.org
Subject: Re: [PATCH v4 5/5] x86: initialize secondary CPU only if master CPU will wait for it
Date: Fri, 2 May 2014 10:21:20 +0200	[thread overview]
Message-ID: <20140502102120.5415bf5b@nial.usersys.redhat.com> (raw)
In-Reply-To: <1398985916.1789.75.camel@misato.fc.hp.com>

On Thu, 01 May 2014 17:11:56 -0600
Toshi Kani <toshi.kani@hp.com> wrote:

> On Mon, 2014-04-14 at 17:11 +0200, Igor Mammedov wrote:
> > Hang is observed on virtual machines during CPU hotplug,
> > especially in big guests with many CPUs. (It reproducible
> > more often if host is over-committed).
> > 
> > It happens because master CPU gives up waiting on
> > secondary CPU and allows it to run wild. As result
> > AP causes locking or crashing system. For example
> > as described here: https://lkml.org/lkml/2014/3/6/257
> > 
> > If master CPU have sent STARTUP IPI successfully,
> > and AP signalled to master CPU that it's ready
> > to start initialization, make master CPU wait
> > indefinitely till AP is onlined.
> > To ensure that AP won't ever run wild, make it
> > wait at early startup till master CPU confirms its
> > intention to wait for AP.
> 
> Please also add description that the master CPU times out when an AP
> does not start initialization within 10 seconds.
added

> 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > v2:
> >  - ammend comment in cpu_init()
> > v3:
> >  - leave timeouts in do_boot_cpu(), so that master CPU
> >    won't hang if AP doesn't respond, use cpu_initialized_mask
> >    as a way for AP to signal to master CPU that it's ready
> >    to start initialzation.
> > v4:
> >  - move common code in cpu_init() for x32/x64 in shared
> >    helper function wait_for_master_cpu()
> >  - add WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask))
> >    to wait_formaster_cpu()
> > ---
> >  arch/x86/kernel/cpu/common.c |   27 +++++++-----
> >  arch/x86/kernel/smpboot.c    |   98 +++++++++++++-----------------------------
> >  2 files changed, 46 insertions(+), 79 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> > index a135239..a4bcbac 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -1221,6 +1221,17 @@ static void dbg_restore_debug_regs(void)
> >  #define dbg_restore_debug_regs()
> >  #endif /* ! CONFIG_KGDB */
> >  
> > +static void wait_for_master_cpu(int cpu)
> > +{
> > +	/*
> > +	 * wait for ACK from master CPU before continuing
> > +	 * with AP initialization
> > +	 */
> > +	WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask));
> > +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > +		cpu_relax();
> > +}
> > +
> >  /*
> >   * cpu_init() initializes state that is per-CPU. Some data is already
> >   * initialized (naturally) in the bootstrap process, such as the GDT
> > @@ -1236,16 +1247,17 @@ void cpu_init(void)
> >  	struct task_struct *me;
> >  	struct tss_struct *t;
> >  	unsigned long v;
> > -	int cpu;
> > +	int cpu = stack_smp_processor_id();
> >  	int i;
> >  
> > +	wait_for_master_cpu(cpu);
> > +
> >  	/*
> >  	 * Load microcode on this cpu if a valid microcode is available.
> >  	 * This is early microcode loading procedure.
> >  	 */
> >  	load_ucode_ap();
> >  
> > -	cpu = stack_smp_processor_id();
> >  	t = &per_cpu(init_tss, cpu);
> >  	oist = &per_cpu(orig_ist, cpu);
> >  
> > @@ -1257,9 +1269,6 @@ void cpu_init(void)
> >  
> >  	me = current;
> >  
> > -	if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask))
> > -		panic("CPU#%d already initialized!\n", cpu);
> > -
> >  	pr_debug("Initializing CPU#%d\n", cpu);
> >  
> >  	clear_in_cr4(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
> > @@ -1336,13 +1345,9 @@ void cpu_init(void)
> >  	struct tss_struct *t = &per_cpu(init_tss, cpu);
> >  	struct thread_struct *thread = &curr->thread;
> >  
> > -	show_ucode_info_early();
> > +	wait_for_master_cpu(cpu);
> >  
> > -	if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask)) {
> > -		printk(KERN_WARNING "CPU#%d already initialized!\n", cpu);
> > -		for (;;)
> > -			local_irq_enable();
> > -	}
> > +	show_ucode_info_early();
> >  
> >  	printk(KERN_INFO "Initializing CPU#%d\n", cpu);
> >  
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index ae2fd97..44903ad 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -111,7 +111,6 @@ atomic_t init_deasserted;
> >  static void smp_callin(void)
> >  {
> >  	int cpuid, phys_id;
> > -	unsigned long timeout;
> >  
> >  	/*
> >  	 * If waken up by an INIT in an 82489DX configuration
> > @@ -130,37 +129,6 @@ static void smp_callin(void)
> >  	 * (This works even if the APIC is not enabled.)
> >  	 */
> >  	phys_id = read_apic_id();
> > -	if (cpumask_test_cpu(cpuid, cpu_callin_mask)) {
> > -		panic("%s: phys CPU#%d, CPU#%d already present??\n", __func__,
> > -					phys_id, cpuid);
> > -	}
> > -	pr_debug("CPU#%d (phys ID: %d) waiting for CALLOUT\n", cpuid, phys_id);
> > -
> > -	/*
> > -	 * STARTUP IPIs are fragile beasts as they might sometimes
> > -	 * trigger some glue motherboard logic. Complete APIC bus
> > -	 * silence for 1 second, this overestimates the time the
> > -	 * boot CPU is spending to send the up to 2 STARTUP IPIs
> > -	 * by a factor of two. This should be enough.
> > -	 */
> > -
> > -	/*
> > -	 * Waiting 2s total for startup (udelay is not yet working)
> > -	 */
> > -	timeout = jiffies + 2*HZ;
> > -	while (time_before(jiffies, timeout)) {
> > -		/*
> > -		 * Has the boot CPU finished it's STARTUP sequence?
> > -		 */
> > -		if (cpumask_test_cpu(cpuid, cpu_callout_mask))
> > -			break;
> > -		cpu_relax();
> > -	}
> > -
> > -	if (!time_before(jiffies, timeout)) {
> > -		panic("%s: CPU%d started up but did not get a callout!\n",
> > -		      __func__, cpuid);
> > -	}
> >  
> >  	/*
> >  	 * the boot CPU has finished the init stage and is spinning
> > @@ -750,8 +718,8 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
> >  	unsigned long start_ip = real_mode_header->trampoline_start;
> >  
> >  	unsigned long boot_error = 0;
> > -	int timeout;
> >  	int cpu0_nmi_registered = 0;
> > +	unsigned long timeout;
> >  
> >  	/* Just in case we booted with a single CPU. */
> >  	alternatives_enable_smp();
> > @@ -799,6 +767,14 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
> >  	}
> >  
> >  	/*
> > +	 * AP might wait on cpu_callout_mask in cpu_init() with
> > +	 * cpu_initialized_mask set if previous attempt to online
> > +	 * it timed-out. Clear cpu_initialized_mask so that after
> > +	 * INIT/SIPI it could start with a clean state.
> > +	 */
> > +	cpumask_clear_cpu(cpu, cpu_initialized_mask);
> 
> I think smp_mb() should be added here to ensure that the target AP sees
> this change.
ok

> 
> > +
> > +	/*
> >  	 * Wake up a CPU in difference cases:
> >  	 * - Use the method in the APIC driver if it's defined
> >  	 * Otherwise,
> > @@ -810,55 +786,41 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
> >  		boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
> >  						     &cpu0_nmi_registered);
> >  
> > +
> >  	if (!boot_error) {
> >  		/*
> > -		 * allow APs to start initializing.
> > +		 * Wait 10s total for a response from AP
> >  		 */
> > -		pr_debug("Before Callout %d\n", cpu);
> > -		cpumask_set_cpu(cpu, cpu_callout_mask);
> > -		pr_debug("After Callout %d\n", cpu);
> > +		boot_error = -1;
> > +		timeout = jiffies + 10*HZ;
> > +		while (time_before(jiffies, timeout)) {
> > +			if (cpumask_test_cpu(cpu, cpu_initialized_mask)) {
> > +				/*
> > +				 * Tell AP to proceed with initialization
> > +				 */
> > +				cpumask_set_cpu(cpu, cpu_callout_mask);
> > +				boot_error = 0;
> > +				break;
> > +			}
> > +			udelay(100);
> > +			schedule();
> > +		}
> > +	}
> 
> When 10s passed, the master could set a new flag, ex.
> cpu_callout_error_mask, which wait_for_master_cpu() checks and call
> play_dead() when it is set.  This avoids AP to spin forever when 10s
> becomes not long enough.  But it does not have to be part of this
> patchset, though.
I'm reluctant to add another to already too many cpu_*_mask,
maybe we could reuse cpu_initialized_mask by clearing it on timeout.
This way AP spinning on cpu_callout_mask could notice it and halt itself.

It would be better to make it separate patch on top of this series,
to reduce delay of bugfixes in this series.

> 
> > +	if (!boot_error) {
> >  		/*
> > -		 * Wait 5s total for a response
> > +		 * Wait till AP completes initial initialization
> 
> We should generally avoid such wait w/o a timeout condition, but since
> native_cpu_up() waits till cpu_online(cpu) anyway after this point, this
If we don't wait here and fall through into tight loop waiting on
cpu_online(cpu) in native_cpu_up() or check_tsc_sync_source() then
stop_task for syncing MTTRs initiated from AP won't have a chance
to run on the master CPU.

> seems OK...  I wonder if we need touch_nmi_watchdog(), though.
There wasn't any touch_nmi_watchdog() in the original code and I don't
think we need it here since we are not just spinning on CPU but giving
control back to kernel calling schedule(), which would allow watchdog_task
to do the job if needed.

> Thanks,
> -Toshi
> 
> 


  reply	other threads:[~2014-05-02  8:22 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-14 15:11 [PATCH v4 0/5] x86: fix hang when AP bringup is too slow Igor Mammedov
2014-04-14 15:11 ` [PATCH v4 1/5] x86: fix list corruption on CPU hotplug Igor Mammedov
2014-04-30 21:18   ` Toshi Kani
2014-04-14 15:11 ` [PATCH v4 2/5] x86: fix memory corruption in acpi_unmap_lsapic() Igor Mammedov
2014-04-14 15:11 ` [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined Igor Mammedov
2014-04-15  5:48   ` Rafael J. Wysocki
2014-04-15  6:00     ` Igor Mammedov
2014-04-15  6:04     ` Ingo Molnar
2014-04-15 15:48       ` Rafael J. Wysocki
2014-04-15  5:53   ` Rafael J. Wysocki
2014-04-30 21:25   ` Toshi Kani
2014-05-02 11:32     ` Igor Mammedov
2014-05-02 17:23       ` Toshi Kani
2014-04-14 15:11 ` [PATCH v4 4/5] x86: log error on secondary CPU wakeup failure at ERR level Igor Mammedov
2014-04-30 21:30   ` Toshi Kani
2014-04-14 15:11 ` [PATCH v4 5/5] x86: initialize secondary CPU only if master CPU will wait for it Igor Mammedov
2014-05-01 23:11   ` Toshi Kani
2014-05-02  8:21     ` Igor Mammedov [this message]
2014-05-02 14:52       ` Toshi Kani
2014-05-05 20:26         ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140502102120.5415bf5b@nial.usersys.redhat.com \
    --to=imammedo@redhat.com \
    --cc=JBeulich@suse.com \
    --cc=andi@firstfloor.org \
    --cc=bp@suse.de \
    --cc=drjones@redhat.com \
    --cc=gong.chen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paul.gortmaker@windriver.com \
    --cc=prarit@redhat.com \
    --cc=riel@redhat.com \
    --cc=rjw@rjwysocki.net \
    --cc=tglx@linutronix.de \
    --cc=toshi.kani@hp.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).