From: David Woodhouse <dwmw2@infradead.org>
To: "H. Peter Anvin" <hpa@zytor.com>,
Usama Arif <usama.arif@bytedance.com>,
Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
rcu@vger.kernel.org, mimoja@mimoja.de, hewenliang4@huawei.com,
hushiyuan@huawei.com, luolongjun@huawei.com,
hejingxian@huawei.com, Tom Lendacky <thomas.lendacky@amd.com>,
Sean Christopherson <seanjc@google.com>,
Paul Menzel <pmenzel@molgen.mpg.de>,
Fam Zheng <fam.zheng@bytedance.com>,
Punit Agrawal <punit.agrawal@bytedance.com>,
simon.evans@bytedance.com, liangma@liangbit.com
Subject: Re: [External] Re: [PATCH v4 0/9] Parallel CPU bringup for x86_64
Date: Wed, 01 Feb 2023 17:12:08 +0000 [thread overview]
Message-ID: <3b6ac86fdc800cac5806433daf14a9095be101e9.camel@infradead.org> (raw)
In-Reply-To: <4C7F2481-0B0B-4399-A8E1-30731EFD02D2@zytor.com>
[-- Attachment #1: Type: text/plain, Size: 4398 bytes --]
On Wed, 2023-02-01 at 08:55 -0800, H. Peter Anvin wrote:
> On February 1, 2023 8:38:14 AM PST, Usama Arif <usama.arif@bytedance.com> wrote:
> >
> >
> > On 01/02/2023 15:08, David Woodhouse wrote:
> > > On Wed, 2023-02-01 at 14:40 +0000, Usama Arif wrote:
> > > > On 01/02/2022 20:53, David Woodhouse wrote:
> > > > > Doing the INIT/SIPI/SIPI in parallel for all APs and *then* waiting for
> > > > > them shaves about 80% off the AP bringup time on a 96-thread 2-socket
> > > > > Skylake box (EC2 c5.metal) — from about 500ms to 100ms.
> > > > >
> > > > > There are more wins to be had with further parallelisation, but this is
> > > > > the simple part.
> > > > >
> > > >
> > > > Hi,
> > > >
> > > > We are interested in reducing the boot time of servers (with kexec), and
> > > > smpboot takes up a significant amount of time while booting. When
> > > > testing the patch series (rebased to v6.1) on a server with 128 CPUs
> > > > split across 2 NUMA nodes, it brought down the smpboot time from ~700ms
> > > > to 100ms. Adding another cpuhp state for do_wait_cpu_initialized to make
> > > > sure cpu_init is reached (as done in v1 of the series + using the
> > > > cpu_finishup_mask) brought it down further to ~30ms.
> > > >
> > > > I just wanted to check what was needed to progress the patch series
> > > > further for review? There weren't any comments on v4 of the patch so I
> > > > couldn't figure out what more is needed. I think its quite useful to
> > > > have this working so would be really glad help in anything needed to
> > > > restart the review.
> > >
> > >
> > > I believe the only thing holding it back was the fact that it broke on
> > > some AMD CPUs.
> > >
> > > We don't *think* there are any remaining software issues; we think it's
> > > hardware. Either an actual hardware race in CPU or chipset, or perhaps
> > > even something as simple as a voltage regulator which can't cope with
> > > an increase in power draw from *all* the CPUs at the same time.
> > >
> > > We have prodded AMD a few times to investigate, but so far to no avail.
> > >
> > > Last time I actually spoke to Thomas in person, I think he agreed that
> > > we should just merge it and disable the parallel mode for the affected
> > > AMD CPUs.
> > >
> >
> > From the comments in v3, it seems to affect multiple generations, would it be worth proceeding with the patches by disabling it on all AMD CPUs to be on the safe side, until the actual issue is found and what causes it, and then follow up later if the issue is found by disabling it only on affected cpus. Maybe simply do something like below?
> >
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index 0f144773a7fc..6b8884592341 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -1575,7 +1575,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
> > * for SEV-ES guests because they can't use CPUID that early.
> > */
> > if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 0x0B ||
> > - cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
> > + cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT) ||
> > + boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
> > do_parallel_bringup = false;
> >
> > if (do_parallel_bringup) {
> >
> >
> >
> >
> > > If you've already rebased to a newer kernel and tested it, perhaps now
> > > is the time to do just that.
> >
> > If you would like me to repost the rebased patches to restart the reviews (with do_parallel_bringup disabled for AMD), please let me know!
> >
Sounds like you have a far fresher context on it all than I do now, so
yes please that sounds like a great idea.
I think we still need a sign-off from Thomas on the real mode patch but
as I noted in the last cover letter, now we've *fixed* it perhaps we
can persuade him to concede that it's his? Either that or we post it in
email and hope to trick him into adding a S-o-B in transit as he
applies it...
> > Thanks,
> > Usama
>
> This should be a CPU bug flag in my option.
Yeah, probably true. But I think I agree with Usama that we should do
it for all AMD to start with. Best to err on the side of caution.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]
next prev parent reply other threads:[~2023-02-01 17:12 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-01 20:53 [PATCH v4 0/9] Parallel CPU bringup for x86_64 David Woodhouse
2022-02-01 20:53 ` [PATCH v4 1/9] x86/apic/x2apic: Fix parallel handling of cluster_mask David Woodhouse
2022-02-01 20:53 ` [PATCH v4 2/9] cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h> David Woodhouse
2022-02-01 20:53 ` [PATCH v4 3/9] cpu/hotplug: Add dynamic parallel bringup states before CPUHP_BRINGUP_CPU David Woodhouse
2022-02-01 20:53 ` [PATCH v4 4/9] x86/smpboot: Reference count on smpboot_setup_warm_reset_vector() David Woodhouse
2022-02-01 20:53 ` [PATCH v4 5/9] x86/smpboot: Split up native_cpu_up into separate phases and document them David Woodhouse
2022-02-01 20:53 ` [PATCH v4 6/9] x86/smpboot: Support parallel startup of secondary CPUs David Woodhouse
2022-02-01 20:53 ` [PATCH v4 7/9] x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel David Woodhouse
2022-02-01 20:53 ` [PATCH v4 8/9] x86/mtrr: Avoid repeated save of MTRRs on boot-time CPU bringup David Woodhouse
2022-02-01 20:53 ` [PATCH v4 9/9] x86/smpboot: Serialize topology updates for secondary bringup David Woodhouse
2022-02-07 18:50 ` [PATCH v4 0/9] Parallel CPU bringup for x86_64 Tom Lendacky
2023-02-01 14:40 ` Usama Arif
2023-02-01 15:08 ` David Woodhouse
2023-02-01 16:38 ` [External] " Usama Arif
2023-02-01 16:55 ` H. Peter Anvin
2023-02-01 17:12 ` David Woodhouse [this message]
2023-02-02 10:06 ` David Woodhouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3b6ac86fdc800cac5806433daf14a9095be101e9.camel@infradead.org \
--to=dwmw2@infradead.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=fam.zheng@bytedance.com \
--cc=hejingxian@huawei.com \
--cc=hewenliang4@huawei.com \
--cc=hpa@zytor.com \
--cc=hushiyuan@huawei.com \
--cc=kvm@vger.kernel.org \
--cc=liangma@liangbit.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luolongjun@huawei.com \
--cc=mimoja@mimoja.de \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=pbonzini@redhat.com \
--cc=pmenzel@molgen.mpg.de \
--cc=punit.agrawal@bytedance.com \
--cc=rcu@vger.kernel.org \
--cc=seanjc@google.com \
--cc=simon.evans@bytedance.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=usama.arif@bytedance.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.