All of lore.kernel.org
 help / color / mirror / Atom feed
From: Babu Moger <babu.moger@amd.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
	"ehabkost@redhat.com" <ehabkost@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"rth@twiddle.net" <rth@twiddle.net>
Subject: RE: [PATCH v3 2/3] hw/i386: Add a new check to configure smp dies for EPYC
Date: Thu, 13 Aug 2020 16:10:39 -0500	[thread overview]
Message-ID: <baeab075-a900-f0d1-4ed6-40b7a8a2988a@amd.com> (raw)
In-Reply-To: <20200813155610.02b1c5cc@redhat.com>



> -----Original Message-----
> From: Igor Mammedov <imammedo@redhat.com>
> Sent: Thursday, August 13, 2020 8:56 AM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>; ehabkost@redhat.com;
> mst@redhat.com; qemu-devel@nongnu.org; pbonzini@redhat.com;
> rth@twiddle.net
> Subject: Re: [PATCH v3 2/3] hw/i386: Add a new check to configure smp dies for
> EPYC
> 
> On Tue, 11 Aug 2020 16:03:58 -0500
> Babu Moger <babu.moger@amd.com> wrote:
> 
> > On 8/7/20 2:11 PM, Igor Mammedov wrote:
> > > On Fri, 7 Aug 2020 17:52:22 +0100
> > > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >
> > >> On Fri, Aug 07, 2020 at 11:32:51AM -0500, Babu Moger wrote:
> > >>> Adding a new check to warn the users to configure 'dies' when
> > >>> topology is numa configured. It makes it easy to build the
> > >>> topology for EPYC models.
> > >>
> > >> This says you're adding a warning....
> > >>
> > >>>
> > >>> Signed-off-by: Babu Moger <babu.moger@amd.com>
> > >>> ---
> > >>>  hw/i386/x86.c |    7 +++++++
> > >>>  1 file changed, 7 insertions(+)
> > >>>
> > >>> diff --git a/hw/i386/x86.c b/hw/i386/x86.c index
> > >>> 67bee1bcb8..2a6ce56ef1 100644
> > >>> --- a/hw/i386/x86.c
> > >>> +++ b/hw/i386/x86.c
> > >>> @@ -138,6 +138,13 @@ void x86_cpus_init(X86MachineState *x86ms,
> > >>> int default_cpu_version)
> > >>>
> > >>>      /* Check for apicid encoding */
> > >>>      if (cpu_x86_use_epyc_apic_id_encoding(ms->cpu_type)) {
> > >>> +        if ((ms->numa_state->num_nodes > 0) &&
> > >>> +            ms->numa_state->num_nodes != (ms->smp.sockets * x86ms-
> >smp_dies)) {
> > >>> +            error_setg(&error_fatal, "Numa configuration requires smp 'dies' "
> > >>> +                       "parameter. Configure the cpu topology properly with "
> > >>> +                       "max_cpus = sockets * dies * cores * threads");
> > >>
> > >> ...but you're actually making this a fatal error, not a warning.
> > >>
> > >> I'm not sure this is really OK. Wouldn't this mean that existing
> > >> VMs deployed today, risk triggering this fatal error next time they
> > >> are booted, or live migrated.  If it is possible someone is using
> > >> such a config today, I don't think we can break it.
> > >
> > > to begin with, users shouldn't have used 'dies' with initial impl. at all.
> > > (it was Intel introduced option and EPYC's added very similar
> > > internal node_id (removed by the next patch)).
> > > Now we are trying to consolidate this mess and reuse dies for EPYC.
> > >
> > > EPYC was out in the since with 5.0 (though broken), users could
> > > start a VM with such config but that would not be correct EPYC from apicid
> and cpuid point of view.
> > > Guest OS might run if it doesn't know about EPYCs or behave wierdly
> > > (sub optimal|crash|whatever) on seeing unexpected values.
> > >
> > > If we are hell bound on keeping bugs of initial impl, then we should
> > > keep it to 5.1<= machine version and do the right thing for newer ones.
> > > Though I'm not sure we should keep broken variant around (all we
> > > would get from it is bug reports*/complains from users with end result of
> their config anyways).
> > > I'd rather error out with clear error message so user could fix their broken
> config.
> > >
> > > *) there is at least one thread/bz on qemu-devel where users are
> > > trying to run with EPYC and pick up options combination so it would produce
> sensible topology.
> >
> >
> > I am still not sure what is the right approach here.  I can think of
> > couple of options.
> > 1. If smp_dies != num_nodes then go ahead create the configuration
> > with as  many smp_dies and warn(but not error out) users about the mis-
> configuration.
> warning is a bad idea, that usually leads to troubles down the road.
> 
> Provided that code is relatively new and produces misconfigured CPUs and if
> nobody insists on keeping bug around, I'd try to go for erroring out.
> Yes that would break misconfigured configs but that could be fixed by
> reconfiguring on user side.

Ok. I will refresh the patches if there are no other comments. thanks

> 
> > 2. Introduce it as a fix based on  machine version(5.1 >) like Igor
> > mentioned. I am not sure how to achieve that. I can look into that.
> That's a headache for maintaing point of view, so again if nobody insist I'd
> rather avoid it.
> 
> >
> > Thanks
> > Babu
> >
> > >
> > >
> > >> Regards,
> > >> Daniel
> > >
> >



  reply	other threads:[~2020-08-13 21:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-07 16:32 [PATCH v3 0/3] Fix couple of issues with AMD topology Babu Moger
2020-08-07 16:32 ` [PATCH v3 1/3] i386: Simplify CPUID_8000_001E for AMD Babu Moger
2020-08-07 18:11   ` Igor Mammedov
2020-08-07 16:32 ` [PATCH v3 2/3] hw/i386: Add a new check to configure smp dies for EPYC Babu Moger
2020-08-07 16:52   ` Daniel P. Berrangé
2020-08-07 17:23     ` Babu Moger
2020-08-07 19:11     ` Igor Mammedov
2020-08-11 21:03       ` Babu Moger
2020-08-13 13:56         ` Igor Mammedov
2020-08-13 21:10           ` Babu Moger [this message]
2020-08-07 19:27   ` Igor Mammedov
2020-08-11 21:04     ` Babu Moger
2020-08-07 16:32 ` [PATCH v3 3/3] hw/i386: Remove node_id, nr_nodes and nodes_per_pkg from topology Babu Moger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=baeab075-a900-f0d1-4ed6-40b7a8a2988a@amd.com \
    --to=babu.moger@amd.com \
    --cc=berrange@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.