All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eduardo Habkost <ehabkost@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: qemu-devel@nongnu.org, kraxel@redhat.com,
	liuxiaojian6@huawei.com, mst@redhat.com, rkrcmar@redhat.com,
	peterx@redhat.com, kevin@koconnor.net, pbonzini@redhat.com,
	lersek@redhat.com, chao.gao@intel.com
Subject: Re: [Qemu-devel] [PATCH v4 11/13] pc: add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs
Date: Thu, 20 Oct 2016 12:15:07 -0200	[thread overview]
Message-ID: <20161020141507.GA5623@thinpad.lan.raisama.net> (raw)
In-Reply-To: <20161020152750.015ca010@nial.brq.redhat.com>

On Thu, Oct 20, 2016 at 03:27:50PM +0200, Igor Mammedov wrote:
> On Thu, 20 Oct 2016 10:27:33 -0200
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Thu, Oct 20, 2016 at 01:27:34PM +0200, Igor Mammedov wrote:
> > > On Wed, 19 Oct 2016 16:29:29 -0200
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > >   
> > > > On Wed, Oct 19, 2016 at 05:18:38PM +0200, Igor Mammedov wrote:  
> > > > > On Wed, 19 Oct 2016 11:15:46 -0200
> > > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > > >     
> > > > > > On Wed, Oct 19, 2016 at 02:05:41PM +0200, Igor Mammedov wrote:    
> > > > > > > Currently firmware uses 1 byte at 0x5F offset in RTC CMOS
> > > > > > > to get number of CPUs present at boot. However 1 byte is
> > > > > > > not enough to handle more than 255 CPUs.  So add a new
> > > > > > > fw_cfg file that would allow QEMU to tell it.
> > > > > > > For compat reasons add file only for machine types that
> > > > > > > support more than 255 CPUs.
> > > > > > > 
> > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>      
> > > > > > [...]    
> > > > > > >  static X86CPU *pc_new_cpu(const char *typename, int64_t apic_id,
> > > > > > >                            Error **errp)
> > > > > > >  {
> > > > > > > @@ -1232,6 +1221,11 @@ static void pc_build_feature_control_file(PCMachineState *pcms)
> > > > > > >      fw_cfg_add_file(pcms->fw_cfg, "etc/msr_feature_control", val, sizeof(*val));
> > > > > > >  }
> > > > > > >  
> > > > > > > +static void rtc_set_cpus_count(ISADevice *rtc, uint16_t cpus_count)
> > > > > > > +{
> > > > > > > +    rtc_set_memory(rtc, 0x5f, cpus_count - 1);      
> > > > > > 
> > > > > > If we have more than 255 CPUs, shouldn't we at least tell the old
> > > > > > BIOS that we have 255, instead of silently truncating bits?    
> > > > > It won't do any good to BIOS as it would hang in AP wakeup due to
> > > > >  (expected != woken up) condition.    
> > > > 
> > > > Even in this case, truncating bits makes it a bit unpredictable:
> > > > having 257 CPUs would set RTC memory to 0, BIOS will believe it
> > > > is a UP system.  
> > > and it will do AP wakeup regardless, where old BIOS will hang
> > > due to unexpectedly woken-up CPUs regardless of value in cmos.  
> > 
> > 0 (1 CPU) seems to be the only value that will not hang (at least
> > it won't hang at the same point). If cmos_cpu_count is 1, SeaBIOS
> > won't even wait for the other CPUs to wake up.
> I don't see it in code
> 
> here is current/old seabios smp init flow:
> 
>     if (BSP HAS APIC DISABLED) {                             
>         // No apic - only the main cpu is present.                               
>         dprintf(1, "No apic - only the main cpu is present.\n");                 
>         CountCPUs= 1;                                                            
>         return;                                                                  
>     }     
> 

We also have these lines:

    u8 apic_id = ebx>>24;
    FoundAPICIDs[apic_id/32] |= (1 << (apic_id % 32));
    CountCPUs = 1;

>     AP WAKEUP regardless of CMOS_BIOS_SMP_COUNT value
> 
>     u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;                       
>     while (cmos_smp_count != CountCPUs) // we are doomed here due to AP race and
>                                         // mostly hang here regardless of cmos_smp_count value
>                                         // as CountCPUs could be anything at this point and counting up

CountCPUs will be always 1 on the first (cmos_smp_count !=
CountCPUs) check, because handle_smp() (which change CountCPUs)
is protected by SMPLock.

>                                         
> 
> and it's been pretty much the same logic throughout history,
> that's why it doesn't matter  if rtc_read(CMOS_BIOS_SMP_COUNT) is 0 or something else.
> 
> SMP support has been introduced by:
> (e97ca7bd1 Forward port bochs smp changes; rename smpdetect.c to smp.c.)
> 
> 
> [...]
> > That said, setting it to 0 (1 CPU) sounds like the best option.
> > But I would be OK with any other value as long as it is
> > predictable.
> So counting above SeaBIOS behavior shall I still set cmos to 0?

In the case we get unpredictable behavior from SeaBIOS for any
value (I will make some tests to confirm that), setting it to 0
still seems more likely to avoid weird things from happening with
BIOSes or OSes we don't control.

> 
> > 
> > >   
> > > > > >     
> > > > > > > +}
> > > > > > > +
> > > > > > >  static
> > > > > > >  void pc_machine_done(Notifier *notifier, void *data)
> > > > > > >  {
> > > > > > > @@ -1240,7 +1234,7 @@ void pc_machine_done(Notifier *notifier, void *data)
> > > > > > >      PCIBus *bus = pcms->bus;
> > > > > > >  
> > > > > > >      /* set the number of CPUs */
> > > > > > > -    rtc_set_memory(pcms->rtc, 0x5f, pc_present_cpus_count(pcms) - 1);
> > > > > > > +    rtc_set_cpus_count(pcms->rtc, le16_to_cpu(pcms->boot_cpus_le));
> > > > > > >  
> > > > > > >      if (bus) {
> > > > > > >          int extra_hosts = 0;
> > > > > > > @@ -1261,8 +1255,15 @@ void pc_machine_done(Notifier *notifier, void *data)
> > > > > > >  
> > > > > > >      acpi_setup();
> > > > > > >      if (pcms->fw_cfg) {
> > > > > > > +        MachineClass *mc = MACHINE_GET_CLASS(pcms);
> > > > > > > +
> > > > > > >          pc_build_smbios(pcms->fw_cfg);
> > > > > > >          pc_build_feature_control_file(pcms);
> > > > > > > +
> > > > > > > +        if (mc->max_cpus > 255) {
> > > > > > > +            fw_cfg_add_file(pcms->fw_cfg, "etc/boot-cpus", &pcms->boot_cpus_le,
> > > > > > > +                            sizeof(pcms->boot_cpus_le));
> > > > > > > +        }
> > > > > > >      }
> > > > > > >  }
> > > > > > >  
> > > > > > > @@ -1786,9 +1787,11 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev,
> > > > > > >          }
> > > > > > >      }
> > > > > > >  
> > > > > > > +    /* increment the number of CPUs */
> > > > > > > +    pcms->boot_cpus_le = cpu_to_le16(le16_to_cpu(pcms->boot_cpus_le) + 1);      
> > > > > > 
> > > > > > Is this really safe? What if the guest is in the middle of a
> > > > > > etc/boot-cpus read?    
> > > > > It's safe for boot CPUs but
> > > > > it's not safe to hotplug cpus CPU during BIOS boot at all
> > > > > as number of CPUs read from boot_cpus might not match number
> > > > > of CPUs that received INIT/SIPI wakeup.
> > > > > This problem is ignored for now, I've dropped related SeaBIOS patch
> > > > > by Kevin's request and would explore it some more to avoid race there.
> > > > > 
> > > > > Anyways,
> > > > > Do you have an idea how to improve reading from pcms->boot_cpus_le and make it atomic?    
> > > > 
> > > > No idea. SeaBIOS seems to use insb to read fw_cfg files, so we
> > > > could be updating the data between two reads. I don't think we
> > > > want to design a fw_cfg guest<->host synchronization mechanism.
> > > > 
> > > > I believe the solution is to not change it at all after booting
> > > > the guest. I suggest initializing/reinitializing fw_cfg data only
> > > > on reset.  
> > > I've checked it once more and value read by BIOS atomically
> > > as both QEMU and SeaBIOS use dma interface to transfer data.
> > > BIOS transfers control to QEMU once via
> > > 
> > >  outl(PORT_QEMU_CFG_DMA_ADDR_LOW)
> > > 
> > > and after that access to pcms->boot_cpus_le is protected by BQL.
> > > So there is no need to invent some sort of synchronization
> > > or limit update to fixed points (machine_done/reset).  
> > 
> > You're right. I was assuming the worst case and looking at the
> > non-DMA path.
> > 
> > In this case, I believe we're safe except when running old BIOS
> > that doesn't support fw_cfg DMA.
> old bios won't read this file (as it doesn't know about it)

Oops, you're right. :)

-- 
Eduardo

  reply	other threads:[~2016-10-20 14:15 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-19 12:05 [Qemu-devel] [PATCH v4 00/13] pc: q35: x2APIC support in kvm_apic mode Igor Mammedov
2016-10-19 12:05 ` [Qemu-devel] [PATCH v4 01/13] pc: acpi: x2APIC support for MADT table and _MAT method Igor Mammedov
2016-10-19 12:54   ` Eduardo Habkost
2016-10-19 12:05 ` [Qemu-devel] [PATCH v4 02/13] pc: acpi: x2APIC support for SRAT table Igor Mammedov
2016-10-19 12:54   ` Eduardo Habkost
2016-10-19 12:05 ` [Qemu-devel] [PATCH v4 03/13] acpi: cphp: force switch to modern cpu hotplug if APIC ID > 254 Igor Mammedov
2016-10-19 12:54   ` Eduardo Habkost
2016-10-19 12:05 ` [Qemu-devel] [PATCH v4 04/13] pc: leave max apic_id_limit only in legacy cpu hotplug code Igor Mammedov
2016-10-19 12:05 ` [Qemu-devel] [PATCH v4 05/13] pc: apic_common: extend APIC ID property to 32bit Igor Mammedov
2016-10-19 12:55   ` Eduardo Habkost
2016-10-19 12:05 ` [Qemu-devel] [PATCH v4 06/13] pc: apic_common: restore APIC ID to initial ID on reset Igor Mammedov
2016-10-19 12:56   ` Eduardo Habkost
2016-10-19 12:05 ` [Qemu-devel] [PATCH v4 07/13] pc: apic_common: reset APIC ID to initial ID when switching into x2APIC mode Igor Mammedov
2016-10-19 12:56   ` Eduardo Habkost
2016-10-19 12:05 ` [Qemu-devel] [PATCH v4 08/13] pc: kvm_apic: pass APIC ID depending on xAPIC/x2APIC mode Igor Mammedov
2016-10-19 12:05 ` [Qemu-devel] [PATCH v4 09/13] pc: clarify FW_CFG_MAX_CPUS usage comment Igor Mammedov
2016-10-19 12:58   ` Eduardo Habkost
2016-10-19 12:05 ` [Qemu-devel] [PATCH v4 10/13] increase MAX_CPUMASK_BITS from 255 to 288 Igor Mammedov
2016-10-19 13:16   ` Eduardo Habkost
2016-10-19 12:05 ` [Qemu-devel] [PATCH v4 11/13] pc: add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs Igor Mammedov
2016-10-19 13:15   ` Eduardo Habkost
2016-10-19 15:18     ` Igor Mammedov
2016-10-19 18:29       ` Eduardo Habkost
2016-10-20 11:27         ` Igor Mammedov
2016-10-20 12:27           ` Eduardo Habkost
2016-10-20 13:27             ` Igor Mammedov
2016-10-20 14:15               ` Eduardo Habkost [this message]
2016-10-20 14:42                 ` Igor Mammedov
2016-10-20 14:49             ` Kevin O'Connor
2016-10-20 14:58   ` [Qemu-devel] [PATCH v5 " Igor Mammedov
2016-10-20 18:11     ` Eduardo Habkost
2016-10-20 18:51     ` [Qemu-devel] [PATCH] fixup! " Eduardo Habkost
2016-10-21  8:53       ` Igor Mammedov
2016-10-21 11:41         ` Eduardo Habkost
2016-10-27 22:08         ` Michael S. Tsirkin
2016-10-28  0:53           ` Eduardo Habkost
2016-10-19 12:05 ` [Qemu-devel] [PATCH v4 12/13] pc: require IRQ remapping and EIM if there could be x2APIC CPUs Igor Mammedov
2016-10-19 13:17   ` Eduardo Habkost
2016-10-19 12:05 ` [Qemu-devel] [PATCH v4 13/13] pc: q35: bump max_cpus to 288 Igor Mammedov
2016-10-19 13:19   ` Eduardo Habkost
2016-10-19 13:29 ` [Qemu-devel] [PATCH v4 00/13] pc: q35: x2APIC support in kvm_apic mode Daniel P. Berrange
2016-10-19 15:22   ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161020141507.GA5623@thinpad.lan.raisama.net \
    --to=ehabkost@redhat.com \
    --cc=chao.gao@intel.com \
    --cc=imammedo@redhat.com \
    --cc=kevin@koconnor.net \
    --cc=kraxel@redhat.com \
    --cc=lersek@redhat.com \
    --cc=liuxiaojian6@huawei.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.