From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40285) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bxE7m-0002zu-Ct for qemu-devel@nongnu.org; Thu, 20 Oct 2016 10:15:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bxE7i-0006sB-CX for qemu-devel@nongnu.org; Thu, 20 Oct 2016 10:15:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:17361) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1bxE7i-0006qm-4s for qemu-devel@nongnu.org; Thu, 20 Oct 2016 10:15:10 -0400 Date: Thu, 20 Oct 2016 12:15:07 -0200 From: Eduardo Habkost Message-ID: <20161020141507.GA5623@thinpad.lan.raisama.net> References: <1476878743-144953-1-git-send-email-imammedo@redhat.com> <1476878743-144953-12-git-send-email-imammedo@redhat.com> <20161019131546.GL5057@thinpad.lan.raisama.net> <20161019171838.4a9343f6@nial.brq.redhat.com> <20161019182929.GR5057@thinpad.lan.raisama.net> <20161020132734.33060b44@nial.brq.redhat.com> <20161020122733.GT5057@thinpad.lan.raisama.net> <20161020152750.015ca010@nial.brq.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161020152750.015ca010@nial.brq.redhat.com> Subject: Re: [Qemu-devel] [PATCH v4 11/13] pc: add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Mammedov Cc: qemu-devel@nongnu.org, kraxel@redhat.com, liuxiaojian6@huawei.com, mst@redhat.com, rkrcmar@redhat.com, peterx@redhat.com, kevin@koconnor.net, pbonzini@redhat.com, lersek@redhat.com, chao.gao@intel.com On Thu, Oct 20, 2016 at 03:27:50PM +0200, Igor Mammedov wrote: > On Thu, 20 Oct 2016 10:27:33 -0200 > Eduardo Habkost wrote: > > > On Thu, Oct 20, 2016 at 01:27:34PM +0200, Igor Mammedov wrote: > > > On Wed, 19 Oct 2016 16:29:29 -0200 > > > Eduardo Habkost wrote: > > > > > > > On Wed, Oct 19, 2016 at 05:18:38PM +0200, Igor Mammedov wrote: > > > > > On Wed, 19 Oct 2016 11:15:46 -0200 > > > > > Eduardo Habkost wrote: > > > > > > > > > > > On Wed, Oct 19, 2016 at 02:05:41PM +0200, Igor Mammedov wrote: > > > > > > > Currently firmware uses 1 byte at 0x5F offset in RTC CMOS > > > > > > > to get number of CPUs present at boot. However 1 byte is > > > > > > > not enough to handle more than 255 CPUs. So add a new > > > > > > > fw_cfg file that would allow QEMU to tell it. > > > > > > > For compat reasons add file only for machine types that > > > > > > > support more than 255 CPUs. > > > > > > > > > > > > > > Signed-off-by: Igor Mammedov > > > > > > [...] > > > > > > > static X86CPU *pc_new_cpu(const char *typename, int64_t apic_id, > > > > > > > Error **errp) > > > > > > > { > > > > > > > @@ -1232,6 +1221,11 @@ static void pc_build_feature_control_file(PCMachineState *pcms) > > > > > > > fw_cfg_add_file(pcms->fw_cfg, "etc/msr_feature_control", val, sizeof(*val)); > > > > > > > } > > > > > > > > > > > > > > +static void rtc_set_cpus_count(ISADevice *rtc, uint16_t cpus_count) > > > > > > > +{ > > > > > > > + rtc_set_memory(rtc, 0x5f, cpus_count - 1); > > > > > > > > > > > > If we have more than 255 CPUs, shouldn't we at least tell the old > > > > > > BIOS that we have 255, instead of silently truncating bits? > > > > > It won't do any good to BIOS as it would hang in AP wakeup due to > > > > > (expected != woken up) condition. > > > > > > > > Even in this case, truncating bits makes it a bit unpredictable: > > > > having 257 CPUs would set RTC memory to 0, BIOS will believe it > > > > is a UP system. > > > and it will do AP wakeup regardless, where old BIOS will hang > > > due to unexpectedly woken-up CPUs regardless of value in cmos. > > > > 0 (1 CPU) seems to be the only value that will not hang (at least > > it won't hang at the same point). If cmos_cpu_count is 1, SeaBIOS > > won't even wait for the other CPUs to wake up. > I don't see it in code > > here is current/old seabios smp init flow: > > if (BSP HAS APIC DISABLED) { > // No apic - only the main cpu is present. > dprintf(1, "No apic - only the main cpu is present.\n"); > CountCPUs= 1; > return; > } > We also have these lines: u8 apic_id = ebx>>24; FoundAPICIDs[apic_id/32] |= (1 << (apic_id % 32)); CountCPUs = 1; > AP WAKEUP regardless of CMOS_BIOS_SMP_COUNT value > > u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1; > while (cmos_smp_count != CountCPUs) // we are doomed here due to AP race and > // mostly hang here regardless of cmos_smp_count value > // as CountCPUs could be anything at this point and counting up CountCPUs will be always 1 on the first (cmos_smp_count != CountCPUs) check, because handle_smp() (which change CountCPUs) is protected by SMPLock. > > > and it's been pretty much the same logic throughout history, > that's why it doesn't matter if rtc_read(CMOS_BIOS_SMP_COUNT) is 0 or something else. > > SMP support has been introduced by: > (e97ca7bd1 Forward port bochs smp changes; rename smpdetect.c to smp.c.) > > > [...] > > That said, setting it to 0 (1 CPU) sounds like the best option. > > But I would be OK with any other value as long as it is > > predictable. > So counting above SeaBIOS behavior shall I still set cmos to 0? In the case we get unpredictable behavior from SeaBIOS for any value (I will make some tests to confirm that), setting it to 0 still seems more likely to avoid weird things from happening with BIOSes or OSes we don't control. > > > > > > > > > > > > > > > > > > > +} > > > > > > > + > > > > > > > static > > > > > > > void pc_machine_done(Notifier *notifier, void *data) > > > > > > > { > > > > > > > @@ -1240,7 +1234,7 @@ void pc_machine_done(Notifier *notifier, void *data) > > > > > > > PCIBus *bus = pcms->bus; > > > > > > > > > > > > > > /* set the number of CPUs */ > > > > > > > - rtc_set_memory(pcms->rtc, 0x5f, pc_present_cpus_count(pcms) - 1); > > > > > > > + rtc_set_cpus_count(pcms->rtc, le16_to_cpu(pcms->boot_cpus_le)); > > > > > > > > > > > > > > if (bus) { > > > > > > > int extra_hosts = 0; > > > > > > > @@ -1261,8 +1255,15 @@ void pc_machine_done(Notifier *notifier, void *data) > > > > > > > > > > > > > > acpi_setup(); > > > > > > > if (pcms->fw_cfg) { > > > > > > > + MachineClass *mc = MACHINE_GET_CLASS(pcms); > > > > > > > + > > > > > > > pc_build_smbios(pcms->fw_cfg); > > > > > > > pc_build_feature_control_file(pcms); > > > > > > > + > > > > > > > + if (mc->max_cpus > 255) { > > > > > > > + fw_cfg_add_file(pcms->fw_cfg, "etc/boot-cpus", &pcms->boot_cpus_le, > > > > > > > + sizeof(pcms->boot_cpus_le)); > > > > > > > + } > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > @@ -1786,9 +1787,11 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev, > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > + /* increment the number of CPUs */ > > > > > > > + pcms->boot_cpus_le = cpu_to_le16(le16_to_cpu(pcms->boot_cpus_le) + 1); > > > > > > > > > > > > Is this really safe? What if the guest is in the middle of a > > > > > > etc/boot-cpus read? > > > > > It's safe for boot CPUs but > > > > > it's not safe to hotplug cpus CPU during BIOS boot at all > > > > > as number of CPUs read from boot_cpus might not match number > > > > > of CPUs that received INIT/SIPI wakeup. > > > > > This problem is ignored for now, I've dropped related SeaBIOS patch > > > > > by Kevin's request and would explore it some more to avoid race there. > > > > > > > > > > Anyways, > > > > > Do you have an idea how to improve reading from pcms->boot_cpus_le and make it atomic? > > > > > > > > No idea. SeaBIOS seems to use insb to read fw_cfg files, so we > > > > could be updating the data between two reads. I don't think we > > > > want to design a fw_cfg guest<->host synchronization mechanism. > > > > > > > > I believe the solution is to not change it at all after booting > > > > the guest. I suggest initializing/reinitializing fw_cfg data only > > > > on reset. > > > I've checked it once more and value read by BIOS atomically > > > as both QEMU and SeaBIOS use dma interface to transfer data. > > > BIOS transfers control to QEMU once via > > > > > > outl(PORT_QEMU_CFG_DMA_ADDR_LOW) > > > > > > and after that access to pcms->boot_cpus_le is protected by BQL. > > > So there is no need to invent some sort of synchronization > > > or limit update to fixed points (machine_done/reset). > > > > You're right. I was assuming the worst case and looking at the > > non-DMA path. > > > > In this case, I believe we're safe except when running old BIOS > > that doesn't support fw_cfg DMA. > old bios won't read this file (as it doesn't know about it) Oops, you're right. :) -- Eduardo