From mboxrd@z Thu Jan 1 00:00:00 1970 From: vjoss197@gmail.com (Vaibhav Jain) Date: Mon, 22 Aug 2011 23:10:30 -0700 Subject: Difference between logical and physical cpu hotplug In-Reply-To: References: Message-ID: To: kernelnewbies@lists.kernelnewbies.org List-Id: kernelnewbies.lists.kernelnewbies.org On Sun, Aug 21, 2011 at 1:09 PM, Srivatsa Bhat wrote: > > > On Sat, Aug 20, 2011 at 4:05 AM, Vaibhav Jain wrote: > >> >> On Thu, Aug 18, 2011 at 11:14 AM, Srivatsa Bhat wrote: >> >>> >>> >>> On Thu, Aug 18, 2011 at 11:40 PM, Srivatsa Bhat < >>> bhat.srivatsa at gmail.com> wrote: >>> >>>> >>>> >>>> On Thu, Aug 18, 2011 at 10:44 PM, Vaibhav Jain wrote: >>>> >>>>> >>>>> >>>>> On Thu, Aug 18, 2011 at 9:02 AM, srivatsa bhat < >>>>> bhat.srivatsa at gmail.com> wrote: >>>>> >>>>>> Hi Vaibhav, >>>>>> >>>>>> On Thu, Aug 18, 2011 at 8:24 PM, Vaibhav Jain wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I talked to a friend of mine and he suggested that >>>>>>> in a logical offline state the cpu is powered on and ready to execute >>>>>>> instructions >>>>>>> just that the kernel is not aware of it. But in case of physical >>>>>>> offline state the cpu >>>>>>> is powered off and cannot run. >>>>>>> Are you saying something similar ? >>>>>>> >>>>>>> Yes, you are right, mostly. >>>>>> When you try to logically offline a CPU, the kernel will do task >>>>>> migration (i.e., move out all the tasks running on that CPU to other CPUs in >>>>>> the system) and it ensures that it doesn't need that CPU anymore. This also >>>>>> means that, from now on, the context of that CPU need not be saved (because >>>>>> the kernel has moved that CPU's tasks elsewhere). At this point, it is as if >>>>>> the kernel is purposefully using only a subset of the available CPUs. This >>>>>> step is a necessary prerequisite to do physical CPU offline later on. >>>>>> >>>>>> But I don't think CPU power ON or OFF is the differentiating factor >>>>>> between logical and physical offlining. In logical offline, you still have >>>>>> the CPUs in the system but you just tell the kernel not to use them. At this >>>>>> stage, you can power off your CPU, to save power for example. >>>>>> But in physical offline, from a software perspective, you do >>>>>> additional work at the firmware level (apart from logical offlining at the >>>>>> OS level), to ensure that physically plugging out the CPUs will not affect >>>>>> the running system in any way. >>>>>> >>>>>> Please note that you can logically online and offline the same CPUs >>>>>> over and over again without rebooting the system. Here, while onlining a CPU >>>>>> which was offlined previously, the kernel follows almost the same sequence >>>>>> which it normally follows while booting the CPUs during full system booting. >>>>>> >>>>>> Also one more thing to be noted is that, to be able to physically >>>>>> hot-plug CPUs, apart from OS and firmware support, you also need the >>>>>> hardware to support this feature. That is, the electrical wiring to the >>>>>> individual CPUs should be such that plugging them in and out does not >>>>>> interfere with the functioning of the rest of the system. As of today, there >>>>>> are only a few systems that support physical CPU-hotplug. But you can do >>>>>> logical CPU hotplug easily, by configuring the kernel appropriately during >>>>>> compilation, as you have noted in one of your previous mails. >>>>>> >>>>>> Regards, >>>>>> Srivatsa S. Bhat >>>>>> >>>>> >>>>> >>>>> Hi Srivatsa, >>>>> >>>>> That was great explanation! Thanks! >>>>> I have just one more query. You mentioned above that " the kernel >>>>> follows almost the same *sequence *which it normally follows while >>>>> booting the CPUs during full system booting." >>>>> >>>>> Can you please explain this sequence a little ? >>>>> >>>>> >>>> Hi Vaibhav, >>>> >>>> I'll try to outline a very high level view of what happens while booting >>>> an SMP (Symmetric Multi-Processor) system. Instead of going through the >>>> entire boot sequence, let me just highlight only the part that is of >>>> interest in this discussion: booting multiple CPUs. >>>> >>>> The "boot processor" is the one which is booted first while booting a >>>> system. On x86 architecture, CPU 0 is always the boot processor. Hence, if >>>> you have observed, you cannot offline CPU0 using CPU hot-plugging on an x86 >>>> machine. (On an Intel box, the file /sys/devices/system/cpu/cpu0/online is >>>> purposefully absent, for this reason!). But in other architectures, this >>>> might not be the case. For example on POWER architecture, any processor in >>>> the system can act as the boot processor. >>>> >>>> Once the boot processor does its initialization, the other processors, >>>> known as "secondary processors or application processors (APs)" are >>>> booted/initialized. Here, obviously some synchronization mechanism is >>>> necessary to ensure that this order is followed. So in Linux, we use 2 >>>> bitmasks called "cpu_callout_mask" and "cpu_callin_mask". These bitmasks are >>>> used to indicate the processors available in the system. >>>> >>>> Once the boot processor initializes itself, it updates cpu_callout_mask >>>> to indicate which secondary processor (or application processor AP) can >>>> initialize itself next (for example, the boot processor sets a particular >>>> bit as 1 in the cpu_callout_mask). On the other hand, the secondary >>>> processor would have done some very basic initialization till then and will >>>> be testing the value of 'cpu_callout_mask' in a while loop to see if its >>>> number has been "called out" by the boot processor. Only after the boot >>>> processor "calls out" this AP, this AP will continue the rest of its >>>> initialization and completes it. >>>> >>>> Once the AP completes its initialization, it reports back to the boot >>>> processor by setting its number in the cpu_callin_mask. As expected, the >>>> boot processor would have been waiting in a while loop on cpu_callin_mask to >>>> see if this AP booted OK or not. Once it finds that the cpu_callin_mask for >>>> this AP has been set, the boot processor follows the same procedure to boot >>>> other APs: i.e., it updates cpu_callout_mask and waits for the corresponding >>>> entry to be set in cpu_callin_mask by that AP and so on. This process >>>> continues until all the APs are booted up. >>>> >>>> Of course, each of these "waiting" times (of both boot processor and >>>> APs) are capped by some preset value, say for example 5 seconds. If some AP >>>> takes more than that time to boot, the boot processor declares that the AP >>>> could not boot and takes appropriate action (like clearing its bit in >>>> cpu_callout_mask and logically removing that AP from its tables etc, >>>> effectively forgetting about that processor). Similarly while the APs wait >>>> for the boot processor to call them out, if the boot processor does not call >>>> them within a given time period, they declare kernel panic. >>>> >>>> Here are some references, if you are interested in more details: >>>> >>>> Linux kernel source code: >>>> 1. linux/arch/x86/kernel/smpboot.c : start_secondary() and smp_callin() >>>> These are the functions executed by the APs (secondary or >>>> application processors). Actually smp_callin() is called within >>>> start_secondary() which is the primary function executed by APs. >>>> >>>> 2. linux/arch/x86/kernel/smpboot.c : do_boot_cpu() >>>> >>> This is executed by the boot processor. You can look up other >>> important functions such as native_cpu_up(). >>> >>> General SMP booting info: >>> 1. http://www.cheesecake.org/sac/smp.html >>> >>> [ Sorry, I accidentally sent the earlier mail before composing the text >>> fully. ] >>> >>> Regards, >>> Srivatsa S. Bhat >>> >> >> >> >> Awesome explanation Srivatsa!! Thanks a lot!! >> Just had one more doubt. I am a little unclear about how the APs get >> initialized in the beginning. In the case of Boot Processor >> its just like a uniprocessor system. But how do the APs start executing >> code ? >> Could you please explain a little ? >> >> > Sure. But please note that I will stick to Intel architecture while > explaining the details. > > The Boot CPU or the Boot-Strap Processor (BSP) is the one which boots the > Operating System. Then it wakes up the APs (Application Processors) when it > is the right time. > > Let us now explore some background details to understand how all this > works. > On uniprocessor systems we use PIC (Programmable Interrupt Controller) like > the 8259A Interrupt Controller chip to deliver interrupts to the processor. > On Multi-Processor (MP) systems, we use something known as APICs (Advanced > Programmable Interrupt Controllers). Every processor has a local APIC. > And there are one or more I/O APICs in the system that are shared by all > the processors. As the name suggests, I/O APICs are used to deliver > interrupts from I/O devices to the processors, via the local APICs. > > All local APICs have unique IDs that are assigned either by the hardware or > the BIOS during the initialization phase. Using the local APIC ID we can > identify the processors in the system. > > Using these local APICs, we can send something known as "Inter-Processor > Interrupts" or IPIs. As the name suggests, this is a mechanism for one > processor to interrupt another processor in the system. Note that this > mechanism can be used by any processor to talk to any other processor in > the system (no distinction between BSP and APs here). > > To kick-start the APs, the BSP sends INIT IPI to each AP in turn, waits for > some time for the IPI to be delivered to the AP and then checks if that AP > booted up. Based on the version of the APIC used, the BSP might have to send > 2 STARTUP IPIs to the APs with some time delay after each of the IPIs. > [ If you have discrete APICs (i.e., 82489DX APIC) then INIT IPI will do. If > you have integrated APIC, you need to send two STARTUP IPIs. ] > All this is in accordance with the "Universal Start-up Algorithm" to start > APs, as specified by Intel architecture. These IPIs cause an INIT at the AP > to which it was delivered. > > Now the question is, how do you make the APs to execute a particular piece > of code (i.e., jump to a specified location) on start-up? > We know that whenever a processor starts after a RESET or INIT, it starts > executing code from the reset vector (a predefined location). > However if you want a processor to immediately jump to an address that you > have specified, you must use the INIT IPI as part of a "warm-reset". > Warm-reset allows you to send INIT signal to a processor without causing > the processor to go through the entire BIOS initialization (POST -- see > below for details) and then start the processor's execution at the > warm-reset-vector. > > By putting the appropriate pointer (i.e., pointer to the AP start-up code) > in the warm-reset-vector (system RAM location 40:67h), setting the BIOS > shutdown code to 0Ah (which tells the BIOS that this INIT is part of a > warm-reset) and then causing an INIT at the processor (via the IPIs), the > Operating System can cause the processor to jump immediately to any location > and start executing that code. This is how the BSP can boot the APs and make > them execute some particular piece of code (in this case, the AP start-up > code as designed in the OS). > > It would be worthwhile to understand what would be the state of the system > (and the APs) before the Operating System gets control from the BIOS after > switching ON the machine. The BIOS, upon system start, performs a procedure > known as POST (Power-On Self Test). This is to check the status of all the > components/circuitry of the system, including the processors, to ensure that > they are all functioning properly. During this phase the BIOS initializes > all the circuitry (including all the APICs and the processors) to some known > configuration and then puts all the APs to the HALT state with interrupts > disabled. This is to ensure that the APs don't execute Operating System code > (we want only the BSP to execute the OS code initially). Then the BSP starts > executing OS code. > > To boot APs, the BSP sends IPIs to them. But IPIs are non-maskable (note > that the APs were in HALT state, with interrupts disabled). Hence the BSP > will be able to kick-start AP execution and by using the warm-reset > mechanism, it can direct the APs to execute some particular piece of code at > startup. The BSP would have put a pointer to that AP start-up code in the > warm-reset-vector address before sending the INIT or STARTUP IPIs to the > APs. > > You might be wondering how does the BSP specify to its local APIC as to > which AP it must send an IPI to.. > The answer is simple. During BIOS POST, an MP (Multi-Processor) > Configuration Table will be set up (in conjunction with BSP and APs) in a > well-known region of memory, which will be read by the OS during boot up. > This table contains the local APIC IDs of all the processors. > So, while sending the targeted IPIs using its local APIC, the BSP specifies > the local APIC ID of the target AP which it wants to interrupt (and boot in > this case). This ensures the delivery of the IPI to the correct AP. > > In short, this is how a Multi-Processor system gets rolling ... :-) > > For more details you can refer: > > 1. Intel Multi-Processor Specification, especially Appendix A and B. > http://www.intel.com/design/pentium/datashts/242016.htm > > > 2. linux/arch/x86/kernel/smpboot.c : > do_boot_cpu(), wakeup_secondary_cpu_via_init(), native_cpu_up(), > start_secondary() > > 3. linux/arch/x86/kernel/head_32.S: > startup_32_smp() > > 4. linux/arch/x86/kernel/trampoline_32.S > > 5. http://tldp.org/HOWTO/Linux-i386-Boot-Code-HOWTO/smpboot.html > > Regards, > Srivatsa S. Bhat > > Srivatsa, you are awsome! Thanks a lot!! I am just wondering what all is required to gain this depth of knowledge :) Thanks Vaibhav Jain -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20110822/6a192568/attachment-0001.html