On Tue, Jul 25, 2017 at 10:52:27AM +0200, Cédric Le Goater wrote: > On 07/24/2017 12:03 PM, David Gibson wrote: > > On Mon, Jul 24, 2017 at 05:20:26PM +1000, Benjamin Herrenschmidt wrote: > >> On Mon, 2017-07-24 at 15:38 +1000, David Gibson wrote: > >>> > >>> Can we assign our logical numbers sparsely, or will that cause other > >>> problems? > >> > >> The main issue is that they probably needs to be the same between XICS > >> and XIVE because by the time we get the CAS call to chose between XICS > >> and XIVE, we have already handed out interrupts and constructed the DT, > >> no ? Unless we do a real CAS reboot... > > > > A real CAS reboot probably isn't unreasonable for this case. > > > > I definitely think we need to go one way or the other - either fully > > unify the irq mapping between xics and xive, or fully separate them. > > To be able to change interrupt model at CAS time, we need to unify > the IRQ numbering. Not necessarily, though it certainly might make things easier. > We don't have much choice because the DT is > already populated. We could change that, though. > We also need to share the ICSIRQState flags unless > we share the interrupt source object between the XIVE and XICS mode. > > In my current tree, I made sure that the same IRQ number ranges > were being used in the XIVE and in the XICS allocator and that the > ICSIRQState flags of the different sPAPR Interrupt sources (XIVE > and XICS) were in sync. That works pretty well for reset, migration > and hotplug, but it is bit hacky. > > C. > > > >> Otherwise, there's no reason they can't be sparse no. > >> > >>> Note that for PAPR we also have the question of finding logical > >>> interrupts for legacy PAPR VIO devices. > >> > >> We just make them another range ? With KVM legacy today, I just use the > >> generic interrupt facility for those. So when you do the ioctl to > >> "trigger" one, I just do an MMIO to the corresponding page and the > >> interrupt magically shows up wherever the guest is running the target > >> vcpu. In fact, I'd like to add a way to mmap that page into qemu so > >> that qemu can triggers them without an ioctl. > > > > Ok. > > > >> The guest doesn't care, from the guest perspective they are interrupts > >> coming from the DT, so they are like PCI etc... > > > > Ok. > > > >>>> We can fix the number of "generic" interrupts given to a guest. The > >>>> only requirements from a PAPR perspective is that there should be at > >>>> least as many as there are possible threads in the guest so they can be > >>>> used as IPIs. > >>> > >>> Ok. If we can do things sparsely, allocating these well away from the > >>> hw interrupts would make things easier. > >>> > >>>> But we may need more for other things. We can make this a machine > >>>> parameter with a default value of something like 4096. If we call N > >>>> that number of extra generic interrupts, then the number of generic > >>>> interrutps would be #possible-vcpu's + N, or something like that. > >>> > >>> That seems reasonable. > >>> > >>>>>> But it's fundamentally an allocator that sits in the hypervisor, so in > >>>>>> our case, I would say in the spapr "component" of XIVE, rather than the > >>>>>> XIVE HW model itself. > >>>>> > >>>>> Maybe.. > >>>> > >>>> You are right in that a mapping is a better term than an allocator > >>>> here. > >>>> > >>>>>> Now what Cedric did, because XIVE is very complex and we need something > >>>>>> for PAPR quickly, is not a complete HW model, but a somewhat simplified > >>>>>> one that only handles what PAPR exposes. So in that case where the > >>>>>> allocator sits is a bit of a TBD... > >>>>> > >>>>> Hm, ok. My concern here is that "dynamic" allocation of irqs at the > >>>>> machine type level needs extreme caution, or the irqs may not be > >>>>> stable which will generally break migration. > >>>> > >>>> Yes you are right. We should probably create a more "static" scheme. > >>> > >>> Sounds like we're in violent agreement. > >> > >> Yup :) > >> > > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson