On Thu, Feb 07, 2019 at 10:13:48AM +0100, Cédric Le Goater wrote: > On 2/7/19 3:48 AM, David Gibson wrote: > > On Wed, Feb 06, 2019 at 08:07:36AM +0100, Cédric Le Goater wrote: > >> On 2/6/19 2:24 AM, David Gibson wrote: > >>> On Wed, Feb 06, 2019 at 12:23:29PM +1100, David Gibson wrote: > >>>> On Tue, Feb 05, 2019 at 02:03:11PM +0100, Cédric Le Goater wrote: > >>>>> On 2/5/19 6:32 AM, David Gibson wrote: > >>>>>> On Mon, Feb 04, 2019 at 05:07:28PM +0100, Cédric Le Goater wrote: > >>>>>>> On 2/4/19 6:21 AM, David Gibson wrote: > >>>>>>>> On Mon, Jan 07, 2019 at 07:43:27PM +0100, Cédric Le Goater wrote: > >>>>>>>>> Theses are use to capure the XIVE EAS table of the KVM device, the > >>>>>>>>> configuration of the source targets. > >>>>>>>>> > >>>>>>>>> Signed-off-by: Cédric Le Goater > >>>>>>>>> --- > >>>>>>>>> arch/powerpc/include/uapi/asm/kvm.h | 11 ++++ > >>>>>>>>> arch/powerpc/kvm/book3s_xive_native.c | 87 +++++++++++++++++++++++++++ > >>>>>>>>> 2 files changed, 98 insertions(+) > >>>>>>>>> > >>>>>>>>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h > >>>>>>>>> index 1a8740629acf..faf024f39858 100644 > >>>>>>>>> --- a/arch/powerpc/include/uapi/asm/kvm.h > >>>>>>>>> +++ b/arch/powerpc/include/uapi/asm/kvm.h > >>>>>>>>> @@ -683,9 +683,20 @@ struct kvm_ppc_cpu_char { > >>>>>>>>> #define KVM_DEV_XIVE_SAVE_EQ_PAGES 4 > >>>>>>>>> #define KVM_DEV_XIVE_GRP_SOURCES 2 /* 64-bit source attributes */ > >>>>>>>>> #define KVM_DEV_XIVE_GRP_SYNC 3 /* 64-bit source attributes */ > >>>>>>>>> +#define KVM_DEV_XIVE_GRP_EAS 4 /* 64-bit eas attributes */ > >>>>>>>>> > >>>>>>>>> /* Layout of 64-bit XIVE source attribute values */ > >>>>>>>>> #define KVM_XIVE_LEVEL_SENSITIVE (1ULL << 0) > >>>>>>>>> #define KVM_XIVE_LEVEL_ASSERTED (1ULL << 1) > >>>>>>>>> > >>>>>>>>> +/* Layout of 64-bit eas attribute values */ > >>>>>>>>> +#define KVM_XIVE_EAS_PRIORITY_SHIFT 0 > >>>>>>>>> +#define KVM_XIVE_EAS_PRIORITY_MASK 0x7 > >>>>>>>>> +#define KVM_XIVE_EAS_SERVER_SHIFT 3 > >>>>>>>>> +#define KVM_XIVE_EAS_SERVER_MASK 0xfffffff8ULL > >>>>>>>>> +#define KVM_XIVE_EAS_MASK_SHIFT 32 > >>>>>>>>> +#define KVM_XIVE_EAS_MASK_MASK 0x100000000ULL > >>>>>>>>> +#define KVM_XIVE_EAS_EISN_SHIFT 33 > >>>>>>>>> +#define KVM_XIVE_EAS_EISN_MASK 0xfffffffe00000000ULL > >>>>>>>>> + > >>>>>>>>> #endif /* __LINUX_KVM_POWERPC_H */ > >>>>>>>>> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c > >>>>>>>>> index f2de1bcf3b35..0468b605baa7 100644 > >>>>>>>>> --- a/arch/powerpc/kvm/book3s_xive_native.c > >>>>>>>>> +++ b/arch/powerpc/kvm/book3s_xive_native.c > >>>>>>>>> @@ -525,6 +525,88 @@ static int kvmppc_xive_native_sync(struct kvmppc_xive *xive, long irq, u64 addr) > >>>>>>>>> return 0; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> +static int kvmppc_xive_native_set_eas(struct kvmppc_xive *xive, long irq, > >>>>>>>>> + u64 addr) > >>>>>>>> > >>>>>>>> I'd prefer to avoid the name "EAS" here. IIUC these aren't "raw" EAS > >>>>>>>> values, but rather essentially the "source config" in the terminology > >>>>>>>> of the PAPR hcalls. Which, yes, is basically implemented by setting > >>>>>>>> the EAS, but since it's the PAPR architected state that we need to > >>>>>>>> preserve across migration, I'd prefer to stick as close as we can to > >>>>>>>> the PAPR terminology. > >>>>>>> > >>>>>>> But we don't have an equivalent name in the PAPR specs for the tuple > >>>>>>> (prio, server). We could use the generic 'target' name may be ? even > >>>>>>> if this is usually referring to a CPU number. > >>>>>> > >>>>>> Um.. what? That's about terminology for one of the fields in this > >>>>>> thing, not about the name for the thing itself. > >>>>>> > >>>>>>> Or, IVE (Interrupt Vector Entry) ? which makes some sense. > >>>>>>> This is was the former name in HW. I think we recycle it for KVM. > >>>>>> > >>>>>> That's a terrible idea, which will make a confusing situation even > >>>>>> more confusing. > >>>>> > >>>>> Let's use SOURCE_CONFIG and QUEUE_CONFIG. The KVM ioctls are very > >>>>> similar to the hcalls anyhow. > >>>> > >>>> Yes, I think that's a good idea. > >>> > >>> Actually... AIUI the SET_CONFIG hcalls shouldn't be a fast path. > >> > >> No indeed. I have move them to standard hcalls in the current version. > >> > >>> Can > >>> we simplify things further by removing the hcall implementation from > >>> the kernel entirely, and have qemu implement them by basically just > >>> forwarding them to the appropriate SET_CONFIG ioctl()? > >> > >> Yes. I think we could. > > > > Great! > > > >> The hcalls H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG and > >> the KVM ioctls to set the EQ and the SOURCE configuration have a > >> lot in common. I need to look at how we can plug the KVM ioctl in > >> the hcalls under QEMU. > >> > >> We will have to convert the returned error to respect the PAPR > >> specs or have the ioctls return H_* errors. > > > > I don't think returning H_* values from a kernel call is a good idea. > > Converting errors is kinda ugly, but I still think it's the better > > option. Note that we already have something like this for the HPT > > resizing hcalls. > > ok. > > >> Let's dig that idea. If we choose that path, QEMU will have an > >> up-to-date EAT and so we won't need to synchronize its state anymore > >> for migration. > > > > I guess so, though I don't see that as essential. > > > >> H_INT_GET_SOURCE_CONFIG can be implemented in QEMU without any KVM > >> ioctl. > >> > >> H_INT_GET_QUEUE_INFO could be implemented in QEMU. I need to check > >> how we return the address of the END ESB in sPAPR. We haven't paid > >> much attention to these pages because they are not used under Linux > >> and today the address is returned by OPAL. > >> > >> H_INT_GET_QUEUE_CONFIG is a little more problematic because we need > >> to query into the XIVE HW the EQ index and toggle bit. OPAL support > >> is required for that. But we could reduce the KVM support to the > >> ioctl querying these EQ information. > > > > Right, and we'd need an ioctl() like that for migration anyway, yes? > > Yes. it is the same need. > > >> H_INT_ESB could be entirely done under QEMU. > > > > This one can actually happen on fairly hot paths, so I think doing > > that in qemu probably isn't a good idea. > > I agree It would nice to have some performance. > > This hcall is used when LSIs are involved, which is not really a common > configuration. There are no OPAL calls involved. And we are duplicating > code at the KVM level to retrigger the interrupt when the level is still > asserted. > > I will benchmark the two options before making a choice. Ok. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson