linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: "Cédric Le Goater" <clg@kaod.org>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: kvm@vger.kernel.org, kvm-ppc@vger.kernel.org,
	Paul Mackerras <paulus@samba.org>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 06/19] KVM: PPC: Book3S HV: add a GET_ESB_FD control to the XIVE native device
Date: Fri, 8 Feb 2019 08:58:14 +0100	[thread overview]
Message-ID: <9b556f53-fcfb-2ca3-019e-6ced0ec74c2a@kaod.org> (raw)
In-Reply-To: <20190208051523.GD2688@umbus.fritz.box>

On 2/8/19 6:15 AM, David Gibson wrote:
> On Thu, Feb 07, 2019 at 10:03:15AM +0100, Cédric Le Goater wrote:
>> On 2/7/19 3:49 AM, David Gibson wrote:
>>> On Wed, Feb 06, 2019 at 08:21:10AM +0100, Cédric Le Goater wrote:
>>>> On 2/6/19 2:23 AM, David Gibson wrote:
>>>>> On Tue, Feb 05, 2019 at 01:55:40PM +0100, Cédric Le Goater wrote:
>>>>>> On 2/5/19 6:28 AM, David Gibson wrote:
>>>>>>> On Mon, Feb 04, 2019 at 12:30:39PM +0100, Cédric Le Goater wrote:
>>>>>>>> On 2/4/19 5:45 AM, David Gibson wrote:
>>>>>>>>> On Mon, Jan 07, 2019 at 07:43:18PM +0100, Cédric Le Goater wrote:
>>>>>>>>>> This will let the guest create a memory mapping to expose the ESB MMIO
>>>>>>>>>> regions used to control the interrupt sources, to trigger events, to
>>>>>>>>>> EOI or to turn off the sources.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>>>>>>>> ---
>>>>>>>>>>  arch/powerpc/include/uapi/asm/kvm.h   |  4 ++
>>>>>>>>>>  arch/powerpc/kvm/book3s_xive_native.c | 97 +++++++++++++++++++++++++++
>>>>>>>>>>  2 files changed, 101 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
>>>>>>>>>> index 8c876c166ef2..6bb61ba141c2 100644
>>>>>>>>>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>>>>>>>>>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>>>>>>>>>> @@ -675,4 +675,8 @@ struct kvm_ppc_cpu_char {
>>>>>>>>>>  #define  KVM_XICS_PRESENTED		(1ULL << 43)
>>>>>>>>>>  #define  KVM_XICS_QUEUED		(1ULL << 44)
>>>>>>>>>>  
>>>>>>>>>> +/* POWER9 XIVE Native Interrupt Controller */
>>>>>>>>>> +#define KVM_DEV_XIVE_GRP_CTRL		1
>>>>>>>>>> +#define   KVM_DEV_XIVE_GET_ESB_FD	1
>>>>>>>>>
>>>>>>>>> Introducing a new FD for ESB and TIMA seems overkill.  Can't you get
>>>>>>>>> to both with an mmap() directly on the xive device fd?  Using the
>>>>>>>>> offset to distinguish which one to map, obviously.
>>>>>>>>
>>>>>>>> The page offset would define some sort of user API. It seems feasible.
>>>>>>>> But I am not sure this would be practical in the future if we need to 
>>>>>>>> tune the length.
>>>>>>>
>>>>>>> Um.. why not?  I mean, yes the XIVE supports rather a lot of
>>>>>>> interrupts, but we have 64-bits of offset we can play with - we can
>>>>>>> leave room for billions of ESB slots and still have room for billions
>>>>>>> of VPs.
>>>>>>
>>>>>> So the first 4 pages could be the TIMA pages and then would come  
>>>>>> the pages for the interrupt ESBs. I think that we can have different 
>>>>>> vm_fault handler for each mapping.
>>>>>
>>>>> Um.. no, I'm saying you don't need to tightly pack them.  So you could
>>>>> have the ESB pages at 0, the TIMA at, say offset 2^60.
>>>>
>>>> Well, we know that the TIMA is 4 pages wide and is "directly" related
>>>> with the KVM interrupt device. So being at offset 0 seems a good idea.
>>>> While the ESB segment is of a variable size depending on the number
>>>> of IRQs and it can come after I think.
>>>>
>>>>>> I wonder how this will work out with pass-through. As Paul said in 
>>>>>> a previous email, it would be better to let QEMU request a new 
>>>>>> mapping to handle the ESB pages of the device being passed through.
>>>>>> I guess this is not a special case, just another offset and length.
>>>>>
>>>>> Right, if we need multiple "chunks" of ESB pages we can given them
>>>>> each their own terabyte or several.  No need to be stingy with address
>>>>> space.
>>>>
>>>> You can not put them anywhere. They should map the same interrupt range
>>>> of ESB pages, overlapping with the underlying segment of IPI ESB pages. 
>>>
>>> I don't really follow what you're saying here.
>>
>>
>> What we want the guest to access in terms of ESB pages is something like 
>> below, VMA0 being the initial mapping done by QEMU at offset 0x0, the IPI 
>> ESB pages being populated on the demand with the loads and the stores from 
>> the guest :
>>
>>
>>                  0x0                   0x1100  0x1200    0x1300     
>>       
>>          ranges   |       CPU IPIs   .. |  VIO  | PCI LSI |  MSIs
>>        	  
>>                   +-+-+-+-+-+-+-...-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ....
>>  VMA0    IPI ESB  | | | | | | |     | | | | | | | | | | | | | | | | | |
>>           pages   +-+-+-+-+-+-+-...-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ....
>>
>>
>>
>> A device is passed through and the driver requests MSIs. 
>>
>> We now want the guest to access the HW ESB pages for the requested IRQs 
>> but still the initial IPI ESB pages for the others. Something like below : 
>>
>>
>>                  0x0                   0x1100  0x1200    0x1300     
>>       
>>          ranges   |       CPU IPIs   .. |  VIO  | PCI LSI |  MSIs
>>
>>                   +-+-+-+-+-+-+-...-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ....
>>  VMA0    IPI ESB  | | | | | | |     | | | | | | | | | | | | | | | | | |
>>           pages   +-+-+-+-+-+-+-...-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ....
>>                                                                   
>>  VMA1    PHB ESB                                          +-------+
>>           pages                                           | | | | | 
>>                                                           +-------+
> 
> Right, except of course VMA0 will be split into two pieces by
> performing the mmap() over it.
> 
>> The VMA1 is the result of a new mmap() being done at an offset depending on 
>> the first IRQ number requested by the driver.
> 
> Right... that's one way we could do it.  But the irq numbers are all
> dynamically allocated here, so could we instead just put the
> passthrough MSIs in a separate range?  

Hmm, yes. These are still MSIs. I am not sure of the benefits. See below.

> We'd still need a separate
> mmap() for them, but we wouldn't have to deal with mapping over and
> unmapping if the device is removed or whatever.

How would we handle multiples devices being hot-plugged, hot-unplugged 
and hot-replugged ? The ESB pages would be populated the first time 
they are touched and might not be the correct ones if a new device is 
hot-plugged to the machine. 

>> This is because the vm_fault handler uses the page offset to find the 
>> associated KVM IRQ struct containing the addresses of the EOI and trigger 
>> pages in the underlying hardware, which will be the PHB in case of a 
>> passthrough device.  
>>
>> From there, the VMA1 mmap() pointer will be used to create a 'ram device'
>> memory region which will be mapped on top of the initial ESB memory region 
>> in QEMU. This will override the initial IPI ESB pages with the PHB ESB pages 
>> in the guest ESB address space.
> 
> Um.. what?  If that qemu memory range is already mapped into the guest
> we don't need to create new RAM devices or anything for the
> overmapping.  If we overmap in qemu that will just get carried into
> the guest.

yes, that's the goal. 

When the guest accesses the region, the vm_fault handler will be invoked 
and the VMA will be populated with the ESB pages of the device being 
passthrough. When the device is removed from the machine, we only need 
to delete the region from QEMU and munmap() the VMA to clear the mappings.
The underlying pages will be the ones for the XIVE IC IPIs. 

And the IRQ numbers can be safely recycled for another passthrough device.

>> That's the plan I have in mind as suggested by Paul if I understood it well.
>> The mechanics are more complex than the patch zapping the PTEs from the VMA
>> but it's also safer.
> 
> Well, yes, where "safer" means "has the possibility to be correct".

Well, the only problem with the kernel approach is keeping a pointer on 
the VMA. If we could call find_vma(), it would be perfectly safe and much 
more simpler.

C. 
 


  reply	other threads:[~2019-02-08  8:00 UTC|newest]

Thread overview: 135+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
2019-01-07 18:43 ` [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls Cédric Le Goater
2019-01-09  3:33   ` David Gibson
2019-01-09 13:08   ` Michael Ellerman
2019-01-09 13:38     ` Cédric Le Goater
2019-01-07 18:43 ` [PATCH 02/19] powerpc/xive: add OPAL extensions for the XIVE native exploitation support Cédric Le Goater
2019-01-09  4:26   ` David Gibson
2019-01-07 18:43 ` [PATCH 03/19] KVM: PPC: Book3S HV: check the IRQ controller type Cédric Le Goater
2019-01-09  4:27   ` David Gibson
2019-01-22  4:56   ` Paul Mackerras
2019-01-23 16:24     ` Cédric Le Goater
2019-02-04  0:50       ` David Gibson
2019-02-04 10:16         ` Cédric Le Goater
2019-01-07 18:43 ` [PATCH 04/19] KVM: PPC: Book3S HV: export services for the XIVE native exploitation device Cédric Le Goater
2019-01-11  4:09   ` David Gibson
2019-01-07 18:43 ` [PATCH 05/19] KVM: PPC: Book3S HV: add a new KVM device for the XIVE native exploitation mode Cédric Le Goater
2019-01-22  5:05   ` Paul Mackerras
2019-01-23 16:28     ` Cédric Le Goater
2019-01-28 17:35     ` Cédric Le Goater
2019-01-30  4:29       ` Paul Mackerras
2019-01-30  7:01         ` Cédric Le Goater
2019-01-31  3:01           ` Paul Mackerras
2019-02-01 17:03             ` Cédric Le Goater
2019-02-04  4:25   ` David Gibson
2019-02-04 11:19     ` Cédric Le Goater
2019-02-05  5:26       ` David Gibson
2019-01-07 18:43 ` [PATCH 06/19] KVM: PPC: Book3S HV: add a GET_ESB_FD control to the XIVE native device Cédric Le Goater
2019-01-22  5:09   ` Paul Mackerras
2019-01-23 16:48     ` Cédric Le Goater
2019-02-04  4:45   ` David Gibson
2019-02-04 11:30     ` Cédric Le Goater
2019-02-05  5:28       ` David Gibson
2019-02-05 12:55         ` Cédric Le Goater
2019-02-06  1:23           ` David Gibson
2019-02-06  7:21             ` Cédric Le Goater
2019-02-07  2:49               ` David Gibson
2019-02-07  9:03                 ` Cédric Le Goater
2019-02-08  5:15                   ` David Gibson
2019-02-08  7:58                     ` Cédric Le Goater [this message]
2019-02-08 21:53                       ` Paul Mackerras
2019-02-09  9:41                         ` Cédric Le Goater
2019-02-11  2:38                           ` David Gibson
2019-02-11  6:42                             ` Benjamin Herrenschmidt
2019-02-12 22:07                               ` Cédric Le Goater
2019-01-07 18:43 ` [PATCH 07/19] KVM: PPC: Book3S HV: add a GET_TIMA_FD control to " Cédric Le Goater
2019-01-07 18:43 ` [PATCH 08/19] KVM: PPC: Book3S HV: add a VC_BASE control to the " Cédric Le Goater
2019-01-22  5:14   ` Paul Mackerras
2019-01-23 16:56     ` Cédric Le Goater
2019-02-04  4:49       ` David Gibson
2019-02-04 15:36         ` Cédric Le Goater
2019-01-07 18:43 ` [PATCH 09/19] KVM: PPC: Book3S HV: add a SET_SOURCE " Cédric Le Goater
2019-02-04  4:57   ` David Gibson
2019-02-04 19:07     ` Cédric Le Goater
2019-02-05  5:35       ` David Gibson
2019-02-05 13:39         ` Cédric Le Goater
2019-01-07 18:43 ` [PATCH 10/19] KVM: PPC: Book3S HV: add a EISN attribute to kvmppc_xive_irq_state Cédric Le Goater
2019-01-07 18:43 ` [PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls Cédric Le Goater
2019-01-22  5:23   ` Paul Mackerras
2019-01-23  6:44     ` Benjamin Herrenschmidt
2019-01-23  8:48       ` Cédric Le Goater
2019-01-23 10:26         ` Paul Mackerras
2019-01-23 10:48           ` Cédric Le Goater
2019-01-23 21:23           ` Benjamin Herrenschmidt
2019-01-07 18:43 ` [PATCH 12/19] KVM: PPC: Book3S HV: record guest queue page address Cédric Le Goater
2019-02-04  5:15   ` David Gibson
2019-02-04 15:37     ` Cédric Le Goater
2019-01-07 18:43 ` [PATCH 13/19] KVM: PPC: Book3S HV: add a SYNC control for the XIVE native migration Cédric Le Goater
2019-02-04  5:17   ` David Gibson
2019-02-04 15:39     ` Cédric Le Goater
2019-01-07 18:43 ` [PATCH 14/19] KVM: PPC: Book3S HV: add a control to make the XIVE EQ pages dirty Cédric Le Goater
2019-02-04  5:18   ` David Gibson
2019-02-04 15:46     ` Cédric Le Goater
2019-02-05  5:30       ` David Gibson
2019-01-07 18:43 ` [PATCH 15/19] KVM: PPC: Book3S HV: add get/set accessors for the source configuration Cédric Le Goater
2019-02-04  5:21   ` David Gibson
2019-02-04 16:07     ` Cédric Le Goater
2019-02-05  5:32       ` David Gibson
2019-02-05 13:03         ` Cédric Le Goater
2019-02-06  1:23           ` David Gibson
2019-02-06  1:24             ` David Gibson
2019-02-06  7:07               ` Cédric Le Goater
2019-02-07  2:48                 ` David Gibson
2019-02-07  9:13                   ` Cédric Le Goater
2019-02-08  5:15                     ` David Gibson
2019-02-14 16:50                       ` Cédric Le Goater
2019-01-07 18:43 ` [PATCH 16/19] KVM: PPC: Book3S HV: add get/set accessors for the EQ configuration Cédric Le Goater
2019-02-04  5:24   ` David Gibson
2019-02-05 17:45     ` Cédric Le Goater
2019-01-07 19:10 ` [PATCH 17/19] KVM: PPC: Book3S HV: add get/set accessors for the VP XIVE state Cédric Le Goater
2019-01-07 19:10   ` [PATCH 18/19] KVM: PPC: Book3S HV: add passthrough support Cédric Le Goater
2019-01-22  5:26     ` Paul Mackerras
2019-01-23  6:45       ` Benjamin Herrenschmidt
2019-01-23 10:30         ` Paul Mackerras
2019-01-23 11:07           ` Cédric Le Goater
2019-01-28  6:13             ` Paul Mackerras
2019-01-28 18:26               ` Cédric Le Goater
2019-01-29  2:45                 ` Paul Mackerras
2019-01-29 13:47                   ` Cédric Le Goater
2019-01-30  6:20                     ` Paul Mackerras
2019-01-30 15:54                       ` Cédric Le Goater
2019-01-31  2:48                         ` Paul Mackerras
2019-01-29  4:12                 ` Paul Mackerras
2019-01-29 17:44                   ` Cédric Le Goater
2019-01-30  5:55                     ` Paul Mackerras
2019-01-30  7:06                       ` Cédric Le Goater
2019-01-23 21:25           ` Benjamin Herrenschmidt
2019-01-24  8:41             ` Cédric Le Goater
2019-01-28  4:43             ` Paul Mackerras
2019-01-29 13:46               ` Cédric Le Goater
2019-01-07 19:10   ` [PATCH 19/19] KVM: introduce a KVM_DELETE_DEVICE ioctl Cédric Le Goater
2019-01-22  5:42     ` Paul Mackerras
2019-01-23 18:39       ` Cédric Le Goater
2019-01-23 21:32         ` Benjamin Herrenschmidt
2019-02-04  5:26   ` [PATCH 17/19] KVM: PPC: Book3S HV: add get/set accessors for the VP XIVE state David Gibson
2019-02-04 18:57     ` Cédric Le Goater
2019-02-05  5:33       ` David Gibson
2019-02-05 11:58         ` Cédric Le Goater
2019-02-06  1:19           ` David Gibson
2019-01-22  4:46 ` [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Paul Mackerras
2019-01-23 19:07   ` Cédric Le Goater
2019-01-23 21:35     ` Benjamin Herrenschmidt
2019-01-26  8:25       ` Cédric Le Goater
2019-02-04  5:36         ` David Gibson
2019-02-05 11:31           ` Cédric Le Goater
2019-02-05 22:13             ` Paul Mackerras
2019-02-06  1:18               ` David Gibson
2019-02-06  7:35                 ` Cédric Le Goater
2019-02-07  2:51                   ` David Gibson
2019-02-07  8:31                     ` Cédric Le Goater
2019-02-08  5:07                       ` David Gibson
2019-02-08  7:38                         ` Cédric Le Goater
2019-01-28  5:51     ` Paul Mackerras
2019-01-29 13:51       ` Cédric Le Goater
2019-01-30  5:40         ` Paul Mackerras
2019-01-30 15:36           ` Cédric Le Goater

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9b556f53-fcfb-2ca3-019e-6ced0ec74c2a@kaod.org \
    --to=clg@kaod.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).