All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen/arm: Virtual ITS command queue handling
@ 2015-05-05 12:14 Vijay Kilari
  2015-05-05 13:51 ` Stefano Stabellini
                   ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Vijay Kilari @ 2015-05-05 12:14 UTC (permalink / raw)
  To: Stefano Stabellini, Ian Campbell, Stefano Stabellini, Julien Grall
  Cc: Prasun Kapoor, manish.jaggi, xen-devel

Hi,

   As discussed, here is the design doc/txt.

ARM GICv3 provides ITS (Interrupt Translation Service) feature to handle
MSIx interrupts.  Below are various mechanisms to handle
ITS commands.

ITS command completion detection mechanism:
----------------------------------------------------------------------
1) Append INT command to receive interrupt from ITS hardware after completion
of ITS command
2) Poll ITS command Queue by reading CREADER register

ITS driver running the guest can follow either one or both of the approaches
to know command completion.

Assumptions:
--------------------
1) Each VM will have one Virtual ITS (vITS)
2) VM is trapped on CWRITER write.
3) ITS commands should be processed in order of occurance.
   Though we release vITS lock before we put in physical ITS queue,
   there will not be any other VCPU that can trap and post another
   ITS command because the current VCPU is trapped on CWRITER.
   so another VCPU of the same domain cannot trap on CWRITER update.
   If this assumption is not valid, then vITS lock should be held
   untill command is posted to physical ITS.

Below are the proposed methods to emulate ITS commands in Xen.

Proposal 1:
----------------
Here when guest writes command to vITS queue and updates CWRITER registers,
it is trapped in XEN and below steps are followed to process ITS command

1) Trap of CWRITER write by guest
2) Take vITS lock
3) Read command written by guest, translate it.
   command queue.
4) Release vITS lock
5) Take physical ITS (pITS) lock
6) write CMD to physical ITS
7) Release pITS lock
8) Poll physical CREADER for completion of command.
9) Update vITS CREADER of the guest
10)If next command is available goto step (2)
   else
   return from trap

Cons:
   - VCPU loops in Xen untill all commands written by Guest are completed.
   - All the ITS commands written by guest is translated at processed before
     VCPU returns from trap.
   - If guest floods with ITS commands, VCPU keeps posting commands continously.

Pros:
   - Only one set of ITS commands sent by one VCPU per domain is
processed at a time

Handling Command queue state:
 - vITS Queue cannot be full as VCPU returns only on completion of ITS command.
 - Physical Queue cannot be full as it 64KB there by it can accomodate
1K ITS commands.
   If physical Queue is full, then VCPU will poll looking for empty physical.
   On timeout return error.

Behaviour of Polling and completion interrupt based guest driver:
 - If completion interrupt (INT) is used by guest driver, then guest driver will
   always see updated CREADER as commands are completed as it is
written to Queue.
 - If polling mode is used, trap on CREADER checks for completion of command.

Proposal 2:
----------------
Here when guest writes command to vITS queue and updates CWRITER registers,
it is trapped in XEN and below steps are followed to process ITS command

- Dom0 creates a ITS completion device with device id (00:00.1) and reserves
  n number (256 or so) irqs (LPIs) for this device.
- One irq/LPI (called completion_irq) of this completion device is
allocated per domain
- With this irq/LPI descriptor we can identify the domain/vITS.
- Info of all the ongoing ITS requests(put in pITS Queue) of this domain is
  stored in ITS command status array (called its_requests). This is
managed per vITS.

1) Trap of CWRITER write by guest
2) Take vITS lock
3) Read all the commands written by guest, translate it
    - If one of the guest command is INT command
       a) Append INT command with completion_irq and write this batch as
          seperate request and goto (3) to process next commands
    - If more than 'n' commands are sent by guest, start a timer to process
      remaining commands
4) Append INT command with completion_irq of current domain
5) Release vITS lock
6) Take physical ITS (pITS) lock
7) Write translated cmds to physical ITS
8) Add entry in its_requests
9) Release pITS lock
10) return from trap

One receiving completion interrupt:

1) Take the first pending request from its_requests.
2) Update vITS CREADER of the guest indicating completion of command to guest

Cons:
   - Has overhead of processing completion interrupt.
   - Need to reserve a fake device to generate completion interrupt and
     reserve one LPI per-domain

Pros:
   - VCPU does not poll in Xen for completion of commands.
   - Handles guest flooding command queue with commands. But needs timer

Handling Command queue state:
 - Physical Queue cannot be full as it 64KB there by it can accomodate
1K ITS commands.
   In case it is full, VCPU has to poll with timeout till physical
Queue is empty before it post
   next command
 - If vITS Queue condition should be managed by guest ITS driver.

Behaviour of Polling and completion interrupt based guest driver:
 - If completion interrupt (INT) is used by guest driver, then insert
Xen completion
   INT command so that CREADER is updated before guest's INT command is injected
 - If polling mode is used, trap on CREADER checks for completion of command


Proposal 3: (From Julien)
------------------------------------
Here when guest writes command to vITS queue and updates CWRITER registers,
it is trapped in XEN and below steps are followed to process ITS command

1) Trap of CWRITER write by guest
2) Check if vITS state is IDLE. If not return from trap
3) Take vITS lock
4) Read 'n' commands at a time written by guest, translate it
5) Set vITS state as IN_PROGRESS
5) Release vITS lock
6) Take physical ITS (pITS) lock
7) Write translated cmds to physical ITS
9) Release pITS lock
10) return from trap

On CREADER trap from guest (which polls for completion)
11) Check if posted command is completed.
   if completed,
     - update CREADER of vITS
     - Set vITS state as IDLE
     - Post next 'n' set of commands (Jump to 1)
   else
     return from trap.

Cons:
   - Guest ITS driver should always poll on CREADER to know completion of
     commands. If guest does not poll, vITS CREADER will not be updated.

Pros:
   - VCPU will not poll in Xen
   - Handles guest flooding with ITS commands scenario by processing 'n'
     commands at a time.

Handling Command queue state:
 - Physical Queue cannot be full as it 64KB there by it can accomodate
1K ITS commands.
   In case it is full, VCPU has to poll with timeout till physical
Queue is empty
   before it posts the commands
 - If vITS Queue condition should be managed by guest ITS driver.

Behaviour of Polling and completion interrupt based guest driver:
 - If completion interrupt (INT) is used by guest driver, then guest driver
   should read CREADER, so that it is trapped and gets updated CREADER.
   If CREADER is not polled, CREADER will not be updated.
 - If polling mode is used, trap on CREADER checks for completion of command


Regards
Vijay

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-05 12:14 Xen/arm: Virtual ITS command queue handling Vijay Kilari
@ 2015-05-05 13:51 ` Stefano Stabellini
  2015-05-05 13:54   ` Julien Grall
  2015-05-05 15:56   ` Vijay Kilari
  2015-05-05 14:09 ` Julien Grall
  2015-05-12 15:02 ` Ian Campbell
  2 siblings, 2 replies; 77+ messages in thread
From: Stefano Stabellini @ 2015-05-05 13:51 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Tue, 5 May 2015, Vijay Kilari wrote:
> Proposal 2:
> ----------------
> Here when guest writes command to vITS queue and updates CWRITER registers,
> it is trapped in XEN and below steps are followed to process ITS command
> 
> - Dom0 creates a ITS completion device with device id (00:00.1) and reserves
>   n number (256 or so) irqs (LPIs) for this device.
> - One irq/LPI (called completion_irq) of this completion device is
> allocated per domain

Good. Is it possible to actually assign an LPI to a domain when/if a PCI
device is assigned to the domain? So that we don't waste LPIs for
domains that are not going to use the vITS?


> - With this irq/LPI descriptor we can identify the domain/vITS.
> - Info of all the ongoing ITS requests(put in pITS Queue) of this domain is
>   stored in ITS command status array (called its_requests). This is
> managed per vITS.
> 
> 1) Trap of CWRITER write by guest
> 2) Take vITS lock
> 3) Read all the commands written by guest, translate it
>     - If one of the guest command is INT command
>        a) Append INT command with completion_irq and write this batch as
>           seperate request and goto (3) to process next commands
>     - If more than 'n' commands are sent by guest, start a timer to process
>       remaining commands
> 4) Append INT command with completion_irq of current domain

I would consider adding a vcpu_block call


> 5) Release vITS lock
> 6) Take physical ITS (pITS) lock
> 7) Write translated cmds to physical ITS
> 8) Add entry in its_requests
> 9) Release pITS lock
> 10) return from trap
>
> One receiving completion interrupt:
> 
> 1) Take the first pending request from its_requests.
> 2) Update vITS CREADER of the guest indicating completion of command to guest

I would add vcpu_unblock


> Cons:
>    - Has overhead of processing completion interrupt.
>    - Need to reserve a fake device to generate completion interrupt and
>      reserve one LPI per-domain
> 
> Pros:
>    - VCPU does not poll in Xen for completion of commands.
>    - Handles guest flooding command queue with commands. But needs timer
> 
> Handling Command queue state:
>  - Physical Queue cannot be full as it 64KB there by it can accomodate
> 1K ITS commands.
>    In case it is full, VCPU has to poll with timeout till physical
> Queue is empty before it post
>    next command
>  - If vITS Queue condition should be managed by guest ITS driver.
> 
> Behaviour of Polling and completion interrupt based guest driver:
>  - If completion interrupt (INT) is used by guest driver, then insert
> Xen completion
>    INT command so that CREADER is updated before guest's INT command is injected
>  - If polling mode is used, trap on CREADER checks for completion of command

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-05 13:51 ` Stefano Stabellini
@ 2015-05-05 13:54   ` Julien Grall
  2015-05-05 15:56   ` Vijay Kilari
  1 sibling, 0 replies; 77+ messages in thread
From: Julien Grall @ 2015-05-05 13:54 UTC (permalink / raw)
  To: Stefano Stabellini, Vijay Kilari
  Cc: Ian Campbell, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Stefano Stabellini

On 05/05/15 14:51, Stefano Stabellini wrote:
>> - With this irq/LPI descriptor we can identify the domain/vITS.
>> - Info of all the ongoing ITS requests(put in pITS Queue) of this domain is
>>   stored in ITS command status array (called its_requests). This is
>> managed per vITS.
>>
>> 1) Trap of CWRITER write by guest
>> 2) Take vITS lock
>> 3) Read all the commands written by guest, translate it
>>     - If one of the guest command is INT command
>>        a) Append INT command with completion_irq and write this batch as
>>           seperate request and goto (3) to process next commands
>>     - If more than 'n' commands are sent by guest, start a timer to process
>>       remaining commands
>> 4) Append INT command with completion_irq of current domain
> 
> I would consider adding a vcpu_block call

I don't think the vcpu_block would improve performance here.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-05 12:14 Xen/arm: Virtual ITS command queue handling Vijay Kilari
  2015-05-05 13:51 ` Stefano Stabellini
@ 2015-05-05 14:09 ` Julien Grall
  2015-05-05 16:09   ` Vijay Kilari
  2015-05-12 15:02 ` Ian Campbell
  2 siblings, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-05 14:09 UTC (permalink / raw)
  To: Vijay Kilari, Stefano Stabellini, Ian Campbell,
	Stefano Stabellini, Julien Grall
  Cc: Prasun Kapoor, manish.jaggi, xen-devel

On 05/05/15 13:14, Vijay Kilari wrote:
> Hi,
> 

Hi Vijay,

>    As discussed, here is the design doc/txt.

I will comment on the proposal 2 as it seems to be the preferred one
assuming you are able to find why it's slow.

> Proposal 2:
> ----------------
> Here when guest writes command to vITS queue and updates CWRITER registers,
> it is trapped in XEN and below steps are followed to process ITS command
> 
> - Dom0 creates a ITS completion device with device id (00:00.1) and reserves
>   n number (256 or so) irqs (LPIs) for this device.
> - One irq/LPI (called completion_irq) of this completion device is
> allocated per domain
> - With this irq/LPI descriptor we can identify the domain/vITS.
> - Info of all the ongoing ITS requests(put in pITS Queue) of this domain is
>   stored in ITS command status array (called its_requests). This is
> managed per vITS.
> 
> 1) Trap of CWRITER write by guest
> 2) Take vITS lock
> 3) Read all the commands written by guest, translate it
>     - If one of the guest command is INT command

Why do you need a specific handling for the guest INT command?

>        a) Append INT command with completion_irq and write this batch as
>           seperate request and goto (3) to process next commands
>     - If more than 'n' commands are sent by guest, start a timer to process
>       remaining commands

Hmmm... How are you sure the time for the timer would be enough?

> 4) Append INT command with completion_irq of current domain
> 5) Release vITS lock
> 6) Take physical ITS (pITS) lock
> 7) Write translated cmds to physical ITS
> 8) Add entry in its_requests

You don't explain what is its_requests.

> 9) Release pITS lock
> 10) return from trap
> 
> One receiving completion interrupt:
> 
> 1) Take the first pending request from its_requests.

I'm assuming that you have some kind of array/list to store the pending
request? I think this would be more difficult to manage than only
supporting one batch per domain at any time.

> 2) Update vITS CREADER of the guest indicating completion of command to guest
> 
> Cons:
>    - Has overhead of processing completion interrupt.
>    - Need to reserve a fake device to generate completion interrupt and
>      reserve one LPI per-domain
> 
> Pros:
>    - VCPU does not poll in Xen for completion of commands.
>    - Handles guest flooding command queue with commands. But needs timer
> 
> Handling Command queue state:
>  - Physical Queue cannot be full as it 64KB there by it can accomodate
> 1K ITS commands.

I don't understand this sentence. Why do you think the physical queue
cannot be full?

>    In case it is full, VCPU has to poll with timeout till physical
> Queue is empty before it post
>    next command
>  - If vITS Queue condition should be managed by guest ITS driver.

Same here.

> Behaviour of Polling and completion interrupt based guest driver:
>  - If completion interrupt (INT) is used by guest driver, then insert
> Xen completion
>    INT command so that CREADER is updated before guest's INT command is injected
>  - If polling mode is used, trap on CREADER checks for completion of command
> 

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-05 13:51 ` Stefano Stabellini
  2015-05-05 13:54   ` Julien Grall
@ 2015-05-05 15:56   ` Vijay Kilari
  1 sibling, 0 replies; 77+ messages in thread
From: Vijay Kilari @ 2015-05-05 15:56 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Ian Campbell, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Stefano Stabellini

On Tue, May 5, 2015 at 7:21 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Tue, 5 May 2015, Vijay Kilari wrote:
>> Proposal 2:
>> ----------------
>> Here when guest writes command to vITS queue and updates CWRITER registers,
>> it is trapped in XEN and below steps are followed to process ITS command
>>
>> - Dom0 creates a ITS completion device with device id (00:00.1) and reserves
>>   n number (256 or so) irqs (LPIs) for this device.
>> - One irq/LPI (called completion_irq) of this completion device is
>> allocated per domain
>
> Good. Is it possible to actually assign an LPI to a domain when/if a PCI
> device is assigned to the domain? So that we don't waste LPIs for
> domains that are not going to use the vITS?

Yes we can. On receiving first MAPD command we can allocate LPI.

>
>
>> - With this irq/LPI descriptor we can identify the domain/vITS.
>> - Info of all the ongoing ITS requests(put in pITS Queue) of this domain is
>>   stored in ITS command status array (called its_requests). This is
>> managed per vITS.
>>
>> 1) Trap of CWRITER write by guest
>> 2) Take vITS lock
>> 3) Read all the commands written by guest, translate it
>>     - If one of the guest command is INT command
>>        a) Append INT command with completion_irq and write this batch as
>>           seperate request and goto (3) to process next commands
>>     - If more than 'n' commands are sent by guest, start a timer to process
>>       remaining commands
>> 4) Append INT command with completion_irq of current domain
>
> I would consider adding a vcpu_block call
>
>
>> 5) Release vITS lock
>> 6) Take physical ITS (pITS) lock
>> 7) Write translated cmds to physical ITS
>> 8) Add entry in its_requests
>> 9) Release pITS lock
>> 10) return from trap
>>
>> One receiving completion interrupt:
>>
>> 1) Take the first pending request from its_requests.
>> 2) Update vITS CREADER of the guest indicating completion of command to guest
>
> I would add vcpu_unblock
>
>
>> Cons:
>>    - Has overhead of processing completion interrupt.
>>    - Need to reserve a fake device to generate completion interrupt and
>>      reserve one LPI per-domain
>>
>> Pros:
>>    - VCPU does not poll in Xen for completion of commands.
>>    - Handles guest flooding command queue with commands. But needs timer
>>
>> Handling Command queue state:
>>  - Physical Queue cannot be full as it 64KB there by it can accomodate
>> 1K ITS commands.
>>    In case it is full, VCPU has to poll with timeout till physical
>> Queue is empty before it post
>>    next command
>>  - If vITS Queue condition should be managed by guest ITS driver.
>>
>> Behaviour of Polling and completion interrupt based guest driver:
>>  - If completion interrupt (INT) is used by guest driver, then insert
>> Xen completion
>>    INT command so that CREADER is updated before guest's INT command is injected
>>  - If polling mode is used, trap on CREADER checks for completion of command

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-05 14:09 ` Julien Grall
@ 2015-05-05 16:09   ` Vijay Kilari
  2015-05-05 16:27     ` Julien Grall
  0 siblings, 1 reply; 77+ messages in thread
From: Vijay Kilari @ 2015-05-05 16:09 UTC (permalink / raw)
  To: Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Tue, May 5, 2015 at 7:39 PM, Julien Grall <julien.grall@citrix.com> wrote:
> On 05/05/15 13:14, Vijay Kilari wrote:
>> Hi,
>>
>
> Hi Vijay,
>
>>    As discussed, here is the design doc/txt.
>
> I will comment on the proposal 2 as it seems to be the preferred one
> assuming you are able to find why it's slow.
>
>> Proposal 2:
>> ----------------
>> Here when guest writes command to vITS queue and updates CWRITER registers,
>> it is trapped in XEN and below steps are followed to process ITS command
>>
>> - Dom0 creates a ITS completion device with device id (00:00.1) and reserves
>>   n number (256 or so) irqs (LPIs) for this device.
>> - One irq/LPI (called completion_irq) of this completion device is
>> allocated per domain
>> - With this irq/LPI descriptor we can identify the domain/vITS.
>> - Info of all the ongoing ITS requests(put in pITS Queue) of this domain is
>>   stored in ITS command status array (called its_requests). This is
>> managed per vITS.
>>
>> 1) Trap of CWRITER write by guest
>> 2) Take vITS lock
>> 3) Read all the commands written by guest, translate it
>>     - If one of the guest command is INT command
>
> Why do you need a specific handling for the guest INT command?

  If guest driver is using interrupt mechanism instead of polling
then INT command is passed by guest. To make sure that CREADER is updated
before INT command raises interrupt to guest, Xen has to insert completion
interrupt and update CREADER

>>        a) Append INT command with completion_irq and write this batch as
>>           seperate request and goto (3) to process next commands
>>     - If more than 'n' commands are sent by guest, start a timer to process
>>       remaining commands
>
> Hmmm... How are you sure the time for the timer would be enough?
>
   Not thought of how much time. May be the number of pending
   commands in physical queue might give some hueristic on timer value.

>> 4) Append INT command with completion_irq of current domain
>> 5) Release vITS lock
>> 6) Take physical ITS (pITS) lock
>> 7) Write translated cmds to physical ITS
>> 8) Add entry in its_requests
>
> You don't explain what is its_requests.
>
>> 9) Release pITS lock
>> 10) return from trap
>>
>> One receiving completion interrupt:
>>
>> 1) Take the first pending request from its_requests.
>
> I'm assuming that you have some kind of array/list to store the pending
> request? I think this would be more difficult to manage than only
> supporting one batch per domain at any time.

  Yes, If only one batch per domain is processed at a time,
then the array could store only one entry. I will tune it when I implement

>> 2) Update vITS CREADER of the guest indicating completion of command to guest
>>
>> Cons:
>>    - Has overhead of processing completion interrupt.
>>    - Need to reserve a fake device to generate completion interrupt and
>>      reserve one LPI per-domain
>>
>> Pros:
>>    - VCPU does not poll in Xen for completion of commands.
>>    - Handles guest flooding command queue with commands. But needs timer
>>
>> Handling Command queue state:
>>  - Physical Queue cannot be full as it 64KB there by it can accomodate
>> 1K ITS commands.
>
> I don't understand this sentence. Why do you think the physical queue
> cannot be full?

  I mean that it is unlikely that physical ITS command Q would be full
because of 64KB size. If at all if it full then below action is taken

>
>>    In case it is full, VCPU has to poll with timeout till physical
>> Queue is empty before it post
>>    next command
>>  - If vITS Queue condition should be managed by guest ITS driver.
>
> Same here.

    vITS Queue is under guest control. If Xen is processing commands slowly
and if guest sees its queue is full then guest driver will handle it.

>
>> Behaviour of Polling and completion interrupt based guest driver:
>>  - If completion interrupt (INT) is used by guest driver, then insert
>> Xen completion
>>    INT command so that CREADER is updated before guest's INT command is injected
>>  - If polling mode is used, trap on CREADER checks for completion of command
>>
>
> Regards,
>
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-05 16:09   ` Vijay Kilari
@ 2015-05-05 16:27     ` Julien Grall
  0 siblings, 0 replies; 77+ messages in thread
From: Julien Grall @ 2015-05-05 16:27 UTC (permalink / raw)
  To: Vijay Kilari, Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On 05/05/15 17:09, Vijay Kilari wrote:
> On Tue, May 5, 2015 at 7:39 PM, Julien Grall <julien.grall@citrix.com> wrote:
>> On 05/05/15 13:14, Vijay Kilari wrote:
>>> Proposal 2:
>>> ----------------
>>> Here when guest writes command to vITS queue and updates CWRITER registers,
>>> it is trapped in XEN and below steps are followed to process ITS command
>>>
>>> - Dom0 creates a ITS completion device with device id (00:00.1) and reserves
>>>   n number (256 or so) irqs (LPIs) for this device.
>>> - One irq/LPI (called completion_irq) of this completion device is
>>> allocated per domain
>>> - With this irq/LPI descriptor we can identify the domain/vITS.
>>> - Info of all the ongoing ITS requests(put in pITS Queue) of this domain is
>>>   stored in ITS command status array (called its_requests). This is
>>> managed per vITS.
>>>
>>> 1) Trap of CWRITER write by guest
>>> 2) Take vITS lock
>>> 3) Read all the commands written by guest, translate it
>>>     - If one of the guest command is INT command
>>
>> Why do you need a specific handling for the guest INT command?
> 
>   If guest driver is using interrupt mechanism instead of polling
> then INT command is passed by guest. To make sure that CREADER is updated
> before INT command raises interrupt to guest, Xen has to insert completion
> interrupt and update CREADER

Hmmm I see what you mean now. Although, if I understand correctly, Xen
would receive two interrupts: one for the completion, and the other for
the guest.

It would be better if we avoid the first by re-using the INT command
from the guest. If it's not to difficult of course.

>>>        a) Append INT command with completion_irq and write this batch as
>>>           seperate request and goto (3) to process next commands
>>>     - If more than 'n' commands are sent by guest, start a timer to process
>>>       remaining commands
>>
>> Hmmm... How are you sure the time for the timer would be enough?
>>
>    Not thought of how much time. May be the number of pending
>    commands in physical queue might give some hueristic on timer value.

I'm wondering if a tasklet would be better here.

>>> 4) Append INT command with completion_irq of current domain
>>> 5) Release vITS lock
>>> 6) Take physical ITS (pITS) lock
>>> 7) Write translated cmds to physical ITS
>>> 8) Add entry in its_requests
>>
>> You don't explain what is its_requests.
>>
>>> 9) Release pITS lock
>>> 10) return from trap
>>>
>>> One receiving completion interrupt:
>>>
>>> 1) Take the first pending request from its_requests.
>>
>> I'm assuming that you have some kind of array/list to store the pending
>> request? I think this would be more difficult to manage than only
>> supporting one batch per domain at any time.
> 
>   Yes, If only one batch per domain is processed at a time,
> then the array could store only one entry. I will tune it when I implement

You won't need an array in this case...

>>> 2) Update vITS CREADER of the guest indicating completion of command to guest
>>>
>>> Cons:
>>>    - Has overhead of processing completion interrupt.
>>>    - Need to reserve a fake device to generate completion interrupt and
>>>      reserve one LPI per-domain
>>>
>>> Pros:
>>>    - VCPU does not poll in Xen for completion of commands.
>>>    - Handles guest flooding command queue with commands. But needs timer
>>>
>>> Handling Command queue state:
>>>  - Physical Queue cannot be full as it 64KB there by it can accomodate
>>> 1K ITS commands.
>>
>> I don't understand this sentence. Why do you think the physical queue
>> cannot be full?
> 
>   I mean that it is unlikely that physical ITS command Q would be full
> because of 64KB size. If at all if it full then below action is taken

Oh ok. I though you were saying it's not possible :).

> 
>>
>>>    In case it is full, VCPU has to poll with timeout till physical
>>> Queue is empty before it post
>>>    next command
>>>  - If vITS Queue condition should be managed by guest ITS driver.
>>
>> Same here.
> 
>     vITS Queue is under guest control. If Xen is processing commands slowly
> and if guest sees its queue is full then guest driver will handle it.

This paragraph is easier to understand thanks.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-05 12:14 Xen/arm: Virtual ITS command queue handling Vijay Kilari
  2015-05-05 13:51 ` Stefano Stabellini
  2015-05-05 14:09 ` Julien Grall
@ 2015-05-12 15:02 ` Ian Campbell
  2015-05-12 17:35   ` Julien Grall
                     ` (2 more replies)
  2 siblings, 3 replies; 77+ messages in thread
From: Ian Campbell @ 2015-05-12 15:02 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Stefano Stabellini

On Tue, 2015-05-05 at 17:44 +0530, Vijay Kilari wrote:
> Hi,
> 
>    As discussed, here is the design doc/txt.

There seems to be no consideration of multiple guests or VCPUs all
accessing one or more vITS in parallel and the associated issues around
fairness etc.

Overall I think there needs to be a stronger logical separation between
the vITS emulation and the stuff which interacts with the pITS
(scheduling, completion handling etc).

I've written up my thinking as a design doc below (it's pandoc and the
pdf version is also at
http://xenbits.xen.org/people/ianc/vits/draftA.pdf FWIW).

Corrections and comments welcome. There are several XXX's in it,
representing open questions or things I wasn't sure about how to handle.

This only really covers command queue virtualisation and not other
aspects (I'm not sure if they need covering or not).

Lets try and use this as a basis for discussion so we can correct and
amend it to represent what the actual design will be

Ian.

% Xen on ARM vITS Handling
% Ian Campbell <ian.campbell@citrix.com>
% Draft A

# Introduction

ARM systems containing a GIC version 3 or later may contain one or
more ITS logical blocks. An ITS is used to route Message Signalled
interrupts from devices into an LPI injection on the processor.

The following summarises the ITS hardware design and serves as a set
of assumptions for the vITS software design. (XXX it is entirely
possible I've horribly misunderstood how this stuff fits
together). For full details of the ITS see the "GIC Architecture
Specification".

Message signalled interrupts are translated into an LPI via a
translation table which must be configured for each device which can
generate an MSI. The ITS uses the device id of the originating device
to lookup the corresponding translation table. Devices IDs are
typically described via system firmware, e.g. the ACPI IORT table or
via device tree.

The ITS is configured and managed, including establishing a
Translation Table for each device, via an in memory ring shared
between the CPU and the ITS controller. The ring is managed via the
`GITS_CBASER` register and indexed by `GITS_CWRITER` and `GITS_CREADR`
registers.

A processor adds commands to the shared ring and then updates
`GITS_CWRITER` to make them visible to the ITS controller.

The ITS controller processes commands from the ring and then updates
`GITS_CREADR` to indicate the the processor that the command has been
processed.

Commands are processed sequentially.

Commands sent on the ring include operational commands:

* Routing interrupts to processors;
* Generating interrupts;
* Clearing the pending state of interrupts;
* Synchronising the command queue

and maintenance commands:

* Map device/collection/processor;
* Map virtual interrupt;
* Clean interrupts;
* Discard interrupts;

The ITS provides no specific completion notification
mechanism. Completion is monitored by a combination of a `SYNC`
command and either polling `GITS_CREADR` or notification via an
interrupt generated via the `INT` command.

Note that the interrupt generation via `INT` requires an originating
device ID to be supplied (which is then translated via the ITS into an
LPI). No specific device ID is defined for this purpose and so the OS
software is expected to fabricate one.

Possible ways of inventing such a device ID are:

* Enumerate all device ids in the system and pick another one;
* Use a PCI BDF associated with a non-existent device function (such
  as an unused one relating to the PCI root-bridge) and translate that
  (via firmware tables) into a suitable device id;
* ???

# vITS

A guest domain which is allowed to use ITS functionality (i.e. has
been assigned pass-through devices which can generate MSIs) will be
presented with a virtualised ITS.

Accesses to the vITS registers will trap to Xen and be emulated and a
virtualised Command Queue will be provided.

Commands entered onto the virtual Command Queue will be translated
into physical commands (this translation is described in the GIC
specification).

XXX there are other aspects to virtualising the ITS (LPI collection
management, assignment of LPI ranges to guests). However these are not
currently considered here. XXX Should they be/do they need to be?

## Requirements

Emulation should not block in the hypervisor for extended periods. In
particular Xen should not busy wait on the physical ITS. Doing so
blocks the physical CPU from doing anything else (such as scheduling
other VCPUS)

There may be multiple guests which have a vITS, all targeting the same
underlying pITS. A single guest VCPU should not be able to monopolise
the pITS via its vITS and all guests should be able to make forward
progress.

## Command Queue Virtualisation

The command queue of each vITS is represented by a data structure:

    struct vits_cq {
        list_head schedule_list; /* Queued onto pits.schedule_list */
        uint32_t creadr;         /* Virtual creadr */
        uint32_t cwriter;        /* Virtual cwriter */
        uint32_t progress;       /* Index of last command queued to pits */
        [ Reference to command queue memory ]
    };

Each pITS has an associated data structure:

    struct pits {
        list_head schedule_list; /* Contains list of vitq_cq.schedule_lists */
	uint32_t last_creadr;
    };

On write to the virtual `CWRITER` the cwriter field is updated and if
that results in there being new outstanding requests then the vits_cq
is enqueued onto pITS' schedule_list (unless it is already there).

On read from the virtual `CREADR` iff the vits_cq is such that
commands are outstanding then a scheduling pass is attempted (in order
to update `vits_cq.creadr`). The current value of `vitq_cq.creadr` is
then returned.

### pITS Scheduling

A pITS scheduling pass is attempted:

* On write to any virtual `CWRITER` iff that write results in there
  being new outstanding requests for that vits;
* On read from a virtual `CREADR` iff there are commands outstanding
  on that vits;
* On receipt of an interrupt notification arising from Xen's own use
  of `INT`; (see discussion under Completion)
* On any interrupt injection arising from a guests use of the `INT`
  command; (XXX perhaps, see discussion under Completion)

Each scheduling pass will:

* Read the physical `CREADR`;
* For each command between `pits.last_creadr` and the new `CREADR`
  value process completion of that command and update the
  corresponding `vits_cq.creadr`.
* Attempt to refill the pITS Command Queue (see below).

### Filling the pITS Command Queue.

Various algorithms could be used here. For now a simple proposal is
to traverse the `pits.schedule_list` starting from where the last
refill finished (i.e not from the top of the list each time).

If a `vits_cq` has no pending commands then it is removed from the
list.

If a `vits_cq` has some pending commands then `min(pits-free-slots,
vits-outstanding, VITS_BATCH_SIZE)` will be taken from the vITS
command queue, translated and placed onto the pITS
queue. `vits_cq.progress` will be updated to reflect this.

Each `vits_cq` is handled in turn in this way until the pITS Command
Queue is full or there are no more outstanding commands.

There will likely need to be a data structure which shadows the pITS
Command Queue slots with references to the `vits_cq` which has a
command currently occupying that slot and corresponding the index into
the virtual command queue, for use when completing a command.

`VITS_BATCH_SIZE` should be small, TBD say 4 or 8.

Possible simplification: If we arrange that no guest ever has multiple
batches in flight (which can occur if we wrap around the list several
times) then we may be able to simplify the book keeping
required. However this may need some careful thought wrt fairness for
guests submitting frequent small batches of commands vs those sending
large batches.

### Completion

It is expected that commands will normally be completed (resulting in
an update of the corresponding `vits_cq.creadr`) via guest read from
`CREADR`. This will trigger a scheduling pass which will ensure the
`vits_cq.creadr` value is up to date before it is returned.

A guest which does completion via the use of `INT` cannot observe
`CREADR` without reading it, so updating on read from `CREADR`
suffices from the point of view of the guests observation of the
state. (Of course we will inject the interrupt at the designated point
and the guest may well then read `CREADR`)

However in order to keep the pITS Command Queue moving along we need
to consider what happens if there are no `INT` based events nor reads
from `CREADR` to drive completion and therefore refilling of the Queue
with other outstanding commands.

A guest which enqueues some commands and then never checks for
completion cannot itself block things because any other guest which
reads `CREADR` will drive completion. However if _no_ guest reads from
`CREADR` then completion will not occur and this must be dealt with.

Even if we include completion on `INT`-base interrupt injection then
it is possible that the pITS queue may not contain any such
interrupts, either because no guest is using them or because the
batching means that none of them are enqueued on the active ring at
the moment.

So we need a fallback to ensure that queue keeps moving. There are
several options:

* A periodic timer in Xen which runs whenever there are outstanding
  commands in the pITS. This is simple but pretty sucky.
* Xen injects its own `INT` commands into the pITS ring. This requires
  figuring out a device ID to use.

The second option is likely to be preferable if the issue of selecting
a device ID can be addressed.

A secondary question is when these `INT` commands should be inserted
into the command stream:

* After each batch taken from a single `vits_cq`;
* After each scheduling pass;
* One active in the command stream at any given time;

The latter should be sufficient, by arranging to insert a `INT` into
the stream at the end of any scheduling pass which occurs while there
is not a currently outstanding `INT` we have sufficient backstop to
allow us to refill the ring.

This assumes that there is no particular benefit to keeping the
`CWRITER` rolling ahead of the pITS's actual processing. This is true
because the IRS operates on commands in the order they appear in the
queue, so there is no need to maintain a runway ahead of the ITS
processing. (XXX If this is a concern perhaps the INT could be
inserted at the head of the final batch of commands in a scheduling
pass instead of the tail).

Xen itself should never need to issue an associated `SYNC` command,
since the individual guests would need to issue those themselves when
they care. The `INT` only serves to allow Xen to enqueue new commands
when there is space on the ring, it has no interest itself on the
actual completion.

### Locking

It may be preferable to use `atomic_t` types for various fields
(e.g. `vits_cq.creadr`) in order to reduce the amount and scope of
locking required.

### Multiple vITS instances in a single guest

As described above each vITS maps to exactly one pITS (while each pITS
servers multiple vITSs).

In principal it might be possible to arrange that a vITS can enqueue
commands to different pITSs depending on e.g. the device id. However
this brings significant additional complexity (what to do with SYNC
commands, how order completion such that one pITS does not block
another, book keeping etc).

In addition the introduction of direct interrupt injection in version
4 GICs may imply a vITS per pITS. (XXX???)

Therefore it is proposed that the restriction that a single vITS maps
to one pITS be retained. If a guest requires access to devices
associated with multiple pITSs then multiple vITS should be
configured.

### vITS for purely software interrupts (e.g. event channels)

It has been proposed that it might be nice to inject event channels as
LPIs in the future. Whether or not that would involve any sort of vITS
is unclear, but if it did then it would likely be a separate emulation
to the vITS emulation used with a pITS and as such is not considered
further here.

# Glossary

* _MSI_: Message Signalled Interrupt
* _ITS_: Interrupt Translation Service
* _GIC_: Generic Interrupt Controller
* _LPI_: Locality-specific Peripheral Interrupt

# References

"GIC Architecture Specification" PRD03-GENC-010745 24.0

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-12 15:02 ` Ian Campbell
@ 2015-05-12 17:35   ` Julien Grall
  2015-05-13 13:23     ` Ian Campbell
  2015-05-13 16:27   ` Vijay Kilari
  2015-05-15 11:45   ` Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling) Ian Campbell
  2 siblings, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-12 17:35 UTC (permalink / raw)
  To: Ian Campbell, Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Stefano Stabellini

Hi Ian,

On 12/05/15 16:02, Ian Campbell wrote:
> On Tue, 2015-05-05 at 17:44 +0530, Vijay Kilari wrote:
>> Hi,
>>
>>    As discussed, here is the design doc/txt.
> 
> There seems to be no consideration of multiple guests or VCPUs all
> accessing one or more vITS in parallel and the associated issues around
> fairness etc.
> 
> Overall I think there needs to be a stronger logical separation between
> the vITS emulation and the stuff which interacts with the pITS
> (scheduling, completion handling etc).
> 
> I've written up my thinking as a design doc below (it's pandoc and the
> pdf version is also at
> http://xenbits.xen.org/people/ianc/vits/draftA.pdf FWIW).

Thank you for write the doc.

> 
> Corrections and comments welcome. There are several XXX's in it,
> representing open questions or things I wasn't sure about how to handle.
> 
> This only really covers command queue virtualisation and not other
> aspects (I'm not sure if they need covering or not).
> 
> Lets try and use this as a basis for discussion so we can correct and
> amend it to represent what the actual design will be
> 
> Ian.
> 
> % Xen on ARM vITS Handling
> % Ian Campbell <ian.campbell@citrix.com>
> % Draft A
> 
> # Introduction
> 
> ARM systems containing a GIC version 3 or later may contain one or
> more ITS logical blocks. An ITS is used to route Message Signalled
> interrupts from devices into an LPI injection on the processor.
> 
> The following summarises the ITS hardware design and serves as a set
> of assumptions for the vITS software design. (XXX it is entirely
> possible I've horribly misunderstood how this stuff fits
> together). For full details of the ITS see the "GIC Architecture
> Specification".

The summarise of the ITS hardware design looks good to me.

> Message signalled interrupts are translated into an LPI via a
> translation table which must be configured for each device which can
> generate an MSI. The ITS uses the device id of the originating device
> to lookup the corresponding translation table. Devices IDs are
> typically described via system firmware, e.g. the ACPI IORT table or
> via device tree.
> 
> The ITS is configured and managed, including establishing a
> Translation Table for each device, via an in memory ring shared

s/an in/a/?

> between the CPU and the ITS controller. The ring is managed via the
> `GITS_CBASER` register and indexed by `GITS_CWRITER` and `GITS_CREADR`
> registers.
> 
> A processor adds commands to the shared ring and then updates
> `GITS_CWRITER` to make them visible to the ITS controller.
> 
> The ITS controller processes commands from the ring and then updates
> `GITS_CREADR` to indicate the the processor that the command has been
> processed.
> 
> Commands are processed sequentially.
> 
> Commands sent on the ring include operational commands:
> 
> * Routing interrupts to processors;
> * Generating interrupts;
> * Clearing the pending state of interrupts;
> * Synchronising the command queue
> 
> and maintenance commands:
> 
> * Map device/collection/processor;
> * Map virtual interrupt;
> * Clean interrupts;
> * Discard interrupts;
> 
> The ITS provides no specific completion notification
> mechanism. Completion is monitored by a combination of a `SYNC`
> command and either polling `GITS_CREADR` or notification via an
> interrupt generated via the `INT` command.
> 
> Note that the interrupt generation via `INT` requires an originating
> device ID to be supplied (which is then translated via the ITS into an
> LPI). No specific device ID is defined for this purpose and so the OS
> software is expected to fabricate one.
> 
> Possible ways of inventing such a device ID are:
> 
> * Enumerate all device ids in the system and pick another one;
> * Use a PCI BDF associated with a non-existent device function (such
>   as an unused one relating to the PCI root-bridge) and translate that
>   (via firmware tables) into a suitable device id;
> * ???

I don't have any other ideas in mind.

> # vITS
> 
> A guest domain which is allowed to use ITS functionality (i.e. has
> been assigned pass-through devices which can generate MSIs) will be
> presented with a virtualised ITS.
> 
> Accesses to the vITS registers will trap to Xen and be emulated and a
> virtualised Command Queue will be provided.
> 
> Commands entered onto the virtual Command Queue will be translated
> into physical commands (this translation is described in the GIC
> specification).
> 
> XXX there are other aspects to virtualising the ITS (LPI collection
> management, assignment of LPI ranges to guests).

Another aspect to think is device management.

> However these are not
> currently considered here. XXX Should they be/do they need to be?

I think those aspects are straightforward and doesn't require any
specific design docs. We could discuss about it during the
implementation (number of LPIs supported, LPIs allocations...).

> 
> ## Requirements
> 
> Emulation should not block in the hypervisor for extended periods. In
> particular Xen should not busy wait on the physical ITS. Doing so
> blocks the physical CPU from doing anything else (such as scheduling
> other VCPUS)
> 
> There may be multiple guests which have a vITS, all targeting the same
> underlying pITS. A single guest VCPU should not be able to monopolise
> the pITS via its vITS and all guests should be able to make forward
> progress.
> 
> ## Command Queue Virtualisation
> 
> The command queue of each vITS is represented by a data structure:
> 
>     struct vits_cq {
>         list_head schedule_list; /* Queued onto pits.schedule_list */
>         uint32_t creadr;         /* Virtual creadr */
>         uint32_t cwriter;        /* Virtual cwriter */
>         uint32_t progress;       /* Index of last command queued to pits */
>         [ Reference to command queue memory ]
>     };
> 
> Each pITS has an associated data structure:
> 
>     struct pits {
>         list_head schedule_list; /* Contains list of vitq_cq.schedule_lists */
> 	uint32_t last_creadr;
>     };
> 
> On write to the virtual `CWRITER` the cwriter field is updated and if
> that results in there being new outstanding requests then the vits_cq
> is enqueued onto pITS' schedule_list (unless it is already there).
> 
> On read from the virtual `CREADR` iff the vits_cq is such that

s/iff/if/

> commands are outstanding then a scheduling pass is attempted (in order
> to update `vits_cq.creadr`). The current value of `vitq_cq.creadr` is
> then returned.
> 
> ### pITS Scheduling

I'm not sure if the design document is the right place to talk about it.

If a domain die during the process , how would it affect the scheduler?

> A pITS scheduling pass is attempted:
> 
> * On write to any virtual `CWRITER` iff that write results in there

s/iff/if/

>   being new outstanding requests for that vits;
> * On read from a virtual `CREADR` iff there are commands outstanding

s/iff/if/

>   on that vits;
> * On receipt of an interrupt notification arising from Xen's own use
>   of `INT`; (see discussion under Completion)
> * On any interrupt injection arising from a guests use of the `INT`
>   command; (XXX perhaps, see discussion under Completion)

With all the solution suggested, it will be very likely that we will try
to execute multiple the scheduling pass at the same time.

One way is to wait, until the previous pass as finished. But that would
mean that the scheduler would be executed very often.

Or maybe you plan to offload the scheduler in a softirq?

> Each scheduling pass will:
> 
> * Read the physical `CREADR`;
> * For each command between `pits.last_creadr` and the new `CREADR`
>   value process completion of that command and update the
>   corresponding `vits_cq.creadr`.
> * Attempt to refill the pITS Command Queue (see below).
> 
> ### Filling the pITS Command Queue.
> 
> Various algorithms could be used here. For now a simple proposal is
> to traverse the `pits.schedule_list` starting from where the last
> refill finished (i.e not from the top of the list each time).
> 
> If a `vits_cq` has no pending commands then it is removed from the
> list.
> 
> If a `vits_cq` has some pending commands then `min(pits-free-slots,
> vits-outstanding, VITS_BATCH_SIZE)` will be taken from the vITS
> command queue, translated and placed onto the pITS
> queue. `vits_cq.progress` will be updated to reflect this.
> 
> Each `vits_cq` is handled in turn in this way until the pITS Command
> Queue is full or there are no more outstanding commands.
> 
> There will likely need to be a data structure which shadows the pITS
> Command Queue slots with references to the `vits_cq` which has a
> command currently occupying that slot and corresponding the index into
> the virtual command queue, for use when completing a command.
> 
> `VITS_BATCH_SIZE` should be small, TBD say 4 or 8.
> 
> Possible simplification: If we arrange that no guest ever has multiple
> batches in flight (which can occur if we wrap around the list several
> times) then we may be able to simplify the book keeping
> required. However this may need some careful thought wrt fairness for
> guests submitting frequent small batches of commands vs those sending
> large batches.

I'm concerned about the time consumed by filling the pITS Command Queue.

AFAIU the process suggested, Xen will inject small batch as long as the
physical command queue is not full.

Let's take a simple case, only a single domain is using vITS on the
platform. If it injects a huge number of commands, Xen will split it
with lots of small batch. All batch will be injected in the same pass as
long as it fits in the physical command queue. Am I correct?

I think we have to restrict total number of batch (i.e for all the
domain) injected in a same scheduling pass.

I would even tend to allow only one in flight batch per domain. That
would limit the possible problem I pointed out.

> 
> ### Completion
> 
> It is expected that commands will normally be completed (resulting in
> an update of the corresponding `vits_cq.creadr`) via guest read from
> `CREADR`. This will trigger a scheduling pass which will ensure the
> `vits_cq.creadr` value is up to date before it is returned.
> 
> A guest which does completion via the use of `INT` cannot observe
> `CREADR` without reading it, so updating on read from `CREADR`
> suffices from the point of view of the guests observation of the
> state. (Of course we will inject the interrupt at the designated point
> and the guest may well then read `CREADR`)
> 
> However in order to keep the pITS Command Queue moving along we need
> to consider what happens if there are no `INT` based events nor reads
> from `CREADR` to drive completion and therefore refilling of the Queue
> with other outstanding commands.
> 
> A guest which enqueues some commands and then never checks for
> completion cannot itself block things because any other guest which
> reads `CREADR` will drive completion. However if _no_ guest reads from
> `CREADR` then completion will not occur and this must be dealt with.
> 
> Even if we include completion on `INT`-base interrupt injection then
> it is possible that the pITS queue may not contain any such
> interrupts, either because no guest is using them or because the
> batching means that none of them are enqueued on the active ring at
> the moment.
> 
> So we need a fallback to ensure that queue keeps moving. There are
> several options:
> 
> * A periodic timer in Xen which runs whenever there are outstanding
>   commands in the pITS. This is simple but pretty sucky.
> * Xen injects its own `INT` commands into the pITS ring. This requires
>   figuring out a device ID to use.
> 
> The second option is likely to be preferable if the issue of selecting
> a device ID can be addressed.
> 
> A secondary question is when these `INT` commands should be inserted
> into the command stream:
> 
> * After each batch taken from a single `vits_cq`;
> * After each scheduling pass;
> * One active in the command stream at any given time;
> 
> The latter should be sufficient, by arranging to insert a `INT` into
> the stream at the end of any scheduling pass which occurs while there
> is not a currently outstanding `INT` we have sufficient backstop to
> allow us to refill the ring.
> 
> This assumes that there is no particular benefit to keeping the
> `CWRITER` rolling ahead of the pITS's actual processing.

I don't understand this assumption. CWRITER will always point to the
last command in the queue.

> This is true
> because the IRS operates on commands in the order they appear in the

s/IRS/ITS/ ?

> queue, so there is no need to maintain a runway ahead of the ITS
> processing. (XXX If this is a concern perhaps the INT could be
> inserted at the head of the final batch of commands in a scheduling
> pass instead of the tail).
> 
> Xen itself should never need to issue an associated `SYNC` command,
> since the individual guests would need to issue those themselves when
> they care. The `INT` only serves to allow Xen to enqueue new commands
> when there is space on the ring, it has no interest itself on the
> actual completion.
> 
> ### Locking
> 
> It may be preferable to use `atomic_t` types for various fields
> (e.g. `vits_cq.creadr`) in order to reduce the amount and scope of
> locking required.
> 
> ### Multiple vITS instances in a single guest
> 
> As described above each vITS maps to exactly one pITS (while each pITS
> servers multiple vITSs).
> 
> In principal it might be possible to arrange that a vITS can enqueue
> commands to different pITSs depending on e.g. the device id. However
> this brings significant additional complexity (what to do with SYNC
> commands, how order completion such that one pITS does not block
> another, book keeping etc).
> 
> In addition the introduction of direct interrupt injection in version
> 4 GICs may imply a vITS per pITS. (XXX???)

GICv4 will directly mark the LPIs pending in the virtual pending table
which is per-redistributor (i.e per-vCPU).

LPIs will be received by the guest the same way as an SPIs. I.e trap in
IRQ mode then read ICC_IAR1_EL1 (for GICv3).

So I don't think that GICv4 will require one vITS per pITS.

> 
> Therefore it is proposed that the restriction that a single vITS maps
> to one pITS be retained. If a guest requires access to devices
> associated with multiple pITSs then multiple vITS should be
> configured.

Having multiple vITS per domain brings other issues:
	- How do you know the number of ITS to describe in the device tree at boot?
	- How do you tell to the guest that the PCI device is mapped to a
specific vITS?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-12 17:35   ` Julien Grall
@ 2015-05-13 13:23     ` Ian Campbell
  2015-05-13 14:26       ` Julien Grall
  0 siblings, 1 reply; 77+ messages in thread
From: Ian Campbell @ 2015-05-13 13:23 UTC (permalink / raw)
  To: Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Tue, 2015-05-12 at 18:35 +0100, Julien Grall wrote:
> > Message signalled interrupts are translated into an LPI via a
> > translation table which must be configured for each device which can
> > generate an MSI. The ITS uses the device id of the originating device
> > to lookup the corresponding translation table. Devices IDs are
> > typically described via system firmware, e.g. the ACPI IORT table or
> > via device tree.
> > 
> > The ITS is configured and managed, including establishing a
> > Translation Table for each device, via an in memory ring shared
> 
> s/an in/a/?

Either is acceptable IMHO. "an (in memory) ring" is how you would parse
what I've written.

> > # vITS
> > 
> > A guest domain which is allowed to use ITS functionality (i.e. has
> > been assigned pass-through devices which can generate MSIs) will be
> > presented with a virtualised ITS.
> > 
> > Accesses to the vITS registers will trap to Xen and be emulated and a
> > virtualised Command Queue will be provided.
> > 
> > Commands entered onto the virtual Command Queue will be translated
> > into physical commands (this translation is described in the GIC
> > specification).
> > 
> > XXX there are other aspects to virtualising the ITS (LPI collection
> > management, assignment of LPI ranges to guests).
> 
> Another aspect to think is device management.

Added.

> > However these are not
> > currently considered here. XXX Should they be/do they need to be?
> 
> I think those aspects are straightforward and doesn't require any
> specific design docs. We could discuss about it during the
> implementation (number of LPIs supported, LPIs allocations...).

OK

> > On read from the virtual `CREADR` iff the vits_cq is such that
> 
> s/iff/if/

"iff" is a shorthand for "if and only if". Apparently not as common as I
think it is though!

> 
> > commands are outstanding then a scheduling pass is attempted (in order
> > to update `vits_cq.creadr`). The current value of `vitq_cq.creadr` is
> > then returned.
> > 
> > ### pITS Scheduling
> 
> I'm not sure if the design document is the right place to talk about it.
> 
> If a domain die during the process , how would it affect the scheduler?


So I think we have to wait for them to finish.

Vague thoughts:

        We can't free a `vits_cq` while has things on the physical
        control
        queue, and we cannot cancel things which are on the control
        queue.
                
        So we must wait.
                
        Obviously don't enqueue anything new onto the pits if
        `d->is_dying`.
        
        `domain_relinquish_resources()` waits (somehow, with suitable
        continuations etc) for anything which the `vits_cq` has
        outstanding to be completed so that the datastructures can be
        cleared.

?

I've added that to a new section "Domain Shutdown" right after
scheduling.

> >   on that vits;
> > * On receipt of an interrupt notification arising from Xen's own use
> >   of `INT`; (see discussion under Completion)
> > * On any interrupt injection arising from a guests use of the `INT`
> >   command; (XXX perhaps, see discussion under Completion)
> 
> With all the solution suggested, it will be very likely that we will try
> to execute multiple the scheduling pass at the same time.
> 
> One way is to wait, until the previous pass as finished. But that would
> mean that the scheduler would be executed very often.
> 
> Or maybe you plan to offload the scheduler in a softirq?

Good point.

A soft irq might be one solution, but it is problematic during emulation
of `CREADR`, when we would like to do a pass immediately to complete any
operations outstanding for the domain doing the read.

Or just using spin_try_lock and not bothering if one is already in
progress might be another. But has similar problems.

Or we could defer only scheduling from `INT` (either guest or Xen's own)
to a softirq but do ones from `CREADR` emulation synchronously? The
softirq would be run on return from the interrupt handler but multiple
such would be coalesced I think?

I've not updated the doc (apart from a note to remember the issue) while
we think about this.

> 
> > Each scheduling pass will:
> > 
> > * Read the physical `CREADR`;
> > * For each command between `pits.last_creadr` and the new `CREADR`
> >   value process completion of that command and update the
> >   corresponding `vits_cq.creadr`.
> > * Attempt to refill the pITS Command Queue (see below).
> > 
> > ### Filling the pITS Command Queue.
> > 
> > Various algorithms could be used here. For now a simple proposal is
> > to traverse the `pits.schedule_list` starting from where the last
> > refill finished (i.e not from the top of the list each time).
> > 
> > If a `vits_cq` has no pending commands then it is removed from the
> > list.
> > 
> > If a `vits_cq` has some pending commands then `min(pits-free-slots,
> > vits-outstanding, VITS_BATCH_SIZE)` will be taken from the vITS
> > command queue, translated and placed onto the pITS
> > queue. `vits_cq.progress` will be updated to reflect this.
> > 
> > Each `vits_cq` is handled in turn in this way until the pITS Command
> > Queue is full or there are no more outstanding commands.
> > 
> > There will likely need to be a data structure which shadows the pITS
> > Command Queue slots with references to the `vits_cq` which has a
> > command currently occupying that slot and corresponding the index into
> > the virtual command queue, for use when completing a command.
> > 
> > `VITS_BATCH_SIZE` should be small, TBD say 4 or 8.
> > 
> > Possible simplification: If we arrange that no guest ever has multiple
> > batches in flight (which can occur if we wrap around the list several
> > times) then we may be able to simplify the book keeping
> > required. However this may need some careful thought wrt fairness for
> > guests submitting frequent small batches of commands vs those sending
> > large batches.
> 
> I'm concerned about the time consumed by filling the pITS Command Queue.
> 
> AFAIU the process suggested, Xen will inject small batch as long as the
> physical command queue is not full.

> Let's take a simple case, only a single domain is using vITS on the
> platform. If it injects a huge number of commands, Xen will split it
> with lots of small batch. All batch will be injected in the same pass as
> long as it fits in the physical command queue. Am I correct?

That's how it is currently written, yes. With the "possible
simplification" above the answer is no, only a batch at a time would be
written for each guest.

BTW, it doesn't have to be a single guest, the sum total of the
injections across all guests could also take a similar amount of time.
Is that a concern?

> I think we have to restrict total number of batch (i.e for all the
> domain) injected in a same scheduling pass.
> 
> I would even tend to allow only one in flight batch per domain. That
> would limit the possible problem I pointed out.

This is the "possible simplification" I think. Since it simplifies other
things (I think) as well as addressing this issue I think it might be a
good idea.

> 
> > 
> > ### Completion
> > 
> > It is expected that commands will normally be completed (resulting in
> > an update of the corresponding `vits_cq.creadr`) via guest read from
> > `CREADR`. This will trigger a scheduling pass which will ensure the
> > `vits_cq.creadr` value is up to date before it is returned.
> > 
> > A guest which does completion via the use of `INT` cannot observe
> > `CREADR` without reading it, so updating on read from `CREADR`
> > suffices from the point of view of the guests observation of the
> > state. (Of course we will inject the interrupt at the designated point
> > and the guest may well then read `CREADR`)
> > 
> > However in order to keep the pITS Command Queue moving along we need
> > to consider what happens if there are no `INT` based events nor reads
> > from `CREADR` to drive completion and therefore refilling of the Queue
> > with other outstanding commands.
> > 
> > A guest which enqueues some commands and then never checks for
> > completion cannot itself block things because any other guest which
> > reads `CREADR` will drive completion. However if _no_ guest reads from
> > `CREADR` then completion will not occur and this must be dealt with.
> > 
> > Even if we include completion on `INT`-base interrupt injection then
> > it is possible that the pITS queue may not contain any such
> > interrupts, either because no guest is using them or because the
> > batching means that none of them are enqueued on the active ring at
> > the moment.
> > 
> > So we need a fallback to ensure that queue keeps moving. There are
> > several options:
> > 
> > * A periodic timer in Xen which runs whenever there are outstanding
> >   commands in the pITS. This is simple but pretty sucky.
> > * Xen injects its own `INT` commands into the pITS ring. This requires
> >   figuring out a device ID to use.
> > 
> > The second option is likely to be preferable if the issue of selecting
> > a device ID can be addressed.
> > 
> > A secondary question is when these `INT` commands should be inserted
> > into the command stream:
> > 
> > * After each batch taken from a single `vits_cq`;
> > * After each scheduling pass;
> > * One active in the command stream at any given time;
> > 
> > The latter should be sufficient, by arranging to insert a `INT` into
> > the stream at the end of any scheduling pass which occurs while there
> > is not a currently outstanding `INT` we have sufficient backstop to
> > allow us to refill the ring.
> > 
> > This assumes that there is no particular benefit to keeping the
> > `CWRITER` rolling ahead of the pITS's actual processing.
> 
> I don't understand this assumption. CWRITER will always point to the
> last command in the queue.

Correct, but that might be ahead of where the pITS has actually gotten
to (which we cannot see).

What I am trying to say here is that there is no point in trying to
eagerly complete things (by checking `CREADR`) such that we can write
new things (and hence push `CWRITER` forward) just to keep ahead of the
pITS' processing.


> 
> > This is true
> > because the IRS operates on commands in the order they appear in the
> 
> s/IRS/ITS/ ?

Yes.

> 
> > queue, so there is no need to maintain a runway ahead of the ITS
> > processing. (XXX If this is a concern perhaps the INT could be
> > inserted at the head of the final batch of commands in a scheduling
> > pass instead of the tail).
> > 
> > Xen itself should never need to issue an associated `SYNC` command,
> > since the individual guests would need to issue those themselves when
> > they care. The `INT` only serves to allow Xen to enqueue new commands
> > when there is space on the ring, it has no interest itself on the
> > actual completion.
> > 
> > ### Locking
> > 
> > It may be preferable to use `atomic_t` types for various fields
> > (e.g. `vits_cq.creadr`) in order to reduce the amount and scope of
> > locking required.
> > 
> > ### Multiple vITS instances in a single guest
> > 
> > As described above each vITS maps to exactly one pITS (while each pITS
> > servers multiple vITSs).
> > 
> > In principal it might be possible to arrange that a vITS can enqueue
> > commands to different pITSs depending on e.g. the device id. However
> > this brings significant additional complexity (what to do with SYNC
> > commands, how order completion such that one pITS does not block
> > another, book keeping etc).
> > 
> > In addition the introduction of direct interrupt injection in version
> > 4 GICs may imply a vITS per pITS. (XXX???)
> 
> GICv4 will directly mark the LPIs pending in the virtual pending table
> which is per-redistributor (i.e per-vCPU).
> 
> LPIs will be received by the guest the same way as an SPIs. I.e trap in
> IRQ mode then read ICC_IAR1_EL1 (for GICv3).
> 
> So I don't think that GICv4 will require one vITS per pITS.

OK, that's good.

> > Therefore it is proposed that the restriction that a single vITS maps
> > to one pITS be retained. If a guest requires access to devices
> > associated with multiple pITSs then multiple vITS should be
> > configured.
> 
> Having multiple vITS per domain brings other issues:
> 	- How do you know the number of ITS to describe in the device tree at boot?

I'm not sure. I don't think 1 vs N is very different from the question
of 0 vs 1 though, somehow the tools need to know about the pITS setup.

> 	- How do you tell to the guest that the PCI device is mapped to a
> specific vITS?

Device Tree or IORT, just like on native and just like we'd have to tell
the guest about that mapping even if there was a single vITS.

I think the complexity of having one vITS target multiple pITSs is going
to be quite high in terms of data structures and the amount of
thinking/tracking scheduler code will have to do, mostly down to out of
order completion of things put in the pITS queue.

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-13 13:23     ` Ian Campbell
@ 2015-05-13 14:26       ` Julien Grall
  2015-05-15 10:59         ` Ian Campbell
  0 siblings, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-13 14:26 UTC (permalink / raw)
  To: Ian Campbell, Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

Hi Ian,

On 13/05/15 14:23, Ian Campbell wrote:
> On Tue, 2015-05-12 at 18:35 +0100, Julien Grall wrote:
>>> On read from the virtual `CREADR` iff the vits_cq is such that
>>
>> s/iff/if/
> 
> "iff" is a shorthand for "if and only if". Apparently not as common as I
> think it is though!

Oh ok. I wasn't aware about this shorthand.

> 
>>
>>> commands are outstanding then a scheduling pass is attempted (in order
>>> to update `vits_cq.creadr`). The current value of `vitq_cq.creadr` is
>>> then returned.
>>>
>>> ### pITS Scheduling
>>
>> I'm not sure if the design document is the right place to talk about it.
>>
>> If a domain die during the process , how would it affect the scheduler?
> 
> 
> So I think we have to wait for them to finish.
> 
> Vague thoughts:
> 
>         We can't free a `vits_cq` while has things on the physical
>         control
>         queue, and we cannot cancel things which are on the control
>         queue.
>                 
>         So we must wait.
>                 
>         Obviously don't enqueue anything new onto the pits if
>         `d->is_dying`.

Right.

>         `domain_relinquish_resources()` waits (somehow, with suitable
>         continuations etc) for anything which the `vits_cq` has
>         outstanding to be completed so that the datastructures can be
>         cleared.
> 
> ?

I think that would work.

> 
> I've added that to a new section "Domain Shutdown" right after
> scheduling.

Thanks.

> 
>>>   on that vits;
>>> * On receipt of an interrupt notification arising from Xen's own use
>>>   of `INT`; (see discussion under Completion)
>>> * On any interrupt injection arising from a guests use of the `INT`
>>>   command; (XXX perhaps, see discussion under Completion)
>>
>> With all the solution suggested, it will be very likely that we will try
>> to execute multiple the scheduling pass at the same time.
>>
>> One way is to wait, until the previous pass as finished. But that would
>> mean that the scheduler would be executed very often.
>>
>> Or maybe you plan to offload the scheduler in a softirq?
> 
> Good point.
> 
> A soft irq might be one solution, but it is problematic during emulation
> of `CREADR`, when we would like to do a pass immediately to complete any
> operations outstanding for the domain doing the read.
> 
> Or just using spin_try_lock and not bothering if one is already in
> progress might be another. But has similar problems.
> 
> Or we could defer only scheduling from `INT` (either guest or Xen's own)
> to a softirq but do ones from `CREADR` emulation synchronously? The
> softirq would be run on return from the interrupt handler but multiple
> such would be coalesced I think?

I think we could defer the scheduling to a softirq for CREADR too, if
the guest is using:
	- INT completion: vits.creadr would have been correctly update when
receiving the INT in xen.
	- polling completion: the guest will loop on CREADR. It will likely get
the info on the next read. The drawback is the guest may loose few
instructions cycle.

Overall, I don't think it's necessary to have an accurate CREADR.

[..]

>> AFAIU the process suggested, Xen will inject small batch as long as the
>> physical command queue is not full.
> 
>> Let's take a simple case, only a single domain is using vITS on the
>> platform. If it injects a huge number of commands, Xen will split it
>> with lots of small batch. All batch will be injected in the same pass as
>> long as it fits in the physical command queue. Am I correct?
> 
> That's how it is currently written, yes. With the "possible
> simplification" above the answer is no, only a batch at a time would be
> written for each guest.
> 
> BTW, it doesn't have to be a single guest, the sum total of the
> injections across all guests could also take a similar amount of time.
> Is that a concern?

Yes, the example with only a guest was easier to explain.

>> I think we have to restrict total number of batch (i.e for all the
>> domain) injected in a same scheduling pass.
>>
>> I would even tend to allow only one in flight batch per domain. That
>> would limit the possible problem I pointed out.
> 
> This is the "possible simplification" I think. Since it simplifies other
> things (I think) as well as addressing this issue I think it might be a
> good idea.

With the limitation of command send per batch, would the fairness you
were talking on the design doc still required?

[..]

>>> This assumes that there is no particular benefit to keeping the
>>> `CWRITER` rolling ahead of the pITS's actual processing.
>>
>> I don't understand this assumption. CWRITER will always point to the
>> last command in the queue.
> 
> Correct, but that might be ahead of where the pITS has actually gotten
> to (which we cannot see).
> 
> What I am trying to say here is that there is no point in trying to
> eagerly complete things (by checking `CREADR`) such that we can write
> new things (and hence push `CWRITER` forward) just to keep ahead of the
> pITS' processing.

With your explanation IRL, I better understand this point now. Thanks
for the explanation.

>>> Therefore it is proposed that the restriction that a single vITS maps
>>> to one pITS be retained. If a guest requires access to devices
>>> associated with multiple pITSs then multiple vITS should be
>>> configured.
>>
>> Having multiple vITS per domain brings other issues:
>> 	- How do you know the number of ITS to describe in the device tree at boot?
> 
> I'm not sure. I don't think 1 vs N is very different from the question
> of 0 vs 1 though, somehow the tools need to know about the pITS setup.

I don't see why the tools would require to know the pITS setup.

>> 	- How do you tell to the guest that the PCI device is mapped to a
>> specific vITS?
> 
> Device Tree or IORT, just like on native and just like we'd have to tell
> the guest about that mapping even if there was a single vITS.

Right, although the root controller can only be attached to one ITS.

It will be necessary to have multiple root controller in the guest in
the case of we passthrough devices using different ITS.

Is pci-back able to expose multiple root controller?

> I think the complexity of having one vITS target multiple pITSs is going
> to be quite high in terms of data structures and the amount of
> thinking/tracking scheduler code will have to do, mostly down to out of
> order completion of things put in the pITS queue.

I understand the complexity, but exposing on vITS per pITS means that we
are exposing the underlying hardware to the guest.

That bring a lot of complexity in the guest layout, which is right now
static. How do you decide the number of vITS/root controller exposed
(think about PCI hotplug)?

Given that PCI passthrough doesn't allow migration, maybe we could use
the layout of the hardware.

If we are going to expose multiple vITS to the guest, we should only use
vITS for guest using PCI passthrough. This is because migration won't be
compatible with it.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-12 15:02 ` Ian Campbell
  2015-05-12 17:35   ` Julien Grall
@ 2015-05-13 16:27   ` Vijay Kilari
  2015-05-15 11:28     ` Ian Campbell
  2015-05-15 11:45   ` Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling) Ian Campbell
  2 siblings, 1 reply; 77+ messages in thread
From: Vijay Kilari @ 2015-05-13 16:27 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Stefano Stabellini

Hi Ian,

   Few thoughts..

On Tue, May 12, 2015 at 8:32 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Tue, 2015-05-05 at 17:44 +0530, Vijay Kilari wrote:
>> Hi,
>>
>>    As discussed, here is the design doc/txt.
>
> There seems to be no consideration of multiple guests or VCPUs all
> accessing one or more vITS in parallel and the associated issues around
> fairness etc.
>
> Overall I think there needs to be a stronger logical separation between
> the vITS emulation and the stuff which interacts with the pITS
> (scheduling, completion handling etc).
>
> I've written up my thinking as a design doc below (it's pandoc and the
> pdf version is also at
> http://xenbits.xen.org/people/ianc/vits/draftA.pdf FWIW).
>
> Corrections and comments welcome. There are several XXX's in it,
> representing open questions or things I wasn't sure about how to handle.
>
> This only really covers command queue virtualisation and not other
> aspects (I'm not sure if they need covering or not).
>
> Lets try and use this as a basis for discussion so we can correct and
> amend it to represent what the actual design will be
>
> Ian.
>
> % Xen on ARM vITS Handling
> % Ian Campbell <ian.campbell@citrix.com>
> % Draft A
>
> # Introduction
>
> ARM systems containing a GIC version 3 or later may contain one or
> more ITS logical blocks. An ITS is used to route Message Signalled
> interrupts from devices into an LPI injection on the processor.
>
> The following summarises the ITS hardware design and serves as a set
> of assumptions for the vITS software design. (XXX it is entirely
> possible I've horribly misunderstood how this stuff fits
> together). For full details of the ITS see the "GIC Architecture
> Specification".
>
> Message signalled interrupts are translated into an LPI via a
> translation table which must be configured for each device which can
> generate an MSI. The ITS uses the device id of the originating device
> to lookup the corresponding translation table. Devices IDs are
> typically described via system firmware, e.g. the ACPI IORT table or
> via device tree.
>
> The ITS is configured and managed, including establishing a
> Translation Table for each device, via an in memory ring shared
> between the CPU and the ITS controller. The ring is managed via the
> `GITS_CBASER` register and indexed by `GITS_CWRITER` and `GITS_CREADR`
> registers.
>
> A processor adds commands to the shared ring and then updates
> `GITS_CWRITER` to make them visible to the ITS controller.
>
> The ITS controller processes commands from the ring and then updates
> `GITS_CREADR` to indicate the the processor that the command has been
> processed.
>
> Commands are processed sequentially.
>
> Commands sent on the ring include operational commands:
>
> * Routing interrupts to processors;
> * Generating interrupts;
> * Clearing the pending state of interrupts;
> * Synchronising the command queue
>
> and maintenance commands:
>
> * Map device/collection/processor;
> * Map virtual interrupt;
> * Clean interrupts;
> * Discard interrupts;
>
> The ITS provides no specific completion notification
> mechanism. Completion is monitored by a combination of a `SYNC`
> command and either polling `GITS_CREADR` or notification via an
> interrupt generated via the `INT` command.
>
> Note that the interrupt generation via `INT` requires an originating
> device ID to be supplied (which is then translated via the ITS into an
> LPI). No specific device ID is defined for this purpose and so the OS
> software is expected to fabricate one.
>
> Possible ways of inventing such a device ID are:
>
> * Enumerate all device ids in the system and pick another one;
> * Use a PCI BDF associated with a non-existent device function (such
>   as an unused one relating to the PCI root-bridge) and translate that
>   (via firmware tables) into a suitable device id;
> * ???
>
> # vITS
>
> A guest domain which is allowed to use ITS functionality (i.e. has
> been assigned pass-through devices which can generate MSIs) will be
> presented with a virtualised ITS.
>
> Accesses to the vITS registers will trap to Xen and be emulated and a
> virtualised Command Queue will be provided.
>
> Commands entered onto the virtual Command Queue will be translated
> into physical commands (this translation is described in the GIC
> specification).
>
> XXX there are other aspects to virtualising the ITS (LPI collection
> management, assignment of LPI ranges to guests). However these are not
> currently considered here. XXX Should they be/do they need to be?
>
> ## Requirements
>
> Emulation should not block in the hypervisor for extended periods. In
> particular Xen should not busy wait on the physical ITS. Doing so
> blocks the physical CPU from doing anything else (such as scheduling
> other VCPUS)
>
> There may be multiple guests which have a vITS, all targeting the same
> underlying pITS. A single guest VCPU should not be able to monopolise
> the pITS via its vITS and all guests should be able to make forward
> progress.
>
> ## Command Queue Virtualisation
>
> The command queue of each vITS is represented by a data structure:
>
>     struct vits_cq {
>         list_head schedule_list; /* Queued onto pits.schedule_list */
>         uint32_t creadr;         /* Virtual creadr */
>         uint32_t cwriter;        /* Virtual cwriter */
>         uint32_t progress;       /* Index of last command queued to pits */
>         [ Reference to command queue memory ]
>     };
>
> Each pITS has an associated data structure:
>
>     struct pits {
>         list_head schedule_list; /* Contains list of vitq_cq.schedule_lists */
>         uint32_t last_creadr;
>     };
>
> On write to the virtual `CWRITER` the cwriter field is updated and if
> that results in there being new outstanding requests then the vits_cq
> is enqueued onto pITS' schedule_list (unless it is already there).
>
> On read from the virtual `CREADR` iff the vits_cq is such that
> commands are outstanding then a scheduling pass is attempted (in order
> to update `vits_cq.creadr`). The current value of `vitq_cq.creadr` is
> then returned.
>
> ### pITS Scheduling
>
> A pITS scheduling pass is attempted:
>
> * On write to any virtual `CWRITER` iff that write results in there
>   being new outstanding requests for that vits;
> * On read from a virtual `CREADR` iff there are commands outstanding
>   on that vits;
> * On receipt of an interrupt notification arising from Xen's own use
>   of `INT`; (see discussion under Completion)

    If INT notification method is used, then I don't think there is need
for pITS scheduling on CREADER read.

As we discussed in patch #13. Below steps should be suffice to virtualize
command queue.

1) On each guest CWRITER update, Read batch ( 'm' commands) of commands
    and translate it and put on pITS schedule list. If there are more than 'm'
    commands create m/n entries in schedule list. Append INT command for each
     schedule list entry
     1a) If there is no ongoing command from this vITS on physical queue,
           send to physical queue.
     1b) If there is ongoing command return to guest.
2) On receiving completion interrupt, update CREADER of guest and post next
    command from schedule list to physical queue.

With this,
   - There will be no overhead of translating command in interrupt context
which is quite heavy because translating ITS command requires validating
and updating interval ITS structures.
   - Always only one request from guest will be posted to physical queue
   - Even in guest floods with large number of commands, all the commands
     will be translated and queued in schedule list and posted batch by batch
   - Scheduling pass is called only on CWRITER & completion INT.

> * On any interrupt injection arising from a guests use of the `INT`
>   command; (XXX perhaps, see discussion under Completion)
>
> Each scheduling pass will:
>
> * Read the physical `CREADR`;
> * For each command between `pits.last_creadr` and the new `CREADR`
>   value process completion of that command and update the
>   corresponding `vits_cq.creadr`.
> * Attempt to refill the pITS Command Queue (see below).
>
> ### Filling the pITS Command Queue.
>
> Various algorithms could be used here. For now a simple proposal is
> to traverse the `pits.schedule_list` starting from where the last
> refill finished (i.e not from the top of the list each time).
>
> If a `vits_cq` has no pending commands then it is removed from the
> list.
>
> If a `vits_cq` has some pending commands then `min(pits-free-slots,
> vits-outstanding, VITS_BATCH_SIZE)` will be taken from the vITS
> command queue, translated and placed onto the pITS
> queue. `vits_cq.progress` will be updated to reflect this.
>
> Each `vits_cq` is handled in turn in this way until the pITS Command
> Queue is full or there are no more outstanding commands.
>
> There will likely need to be a data structure which shadows the pITS
> Command Queue slots with references to the `vits_cq` which has a
> command currently occupying that slot and corresponding the index into
> the virtual command queue, for use when completing a command.
>
> `VITS_BATCH_SIZE` should be small, TBD say 4 or 8.
>
> Possible simplification: If we arrange that no guest ever has multiple
> batches in flight (which can occur if we wrap around the list several
> times) then we may be able to simplify the book keeping
> required. However this may need some careful thought wrt fairness for
> guests submitting frequent small batches of commands vs those sending
> large batches.

  If one LPI of the dummy device assigned to one VM, then book keeping
per vITS becomes simple

>
> ### Completion
>
> It is expected that commands will normally be completed (resulting in
> an update of the corresponding `vits_cq.creadr`) via guest read from
> `CREADR`. This will trigger a scheduling pass which will ensure the
> `vits_cq.creadr` value is up to date before it is returned.
>
    If guest is CREADR to know completion of command, no need
of scheduling pass if INT is used.

> A guest which does completion via the use of `INT` cannot observe
> `CREADR` without reading it, so updating on read from `CREADR`
> suffices from the point of view of the guests observation of the
> state. (Of course we will inject the interrupt at the designated point
> and the guest may well then read `CREADR`)

   Append Xen completion INT before guest INT command which
will update CREADER correctly before guest receives INT

>
> However in order to keep the pITS Command Queue moving along we need
> to consider what happens if there are no `INT` based events nor reads
> from `CREADR` to drive completion and therefore refilling of the Queue
> with other outstanding commands.
>
> A guest which enqueues some commands and then never checks for
> completion cannot itself block things because any other guest which
> reads `CREADR` will drive completion. However if _no_ guest reads from
> `CREADR` then completion will not occur and this must be dealt with.
>
   Do you mean CREADR of guest should check all the vITS of other
guests to post pending commands?

> Even if we include completion on `INT`-base interrupt injection then
> it is possible that the pITS queue may not contain any such
> interrupts, either because no guest is using them or because the
> batching means that none of them are enqueued on the active ring at
> the moment.
>
> So we need a fallback to ensure that queue keeps moving. There are
> several options:
>
> * A periodic timer in Xen which runs whenever there are outstanding
>   commands in the pITS. This is simple but pretty sucky.
> * Xen injects its own `INT` commands into the pITS ring. This requires
>   figuring out a device ID to use.
>
> The second option is likely to be preferable if the issue of selecting
> a device ID can be addressed.
>
> A secondary question is when these `INT` commands should be inserted
> into the command stream:
>
> * After each batch taken from a single `vits_cq`;
> * After each scheduling pass;
> * One active in the command stream at any given time;
>
> The latter should be sufficient, by arranging to insert a `INT` into
> the stream at the end of any scheduling pass which occurs while there
> is not a currently outstanding `INT` we have sufficient backstop to
> allow us to refill the ring.
>
> This assumes that there is no particular benefit to keeping the
> `CWRITER` rolling ahead of the pITS's actual processing. This is true
> because the IRS operates on commands in the order they appear in the
> queue, so there is no need to maintain a runway ahead of the ITS
> processing. (XXX If this is a concern perhaps the INT could be
> inserted at the head of the final batch of commands in a scheduling
> pass instead of the tail).
>
> Xen itself should never need to issue an associated `SYNC` command,
> since the individual guests would need to issue those themselves when
> they care. The `INT` only serves to allow Xen to enqueue new commands
> when there is space on the ring, it has no interest itself on the
> actual completion.
>
> ### Locking
>
> It may be preferable to use `atomic_t` types for various fields
> (e.g. `vits_cq.creadr`) in order to reduce the amount and scope of
> locking required.
>
> ### Multiple vITS instances in a single guest
>
> As described above each vITS maps to exactly one pITS (while each pITS
> servers multiple vITSs).
>

  IMO, one vITS per domain should be OK. For each command based
on the device ID, VITS will query PCI fwk, to know physical ITS
on which this device is attached and command will be sent to particular
pITS.

There are some expection like SYNC, INVALL which does not have
device id. In this case these commands are sent on all pITS in the platform.
(XXX: If a command is sent to all pITS, how to identify if command is
processed on all pITS?.)

> In principal it might be possible to arrange that a vITS can enqueue
> commands to different pITSs depending on e.g. the device id. However
> this brings significant additional complexity (what to do with SYNC
> commands, how order completion such that one pITS does not block
> another, book keeping etc).
>
> In addition the introduction of direct interrupt injection in version
> 4 GICs may imply a vITS per pITS. (XXX???)
>
> Therefore it is proposed that the restriction that a single vITS maps
> to one pITS be retained. If a guest requires access to devices
> associated with multiple pITSs then multiple vITS should be
> configured.
>
> ### vITS for purely software interrupts (e.g. event channels)
>
> It has been proposed that it might be nice to inject event channels as
> LPIs in the future. Whether or not that would involve any sort of vITS
> is unclear, but if it did then it would likely be a separate emulation
> to the vITS emulation used with a pITS and as such is not considered
> further here.
>
> # Glossary
>
> * _MSI_: Message Signalled Interrupt
> * _ITS_: Interrupt Translation Service
> * _GIC_: Generic Interrupt Controller
> * _LPI_: Locality-specific Peripheral Interrupt
>
> # References
>
> "GIC Architecture Specification" PRD03-GENC-010745 24.0
>
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-13 14:26       ` Julien Grall
@ 2015-05-15 10:59         ` Ian Campbell
  2015-05-15 11:26           ` Vijay Kilari
  2015-05-15 12:19           ` Julien Grall
  0 siblings, 2 replies; 77+ messages in thread
From: Ian Campbell @ 2015-05-15 10:59 UTC (permalink / raw)
  To: Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Wed, 2015-05-13 at 15:26 +0100, Julien Grall wrote:
> >>>   on that vits;
> >>> * On receipt of an interrupt notification arising from Xen's own use
> >>>   of `INT`; (see discussion under Completion)
> >>> * On any interrupt injection arising from a guests use of the `INT`
> >>>   command; (XXX perhaps, see discussion under Completion)
> >>
> >> With all the solution suggested, it will be very likely that we will try
> >> to execute multiple the scheduling pass at the same time.
> >>
> >> One way is to wait, until the previous pass as finished. But that would
> >> mean that the scheduler would be executed very often.
> >>
> >> Or maybe you plan to offload the scheduler in a softirq?
> > 
> > Good point.
> > 
> > A soft irq might be one solution, but it is problematic during emulation
> > of `CREADR`, when we would like to do a pass immediately to complete any
> > operations outstanding for the domain doing the read.
> > 
> > Or just using spin_try_lock and not bothering if one is already in
> > progress might be another. But has similar problems.
> > 
> > Or we could defer only scheduling from `INT` (either guest or Xen's own)
> > to a softirq but do ones from `CREADR` emulation synchronously? The
> > softirq would be run on return from the interrupt handler but multiple
> > such would be coalesced I think?
> 
> I think we could defer the scheduling to a softirq for CREADR too, if
> the guest is using:
> 	- INT completion: vits.creadr would have been correctly update when
> receiving the INT in xen.
> 	- polling completion: the guest will loop on CREADR. It will likely get
> the info on the next read. The drawback is the guest may loose few
> instructions cycle.
> 
> Overall, I don't think it's necessary to have an accurate CREADR.

Yes, deferring the update by one exit+enter might be tolerable. I added
after this list:
        This may result in lots of contention on the scheduler
        locking. Therefore we consider that in each case all which happens is
        triggering of a softirq which will be processed on return to guest,
        and just once even for multiple events. The is considered OK for the
        `CREADR` case because at worst the value read will be one cycle out of
        date.
        


> 
> [..]
> 
> >> AFAIU the process suggested, Xen will inject small batch as long as the
> >> physical command queue is not full.
> > 
> >> Let's take a simple case, only a single domain is using vITS on the
> >> platform. If it injects a huge number of commands, Xen will split it
> >> with lots of small batch. All batch will be injected in the same pass as
> >> long as it fits in the physical command queue. Am I correct?
> > 
> > That's how it is currently written, yes. With the "possible
> > simplification" above the answer is no, only a batch at a time would be
> > written for each guest.
> > 
> > BTW, it doesn't have to be a single guest, the sum total of the
> > injections across all guests could also take a similar amount of time.
> > Is that a concern?
> 
> Yes, the example with only a guest was easier to explain.

So as well as limiting the number of commands in each domains batch we
also want to limit the total number of batches?

> >> I think we have to restrict total number of batch (i.e for all the
> >> domain) injected in a same scheduling pass.
> >>
> >> I would even tend to allow only one in flight batch per domain. That
> >> would limit the possible problem I pointed out.
> > 
> > This is the "possible simplification" I think. Since it simplifies other
> > things (I think) as well as addressing this issue I think it might be a
> > good idea.
> 
> With the limitation of command send per batch, would the fairness you
> were talking on the design doc still required?

I think we still want to schedule the guest's in a strict round robin
manner, to avoid one guest monopolising things.

> >>> Therefore it is proposed that the restriction that a single vITS maps
> >>> to one pITS be retained. If a guest requires access to devices
> >>> associated with multiple pITSs then multiple vITS should be
> >>> configured.
> >>
> >> Having multiple vITS per domain brings other issues:
> >> 	- How do you know the number of ITS to describe in the device tree at boot?
> > 
> > I'm not sure. I don't think 1 vs N is very different from the question
> > of 0 vs 1 though, somehow the tools need to know about the pITS setup.
> 
> I don't see why the tools would require to know the pITS setup.

Even with only a single vits the tools need to know if the system has 0,
1, or more pits, to know whether to vreate a vits at all or not.

> >> 	- How do you tell to the guest that the PCI device is mapped to a
> >> specific vITS?
> > 
> > Device Tree or IORT, just like on native and just like we'd have to tell
> > the guest about that mapping even if there was a single vITS.
> 
> Right, although the root controller can only be attached to one ITS.
> 
> It will be necessary to have multiple root controller in the guest in
> the case of we passthrough devices using different ITS.
> 
> Is pci-back able to expose multiple root controller?

In principal the xenstore protocol supports it, but AFAIK all toolstacks
have only every used "bus" 0, so I wouldn't be surprised if there were
bugs lurking.

But we could fix those, I don't think it is a requirement that this
stuff suddenly springs into life on ARM even with existing kernels.

> > I think the complexity of having one vITS target multiple pITSs is going
> > to be quite high in terms of data structures and the amount of
> > thinking/tracking scheduler code will have to do, mostly down to out of
> > order completion of things put in the pITS queue.
> 
> I understand the complexity, but exposing on vITS per pITS means that we
> are exposing the underlying hardware to the guest.

Some aspect of it, yes, but it is still a virtual ITs.

> That bring a lot of complexity in the guest layout, which is right now
> static. How do you decide the number of vITS/root controller exposed
> (think about PCI hotplug)?
> 
> Given that PCI passthrough doesn't allow migration, maybe we could use
> the layout of the hardware.

That's an option.

> If we are going to expose multiple vITS to the guest, we should only use
> vITS for guest using PCI passthrough. This is because migration won't be
> compatible with it.

It would be possible to support one s/w only vits for migration, i.e the
evtchn thing at the end, but for the general case that is correct. On
x86 I believe that if you hot unplug all passthrough devices you can
migrate and then plug in other devices at the other end.

Anyway, more generally there are certainly problems with multiple vITS.
However there are also problems with a single vITS feeding multiple
pITSs:

      * What to do with global commands? Inject to all pITS and then
        synchronise on them all finishing.
      * Handling of out of order completion of commands queued with
        different pITS, since the vITS must appear to complete in order.
        Apart from the book keeping question it makes scheduling more
        interesting:
              * What if you have a pITS with slots available, and the
                guest command queue contains commands which could go to
                the pITS, but behind ones which are targetting another
                pITS which has no slots
              * What if one pITS is very busy and another is mostly idle
                and a guest submits one command to the busy one
                (contending with other guest) followed by a load of
                commands targeting the idle one. Those commands would be
                held up in this situation.
              * Reasoning about fairness may be harder.

I've but both your list and mine into the next revision of the document.
I think this remains an important open question.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 10:59         ` Ian Campbell
@ 2015-05-15 11:26           ` Vijay Kilari
  2015-05-15 11:30             ` Ian Campbell
  2015-05-15 12:19           ` Julien Grall
  1 sibling, 1 reply; 77+ messages in thread
From: Vijay Kilari @ 2015-05-15 11:26 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On Fri, May 15, 2015 at 4:29 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Wed, 2015-05-13 at 15:26 +0100, Julien Grall wrote:
>> >>>   on that vits;
>> >>> * On receipt of an interrupt notification arising from Xen's own use
>> >>>   of `INT`; (see discussion under Completion)
>> >>> * On any interrupt injection arising from a guests use of the `INT`
>> >>>   command; (XXX perhaps, see discussion under Completion)
>> >>
>> >> With all the solution suggested, it will be very likely that we will try
>> >> to execute multiple the scheduling pass at the same time.
>> >>
>> >> One way is to wait, until the previous pass as finished. But that would
>> >> mean that the scheduler would be executed very often.
>> >>
>> >> Or maybe you plan to offload the scheduler in a softirq?
>> >
>> > Good point.
>> >
>> > A soft irq might be one solution, but it is problematic during emulation
>> > of `CREADR`, when we would like to do a pass immediately to complete any
>> > operations outstanding for the domain doing the read.
>> >
>> > Or just using spin_try_lock and not bothering if one is already in
>> > progress might be another. But has similar problems.
>> >
>> > Or we could defer only scheduling from `INT` (either guest or Xen's own)
>> > to a softirq but do ones from `CREADR` emulation synchronously? The
>> > softirq would be run on return from the interrupt handler but multiple
>> > such would be coalesced I think?
>>
>> I think we could defer the scheduling to a softirq for CREADR too, if
>> the guest is using:
>>       - INT completion: vits.creadr would have been correctly update when
>> receiving the INT in xen.
>>       - polling completion: the guest will loop on CREADR. It will likely get
>> the info on the next read. The drawback is the guest may loose few
>> instructions cycle.
>>
>> Overall, I don't think it's necessary to have an accurate CREADR.
>
> Yes, deferring the update by one exit+enter might be tolerable. I added
> after this list:
>         This may result in lots of contention on the scheduler
>         locking. Therefore we consider that in each case all which happens is
>         triggering of a softirq which will be processed on return to guest,
>         and just once even for multiple events. The is considered OK for the
>         `CREADR` case because at worst the value read will be one cycle out of
>         date.
>
>
>
>>
>> [..]
>>
>> >> AFAIU the process suggested, Xen will inject small batch as long as the
>> >> physical command queue is not full.
>> >
>> >> Let's take a simple case, only a single domain is using vITS on the
>> >> platform. If it injects a huge number of commands, Xen will split it
>> >> with lots of small batch. All batch will be injected in the same pass as
>> >> long as it fits in the physical command queue. Am I correct?
>> >
>> > That's how it is currently written, yes. With the "possible
>> > simplification" above the answer is no, only a batch at a time would be
>> > written for each guest.
>> >
>> > BTW, it doesn't have to be a single guest, the sum total of the
>> > injections across all guests could also take a similar amount of time.
>> > Is that a concern?
>>
>> Yes, the example with only a guest was easier to explain.
>
> So as well as limiting the number of commands in each domains batch we
> also want to limit the total number of batches?
>
>> >> I think we have to restrict total number of batch (i.e for all the
>> >> domain) injected in a same scheduling pass.
>> >>
>> >> I would even tend to allow only one in flight batch per domain. That
>> >> would limit the possible problem I pointed out.
>> >
>> > This is the "possible simplification" I think. Since it simplifies other
>> > things (I think) as well as addressing this issue I think it might be a
>> > good idea.
>>
>> With the limitation of command send per batch, would the fairness you
>> were talking on the design doc still required?
>
> I think we still want to schedule the guest's in a strict round robin
> manner, to avoid one guest monopolising things.
>
>> >>> Therefore it is proposed that the restriction that a single vITS maps
>> >>> to one pITS be retained. If a guest requires access to devices
>> >>> associated with multiple pITSs then multiple vITS should be
>> >>> configured.
>> >>
>> >> Having multiple vITS per domain brings other issues:
>> >>    - How do you know the number of ITS to describe in the device tree at boot?
>> >
>> > I'm not sure. I don't think 1 vs N is very different from the question
>> > of 0 vs 1 though, somehow the tools need to know about the pITS setup.
>>
>> I don't see why the tools would require to know the pITS setup.
>
> Even with only a single vits the tools need to know if the system has 0,
> 1, or more pits, to know whether to vreate a vits at all or not.
>
>> >>    - How do you tell to the guest that the PCI device is mapped to a
>> >> specific vITS?
>> >
>> > Device Tree or IORT, just like on native and just like we'd have to tell
>> > the guest about that mapping even if there was a single vITS.
>>
>> Right, although the root controller can only be attached to one ITS.
>>
>> It will be necessary to have multiple root controller in the guest in
>> the case of we passthrough devices using different ITS.
>>
>> Is pci-back able to expose multiple root controller?
>
> In principal the xenstore protocol supports it, but AFAIK all toolstacks
> have only every used "bus" 0, so I wouldn't be surprised if there were
> bugs lurking.
>
> But we could fix those, I don't think it is a requirement that this
> stuff suddenly springs into life on ARM even with existing kernels.
>
>> > I think the complexity of having one vITS target multiple pITSs is going
>> > to be quite high in terms of data structures and the amount of
>> > thinking/tracking scheduler code will have to do, mostly down to out of
>> > order completion of things put in the pITS queue.
>>
>> I understand the complexity, but exposing on vITS per pITS means that we
>> are exposing the underlying hardware to the guest.
>
> Some aspect of it, yes, but it is still a virtual ITs.
>
>> That bring a lot of complexity in the guest layout, which is right now
>> static. How do you decide the number of vITS/root controller exposed
>> (think about PCI hotplug)?
>>
>> Given that PCI passthrough doesn't allow migration, maybe we could use
>> the layout of the hardware.
>
> That's an option.
>
>> If we are going to expose multiple vITS to the guest, we should only use
>> vITS for guest using PCI passthrough. This is because migration won't be
>> compatible with it.
>
> It would be possible to support one s/w only vits for migration, i.e the
> evtchn thing at the end, but for the general case that is correct. On
> x86 I believe that if you hot unplug all passthrough devices you can
> migrate and then plug in other devices at the other end.
>
> Anyway, more generally there are certainly problems with multiple vITS.
> However there are also problems with a single vITS feeding multiple
> pITSs:
>
>       * What to do with global commands? Inject to all pITS and then
>         synchronise on them all finishing.
>       * Handling of out of order completion of commands queued with
>         different pITS, since the vITS must appear to complete in order.
>         Apart from the book keeping question it makes scheduling more
>         interesting:
>               * What if you have a pITS with slots available, and the
>                 guest command queue contains commands which could go to
>                 the pITS, but behind ones which are targetting another
>                 pITS which has no slots
>               * What if one pITS is very busy and another is mostly idle
>                 and a guest submits one command to the busy one
>                 (contending with other guest) followed by a load of
>                 commands targeting the idle one. Those commands would be
>                 held up in this situation.
>               * Reasoning about fairness may be harder.
>
> I've but both your list and mine into the next revision of the document.
> I think this remains an important open question.
>

Handling of Single vITS and multipl pITS can be made simple.

All ITS commands except SYNC & INVALL has device id which will
help us to know to which pITS it should be sent.

SYNC & INVALL can be dropped by Xen on Guest request
 and let Xen append where ever SYNC & INVALL is required.
(Ex; Linux driver adds SYNC for required commands).
With this assumption, all ITS commands are mapped to pITS
and no need of synchronization across pITS

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-13 16:27   ` Vijay Kilari
@ 2015-05-15 11:28     ` Ian Campbell
  2015-05-15 12:38       ` Vijay Kilari
  0 siblings, 1 reply; 77+ messages in thread
From: Ian Campbell @ 2015-05-15 11:28 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Stefano Stabellini

On Wed, 2015-05-13 at 21:57 +0530, Vijay Kilari wrote:
> > * On receipt of an interrupt notification arising from Xen's own use
> >   of `INT`; (see discussion under Completion)
> 
>     If INT notification method is used, then I don't think there is need
> for pITS scheduling on CREADER read.
> 
> As we discussed in patch #13. Below steps should be suffice to virtualize
> command queue.
> 
> 1) On each guest CWRITER update, Read batch ( 'm' commands) of commands
>     and translate it and put on pITS schedule list. If there are more than 'm'
>     commands create m/n entries in schedule list. Append INT command for each
>      schedule list entry

How many INT commands do you mean here?

>      1a) If there is no ongoing command from this vITS on physical queue,
>            send to physical queue.
>      1b) If there is ongoing command return to guest.
> 2) On receiving completion interrupt, update CREADER of guest and post next
>     command from schedule list to physical queue.
> 
> With this,
>    - There will be no overhead of translating command in interrupt context
> which is quite heavy because translating ITS command requires validating
> and updating interval ITS structures.

Can you give some examples of the heaviest translations please so I can
get a feel for actually how expensive we are talking here.

>    - Always only one request from guest will be posted to physical queue
>    - Even in guest floods with large number of commands, all the commands
>      will be translated and queued in schedule list and posted batch by batch
>    - Scheduling pass is called only on CWRITER & completion INT.

I think the main difference in what you propose here is that commands
are queued in pre-translated form to be injected (cheaply) during
scheduling as opposed to being left on the guest queue and translated
directly into the pits queue.

I think `INT` vs `CREADR` scheduling is largely orthogonal to that.

Julien proposed moving scheduling to a softirq, which gets it out of IRQ
context (good) but does necessarily account the translation to the
guest, which is a benefit of your approach. (I think things wihch happen
in a sortirq are implicitly accounted  to current, whoever that may be)

On the downside pretranslation adds memory overhead and reintroduces the
issue of a potentially long synchronous translation during `CWRITER`
handling.

We could pretranslate a batch of commands into a s/w queue rather than
into the pits queue, but then we are back to where do we refill that
queue from.

The first draft wasn't particular clear on when translation occurs
(although I intended it to be during scheduling). I shall add some
treatment of that to the next draft.
> 
> > * On any interrupt injection arising from a guests use of the `INT`
> >   command; (XXX perhaps, see discussion under Completion)
> >
> > Each scheduling pass will:
> >
> > * Read the physical `CREADR`;
> > * For each command between `pits.last_creadr` and the new `CREADR`
> >   value process completion of that command and update the
> >   corresponding `vits_cq.creadr`.
> > * Attempt to refill the pITS Command Queue (see below).
> >
> > ### Filling the pITS Command Queue.
> >
> > Various algorithms could be used here. For now a simple proposal is
> > to traverse the `pits.schedule_list` starting from where the last
> > refill finished (i.e not from the top of the list each time).
> >
> > If a `vits_cq` has no pending commands then it is removed from the
> > list.
> >
> > If a `vits_cq` has some pending commands then `min(pits-free-slots,
> > vits-outstanding, VITS_BATCH_SIZE)` will be taken from the vITS
> > command queue, translated and placed onto the pITS
> > queue. `vits_cq.progress` will be updated to reflect this.
> >
> > Each `vits_cq` is handled in turn in this way until the pITS Command
> > Queue is full or there are no more outstanding commands.
> >
> > There will likely need to be a data structure which shadows the pITS
> > Command Queue slots with references to the `vits_cq` which has a
> > command currently occupying that slot and corresponding the index into
> > the virtual command queue, for use when completing a command.
> >
> > `VITS_BATCH_SIZE` should be small, TBD say 4 or 8.
> >
> > Possible simplification: If we arrange that no guest ever has multiple
> > batches in flight (which can occur if we wrap around the list several
> > times) then we may be able to simplify the book keeping
> > required. However this may need some careful thought wrt fairness for
> > guests submitting frequent small batches of commands vs those sending
> > large batches.
> 
>   If one LPI of the dummy device assigned to one VM, then book keeping
> per vITS becomes simple

What dummy device do you mean? What simplifications does it imply?

> 
> >
> > ### Completion
> >
> > It is expected that commands will normally be completed (resulting in
> > an update of the corresponding `vits_cq.creadr`) via guest read from
> > `CREADR`. This will trigger a scheduling pass which will ensure the
> > `vits_cq.creadr` value is up to date before it is returned.
> >
>     If guest is CREADR to know completion of command, no need
> of scheduling pass if INT is used.

We cannot know apriori which scheme a guest is going to use, nor do we
have the freedom to mandate a particular scheme, or even that the guest
uses the same scheme for every batch of commands.

So we need to design a system which works whether all guests use only
INT or all guests using only CREADR polling or anything in between.

A scheduling pass is not needed on INT injection (either Xen's or the
guests) in order to update `CREADR` (as you suggest), however it may be
necessary in order to keep the pITS command queue moving by scheduling
any outstanding commands. Consider the case of a guest which receives an
INT but does not subsequently read `CREADR` (at all or in a timely
manner).

> > A guest which does completion via the use of `INT` cannot observe
> > `CREADR` without reading it, so updating on read from `CREADR`
> > suffices from the point of view of the guests observation of the
> > state. (Of course we will inject the interrupt at the designated point
> > and the guest may well then read `CREADR`)
> 
>    Append Xen completion INT before guest INT command which
> will update CREADER correctly before guest receives INT

That means two interrupts. And there is no need because even with the
guest's own completion INT it won't see things until it reads CREADR
itself.

> > However in order to keep the pITS Command Queue moving along we need
> > to consider what happens if there are no `INT` based events nor reads
> > from `CREADR` to drive completion and therefore refilling of the Queue
> > with other outstanding commands.
> >
> > A guest which enqueues some commands and then never checks for
> > completion cannot itself block things because any other guest which
> > reads `CREADR` will drive completion. However if _no_ guest reads from
> > `CREADR` then completion will not occur and this must be dealt with.
> >
>    Do you mean CREADR of guest should check all the vITS of other
> guests to post pending commands?

In the proposal `CREADR` kicks off a scheduling pass, which is
independent of any particular vITS and operates only on the list of
scheduled vits, decoupling the vits from the pits scheduling.

> 
> > Even if we include completion on `INT`-base interrupt injection then
> > it is possible that the pITS queue may not contain any such
> > interrupts, either because no guest is using them or because the
> > batching means that none of them are enqueued on the active ring at
> > the moment.
> >
> > So we need a fallback to ensure that queue keeps moving. There are
> > several options:
> >
> > * A periodic timer in Xen which runs whenever there are outstanding
> >   commands in the pITS. This is simple but pretty sucky.
> > * Xen injects its own `INT` commands into the pITS ring. This requires
> >   figuring out a device ID to use.
> >
> > The second option is likely to be preferable if the issue of selecting
> > a device ID can be addressed.
> >
> > A secondary question is when these `INT` commands should be inserted
> > into the command stream:
> >
> > * After each batch taken from a single `vits_cq`;
> > * After each scheduling pass;
> > * One active in the command stream at any given time;
> >
> > The latter should be sufficient, by arranging to insert a `INT` into
> > the stream at the end of any scheduling pass which occurs while there
> > is not a currently outstanding `INT` we have sufficient backstop to
> > allow us to refill the ring.
> >
> > This assumes that there is no particular benefit to keeping the
> > `CWRITER` rolling ahead of the pITS's actual processing. This is true
> > because the IRS operates on commands in the order they appear in the
> > queue, so there is no need to maintain a runway ahead of the ITS
> > processing. (XXX If this is a concern perhaps the INT could be
> > inserted at the head of the final batch of commands in a scheduling
> > pass instead of the tail).
> >
> > Xen itself should never need to issue an associated `SYNC` command,
> > since the individual guests would need to issue those themselves when
> > they care. The `INT` only serves to allow Xen to enqueue new commands
> > when there is space on the ring, it has no interest itself on the
> > actual completion.
> >
> > ### Locking
> >
> > It may be preferable to use `atomic_t` types for various fields
> > (e.g. `vits_cq.creadr`) in order to reduce the amount and scope of
> > locking required.
> >
> > ### Multiple vITS instances in a single guest
> >
> > As described above each vITS maps to exactly one pITS (while each pITS
> > servers multiple vITSs).
> >
> 
>   IMO, one vITS per domain should be OK. For each command based
> on the device ID, VITS will query PCI fwk, to know physical ITS
> on which this device is attached and command will be sent to particular
> pITS.
> 
> There are some expection like SYNC, INVALL which does not have
> device id. In this case these commands are sent on all pITS in the platform.
> (XXX: If a command is sent to all pITS, how to identify if command is
> processed on all pITS?.)

That's one potential issue. I mentioned a couple of others in my reply
to Julien just now.

Draft B will have more discussion of these cases, but so far no firm
solution I think.

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 11:26           ` Vijay Kilari
@ 2015-05-15 11:30             ` Ian Campbell
  2015-05-15 12:03               ` Julien Grall
  0 siblings, 1 reply; 77+ messages in thread
From: Ian Campbell @ 2015-05-15 11:30 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On Fri, 2015-05-15 at 16:56 +0530, Vijay Kilari wrote:
> On Fri, May 15, 2015 at 4:29 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> > On Wed, 2015-05-13 at 15:26 +0100, Julien Grall wrote:
> >> >>>   on that vits;
> >> >>> * On receipt of an interrupt notification arising from Xen's own use
> >> >>>   of `INT`; (see discussion under Completion)
> >> >>> * On any interrupt injection arising from a guests use of the `INT`
> >> >>>   command; (XXX perhaps, see discussion under Completion)
> >> >>
> >> >> With all the solution suggested, it will be very likely that we will try
> >> >> to execute multiple the scheduling pass at the same time.
> >> >>
> >> >> One way is to wait, until the previous pass as finished. But that would
> >> >> mean that the scheduler would be executed very often.
> >> >>
> >> >> Or maybe you plan to offload the scheduler in a softirq?
> >> >
> >> > Good point.
> >> >
> >> > A soft irq might be one solution, but it is problematic during emulation
> >> > of `CREADR`, when we would like to do a pass immediately to complete any
> >> > operations outstanding for the domain doing the read.
> >> >
> >> > Or just using spin_try_lock and not bothering if one is already in
> >> > progress might be another. But has similar problems.
> >> >
> >> > Or we could defer only scheduling from `INT` (either guest or Xen's own)
> >> > to a softirq but do ones from `CREADR` emulation synchronously? The
> >> > softirq would be run on return from the interrupt handler but multiple
> >> > such would be coalesced I think?
> >>
> >> I think we could defer the scheduling to a softirq for CREADR too, if
> >> the guest is using:
> >>       - INT completion: vits.creadr would have been correctly update when
> >> receiving the INT in xen.
> >>       - polling completion: the guest will loop on CREADR. It will likely get
> >> the info on the next read. The drawback is the guest may loose few
> >> instructions cycle.
> >>
> >> Overall, I don't think it's necessary to have an accurate CREADR.
> >
> > Yes, deferring the update by one exit+enter might be tolerable. I added
> > after this list:
> >         This may result in lots of contention on the scheduler
> >         locking. Therefore we consider that in each case all which happens is
> >         triggering of a softirq which will be processed on return to guest,
> >         and just once even for multiple events. The is considered OK for the
> >         `CREADR` case because at worst the value read will be one cycle out of
> >         date.
> >
> >
> >
> >>
> >> [..]
> >>
> >> >> AFAIU the process suggested, Xen will inject small batch as long as the
> >> >> physical command queue is not full.
> >> >
> >> >> Let's take a simple case, only a single domain is using vITS on the
> >> >> platform. If it injects a huge number of commands, Xen will split it
> >> >> with lots of small batch. All batch will be injected in the same pass as
> >> >> long as it fits in the physical command queue. Am I correct?
> >> >
> >> > That's how it is currently written, yes. With the "possible
> >> > simplification" above the answer is no, only a batch at a time would be
> >> > written for each guest.
> >> >
> >> > BTW, it doesn't have to be a single guest, the sum total of the
> >> > injections across all guests could also take a similar amount of time.
> >> > Is that a concern?
> >>
> >> Yes, the example with only a guest was easier to explain.
> >
> > So as well as limiting the number of commands in each domains batch we
> > also want to limit the total number of batches?
> >
> >> >> I think we have to restrict total number of batch (i.e for all the
> >> >> domain) injected in a same scheduling pass.
> >> >>
> >> >> I would even tend to allow only one in flight batch per domain. That
> >> >> would limit the possible problem I pointed out.
> >> >
> >> > This is the "possible simplification" I think. Since it simplifies other
> >> > things (I think) as well as addressing this issue I think it might be a
> >> > good idea.
> >>
> >> With the limitation of command send per batch, would the fairness you
> >> were talking on the design doc still required?
> >
> > I think we still want to schedule the guest's in a strict round robin
> > manner, to avoid one guest monopolising things.
> >
> >> >>> Therefore it is proposed that the restriction that a single vITS maps
> >> >>> to one pITS be retained. If a guest requires access to devices
> >> >>> associated with multiple pITSs then multiple vITS should be
> >> >>> configured.
> >> >>
> >> >> Having multiple vITS per domain brings other issues:
> >> >>    - How do you know the number of ITS to describe in the device tree at boot?
> >> >
> >> > I'm not sure. I don't think 1 vs N is very different from the question
> >> > of 0 vs 1 though, somehow the tools need to know about the pITS setup.
> >>
> >> I don't see why the tools would require to know the pITS setup.
> >
> > Even with only a single vits the tools need to know if the system has 0,
> > 1, or more pits, to know whether to vreate a vits at all or not.
> >
> >> >>    - How do you tell to the guest that the PCI device is mapped to a
> >> >> specific vITS?
> >> >
> >> > Device Tree or IORT, just like on native and just like we'd have to tell
> >> > the guest about that mapping even if there was a single vITS.
> >>
> >> Right, although the root controller can only be attached to one ITS.
> >>
> >> It will be necessary to have multiple root controller in the guest in
> >> the case of we passthrough devices using different ITS.
> >>
> >> Is pci-back able to expose multiple root controller?
> >
> > In principal the xenstore protocol supports it, but AFAIK all toolstacks
> > have only every used "bus" 0, so I wouldn't be surprised if there were
> > bugs lurking.
> >
> > But we could fix those, I don't think it is a requirement that this
> > stuff suddenly springs into life on ARM even with existing kernels.
> >
> >> > I think the complexity of having one vITS target multiple pITSs is going
> >> > to be quite high in terms of data structures and the amount of
> >> > thinking/tracking scheduler code will have to do, mostly down to out of
> >> > order completion of things put in the pITS queue.
> >>
> >> I understand the complexity, but exposing on vITS per pITS means that we
> >> are exposing the underlying hardware to the guest.
> >
> > Some aspect of it, yes, but it is still a virtual ITs.
> >
> >> That bring a lot of complexity in the guest layout, which is right now
> >> static. How do you decide the number of vITS/root controller exposed
> >> (think about PCI hotplug)?
> >>
> >> Given that PCI passthrough doesn't allow migration, maybe we could use
> >> the layout of the hardware.
> >
> > That's an option.
> >
> >> If we are going to expose multiple vITS to the guest, we should only use
> >> vITS for guest using PCI passthrough. This is because migration won't be
> >> compatible with it.
> >
> > It would be possible to support one s/w only vits for migration, i.e the
> > evtchn thing at the end, but for the general case that is correct. On
> > x86 I believe that if you hot unplug all passthrough devices you can
> > migrate and then plug in other devices at the other end.
> >
> > Anyway, more generally there are certainly problems with multiple vITS.
> > However there are also problems with a single vITS feeding multiple
> > pITSs:
> >
> >       * What to do with global commands? Inject to all pITS and then
> >         synchronise on them all finishing.
> >       * Handling of out of order completion of commands queued with
> >         different pITS, since the vITS must appear to complete in order.
> >         Apart from the book keeping question it makes scheduling more
> >         interesting:
> >               * What if you have a pITS with slots available, and the
> >                 guest command queue contains commands which could go to
> >                 the pITS, but behind ones which are targetting another
> >                 pITS which has no slots
> >               * What if one pITS is very busy and another is mostly idle
> >                 and a guest submits one command to the busy one
> >                 (contending with other guest) followed by a load of
> >                 commands targeting the idle one. Those commands would be
> >                 held up in this situation.
> >               * Reasoning about fairness may be harder.
> >
> > I've but both your list and mine into the next revision of the document.
> > I think this remains an important open question.
> >
> 
> Handling of Single vITS and multipl pITS can be made simple.
> 
> All ITS commands except SYNC & INVALL has device id which will
> help us to know to which pITS it should be sent.
> 
> SYNC & INVALL can be dropped by Xen on Guest request
>  and let Xen append where ever SYNC & INVALL is required.
> (Ex; Linux driver adds SYNC for required commands).
> With this assumption, all ITS commands are mapped to pITS
> and no need of synchronization across pITS

You've ignored the second bullet its three sub-bullets, I think.

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-12 15:02 ` Ian Campbell
  2015-05-12 17:35   ` Julien Grall
  2015-05-13 16:27   ` Vijay Kilari
@ 2015-05-15 11:45   ` Ian Campbell
  2015-05-15 14:55     ` Julien Grall
  2 siblings, 1 reply; 77+ messages in thread
From: Ian Campbell @ 2015-05-15 11:45 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Stefano Stabellini

On Tue, 2015-05-12 at 16:02 +0100, Ian Campbell wrote:
> I've written up my thinking as a design doc below (it's pandoc and the
> pdf version is also at
> http://xenbits.xen.org/people/ianc/vits/draftA.pdf FWIW).

Here is a second draft based on the feedback so far. Also at
http://xenbits.xen.org/people/ianc/vits/draftB.{pdf,html}.

So far I think we are mostly at the stage of gather open questions and
enumerate the issues rather than actually beginning reaching any
conclusion. That's OK (and part of the purpose).

Ian.
-----

% Xen on ARM vITS Handling
% Ian Campbell <ian.campbell@citrix.com>
% Draft B

# Changelog

## Since Draft A

* Added discussion of when/where command translation occurs.
* Contention on scheduler lock, suggestion to use SOFTIRQ.
* Handling of domain shutdown.
* More detailed discussion of multiple vs single vits pros/cons.

# Introduction

ARM systems containing a GIC version 3 or later may contain one or
more ITS logical blocks. An ITS is used to route Message Signalled
interrupts from devices into an LPI injection on the processor.

The following summarises the ITS hardware design and serves as a set
of assumptions for the vITS software design. (XXX it is entirely
possible I've horribly misunderstood how this stuff fits
together). For full details of the ITS see the "GIC Architecture
Specification".

Message signalled interrupts are translated into an LPI via a
translation table which must be configured for each device which can
generate an MSI. The ITS uses the device id of the originating device
to lookup the corresponding translation table. Devices IDs are
typically described via system firmware, e.g. the ACPI IORT table or
via device tree.

The ITS is configured and managed, including establishing a
Translation Table for each device, via an in memory ring shared
between the CPU and the ITS controller. The ring is managed via the
`GITS_CBASER` register and indexed by `GITS_CWRITER` and `GITS_CREADR`
registers.

A processor adds commands to the shared ring and then updates
`GITS_CWRITER` to make them visible to the ITS controller.

The ITS controller processes commands from the ring and then updates
`GITS_CREADR` to indicate the the processor that the command has been
processed.

Commands are processed sequentially.

Commands sent on the ring include operational commands:

* Routing interrupts to processors;
* Generating interrupts;
* Clearing the pending state of interrupts;
* Synchronising the command queue

and maintenance commands:

* Map device/collection/processor;
* Map virtual interrupt;
* Clean interrupts;
* Discard interrupts;

The ITS provides no specific completion notification
mechanism. Completion is monitored by a combination of a `SYNC`
command and either polling `GITS_CREADR` or notification via an
interrupt generated via the `INT` command.

Note that the interrupt generation via `INT` requires an originating
device ID to be supplied (which is then translated via the ITS into an
LPI). No specific device ID is defined for this purpose and so the OS
software is expected to fabricate one.

Possible ways of inventing such a device ID are:

* Enumerate all device ids in the system and pick another one;
* Use a PCI BDF associated with a non-existent device function (such
  as an unused one relating to the PCI root-bridge) and translate that
  (via firmware tables) into a suitable device id;
* ???

# vITS

A guest domain which is allowed to use ITS functionality (i.e. has
been assigned pass-through devices which can generate MSIs) will be
presented with a virtualised ITS.

Accesses to the vITS registers will trap to Xen and be emulated and a
virtualised Command Queue will be provided.

Commands entered onto the virtual Command Queue will be translated
into physical commands (this translation is described in the GIC
specification).

XXX there are other aspects to virtualising the ITS (LPI collection
management, assignment of LPI ranges to guests, device
management). However these are not currently considered here. XXX
Should they be/do they need to be?

## Requirements

Emulation should not block in the hypervisor for extended periods. In
particular Xen should not busy wait on the physical ITS. Doing so
blocks the physical CPU from doing anything else (such as scheduling
other VCPUS)

There may be multiple guests which have a vITS, all targeting the same
underlying pITS. A single guest VCPU should not be able to monopolise
the pITS via its vITS and all guests should be able to make forward
progress.

## Command Queue Virtualisation

The command queue of each vITS is represented by a data structure:

    struct vits_cq {
        list_head schedule_list; /* Queued onto pits.schedule_list */
        uint32_t creadr;         /* Virtual creadr */
        uint32_t cwriter;        /* Virtual cwriter */
        uint32_t progress;       /* Index of last command queued to pits */
        [ Reference to command queue memory ]
    };

Each pITS has an associated data structure:

    struct pits {
        list_head schedule_list; /* Contains list of vitq_cq.schedule_lists */
	uint32_t last_creadr;
    };

On write to the virtual `CWRITER` the cwriter field is updated and if
that results in there being new outstanding requests then the vits_cq
is enqueued onto pITS' schedule_list (unless it is already there).

On read from the virtual `CREADR` iff the vits_cq is such that
commands are outstanding then a scheduling pass is attempted (in order
to update `vits_cq.creadr`). The current value of `vitq_cq.creadr` is
then returned.

### Command translation

In order to virtualise the Command Queue each command must be
translated (this is described in the GIC spec).

Translation of certain commands can be expensive (XXX citation
needed).

Translation can be done in two places:

* During scheduling.
* On write to `CWRITER`, into a per `vits_cq` queue which the
  scheduler then propagates to the pits.

Doing the translate during scheduling means that potentially expensive
operations may be accounted to `current`, who may have nothing to do
with those operations (this is true whether it is IRQ context or
SOFTIRQ context).

Doing the translate during `CWRITER` emulation accounts it to the
right place, but introduces a potentially long synchronous operation
which ties down a VCPU. Introducing batching here means we have
essentially the same issue wrt when to replenish the translated queue
as doing translate during scheduling.

Translate during `CWRITER` also has memory overheads. Unclear if they
are at a problematic scale or not.

XXX need a solution for this.

XXX Can we arrange a scheme where a pretranslated queue is replensihed
(in batches) only on return to a vcpu owned by that guest (getting
accounting right). This would involve some careful logic to kick vcpus
at partiuclar times, and presumably some spurious wake ups.

### pITS Scheduling

A pITS scheduling pass is attempted:

* On write to any virtual `CWRITER` iff that write results in there
  being new outstanding requests for that vits;
* On read from a virtual `CREADR` iff there are commands outstanding
  on that vits;
* On receipt of an interrupt notification arising from Xen's own use
  of `INT`; (see discussion under Completion)
* On any interrupt injection arising from a guests use of the `INT`
  command; (XXX perhaps, see discussion under Completion)

This may result in lots of contention on the scheduler
locking. Therefore we consider that in each case all which happens is
triggering of a softirq which will be processed on return to guest,
and just once even for multiple events.

Such deferal could be considered OK (XXX ???) for the `CREADR` case
because at worst the value read will be one cycle out of date. A guest
which receives an `INT` notification might reasonably expect a
subsequent read of `CREADR` to reflect that. However that should be
covered by the softint processing which would occur on entry to the
guest to inject the `INT`.

Each scheduling pass will:

* Read the physical `CREADR`;
* For each command between `pits.last_creadr` and the new `CREADR`
  value process completion of that command and update the
  corresponding `vits_cq.creadr`.
* Attempt to refill the pITS Command Queue (see below).

### Domain Shutdown

We can't free a `vits_cq` while has things on the physical control
queue, and we cannot cancel things which are on the control queue.

So we must wait.

Obviously don't enqueue anything new onto the pits if `d->is_dying`.

`domain_relinquish_resources()` waits (somehow, with suitable
continuations etc) for anything which the `vits_cq` has outstanding to
be completed so that the datastructures can be cleared.

### Filling the pITS Command Queue.

Various algorithms could be used here. For now a simple proposal is
to traverse the `pits.schedule_list` starting from where the last
refill finished (i.e not from the top of the list each time).

If a `vits_cq` has no pending commands then it is removed from the
list.

If a `vits_cq` has some pending commands then `min(pits-free-slots,
vits-outstanding, VITS_BATCH_SIZE)` will be taken from the vITS
command queue, translated and placed onto the pITS
queue. `vits_cq.progress` will be updated to reflect this.

Each `vits_cq` is handled in turn in this way until the pITS Command
Queue is full or there are no more outstanding commands.

There will likely need to be a data structure which shadows the pITS
Command Queue slots with references to the `vits_cq` which has a
command currently occupying that slot and corresponding the index into
the virtual command queue, for use when completing a command.

`VITS_BATCH_SIZE` should be small, TBD say 4 or 8.

Possible simplification: If we arrange that no guest ever has multiple
batches in flight (which can occur if we wrap around the list several
times) then we may be able to simplify the book keeping
required. However this may need some careful thought wrt fairness for
guests submitting frequent small batches of commands vs those sending
large batches.

XXX concern: Time spent filling the pITS queue could be significant if
guests are allowed to fill the ring completely.

### Completion

It is expected that commands will normally be completed (resulting in
an update of the corresponding `vits_cq.creadr`) via guest read from
`CREADR`. This will trigger a scheduling pass which will ensure the
`vits_cq.creadr` value is up to date before it is returned.

A guest which does completion via the use of `INT` cannot observe
`CREADR` without reading it, so updating on read from `CREADR`
suffices from the point of view of the guests observation of the
state. (Of course we will inject the interrupt at the designated point
and the guest may well then read `CREADR`)

However in order to keep the pITS Command Queue moving along we need
to consider what happens if there are no `INT` based events nor reads
from `CREADR` to drive completion and therefore refilling of the Queue
with other outstanding commands.

A guest which enqueues some commands and then never checks for
completion cannot itself block things because any other guest which
reads `CREADR` will drive completion. However if _no_ guest reads from
`CREADR` then completion will not occur and this must be dealt with.

Even if we include completion on `INT`-base interrupt injection then
it is possible that the pITS queue may not contain any such
interrupts, either because no guest is using them or because the
batching means that none of them are enqueued on the active ring at
the moment.

So we need a fallback to ensure that queue keeps moving. There are
several options:

* A periodic timer in Xen which runs whenever there are outstanding
  commands in the pITS. This is simple but pretty sucky.
* Xen injects its own `INT` commands into the pITS ring. This requires
  figuring out a device ID to use.

The second option is likely to be preferable if the issue of selecting
a device ID can be addressed.

A secondary question is when these `INT` commands should be inserted
into the command stream:

* After each batch taken from a single `vits_cq`;
* After each scheduling pass;
* One active in the command stream at any given time;

The latter should be sufficient, by arranging to insert a `INT` into
the stream at the end of any scheduling pass which occurs while there
is not a currently outstanding `INT` we have sufficient backstop to
allow us to refill the ring.

This assumes that there is no particular benefit to keeping the
`CWRITER` rolling ahead of the pITS's actual processing. This is true
because the ITS operates on commands in the order they appear in the
queue, so there is no need to maintain a runway ahead of the ITS
processing. (XXX If this is a concern perhaps the INT could be
inserted at the head of the final batch of commands in a scheduling
pass instead of the tail).

Xen itself should never need to issue an associated `SYNC` command,
since the individual guests would need to issue those themselves when
they care. The `INT` only serves to allow Xen to enqueue new commands
when there is space on the ring, it has no interest itself on the
actual completion.

### Locking

It may be preferable to use `atomic_t` types for various fields
(e.g. `vits_cq.creadr`) in order to reduce the amount and scope of
locking required.

### Multiple vITS instances in a single guest

As described above each vITS maps to exactly one pITS (while each pITS
serves multiple vITSs).

It could be possible to arrange that a vITS can enqueue commands to
different pITSs depending on e.g. the device id.

However each approach has issues.

In 1 vITS per pITS:

* Exposing on vITS per pITS means that we are exposing something about
  the underlying hardware to the guest.
* Adds complexity to the guest layout, which is right now static. How
  do you decide the number of vITS/root controller exposed:
    * Hotplug is tricky
* Toolstack needs greater knowledge of the host layout
* Given that PCI passthrough doesn't allow migration, maybe we could
  use the layout of the hardware.

In 1 vITS for all pITS:

* What to do with global commands? Inject to all pITS and then
  synchronise on them all finishing.
* Handling of out of order completion of commands queued with
  different pITS, since the vITS must appear to complete in
  order. Apart from the book keeping question it makes scheduling more
  interesting:
    * What if you have a pITS with slots available, and the guest command
      queue contains commands which could go to the pITS, but behind ones
      which are targetting another pITS which has no slots
    * What if one pITS is very busy and another is mostly idle and a
      guest submits one command to the busy one (contending with other
      guest) followed by a load of commands targeting the idle one. Those
      commands would be held up in this situation.
    * Reasoning about fairness may be harder.

XXX need a solution/decision here.

In addition the introduction of direct interrupt injection in version
4 GICs may imply a vITS per pITS. (Update: it seems not)

### vITS for purely software interrupts (e.g. event channels)

It has been proposed that it might be nice to inject event channels as
LPIs in the future. Whether or not that would involve any sort of vITS
is unclear, but if it did then it would likely be a separate emulation
to the vITS emulation used with a pITS and as such is not considered
further here.

# Glossary

* _MSI_: Message Signalled Interrupt
* _ITS_: Interrupt Translation Service
* _GIC_: Generic Interrupt Controller
* _LPI_: Locality-specific Peripheral Interrupt

# References

"GIC Architecture Specification" PRD03-GENC-010745 24.0

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 11:30             ` Ian Campbell
@ 2015-05-15 12:03               ` Julien Grall
  2015-05-15 12:47                 ` Vijay Kilari
  0 siblings, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-15 12:03 UTC (permalink / raw)
  To: Ian Campbell, Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On 15/05/15 12:30, Ian Campbell wrote:
>> Handling of Single vITS and multipl pITS can be made simple.
>>
>> All ITS commands except SYNC & INVALL has device id which will
>> help us to know to which pITS it should be sent.
>>
>> SYNC & INVALL can be dropped by Xen on Guest request
>>  and let Xen append where ever SYNC & INVALL is required.
>> (Ex; Linux driver adds SYNC for required commands).
>> With this assumption, all ITS commands are mapped to pITS
>> and no need of synchronization across pITS
> 
> You've ignored the second bullet its three sub-bullets, I think.

Aside ignoring the second bullet it's not possible to drop like that a
SYNC/INVALL command sent be the guest. How can you decide when a SYNC is
required or not? Why dropping "optional" SYNC would be fine? The spec
only says "This command specifies that all actions for the specified
re-distributor must be completed"...

Linux is not a good example for respecting the spec. Developers may
decide to put SYNC differently in new necessary place and we won't be
able to handle it correctly in Xen (see the vGICv3 re-dist example...).

If we go on one vITS per multiple pITS we would have to send the command
SYNC/INVALL to every pITS.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 10:59         ` Ian Campbell
  2015-05-15 11:26           ` Vijay Kilari
@ 2015-05-15 12:19           ` Julien Grall
  2015-05-15 12:58             ` Ian Campbell
  1 sibling, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-15 12:19 UTC (permalink / raw)
  To: Ian Campbell, Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On 15/05/15 11:59, Ian Campbell wrote:
>>>> AFAIU the process suggested, Xen will inject small batch as long as the
>>>> physical command queue is not full.
>>>
>>>> Let's take a simple case, only a single domain is using vITS on the
>>>> platform. If it injects a huge number of commands, Xen will split it
>>>> with lots of small batch. All batch will be injected in the same pass as
>>>> long as it fits in the physical command queue. Am I correct?
>>>
>>> That's how it is currently written, yes. With the "possible
>>> simplification" above the answer is no, only a batch at a time would be
>>> written for each guest.
>>>
>>> BTW, it doesn't have to be a single guest, the sum total of the
>>> injections across all guests could also take a similar amount of time.
>>> Is that a concern?
>>
>> Yes, the example with only a guest was easier to explain.
> 
> So as well as limiting the number of commands in each domains batch we
> also want to limit the total number of batches?

Right. We want to have a "short" scheduling pass no matter the size of
the queue.

>>>> I think we have to restrict total number of batch (i.e for all the
>>>> domain) injected in a same scheduling pass.
>>>>
>>>> I would even tend to allow only one in flight batch per domain. That
>>>> would limit the possible problem I pointed out.
>>>
>>> This is the "possible simplification" I think. Since it simplifies other
>>> things (I think) as well as addressing this issue I think it might be a
>>> good idea.
>>
>> With the limitation of command send per batch, would the fairness you
>> were talking on the design doc still required?
> 
> I think we still want to schedule the guest's in a strict round robin
> manner, to avoid one guest monopolising things.

I agree, although I was talking about the fairness you mentionned in
"However this may need some careful thought wrt fairness for
guests submitting frequent small batches of commands vs those sending
large batches."

>>>>> Therefore it is proposed that the restriction that a single vITS maps
>>>>> to one pITS be retained. If a guest requires access to devices
>>>>> associated with multiple pITSs then multiple vITS should be
>>>>> configured.
>>>>
>>>> Having multiple vITS per domain brings other issues:
>>>> 	- How do you know the number of ITS to describe in the device tree at boot?
>>>
>>> I'm not sure. I don't think 1 vs N is very different from the question
>>> of 0 vs 1 though, somehow the tools need to know about the pITS setup.
>>
>> I don't see why the tools would require to know the pITS setup.
> 
> Even with only a single vits the tools need to know if the system has 0,
> 1, or more pits, to know whether to vreate a vits at all or not.

In the 1 vITS solution no, it's only necessary to add a new gic define
for the gic_version field in xen_arch_domainconfig.

Although, I agree that in multiple vITS configuration we would need to
know the number of vITS to create (not necessarily the number of pITS).

>>>> 	- How do you tell to the guest that the PCI device is mapped to a
>>>> specific vITS?
>>>
>>> Device Tree or IORT, just like on native and just like we'd have to tell
>>> the guest about that mapping even if there was a single vITS.
>>
>> Right, although the root controller can only be attached to one ITS.
>>
>> It will be necessary to have multiple root controller in the guest in
>> the case of we passthrough devices using different ITS.
>>
>> Is pci-back able to expose multiple root controller?
> 
> In principal the xenstore protocol supports it, but AFAIK all toolstacks
> have only every used "bus" 0, so I wouldn't be surprised if there were
> bugs lurking.
> 
> But we could fix those, I don't think it is a requirement that this
> stuff suddenly springs into life on ARM even with existing kernels.

Right.

> 
>>> I think the complexity of having one vITS target multiple pITSs is going
>>> to be quite high in terms of data structures and the amount of
>>> thinking/tracking scheduler code will have to do, mostly down to out of
>>> order completion of things put in the pITS queue.
>>
>> I understand the complexity, but exposing on vITS per pITS means that we
>> are exposing the underlying hardware to the guest.
> 
> Some aspect of it, yes, but it is still a virtual ITs.

Yes and no. It make more complex the migration case (even without PCI
passthrough). See below.

>> If we are going to expose multiple vITS to the guest, we should only use
>> vITS for guest using PCI passthrough. This is because migration won't be
>> compatible with it.
> 
> It would be possible to support one s/w only vits for migration, i.e the
> evtchn thing at the end, but for the general case that is correct. On
> x86 I believe that if you hot unplug all passthrough devices you can
> migrate and then plug in other devices at the other end.

What about migration on platform having fewer/more pITS (AFAIU on cavium
it may be possible because there is only one node)? If we want to
migrate vITS we should have to handle case where there is a mismatch.
Which brings to the solution with one vITS.

As said your event channel paragraph, we should put aside the event
channel injected by the vITS for now. It was only a suggestion and it
will require more though that the vITS emulation.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 11:28     ` Ian Campbell
@ 2015-05-15 12:38       ` Vijay Kilari
  2015-05-15 13:06         ` Ian Campbell
  2015-05-15 13:17         ` Julien Grall
  0 siblings, 2 replies; 77+ messages in thread
From: Vijay Kilari @ 2015-05-15 12:38 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Stefano Stabellini

On Fri, May 15, 2015 at 4:58 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Wed, 2015-05-13 at 21:57 +0530, Vijay Kilari wrote:
>> > * On receipt of an interrupt notification arising from Xen's own use
>> >   of `INT`; (see discussion under Completion)
>>
>>     If INT notification method is used, then I don't think there is need
>> for pITS scheduling on CREADER read.
>>
>> As we discussed in patch #13. Below steps should be suffice to virtualize
>> command queue.
>>
>> 1) On each guest CWRITER update, Read batch ( 'm' commands) of commands
>>     and translate it and put on pITS schedule list. If there are more than 'm'
>>     commands create m/n entries in schedule list. Append INT command for each
>>      schedule list entry
>
> How many INT commands do you mean here?

   One INT command (Xen's completion INT) per batch

>
>>      1a) If there is no ongoing command from this vITS on physical queue,
>>            send to physical queue.
>>      1b) If there is ongoing command return to guest.
>> 2) On receiving completion interrupt, update CREADER of guest and post next
>>     command from schedule list to physical queue.
>>
>> With this,
>>    - There will be no overhead of translating command in interrupt context
>> which is quite heavy because translating ITS command requires validating
>> and updating interval ITS structures.
>
> Can you give some examples of the heaviest translations please so I can
> get a feel for actually how expensive we are talking here.
>
    For example to translate MAPVI device_ID, event_ID, vID, vCID

    1) Read from vITS command queue
    2) Validate device_ID is valid by looking at device list attached
to that domain (vITS)
    3) Validate vCID (virtual Collection ID) by checking against
re-distributor address/cpu numbers
        of this domain
    4) Allocate physical LPI for the vID (virtual LPI) from lpi map of
this device
           - Check if virtual LPI is already allocated from this device.
           - If not allocate it
           - Update lpi entries for this device
    5) Allocate memory for physical LPI descriptor (Add radix tree
entry) and populate it
    6) Call route_irq_to_guest() for this LPI
    7) Format physical ITS command and send to pITS

>>    - Always only one request from guest will be posted to physical queue
>>    - Even in guest floods with large number of commands, all the commands
>>      will be translated and queued in schedule list and posted batch by batch
>>    - Scheduling pass is called only on CWRITER & completion INT.
>
> I think the main difference in what you propose here is that commands
> are queued in pre-translated form to be injected (cheaply) during
> scheduling as opposed to being left on the guest queue and translated
> directly into the pits queue.
>
> I think `INT` vs `CREADR` scheduling is largely orthogonal to that.
>
> Julien proposed moving scheduling to a softirq, which gets it out of IRQ
> context (good) but does necessarily account the translation to the
> guest, which is a benefit of your approach. (I think things wihch happen
> in a sortirq are implicitly accounted  to current, whoever that may be)
>
   one softirq that looks at the all the vITS and posts the commands to pITS?
or one softirq per vITS?

> On the downside pretranslation adds memory overhead and reintroduces the
> issue of a potentially long synchronous translation during `CWRITER`
> handling.

   Memory that is allocated is freed after completion of that batch.
  The translation duration depends on how many commands guest is
writing before updated CWRITER.

>
> We could pretranslate a batch of commands into a s/w queue rather than
> into the pits queue, but then we are back to where do we refill that
> queue from.
>
> The first draft wasn't particular clear on when translation occurs
> (although I intended it to be during scheduling). I shall add some
> treatment of that to the next draft.
>>
>> > * On any interrupt injection arising from a guests use of the `INT`
>> >   command; (XXX perhaps, see discussion under Completion)
>> >
>> > Each scheduling pass will:
>> >
>> > * Read the physical `CREADR`;
>> > * For each command between `pits.last_creadr` and the new `CREADR`
>> >   value process completion of that command and update the
>> >   corresponding `vits_cq.creadr`.
>> > * Attempt to refill the pITS Command Queue (see below).
>> >
>> > ### Filling the pITS Command Queue.
>> >
>> > Various algorithms could be used here. For now a simple proposal is
>> > to traverse the `pits.schedule_list` starting from where the last
>> > refill finished (i.e not from the top of the list each time).
>> >
>> > If a `vits_cq` has no pending commands then it is removed from the
>> > list.
>> >
>> > If a `vits_cq` has some pending commands then `min(pits-free-slots,
>> > vits-outstanding, VITS_BATCH_SIZE)` will be taken from the vITS
>> > command queue, translated and placed onto the pITS
>> > queue. `vits_cq.progress` will be updated to reflect this.
>> >
>> > Each `vits_cq` is handled in turn in this way until the pITS Command
>> > Queue is full or there are no more outstanding commands.
>> >
>> > There will likely need to be a data structure which shadows the pITS
>> > Command Queue slots with references to the `vits_cq` which has a
>> > command currently occupying that slot and corresponding the index into
>> > the virtual command queue, for use when completing a command.
>> >
>> > `VITS_BATCH_SIZE` should be small, TBD say 4 or 8.
>> >
>> > Possible simplification: If we arrange that no guest ever has multiple
>> > batches in flight (which can occur if we wrap around the list several
>> > times) then we may be able to simplify the book keeping
>> > required. However this may need some careful thought wrt fairness for
>> > guests submitting frequent small batches of commands vs those sending
>> > large batches.
>>
>>   If one LPI of the dummy device assigned to one VM, then book keeping
>> per vITS becomes simple
>
> What dummy device do you mean? What simplifications does it imply?
>

  I mean fake device (non-existent device)  to generate completion INT.
Using unique completion INT for every vITS, then book keeping would be
simple. This helps to identify vITS on receiving completion INT (Completion INT
<=> vITS mapping)

>>
>> >
>> > ### Completion
>> >
>> > It is expected that commands will normally be completed (resulting in
>> > an update of the corresponding `vits_cq.creadr`) via guest read from
>> > `CREADR`. This will trigger a scheduling pass which will ensure the
>> > `vits_cq.creadr` value is up to date before it is returned.
>> >
>>     If guest is CREADR to know completion of command, no need
>> of scheduling pass if INT is used.
>
> We cannot know apriori which scheme a guest is going to use, nor do we
> have the freedom to mandate a particular scheme, or even that the guest
> uses the same scheme for every batch of commands.
>
> So we need to design a system which works whether all guests use only
> INT or all guests using only CREADR polling or anything in between.
>
> A scheduling pass is not needed on INT injection (either Xen's or the
> guests) in order to update `CREADR` (as you suggest), however it may be
> necessary in order to keep the pITS command queue moving by scheduling
> any outstanding commands. Consider the case of a guest which receives an
> INT but does not subsequently read `CREADR` (at all or in a timely
> manner).

  Scheduling outstanding commands and updating CREADER
is always done by Xen's completion INT.
So even if guest does not read CREADER it does not matter.

One corner case I think of is if guest is using INT method to know the
completion of command and if guest's INT command is received before
Xen's completion INT arrives, in that case guest might see old CREADER.
To handle this scenario, we can prefix Xen's completion INT before guest INT
command.

>
>> > A guest which does completion via the use of `INT` cannot observe
>> > `CREADR` without reading it, so updating on read from `CREADR`
>> > suffices from the point of view of the guests observation of the
>> > state. (Of course we will inject the interrupt at the designated point
>> > and the guest may well then read `CREADR`)
>>
>>    Append Xen completion INT before guest INT command which
>> will update CREADER correctly before guest receives INT
>
> That means two interrupts. And there is no need because even with the
> guest's own completion INT it won't see things until it reads CREADR
> itself.
>
>> > However in order to keep the pITS Command Queue moving along we need
>> > to consider what happens if there are no `INT` based events nor reads
>> > from `CREADR` to drive completion and therefore refilling of the Queue
>> > with other outstanding commands.
>> >
>> > A guest which enqueues some commands and then never checks for
>> > completion cannot itself block things because any other guest which
>> > reads `CREADR` will drive completion. However if _no_ guest reads from
>> > `CREADR` then completion will not occur and this must be dealt with.
>> >
>>    Do you mean CREADR of guest should check all the vITS of other
>> guests to post pending commands?
>
> In the proposal `CREADR` kicks off a scheduling pass, which is
> independent of any particular vITS and operates only on the list of
> scheduled vits, decoupling the vits from the pits scheduling.
>
>>
>> > Even if we include completion on `INT`-base interrupt injection then
>> > it is possible that the pITS queue may not contain any such
>> > interrupts, either because no guest is using them or because the
>> > batching means that none of them are enqueued on the active ring at
>> > the moment.
>> >
>> > So we need a fallback to ensure that queue keeps moving. There are
>> > several options:
>> >
>> > * A periodic timer in Xen which runs whenever there are outstanding
>> >   commands in the pITS. This is simple but pretty sucky.
>> > * Xen injects its own `INT` commands into the pITS ring. This requires
>> >   figuring out a device ID to use.
>> >
>> > The second option is likely to be preferable if the issue of selecting
>> > a device ID can be addressed.
>> >
>> > A secondary question is when these `INT` commands should be inserted
>> > into the command stream:
>> >
>> > * After each batch taken from a single `vits_cq`;
>> > * After each scheduling pass;
>> > * One active in the command stream at any given time;
>> >
>> > The latter should be sufficient, by arranging to insert a `INT` into
>> > the stream at the end of any scheduling pass which occurs while there
>> > is not a currently outstanding `INT` we have sufficient backstop to
>> > allow us to refill the ring.
>> >
>> > This assumes that there is no particular benefit to keeping the
>> > `CWRITER` rolling ahead of the pITS's actual processing. This is true
>> > because the IRS operates on commands in the order they appear in the
>> > queue, so there is no need to maintain a runway ahead of the ITS
>> > processing. (XXX If this is a concern perhaps the INT could be
>> > inserted at the head of the final batch of commands in a scheduling
>> > pass instead of the tail).
>> >
>> > Xen itself should never need to issue an associated `SYNC` command,
>> > since the individual guests would need to issue those themselves when
>> > they care. The `INT` only serves to allow Xen to enqueue new commands
>> > when there is space on the ring, it has no interest itself on the
>> > actual completion.
>> >
>> > ### Locking
>> >
>> > It may be preferable to use `atomic_t` types for various fields
>> > (e.g. `vits_cq.creadr`) in order to reduce the amount and scope of
>> > locking required.
>> >
>> > ### Multiple vITS instances in a single guest
>> >
>> > As described above each vITS maps to exactly one pITS (while each pITS
>> > servers multiple vITSs).
>> >
>>
>>   IMO, one vITS per domain should be OK. For each command based
>> on the device ID, VITS will query PCI fwk, to know physical ITS
>> on which this device is attached and command will be sent to particular
>> pITS.
>>
>> There are some expection like SYNC, INVALL which does not have
>> device id. In this case these commands are sent on all pITS in the platform.
>> (XXX: If a command is sent to all pITS, how to identify if command is
>> processed on all pITS?.)
>
> That's one potential issue. I mentioned a couple of others in my reply
> to Julien just now.
>
> Draft B will have more discussion of these cases, but so far no firm
> solution I think.
>
> Ian.
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 12:03               ` Julien Grall
@ 2015-05-15 12:47                 ` Vijay Kilari
  2015-05-15 12:52                   ` Julien Grall
  2015-05-15 12:53                   ` Ian Campbell
  0 siblings, 2 replies; 77+ messages in thread
From: Vijay Kilari @ 2015-05-15 12:47 UTC (permalink / raw)
  To: Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Fri, May 15, 2015 at 5:33 PM, Julien Grall <julien.grall@citrix.com> wrote:
> On 15/05/15 12:30, Ian Campbell wrote:
>>> Handling of Single vITS and multipl pITS can be made simple.
>>>
>>> All ITS commands except SYNC & INVALL has device id which will
>>> help us to know to which pITS it should be sent.
>>>
>>> SYNC & INVALL can be dropped by Xen on Guest request
>>>  and let Xen append where ever SYNC & INVALL is required.
>>> (Ex; Linux driver adds SYNC for required commands).
>>> With this assumption, all ITS commands are mapped to pITS
>>> and no need of synchronization across pITS
>>
>> You've ignored the second bullet its three sub-bullets, I think.
>
   Why can't we group the batch of commands based on pITS it has
to be sent?.

> Aside ignoring the second bullet it's not possible to drop like that a
> SYNC/INVALL command sent be the guest. How can you decide when a SYNC is
> required or not? Why dropping "optional" SYNC would be fine? The spec
> only says "This command specifies that all actions for the specified
> re-distributor must be completed"...

 If Xen is sending SYNC/INVALL commands to pITS based on the commands
Xen is sending on pITS, there is no harm in ignoring guest commands.

SYNC/INVALL are always depends on previous ITS commands.
IMO, Alone these commands does not have any significance.

>
> Linux is not a good example for respecting the spec. Developers may
> decide to put SYNC differently in new necessary place and we won't be
> able to handle it correctly in Xen (see the vGICv3 re-dist example...).
>
> If we go on one vITS per multiple pITS we would have to send the command
> SYNC/INVALL to every pITS.
>
> Regards,
>
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 12:47                 ` Vijay Kilari
@ 2015-05-15 12:52                   ` Julien Grall
  2015-05-15 12:53                   ` Ian Campbell
  1 sibling, 0 replies; 77+ messages in thread
From: Julien Grall @ 2015-05-15 12:52 UTC (permalink / raw)
  To: Vijay Kilari, Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On 15/05/15 13:47, Vijay Kilari wrote:
>> Aside ignoring the second bullet it's not possible to drop like that a
>> SYNC/INVALL command sent be the guest. How can you decide when a SYNC is
>> required or not? Why dropping "optional" SYNC would be fine? The spec
>> only says "This command specifies that all actions for the specified
>> re-distributor must be completed"...
> 
>  If Xen is sending SYNC/INVALL commands to pITS based on the commands
> Xen is sending on pITS, there is no harm in ignoring guest commands.
> 
> SYNC/INVALL are always depends on previous ITS commands.
> IMO, Alone these commands does not have any significance.

The SYNC command ensure that any commands before it has been completed...

The guest can decide to put one after only one command or after a batch
of command.

You have to respect it and not let Xen guess when it's necessary to have
one.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 12:47                 ` Vijay Kilari
  2015-05-15 12:52                   ` Julien Grall
@ 2015-05-15 12:53                   ` Ian Campbell
  2015-05-15 13:14                     ` Vijay Kilari
  1 sibling, 1 reply; 77+ messages in thread
From: Ian Campbell @ 2015-05-15 12:53 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On Fri, 2015-05-15 at 18:17 +0530, Vijay Kilari wrote:
> On Fri, May 15, 2015 at 5:33 PM, Julien Grall <julien.grall@citrix.com> wrote:
> > On 15/05/15 12:30, Ian Campbell wrote:
> >>> Handling of Single vITS and multipl pITS can be made simple.
> >>>
> >>> All ITS commands except SYNC & INVALL has device id which will
> >>> help us to know to which pITS it should be sent.
> >>>
> >>> SYNC & INVALL can be dropped by Xen on Guest request
> >>>  and let Xen append where ever SYNC & INVALL is required.
> >>> (Ex; Linux driver adds SYNC for required commands).
> >>> With this assumption, all ITS commands are mapped to pITS
> >>> and no need of synchronization across pITS
> >>
> >> You've ignored the second bullet its three sub-bullets, I think.
> >
>    Why can't we group the batch of commands based on pITS it has
> to be sent?.

Are you suggesting that each batch we send should be synchronous? (i.e.
end with SYNC+INT) That doesn't seem at all desirable.

> > Aside ignoring the second bullet it's not possible to drop like that a
> > SYNC/INVALL command sent be the guest. How can you decide when a SYNC is
> > required or not? Why dropping "optional" SYNC would be fine? The spec
> > only says "This command specifies that all actions for the specified
> > re-distributor must be completed"...
> 
>  If Xen is sending SYNC/INVALL commands to pITS based on the commands
> Xen is sending on pITS, there is no harm in ignoring guest commands.
> 
> SYNC/INVALL are always depends on previous ITS commands.
> IMO, Alone these commands does not have any significance.
> 
> >
> > Linux is not a good example for respecting the spec. Developers may
> > decide to put SYNC differently in new necessary place and we won't be
> > able to handle it correctly in Xen (see the vGICv3 re-dist example...).
> >
> > If we go on one vITS per multiple pITS we would have to send the command
> > SYNC/INVALL to every pITS.
> >
> > Regards,
> >
> > --
> > Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 12:19           ` Julien Grall
@ 2015-05-15 12:58             ` Ian Campbell
  2015-05-15 13:24               ` Julien Grall
  0 siblings, 1 reply; 77+ messages in thread
From: Ian Campbell @ 2015-05-15 12:58 UTC (permalink / raw)
  To: Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Fri, 2015-05-15 at 13:19 +0100, Julien Grall wrote:
> On 15/05/15 11:59, Ian Campbell wrote:
> >>>> AFAIU the process suggested, Xen will inject small batch as long as the
> >>>> physical command queue is not full.
> >>>
> >>>> Let's take a simple case, only a single domain is using vITS on the
> >>>> platform. If it injects a huge number of commands, Xen will split it
> >>>> with lots of small batch. All batch will be injected in the same pass as
> >>>> long as it fits in the physical command queue. Am I correct?
> >>>
> >>> That's how it is currently written, yes. With the "possible
> >>> simplification" above the answer is no, only a batch at a time would be
> >>> written for each guest.
> >>>
> >>> BTW, it doesn't have to be a single guest, the sum total of the
> >>> injections across all guests could also take a similar amount of time.
> >>> Is that a concern?
> >>
> >> Yes, the example with only a guest was easier to explain.
> > 
> > So as well as limiting the number of commands in each domains batch we
> > also want to limit the total number of batches?
> 
> Right. We want to have a "short" scheduling pass no matter the size of
> the queue.
> 
> >>>> I think we have to restrict total number of batch (i.e for all the
> >>>> domain) injected in a same scheduling pass.
> >>>>
> >>>> I would even tend to allow only one in flight batch per domain. That
> >>>> would limit the possible problem I pointed out.
> >>>
> >>> This is the "possible simplification" I think. Since it simplifies other
> >>> things (I think) as well as addressing this issue I think it might be a
> >>> good idea.
> >>
> >> With the limitation of command send per batch, would the fairness you
> >> were talking on the design doc still required?
> > 
> > I think we still want to schedule the guest's in a strict round robin
> > manner, to avoid one guest monopolising things.
> 
> I agree, although I was talking about the fairness you mentionned in
> "However this may need some careful thought wrt fairness for
> guests submitting frequent small batches of commands vs those sending
> large batches."

Ah, yes.

The trade off here is between number of INT+scheduling passes vs time
spent in each int pass. Smaller batches would mean more ints and
overhead there.

So I think limiting batch sizes is ok, but we may need to tweak the
sizing a bit based on experience.

> >>>>> Therefore it is proposed that the restriction that a single vITS maps
> >>>>> to one pITS be retained. If a guest requires access to devices
> >>>>> associated with multiple pITSs then multiple vITS should be
> >>>>> configured.
> >>>>
> >>>> Having multiple vITS per domain brings other issues:
> >>>> 	- How do you know the number of ITS to describe in the device tree at boot?
> >>>
> >>> I'm not sure. I don't think 1 vs N is very different from the question
> >>> of 0 vs 1 though, somehow the tools need to know about the pITS setup.
> >>
> >> I don't see why the tools would require to know the pITS setup.
> > 
> > Even with only a single vits the tools need to know if the system has 0,
> > 1, or more pits, to know whether to vreate a vits at all or not.
> 
> In the 1 vITS solution no, it's only necessary to add a new gic define
> for the gic_version field in xen_arch_domainconfig.

Would we expose a vITS to guests on a host which has no pITS at all?
What would happen if the guest tried to use it? That's the 0 vITS case,
and once you can distinguish 0 from 1 distinguishing larger numbers
isn't a huge stretch.

> >> If we are going to expose multiple vITS to the guest, we should only use
> >> vITS for guest using PCI passthrough. This is because migration won't be
> >> compatible with it.
> > 
> > It would be possible to support one s/w only vits for migration, i.e the
> > evtchn thing at the end, but for the general case that is correct. On
> > x86 I believe that if you hot unplug all passthrough devices you can
> > migrate and then plug in other devices at the other end.
> 
> What about migration on platform having fewer/more pITS (AFAIU on cavium
> it may be possible because there is only one node)? If we want to
> migrate vITS we should have to handle case where there is a mismatch.
> Which brings to the solution with one vITS.

At the moment I don't think we are expecting to do heterogeneous
migration. But perhaps we should plan for that eventuality, since one
day it seems people would want to at least move to a newer version of
the same silicon family for upgrade purposes.

> As said your event channel paragraph, we should put aside the event
> channel injected by the vITS for now. It was only a suggestion and it
> will require more though that the vITS emulation.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 12:38       ` Vijay Kilari
@ 2015-05-15 13:06         ` Ian Campbell
  2015-05-15 13:17         ` Julien Grall
  1 sibling, 0 replies; 77+ messages in thread
From: Ian Campbell @ 2015-05-15 13:06 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Stefano Stabellini

On Fri, 2015-05-15 at 18:08 +0530, Vijay Kilari wrote:
> On Fri, May 15, 2015 at 4:58 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> > On Wed, 2015-05-13 at 21:57 +0530, Vijay Kilari wrote:
> >> > * On receipt of an interrupt notification arising from Xen's own use
> >> >   of `INT`; (see discussion under Completion)
> >>
> >>     If INT notification method is used, then I don't think there is need
> >> for pITS scheduling on CREADER read.
> >>
> >> As we discussed in patch #13. Below steps should be suffice to virtualize
> >> command queue.
> >>
> >> 1) On each guest CWRITER update, Read batch ( 'm' commands) of commands
> >>     and translate it and put on pITS schedule list. If there are more than 'm'
> >>     commands create m/n entries in schedule list. Append INT command for each
> >>      schedule list entry
> >
> > How many INT commands do you mean here?
> 
>    One INT command (Xen's completion INT) per batch
> 
> >
> >>      1a) If there is no ongoing command from this vITS on physical queue,
> >>            send to physical queue.
> >>      1b) If there is ongoing command return to guest.
> >> 2) On receiving completion interrupt, update CREADER of guest and post next
> >>     command from schedule list to physical queue.
> >>
> >> With this,
> >>    - There will be no overhead of translating command in interrupt context
> >> which is quite heavy because translating ITS command requires validating
> >> and updating interval ITS structures.
> >
> > Can you give some examples of the heaviest translations please so I can
> > get a feel for actually how expensive we are talking here.
> >
>     For example to translate MAPVI device_ID, event_ID, vID, vCID
[...]

Thanks.

> >>    - Always only one request from guest will be posted to physical queue
> >>    - Even in guest floods with large number of commands, all the commands
> >>      will be translated and queued in schedule list and posted batch by batch
> >>    - Scheduling pass is called only on CWRITER & completion INT.
> >
> > I think the main difference in what you propose here is that commands
> > are queued in pre-translated form to be injected (cheaply) during
> > scheduling as opposed to being left on the guest queue and translated
> > directly into the pits queue.
> >
> > I think `INT` vs `CREADR` scheduling is largely orthogonal to that.
> >
> > Julien proposed moving scheduling to a softirq, which gets it out of IRQ
> > context (good) but does necessarily account the translation to the
> > guest, which is a benefit of your approach. (I think things wihch happen
> > in a sortirq are implicitly accounted  to current, whoever that may be)
> >
>    one softirq that looks at the all the vITS and posts the commands to pITS?
> or one softirq per vITS?

The former.

However in draft B I proposed that we might need something more like the
latter for accounting purposes, either the actual scheduling pass or a
per-vITS translation pass.

> > On the downside pretranslation adds memory overhead and reintroduces the
> > issue of a potentially long synchronous translation during `CWRITER`
> > handling.
> 
>    Memory that is allocated is freed after completion of that batch.

It is still overhead.

>   The translation duration depends on how many commands guest is
> writing before updated CWRITER.

Xen cannot trust a guest to not write an enourmous batch. We need to
think in terms of malicious guest behaviour, i.e. deliberately try to
subvert or DoS the system, we cannot assume a well behaved guest.

> >> > Possible simplification: If we arrange that no guest ever has multiple
> >> > batches in flight (which can occur if we wrap around the list several
> >> > times) then we may be able to simplify the book keeping
> >> > required. However this may need some careful thought wrt fairness for
> >> > guests submitting frequent small batches of commands vs those sending
> >> > large batches.
> >>
> >>   If one LPI of the dummy device assigned to one VM, then book keeping
> >> per vITS becomes simple
> >
> > What dummy device do you mean? What simplifications does it imply?
> >
> 
>   I mean fake device (non-existent device)  to generate completion INT.
> Using unique completion INT for every vITS, then book keeping would be
> simple. This helps to identify vITS on receiving completion INT (Completion INT
> <=> vITS mapping)

It already seem interesting to find one INT, would finding N (for
potentially large N) be possible?

However given the synchronous nature of things I think one suffices, you
can fairly easily keep the vits on a list in the order they appear on
the ring etc.

> 
> >>
> >> >
> >> > ### Completion
> >> >
> >> > It is expected that commands will normally be completed (resulting in
> >> > an update of the corresponding `vits_cq.creadr`) via guest read from
> >> > `CREADR`. This will trigger a scheduling pass which will ensure the
> >> > `vits_cq.creadr` value is up to date before it is returned.
> >> >
> >>     If guest is CREADR to know completion of command, no need
> >> of scheduling pass if INT is used.
> >
> > We cannot know apriori which scheme a guest is going to use, nor do we
> > have the freedom to mandate a particular scheme, or even that the guest
> > uses the same scheme for every batch of commands.
> >
> > So we need to design a system which works whether all guests use only
> > INT or all guests using only CREADR polling or anything in between.
> >
> > A scheduling pass is not needed on INT injection (either Xen's or the
> > guests) in order to update `CREADR` (as you suggest), however it may be
> > necessary in order to keep the pITS command queue moving by scheduling
> > any outstanding commands. Consider the case of a guest which receives an
> > INT but does not subsequently read `CREADR` (at all or in a timely
> > manner).
> 
>   Scheduling outstanding commands and updating CREADER
> is always done by Xen's completion INT.
> So even if guest does not read CREADER it does not matter.
> 
> One corner case I think of is if guest is using INT method to know the
> completion of command and if guest's INT command is received before
> Xen's completion INT arrives, in that case guest might see old CREADER.
> To handle this scenario, we can prefix Xen's completion INT before guest INT
> command.

Or do the processing on guest INT command too, which is in the draft
proposal I think.

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 12:53                   ` Ian Campbell
@ 2015-05-15 13:14                     ` Vijay Kilari
  2015-05-15 13:24                       ` Ian Campbell
  0 siblings, 1 reply; 77+ messages in thread
From: Vijay Kilari @ 2015-05-15 13:14 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On Fri, May 15, 2015 at 6:23 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Fri, 2015-05-15 at 18:17 +0530, Vijay Kilari wrote:
>> On Fri, May 15, 2015 at 5:33 PM, Julien Grall <julien.grall@citrix.com> wrote:
>> > On 15/05/15 12:30, Ian Campbell wrote:
>> >>> Handling of Single vITS and multipl pITS can be made simple.
>> >>>
>> >>> All ITS commands except SYNC & INVALL has device id which will
>> >>> help us to know to which pITS it should be sent.
>> >>>
>> >>> SYNC & INVALL can be dropped by Xen on Guest request
>> >>>  and let Xen append where ever SYNC & INVALL is required.
>> >>> (Ex; Linux driver adds SYNC for required commands).
>> >>> With this assumption, all ITS commands are mapped to pITS
>> >>> and no need of synchronization across pITS
>> >>
>> >> You've ignored the second bullet its three sub-bullets, I think.
>> >
>>    Why can't we group the batch of commands based on pITS it has
>> to be sent?.
>
> Are you suggesting that each batch we send should be synchronous? (i.e.
> end with SYNC+INT) That doesn't seem at all desirable.

Not only at the end of batch, SYNC can be appended based on every
command within the batch.

Also to handle second bullet, where a batch of commands might be
sent on multple pITS. In that case batch of ITS commands is split
across pITS and we have
to wait for all the pITS to complete. Managing this would be difficult.
For this I propose, batch can be created/split such that each batch
contains commands related to one pITS. But it leads to small batch of commands.

>> > Aside ignoring the second bullet it's not possible to drop like that a
>> > SYNC/INVALL command sent be the guest. How can you decide when a SYNC is
>> > required or not? Why dropping "optional" SYNC would be fine? The spec
>> > only says "This command specifies that all actions for the specified
>> > re-distributor must be completed"...
>>
>>  If Xen is sending SYNC/INVALL commands to pITS based on the commands
>> Xen is sending on pITS, there is no harm in ignoring guest commands.
>>
>> SYNC/INVALL are always depends on previous ITS commands.
>> IMO, Alone these commands does not have any significance.
>>
>> >
>> > Linux is not a good example for respecting the spec. Developers may
>> > decide to put SYNC differently in new necessary place and we won't be
>> > able to handle it correctly in Xen (see the vGICv3 re-dist example...).
>> >
>> > If we go on one vITS per multiple pITS we would have to send the command
>> > SYNC/INVALL to every pITS.
>> >
>> > Regards,
>> >
>> > --
>> > Julien Grall
>
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 12:38       ` Vijay Kilari
  2015-05-15 13:06         ` Ian Campbell
@ 2015-05-15 13:17         ` Julien Grall
  1 sibling, 0 replies; 77+ messages in thread
From: Julien Grall @ 2015-05-15 13:17 UTC (permalink / raw)
  To: Vijay Kilari, Ian Campbell
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Stefano Stabellini

On 15/05/15 13:38, Vijay Kilari wrote:
>> Can you give some examples of the heaviest translations please so I can
>> get a feel for actually how expensive we are talking here.
>>
>     For example to translate MAPVI device_ID, event_ID, vID, vCID
> 
>     1) Read from vITS command queue

Not expensive

>     2) Validate device_ID is valid by looking at device list attached
> to that domain (vITS)

It can be reduced by using a tree rather than a list.

>     3) Validate vCID (virtual Collection ID) by checking against
> re-distributor address/cpu numbers
>         of this domain

Validating vCID can be O(1) if you use only the cpu numbers (see
GITS_TYPER.PTA = 0).

>     4) Allocate physical LPI for the vID (virtual LPI) from lpi map of
> this device
>            - Check if virtual LPI is already allocated from this device.
>            - If not allocate it

Not expensive. Only looking in a bitmap.

>            - Update lpi entries for this device

What do you mean by updating the LPI entries for this device?

>     5) Allocate memory for physical LPI descriptor (Add radix tree
> entry) and populate it
>     6) Call route_irq_to_guest() for this LPI

This could be done earlier by pre-allocating a chunk of LPIs.

If memory usage is a concern, I think we could allocate one IRQ
descriptor per chunk of LPIs and manage it ourself.

>     7) Format physical ITS command and send to pITS

Not expensive.

Overall, I don't think command are so expensive if we take time to think
how to optimize the emulation.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 13:14                     ` Vijay Kilari
@ 2015-05-15 13:24                       ` Ian Campbell
  2015-05-15 13:44                         ` Julien Grall
  0 siblings, 1 reply; 77+ messages in thread
From: Ian Campbell @ 2015-05-15 13:24 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On Fri, 2015-05-15 at 18:44 +0530, Vijay Kilari wrote:
> On Fri, May 15, 2015 at 6:23 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> > On Fri, 2015-05-15 at 18:17 +0530, Vijay Kilari wrote:
> >> On Fri, May 15, 2015 at 5:33 PM, Julien Grall <julien.grall@citrix.com> wrote:
> >> > On 15/05/15 12:30, Ian Campbell wrote:
> >> >>> Handling of Single vITS and multipl pITS can be made simple.
> >> >>>
> >> >>> All ITS commands except SYNC & INVALL has device id which will
> >> >>> help us to know to which pITS it should be sent.
> >> >>>
> >> >>> SYNC & INVALL can be dropped by Xen on Guest request
> >> >>>  and let Xen append where ever SYNC & INVALL is required.
> >> >>> (Ex; Linux driver adds SYNC for required commands).
> >> >>> With this assumption, all ITS commands are mapped to pITS
> >> >>> and no need of synchronization across pITS
> >> >>
> >> >> You've ignored the second bullet its three sub-bullets, I think.
> >> >
> >>    Why can't we group the batch of commands based on pITS it has
> >> to be sent?.
> >
> > Are you suggesting that each batch we send should be synchronous? (i.e.
> > end with SYNC+INT) That doesn't seem at all desirable.
> 
> Not only at the end of batch, SYNC can be appended based on every
> command within the batch.

Could be, but something to avoid I think?

> Also to handle second bullet, where a batch of commands might be
> sent on multple pITS. In that case batch of ITS commands is split
> across pITS and we have
> to wait for all the pITS to complete. Managing this would be difficult.
> For this I propose, batch can be created/split such that each batch
> contains commands related to one pITS. But it leads to small batch of commands.

That's not a bad idea, commonly I would expect commands for one device
to come in a short batch anyway. So long as the thing does cope if not I
think this might work.

> 
> >> > Aside ignoring the second bullet it's not possible to drop like that a
> >> > SYNC/INVALL command sent be the guest. How can you decide when a SYNC is
> >> > required or not? Why dropping "optional" SYNC would be fine? The spec
> >> > only says "This command specifies that all actions for the specified
> >> > re-distributor must be completed"...
> >>
> >>  If Xen is sending SYNC/INVALL commands to pITS based on the commands
> >> Xen is sending on pITS, there is no harm in ignoring guest commands.
> >>
> >> SYNC/INVALL are always depends on previous ITS commands.
> >> IMO, Alone these commands does not have any significance.
> >>
> >> >
> >> > Linux is not a good example for respecting the spec. Developers may
> >> > decide to put SYNC differently in new necessary place and we won't be
> >> > able to handle it correctly in Xen (see the vGICv3 re-dist example...).
> >> >
> >> > If we go on one vITS per multiple pITS we would have to send the command
> >> > SYNC/INVALL to every pITS.
> >> >
> >> > Regards,
> >> >
> >> > --
> >> > Julien Grall
> >
> >
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 12:58             ` Ian Campbell
@ 2015-05-15 13:24               ` Julien Grall
  2015-05-19 12:14                 ` Ian Campbell
  0 siblings, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-15 13:24 UTC (permalink / raw)
  To: Ian Campbell, Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

Hi Ian,

On 15/05/15 13:58, Ian Campbell wrote:
>>>>>>> Therefore it is proposed that the restriction that a single vITS maps
>>>>>>> to one pITS be retained. If a guest requires access to devices
>>>>>>> associated with multiple pITSs then multiple vITS should be
>>>>>>> configured.
>>>>>>
>>>>>> Having multiple vITS per domain brings other issues:
>>>>>> 	- How do you know the number of ITS to describe in the device tree at boot?
>>>>>
>>>>> I'm not sure. I don't think 1 vs N is very different from the question
>>>>> of 0 vs 1 though, somehow the tools need to know about the pITS setup.
>>>>
>>>> I don't see why the tools would require to know the pITS setup.
>>>
>>> Even with only a single vits the tools need to know if the system has 0,
>>> 1, or more pits, to know whether to vreate a vits at all or not.
>>
>> In the 1 vITS solution no, it's only necessary to add a new gic define
>> for the gic_version field in xen_arch_domainconfig.
> 
> Would we expose a vITS to guests on a host which has no pITS at all?

No, Xen will check if we can support vITS. See an example with my "GICv2
on GICv3" series. Obviously, we don't allow vGICv3 on GICv2.

>>>> If we are going to expose multiple vITS to the guest, we should only use
>>>> vITS for guest using PCI passthrough. This is because migration won't be
>>>> compatible with it.
>>>
>>> It would be possible to support one s/w only vits for migration, i.e the
>>> evtchn thing at the end, but for the general case that is correct. On
>>> x86 I believe that if you hot unplug all passthrough devices you can
>>> migrate and then plug in other devices at the other end.
>>
>> What about migration on platform having fewer/more pITS (AFAIU on cavium
>> it may be possible because there is only one node)? If we want to
>> migrate vITS we should have to handle case where there is a mismatch.
>> Which brings to the solution with one vITS.
> 
> At the moment I don't think we are expecting to do heterogeneous
> migration. But perhaps we should plan for that eventuality, since one
> day it seems people would want to at least move to a newer version of
> the same silicon family for upgrade purposes.

I was think migration within the same version of the silicon.

AFAICT, cavium can be shipped with 1 or 2 nodes. This will result to
have 1 or 2 ITS.

Migration wouldn't be possible between servers using different number of
nodes.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 13:24                       ` Ian Campbell
@ 2015-05-15 13:44                         ` Julien Grall
  2015-05-15 14:04                           ` Vijay Kilari
  2015-05-15 14:05                           ` Ian Campbell
  0 siblings, 2 replies; 77+ messages in thread
From: Julien Grall @ 2015-05-15 13:44 UTC (permalink / raw)
  To: Ian Campbell, Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On 15/05/15 14:24, Ian Campbell wrote:
> On Fri, 2015-05-15 at 18:44 +0530, Vijay Kilari wrote:
>> On Fri, May 15, 2015 at 6:23 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
>>> On Fri, 2015-05-15 at 18:17 +0530, Vijay Kilari wrote:
>>>> On Fri, May 15, 2015 at 5:33 PM, Julien Grall <julien.grall@citrix.com> wrote:
>>>>> On 15/05/15 12:30, Ian Campbell wrote:
>>>>>>> Handling of Single vITS and multipl pITS can be made simple.
>>>>>>>
>>>>>>> All ITS commands except SYNC & INVALL has device id which will
>>>>>>> help us to know to which pITS it should be sent.
>>>>>>>
>>>>>>> SYNC & INVALL can be dropped by Xen on Guest request
>>>>>>>  and let Xen append where ever SYNC & INVALL is required.
>>>>>>> (Ex; Linux driver adds SYNC for required commands).
>>>>>>> With this assumption, all ITS commands are mapped to pITS
>>>>>>> and no need of synchronization across pITS
>>>>>>
>>>>>> You've ignored the second bullet its three sub-bullets, I think.
>>>>>
>>>>    Why can't we group the batch of commands based on pITS it has
>>>> to be sent?.
>>>
>>> Are you suggesting that each batch we send should be synchronous? (i.e.
>>> end with SYNC+INT) That doesn't seem at all desirable.
>>
>> Not only at the end of batch, SYNC can be appended based on every
>> command within the batch.
> 
> Could be, but something to avoid I think?

That would slow down the ITS processing (SYNC is waiting that the
previous command has executed).

Also, what about INTALL? Sending it everytime would be horrible for the
performance because it flush the ITS cache.

>> Also to handle second bullet, where a batch of commands might be
>> sent on multple pITS. In that case batch of ITS commands is split
>> across pITS and we have
>> to wait for all the pITS to complete. Managing this would be difficult.
>> For this I propose, batch can be created/split such that each batch
>> contains commands related to one pITS. But it leads to small batch of commands.

If I understand correctly, even with multiple pITS only a single batch
per domain would be in-flight, right?

> That's not a bad idea, commonly I would expect commands for one device
> to come in a short batch anyway. So long as the thing does cope if not I
> think this might work.

This doesn't work well, we will need to read/validate twice a command.
The first time to get the devID and notice we need to create a separate
batch, the second time to effectively queue the command.

Given that validation is the part where the emulation will spend most of
the time, we should avoid to do it twice.

Although, if we cache the validation we may send the wrong command/data
if the guest decide to write in the command queue at the same time.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 13:44                         ` Julien Grall
@ 2015-05-15 14:04                           ` Vijay Kilari
  2015-05-15 15:05                             ` Julien Grall
  2015-05-15 14:05                           ` Ian Campbell
  1 sibling, 1 reply; 77+ messages in thread
From: Vijay Kilari @ 2015-05-15 14:04 UTC (permalink / raw)
  To: Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Fri, May 15, 2015 at 7:14 PM, Julien Grall <julien.grall@citrix.com> wrote:
> On 15/05/15 14:24, Ian Campbell wrote:
>> On Fri, 2015-05-15 at 18:44 +0530, Vijay Kilari wrote:
>>> On Fri, May 15, 2015 at 6:23 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
>>>> On Fri, 2015-05-15 at 18:17 +0530, Vijay Kilari wrote:
>>>>> On Fri, May 15, 2015 at 5:33 PM, Julien Grall <julien.grall@citrix.com> wrote:
>>>>>> On 15/05/15 12:30, Ian Campbell wrote:
>>>>>>>> Handling of Single vITS and multipl pITS can be made simple.
>>>>>>>>
>>>>>>>> All ITS commands except SYNC & INVALL has device id which will
>>>>>>>> help us to know to which pITS it should be sent.
>>>>>>>>
>>>>>>>> SYNC & INVALL can be dropped by Xen on Guest request
>>>>>>>>  and let Xen append where ever SYNC & INVALL is required.
>>>>>>>> (Ex; Linux driver adds SYNC for required commands).
>>>>>>>> With this assumption, all ITS commands are mapped to pITS
>>>>>>>> and no need of synchronization across pITS
>>>>>>>
>>>>>>> You've ignored the second bullet its three sub-bullets, I think.
>>>>>>
>>>>>    Why can't we group the batch of commands based on pITS it has
>>>>> to be sent?.
>>>>
>>>> Are you suggesting that each batch we send should be synchronous? (i.e.
>>>> end with SYNC+INT) That doesn't seem at all desirable.
>>>
>>> Not only at the end of batch, SYNC can be appended based on every
>>> command within the batch.
>>
>> Could be, but something to avoid I think?
>
> That would slow down the ITS processing (SYNC is waiting that the
> previous command has executed).
>
> Also, what about INTALL? Sending it everytime would be horrible for the
> performance because it flush the ITS cache.

INVALL is not required everytime. It can be sent only as mentioned in spec Note.
ex; MOVI

Note: this command is expected to be used by software when it changed
the re-configuration
of an LPI in memory to ensure any cached copies of the old
configuration are discarded.

>
>>> Also to handle second bullet, where a batch of commands might be
>>> sent on multple pITS. In that case batch of ITS commands is split
>>> across pITS and we have
>>> to wait for all the pITS to complete. Managing this would be difficult.
>>> For this I propose, batch can be created/split such that each batch
>>> contains commands related to one pITS. But it leads to small batch of commands.
>
> If I understand correctly, even with multiple pITS only a single batch
> per domain would be in-flight, right?
>
>> That's not a bad idea, commonly I would expect commands for one device
>> to come in a short batch anyway. So long as the thing does cope if not I
>> think this might work.
>
> This doesn't work well, we will need to read/validate twice a command.
> The first time to get the devID and notice we need to create a separate
> batch, the second time to effectively queue the command.
>
> Given that validation is the part where the emulation will spend most of
> the time, we should avoid to do it twice.
>
> Although, if we cache the validation we may send the wrong command/data
> if the guest decide to write in the command queue at the same time.

The devID in the first command of the batch will decide pITS and all
subsequent commands in the batch will be added to the same batch if the
devID is same. I don't think mapping devID to pITS can be changed by guest
at any time

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 13:44                         ` Julien Grall
  2015-05-15 14:04                           ` Vijay Kilari
@ 2015-05-15 14:05                           ` Ian Campbell
  1 sibling, 0 replies; 77+ messages in thread
From: Ian Campbell @ 2015-05-15 14:05 UTC (permalink / raw)
  To: Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Fri, 2015-05-15 at 14:44 +0100, Julien Grall wrote:
> On 15/05/15 14:24, Ian Campbell wrote:
> > On Fri, 2015-05-15 at 18:44 +0530, Vijay Kilari wrote:
> >> On Fri, May 15, 2015 at 6:23 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> >>> On Fri, 2015-05-15 at 18:17 +0530, Vijay Kilari wrote:
> >>>> On Fri, May 15, 2015 at 5:33 PM, Julien Grall <julien.grall@citrix.com> wrote:
> >>>>> On 15/05/15 12:30, Ian Campbell wrote:
> >>>>>>> Handling of Single vITS and multipl pITS can be made simple.
> >>>>>>>
> >>>>>>> All ITS commands except SYNC & INVALL has device id which will
> >>>>>>> help us to know to which pITS it should be sent.
> >>>>>>>
> >>>>>>> SYNC & INVALL can be dropped by Xen on Guest request
> >>>>>>>  and let Xen append where ever SYNC & INVALL is required.
> >>>>>>> (Ex; Linux driver adds SYNC for required commands).
> >>>>>>> With this assumption, all ITS commands are mapped to pITS
> >>>>>>> and no need of synchronization across pITS
> >>>>>>
> >>>>>> You've ignored the second bullet its three sub-bullets, I think.
> >>>>>
> >>>>    Why can't we group the batch of commands based on pITS it has
> >>>> to be sent?.
> >>>
> >>> Are you suggesting that each batch we send should be synchronous? (i.e.
> >>> end with SYNC+INT) That doesn't seem at all desirable.
> >>
> >> Not only at the end of batch, SYNC can be appended based on every
> >> command within the batch.
> > 
> > Could be, but something to avoid I think?
> 
> That would slow down the ITS processing (SYNC is waiting that the
> previous command has executed).
> 
> Also, what about INTALL? Sending it everytime would be horrible for the
> performance because it flush the ITS cache.
> 
> >> Also to handle second bullet, where a batch of commands might be
> >> sent on multple pITS. In that case batch of ITS commands is split
> >> across pITS and we have
> >> to wait for all the pITS to complete. Managing this would be difficult.
> >> For this I propose, batch can be created/split such that each batch
> >> contains commands related to one pITS. But it leads to small batch of commands.
> 
> If I understand correctly, even with multiple pITS only a single batch
> per domain would be in-flight, right?
> 
> > That's not a bad idea, commonly I would expect commands for one device
> > to come in a short batch anyway. So long as the thing does cope if not I
> > think this might work.
> 
> This doesn't work well, we will need to read/validate twice a command.
> The first time to get the devID and notice we need to create a separate
> batch, the second time to effectively queue the command.
> 
> Given that validation is the part where the emulation will spend most of
> the time, we should avoid to do it twice.

Which can trivially be arranged by not doing it the dumb way. At worst
you remember the first translation which mismatched and use it again
next time.

Or you do translates in batches into a queue and then dequeue into the
physical command queue based on the target devices.

Thinking about global commands a bit, you could make those somewhat less
painful by remembering on a per `vits_cq` basis which pits devices it
has sent commands to since the last invalidate on that device and elide
any where the guest didn't touch that pits. Doesn't help against a
malicious guest in the worst case but does improve things in the common
case.

> Although, if we cache the validation we may send the wrong command/data
> if the guest decide to write in the command queue at the same time.

A guest which modifies its command queue after having advanced CWRITER
past that point deserves whatever it gets.

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-15 11:45   ` Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling) Ian Campbell
@ 2015-05-15 14:55     ` Julien Grall
  2015-05-19 12:10       ` Ian Campbell
  0 siblings, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-15 14:55 UTC (permalink / raw)
  To: Ian Campbell, Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Stefano Stabellini

Hi Ian,

On 15/05/15 12:45, Ian Campbell wrote:
> On Tue, 2015-05-12 at 16:02 +0100, Ian Campbell wrote:
>> I've written up my thinking as a design doc below (it's pandoc and the
>> pdf version is also at
>> http://xenbits.xen.org/people/ianc/vits/draftA.pdf FWIW).
> 
> Here is a second draft based on the feedback so far. Also at
> http://xenbits.xen.org/people/ianc/vits/draftB.{pdf,html}.
> 
> So far I think we are mostly at the stage of gather open questions and
> enumerate the issues rather than actually beginning reaching any
> conclusion. That's OK (and part of the purpose).
> 
> Ian.
> -----
> 
> % Xen on ARM vITS Handling
> % Ian Campbell <ian.campbell@citrix.com>
> % Draft B
> 
> # Changelog
> 
> ## Since Draft A
> 
> * Added discussion of when/where command translation occurs.
> * Contention on scheduler lock, suggestion to use SOFTIRQ.
> * Handling of domain shutdown.
> * More detailed discussion of multiple vs single vits pros/cons.
> 
> # Introduction
> 
> ARM systems containing a GIC version 3 or later may contain one or
> more ITS logical blocks. An ITS is used to route Message Signalled
> interrupts from devices into an LPI injection on the processor.
> 
> The following summarises the ITS hardware design and serves as a set
> of assumptions for the vITS software design. (XXX it is entirely
> possible I've horribly misunderstood how this stuff fits
> together). For full details of the ITS see the "GIC Architecture
> Specification".

I read again the spec today and notice that I was wrong on the maximum
size of the command queue. The field GITS_CBASER.Size encode the number
of 4KB page minus 0. Its size is 8 bits which means the maximum size is
2^8 * 4KB = 1MB.

Given that each command is 32 bytes, we would have a maximum of 32768
commands in the queue.

Although I don't think that change the design as processing a such
number of command in one go can be very slow.

[..]

> ### Command translation
> 
> In order to virtualise the Command Queue each command must be
> translated (this is described in the GIC spec).
> 
> Translation of certain commands can be expensive (XXX citation
> needed).

The term "expensive" is subjective. I think we can end up to cheap
translation if we properly pre-allocate information (such as device,
LPIs...). We can have all the informations before the guest as boot or
during hotplug part. It wouldn't take more memory than it should use.

During command translation, we would just need to enable the device/LPIs.

The remaining expensive part would be the validation. I think we can
improve most of them of O(1) (such as collection checking) or O(log(n))
(such as device checking).

> Translation can be done in two places:
> 
> * During scheduling.
> * On write to `CWRITER`, into a per `vits_cq` queue which the
>   scheduler then propagates to the pits.
> 
> Doing the translate during scheduling means that potentially expensive
> operations may be accounted to `current`, who may have nothing to do
> with those operations (this is true whether it is IRQ context or
> SOFTIRQ context).
> 
> Doing the translate during `CWRITER` emulation accounts it to the
> right place, but introduces a potentially long synchronous operation
> which ties down a VCPU. Introducing batching here means we have
> essentially the same issue wrt when to replenish the translated queue
> as doing translate during scheduling.
> 
> Translate during `CWRITER` also has memory overheads. Unclear if they
> are at a problematic scale or not.
> 
> XXX need a solution for this.

Command translation can be improved. It may be good too add a section
explaining how translation of command foo can be done.

> ### pITS Scheduling
> 
> A pITS scheduling pass is attempted:
> 
> * On write to any virtual `CWRITER` iff that write results in there
>   being new outstanding requests for that vits;
> * On read from a virtual `CREADR` iff there are commands outstanding
>   on that vits;
> * On receipt of an interrupt notification arising from Xen's own use
>   of `INT`; (see discussion under Completion)
> * On any interrupt injection arising from a guests use of the `INT`
>   command; (XXX perhaps, see discussion under Completion)
> 
> This may result in lots of contention on the scheduler
> locking. Therefore we consider that in each case all which happens is
> triggering of a softirq which will be processed on return to guest,
> and just once even for multiple events.
> 
> Such deferal could be considered OK (XXX ???) for the `CREADR` case

deferral?

> because at worst the value read will be one cycle out of date. A guest
> which receives an `INT` notification might reasonably expect a
> subsequent read of `CREADR` to reflect that. However that should be
> covered by the softint processing which would occur on entry to the
> guest to inject the `INT`.
> 
> Each scheduling pass will:
> 
> * Read the physical `CREADR`;
> * For each command between `pits.last_creadr` and the new `CREADR`
>   value process completion of that command and update the
>   corresponding `vits_cq.creadr`.
> * Attempt to refill the pITS Command Queue (see below).

[..]

> ### Filling the pITS Command Queue.
> 
> Various algorithms could be used here. For now a simple proposal is
> to traverse the `pits.schedule_list` starting from where the last
> refill finished (i.e not from the top of the list each time).
> 
> If a `vits_cq` has no pending commands then it is removed from the
> list.
> 
> If a `vits_cq` has some pending commands then `min(pits-free-slots,
> vits-outstanding, VITS_BATCH_SIZE)` will be taken from the vITS
> command queue, translated and placed onto the pITS
> queue. `vits_cq.progress` will be updated to reflect this.
> 
> Each `vits_cq` is handled in turn in this way until the pITS Command
> Queue is full or there are no more outstanding commands.
> 
> There will likely need to be a data structure which shadows the pITS
> Command Queue slots with references to the `vits_cq` which has a
> command currently occupying that slot and corresponding the index into
> the virtual command queue, for use when completing a command.
> 
> `VITS_BATCH_SIZE` should be small, TBD say 4 or 8.
> 
> Possible simplification: If we arrange that no guest ever has multiple
> batches in flight (which can occur if we wrap around the list several
> times) then we may be able to simplify the book keeping
> required. However this may need some careful thought wrt fairness for
> guests submitting frequent small batches of commands vs those sending
> large batches.
> 
> XXX concern: Time spent filling the pITS queue could be significant if
> guests are allowed to fill the ring completely.

I guess you sent this design before the end of the discussion? I think
that limiting the number of batch/command sent per pass would allow a
small pass.

[..]

> ### Multiple vITS instances in a single guest
> 
> As described above each vITS maps to exactly one pITS (while each pITS
> serves multiple vITSs).
> 
> It could be possible to arrange that a vITS can enqueue commands to
> different pITSs depending on e.g. the device id.
> 
> However each approach has issues.
> 
> In 1 vITS per pITS:
> 
> * Exposing on vITS per pITS means that we are exposing something about

s/on/one/

>   the underlying hardware to the guest.
> * Adds complexity to the guest layout, which is right now static. How
>   do you decide the number of vITS/root controller exposed:
>     * Hotplug is tricky
> * Toolstack needs greater knowledge of the host layout
> * Given that PCI passthrough doesn't allow migration, maybe we could
>   use the layout of the hardware.
> 
> In 1 vITS for all pITS:
> 
> * What to do with global commands? Inject to all pITS and then
>   synchronise on them all finishing.
> * Handling of out of order completion of commands queued with
>   different pITS, since the vITS must appear to complete in
>   order. Apart from the book keeping question it makes scheduling more
>   interesting:
>     * What if you have a pITS with slots available, and the guest command
>       queue contains commands which could go to the pITS, but behind ones
>       which are targetting another pITS which has no slots
>     * What if one pITS is very busy and another is mostly idle and a
>       guest submits one command to the busy one (contending with other
>       guest) followed by a load of commands targeting the idle one. Those
>       commands would be held up in this situation.
>     * Reasoning about fairness may be harder.
> 
> XXX need a solution/decision here.

> In addition the introduction of direct interrupt injection in version
> 4 GICs may imply a vITS per pITS. (Update: it seems not)

Other items to add: NUMA and I/O NUMA. I don't know much about it but I
think the first solution would be more suitable.

Regards,
-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 14:04                           ` Vijay Kilari
@ 2015-05-15 15:05                             ` Julien Grall
  2015-05-15 15:38                               ` Ian Campbell
  0 siblings, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-15 15:05 UTC (permalink / raw)
  To: Vijay Kilari, Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On 15/05/15 15:04, Vijay Kilari wrote:
> On Fri, May 15, 2015 at 7:14 PM, Julien Grall <julien.grall@citrix.com> wrote:
>> On 15/05/15 14:24, Ian Campbell wrote:
>>> On Fri, 2015-05-15 at 18:44 +0530, Vijay Kilari wrote:
>>>> On Fri, May 15, 2015 at 6:23 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
>>>>> On Fri, 2015-05-15 at 18:17 +0530, Vijay Kilari wrote:
>>>>>> On Fri, May 15, 2015 at 5:33 PM, Julien Grall <julien.grall@citrix.com> wrote:
>>>>>>> On 15/05/15 12:30, Ian Campbell wrote:
>>>>>>>>> Handling of Single vITS and multipl pITS can be made simple.
>>>>>>>>>
>>>>>>>>> All ITS commands except SYNC & INVALL has device id which will
>>>>>>>>> help us to know to which pITS it should be sent.
>>>>>>>>>
>>>>>>>>> SYNC & INVALL can be dropped by Xen on Guest request
>>>>>>>>>  and let Xen append where ever SYNC & INVALL is required.
>>>>>>>>> (Ex; Linux driver adds SYNC for required commands).
>>>>>>>>> With this assumption, all ITS commands are mapped to pITS
>>>>>>>>> and no need of synchronization across pITS
>>>>>>>>
>>>>>>>> You've ignored the second bullet its three sub-bullets, I think.
>>>>>>>
>>>>>>    Why can't we group the batch of commands based on pITS it has
>>>>>> to be sent?.
>>>>>
>>>>> Are you suggesting that each batch we send should be synchronous? (i.e.
>>>>> end with SYNC+INT) That doesn't seem at all desirable.
>>>>
>>>> Not only at the end of batch, SYNC can be appended based on every
>>>> command within the batch.
>>>
>>> Could be, but something to avoid I think?
>>
>> That would slow down the ITS processing (SYNC is waiting that the
>> previous command has executed).
>>
>> Also, what about INTALL? Sending it everytime would be horrible for the
>> performance because it flush the ITS cache.
> 
> INVALL is not required everytime. It can be sent only as mentioned in spec Note.
> ex; MOVI
> 
> Note: this command is expected to be used by software when it changed
> the re-configuration
> of an LPI in memory to ensure any cached copies of the old
> configuration are discarded.

INVALL is used when a large number of LPIs has been reconfigured. If you
send one by MOVI is not efficient at all and will slowdown all the
interrupts for few milliseconds. We need to use them with caution.

Usually a guest will send one for multiple MOVI command.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 15:05                             ` Julien Grall
@ 2015-05-15 15:38                               ` Ian Campbell
  2015-05-15 17:31                                 ` Julien Grall
  0 siblings, 1 reply; 77+ messages in thread
From: Ian Campbell @ 2015-05-15 15:38 UTC (permalink / raw)
  To: Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Julien Grall, Stefano Stabellini

On Fri, 2015-05-15 at 16:05 +0100, Julien Grall wrote:
> On 15/05/15 15:04, Vijay Kilari wrote:
> > On Fri, May 15, 2015 at 7:14 PM, Julien Grall <julien.grall@citrix.com> wrote:
> >> On 15/05/15 14:24, Ian Campbell wrote:
> >>> On Fri, 2015-05-15 at 18:44 +0530, Vijay Kilari wrote:
> >>>> On Fri, May 15, 2015 at 6:23 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> >>>>> On Fri, 2015-05-15 at 18:17 +0530, Vijay Kilari wrote:
> >>>>>> On Fri, May 15, 2015 at 5:33 PM, Julien Grall <julien.grall@citrix.com> wrote:
> >>>>>>> On 15/05/15 12:30, Ian Campbell wrote:
> >>>>>>>>> Handling of Single vITS and multipl pITS can be made simple.
> >>>>>>>>>
> >>>>>>>>> All ITS commands except SYNC & INVALL has device id which will
> >>>>>>>>> help us to know to which pITS it should be sent.
> >>>>>>>>>
> >>>>>>>>> SYNC & INVALL can be dropped by Xen on Guest request
> >>>>>>>>>  and let Xen append where ever SYNC & INVALL is required.
> >>>>>>>>> (Ex; Linux driver adds SYNC for required commands).
> >>>>>>>>> With this assumption, all ITS commands are mapped to pITS
> >>>>>>>>> and no need of synchronization across pITS
> >>>>>>>>
> >>>>>>>> You've ignored the second bullet its three sub-bullets, I think.
> >>>>>>>
> >>>>>>    Why can't we group the batch of commands based on pITS it has
> >>>>>> to be sent?.
> >>>>>
> >>>>> Are you suggesting that each batch we send should be synchronous? (i.e.
> >>>>> end with SYNC+INT) That doesn't seem at all desirable.
> >>>>
> >>>> Not only at the end of batch, SYNC can be appended based on every
> >>>> command within the batch.
> >>>
> >>> Could be, but something to avoid I think?
> >>
> >> That would slow down the ITS processing (SYNC is waiting that the
> >> previous command has executed).
> >>
> >> Also, what about INTALL? Sending it everytime would be horrible for the
> >> performance because it flush the ITS cache.
> > 
> > INVALL is not required everytime. It can be sent only as mentioned in spec Note.
> > ex; MOVI
> > 
> > Note: this command is expected to be used by software when it changed
> > the re-configuration
> > of an LPI in memory to ensure any cached copies of the old
> > configuration are discarded.
> 
> INVALL is used when a large number of LPIs has been reconfigured. If you
> send one by MOVI is not efficient at all and will slowdown all the
> interrupts for few milliseconds. We need to use them with caution.
> 
> Usually a guest will send one for multiple MOVI command.

We should be prepared for a guest which does nothing but send INVALL
commands (i.e. trying to DoS the host).

I mentioned earlier about maybe needing to track which pITS's a SYNC
goes to (based on what SYNC have happened already and what commands the
guest has sent since).

Do we also need to track which LPIs a guest has fiddled with in order to
decide (perhaps via a threshold) whether to use INVALL vs a small number
of targeted INVALL?

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 15:38                               ` Ian Campbell
@ 2015-05-15 17:31                                 ` Julien Grall
  2015-05-16  4:03                                   ` Vijay Kilari
  0 siblings, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-15 17:31 UTC (permalink / raw)
  To: Ian Campbell, Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On 15/05/15 16:38, Ian Campbell wrote:
> On Fri, 2015-05-15 at 16:05 +0100, Julien Grall wrote:
>> On 15/05/15 15:04, Vijay Kilari wrote:
>>> On Fri, May 15, 2015 at 7:14 PM, Julien Grall <julien.grall@citrix.com> wrote:
>>>> On 15/05/15 14:24, Ian Campbell wrote:
>>>>> On Fri, 2015-05-15 at 18:44 +0530, Vijay Kilari wrote:
>>>>>> On Fri, May 15, 2015 at 6:23 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
>>>>>>> On Fri, 2015-05-15 at 18:17 +0530, Vijay Kilari wrote:
>>>>>>>> On Fri, May 15, 2015 at 5:33 PM, Julien Grall <julien.grall@citrix.com> wrote:
>>>>>>>>> On 15/05/15 12:30, Ian Campbell wrote:
>>>>>>>>>>> Handling of Single vITS and multipl pITS can be made simple.
>>>>>>>>>>>
>>>>>>>>>>> All ITS commands except SYNC & INVALL has device id which will
>>>>>>>>>>> help us to know to which pITS it should be sent.
>>>>>>>>>>>
>>>>>>>>>>> SYNC & INVALL can be dropped by Xen on Guest request
>>>>>>>>>>>  and let Xen append where ever SYNC & INVALL is required.
>>>>>>>>>>> (Ex; Linux driver adds SYNC for required commands).
>>>>>>>>>>> With this assumption, all ITS commands are mapped to pITS
>>>>>>>>>>> and no need of synchronization across pITS
>>>>>>>>>>
>>>>>>>>>> You've ignored the second bullet its three sub-bullets, I think.
>>>>>>>>>
>>>>>>>>    Why can't we group the batch of commands based on pITS it has
>>>>>>>> to be sent?.
>>>>>>>
>>>>>>> Are you suggesting that each batch we send should be synchronous? (i.e.
>>>>>>> end with SYNC+INT) That doesn't seem at all desirable.
>>>>>>
>>>>>> Not only at the end of batch, SYNC can be appended based on every
>>>>>> command within the batch.
>>>>>
>>>>> Could be, but something to avoid I think?
>>>>
>>>> That would slow down the ITS processing (SYNC is waiting that the
>>>> previous command has executed).
>>>>
>>>> Also, what about INTALL? Sending it everytime would be horrible for the
>>>> performance because it flush the ITS cache.
>>>
>>> INVALL is not required everytime. It can be sent only as mentioned in spec Note.
>>> ex; MOVI

BTW, when you quote the spec, can you give the section number/version of
the spec? So far, I'm not able to find anything about the relation
between MOVI and INVALL in my spec.

INV* commands are sent in order to ask the ITS reloading the
configuration tables (see 4.8.4 PRD03-GENC-010745 24.0):

"The effects of this caching are not visible to software except when
reconfiguring an LPI, in which case an explicit invalidate command must
be issued (e.g. an ITS INV command or a write to GICR_INVLPIR)
Note: this means hardware must manage its caches automatically when
moving interrupts"

So, it looks like to me that INV* command are only necessary when
configuration tables is changed.

FWIW, Linux is using INVALL when a collection is map and INV when the
LPI configuration is changed. I don't see any INV* command after MOVI.
So it confirms what the spec says.

>>> Note: this command is expected to be used by software when it changed
>>> the re-configuration
>>> of an LPI in memory to ensure any cached copies of the old
>>> configuration are discarded.
>>
>> INVALL is used when a large number of LPIs has been reconfigured. If you
>> send one by MOVI is not efficient at all and will slowdown all the
>> interrupts for few milliseconds. We need to use them with caution.
>>
>> Usually a guest will send one for multiple MOVI command.
> 
> We should be prepared for a guest which does nothing but send INVALL
> commands (i.e. trying to DoS the host).
> 
> I mentioned earlier about maybe needing to track which pITS's a SYNC
> goes to (based on what SYNC have happened already and what commands the
> guest has sent since).
> 
> Do we also need to track which LPIs a guest has fiddled with in order to
> decide (perhaps via a threshold) whether to use INVALL vs a small number
> of targeted INVALL?

I did some reading about the INV* commands (INV and INVALL). The
interesting section in GICv3 is 4.8.4 PRD03-GENC-010745 24.0.

They are only used to ensure the ITS re-read the LPIs configuration
table. I don't speak about the pending table as the spec (4.8.5) says
that it's maintained solely by a re-distributor. It's up to the
implementation to provide a mechanism to sync the memory (useful for
Power Management).

The LPIs configuration tables is used to enable/disable the LPI and set
the priority. Only the enable/disable bit needs to be replicated to the
hardware.

The pITS LPIs configuration tables is managed by Xen. Each guest will
provide to the vITS his own LPIs configuration table.

The emulation of INV* command will depend on how we decide to emulate
the LPIs configuration table.

Solution 1: Trap every access to the guest LPIs configuration table

For every write access, when the vLPIs is valid (i.e associated to a
device/interrupt), Xen will toggle the enable bit in the hardware LPIs
configuration table and send an INV *. This requiring to be able to
translate the vLPIs to a (device,ID).

INVALL/INV command could be ignored and directly increment CREADR
because it only ensure that the command has been executed, not fully
completed. A SYNC would be required from the guest in order to ensure
the completion.

Therefore we would need more care for the SYNC. Maybe by injecting a
SYNC when it's necessary.

Note that we would need Xen to send command on behalf of the guest (i.e
not part of the command queue).

Solution 2: Emulate "properly" INV and INVALL commands

While emulate the INV is easy (read the configuration table and
replicate it the pITS LPIs configuration table), INVALL requires to read
most of the LPI configuration tables.

AFAICT there is no limitation of the size of the table (driven by
GITS_TYPER.IDbits)

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 17:31                                 ` Julien Grall
@ 2015-05-16  4:03                                   ` Vijay Kilari
  2015-05-16  8:49                                     ` Julien Grall
  0 siblings, 1 reply; 77+ messages in thread
From: Vijay Kilari @ 2015-05-16  4:03 UTC (permalink / raw)
  To: Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Fri, May 15, 2015 at 11:01 PM, Julien Grall <julien.grall@citrix.com> wrote:
> On 15/05/15 16:38, Ian Campbell wrote:
>> On Fri, 2015-05-15 at 16:05 +0100, Julien Grall wrote:
>>> On 15/05/15 15:04, Vijay Kilari wrote:
>>>> On Fri, May 15, 2015 at 7:14 PM, Julien Grall <julien.grall@citrix.com> wrote:
>>>>> On 15/05/15 14:24, Ian Campbell wrote:
>>>>>> On Fri, 2015-05-15 at 18:44 +0530, Vijay Kilari wrote:
>>>>>>> On Fri, May 15, 2015 at 6:23 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
>>>>>>>> On Fri, 2015-05-15 at 18:17 +0530, Vijay Kilari wrote:
>>>>>>>>> On Fri, May 15, 2015 at 5:33 PM, Julien Grall <julien.grall@citrix.com> wrote:
>>>>>>>>>> On 15/05/15 12:30, Ian Campbell wrote:
>>>>>>>>>>>> Handling of Single vITS and multipl pITS can be made simple.
>>>>>>>>>>>>
>>>>>>>>>>>> All ITS commands except SYNC & INVALL has device id which will
>>>>>>>>>>>> help us to know to which pITS it should be sent.
>>>>>>>>>>>>
>>>>>>>>>>>> SYNC & INVALL can be dropped by Xen on Guest request
>>>>>>>>>>>>  and let Xen append where ever SYNC & INVALL is required.
>>>>>>>>>>>> (Ex; Linux driver adds SYNC for required commands).
>>>>>>>>>>>> With this assumption, all ITS commands are mapped to pITS
>>>>>>>>>>>> and no need of synchronization across pITS
>>>>>>>>>>>
>>>>>>>>>>> You've ignored the second bullet its three sub-bullets, I think.
>>>>>>>>>>
>>>>>>>>>    Why can't we group the batch of commands based on pITS it has
>>>>>>>>> to be sent?.
>>>>>>>>
>>>>>>>> Are you suggesting that each batch we send should be synchronous? (i.e.
>>>>>>>> end with SYNC+INT) That doesn't seem at all desirable.
>>>>>>>
>>>>>>> Not only at the end of batch, SYNC can be appended based on every
>>>>>>> command within the batch.
>>>>>>
>>>>>> Could be, but something to avoid I think?
>>>>>
>>>>> That would slow down the ITS processing (SYNC is waiting that the
>>>>> previous command has executed).
>>>>>
>>>>> Also, what about INTALL? Sending it everytime would be horrible for the
>>>>> performance because it flush the ITS cache.
>>>>
>>>> INVALL is not required everytime. It can be sent only as mentioned in spec Note.
>>>> ex; MOVI
>
> BTW, when you quote the spec, can you give the section number/version of
> the spec? So far, I'm not able to find anything about the relation
> between MOVI and INVALL in my spec.
>

See 5.13.19 INVALL collection of PRD03-GENC-010745 20.0

> INV* commands are sent in order to ask the ITS reloading the
> configuration tables (see 4.8.4 PRD03-GENC-010745 24.0):
>
> "The effects of this caching are not visible to software except when
> reconfiguring an LPI, in which case an explicit invalidate command must
> be issued (e.g. an ITS INV command or a write to GICR_INVLPIR)
> Note: this means hardware must manage its caches automatically when
> moving interrupts"
>
> So, it looks like to me that INV* command are only necessary when
> configuration tables is changed.
>
> FWIW, Linux is using INVALL when a collection is map and INV when the
> LPI configuration is changed. I don't see any INV* command after MOVI.
> So it confirms what the spec says.
>
>>>> Note: this command is expected to be used by software when it changed
>>>> the re-configuration
>>>> of an LPI in memory to ensure any cached copies of the old
>>>> configuration are discarded.
>>>
>>> INVALL is used when a large number of LPIs has been reconfigured. If you
>>> send one by MOVI is not efficient at all and will slowdown all the
>>> interrupts for few milliseconds. We need to use them with caution.
>>>
>>> Usually a guest will send one for multiple MOVI command.
>>
>> We should be prepared for a guest which does nothing but send INVALL
>> commands (i.e. trying to DoS the host).
>>
>> I mentioned earlier about maybe needing to track which pITS's a SYNC
>> goes to (based on what SYNC have happened already and what commands the
>> guest has sent since).
>>
>> Do we also need to track which LPIs a guest has fiddled with in order to
>> decide (perhaps via a threshold) whether to use INVALL vs a small number
>> of targeted INVALL?
>
> I did some reading about the INV* commands (INV and INVALL). The
> interesting section in GICv3 is 4.8.4 PRD03-GENC-010745 24.0.
>
> They are only used to ensure the ITS re-read the LPIs configuration
> table. I don't speak about the pending table as the spec (4.8.5) says
> that it's maintained solely by a re-distributor. It's up to the
> implementation to provide a mechanism to sync the memory (useful for
> Power Management).
>
> The LPIs configuration tables is used to enable/disable the LPI and set
> the priority. Only the enable/disable bit needs to be replicated to the
> hardware.
>
> The pITS LPIs configuration tables is managed by Xen. Each guest will
> provide to the vITS his own LPIs configuration table.
>
> The emulation of INV* command will depend on how we decide to emulate
> the LPIs configuration table.
>
> Solution 1: Trap every access to the guest LPIs configuration table
>
   Trapping on guest LPI configuration table is mandatory to
enable/disable LPI in LPI pending table. There is no ITS command
for this. In my RFC patches I have done this, where Xen calls
irq_hw_controller's set_affinity which will send INVALL command

> For every write access, when the vLPIs is valid (i.e associated to a
> device/interrupt), Xen will toggle the enable bit in the hardware LPIs
> configuration table and send an INV *. This requiring to be able to
> translate the vLPIs to a (device,ID).
>
> INVALL/INV command could be ignored and directly increment CREADR
> because it only ensure that the command has been executed, not fully
> completed. A SYNC would be required from the guest in order to ensure
> the completion.
>
> Therefore we would need more care for the SYNC. Maybe by injecting a
> SYNC when it's necessary.
>
> Note that we would need Xen to send command on behalf of the guest (i.e
> not part of the command queue).
>
> Solution 2: Emulate "properly" INV and INVALL commands
>
> While emulate the INV is easy (read the configuration table and
> replicate it the pITS LPIs configuration table), INVALL requires to read
> most of the LPI configuration tables.
>
> AFAICT there is no limitation of the size of the table (driven by
> GITS_TYPER.IDbits)
>
> Regards,
>
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-16  4:03                                   ` Vijay Kilari
@ 2015-05-16  8:49                                     ` Julien Grall
  2015-05-19 11:38                                       ` Vijay Kilari
  0 siblings, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-16  8:49 UTC (permalink / raw)
  To: Vijay Kilari, Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

Hi,

On 16/05/2015 05:03, Vijay Kilari wrote:
> On Fri, May 15, 2015 at 11:01 PM, Julien Grall <julien.grall@citrix.com> wrote:
>> On 15/05/15 16:38, Ian Campbell wrote:
>>> On Fri, 2015-05-15 at 16:05 +0100, Julien Grall wrote:
>>>> On 15/05/15 15:04, Vijay Kilari wrote:
>>>>> On Fri, May 15, 2015 at 7:14 PM, Julien Grall <julien.grall@citrix.com> wrote:
>>>>>> On 15/05/15 14:24, Ian Campbell wrote:
>>>>>>> On Fri, 2015-05-15 at 18:44 +0530, Vijay Kilari wrote:
>>>>>>>> On Fri, May 15, 2015 at 6:23 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
>>>>>>>>> On Fri, 2015-05-15 at 18:17 +0530, Vijay Kilari wrote:
>>>>>>>>>> On Fri, May 15, 2015 at 5:33 PM, Julien Grall <julien.grall@citrix.com> wrote:
>>>>>>>>>>> On 15/05/15 12:30, Ian Campbell wrote:
>>>>>>>>>>>>> Handling of Single vITS and multipl pITS can be made simple.
>>>>>>>>>>>>>
>>>>>>>>>>>>> All ITS commands except SYNC & INVALL has device id which will
>>>>>>>>>>>>> help us to know to which pITS it should be sent.
>>>>>>>>>>>>>
>>>>>>>>>>>>> SYNC & INVALL can be dropped by Xen on Guest request
>>>>>>>>>>>>>   and let Xen append where ever SYNC & INVALL is required.
>>>>>>>>>>>>> (Ex; Linux driver adds SYNC for required commands).
>>>>>>>>>>>>> With this assumption, all ITS commands are mapped to pITS
>>>>>>>>>>>>> and no need of synchronization across pITS
>>>>>>>>>>>>
>>>>>>>>>>>> You've ignored the second bullet its three sub-bullets, I think.
>>>>>>>>>>>
>>>>>>>>>>     Why can't we group the batch of commands based on pITS it has
>>>>>>>>>> to be sent?.
>>>>>>>>>
>>>>>>>>> Are you suggesting that each batch we send should be synchronous? (i.e.
>>>>>>>>> end with SYNC+INT) That doesn't seem at all desirable.
>>>>>>>>
>>>>>>>> Not only at the end of batch, SYNC can be appended based on every
>>>>>>>> command within the batch.
>>>>>>>
>>>>>>> Could be, but something to avoid I think?
>>>>>>
>>>>>> That would slow down the ITS processing (SYNC is waiting that the
>>>>>> previous command has executed).
>>>>>>
>>>>>> Also, what about INTALL? Sending it everytime would be horrible for the
>>>>>> performance because it flush the ITS cache.
>>>>>
>>>>> INVALL is not required everytime. It can be sent only as mentioned in spec Note.
>>>>> ex; MOVI
>>
>> BTW, when you quote the spec, can you give the section number/version of
>> the spec? So far, I'm not able to find anything about the relation
>> between MOVI and INVALL in my spec.
>>
>
> See 5.13.19 INVALL collection of PRD03-GENC-010745 20.0

Still nothing about MOVI... How did you deduce it?

The spec only says:

"this command is expected to be used by software when it changed the 
re-configuration of an LPI in memory
to ensure any cached copies of the old configuration are discarded."

>> INV* commands are sent in order to ask the ITS reloading the
>> configuration tables (see 4.8.4 PRD03-GENC-010745 24.0):
>>
>> "The effects of this caching are not visible to software except when
>> reconfiguring an LPI, in which case an explicit invalidate command must
>> be issued (e.g. an ITS INV command or a write to GICR_INVLPIR)
>> Note: this means hardware must manage its caches automatically when
>> moving interrupts"
>>
>> So, it looks like to me that INV* command are only necessary when
>> configuration tables is changed.
>>
>> FWIW, Linux is using INVALL when a collection is map and INV when the
>> LPI configuration is changed. I don't see any INV* command after MOVI.
>> So it confirms what the spec says.
>>
>>>>> Note: this command is expected to be used by software when it changed
>>>>> the re-configuration
>>>>> of an LPI in memory to ensure any cached copies of the old
>>>>> configuration are discarded.
>>>>
>>>> INVALL is used when a large number of LPIs has been reconfigured. If you
>>>> send one by MOVI is not efficient at all and will slowdown all the
>>>> interrupts for few milliseconds. We need to use them with caution.
>>>>
>>>> Usually a guest will send one for multiple MOVI command.
>>>
>>> We should be prepared for a guest which does nothing but send INVALL
>>> commands (i.e. trying to DoS the host).
>>>
>>> I mentioned earlier about maybe needing to track which pITS's a SYNC
>>> goes to (based on what SYNC have happened already and what commands the
>>> guest has sent since).
>>>
>>> Do we also need to track which LPIs a guest has fiddled with in order to
>>> decide (perhaps via a threshold) whether to use INVALL vs a small number
>>> of targeted INVALL?
>>
>> I did some reading about the INV* commands (INV and INVALL). The
>> interesting section in GICv3 is 4.8.4 PRD03-GENC-010745 24.0.
>>
>> They are only used to ensure the ITS re-read the LPIs configuration
>> table. I don't speak about the pending table as the spec (4.8.5) says
>> that it's maintained solely by a re-distributor. It's up to the
>> implementation to provide a mechanism to sync the memory (useful for
>> Power Management).
>>
>> The LPIs configuration tables is used to enable/disable the LPI and set
>> the priority. Only the enable/disable bit needs to be replicated to the
>> hardware.
>>
>> The pITS LPIs configuration tables is managed by Xen. Each guest will
>> provide to the vITS his own LPIs configuration table.
>>
>> The emulation of INV* command will depend on how we decide to emulate
>> the LPIs configuration table.
>>
>> Solution 1: Trap every access to the guest LPIs configuration table
>>
>     Trapping on guest LPI configuration table is mandatory to
> enable/disable LPI in LPI pending table. There is no ITS command
> for this. In my RFC patches I have done this, where Xen calls
> irq_hw_controller's set_affinity which will send INVALL command

Trapping is not mandatory. The ITS may not read the LPI configuration 
table until a INV/INVALL command has been sent.

The vITS is not forced to enable/disable the LPIs until one of this 
command is sent.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-16  8:49                                     ` Julien Grall
@ 2015-05-19 11:38                                       ` Vijay Kilari
  2015-05-19 11:48                                         ` Ian Campbell
  2015-05-19 11:55                                         ` Ian Campbell
  0 siblings, 2 replies; 77+ messages in thread
From: Vijay Kilari @ 2015-05-19 11:38 UTC (permalink / raw)
  To: Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

Hi Ian,

   If we want to target for 4.6, then I think we should draw conclusion

On Sat, May 16, 2015 at 2:19 PM, Julien Grall <julien.grall@citrix.com> wrote:
> Hi,
>
>
> On 16/05/2015 05:03, Vijay Kilari wrote:
>>
>> On Fri, May 15, 2015 at 11:01 PM, Julien Grall <julien.grall@citrix.com>
>> wrote:
>>>
>>> On 15/05/15 16:38, Ian Campbell wrote:
>>>>
>>>> On Fri, 2015-05-15 at 16:05 +0100, Julien Grall wrote:
>>>>>
>>>>> On 15/05/15 15:04, Vijay Kilari wrote:
>>>>>>
>>>>>> On Fri, May 15, 2015 at 7:14 PM, Julien Grall
>>>>>> <julien.grall@citrix.com> wrote:
>>>>>>>
>>>>>>> On 15/05/15 14:24, Ian Campbell wrote:
>>>>>>>>
>>>>>>>> On Fri, 2015-05-15 at 18:44 +0530, Vijay Kilari wrote:
>>>>>>>>>
>>>>>>>>> On Fri, May 15, 2015 at 6:23 PM, Ian Campbell
>>>>>>>>> <ian.campbell@citrix.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, 2015-05-15 at 18:17 +0530, Vijay Kilari wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 15, 2015 at 5:33 PM, Julien Grall
>>>>>>>>>>> <julien.grall@citrix.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 15/05/15 12:30, Ian Campbell wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Handling of Single vITS and multipl pITS can be made simple.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> All ITS commands except SYNC & INVALL has device id which will
>>>>>>>>>>>>>> help us to know to which pITS it should be sent.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> SYNC & INVALL can be dropped by Xen on Guest request
>>>>>>>>>>>>>>   and let Xen append where ever SYNC & INVALL is required.
>>>>>>>>>>>>>> (Ex; Linux driver adds SYNC for required commands).
>>>>>>>>>>>>>> With this assumption, all ITS commands are mapped to pITS
>>>>>>>>>>>>>> and no need of synchronization across pITS
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> You've ignored the second bullet its three sub-bullets, I
>>>>>>>>>>>>> think.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>     Why can't we group the batch of commands based on pITS it has
>>>>>>>>>>> to be sent?.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Are you suggesting that each batch we send should be synchronous?
>>>>>>>>>> (i.e.
>>>>>>>>>> end with SYNC+INT) That doesn't seem at all desirable.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Not only at the end of batch, SYNC can be appended based on every
>>>>>>>>> command within the batch.
>>>>>>>>
>>>>>>>>
>>>>>>>> Could be, but something to avoid I think?
>>>>>>>
>>>>>>>
>>>>>>> That would slow down the ITS processing (SYNC is waiting that the
>>>>>>> previous command has executed).
>>>>>>>
>>>>>>> Also, what about INTALL? Sending it everytime would be horrible for
>>>>>>> the
>>>>>>> performance because it flush the ITS cache.
>>>>>>
>>>>>>
>>>>>> INVALL is not required everytime. It can be sent only as mentioned in
>>>>>> spec Note.
>>>>>> ex; MOVI
>>>
>>>
>>> BTW, when you quote the spec, can you give the section number/version of
>>> the spec? So far, I'm not able to find anything about the relation
>>> between MOVI and INVALL in my spec.
>>>
>>
>> See 5.13.19 INVALL collection of PRD03-GENC-010745 20.0
>
>
> Still nothing about MOVI... How did you deduce it?

 I have quoted it as an example where INVALL might be needed.

>
>
> The spec only says:
>
> "this command is expected to be used by software when it changed the
> re-configuration of an LPI in memory
> to ensure any cached copies of the old configuration are discarded."
>
>>> INV* commands are sent in order to ask the ITS reloading the
>>> configuration tables (see 4.8.4 PRD03-GENC-010745 24.0):
>>>
>>> "The effects of this caching are not visible to software except when
>>> reconfiguring an LPI, in which case an explicit invalidate command must
>>> be issued (e.g. an ITS INV command or a write to GICR_INVLPIR)
>>> Note: this means hardware must manage its caches automatically when
>>> moving interrupts"
>>>
>>> So, it looks like to me that INV* command are only necessary when
>>> configuration tables is changed.
>>>
>>> FWIW, Linux is using INVALL when a collection is map and INV when the
>>> LPI configuration is changed. I don't see any INV* command after MOVI.
>>> So it confirms what the spec says.
>>>
>>>>>> Note: this command is expected to be used by software when it changed
>>>>>> the re-configuration
>>>>>> of an LPI in memory to ensure any cached copies of the old
>>>>>> configuration are discarded.
>>>>>
>>>>>
>>>>> INVALL is used when a large number of LPIs has been reconfigured. If
>>>>> you
>>>>> send one by MOVI is not efficient at all and will slowdown all the
>>>>> interrupts for few milliseconds. We need to use them with caution.
>>>>>
>>>>> Usually a guest will send one for multiple MOVI command.
>>>>
>>>>
>>>> We should be prepared for a guest which does nothing but send INVALL
>>>> commands (i.e. trying to DoS the host).
>>>>
>>>> I mentioned earlier about maybe needing to track which pITS's a SYNC
>>>> goes to (based on what SYNC have happened already and what commands the
>>>> guest has sent since).
>>>>
>>>> Do we also need to track which LPIs a guest has fiddled with in order to
>>>> decide (perhaps via a threshold) whether to use INVALL vs a small number
>>>> of targeted INVALL?
>>>
>>>
>>> I did some reading about the INV* commands (INV and INVALL). The
>>> interesting section in GICv3 is 4.8.4 PRD03-GENC-010745 24.0.
>>>
>>> They are only used to ensure the ITS re-read the LPIs configuration
>>> table. I don't speak about the pending table as the spec (4.8.5) says
>>> that it's maintained solely by a re-distributor. It's up to the
>>> implementation to provide a mechanism to sync the memory (useful for
>>> Power Management).
>>>
>>> The LPIs configuration tables is used to enable/disable the LPI and set
>>> the priority. Only the enable/disable bit needs to be replicated to the
>>> hardware.
>>>
>>> The pITS LPIs configuration tables is managed by Xen. Each guest will
>>> provide to the vITS his own LPIs configuration table.
>>>
>>> The emulation of INV* command will depend on how we decide to emulate
>>> the LPIs configuration table.
>>>
>>> Solution 1: Trap every access to the guest LPIs configuration table
>>>
>>     Trapping on guest LPI configuration table is mandatory to
>> enable/disable LPI in LPI pending table. There is no ITS command
>> for this. In my RFC patches I have done this, where Xen calls
>> irq_hw_controller's set_affinity which will send INVALL command
>
>
> Trapping is not mandatory. The ITS may not read the LPI configuration table
> until a INV/INVALL command has been sent.
>
> The vITS is not forced to enable/disable the LPIs until one of this command
> is sent.

  If so, in case INV/INVALL is not sent, then LPI configuration will
never be applied.
Which is slightly different from behaviour without Xen

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 11:38                                       ` Vijay Kilari
@ 2015-05-19 11:48                                         ` Ian Campbell
  2015-05-19 11:55                                         ` Ian Campbell
  1 sibling, 0 replies; 77+ messages in thread
From: Ian Campbell @ 2015-05-19 11:48 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On Tue, 2015-05-19 at 17:08 +0530, Vijay Kilari wrote:
> Hi Ian,
> 
>    If we want to target for 4.6, then I think we should draw conclusion

I'm waiting for this subthread to reach some sort of conclusion before
posting another draft.

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 11:38                                       ` Vijay Kilari
  2015-05-19 11:48                                         ` Ian Campbell
@ 2015-05-19 11:55                                         ` Ian Campbell
  2015-05-19 12:10                                           ` Vijay Kilari
  1 sibling, 1 reply; 77+ messages in thread
From: Ian Campbell @ 2015-05-19 11:55 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On Tue, 2015-05-19 at 17:08 +0530, Vijay Kilari wrote:
> Hi Ian,
> 
>    If we want to target for 4.6, then I think we should draw conclusion
> 
> On Sat, May 16, 2015 at 2:19 PM, Julien Grall <julien.grall@citrix.com> wrote:
> > Hi,
> >
> >
> > On 16/05/2015 05:03, Vijay Kilari wrote:
> >>
> >> On Fri, May 15, 2015 at 11:01 PM, Julien Grall <julien.grall@citrix.com>
> >> wrote:
> >>>
> >>> On 15/05/15 16:38, Ian Campbell wrote:
> >>>>
> >>>> On Fri, 2015-05-15 at 16:05 +0100, Julien Grall wrote:
> >>>>>
> >>>>> On 15/05/15 15:04, Vijay Kilari wrote:
> >>>>>>
> >>>>>> On Fri, May 15, 2015 at 7:14 PM, Julien Grall
> >>>>>> <julien.grall@citrix.com> wrote:
> >>>>>>>
> >>>>>>> On 15/05/15 14:24, Ian Campbell wrote:
> >>>>>>>>
> >>>>>>>> On Fri, 2015-05-15 at 18:44 +0530, Vijay Kilari wrote:
> >>>>>>>>>
> >>>>>>>>> On Fri, May 15, 2015 at 6:23 PM, Ian Campbell
> >>>>>>>>> <ian.campbell@citrix.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On Fri, 2015-05-15 at 18:17 +0530, Vijay Kilari wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, May 15, 2015 at 5:33 PM, Julien Grall
> >>>>>>>>>>> <julien.grall@citrix.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 15/05/15 12:30, Ian Campbell wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Handling of Single vITS and multipl pITS can be made simple.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> All ITS commands except SYNC & INVALL has device id which will
> >>>>>>>>>>>>>> help us to know to which pITS it should be sent.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> SYNC & INVALL can be dropped by Xen on Guest request
> >>>>>>>>>>>>>>   and let Xen append where ever SYNC & INVALL is required.
> >>>>>>>>>>>>>> (Ex; Linux driver adds SYNC for required commands).
> >>>>>>>>>>>>>> With this assumption, all ITS commands are mapped to pITS
> >>>>>>>>>>>>>> and no need of synchronization across pITS
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> You've ignored the second bullet its three sub-bullets, I
> >>>>>>>>>>>>> think.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>     Why can't we group the batch of commands based on pITS it has
> >>>>>>>>>>> to be sent?.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Are you suggesting that each batch we send should be synchronous?
> >>>>>>>>>> (i.e.
> >>>>>>>>>> end with SYNC+INT) That doesn't seem at all desirable.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Not only at the end of batch, SYNC can be appended based on every
> >>>>>>>>> command within the batch.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Could be, but something to avoid I think?
> >>>>>>>
> >>>>>>>
> >>>>>>> That would slow down the ITS processing (SYNC is waiting that the
> >>>>>>> previous command has executed).
> >>>>>>>
> >>>>>>> Also, what about INTALL? Sending it everytime would be horrible for
> >>>>>>> the
> >>>>>>> performance because it flush the ITS cache.
> >>>>>>
> >>>>>>
> >>>>>> INVALL is not required everytime. It can be sent only as mentioned in
> >>>>>> spec Note.
> >>>>>> ex; MOVI
> >>>
> >>>
> >>> BTW, when you quote the spec, can you give the section number/version of
> >>> the spec? So far, I'm not able to find anything about the relation
> >>> between MOVI and INVALL in my spec.
> >>>
> >>
> >> See 5.13.19 INVALL collection of PRD03-GENC-010745 20.0
> >
> >
> > Still nothing about MOVI... How did you deduce it?
> 
>  I have quoted it as an example where INVALL might be needed.
> 
> >
> >
> > The spec only says:
> >
> > "this command is expected to be used by software when it changed the
> > re-configuration of an LPI in memory
> > to ensure any cached copies of the old configuration are discarded."
> >
> >>> INV* commands are sent in order to ask the ITS reloading the
> >>> configuration tables (see 4.8.4 PRD03-GENC-010745 24.0):
> >>>
> >>> "The effects of this caching are not visible to software except when
> >>> reconfiguring an LPI, in which case an explicit invalidate command must
> >>> be issued (e.g. an ITS INV command or a write to GICR_INVLPIR)
> >>> Note: this means hardware must manage its caches automatically when
> >>> moving interrupts"
> >>>
> >>> So, it looks like to me that INV* command are only necessary when
> >>> configuration tables is changed.
> >>>
> >>> FWIW, Linux is using INVALL when a collection is map and INV when the
> >>> LPI configuration is changed. I don't see any INV* command after MOVI.
> >>> So it confirms what the spec says.
> >>>
> >>>>>> Note: this command is expected to be used by software when it changed
> >>>>>> the re-configuration
> >>>>>> of an LPI in memory to ensure any cached copies of the old
> >>>>>> configuration are discarded.
> >>>>>
> >>>>>
> >>>>> INVALL is used when a large number of LPIs has been reconfigured. If
> >>>>> you
> >>>>> send one by MOVI is not efficient at all and will slowdown all the
> >>>>> interrupts for few milliseconds. We need to use them with caution.
> >>>>>
> >>>>> Usually a guest will send one for multiple MOVI command.
> >>>>
> >>>>
> >>>> We should be prepared for a guest which does nothing but send INVALL
> >>>> commands (i.e. trying to DoS the host).
> >>>>
> >>>> I mentioned earlier about maybe needing to track which pITS's a SYNC
> >>>> goes to (based on what SYNC have happened already and what commands the
> >>>> guest has sent since).
> >>>>
> >>>> Do we also need to track which LPIs a guest has fiddled with in order to
> >>>> decide (perhaps via a threshold) whether to use INVALL vs a small number
> >>>> of targeted INVALL?
> >>>
> >>>
> >>> I did some reading about the INV* commands (INV and INVALL). The
> >>> interesting section in GICv3 is 4.8.4 PRD03-GENC-010745 24.0.
> >>>
> >>> They are only used to ensure the ITS re-read the LPIs configuration
> >>> table. I don't speak about the pending table as the spec (4.8.5) says
> >>> that it's maintained solely by a re-distributor. It's up to the
> >>> implementation to provide a mechanism to sync the memory (useful for
> >>> Power Management).
> >>>
> >>> The LPIs configuration tables is used to enable/disable the LPI and set
> >>> the priority. Only the enable/disable bit needs to be replicated to the
> >>> hardware.
> >>>
> >>> The pITS LPIs configuration tables is managed by Xen. Each guest will
> >>> provide to the vITS his own LPIs configuration table.
> >>>
> >>> The emulation of INV* command will depend on how we decide to emulate
> >>> the LPIs configuration table.
> >>>
> >>> Solution 1: Trap every access to the guest LPIs configuration table
> >>>
> >>     Trapping on guest LPI configuration table is mandatory to
> >> enable/disable LPI in LPI pending table. There is no ITS command
> >> for this. In my RFC patches I have done this, where Xen calls
> >> irq_hw_controller's set_affinity which will send INVALL command
> >
> >
> > Trapping is not mandatory. The ITS may not read the LPI configuration table
> > until a INV/INVALL command has been sent.
> >
> > The vITS is not forced to enable/disable the LPIs until one of this command
> > is sent.
> 
>   If so, in case INV/INVALL is not sent, then LPI configuration will
> never be applied.
> Which is slightly different from behaviour without Xen

If a guest issues (for example) a MOVI which is not followed by an
INV/INVALL on native then what would trigger the LPI configuration to be
applied by the h/w?

If a guest is required to send an INV/INVALL in order for some change to
take affect and it does not do so then it is buggy, isn't it?

IOW all Xen needs to do is to propagate any guest initiated INV/INVALL
as/when it occurs in the command queue. I don't think we need to
fabricate an additional INV/INVALL while emulating a MOVI.

What am I missing?

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-15 14:55     ` Julien Grall
@ 2015-05-19 12:10       ` Ian Campbell
  2015-05-19 13:37         ` Julien Grall
  0 siblings, 1 reply; 77+ messages in thread
From: Ian Campbell @ 2015-05-19 12:10 UTC (permalink / raw)
  To: Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Fri, 2015-05-15 at 15:55 +0100, Julien Grall wrote:
[...]
> > Translation of certain commands can be expensive (XXX citation
> > needed).
> 
> The term "expensive" is subjective. I think we can end up to cheap
> translation if we properly pre-allocate information (such as device,
> LPIs...). We can have all the informations before the guest as boot or
> during hotplug part. It wouldn't take more memory than it should use.
> 
> During command translation, we would just need to enable the device/LPIs.
> 
> The remaining expensive part would be the validation. I think we can
> improve most of them of O(1) (such as collection checking) or O(log(n))
> (such as device checking).
[...]
> > XXX need a solution for this.
> 
> Command translation can be improved. It may be good too add a section
> explaining how translation of command foo can be done.

I think that is covered by the spec, however if there are operations
which form part of this which are potentially expensive we should
outline in our design how this will be dealt with.

Perhaps you or Vijay could propose some additional text covering:

      * What the potentially expensive operations during a translation
        are.
      * How we are going to deal with those operations, including:
              * What data structure is used
              * What start of day setup is required to enable this
              * What operations are therefore required at translation
                time

> > ### Filling the pITS Command Queue.
> > 
> > Various algorithms could be used here. For now a simple proposal is
> > to traverse the `pits.schedule_list` starting from where the last
> > refill finished (i.e not from the top of the list each time).
> > 
> > If a `vits_cq` has no pending commands then it is removed from the
> > list.
> > 
> > If a `vits_cq` has some pending commands then `min(pits-free-slots,
> > vits-outstanding, VITS_BATCH_SIZE)` will be taken from the vITS
> > command queue, translated and placed onto the pITS
> > queue. `vits_cq.progress` will be updated to reflect this.
> > 
> > Each `vits_cq` is handled in turn in this way until the pITS Command
> > Queue is full or there are no more outstanding commands.
> > 
> > There will likely need to be a data structure which shadows the pITS
> > Command Queue slots with references to the `vits_cq` which has a
> > command currently occupying that slot and corresponding the index into
> > the virtual command queue, for use when completing a command.
> > 
> > `VITS_BATCH_SIZE` should be small, TBD say 4 or 8.
> > 
> > Possible simplification: If we arrange that no guest ever has multiple
> > batches in flight (which can occur if we wrap around the list several
> > times) then we may be able to simplify the book keeping
> > required. However this may need some careful thought wrt fairness for
> > guests submitting frequent small batches of commands vs those sending
> > large batches.
> > 
> > XXX concern: Time spent filling the pITS queue could be significant if
> > guests are allowed to fill the ring completely.
> 
> I guess you sent this design before the end of the discussion?

Probably.

>  I think
> that limiting the number of batch/command sent per pass would allow a
> small pass.

I think we have a few choices:

      * Limit to one batch per vits at a time
      * Limit to some total number of batches per scheduling pass
      * Time bound the scheduling procedure

Do we have a preference?


> >   the underlying hardware to the guest.
> > * Adds complexity to the guest layout, which is right now static. How
> >   do you decide the number of vITS/root controller exposed:
> >     * Hotplug is tricky
> > * Toolstack needs greater knowledge of the host layout
> > * Given that PCI passthrough doesn't allow migration, maybe we could
> >   use the layout of the hardware.
> > 
> > In 1 vITS for all pITS:
> > 
> > * What to do with global commands? Inject to all pITS and then
> >   synchronise on them all finishing.
> > * Handling of out of order completion of commands queued with
> >   different pITS, since the vITS must appear to complete in
> >   order. Apart from the book keeping question it makes scheduling more
> >   interesting:
> >     * What if you have a pITS with slots available, and the guest command
> >       queue contains commands which could go to the pITS, but behind ones
> >       which are targetting another pITS which has no slots
> >     * What if one pITS is very busy and another is mostly idle and a
> >       guest submits one command to the busy one (contending with other
> >       guest) followed by a load of commands targeting the idle one. Those
> >       commands would be held up in this situation.
> >     * Reasoning about fairness may be harder.
> > 
> > XXX need a solution/decision here.
> 
> > In addition the introduction of direct interrupt injection in version
> > 4 GICs may imply a vITS per pITS. (Update: it seems not)
> 
> Other items to add: NUMA and I/O NUMA. I don't know much about it but I
> think the first solution would be more suitable.

first solution == ?

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 11:55                                         ` Ian Campbell
@ 2015-05-19 12:10                                           ` Vijay Kilari
  2015-05-19 12:19                                             ` Ian Campbell
  0 siblings, 1 reply; 77+ messages in thread
From: Vijay Kilari @ 2015-05-19 12:10 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On Tue, May 19, 2015 at 5:25 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Tue, 2015-05-19 at 17:08 +0530, Vijay Kilari wrote:
>> Hi Ian,
>>
>>    If we want to target for 4.6, then I think we should draw conclusion
>>
>> On Sat, May 16, 2015 at 2:19 PM, Julien Grall <julien.grall@citrix.com> wrote:
>> > Hi,
>> >
>> >
>> > On 16/05/2015 05:03, Vijay Kilari wrote:
>> >>
>> >> On Fri, May 15, 2015 at 11:01 PM, Julien Grall <julien.grall@citrix.com>
>> >> wrote:
>> >>>
>> >>> On 15/05/15 16:38, Ian Campbell wrote:
>> >>>>
>> >>>> On Fri, 2015-05-15 at 16:05 +0100, Julien Grall wrote:
>> >>>>>
>> >>>>> On 15/05/15 15:04, Vijay Kilari wrote:
>> >>>>>>
>> >>>>>> On Fri, May 15, 2015 at 7:14 PM, Julien Grall
>> >>>>>> <julien.grall@citrix.com> wrote:
>> >>>>>>>
>> >>>>>>> On 15/05/15 14:24, Ian Campbell wrote:
>> >>>>>>>>
>> >>>>>>>> On Fri, 2015-05-15 at 18:44 +0530, Vijay Kilari wrote:
>> >>>>>>>>>
>> >>>>>>>>> On Fri, May 15, 2015 at 6:23 PM, Ian Campbell
>> >>>>>>>>> <ian.campbell@citrix.com> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> On Fri, 2015-05-15 at 18:17 +0530, Vijay Kilari wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Fri, May 15, 2015 at 5:33 PM, Julien Grall
>> >>>>>>>>>>> <julien.grall@citrix.com> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On 15/05/15 12:30, Ian Campbell wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Handling of Single vITS and multipl pITS can be made simple.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> All ITS commands except SYNC & INVALL has device id which will
>> >>>>>>>>>>>>>> help us to know to which pITS it should be sent.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> SYNC & INVALL can be dropped by Xen on Guest request
>> >>>>>>>>>>>>>>   and let Xen append where ever SYNC & INVALL is required.
>> >>>>>>>>>>>>>> (Ex; Linux driver adds SYNC for required commands).
>> >>>>>>>>>>>>>> With this assumption, all ITS commands are mapped to pITS
>> >>>>>>>>>>>>>> and no need of synchronization across pITS
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> You've ignored the second bullet its three sub-bullets, I
>> >>>>>>>>>>>>> think.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>     Why can't we group the batch of commands based on pITS it has
>> >>>>>>>>>>> to be sent?.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Are you suggesting that each batch we send should be synchronous?
>> >>>>>>>>>> (i.e.
>> >>>>>>>>>> end with SYNC+INT) That doesn't seem at all desirable.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Not only at the end of batch, SYNC can be appended based on every
>> >>>>>>>>> command within the batch.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Could be, but something to avoid I think?
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> That would slow down the ITS processing (SYNC is waiting that the
>> >>>>>>> previous command has executed).
>> >>>>>>>
>> >>>>>>> Also, what about INTALL? Sending it everytime would be horrible for
>> >>>>>>> the
>> >>>>>>> performance because it flush the ITS cache.
>> >>>>>>
>> >>>>>>
>> >>>>>> INVALL is not required everytime. It can be sent only as mentioned in
>> >>>>>> spec Note.
>> >>>>>> ex; MOVI
>> >>>
>> >>>
>> >>> BTW, when you quote the spec, can you give the section number/version of
>> >>> the spec? So far, I'm not able to find anything about the relation
>> >>> between MOVI and INVALL in my spec.
>> >>>
>> >>
>> >> See 5.13.19 INVALL collection of PRD03-GENC-010745 20.0
>> >
>> >
>> > Still nothing about MOVI... How did you deduce it?
>>
>>  I have quoted it as an example where INVALL might be needed.
>>
>> >
>> >
>> > The spec only says:
>> >
>> > "this command is expected to be used by software when it changed the
>> > re-configuration of an LPI in memory
>> > to ensure any cached copies of the old configuration are discarded."
>> >
>> >>> INV* commands are sent in order to ask the ITS reloading the
>> >>> configuration tables (see 4.8.4 PRD03-GENC-010745 24.0):
>> >>>
>> >>> "The effects of this caching are not visible to software except when
>> >>> reconfiguring an LPI, in which case an explicit invalidate command must
>> >>> be issued (e.g. an ITS INV command or a write to GICR_INVLPIR)
>> >>> Note: this means hardware must manage its caches automatically when
>> >>> moving interrupts"
>> >>>
>> >>> So, it looks like to me that INV* command are only necessary when
>> >>> configuration tables is changed.
>> >>>
>> >>> FWIW, Linux is using INVALL when a collection is map and INV when the
>> >>> LPI configuration is changed. I don't see any INV* command after MOVI.
>> >>> So it confirms what the spec says.
>> >>>
>> >>>>>> Note: this command is expected to be used by software when it changed
>> >>>>>> the re-configuration
>> >>>>>> of an LPI in memory to ensure any cached copies of the old
>> >>>>>> configuration are discarded.
>> >>>>>
>> >>>>>
>> >>>>> INVALL is used when a large number of LPIs has been reconfigured. If
>> >>>>> you
>> >>>>> send one by MOVI is not efficient at all and will slowdown all the
>> >>>>> interrupts for few milliseconds. We need to use them with caution.
>> >>>>>
>> >>>>> Usually a guest will send one for multiple MOVI command.
>> >>>>
>> >>>>
>> >>>> We should be prepared for a guest which does nothing but send INVALL
>> >>>> commands (i.e. trying to DoS the host).
>> >>>>
>> >>>> I mentioned earlier about maybe needing to track which pITS's a SYNC
>> >>>> goes to (based on what SYNC have happened already and what commands the
>> >>>> guest has sent since).
>> >>>>
>> >>>> Do we also need to track which LPIs a guest has fiddled with in order to
>> >>>> decide (perhaps via a threshold) whether to use INVALL vs a small number
>> >>>> of targeted INVALL?
>> >>>
>> >>>
>> >>> I did some reading about the INV* commands (INV and INVALL). The
>> >>> interesting section in GICv3 is 4.8.4 PRD03-GENC-010745 24.0.
>> >>>
>> >>> They are only used to ensure the ITS re-read the LPIs configuration
>> >>> table. I don't speak about the pending table as the spec (4.8.5) says
>> >>> that it's maintained solely by a re-distributor. It's up to the
>> >>> implementation to provide a mechanism to sync the memory (useful for
>> >>> Power Management).
>> >>>
>> >>> The LPIs configuration tables is used to enable/disable the LPI and set
>> >>> the priority. Only the enable/disable bit needs to be replicated to the
>> >>> hardware.
>> >>>
>> >>> The pITS LPIs configuration tables is managed by Xen. Each guest will
>> >>> provide to the vITS his own LPIs configuration table.
>> >>>
>> >>> The emulation of INV* command will depend on how we decide to emulate
>> >>> the LPIs configuration table.
>> >>>
>> >>> Solution 1: Trap every access to the guest LPIs configuration table
>> >>>
>> >>     Trapping on guest LPI configuration table is mandatory to
>> >> enable/disable LPI in LPI pending table. There is no ITS command
>> >> for this. In my RFC patches I have done this, where Xen calls
>> >> irq_hw_controller's set_affinity which will send INVALL command
>> >
>> >
>> > Trapping is not mandatory. The ITS may not read the LPI configuration table
>> > until a INV/INVALL command has been sent.
>> >
>> > The vITS is not forced to enable/disable the LPIs until one of this command
>> > is sent.
>>
>>   If so, in case INV/INVALL is not sent, then LPI configuration will
>> never be applied.
>> Which is slightly different from behaviour without Xen
>
> If a guest issues (for example) a MOVI which is not followed by an
> INV/INVALL on native then what would trigger the LPI configuration to be
> applied by the h/w?
>
> If a guest is required to send an INV/INVALL in order for some change to
> take affect and it does not do so then it is buggy, isn't it?

agreed.

>
> IOW all Xen needs to do is to propagate any guest initiated INV/INVALL
> as/when it occurs in the command queue. I don't think we need to
> fabricate an additional INV/INVALL while emulating a MOVI.
>
> What am I missing?

back to point:

INV has device id so not an issue.
INVALL does not have device id to know pITS to send.
For that reason Xen is expected to insert INVALL at proper
places similar to SYNC and ignore INV/INVALL of guest.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-15 13:24               ` Julien Grall
@ 2015-05-19 12:14                 ` Ian Campbell
  2015-05-19 13:27                   ` Julien Grall
  0 siblings, 1 reply; 77+ messages in thread
From: Ian Campbell @ 2015-05-19 12:14 UTC (permalink / raw)
  To: Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Fri, 2015-05-15 at 14:24 +0100, Julien Grall wrote:
> Hi Ian,
> 
> On 15/05/15 13:58, Ian Campbell wrote:
> >>>>>>> Therefore it is proposed that the restriction that a single vITS maps
> >>>>>>> to one pITS be retained. If a guest requires access to devices
> >>>>>>> associated with multiple pITSs then multiple vITS should be
> >>>>>>> configured.
> >>>>>>
> >>>>>> Having multiple vITS per domain brings other issues:
> >>>>>> 	- How do you know the number of ITS to describe in the device tree at boot?
> >>>>>
> >>>>> I'm not sure. I don't think 1 vs N is very different from the question
> >>>>> of 0 vs 1 though, somehow the tools need to know about the pITS setup.
> >>>>
> >>>> I don't see why the tools would require to know the pITS setup.
> >>>
> >>> Even with only a single vits the tools need to know if the system has 0,
> >>> 1, or more pits, to know whether to vreate a vits at all or not.
> >>
> >> In the 1 vITS solution no, it's only necessary to add a new gic define
> >> for the gic_version field in xen_arch_domainconfig.
> > 
> > Would we expose a vITS to guests on a host which has no pITS at all?
> 
> No, Xen will check if we can support vITS. See an example with my "GICv2
> on GICv3" series. Obviously, we don't allow vGICv3 on GICv2.

Did you mean to refer to "arm: Allow the user to specify the GIC
version" or some other part of that series?

I suppose you are proposing a new flag vits=yes|no passed as part of the
domain config which Xen can then update to indicate yes or no? Or is
there more to it than that? Could Xen not equally well expose nr_vits
back to the tools?

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 12:10                                           ` Vijay Kilari
@ 2015-05-19 12:19                                             ` Ian Campbell
  2015-05-19 12:48                                               ` Vijay Kilari
  0 siblings, 1 reply; 77+ messages in thread
From: Ian Campbell @ 2015-05-19 12:19 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On Tue, 2015-05-19 at 17:40 +0530, Vijay Kilari wrote:
> > If a guest issues (for example) a MOVI which is not followed by an
> > INV/INVALL on native then what would trigger the LPI configuration to be
> > applied by the h/w?
> >
> > If a guest is required to send an INV/INVALL in order for some change to
> > take affect and it does not do so then it is buggy, isn't it?
> 
> agreed.
> 
> >
> > IOW all Xen needs to do is to propagate any guest initiated INV/INVALL
> > as/when it occurs in the command queue. I don't think we need to
> > fabricate an additional INV/INVALL while emulating a MOVI.
> >
> > What am I missing?
> 
> back to point:
> 
> INV has device id so not an issue.
> INVALL does not have device id to know pITS to send.
> For that reason Xen is expected to insert INVALL at proper
> places similar to SYNC and ignore INV/INVALL of guest.

Why wouldn't Xen just insert an INVALL in to all relevant pITS in
response to an INVALL from the guest?

If you are proposing something different then please be explicit by what
you mean by "proper places similar to SYNC". Ideally by proposing some
new text which I can use in the document.

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 12:19                                             ` Ian Campbell
@ 2015-05-19 12:48                                               ` Vijay Kilari
  2015-05-19 13:12                                                 ` Ian Campbell
  2015-05-19 14:05                                                 ` Julien Grall
  0 siblings, 2 replies; 77+ messages in thread
From: Vijay Kilari @ 2015-05-19 12:48 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On Tue, May 19, 2015 at 5:49 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Tue, 2015-05-19 at 17:40 +0530, Vijay Kilari wrote:
>> > If a guest issues (for example) a MOVI which is not followed by an
>> > INV/INVALL on native then what would trigger the LPI configuration to be
>> > applied by the h/w?
>> >
>> > If a guest is required to send an INV/INVALL in order for some change to
>> > take affect and it does not do so then it is buggy, isn't it?
>>
>> agreed.
>>
>> >
>> > IOW all Xen needs to do is to propagate any guest initiated INV/INVALL
>> > as/when it occurs in the command queue. I don't think we need to
>> > fabricate an additional INV/INVALL while emulating a MOVI.
>> >
>> > What am I missing?
>>
>> back to point:
>>
>> INV has device id so not an issue.
>> INVALL does not have device id to know pITS to send.
>> For that reason Xen is expected to insert INVALL at proper
>> places similar to SYNC and ignore INV/INVALL of guest.
>
> Why wouldn't Xen just insert an INVALL in to all relevant pITS in
> response to an INVALL from the guest?

If INVALL is sent on all pITS, then we need to wait for all pITS to complete
the command before we update CREADR of vITS.

>
> If you are proposing something different then please be explicit by what
> you mean by "proper places similar to SYNC". Ideally by proposing some
> new text which I can use in the document.

If the platform has more than 1 pITS, The ITS commands are mapped
from vITS to pITS using device ID provided with ITS command.

However SYNC and INVALL does not have device ID.
In such case there could be two ways to handle
1) SYNC and INVALL of guest will be sent to pITS based on previous ITS commands
    of guest
2) Xen will insert/append SYNC and INVALL to guest ITS commands
where-ever required and ignore guest
   SYNC and INVALL commands

IMO (2) would be better as approach (1) might fail to handle
scenario where-in guest is sending only SYNC & INVALL commands.

Regards
Vijay

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 12:48                                               ` Vijay Kilari
@ 2015-05-19 13:12                                                 ` Ian Campbell
  2015-05-19 14:05                                                 ` Julien Grall
  1 sibling, 0 replies; 77+ messages in thread
From: Ian Campbell @ 2015-05-19 13:12 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On Tue, 2015-05-19 at 18:18 +0530, Vijay Kilari wrote:
> On Tue, May 19, 2015 at 5:49 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> > On Tue, 2015-05-19 at 17:40 +0530, Vijay Kilari wrote:
> >> > If a guest issues (for example) a MOVI which is not followed by an
> >> > INV/INVALL on native then what would trigger the LPI configuration to be
> >> > applied by the h/w?
> >> >
> >> > If a guest is required to send an INV/INVALL in order for some change to
> >> > take affect and it does not do so then it is buggy, isn't it?
> >>
> >> agreed.
> >>
> >> >
> >> > IOW all Xen needs to do is to propagate any guest initiated INV/INVALL
> >> > as/when it occurs in the command queue. I don't think we need to
> >> > fabricate an additional INV/INVALL while emulating a MOVI.
> >> >
> >> > What am I missing?
> >>
> >> back to point:
> >>
> >> INV has device id so not an issue.
> >> INVALL does not have device id to know pITS to send.
> >> For that reason Xen is expected to insert INVALL at proper
> >> places similar to SYNC and ignore INV/INVALL of guest.
> >
> > Why wouldn't Xen just insert an INVALL in to all relevant pITS in
> > response to an INVALL from the guest?
> 
> If INVALL is sent on all pITS, then we need to wait for all pITS to complete
> the command before we update CREADR of vITS.

Correct, but doesn't that already naturally fall out of any scheme which
maps on vits onto multiple pits? It's not specific to INVALL that we
need to consider the progress of all pITS before updating the vITS.

> >
> > If you are proposing something different then please be explicit by what
> > you mean by "proper places similar to SYNC". Ideally by proposing some
> > new text which I can use in the document.
> 
> If the platform has more than 1 pITS, The ITS commands are mapped
> from vITS to pITS using device ID provided with ITS command.
> 
> However SYNC and INVALL does not have device ID.
> In such case there could be two ways to handle
> 1) SYNC and INVALL of guest will be sent to pITS based on previous ITS commands
>     of guest
> 2) Xen will insert/append SYNC and INVALL to guest ITS commands
> where-ever required and ignore guest
>    SYNC and INVALL commands
> 
> IMO (2) would be better as approach (1) might fail to handle
> scenario where-in guest is sending only SYNC & INVALL commands.

That depends on what "where-ever required" evaluates to. Please be
explicit here.

It sounds like this needs to be something which is handled as a new
chapter on translation, in a subsection dealing with non-device specific
command handling.

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 12:14                 ` Ian Campbell
@ 2015-05-19 13:27                   ` Julien Grall
  2015-05-19 13:36                     ` Ian Campbell
  0 siblings, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-19 13:27 UTC (permalink / raw)
  To: Ian Campbell, Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

Hi Ian,

On 19/05/15 13:14, Ian Campbell wrote:
> On Fri, 2015-05-15 at 14:24 +0100, Julien Grall wrote:
>> Hi Ian,
>>
>> On 15/05/15 13:58, Ian Campbell wrote:
>>>>>>>>> Therefore it is proposed that the restriction that a single vITS maps
>>>>>>>>> to one pITS be retained. If a guest requires access to devices
>>>>>>>>> associated with multiple pITSs then multiple vITS should be
>>>>>>>>> configured.
>>>>>>>>
>>>>>>>> Having multiple vITS per domain brings other issues:
>>>>>>>> 	- How do you know the number of ITS to describe in the device tree at boot?
>>>>>>>
>>>>>>> I'm not sure. I don't think 1 vs N is very different from the question
>>>>>>> of 0 vs 1 though, somehow the tools need to know about the pITS setup.
>>>>>>
>>>>>> I don't see why the tools would require to know the pITS setup.
>>>>>
>>>>> Even with only a single vits the tools need to know if the system has 0,
>>>>> 1, or more pits, to know whether to vreate a vits at all or not.
>>>>
>>>> In the 1 vITS solution no, it's only necessary to add a new gic define
>>>> for the gic_version field in xen_arch_domainconfig.
>>>
>>> Would we expose a vITS to guests on a host which has no pITS at all?
>>
>> No, Xen will check if we can support vITS. See an example with my "GICv2
>> on GICv3" series. Obviously, we don't allow vGICv3 on GICv2.
> 
> Did you mean to refer to "arm: Allow the user to specify the GIC
> version" or some other part of that series?

Yes I mean this patch.

> I suppose you are proposing a new flag vits=yes|no passed as part of the
> domain config which Xen can then update to indicate yes or no? Or is
> there more to it than that? Could Xen not equally well expose nr_vits
> back to the tools?

A new flag or extending gic_version parameters (gic_version = "v3-its").

With the multiple vITS we would have to retrieve the number of vITS.
Maybe by extending the xen_arch_domainconfig?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 13:27                   ` Julien Grall
@ 2015-05-19 13:36                     ` Ian Campbell
  2015-05-19 13:46                       ` Julien Grall
  2015-05-19 13:54                       ` Ian Campbell
  0 siblings, 2 replies; 77+ messages in thread
From: Ian Campbell @ 2015-05-19 13:36 UTC (permalink / raw)
  To: Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Tue, 2015-05-19 at 14:27 +0100, Julien Grall wrote:
> With the multiple vITS we would have to retrieve the number of vITS.
> Maybe by extending the xen_arch_domainconfig?

I'm sure we can find a way.

The important question is whether we want to go for a N:N vits:pits
mapping or 1:N.

So far I think we are leaning (slightly?) towards the 1:N model, if we
can come up with a satisfactory answer for what to do with global
commands.

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-19 12:10       ` Ian Campbell
@ 2015-05-19 13:37         ` Julien Grall
  2015-05-19 13:51           ` Ian Campbell
  0 siblings, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-19 13:37 UTC (permalink / raw)
  To: Ian Campbell, Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

Hi Ian,

On 19/05/15 13:10, Ian Campbell wrote:
> On Fri, 2015-05-15 at 15:55 +0100, Julien Grall wrote:
> [...]
>>> Translation of certain commands can be expensive (XXX citation
>>> needed).
>>
>> The term "expensive" is subjective. I think we can end up to cheap
>> translation if we properly pre-allocate information (such as device,
>> LPIs...). We can have all the informations before the guest as boot or
>> during hotplug part. It wouldn't take more memory than it should use.
>>
>> During command translation, we would just need to enable the device/LPIs.
>>
>> The remaining expensive part would be the validation. I think we can
>> improve most of them of O(1) (such as collection checking) or O(log(n))
>> (such as device checking).
> [...]
>>> XXX need a solution for this.
>>
>> Command translation can be improved. It may be good too add a section
>> explaining how translation of command foo can be done.
> 
> I think that is covered by the spec, however if there are operations
> which form part of this which are potentially expensive we should
> outline in our design how this will be dealt with.
> 
> Perhaps you or Vijay could propose some additional text covering:
>       * What the potentially expensive operations during a translation
>         are.
>       * How we are going to deal with those operations, including:
>               * What data structure is used
>               * What start of day setup is required to enable this
>               * What operations are therefore required at translation
>                 time

I don't have much time to work on a proposal. I would be happy if Vijay
do it.

>>  I think
>> that limiting the number of batch/command sent per pass would allow a
>> small pass.
> 
> I think we have a few choices:
> 
>       * Limit to one batch per vits at a time
>       * Limit to some total number of batches per scheduling pass
>       * Time bound the scheduling procedure
>
> Do we have a preference?

Time bound may be difficult to implement. I think we would have to limit
batch per vITS (for code simplification) and total number of batch per
scheduling pass at the same time.

>>>   the underlying hardware to the guest.
>>> * Adds complexity to the guest layout, which is right now static. How
>>>   do you decide the number of vITS/root controller exposed:
>>>     * Hotplug is tricky
>>> * Toolstack needs greater knowledge of the host layout
>>> * Given that PCI passthrough doesn't allow migration, maybe we could
>>>   use the layout of the hardware.
>>>
>>> In 1 vITS for all pITS:
>>>
>>> * What to do with global commands? Inject to all pITS and then
>>>   synchronise on them all finishing.
>>> * Handling of out of order completion of commands queued with
>>>   different pITS, since the vITS must appear to complete in
>>>   order. Apart from the book keeping question it makes scheduling more
>>>   interesting:
>>>     * What if you have a pITS with slots available, and the guest command
>>>       queue contains commands which could go to the pITS, but behind ones
>>>       which are targetting another pITS which has no slots
>>>     * What if one pITS is very busy and another is mostly idle and a
>>>       guest submits one command to the busy one (contending with other
>>>       guest) followed by a load of commands targeting the idle one. Those
>>>       commands would be held up in this situation.
>>>     * Reasoning about fairness may be harder.
>>>
>>> XXX need a solution/decision here.
>>
>>> In addition the introduction of direct interrupt injection in version
>>> 4 GICs may imply a vITS per pITS. (Update: it seems not)
>>
>> Other items to add: NUMA and I/O NUMA. I don't know much about it but I
>> think the first solution would be more suitable.
> 
> first solution == ?

1 vITS per pITS.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 13:36                     ` Ian Campbell
@ 2015-05-19 13:46                       ` Julien Grall
  2015-05-19 13:54                       ` Ian Campbell
  1 sibling, 0 replies; 77+ messages in thread
From: Julien Grall @ 2015-05-19 13:46 UTC (permalink / raw)
  To: Ian Campbell, Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On 19/05/15 14:36, Ian Campbell wrote:
> On Tue, 2015-05-19 at 14:27 +0100, Julien Grall wrote:
>> With the multiple vITS we would have to retrieve the number of vITS.
>> Maybe by extending the xen_arch_domainconfig?
> 
> I'm sure we can find a way.
> 
> The important question is whether we want to go for a N:N vits:pits
> mapping or 1:N.
> 
> So far I think we are leaning (slightly?) towards the 1:N model, if we
> can come up with a satisfactory answer for what to do with global
> commands.

I was leaning toward the 1:1 model :).

I think the 1:N model will result to a more complex scheduling and would
slow down the emulation in environment where each domain is using a
different pITS.

Also there is the question of I/O Numa.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-19 13:37         ` Julien Grall
@ 2015-05-19 13:51           ` Ian Campbell
  2015-05-22 12:16             ` Vijay Kilari
  0 siblings, 1 reply; 77+ messages in thread
From: Ian Campbell @ 2015-05-19 13:51 UTC (permalink / raw)
  To: Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Tue, 2015-05-19 at 14:37 +0100, Julien Grall wrote:
> Hi Ian,
> 
> On 19/05/15 13:10, Ian Campbell wrote:
> > On Fri, 2015-05-15 at 15:55 +0100, Julien Grall wrote:
> > [...]
> >>> Translation of certain commands can be expensive (XXX citation
> >>> needed).
> >>
> >> The term "expensive" is subjective. I think we can end up to cheap
> >> translation if we properly pre-allocate information (such as device,
> >> LPIs...). We can have all the informations before the guest as boot or
> >> during hotplug part. It wouldn't take more memory than it should use.
> >>
> >> During command translation, we would just need to enable the device/LPIs.
> >>
> >> The remaining expensive part would be the validation. I think we can
> >> improve most of them of O(1) (such as collection checking) or O(log(n))
> >> (such as device checking).
> > [...]
> >>> XXX need a solution for this.
> >>
> >> Command translation can be improved. It may be good too add a section
> >> explaining how translation of command foo can be done.
> > 
> > I think that is covered by the spec, however if there are operations
> > which form part of this which are potentially expensive we should
> > outline in our design how this will be dealt with.
> > 
> > Perhaps you or Vijay could propose some additional text covering:
> >       * What the potentially expensive operations during a translation
> >         are.
> >       * How we are going to deal with those operations, including:
> >               * What data structure is used
> >               * What start of day setup is required to enable this
> >               * What operations are therefore required at translation
> >                 time
> 
> I don't have much time to work on a proposal. I would be happy if Vijay
> do it.

OK, Vijay could you make a proposal here please.

> 
> >>  I think
> >> that limiting the number of batch/command sent per pass would allow a
> >> small pass.
> > 
> > I think we have a few choices:
> > 
> >       * Limit to one batch per vits at a time
> >       * Limit to some total number of batches per scheduling pass
> >       * Time bound the scheduling procedure
> >
> > Do we have a preference?
> 
> Time bound may be difficult to implement.

Yes, I don't think that one is realistic.

>  I think we would have to limit
> batch per vITS (for code simplification) and total number of batch per
> scheduling pass at the same time.

OK.

> >>>   the underlying hardware to the guest.
> >>> * Adds complexity to the guest layout, which is right now static. How
> >>>   do you decide the number of vITS/root controller exposed:
> >>>     * Hotplug is tricky
> >>> * Toolstack needs greater knowledge of the host layout
> >>> * Given that PCI passthrough doesn't allow migration, maybe we could
> >>>   use the layout of the hardware.
> >>>
> >>> In 1 vITS for all pITS:
> >>>
> >>> * What to do with global commands? Inject to all pITS and then
> >>>   synchronise on them all finishing.
> >>> * Handling of out of order completion of commands queued with
> >>>   different pITS, since the vITS must appear to complete in
> >>>   order. Apart from the book keeping question it makes scheduling more
> >>>   interesting:
> >>>     * What if you have a pITS with slots available, and the guest command
> >>>       queue contains commands which could go to the pITS, but behind ones
> >>>       which are targetting another pITS which has no slots
> >>>     * What if one pITS is very busy and another is mostly idle and a
> >>>       guest submits one command to the busy one (contending with other
> >>>       guest) followed by a load of commands targeting the idle one. Those
> >>>       commands would be held up in this situation.
> >>>     * Reasoning about fairness may be harder.
> >>>
> >>> XXX need a solution/decision here.
> >>
> >>> In addition the introduction of direct interrupt injection in version
> >>> 4 GICs may imply a vITS per pITS. (Update: it seems not)
> >>
> >> Other items to add: NUMA and I/O NUMA. I don't know much about it but I
> >> think the first solution would be more suitable.
> > 
> > first solution == ?
> 
> 1 vITS per pITS.

Ah, yes.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 13:36                     ` Ian Campbell
  2015-05-19 13:46                       ` Julien Grall
@ 2015-05-19 13:54                       ` Ian Campbell
  2015-05-19 14:04                         ` Vijay Kilari
  2015-05-19 14:06                         ` Julien Grall
  1 sibling, 2 replies; 77+ messages in thread
From: Ian Campbell @ 2015-05-19 13:54 UTC (permalink / raw)
  To: Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Tue, 2015-05-19 at 14:36 +0100, Ian Campbell wrote:
> On Tue, 2015-05-19 at 14:27 +0100, Julien Grall wrote:
> > With the multiple vITS we would have to retrieve the number of vITS.
> > Maybe by extending the xen_arch_domainconfig?
> 
> I'm sure we can find a way.
> 
> The important question is whether we want to go for a N:N vits:pits
> mapping or 1:N.
> 
> So far I think we are leaning (slightly?) towards the 1:N model, if we
> can come up with a satisfactory answer for what to do with global
> commands.

Actually, Julien just mentioned NUMA which I think is a strong argument
for the N:N model.

We need to make a choice here one way or another, since it has knock on
effects on other parts, e.g the handling of SYNC and INVALL etc.

Given that N:N seems likely to be simpler from the Xen side and in any
case doesn't preclude us moving to a 1:N model (or even a 2:N model etc)
in the future how about we start with that?

If there is agreement in taking this direction then I will adjust the
relevant sections of the document to reflect this.

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 13:54                       ` Ian Campbell
@ 2015-05-19 14:04                         ` Vijay Kilari
  2015-05-19 14:18                           ` Ian Campbell
  2015-05-19 14:06                         ` Julien Grall
  1 sibling, 1 reply; 77+ messages in thread
From: Vijay Kilari @ 2015-05-19 14:04 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On Tue, May 19, 2015 at 7:24 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Tue, 2015-05-19 at 14:36 +0100, Ian Campbell wrote:
>> On Tue, 2015-05-19 at 14:27 +0100, Julien Grall wrote:
>> > With the multiple vITS we would have to retrieve the number of vITS.
>> > Maybe by extending the xen_arch_domainconfig?
>>
>> I'm sure we can find a way.
>>
>> The important question is whether we want to go for a N:N vits:pits
>> mapping or 1:N.
>>
>> So far I think we are leaning (slightly?) towards the 1:N model, if we
>> can come up with a satisfactory answer for what to do with global
>> commands.
>
> Actually, Julien just mentioned NUMA which I think is a strong argument
> for the N:N model.
>
> We need to make a choice here one way or another, since it has knock on
> effects on other parts, e.g the handling of SYNC and INVALL etc.
>
> Given that N:N seems likely to be simpler from the Xen side and in any
> case doesn't preclude us moving to a 1:N model (or even a 2:N model etc)
> in the future how about we start with that?
>
> If there is agreement in taking this direction then I will adjust the
> relevant sections of the document to reflect this.

Yes, this make Xen side simple. Most important point to discuss is

1) How Xen maps vITS to pITS. its0 -> vits0?
2) When PCI device is assigned to DomU, how does domU choose
    vITS to send commands.  AFAIK, the BDF of assigned device
    is different from actual BDF in DomU.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 12:48                                               ` Vijay Kilari
  2015-05-19 13:12                                                 ` Ian Campbell
@ 2015-05-19 14:05                                                 ` Julien Grall
  2015-05-19 14:48                                                   ` Ian Campbell
  1 sibling, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-19 14:05 UTC (permalink / raw)
  To: Vijay Kilari, Ian Campbell
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On 19/05/15 13:48, Vijay Kilari wrote:
> On Tue, May 19, 2015 at 5:49 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
>> On Tue, 2015-05-19 at 17:40 +0530, Vijay Kilari wrote:
>>>> If a guest issues (for example) a MOVI which is not followed by an
>>>> INV/INVALL on native then what would trigger the LPI configuration to be
>>>> applied by the h/w?
>>>>
>>>> If a guest is required to send an INV/INVALL in order for some change to
>>>> take affect and it does not do so then it is buggy, isn't it?
>>>
>>> agreed.
>>>
>>>>
>>>> IOW all Xen needs to do is to propagate any guest initiated INV/INVALL
>>>> as/when it occurs in the command queue. I don't think we need to
>>>> fabricate an additional INV/INVALL while emulating a MOVI.
>>>>
>>>> What am I missing?
>>>
>>> back to point:
>>>
>>> INV has device id so not an issue.
>>> INVALL does not have device id to know pITS to send.
>>> For that reason Xen is expected to insert INVALL at proper
>>> places similar to SYNC and ignore INV/INVALL of guest.
>>
>> Why wouldn't Xen just insert an INVALL in to all relevant pITS in
>> response to an INVALL from the guest?
> 
> If INVALL is sent on all pITS, then we need to wait for all pITS to complete
> the command before we update CREADR of vITS.
> 
>>
>> If you are proposing something different then please be explicit by what
>> you mean by "proper places similar to SYNC". Ideally by proposing some
>> new text which I can use in the document.
> 
> If the platform has more than 1 pITS, The ITS commands are mapped
> from vITS to pITS using device ID provided with ITS command.
> 
> However SYNC and INVALL does not have device ID.
> In such case there could be two ways to handle
> 1) SYNC and INVALL of guest will be sent to pITS based on previous ITS commands
>     of guest
> 2) Xen will insert/append SYNC and INVALL to guest ITS commands
> where-ever required and ignore guest
>    SYNC and INVALL commands
> 
> IMO (2) would be better as approach (1) might fail to handle
> scenario where-in guest is sending only SYNC & INVALL commands.

When the guest send a SYNC, it expects all the command to be completed.
If you send SYNC only when you think it's required we will end up to
unexpected behavior.

Now, for INVALL, as said on a previous mail it's never required after an
instruction. It's used to ask the ITS to invalid his cache of the LPI
configuration.

A software would be buggy if no INV/INVALL is sent after change the LPI
configuration table.

As suggested on a previous mail, I think we can get rid of sending
INV/INVALL command to the pITS by trapping the LPI configuration table:

For every write access, when the vLPIs is valid (i.e associated to a
device/interrupt), Xen will toggle the enable bit in the hardware LPIs
configuration table, send an INV * and sync his internal state. This
requiring to be able to translate the vLPIs to a (device,ID).

INVALL/INV command could be ignored and directly increment CREADR (with
some care) because it only ensure that the command has been executed,
not fully completed. A SYNC would be required from the guest in order to
ensure the completion.

Therefore we would need more care for the SYNC. Maybe by injecting a
SYNC when it's necessary.

Note that we would need Xen to send command on behalf of the guest (i.e
not part of the command queue).

With this solution, it would be possible to have a small amount of time
where the pITS doesn't use the correct the configuration (i.e the
interrupt not yet enabled/disabled). Xen is able to cooperate with that
and will queue the interrupt to the guest.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 13:54                       ` Ian Campbell
  2015-05-19 14:04                         ` Vijay Kilari
@ 2015-05-19 14:06                         ` Julien Grall
  1 sibling, 0 replies; 77+ messages in thread
From: Julien Grall @ 2015-05-19 14:06 UTC (permalink / raw)
  To: Ian Campbell, Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On 19/05/15 14:54, Ian Campbell wrote:
> On Tue, 2015-05-19 at 14:36 +0100, Ian Campbell wrote:
>> On Tue, 2015-05-19 at 14:27 +0100, Julien Grall wrote:
>>> With the multiple vITS we would have to retrieve the number of vITS.
>>> Maybe by extending the xen_arch_domainconfig?
>>
>> I'm sure we can find a way.
>>
>> The important question is whether we want to go for a N:N vits:pits
>> mapping or 1:N.
>>
>> So far I think we are leaning (slightly?) towards the 1:N model, if we
>> can come up with a satisfactory answer for what to do with global
>> commands.
> 
> Actually, Julien just mentioned NUMA which I think is a strong argument
> for the N:N model.
> 
> We need to make a choice here one way or another, since it has knock on
> effects on other parts, e.g the handling of SYNC and INVALL etc.
> 
> Given that N:N seems likely to be simpler from the Xen side and in any
> case doesn't preclude us moving to a 1:N model (or even a 2:N model etc)
> in the future how about we start with that?

+1.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 14:04                         ` Vijay Kilari
@ 2015-05-19 14:18                           ` Ian Campbell
  2015-05-21 12:37                             ` Manish Jaggi
  0 siblings, 1 reply; 77+ messages in thread
From: Ian Campbell @ 2015-05-19 14:18 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On Tue, 2015-05-19 at 19:34 +0530, Vijay Kilari wrote:
> On Tue, May 19, 2015 at 7:24 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> > On Tue, 2015-05-19 at 14:36 +0100, Ian Campbell wrote:
> >> On Tue, 2015-05-19 at 14:27 +0100, Julien Grall wrote:
> >> > With the multiple vITS we would have to retrieve the number of vITS.
> >> > Maybe by extending the xen_arch_domainconfig?
> >>
> >> I'm sure we can find a way.
> >>
> >> The important question is whether we want to go for a N:N vits:pits
> >> mapping or 1:N.
> >>
> >> So far I think we are leaning (slightly?) towards the 1:N model, if we
> >> can come up with a satisfactory answer for what to do with global
> >> commands.
> >
> > Actually, Julien just mentioned NUMA which I think is a strong argument
> > for the N:N model.
> >
> > We need to make a choice here one way or another, since it has knock on
> > effects on other parts, e.g the handling of SYNC and INVALL etc.
> >
> > Given that N:N seems likely to be simpler from the Xen side and in any
> > case doesn't preclude us moving to a 1:N model (or even a 2:N model etc)
> > in the future how about we start with that?
> >
> > If there is agreement in taking this direction then I will adjust the
> > relevant sections of the document to reflect this.
> 
> Yes, this make Xen side simple. Most important point to discuss is
> 
> 1) How Xen maps vITS to pITS. its0 -> vits0?

The choices are basically either Xen chooses and the tools get told (or
"Just Know" the result), or the tools choose and setup the mapping in
Xen via hypercalls.

> 2) When PCI device is assigned to DomU, how does domU choose
>     vITS to send commands.  AFAIK, the BDF of assigned device
>     is different from actual BDF in DomU.

AIUI this is described in the firmware tables.

e.g. in DT via the msi-parent phandle on the PCI root complex or
individual device.

Is there an assumption here that a single PCI root bridge is associated
with a single ITS block? Or can different devices on a PCI bus use
different ITS blocks?

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 14:05                                                 ` Julien Grall
@ 2015-05-19 14:48                                                   ` Ian Campbell
  2015-05-19 15:44                                                     ` Julien Grall
  0 siblings, 1 reply; 77+ messages in thread
From: Ian Campbell @ 2015-05-19 14:48 UTC (permalink / raw)
  To: Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Tue, 2015-05-19 at 15:05 +0100, Julien Grall wrote:
> On 19/05/15 13:48, Vijay Kilari wrote:
> > On Tue, May 19, 2015 at 5:49 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> >> On Tue, 2015-05-19 at 17:40 +0530, Vijay Kilari wrote:
> >>>> If a guest issues (for example) a MOVI which is not followed by an
> >>>> INV/INVALL on native then what would trigger the LPI configuration to be
> >>>> applied by the h/w?
> >>>>
> >>>> If a guest is required to send an INV/INVALL in order for some change to
> >>>> take affect and it does not do so then it is buggy, isn't it?
> >>>
> >>> agreed.
> >>>
> >>>>
> >>>> IOW all Xen needs to do is to propagate any guest initiated INV/INVALL
> >>>> as/when it occurs in the command queue. I don't think we need to
> >>>> fabricate an additional INV/INVALL while emulating a MOVI.
> >>>>
> >>>> What am I missing?
> >>>
> >>> back to point:
> >>>
> >>> INV has device id so not an issue.
> >>> INVALL does not have device id to know pITS to send.
> >>> For that reason Xen is expected to insert INVALL at proper
> >>> places similar to SYNC and ignore INV/INVALL of guest.
> >>
> >> Why wouldn't Xen just insert an INVALL in to all relevant pITS in
> >> response to an INVALL from the guest?
> > 
> > If INVALL is sent on all pITS, then we need to wait for all pITS to complete
> > the command before we update CREADR of vITS.
> > 
> >>
> >> If you are proposing something different then please be explicit by what
> >> you mean by "proper places similar to SYNC". Ideally by proposing some
> >> new text which I can use in the document.
> > 
> > If the platform has more than 1 pITS, The ITS commands are mapped
> > from vITS to pITS using device ID provided with ITS command.
> > 
> > However SYNC and INVALL does not have device ID.
> > In such case there could be two ways to handle
> > 1) SYNC and INVALL of guest will be sent to pITS based on previous ITS commands
> >     of guest
> > 2) Xen will insert/append SYNC and INVALL to guest ITS commands
> > where-ever required and ignore guest
> >    SYNC and INVALL commands
> > 
> > IMO (2) would be better as approach (1) might fail to handle
> > scenario where-in guest is sending only SYNC & INVALL commands.
> 
> When the guest send a SYNC, it expects all the command to be completed.
> If you send SYNC only when you think it's required we will end up to
> unexpected behavior.
> 
> Now, for INVALL, as said on a previous mail it's never required after an
> instruction. It's used to ask the ITS to invalid his cache of the LPI
> configuration.
> 
> A software would be buggy if no INV/INVALL is sent after change the LPI
> configuration table.

Specifically _guest_ software.

AIUI the ITS is not required to reread the LPI cfg table unless an
INV/INVALL is issued, but it is allowed to do so if it wants, i.e. it
could pickup the config change at any point after the write to the cfg
table. Is that correct?

If so then as long as it cannot blow up in Xen's face (i.e. an interrupt
storm) I think between a write to the LPI config table and the next
associated INV/INVALL we are entitled either continue using the old
config until the INV/INVALL, to immediately enact the change or anything
in the middle. I think this gives a fair bit of flexibility.

You've proposed something at the "immediately enact" end of the
spectrum.

> As suggested on a previous mail, I think we can get rid of sending
> INV/INVALL command to the pITS by trapping the LPI configuration table:

The motivation here is simply to avoid the potential negative impact on
the system of a guest which fills its command queue with INVALL
commands?

I think we don't especially care about INV since they are targeted. We
care about INVALL because they are global. INV handling comes along for
the ride though.

> For every write access, when the vLPIs is valid (i.e associated to a
> device/interrupt), Xen will toggle the enable bit in the hardware LPIs
> configuration table, send an INV * and sync his internal state. This
> requiring to be able to translate the vLPIs to a (device,ID).

"INV *"? You don't mean INVALL I think, but rather INV of the specific
device?

One possible downside is that you will convert this guest vits
interaction:
        for all LPIs
                enable LPI
        INVALL

Into this pits interaction:
        for all LPIs
                enable LPI
                INV LPI

Also sequences of events with toggle things back and forth before
invalidating are similarly made more synchronous. (Such sequences seem
dumb to me, but kernel side abstractions sometimes lead to such things).

> INVALL/INV command could be ignored and directly increment CREADR (with
> some care) because it only ensure that the command has been executed,
> not fully completed. A SYNC would be required from the guest in order to
> ensure the completion.
> 
> Therefore we would need more care for the SYNC. Maybe by injecting a
> SYNC when it's necessary.
> 
> Note that we would need Xen to send command on behalf of the guest (i.e
> not part of the command queue).

A guest may do this:
        Enqueue command A
        Enqueue command B
        Change LPI1 cfg table
        Change LPI2 cfg table
        Enqueue command C
        Enqueue command D
        Enqueue INV LPI2
        Enqueue INV LPI1

With your change this would end up going to the PITS as:
        Enqueue command A
        Enqueue command B
        Change LPI1 cfg table
        Enqueue INV LPI1
        Change LPI2 cfg table
        Enqueue INV LPI2
        Enqueue command C
        Enqueue command D

Note that the INV's have been reordered WRT command C and D as well as
each other. Are there sequences of commands where this may make a
semantic difference?

What if command C is a SYNC for example?

> With this solution, it would be possible to have a small amount of time
> where the pITS doesn't use the correct the configuration (i.e the
> interrupt not yet enabled/disabled). Xen is able to cooperate with that
> and will queue the interrupt to the guest.

I think it is inherent in the h/w design that an LPI may still be
delivered after the cfg table has changed or even the INV enqueued, it
is only guaranteed to take effect with a sync following the INV.

I had in mind a lazier scheme which I'll mention for completeness not
because I necessarily think it is better.

For each vits we maintain a bit map which marks LPI cfg table entries as
dirty. Possibly a count of dirty entries too.

On trap of cfg table write we propagate the change to the physical table
and set the corresponding dirty bit (and count++ if we are doing that)

On INV we insert the corresponding INV to the PITS iff
test_and_clear(dirty, LPI) and count--. If the bit is not set then we
just eat the INV.

On INVALL we insert INVALL iff there are bits set in the bitmap (or use
count is not 0 instead, if we chose to maintain that) and clear the
bitmap. If no bits are set in the bitmap then we eat the INVALL.

Extension: If we are tracking count then we may choose to switch INVAL
into one or more INV's up to some threshold of dirtiness.

I've been trying to think of ways of extending this to reduce the
number/impact of guest SYNC. Other than tracking whether there have been
0 or !0 commands since the last SYNC (and squashing the extras) I
haven't thought of a cunning scheme.

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 14:48                                                   ` Ian Campbell
@ 2015-05-19 15:44                                                     ` Julien Grall
  0 siblings, 0 replies; 77+ messages in thread
From: Julien Grall @ 2015-05-19 15:44 UTC (permalink / raw)
  To: Ian Campbell, Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On 19/05/15 15:48, Ian Campbell wrote:
>> A software would be buggy if no INV/INVALL is sent after change the LPI
>> configuration table.
> 
> Specifically _guest_ software.
> 
> AIUI the ITS is not required to reread the LPI cfg table unless an
> INV/INVALL is issued, but it is allowed to do so if it wants, i.e. it
> could pickup the config change at any point after the write to the cfg
> table. Is that correct?

Yes.

> If so then as long as it cannot blow up in Xen's face (i.e. an interrupt
> storm) I think between a write to the LPI config table and the next
> associated INV/INVALL we are entitled either continue using the old
> config until the INV/INVALL, to immediately enact the change or anything
> in the middle. I think this gives a fair bit of flexibility.

The interrupt is deprivileged by Xen and EOI by the guest. I don't think
it's possible to produce an interrupt storm.

> You've proposed something at the "immediately enact" end of the
> spectrum.

Yes, it one suggestion among another.

>> As suggested on a previous mail, I think we can get rid of sending
>> INV/INVALL command to the pITS by trapping the LPI configuration table:
> 
> The motivation here is simply to avoid the potential negative impact on
> the system of a guest which fills its command queue with INVALL
> commands?

Right.

> I think we don't especially care about INV since they are targeted. We
> care about INVALL because they are global. INV handling comes along for
> the ride though.
> 
>> For every write access, when the vLPIs is valid (i.e associated to a
>> device/interrupt), Xen will toggle the enable bit in the hardware LPIs
>> configuration table, send an INV * and sync his internal state. This
>> requiring to be able to translate the vLPIs to a (device,ID).
> 
> "INV *"? You don't mean INVALL I think, but rather INV of the specific
> device?

Yes, I mean INV command.

> 
> One possible downside is that you will convert this guest vits
> interaction:
>         for all LPIs
>                 enable LPI
>         INVALL
> 
> Into this pits interaction:
>         for all LPIs
>                 enable LPI
>                 INV LPI
> 
> Also sequences of events with toggle things back and forth before
> invalidating are similarly made more synchronous. (Such sequences seem
> dumb to me, but kernel side abstractions sometimes lead to such things).

Correct, this will result to send much more command to the ITS.

>> INVALL/INV command could be ignored and directly increment CREADR (with
>> some care) because it only ensure that the command has been executed,
>> not fully completed. A SYNC would be required from the guest in order to
>> ensure the completion.
>>
>> Therefore we would need more care for the SYNC. Maybe by injecting a
>> SYNC when it's necessary.
>>
>> Note that we would need Xen to send command on behalf of the guest (i.e
>> not part of the command queue).
> 
> A guest may do this:
>         Enqueue command A
>         Enqueue command B
>         Change LPI1 cfg table
>         Change LPI2 cfg table
>         Enqueue command C
>         Enqueue command D
>         Enqueue INV LPI2
>         Enqueue INV LPI1
> 
> With your change this would end up going to the PITS as:
>         Enqueue command A
>         Enqueue command B
>         Change LPI1 cfg table
>         Enqueue INV LPI1
>         Change LPI2 cfg table
>         Enqueue INV LPI2
>         Enqueue command C
>         Enqueue command D
> 
> Note that the INV's have been reordered WRT command C and D as well as
> each other. Are there sequences of commands where this may make a
> semantic difference?

AFAICT, the commands don't change their semantics following the state of
LPI configuration.

> What if command C is a SYNC for example?

That would not be a problem. As soon as the OS write into the LPI
configuration it can expect that the ITS will take the change a anytime.

>> With this solution, it would be possible to have a small amount of time
>> where the pITS doesn't use the correct the configuration (i.e the
>> interrupt not yet enabled/disabled). Xen is able to cooperate with that
>> and will queue the interrupt to the guest.
> 
> I think it is inherent in the h/w design that an LPI may still be
> delivered after the cfg table has changed or even the INV enqueued, it
> is only guaranteed to take effect with a sync following the INV.

Right.

> I had in mind a lazier scheme which I'll mention for completeness not
> because I necessarily think it is better.

I wasn't expected to have a correct solution from the beginning ;). I
was more a first step for a better one such as yours.

> For each vits we maintain a bit map which marks LPI cfg table entries as
> dirty. Possibly a count of dirty entries too.
> 
> On trap of cfg table write we propagate the change to the physical table
> and set the corresponding dirty bit (and count++ if we are doing that)
> 
> On INV we insert the corresponding INV to the PITS iff
> test_and_clear(dirty, LPI) and count--. If the bit is not set then we
> just eat the INV.

The bitmap is global to the host, right?

If not we may end up to send multiple INVALL from different domain in a
row. Although, I don't know if we could improve it.

> On INVALL we insert INVALL iff there are bits set in the bitmap (or use
> count is not 0 instead, if we chose to maintain that) and clear the
> bitmap. If no bits are set in the bitmap then we eat the INVALL.
> 
> Extension: If we are tracking count then we may choose to switch INVAL
> into one or more INV's up to some threshold of dirtiness.

It may be more expensive to do the conversion LPIs -> (Dev, ID) than
doing the INVALL.

> I've been trying to think of ways of extending this to reduce the
> number/impact of guest SYNC. Other than tracking whether there have been
> 0 or !0 commands since the last SYNC (and squashing the extras) I
> haven't thought of a cunning scheme.

Also this discussion remind me another point.  So far, we've assume that
Xen doesn't send a single command. Although, when a guest is destroyed
we may need to wipe a part of the LPI configuration table and therefore
sending an INVALL.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-19 14:18                           ` Ian Campbell
@ 2015-05-21 12:37                             ` Manish Jaggi
  2015-05-26 13:04                               ` Ian Campbell
  0 siblings, 1 reply; 77+ messages in thread
From: Manish Jaggi @ 2015-05-21 12:37 UTC (permalink / raw)
  To: Ian Campbell, Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini



On Tuesday 19 May 2015 07:18 AM, Ian Campbell wrote:
> On Tue, 2015-05-19 at 19:34 +0530, Vijay Kilari wrote:
>> On Tue, May 19, 2015 at 7:24 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
>>> On Tue, 2015-05-19 at 14:36 +0100, Ian Campbell wrote:
>>>> On Tue, 2015-05-19 at 14:27 +0100, Julien Grall wrote:
>>>>> With the multiple vITS we would have to retrieve the number of vITS.
>>>>> Maybe by extending the xen_arch_domainconfig?
>>>> I'm sure we can find a way.
>>>>
>>>> The important question is whether we want to go for a N:N vits:pits
>>>> mapping or 1:N.
>>>>
>>>> So far I think we are leaning (slightly?) towards the 1:N model, if we
>>>> can come up with a satisfactory answer for what to do with global
>>>> commands.
>>> Actually, Julien just mentioned NUMA which I think is a strong argument
>>> for the N:N model.
>>>
>>> We need to make a choice here one way or another, since it has knock on
>>> effects on other parts, e.g the handling of SYNC and INVALL etc.
>>>
>>> Given that N:N seems likely to be simpler from the Xen side and in any
>>> case doesn't preclude us moving to a 1:N model (or even a 2:N model etc)
>>> in the future how about we start with that?
>>>
>>> If there is agreement in taking this direction then I will adjust the
>>> relevant sections of the document to reflect this.
>> Yes, this make Xen side simple. Most important point to discuss is
>>
>> 1) How Xen maps vITS to pITS. its0 -> vits0?
> The choices are basically either Xen chooses and the tools get told (or
> "Just Know" the result), or the tools choose and setup the mapping in
> Xen via hypercalls.
>
This could be one possible flow:
-1- xen code parses the pci node and creates a pci_hostbridge structure 
which stores the device_tree ptr.
(using this pointer msi-parent (or respective its) can be retrieved)
-2- dom0 invokes a hypercall to register pci_hostbridge (seg_no:cfg_addr)
-3- Xen now knows that the device id (seg:bus:dev.fn) has which its.
Using a helper function its node for a seg_no can be retrieved.
-4- When a device is assigned to a domU, we introduce a new hypercall 
map_guest_bdf which would let xen know
that for a guest how a virtual sbdf maps to a physical sdbf
-5- domU is booted with a single virtual its node in device tree. Front 
end driver  attaches this its as msi-parent
-6- When domU accesses for ITS are trapped in Xen, using the helper 
function say
get_phys_its_for_guest(guest_id, guest_sbdf, /*[out]*/its_ptr *its)

its can be retrieved.
AFAIK this is numa safe.
>> 2) When PCI device is assigned to DomU, how does domU choose
>>      vITS to send commands.  AFAIK, the BDF of assigned device
>>      is different from actual BDF in DomU.
> AIUI this is described in the firmware tables.
>
> e.g. in DT via the msi-parent phandle on the PCI root complex or
> individual device.
>
> Is there an assumption here that a single PCI root bridge is associated
> with a single ITS block? Or can different devices on a PCI bus use
> different ITS blocks?
>
> Ian.
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-19 13:51           ` Ian Campbell
@ 2015-05-22 12:16             ` Vijay Kilari
  2015-05-22 12:49               ` Julien Grall
                                 ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Vijay Kilari @ 2015-05-22 12:16 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On Tue, May 19, 2015 at 7:21 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Tue, 2015-05-19 at 14:37 +0100, Julien Grall wrote:
>> Hi Ian,
>>
>> On 19/05/15 13:10, Ian Campbell wrote:
>> > On Fri, 2015-05-15 at 15:55 +0100, Julien Grall wrote:
>> > [...]
>> >>> Translation of certain commands can be expensive (XXX citation
>> >>> needed).
>> >>
>> >> The term "expensive" is subjective. I think we can end up to cheap
>> >> translation if we properly pre-allocate information (such as device,
>> >> LPIs...). We can have all the informations before the guest as boot or
>> >> during hotplug part. It wouldn't take more memory than it should use.
>> >>
>> >> During command translation, we would just need to enable the device/LPIs.
>> >>
>> >> The remaining expensive part would be the validation. I think we can
>> >> improve most of them of O(1) (such as collection checking) or O(log(n))
>> >> (such as device checking).
>> > [...]
>> >>> XXX need a solution for this.
>> >>
>> >> Command translation can be improved. It may be good too add a section
>> >> explaining how translation of command foo can be done.
>> >
>> > I think that is covered by the spec, however if there are operations
>> > which form part of this which are potentially expensive we should
>> > outline in our design how this will be dealt with.
>> >
>> > Perhaps you or Vijay could propose some additional text covering:
>> >       * What the potentially expensive operations during a translation
>> >         are.
>> >       * How we are going to deal with those operations, including:
>> >               * What data structure is used
>> >               * What start of day setup is required to enable this
>> >               * What operations are therefore required at translation
>> >                 time
>>
>> I don't have much time to work on a proposal. I would be happy if Vijay
>> do it.
>
> OK, Vijay could you make a proposal here please.

__text__

1) Command translation:
-----------------------------------

 - ITS commands contains device ID, Event ID (vID), Collection ID
(vCID), Target Address (vTA)
    parameters
 - All these parameters should be validated
 - These parameters should be translated from Virtual to Physical

Of the existing GICv3 ITS commands, MAPC, MAPD, MAPVI/MAPI are the time
consuming commands as these commands creates entry in the Xen ITS structures,
which are used to validate other ITS commands.

1.1 MAPC command translation
-----------------------------------------------
   Format: MAPC vCID, vTA

   -  vTA is validated against Re-distributor address by searching
Redistributor region /
       CPU number based on GITS_TYPER.PAtype and Physical Collection
ID & Physical
       Target address are retrieved
   -  Each vITS will have cid_map (struct cid_mapping) which holds mapping of
      Virtual Collection ID, Virtual Targets address and Physical Collection ID.
   -  MAPC pCID, pTA physical ITS command is generated

   Here there is no overhead, the cid_map entries (approx 32 entries)
are preallocated when
   vITS is created.

1.2 MAPD Command translation:
-----------------------------------------------
   Format: MAPD device, ITT IPA, ITT Size

   MAPD is sent with Validation bit set if device needs to be added
and reset when device is removed

If Validation bit is set:
   - Allocate memory for its_device struct
   - Validate ITT IPA & ITT size and update its_device struct
   - Find number of vectors(nrvecs) for this device by querying PCI
helper function
   - Allocate nrvecs number of LPI
   - Allocate memory for struct vlpi_map for this device. This
vlpi_map holds mapping
     of Virtual LPI to Physical LPI and ID.
   - Find physical ITS node for which this device is assigned

   - Call p2m_lookup on ITT IPA addr and get physical ITT address
   - Validate ITT Size
   - Generate/format physical ITS command: MAPD, ITT PA, ITT Size

   Here the overhead is with memory allocation for its_device and vlpi_map

If Validation bit is not set:
    - Validate if the device exits by checking vITS device list
    - Clear all vlpis assigned for this device
    - Remove this device from vITS list
    - Free memory

1.3 MAPVI/MAPI Command translation:
-----------------------------------------------
   Format: MAPVI device, ID, vID, vCID

- Validate if the device exits by checking vITS device list
- Validate vCID and get pCID by searching cid_map
- if vID does not have entry in vlpi_entries of this device
  If not, Allot pID from vlpi_map of this device and update
vlpi_entries with new pID
- Allocate irq descriptor and add to RB tree
- call route_irq_to_guest() for this pID
- Generate/format physical ITS command: MAPVI device ID, pID, pCID

Here the overhead is allot physical ID, allocate memory for
irq descriptor and  routing interrupt

All other ITS command like MOVI, DISCARD, INV, INVALL, INT, CLEAR,
SYNC just validate and generate physical command

__text__

We can discuss and add how to reduce translation time.

>>
>> >>  I think
>> >> that limiting the number of batch/command sent per pass would allow a
>> >> small pass.
>> >
>> > I think we have a few choices:
>> >
>> >       * Limit to one batch per vits at a time
>> >       * Limit to some total number of batches per scheduling pass
>> >       * Time bound the scheduling procedure
>> >
>> > Do we have a preference?
>>
>> Time bound may be difficult to implement.
>
> Yes, I don't think that one is realistic.
>
>>  I think we would have to limit
>> batch per vITS (for code simplification) and total number of batch per
>> scheduling pass at the same time.
>
> OK.
>
>> >>>   the underlying hardware to the guest.
>> >>> * Adds complexity to the guest layout, which is right now static. How
>> >>>   do you decide the number of vITS/root controller exposed:
>> >>>     * Hotplug is tricky
>> >>> * Toolstack needs greater knowledge of the host layout
>> >>> * Given that PCI passthrough doesn't allow migration, maybe we could
>> >>>   use the layout of the hardware.
>> >>>
>> >>> In 1 vITS for all pITS:
>> >>>
>> >>> * What to do with global commands? Inject to all pITS and then
>> >>>   synchronise on them all finishing.
>> >>> * Handling of out of order completion of commands queued with
>> >>>   different pITS, since the vITS must appear to complete in
>> >>>   order. Apart from the book keeping question it makes scheduling more
>> >>>   interesting:
>> >>>     * What if you have a pITS with slots available, and the guest command
>> >>>       queue contains commands which could go to the pITS, but behind ones
>> >>>       which are targetting another pITS which has no slots
>> >>>     * What if one pITS is very busy and another is mostly idle and a
>> >>>       guest submits one command to the busy one (contending with other
>> >>>       guest) followed by a load of commands targeting the idle one. Those
>> >>>       commands would be held up in this situation.
>> >>>     * Reasoning about fairness may be harder.
>> >>>
>> >>> XXX need a solution/decision here.
>> >>
>> >>> In addition the introduction of direct interrupt injection in version
>> >>> 4 GICs may imply a vITS per pITS. (Update: it seems not)
>> >>
>> >> Other items to add: NUMA and I/O NUMA. I don't know much about it but I
>> >> think the first solution would be more suitable.
>> >
>> > first solution == ?
>>
>> 1 vITS per pITS.
>
> Ah, yes.
>
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-22 12:16             ` Vijay Kilari
@ 2015-05-22 12:49               ` Julien Grall
  2015-05-22 13:58                 ` Vijay Kilari
  2015-05-24 10:35               ` Julien Grall
  2015-05-27 11:22               ` Ian Campbell
  2 siblings, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-22 12:49 UTC (permalink / raw)
  To: Vijay Kilari, Ian Campbell
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

Hi Vijay,

On 22/05/15 13:16, Vijay Kilari wrote:
> On Tue, May 19, 2015 at 7:21 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
>> On Tue, 2015-05-19 at 14:37 +0100, Julien Grall wrote:
>>> Hi Ian,
>>>
>>> On 19/05/15 13:10, Ian Campbell wrote:
>>>> On Fri, 2015-05-15 at 15:55 +0100, Julien Grall wrote:
>>>> [...]
>>>>>> Translation of certain commands can be expensive (XXX citation
>>>>>> needed).
>>>>>
>>>>> The term "expensive" is subjective. I think we can end up to cheap
>>>>> translation if we properly pre-allocate information (such as device,
>>>>> LPIs...). We can have all the informations before the guest as boot or
>>>>> during hotplug part. It wouldn't take more memory than it should use.
>>>>>
>>>>> During command translation, we would just need to enable the device/LPIs.
>>>>>
>>>>> The remaining expensive part would be the validation. I think we can
>>>>> improve most of them of O(1) (such as collection checking) or O(log(n))
>>>>> (such as device checking).
>>>> [...]
>>>>>> XXX need a solution for this.
>>>>>
>>>>> Command translation can be improved. It may be good too add a section
>>>>> explaining how translation of command foo can be done.
>>>>
>>>> I think that is covered by the spec, however if there are operations
>>>> which form part of this which are potentially expensive we should
>>>> outline in our design how this will be dealt with.
>>>>
>>>> Perhaps you or Vijay could propose some additional text covering:
>>>>       * What the potentially expensive operations during a translation
>>>>         are.
>>>>       * How we are going to deal with those operations, including:
>>>>               * What data structure is used
>>>>               * What start of day setup is required to enable this
>>>>               * What operations are therefore required at translation
>>>>                 time
>>>
>>> I don't have much time to work on a proposal. I would be happy if Vijay
>>> do it.
>>
>> OK, Vijay could you make a proposal here please.
> 
> __text__
> 
> 1) Command translation:
> -----------------------------------
> 
>  - ITS commands contains device ID, Event ID (vID), Collection ID
> (vCID), Target Address (vTA)
>     parameters
>  - All these parameters should be validated
>  - These parameters should be translated from Virtual to Physical
> 
> Of the existing GICv3 ITS commands, MAPC, MAPD, MAPVI/MAPI are the time
> consuming commands as these commands creates entry in the Xen ITS structures,
> which are used to validate other ITS commands.
> 
> 1.1 MAPC command translation
> -----------------------------------------------
>    Format: MAPC vCID, vTA
> 
>    -  vTA is validated against Re-distributor address by searching
> Redistributor region /
>        CPU number based on GITS_TYPER.PAtype and Physical Collection
> ID & Physical
>        Target address are retrieved
>    -  Each vITS will have cid_map (struct cid_mapping) which holds mapping of
>       Virtual Collection ID, Virtual Targets address and Physical Collection ID.

How the vCID is mapped to the pCID? How would that fit with interrupt
migration?

>    -  MAPC pCID, pTA physical ITS command is generated
> 
>    Here there is no overhead, the cid_map entries (approx 32 entries)
> are preallocated when
>    vITS is created.

Wrong, there is an overhead with your solution. If you have
GITS_TYPER.PTA == 1 (i.e using the physical address of re-distributors)
you have to loop through all the re-distributors which may be long.

As suggested on a previous mail, there is no reason to have
GITS_TYPER.PTA different per domain and we can choose the best value for us.

In our case, GITS_TYPER.PTA = 0 (i.e using linear processors numbers) is
the best one.

> 1.2 MAPD Command translation:
> -----------------------------------------------
>    Format: MAPD device, ITT IPA, ITT Size
> 
>    MAPD is sent with Validation bit set if device needs to be added
> and reset when device is removed
> 
> If Validation bit is set:

     - Check if the device is assigned to the domain

>    - Allocate memory for its_device struct

Allocation can't be done in interrupt context.

>    - Validate ITT IPA & ITT size and update its_device struct
>    - Find number of vectors(nrvecs) for this device by querying PCI
> helper function

This could be read only once when the device is added to Xen via the
hypercall PHYSDEV_*pci*.

>    - Allocate nrvecs number of LPI
>    - Allocate memory for struct vlpi_map for this device. This
> vlpi_map holds mapping
>      of Virtual LPI to Physical LPI and ID.
>    - Find physical ITS node for which this device is assigned

Not necessary in a 1 vITS = 1 pITS which seem to be the solution we will
choose.

>    - Call p2m_lookup on ITT IPA addr and get physical ITT address
>    - Validate ITT Size

You already do it in "validate ITT IPA & ITT size...". Although all the
checks should be done before any allocation.

>    - Generate/format physical ITS command: MAPD, ITT PA, ITT Size
> 
>    Here the overhead is with memory allocation for its_device and vlpi_map

As suggested earlier, the memory allocate of its_device and vlpi_map can
be done when the device is assigned to the domain or added to Xen

The only things you would have to do here is checking the ITT size and
mark the device enable.

> 
> If Validation bit is not set:
>     - Validate if the device exits by checking vITS device list

Using a list can be very expensive... I would use a radix tree.

>     - Clear all vlpis assigned for this device

What happens for interrupt assigned to this device? Are they disabled?
unroute?

>     - Remove this device from vITS list
>     - Free memory
>
> 1.3 MAPVI/MAPI Command translation:
> -----------------------------------------------
>    Format: MAPVI device, ID, vID, vCID
> 
> - Validate if the device exits by checking vITS device list

exists

> - Validate vCID and get pCID by searching cid_map
> - if vID does not have entry in vlpi_entries of this device
>   If not, Allot pID from vlpi_map of this device and update
> vlpi_entries with new pID
> - Allocate irq descriptor and add to RB tree
> - call route_irq_to_guest() for this pID
> - Generate/format physical ITS command: MAPVI device ID, pID, pCID
> 
> Here the overhead is allot physical ID, allocate memory for
> irq descriptor and  routing interrupt

An overhead which can be removed by routing the IRQ when the device is
assigned.

> All other ITS command like MOVI, DISCARD, INV, INVALL, INT, CLEAR,
> SYNC just validate and generate physical command

With the data structure you suggested it's not the case, the validation
can be very expensive.

> __text__
> 
> We can discuss and add how to reduce translation time.

I've suggested multiple way to reduce translation time over my previous
mail. It would have been nice to include them in your proposal...

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-22 12:49               ` Julien Grall
@ 2015-05-22 13:58                 ` Vijay Kilari
  2015-05-22 14:35                   ` Julien Grall
  0 siblings, 1 reply; 77+ messages in thread
From: Vijay Kilari @ 2015-05-22 13:58 UTC (permalink / raw)
  To: Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Fri, May 22, 2015 at 6:19 PM, Julien Grall <julien.grall@citrix.com> wrote:
> Hi Vijay,
>
> On 22/05/15 13:16, Vijay Kilari wrote:
>> On Tue, May 19, 2015 at 7:21 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
>>> On Tue, 2015-05-19 at 14:37 +0100, Julien Grall wrote:
>>>> Hi Ian,
>>>>
>>>> On 19/05/15 13:10, Ian Campbell wrote:
>>>>> On Fri, 2015-05-15 at 15:55 +0100, Julien Grall wrote:
>>>>> [...]
>>>>>>> Translation of certain commands can be expensive (XXX citation
>>>>>>> needed).
>>>>>>
>>>>>> The term "expensive" is subjective. I think we can end up to cheap
>>>>>> translation if we properly pre-allocate information (such as device,
>>>>>> LPIs...). We can have all the informations before the guest as boot or
>>>>>> during hotplug part. It wouldn't take more memory than it should use.
>>>>>>
>>>>>> During command translation, we would just need to enable the device/LPIs.
>>>>>>
>>>>>> The remaining expensive part would be the validation. I think we can
>>>>>> improve most of them of O(1) (such as collection checking) or O(log(n))
>>>>>> (such as device checking).
>>>>> [...]
>>>>>>> XXX need a solution for this.
>>>>>>
>>>>>> Command translation can be improved. It may be good too add a section
>>>>>> explaining how translation of command foo can be done.
>>>>>
>>>>> I think that is covered by the spec, however if there are operations
>>>>> which form part of this which are potentially expensive we should
>>>>> outline in our design how this will be dealt with.
>>>>>
>>>>> Perhaps you or Vijay could propose some additional text covering:
>>>>>       * What the potentially expensive operations during a translation
>>>>>         are.
>>>>>       * How we are going to deal with those operations, including:
>>>>>               * What data structure is used
>>>>>               * What start of day setup is required to enable this
>>>>>               * What operations are therefore required at translation
>>>>>                 time
>>>>
>>>> I don't have much time to work on a proposal. I would be happy if Vijay
>>>> do it.
>>>
>>> OK, Vijay could you make a proposal here please.
>>
>> __text__
>>
>> 1) Command translation:
>> -----------------------------------
>>
>>  - ITS commands contains device ID, Event ID (vID), Collection ID
>> (vCID), Target Address (vTA)
>>     parameters
>>  - All these parameters should be validated
>>  - These parameters should be translated from Virtual to Physical
>>
>> Of the existing GICv3 ITS commands, MAPC, MAPD, MAPVI/MAPI are the time
>> consuming commands as these commands creates entry in the Xen ITS structures,
>> which are used to validate other ITS commands.
>>
>> 1.1 MAPC command translation
>> -----------------------------------------------
>>    Format: MAPC vCID, vTA
>>
>>    -  vTA is validated against Re-distributor address by searching
>> Redistributor region /
>>        CPU number based on GITS_TYPER.PAtype and Physical Collection
>> ID & Physical
>>        Target address are retrieved
>>    -  Each vITS will have cid_map (struct cid_mapping) which holds mapping of
>>       Virtual Collection ID, Virtual Targets address and Physical Collection ID.
>
> How the vCID is mapped to the pCID? How would that fit with interrupt
> migration?

Physical ITS driver create one collection ID (pCID) per CPU.
DomU's vCID should always 0 to MAXVCPUS as GITS.TYPER.PTAtype is set to 0.
(as suggested by you below)

So Migration should be within 0 - 8. Here there is scope for improvement
to migration to pCPU on which vCPU is running.

>
>>    -  MAPC pCID, pTA physical ITS command is generated
>>
>>    Here there is no overhead, the cid_map entries (approx 32 entries)
>> are preallocated when
>>    vITS is created.
>
> Wrong, there is an overhead with your solution. If you have
> GITS_TYPER.PTA == 1 (i.e using the physical address of re-distributors)
> you have to loop through all the re-distributors which may be long.
>
> As suggested on a previous mail, there is no reason to have
> GITS_TYPER.PTA different per domain and we can choose the best value for us.
>
> In our case, GITS_TYPER.PTA = 0 (i.e using linear processors numbers) is
> the best one.
>
agreed

>> 1.2 MAPD Command translation:
>> -----------------------------------------------
>>    Format: MAPD device, ITT IPA, ITT Size
>>
>>    MAPD is sent with Validation bit set if device needs to be added
>> and reset when device is removed
>>
>> If Validation bit is set:
>
>      - Check if the device is assigned to the domain
>
>>    - Allocate memory for its_device struct
>
> Allocation can't be done in interrupt context.

Can't we allocate in softirq context?

>
>>    - Validate ITT IPA & ITT size and update its_device struct
>>    - Find number of vectors(nrvecs) for this device by querying PCI
>> helper function
>
> This could be read only once when the device is added to Xen via the
> hypercall PHYSDEV_*pci*

If so, this value should be in pci_dev struct
.
>
>>    - Allocate nrvecs number of LPI
>>    - Allocate memory for struct vlpi_map for this device. This
>> vlpi_map holds mapping
>>      of Virtual LPI to Physical LPI and ID.
>>    - Find physical ITS node for which this device is assigned
>
> Not necessary in a 1 vITS = 1 pITS which seem to be the solution we will
> choose.
>
>>    - Call p2m_lookup on ITT IPA addr and get physical ITT address
>>    - Validate ITT Size
>
> You already do it in "validate ITT IPA & ITT size...". Although all the
> checks should be done before any allocation.
>
>>    - Generate/format physical ITS command: MAPD, ITT PA, ITT Size
>>
>>    Here the overhead is with memory allocation for its_device and vlpi_map
>
> As suggested earlier, the memory allocate of its_device and vlpi_map can
> be done when the device is assigned to the domain or added to Xen
>
> The only things you would have to do here is checking the ITT size and
> mark the device enable.
>
>>
>> If Validation bit is not set:
>>     - Validate if the device exits by checking vITS device list
>
> Using a list can be very expensive... I would use a radix tree.
>
>>     - Clear all vlpis assigned for this device
>
> What happens for interrupt assigned to this device? Are they disabled?
> unroute?

    Should be disable with LPI configuration table update. I think
release_irq is called
>
>>     - Remove this device from vITS list
>>     - Free memory
>>
>> 1.3 MAPVI/MAPI Command translation:
>> -----------------------------------------------
>>    Format: MAPVI device, ID, vID, vCID
>>
>> - Validate if the device exits by checking vITS device list
>
> exists
>
>> - Validate vCID and get pCID by searching cid_map
>> - if vID does not have entry in vlpi_entries of this device
>>   If not, Allot pID from vlpi_map of this device and update
>> vlpi_entries with new pID
>> - Allocate irq descriptor and add to RB tree
>> - call route_irq_to_guest() for this pID
>> - Generate/format physical ITS command: MAPVI device ID, pID, pCID
>>
>> Here the overhead is allot physical ID, allocate memory for
>> irq descriptor and  routing interrupt
>
> An overhead which can be removed by routing the IRQ when the device is
> assigned.

   But, routing requires pID which is not known when device is assigned.
nrvecs could be as high as 256/2K so cannot route all the pID when assigned.

>
>> All other ITS command like MOVI, DISCARD, INV, INVALL, INT, CLEAR,
>> SYNC just validate and generate physical command
>
> With the data structure you suggested it's not the case, the validation
> can be very expensive.

which data structure?

>
>> __text__
>>
>> We can discuss and add how to reduce translation time.
>
> I've suggested multiple way to reduce translation time over my previous
> mail. It would have been nice to include them in your proposal...
>
> Regards,
>
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-22 13:58                 ` Vijay Kilari
@ 2015-05-22 14:35                   ` Julien Grall
  2015-05-22 14:54                     ` Vijay Kilari
  0 siblings, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-22 14:35 UTC (permalink / raw)
  To: Vijay Kilari, Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On 22/05/15 14:58, Vijay Kilari wrote:
> On Fri, May 22, 2015 at 6:19 PM, Julien Grall <julien.grall@citrix.com> wrote:
>>> 1) Command translation:
>>> -----------------------------------
>>>
>>>  - ITS commands contains device ID, Event ID (vID), Collection ID
>>> (vCID), Target Address (vTA)
>>>     parameters
>>>  - All these parameters should be validated
>>>  - These parameters should be translated from Virtual to Physical
>>>
>>> Of the existing GICv3 ITS commands, MAPC, MAPD, MAPVI/MAPI are the time
>>> consuming commands as these commands creates entry in the Xen ITS structures,
>>> which are used to validate other ITS commands.
>>>
>>> 1.1 MAPC command translation
>>> -----------------------------------------------
>>>    Format: MAPC vCID, vTA
>>>
>>>    -  vTA is validated against Re-distributor address by searching
>>> Redistributor region /
>>>        CPU number based on GITS_TYPER.PAtype and Physical Collection
>>> ID & Physical
>>>        Target address are retrieved
>>>    -  Each vITS will have cid_map (struct cid_mapping) which holds mapping of
>>>       Virtual Collection ID, Virtual Targets address and Physical Collection ID.
>>
>> How the vCID is mapped to the pCID? How would that fit with interrupt
>> migration?
> 
> Physical ITS driver create one collection ID (pCID) per CPU.
> DomU's vCID should always 0 to MAXVCPUS as GITS.TYPER.PTAtype is set to 0.
> (as suggested by you below)

Why do you speak about GITS_TYPER.PTA? No matter the value of this
field, there will be always no more than MAXVPCUS collections

> So Migration should be within 0 - 8. Here there is scope for improvement
> to migration to pCPU on which vCPU is running.

Are you aware that the physical collection may contain interrupt from
other domain and Xen?

>>> 1.2 MAPD Command translation:
>>> -----------------------------------------------
>>>    Format: MAPD device, ITT IPA, ITT Size
>>>
>>>    MAPD is sent with Validation bit set if device needs to be added
>>> and reset when device is removed
>>>
>>> If Validation bit is set:

More other concerns about MAPD. How do you handle a guest who wants to
change the ITT by calling again MAPD?

>>      - Check if the device is assigned to the domain
>>
>>>    - Allocate memory for its_device struct
>>
>> Allocation can't be done in interrupt context.
> 
> Can't we allocate in softirq context?

It should be possible in softirq. Although, we still want something quick.

> 
>>
>>>    - Validate ITT IPA & ITT size and update its_device struct
>>>    - Find number of vectors(nrvecs) for this device by querying PCI
>>> helper function
>>
>> This could be read only once when the device is added to Xen via the
>> hypercall PHYSDEV_*pci*
> 
> If so, this value should be in pci_dev struct

Or a in a specific its_device structure in the ITS... because the
{,v}ITS code has to be device agnostic as much as possible.

> .
>>
>>>    - Allocate nrvecs number of LPI
>>>    - Allocate memory for struct vlpi_map for this device. This
>>> vlpi_map holds mapping
>>>      of Virtual LPI to Physical LPI and ID.
>>>    - Find physical ITS node for which this device is assigned
>>
>> Not necessary in a 1 vITS = 1 pITS which seem to be the solution we will
>> choose.
>>
>>>    - Call p2m_lookup on ITT IPA addr and get physical ITT address
>>>    - Validate ITT Size
>>
>> You already do it in "validate ITT IPA & ITT size...". Although all the
>> checks should be done before any allocation.
>>
>>>    - Generate/format physical ITS command: MAPD, ITT PA, ITT Size
>>>
>>>    Here the overhead is with memory allocation for its_device and vlpi_map
>>
>> As suggested earlier, the memory allocate of its_device and vlpi_map can
>> be done when the device is assigned to the domain or added to Xen
>>
>> The only things you would have to do here is checking the ITT size and
>> mark the device enable.
>>
>>>
>>> If Validation bit is not set:
>>>     - Validate if the device exits by checking vITS device list
>>
>> Using a list can be very expensive... I would use a radix tree.
>>
>>>     - Clear all vlpis assigned for this device
>>
>> What happens for interrupt assigned to this device? Are they disabled?
>> unroute?
> 
>     Should be disable with LPI configuration table update. I think
> release_irq is called

So calling release_irq on every LPIs associated? That sounds very long.

>>
>>>     - Remove this device from vITS list
>>>     - Free memory
>>>
>>> 1.3 MAPVI/MAPI Command translation:
>>> -----------------------------------------------
>>>    Format: MAPVI device, ID, vID, vCID
>>>
>>> - Validate if the device exits by checking vITS device list
>>
>> exists
>>
>>> - Validate vCID and get pCID by searching cid_map
>>> - if vID does not have entry in vlpi_entries of this device
>>>   If not, Allot pID from vlpi_map of this device and update
>>> vlpi_entries with new pID
>>> - Allocate irq descriptor and add to RB tree
>>> - call route_irq_to_guest() for this pID
>>> - Generate/format physical ITS command: MAPVI device ID, pID, pCID
>>>
>>> Here the overhead is allot physical ID, allocate memory for
>>> irq descriptor and  routing interrupt
>>
>> An overhead which can be removed by routing the IRQ when the device is
>> assigned.
> 
>    But, routing requires pID which is not known when device is assigned.
> nrvecs could be as high as 256/2K so cannot route all the pID when assigned.

Why? You just need to allocate a chunk of pID and having an optimized
function to route multiple IRQ at once. We could also improve the way to
store IRQ desc.

>>
>>> All other ITS command like MOVI, DISCARD, INV, INVALL, INT, CLEAR,
>>> SYNC just validate and generate physical command
>>
>> With the data structure you suggested it's not the case, the validation
>> can be very expensive.
> 
> which data structure?

The list ...

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-22 14:35                   ` Julien Grall
@ 2015-05-22 14:54                     ` Vijay Kilari
  0 siblings, 0 replies; 77+ messages in thread
From: Vijay Kilari @ 2015-05-22 14:54 UTC (permalink / raw)
  To: Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Fri, May 22, 2015 at 8:05 PM, Julien Grall <julien.grall@citrix.com> wrote:
> On 22/05/15 14:58, Vijay Kilari wrote:
>> On Fri, May 22, 2015 at 6:19 PM, Julien Grall <julien.grall@citrix.com> wrote:
>>>> 1) Command translation:
>>>> -----------------------------------
>>>>
>>>>  - ITS commands contains device ID, Event ID (vID), Collection ID
>>>> (vCID), Target Address (vTA)
>>>>     parameters
>>>>  - All these parameters should be validated
>>>>  - These parameters should be translated from Virtual to Physical
>>>>
>>>> Of the existing GICv3 ITS commands, MAPC, MAPD, MAPVI/MAPI are the time
>>>> consuming commands as these commands creates entry in the Xen ITS structures,
>>>> which are used to validate other ITS commands.
>>>>
>>>> 1.1 MAPC command translation
>>>> -----------------------------------------------
>>>>    Format: MAPC vCID, vTA
>>>>
>>>>    -  vTA is validated against Re-distributor address by searching
>>>> Redistributor region /
>>>>        CPU number based on GITS_TYPER.PAtype and Physical Collection
>>>> ID & Physical
>>>>        Target address are retrieved
>>>>    -  Each vITS will have cid_map (struct cid_mapping) which holds mapping of
>>>>       Virtual Collection ID, Virtual Targets address and Physical Collection ID.
>>>
>>> How the vCID is mapped to the pCID? How would that fit with interrupt
>>> migration?
>>
>> Physical ITS driver create one collection ID (pCID) per CPU.
>> DomU's vCID should always 0 to MAXVCPUS as GITS.TYPER.PTAtype is set to 0.
>> (as suggested by you below)
>
> Why do you speak about GITS_TYPER.PTA? No matter the value of this
> field, there will be always no more than MAXVPCUS collections
>
>> So Migration should be within 0 - 8. Here there is scope for improvement
>> to migration to pCPU on which vCPU is running.
>
> Are you aware that the physical collection may contain interrupt from
> other domain and Xen?
>

yes. Collection IDs are not unique.

>>>> 1.2 MAPD Command translation:
>>>> -----------------------------------------------
>>>>    Format: MAPD device, ITT IPA, ITT Size
>>>>
>>>>    MAPD is sent with Validation bit set if device needs to be added
>>>> and reset when device is removed
>>>>
>>>> If Validation bit is set:
>
> More other concerns about MAPD. How do you handle a guest who wants to
> change the ITT by calling again MAPD?
>
>>>      - Check if the device is assigned to the domain
>>>
>>>>    - Allocate memory for its_device struct
>>>
>>> Allocation can't be done in interrupt context.
>>
>> Can't we allocate in softirq context?
>
> It should be possible in softirq. Although, we still want something quick.
>
>>
>>>
>>>>    - Validate ITT IPA & ITT size and update its_device struct
>>>>    - Find number of vectors(nrvecs) for this device by querying PCI
>>>> helper function
>>>
>>> This could be read only once when the device is added to Xen via the
>>> hypercall PHYSDEV_*pci*
>>
>> If so, this value should be in pci_dev struct
>
> Or a in a specific its_device structure in the ITS... because the
> {,v}ITS code has to be device agnostic as much as possible.

   Yes, nrvecs value is copied to its_device structure. However it
has to be queries once at the time of device creation.
If we pre-allocate its_device struct then nrvecs can be updated
at the time of pre-allocation

>
>> .
>>>
>>>>    - Allocate nrvecs number of LPI
>>>>    - Allocate memory for struct vlpi_map for this device. This
>>>> vlpi_map holds mapping
>>>>      of Virtual LPI to Physical LPI and ID.
>>>>    - Find physical ITS node for which this device is assigned
>>>
>>> Not necessary in a 1 vITS = 1 pITS which seem to be the solution we will
>>> choose.
>>>
>>>>    - Call p2m_lookup on ITT IPA addr and get physical ITT address
>>>>    - Validate ITT Size
>>>
>>> You already do it in "validate ITT IPA & ITT size...". Although all the
>>> checks should be done before any allocation.
>>>
>>>>    - Generate/format physical ITS command: MAPD, ITT PA, ITT Size
>>>>
>>>>    Here the overhead is with memory allocation for its_device and vlpi_map
>>>
>>> As suggested earlier, the memory allocate of its_device and vlpi_map can
>>> be done when the device is assigned to the domain or added to Xen
>>>
>>> The only things you would have to do here is checking the ITT size and
>>> mark the device enable.
>>>
>>>>
>>>> If Validation bit is not set:
>>>>     - Validate if the device exits by checking vITS device list
>>>
>>> Using a list can be very expensive... I would use a radix tree.
>>>
>>>>     - Clear all vlpis assigned for this device
>>>
>>> What happens for interrupt assigned to this device? Are they disabled?
>>> unroute?
>>
>>     Should be disable with LPI configuration table update. I think
>> release_irq is called
>
> So calling release_irq on every LPIs associated? That sounds very long.
>
>>>
>>>>     - Remove this device from vITS list
>>>>     - Free memory
>>>>
>>>> 1.3 MAPVI/MAPI Command translation:
>>>> -----------------------------------------------
>>>>    Format: MAPVI device, ID, vID, vCID
>>>>
>>>> - Validate if the device exits by checking vITS device list
>>>
>>> exists
>>>
>>>> - Validate vCID and get pCID by searching cid_map
>>>> - if vID does not have entry in vlpi_entries of this device
>>>>   If not, Allot pID from vlpi_map of this device and update
>>>> vlpi_entries with new pID
>>>> - Allocate irq descriptor and add to RB tree
>>>> - call route_irq_to_guest() for this pID
>>>> - Generate/format physical ITS command: MAPVI device ID, pID, pCID
>>>>
>>>> Here the overhead is allot physical ID, allocate memory for
>>>> irq descriptor and  routing interrupt
>>>
>>> An overhead which can be removed by routing the IRQ when the device is
>>> assigned.
>>
>>    But, routing requires pID which is not known when device is assigned.
>> nrvecs could be as high as 256/2K so cannot route all the pID when assigned.
>
> Why? You just need to allocate a chunk of pID and having an optimized
> function to route multiple IRQ at once. We could also improve the way to
> store IRQ desc.

 You suggest to create one irq descriptor for all LPI's of the device?

>
>>>
>>>> All other ITS command like MOVI, DISCARD, INV, INVALL, INT, CLEAR,
>>>> SYNC just validate and generate physical command
>>>
>>> With the data structure you suggested it's not the case, the validation
>>> can be very expensive.
>>
>> which data structure?
>
> The list ...

we can create RB-tree of its_device list.

>
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-22 12:16             ` Vijay Kilari
  2015-05-22 12:49               ` Julien Grall
@ 2015-05-24 10:35               ` Julien Grall
  2015-05-25  9:06                 ` Vijay Kilari
  2015-05-27 11:22                 ` Ian Campbell
  2015-05-27 11:22               ` Ian Campbell
  2 siblings, 2 replies; 77+ messages in thread
From: Julien Grall @ 2015-05-24 10:35 UTC (permalink / raw)
  To: Vijay Kilari, Ian Campbell
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

Hi Vijay,

On 22/05/2015 13:16, Vijay Kilari wrote:
> On Tue, May 19, 2015 at 7:21 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
>> On Tue, 2015-05-19 at 14:37 +0100, Julien Grall wrote:
>>> Hi Ian,
>>>
>>> On 19/05/15 13:10, Ian Campbell wrote:
>>>> On Fri, 2015-05-15 at 15:55 +0100, Julien Grall wrote:
>>>> [...]
>>>>>> Translation of certain commands can be expensive (XXX citation
>>>>>> needed).
>>>>>
>>>>> The term "expensive" is subjective. I think we can end up to cheap
>>>>> translation if we properly pre-allocate information (such as device,
>>>>> LPIs...). We can have all the informations before the guest as boot or
>>>>> during hotplug part. It wouldn't take more memory than it should use.
>>>>>
>>>>> During command translation, we would just need to enable the device/LPIs.
>>>>>
>>>>> The remaining expensive part would be the validation. I think we can
>>>>> improve most of them of O(1) (such as collection checking) or O(log(n))
>>>>> (such as device checking).
>>>> [...]
>>>>>> XXX need a solution for this.
>>>>>
>>>>> Command translation can be improved. It may be good too add a section
>>>>> explaining how translation of command foo can be done.
>>>>
>>>> I think that is covered by the spec, however if there are operations
>>>> which form part of this which are potentially expensive we should
>>>> outline in our design how this will be dealt with.
>>>>
>>>> Perhaps you or Vijay could propose some additional text covering:
>>>>        * What the potentially expensive operations during a translation
>>>>          are.
>>>>        * How we are going to deal with those operations, including:
>>>>                * What data structure is used
>>>>                * What start of day setup is required to enable this
>>>>                * What operations are therefore required at translation
>>>>                  time
>>>
>>> I don't have much time to work on a proposal. I would be happy if Vijay
>>> do it.
>>
>> OK, Vijay could you make a proposal here please.
>
> __text__

I gave a second look to your proposal.

> 1) Command translation:
> -----------------------------------
>
>   - ITS commands contains device ID, Event ID (vID), Collection ID
> (vCID), Target Address (vTA)
>      parameters
>   - All these parameters should be validated
>   - These parameters should be translated from Virtual to Physical
>
> Of the existing GICv3 ITS commands, MAPC, MAPD, MAPVI/MAPI are the time
> consuming commands as these commands creates entry in the Xen ITS structures,
> which are used to validate other ITS commands.
>
> 1.1 MAPC command translation
> -----------------------------------------------
>     Format: MAPC vCID, vTA
>
>     -  vTA is validated against Re-distributor address by searching
> Redistributor region /
>         CPU number based on GITS_TYPER.PAtype and Physical Collection
> ID & Physical
>         Target address are retrieved
>     -  Each vITS will have cid_map (struct cid_mapping) which holds mapping of
>        Virtual Collection ID, Virtual Targets address and Physical Collection ID.
>     -  MAPC pCID, pTA physical ITS command is generated
>
>     Here there is no overhead, the cid_map entries (approx 32 entries)
> are preallocated when
>     vITS is created.

How did you decide the 32 entries? The ITS must at least provide N + 1 
collection when N is the number of processors.

Also, how do you handle collection re-mapping?


>
> 1.2 MAPD Command translation:
> -----------------------------------------------
>     Format: MAPD device, ITT IPA, ITT Size
>
>     MAPD is sent with Validation bit set if device needs to be added
> and reset when device is removed
>
> If Validation bit is set:
>     - Allocate memory for its_device struct
>     - Validate ITT IPA & ITT size and update its_device struct
>     - Find number of vectors(nrvecs) for this device by querying PCI
> helper function
>     - Allocate nrvecs number of LPI
>     - Allocate memory for struct vlpi_map for this device. This
> vlpi_map holds mapping
>       of Virtual LPI to Physical LPI and ID.
>     - Find physical ITS node for which this device is assigned
>
>     - Call p2m_lookup on ITT IPA addr and get physical ITT address
>     - Validate ITT Size
>     - Generate/format physical ITS command: MAPD, ITT PA, ITT Size
>
>     Here the overhead is with memory allocation for its_device and vlpi_map

What about device remapping?

> If Validation bit is not set:
>      - Validate if the device exits by checking vITS device list
>      - Clear all vlpis assigned for this device
>      - Remove this device from vITS list
>      - Free memory
>
> 1.3 MAPVI/MAPI Command translation:
> -----------------------------------------------
>     Format: MAPVI device, ID, vID, vCID
>
> - Validate if the device exits by checking vITS device list
> - Validate vCID and get pCID by searching cid_map
> - if vID does not have entry in vlpi_entries of this device
>    If not, Allot pID from vlpi_map of this device and update
> vlpi_entries with new pID
> - Allocate irq descriptor and add to RB tree
> - call route_irq_to_guest() for this pID
> - Generate/format physical ITS command: MAPVI device ID, pID, pCID
>
> Here the overhead is allot physical ID, allocate memory for
> irq descriptor and  routing interrupt
>
> All other ITS command like MOVI, DISCARD, INV, INVALL, INT, CLEAR,
> SYNC just validate and generate physical command

Interrupt remapping?

> __text__
>
> We can discuss and add how to reduce translation time.

I wrote my though for the validation bits (see below) and add some 
definitions useful for people which don't have the spec.

Emulation of ITS commands
=========================

# Introduction

This document is based on the section 5.13 of GICv3 specification
(PRD03-GENC-010745 24.0). The goal is to provide insight of the cost
to emulate ITS commands in Xen.

The ITS provides 12 commands in order to manage interrupt collection,
device and interrupts.

# Definitions

## Device identifier

Each device using the ITS is associated to an unique identifier. It's
discoverable via the firwmare and a specific algorithm (not described here).

The number of identifiers is variable and can be discovered via
GITS_TYPER.Devbits. The field allow this ITS to have up to 2^32 device.

## Collection

Each interrupt is a member of an Interrupt Collection. This allows 
software to manage large numbers of physical interrupts with a small 
number of commands rather than issuing command per interrupt.

On a system with N processors, the ITS must provide at least N+1 
collections.

## Target Addresses

The Target Address correspond to a specific re-distributor. The format 
of this field depend on the value of the bit GITS_TYPER.PTA:
     - 1: the base address of the re-distributor target is used
     - 0: a unique processor number is used. The mapping between the
     processor affinity value (MPIDR) and the processor number can be
     discoverable via GICR_TYPER.ProcessorNumber.

# Validation of the parameters

Each command contains parameters that needs to be validated before any 
usage in Xen or passing to the hardware.

This section will describe the validation of the main parameters.

## Device ID

This parameter is used in commands which manage the device and the 
interrupts associated to this device. Checking if a device is present 
and retrieving the data structure must be fast.

The device identifiers may not be assigned contiguously and the maximum 
number is very high (2^32). The possible efficient data structure would be:
     1) List: The lookup/deletion is in O(n) and the insertion will 
depend if the device should be sort following their identifier. The 
memory overhead is 18 bytes per element.
     2) Red-black tree: All the operations are O(log(n)). The memory 
overhead is 24 bytes per element.

The solution 2) seems the more suitable for having fast deviceID 
validation even though the memory overhead is a bit higher compare to 
the list.

## Collection

This parameter is used in commands which manage collections and 
interrupt in order to move them for one CPU to another. The ITS is only 
mandatory to implement N + 1 collections where N is the number of 
processor on the platform. Furthermore, the identifier are always 
contiguous.

If we decide to implement the strict minimum (i.e N + 1), an array is
enough and will allow operations in O(1).

## Target Address

This parameter is used in commands to manage collection. It's a unique
identifier per processor. The format is different following the value
of the bit GITS_TYPER.PTA (see definition). The value of the field 
pre-defined by the ITS and the software has to handle the 2 cases.

The solution with GITS_TYPER.PTA set to one will require some computation
in order to find the VCPU associated with the redistributor address. It 
will be similar to get_vcpu_from_rdist in the vGICv3 emulation 
(xen/arch/arm/vgic-v3.c).

On another hand, setting GITS_TYPER.PTA to zero will give us control to
decide the linear process number  which could simply be the vcpu_id (always
linear).

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-24 10:35               ` Julien Grall
@ 2015-05-25  9:06                 ` Vijay Kilari
  2015-05-25  9:32                   ` Julien Grall
  2015-05-27 11:22                 ` Ian Campbell
  1 sibling, 1 reply; 77+ messages in thread
From: Vijay Kilari @ 2015-05-25  9:06 UTC (permalink / raw)
  To: Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Sun, May 24, 2015 at 4:05 PM, Julien Grall <julien.grall@citrix.com> wrote:
> Hi Vijay,
>
>
> On 22/05/2015 13:16, Vijay Kilari wrote:
>>
>> On Tue, May 19, 2015 at 7:21 PM, Ian Campbell <ian.campbell@citrix.com>
>> wrote:
>>>
>>> On Tue, 2015-05-19 at 14:37 +0100, Julien Grall wrote:
>>>>
>>>> Hi Ian,
>>>>
>>>> On 19/05/15 13:10, Ian Campbell wrote:
>>>>>
>>>>> On Fri, 2015-05-15 at 15:55 +0100, Julien Grall wrote:
>>>>> [...]
>>>>>>>
>>>>>>> Translation of certain commands can be expensive (XXX citation
>>>>>>> needed).
>>>>>>
>>>>>>
>>>>>> The term "expensive" is subjective. I think we can end up to cheap
>>>>>> translation if we properly pre-allocate information (such as device,
>>>>>> LPIs...). We can have all the informations before the guest as boot or
>>>>>> during hotplug part. It wouldn't take more memory than it should use.
>>>>>>
>>>>>> During command translation, we would just need to enable the
>>>>>> device/LPIs.
>>>>>>
>>>>>> The remaining expensive part would be the validation. I think we can
>>>>>> improve most of them of O(1) (such as collection checking) or
>>>>>> O(log(n))
>>>>>> (such as device checking).
>>>>>
>>>>> [...]
>>>>>>>
>>>>>>> XXX need a solution for this.
>>>>>>
>>>>>>
>>>>>> Command translation can be improved. It may be good too add a section
>>>>>> explaining how translation of command foo can be done.
>>>>>
>>>>>
>>>>> I think that is covered by the spec, however if there are operations
>>>>> which form part of this which are potentially expensive we should
>>>>> outline in our design how this will be dealt with.
>>>>>
>>>>> Perhaps you or Vijay could propose some additional text covering:
>>>>>        * What the potentially expensive operations during a translation
>>>>>          are.
>>>>>        * How we are going to deal with those operations, including:
>>>>>                * What data structure is used
>>>>>                * What start of day setup is required to enable this
>>>>>                * What operations are therefore required at translation
>>>>>                  time
>>>>
>>>>
>>>> I don't have much time to work on a proposal. I would be happy if Vijay
>>>> do it.
>>>
>>>
>>> OK, Vijay could you make a proposal here please.
>>
>>
>> __text__
>
>
> I gave a second look to your proposal.
>
>> 1) Command translation:
>> -----------------------------------
>>
>>   - ITS commands contains device ID, Event ID (vID), Collection ID
>> (vCID), Target Address (vTA)
>>      parameters
>>   - All these parameters should be validated
>>   - These parameters should be translated from Virtual to Physical
>>
>> Of the existing GICv3 ITS commands, MAPC, MAPD, MAPVI/MAPI are the time
>> consuming commands as these commands creates entry in the Xen ITS
>> structures,
>> which are used to validate other ITS commands.
>>
>> 1.1 MAPC command translation
>> -----------------------------------------------
>>     Format: MAPC vCID, vTA
>>
>>     -  vTA is validated against Re-distributor address by searching
>> Redistributor region /
>>         CPU number based on GITS_TYPER.PAtype and Physical Collection
>> ID & Physical
>>         Target address are retrieved
>>     -  Each vITS will have cid_map (struct cid_mapping) which holds
>> mapping of
>>        Virtual Collection ID, Virtual Targets address and Physical
>> Collection ID.
>>     -  MAPC pCID, pTA physical ITS command is generated
>>
>>     Here there is no overhead, the cid_map entries (approx 32 entries)
>> are preallocated when
>>     vITS is created.
>
>
> How did you decide the 32 entries? The ITS must at least provide N + 1
> collection when N is the number of processors.

It should be MAX_VIRT_VCPUS.

>
> Also, how do you handle collection re-mapping?

There is one collection per cpu. The vTA of MAPC should fall within
vcpus range (GITS_TYPE.PTAtype is 0).

In case of remapping,  if the vCID does not exists in cid_map,
then new entry is made (vCID, pCID, vTA)

If vCID exists, the existing entry is updated with pCID, vTA

However this cid_map should be used to inject to right pCPU where
vCPU is running.

>
>
>>
>> 1.2 MAPD Command translation:
>> -----------------------------------------------
>>     Format: MAPD device, ITT IPA, ITT Size
>>
>>     MAPD is sent with Validation bit set if device needs to be added
>> and reset when device is removed
>>
>> If Validation bit is set:
>>     - Allocate memory for its_device struct
>>     - Validate ITT IPA & ITT size and update its_device struct
>>     - Find number of vectors(nrvecs) for this device by querying PCI
>> helper function
>>     - Allocate nrvecs number of LPI
>>     - Allocate memory for struct vlpi_map for this device. This
>> vlpi_map holds mapping
>>       of Virtual LPI to Physical LPI and ID.
>>     - Find physical ITS node for which this device is assigned
>>
>>     - Call p2m_lookup on ITT IPA addr and get physical ITT address
>>     - Validate ITT Size
>>     - Generate/format physical ITS command: MAPD, ITT PA, ITT Size
>>
>>     Here the overhead is with memory allocation for its_device and
>> vlpi_map
>
>
> What about device remapping?

IMO, device cannot be remapped. It has to removed (MAPD with valid bit 0)
so that ITS HW can remove the entries and added with new MAPD command.

>
>> If Validation bit is not set:
>>      - Validate if the device exits by checking vITS device list
>>      - Clear all vlpis assigned for this device
>>      - Remove this device from vITS list
>>      - Free memory
>>
>> 1.3 MAPVI/MAPI Command translation:
>> -----------------------------------------------
>>     Format: MAPVI device, ID, vID, vCID
>>
>> - Validate if the device exits by checking vITS device list
>> - Validate vCID and get pCID by searching cid_map
>> - if vID does not have entry in vlpi_entries of this device
>>    If not, Allot pID from vlpi_map of this device and update
>> vlpi_entries with new pID
>> - Allocate irq descriptor and add to RB tree
>> - call route_irq_to_guest() for this pID
>> - Generate/format physical ITS command: MAPVI device ID, pID, pCID
>>
>> Here the overhead is allot physical ID, allocate memory for
>> irq descriptor and  routing interrupt
>>
>> All other ITS command like MOVI, DISCARD, INV, INVALL, INT, CLEAR,
>> SYNC just validate and generate physical command
>
>
> Interrupt remapping?

Interrupt mapping is with MAP command. Here as per spec 4.9.22,
Device, vID should be unique to generate pID. So in case of
remapping unless Device, vID is changed, new pID is not generated.

If vCID is changed, a new pCID is generated based on MAPC command

>
>> __text__
>>
>> We can discuss and add how to reduce translation time.
>
>
> I wrote my though for the validation bits (see below) and add some
> definitions useful for people which don't have the spec.
>
> Emulation of ITS commands
> =========================
>
> # Introduction
>
> This document is based on the section 5.13 of GICv3 specification
> (PRD03-GENC-010745 24.0). The goal is to provide insight of the cost
> to emulate ITS commands in Xen.
>
> The ITS provides 12 commands in order to manage interrupt collection,
> device and interrupts.
>
> # Definitions
>
> ## Device identifier
>
> Each device using the ITS is associated to an unique identifier. It's
> discoverable via the firwmare and a specific algorithm (not described here).
>
> The number of identifiers is variable and can be discovered via
> GITS_TYPER.Devbits. The field allow this ITS to have up to 2^32 device.
>
> ## Collection
>
> Each interrupt is a member of an Interrupt Collection. This allows software
> to manage large numbers of physical interrupts with a small number of
> commands rather than issuing command per interrupt.
>
> On a system with N processors, the ITS must provide at least N+1
> collections.
>
> ## Target Addresses
>
> The Target Address correspond to a specific re-distributor. The format of
> this field depend on the value of the bit GITS_TYPER.PTA:
>     - 1: the base address of the re-distributor target is used
>     - 0: a unique processor number is used. The mapping between the
>     processor affinity value (MPIDR) and the processor number can be
>     discoverable via GICR_TYPER.ProcessorNumber.
>
> # Validation of the parameters
>
> Each command contains parameters that needs to be validated before any usage
> in Xen or passing to the hardware.
>
> This section will describe the validation of the main parameters.
>
> ## Device ID
>
> This parameter is used in commands which manage the device and the
> interrupts associated to this device. Checking if a device is present and
> retrieving the data structure must be fast.
>
> The device identifiers may not be assigned contiguously and the maximum
> number is very high (2^32). The possible efficient data structure would be:
>     1) List: The lookup/deletion is in O(n) and the insertion will depend if
> the device should be sort following their identifier. The memory overhead is
> 18 bytes per element.
>     2) Red-black tree: All the operations are O(log(n)). The memory overhead
> is 24 bytes per element.
>
> The solution 2) seems the more suitable for having fast deviceID validation
> even though the memory overhead is a bit higher compare to the list.
>
> ## Collection
>
> This parameter is used in commands which manage collections and interrupt in
> order to move them for one CPU to another. The ITS is only mandatory to
> implement N + 1 collections where N is the number of processor on the
> platform. Furthermore, the identifier are always contiguous.
>
> If we decide to implement the strict minimum (i.e N + 1), an array is
> enough and will allow operations in O(1).
>
> ## Target Address
>
> This parameter is used in commands to manage collection. It's a unique
> identifier per processor. The format is different following the value
> of the bit GITS_TYPER.PTA (see definition). The value of the field
> pre-defined by the ITS and the software has to handle the 2 cases.
>
> The solution with GITS_TYPER.PTA set to one will require some computation
> in order to find the VCPU associated with the redistributor address. It will
> be similar to get_vcpu_from_rdist in the vGICv3 emulation
> (xen/arch/arm/vgic-v3.c).
>
> On another hand, setting GITS_TYPER.PTA to zero will give us control to
> decide the linear process number  which could simply be the vcpu_id (always
> linear).
>
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-25  9:06                 ` Vijay Kilari
@ 2015-05-25  9:32                   ` Julien Grall
  2015-05-25 10:40                     ` Vijay Kilari
  0 siblings, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-25  9:32 UTC (permalink / raw)
  To: Vijay Kilari, Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini



On 25/05/2015 11:06, Vijay Kilari wrote:
> On Sun, May 24, 2015 at 4:05 PM, Julien Grall <julien.grall@citrix.com> wrote:
>>> 1) Command translation:
>>> -----------------------------------
>>>
>>>    - ITS commands contains device ID, Event ID (vID), Collection ID
>>> (vCID), Target Address (vTA)
>>>       parameters
>>>    - All these parameters should be validated
>>>    - These parameters should be translated from Virtual to Physical
>>>
>>> Of the existing GICv3 ITS commands, MAPC, MAPD, MAPVI/MAPI are the time
>>> consuming commands as these commands creates entry in the Xen ITS
>>> structures,
>>> which are used to validate other ITS commands.
>>>
>>> 1.1 MAPC command translation
>>> -----------------------------------------------
>>>      Format: MAPC vCID, vTA
>>>
>>>      -  vTA is validated against Re-distributor address by searching
>>> Redistributor region /
>>>          CPU number based on GITS_TYPER.PAtype and Physical Collection
>>> ID & Physical
>>>          Target address are retrieved
>>>      -  Each vITS will have cid_map (struct cid_mapping) which holds
>>> mapping of
>>>         Virtual Collection ID, Virtual Targets address and Physical
>>> Collection ID.
>>>      -  MAPC pCID, pTA physical ITS command is generated
>>>
>>>      Here there is no overhead, the cid_map entries (approx 32 entries)
>>> are preallocated when
>>>      vITS is created.
>>
>>
>> How did you decide the 32 entries? The ITS must at least provide N + 1
>> collection when N is the number of processors.
>
> It should be MAX_VIRT_VCPUS.

Why not allocating dynamically rather than wasting memory?

>>
>> Also, how do you handle collection re-mapping?
>
> There is one collection per cpu. The vTA of MAPC should fall within
> vcpus range (GITS_TYPE.PTAtype is 0).

It's not what I asked...

> In case of remapping,  if the vCID does not exists in cid_map,
> then new entry is made (vCID, pCID, vTA)
>
> If vCID exists, the existing entry is updated with pCID, vTA
>
> However this cid_map should be used to inject to right pCPU where
> vCPU is running.

What do you mean by injecting? The MAPC should never be injected to the 
physical CPU. As I said earlier, the collection is shared with all the 
vCPU and Xen.

>>
>>
>>>
>>> 1.2 MAPD Command translation:
>>> -----------------------------------------------
>>>      Format: MAPD device, ITT IPA, ITT Size
>>>
>>>      MAPD is sent with Validation bit set if device needs to be added
>>> and reset when device is removed
>>>
>>> If Validation bit is set:
>>>      - Allocate memory for its_device struct
>>>      - Validate ITT IPA & ITT size and update its_device struct
>>>      - Find number of vectors(nrvecs) for this device by querying PCI
>>> helper function
>>>      - Allocate nrvecs number of LPI
>>>      - Allocate memory for struct vlpi_map for this device. This
>>> vlpi_map holds mapping
>>>        of Virtual LPI to Physical LPI and ID.
>>>      - Find physical ITS node for which this device is assigned
>>>
>>>      - Call p2m_lookup on ITT IPA addr and get physical ITT address
>>>      - Validate ITT Size
>>>      - Generate/format physical ITS command: MAPD, ITT PA, ITT Size
>>>
>>>      Here the overhead is with memory allocation for its_device and
>>> vlpi_map
>>
>>
>> What about device remapping?
>
> IMO, device cannot be remapped. It has to removed (MAPD with valid bit 0)
> so that ITS HW can remove the entries and added with new MAPD command.

Your opinion is not the spec...

Device remapping is allowed by the spec (see 4.9.18 "Re-mapping and 
Un-mapping devices in PRD03-GENC-010745 24.0). So even it's not possible 
(with a spec ref in proof), you have to protect it...

>>
>>> If Validation bit is not set:
>>>       - Validate if the device exits by checking vITS device list
>>>       - Clear all vlpis assigned for this device
>>>       - Remove this device from vITS list
>>>       - Free memory
>>>
>>> 1.3 MAPVI/MAPI Command translation:
>>> -----------------------------------------------
>>>      Format: MAPVI device, ID, vID, vCID
>>>
>>> - Validate if the device exits by checking vITS device list
>>> - Validate vCID and get pCID by searching cid_map
>>> - if vID does not have entry in vlpi_entries of this device
>>>     If not, Allot pID from vlpi_map of this device and update
>>> vlpi_entries with new pID
>>> - Allocate irq descriptor and add to RB tree
>>> - call route_irq_to_guest() for this pID
>>> - Generate/format physical ITS command: MAPVI device ID, pID, pCID
>>>
>>> Here the overhead is allot physical ID, allocate memory for
>>> irq descriptor and  routing interrupt
>>>
>>> All other ITS command like MOVI, DISCARD, INV, INVALL, INT, CLEAR,
>>> SYNC just validate and generate physical command
>>
>>
>> Interrupt remapping?
>
> Interrupt mapping is with MAP command. Here as per spec 4.9.22,
> Device, vID should be unique to generate pID. So in case of
> remapping unless Device, vID is changed, new pID is not generated.

4.9.22 for which version of the spec?

new pID may not be re-generated but there is some care to take when an 
vID is remapped. (see 4.9.17 "Re-mapping and Un-mapping Interrupts" in 
PRD03-GENC-010745 24.0).

> If vCID is changed, a new pCID is generated based on MAPC command

Which is wrong...

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-25  9:32                   ` Julien Grall
@ 2015-05-25 10:40                     ` Vijay Kilari
  2015-05-25 12:44                       ` Julien Grall
  0 siblings, 1 reply; 77+ messages in thread
From: Vijay Kilari @ 2015-05-25 10:40 UTC (permalink / raw)
  To: Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Julien Grall, Stefano Stabellini

On Mon, May 25, 2015 at 3:02 PM, Julien Grall
<julien.grall.oss@gmail.com> wrote:
>
>
> On 25/05/2015 11:06, Vijay Kilari wrote:
>>
>> On Sun, May 24, 2015 at 4:05 PM, Julien Grall <julien.grall@citrix.com>
>> wrote:
>>>>
>>>> 1) Command translation:
>>>> -----------------------------------
>>>>
>>>>    - ITS commands contains device ID, Event ID (vID), Collection ID
>>>> (vCID), Target Address (vTA)
>>>>       parameters
>>>>    - All these parameters should be validated
>>>>    - These parameters should be translated from Virtual to Physical
>>>>
>>>> Of the existing GICv3 ITS commands, MAPC, MAPD, MAPVI/MAPI are the time
>>>> consuming commands as these commands creates entry in the Xen ITS
>>>> structures,
>>>> which are used to validate other ITS commands.
>>>>
>>>> 1.1 MAPC command translation
>>>> -----------------------------------------------
>>>>      Format: MAPC vCID, vTA
>>>>
>>>>      -  vTA is validated against Re-distributor address by searching
>>>> Redistributor region /
>>>>          CPU number based on GITS_TYPER.PAtype and Physical Collection
>>>> ID & Physical
>>>>          Target address are retrieved
>>>>      -  Each vITS will have cid_map (struct cid_mapping) which holds
>>>> mapping of
>>>>         Virtual Collection ID, Virtual Targets address and Physical
>>>> Collection ID.
>>>>      -  MAPC pCID, pTA physical ITS command is generated
>>>>
>>>>      Here there is no overhead, the cid_map entries (approx 32 entries)
>>>> are preallocated when
>>>>      vITS is created.
>>>
>>>
>>>
>>> How did you decide the 32 entries? The ITS must at least provide N + 1
>>> collection when N is the number of processors.
>>
>>
>> It should be MAX_VIRT_VCPUS.
>
>
> Why not allocating dynamically rather than wasting memory?
>
>>>
>>> Also, how do you handle collection re-mapping?
>>
>>
>> There is one collection per cpu. The vTA of MAPC should fall within
>> vcpus range (GITS_TYPE.PTAtype is 0).
>
>
> It's not what I asked...
>
>> In case of remapping,  if the vCID does not exists in cid_map,
>> then new entry is made (vCID, pCID, vTA)
>>
>> If vCID exists, the existing entry is updated with pCID, vTA
>>
>> However this cid_map should be used to inject to right pCPU where
>> vCPU is running.
>
>
> What do you mean by injecting? The MAPC should never be injected to the
> physical CPU. As I said earlier, the collection is shared with all the vCPU
> and Xen.
>

It does not mean MAPC is sent to physical CPU,

All interrupts mapped to collection are taken on cpus 0 to nr_vcpus.
when vCID is mapped to pCID, all pCID fall in the range of 0 to nr_vcpus

So, irrespective of vcpus running on physical cpus all interrupts are routed
to pCPU 0 to nr_vcpus

Similar to below patch done for SPIs. LPIs should also be injected.

http://lists.xen.org/archives/html/xen-devel/2014-09/msg04176.html

Correct me if I have not understood your question correctly.

>>>
>>>
>>>>
>>>> 1.2 MAPD Command translation:
>>>> -----------------------------------------------
>>>>      Format: MAPD device, ITT IPA, ITT Size
>>>>
>>>>      MAPD is sent with Validation bit set if device needs to be added
>>>> and reset when device is removed
>>>>
>>>> If Validation bit is set:
>>>>      - Allocate memory for its_device struct
>>>>      - Validate ITT IPA & ITT size and update its_device struct
>>>>      - Find number of vectors(nrvecs) for this device by querying PCI
>>>> helper function
>>>>      - Allocate nrvecs number of LPI
>>>>      - Allocate memory for struct vlpi_map for this device. This
>>>> vlpi_map holds mapping
>>>>        of Virtual LPI to Physical LPI and ID.
>>>>      - Find physical ITS node for which this device is assigned
>>>>
>>>>      - Call p2m_lookup on ITT IPA addr and get physical ITT address
>>>>      - Validate ITT Size
>>>>      - Generate/format physical ITS command: MAPD, ITT PA, ITT Size
>>>>
>>>>      Here the overhead is with memory allocation for its_device and
>>>> vlpi_map
>>>
>>>
>>>
>>> What about device remapping?
>>
>>
>> IMO, device cannot be remapped. It has to removed (MAPD with valid bit 0)
>> so that ITS HW can remove the entries and added with new MAPD command.
>
>
> Your opinion is not the spec...
>
> Device remapping is allowed by the spec (see 4.9.18 "Re-mapping and
> Un-mapping devices in PRD03-GENC-010745 24.0). So even it's not possible
> (with a spec ref in proof), you have to protect it...

I am no saying that is my opinion, I mean the same as told in 4.9.18,
To unmap the device, the MAPD should be sent with valid bit 0, which will
remove the device from the list and added again on MAPD with valid bit 1

>
>>>
>>>> If Validation bit is not set:
>>>>       - Validate if the device exits by checking vITS device list
>>>>       - Clear all vlpis assigned for this device
>>>>       - Remove this device from vITS list
>>>>       - Free memory
>>>>
>>>> 1.3 MAPVI/MAPI Command translation:
>>>> -----------------------------------------------
>>>>      Format: MAPVI device, ID, vID, vCID
>>>>
>>>> - Validate if the device exits by checking vITS device list
>>>> - Validate vCID and get pCID by searching cid_map
>>>> - if vID does not have entry in vlpi_entries of this device
>>>>     If not, Allot pID from vlpi_map of this device and update
>>>> vlpi_entries with new pID
>>>> - Allocate irq descriptor and add to RB tree
>>>> - call route_irq_to_guest() for this pID
>>>> - Generate/format physical ITS command: MAPVI device ID, pID, pCID
>>>>
>>>> Here the overhead is allot physical ID, allocate memory for
>>>> irq descriptor and  routing interrupt
>>>>
>>>> All other ITS command like MOVI, DISCARD, INV, INVALL, INT, CLEAR,
>>>> SYNC just validate and generate physical command
>>>
>>>
>>>
>>> Interrupt remapping?
>>
>>
>> Interrupt mapping is with MAP command. Here as per spec 4.9.22,
>> Device, vID should be unique to generate pID. So in case of
>> remapping unless Device, vID is changed, new pID is not generated.
>
>
> 4.9.22 for which version of the spec?

24.0

>
> new pID may not be re-generated but there is some care to take when an vID
> is remapped. (see 4.9.17 "Re-mapping and Un-mapping Interrupts" in
> PRD03-GENC-010745 24.0).
>
>> If vCID is changed, a new pCID is generated based on MAPC command
>
>
> Which is wrong...

When you say vID is remapped, then vCID should be different right?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-25 10:40                     ` Vijay Kilari
@ 2015-05-25 12:44                       ` Julien Grall
  2015-05-25 13:38                         ` Vijay Kilari
  0 siblings, 1 reply; 77+ messages in thread
From: Julien Grall @ 2015-05-25 12:44 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Julien Grall, Stefano Stabellini

Hi,

On 25/05/2015 12:40, Vijay Kilari wrote:
> On Mon, May 25, 2015 at 3:02 PM, Julien Grall
> <julien.grall.oss@gmail.com> wrote:
>>
>>
>> On 25/05/2015 11:06, Vijay Kilari wrote:
>>>
>>> On Sun, May 24, 2015 at 4:05 PM, Julien Grall <julien.grall@citrix.com>
>>> wrote:
>>>>>
>>>>> 1) Command translation:
>>>>> -----------------------------------
>>>>>
>>>>>     - ITS commands contains device ID, Event ID (vID), Collection ID
>>>>> (vCID), Target Address (vTA)
>>>>>        parameters
>>>>>     - All these parameters should be validated
>>>>>     - These parameters should be translated from Virtual to Physical
>>>>>
>>>>> Of the existing GICv3 ITS commands, MAPC, MAPD, MAPVI/MAPI are the time
>>>>> consuming commands as these commands creates entry in the Xen ITS
>>>>> structures,
>>>>> which are used to validate other ITS commands.
>>>>>
>>>>> 1.1 MAPC command translation
>>>>> -----------------------------------------------
>>>>>       Format: MAPC vCID, vTA
>>>>>
>>>>>       -  vTA is validated against Re-distributor address by searching
>>>>> Redistributor region /
>>>>>           CPU number based on GITS_TYPER.PAtype and Physical Collection
>>>>> ID & Physical
>>>>>           Target address are retrieved
>>>>>       -  Each vITS will have cid_map (struct cid_mapping) which holds
>>>>> mapping of
>>>>>          Virtual Collection ID, Virtual Targets address and Physical
>>>>> Collection ID.
>>>>>       -  MAPC pCID, pTA physical ITS command is generated
>>>>>
>>>>>       Here there is no overhead, the cid_map entries (approx 32 entries)
>>>>> are preallocated when
>>>>>       vITS is created.
>>>>
>>>>
>>>>
>>>> How did you decide the 32 entries? The ITS must at least provide N + 1
>>>> collection when N is the number of processors.
>>>
>>>
>>> It should be MAX_VIRT_VCPUS.
>>
>>
>> Why not allocating dynamically rather than wasting memory?
>>
>>>>
>>>> Also, how do you handle collection re-mapping?
>>>
>>>
>>> There is one collection per cpu. The vTA of MAPC should fall within
>>> vcpus range (GITS_TYPE.PTAtype is 0).
>>
>>
>> It's not what I asked...
>>
>>> In case of remapping,  if the vCID does not exists in cid_map,
>>> then new entry is made (vCID, pCID, vTA)
>>>
>>> If vCID exists, the existing entry is updated with pCID, vTA
>>>
>>> However this cid_map should be used to inject to right pCPU where
>>> vCPU is running.
>>
>>
>> What do you mean by injecting? The MAPC should never be injected to the
>> physical CPU. As I said earlier, the collection is shared with all the vCPU
>> and Xen.
>>
>
> It does not mean MAPC is sent to physical CPU,
>
> All interrupts mapped to collection are taken on cpus 0 to nr_vcpus.
> when vCID is mapped to pCID, all pCID fall in the range of 0 to nr_vcpus

vCID can be higher than the number of VCPUs (the vITS has to support 
nr_vcpus + 1 collection).

Also, the number of physical collection may be lower than the virtual 
collection because the user created a guest with num vCPUs > num pCPU.

> So, irrespective of vcpus running on physical cpus all interrupts are routed
> to pCPU 0 to nr_vcpus
>
> Similar to below patch done for SPIs. LPIs should also be injected.

I know that LPIs should be injected...

>
> http://lists.xen.org/archives/html/xen-devel/2014-09/msg04176.html
>
> Correct me if I have not understood your question correctly.

AFAIU your proposal, the function mapping(vCID) will always return the 
same pCID, right?

[..]

>>>> What about device remapping?
>>>
>>>
>>> IMO, device cannot be remapped. It has to removed (MAPD with valid bit 0)
>>> so that ITS HW can remove the entries and added with new MAPD command.
>>
>>
>> Your opinion is not the spec...
>>
>> Device remapping is allowed by the spec (see 4.9.18 "Re-mapping and
>> Un-mapping devices in PRD03-GENC-010745 24.0). So even it's not possible
>> (with a spec ref in proof), you have to protect it...
>
> I am no saying that is my opinion, I mean the same as told in 4.9.18,

IMO === In My Opinion... I can't guess that you were talking about 4.9.18.

> To unmap the device, the MAPD should be sent with valid bit 0, which will

s/unmap/re-map/ ?

> remove the device from the list and added again on MAPD with valid bit 1

I can't see where the spec says that 2 MAPD (one with V=1 and the other 
with V=0) is required. The section 4.9.18 contains an 'or':

"Issue a mapping command (MAPD; see section 5.13.11) or an un-mapping 
command"

This is related to "Interrupts can be re-mapped or un-mapped".

4.9.18 and 5.13.11 (PRD03-GENC-010745 24.0) are only speaking about a 
single MAPD:

"Note: software might issue a MAPD command to re-map an already mapped 
device and the ITS must invalidate all cached data for that device."

>>
>> new pID may not be re-generated but there is some care to take when an vID
>> is remapped. (see 4.9.17 "Re-mapping and Un-mapping Interrupts" in
>> PRD03-GENC-010745 24.0).
>>
>>> If vCID is changed, a new pCID is generated based on MAPC command
>>
>>
>> Which is wrong...
>
> When you say vID is remapped, then vCID should be different right?

Yes. I was confuse by "MAPC command" at the end.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-25 12:44                       ` Julien Grall
@ 2015-05-25 13:38                         ` Vijay Kilari
  2015-05-25 17:11                           ` Julien Grall
  0 siblings, 1 reply; 77+ messages in thread
From: Vijay Kilari @ 2015-05-25 13:38 UTC (permalink / raw)
  To: Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Mon, May 25, 2015 at 6:14 PM, Julien Grall <julien.grall@citrix.com> wrote:
> Hi,
>
>
> On 25/05/2015 12:40, Vijay Kilari wrote:
>>
>> On Mon, May 25, 2015 at 3:02 PM, Julien Grall
>> <julien.grall.oss@gmail.com> wrote:
>>>
>>>
>>>
>>> On 25/05/2015 11:06, Vijay Kilari wrote:
>>>>
>>>>
>>>> On Sun, May 24, 2015 at 4:05 PM, Julien Grall <julien.grall@citrix.com>
>>>> wrote:
>>>>>>
>>>>>>
>>>>>> 1) Command translation:
>>>>>> -----------------------------------
>>>>>>
>>>>>>     - ITS commands contains device ID, Event ID (vID), Collection ID
>>>>>> (vCID), Target Address (vTA)
>>>>>>        parameters
>>>>>>     - All these parameters should be validated
>>>>>>     - These parameters should be translated from Virtual to Physical
>>>>>>
>>>>>> Of the existing GICv3 ITS commands, MAPC, MAPD, MAPVI/MAPI are the
>>>>>> time
>>>>>> consuming commands as these commands creates entry in the Xen ITS
>>>>>> structures,
>>>>>> which are used to validate other ITS commands.
>>>>>>
>>>>>> 1.1 MAPC command translation
>>>>>> -----------------------------------------------
>>>>>>       Format: MAPC vCID, vTA
>>>>>>
>>>>>>       -  vTA is validated against Re-distributor address by searching
>>>>>> Redistributor region /
>>>>>>           CPU number based on GITS_TYPER.PAtype and Physical
>>>>>> Collection
>>>>>> ID & Physical
>>>>>>           Target address are retrieved
>>>>>>       -  Each vITS will have cid_map (struct cid_mapping) which holds
>>>>>> mapping of
>>>>>>          Virtual Collection ID, Virtual Targets address and Physical
>>>>>> Collection ID.
>>>>>>       -  MAPC pCID, pTA physical ITS command is generated
>>>>>>
>>>>>>       Here there is no overhead, the cid_map entries (approx 32
>>>>>> entries)
>>>>>> are preallocated when
>>>>>>       vITS is created.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> How did you decide the 32 entries? The ITS must at least provide N + 1
>>>>> collection when N is the number of processors.
>>>>
>>>>
>>>>
>>>> It should be MAX_VIRT_VCPUS.
>>>
>>>
>>>
>>> Why not allocating dynamically rather than wasting memory?
>>>
>>>>>
>>>>> Also, how do you handle collection re-mapping?
>>>>
>>>>
>>>>
>>>> There is one collection per cpu. The vTA of MAPC should fall within
>>>> vcpus range (GITS_TYPE.PTAtype is 0).
>>>
>>>
>>>
>>> It's not what I asked...
>>>
>>>> In case of remapping,  if the vCID does not exists in cid_map,
>>>> then new entry is made (vCID, pCID, vTA)
>>>>
>>>> If vCID exists, the existing entry is updated with pCID, vTA
>>>>
>>>> However this cid_map should be used to inject to right pCPU where
>>>> vCPU is running.
>>>
>>>
>>>
>>> What do you mean by injecting? The MAPC should never be injected to the
>>> physical CPU. As I said earlier, the collection is shared with all the
>>> vCPU
>>> and Xen.
>>>
>>
>> It does not mean MAPC is sent to physical CPU,
>>
>> All interrupts mapped to collection are taken on cpus 0 to nr_vcpus.
>> when vCID is mapped to pCID, all pCID fall in the range of 0 to nr_vcpus
>
>
> vCID can be higher than the number of VCPUs (the vITS has to support
> nr_vcpus + 1 collection).
>
> Also, the number of physical collection may be lower than the virtual
> collection because the user created a guest with num vCPUs > num pCPU.
>
>> So, irrespective of vcpus running on physical cpus all interrupts are
>> routed
>> to pCPU 0 to nr_vcpus
>>
>> Similar to below patch done for SPIs. LPIs should also be injected.
>
>
> I know that LPIs should be injected...
>
>>
>> http://lists.xen.org/archives/html/xen-devel/2014-09/msg04176.html
>>
>> Correct me if I have not understood your question correctly.
>
>
> AFAIU your proposal, the function mapping(vCID) will always return the same
> pCID, right?

Yes, vCID to pCID is mapped

>
> [..]
>
>>>>> What about device remapping?
>>>>
>>>>
>>>>
>>>> IMO, device cannot be remapped. It has to removed (MAPD with valid bit
>>>> 0)
>>>> so that ITS HW can remove the entries and added with new MAPD command.
>>>
>>>
>>>
>>> Your opinion is not the spec...
>>>
>>> Device remapping is allowed by the spec (see 4.9.18 "Re-mapping and
>>> Un-mapping devices in PRD03-GENC-010745 24.0). So even it's not possible
>>> (with a spec ref in proof), you have to protect it...
>>
>>
>> I am no saying that is my opinion, I mean the same as told in 4.9.18,
>
>
> IMO === In My Opinion... I can't guess that you were talking about 4.9.18.
>
>> To unmap the device, the MAPD should be sent with valid bit 0, which will
>
>
> s/unmap/re-map/ ?
>
>> remove the device from the list and added again on MAPD with valid bit 1
>
>
> I can't see where the spec says that 2 MAPD (one with V=1 and the other with
> V=0) is required. The section 4.9.18 contains an 'or':
>
> "Issue a mapping command (MAPD; see section 5.13.11) or an un-mapping
> command"
>
> This is related to "Interrupts can be re-mapped or un-mapped".
>
> 4.9.18 and 5.13.11 (PRD03-GENC-010745 24.0) are only speaking about a single
> MAPD:
>
> "Note: software might issue a MAPD command to re-map an already mapped
> device and the ITS must invalidate all cached data for that device."
>

OK. I have missed this 'or'. If so, MAPD always overwrites the old info

>>>
>>> new pID may not be re-generated but there is some care to take when an
>>> vID
>>> is remapped. (see 4.9.17 "Re-mapping and Un-mapping Interrupts" in
>>> PRD03-GENC-010745 24.0).
>>>
>>>> If vCID is changed, a new pCID is generated based on MAPC command
>>>
>>>
>>>
>>> Which is wrong...
>>
>>
>> When you say vID is remapped, then vCID should be different right?
>
>
> Yes. I was confuse by "MAPC command" at the end.
>
> Regards,
>
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-25 13:38                         ` Vijay Kilari
@ 2015-05-25 17:11                           ` Julien Grall
  0 siblings, 0 replies; 77+ messages in thread
From: Julien Grall @ 2015-05-25 17:11 UTC (permalink / raw)
  To: Vijay Kilari, Julien Grall
  Cc: Ian Campbell, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

Hi,

On 25/05/2015 15:38, Vijay Kilari wrote:
> On Mon, May 25, 2015 at 6:14 PM, Julien Grall <julien.grall@citrix.com> wrote:
>> AFAIU your proposal, the function mapping(vCID) will always return the same
>> pCID, right?
>
> Yes, vCID to pCID is mapped

But how? Let say we have a function vCID_to_pCID which take a vCID in 
parameter and return the corresponding pCID. Is this function pure (i.e 
the function always evaluates the same result and don't depend on any 
hidden information)?

Don't forget that any interrupt associated to a collection should be 
moved with the collection. So depending how you decide to map the vCID 
to the pCID you may also need to move all the interrupts one by one.

MOVALL only move the pending interrupt from one vCPU to another vCPU 
(BTW this could be very expensive).


[..]

>> I can't see where the spec says that 2 MAPD (one with V=1 and the other with
>> V=0) is required. The section 4.9.18 contains an 'or':
>>
>> "Issue a mapping command (MAPD; see section 5.13.11) or an un-mapping
>> command"
>>
>> This is related to "Interrupts can be re-mapped or un-mapped".
>>
>> 4.9.18 and 5.13.11 (PRD03-GENC-010745 24.0) are only speaking about a single
>> MAPD:
>>
>> "Note: software might issue a MAPD command to re-map an already mapped
>> device and the ITS must invalidate all cached data for that device."
>>
>
> OK. I have missed this 'or'. If so, MAPD always overwrites the old info

You have to ensure that all interrupts related to this device have been 
disabled before using the new ITT.

You can't trust that the guest did it correctly before re-issuing MAPD.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-21 12:37                             ` Manish Jaggi
@ 2015-05-26 13:04                               ` Ian Campbell
  2015-06-01 22:57                                 ` Manish Jaggi
  0 siblings, 1 reply; 77+ messages in thread
From: Ian Campbell @ 2015-05-26 13:04 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Julien Grall, Stefano Stabellini

On Thu, 2015-05-21 at 05:37 -0700, Manish Jaggi wrote:
> 
> On Tuesday 19 May 2015 07:18 AM, Ian Campbell wrote:
> > On Tue, 2015-05-19 at 19:34 +0530, Vijay Kilari wrote:
> >> On Tue, May 19, 2015 at 7:24 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> >>> On Tue, 2015-05-19 at 14:36 +0100, Ian Campbell wrote:
> >>>> On Tue, 2015-05-19 at 14:27 +0100, Julien Grall wrote:
> >>>>> With the multiple vITS we would have to retrieve the number of vITS.
> >>>>> Maybe by extending the xen_arch_domainconfig?
> >>>> I'm sure we can find a way.
> >>>>
> >>>> The important question is whether we want to go for a N:N vits:pits
> >>>> mapping or 1:N.
> >>>>
> >>>> So far I think we are leaning (slightly?) towards the 1:N model, if we
> >>>> can come up with a satisfactory answer for what to do with global
> >>>> commands.
> >>> Actually, Julien just mentioned NUMA which I think is a strong argument
> >>> for the N:N model.
> >>>
> >>> We need to make a choice here one way or another, since it has knock on
> >>> effects on other parts, e.g the handling of SYNC and INVALL etc.
> >>>
> >>> Given that N:N seems likely to be simpler from the Xen side and in any
> >>> case doesn't preclude us moving to a 1:N model (or even a 2:N model etc)
> >>> in the future how about we start with that?
> >>>
> >>> If there is agreement in taking this direction then I will adjust the
> >>> relevant sections of the document to reflect this.
> >> Yes, this make Xen side simple. Most important point to discuss is
> >>
> >> 1) How Xen maps vITS to pITS. its0 -> vits0?
> > The choices are basically either Xen chooses and the tools get told (or
> > "Just Know" the result), or the tools choose and setup the mapping in
> > Xen via hypercalls.
> >
> This could be one possible flow:
> -1- xen code parses the pci node and creates a pci_hostbridge structure 
> which stores the device_tree ptr.
> (using this pointer msi-parent (or respective its) can be retrieved)
> -2- dom0 invokes a hypercall to register pci_hostbridge (seg_no:cfg_addr)
> -3- Xen now knows that the device id (seg:bus:dev.fn) has which its.
> Using a helper function its node for a seg_no can be retrieved.
> -4- When a device is assigned to a domU, we introduce a new hypercall 
> map_guest_bdf which would let xen know
> that for a guest how a virtual sbdf maps to a physical sdbf

This is an extension to XEN_DOMCTL_assign_device, I think. An extension
because that hypercall currently only receives the physical SBDF.

I wonder how x86 knows the virtual SBDF. Perhaps it has no need to for
some reason.

Anyway, the general shape of this plan seems plausible enough.

> -5- domU is booted with a single virtual its node in device tree. Front 
> end driver  attaches this its as msi-parent
> -6- When domU accesses for ITS are trapped in Xen, using the helper 
> function say
> get_phys_its_for_guest(guest_id, guest_sbdf, /*[out]*/its_ptr *its)
> 
> its can be retrieved.
> AFAIK this is numa safe.
> >> 2) When PCI device is assigned to DomU, how does domU choose
> >>      vITS to send commands.  AFAIK, the BDF of assigned device
> >>      is different from actual BDF in DomU.
> > AIUI this is described in the firmware tables.
> >
> > e.g. in DT via the msi-parent phandle on the PCI root complex or
> > individual device.
> >
> > Is there an assumption here that a single PCI root bridge is associated
> > with a single ITS block? Or can different devices on a PCI bus use
> > different ITS blocks?
> >
> > Ian.
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-22 12:16             ` Vijay Kilari
  2015-05-22 12:49               ` Julien Grall
  2015-05-24 10:35               ` Julien Grall
@ 2015-05-27 11:22               ` Ian Campbell
  2 siblings, 0 replies; 77+ messages in thread
From: Ian Campbell @ 2015-05-27 11:22 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Stefano Stabellini, Prasun Kapoor, manish.jaggi, Julien Grall,
	xen-devel, Julien Grall, Stefano Stabellini

On Fri, 2015-05-22 at 17:46 +0530, Vijay Kilari wrote:

> > OK, Vijay could you make a proposal here please.
> 
> __text__

Thanks, I tried to incorporate / merge this with the stuff Julien
proposed later, and to update based on the discussion in this thread.

Please check the next draft since I'm sure I must have either missed
something or mismerged concepts etc. I've also left some XXX where I
wasn't sure what the conclusion was.

I think the main suggestion was to allocate various data structures at
passthrough setup time rather than at command xlate time.

Please check draft C, which I hope to post shortly.

Ian.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling)
  2015-05-24 10:35               ` Julien Grall
  2015-05-25  9:06                 ` Vijay Kilari
@ 2015-05-27 11:22                 ` Ian Campbell
  1 sibling, 0 replies; 77+ messages in thread
From: Ian Campbell @ 2015-05-27 11:22 UTC (permalink / raw)
  To: Julien Grall
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Stefano Stabellini

On Sun, 2015-05-24 at 11:35 +0100, Julien Grall wrote:
[...]

> I wrote my though for the validation bits (see below) and add some 
> definitions useful for people which don't have the spec.

Thanks for this.

> 
> Emulation of ITS commands
> =========================
> 
> # Introduction
> 
> This document is based on the section 5.13 of GICv3 specification
> (PRD03-GENC-010745 24.0). The goal is to provide insight of the cost
> to emulate ITS commands in Xen.
> 
> The ITS provides 12 commands in order to manage interrupt collection,
> device and interrupts.
> 
> # Definitions

I tried to integrate your text here into the introduction section of the
VITs doc.

[...]
> # Validation of the parameters

And this bit I used as the basis for a new "ITS Command Translation"
chapter.

I have some questions which I have inserted into the next draft of the
doc with XXX markers. I intend to post a new draft very soon, rather
than wait for any discussion, so you might prefer to wait and answer
them in that thread.

> Each command contains parameters that needs to be validated before any 
> usage in Xen or passing to the hardware.
> 
> This section will describe the validation of the main parameters.
> 
> ## Device ID
> 
> This parameter is used in commands which manage the device and the 
> interrupts associated to this device. Checking if a device is present 
> and retrieving the data structure must be fast.
> 
> The device identifiers may not be assigned contiguously and the maximum 
> number is very high (2^32).

That's true for the host, but I think the lookup here needs to be based
on the virtual device id, not necessarily the physical one, so we have
the opportunity to arrange things for our convenience.

In particular we could arrange for device ids to be contiguous (or in a
small number of ranges) and we know that N is going to be much lower
that 2^32 in practice.

So I think we could almost get away with either a simple array or a much
simpler M-level look up (for small M, say 2).

Or is there some constraint which means we cannot virtualise the device
id?

>  The possible efficient data structure would be:
>      1) List: The lookup/deletion is in O(n) and the insertion will 
> depend if the device should be sort following their identifier. The 
> memory overhead is 18 bytes per element.
>      2) Red-black tree: All the operations are O(log(n)). The memory 
> overhead is 24 bytes per element.
> 
> The solution 2) seems the more suitable for having fast deviceID 
> validation even though the memory overhead is a bit higher compare to 
> the list.

Vijay's text discussed Event ID too. I've added some words about that,
they may be rubbish, please check the next draft.

> ## Collection
> 
> This parameter is used in commands which manage collections and 
> interrupt in order to move them for one CPU to another. The ITS is only 
> mandatory to implement N + 1 collections where N is the number of 
> processor on the platform. Furthermore, the identifier are always 
> contiguous.
> 
> If we decide to implement the strict minimum (i.e N + 1), an array is
> enough and will allow operations in O(1).

May not even need that since [0..NR_CPUS+1] would allow us to go
straight to either vcpu->collection_id or domain->collection_id (the
latter being the +1).

> 
> ## Target Address
> 
>        This parameter is used in commands to manage collection.

It's also, I think, the output of the ITS Translation table?

>                              It's a unique
> identifier per processor. The format is different following the value
> of the bit GITS_TYPER.PTA (see definition). The value of the field 
> pre-defined by the ITS and the software has to handle the 2 cases.

IOW the bit is r/o and fixed by the ITS implementor?

> The solution with GITS_TYPER.PTA set to one will require some computation
> in order to find the VCPU associated with the redistributor address. It 
> will be similar to get_vcpu_from_rdist in the vGICv3 emulation 
> (xen/arch/arm/vgic-v3.c).
> 
> On another hand, setting GITS_TYPER.PTA to zero will give us control to
> decide the linear process number  which could simply be the vcpu_id (always
> linear).

Does this get more complicated with large numbers of vcpus on gic v3
(i.e. once AFFR>0 gets involved)?

> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-05-26 13:04                               ` Ian Campbell
@ 2015-06-01 22:57                                 ` Manish Jaggi
  2015-06-02  8:29                                   ` Ian Campbell
  0 siblings, 1 reply; 77+ messages in thread
From: Manish Jaggi @ 2015-06-01 22:57 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Julien Grall, Stefano Stabellini


[-- Attachment #1.1: Type: text/plain, Size: 3899 bytes --]



On Tuesday 26 May 2015 06:04 AM, Ian Campbell wrote:
> On Thu, 2015-05-21 at 05:37 -0700, Manish Jaggi wrote:
>> On Tuesday 19 May 2015 07:18 AM, Ian Campbell wrote:
>>> On Tue, 2015-05-19 at 19:34 +0530, Vijay Kilari wrote:
>>>> On Tue, May 19, 2015 at 7:24 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
>>>>> On Tue, 2015-05-19 at 14:36 +0100, Ian Campbell wrote:
>>>>>> On Tue, 2015-05-19 at 14:27 +0100, Julien Grall wrote:
>>>>>>> With the multiple vITS we would have to retrieve the number of vITS.
>>>>>>> Maybe by extending the xen_arch_domainconfig?
>>>>>> I'm sure we can find a way.
>>>>>>
>>>>>> The important question is whether we want to go for a N:N vits:pits
>>>>>> mapping or 1:N.
>>>>>>
>>>>>> So far I think we are leaning (slightly?) towards the 1:N model, if we
>>>>>> can come up with a satisfactory answer for what to do with global
>>>>>> commands.
>>>>> Actually, Julien just mentioned NUMA which I think is a strong argument
>>>>> for the N:N model.
>>>>>
>>>>> We need to make a choice here one way or another, since it has knock on
>>>>> effects on other parts, e.g the handling of SYNC and INVALL etc.
>>>>>
>>>>> Given that N:N seems likely to be simpler from the Xen side and in any
>>>>> case doesn't preclude us moving to a 1:N model (or even a 2:N model etc)
>>>>> in the future how about we start with that?
>>>>>
>>>>> If there is agreement in taking this direction then I will adjust the
>>>>> relevant sections of the document to reflect this.
>>>> Yes, this make Xen side simple. Most important point to discuss is
>>>>
>>>> 1) How Xen maps vITS to pITS. its0 -> vits0?
>>> The choices are basically either Xen chooses and the tools get told (or
>>> "Just Know" the result), or the tools choose and setup the mapping in
>>> Xen via hypercalls.
>>>
>> This could be one possible flow:
>> -1- xen code parses the pci node and creates a pci_hostbridge structure
>> which stores the device_tree ptr.
>> (using this pointer msi-parent (or respective its) can be retrieved)
>> -2- dom0 invokes a hypercall to register pci_hostbridge (seg_no:cfg_addr)
>> -3- Xen now knows that the device id (seg:bus:dev.fn) has which its.
>> Using a helper function its node for a seg_no can be retrieved.
>> -4- When a device is assigned to a domU, we introduce a new hypercall
>> map_guest_bdf which would let xen know
>> that for a guest how a virtual sbdf maps to a physical sdbf
> This is an extension to XEN_DOMCTL_assign_device, I think. An extension
> because that hypercall currently only receives the physical SBDF.
>
> I wonder how x86 knows the virtual SBDF. Perhaps it has no need to for
> some reason.
>
> Anyway, the general shape of this plan seems plausible enough.

Could you modify the http://xenbits.xen.org/people/ianc/vits/draftC.html(5  vITS to pITS mapping  <http://xenbits.xen.org/people/ianc/vits/draftC.html#TOC>) based on this approach

>> -5- domU is booted with a single virtual its node in device tree. Front
>> end driver  attaches this its as msi-parent
>> -6- When domU accesses for ITS are trapped in Xen, using the helper
>> function say
>> get_phys_its_for_guest(guest_id, guest_sbdf, /*[out]*/its_ptr *its)
>>
>> its can be retrieved.
>> AFAIK this is numa safe.
>>>> 2) When PCI device is assigned to DomU, how does domU choose
>>>>       vITS to send commands.  AFAIK, the BDF of assigned device
>>>>       is different from actual BDF in DomU.
>>> AIUI this is described in the firmware tables.
>>>
>>> e.g. in DT via the msi-parent phandle on the PCI root complex or
>>> individual device.
>>>
>>> Is there an assumption here that a single PCI root bridge is associated
>>> with a single ITS block? Or can different devices on a PCI bus use
>>> different ITS blocks?
>>>
>>> Ian.
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>


[-- Attachment #1.2: Type: text/html, Size: 5567 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Xen/arm: Virtual ITS command queue handling
  2015-06-01 22:57                                 ` Manish Jaggi
@ 2015-06-02  8:29                                   ` Ian Campbell
  0 siblings, 0 replies; 77+ messages in thread
From: Ian Campbell @ 2015-06-02  8:29 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: Vijay Kilari, Stefano Stabellini, Prasun Kapoor, manish.jaggi,
	Julien Grall, xen-devel, Julien Grall, Stefano Stabellini

On Mon, 2015-06-01 at 15:57 -0700, Manish Jaggi wrote:

> > Anyway, the general shape of this plan seems plausible enough.
> Could you modify the http://xenbits.xen.org/people/ianc/vits/draftC.html(5 vITS to pITS mapping) based on this approach

I'm updating things as I go and feed back will be relected in the next
draft.


> > > -5- domU is booted with a single virtual its node in device tree. Front 
> > > end driver  attaches this its as msi-parent
> > > -6- When domU accesses for ITS are trapped in Xen, using the helper 
> > > function say
> > > get_phys_its_for_guest(guest_id, guest_sbdf, /*[out]*/its_ptr *its)
> > > 
> > > its can be retrieved.
> > > AFAIK this is numa safe.
> > > > > 2) When PCI device is assigned to DomU, how does domU choose
> > > > >      vITS to send commands.  AFAIK, the BDF of assigned device
> > > > >      is different from actual BDF in DomU.
> > > > AIUI this is described in the firmware tables.
> > > > 
> > > > e.g. in DT via the msi-parent phandle on the PCI root complex or
> > > > individual device.
> > > > 
> > > > Is there an assumption here that a single PCI root bridge is associated
> > > > with a single ITS block? Or can different devices on a PCI bus use
> > > > different ITS blocks?
> > > > 
> > > > Ian.
> > > > 
> > > > 
> > > > _______________________________________________
> > > > Xen-devel mailing list
> > > > Xen-devel@lists.xen.org
> > > > http://lists.xen.org/xen-devel
> > 
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2015-06-02  8:29 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-05 12:14 Xen/arm: Virtual ITS command queue handling Vijay Kilari
2015-05-05 13:51 ` Stefano Stabellini
2015-05-05 13:54   ` Julien Grall
2015-05-05 15:56   ` Vijay Kilari
2015-05-05 14:09 ` Julien Grall
2015-05-05 16:09   ` Vijay Kilari
2015-05-05 16:27     ` Julien Grall
2015-05-12 15:02 ` Ian Campbell
2015-05-12 17:35   ` Julien Grall
2015-05-13 13:23     ` Ian Campbell
2015-05-13 14:26       ` Julien Grall
2015-05-15 10:59         ` Ian Campbell
2015-05-15 11:26           ` Vijay Kilari
2015-05-15 11:30             ` Ian Campbell
2015-05-15 12:03               ` Julien Grall
2015-05-15 12:47                 ` Vijay Kilari
2015-05-15 12:52                   ` Julien Grall
2015-05-15 12:53                   ` Ian Campbell
2015-05-15 13:14                     ` Vijay Kilari
2015-05-15 13:24                       ` Ian Campbell
2015-05-15 13:44                         ` Julien Grall
2015-05-15 14:04                           ` Vijay Kilari
2015-05-15 15:05                             ` Julien Grall
2015-05-15 15:38                               ` Ian Campbell
2015-05-15 17:31                                 ` Julien Grall
2015-05-16  4:03                                   ` Vijay Kilari
2015-05-16  8:49                                     ` Julien Grall
2015-05-19 11:38                                       ` Vijay Kilari
2015-05-19 11:48                                         ` Ian Campbell
2015-05-19 11:55                                         ` Ian Campbell
2015-05-19 12:10                                           ` Vijay Kilari
2015-05-19 12:19                                             ` Ian Campbell
2015-05-19 12:48                                               ` Vijay Kilari
2015-05-19 13:12                                                 ` Ian Campbell
2015-05-19 14:05                                                 ` Julien Grall
2015-05-19 14:48                                                   ` Ian Campbell
2015-05-19 15:44                                                     ` Julien Grall
2015-05-15 14:05                           ` Ian Campbell
2015-05-15 12:19           ` Julien Grall
2015-05-15 12:58             ` Ian Campbell
2015-05-15 13:24               ` Julien Grall
2015-05-19 12:14                 ` Ian Campbell
2015-05-19 13:27                   ` Julien Grall
2015-05-19 13:36                     ` Ian Campbell
2015-05-19 13:46                       ` Julien Grall
2015-05-19 13:54                       ` Ian Campbell
2015-05-19 14:04                         ` Vijay Kilari
2015-05-19 14:18                           ` Ian Campbell
2015-05-21 12:37                             ` Manish Jaggi
2015-05-26 13:04                               ` Ian Campbell
2015-06-01 22:57                                 ` Manish Jaggi
2015-06-02  8:29                                   ` Ian Campbell
2015-05-19 14:06                         ` Julien Grall
2015-05-13 16:27   ` Vijay Kilari
2015-05-15 11:28     ` Ian Campbell
2015-05-15 12:38       ` Vijay Kilari
2015-05-15 13:06         ` Ian Campbell
2015-05-15 13:17         ` Julien Grall
2015-05-15 11:45   ` Xen on ARM vITS Handling Draft B (Was Re: Xen/arm: Virtual ITS command queue handling) Ian Campbell
2015-05-15 14:55     ` Julien Grall
2015-05-19 12:10       ` Ian Campbell
2015-05-19 13:37         ` Julien Grall
2015-05-19 13:51           ` Ian Campbell
2015-05-22 12:16             ` Vijay Kilari
2015-05-22 12:49               ` Julien Grall
2015-05-22 13:58                 ` Vijay Kilari
2015-05-22 14:35                   ` Julien Grall
2015-05-22 14:54                     ` Vijay Kilari
2015-05-24 10:35               ` Julien Grall
2015-05-25  9:06                 ` Vijay Kilari
2015-05-25  9:32                   ` Julien Grall
2015-05-25 10:40                     ` Vijay Kilari
2015-05-25 12:44                       ` Julien Grall
2015-05-25 13:38                         ` Vijay Kilari
2015-05-25 17:11                           ` Julien Grall
2015-05-27 11:22                 ` Ian Campbell
2015-05-27 11:22               ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.