Re: FIFO-based event channel ABI design (draft B)

From: David Vrabel <david.vrabel@citrix.com>
To: Wei Liu <wei.liu2@citrix.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: FIFO-based event channel ABI design (draft B)
Date: Fri, 15 Feb 2013 18:46:51 +0000	[thread overview]
Message-ID: <511E829B.8020800@citrix.com> (raw)
In-Reply-To: <1360952378.16636.195.camel@zion.uk.xensource.com>

On 15/02/13 18:19, Wei Liu wrote:
> Apart from the TAIL field, I have some comments on the hypercall
> interface. Please see inline comments.
> 
> On Fri, 2013-02-15 at 14:32 +0000, David Vrabel wrote:
>>
>> ### `EVTCHNOP_init_control`
>>
>> This call initializes a single VCPU's control block.
>>
>> A guest should call this during initial VCPU bring up.  The guest must
>> have already successfully registered a vcpu_info structure and the
>> control block must be in the same page.
>>
>> If this call fails on the boot VCPU, the guest should continue to use
>> the 2-level event channel ABI for all VCPUs.  If this call fails on
>> any non-boot VCPU then the VCPU will be unable to receive events and
>> the guest should offline the VCPU.
>>
> 
> Why offline this CPU? This call only registers control block, we can
> switch back to use 2-level for all CPUs.

Because switching back for the other VCPUs is potentially impossible (we
may have hotplugged this new VCPU and more than 4096 event channels may
be currently in use).  It would also require migrating events between
the new and old data structures which is hard.

I would expect this call to never fail in normal operation (except on
the boot VCPU where support may be missing).  i.e., it will only fail if
xenheap or map space is exhausted.  Both the xenheap and the map space
should be large enough so we run out of other resources (e.g,, domheap)
first.

> This is not a right or wrong question, but consider, how many vcpus will
> a guest have? If the number is small, user would probably choose to have
> all their cpus online over the new event interface.
> 
>>> Note: This only initializes the control block. At least one page
>>> needs to be added to the event arrary with `EVTCHNOP_expand_array`.
>>
>>     struct evtchnop_init_control {
>>         uint64_t control_pfn;
>>         uint32_t offset;
>>         uint32_t vcpu;
>>     };
>>
>> Field          Purpose
>> -----          -------
>> `control_pfn`  [in] The PFN or GMFN of the page containing the control
>> block.
>> `offset`       [in] Offset in bytes from the start of the page to the
>> beginning of the control block.
>> `vcpu`         [in] The VCPU number.
>>
>> Error code  Reason
>> ----------  ------
>> EINVAL      `vcpu` is invalid or already initialized.
>> EINVAL      `control_pfn` is not a valid frame for the domain.
>> EINVAL      `control_pfn` is not the same frame as the vcpu_info structure.
> 
> Hmm... then you can omit control_pfn entirely?

I left it in so the requirement that the two structures share pages can
be relaxed in the future.

And it will catch callers that haven't laid out their structures
correctly which is useful.

>> EINVAL      `offset` is not a multiple of 8 or the control block would
>> cross a page boundary.
>> ENOMEM      Insufficient memory to allocate internal structures.
>>
>> ### `EVTCHNOP_expand_array`
>>
>> This call expands the event array by appending an additional page.
>>
> 
> A bit implementation detail:
> 
> What state should guest / hypervisor be in when issuing this call?

The boot VCPU shall have successfully initialized its control block.  I
don't think there are any other restrictions on this call.

>> ### `EVTCHNOP_set_limit`
>>
>> This privileged call sets the highest port number a domain can bind an
>> event channel to.  The default for dom0 is the maximum supported
>> ($2^{17}-1$).  Other domains default to 1023 (requiring only a single
>> page for their event array).
>>
>> The limit only affects future attempts to bind event channels.  Event
>> channels that are already bound are not affected.
>>
>> It is recommended that the toolstack only calls this during domain
>> creation before the guest is started.
>>
> 
> Is it OK to shrink port limit?

Yes, but note the affect this has:

"The limit only affects future attempts to bind event channels.  Event
channels that are already bound are not affected."

>> Upcall
>> ------
>>
>> When Xen places an event on an empty queue it sets the queue as ready
>> in the control block.  If the ready bit transitions from 0 to 1, a new
>> event is signalled to the guest.
>>
>> The guest uses the control block's ready field to find the highest
>> priority queue with pending events.  The bit in the ready word is
>> cleared when the queue is emptied.
>>
>> Higher priority events do not need to preempt lower priority event
>> handlers so the guest can handle events by taking one event off the
>> currently ready queue with highest priority.
>>
>>     r = C.ready
>>     while r
>>         q = find_first_set_bit(r)
>>         consume_one_event_from_queue(q)
>>         if C.queue[q].H == 0
>>             C.ready[q] = 0
>>             r[b] = 0
> 
> What is b? Could be clear_bit(r,q)?

Typo, oops.

r[q] = 0

i.e., clear the q'th bit of r, so clear_bit(q, r) yes.

Typos not withstanding, would the pseudocode be clearer if it used Linux
style bit ops?

David