Re: Interesting observation with network event notification and batching

From: Andrew Bennieston <andrew.bennieston@citrix.com>
To: Wei Liu <wei.liu2@citrix.com>
Cc: annie.li@oracle.com, xen-devel@lists.xen.org,
	Ian Campbell <Ian.Campbell@citrix.com>,
	stefano.stabellini@eu.citrix.com,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: Interesting observation with network event notification and batching
Date: Mon, 17 Jun 2013 11:56:09 +0100	[thread overview]
Message-ID: <51BEEB49.4060704@citrix.com> (raw)
In-Reply-To: <20130617104646.GD1757@zion.uk.xensource.com>

On 17/06/13 11:46, Wei Liu wrote:
> On Mon, Jun 17, 2013 at 10:56:12AM +0100, Andrew Bennieston wrote:
>> On 17/06/13 10:38, Ian Campbell wrote:
>>> On Sun, 2013-06-16 at 10:54 +0100, Wei Liu wrote:
>>>>>> Konrad, IIRC you once mentioned you discovered something with event
>>>>>> notification, what's that?
>>>>>
>>>>> They were bizzare. I naively expected some form of # of physical NIC
>>>>> interrupts to be around the same as the VIF or less. And I figured
>>>>> that the amount of interrupts would be constant irregardless of the
>>>>> size of the packets. In other words #packets == #interrupts.
>>>>>
>>>>
>>>> It could be that the frontend notifies the backend for every packet it
>>>> sends. This is not desirable and I don't expect the ring to behave that
>>>> way.
>>
>> I have observed this kind of behaviour during network performance
>> tests in which I periodically checked the ring state during an iperf
>> session. It looked to me like the frontend was sending notifications
>> far too often, but that the backend was sending them very
>> infrequently, so the Tx (from guest) ring was mostly empty and the
>> Rx (to guest) ring was mostly full. This has the effect of both
>> front and backend having to block occasionally waiting for the other
>> end to clear or fill a ring, even though there is more data
>> available.
>>
>> My initial theory was that this was caused in part by the shared
>> event channel, however I expect that Wei is testing on top of a
>> kernel with his split event channel features?
>>
>
> Yes, with split event channels.
>
> And during tests the interrupt counts, frontend TX has 6 figures
> interrupt number while frontend RX has 2 figures number.
>
>>>
>>> It is probably worth checking that things are working how we think they
>>> should. i.e. that netback's calls to RING_FINAL_CHECK_FOR_.. and
>>> netfront's calls to RING_PUSH_..._AND_CHECK_NOTIFY are placed at
>>> suitable points to maximise batching.
>>>
>>> Is the RING_FINAL_CHECK_FOR_REQUESTS inside the xen_netbk_tx_build_gops
>>> loop right? This would push the req_event pointer to just after the last
>>> request, meaning the net request enqueued by the frontend would cause a
>>> notification -- even though the backend is actually still continuing to
>>> process requests and would have picked up that packet without further
>>> notification. n this case there is a fair bit of work left in the
>>> backend for this iteration i.e. plenty of opportunity for the frontend
>>> to queue more requests.
>>>
>>> The comments in ring.h say:
>>>   *  These macros will set the req_event/rsp_event field to trigger a
>>>   *  notification on the very next message that is enqueued. If you want to
>>>   *  create batches of work (i.e., only receive a notification after several
>>>   *  messages have been enqueued) then you will need to create a customised
>>>   *  version of the FINAL_CHECK macro in your own code, which sets the event
>>>   *  field appropriately.
>>>
>>> Perhaps we want to just use RING_HAS_UNCONSUMED_REQUESTS in that loop
>>> (and other similar loops) and add a FINAL check at the very end?
>>>
>>>>> But it was odd and I didn't go deeper in it to figure out what
>>>>> was happening. And also to figure out if for the VIF we could
>>>>> do something of #packets != #interrupts.  And hopefully some
>>>>> mechanism to adjust so that the amount of interrupts would
>>>>> be lesser per packets (hand waving here).
>>>>
>>>> I'm trying to do this now.
>>>
>>> What scheme do you have in mind?
>>
>> As I mentioned above, filling a ring completely appears to be almost
>> as bad as sending too many notifications. The ideal scheme may
>> involve trying to balance the ring at some "half-full" state,
>> depending on the capacity for the front- and backends to process
>> requests and responses.
>>
>
> I don't think filling the ring full causes any problem, that's just
> conceptually the same as "half-full" state if you need to throttle the
> ring.
My understanding was that filling the ring will cause the producer to 
sleep until slots become available (i.e. the until the consumer notifies 
it that it has removed something from the ring).

I'm just concerned that overly aggressive batching may lead to a 
situation where the consumer is sitting idle, waiting for a notification 
that the producer hasn't yet sent because it can still fill more slots 
on the ring. When the ring is completely full, the producer would have 
to wait for the ring to partially empty. At this point, the consumer 
would hold off notifying because it can still batch more processing, so 
the producer is left waiting. (Repeat as required). It would be better 
to have both producer and consumer running concurrently.

I mention this mainly so that we don't end up with a swing to the polar 
opposite of what we have now, which (to my mind) is just as bad. Clearly 
this is an edge case, but if there's a reason I'm missing that this 
can't happen (e.g. after a period of inactivity) then don't hesitate to 
point it out :)

(Perhaps "half-full" was misleading... the optimal state may be "just 
enough room for one more packet", or something along those lines...)

Andrew