From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:48753)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lprosek@redhat.com>) id 1aR1lH-0007MB-6n
	for qemu-devel@nongnu.org; Wed, 03 Feb 2016 13:02:40 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <lprosek@redhat.com>) id 1aR1lD-0003Lb-W7
	for qemu-devel@nongnu.org; Wed, 03 Feb 2016 13:02:39 -0500
Received: from mx4-phx2.redhat.com ([209.132.183.25]:46671)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lprosek@redhat.com>) id 1aR1lD-0003LU-Ob
	for qemu-devel@nongnu.org; Wed, 03 Feb 2016 13:02:35 -0500
Date: Wed, 3 Feb 2016 13:02:34 -0500 (EST)
From: Ladi Prosek <lprosek@redhat.com>
Message-ID: <831310576.31418757.1454522554452.JavaMail.zimbra@redhat.com>
In-Reply-To: <20160203123639.GA20527@grmbl.mre>
References: <1453465198-11000-1-git-send-email-lprosek@redhat.com>
	<20160203123639.GA20527@grmbl.mre>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH] rng-random: implement request queue
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Amit Shah <amit.shah@redhat.com>
Cc: pagupta@redhat.com, qemu-devel@nongnu.org

Hi Amit,

----- Original Message -----
> Hi Ladi,
> 
> Adding Pankaj to CC, he too looked at this recently.
> 
> On (Fri) 22 Jan 2016 [13:19:58], Ladi Prosek wrote:
> > If the guest adds a buffer to the virtio queue while another buffer
> > is still pending and hasn't been filled and returned by the rng
> > device, rng-random internally discards the pending request, which
> > leads to the second buffer getting stuck in the queue. For the guest
> > this manifests as delayed completion of reads from virtio-rng, i.e.
> > a read is completed only after another read is issued.
> > 
> > This patch adds an internal queue of requests, analogous to what
> > rng-egd uses, to make sure that requests and responses are balanced
> > and correctly ordered.
> 
> ... and this can lead to breaking migration (the queue of requests on
> the host needs to be migrated, else the new host will have no idea of
> the queue).

I was under the impression that clearing the queue pre-migration as
implemented by the RngBackendClass::cancel_requests callback is enough.
If it wasn't, the rgn-egd backend would be already broken as its
queueing logic is pretty much identical.

/**
 * rng_backend_cancel_requests:
 * @s: the backend to cancel all pending requests in
 *
 * Cancels all pending requests submitted by @rng_backend_request_entropy.  This
 * should be used by a device during reset or in preparation for live migration
 * to stop tracking any request.
 */
void rng_backend_cancel_requests(RngBackend *s);

Upon closer inspection though, this function appears to have no callers.
Either I'm missing something or there's another bug to be fixed.

> I think we should limit the queue size to 1 instead.  Multiple rng
> requests should not be common, because if we did have entropy, we'd
> just service the guest request and be done with it.  If we haven't
> replied to the guest, it just means that the host itself is waiting
> for more entropy, or is waiting for the timeout before the guest's
> ratelimit is lifted.

The scenario I had in mind is multiple processes in the guest
requesting entropy at the same time, no ratelimit, and fast entropy
source in the host. Being able to queue up requests would definitely
help boost performance, I think I even benchmarked it but I must
have lost the numbers. I can set it up again and rerun the benchmark
if you're interested.
 
> So, instead of fixing this using a queue, how about limiting the size
> of the vq to have just one element at a time?

I don't believe that this is a good solution. Although perfectly valid
spec-wise, I can see how a one-element queue could confuse less than
perfect driver implementations. Additionally, the driver would have to
implement some kind of a guest-side queueing logic and serialize its
requests or else be dropping them if the virtqueue is full. Overall,
I don't think that it's completely crazy to call it a breaking change.

> Thanks,
> 
> 		Amit
> 
>