From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:48320)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1V6iGo-0004gl-16
	for qemu-devel@nongnu.org; Tue, 06 Aug 2013 10:30:02 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1V6iGf-0006xX-Bi
	for qemu-devel@nongnu.org; Tue, 06 Aug 2013 10:29:53 -0400
Received: from mail-wg0-x236.google.com ([2a00:1450:400c:c00::236]:59379)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1V6iGf-0006xM-3X
	for qemu-devel@nongnu.org; Tue, 06 Aug 2013 10:29:45 -0400
Received: by mail-wg0-f54.google.com with SMTP id e12so414195wgh.21
	for <qemu-devel@nongnu.org>; Tue, 06 Aug 2013 07:29:44 -0700 (PDT)
Date: Tue, 6 Aug 2013 16:29:40 +0200
From: Stefan Hajnoczi <stefanha@gmail.com>
Message-ID: <20130806142940.GB4373@stefanha-thinkpad.redhat.com>
References: <1375639805-1943-1-git-send-email-alex@alex.org.uk>
	<1375780592-22842-1-git-send-email-alex@alex.org.uk>
	<1375780592-22842-6-git-send-email-alex@alex.org.uk>
	<20130806122633.GC30812@stefanha-thinkpad.redhat.com>
	<5AE028D8D3CBDB45EFBBD20A@nimrod.local>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <5AE028D8D3CBDB45EFBBD20A@nimrod.local>
Subject: Re: [Qemu-devel] [RFC] [PATCHv6 05/16] aio / timers: Split
 QEMUClock into QEMUClock and QEMUTimerList
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Bligh <alex@alex.org.uk>
Cc: Kevin Wolf <kwolf@redhat.com>, Anthony Liguori <aliguori@us.ibm.com>, qemu-devel@nongnu.org, liu ping fan <qemulist@gmail.com>, Stefan Hajnoczi <stefanha@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>, rth@twiddle.net

On Tue, Aug 06, 2013 at 01:49:03PM +0100, Alex Bligh wrote:
> Stefan,
> 
> >Although in the short time it's easier to keep the old API where
> >QEMUClock also acts as a timer list, I don't think it's a good idea to
> >do that.
> >
> >It makes timers harder to understand for everyone because we now have
> >timerlists but an extra layer to sort of make QEMUClock like a
> >timerlist.
> 
> I think I disagree here.
> 
> At the very least we should put the conversion to use the new API
> into a separate patch (possibly a separate patch set). It's fantastically
> intrusive.

Yes, it should be a separate patch.

> However it touches so much (it touches just about every driver) that
> it's going to be really hard for people to maintain a driver that
> works with two versions of qemu. This is really useful (having just
> backported the modern ceph driver to qemu 1.0 and 1.3). Moreover it's
> going to break any developer with an existing non-upstreamed branch

That's true and there are other instances of this like the multi-queue
networking or qemu_get_clock_ns().  But upstream cannot be held back
because of downstreams and especially not because of out-of-tree
branches.

> The change is pretty mechanical, so what I'd suggest is that we keep
> the old API around too (as deprecated) for a little while. At some
> stage we can fix existing code and after that point not accept patches
> that use the existing API.
> 
> Even if the period is just a month (i.e. the old API goes before 1.7),
> why break things unnecessarily?

Nothing upstream breaks.  Only out-of-tree code breaks but that's life.

What's important is that upstream will be clean and easy to understand
or debug.  Given how undocumented the QEMU codebase is, leaving legacy
API layers around just does more to confuse new contributors.

That's why I'd really like to transition now instead of leaving things
in a more complex state than before.

> >What is the point of this change?  It's not clear how using
> >qemu_clocks[] is an improvement over rt_clock, vm_clock, and host_clock.
> 
> Because you can iterate through all the clocks with a for() loop as
> is done in six places in qemu-timer.c by the end of the patch
> series (look for QEMU_CLOCK_MAX). This also allows us to use
> a timerlistgroup (a set of timerlists, one for each clock) so
> each AioContext can have a clock of each type, as Paolo requested
> in response to v4.

Later in the patch series I realized how this gets used.  Thanks for
explaining.

We end up with:

AioContext->tlg and default_timerlistgroup.

Regarding naming, I think default_timerlistgroup should be called
main_loop_timerlistgroup instead.  The meaning of "default" is not
obvious.

Now let's think about how callers will create QEMUTimers:

1. AioContext

   timer_new(ctx->tlg[QEMU_CLOCK_RT], SCALE_NS, cb, opaque);

   Or with a wrapper:

   QEMUTimer *aio_new_timer(AioContext *ctx, QEMUClockType type, int scale,
                            QEMUTimerCB *cb, void *opaque)
   {
       return timer_new(ctx->tlg[type], scale, cb, opaque);
   }

   aio_new_timer(ctx, QEMU_CLOCK_RT, SCALE_NS, cb, opaque);

2. main-loop

   /* without legacy qemu_timer_new() */
   timer_new(main_loop_tlg[QEMU_CLOCK_RT], SCALE_NS, cb, opaque);

   Or with a wrapper:

   QEMUTimer *qemu_new_timer(QEMUClockType type, int scale,
                             QEMUTimerCB *cb, void *opaque)
   {
       return timer_new(main_loop_tlg[type], scale, cb, opaque);
   }

   qemu_new_timer(QEMU_CLOCK_RT, SCALE_NS, cb, opaque);

Is this what you have in mind too?

> >Or in other words: why is timerlist_new necessary?
> 
> I think that question is entirely orthogonal. This generates a new
> timerlist. In v4 each AioContext had its own timerlist. Now it has
> its own timerlistgroup, with one timerlist of each clock type.
> The constructor timerlist_new is there to initialise the timerlist
> which broadly is a malloc(), setting ->clock, and inserting it
> onto the list of timerlists associated with that clock. How would
> we avoid this if we still want multiple timerlists per clock?

I think I'm okay with the array indexing.  Anyway, here were my
thoughts:

I guess I was thinking about keeping global clock sources for rt, vm,
and host.  Then you always use timerlist_new_from_clock() and you don't
need timerlist_new() at all.

But this doesn't allow for the array indexing that you do in
TimerListGroup later.  I didn't know that at this point in the patch
series.

> >> struct QEMUTimer {
> >>     int64_t expire_time;	/* in nanoseconds */
> >>-    QEMUClock *clock;
> >>+    QEMUTimerList *tl;
> >
> >'timer_list' is easier to read than just 'tl'.
> 
> It caused a pile of line wrap issues which made the patch harder
> to read, so I shortened it. I can put it back if you like.

Are you sure it's the QEMUTimer->tl field that causes line wraps?

I took a quick look and it seemed like only the QEMUTimerList *tl
function argument to the deadline functions could cause line wrap.  The
argument variable is unrelated and can stay short since it has a very
narrow context - the reader can be expected to remember the tl argument
while reading the code for the function.

> >>+bool qemu_clock_use_for_deadline(QEMUClock *clock)
> >>+{
> >>+    return !(use_icount && (clock->type = QEMU_CLOCK_VIRTUAL));
> >>+}
> >
> >Please use doc comments (see include/object/qom.h for example doc
> >comment syntax).  No idea why this function is needed or what it will be
> >used for.
> 
> I will comment it, but it mostly does what it says in the tin. Per
> Paolo's comment, the vm_clock should not be used for calculation of
> deadlines to ppoll etc. if use_icount is true, because it's not actually
> in nanoseconds; rather qemu_notify() or aio_notify() get called by the
> vm cpu thread when the relevant instruction counter is exceeded.

I didn't know that but the explanation makes sense.  Definitely
something that could be in a comment.  Perhaps its best to introduce
this small helper function in the patch that actually calls it.

Stefan