ptimer use of bottom-half handlers

* ptimer use of bottom-half handlers
@ 2019-09-27 10:01 Peter Maydell
  2019-09-27 17:40 ` Richard Henderson
  2019-10-04 17:40 ` Peter Maydell
  0 siblings, 2 replies; 5+ messages in thread
From: Peter Maydell @ 2019-09-27 10:01 UTC (permalink / raw)
  To: QEMU Developers; +Cc: Paolo Bonzini, Philippe Mathieu-Daudé

Investigating https://bugs.launchpad.net/qemu/+bug/1777777 I
found what seems to be a design flaw in the ptimer code.

The ptimer API is basically a down-counter (with a lot of bells
and whistles), with functions to do things like "read current count"
and "write current count". When the counter hits zero it typically
reloads, and the ptimer code notifies a callback function in the
device that's using ptimer. In the current implementation this
is done using a QEMU bottom-half handler, so ptimer_trigger()
calls replay_bh_schedule_event() on a QEMUBH that it was passed
by the device when it was initialized. The comment says this is
"to avoid reentrancy issues". That's a bit vague, but I assume the
idea is to avoid the code of the device's 'triggered' callback
being called from within eg ptimer_set_count(), because the
device might be in the middle of changing multiple parts of the
ptimer's state, its own state might not be consistent, etc.

Unfortunately, use of the bottom-half mechanism might have worked
back when the ptimer was first written, but these days we don't
run the iothread in lockstep with a single vcpu thread, so you
can get races, like:

 * the QEMU timer ptimer uses calls the ptimer_tick() function
 * ptimer code updates its own state for the counter rollover,
   and schedules the bottom-half handler to run
 * the vcpu thread executes guest code that does "read the counter
   value", which calls ptimer_get_count() and gets the new
   rolled-over value
 * the vcpu thread executes guest code that does "read an
   interrupt-pending register", which looks at a bit of state
   that's updated by the device's bottom-half handler, and
   incorrectly gets a value indicating "no rollover happened"
 * ...then the bottom-half handler runs and the device code
   updates its interrupt state, but too late.

I'm not sure what the best fix here is, but it's hard to see
how we can really continue to use bottom-half handlers here.

Possibilities I thought of:
 (1) make ptimer_trigger() just directly call the device's
     callback function. We'd need to audit all the devices
     to fix them up to handle the cases where the callback
     gets called while the device is in the middle of
     reconfiguring the timer.
 (2) call the device's callback function directly when the
     ptimer triggers from the QEMU timer expiry. But for
     the case of "a call to ptimer_set_count() etc caused
     the timer to trigger", don't call the callback, instead
     return a boolean from those functions which tells the
     caller that the timer triggered, and they need to deal
     with it (by calling their callback function when they've
     finished messing with the timer).

In either case we would need to gradually convert all (~30)
of the devices currently using ptimer over to use the new
mechanism, which in all cases would require some examination
and modification of the timer code to deal with the new
semantics. (I'm thinking of a patch series structure along
the lines of "patch 1 renames ptimer_init to ptimer_init_bh,
patch 2 introduces new ptimer_init function with new
semantics, patches 3..n change one device at a time,
patch n+1 deletes the now-unused ptimer_init_bh".)

I think overall I favour option 2, which is a bit more
syntactically invasive in terms of changing API signatures etc,
but semantically easier to reason about (because the
callback-function in the device is still not called when
the device might be partway through doing an update to
the ptimer state that changes multiple parameters of the
ptimer).

Is there another cleverer fix that I haven't thought of?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 5+ messages in thread