From: Anthony Liguori <anthony@codemonkey.ws>
To: Avi Kivity <avi@redhat.com>
Cc: qemu-devel <qemu-devel@nongnu.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Gleb Natapov <gleb@redhat.com>, KVM list <kvm@vger.kernel.org>
Subject: Re: [Qemu-devel] [RFC] Next gen kvm api
Date: Mon, 06 Feb 2012 07:33:06 -0600 [thread overview]
Message-ID: <4F2FD692.5060708@codemonkey.ws> (raw)
In-Reply-To: <4F2F9E89.7090607@redhat.com>
On 02/06/2012 03:34 AM, Avi Kivity wrote:
> On 02/05/2012 06:36 PM, Anthony Liguori wrote:
>> On 02/05/2012 03:51 AM, Gleb Natapov wrote:
>>> On Sun, Feb 05, 2012 at 11:44:43AM +0200, Avi Kivity wrote:
>>>> On 02/05/2012 11:37 AM, Gleb Natapov wrote:
>>>>> On Thu, Feb 02, 2012 at 06:09:54PM +0200, Avi Kivity wrote:
>>>>>> Device model
>>>>>> ------------
>>>>>> Currently kvm virtualizes or emulates a set of x86 cores, with or
>>>>>> without local APICs, a 24-input IOAPIC, a PIC, a PIT, and a number of
>>>>>> PCI devices assigned from the host. The API allows emulating the
>>>>>> local
>>>>>> APICs in userspace.
>>>>>>
>>>>>> The new API will do away with the IOAPIC/PIC/PIT emulation and defer
>>>>>> them to userspace. Note: this may cause a regression for older
>>>>>> guests
>>>>>> that don't support MSI or kvmclock. Device assignment will be done
>>>>>> using VFIO, that is, without direct kvm involvement.
>>>>>>
>>>>> So are we officially saying that KVM is only for modern guest
>>>>> virtualization?
>>>>
>>>> No, but older guests may have reduced performance in some workloads
>>>> (e.g. RHEL4 gettimeofday() intensive workloads).
>>>>
>>> Reduced performance is what I mean. Obviously old guests will
>>> continue working.
>>
>> An interesting solution to this problem would be an in-kernel device VM.
>
> It's interesting, yes, but has a very high barrier to implementation.
>
>>
>> Most of the time, the hot register is just one register within a more
>> complex device. The reads are often side-effect free and trivially
>> computed from some device state + host time.
>
> Look at arch/x86/kvm/i8254.c:pit_ioport_read() for a counterexample.
> There are also interactions with other devices (for example the
> apic/ioapic interaction via the apic bus).
Hrm, maybe I'm missing it, but the path that would be hot is:
if (!status_latched && !count_latched) {
value = kpit_elapsed()
// manipulate count based on mode
// mask value depending on read_state
}
This path is side-effect free, and applies relatively simple math to a time counter.
The idea would be to allow the filter to not handle an I/O request depending on
existing state. Anything that's modifies state (like reading the latch counter)
would drop to userspace.
>
>>
>> If userspace had a way to upload bytecode to the kernel that was
>> executed for a PIO operation, it could either pass the operation to
>> userspace or handle it within the kernel when possible without taking
>> a heavy weight exit.
>>
>> If the bytecode can access variables in a shared memory area, it could
>> be pretty efficient to work with.
>>
>> This means that the kernel never has to deal with specific in-kernel
>> devices but that userspace can accelerator as many of its devices as
>> it sees fit.
>
> I would really love to have this, but the problem is that we'd need a
> general purpose bytecode VM with binding to some kernel APIs. The
> bytecode VM, if made general enough to host more complicated devices,
> would likely be much larger than the actual code we have in the kernel now.
I think the question is whether BPF is good enough as it stands. I'm not really
sure. I agree that inventing a new bytecode VM is probably not worth it.
>>
>> This could replace ioeventfd as a mechanism (which would allow
>> clearing the notify flag before writing to an eventfd).
>>
>> We could potentially just use BPF for this.
>
> BPF generally just computes a predicate.
Can it modify a packet in place? I think a predicate is about right (can this
io operation be handled in the kernel or not) but the question is whether
there's a way produce an output as a side effect.
> We could overload the scratch
> area for storing internal state and for read results, though (and have
> an "mmio scratch register" for reading the time).
Right.
Regards,
Anthony Liguori
next prev parent reply other threads:[~2012-02-06 13:33 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-02 16:09 [RFC] Next gen kvm api Avi Kivity
[not found] ` <CAB9FdM9M2DWXBxxyG-ez_5igT61x5b7ptw+fKfgaqMBU_JS5aA@mail.gmail.com>
2012-02-02 22:16 ` [Qemu-devel] " Rob Earhart
2012-02-05 13:14 ` Avi Kivity
2012-02-06 17:41 ` Rob Earhart
2012-02-06 19:11 ` Anthony Liguori
2012-02-07 12:03 ` Avi Kivity
2012-02-07 15:17 ` Anthony Liguori
2012-02-07 16:02 ` Avi Kivity
2012-02-07 16:18 ` Jan Kiszka
2012-02-07 16:21 ` Anthony Liguori
2012-02-07 16:29 ` Jan Kiszka
2012-02-15 13:41 ` Avi Kivity
2012-02-07 16:19 ` Anthony Liguori
2012-02-15 13:47 ` Avi Kivity
2012-02-07 12:01 ` Avi Kivity
2012-02-03 2:09 ` Anthony Liguori
2012-02-04 2:08 ` Takuya Yoshikawa
2012-02-22 13:06 ` Peter Zijlstra
2012-02-05 9:24 ` Avi Kivity
2012-02-07 1:08 ` Alexander Graf
2012-02-07 12:24 ` Avi Kivity
2012-02-07 12:51 ` Alexander Graf
2012-02-07 13:16 ` Avi Kivity
2012-02-07 13:40 ` Alexander Graf
2012-02-07 14:21 ` Avi Kivity
2012-02-07 14:39 ` Alexander Graf
2012-02-15 11:18 ` Avi Kivity
2012-02-15 11:57 ` Alexander Graf
2012-02-15 13:29 ` Avi Kivity
2012-02-15 13:37 ` Alexander Graf
2012-02-15 13:57 ` Avi Kivity
2012-02-15 14:08 ` Alexander Graf
2012-02-16 19:24 ` Avi Kivity
2012-02-16 19:34 ` Alexander Graf
2012-02-16 19:38 ` Avi Kivity
2012-02-16 20:41 ` Scott Wood
2012-02-17 0:23 ` Alexander Graf
2012-02-17 18:27 ` Scott Wood
2012-02-18 9:49 ` Avi Kivity
2012-02-17 0:19 ` Alexander Graf
2012-02-18 10:00 ` Avi Kivity
2012-02-18 10:43 ` Alexander Graf
2012-02-15 19:17 ` Scott Wood
2012-02-12 7:10 ` Takuya Yoshikawa
2012-02-15 13:32 ` Avi Kivity
2012-02-07 15:23 ` Anthony Liguori
2012-02-07 15:28 ` Alexander Graf
2012-02-08 17:20 ` Alan Cox
2012-02-15 13:33 ` Avi Kivity
2012-02-15 22:14 ` Arnd Bergmann
2012-02-10 3:07 ` Jamie Lokier
2012-02-03 18:07 ` Eric Northup
2012-02-03 22:52 ` [Qemu-devel] " Anthony Liguori
2012-02-06 19:46 ` Scott Wood
2012-02-07 6:58 ` Michael Ellerman
2012-02-07 10:04 ` Alexander Graf
2012-02-15 22:21 ` Arnd Bergmann
2012-02-16 1:04 ` Michael Ellerman
2012-02-16 19:28 ` Avi Kivity
2012-02-17 0:09 ` Michael Ellerman
2012-02-18 10:03 ` Avi Kivity
2012-02-16 10:26 ` Avi Kivity
2012-02-07 12:28 ` Anthony Liguori
2012-02-07 12:40 ` Avi Kivity
2012-02-07 12:51 ` Anthony Liguori
2012-02-07 13:18 ` Avi Kivity
2012-02-07 15:15 ` Anthony Liguori
2012-02-07 18:28 ` Chris Wright
2012-02-08 17:02 ` Scott Wood
2012-02-08 17:12 ` Alan Cox
2012-02-05 9:37 ` Gleb Natapov
2012-02-05 9:44 ` Avi Kivity
2012-02-05 9:51 ` Gleb Natapov
2012-02-05 9:56 ` Avi Kivity
2012-02-05 10:58 ` Gleb Natapov
2012-02-05 13:16 ` Avi Kivity
2012-02-05 16:36 ` [Qemu-devel] " Anthony Liguori
2012-02-06 9:34 ` Avi Kivity
2012-02-06 13:33 ` Anthony Liguori [this message]
2012-02-06 13:54 ` Avi Kivity
2012-02-06 14:00 ` Anthony Liguori
2012-02-06 14:08 ` Avi Kivity
2012-02-07 18:12 ` Rusty Russell
2012-02-15 13:39 ` Avi Kivity
2012-02-15 21:59 ` Anthony Liguori
2012-02-16 8:57 ` Gleb Natapov
2012-02-16 14:46 ` Anthony Liguori
2012-02-16 19:34 ` Avi Kivity
2012-02-15 23:08 ` Rusty Russell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F2FD692.5060708@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=avi@redhat.com \
--cc=gleb@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).