All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [qemu-web PATCH] Add "Understanding QEMU devices" blog post
@ 2018-02-09 19:54 Eric Blake
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Blake @ 2018-02-09 19:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, pbonzini

Last July, Eric Blake wrote a nice summary for newcomers about what
QEMU has to do to emulate devices for the guests. So far, we missed
integratating this somewhere into the QEM web site or wiki, so let's
publish this now as a nice blog post for the users.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1516958370-3955-1-git-send-email-thuth@redhat.com>

Incorporate editing suggestions made by others on the list,
particularly from Paolo Bonzini.

Signed-off-by: Eric Blake <eblake@redhat.com>
---

I finally took the time to make the edits suggested when Thomas
first proposed adding my email to the blog.

Thanks again for helping this text improve and reach a broader
audience!

 _posts/2018-02-09-understanding-qemu-devices.md | 179 ++++++++++++++++++++++++
 1 file changed, 179 insertions(+)
 create mode 100644 _posts/2018-02-09-understanding-qemu-devices.md

diff --git a/_posts/2018-02-09-understanding-qemu-devices.md b/_posts/2018-02-09-understanding-qemu-devices.md
new file mode 100644
index 0000000..25130b7
--- /dev/null
+++ b/_posts/2018-02-09-understanding-qemu-devices.md
@@ -0,0 +1,179 @@
+---
+layout: post
+title:  "Understanding QEMU devices"
+date:   2018-02-09 13:30:00 -0600
+author: Eric Blake
+categories: blog
+---
+Here are some notes that may help newcomers understand what is
+actually happening with QEMU devices:
+
+With QEMU, one thing to remember is that we are trying to emulate what
+an Operating System (OS) would see on bare-metal hardware.  Most
+bare-metal machines are basically giant memory maps, where software
+poking at a particular address will have a particular side effect (the
+most common side effect is, of course, accessing memory; but other
+common regions in memory include the register banks for controlling
+particular pieces of hardware, like the hard drive or a network card,
+or even the CPU itself).  The end-goal of emulation is to allow a
+user-space program, using only normal memory accesses, to manage all
+of the side-effects that a guest OS is expecting.
+
+As an implementation detail, some hardware, like x86, actually has two
+memory spaces, where I/O space uses different assembly codes than
+normal; QEMU has to emulate these alternative accesses.  Similarly,
+many modern CPUs provide themselves a bank of CPU-local registers
+within the memory map, such as for an interrupt controller.
+
+With certain hardware, we have virtualization hooks where the CPU
+itself makes it easy to trap on just the problematic assembly
+instructions (those that access I/O space or CPU internal registers,
+and therefore require side effects different than a normal memory
+access), so that the guest just executes the same assembly sequence as
+on bare metal, but that execution then causes a trap to let user-space
+QEMU then react to the instructions using just its normal user-space
+memory accesses before returning control to the guest.  This is
+supported in QEMU through "accelerators".
+
+Virtualizing accelerators, such as KVM, can let a guest run nearly
+as fast as bare metal, where the slowdowns are caused by each trap
+from guest back to QEMU (a vmexit) to handle a difficult assembly
+instruction or memory address.  QEMU also supports other virtualizing
+accelerators (such as
+[HAXM](https://www.qemu.org/2017/11/22/haxm-usage-windows/) or macOS's
+Hypervisor.framework).
+
+QEMU also has a TCG accelerator, which takes the guest assembly
+instructions and compiles it on the fly into comparable host
+instructions or calls to host helper routines; while not as fast as
+hardware acceleration, it allows cross-hardware emulation, such as
+running ARM code on x86.
+
+The next thing to realize is what is happening when an OS is accessing
+various hardware resources.  For example, most operating systems ship
+with a driver that knows how to manage an IDE disk - the driver is
+merely software that is programmed to make specific I/O requests to a
+specific subset of the memory map (wherever the IDE bus lives, which
+is specific to the the hardware board).  When the IDE controller
+hardware receives those I/O requests it then performs the appropriate
+actions (via DMA transfers or other hardware action) to copy data from
+memory to persistent storage (writing to disk) or from persistent
+storage to memory (reading from the disk).
+
+When you first buy bare-metal hardware, your disk is uninitialized; you
+install the OS that uses the driver to make enough bare-metal accesses
+to the IDE hardware portion of the memory map to then turn the disk into
+a set of partitions and filesystems on top of those partitions.
+
+So, how does QEMU emulate this? In the big memory map it provides to
+the guest, it emulates an IDE disk at the same address as bare-metal
+would.  When the guest OS driver issues particular memory writes to
+the IDE control registers in order to copy data from memory to
+persistent storage, the QEMU accelerator traps accesses to that memory
+region, and passes the request on to the QEMU IDE controller device
+model.  The device model then parses the I/O requests, and emulates
+them by issuing host system calls.  The result is that guest memory is
+copied into host storage.
+
+On the host side, the easiest way to emulate persistent storage is via
+treating a file in the host filesystem as raw data (a 1:1 mapping of
+offsets in the host file to disk offsets being accessed by the guest
+driver), but QEMU actually has the ability to glue together a lot of
+different host formats (raw,
+[qcow2](https://git.qemu.org/?p=qemu.git;a=blob;f=docs/interop/qcow2.txt),
+qed,
+[vhdx](https://www.microsoft.com/en-us/download/details.aspx?id=34750),
+...) and protocols (file system, block device,
+[NBD](https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md),
+[Ceph](https://ceph.com/), [gluster](https://www.gluster.org/), ...)
+where any combination of host format and protocol can serve as the
+backend that is then tied to the QEMU emulation providing the guest
+device.
+
+Thus, when you tell QEMU to use a host qcow2 file, the guest does not
+have to know qcow2, but merely has its normal driver make the same
+register reads and writes as it would on bare metal, which cause vmexits
+into QEMU code, then QEMU maps those accesses into reads and writes in
+the appropriate offsets of the qcow2 file.  When you first install the
+guest, all the guest sees is a blank uninitialized linear disk
+(regardless of whether that disk is linear in the host, as in raw
+format, or optimized for random access, as in the qcow2 format); it is
+up to the guest OS to decide how to partition its view of the hardware
+and install filesystems on top of that, and QEMU does not care what
+filesystems the guest is using, only what pattern of raw disk I/O
+register control sequences are issued.
+
+The next thing to realize is that emulating IDE is not always the most
+efficient.  Every time the guest writes to the control registers, it
+has to go through special handling, and vmexits slow down
+emulation. Of course, different hardware models have different
+performance characteristics when virtualized.  In general, however,
+what works best for real hardware does not necessarily work best for
+virtualization, and until recently, hardware was not designed to
+operate fast when emulated by software such as QEMU.  Therefore, QEMU
+includes paravirtualized devices that are designed specifically for
+this purpose.
+
+The meaning of "paravirtualization" here is slightly different from
+the original one of "virtualization through cooperation between the
+guest and host".  The QEMU developers have produced a specification
+for a set of hardware registers and the behavior for those registers
+which are designed to result in the minimum number of vmexits possible
+while still accomplishing what a hard disk must do, namely,
+transferring data between normal guest memory and persistent storage.
+This specification is called virtio; using it requires installation of
+a virtio driver in the guest.  While no physical device exists that
+follows the same register layout as virtio, the concept is the same: a
+virtio disk behaves like a memory-mapped register bank, where the
+guest OS driver then knows what sequence of register commands to write
+into that bank to cause data to be copied in and out of other guest
+memory.  Much of the speedups in virtio come by its design - the guest
+sets aside a portion of regular memory for the bulk of its command
+queue, and only has to kick a single register to then tell QEMU to
+read the command queue (fewer mapped register accesses mean fewer
+vmexits), coupled with handshaking guarantees that the guest driver
+won't be changing the normal memory while QEMU is acting on it.
+
+As an aside, just like recent hardware is fairly efficient to emulate,
+virtio is evolving to be more efficient to implement in hardware, of
+course without sacrificing performance for emulation or
+virtualization.  Therefore, in the future, you could stumble upon
+physical virtio devices as well.
+
+In a similar vein, many operating systems have support for a number of
+network cards, a common example being the e1000 card on the PCI bus.
+On bare metal, an OS will probe PCI space, see that a bank of
+registers with the signature for e1000 is populated, and load the
+driver that then knows what register sequences to write in order to
+let the hardware card transfer network traffic in and out of the
+guest.  So QEMU has, as one of its many network card emulations, an
+e1000 device, which is mapped to the same guest memory region as a
+real one would live on bare metal.
+
+And once again, the e1000 register layout tends to require a lot of
+register writes (and thus vmexits) for the amount of work the hardware
+performs, so the QEMU developers have added the virtio-net card (a PCI
+hardware specification, although no bare-metal hardware exists yet
+that actually implements it), such that installing a virtio-net driver
+in the guest OS can then minimize the number of vmexits while still
+getting the same side-effects of sending network traffic.  If you tell
+QEMU to start a guest with a virtio-net card, then the guest OS will
+probe PCI space and see a bank of registers with the virtio-net
+signature, and load the appropriate driver like it would for any other
+PCI hardware.
+
+In summary, even though QEMU was first written as a way of emulating
+hardware memory maps in order to virtualize a guest OS, it turns out
+that the fastest virtualization also depends on virtual hardware: a
+memory map of registers with particular documented side effects that has
+no bare-metal counterpart.  And at the end of the day, all
+virtualization really means is running a particular set of assembly
+instructions (the guest OS) to manipulate locations within a giant
+memory map for causing a particular set of side effects, where QEMU is
+just a user-space application providing a memory map and mimicking the
+same side effects you would get when executing those guest instructions
+on the appropriate bare metal hardware.
+
+(This post is a slight update on an
+[email](https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg05939.html)
+originally posted to the qemu-devel list back in July 2017).
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [qemu-web PATCH] Add "Understanding QEMU devices" blog post
  2018-01-29 11:27     ` Thomas Huth
@ 2018-01-30 23:02       ` Paolo Bonzini
  0 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2018-01-30 23:02 UTC (permalink / raw)
  To: Thomas Huth, Eric Blake, qemu-devel

On 29/01/2018 12:27, Thomas Huth wrote:
> On 26.01.2018 15:46, Eric Blake wrote:
>> On 01/26/2018 06:40 AM, Paolo Bonzini wrote:
>>> On 26/01/2018 10:19, Thomas Huth wrote:
>>>> Last July, Eric Blake wrote a nice summary for newcomers about what
>>>> QEMU has to do to emulate devices for the guests. So far, we missed
>>>> to integrate this somewhere into the QEM web site or wiki, so let's
>>>> publish this now as a nice blog post for the users.
>>>
>>> It's very nice!  Some proofreading and corrections follow.
>>
>> Thanks for digging up my original email, and enhancing it (I guess the
>> fact that I don't blog very often, and stick to email, means that I rely
>> on others helping to polish my gems for the masses).
> 
> Sure ... Would you like to give it a try this time and continue with the
> patch, or shall I continue and send a v2?
> 
>>>> +++ b/_posts/2018-01-26-understanding-qemu-devices.md
>>>> @@ -0,0 +1,139 @@
>>>> +---
>>>> +layout: post
>>>> +title:  "Understanding QEMU devices"
>>>> +date:   2018-01-26 10:00:00 +0100
>>
>> That's when you're posting it online, but should it also mention when I
>> first started these thoughts in email form?
> 
> At least in the header we should have the current date, or the blog post
> will not show up at the top of the blog. Not sure whether there is an
> alternate field for the original date ... maybe we could rather add an
> "originally written in 2017 ..." sentence at the bottom of the post instead?

I don't think it's relevant, as long as it's brought up to date when we
post it to the web.

Paolo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [qemu-web PATCH] Add "Understanding QEMU devices" blog post
  2018-01-26 14:46   ` Eric Blake
  2018-01-29 11:11     ` Paolo Bonzini
@ 2018-01-29 11:27     ` Thomas Huth
  2018-01-30 23:02       ` Paolo Bonzini
  1 sibling, 1 reply; 8+ messages in thread
From: Thomas Huth @ 2018-01-29 11:27 UTC (permalink / raw)
  To: Eric Blake, Paolo Bonzini, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1436 bytes --]

On 26.01.2018 15:46, Eric Blake wrote:
> On 01/26/2018 06:40 AM, Paolo Bonzini wrote:
>> On 26/01/2018 10:19, Thomas Huth wrote:
>>> Last July, Eric Blake wrote a nice summary for newcomers about what
>>> QEMU has to do to emulate devices for the guests. So far, we missed
>>> to integrate this somewhere into the QEM web site or wiki, so let's
>>> publish this now as a nice blog post for the users.
>>
>> It's very nice!  Some proofreading and corrections follow.
> 
> Thanks for digging up my original email, and enhancing it (I guess the
> fact that I don't blog very often, and stick to email, means that I rely
> on others helping to polish my gems for the masses).

Sure ... Would you like to give it a try this time and continue with the
patch, or shall I continue and send a v2?

>>> +++ b/_posts/2018-01-26-understanding-qemu-devices.md
>>> @@ -0,0 +1,139 @@
>>> +---
>>> +layout: post
>>> +title:  "Understanding QEMU devices"
>>> +date:   2018-01-26 10:00:00 +0100
> 
> That's when you're posting it online, but should it also mention when I
> first started these thoughts in email form?

At least in the header we should have the current date, or the blog post
will not show up at the top of the blog. Not sure whether there is an
alternate field for the original date ... maybe we could rather add an
"originally written in 2017 ..." sentence at the bottom of the post instead?

 Thomas


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [qemu-web PATCH] Add "Understanding QEMU devices" blog post
  2018-01-26 14:46   ` Eric Blake
@ 2018-01-29 11:11     ` Paolo Bonzini
  2018-01-29 11:27     ` Thomas Huth
  1 sibling, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2018-01-29 11:11 UTC (permalink / raw)
  To: Eric Blake, Thomas Huth, qemu-devel

On 26/01/2018 15:46, Eric Blake wrote:
> On 01/26/2018 06:40 AM, Paolo Bonzini wrote:
>> On 26/01/2018 10:19, Thomas Huth wrote:
>>> Last July, Eric Blake wrote a nice summary for newcomers about what
>>> QEMU has to do to emulate devices for the guests. So far, we missed
>>> to integrate this somewhere into the QEM web site or wiki, so let's
>>> publish this now as a nice blog post for the users.
>>
>> It's very nice!  Some proofreading and corrections follow.
> 
> Thanks for digging up my original email, and enhancing it (I guess the
> fact that I don't blog very often, and stick to email, means that I rely
> on others helping to polish my gems for the masses).
> 
>>> +++ b/_posts/2018-01-26-understanding-qemu-devices.md
>>> @@ -0,0 +1,139 @@
>>> +---
>>> +layout: post
>>> +title:  "Understanding QEMU devices"
>>> +date:   2018-01-26 10:00:00 +0100
> 
> That's when you're posting it online, but should it also mention when I
> first started these thoughts in email form?
> 
>>> +author: Eric Blake
>>> +categories: blog
>>> +---
>>> +Here are some notes that may help newcomers understand what is actually
>>> +happening with QEMU devices:
>>> +
>>> +With QEMU, one thing to remember is that we are trying to emulate what
>>> +an OS would see on bare-metal hardware.  All bare-metal machines are
>>
>> s/All/Most/ (s390 anyone? :))
> 
> Also, s/OS/Operating System (OS)/ to make the acronym easier to follow
> in the rest of the document.
> 
>>
>>> +basically giant memory maps, where software poking at a particular
>>> +address will have a particular side effect (the most common side effect
>>> +is, of course, accessing memory; but other common regions in memory
>>> +include the register banks for controlling particular pieces of
>>> +hardware, like the hard drive or a network card, or even the CPU
>>> +itself).  The end-goal of emulation is to allow a user-space program,
>>> +using only normal memory accesses, to manage all of the side-effects
>>> +that a guest OS is expecting.
>>> +
>>> +As an implementation detail, some hardware, like x86, actually has two
>>> +memory spaces, where I/O space uses different assembly codes than
>>> +normal; QEMU has to emulate these alternative accesses.  Similarly, many
>>> +modern hardware is so complex that the CPU itself provides both
>>> +specialized assembly instructions and a bank of registers within the
>>> +memory map (a classic example being the management of the MMU, or
>>> +separation between Ring 0 kernel code and Ring 3 userspace code - if
>>> +that's not crazy enough, there's nested virtualization).
>>
>> I'd say the interrupt controllers are a better example so:
>>
>> Similarly, many modern CPUs provide themselves a bank of CPU-local
>> registers within the memory map, such as for an interrupt controller.
> 
> Is it still worth a mention of nested virtualization?

No, nested virtualization is just two layers of doing the same thing. :)

Paolo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [qemu-web PATCH] Add "Understanding QEMU devices" blog post
  2018-01-26 12:40 ` Paolo Bonzini
@ 2018-01-26 14:46   ` Eric Blake
  2018-01-29 11:11     ` Paolo Bonzini
  2018-01-29 11:27     ` Thomas Huth
  0 siblings, 2 replies; 8+ messages in thread
From: Eric Blake @ 2018-01-26 14:46 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Huth, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 13149 bytes --]

On 01/26/2018 06:40 AM, Paolo Bonzini wrote:
> On 26/01/2018 10:19, Thomas Huth wrote:
>> Last July, Eric Blake wrote a nice summary for newcomers about what
>> QEMU has to do to emulate devices for the guests. So far, we missed
>> to integrate this somewhere into the QEM web site or wiki, so let's
>> publish this now as a nice blog post for the users.
> 
> It's very nice!  Some proofreading and corrections follow.

Thanks for digging up my original email, and enhancing it (I guess the
fact that I don't blog very often, and stick to email, means that I rely
on others helping to polish my gems for the masses).

>> +++ b/_posts/2018-01-26-understanding-qemu-devices.md
>> @@ -0,0 +1,139 @@
>> +---
>> +layout: post
>> +title:  "Understanding QEMU devices"
>> +date:   2018-01-26 10:00:00 +0100

That's when you're posting it online, but should it also mention when I
first started these thoughts in email form?

>> +author: Eric Blake
>> +categories: blog
>> +---
>> +Here are some notes that may help newcomers understand what is actually
>> +happening with QEMU devices:
>> +
>> +With QEMU, one thing to remember is that we are trying to emulate what
>> +an OS would see on bare-metal hardware.  All bare-metal machines are
> 
> s/All/Most/ (s390 anyone? :))

Also, s/OS/Operating System (OS)/ to make the acronym easier to follow
in the rest of the document.

> 
>> +basically giant memory maps, where software poking at a particular
>> +address will have a particular side effect (the most common side effect
>> +is, of course, accessing memory; but other common regions in memory
>> +include the register banks for controlling particular pieces of
>> +hardware, like the hard drive or a network card, or even the CPU
>> +itself).  The end-goal of emulation is to allow a user-space program,
>> +using only normal memory accesses, to manage all of the side-effects
>> +that a guest OS is expecting.
>> +
>> +As an implementation detail, some hardware, like x86, actually has two
>> +memory spaces, where I/O space uses different assembly codes than
>> +normal; QEMU has to emulate these alternative accesses.  Similarly, many
>> +modern hardware is so complex that the CPU itself provides both
>> +specialized assembly instructions and a bank of registers within the
>> +memory map (a classic example being the management of the MMU, or
>> +separation between Ring 0 kernel code and Ring 3 userspace code - if
>> +that's not crazy enough, there's nested virtualization).
> 
> I'd say the interrupt controllers are a better example so:
> 
> Similarly, many modern CPUs provide themselves a bank of CPU-local
> registers within the memory map, such as for an interrupt controller.

Is it still worth a mention of nested virtualization?

> 
> And then a paragraph break.
> 
>> +With certain
>> +hardware, we have virtualization hooks where the CPU itself makes it
>> +easy to trap on just the problematic assembly instructions (those that
>> +access I/O space or CPU internal registers, and therefore require side
>> +effects different than a normal memory access), so that the guest just
>> +executes the same assembly sequence as on bare metal, but that execution
>> +then causes a trap to let user-space QEMU then react to the instructions
>> +using just its normal user-space memory accesses before returning
>> +control to the guest.  This is the kvm accelerator, and can let a guest
> 
> This is supported in QEMU through "accelerators" such as KVM.

Yeah, when I first wrote the email, we didn't have as many accelerators
in qemu.git :)

> 
>> +run nearly as fast as bare metal, where the slowdowns are caused by each
>> +trap from guest back to QEMU (a vmexit) to handle a difficult assembly
>> +instruction or memory address.  QEMU also supports a TCG accelerator,
> 
> QEMU also supports other virtualizing accelerators (such as
> [HAXM](https://www.qemu.org/2017/11/22/haxm-usage-windows/) or macOS's
> Hypervisor.framework) and also TCG,
> 
>> +which takes the guest assembly instructions and compiles it on the fly
>> +into comparable host instructions or calls to host helper routines (not
>> +as fast, but results in QEMU being able to do cross-hardware emulation).
> 
> While not as fast, TCG is able to do cross-hardware emulation, such as
> running ARM code on x86. (Removing the parentheses)
> 
>> +The next thing to realize is what is happening when an OS is accessing
>> +various hardware resources.  For example, most OS ship with a driver
> 
> most operating systems
> 
>> +that knows how to manage an IDE disk - the driver is merely software
>> +that is programmed to make specific I/O requests to a specific subset of
>> +the memory map (wherever the IDE bus lives, as hard-coded by the
>> +hardware board designers),
> 
> (wherever the IDE bus lives, which is specific the hardware board).

specific to the

> 
>  in order to make the disk drive hardware then
>> +obey commands to copy data from memory to persistent storage (writing to
>> +disk) or from persistent storage to memory (reading from the disk).
> 
> When the IDE controller hardware receives those I/O requests it
> communicates with the disk drive hardware, ultimately resulting in data
> being copied from memory...
> 
>> +When you first buy bare-metal hardware, your disk is uninitialized; you
>> +install the OS that uses the driver to make enough bare-metal accesses
>> +to the IDE hardware portion of the memory map to then turn the disk into
>> +a set of partitions and filesystems on top of those partitions.
>> +
>> +So, how does QEMU emulate this? In the big memory map it provides to the
>> +guest, it emulates an IDE disk at the same address as bare-metal would.
>> +When the guest OS driver issues particular memory writes to the IDE
>> +control registers in order to copy data from memory to persistent
>> +storage, QEMU traps on those writes (whether via kvm hypervisor assist,
>> +or by noticing during TCG translation that the addresses being accessed
>> +are special),
> 
> the accelerator knows that these writes must trap (remove everything in
> parentheses) and passes them to the QEMU IDE controller _device model_.
> 
>  and emulates the same side effects by issuing host
>> +commands to copy the specified guest memory into host storage.
> 
> The device model parses the I/O requests, then emulates them by issuing
> host system calls.  The result is that guest memory is copied into host
> storage.

Works for me.  Thanks for helping clarify my concepts.

> 
> (New paragraph).
> 
>>  On the
>> +host side, the easiest way to emulate persistent storage is via treating
>> +a file in the host filesystem as raw data (a 1:1 mapping of offsets in
>> +the host file to disk offsets being accessed by the guest driver), but
>> +QEMU actually has the ability to glue together a lot of different host
>> +formats (raw, qcow2, qed, vhdx, ...) and protocols (file system, block
>> +device, NBD, sheepdog, gluster, ...) where any combination of host
> 
> Can we link NBD, sheepdog and gluster?  Maybe Ceph instead of Sheepdog.
> 
>> +format and protocol can serve as the backend that is then tied to the
>> +QEMU emulation providing the guest device.
>> +
>> +Thus, when you tell QEMU to use a host qcow2 file, the guest does not
>> +have to know qcow2, but merely has its normal driver make the same
>> +register reads and writes as it would on bare metal, which cause vmexits
>> +into QEMU code, then QEMU maps those accesses into reads and writes in
>> +the appropriate offsets of the qcow2 file.  When you first install the
>> +guest, all the guest sees is a blank uninitialized linear disk
>> +(regardless of whether that disk is linear in the host, as in raw
>> +format, or optimized for random access, as in the qcow2 format); it is
>> +up to the guest OS to decide how to partition its view of the hardware
>> +and install filesystems on top of that, and QEMU does not care what
>> +filesystems the guest is using, only what pattern of raw disk I/O
>> +register control sequences are issued.
>> +
>> +The next thing to realize is that emulating IDE is not always the most
>> +efficient.  Every time the guest writes to the control registers, it has
>> +to go through special handling, and vmexits slow down emulation.  One
>> +way to speed this up is through paravirtualization, or cooperation
>> +between the guest and host.
> 
> Replace last sentence with:
> 
> Of course, different hardware models have different performance
> characteristics when virtualized.  In general, however, what works best
> for real hardware does not necessarily work best for virtualization and,
> until recently, hardware was not designed to operate fast when emulated
> by software such as QEMU.  Therefore, QEMU includes _paravirtualized_
> devices that _are_ designed specifically for this purpose.
> 
> The meaning of "paravirtualization" here is slightly different from the
> original one of "virtualization through cooperation between the guest
> and host".  (Continue with next sentence in the same paragraph).
> 
>>  The QEMU developers have produced a
>> +specification for a set of hardware registers and the behavior for those
>> +registers which are designed to result in the minimum number of vmexits
>> +possible while still accomplishing what a hard disk must do, namely,
>> +transferring data between normal guest memory and persistent storage.
>> +This specification is called virtio; using it requires installation of a
>> +virtio driver in the guest.  While there is no known hardware that
> 
> s/there is no known hardware/no physical device exists/
> 
>> +follows the same register layout as virtio, the concept is the same: a
>> +virtio disk behaves like a memory-mapped register bank, where the guest
>> +OS driver then knows what sequence of register commands to write into
>> +that bank to cause data to be copied in and out of other guest memory.
>> +Much of the speedups in virtio come by its design - the guest sets aside
>> +a portion of regular memory for the bulk of its command queue, and only
>> +has to kick a single register to then tell QEMU to read the command
>> +queue (fewer mapped register accesses mean fewer vmexits), coupled with
>> +handshaking guarantees that the guest driver won't be changing the
>> +normal memory while QEMU is acting on it.
> 
> Maybe add a short paragraph here like:
> 
> As an aside, just like recent hardware is fairly efficient to emulate,
> virtio is evolving to be also efficient to implement in hardware, of
> course without sacrificing performance for emulation or virtualization.
> Therefore, in the future you could stumble upon physical virtio devices
> as well.
> 
>> +In a similar vein, many OS have support for a number of network cards, a
>> +common example being the e1000 card on the PCI bus.  On bare metal, an
>> +OS will probe PCI space, see that a bank of registers with the signature
>> +for e1000 is populated, and load the driver that then knows what
>> +register sequences to write in order to let the hardware card transfer
>> +network traffic in and out of the guest.  So QEMU has, as one of its
>> +many network card emulations, an e1000 device, which is mapped to the
>> +same guest memory region as a real one would live on bare metal.  And
>> +once again, the e1000 register layout tends to require a lot of register
>> +writes (and thus vmexits) for the amount of work the hardware performs,
>> +so the QEMU developers have added the virtio-net card (a PCI hardware
>> +specification, although no bare-metal hardware exists that actually
>> +implements it), such that installing a virtio-net driver in the guest OS
>> +can then minimize the number of vmexits while still getting the same
>> +side-effects of sending network traffic.  If you tell QEMU to start a
>> +guest with a virtio-net card, then the guest OS will probe PCI space and
>> +see a bank of registers with the virtio-net signature, and load the
>> +appropriate driver like it would for any other PCI hardware.
>> +
>> +In summary, even though QEMU was first written as a way of emulating
>> +hardware memory maps in order to virtualize a guest OS, it turns out
>> +that the fastest virtualization also depends on virtual hardware: a
>> +memory map of registers with particular documented side effects that has
>> +no bare-metal counterpart.  And at the end of the day, all
>> +virtualization really means is running a particular set of assembly
>> +instructions (the guest OS) to manipulate locations within a giant
>> +memory map for causing a particular set of side effects, where QEMU is
>> +just a user-space application providing a memory map and mimicking the
>> +same side effects you would get when executing those guest instructions
>> +on the appropriate bare metal hardware.
>>
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [qemu-web PATCH] Add "Understanding QEMU devices" blog post
  2018-01-26  9:19 Thomas Huth
  2018-01-26 12:40 ` Paolo Bonzini
@ 2018-01-26 13:06 ` Kashyap Chamarthy
  1 sibling, 0 replies; 8+ messages in thread
From: Kashyap Chamarthy @ 2018-01-26 13:06 UTC (permalink / raw)
  To: Thomas Huth; +Cc: qemu-devel, Paolo Bonzini

On Fri, Jan 26, 2018 at 10:19:30AM +0100, Thomas Huth wrote:
> Last July, Eric Blake wrote a nice summary for newcomers about what
> QEMU has to do to emulate devices for the guests. So far, we missed
> to integrate this somewhere into the QEM web site or wiki, so let's
> publish this now as a nice blog post for the users.

Good catch.  I had it bookmarked to do so, but forgot.

> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  _posts/2018-01-26-understanding-qemu-devices.md | 139 ++++++++++++++++++++++++
>  1 file changed, 139 insertions(+)
>  create mode 100644 _posts/2018-01-26-understanding-qemu-devices.md

The core content is very helpful; but for better readability, would be
nice to break some of the large 22-line paragraphs (as they can be
overwhelming) down into slightly readable chunks.

If you don't prefer to do it, can take a stab at it, since I asked about
it.  If you think it's not worth it, I'm fine letting it go, since this
is a strict improvement as-is.

> diff --git a/_posts/2018-01-26-understanding-qemu-devices.md b/_posts/2018-01-26-understanding-qemu-devices.md
> new file mode 100644
> index 0000000..b436ef0
> --- /dev/null
> +++ b/_posts/2018-01-26-understanding-qemu-devices.md
> @@ -0,0 +1,139 @@
> +---
> +layout: post
> +title:  "Understanding QEMU devices"
> +date:   2018-01-26 10:00:00 +0100
> +author: Eric Blake
> +categories: blog

[...]

-- 
/kashyap

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [qemu-web PATCH] Add "Understanding QEMU devices" blog post
  2018-01-26  9:19 Thomas Huth
@ 2018-01-26 12:40 ` Paolo Bonzini
  2018-01-26 14:46   ` Eric Blake
  2018-01-26 13:06 ` Kashyap Chamarthy
  1 sibling, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2018-01-26 12:40 UTC (permalink / raw)
  To: Thomas Huth, qemu-devel

On 26/01/2018 10:19, Thomas Huth wrote:
> Last July, Eric Blake wrote a nice summary for newcomers about what
> QEMU has to do to emulate devices for the guests. So far, we missed
> to integrate this somewhere into the QEM web site or wiki, so let's
> publish this now as a nice blog post for the users.

It's very nice!  Some proofreading and corrections follow.

> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  _posts/2018-01-26-understanding-qemu-devices.md | 139 ++++++++++++++++++++++++
>  1 file changed, 139 insertions(+)
>  create mode 100644 _posts/2018-01-26-understanding-qemu-devices.md
> 
> diff --git a/_posts/2018-01-26-understanding-qemu-devices.md b/_posts/2018-01-26-understanding-qemu-devices.md
> new file mode 100644
> index 0000000..b436ef0
> --- /dev/null
> +++ b/_posts/2018-01-26-understanding-qemu-devices.md
> @@ -0,0 +1,139 @@
> +---
> +layout: post
> +title:  "Understanding QEMU devices"
> +date:   2018-01-26 10:00:00 +0100
> +author: Eric Blake
> +categories: blog
> +---
> +Here are some notes that may help newcomers understand what is actually
> +happening with QEMU devices:
> +
> +With QEMU, one thing to remember is that we are trying to emulate what
> +an OS would see on bare-metal hardware.  All bare-metal machines are

s/All/Most/ (s390 anyone? :))

> +basically giant memory maps, where software poking at a particular
> +address will have a particular side effect (the most common side effect
> +is, of course, accessing memory; but other common regions in memory
> +include the register banks for controlling particular pieces of
> +hardware, like the hard drive or a network card, or even the CPU
> +itself).  The end-goal of emulation is to allow a user-space program,
> +using only normal memory accesses, to manage all of the side-effects
> +that a guest OS is expecting.
> +
> +As an implementation detail, some hardware, like x86, actually has two
> +memory spaces, where I/O space uses different assembly codes than
> +normal; QEMU has to emulate these alternative accesses.  Similarly, many
> +modern hardware is so complex that the CPU itself provides both
> +specialized assembly instructions and a bank of registers within the
> +memory map (a classic example being the management of the MMU, or
> +separation between Ring 0 kernel code and Ring 3 userspace code - if
> +that's not crazy enough, there's nested virtualization).

I'd say the interrupt controllers are a better example so:

Similarly, many modern CPUs provide themselves a bank of CPU-local
registers within the memory map, such as for an interrupt controller.

And then a paragraph break.

> +With certain
> +hardware, we have virtualization hooks where the CPU itself makes it
> +easy to trap on just the problematic assembly instructions (those that
> +access I/O space or CPU internal registers, and therefore require side
> +effects different than a normal memory access), so that the guest just
> +executes the same assembly sequence as on bare metal, but that execution
> +then causes a trap to let user-space QEMU then react to the instructions
> +using just its normal user-space memory accesses before returning
> +control to the guest.  This is the kvm accelerator, and can let a guest

This is supported in QEMU through "accelerators" such as KVM.

> +run nearly as fast as bare metal, where the slowdowns are caused by each
> +trap from guest back to QEMU (a vmexit) to handle a difficult assembly
> +instruction or memory address.  QEMU also supports a TCG accelerator,

QEMU also supports other virtualizing accelerators (such as
[HAXM](https://www.qemu.org/2017/11/22/haxm-usage-windows/) or macOS's
Hypervisor.framework) and also TCG,

> +which takes the guest assembly instructions and compiles it on the fly
> +into comparable host instructions or calls to host helper routines (not
> +as fast, but results in QEMU being able to do cross-hardware emulation).

While not as fast, TCG is able to do cross-hardware emulation, such as
running ARM code on x86. (Removing the parentheses)

> +The next thing to realize is what is happening when an OS is accessing
> +various hardware resources.  For example, most OS ship with a driver

most operating systems

> +that knows how to manage an IDE disk - the driver is merely software
> +that is programmed to make specific I/O requests to a specific subset of
> +the memory map (wherever the IDE bus lives, as hard-coded by the
> +hardware board designers),

(wherever the IDE bus lives, which is specific the hardware board).

 in order to make the disk drive hardware then
> +obey commands to copy data from memory to persistent storage (writing to
> +disk) or from persistent storage to memory (reading from the disk).

When the IDE controller hardware receives those I/O requests it
communicates with the disk drive hardware, ultimately resulting in data
being copied from memory...

> +When you first buy bare-metal hardware, your disk is uninitialized; you
> +install the OS that uses the driver to make enough bare-metal accesses
> +to the IDE hardware portion of the memory map to then turn the disk into
> +a set of partitions and filesystems on top of those partitions.
> +
> +So, how does QEMU emulate this? In the big memory map it provides to the
> +guest, it emulates an IDE disk at the same address as bare-metal would.
> +When the guest OS driver issues particular memory writes to the IDE
> +control registers in order to copy data from memory to persistent
> +storage, QEMU traps on those writes (whether via kvm hypervisor assist,
> +or by noticing during TCG translation that the addresses being accessed
> +are special),

the accelerator knows that these writes must trap (remove everything in
parentheses) and passes them to the QEMU IDE controller _device model_.

 and emulates the same side effects by issuing host
> +commands to copy the specified guest memory into host storage.

The device model parses the I/O requests, then emulates them by issuing
host system calls.  The result is that guest memory is copied into host
storage.

(New paragraph).

>  On the
> +host side, the easiest way to emulate persistent storage is via treating
> +a file in the host filesystem as raw data (a 1:1 mapping of offsets in
> +the host file to disk offsets being accessed by the guest driver), but
> +QEMU actually has the ability to glue together a lot of different host
> +formats (raw, qcow2, qed, vhdx, ...) and protocols (file system, block
> +device, NBD, sheepdog, gluster, ...) where any combination of host

Can we link NBD, sheepdog and gluster?  Maybe Ceph instead of Sheepdog.

> +format and protocol can serve as the backend that is then tied to the
> +QEMU emulation providing the guest device.
> +
> +Thus, when you tell QEMU to use a host qcow2 file, the guest does not
> +have to know qcow2, but merely has its normal driver make the same
> +register reads and writes as it would on bare metal, which cause vmexits
> +into QEMU code, then QEMU maps those accesses into reads and writes in
> +the appropriate offsets of the qcow2 file.  When you first install the
> +guest, all the guest sees is a blank uninitialized linear disk
> +(regardless of whether that disk is linear in the host, as in raw
> +format, or optimized for random access, as in the qcow2 format); it is
> +up to the guest OS to decide how to partition its view of the hardware
> +and install filesystems on top of that, and QEMU does not care what
> +filesystems the guest is using, only what pattern of raw disk I/O
> +register control sequences are issued.
> +
> +The next thing to realize is that emulating IDE is not always the most
> +efficient.  Every time the guest writes to the control registers, it has
> +to go through special handling, and vmexits slow down emulation.  One
> +way to speed this up is through paravirtualization, or cooperation
> +between the guest and host.

Replace last sentence with:

Of course, different hardware models have different performance
characteristics when virtualized.  In general, however, what works best
for real hardware does not necessarily work best for virtualization and,
until recently, hardware was not designed to operate fast when emulated
by software such as QEMU.  Therefore, QEMU includes _paravirtualized_
devices that _are_ designed specifically for this purpose.

The meaning of "paravirtualization" here is slightly different from the
original one of "virtualization through cooperation between the guest
and host".  (Continue with next sentence in the same paragraph).

>  The QEMU developers have produced a
> +specification for a set of hardware registers and the behavior for those
> +registers which are designed to result in the minimum number of vmexits
> +possible while still accomplishing what a hard disk must do, namely,
> +transferring data between normal guest memory and persistent storage.
> +This specification is called virtio; using it requires installation of a
> +virtio driver in the guest.  While there is no known hardware that

s/there is no known hardware/no physical device exists/

> +follows the same register layout as virtio, the concept is the same: a
> +virtio disk behaves like a memory-mapped register bank, where the guest
> +OS driver then knows what sequence of register commands to write into
> +that bank to cause data to be copied in and out of other guest memory.
> +Much of the speedups in virtio come by its design - the guest sets aside
> +a portion of regular memory for the bulk of its command queue, and only
> +has to kick a single register to then tell QEMU to read the command
> +queue (fewer mapped register accesses mean fewer vmexits), coupled with
> +handshaking guarantees that the guest driver won't be changing the
> +normal memory while QEMU is acting on it.

Maybe add a short paragraph here like:

As an aside, just like recent hardware is fairly efficient to emulate,
virtio is evolving to be also efficient to implement in hardware, of
course without sacrificing performance for emulation or virtualization.
Therefore, in the future you could stumble upon physical virtio devices
as well.

> +In a similar vein, many OS have support for a number of network cards, a
> +common example being the e1000 card on the PCI bus.  On bare metal, an
> +OS will probe PCI space, see that a bank of registers with the signature
> +for e1000 is populated, and load the driver that then knows what
> +register sequences to write in order to let the hardware card transfer
> +network traffic in and out of the guest.  So QEMU has, as one of its
> +many network card emulations, an e1000 device, which is mapped to the
> +same guest memory region as a real one would live on bare metal.  And
> +once again, the e1000 register layout tends to require a lot of register
> +writes (and thus vmexits) for the amount of work the hardware performs,
> +so the QEMU developers have added the virtio-net card (a PCI hardware
> +specification, although no bare-metal hardware exists that actually
> +implements it), such that installing a virtio-net driver in the guest OS
> +can then minimize the number of vmexits while still getting the same
> +side-effects of sending network traffic.  If you tell QEMU to start a
> +guest with a virtio-net card, then the guest OS will probe PCI space and
> +see a bank of registers with the virtio-net signature, and load the
> +appropriate driver like it would for any other PCI hardware.
> +
> +In summary, even though QEMU was first written as a way of emulating
> +hardware memory maps in order to virtualize a guest OS, it turns out
> +that the fastest virtualization also depends on virtual hardware: a
> +memory map of registers with particular documented side effects that has
> +no bare-metal counterpart.  And at the end of the day, all
> +virtualization really means is running a particular set of assembly
> +instructions (the guest OS) to manipulate locations within a giant
> +memory map for causing a particular set of side effects, where QEMU is
> +just a user-space application providing a memory map and mimicking the
> +same side effects you would get when executing those guest instructions
> +on the appropriate bare metal hardware.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [qemu-web PATCH] Add "Understanding QEMU devices" blog post
@ 2018-01-26  9:19 Thomas Huth
  2018-01-26 12:40 ` Paolo Bonzini
  2018-01-26 13:06 ` Kashyap Chamarthy
  0 siblings, 2 replies; 8+ messages in thread
From: Thomas Huth @ 2018-01-26  9:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Eric Blake

Last July, Eric Blake wrote a nice summary for newcomers about what
QEMU has to do to emulate devices for the guests. So far, we missed
to integrate this somewhere into the QEM web site or wiki, so let's
publish this now as a nice blog post for the users.

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 _posts/2018-01-26-understanding-qemu-devices.md | 139 ++++++++++++++++++++++++
 1 file changed, 139 insertions(+)
 create mode 100644 _posts/2018-01-26-understanding-qemu-devices.md

diff --git a/_posts/2018-01-26-understanding-qemu-devices.md b/_posts/2018-01-26-understanding-qemu-devices.md
new file mode 100644
index 0000000..b436ef0
--- /dev/null
+++ b/_posts/2018-01-26-understanding-qemu-devices.md
@@ -0,0 +1,139 @@
+---
+layout: post
+title:  "Understanding QEMU devices"
+date:   2018-01-26 10:00:00 +0100
+author: Eric Blake
+categories: blog
+---
+Here are some notes that may help newcomers understand what is actually
+happening with QEMU devices:
+
+With QEMU, one thing to remember is that we are trying to emulate what
+an OS would see on bare-metal hardware.  All bare-metal machines are
+basically giant memory maps, where software poking at a particular
+address will have a particular side effect (the most common side effect
+is, of course, accessing memory; but other common regions in memory
+include the register banks for controlling particular pieces of
+hardware, like the hard drive or a network card, or even the CPU
+itself).  The end-goal of emulation is to allow a user-space program,
+using only normal memory accesses, to manage all of the side-effects
+that a guest OS is expecting.
+
+As an implementation detail, some hardware, like x86, actually has two
+memory spaces, where I/O space uses different assembly codes than
+normal; QEMU has to emulate these alternative accesses.  Similarly, many
+modern hardware is so complex that the CPU itself provides both
+specialized assembly instructions and a bank of registers within the
+memory map (a classic example being the management of the MMU, or
+separation between Ring 0 kernel code and Ring 3 userspace code - if
+that's not crazy enough, there's nested virtualization).  With certain
+hardware, we have virtualization hooks where the CPU itself makes it
+easy to trap on just the problematic assembly instructions (those that
+access I/O space or CPU internal registers, and therefore require side
+effects different than a normal memory access), so that the guest just
+executes the same assembly sequence as on bare metal, but that execution
+then causes a trap to let user-space QEMU then react to the instructions
+using just its normal user-space memory accesses before returning
+control to the guest.  This is the kvm accelerator, and can let a guest
+run nearly as fast as bare metal, where the slowdowns are caused by each
+trap from guest back to QEMU (a vmexit) to handle a difficult assembly
+instruction or memory address.  QEMU also supports a TCG accelerator,
+which takes the guest assembly instructions and compiles it on the fly
+into comparable host instructions or calls to host helper routines (not
+as fast, but results in QEMU being able to do cross-hardware emulation).
+
+The next thing to realize is what is happening when an OS is accessing
+various hardware resources.  For example, most OS ship with a driver
+that knows how to manage an IDE disk - the driver is merely software
+that is programmed to make specific I/O requests to a specific subset of
+the memory map (wherever the IDE bus lives, as hard-coded by the
+hardware board designers), in order to make the disk drive hardware then
+obey commands to copy data from memory to persistent storage (writing to
+disk) or from persistent storage to memory (reading from the disk).
+When you first buy bare-metal hardware, your disk is uninitialized; you
+install the OS that uses the driver to make enough bare-metal accesses
+to the IDE hardware portion of the memory map to then turn the disk into
+a set of partitions and filesystems on top of those partitions.
+
+So, how does QEMU emulate this? In the big memory map it provides to the
+guest, it emulates an IDE disk at the same address as bare-metal would.
+When the guest OS driver issues particular memory writes to the IDE
+control registers in order to copy data from memory to persistent
+storage, QEMU traps on those writes (whether via kvm hypervisor assist,
+or by noticing during TCG translation that the addresses being accessed
+are special), and emulates the same side effects by issuing host
+commands to copy the specified guest memory into host storage.  On the
+host side, the easiest way to emulate persistent storage is via treating
+a file in the host filesystem as raw data (a 1:1 mapping of offsets in
+the host file to disk offsets being accessed by the guest driver), but
+QEMU actually has the ability to glue together a lot of different host
+formats (raw, qcow2, qed, vhdx, ...) and protocols (file system, block
+device, NBD, sheepdog, gluster, ...) where any combination of host
+format and protocol can serve as the backend that is then tied to the
+QEMU emulation providing the guest device.
+
+Thus, when you tell QEMU to use a host qcow2 file, the guest does not
+have to know qcow2, but merely has its normal driver make the same
+register reads and writes as it would on bare metal, which cause vmexits
+into QEMU code, then QEMU maps those accesses into reads and writes in
+the appropriate offsets of the qcow2 file.  When you first install the
+guest, all the guest sees is a blank uninitialized linear disk
+(regardless of whether that disk is linear in the host, as in raw
+format, or optimized for random access, as in the qcow2 format); it is
+up to the guest OS to decide how to partition its view of the hardware
+and install filesystems on top of that, and QEMU does not care what
+filesystems the guest is using, only what pattern of raw disk I/O
+register control sequences are issued.
+
+The next thing to realize is that emulating IDE is not always the most
+efficient.  Every time the guest writes to the control registers, it has
+to go through special handling, and vmexits slow down emulation.  One
+way to speed this up is through paravirtualization, or cooperation
+between the guest and host.  The QEMU developers have produced a
+specification for a set of hardware registers and the behavior for those
+registers which are designed to result in the minimum number of vmexits
+possible while still accomplishing what a hard disk must do, namely,
+transferring data between normal guest memory and persistent storage.
+This specification is called virtio; using it requires installation of a
+virtio driver in the guest.  While there is no known hardware that
+follows the same register layout as virtio, the concept is the same: a
+virtio disk behaves like a memory-mapped register bank, where the guest
+OS driver then knows what sequence of register commands to write into
+that bank to cause data to be copied in and out of other guest memory.
+Much of the speedups in virtio come by its design - the guest sets aside
+a portion of regular memory for the bulk of its command queue, and only
+has to kick a single register to then tell QEMU to read the command
+queue (fewer mapped register accesses mean fewer vmexits), coupled with
+handshaking guarantees that the guest driver won't be changing the
+normal memory while QEMU is acting on it.
+
+In a similar vein, many OS have support for a number of network cards, a
+common example being the e1000 card on the PCI bus.  On bare metal, an
+OS will probe PCI space, see that a bank of registers with the signature
+for e1000 is populated, and load the driver that then knows what
+register sequences to write in order to let the hardware card transfer
+network traffic in and out of the guest.  So QEMU has, as one of its
+many network card emulations, an e1000 device, which is mapped to the
+same guest memory region as a real one would live on bare metal.  And
+once again, the e1000 register layout tends to require a lot of register
+writes (and thus vmexits) for the amount of work the hardware performs,
+so the QEMU developers have added the virtio-net card (a PCI hardware
+specification, although no bare-metal hardware exists that actually
+implements it), such that installing a virtio-net driver in the guest OS
+can then minimize the number of vmexits while still getting the same
+side-effects of sending network traffic.  If you tell QEMU to start a
+guest with a virtio-net card, then the guest OS will probe PCI space and
+see a bank of registers with the virtio-net signature, and load the
+appropriate driver like it would for any other PCI hardware.
+
+In summary, even though QEMU was first written as a way of emulating
+hardware memory maps in order to virtualize a guest OS, it turns out
+that the fastest virtualization also depends on virtual hardware: a
+memory map of registers with particular documented side effects that has
+no bare-metal counterpart.  And at the end of the day, all
+virtualization really means is running a particular set of assembly
+instructions (the guest OS) to manipulate locations within a giant
+memory map for causing a particular set of side effects, where QEMU is
+just a user-space application providing a memory map and mimicking the
+same side effects you would get when executing those guest instructions
+on the appropriate bare metal hardware.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-02-09 19:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-09 19:54 [Qemu-devel] [qemu-web PATCH] Add "Understanding QEMU devices" blog post Eric Blake
  -- strict thread matches above, loose matches on Subject: below --
2018-01-26  9:19 Thomas Huth
2018-01-26 12:40 ` Paolo Bonzini
2018-01-26 14:46   ` Eric Blake
2018-01-29 11:11     ` Paolo Bonzini
2018-01-29 11:27     ` Thomas Huth
2018-01-30 23:02       ` Paolo Bonzini
2018-01-26 13:06 ` Kashyap Chamarthy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.