From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claudio Fontana Subject: Re: [PATCH v3 2/2] docs: update ivshmem device spec Date: Fri, 8 Aug 2014 11:04:03 +0200 Message-ID: <53E49283.8020404@huawei.com> References: <1407488118-11245-1-git-send-email-david.marchand@6wind.com> <1407488118-11245-3-git-send-email-david.marchand@6wind.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: , , , , , To: David Marchand , Return-path: Received: from lhrrgout.huawei.com ([194.213.3.17]:9170 "EHLO lhrrgout.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752939AbaHHJFN (ORCPT ); Fri, 8 Aug 2014 05:05:13 -0400 In-Reply-To: <1407488118-11245-3-git-send-email-david.marchand@6wind.com> Sender: kvm-owner@vger.kernel.org List-ID: Hello David, On 08.08.2014 10:55, David Marchand wrote: > Add some notes on the parts needed to use ivshmem devices: more speci= fically, > explain the purpose of an ivshmem server and the basic concept to use= the > ivshmem devices in guests. > Move some parts of the documentation and re-organise it. >=20 > Signed-off-by: David Marchand You did not include my Reviewed-by: tag, did you change this from v2? Ciao, Claudio > --- > docs/specs/ivshmem_device_spec.txt | 124 ++++++++++++++++++++++++++= +--------- > 1 file changed, 93 insertions(+), 31 deletions(-) >=20 > diff --git a/docs/specs/ivshmem_device_spec.txt b/docs/specs/ivshmem_= device_spec.txt > index 667a862..f5f2b95 100644 > --- a/docs/specs/ivshmem_device_spec.txt > +++ b/docs/specs/ivshmem_device_spec.txt > @@ -2,30 +2,103 @@ > Device Specification for Inter-VM shared memory device > ------------------------------------------------------ > =20 > -The Inter-VM shared memory device is designed to share a region of m= emory to > -userspace in multiple virtual guests. The memory region does not be= long to any > -guest, but is a POSIX memory object on the host. Optionally, the de= vice may > -support sending interrupts to other guests sharing the same memory r= egion. > +The Inter-VM shared memory device is designed to share a memory regi= on (created > +on the host via the POSIX shared memory API) between multiple QEMU p= rocesses > +running different guests. In order for all guests to be able to pick= up the > +shared memory area, it is modeled by QEMU as a PCI device exposing s= aid memory > +to the guest as a PCI BAR. > +The memory region does not belong to any guest, but is a POSIX memor= y object on > +the host. The host can access this shared memory if needed. > + > +The device also provides an optional communication mechanism between= guests > +sharing the same memory object. More details about that in the secti= on 'Guest to > +guest communication' section. > =20 > =20 > The Inter-VM PCI device > ----------------------- > =20 > -*BARs* > +From the VM point of view, the ivshmem PCI device supports three BAR= s. > + > +- BAR0 is a 1 Kbyte MMIO region to support registers and interrupts = when MSI is > + not used. > +- BAR1 is used for MSI-X when it is enabled in the device. > +- BAR2 is used to access the shared memory object. > + > +It is your choice how to use the device but you must choose between = two > +behaviors : > + > +- basically, if you only need the shared memory part, you will map B= AR2. > + This way, you have access to the shared memory in guest and can us= e it as you > + see fit (memnic, for example, uses it in userland > + http://dpdk.org/browse/memnic). > + > +- BAR0 and BAR1 are used to implement an optional communication mech= anism > + through interrupts in the guests. If you need an event mechanism b= etween the > + guests accessing the shared memory, you will most likely want to w= rite a > + kernel driver that will handle interrupts. See details in the sect= ion 'Guest > + to guest communication' section. > + > +The behavior is chosen when starting your QEMU processes: > +- no communication mechanism needed, the first QEMU to start creates= the shared > + memory on the host, subsequent QEMU processes will use it. > + > +- communication mechanism needed, an ivshmem server must be started = before any > + QEMU processes, then each QEMU process connects to the server unix= socket. > + > +For more details on the QEMU ivshmem parameters, see qemu-doc docume= ntation. > + > + > +Guest to guest communication > +---------------------------- > + > +This section details the communication mechanism between the guests = accessing > +the ivhsmem shared memory. > =20 > -The device supports three BARs. BAR0 is a 1 Kbyte MMIO region to su= pport > -registers. BAR1 is used for MSI-X when it is enabled in the device.= BAR2 is > -used to map the shared memory object from the host. The size of BAR= 2 is > -specified when the guest is started and must be a power of 2 in size= =2E > +*ivshmem server* > =20 > -*Registers* > +This server code is available in qemu.git/contrib/ivshmem-server. > =20 > -The device currently supports 4 registers of 32-bits each. Register= s > -are used for synchronization between guests sharing the same memory = object when > -interrupts are supported (this requires using the shared memory serv= er). > +The server must be started on the host before any guest. > +It creates a shared memory object then waits for clients to connect = on an unix > +socket. > =20 > -The server assigns each VM an ID number and sends this ID number to = the QEMU > -process when the guest starts. > +For each client (QEMU processes) that connects to the server: > +- the server assigns an ID for this client and sends this ID to him = as the first > + message, > +- the server sends a fd to the shared memory object to this client, > +- the server creates a new set of host eventfds associated to the ne= w client and > + sends this set to all already connected clients, > +- finally, the server sends all the eventfds sets for all clients to= the new > + client. > + > +The server signals all clients when one of them disconnects. > + > +The client IDs are limited to 16 bits because of the current impleme= ntation (see > +Doorbell register in 'PCI device registers' subsection). Hence on 65= 536 clients > +are supported. > + > +All the file descriptors (fd to the shared memory, eventfds for each= client) > +are passed to clients using SCM_RIGHTS over the server unix socket. > + > +Apart from the current ivshmem implementation in QEMU, an ivshmem cl= ient has > +been provided in qemu.git/contrib/ivshmem-client for debug. > + > +*QEMU as an ivshmem client* > + > +At initialisation, when creating the ivshmem device, QEMU gets its I= D from the > +server then make it available through BAR0 IVPosition register for t= he VM to use > +(see 'PCI device registers' subsection). > +QEMU then uses the fd to the shared memory to map it to BAR2. > +eventfds for all other clients received from the server are stored t= o implement > +BAR0 Doorbell register (see 'PCI device registers' subsection). > +Finally, eventfds assigned to this QEMU process are used to send int= errupts in > +this VM. > + > +*PCI device registers* > + > +From the VM point of view, the ivshmem PCI device supports 4 registe= rs of > +32-bits each. > =20 > enum ivshmem_registers { > IntrMask =3D 0, > @@ -49,8 +122,8 @@ bit to 0 and unmasked by setting the first bit to = 1. > IVPosition Register: The IVPosition register is read-only and report= s the > guest's ID number. The guest IDs are non-negative integers. When u= sing the > server, since the server is a separate process, the VM ID will only = be set when > -the device is ready (shared memory is received from the server and a= ccessible via > -the device). If the device is not ready, the IVPosition will return= -1. > +the device is ready (shared memory is received from the server and a= ccessible > +via the device). If the device is not ready, the IVPosition will re= turn -1. > Applications should ensure that they have a valid VM ID before acces= sing the > shared memory. > =20 > @@ -59,8 +132,8 @@ Doorbell register. The doorbell register is 32-bi= ts, logically divided into > two 16-bit fields. The high 16-bits are the guest ID to interrupt a= nd the low > 16-bits are the interrupt vector to trigger. The semantics of the v= alue > written to the doorbell depends on whether the device is using MSI o= r a regular > -pin-based interrupt. In short, MSI uses vectors while regular inter= rupts set the > -status register. > +pin-based interrupt. In short, MSI uses vectors while regular inter= rupts set > +the status register. > =20 > Regular Interrupts > =20 > @@ -71,7 +144,7 @@ interrupt in the destination guest. > =20 > Message Signalled Interrupts > =20 > -A ivshmem device may support multiple MSI vectors. If so, the lower= 16-bits > +An ivshmem device may support multiple MSI vectors. If so, the lowe= r 16-bits > written to the Doorbell register must be between 0 and the maximum n= umber of > vectors the guest supports. The lower 16 bits written to the doorbe= ll is the > MSI vector that will be raised in the destination guest. The number= of MSI > @@ -83,14 +156,3 @@ interrupt itself should be communicated via the s= hared memory region. Devices > supporting multiple MSI vectors can use different vectors to indicat= e different > events have occurred. The semantics of interrupt vectors are left t= o the > user's discretion. > - > - > -Usage in the Guest > ------------------- > - > -The shared memory device is intended to be used with the provided UI= O driver. > -Very little configuration is needed. The guest should map BAR0 to a= ccess the > -registers (an array of 32-bit ints allows simple writing) and map BA= R2 to > -access the shared memory region itself. The size of the shared memo= ry region > -is specified when the guest (or shared memory server) is started. A= guest may > -map the whole shared memory region or only part of it. >=20 --=20 Claudio Fontana Server Virtualization Architect Huawei Technologies Duesseldorf GmbH Riesstra=DFe 25 - 80992 M=FCnchen office: +49 89 158834 4135 mobile: +49 15253060158 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46554) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XFg6r-0001Iz-P5 for qemu-devel@nongnu.org; Fri, 08 Aug 2014 05:05:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XFg6n-0007n5-7G for qemu-devel@nongnu.org; Fri, 08 Aug 2014 05:05:13 -0400 Received: from lhrrgout.huawei.com ([194.213.3.17]:9030) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XFg6m-0007Od-S0 for qemu-devel@nongnu.org; Fri, 08 Aug 2014 05:05:09 -0400 Message-ID: <53E49283.8020404@huawei.com> Date: Fri, 8 Aug 2014 11:04:03 +0200 From: Claudio Fontana MIME-Version: 1.0 References: <1407488118-11245-1-git-send-email-david.marchand@6wind.com> <1407488118-11245-3-git-send-email-david.marchand@6wind.com> In-Reply-To: <1407488118-11245-3-git-send-email-david.marchand@6wind.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH v3 2/2] docs: update ivshmem device spec List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Marchand , qemu-devel@nongnu.org Cc: kvm@vger.kernel.org, armbru@redhat.com, pbonzini@redhat.com, jani.kokkonen@huawei.com, cam@cs.ualberta.ca Hello David, On 08.08.2014 10:55, David Marchand wrote: > Add some notes on the parts needed to use ivshmem devices: more specifically, > explain the purpose of an ivshmem server and the basic concept to use the > ivshmem devices in guests. > Move some parts of the documentation and re-organise it. > > Signed-off-by: David Marchand You did not include my Reviewed-by: tag, did you change this from v2? Ciao, Claudio > --- > docs/specs/ivshmem_device_spec.txt | 124 +++++++++++++++++++++++++++--------- > 1 file changed, 93 insertions(+), 31 deletions(-) > > diff --git a/docs/specs/ivshmem_device_spec.txt b/docs/specs/ivshmem_device_spec.txt > index 667a862..f5f2b95 100644 > --- a/docs/specs/ivshmem_device_spec.txt > +++ b/docs/specs/ivshmem_device_spec.txt > @@ -2,30 +2,103 @@ > Device Specification for Inter-VM shared memory device > ------------------------------------------------------ > > -The Inter-VM shared memory device is designed to share a region of memory to > -userspace in multiple virtual guests. The memory region does not belong to any > -guest, but is a POSIX memory object on the host. Optionally, the device may > -support sending interrupts to other guests sharing the same memory region. > +The Inter-VM shared memory device is designed to share a memory region (created > +on the host via the POSIX shared memory API) between multiple QEMU processes > +running different guests. In order for all guests to be able to pick up the > +shared memory area, it is modeled by QEMU as a PCI device exposing said memory > +to the guest as a PCI BAR. > +The memory region does not belong to any guest, but is a POSIX memory object on > +the host. The host can access this shared memory if needed. > + > +The device also provides an optional communication mechanism between guests > +sharing the same memory object. More details about that in the section 'Guest to > +guest communication' section. > > > The Inter-VM PCI device > ----------------------- > > -*BARs* > +From the VM point of view, the ivshmem PCI device supports three BARs. > + > +- BAR0 is a 1 Kbyte MMIO region to support registers and interrupts when MSI is > + not used. > +- BAR1 is used for MSI-X when it is enabled in the device. > +- BAR2 is used to access the shared memory object. > + > +It is your choice how to use the device but you must choose between two > +behaviors : > + > +- basically, if you only need the shared memory part, you will map BAR2. > + This way, you have access to the shared memory in guest and can use it as you > + see fit (memnic, for example, uses it in userland > + http://dpdk.org/browse/memnic). > + > +- BAR0 and BAR1 are used to implement an optional communication mechanism > + through interrupts in the guests. If you need an event mechanism between the > + guests accessing the shared memory, you will most likely want to write a > + kernel driver that will handle interrupts. See details in the section 'Guest > + to guest communication' section. > + > +The behavior is chosen when starting your QEMU processes: > +- no communication mechanism needed, the first QEMU to start creates the shared > + memory on the host, subsequent QEMU processes will use it. > + > +- communication mechanism needed, an ivshmem server must be started before any > + QEMU processes, then each QEMU process connects to the server unix socket. > + > +For more details on the QEMU ivshmem parameters, see qemu-doc documentation. > + > + > +Guest to guest communication > +---------------------------- > + > +This section details the communication mechanism between the guests accessing > +the ivhsmem shared memory. > > -The device supports three BARs. BAR0 is a 1 Kbyte MMIO region to support > -registers. BAR1 is used for MSI-X when it is enabled in the device. BAR2 is > -used to map the shared memory object from the host. The size of BAR2 is > -specified when the guest is started and must be a power of 2 in size. > +*ivshmem server* > > -*Registers* > +This server code is available in qemu.git/contrib/ivshmem-server. > > -The device currently supports 4 registers of 32-bits each. Registers > -are used for synchronization between guests sharing the same memory object when > -interrupts are supported (this requires using the shared memory server). > +The server must be started on the host before any guest. > +It creates a shared memory object then waits for clients to connect on an unix > +socket. > > -The server assigns each VM an ID number and sends this ID number to the QEMU > -process when the guest starts. > +For each client (QEMU processes) that connects to the server: > +- the server assigns an ID for this client and sends this ID to him as the first > + message, > +- the server sends a fd to the shared memory object to this client, > +- the server creates a new set of host eventfds associated to the new client and > + sends this set to all already connected clients, > +- finally, the server sends all the eventfds sets for all clients to the new > + client. > + > +The server signals all clients when one of them disconnects. > + > +The client IDs are limited to 16 bits because of the current implementation (see > +Doorbell register in 'PCI device registers' subsection). Hence on 65536 clients > +are supported. > + > +All the file descriptors (fd to the shared memory, eventfds for each client) > +are passed to clients using SCM_RIGHTS over the server unix socket. > + > +Apart from the current ivshmem implementation in QEMU, an ivshmem client has > +been provided in qemu.git/contrib/ivshmem-client for debug. > + > +*QEMU as an ivshmem client* > + > +At initialisation, when creating the ivshmem device, QEMU gets its ID from the > +server then make it available through BAR0 IVPosition register for the VM to use > +(see 'PCI device registers' subsection). > +QEMU then uses the fd to the shared memory to map it to BAR2. > +eventfds for all other clients received from the server are stored to implement > +BAR0 Doorbell register (see 'PCI device registers' subsection). > +Finally, eventfds assigned to this QEMU process are used to send interrupts in > +this VM. > + > +*PCI device registers* > + > +From the VM point of view, the ivshmem PCI device supports 4 registers of > +32-bits each. > > enum ivshmem_registers { > IntrMask = 0, > @@ -49,8 +122,8 @@ bit to 0 and unmasked by setting the first bit to 1. > IVPosition Register: The IVPosition register is read-only and reports the > guest's ID number. The guest IDs are non-negative integers. When using the > server, since the server is a separate process, the VM ID will only be set when > -the device is ready (shared memory is received from the server and accessible via > -the device). If the device is not ready, the IVPosition will return -1. > +the device is ready (shared memory is received from the server and accessible > +via the device). If the device is not ready, the IVPosition will return -1. > Applications should ensure that they have a valid VM ID before accessing the > shared memory. > > @@ -59,8 +132,8 @@ Doorbell register. The doorbell register is 32-bits, logically divided into > two 16-bit fields. The high 16-bits are the guest ID to interrupt and the low > 16-bits are the interrupt vector to trigger. The semantics of the value > written to the doorbell depends on whether the device is using MSI or a regular > -pin-based interrupt. In short, MSI uses vectors while regular interrupts set the > -status register. > +pin-based interrupt. In short, MSI uses vectors while regular interrupts set > +the status register. > > Regular Interrupts > > @@ -71,7 +144,7 @@ interrupt in the destination guest. > > Message Signalled Interrupts > > -A ivshmem device may support multiple MSI vectors. If so, the lower 16-bits > +An ivshmem device may support multiple MSI vectors. If so, the lower 16-bits > written to the Doorbell register must be between 0 and the maximum number of > vectors the guest supports. The lower 16 bits written to the doorbell is the > MSI vector that will be raised in the destination guest. The number of MSI > @@ -83,14 +156,3 @@ interrupt itself should be communicated via the shared memory region. Devices > supporting multiple MSI vectors can use different vectors to indicate different > events have occurred. The semantics of interrupt vectors are left to the > user's discretion. > - > - > -Usage in the Guest > ------------------- > - > -The shared memory device is intended to be used with the provided UIO driver. > -Very little configuration is needed. The guest should map BAR0 to access the > -registers (an array of 32-bit ints allows simple writing) and map BAR2 to > -access the shared memory region itself. The size of the shared memory region > -is specified when the guest (or shared memory server) is started. A guest may > -map the whole shared memory region or only part of it. > -- Claudio Fontana Server Virtualization Architect Huawei Technologies Duesseldorf GmbH Riesstraße 25 - 80992 München office: +49 89 158834 4135 mobile: +49 15253060158