All of lore.kernel.org
 help / color / mirror / Atom feed
* [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
@ 2022-10-17  7:47 Xuan Zhuo
  2022-10-17  7:47 ` [virtio-dev] [PATCH 1/2] Reserve device id for ISM device Xuan Zhuo
                   ` (3 more replies)
  0 siblings, 4 replies; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-17  7:47 UTC (permalink / raw)
  To: virtio-dev
  Cc: hans, herongguang, zmlcc, dust.li, tonylu, zhenzao, helinguo,
	gerry, xuanzhuo, mst, cohuck, jasowang

Hello everyone,

# Background

Nowadays, there is a common scenario to accelerate communication between
different VMs and containers, including light weight virtual machine based
containers. One way to achieve this is to colocate them on the same host.
However, the performance of inter-VM communication through network stack is not
optimal and may also waste extra CPU cycles. This scenario has been discussed
many times, but still no generic solution available [1] [2] [3].

With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
We found that by changing the communication channel between VMs from TCP to SMC
with shared memory, we can achieve superior performance for a common
socket-based application[5]:
  - latency reduced by about 50%
  - throughput increased by about 300%
  - CPU consumption reduced by about 50%

Since there is no particularly suitable shared memory management solution
matches the need for SMC(See ## Comparison with existing technology), and virtio
is the standard for communication in the virtualization world, we want to
implement a virtio-ism device based on virtio, which can support on-demand
memory sharing across VMs, containers or VM-container. To match the needs of SMC,
the virtio-ism device need to support:

1. Dynamic provision: shared memory regions are dynamically allocated and
   provisioned.
2. Multi-region management: the shared memory is divided into regions,
   and a peer may allocate one or more regions from the same shared memory
   device.
3. Permission control: The permission of each region can be set seperately.

# Virtio ism device

ISM devices provide the ability to share memory between different guests on a
host. A guest's memory got from ism device can be shared with multiple peers at
the same time. This shared relationship can be dynamically created and released.

The shared memory obtained from the device is divided into multiple ism regions
for share. ISM device provides a mechanism to notify other ism region referrers
of content update events.

# Usage (SMC as example)

Maybe there is one of possible use cases:

1. SMC calls the interface ism_alloc_region() of the ism driver to return the
   location of a memory region in the PCI space and a token.
2. The ism driver mmap the memory region and return to SMC with the token
3. SMC passes the token to the connected peer
3. the peer calls the ism driver interface ism_attach_region(token) to
   get the location of the PCI space of the shared memory


# About hot plugging of the ism device

   Hot plugging of devices is a heavier, possibly failed, time-consuming, and
   less scalable operation. So, we don't plan to support it for now.

# Comparison with existing technology

## ivshmem or ivshmem 2.0 of Qemu

   1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
   use this VM, so the security is not enough.

   2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
   other VMs that use the ivshmem 2.0 shared memory device, which also does not
   meet our needs in terms of security.

## vhost-pci and virtiovhostuser

   Does not support dynamic allocation and therefore not suitable for SMC.

# Design

   This is a structure diagram based on ism sharing between two vms.

    |-------------------------------------------------------------------------------------------------------------|
    | |------------------------------------------------|       |------------------------------------------------| |
    | | Guest                                          |       | Guest                                          | |
    | |                                                |       |                                                | |
    | |   ----------------                             |       |   ----------------                             | |
    | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
    | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
    | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
    | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
    | |    |  |                -------------------     |       |    |  |                --------------------    | |
    | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
    | |    |  |                -------------------     |       |    |  |                --------------------    | |
    | |                                |               |       |                               |                | |
    | |                                |               |       |                               |                | |
    | | Qemu                           |               |       | Qemu                          |                | |
    | |--------------------------------+---------------|       |-------------------------------+----------------| |
    |                                  |                                                       |                  |
    |                                  |                                                       |                  |
    |                                  |------------------------------+------------------------|                  |
    |                                                                 |                                           |
    |                                                                 |                                           |
    |                                                   --------------------------                                |
    |                                                    | M1 |   | M2 |   | M3 |                                 |
    |                                                   --------------------------                                |
    |                                                                                                             |
    | HOST                                                                                                        |
    ---------------------------------------------------------------------------------------------------------------

# POC code

   Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
   Qemu:   https://github.com/fengidri/qemu/commits/ism

If there are any problems, please point them out.

Hope to hear from you, thank you.

[1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
[2] https://dl.acm.org/doi/10.1145/2847562
[3] https://hal.archives-ouvertes.fr/hal-00368622/document
[4] https://lwn.net/Articles/711071/
[5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/


Xuan Zhuo (2):
  Reserve device id for ISM device
  virtio-ism: introduce new device virtio-ism

 content.tex    |   3 +
 virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 343 insertions(+)
 create mode 100644 virtio-ism.tex

--
2.32.0.3.g01195cf9f


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [virtio-dev] [PATCH 1/2] Reserve device id for ISM device
  2022-10-17  7:47 [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device Xuan Zhuo
@ 2022-10-17  7:47 ` Xuan Zhuo
  2022-10-17  7:47 ` [PATCH 2/2] virtio-ism: introduce new device virtio-ism Xuan Zhuo
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-17  7:47 UTC (permalink / raw)
  To: virtio-dev
  Cc: hans, herongguang, zmlcc, dust.li, tonylu, zhenzao, helinguo,
	gerry, xuanzhuo, mst, cohuck, jasowang

Use device ID 43

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
Signed-off-by: Hans Zhang <hans@linux.alibaba.com>
Signed-off-by: He Rongguang <herongguang@linux.alibaba.com>
---
 content.tex | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/content.tex b/content.tex
index e863709..cd006c3 100644
--- a/content.tex
+++ b/content.tex
@@ -2990,6 +2990,8 @@ \chapter{Device Types}\label{sec:Device Types}
 \hline
 42         &   RDMA device \\
 \hline
+43         &   ISM device \\
+\hline
 \end{tabular}
 
 Some of the devices above are unspecified by this document,
-- 
2.32.0.3.g01195cf9f


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2/2] virtio-ism: introduce new device virtio-ism
  2022-10-17  7:47 [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device Xuan Zhuo
  2022-10-17  7:47 ` [virtio-dev] [PATCH 1/2] Reserve device id for ISM device Xuan Zhuo
@ 2022-10-17  7:47 ` Xuan Zhuo
  2022-10-17  8:17 ` [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device Jason Wang
  2022-10-18  7:32 ` Jan Kiszka
  3 siblings, 0 replies; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-17  7:47 UTC (permalink / raw)
  To: virtio-dev
  Cc: hans, herongguang, zmlcc, dust.li, tonylu, zhenzao, helinguo,
	gerry, xuanzhuo, mst, cohuck, jasowang

The virtio ism device provides and manages many memory ism regions in
host. These ism regions can be alloc/attach/detach by driver. Every
ism region can be shared by token with other VM after allocation.
The driver obtains the memory region on the host through the memory on
the device.

|-------------------------------------------------------------------------------------------------------------|
| |------------------------------------------------|       |------------------------------------------------| |
| | Guest                                          |       | Guest                                          | |
| |                                                |       |                                                | |
| |   ----------------                             |       |   ----------------                             | |
| |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
| |   ----------------       |      |      |       |       |   ----------------               |      |      | |
| |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
| |    |  |                  |      |      |       |       |    |  |                          |      |      | |
| |    |  |                -------------------     |       |    |  |                --------------------    | |
| |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
| |    |  |                -------------------     |       |    |  |                --------------------    | |
| |                                |               |       |                               |                | |
| |                                |               |       |                               |                | |
| | Qemu                           |               |       | Qemu                          |                | |
| |--------------------------------+---------------|       |-------------------------------+----------------| |
|                                  |                                                       |                  |
|                                  |                                                       |                  |
|                                  |------------------------------+------------------------|                  |
|                                                                 |                                           |
|                                                                 |                                           |
|                                                   --------------------------                                |
|                                                    | M1 |   | M2 |   | M3 |                                 |
|                                                   --------------------------                                |
|                                                                                                             |
| HOST                                                                                                        |
---------------------------------------------------------------------------------------------------------------

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
Signed-off-by: Hans Zhang <hans@linux.alibaba.com>
Signed-off-by: He Rongguang <herongguang@linux.alibaba.com>
---
 content.tex    |   1 +
 virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 341 insertions(+)
 create mode 100644 virtio-ism.tex

diff --git a/content.tex b/content.tex
index cd006c3..dc99f77 100644
--- a/content.tex
+++ b/content.tex
@@ -6853,6 +6853,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
 \input{virtio-scmi.tex}
 \input{virtio-gpio.tex}
 \input{virtio-pmem.tex}
+\input{virtio-ism.tex}
 
 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
diff --git a/virtio-ism.tex b/virtio-ism.tex
new file mode 100644
index 0000000..28ce8ec
--- /dev/null
+++ b/virtio-ism.tex
@@ -0,0 +1,340 @@
+\section{ISM Device}\label{sec:Device Types / ISM Device}
+
+ISM devices provide the ability to share memory between different guests on a
+host. A guest's memory got from ism device can be shared with multiple peers at
+the same time. This shared relationship can be dynamically created and released.
+
+The shared memory obtained from the device is divided into multiple ism regions for
+share. The size of each ism region is \field{region_size}(the actual available
+memory may be smaller). The unit of operation of the driver to the shared memory
+is the ism region.
+
+ISM device provides a mechanism to notify other ism region referrers of
+content update events.
+
+
+\subsection{Device ID}\label{sec:Device Types / ISM Device / Device ID}
+  43
+
+\subsection{Virtqueues}\label{sec:Device Types / ISM Device / Virtqueues}
+\begin{description}
+\item[0] controlq
+\item[1] eventq
+\end{description}
+
+eventq only exists if VIRTIO_ISM_F_EVENT_VQ is negotiated.
+
+\subsection{Feature bits}\label{sec:Device Types / ISM Device / Feature bits}
+
+\begin{description}
+\item[VIRTIO_ISM_F_EVENT_VQ (0)] The ISM driver uses eventq to receive the ism regions update event.
+\item[VIRTIO_ISM_F_EVENT_IRQ (1)] Each ism region is directly bound to an interrupt to receive update events.
+\end{description}
+
+\subsection{Device configuration layout}\label{sec:Device Types / ISM Device / Device configuration layout}
+
+\begin{lstlisting}
+struct virtio_ism_config {
+	le64 dev_id;
+	le64 region_size;
+	le64 notify_size;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{dev_id}]      the id of the device.
+\item[\field{region_size}] the size of the every ism region.
+\item[\field{notify_size}] the size of the notify address.
+
+\end{description}
+
+\subsection{Event}\label{sec:Device Types / Network Device / Device Operation / Event}
+
+When VIRTIO_ISM_F_EVENT_VQ or VIRTIO_ISM_F_EVENT_IRQ is negotiated, the ism
+device supports event notification of ism region update. After the device
+receives the notification from the driver, it MUST notify other guests that
+refer to this ism region.
+
+Such a structure will be received if VIRTIO_ISM_F_EVENT_VQ is negotiated.
+
+\begin{lstlisting}
+enum virtio_ism_event {
+    le64 num;
+    le64 offset[];
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{num}] The number of ism regions with update events.
+\item[\field{offset}] The offset of ism regions with update events.
+\end{description}
+
+If VIRTIO_ISM_F_EVENT_IRQ is negotiated, when the driver receives an interrupt,
+it means that the ism region associated with it has been updated.
+
+
+\subsection{Permissions}\label{sec:Device Types / Network Device / Device Operation / Permission}
+
+The driver can set independent permissions for a certain ism region. Restrict
+which devices can execute attach or read and write permissions after attach.
+
+By default, the ism region can be attached by any device, and the driver can set
+it to not allow attachment or only allow the specified device to attach.
+
+The driver can set the read and write permissions after it is attached by
+default, and can also set independent read and write permissions for some
+devices.
+
+When a driver has the management permission of the ism region,
+then it can modify the permissions of this ism region.
+By default, only the device that created the ism region has this permission.
+
+
+\subsection{Device Initialization}\label{sec:Device Types / ISM Device / Device Initialization}
+
+\devicenormative{\subsubsection}{Device Initialization}{Device Types / ISM Device / Device Initialization}
+
+The device MUST regenerate a \field{dev_id}. \field{dev_id} remains unchanged
+during reset. \field{dev_id} MUST NOT be 0;
+
+The device shares memory to the guest based on shared memory regions
+\ref{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}.
+However, it does not need to allocate physical memory during initialization.
+
+The \field{shmid} of a region MUST be one of the following
+\begin{lstlisting}
+enum virtio_ism_shm_id {
+        VIRTIO_ISM_SHM_ID_UNDEFINED = 0,
+        VIRTIO_ISM_SHM_ID_REGIONS   = 1,
+        VIRTIO_ISM_SHM_ID_NOTIFY    = 2,
+};
+\end{lstlisting}
+
+The shared memory whose shmid is VIRTIO_ISM_SHM_ID_REGIONS is used to implement
+ism regions. If there are multiple shared memory whose shmid is
+VIRTIO_ISM_SHM_ID_REGIONS, they are used as contiguous memory in the order of
+acquisition.
+
+If VIRTIO_ISM_F_EVENT_VQ or VIRTIO_ISM_F_EVENT_IRQ is negotiated, the device
+MUST also provides a shared memory with VIRTIO_ISM_SHM_ID_NOTIFY to the driver.
+This memory area is used for notify, and each ism region MUST have a
+corresponding notify address inside this area, and the size of the notify
+address is \field{notify_size};
+
+\drivernormative{\subsubsection}{Device Initialization}{Device Types / ISM Device / Device Initialization}
+
+The driver MUST query all shared memory regions supported by the device.
+(see \ref{sec:Basic Facilities of a Virtio Device / Shared Memory Regions})
+
+Use \field{offset} to reference the ism region.
+
+If VIRTIO_ISM_F_EVENT_VQ is negotiated, then the driver MUST initialize eventq
+to get update events for the ism region.
+
+If VIRTIO_ISM_F_EVENT_IRQ is negotiated, the driver MUST initiate interrupts to
+obtain update events for the ism region. And the driver MUST inform the device
+the interrupt vectors for one ism region.
+
+\subsection{Control Virtqueue}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue}
+
+The driver uses the control virtqueue send commands to implement operations on
+the ism region and some global configurations.
+
+All commands are of the following form:
+\begin{lstlisting}
+struct virtio_ism_ctrl {
+        u8 class;
+        u8 command;
+        u8 command-specific-data[];
+        u8 ack;
+        u8 command-specific-data-reply[];
+};
+
+/* ack values */
+#define VIRTIO_ISM_OK     0
+#define VIRTIO_NET_ERR    1
+\end{lstlisting}
+
+The \field{class}, \field{command} and command-specific-data are set by the
+driver, and the device sets the \field{ack} byte and optionally
+\field{command-specific-data-reply}. There is little the driver can
+do except issue a diagnostic if \field{ack} is not VIRTIO_NET_OK.
+
+\subsection{Device Operation}  \label{sec:Device Types / ISM Driver / Device Operation}
+
+\subsubsection{Alloc ISM Region}\label{sec:Device Types / ISM Device / Device Operation / Alloc ISM Region}
+
+Based on controlq, the driver can request an ism region to be allocated.
+
+The ism region obtained from the device will carry a token, which can be passed
+to other guests for attaching to this ism region.
+
+\begin{lstlisting}
+
+#define VIRTIO_ISM_TOKEN_SIZE 64
+
+struct virtio_ism_ctrl_alloc {
+       le64 size;
+};
+
+struct virtio_ism_ctrl_alloc_reply {
+       u8 token[VIRTIO_ISM_TOKEN_SIZE];
+       le64 offset;
+};
+
+#define VIRTIO_ISM_CTRL_ALLOC  0
+ #define VIRTIO_ISM_CTRL_ALLOC_REGION 0
+\end{lstlisting}
+
+
+\devicenormative{\subparagraph}{Alloc ISM Region}{Device Types / ISM Device / Device Operation / Alloc ISM Region}
+
+The device sets \field{ack} to VIRTIO_ISM_OK after successfully assigning the
+physical ism region. At the same time, a new token MUST be dynamically created
+for this ism region. \field{offset} is the location of this ism region in shared
+memory.
+
+If there is no free area of the shared memory space, the device MUST set
+\field{ack} to VIRTIO_ISM_ERR.
+
+If new physical memory cannot be allocated, the device MUST set
+\field{ack} to VIRTIO_ISM_ERR.
+
+The device MUST clear the new ism region before committing to the guest.
+
+If \field{size} is greater than \field{region_size}, the device MUST set
+\field{ack} to VIRTIO_ISM_ERR.
+
+If \field{size} is smaller than \field{region_size}, the ism region also
+occupies \field{region_size} in the shared memory space.
+
+\drivernormative{\subparagraph}{Alloc ISM Region}{Device Types / ISM Device / Device Operation / Alloc ISM Region}
+
+After the alloc request is successful, the driver MUST only use the range
+\field{offset} to \field{offset} + \field{size} - 1.
+
+\subsubsection{Attach ISM Region}\label{sec:Device Types / ISM Device / Device Operation / Attach ISM Region}
+
+Based on controlq, the driver can request to attach an ism region with a
+specified token.
+
+\begin{lstlisting}
+struct virtio_ism_ctrl_attach {
+       u8 token[VIRTIO_ISM_TOKEN_SIZE];
+};
+
+struct virtio_ism_ctrl_attach_reply {
+       le64 offset;
+};
+
+#define VIRTIO_ISM_CTRL_ATTACH  1
+ #define VIRTIO_ISM_CTRL_ATTACH_REGION 0
+\end{lstlisting}
+\devicenormative{\subparagraph}{Attach ISM Region}{Device Types / ISM Device / Device Operation / Attach ISM Region}
+
+If there is no free area of the shared memory space, the device MUST set
+\field{ack} to VIRTIO_ISM_ERR.
+
+If the ism region specified by \field{token} does not exist, the device MUST set
+\field{ack} to VIRTIO_ISM_ERR.
+
+After the attach operation, an ism region can ONLY be shared between these two
+guests, even if one of them operates detach, but as long as the ism region is
+not completely released, the ism region can only be re-attached by the previous
+guest and cannot share with other guests.
+
+\subsubsection{Detach ISM Region}\label{sec:Device Types / ISM Device / Device Operation / Detach ISM Region}
+Based on controlq, the device can release references to the ism region.
+
+\begin{lstlisting}
+struct virtio_ism_ctrl_detach {
+       le64 offset;
+};
+
+#define VIRTIO_ISM_CTRL_DETACH  2
+ #define VIRTIO_ISM_CTRL_DETACH_REGION 0
+\end{lstlisting}
+
+\devicenormative{\subparagraph}{Detach ISM Region}{Device Types / ISM Device / Device Operation / Detach ISM Region}
+
+If the location specified by \field{offset} is not assigned an ism region,
+the device MUST set \field{ack} to VIRTIO_ISM_ERR.
+
+The device MUST release the physical memory of the ism region specified by
+\field{offset} from the guest.
+
+The device can only fully release an ism region after all devices have released
+references to the ism region.
+
+\subsubsection{Grant ISM Region}\label{sec:Device Types / ISM Device / Device Operation / Grant ISM Region}
+Based on controlq, the driver can set the access permissions for each ism
+region.
+
+\begin{lstlisting}
+struct virtio_ism_ctrl_grant {
+       le64 offset;
+       le64 peer_dev_id;
+       le64 permissions;
+};
+
+#define VIRTIO_ISM_CTRL_GRANT  3
+ #define VIRTIO_ISM_CTRL_GRANT_SET 0
+
+#define VIRTIO_ISM_PERM_READ       (1 << 0)
+#define VIRTIO_ISM_PERM_WRITE      (1 << 1)
+#define VIRTIO_ISM_PERM_ATTACH     (1 << 2)
+#define VIRTIO_ISM_PERM_MANAGE     (1 << 3)
+#define VIRTIO_ISM_PERM_DENY_OTHER (1 << 4)
+
+\end{lstlisting}
+
+\begin{description}
+\item[VIRTIO_ISM_PERM_READ] read permission
+\item[VIRTIO_ISM_PERM_WRITE] write permission
+\item[VIRTIO_ISM_PERM_ATTACH] attach permission
+\item[VIRTIO_ISM_PERM_MANAGE] Management permission, the device with this
+    permission can modify the permission of this ism region. By default, only
+    the alloc device has this permission.
+\item[VIRTIO_ISM_PERM_DENY_OTHER] Unspecified devices do not have attach
+    permission.
+
+\end{description}
+
+Permission control is divided into two categories, one is the permission for the
+specified device, and the other is the default permission that does not specify
+the device.
+
+If \field{peer_dev_id} is 0, it is used to configure the default device
+permissions.
+
+\devicenormative{\subparagraph}{Grant ISM Region}{Device Types / ISM Device / Device Operation / Grant ISM Region}
+
+If the location specified by \field{offset} is not assigned an ism region,
+the device MUST set \field{ack} to VIRTIO_ISM_ERR.
+
+The device MUST respond to the driver's request based on the permissions the
+device has.
+
+\subsubsection{Inform Event IRQ Vector}\label{sec:Device Types / ISM Device / Device Operation / Inform Event IRQ Vector}
+
+If VIRTIO_ISM_F_EVENT_IRQ is negotiated, the driver should tell which interrupt
+vector to use for event notification.
+
+\begin{lstlisting}
+struct virtio_ism_ctrl_irq_vector {
+       le64 offset;
+       le64 vector;
+};
+
+#define VIRTIO_ISM_CTRL_EVENT_VECTOR  4
+ #define VIRTIO_ISM_CTRL_EVENT_VECTOR_SET 0
+\end{lstlisting}
+
+
+\devicenormative{\subparagraph}{Inform Event IRQ Vector}{Device Types / ISM Device / Device Operation / Inform Event IRQ Vector}
+
+The device MUST record the relationship between the ism region and the vector
+notified by the driver, and notify the driver based on the corresponding vector
+when the ism region is updated.
+
+
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-17  7:47 [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device Xuan Zhuo
  2022-10-17  7:47 ` [virtio-dev] [PATCH 1/2] Reserve device id for ISM device Xuan Zhuo
  2022-10-17  7:47 ` [PATCH 2/2] virtio-ism: introduce new device virtio-ism Xuan Zhuo
@ 2022-10-17  8:17 ` Jason Wang
  2022-10-17 12:26   ` Xuan Zhuo
                     ` (2 more replies)
  2022-10-18  7:32 ` Jan Kiszka
  3 siblings, 3 replies; 61+ messages in thread
From: Jason Wang @ 2022-10-17  8:17 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

Adding Stefan.


On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Hello everyone,
>
> # Background
>
> Nowadays, there is a common scenario to accelerate communication between
> different VMs and containers, including light weight virtual machine based
> containers. One way to achieve this is to colocate them on the same host.
> However, the performance of inter-VM communication through network stack is not
> optimal and may also waste extra CPU cycles. This scenario has been discussed
> many times, but still no generic solution available [1] [2] [3].
>
> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> We found that by changing the communication channel between VMs from TCP to SMC
> with shared memory, we can achieve superior performance for a common
> socket-based application[5]:
>   - latency reduced by about 50%
>   - throughput increased by about 300%
>   - CPU consumption reduced by about 50%
>
> Since there is no particularly suitable shared memory management solution
> matches the need for SMC(See ## Comparison with existing technology), and virtio
> is the standard for communication in the virtualization world, we want to
> implement a virtio-ism device based on virtio, which can support on-demand
> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> the virtio-ism device need to support:
>
> 1. Dynamic provision: shared memory regions are dynamically allocated and
>    provisioned.
> 2. Multi-region management: the shared memory is divided into regions,
>    and a peer may allocate one or more regions from the same shared memory
>    device.
> 3. Permission control: The permission of each region can be set seperately.

Looks like virtio-ROCE

https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/

and virtio-vhost-user can satisfy the requirement?

>
> # Virtio ism device
>
> ISM devices provide the ability to share memory between different guests on a
> host. A guest's memory got from ism device can be shared with multiple peers at
> the same time. This shared relationship can be dynamically created and released.
>
> The shared memory obtained from the device is divided into multiple ism regions
> for share. ISM device provides a mechanism to notify other ism region referrers
> of content update events.
>
> # Usage (SMC as example)
>
> Maybe there is one of possible use cases:
>
> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
>    location of a memory region in the PCI space and a token.
> 2. The ism driver mmap the memory region and return to SMC with the token
> 3. SMC passes the token to the connected peer
> 3. the peer calls the ism driver interface ism_attach_region(token) to
>    get the location of the PCI space of the shared memory
>
>
> # About hot plugging of the ism device
>
>    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
>    less scalable operation. So, we don't plan to support it for now.
>
> # Comparison with existing technology
>
> ## ivshmem or ivshmem 2.0 of Qemu
>
>    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
>    use this VM, so the security is not enough.
>
>    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
>    other VMs that use the ivshmem 2.0 shared memory device, which also does not
>    meet our needs in terms of security.
>
> ## vhost-pci and virtiovhostuser
>
>    Does not support dynamic allocation and therefore not suitable for SMC.

I think this is an implementation issue, we can support VHOST IOTLB
message then the regions could be added/removed on demand.

Thanks

>
> # Design
>
>    This is a structure diagram based on ism sharing between two vms.
>
>     |-------------------------------------------------------------------------------------------------------------|
>     | |------------------------------------------------|       |------------------------------------------------| |
>     | | Guest                                          |       | Guest                                          | |
>     | |                                                |       |                                                | |
>     | |   ----------------                             |       |   ----------------                             | |
>     | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
>     | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
>     | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
>     | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
>     | |    |  |                -------------------     |       |    |  |                --------------------    | |
>     | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
>     | |    |  |                -------------------     |       |    |  |                --------------------    | |
>     | |                                |               |       |                               |                | |
>     | |                                |               |       |                               |                | |
>     | | Qemu                           |               |       | Qemu                          |                | |
>     | |--------------------------------+---------------|       |-------------------------------+----------------| |
>     |                                  |                                                       |                  |
>     |                                  |                                                       |                  |
>     |                                  |------------------------------+------------------------|                  |
>     |                                                                 |                                           |
>     |                                                                 |                                           |
>     |                                                   --------------------------                                |
>     |                                                    | M1 |   | M2 |   | M3 |                                 |
>     |                                                   --------------------------                                |
>     |                                                                                                             |
>     | HOST                                                                                                        |
>     ---------------------------------------------------------------------------------------------------------------
>
> # POC code
>
>    Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
>    Qemu:   https://github.com/fengidri/qemu/commits/ism
>
> If there are any problems, please point them out.
>
> Hope to hear from you, thank you.
>
> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> [2] https://dl.acm.org/doi/10.1145/2847562
> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> [4] https://lwn.net/Articles/711071/
> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
>
>
> Xuan Zhuo (2):
>   Reserve device id for ISM device
>   virtio-ism: introduce new device virtio-ism
>
>  content.tex    |   3 +
>  virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 343 insertions(+)
>  create mode 100644 virtio-ism.tex
>
> --
> 2.32.0.3.g01195cf9f
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-17  8:17 ` [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device Jason Wang
@ 2022-10-17 12:26   ` Xuan Zhuo
  2022-10-18  6:54     ` Jason Wang
  2022-10-18  3:15   ` dust.li
  2022-10-19  2:34   ` Xuan Zhuo
  2 siblings, 1 reply; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-17 12:26 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> Adding Stefan.
>
>
> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Hello everyone,
> >
> > # Background
> >
> > Nowadays, there is a common scenario to accelerate communication between
> > different VMs and containers, including light weight virtual machine based
> > containers. One way to achieve this is to colocate them on the same host.
> > However, the performance of inter-VM communication through network stack is not
> > optimal and may also waste extra CPU cycles. This scenario has been discussed
> > many times, but still no generic solution available [1] [2] [3].
> >
> > With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> > We found that by changing the communication channel between VMs from TCP to SMC
> > with shared memory, we can achieve superior performance for a common
> > socket-based application[5]:
> >   - latency reduced by about 50%
> >   - throughput increased by about 300%
> >   - CPU consumption reduced by about 50%
> >
> > Since there is no particularly suitable shared memory management solution
> > matches the need for SMC(See ## Comparison with existing technology), and virtio
> > is the standard for communication in the virtualization world, we want to
> > implement a virtio-ism device based on virtio, which can support on-demand
> > memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> > the virtio-ism device need to support:
> >
> > 1. Dynamic provision: shared memory regions are dynamically allocated and
> >    provisioned.
> > 2. Multi-region management: the shared memory is divided into regions,
> >    and a peer may allocate one or more regions from the same shared memory
> >    device.
> > 3. Permission control: The permission of each region can be set seperately.
>
> Looks like virtio-ROCE
>
> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
>
> and virtio-vhost-user can satisfy the requirement?
>
> >
> > # Virtio ism device
> >
> > ISM devices provide the ability to share memory between different guests on a
> > host. A guest's memory got from ism device can be shared with multiple peers at
> > the same time. This shared relationship can be dynamically created and released.
> >
> > The shared memory obtained from the device is divided into multiple ism regions
> > for share. ISM device provides a mechanism to notify other ism region referrers
> > of content update events.
> >
> > # Usage (SMC as example)
> >
> > Maybe there is one of possible use cases:
> >
> > 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> >    location of a memory region in the PCI space and a token.
> > 2. The ism driver mmap the memory region and return to SMC with the token
> > 3. SMC passes the token to the connected peer
> > 3. the peer calls the ism driver interface ism_attach_region(token) to
> >    get the location of the PCI space of the shared memory
> >
> >
> > # About hot plugging of the ism device
> >
> >    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> >    less scalable operation. So, we don't plan to support it for now.
> >
> > # Comparison with existing technology
> >
> > ## ivshmem or ivshmem 2.0 of Qemu
> >
> >    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> >    use this VM, so the security is not enough.
> >
> >    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> >    other VMs that use the ivshmem 2.0 shared memory device, which also does not
> >    meet our needs in terms of security.
> >
> > ## vhost-pci and virtiovhostuser
> >
> >    Does not support dynamic allocation and therefore not suitable for SMC.
>
> I think this is an implementation issue, we can support VHOST IOTLB
> message then the regions could be added/removed on demand.


1. After the attacker connects with the victim, if the attacker does not
   dereference memory, the memory will be occupied under virtiovhostuser. In the
   case of ism devices, the victim can directly release the reference, and the
   maliciously referenced region only occupies the attacker's resources

2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
   time, which is a challenge for virtiovhostuser

3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
   determines the sharing relationship at startup.

4. For security issues, the device under virtiovhostuser may mmap more memory,
   while ism only maps one region to other devices

Thanks.

>
> Thanks
>
> >
> > # Design
> >
> >    This is a structure diagram based on ism sharing between two vms.
> >
> >     |-------------------------------------------------------------------------------------------------------------|
> >     | |------------------------------------------------|       |------------------------------------------------| |
> >     | | Guest                                          |       | Guest                                          | |
> >     | |                                                |       |                                                | |
> >     | |   ----------------                             |       |   ----------------                             | |
> >     | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> >     | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> >     | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> >     | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> >     | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> >     | |                                |               |       |                               |                | |
> >     | |                                |               |       |                               |                | |
> >     | | Qemu                           |               |       | Qemu                          |                | |
> >     | |--------------------------------+---------------|       |-------------------------------+----------------| |
> >     |                                  |                                                       |                  |
> >     |                                  |                                                       |                  |
> >     |                                  |------------------------------+------------------------|                  |
> >     |                                                                 |                                           |
> >     |                                                                 |                                           |
> >     |                                                   --------------------------                                |
> >     |                                                    | M1 |   | M2 |   | M3 |                                 |
> >     |                                                   --------------------------                                |
> >     |                                                                                                             |
> >     | HOST                                                                                                        |
> >     ---------------------------------------------------------------------------------------------------------------
> >
> > # POC code
> >
> >    Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> >    Qemu:   https://github.com/fengidri/qemu/commits/ism
> >
> > If there are any problems, please point them out.
> >
> > Hope to hear from you, thank you.
> >
> > [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> > [2] https://dl.acm.org/doi/10.1145/2847562
> > [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> > [4] https://lwn.net/Articles/711071/
> > [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> >
> >
> > Xuan Zhuo (2):
> >   Reserve device id for ISM device
> >   virtio-ism: introduce new device virtio-ism
> >
> >  content.tex    |   3 +
> >  virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 343 insertions(+)
> >  create mode 100644 virtio-ism.tex
> >
> > --
> > 2.32.0.3.g01195cf9f
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-17  8:17 ` [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device Jason Wang
  2022-10-17 12:26   ` Xuan Zhuo
@ 2022-10-18  3:15   ` dust.li
  2022-10-18  7:29     ` Jason Wang
  2022-10-19  2:34   ` Xuan Zhuo
  2 siblings, 1 reply; 61+ messages in thread
From: dust.li @ 2022-10-18  3:15 UTC (permalink / raw)
  To: Jason Wang, Xuan Zhuo
  Cc: virtio-dev, hans, herongguang, zmlcc, tonylu, zhenzao, helinguo,
	gerry, mst, cohuck, Stefan Hajnoczi

On Mon, Oct 17, 2022 at 04:17:31PM +0800, Jason Wang wrote:
>Adding Stefan.
>
>
>On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>
>> Hello everyone,
>>
>> # Background
>>
>> Nowadays, there is a common scenario to accelerate communication between
>> different VMs and containers, including light weight virtual machine based
>> containers. One way to achieve this is to colocate them on the same host.
>> However, the performance of inter-VM communication through network stack is not
>> optimal and may also waste extra CPU cycles. This scenario has been discussed
>> many times, but still no generic solution available [1] [2] [3].
>>
>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
>> We found that by changing the communication channel between VMs from TCP to SMC
>> with shared memory, we can achieve superior performance for a common
>> socket-based application[5]:
>>   - latency reduced by about 50%
>>   - throughput increased by about 300%
>>   - CPU consumption reduced by about 50%
>>
>> Since there is no particularly suitable shared memory management solution
>> matches the need for SMC(See ## Comparison with existing technology), and virtio
>> is the standard for communication in the virtualization world, we want to
>> implement a virtio-ism device based on virtio, which can support on-demand
>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
>> the virtio-ism device need to support:
>>
>> 1. Dynamic provision: shared memory regions are dynamically allocated and
>>    provisioned.
>> 2. Multi-region management: the shared memory is divided into regions,
>>    and a peer may allocate one or more regions from the same shared memory
>>    device.
>> 3. Permission control: The permission of each region can be set seperately.
>
>Looks like virtio-ROCE
>
>https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/

Thanks for your reply!

Yes, RoCE is OK for SMC and can support all those features.
And SMC already support RoCE now.

The biggest advantage of virito-ism compared to roce is performance.
When 2 VMs are on the same host. With RoCE, the RDMA device still need
to do a memory copy to transfer the data from one VM to another, regardless
of the devcie is implemented by software or hardware.
But with this virito-ism device, the memory can be truely shared between
2 VMs, and no memory copy is needed in the datapath.


>
>and virtio-vhost-user can satisfy the requirement?

XuanZhuo has already listed the reasons, but I want to say something
more about that.

We throught about virtio-vhost-user before, and I think the biggest
different between virtio-vhost-user and virtio-ism device is where
the shared memory comes from.

IIUC, with virtio-vhost-user, the shared memory belongs to the front-end
VM, and mapped to the backend VM. But with virtio-ism device, the shared
memory is from the device, and mapped to both VMs.

So, with virtio-vhost-user, if the front-end VM want to disconnect with
the back-end VM, it has no way to do it. If the front-end VM has
disconnect and released its reference to the shared memory, but the
back-end VM didn't(intentional or unintentional), the front-end VM
cannot reuse that memory. This creates a big security hole.

With virtio-ism, we can avoid that using a backend server to account
the shared memory usage of each VM. Since the shared memory belongs
to the device, any VM who has released its reference to the shared
memory will no longer be accounted, thus can allocate new memory from
the device.

Thanks.

>
>>
>> # Virtio ism device
>>
>> ISM devices provide the ability to share memory between different guests on a
>> host. A guest's memory got from ism device can be shared with multiple peers at
>> the same time. This shared relationship can be dynamically created and released.
>>
>> The shared memory obtained from the device is divided into multiple ism regions
>> for share. ISM device provides a mechanism to notify other ism region referrers
>> of content update events.
>>
>> # Usage (SMC as example)
>>
>> Maybe there is one of possible use cases:
>>
>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
>>    location of a memory region in the PCI space and a token.
>> 2. The ism driver mmap the memory region and return to SMC with the token
>> 3. SMC passes the token to the connected peer
>> 3. the peer calls the ism driver interface ism_attach_region(token) to
>>    get the location of the PCI space of the shared memory
>>
>>
>> # About hot plugging of the ism device
>>
>>    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
>>    less scalable operation. So, we don't plan to support it for now.
>>
>> # Comparison with existing technology
>>
>> ## ivshmem or ivshmem 2.0 of Qemu
>>
>>    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
>>    use this VM, so the security is not enough.
>>
>>    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
>>    other VMs that use the ivshmem 2.0 shared memory device, which also does not
>>    meet our needs in terms of security.
>>
>> ## vhost-pci and virtiovhostuser
>>
>>    Does not support dynamic allocation and therefore not suitable for SMC.
>
>I think this is an implementation issue, we can support VHOST IOTLB
>message then the regions could be added/removed on demand.
>
>Thanks
>
>>
>> # Design
>>
>>    This is a structure diagram based on ism sharing between two vms.
>>
>>     |-------------------------------------------------------------------------------------------------------------|
>>     | |------------------------------------------------|       |------------------------------------------------| |
>>     | | Guest                                          |       | Guest                                          | |
>>     | |                                                |       |                                                | |
>>     | |   ----------------                             |       |   ----------------                             | |
>>     | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
>>     | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
>>     | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
>>     | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
>>     | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>     | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
>>     | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>     | |                                |               |       |                               |                | |
>>     | |                                |               |       |                               |                | |
>>     | | Qemu                           |               |       | Qemu                          |                | |
>>     | |--------------------------------+---------------|       |-------------------------------+----------------| |
>>     |                                  |                                                       |                  |
>>     |                                  |                                                       |                  |
>>     |                                  |------------------------------+------------------------|                  |
>>     |                                                                 |                                           |
>>     |                                                                 |                                           |
>>     |                                                   --------------------------                                |
>>     |                                                    | M1 |   | M2 |   | M3 |                                 |
>>     |                                                   --------------------------                                |
>>     |                                                                                                             |
>>     | HOST                                                                                                        |
>>     ---------------------------------------------------------------------------------------------------------------
>>
>> # POC code
>>
>>    Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
>>    Qemu:   https://github.com/fengidri/qemu/commits/ism
>>
>> If there are any problems, please point them out.
>>
>> Hope to hear from you, thank you.
>>
>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
>> [2] https://dl.acm.org/doi/10.1145/2847562
>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
>> [4] https://lwn.net/Articles/711071/
>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
>>
>>
>> Xuan Zhuo (2):
>>   Reserve device id for ISM device
>>   virtio-ism: introduce new device virtio-ism
>>
>>  content.tex    |   3 +
>>  virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 343 insertions(+)
>>  create mode 100644 virtio-ism.tex
>>
>> --
>> 2.32.0.3.g01195cf9f
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-17 12:26   ` Xuan Zhuo
@ 2022-10-18  6:54     ` Jason Wang
  2022-10-18  8:33       ` Gerry
                         ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Jason Wang @ 2022-10-18  6:54 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > Adding Stefan.
> >
> >
> > On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > Hello everyone,
> > >
> > > # Background
> > >
> > > Nowadays, there is a common scenario to accelerate communication between
> > > different VMs and containers, including light weight virtual machine based
> > > containers. One way to achieve this is to colocate them on the same host.
> > > However, the performance of inter-VM communication through network stack is not
> > > optimal and may also waste extra CPU cycles. This scenario has been discussed
> > > many times, but still no generic solution available [1] [2] [3].
> > >
> > > With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> > > We found that by changing the communication channel between VMs from TCP to SMC
> > > with shared memory, we can achieve superior performance for a common
> > > socket-based application[5]:
> > >   - latency reduced by about 50%
> > >   - throughput increased by about 300%
> > >   - CPU consumption reduced by about 50%
> > >
> > > Since there is no particularly suitable shared memory management solution
> > > matches the need for SMC(See ## Comparison with existing technology), and virtio
> > > is the standard for communication in the virtualization world, we want to
> > > implement a virtio-ism device based on virtio, which can support on-demand
> > > memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> > > the virtio-ism device need to support:
> > >
> > > 1. Dynamic provision: shared memory regions are dynamically allocated and
> > >    provisioned.
> > > 2. Multi-region management: the shared memory is divided into regions,
> > >    and a peer may allocate one or more regions from the same shared memory
> > >    device.
> > > 3. Permission control: The permission of each region can be set seperately.
> >
> > Looks like virtio-ROCE
> >
> > https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> >
> > and virtio-vhost-user can satisfy the requirement?
> >
> > >
> > > # Virtio ism device
> > >
> > > ISM devices provide the ability to share memory between different guests on a
> > > host. A guest's memory got from ism device can be shared with multiple peers at
> > > the same time. This shared relationship can be dynamically created and released.
> > >
> > > The shared memory obtained from the device is divided into multiple ism regions
> > > for share. ISM device provides a mechanism to notify other ism region referrers
> > > of content update events.
> > >
> > > # Usage (SMC as example)
> > >
> > > Maybe there is one of possible use cases:
> > >
> > > 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> > >    location of a memory region in the PCI space and a token.
> > > 2. The ism driver mmap the memory region and return to SMC with the token
> > > 3. SMC passes the token to the connected peer
> > > 3. the peer calls the ism driver interface ism_attach_region(token) to
> > >    get the location of the PCI space of the shared memory
> > >
> > >
> > > # About hot plugging of the ism device
> > >
> > >    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> > >    less scalable operation. So, we don't plan to support it for now.
> > >
> > > # Comparison with existing technology
> > >
> > > ## ivshmem or ivshmem 2.0 of Qemu
> > >
> > >    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> > >    use this VM, so the security is not enough.
> > >
> > >    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> > >    other VMs that use the ivshmem 2.0 shared memory device, which also does not
> > >    meet our needs in terms of security.
> > >
> > > ## vhost-pci and virtiovhostuser
> > >
> > >    Does not support dynamic allocation and therefore not suitable for SMC.
> >
> > I think this is an implementation issue, we can support VHOST IOTLB
> > message then the regions could be added/removed on demand.
>
>
> 1. After the attacker connects with the victim, if the attacker does not
>    dereference memory, the memory will be occupied under virtiovhostuser. In the
>    case of ism devices, the victim can directly release the reference, and the
>    maliciously referenced region only occupies the attacker's resources

Let's define the security boundary here. E.g do we trust the device or
not? If yes, in the case of virtiovhostuser, can we simple do
VHOST_IOTLB_UNMAP then we can safely release the memory from the
attacker.

>
> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
>    time, which is a challenge for virtiovhostuser

Please elaborate more the the challenges, anything make
virtiovhostuser different?

>
> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
>    determines the sharing relationship at startup.

Not necessarily with IOTLB API?

>
> 4. For security issues, the device under virtiovhostuser may mmap more memory,
>    while ism only maps one region to other devices

With VHOST_IOTLB_MAP, the map could be done per region.

Thanks

>
> Thanks.
>
> >
> > Thanks
> >
> > >
> > > # Design
> > >
> > >    This is a structure diagram based on ism sharing between two vms.
> > >
> > >     |-------------------------------------------------------------------------------------------------------------|
> > >     | |------------------------------------------------|       |------------------------------------------------| |
> > >     | | Guest                                          |       | Guest                                          | |
> > >     | |                                                |       |                                                | |
> > >     | |   ----------------                             |       |   ----------------                             | |
> > >     | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> > >     | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> > >     | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> > >     | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > >     | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > >     | |                                |               |       |                               |                | |
> > >     | |                                |               |       |                               |                | |
> > >     | | Qemu                           |               |       | Qemu                          |                | |
> > >     | |--------------------------------+---------------|       |-------------------------------+----------------| |
> > >     |                                  |                                                       |                  |
> > >     |                                  |                                                       |                  |
> > >     |                                  |------------------------------+------------------------|                  |
> > >     |                                                                 |                                           |
> > >     |                                                                 |                                           |
> > >     |                                                   --------------------------                                |
> > >     |                                                    | M1 |   | M2 |   | M3 |                                 |
> > >     |                                                   --------------------------                                |
> > >     |                                                                                                             |
> > >     | HOST                                                                                                        |
> > >     ---------------------------------------------------------------------------------------------------------------
> > >
> > > # POC code
> > >
> > >    Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> > >    Qemu:   https://github.com/fengidri/qemu/commits/ism
> > >
> > > If there are any problems, please point them out.
> > >
> > > Hope to hear from you, thank you.
> > >
> > > [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> > > [2] https://dl.acm.org/doi/10.1145/2847562
> > > [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> > > [4] https://lwn.net/Articles/711071/
> > > [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> > >
> > >
> > > Xuan Zhuo (2):
> > >   Reserve device id for ISM device
> > >   virtio-ism: introduce new device virtio-ism
> > >
> > >  content.tex    |   3 +
> > >  virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 343 insertions(+)
> > >  create mode 100644 virtio-ism.tex
> > >
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-18  3:15   ` dust.li
@ 2022-10-18  7:29     ` Jason Wang
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Wang @ 2022-10-18  7:29 UTC (permalink / raw)
  To: dust.li, Xuan Zhuo
  Cc: virtio-dev, hans, herongguang, zmlcc, tonylu, zhenzao, helinguo,
	gerry, mst, cohuck, Stefan Hajnoczi, Xie Yongji


在 2022/10/18 11:15, dust.li 写道:
> On Mon, Oct 17, 2022 at 04:17:31PM +0800, Jason Wang wrote:
>> Adding Stefan.
>>
>>
>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>> Hello everyone,
>>>
>>> # Background
>>>
>>> Nowadays, there is a common scenario to accelerate communication between
>>> different VMs and containers, including light weight virtual machine based
>>> containers. One way to achieve this is to colocate them on the same host.
>>> However, the performance of inter-VM communication through network stack is not
>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
>>> many times, but still no generic solution available [1] [2] [3].
>>>
>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
>>> We found that by changing the communication channel between VMs from TCP to SMC
>>> with shared memory, we can achieve superior performance for a common
>>> socket-based application[5]:
>>>    - latency reduced by about 50%
>>>    - throughput increased by about 300%
>>>    - CPU consumption reduced by about 50%
>>>
>>> Since there is no particularly suitable shared memory management solution
>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
>>> is the standard for communication in the virtualization world, we want to
>>> implement a virtio-ism device based on virtio, which can support on-demand
>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
>>> the virtio-ism device need to support:
>>>
>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
>>>     provisioned.
>>> 2. Multi-region management: the shared memory is divided into regions,
>>>     and a peer may allocate one or more regions from the same shared memory
>>>     device.
>>> 3. Permission control: The permission of each region can be set seperately.
>> Looks like virtio-ROCE
>>
>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> Thanks for your reply!
>
> Yes, RoCE is OK for SMC and can support all those features.
> And SMC already support RoCE now.
>
> The biggest advantage of virito-ism compared to roce is performance.
> When 2 VMs are on the same host. With RoCE, the RDMA device still need
> to do a memory copy to transfer the data from one VM to another, regardless
> of the devcie is implemented by software or hardware.
> But with this virito-ism device, the memory can be truely shared between
> 2 VMs, and no memory copy is needed in the datapath.


Adding Yong Ji for more thoughts.


>
>
>> and virtio-vhost-user can satisfy the requirement?
> XuanZhuo has already listed the reasons, but I want to say something
> more about that.
>
> We throught about virtio-vhost-user before, and I think the biggest
> different between virtio-vhost-user and virtio-ism device is where
> the shared memory comes from.
>
> IIUC, with virtio-vhost-user, the shared memory belongs to the front-end
> VM, and mapped to the backend VM. But with virtio-ism device, the shared
> memory is from the device, and mapped to both VMs.


It doesn't differ from the view of host (qemu)? Even it is, it should 
not be hard to mandate the virtio-vhost-user to use memory belong to the 
device.


>
> So, with virtio-vhost-user, if the front-end VM want to disconnect with
> the back-end VM, it has no way to do it. If the front-end VM has
> disconnect and released its reference to the shared memory, but the
> back-end VM didn't(intentional or unintentional), the front-end VM
> cannot reuse that memory.


This can be mandated by the hypervisor (Qemu), isn't it?

Thanks


> This creates a big security hole.
>
> With virtio-ism, we can avoid that using a backend server to account
> the shared memory usage of each VM. Since the shared memory belongs
> to the device, any VM who has released its reference to the shared
> memory will no longer be accounted, thus can allocate new memory from
> the device.
>
> Thanks.
>
>>> # Virtio ism device
>>>
>>> ISM devices provide the ability to share memory between different guests on a
>>> host. A guest's memory got from ism device can be shared with multiple peers at
>>> the same time. This shared relationship can be dynamically created and released.
>>>
>>> The shared memory obtained from the device is divided into multiple ism regions
>>> for share. ISM device provides a mechanism to notify other ism region referrers
>>> of content update events.
>>>
>>> # Usage (SMC as example)
>>>
>>> Maybe there is one of possible use cases:
>>>
>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
>>>     location of a memory region in the PCI space and a token.
>>> 2. The ism driver mmap the memory region and return to SMC with the token
>>> 3. SMC passes the token to the connected peer
>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
>>>     get the location of the PCI space of the shared memory
>>>
>>>
>>> # About hot plugging of the ism device
>>>
>>>     Hot plugging of devices is a heavier, possibly failed, time-consuming, and
>>>     less scalable operation. So, we don't plan to support it for now.
>>>
>>> # Comparison with existing technology
>>>
>>> ## ivshmem or ivshmem 2.0 of Qemu
>>>
>>>     1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
>>>     use this VM, so the security is not enough.
>>>
>>>     2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
>>>     other VMs that use the ivshmem 2.0 shared memory device, which also does not
>>>     meet our needs in terms of security.
>>>
>>> ## vhost-pci and virtiovhostuser
>>>
>>>     Does not support dynamic allocation and therefore not suitable for SMC.
>> I think this is an implementation issue, we can support VHOST IOTLB
>> message then the regions could be added/removed on demand.
>>
>> Thanks
>>
>>> # Design
>>>
>>>     This is a structure diagram based on ism sharing between two vms.
>>>
>>>      |-------------------------------------------------------------------------------------------------------------|
>>>      | |------------------------------------------------|       |------------------------------------------------| |
>>>      | | Guest                                          |       | Guest                                          | |
>>>      | |                                                |       |                                                | |
>>>      | |   ----------------                             |       |   ----------------                             | |
>>>      | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
>>>      | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
>>>      | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
>>>      | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
>>>      | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>>      | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
>>>      | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>>      | |                                |               |       |                               |                | |
>>>      | |                                |               |       |                               |                | |
>>>      | | Qemu                           |               |       | Qemu                          |                | |
>>>      | |--------------------------------+---------------|       |-------------------------------+----------------| |
>>>      |                                  |                                                       |                  |
>>>      |                                  |                                                       |                  |
>>>      |                                  |------------------------------+------------------------|                  |
>>>      |                                                                 |                                           |
>>>      |                                                                 |                                           |
>>>      |                                                   --------------------------                                |
>>>      |                                                    | M1 |   | M2 |   | M3 |                                 |
>>>      |                                                   --------------------------                                |
>>>      |                                                                                                             |
>>>      | HOST                                                                                                        |
>>>      ---------------------------------------------------------------------------------------------------------------
>>>
>>> # POC code
>>>
>>>     Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
>>>     Qemu:   https://github.com/fengidri/qemu/commits/ism
>>>
>>> If there are any problems, please point them out.
>>>
>>> Hope to hear from you, thank you.
>>>
>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
>>> [2] https://dl.acm.org/doi/10.1145/2847562
>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
>>> [4] https://lwn.net/Articles/711071/
>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
>>>
>>>
>>> Xuan Zhuo (2):
>>>    Reserve device id for ISM device
>>>    virtio-ism: introduce new device virtio-ism
>>>
>>>   content.tex    |   3 +
>>>   virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 343 insertions(+)
>>>   create mode 100644 virtio-ism.tex
>>>
>>> --
>>> 2.32.0.3.g01195cf9f
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-17  7:47 [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device Xuan Zhuo
                   ` (2 preceding siblings ...)
  2022-10-17  8:17 ` [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device Jason Wang
@ 2022-10-18  7:32 ` Jan Kiszka
  2022-11-14 21:30   ` Jan Kiszka
  3 siblings, 1 reply; 61+ messages in thread
From: Jan Kiszka @ 2022-10-18  7:32 UTC (permalink / raw)
  To: Xuan Zhuo, virtio-dev
  Cc: hans, herongguang, zmlcc, dust.li, tonylu, zhenzao, helinguo,
	gerry, mst, cohuck, jasowang

On 17.10.22 09:47, Xuan Zhuo wrote:
> Hello everyone,
> 
> # Background
> 
> Nowadays, there is a common scenario to accelerate communication between
> different VMs and containers, including light weight virtual machine based
> containers. One way to achieve this is to colocate them on the same host.
> However, the performance of inter-VM communication through network stack is not
> optimal and may also waste extra CPU cycles. This scenario has been discussed
> many times, but still no generic solution available [1] [2] [3].
> 
> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> We found that by changing the communication channel between VMs from TCP to SMC
> with shared memory, we can achieve superior performance for a common
> socket-based application[5]:
>   - latency reduced by about 50%
>   - throughput increased by about 300%
>   - CPU consumption reduced by about 50%
> 
> Since there is no particularly suitable shared memory management solution
> matches the need for SMC(See ## Comparison with existing technology), and virtio
> is the standard for communication in the virtualization world, we want to
> implement a virtio-ism device based on virtio, which can support on-demand
> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> the virtio-ism device need to support:
> 
> 1. Dynamic provision: shared memory regions are dynamically allocated and
>    provisioned.
> 2. Multi-region management: the shared memory is divided into regions,
>    and a peer may allocate one or more regions from the same shared memory
>    device.
> 3. Permission control: The permission of each region can be set seperately.
> 
> # Virtio ism device
> 
> ISM devices provide the ability to share memory between different guests on a
> host. A guest's memory got from ism device can be shared with multiple peers at
> the same time. This shared relationship can be dynamically created and released.
> 
> The shared memory obtained from the device is divided into multiple ism regions
> for share. ISM device provides a mechanism to notify other ism region referrers
> of content update events.
> 
> # Usage (SMC as example)
> 
> Maybe there is one of possible use cases:
> 
> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
>    location of a memory region in the PCI space and a token.
> 2. The ism driver mmap the memory region and return to SMC with the token
> 3. SMC passes the token to the connected peer
> 3. the peer calls the ism driver interface ism_attach_region(token) to
>    get the location of the PCI space of the shared memory
> 
> 
> # About hot plugging of the ism device
> 
>    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
>    less scalable operation. So, we don't plan to support it for now.
> 
> # Comparison with existing technology
> 
> ## ivshmem or ivshmem 2.0 of Qemu
> 
>    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
>    use this VM, so the security is not enough.
> 
>    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
>    other VMs that use the ivshmem 2.0 shared memory device, which also does not
>    meet our needs in terms of security.

This is addressed by establishing separate links between VMs (modeled
with separate devices). That is a trade-off between simplicity of the
model and convenience, for sure.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-18  6:54     ` Jason Wang
@ 2022-10-18  8:33       ` Gerry
  2022-10-19  3:55         ` Jason Wang
  2022-10-18  8:55       ` He Rongguang
  2022-10-19  6:43       ` Xuan Zhuo
  2 siblings, 1 reply; 61+ messages in thread
From: Gerry @ 2022-10-18  8:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu,
	zhenzao, helinguo, mst, cohuck, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 12813 bytes --]



> 2022年10月18日 14:54,Jason Wang <jasowang@redhat.com> 写道:
> 
> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com <mailto:xuanzhuo@linux.alibaba.com>> wrote:
>> 
>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>> Adding Stefan.
>>> 
>>> 
>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>> 
>>>> Hello everyone,
>>>> 
>>>> # Background
>>>> 
>>>> Nowadays, there is a common scenario to accelerate communication between
>>>> different VMs and containers, including light weight virtual machine based
>>>> containers. One way to achieve this is to colocate them on the same host.
>>>> However, the performance of inter-VM communication through network stack is not
>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
>>>> many times, but still no generic solution available [1] [2] [3].
>>>> 
>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
>>>> We found that by changing the communication channel between VMs from TCP to SMC
>>>> with shared memory, we can achieve superior performance for a common
>>>> socket-based application[5]:
>>>>  - latency reduced by about 50%
>>>>  - throughput increased by about 300%
>>>>  - CPU consumption reduced by about 50%
>>>> 
>>>> Since there is no particularly suitable shared memory management solution
>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
>>>> is the standard for communication in the virtualization world, we want to
>>>> implement a virtio-ism device based on virtio, which can support on-demand
>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
>>>> the virtio-ism device need to support:
>>>> 
>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
>>>>   provisioned.
>>>> 2. Multi-region management: the shared memory is divided into regions,
>>>>   and a peer may allocate one or more regions from the same shared memory
>>>>   device.
>>>> 3. Permission control: The permission of each region can be set seperately.
>>> 
>>> Looks like virtio-ROCE
>>> 
>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
>>> 
>>> and virtio-vhost-user can satisfy the requirement?
>>> 
>>>> 
>>>> # Virtio ism device
>>>> 
>>>> ISM devices provide the ability to share memory between different guests on a
>>>> host. A guest's memory got from ism device can be shared with multiple peers at
>>>> the same time. This shared relationship can be dynamically created and released.
>>>> 
>>>> The shared memory obtained from the device is divided into multiple ism regions
>>>> for share. ISM device provides a mechanism to notify other ism region referrers
>>>> of content update events.
>>>> 
>>>> # Usage (SMC as example)
>>>> 
>>>> Maybe there is one of possible use cases:
>>>> 
>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
>>>>   location of a memory region in the PCI space and a token.
>>>> 2. The ism driver mmap the memory region and return to SMC with the token
>>>> 3. SMC passes the token to the connected peer
>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
>>>>   get the location of the PCI space of the shared memory
>>>> 
>>>> 
>>>> # About hot plugging of the ism device
>>>> 
>>>>   Hot plugging of devices is a heavier, possibly failed, time-consuming, and
>>>>   less scalable operation. So, we don't plan to support it for now.
>>>> 
>>>> # Comparison with existing technology
>>>> 
>>>> ## ivshmem or ivshmem 2.0 of Qemu
>>>> 
>>>>   1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
>>>>   use this VM, so the security is not enough.
>>>> 
>>>>   2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
>>>>   other VMs that use the ivshmem 2.0 shared memory device, which also does not
>>>>   meet our needs in terms of security.
>>>> 
>>>> ## vhost-pci and virtiovhostuser
>>>> 
>>>>   Does not support dynamic allocation and therefore not suitable for SMC.
>>> 
>>> I think this is an implementation issue, we can support VHOST IOTLB
>>> message then the regions could be added/removed on demand.
>> 
>> 
>> 1. After the attacker connects with the victim, if the attacker does not
>>   dereference memory, the memory will be occupied under virtiovhostuser. In the
>>   case of ism devices, the victim can directly release the reference, and the
>>   maliciously referenced region only occupies the attacker's resources
> 
> Let's define the security boundary here. E.g do we trust the device or
> not? If yes, in the case of virtiovhostuser, can we simple do
> VHOST_IOTLB_UNMAP then we can safely release the memory from the
> attacker.
Thanks, Jason:)
In our the design, there are several roles involved:
1) a virtio-ism-smc front-end driver
2) a Virtio-ism backend device driver and its associated vmm
3) a global device manager
4) a group of remote/peer virtio-ism backend devices/vmms
5) a group of remote/peer virtio-ism-smc front-end drivers

Among which , we treat 1, 2 and 3 as trusted, 4 and 5 as untrusted.
Because 4 and 5 are trusted, we can’t guarantee that IOTLB Invalidate requests have been executed as expected.
Say when disconnecting an SMC connection, a malicious peer may ignore the IOTLB invalidation request and keep access the shared memory region.

We have considered the IOTLB based design but encountered several issues:
1) It depends on the way to provision guest vm memory. We need a memory resource descriptor to support vhost-user IOTLB messages, thus can’t support anonymous memory based vm.
2) Lack of fine-grain access control of memory resource descriptor. When send a memory resource descriptor to an untrusted peer, we can’t enforce region based access control. Memfd supports file level seal operations, but still lack of region based permission control. Hugetlbfs based fd doesn’t support seal at all.
3) Lack of reliable way to reclaim granted access permissions from untrusted peers, as stated above.
4) How implement resource accounting. Say a vm has shared some memory regions from peers, and those peers exited unexpectedly, then those shared memory will be accounted to the victim vm, and may cause unexpected OOM.

Based on the above consideration, we adopted another design and introduced the device manager to solve above issues:
1) the device manager is the owner of memory buffers.
2) the device manager creates a memfd for each memory buffer/region, and configure SEALs according to requested access permissions.
3) When a guest vm reclaims a shared memory buffer, the device manager will provision a new memfd to the guest vm. And it will take the responsibility to reclaim the old buffer from peer and eventually release the old buffer.
4) Simplify the control communication channel. Every guest vm only needs to talk with the device manager and no need to discover and communicate with other peers.

Thanks,
Gerry

> 
>> 
>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
>>   time, which is a challenge for virtiovhostuser
> 
> Please elaborate more the the challenges, anything make
> virtiovhostuser different?
> 
>> 
>> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
>>   determines the sharing relationship at startup.
> 
> Not necessarily with IOTLB API?
> 
>> 
>> 4. For security issues, the device under virtiovhostuser may mmap more memory,
>>   while ism only maps one region to other devices
> 
> With VHOST_IOTLB_MAP, the map could be done per region.
> 
> Thanks
> 
>> 
>> Thanks.
>> 
>>> 
>>> Thanks
>>> 
>>>> 
>>>> # Design
>>>> 
>>>>   This is a structure diagram based on ism sharing between two vms.
>>>> 
>>>>    |-------------------------------------------------------------------------------------------------------------|
>>>>    | |------------------------------------------------|       |------------------------------------------------| |
>>>>    | | Guest                                          |       | Guest                                          | |
>>>>    | |                                                |       |                                                | |
>>>>    | |   ----------------                             |       |   ----------------                             | |
>>>>    | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
>>>>    | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
>>>>    | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
>>>>    | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>>>    | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>>>    | |                                |               |       |                               |                | |
>>>>    | |                                |               |       |                               |                | |
>>>>    | | Qemu                           |               |       | Qemu                          |                | |
>>>>    | |--------------------------------+---------------|       |-------------------------------+----------------| |
>>>>    |                                  |                                                       |                  |
>>>>    |                                  |                                                       |                  |
>>>>    |                                  |------------------------------+------------------------|                  |
>>>>    |                                                                 |                                           |
>>>>    |                                                                 |                                           |
>>>>    |                                                   --------------------------                                |
>>>>    |                                                    | M1 |   | M2 |   | M3 |                                 |
>>>>    |                                                   --------------------------                                |
>>>>    |                                                                                                             |
>>>>    | HOST                                                                                                        |
>>>>    ---------------------------------------------------------------------------------------------------------------
>>>> 
>>>> # POC code
>>>> 
>>>>   Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
>>>>   Qemu:   https://github.com/fengidri/qemu/commits/ism
>>>> 
>>>> If there are any problems, please point them out.
>>>> 
>>>> Hope to hear from you, thank you.
>>>> 
>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
>>>> [2] https://dl.acm.org/doi/10.1145/2847562
>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
>>>> [4] https://lwn.net/Articles/711071/
>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
>>>> 
>>>> 
>>>> Xuan Zhuo (2):
>>>>  Reserve device id for ISM device
>>>>  virtio-ism: introduce new device virtio-ism
>>>> 
>>>> content.tex    |   3 +
>>>> virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 2 files changed, 343 insertions(+)
>>>> create mode 100644 virtio-ism.tex
>>>> 
>>>> --
>>>> 2.32.0.3.g01195cf9f
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>> 
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org <mailto:virtio-dev-unsubscribe@lists.oasis-open.org>
>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org <mailto:virtio-dev-help@lists.oasis-open.org>

[-- Attachment #2: Type: text/html, Size: 38738 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-18  6:54     ` Jason Wang
  2022-10-18  8:33       ` Gerry
@ 2022-10-18  8:55       ` He Rongguang
  2022-10-19  4:16         ` Jason Wang
  2022-10-19  6:43       ` Xuan Zhuo
  2 siblings, 1 reply; 61+ messages in thread
From: He Rongguang @ 2022-10-18  8:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtio-dev, hans, zmlcc, dust.li, tonylu, zhenzao, helinguo,
	gerry, mst, cohuck, Stefan Hajnoczi, Xuan Zhuo



在 2022/10/18 14:54, Jason Wang 写道:
> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>
>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>> Adding Stefan.
>>>
>>>
>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>
>>>> Hello everyone,
>>>>
>>>> # Background
>>>>
>>>> Nowadays, there is a common scenario to accelerate communication between
>>>> different VMs and containers, including light weight virtual machine based
>>>> containers. One way to achieve this is to colocate them on the same host.
>>>> However, the performance of inter-VM communication through network stack is not
>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
>>>> many times, but still no generic solution available [1] [2] [3].
>>>>
>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
>>>> We found that by changing the communication channel between VMs from TCP to SMC
>>>> with shared memory, we can achieve superior performance for a common
>>>> socket-based application[5]:
>>>>    - latency reduced by about 50%
>>>>    - throughput increased by about 300%
>>>>    - CPU consumption reduced by about 50%
>>>>
>>>> Since there is no particularly suitable shared memory management solution
>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
>>>> is the standard for communication in the virtualization world, we want to
>>>> implement a virtio-ism device based on virtio, which can support on-demand
>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
>>>> the virtio-ism device need to support:
>>>>
>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
>>>>     provisioned.
>>>> 2. Multi-region management: the shared memory is divided into regions,
>>>>     and a peer may allocate one or more regions from the same shared memory
>>>>     device.
>>>> 3. Permission control: The permission of each region can be set seperately.
>>>
>>> Looks like virtio-ROCE
>>>
>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
>>>
>>> and virtio-vhost-user can satisfy the requirement?
>>>
>>>>
>>>> # Virtio ism device
>>>>
>>>> ISM devices provide the ability to share memory between different guests on a
>>>> host. A guest's memory got from ism device can be shared with multiple peers at
>>>> the same time. This shared relationship can be dynamically created and released.
>>>>
>>>> The shared memory obtained from the device is divided into multiple ism regions
>>>> for share. ISM device provides a mechanism to notify other ism region referrers
>>>> of content update events.
>>>>
>>>> # Usage (SMC as example)
>>>>
>>>> Maybe there is one of possible use cases:
>>>>
>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
>>>>     location of a memory region in the PCI space and a token.
>>>> 2. The ism driver mmap the memory region and return to SMC with the token
>>>> 3. SMC passes the token to the connected peer
>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
>>>>     get the location of the PCI space of the shared memory
>>>>
>>>>
>>>> # About hot plugging of the ism device
>>>>
>>>>     Hot plugging of devices is a heavier, possibly failed, time-consuming, and
>>>>     less scalable operation. So, we don't plan to support it for now.
>>>>
>>>> # Comparison with existing technology
>>>>
>>>> ## ivshmem or ivshmem 2.0 of Qemu
>>>>
>>>>     1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
>>>>     use this VM, so the security is not enough.
>>>>
>>>>     2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
>>>>     other VMs that use the ivshmem 2.0 shared memory device, which also does not
>>>>     meet our needs in terms of security.
>>>>
>>>> ## vhost-pci and virtiovhostuser
>>>>
>>>>     Does not support dynamic allocation and therefore not suitable for SMC.
>>>
>>> I think this is an implementation issue, we can support VHOST IOTLB
>>> message then the regions could be added/removed on demand.
>>
>>
>> 1. After the attacker connects with the victim, if the attacker does not
>>     dereference memory, the memory will be occupied under virtiovhostuser. In the
>>     case of ism devices, the victim can directly release the reference, and the
>>     maliciously referenced region only occupies the attacker's resources
> 
> Let's define the security boundary here. E.g do we trust the device or
> not? If yes, in the case of virtiovhostuser, can we simple do
> VHOST_IOTLB_UNMAP then we can safely release the memory from the
> attacker.
> 
>>
>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
>>     time, which is a challenge for virtiovhostuser
> 
> Please elaborate more the the challenges, anything make
> virtiovhostuser different?

Hi, besides that, I think there's another distinctive difference between 
virtio-ism+smc and virtiovhostuser: in virtiovhostuser, one end is 
frontend(virtio-net device), the other end is vhost backend, thus it's 
one frontend to one backend model, whereas in our business scenario, we 
need a dynamically network communication model, in which one end that 
runs for a long time may connect and communicate to a just booted VM, 
i.e., each end is equal, thus there are no frontend or vhost backend 
roles as in vhost, and each end may appear and disappear dynamically, 
not provisioned in advance.

> 
>>
>> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
>>     determines the sharing relationship at startup.
> 
> Not necessarily with IOTLB API?
> 
>>
>> 4. For security issues, the device under virtiovhostuser may mmap more memory,
>>     while ism only maps one region to other devices
> 
> With VHOST_IOTLB_MAP, the map could be done per region.
> 
> Thanks
> 
>>
>> Thanks.
>>
>>>
>>> Thanks
>>>
>>>>
>>>> # Design
>>>>
>>>>     This is a structure diagram based on ism sharing between two vms.
>>>>
>>>>      |-------------------------------------------------------------------------------------------------------------|
>>>>      | |------------------------------------------------|       |------------------------------------------------| |
>>>>      | | Guest                                          |       | Guest                                          | |
>>>>      | |                                                |       |                                                | |
>>>>      | |   ----------------                             |       |   ----------------                             | |
>>>>      | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
>>>>      | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
>>>>      | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
>>>>      | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
>>>>      | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>>>      | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
>>>>      | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>>>      | |                                |               |       |                               |                | |
>>>>      | |                                |               |       |                               |                | |
>>>>      | | Qemu                           |               |       | Qemu                          |                | |
>>>>      | |--------------------------------+---------------|       |-------------------------------+----------------| |
>>>>      |                                  |                                                       |                  |
>>>>      |                                  |                                                       |                  |
>>>>      |                                  |------------------------------+------------------------|                  |
>>>>      |                                                                 |                                           |
>>>>      |                                                                 |                                           |
>>>>      |                                                   --------------------------                                |
>>>>      |                                                    | M1 |   | M2 |   | M3 |                                 |
>>>>      |                                                   --------------------------                                |
>>>>      |                                                                                                             |
>>>>      | HOST                                                                                                        |
>>>>      ---------------------------------------------------------------------------------------------------------------
>>>>
>>>> # POC code
>>>>
>>>>     Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
>>>>     Qemu:   https://github.com/fengidri/qemu/commits/ism
>>>>
>>>> If there are any problems, please point them out.
>>>>
>>>> Hope to hear from you, thank you.
>>>>
>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
>>>> [2] https://dl.acm.org/doi/10.1145/2847562
>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
>>>> [4] https://lwn.net/Articles/711071/
>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
>>>>
>>>>
>>>> Xuan Zhuo (2):
>>>>    Reserve device id for ISM device
>>>>    virtio-ism: introduce new device virtio-ism
>>>>
>>>>   content.tex    |   3 +
>>>>   virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>   2 files changed, 343 insertions(+)
>>>>   create mode 100644 virtio-ism.tex
>>>>
>>>> --
>>>> 2.32.0.3.g01195cf9f
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-17  8:17 ` [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device Jason Wang
  2022-10-17 12:26   ` Xuan Zhuo
  2022-10-18  3:15   ` dust.li
@ 2022-10-19  2:34   ` Xuan Zhuo
  2022-10-19  3:56     ` Jason Wang
  2 siblings, 1 reply; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-19  2:34 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:


Hi Jason,

I think there may be some problems with the direction we are discussing. Our
goal is to add an new ism device. As far as the spec is concerned, we are not
concerned with the implementation of the backend.

The direction we should discuss is what is the difference between the ism device
and other devices such as virtio-net, and whether it is necessary to introduce
this new device. How to share the backend with other deivce is another problem.

Our goal is to dynamically obtain a piece of memory to share with other vms.

In a connection, this memory will be used repeatedly. As far as SMC is concerned,
it will use it as a ring. Of course, we also need a notify mechanism.

That's what we're aiming for, so we should first discuss whether this
requirement is reasonable. I think it's a feature currently not supported by
other devices specified by the current virtio spce.

Thanks.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-18  8:33       ` Gerry
@ 2022-10-19  3:55         ` Jason Wang
  2022-10-19  5:29           ` Gerry
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Wang @ 2022-10-19  3:55 UTC (permalink / raw)
  To: Gerry
  Cc: Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu,
	zhenzao, helinguo, mst, cohuck, Stefan Hajnoczi


在 2022/10/18 16:33, Gerry 写道:
>
>
>> 2022年10月18日 14:54,Jason Wang <jasowang@redhat.com> 写道:
>>
>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo 
>> <xuanzhuo@linux.alibaba.com> wrote:
>>>
>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> 
>>> wrote:
>>>> Adding Stefan.
>>>>
>>>>
>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo 
>>>> <xuanzhuo@linux.alibaba.com> wrote:
>>>>>
>>>>> Hello everyone,
>>>>>
>>>>> # Background
>>>>>
>>>>> Nowadays, there is a common scenario to accelerate communication 
>>>>> between
>>>>> different VMs and containers, including light weight virtual 
>>>>> machine based
>>>>> containers. One way to achieve this is to colocate them on the 
>>>>> same host.
>>>>> However, the performance of inter-VM communication through network 
>>>>> stack is not
>>>>> optimal and may also waste extra CPU cycles. This scenario has 
>>>>> been discussed
>>>>> many times, but still no generic solution available [1] [2] [3].
>>>>>
>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based 
>>>>> PoC[5],
>>>>> We found that by changing the communication channel between VMs 
>>>>> from TCP to SMC
>>>>> with shared memory, we can achieve superior performance for a common
>>>>> socket-based application[5]:
>>>>>  - latency reduced by about 50%
>>>>>  - throughput increased by about 300%
>>>>>  - CPU consumption reduced by about 50%
>>>>>
>>>>> Since there is no particularly suitable shared memory management 
>>>>> solution
>>>>> matches the need for SMC(See ## Comparison with existing 
>>>>> technology), and virtio
>>>>> is the standard for communication in the virtualization world, we 
>>>>> want to
>>>>> implement a virtio-ism device based on virtio, which can support 
>>>>> on-demand
>>>>> memory sharing across VMs, containers or VM-container. To match 
>>>>> the needs of SMC,
>>>>> the virtio-ism device need to support:
>>>>>
>>>>> 1. Dynamic provision: shared memory regions are dynamically 
>>>>> allocated and
>>>>>   provisioned.
>>>>> 2. Multi-region management: the shared memory is divided into regions,
>>>>>   and a peer may allocate one or more regions from the same shared 
>>>>> memory
>>>>>   device.
>>>>> 3. Permission control: The permission of each region can be set 
>>>>> seperately.
>>>>
>>>> Looks like virtio-ROCE
>>>>
>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
>>>>
>>>> and virtio-vhost-user can satisfy the requirement?
>>>>
>>>>>
>>>>> # Virtio ism device
>>>>>
>>>>> ISM devices provide the ability to share memory between different 
>>>>> guests on a
>>>>> host. A guest's memory got from ism device can be shared with 
>>>>> multiple peers at
>>>>> the same time. This shared relationship can be dynamically created 
>>>>> and released.
>>>>>
>>>>> The shared memory obtained from the device is divided into 
>>>>> multiple ism regions
>>>>> for share. ISM device provides a mechanism to notify other ism 
>>>>> region referrers
>>>>> of content update events.
>>>>>
>>>>> # Usage (SMC as example)
>>>>>
>>>>> Maybe there is one of possible use cases:
>>>>>
>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to 
>>>>> return the
>>>>>   location of a memory region in the PCI space and a token.
>>>>> 2. The ism driver mmap the memory region and return to SMC with 
>>>>> the token
>>>>> 3. SMC passes the token to the connected peer
>>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
>>>>>   get the location of the PCI space of the shared memory
>>>>>
>>>>>
>>>>> # About hot plugging of the ism device
>>>>>
>>>>>   Hot plugging of devices is a heavier, possibly failed, 
>>>>> time-consuming, and
>>>>>   less scalable operation. So, we don't plan to support it for now.
>>>>>
>>>>> # Comparison with existing technology
>>>>>
>>>>> ## ivshmem or ivshmem 2.0 of Qemu
>>>>>
>>>>>   1. ivshmem 1.0 is a large piece of memory that can be seen by 
>>>>> all devices that
>>>>>   use this VM, so the security is not enough.
>>>>>
>>>>>   2. ivshmem 2.0 is a shared memory belonging to a VM that can be 
>>>>> read-only by all
>>>>>   other VMs that use the ivshmem 2.0 shared memory device, which 
>>>>> also does not
>>>>>   meet our needs in terms of security.
>>>>>
>>>>> ## vhost-pci and virtiovhostuser
>>>>>
>>>>>   Does not support dynamic allocation and therefore not suitable 
>>>>> for SMC.
>>>>
>>>> I think this is an implementation issue, we can support VHOST IOTLB
>>>> message then the regions could be added/removed on demand.
>>>
>>>
>>> 1. After the attacker connects with the victim, if the attacker does not
>>>   dereference memory, the memory will be occupied under 
>>> virtiovhostuser. In the
>>>   case of ism devices, the victim can directly release the 
>>> reference, and the
>>>   maliciously referenced region only occupies the attacker's resources
>>
>> Let's define the security boundary here. E.g do we trust the device or
>> not? If yes, in the case of virtiovhostuser, can we simple do
>> VHOST_IOTLB_UNMAP then we can safely release the memory from the
>> attacker.
> Thanks, Jason:)
> In our the design, there are several roles involved:
> 1) a virtio-ism-smc front-end driver
> 2) a Virtio-ism backend device driver and its associated vmm
> 3) a global device manager
> 4) a group of remote/peer virtio-ism backend devices/vmms
> 5) a group of remote/peer virtio-ism-smc front-end drivers
>
> Among which , we treat 1, 2 and 3 as trusted, 4 and 5 as untrusted.


It looks to me VIRTIO_ISM_PERM_MANAGE violates what you've described 
here. E.g what happens if 1 grant this permission to 5?


> Because 4 and 5 are trusted, we can’t guarantee that IOTLB Invalidate 
> requests have been executed as expected.


Interesting, I wonder how this is guaranteed by ISM. Anything that can 
work for ISM but not IOTLB? Note that the only difference for me is the 
device API. We can hook anything that works for ISM to IOTLB.


> Say when disconnecting an SMC connection, a malicious peer may ignore 
> the IOTLB invalidation request and keep access the shared memory region.
>
> We have considered the IOTLB based design but encountered several issues:
> 1) It depends on the way to provision guest vm memory. We need a 
> memory resource descriptor to support vhost-user IOTLB messages, thus 
> can’t support anonymous memory based vm.


Hypervisor (Qemu) is free to hook IOTLB message to any kind of memory 
backend, isn't? E.g Qemu can choose to implement IOTLB by its own 
instead of forwarding it to another VM.


> 2) Lack of fine-grain access control of memory resource descriptor. 
> When send a memory resource descriptor to an untrusted peer, we can’t 
> enforce region based access control. Memfd supports file level seal 
> operations, but still lack of region based permission control. 
> Hugetlbfs based fd doesn’t support seal at all.


So in the above, you said 4 and 5 are untrusted. If yes how you can 
enforce regioned based access control (the memory is still mapped by the 
untrsuted VMM)? And again, virtio-vhost-user is not limited to 
memfd/hugetlbfs, it can do want you've done in your protoype (hooking to 
/dev/shm).


> 3) Lack of reliable way to reclaim granted access permissions from 
> untrusted peers, as stated above.


It would be better to explain how this "reclaim" works.


> 4) How implement resource accounting. Say a vm has shared some memory 
> regions from peers, and those peers exited unexpectedly, then those 
> shared memory will be accounted to the victim vm, and may cause 
> unexpected OOM.
>
> Based on the above consideration, we adopted another design and 
> introduced the device manager to solve above issues:
> 1) the device manager is the owner of memory buffers.


I don't see the definition "device manager" in your proposal, this needs 
to be clarified in both the spec and the changelog or the cover letter.


> 2) the device manager creates a memfd for each memory buffer/region, 
> and configure SEALs according to requested access permissions.


Ok, but this seems not what you've implemented in your qemu prototype?


> 3) When a guest vm reclaims a shared memory buffer, the device manager 
> will provision a new memfd to the guest vm.


How can this be done for the untrusted peers?


> And it will take the responsibility to reclaim the old buffer from 
> peer and eventually release the old buffer.
> 4) Simplify the control communication channel. Every guest vm only 
> needs to talk with the device manager and no need to discover and 
> communicate with other peers.


Not sure but it's better not mandate any model in the application layer.

Thanks


>
> Thanks,
> Gerry
>
>>
>>>
>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at 
>>> the same
>>>   time, which is a challenge for virtiovhostuser
>>
>> Please elaborate more the the challenges, anything make
>> virtiovhostuser different?
>>
>>>
>>> 3. The sharing relationship of ism is dynamically increased, and 
>>> virtiovhostuser
>>>   determines the sharing relationship at startup.
>>
>> Not necessarily with IOTLB API?
>>
>>>
>>> 4. For security issues, the device under virtiovhostuser may mmap 
>>> more memory,
>>>   while ism only maps one region to other devices
>>
>> With VHOST_IOTLB_MAP, the map could be done per region.
>>
>> Thanks
>>
>>>
>>> Thanks.
>>>
>>>>
>>>> Thanks
>>>>
>>>>>
>>>>> # Design
>>>>>
>>>>>   This is a structure diagram based on ism sharing between two vms.
>>>>>
>>>>>    |-------------------------------------------------------------------------------------------------------------|
>>>>>    | |------------------------------------------------| 
>>>>>       |------------------------------------------------| |
>>>>>    | | Guest                                          |       | 
>>>>> Guest                                          | |
>>>>>    | |                                                |       | 
>>>>>                                                | |
>>>>>    | |   ----------------                             |       | 
>>>>>   ----------------                             | |
>>>>>    | |   |    driver    |     [M1]   [M2]   [M3]      |       | 
>>>>>   |    driver    |             [M2]   [M3]     | |
>>>>>    | |   ----------------       |      |      |       |       | 
>>>>>   ----------------               |      |      | |
>>>>>    | |    |cq|                  |map   |map   |map    |       | 
>>>>>    |cq|                          |map   |map   | |
>>>>>    | |    |  |                  |      |      |       |       | 
>>>>>    |  |                          |      |      | |
>>>>>    | |    |  |                -------------------     |       | 
>>>>>    |  |                --------------------    | |
>>>>>    | |----|--|----------------|  device memory  |-----| 
>>>>>       |----|--|----------------|  device memory   |----| |
>>>>>    | |    |  |                -------------------     |       | 
>>>>>    |  |                --------------------    | |
>>>>>    | |                                |               |       | 
>>>>>                               |                | |
>>>>>    | |                                |               |       | 
>>>>>                               |                | |
>>>>>    | | Qemu                           |               |       | 
>>>>> Qemu                          |                | |
>>>>>    | |--------------------------------+---------------| 
>>>>>       |-------------------------------+----------------| |
>>>>>    |                                  | 
>>>>>                                                       | 
>>>>>                  |
>>>>>    |                                  | 
>>>>>                                                       | 
>>>>>                  |
>>>>>    | 
>>>>>                                  |------------------------------+------------------------| 
>>>>>                  |
>>>>>    | 
>>>>>                                                                 | 
>>>>>                                           |
>>>>>    | 
>>>>>                                                                 | 
>>>>>                                           |
>>>>>    | 
>>>>>                                                   -------------------------- 
>>>>>                                |
>>>>>    |                                                    | M1 |   | 
>>>>> M2 |   | M3 |                                 |
>>>>>    | 
>>>>>                                                   -------------------------- 
>>>>>                                |
>>>>>    | 
>>>>>                                                                                                             |
>>>>>    | HOST 
>>>>>                                                                                                        |
>>>>>    ---------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> # POC code
>>>>>
>>>>>   Kernel: 
>>>>> https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
>>>>>   Qemu: https://github.com/fengidri/qemu/commits/ism
>>>>>
>>>>> If there are any problems, please point them out.
>>>>>
>>>>> Hope to hear from you, thank you.
>>>>>
>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
>>>>> [2] https://dl.acm.org/doi/10.1145/2847562
>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
>>>>> [4] https://lwn.net/Articles/711071/
>>>>> [5] 
>>>>> https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
>>>>>
>>>>>
>>>>> Xuan Zhuo (2):
>>>>>  Reserve device id for ISM device
>>>>>  virtio-ism: introduce new device virtio-ism
>>>>>
>>>>> content.tex    |   3 +
>>>>> virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> 2 files changed, 343 insertions(+)
>>>>> create mode 100644 virtio-ism.tex
>>>>>
>>>>> --
>>>>> 2.32.0.3.g01195cf9f
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: 
>>>>> virtio-dev-unsubscribe@lists.oasis-open.org 
>>>>> <mailto:virtio-dev-unsubscribe@lists.oasis-open.org>
>>>>> For additional commands, e-mail: 
>>>>> virtio-dev-help@lists.oasis-open.org 
>>>>> <mailto:virtio-dev-help@lists.oasis-open.org>
>>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:virtio-dev-unsubscribe@lists.oasis-open.org
>>> For additional commands, e-mail:virtio-dev-help@lists.oasis-open.org
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  2:34   ` Xuan Zhuo
@ 2022-10-19  3:56     ` Jason Wang
  2022-10-19  4:08       ` Xuan Zhuo
  2022-10-19  4:30       ` Xuan Zhuo
  0 siblings, 2 replies; 61+ messages in thread
From: Jason Wang @ 2022-10-19  3:56 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
>
>
> Hi Jason,
>
> I think there may be some problems with the direction we are discussing.

Probably not.

As far as we are focusing on technology, there's nothing wrong from my
perspective. And this is how the community works. Your idea needs to
be justified and people are free to raise any technical questions
especially considering you've posted a spec change with prototype
codes but not only the idea.

> Our
> goal is to add an new ism device. As far as the spec is concerned, we are not
> concerned with the implementation of the backend.
>
> The direction we should discuss is what is the difference between the ism device
> and other devices such as virtio-net, and whether it is necessary to introduce
> this new device.

This is somehow what I want to ask, actually it's not a comparison
with virtio-net but:

- virtio-roce
- virtio-vhost-user
- virtio-(p)mem

or whether we can simply add features to those devices to achieve what
you want to do here.

> How to share the backend with other deivce is another problem.

Yes, anything that is used for your virito-ism prototype can be used
for other devices.

>
> Our goal is to dynamically obtain a piece of memory to share with other vms.

So at this level, I don't see the exact difference compared to
virtio-vhost-user. Let's just focus on the API that carries on the
semantic:

- map/unmap
- permission update

The only missing piece is the per region notification.

>
> In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> it will use it as a ring. Of course, we also need a notify mechanism.
>
> That's what we're aiming for, so we should first discuss whether this
> requirement is reasonable.

So unless somebody said "no", it is fine until now.

> I think it's a feature currently not supported by
> other devices specified by the current virtio spce.

Probably, but we've already had rfcs for roce and vhost-user.

Thanks

>
> Thanks.
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  3:56     ` Jason Wang
@ 2022-10-19  4:08       ` Xuan Zhuo
  2022-10-19  4:36         ` Jason Wang
  2022-10-19  4:30       ` Xuan Zhuo
  1 sibling, 1 reply; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-19  4:08 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > Hi Jason,
> >
> > I think there may be some problems with the direction we are discussing.
>
> Probably not.
>
> As far as we are focusing on technology, there's nothing wrong from my
> perspective. And this is how the community works. Your idea needs to
> be justified and people are free to raise any technical questions
> especially considering you've posted a spec change with prototype
> codes but not only the idea.
>
> > Our
> > goal is to add an new ism device. As far as the spec is concerned, we are not
> > concerned with the implementation of the backend.
> >
> > The direction we should discuss is what is the difference between the ism device
> > and other devices such as virtio-net, and whether it is necessary to introduce
> > this new device.
>
> This is somehow what I want to ask, actually it's not a comparison
> with virtio-net but:
>
> - virtio-roce
> - virtio-vhost-user
> - virtio-(p)mem
>
> or whether we can simply add features to those devices to achieve what
> you want to do here.


Yes, this is my priority to discuss.

At the moment, I think the most similar to ism is the Vhost-user Device Backend
of virtio-vhost-user.

My understanding of it is to map any virtio device to another vm as a vvu
device.

From this design purpose, I think the two are different.

Of course, you might want to extend it, it does have some similarities and uses
a lot of similar techniques. So we can really discuss in this direction, whether
the vvu device can be extended to achieve the purpose of ism, or whether the
design goals can be agreed.

Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
Should device/driver APIs remain independent?

Thanks.


>
> > How to share the backend with other deivce is another problem.
>
> Yes, anything that is used for your virito-ism prototype can be used
> for other devices.
>
> >
> > Our goal is to dynamically obtain a piece of memory to share with other vms.
>
> So at this level, I don't see the exact difference compared to
> virtio-vhost-user. Let's just focus on the API that carries on the
> semantic:
>
> - map/unmap
> - permission update
>
> The only missing piece is the per region notification.
>
> >
> > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> > it will use it as a ring. Of course, we also need a notify mechanism.
> >
> > That's what we're aiming for, so we should first discuss whether this
> > requirement is reasonable.
>
> So unless somebody said "no", it is fine until now.
>
> > I think it's a feature currently not supported by
> > other devices specified by the current virtio spce.
>
> Probably, but we've already had rfcs for roce and vhost-user.
>
> Thanks
>
> >
> > Thanks.
> >
> >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-18  8:55       ` He Rongguang
@ 2022-10-19  4:16         ` Jason Wang
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Wang @ 2022-10-19  4:16 UTC (permalink / raw)
  To: He Rongguang
  Cc: virtio-dev, hans, zmlcc, dust.li, tonylu, zhenzao, helinguo,
	gerry, mst, cohuck, Stefan Hajnoczi, Xuan Zhuo


在 2022/10/18 16:55, He Rongguang 写道:
>
>
> 在 2022/10/18 14:54, Jason Wang 写道:
>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo 
>> <xuanzhuo@linux.alibaba.com> wrote:
>>>
>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> 
>>> wrote:
>>>> Adding Stefan.
>>>>
>>>>
>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo 
>>>> <xuanzhuo@linux.alibaba.com> wrote:
>>>>>
>>>>> Hello everyone,
>>>>>
>>>>> # Background
>>>>>
>>>>> Nowadays, there is a common scenario to accelerate communication 
>>>>> between
>>>>> different VMs and containers, including light weight virtual 
>>>>> machine based
>>>>> containers. One way to achieve this is to colocate them on the 
>>>>> same host.
>>>>> However, the performance of inter-VM communication through network 
>>>>> stack is not
>>>>> optimal and may also waste extra CPU cycles. This scenario has 
>>>>> been discussed
>>>>> many times, but still no generic solution available [1] [2] [3].
>>>>>
>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based 
>>>>> PoC[5],
>>>>> We found that by changing the communication channel between VMs 
>>>>> from TCP to SMC
>>>>> with shared memory, we can achieve superior performance for a common
>>>>> socket-based application[5]:
>>>>>    - latency reduced by about 50%
>>>>>    - throughput increased by about 300%
>>>>>    - CPU consumption reduced by about 50%
>>>>>
>>>>> Since there is no particularly suitable shared memory management 
>>>>> solution
>>>>> matches the need for SMC(See ## Comparison with existing 
>>>>> technology), and virtio
>>>>> is the standard for communication in the virtualization world, we 
>>>>> want to
>>>>> implement a virtio-ism device based on virtio, which can support 
>>>>> on-demand
>>>>> memory sharing across VMs, containers or VM-container. To match 
>>>>> the needs of SMC,
>>>>> the virtio-ism device need to support:
>>>>>
>>>>> 1. Dynamic provision: shared memory regions are dynamically 
>>>>> allocated and
>>>>>     provisioned.
>>>>> 2. Multi-region management: the shared memory is divided into 
>>>>> regions,
>>>>>     and a peer may allocate one or more regions from the same 
>>>>> shared memory
>>>>>     device.
>>>>> 3. Permission control: The permission of each region can be set 
>>>>> seperately.
>>>>
>>>> Looks like virtio-ROCE
>>>>
>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/ 
>>>>
>>>>
>>>> and virtio-vhost-user can satisfy the requirement?
>>>>
>>>>>
>>>>> # Virtio ism device
>>>>>
>>>>> ISM devices provide the ability to share memory between different 
>>>>> guests on a
>>>>> host. A guest's memory got from ism device can be shared with 
>>>>> multiple peers at
>>>>> the same time. This shared relationship can be dynamically created 
>>>>> and released.
>>>>>
>>>>> The shared memory obtained from the device is divided into 
>>>>> multiple ism regions
>>>>> for share. ISM device provides a mechanism to notify other ism 
>>>>> region referrers
>>>>> of content update events.
>>>>>
>>>>> # Usage (SMC as example)
>>>>>
>>>>> Maybe there is one of possible use cases:
>>>>>
>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to 
>>>>> return the
>>>>>     location of a memory region in the PCI space and a token.
>>>>> 2. The ism driver mmap the memory region and return to SMC with 
>>>>> the token
>>>>> 3. SMC passes the token to the connected peer
>>>>> 3. the peer calls the ism driver interface 
>>>>> ism_attach_region(token) to
>>>>>     get the location of the PCI space of the shared memory
>>>>>
>>>>>
>>>>> # About hot plugging of the ism device
>>>>>
>>>>>     Hot plugging of devices is a heavier, possibly failed, 
>>>>> time-consuming, and
>>>>>     less scalable operation. So, we don't plan to support it for now.
>>>>>
>>>>> # Comparison with existing technology
>>>>>
>>>>> ## ivshmem or ivshmem 2.0 of Qemu
>>>>>
>>>>>     1. ivshmem 1.0 is a large piece of memory that can be seen by 
>>>>> all devices that
>>>>>     use this VM, so the security is not enough.
>>>>>
>>>>>     2. ivshmem 2.0 is a shared memory belonging to a VM that can 
>>>>> be read-only by all
>>>>>     other VMs that use the ivshmem 2.0 shared memory device, which 
>>>>> also does not
>>>>>     meet our needs in terms of security.
>>>>>
>>>>> ## vhost-pci and virtiovhostuser
>>>>>
>>>>>     Does not support dynamic allocation and therefore not suitable 
>>>>> for SMC.
>>>>
>>>> I think this is an implementation issue, we can support VHOST IOTLB
>>>> message then the regions could be added/removed on demand.
>>>
>>>
>>> 1. After the attacker connects with the victim, if the attacker does 
>>> not
>>>     dereference memory, the memory will be occupied under 
>>> virtiovhostuser. In the
>>>     case of ism devices, the victim can directly release the 
>>> reference, and the
>>>     maliciously referenced region only occupies the attacker's 
>>> resources
>>
>> Let's define the security boundary here. E.g do we trust the device or
>> not? If yes, in the case of virtiovhostuser, can we simple do
>> VHOST_IOTLB_UNMAP then we can safely release the memory from the
>> attacker.
>>
>>>
>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at 
>>> the same
>>>     time, which is a challenge for virtiovhostuser
>>
>> Please elaborate more the the challenges, anything make
>> virtiovhostuser different?
>
> Hi, besides that, I think there's another distinctive difference 
> between virtio-ism+smc and virtiovhostuser: in virtiovhostuser, one 
> end is frontend(virtio-net device), the other end is vhost backend, 
> thus it's one frontend to one backend model, whereas in our business 
> scenario, we need a dynamically network communication model, in which 
> one end that runs for a long time may connect and communicate to a 
> just booted VM, i.e., each end is equal, thus there are no frontend or 
> vhost backend roles as in vhost, and each end may appear and disappear 
> dynamically, not provisioned in advance.


Ok, please describe them in the changelog at least.

Note that what I want to say is, virtio-vhost-user could be tweaked to 
achieve the same goal. For the dynamic provision, it could be something 
like having a 0 for VHOST_IOTLB_UPDATE message (like what mmap()) works. 
I wonder if we can unify them.

Thanks


>
>>
>>>
>>> 3. The sharing relationship of ism is dynamically increased, and 
>>> virtiovhostuser
>>>     determines the sharing relationship at startup.
>>
>> Not necessarily with IOTLB API?
>>
>>>
>>> 4. For security issues, the device under virtiovhostuser may mmap 
>>> more memory,
>>>     while ism only maps one region to other devices
>>
>> With VHOST_IOTLB_MAP, the map could be done per region.
>>
>> Thanks
>>
>>>
>>> Thanks.
>>>
>>>>
>>>> Thanks
>>>>
>>>>>
>>>>> # Design
>>>>>
>>>>>     This is a structure diagram based on ism sharing between two vms.
>>>>>
>>>>> |-------------------------------------------------------------------------------------------------------------|
>>>>>      | |------------------------------------------------| 
>>>>> |------------------------------------------------| |
>>>>>      | | Guest |       | 
>>>>> Guest                                          | |
>>>>>      | | |       |                                                | |
>>>>>      | |   ---------------- |       |   
>>>>> ----------------                             | |
>>>>>      | |   |    driver    |     [M1]   [M2]   [M3] |       |   
>>>>> |    driver    |             [M2]   [M3]     | |
>>>>>      | |   ----------------       |      |      | |       |   
>>>>> ----------------               |      |      | |
>>>>>      | |    |cq|                  |map   |map   |map |       |    
>>>>> |cq|                          |map   |map   | |
>>>>>      | |    |  |                  |      |      | |       |    |  
>>>>> |                          |      |      | |
>>>>>      | |    |  |                ------------------- |       |    
>>>>> |  |                --------------------    | |
>>>>>      | |----|--|----------------|  device memory |-----|       
>>>>> |----|--|----------------|  device memory |----| |
>>>>>      | |    |  |                ------------------- |       |    
>>>>> |  |                --------------------    | |
>>>>>      | |                                | |       
>>>>> |                               |                | |
>>>>>      | |                                | |       
>>>>> |                               |                | |
>>>>>      | | Qemu                           | |       | 
>>>>> Qemu                          |                | |
>>>>>      | |--------------------------------+---------------| 
>>>>> |-------------------------------+----------------| |
>>>>>      | | |                  |
>>>>>      | | |                  |
>>>>>      | |------------------------------+------------------------| |
>>>>> | |                                           |
>>>>> | |                                           |
>>>>>      | -------------------------- |
>>>>>      | | M1 |   | M2 |   | M3 |                                 |
>>>>>      | -------------------------- |
>>>>> | |
>>>>>      | HOST |
>>>>> ---------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> # POC code
>>>>>
>>>>>     Kernel: 
>>>>> https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
>>>>>     Qemu:   https://github.com/fengidri/qemu/commits/ism
>>>>>
>>>>> If there are any problems, please point them out.
>>>>>
>>>>> Hope to hear from you, thank you.
>>>>>
>>>>> [1] 
>>>>> https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
>>>>> [2] https://dl.acm.org/doi/10.1145/2847562
>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
>>>>> [4] https://lwn.net/Articles/711071/
>>>>> [5] 
>>>>> https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
>>>>>
>>>>>
>>>>> Xuan Zhuo (2):
>>>>>    Reserve device id for ISM device
>>>>>    virtio-ism: introduce new device virtio-ism
>>>>>
>>>>>   content.tex    |   3 +
>>>>>   virtio-ism.tex | 340 
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>   2 files changed, 343 insertions(+)
>>>>>   create mode 100644 virtio-ism.tex
>>>>>
>>>>> -- 
>>>>> 2.32.0.3.g01195cf9f
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  3:56     ` Jason Wang
  2022-10-19  4:08       ` Xuan Zhuo
@ 2022-10-19  4:30       ` Xuan Zhuo
  2022-10-19  5:10         ` Jason Wang
  1 sibling, 1 reply; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-19  4:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > Hi Jason,
> >
> > I think there may be some problems with the direction we are discussing.
>
> Probably not.
>
> As far as we are focusing on technology, there's nothing wrong from my
> perspective. And this is how the community works. Your idea needs to
> be justified and people are free to raise any technical questions
> especially considering you've posted a spec change with prototype
> codes but not only the idea.
>
> > Our
> > goal is to add an new ism device. As far as the spec is concerned, we are not
> > concerned with the implementation of the backend.
> >
> > The direction we should discuss is what is the difference between the ism device
> > and other devices such as virtio-net, and whether it is necessary to introduce
> > this new device.
>
> This is somehow what I want to ask, actually it's not a comparison
> with virtio-net but:
>
> - virtio-roce
> - virtio-vhost-user
> - virtio-(p)mem
>
> or whether we can simply add features to those devices to achieve what
> you want to do here.
>
> > How to share the backend with other deivce is another problem.
>
> Yes, anything that is used for your virito-ism prototype can be used
> for other devices.
>
> >
> > Our goal is to dynamically obtain a piece of memory to share with other vms.
>
> So at this level, I don't see the exact difference compared to
> virtio-vhost-user. Let's just focus on the API that carries on the
> semantic:
>
> - map/unmap
> - permission update
>
> The only missing piece is the per region notification.



I want to know how we can share a region based on vvu:

|---------|       |---------------|
|         |       |               |
|  -----  |       |  -------      |
|  | ? |  |       |  | vvu |      |
|---------|       |---------------|
     |                  |
     |                  |
     |------------------|

Can you describe this process in the vvu scenario you are considering?


The flow of ism we consider is as follows:
    1. SMC calls the interface ism_alloc_region() of the ism driver to return the
       location of a memory region in the PCI space and a token.
    2. The ism driver mmap the memory region and return to SMC with the token
    3. SMC passes the token to the connected peer
    4. the peer calls the ism driver interface ism_attach_region(token) to
       get the location of the PCI space of the shared memory

Thanks.


>
> >
> > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> > it will use it as a ring. Of course, we also need a notify mechanism.
> >
> > That's what we're aiming for, so we should first discuss whether this
> > requirement is reasonable.
>
> So unless somebody said "no", it is fine until now.
>
> > I think it's a feature currently not supported by
> > other devices specified by the current virtio spce.
>
> Probably, but we've already had rfcs for roce and vhost-user.
>
> Thanks
>
> >
> > Thanks.
> >
> >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  4:08       ` Xuan Zhuo
@ 2022-10-19  4:36         ` Jason Wang
  2022-10-19  6:02           ` Xuan Zhuo
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Wang @ 2022-10-19  4:36 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Wed, Oct 19, 2022 at 12:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > Hi Jason,
> > >
> > > I think there may be some problems with the direction we are discussing.
> >
> > Probably not.
> >
> > As far as we are focusing on technology, there's nothing wrong from my
> > perspective. And this is how the community works. Your idea needs to
> > be justified and people are free to raise any technical questions
> > especially considering you've posted a spec change with prototype
> > codes but not only the idea.
> >
> > > Our
> > > goal is to add an new ism device. As far as the spec is concerned, we are not
> > > concerned with the implementation of the backend.
> > >
> > > The direction we should discuss is what is the difference between the ism device
> > > and other devices such as virtio-net, and whether it is necessary to introduce
> > > this new device.
> >
> > This is somehow what I want to ask, actually it's not a comparison
> > with virtio-net but:
> >
> > - virtio-roce
> > - virtio-vhost-user
> > - virtio-(p)mem
> >
> > or whether we can simply add features to those devices to achieve what
> > you want to do here.
>
>
> Yes, this is my priority to discuss.
>
> At the moment, I think the most similar to ism is the Vhost-user Device Backend
> of virtio-vhost-user.
>
> My understanding of it is to map any virtio device to another vm as a vvu
> device.

Yes, so a possible way is to have a device with memory zone/region
provision and management then map it via virtio-vhost-user.

>
> From this design purpose, I think the two are different.
>
> Of course, you might want to extend it, it does have some similarities and uses
> a lot of similar techniques.

I don't have any preference so far. If you think your idea makes more
sense, then try your best to justify it in the list.

> So we can really discuss in this direction, whether
> the vvu device can be extended to achieve the purpose of ism, or whether the
> design goals can be agreed.

I've added Stefan in the loop, let's hear from him.

>
> Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
> Should device/driver APIs remain independent?

Btw, you mentioned that one possible user of ism is the smc, but I
don't see how it connects to that with your prototype driver.

Thanks

>
> Thanks.
>
>
> >
> > > How to share the backend with other deivce is another problem.
> >
> > Yes, anything that is used for your virito-ism prototype can be used
> > for other devices.
> >
> > >
> > > Our goal is to dynamically obtain a piece of memory to share with other vms.
> >
> > So at this level, I don't see the exact difference compared to
> > virtio-vhost-user. Let's just focus on the API that carries on the
> > semantic:
> >
> > - map/unmap
> > - permission update
> >
> > The only missing piece is the per region notification.
> >
> > >
> > > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> > > it will use it as a ring. Of course, we also need a notify mechanism.
> > >
> > > That's what we're aiming for, so we should first discuss whether this
> > > requirement is reasonable.
> >
> > So unless somebody said "no", it is fine until now.
> >
> > > I think it's a feature currently not supported by
> > > other devices specified by the current virtio spce.
> >
> > Probably, but we've already had rfcs for roce and vhost-user.
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  4:30       ` Xuan Zhuo
@ 2022-10-19  5:10         ` Jason Wang
  2022-10-19  6:13           ` Xuan Zhuo
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Wang @ 2022-10-19  5:10 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Wed, Oct 19, 2022 at 12:35 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > Hi Jason,
> > >
> > > I think there may be some problems with the direction we are discussing.
> >
> > Probably not.
> >
> > As far as we are focusing on technology, there's nothing wrong from my
> > perspective. And this is how the community works. Your idea needs to
> > be justified and people are free to raise any technical questions
> > especially considering you've posted a spec change with prototype
> > codes but not only the idea.
> >
> > > Our
> > > goal is to add an new ism device. As far as the spec is concerned, we are not
> > > concerned with the implementation of the backend.
> > >
> > > The direction we should discuss is what is the difference between the ism device
> > > and other devices such as virtio-net, and whether it is necessary to introduce
> > > this new device.
> >
> > This is somehow what I want to ask, actually it's not a comparison
> > with virtio-net but:
> >
> > - virtio-roce
> > - virtio-vhost-user
> > - virtio-(p)mem
> >
> > or whether we can simply add features to those devices to achieve what
> > you want to do here.
> >
> > > How to share the backend with other deivce is another problem.
> >
> > Yes, anything that is used for your virito-ism prototype can be used
> > for other devices.
> >
> > >
> > > Our goal is to dynamically obtain a piece of memory to share with other vms.
> >
> > So at this level, I don't see the exact difference compared to
> > virtio-vhost-user. Let's just focus on the API that carries on the
> > semantic:
> >
> > - map/unmap
> > - permission update
> >
> > The only missing piece is the per region notification.
>
>
>
> I want to know how we can share a region based on vvu:
>
> |---------|       |---------------|
> |         |       |               |
> |  -----  |       |  -------      |
> |  | ? |  |       |  | vvu |      |
> |---------|       |---------------|
>      |                  |
>      |                  |
>      |------------------|
>
> Can you describe this process in the vvu scenario you are considering?
>
>
> The flow of ism we consider is as follows:
>     1. SMC calls the interface ism_alloc_region() of the ism driver to return the
>        location of a memory region in the PCI space and a token.

Can virtio-vhost-user be backed on the memory you've used for ISM?
It's just a name of the command:

VHOST_IOTLB_UPDATE(or other) vs VIRTIO_ISM_CTRL_ALLOC.

Or we can consider the form another angle, can virtio-vhost-user be
built on top of ISM?

>     2. The ism driver mmap the memory region and return to SMC with the token

This part should be the same as long as we add token to a specific region.

>     3. SMC passes the token to the connected peer

Should be the same.

>     4. the peer calls the ism driver interface ism_attach_region(token) to
>        get the location of the PCI space of the shared memory

Ditto.

Thanks

>
> Thanks.
>
>
> >
> > >
> > > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> > > it will use it as a ring. Of course, we also need a notify mechanism.
> > >
> > > That's what we're aiming for, so we should first discuss whether this
> > > requirement is reasonable.
> >
> > So unless somebody said "no", it is fine until now.
> >
> > > I think it's a feature currently not supported by
> > > other devices specified by the current virtio spce.
> >
> > Probably, but we've already had rfcs for roce and vhost-user.
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  3:55         ` Jason Wang
@ 2022-10-19  5:29           ` Gerry
  0 siblings, 0 replies; 61+ messages in thread
From: Gerry @ 2022-10-19  5:29 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu,
	zhenzao, helinguo, mst, cohuck, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 17479 bytes --]



> 2022年10月19日 11:55,Jason Wang <jasowang@redhat.com> 写道:
> 
> 
> 在 2022/10/18 16:33, Gerry 写道:
>> 
>> 
>>> 2022年10月18日 14:54,Jason Wang <jasowang@redhat.com> 写道:
>>> 
>>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>> 
>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>> Adding Stefan.
>>>>> 
>>>>> 
>>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>> 
>>>>>> Hello everyone,
>>>>>> 
>>>>>> # Background
>>>>>> 
>>>>>> Nowadays, there is a common scenario to accelerate communication between
>>>>>> different VMs and containers, including light weight virtual machine based
>>>>>> containers. One way to achieve this is to colocate them on the same host.
>>>>>> However, the performance of inter-VM communication through network stack is not
>>>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
>>>>>> many times, but still no generic solution available [1] [2] [3].
>>>>>> 
>>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
>>>>>> We found that by changing the communication channel between VMs from TCP to SMC
>>>>>> with shared memory, we can achieve superior performance for a common
>>>>>> socket-based application[5]:
>>>>>>  - latency reduced by about 50%
>>>>>>  - throughput increased by about 300%
>>>>>>  - CPU consumption reduced by about 50%
>>>>>> 
>>>>>> Since there is no particularly suitable shared memory management solution
>>>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
>>>>>> is the standard for communication in the virtualization world, we want to
>>>>>> implement a virtio-ism device based on virtio, which can support on-demand
>>>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
>>>>>> the virtio-ism device need to support:
>>>>>> 
>>>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
>>>>>>   provisioned.
>>>>>> 2. Multi-region management: the shared memory is divided into regions,
>>>>>>   and a peer may allocate one or more regions from the same shared memory
>>>>>>   device.
>>>>>> 3. Permission control: The permission of each region can be set seperately.
>>>>> 
>>>>> Looks like virtio-ROCE
>>>>> 
>>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
>>>>> 
>>>>> and virtio-vhost-user can satisfy the requirement?
>>>>> 
>>>>>> 
>>>>>> # Virtio ism device
>>>>>> 
>>>>>> ISM devices provide the ability to share memory between different guests on a
>>>>>> host. A guest's memory got from ism device can be shared with multiple peers at
>>>>>> the same time. This shared relationship can be dynamically created and released.
>>>>>> 
>>>>>> The shared memory obtained from the device is divided into multiple ism regions
>>>>>> for share. ISM device provides a mechanism to notify other ism region referrers
>>>>>> of content update events.
>>>>>> 
>>>>>> # Usage (SMC as example)
>>>>>> 
>>>>>> Maybe there is one of possible use cases:
>>>>>> 
>>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
>>>>>>   location of a memory region in the PCI space and a token.
>>>>>> 2. The ism driver mmap the memory region and return to SMC with the token
>>>>>> 3. SMC passes the token to the connected peer
>>>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
>>>>>>   get the location of the PCI space of the shared memory
>>>>>> 
>>>>>> 
>>>>>> # About hot plugging of the ism device
>>>>>> 
>>>>>>   Hot plugging of devices is a heavier, possibly failed, time-consuming, and
>>>>>>   less scalable operation. So, we don't plan to support it for now.
>>>>>> 
>>>>>> # Comparison with existing technology
>>>>>> 
>>>>>> ## ivshmem or ivshmem 2.0 of Qemu
>>>>>> 
>>>>>>   1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
>>>>>>   use this VM, so the security is not enough.
>>>>>> 
>>>>>>   2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
>>>>>>   other VMs that use the ivshmem 2.0 shared memory device, which also does not
>>>>>>   meet our needs in terms of security.
>>>>>> 
>>>>>> ## vhost-pci and virtiovhostuser
>>>>>> 
>>>>>>   Does not support dynamic allocation and therefore not suitable for SMC.
>>>>> 
>>>>> I think this is an implementation issue, we can support VHOST IOTLB
>>>>> message then the regions could be added/removed on demand.
>>>> 
>>>> 
>>>> 1. After the attacker connects with the victim, if the attacker does not
>>>>   dereference memory, the memory will be occupied under virtiovhostuser. In the
>>>>   case of ism devices, the victim can directly release the reference, and the
>>>>   maliciously referenced region only occupies the attacker's resources
>>> 
>>> Let's define the security boundary here. E.g do we trust the device or
>>> not? If yes, in the case of virtiovhostuser, can we simple do
>>> VHOST_IOTLB_UNMAP then we can safely release the memory from the
>>> attacker.
>> Thanks, Jason:)
>> In our the design, there are several roles involved:
>> 1) a virtio-ism-smc front-end driver
>> 2) a Virtio-ism backend device driver and its associated vmm
>> 3) a global device manager
>> 4) a group of remote/peer virtio-ism backend devices/vmms
>> 5) a group of remote/peer virtio-ism-smc front-end drivers
>> 
>> Among which , we treat 1, 2 and 3 as trusted, 4 and 5 as untrusted.
> 
> 
> It looks to me VIRTIO_ISM_PERM_MANAGE violates what you've described here. E.g what happens if 1 grant this permission to 5?

My mistake, missed some background information.
We split the communication into control plain and data plain. The above thread model is for control plain. Once a peer has been granted permissions to access a memory region, it becomes trusted to read/write the memory region.

> 
> 
>> Because 4 and 5 are trusted, we can’t guarantee that IOTLB Invalidate requests have been executed as expected.
> 
> 
> Interesting, I wonder how this is guaranteed by ISM. Anything that can work for ISM but not IOTLB? Note that the only difference for me is the device API. We can hook anything that works for ISM to IOTLB.
The difference is who is the resource owner.
For IOTLB based design, guest vm is the resource owner, so it could only reclaim a shared memory region from peers.
For our design, the device manager is the resource owner, guest vm allocate/free memory region from the device manager. So for each SMC connection, a new memory region is allocated/freed, a memory region won’t be reused for SMCs connections with different (local, peer) pairs.

> 
> 
>> Say when disconnecting an SMC connection, a malicious peer may ignore the IOTLB invalidation request and keep access the shared memory region.
>> 
>> We have considered the IOTLB based design but encountered several issues:
>> 1) It depends on the way to provision guest vm memory. We need a memory resource descriptor to support vhost-user IOTLB messages, thus can’t support anonymous memory based vm.
> 
> 
> Hypervisor (Qemu) is free to hook IOTLB message to any kind of memory backend, isn't? E.g Qemu can choose to implement IOTLB by its own instead of forwarding it to another VM.
A memory resource file descriptor is needed to share a memory region among VMs. If the guest memory is provisioned by anonymous mapped memory, it can’t be shared to other VMs.
In other words, vhost may work with process virtual address, ghost-user always works with file descriptors.  

> 
> 
>> 2) Lack of fine-grain access control of memory resource descriptor. When send a memory resource descriptor to an untrusted peer, we can’t enforce region based access control. Memfd supports file level seal operations, but still lack of region based permission control. Hugetlbfs based fd doesn’t support seal at all.
> 
> 
> So in the above, you said 4 and 5 are untrusted. If yes how you can enforce regioned based access control (the memory is still mapped by the untrsuted VMM)? And again, virtio-vhost-user is not limited to memfd/hugetlbfs, it can do want you've done in your protoype (hooking to /dev/shm).
Let’s take an example. Say vmm provisions 1GB memory to guest A through a memfd, among which 1MB is allocated by guest A as shared memory and want to share it with vm B.
We lack of technologies to share the memfd to guest B and restrict guest B to only access the shared 1MB region. 

> 
> 
>> 3) Lack of reliable way to reclaim granted access permissions from untrusted peers, as stated above.
> 
> 
> It would be better to explain how this "reclaim" works.
> 
> 
>> 4) How implement resource accounting. Say a vm has shared some memory regions from peers, and those peers exited unexpectedly, then those shared memory will be accounted to the victim vm, and may cause unexpected OOM.
>> 
>> Based on the above consideration, we adopted another design and introduced the device manager to solve above issues:
>> 1) the device manager is the owner of memory buffers.
> 
> 
> I don't see the definition "device manager" in your proposal, this needs to be clarified in both the spec and the changelog or the cover letter.
We will add the description about “device manager” in next version.

> 
> 
>> 2) the device manager creates a memfd for each memory buffer/region, and configure SEALs according to requested access permissions.
> 
> 
> Ok, but this seems not what you've implemented in your qemu prototype?
Not yet, we are still working on it.
BTW, is rust/golang based prototype acceptable to Qemu community? Or it must be written in C?

> 
> 
>> 3) When a guest vm reclaims a shared memory buffer, the device manager will provision a new memfd to the guest vm.
> 
> 
> How can this be done for the untrusted peers?
Please refer to above explanations:)

> 
> 
>> And it will take the responsibility to reclaim the old buffer from peer and eventually release the old buffer.
>> 4) Simplify the control communication channel. Every guest vm only needs to talk with the device manager and no need to discover and communicate with other peers.
> 
> 
> Not sure but it's better not mandate any model in the application layer.
Made another mistake here, it should be stated as

Every backend device driver only needs to talk with the device manager and no need to discover and communicate with other peers.

It may help to give more information about the design goal we want to achieve:
1) extend pci-ivshmem v1/v2 to support more complex usage scenarios like SMC.
2) virtio-ism is a generic shmem device and should support pci-ivshmem v1/v2 usage scenarios. 
3) SMC is the first/example usage scenario.
4) support communications among vm<—>vm, vm<—>run container, runC container<—>runC container. 

Design goal 4 forces us to design virtio-ism instead of pci-ivshmem v2/v3. 
The virtio/vDPA/vDUSE stack provides us a perfect way to implement userspace shmem device to support runC containers.
And currently there’s no way to emulate PCI devices in userspace. 


> 
> Thanks
> 
> 
>> 
>> Thanks,
>> Gerry
>> 
>>> 
>>>> 
>>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
>>>>   time, which is a challenge for virtiovhostuser
>>> 
>>> Please elaborate more the the challenges, anything make
>>> virtiovhostuser different?
>>> 
>>>> 
>>>> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
>>>>   determines the sharing relationship at startup.
>>> 
>>> Not necessarily with IOTLB API?
>>> 
>>>> 
>>>> 4. For security issues, the device under virtiovhostuser may mmap more memory,
>>>>   while ism only maps one region to other devices
>>> 
>>> With VHOST_IOTLB_MAP, the map could be done per region.
>>> 
>>> Thanks
>>> 
>>>> 
>>>> Thanks.
>>>> 
>>>>> 
>>>>> Thanks
>>>>> 
>>>>>> 
>>>>>> # Design
>>>>>> 
>>>>>>   This is a structure diagram based on ism sharing between two vms.
>>>>>> 
>>>>>>    |-------------------------------------------------------------------------------------------------------------|
>>>>>>    | |------------------------------------------------|       |------------------------------------------------| |
>>>>>>    | | Guest                                          |       | Guest                                          | |
>>>>>>    | |                                                |       |                                                | |
>>>>>>    | |   ----------------                             |       |   ----------------                             | |
>>>>>>    | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
>>>>>>    | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
>>>>>>    | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
>>>>>>    | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>>>>>    | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>>>>>    | |                                |               |       |                               |                | |
>>>>>>    | |                                |               |       |                               |                | |
>>>>>>    | | Qemu                           |               |       | Qemu                          |                | |
>>>>>>    | |--------------------------------+---------------|       |-------------------------------+----------------| |
>>>>>>    |                                  |                                                       |                  |
>>>>>>    |                                  |                                                       |                  |
>>>>>>    |                                  |------------------------------+------------------------|                  |
>>>>>>    |                                                                 |                                           |
>>>>>>    |                                                                 |                                           |
>>>>>>    |                                                   --------------------------                                |
>>>>>>    |                                                    | M1 |   | M2 |   | M3 |                                 |
>>>>>>    |                                                   --------------------------                                |
>>>>>>    |                                                                                                             |
>>>>>>    | HOST                                                                                                        |
>>>>>>    ---------------------------------------------------------------------------------------------------------------
>>>>>> 
>>>>>> # POC code
>>>>>> 
>>>>>>   Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
>>>>>>   Qemu: https://github.com/fengidri/qemu/commits/ism
>>>>>> 
>>>>>> If there are any problems, please point them out.
>>>>>> 
>>>>>> Hope to hear from you, thank you.
>>>>>> 
>>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
>>>>>> [2] https://dl.acm.org/doi/10.1145/2847562
>>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
>>>>>> [4] https://lwn.net/Articles/711071/
>>>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
>>>>>> 
>>>>>> 
>>>>>> Xuan Zhuo (2):
>>>>>>  Reserve device id for ISM device
>>>>>>  virtio-ism: introduce new device virtio-ism
>>>>>> 
>>>>>> content.tex    |   3 +
>>>>>> virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> 2 files changed, 343 insertions(+)
>>>>>> create mode 100644 virtio-ism.tex
>>>>>> 
>>>>>> --
>>>>>> 2.32.0.3.g01195cf9f
>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org <mailto:virtio-dev-unsubscribe@lists.oasis-open.org> <mailto:virtio-dev-unsubscribe@lists.oasis-open.org <mailto:virtio-dev-unsubscribe@lists.oasis-open.org>>
>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org <mailto:virtio-dev-help@lists.oasis-open.org> <mailto:virtio-dev-help@lists.oasis-open.org <mailto:virtio-dev-help@lists.oasis-open.org>>
>>>>>> 
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:virtio-dev-unsubscribe@lists.oasis-open.org <mailto:virtio-dev-unsubscribe@lists.oasis-open.org>
>>>> For additional commands, e-mail:virtio-dev-help@lists.oasis-open.org <mailto:virtio-dev-help@lists.oasis-open.org>

[-- Attachment #2: Type: text/html, Size: 56014 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  4:36         ` Jason Wang
@ 2022-10-19  6:02           ` Xuan Zhuo
  2022-10-19  8:07             ` Tony Lu
  0 siblings, 1 reply; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-19  6:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Wed, 19 Oct 2022 12:36:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Oct 19, 2022 at 12:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > Hi Jason,
> > > >
> > > > I think there may be some problems with the direction we are discussing.
> > >
> > > Probably not.
> > >
> > > As far as we are focusing on technology, there's nothing wrong from my
> > > perspective. And this is how the community works. Your idea needs to
> > > be justified and people are free to raise any technical questions
> > > especially considering you've posted a spec change with prototype
> > > codes but not only the idea.
> > >
> > > > Our
> > > > goal is to add an new ism device. As far as the spec is concerned, we are not
> > > > concerned with the implementation of the backend.
> > > >
> > > > The direction we should discuss is what is the difference between the ism device
> > > > and other devices such as virtio-net, and whether it is necessary to introduce
> > > > this new device.
> > >
> > > This is somehow what I want to ask, actually it's not a comparison
> > > with virtio-net but:
> > >
> > > - virtio-roce
> > > - virtio-vhost-user
> > > - virtio-(p)mem
> > >
> > > or whether we can simply add features to those devices to achieve what
> > > you want to do here.
> >
> >
> > Yes, this is my priority to discuss.
> >
> > At the moment, I think the most similar to ism is the Vhost-user Device Backend
> > of virtio-vhost-user.
> >
> > My understanding of it is to map any virtio device to another vm as a vvu
> > device.
>
> Yes, so a possible way is to have a device with memory zone/region
> provision and management then map it via virtio-vhost-user.


Yes, there is such a possibility. virtio-vhost-user makes me feel that what can
be shared is the function implementation of map.

But in the vm to provide the interface to the upper layer, I think this is the
work of ism.

But one of the reasons why I didn't use virtio-vhost-user directly is that in
another vm, the guest can operate the vvu device, which we hope that both sides
are equal to the ism device.

So I want to agree on a question first: who will provide the upper layer with
the ability to share the memory area?

Our answer is a new ism device. How does this device achieve memory sharing, I
think is the second question.


>
> >
> > From this design purpose, I think the two are different.
> >
> > Of course, you might want to extend it, it does have some similarities and uses
> > a lot of similar techniques.
>
> I don't have any preference so far. If you think your idea makes more
> sense, then try your best to justify it in the list.
>
> > So we can really discuss in this direction, whether
> > the vvu device can be extended to achieve the purpose of ism, or whether the
> > design goals can be agreed.
>
> I've added Stefan in the loop, let's hear from him.
>
> >
> > Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
> > Should device/driver APIs remain independent?
>
> Btw, you mentioned that one possible user of ism is the smc, but I
> don't see how it connects to that with your prototype driver.

Yes, we originally had plans, but the virtio spec was considered for submission,
so this was not included. Maybe, we should have included this part @Tony

A brief introduction is that SMC currently has a corresponding
s390/net/ism_drv.c and we will replace this in the virtualization scenario.

Thanks.

>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > > How to share the backend with other deivce is another problem.
> > >
> > > Yes, anything that is used for your virito-ism prototype can be used
> > > for other devices.
> > >
> > > >
> > > > Our goal is to dynamically obtain a piece of memory to share with other vms.
> > >
> > > So at this level, I don't see the exact difference compared to
> > > virtio-vhost-user. Let's just focus on the API that carries on the
> > > semantic:
> > >
> > > - map/unmap
> > > - permission update
> > >
> > > The only missing piece is the per region notification.
> > >
> > > >
> > > > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> > > > it will use it as a ring. Of course, we also need a notify mechanism.
> > > >
> > > > That's what we're aiming for, so we should first discuss whether this
> > > > requirement is reasonable.
> > >
> > > So unless somebody said "no", it is fine until now.
> > >
> > > > I think it's a feature currently not supported by
> > > > other devices specified by the current virtio spce.
> > >
> > > Probably, but we've already had rfcs for roce and vhost-user.
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  5:10         ` Jason Wang
@ 2022-10-19  6:13           ` Xuan Zhuo
  0 siblings, 0 replies; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-19  6:13 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Wed, 19 Oct 2022 13:10:49 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Oct 19, 2022 at 12:35 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > Hi Jason,
> > > >
> > > > I think there may be some problems with the direction we are discussing.
> > >
> > > Probably not.
> > >
> > > As far as we are focusing on technology, there's nothing wrong from my
> > > perspective. And this is how the community works. Your idea needs to
> > > be justified and people are free to raise any technical questions
> > > especially considering you've posted a spec change with prototype
> > > codes but not only the idea.
> > >
> > > > Our
> > > > goal is to add an new ism device. As far as the spec is concerned, we are not
> > > > concerned with the implementation of the backend.
> > > >
> > > > The direction we should discuss is what is the difference between the ism device
> > > > and other devices such as virtio-net, and whether it is necessary to introduce
> > > > this new device.
> > >
> > > This is somehow what I want to ask, actually it's not a comparison
> > > with virtio-net but:
> > >
> > > - virtio-roce
> > > - virtio-vhost-user
> > > - virtio-(p)mem
> > >
> > > or whether we can simply add features to those devices to achieve what
> > > you want to do here.
> > >
> > > > How to share the backend with other deivce is another problem.
> > >
> > > Yes, anything that is used for your virito-ism prototype can be used
> > > for other devices.
> > >
> > > >
> > > > Our goal is to dynamically obtain a piece of memory to share with other vms.
> > >
> > > So at this level, I don't see the exact difference compared to
> > > virtio-vhost-user. Let's just focus on the API that carries on the
> > > semantic:
> > >
> > > - map/unmap
> > > - permission update
> > >
> > > The only missing piece is the per region notification.
> >
> >
> >
> > I want to know how we can share a region based on vvu:
> >
> > |---------|       |---------------|
> > |         |       |               |
> > |  -----  |       |  -------      |
> > |  | ? |  |       |  | vvu |      |
> > |---------|       |---------------|
> >      |                  |
> >      |                  |
> >      |------------------|
> >
> > Can you describe this process in the vvu scenario you are considering?
> >
> >
> > The flow of ism we consider is as follows:
> >     1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> >        location of a memory region in the PCI space and a token.
>
> Can virtio-vhost-user be backed on the memory you've used for ISM?
> It's just a name of the command:

I think there is such a possibility, although there are some points of
contention.

I understand there are several possibilities:

1. Our current approach

     |-----------|       |---------------|
     |           |       |               |
     |  -------  |       |  -------      |
     |  | ism |  |       |  | ism |      |
     |-----------|       |---------------|
          |                  |
          |                  |
          |------------------|
                [ism protocol]

2. by vhost-user protocol

     |-----------|       |---------------|
     |           |       |               |
     |  -------  |       |  -------      |
     |  | ism |  |       |  | ism |      |
     |-----------|       |---------------|
          |                  |
          |                  |
          |------------------|
                [vhost-user]

3. by virtio-vhost-user

     |-----------|       |---------------|
     |           |       |               |
     |  -------  |       |  -------      |
     |  | ism |  |       |  | ism |      |
     |  -------  |       |  -------      |
     |  | vvu |  |       |  | vvu |      |
     |-----------|       |---------------|
          |                  |
          |                  |
          |------------------|
                [vhost-user]


We currently have the following requirements for the ism protocol:

1. Dynamic creation
2. Region-based sharing
3. Security.

I thought vhost-user compatibility was difficult, but you should think it's
possible. Let's think about it again.


Thanks


>
> VHOST_IOTLB_UPDATE(or other) vs VIRTIO_ISM_CTRL_ALLOC.
>
> Or we can consider the form another angle, can virtio-vhost-user be
> built on top of ISM?
>
> >     2. The ism driver mmap the memory region and return to SMC with the token
>
> This part should be the same as long as we add token to a specific region.
>
> >     3. SMC passes the token to the connected peer
>
> Should be the same.
>
> >     4. the peer calls the ism driver interface ism_attach_region(token) to
> >        get the location of the PCI space of the shared memory
>
> Ditto.
>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > >
> > > > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> > > > it will use it as a ring. Of course, we also need a notify mechanism.
> > > >
> > > > That's what we're aiming for, so we should first discuss whether this
> > > > requirement is reasonable.
> > >
> > > So unless somebody said "no", it is fine until now.
> > >
> > > > I think it's a feature currently not supported by
> > > > other devices specified by the current virtio spce.
> > >
> > > Probably, but we've already had rfcs for roce and vhost-user.
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-18  6:54     ` Jason Wang
  2022-10-18  8:33       ` Gerry
  2022-10-18  8:55       ` He Rongguang
@ 2022-10-19  6:43       ` Xuan Zhuo
  2022-10-19  8:01         ` Jason Wang
  2 siblings, 1 reply; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-19  6:43 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > Adding Stefan.
> > >
> > >
> > > On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > Hello everyone,
> > > >
> > > > # Background
> > > >
> > > > Nowadays, there is a common scenario to accelerate communication between
> > > > different VMs and containers, including light weight virtual machine based
> > > > containers. One way to achieve this is to colocate them on the same host.
> > > > However, the performance of inter-VM communication through network stack is not
> > > > optimal and may also waste extra CPU cycles. This scenario has been discussed
> > > > many times, but still no generic solution available [1] [2] [3].
> > > >
> > > > With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> > > > We found that by changing the communication channel between VMs from TCP to SMC
> > > > with shared memory, we can achieve superior performance for a common
> > > > socket-based application[5]:
> > > >   - latency reduced by about 50%
> > > >   - throughput increased by about 300%
> > > >   - CPU consumption reduced by about 50%
> > > >
> > > > Since there is no particularly suitable shared memory management solution
> > > > matches the need for SMC(See ## Comparison with existing technology), and virtio
> > > > is the standard for communication in the virtualization world, we want to
> > > > implement a virtio-ism device based on virtio, which can support on-demand
> > > > memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> > > > the virtio-ism device need to support:
> > > >
> > > > 1. Dynamic provision: shared memory regions are dynamically allocated and
> > > >    provisioned.
> > > > 2. Multi-region management: the shared memory is divided into regions,
> > > >    and a peer may allocate one or more regions from the same shared memory
> > > >    device.
> > > > 3. Permission control: The permission of each region can be set seperately.
> > >
> > > Looks like virtio-ROCE
> > >
> > > https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> > >
> > > and virtio-vhost-user can satisfy the requirement?
> > >
> > > >
> > > > # Virtio ism device
> > > >
> > > > ISM devices provide the ability to share memory between different guests on a
> > > > host. A guest's memory got from ism device can be shared with multiple peers at
> > > > the same time. This shared relationship can be dynamically created and released.
> > > >
> > > > The shared memory obtained from the device is divided into multiple ism regions
> > > > for share. ISM device provides a mechanism to notify other ism region referrers
> > > > of content update events.
> > > >
> > > > # Usage (SMC as example)
> > > >
> > > > Maybe there is one of possible use cases:
> > > >
> > > > 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> > > >    location of a memory region in the PCI space and a token.
> > > > 2. The ism driver mmap the memory region and return to SMC with the token
> > > > 3. SMC passes the token to the connected peer
> > > > 3. the peer calls the ism driver interface ism_attach_region(token) to
> > > >    get the location of the PCI space of the shared memory
> > > >
> > > >
> > > > # About hot plugging of the ism device
> > > >
> > > >    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> > > >    less scalable operation. So, we don't plan to support it for now.
> > > >
> > > > # Comparison with existing technology
> > > >
> > > > ## ivshmem or ivshmem 2.0 of Qemu
> > > >
> > > >    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> > > >    use this VM, so the security is not enough.
> > > >
> > > >    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> > > >    other VMs that use the ivshmem 2.0 shared memory device, which also does not
> > > >    meet our needs in terms of security.
> > > >
> > > > ## vhost-pci and virtiovhostuser
> > > >
> > > >    Does not support dynamic allocation and therefore not suitable for SMC.
> > >
> > > I think this is an implementation issue, we can support VHOST IOTLB
> > > message then the regions could be added/removed on demand.
> >
> >
> > 1. After the attacker connects with the victim, if the attacker does not
> >    dereference memory, the memory will be occupied under virtiovhostuser. In the
> >    case of ism devices, the victim can directly release the reference, and the
> >    maliciously referenced region only occupies the attacker's resources
>
> Let's define the security boundary here. E.g do we trust the device or
> not? If yes, in the case of virtiovhostuser, can we simple do
> VHOST_IOTLB_UNMAP then we can safely release the memory from the
> attacker.
>
> >
> > 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
> >    time, which is a challenge for virtiovhostuser
>
> Please elaborate more the the challenges, anything make
> virtiovhostuser different?

I understand (please point out any mistakes), one vvu device corresponds to one
vm. If we share memory with 1000 vm, do we have 1000 vvu devices?


>
> >
> > 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
> >    determines the sharing relationship at startup.
>
> Not necessarily with IOTLB API?

Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
provide the same memory on the host to two vms. So the implementation of this
part will be much simpler. This is why we gave up virtio-vhost-user at the
beginning.

Thanks.


>
> >
> > 4. For security issues, the device under virtiovhostuser may mmap more memory,
> >    while ism only maps one region to other devices
>
> With VHOST_IOTLB_MAP, the map could be done per region.
>
> Thanks
>
> >
> > Thanks.
> >
> > >
> > > Thanks
> > >
> > > >
> > > > # Design
> > > >
> > > >    This is a structure diagram based on ism sharing between two vms.
> > > >
> > > >     |-------------------------------------------------------------------------------------------------------------|
> > > >     | |------------------------------------------------|       |------------------------------------------------| |
> > > >     | | Guest                                          |       | Guest                                          | |
> > > >     | |                                                |       |                                                | |
> > > >     | |   ----------------                             |       |   ----------------                             | |
> > > >     | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> > > >     | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> > > >     | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> > > >     | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > >     | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > >     | |                                |               |       |                               |                | |
> > > >     | |                                |               |       |                               |                | |
> > > >     | | Qemu                           |               |       | Qemu                          |                | |
> > > >     | |--------------------------------+---------------|       |-------------------------------+----------------| |
> > > >     |                                  |                                                       |                  |
> > > >     |                                  |                                                       |                  |
> > > >     |                                  |------------------------------+------------------------|                  |
> > > >     |                                                                 |                                           |
> > > >     |                                                                 |                                           |
> > > >     |                                                   --------------------------                                |
> > > >     |                                                    | M1 |   | M2 |   | M3 |                                 |
> > > >     |                                                   --------------------------                                |
> > > >     |                                                                                                             |
> > > >     | HOST                                                                                                        |
> > > >     ---------------------------------------------------------------------------------------------------------------
> > > >
> > > > # POC code
> > > >
> > > >    Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> > > >    Qemu:   https://github.com/fengidri/qemu/commits/ism
> > > >
> > > > If there are any problems, please point them out.
> > > >
> > > > Hope to hear from you, thank you.
> > > >
> > > > [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> > > > [2] https://dl.acm.org/doi/10.1145/2847562
> > > > [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> > > > [4] https://lwn.net/Articles/711071/
> > > > [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> > > >
> > > >
> > > > Xuan Zhuo (2):
> > > >   Reserve device id for ISM device
> > > >   virtio-ism: introduce new device virtio-ism
> > > >
> > > >  content.tex    |   3 +
> > > >  virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  2 files changed, 343 insertions(+)
> > > >  create mode 100644 virtio-ism.tex
> > > >
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  6:43       ` Xuan Zhuo
@ 2022-10-19  8:01         ` Jason Wang
  2022-10-19  8:03           ` Gerry
  2022-10-19  8:13           ` Xuan Zhuo
  0 siblings, 2 replies; 61+ messages in thread
From: Jason Wang @ 2022-10-19  8:01 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > Adding Stefan.
> > > >
> > > >
> > > > On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > Hello everyone,
> > > > >
> > > > > # Background
> > > > >
> > > > > Nowadays, there is a common scenario to accelerate communication between
> > > > > different VMs and containers, including light weight virtual machine based
> > > > > containers. One way to achieve this is to colocate them on the same host.
> > > > > However, the performance of inter-VM communication through network stack is not
> > > > > optimal and may also waste extra CPU cycles. This scenario has been discussed
> > > > > many times, but still no generic solution available [1] [2] [3].
> > > > >
> > > > > With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> > > > > We found that by changing the communication channel between VMs from TCP to SMC
> > > > > with shared memory, we can achieve superior performance for a common
> > > > > socket-based application[5]:
> > > > >   - latency reduced by about 50%
> > > > >   - throughput increased by about 300%
> > > > >   - CPU consumption reduced by about 50%
> > > > >
> > > > > Since there is no particularly suitable shared memory management solution
> > > > > matches the need for SMC(See ## Comparison with existing technology), and virtio
> > > > > is the standard for communication in the virtualization world, we want to
> > > > > implement a virtio-ism device based on virtio, which can support on-demand
> > > > > memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> > > > > the virtio-ism device need to support:
> > > > >
> > > > > 1. Dynamic provision: shared memory regions are dynamically allocated and
> > > > >    provisioned.
> > > > > 2. Multi-region management: the shared memory is divided into regions,
> > > > >    and a peer may allocate one or more regions from the same shared memory
> > > > >    device.
> > > > > 3. Permission control: The permission of each region can be set seperately.
> > > >
> > > > Looks like virtio-ROCE
> > > >
> > > > https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> > > >
> > > > and virtio-vhost-user can satisfy the requirement?
> > > >
> > > > >
> > > > > # Virtio ism device
> > > > >
> > > > > ISM devices provide the ability to share memory between different guests on a
> > > > > host. A guest's memory got from ism device can be shared with multiple peers at
> > > > > the same time. This shared relationship can be dynamically created and released.
> > > > >
> > > > > The shared memory obtained from the device is divided into multiple ism regions
> > > > > for share. ISM device provides a mechanism to notify other ism region referrers
> > > > > of content update events.
> > > > >
> > > > > # Usage (SMC as example)
> > > > >
> > > > > Maybe there is one of possible use cases:
> > > > >
> > > > > 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> > > > >    location of a memory region in the PCI space and a token.
> > > > > 2. The ism driver mmap the memory region and return to SMC with the token
> > > > > 3. SMC passes the token to the connected peer
> > > > > 3. the peer calls the ism driver interface ism_attach_region(token) to
> > > > >    get the location of the PCI space of the shared memory
> > > > >
> > > > >
> > > > > # About hot plugging of the ism device
> > > > >
> > > > >    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> > > > >    less scalable operation. So, we don't plan to support it for now.
> > > > >
> > > > > # Comparison with existing technology
> > > > >
> > > > > ## ivshmem or ivshmem 2.0 of Qemu
> > > > >
> > > > >    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> > > > >    use this VM, so the security is not enough.
> > > > >
> > > > >    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> > > > >    other VMs that use the ivshmem 2.0 shared memory device, which also does not
> > > > >    meet our needs in terms of security.
> > > > >
> > > > > ## vhost-pci and virtiovhostuser
> > > > >
> > > > >    Does not support dynamic allocation and therefore not suitable for SMC.
> > > >
> > > > I think this is an implementation issue, we can support VHOST IOTLB
> > > > message then the regions could be added/removed on demand.
> > >
> > >
> > > 1. After the attacker connects with the victim, if the attacker does not
> > >    dereference memory, the memory will be occupied under virtiovhostuser. In the
> > >    case of ism devices, the victim can directly release the reference, and the
> > >    maliciously referenced region only occupies the attacker's resources
> >
> > Let's define the security boundary here. E.g do we trust the device or
> > not? If yes, in the case of virtiovhostuser, can we simple do
> > VHOST_IOTLB_UNMAP then we can safely release the memory from the
> > attacker.
> >
> > >
> > > 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
> > >    time, which is a challenge for virtiovhostuser
> >
> > Please elaborate more the the challenges, anything make
> > virtiovhostuser different?
>
> I understand (please point out any mistakes), one vvu device corresponds to one
> vm. If we share memory with 1000 vm, do we have 1000 vvu devices?

There could be some misunderstanding here. With 1000 VM, you still
need 1000 virtio-sim devices I think.

>
>
> >
> > >
> > > 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
> > >    determines the sharing relationship at startup.
> >
> > Not necessarily with IOTLB API?
>
> Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
> provide the same memory on the host to two vms. So the implementation of this
> part will be much simpler. This is why we gave up virtio-vhost-user at the
> beginning.

Ok, just to make sure we're at the same page. From spec level,
virtio-vhost-user doesn't (can't) limit the backend to be implemented
in another VM. So it should be ok to be used for sharing memory
between a guest and host.

Thanks

>
> Thanks.
>
>
> >
> > >
> > > 4. For security issues, the device under virtiovhostuser may mmap more memory,
> > >    while ism only maps one region to other devices
> >
> > With VHOST_IOTLB_MAP, the map could be done per region.
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > # Design
> > > > >
> > > > >    This is a structure diagram based on ism sharing between two vms.
> > > > >
> > > > >     |-------------------------------------------------------------------------------------------------------------|
> > > > >     | |------------------------------------------------|       |------------------------------------------------| |
> > > > >     | | Guest                                          |       | Guest                                          | |
> > > > >     | |                                                |       |                                                | |
> > > > >     | |   ----------------                             |       |   ----------------                             | |
> > > > >     | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> > > > >     | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> > > > >     | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> > > > >     | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> > > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > >     | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> > > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > >     | |                                |               |       |                               |                | |
> > > > >     | |                                |               |       |                               |                | |
> > > > >     | | Qemu                           |               |       | Qemu                          |                | |
> > > > >     | |--------------------------------+---------------|       |-------------------------------+----------------| |
> > > > >     |                                  |                                                       |                  |
> > > > >     |                                  |                                                       |                  |
> > > > >     |                                  |------------------------------+------------------------|                  |
> > > > >     |                                                                 |                                           |
> > > > >     |                                                                 |                                           |
> > > > >     |                                                   --------------------------                                |
> > > > >     |                                                    | M1 |   | M2 |   | M3 |                                 |
> > > > >     |                                                   --------------------------                                |
> > > > >     |                                                                                                             |
> > > > >     | HOST                                                                                                        |
> > > > >     ---------------------------------------------------------------------------------------------------------------
> > > > >
> > > > > # POC code
> > > > >
> > > > >    Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> > > > >    Qemu:   https://github.com/fengidri/qemu/commits/ism
> > > > >
> > > > > If there are any problems, please point them out.
> > > > >
> > > > > Hope to hear from you, thank you.
> > > > >
> > > > > [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> > > > > [2] https://dl.acm.org/doi/10.1145/2847562
> > > > > [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> > > > > [4] https://lwn.net/Articles/711071/
> > > > > [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> > > > >
> > > > >
> > > > > Xuan Zhuo (2):
> > > > >   Reserve device id for ISM device
> > > > >   virtio-ism: introduce new device virtio-ism
> > > > >
> > > > >  content.tex    |   3 +
> > > > >  virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > >  2 files changed, 343 insertions(+)
> > > > >  create mode 100644 virtio-ism.tex
> > > > >
> > > > > --
> > > > > 2.32.0.3.g01195cf9f
> > > > >
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >
> >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  8:01         ` Jason Wang
@ 2022-10-19  8:03           ` Gerry
  2022-10-19  8:14             ` Xuan Zhuo
  2022-10-19  8:21             ` Dust Li
  2022-10-19  8:13           ` Xuan Zhuo
  1 sibling, 2 replies; 61+ messages in thread
From: Gerry @ 2022-10-19  8:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu,
	zhenzao, helinguo, mst, cohuck, Stefan Hajnoczi



> 2022年10月19日 16:01,Jason Wang <jasowang@redhat.com> 写道:
> 
> On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>> 
>> On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>> 
>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>> Adding Stefan.
>>>>> 
>>>>> 
>>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>> 
>>>>>> Hello everyone,
>>>>>> 
>>>>>> # Background
>>>>>> 
>>>>>> Nowadays, there is a common scenario to accelerate communication between
>>>>>> different VMs and containers, including light weight virtual machine based
>>>>>> containers. One way to achieve this is to colocate them on the same host.
>>>>>> However, the performance of inter-VM communication through network stack is not
>>>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
>>>>>> many times, but still no generic solution available [1] [2] [3].
>>>>>> 
>>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
>>>>>> We found that by changing the communication channel between VMs from TCP to SMC
>>>>>> with shared memory, we can achieve superior performance for a common
>>>>>> socket-based application[5]:
>>>>>>  - latency reduced by about 50%
>>>>>>  - throughput increased by about 300%
>>>>>>  - CPU consumption reduced by about 50%
>>>>>> 
>>>>>> Since there is no particularly suitable shared memory management solution
>>>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
>>>>>> is the standard for communication in the virtualization world, we want to
>>>>>> implement a virtio-ism device based on virtio, which can support on-demand
>>>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
>>>>>> the virtio-ism device need to support:
>>>>>> 
>>>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
>>>>>>   provisioned.
>>>>>> 2. Multi-region management: the shared memory is divided into regions,
>>>>>>   and a peer may allocate one or more regions from the same shared memory
>>>>>>   device.
>>>>>> 3. Permission control: The permission of each region can be set seperately.
>>>>> 
>>>>> Looks like virtio-ROCE
>>>>> 
>>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
>>>>> 
>>>>> and virtio-vhost-user can satisfy the requirement?
>>>>> 
>>>>>> 
>>>>>> # Virtio ism device
>>>>>> 
>>>>>> ISM devices provide the ability to share memory between different guests on a
>>>>>> host. A guest's memory got from ism device can be shared with multiple peers at
>>>>>> the same time. This shared relationship can be dynamically created and released.
>>>>>> 
>>>>>> The shared memory obtained from the device is divided into multiple ism regions
>>>>>> for share. ISM device provides a mechanism to notify other ism region referrers
>>>>>> of content update events.
>>>>>> 
>>>>>> # Usage (SMC as example)
>>>>>> 
>>>>>> Maybe there is one of possible use cases:
>>>>>> 
>>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
>>>>>>   location of a memory region in the PCI space and a token.
>>>>>> 2. The ism driver mmap the memory region and return to SMC with the token
>>>>>> 3. SMC passes the token to the connected peer
>>>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
>>>>>>   get the location of the PCI space of the shared memory
>>>>>> 
>>>>>> 
>>>>>> # About hot plugging of the ism device
>>>>>> 
>>>>>>   Hot plugging of devices is a heavier, possibly failed, time-consuming, and
>>>>>>   less scalable operation. So, we don't plan to support it for now.
>>>>>> 
>>>>>> # Comparison with existing technology
>>>>>> 
>>>>>> ## ivshmem or ivshmem 2.0 of Qemu
>>>>>> 
>>>>>>   1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
>>>>>>   use this VM, so the security is not enough.
>>>>>> 
>>>>>>   2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
>>>>>>   other VMs that use the ivshmem 2.0 shared memory device, which also does not
>>>>>>   meet our needs in terms of security.
>>>>>> 
>>>>>> ## vhost-pci and virtiovhostuser
>>>>>> 
>>>>>>   Does not support dynamic allocation and therefore not suitable for SMC.
>>>>> 
>>>>> I think this is an implementation issue, we can support VHOST IOTLB
>>>>> message then the regions could be added/removed on demand.
>>>> 
>>>> 
>>>> 1. After the attacker connects with the victim, if the attacker does not
>>>>   dereference memory, the memory will be occupied under virtiovhostuser. In the
>>>>   case of ism devices, the victim can directly release the reference, and the
>>>>   maliciously referenced region only occupies the attacker's resources
>>> 
>>> Let's define the security boundary here. E.g do we trust the device or
>>> not? If yes, in the case of virtiovhostuser, can we simple do
>>> VHOST_IOTLB_UNMAP then we can safely release the memory from the
>>> attacker.
>>> 
>>>> 
>>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
>>>>   time, which is a challenge for virtiovhostuser
>>> 
>>> Please elaborate more the the challenges, anything make
>>> virtiovhostuser different?
>> 
>> I understand (please point out any mistakes), one vvu device corresponds to one
>> vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
> 
> There could be some misunderstanding here. With 1000 VM, you still
> need 1000 virtio-sim devices I think.
We are trying to achieve one virtio-ism device per vm instead of one virtio-ism device per SMC connection.

> 
>> 
>> 
>>> 
>>>> 
>>>> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
>>>>   determines the sharing relationship at startup.
>>> 
>>> Not necessarily with IOTLB API?
>> 
>> Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
>> provide the same memory on the host to two vms. So the implementation of this
>> part will be much simpler. This is why we gave up virtio-vhost-user at the
>> beginning.
> 
> Ok, just to make sure we're at the same page. From spec level,
> virtio-vhost-user doesn't (can't) limit the backend to be implemented
> in another VM. So it should be ok to be used for sharing memory
> between a guest and host.
> 
> Thanks
> 
>> 
>> Thanks.
>> 
>> 
>>> 
>>>> 
>>>> 4. For security issues, the device under virtiovhostuser may mmap more memory,
>>>>   while ism only maps one region to other devices
>>> 
>>> With VHOST_IOTLB_MAP, the map could be done per region.
>>> 
>>> Thanks
>>> 
>>>> 
>>>> Thanks.
>>>> 
>>>>> 
>>>>> Thanks
>>>>> 
>>>>>> 
>>>>>> # Design
>>>>>> 
>>>>>>   This is a structure diagram based on ism sharing between two vms.
>>>>>> 
>>>>>>    |-------------------------------------------------------------------------------------------------------------|
>>>>>>    | |------------------------------------------------|       |------------------------------------------------| |
>>>>>>    | | Guest                                          |       | Guest                                          | |
>>>>>>    | |                                                |       |                                                | |
>>>>>>    | |   ----------------                             |       |   ----------------                             | |
>>>>>>    | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
>>>>>>    | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
>>>>>>    | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
>>>>>>    | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>>>>>    | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>>>>>    | |                                |               |       |                               |                | |
>>>>>>    | |                                |               |       |                               |                | |
>>>>>>    | | Qemu                           |               |       | Qemu                          |                | |
>>>>>>    | |--------------------------------+---------------|       |-------------------------------+----------------| |
>>>>>>    |                                  |                                                       |                  |
>>>>>>    |                                  |                                                       |                  |
>>>>>>    |                                  |------------------------------+------------------------|                  |
>>>>>>    |                                                                 |                                           |
>>>>>>    |                                                                 |                                           |
>>>>>>    |                                                   --------------------------                                |
>>>>>>    |                                                    | M1 |   | M2 |   | M3 |                                 |
>>>>>>    |                                                   --------------------------                                |
>>>>>>    |                                                                                                             |
>>>>>>    | HOST                                                                                                        |
>>>>>>    ---------------------------------------------------------------------------------------------------------------
>>>>>> 
>>>>>> # POC code
>>>>>> 
>>>>>>   Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
>>>>>>   Qemu:   https://github.com/fengidri/qemu/commits/ism
>>>>>> 
>>>>>> If there are any problems, please point them out.
>>>>>> 
>>>>>> Hope to hear from you, thank you.
>>>>>> 
>>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
>>>>>> [2] https://dl.acm.org/doi/10.1145/2847562
>>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
>>>>>> [4] https://lwn.net/Articles/711071/
>>>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
>>>>>> 
>>>>>> 
>>>>>> Xuan Zhuo (2):
>>>>>>  Reserve device id for ISM device
>>>>>>  virtio-ism: introduce new device virtio-ism
>>>>>> 
>>>>>> content.tex    |   3 +
>>>>>> virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> 2 files changed, 343 insertions(+)
>>>>>> create mode 100644 virtio-ism.tex
>>>>>> 
>>>>>> --
>>>>>> 2.32.0.3.g01195cf9f
>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>>> 
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>> 
>>> 
>> 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  6:02           ` Xuan Zhuo
@ 2022-10-19  8:07             ` Tony Lu
  2022-10-19  9:04               ` Jason Wang
  0 siblings, 1 reply; 61+ messages in thread
From: Tony Lu @ 2022-10-19  8:07 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, dust.li,
	zhenzao, helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Wed, Oct 19, 2022 at 02:02:21PM +0800, Xuan Zhuo wrote:
> On Wed, 19 Oct 2022 12:36:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Oct 19, 2022 at 12:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > >
> > > > > Hi Jason,
> > > > >
> > > > > I think there may be some problems with the direction we are discussing.
> > > >
> > > > Probably not.
> > > >
> > > > As far as we are focusing on technology, there's nothing wrong from my
> > > > perspective. And this is how the community works. Your idea needs to
> > > > be justified and people are free to raise any technical questions
> > > > especially considering you've posted a spec change with prototype
> > > > codes but not only the idea.
> > > >
> > > > > Our
> > > > > goal is to add an new ism device. As far as the spec is concerned, we are not
> > > > > concerned with the implementation of the backend.
> > > > >
> > > > > The direction we should discuss is what is the difference between the ism device
> > > > > and other devices such as virtio-net, and whether it is necessary to introduce
> > > > > this new device.
> > > >
> > > > This is somehow what I want to ask, actually it's not a comparison
> > > > with virtio-net but:
> > > >
> > > > - virtio-roce
> > > > - virtio-vhost-user
> > > > - virtio-(p)mem
> > > >
> > > > or whether we can simply add features to those devices to achieve what
> > > > you want to do here.
> > >
> > >
> > > Yes, this is my priority to discuss.
> > >
> > > At the moment, I think the most similar to ism is the Vhost-user Device Backend
> > > of virtio-vhost-user.
> > >
> > > My understanding of it is to map any virtio device to another vm as a vvu
> > > device.
> >
> > Yes, so a possible way is to have a device with memory zone/region
> > provision and management then map it via virtio-vhost-user.
> 
> 
> Yes, there is such a possibility. virtio-vhost-user makes me feel that what can
> be shared is the function implementation of map.
> 
> But in the vm to provide the interface to the upper layer, I think this is the
> work of ism.
> 
> But one of the reasons why I didn't use virtio-vhost-user directly is that in
> another vm, the guest can operate the vvu device, which we hope that both sides
> are equal to the ism device.
> 
> So I want to agree on a question first: who will provide the upper layer with
> the ability to share the memory area?
> 
> Our answer is a new ism device. How does this device achieve memory sharing, I
> think is the second question.
> 
> 
> >
> > >
> > > From this design purpose, I think the two are different.
> > >
> > > Of course, you might want to extend it, it does have some similarities and uses
> > > a lot of similar techniques.
> >
> > I don't have any preference so far. If you think your idea makes more
> > sense, then try your best to justify it in the list.
> >
> > > So we can really discuss in this direction, whether
> > > the vvu device can be extended to achieve the purpose of ism, or whether the
> > > design goals can be agreed.
> >
> > I've added Stefan in the loop, let's hear from him.
> >
> > >
> > > Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
> > > Should device/driver APIs remain independent?
> >
> > Btw, you mentioned that one possible user of ism is the smc, but I
> > don't see how it connects to that with your prototype driver.
> 
> Yes, we originally had plans, but the virtio spec was considered for submission,
> so this was not included. Maybe, we should have included this part @Tony
> 
> A brief introduction is that SMC currently has a corresponding
> s390/net/ism_drv.c and we will replace this in the virtualization scenario.
> 
> Thanks.
> 

SMC is a network protocol which is modeled by shared memory rather than
packet. Actually the basic required interfaces of SMC device are:

  - alloc / free memory region, each connection peer has two memory
	regions dynamically for sending and receiving ring buffer.
  - attach / detach memory region, remote attaches local-allocated
	sending region as receiving region, vice versa.
  - notify, tell peer to read data and update cursor.

Then the device can be registered as SMC ISM device. Of course, SMC
also requires some modification to adapt it.

Cheers,
Tony Lu

> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > > How to share the backend with other deivce is another problem.
> > > >
> > > > Yes, anything that is used for your virito-ism prototype can be used
> > > > for other devices.
> > > >
> > > > >
> > > > > Our goal is to dynamically obtain a piece of memory to share with other vms.
> > > >
> > > > So at this level, I don't see the exact difference compared to
> > > > virtio-vhost-user. Let's just focus on the API that carries on the
> > > > semantic:
> > > >
> > > > - map/unmap
> > > > - permission update
> > > >
> > > > The only missing piece is the per region notification.
> > > >
> > > > >
> > > > > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> > > > > it will use it as a ring. Of course, we also need a notify mechanism.
> > > > >
> > > > > That's what we're aiming for, so we should first discuss whether this
> > > > > requirement is reasonable.
> > > >
> > > > So unless somebody said "no", it is fine until now.
> > > >
> > > > > I think it's a feature currently not supported by
> > > > > other devices specified by the current virtio spce.
> > > >
> > > > Probably, but we've already had rfcs for roce and vhost-user.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > >
> > >
> >


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  8:01         ` Jason Wang
  2022-10-19  8:03           ` Gerry
@ 2022-10-19  8:13           ` Xuan Zhuo
  2022-10-19  8:15             ` Xuan Zhuo
  1 sibling, 1 reply; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-19  8:13 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Wed, 19 Oct 2022 16:01:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > Adding Stefan.
> > > > >
> > > > >
> > > > > On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > Hello everyone,
> > > > > >
> > > > > > # Background
> > > > > >
> > > > > > Nowadays, there is a common scenario to accelerate communication between
> > > > > > different VMs and containers, including light weight virtual machine based
> > > > > > containers. One way to achieve this is to colocate them on the same host.
> > > > > > However, the performance of inter-VM communication through network stack is not
> > > > > > optimal and may also waste extra CPU cycles. This scenario has been discussed
> > > > > > many times, but still no generic solution available [1] [2] [3].
> > > > > >
> > > > > > With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> > > > > > We found that by changing the communication channel between VMs from TCP to SMC
> > > > > > with shared memory, we can achieve superior performance for a common
> > > > > > socket-based application[5]:
> > > > > >   - latency reduced by about 50%
> > > > > >   - throughput increased by about 300%
> > > > > >   - CPU consumption reduced by about 50%
> > > > > >
> > > > > > Since there is no particularly suitable shared memory management solution
> > > > > > matches the need for SMC(See ## Comparison with existing technology), and virtio
> > > > > > is the standard for communication in the virtualization world, we want to
> > > > > > implement a virtio-ism device based on virtio, which can support on-demand
> > > > > > memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> > > > > > the virtio-ism device need to support:
> > > > > >
> > > > > > 1. Dynamic provision: shared memory regions are dynamically allocated and
> > > > > >    provisioned.
> > > > > > 2. Multi-region management: the shared memory is divided into regions,
> > > > > >    and a peer may allocate one or more regions from the same shared memory
> > > > > >    device.
> > > > > > 3. Permission control: The permission of each region can be set seperately.
> > > > >
> > > > > Looks like virtio-ROCE
> > > > >
> > > > > https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> > > > >
> > > > > and virtio-vhost-user can satisfy the requirement?
> > > > >
> > > > > >
> > > > > > # Virtio ism device
> > > > > >
> > > > > > ISM devices provide the ability to share memory between different guests on a
> > > > > > host. A guest's memory got from ism device can be shared with multiple peers at
> > > > > > the same time. This shared relationship can be dynamically created and released.
> > > > > >
> > > > > > The shared memory obtained from the device is divided into multiple ism regions
> > > > > > for share. ISM device provides a mechanism to notify other ism region referrers
> > > > > > of content update events.
> > > > > >
> > > > > > # Usage (SMC as example)
> > > > > >
> > > > > > Maybe there is one of possible use cases:
> > > > > >
> > > > > > 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> > > > > >    location of a memory region in the PCI space and a token.
> > > > > > 2. The ism driver mmap the memory region and return to SMC with the token
> > > > > > 3. SMC passes the token to the connected peer
> > > > > > 3. the peer calls the ism driver interface ism_attach_region(token) to
> > > > > >    get the location of the PCI space of the shared memory
> > > > > >
> > > > > >
> > > > > > # About hot plugging of the ism device
> > > > > >
> > > > > >    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> > > > > >    less scalable operation. So, we don't plan to support it for now.
> > > > > >
> > > > > > # Comparison with existing technology
> > > > > >
> > > > > > ## ivshmem or ivshmem 2.0 of Qemu
> > > > > >
> > > > > >    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> > > > > >    use this VM, so the security is not enough.
> > > > > >
> > > > > >    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> > > > > >    other VMs that use the ivshmem 2.0 shared memory device, which also does not
> > > > > >    meet our needs in terms of security.
> > > > > >
> > > > > > ## vhost-pci and virtiovhostuser
> > > > > >
> > > > > >    Does not support dynamic allocation and therefore not suitable for SMC.
> > > > >
> > > > > I think this is an implementation issue, we can support VHOST IOTLB
> > > > > message then the regions could be added/removed on demand.
> > > >
> > > >
> > > > 1. After the attacker connects with the victim, if the attacker does not
> > > >    dereference memory, the memory will be occupied under virtiovhostuser. In the
> > > >    case of ism devices, the victim can directly release the reference, and the
> > > >    maliciously referenced region only occupies the attacker's resources
> > >
> > > Let's define the security boundary here. E.g do we trust the device or
> > > not? If yes, in the case of virtiovhostuser, can we simple do
> > > VHOST_IOTLB_UNMAP then we can safely release the memory from the
> > > attacker.
> > >
> > > >
> > > > 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
> > > >    time, which is a challenge for virtiovhostuser
> > >
> > > Please elaborate more the the challenges, anything make
> > > virtiovhostuser different?
> >
> > I understand (please point out any mistakes), one vvu device corresponds to one
> > vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
>
> There could be some misunderstanding here. With 1000 VM, you still
> need 1000 virtio-sim devices I think.

No, just use a virtio-ism device.

Thanks.

>
> >
> >
> > >
> > > >
> > > > 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
> > > >    determines the sharing relationship at startup.
> > >
> > > Not necessarily with IOTLB API?
> >
> > Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
> > provide the same memory on the host to two vms. So the implementation of this
> > part will be much simpler. This is why we gave up virtio-vhost-user at the
> > beginning.
>
> Ok, just to make sure we're at the same page. From spec level,
> virtio-vhost-user doesn't (can't) limit the backend to be implemented
> in another VM. So it should be ok to be used for sharing memory
> between a guest and host.
>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > >
> > > > 4. For security issues, the device under virtiovhostuser may mmap more memory,
> > > >    while ism only maps one region to other devices
> > >
> > > With VHOST_IOTLB_MAP, the map could be done per region.
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > # Design
> > > > > >
> > > > > >    This is a structure diagram based on ism sharing between two vms.
> > > > > >
> > > > > >     |-------------------------------------------------------------------------------------------------------------|
> > > > > >     | |------------------------------------------------|       |------------------------------------------------| |
> > > > > >     | | Guest                                          |       | Guest                                          | |
> > > > > >     | |                                                |       |                                                | |
> > > > > >     | |   ----------------                             |       |   ----------------                             | |
> > > > > >     | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> > > > > >     | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> > > > > >     | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> > > > > >     | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> > > > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > > >     | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> > > > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > > >     | |                                |               |       |                               |                | |
> > > > > >     | |                                |               |       |                               |                | |
> > > > > >     | | Qemu                           |               |       | Qemu                          |                | |
> > > > > >     | |--------------------------------+---------------|       |-------------------------------+----------------| |
> > > > > >     |                                  |                                                       |                  |
> > > > > >     |                                  |                                                       |                  |
> > > > > >     |                                  |------------------------------+------------------------|                  |
> > > > > >     |                                                                 |                                           |
> > > > > >     |                                                                 |                                           |
> > > > > >     |                                                   --------------------------                                |
> > > > > >     |                                                    | M1 |   | M2 |   | M3 |                                 |
> > > > > >     |                                                   --------------------------                                |
> > > > > >     |                                                                                                             |
> > > > > >     | HOST                                                                                                        |
> > > > > >     ---------------------------------------------------------------------------------------------------------------
> > > > > >
> > > > > > # POC code
> > > > > >
> > > > > >    Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> > > > > >    Qemu:   https://github.com/fengidri/qemu/commits/ism
> > > > > >
> > > > > > If there are any problems, please point them out.
> > > > > >
> > > > > > Hope to hear from you, thank you.
> > > > > >
> > > > > > [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> > > > > > [2] https://dl.acm.org/doi/10.1145/2847562
> > > > > > [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> > > > > > [4] https://lwn.net/Articles/711071/
> > > > > > [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> > > > > >
> > > > > >
> > > > > > Xuan Zhuo (2):
> > > > > >   Reserve device id for ISM device
> > > > > >   virtio-ism: introduce new device virtio-ism
> > > > > >
> > > > > >  content.tex    |   3 +
> > > > > >  virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > >  2 files changed, 343 insertions(+)
> > > > > >  create mode 100644 virtio-ism.tex
> > > > > >
> > > > > > --
> > > > > > 2.32.0.3.g01195cf9f
> > > > > >
> > > > > >
> > > > > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > >
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  8:03           ` Gerry
@ 2022-10-19  8:14             ` Xuan Zhuo
  2022-10-19  8:21             ` Dust Li
  1 sibling, 0 replies; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-19  8:14 UTC (permalink / raw)
  To: Gerry
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, mst, cohuck, Stefan Hajnoczi, Jason Wang

On Wed, 19 Oct 2022 16:03:42 +0800, Gerry <gerry@linux.alibaba.com> wrote:
>
>
> > 2022年10月19日 16:01,Jason Wang <jasowang@redhat.com> 写道:
> >
> > On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >>
> >> On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >>>>
> >>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >>>>> Adding Stefan.
> >>>>>
> >>>>>
> >>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >>>>>>
> >>>>>> Hello everyone,
> >>>>>>
> >>>>>> # Background
> >>>>>>
> >>>>>> Nowadays, there is a common scenario to accelerate communication between
> >>>>>> different VMs and containers, including light weight virtual machine based
> >>>>>> containers. One way to achieve this is to colocate them on the same host.
> >>>>>> However, the performance of inter-VM communication through network stack is not
> >>>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
> >>>>>> many times, but still no generic solution available [1] [2] [3].
> >>>>>>
> >>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> >>>>>> We found that by changing the communication channel between VMs from TCP to SMC
> >>>>>> with shared memory, we can achieve superior performance for a common
> >>>>>> socket-based application[5]:
> >>>>>>  - latency reduced by about 50%
> >>>>>>  - throughput increased by about 300%
> >>>>>>  - CPU consumption reduced by about 50%
> >>>>>>
> >>>>>> Since there is no particularly suitable shared memory management solution
> >>>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
> >>>>>> is the standard for communication in the virtualization world, we want to
> >>>>>> implement a virtio-ism device based on virtio, which can support on-demand
> >>>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> >>>>>> the virtio-ism device need to support:
> >>>>>>
> >>>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
> >>>>>>   provisioned.
> >>>>>> 2. Multi-region management: the shared memory is divided into regions,
> >>>>>>   and a peer may allocate one or more regions from the same shared memory
> >>>>>>   device.
> >>>>>> 3. Permission control: The permission of each region can be set seperately.
> >>>>>
> >>>>> Looks like virtio-ROCE
> >>>>>
> >>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> >>>>>
> >>>>> and virtio-vhost-user can satisfy the requirement?
> >>>>>
> >>>>>>
> >>>>>> # Virtio ism device
> >>>>>>
> >>>>>> ISM devices provide the ability to share memory between different guests on a
> >>>>>> host. A guest's memory got from ism device can be shared with multiple peers at
> >>>>>> the same time. This shared relationship can be dynamically created and released.
> >>>>>>
> >>>>>> The shared memory obtained from the device is divided into multiple ism regions
> >>>>>> for share. ISM device provides a mechanism to notify other ism region referrers
> >>>>>> of content update events.
> >>>>>>
> >>>>>> # Usage (SMC as example)
> >>>>>>
> >>>>>> Maybe there is one of possible use cases:
> >>>>>>
> >>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> >>>>>>   location of a memory region in the PCI space and a token.
> >>>>>> 2. The ism driver mmap the memory region and return to SMC with the token
> >>>>>> 3. SMC passes the token to the connected peer
> >>>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
> >>>>>>   get the location of the PCI space of the shared memory
> >>>>>>
> >>>>>>
> >>>>>> # About hot plugging of the ism device
> >>>>>>
> >>>>>>   Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> >>>>>>   less scalable operation. So, we don't plan to support it for now.
> >>>>>>
> >>>>>> # Comparison with existing technology
> >>>>>>
> >>>>>> ## ivshmem or ivshmem 2.0 of Qemu
> >>>>>>
> >>>>>>   1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> >>>>>>   use this VM, so the security is not enough.
> >>>>>>
> >>>>>>   2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> >>>>>>   other VMs that use the ivshmem 2.0 shared memory device, which also does not
> >>>>>>   meet our needs in terms of security.
> >>>>>>
> >>>>>> ## vhost-pci and virtiovhostuser
> >>>>>>
> >>>>>>   Does not support dynamic allocation and therefore not suitable for SMC.
> >>>>>
> >>>>> I think this is an implementation issue, we can support VHOST IOTLB
> >>>>> message then the regions could be added/removed on demand.
> >>>>
> >>>>
> >>>> 1. After the attacker connects with the victim, if the attacker does not
> >>>>   dereference memory, the memory will be occupied under virtiovhostuser. In the
> >>>>   case of ism devices, the victim can directly release the reference, and the
> >>>>   maliciously referenced region only occupies the attacker's resources
> >>>
> >>> Let's define the security boundary here. E.g do we trust the device or
> >>> not? If yes, in the case of virtiovhostuser, can we simple do
> >>> VHOST_IOTLB_UNMAP then we can safely release the memory from the
> >>> attacker.
> >>>
> >>>>
> >>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
> >>>>   time, which is a challenge for virtiovhostuser
> >>>
> >>> Please elaborate more the the challenges, anything make
> >>> virtiovhostuser different?
> >>
> >> I understand (please point out any mistakes), one vvu device corresponds to one
> >> vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
> >
> > There could be some misunderstanding here. With 1000 VM, you still
> > need 1000 virtio-sim devices I think.
> We are trying to achieve one virtio-ism device per vm instead of one virtio-ism device per SMC connection.


Already done.


>
> >
> >>
> >>
> >>>
> >>>>
> >>>> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
> >>>>   determines the sharing relationship at startup.
> >>>
> >>> Not necessarily with IOTLB API?
> >>
> >> Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
> >> provide the same memory on the host to two vms. So the implementation of this
> >> part will be much simpler. This is why we gave up virtio-vhost-user at the
> >> beginning.
> >
> > Ok, just to make sure we're at the same page. From spec level,
> > virtio-vhost-user doesn't (can't) limit the backend to be implemented
> > in another VM. So it should be ok to be used for sharing memory
> > between a guest and host.
> >
> > Thanks
> >
> >>
> >> Thanks.
> >>
> >>
> >>>
> >>>>
> >>>> 4. For security issues, the device under virtiovhostuser may mmap more memory,
> >>>>   while ism only maps one region to other devices
> >>>
> >>> With VHOST_IOTLB_MAP, the map could be done per region.
> >>>
> >>> Thanks
> >>>
> >>>>
> >>>> Thanks.
> >>>>
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>>>
> >>>>>> # Design
> >>>>>>
> >>>>>>   This is a structure diagram based on ism sharing between two vms.
> >>>>>>
> >>>>>>    |-------------------------------------------------------------------------------------------------------------|
> >>>>>>    | |------------------------------------------------|       |------------------------------------------------| |
> >>>>>>    | | Guest                                          |       | Guest                                          | |
> >>>>>>    | |                                                |       |                                                | |
> >>>>>>    | |   ----------------                             |       |   ----------------                             | |
> >>>>>>    | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> >>>>>>    | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> >>>>>>    | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> >>>>>>    | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> >>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
> >>>>>>    | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> >>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
> >>>>>>    | |                                |               |       |                               |                | |
> >>>>>>    | |                                |               |       |                               |                | |
> >>>>>>    | | Qemu                           |               |       | Qemu                          |                | |
> >>>>>>    | |--------------------------------+---------------|       |-------------------------------+----------------| |
> >>>>>>    |                                  |                                                       |                  |
> >>>>>>    |                                  |                                                       |                  |
> >>>>>>    |                                  |------------------------------+------------------------|                  |
> >>>>>>    |                                                                 |                                           |
> >>>>>>    |                                                                 |                                           |
> >>>>>>    |                                                   --------------------------                                |
> >>>>>>    |                                                    | M1 |   | M2 |   | M3 |                                 |
> >>>>>>    |                                                   --------------------------                                |
> >>>>>>    |                                                                                                             |
> >>>>>>    | HOST                                                                                                        |
> >>>>>>    ---------------------------------------------------------------------------------------------------------------
> >>>>>>
> >>>>>> # POC code
> >>>>>>
> >>>>>>   Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> >>>>>>   Qemu:   https://github.com/fengidri/qemu/commits/ism
> >>>>>>
> >>>>>> If there are any problems, please point them out.
> >>>>>>
> >>>>>> Hope to hear from you, thank you.
> >>>>>>
> >>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> >>>>>> [2] https://dl.acm.org/doi/10.1145/2847562
> >>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> >>>>>> [4] https://lwn.net/Articles/711071/
> >>>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> >>>>>>
> >>>>>>
> >>>>>> Xuan Zhuo (2):
> >>>>>>  Reserve device id for ISM device
> >>>>>>  virtio-ism: introduce new device virtio-ism
> >>>>>>
> >>>>>> content.tex    |   3 +
> >>>>>> virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>> 2 files changed, 343 insertions(+)
> >>>>>> create mode 100644 virtio-ism.tex
> >>>>>>
> >>>>>> --
> >>>>>> 2.32.0.3.g01195cf9f
> >>>>>>
> >>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> >>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >>>>>>
> >>>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> >>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >>>>
> >>>
> >>
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  8:13           ` Xuan Zhuo
@ 2022-10-19  8:15             ` Xuan Zhuo
  2022-10-19  9:11               ` Jason Wang
  0 siblings, 1 reply; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-19  8:15 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi, Jason Wang

On Wed, 19 Oct 2022 16:13:07 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> On Wed, 19 Oct 2022 16:01:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > Adding Stefan.
> > > > > >
> > > > > >
> > > > > > On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > Hello everyone,
> > > > > > >
> > > > > > > # Background
> > > > > > >
> > > > > > > Nowadays, there is a common scenario to accelerate communication between
> > > > > > > different VMs and containers, including light weight virtual machine based
> > > > > > > containers. One way to achieve this is to colocate them on the same host.
> > > > > > > However, the performance of inter-VM communication through network stack is not
> > > > > > > optimal and may also waste extra CPU cycles. This scenario has been discussed
> > > > > > > many times, but still no generic solution available [1] [2] [3].
> > > > > > >
> > > > > > > With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> > > > > > > We found that by changing the communication channel between VMs from TCP to SMC
> > > > > > > with shared memory, we can achieve superior performance for a common
> > > > > > > socket-based application[5]:
> > > > > > >   - latency reduced by about 50%
> > > > > > >   - throughput increased by about 300%
> > > > > > >   - CPU consumption reduced by about 50%
> > > > > > >
> > > > > > > Since there is no particularly suitable shared memory management solution
> > > > > > > matches the need for SMC(See ## Comparison with existing technology), and virtio
> > > > > > > is the standard for communication in the virtualization world, we want to
> > > > > > > implement a virtio-ism device based on virtio, which can support on-demand
> > > > > > > memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> > > > > > > the virtio-ism device need to support:
> > > > > > >
> > > > > > > 1. Dynamic provision: shared memory regions are dynamically allocated and
> > > > > > >    provisioned.
> > > > > > > 2. Multi-region management: the shared memory is divided into regions,
> > > > > > >    and a peer may allocate one or more regions from the same shared memory
> > > > > > >    device.
> > > > > > > 3. Permission control: The permission of each region can be set seperately.
> > > > > >
> > > > > > Looks like virtio-ROCE
> > > > > >
> > > > > > https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> > > > > >
> > > > > > and virtio-vhost-user can satisfy the requirement?
> > > > > >
> > > > > > >
> > > > > > > # Virtio ism device
> > > > > > >
> > > > > > > ISM devices provide the ability to share memory between different guests on a
> > > > > > > host. A guest's memory got from ism device can be shared with multiple peers at
> > > > > > > the same time. This shared relationship can be dynamically created and released.
> > > > > > >
> > > > > > > The shared memory obtained from the device is divided into multiple ism regions
> > > > > > > for share. ISM device provides a mechanism to notify other ism region referrers
> > > > > > > of content update events.
> > > > > > >
> > > > > > > # Usage (SMC as example)
> > > > > > >
> > > > > > > Maybe there is one of possible use cases:
> > > > > > >
> > > > > > > 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> > > > > > >    location of a memory region in the PCI space and a token.
> > > > > > > 2. The ism driver mmap the memory region and return to SMC with the token
> > > > > > > 3. SMC passes the token to the connected peer
> > > > > > > 3. the peer calls the ism driver interface ism_attach_region(token) to
> > > > > > >    get the location of the PCI space of the shared memory
> > > > > > >
> > > > > > >
> > > > > > > # About hot plugging of the ism device
> > > > > > >
> > > > > > >    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> > > > > > >    less scalable operation. So, we don't plan to support it for now.
> > > > > > >
> > > > > > > # Comparison with existing technology
> > > > > > >
> > > > > > > ## ivshmem or ivshmem 2.0 of Qemu
> > > > > > >
> > > > > > >    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> > > > > > >    use this VM, so the security is not enough.
> > > > > > >
> > > > > > >    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> > > > > > >    other VMs that use the ivshmem 2.0 shared memory device, which also does not
> > > > > > >    meet our needs in terms of security.
> > > > > > >
> > > > > > > ## vhost-pci and virtiovhostuser
> > > > > > >
> > > > > > >    Does not support dynamic allocation and therefore not suitable for SMC.
> > > > > >
> > > > > > I think this is an implementation issue, we can support VHOST IOTLB
> > > > > > message then the regions could be added/removed on demand.
> > > > >
> > > > >
> > > > > 1. After the attacker connects with the victim, if the attacker does not
> > > > >    dereference memory, the memory will be occupied under virtiovhostuser. In the
> > > > >    case of ism devices, the victim can directly release the reference, and the
> > > > >    maliciously referenced region only occupies the attacker's resources
> > > >
> > > > Let's define the security boundary here. E.g do we trust the device or
> > > > not? If yes, in the case of virtiovhostuser, can we simple do
> > > > VHOST_IOTLB_UNMAP then we can safely release the memory from the
> > > > attacker.
> > > >
> > > > >
> > > > > 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
> > > > >    time, which is a challenge for virtiovhostuser
> > > >
> > > > Please elaborate more the the challenges, anything make
> > > > virtiovhostuser different?
> > >
> > > I understand (please point out any mistakes), one vvu device corresponds to one
> > > vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
> >
> > There could be some misunderstanding here. With 1000 VM, you still
> > need 1000 virtio-sim devices I think.
>
> No, just use a virtio-ism device.

For example, if the hardware memory of a virtio-ism is 1G, and an ism region is
1M, there are 1000 ism regions, and these ism regions can be shared with
different vms.

And it is dynamic. After an ism region is shared with a vm, it can be shared
with other vms.

Thanks.

>
> Thanks.
>
> >
> > >
> > >
> > > >
> > > > >
> > > > > 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
> > > > >    determines the sharing relationship at startup.
> > > >
> > > > Not necessarily with IOTLB API?
> > >
> > > Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
> > > provide the same memory on the host to two vms. So the implementation of this
> > > part will be much simpler. This is why we gave up virtio-vhost-user at the
> > > beginning.
> >
> > Ok, just to make sure we're at the same page. From spec level,
> > virtio-vhost-user doesn't (can't) limit the backend to be implemented
> > in another VM. So it should be ok to be used for sharing memory
> > between a guest and host.
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > >
> > > > > 4. For security issues, the device under virtiovhostuser may mmap more memory,
> > > > >    while ism only maps one region to other devices
> > > >
> > > > With VHOST_IOTLB_MAP, the map could be done per region.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > # Design
> > > > > > >
> > > > > > >    This is a structure diagram based on ism sharing between two vms.
> > > > > > >
> > > > > > >     |-------------------------------------------------------------------------------------------------------------|
> > > > > > >     | |------------------------------------------------|       |------------------------------------------------| |
> > > > > > >     | | Guest                                          |       | Guest                                          | |
> > > > > > >     | |                                                |       |                                                | |
> > > > > > >     | |   ----------------                             |       |   ----------------                             | |
> > > > > > >     | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> > > > > > >     | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> > > > > > >     | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> > > > > > >     | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> > > > > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > > > >     | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> > > > > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > > > >     | |                                |               |       |                               |                | |
> > > > > > >     | |                                |               |       |                               |                | |
> > > > > > >     | | Qemu                           |               |       | Qemu                          |                | |
> > > > > > >     | |--------------------------------+---------------|       |-------------------------------+----------------| |
> > > > > > >     |                                  |                                                       |                  |
> > > > > > >     |                                  |                                                       |                  |
> > > > > > >     |                                  |------------------------------+------------------------|                  |
> > > > > > >     |                                                                 |                                           |
> > > > > > >     |                                                                 |                                           |
> > > > > > >     |                                                   --------------------------                                |
> > > > > > >     |                                                    | M1 |   | M2 |   | M3 |                                 |
> > > > > > >     |                                                   --------------------------                                |
> > > > > > >     |                                                                                                             |
> > > > > > >     | HOST                                                                                                        |
> > > > > > >     ---------------------------------------------------------------------------------------------------------------
> > > > > > >
> > > > > > > # POC code
> > > > > > >
> > > > > > >    Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> > > > > > >    Qemu:   https://github.com/fengidri/qemu/commits/ism
> > > > > > >
> > > > > > > If there are any problems, please point them out.
> > > > > > >
> > > > > > > Hope to hear from you, thank you.
> > > > > > >
> > > > > > > [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> > > > > > > [2] https://dl.acm.org/doi/10.1145/2847562
> > > > > > > [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> > > > > > > [4] https://lwn.net/Articles/711071/
> > > > > > > [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> > > > > > >
> > > > > > >
> > > > > > > Xuan Zhuo (2):
> > > > > > >   Reserve device id for ISM device
> > > > > > >   virtio-ism: introduce new device virtio-ism
> > > > > > >
> > > > > > >  content.tex    |   3 +
> > > > > > >  virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > >  2 files changed, 343 insertions(+)
> > > > > > >  create mode 100644 virtio-ism.tex
> > > > > > >
> > > > > > > --
> > > > > > > 2.32.0.3.g01195cf9f
> > > > > > >
> > > > > > >
> > > > > > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > > >
> > > > > >
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > >
> > > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  8:03           ` Gerry
  2022-10-19  8:14             ` Xuan Zhuo
@ 2022-10-19  8:21             ` Dust Li
  2022-10-19  9:08               ` Jason Wang
  1 sibling, 1 reply; 61+ messages in thread
From: Dust Li @ 2022-10-19  8:21 UTC (permalink / raw)
  To: Gerry, Jason Wang
  Cc: Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, tonylu, zhenzao,
	helinguo, mst, cohuck, Stefan Hajnoczi

On Wed, Oct 19, 2022 at 04:03:42PM +0800, Gerry wrote:
>
>
>> 2022年10月19日 16:01,Jason Wang <jasowang@redhat.com> 写道:
>> 
>> On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>> 
>>> On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>> 
>>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>> Adding Stefan.
>>>>>> 
>>>>>> 
>>>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>>> 
>>>>>>> Hello everyone,
>>>>>>> 
>>>>>>> # Background
>>>>>>> 
>>>>>>> Nowadays, there is a common scenario to accelerate communication between
>>>>>>> different VMs and containers, including light weight virtual machine based
>>>>>>> containers. One way to achieve this is to colocate them on the same host.
>>>>>>> However, the performance of inter-VM communication through network stack is not
>>>>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
>>>>>>> many times, but still no generic solution available [1] [2] [3].
>>>>>>> 
>>>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
>>>>>>> We found that by changing the communication channel between VMs from TCP to SMC
>>>>>>> with shared memory, we can achieve superior performance for a common
>>>>>>> socket-based application[5]:
>>>>>>>  - latency reduced by about 50%
>>>>>>>  - throughput increased by about 300%
>>>>>>>  - CPU consumption reduced by about 50%
>>>>>>> 
>>>>>>> Since there is no particularly suitable shared memory management solution
>>>>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
>>>>>>> is the standard for communication in the virtualization world, we want to
>>>>>>> implement a virtio-ism device based on virtio, which can support on-demand
>>>>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
>>>>>>> the virtio-ism device need to support:
>>>>>>> 
>>>>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
>>>>>>>   provisioned.
>>>>>>> 2. Multi-region management: the shared memory is divided into regions,
>>>>>>>   and a peer may allocate one or more regions from the same shared memory
>>>>>>>   device.
>>>>>>> 3. Permission control: The permission of each region can be set seperately.
>>>>>> 
>>>>>> Looks like virtio-ROCE
>>>>>> 
>>>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
>>>>>> 
>>>>>> and virtio-vhost-user can satisfy the requirement?
>>>>>> 
>>>>>>> 
>>>>>>> # Virtio ism device
>>>>>>> 
>>>>>>> ISM devices provide the ability to share memory between different guests on a
>>>>>>> host. A guest's memory got from ism device can be shared with multiple peers at
>>>>>>> the same time. This shared relationship can be dynamically created and released.
>>>>>>> 
>>>>>>> The shared memory obtained from the device is divided into multiple ism regions
>>>>>>> for share. ISM device provides a mechanism to notify other ism region referrers
>>>>>>> of content update events.
>>>>>>> 
>>>>>>> # Usage (SMC as example)
>>>>>>> 
>>>>>>> Maybe there is one of possible use cases:
>>>>>>> 
>>>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
>>>>>>>   location of a memory region in the PCI space and a token.
>>>>>>> 2. The ism driver mmap the memory region and return to SMC with the token
>>>>>>> 3. SMC passes the token to the connected peer
>>>>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
>>>>>>>   get the location of the PCI space of the shared memory
>>>>>>> 
>>>>>>> 
>>>>>>> # About hot plugging of the ism device
>>>>>>> 
>>>>>>>   Hot plugging of devices is a heavier, possibly failed, time-consuming, and
>>>>>>>   less scalable operation. So, we don't plan to support it for now.
>>>>>>> 
>>>>>>> # Comparison with existing technology
>>>>>>> 
>>>>>>> ## ivshmem or ivshmem 2.0 of Qemu
>>>>>>> 
>>>>>>>   1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
>>>>>>>   use this VM, so the security is not enough.
>>>>>>> 
>>>>>>>   2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
>>>>>>>   other VMs that use the ivshmem 2.0 shared memory device, which also does not
>>>>>>>   meet our needs in terms of security.
>>>>>>> 
>>>>>>> ## vhost-pci and virtiovhostuser
>>>>>>> 
>>>>>>>   Does not support dynamic allocation and therefore not suitable for SMC.
>>>>>> 
>>>>>> I think this is an implementation issue, we can support VHOST IOTLB
>>>>>> message then the regions could be added/removed on demand.
>>>>> 
>>>>> 
>>>>> 1. After the attacker connects with the victim, if the attacker does not
>>>>>   dereference memory, the memory will be occupied under virtiovhostuser. In the
>>>>>   case of ism devices, the victim can directly release the reference, and the
>>>>>   maliciously referenced region only occupies the attacker's resources
>>>> 
>>>> Let's define the security boundary here. E.g do we trust the device or
>>>> not? If yes, in the case of virtiovhostuser, can we simple do
>>>> VHOST_IOTLB_UNMAP then we can safely release the memory from the
>>>> attacker.
>>>> 
>>>>> 
>>>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
>>>>>   time, which is a challenge for virtiovhostuser
>>>> 
>>>> Please elaborate more the the challenges, anything make
>>>> virtiovhostuser different?
>>> 
>>> I understand (please point out any mistakes), one vvu device corresponds to one
>>> vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
>> 
>> There could be some misunderstanding here. With 1000 VM, you still
>> need 1000 virtio-sim devices I think.
>We are trying to achieve one virtio-ism device per vm instead of one virtio-ism device per SMC connection.

I think we must achieve this if we want to meet the requirements of SMC.
In SMC, a SMC socket(Corresponding to a TCP socket) need 2 memory
regions(1 for Tx and 1 for Rx). So if we have 1K TCP connections,
we'll need 2K share memory regions, and those memory regions are
dynamically allocated and freed with the TCP socket.

>
>> 
>>> 
>>> 
>>>> 
>>>>> 
>>>>> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
>>>>>   determines the sharing relationship at startup.
>>>> 
>>>> Not necessarily with IOTLB API?
>>> 
>>> Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
>>> provide the same memory on the host to two vms. So the implementation of this
>>> part will be much simpler. This is why we gave up virtio-vhost-user at the
>>> beginning.
>> 
>> Ok, just to make sure we're at the same page. From spec level,
>> virtio-vhost-user doesn't (can't) limit the backend to be implemented
>> in another VM. So it should be ok to be used for sharing memory
>> between a guest and host.
>> 
>> Thanks
>> 
>>> 
>>> Thanks.
>>> 
>>> 
>>>> 
>>>>> 
>>>>> 4. For security issues, the device under virtiovhostuser may mmap more memory,
>>>>>   while ism only maps one region to other devices
>>>> 
>>>> With VHOST_IOTLB_MAP, the map could be done per region.
>>>> 
>>>> Thanks
>>>> 
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>>> 
>>>>>>> # Design
>>>>>>> 
>>>>>>>   This is a structure diagram based on ism sharing between two vms.
>>>>>>> 
>>>>>>>    |-------------------------------------------------------------------------------------------------------------|
>>>>>>>    | |------------------------------------------------|       |------------------------------------------------| |
>>>>>>>    | | Guest                                          |       | Guest                                          | |
>>>>>>>    | |                                                |       |                                                | |
>>>>>>>    | |   ----------------                             |       |   ----------------                             | |
>>>>>>>    | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
>>>>>>>    | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
>>>>>>>    | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
>>>>>>>    | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
>>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>>>>>>    | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
>>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>>>>>>    | |                                |               |       |                               |                | |
>>>>>>>    | |                                |               |       |                               |                | |
>>>>>>>    | | Qemu                           |               |       | Qemu                          |                | |
>>>>>>>    | |--------------------------------+---------------|       |-------------------------------+----------------| |
>>>>>>>    |                                  |                                                       |                  |
>>>>>>>    |                                  |                                                       |                  |
>>>>>>>    |                                  |------------------------------+------------------------|                  |
>>>>>>>    |                                                                 |                                           |
>>>>>>>    |                                                                 |                                           |
>>>>>>>    |                                                   --------------------------                                |
>>>>>>>    |                                                    | M1 |   | M2 |   | M3 |                                 |
>>>>>>>    |                                                   --------------------------                                |
>>>>>>>    |                                                                                                             |
>>>>>>>    | HOST                                                                                                        |
>>>>>>>    ---------------------------------------------------------------------------------------------------------------
>>>>>>> 
>>>>>>> # POC code
>>>>>>> 
>>>>>>>   Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
>>>>>>>   Qemu:   https://github.com/fengidri/qemu/commits/ism
>>>>>>> 
>>>>>>> If there are any problems, please point them out.
>>>>>>> 
>>>>>>> Hope to hear from you, thank you.
>>>>>>> 
>>>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
>>>>>>> [2] https://dl.acm.org/doi/10.1145/2847562
>>>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
>>>>>>> [4] https://lwn.net/Articles/711071/
>>>>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
>>>>>>> 
>>>>>>> 
>>>>>>> Xuan Zhuo (2):
>>>>>>>  Reserve device id for ISM device
>>>>>>>  virtio-ism: introduce new device virtio-ism
>>>>>>> 
>>>>>>> content.tex    |   3 +
>>>>>>> virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> 2 files changed, 343 insertions(+)
>>>>>>> create mode 100644 virtio-ism.tex
>>>>>>> 
>>>>>>> --
>>>>>>> 2.32.0.3.g01195cf9f
>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>> 
>>>> 
>>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  8:07             ` Tony Lu
@ 2022-10-19  9:04               ` Jason Wang
  2022-10-19  9:10                 ` Gerry
  2022-10-19 10:01                 ` Tony Lu
  0 siblings, 2 replies; 61+ messages in thread
From: Jason Wang @ 2022-10-19  9:04 UTC (permalink / raw)
  To: Tony Lu
  Cc: Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, dust.li,
	zhenzao, helinguo, gerry, mst, cohuck, Stefan Hajnoczi


在 2022/10/19 16:07, Tony Lu 写道:
> On Wed, Oct 19, 2022 at 02:02:21PM +0800, Xuan Zhuo wrote:
>> On Wed, 19 Oct 2022 12:36:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>> On Wed, Oct 19, 2022 at 12:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>> On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>> On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Jason,
>>>>>>
>>>>>> I think there may be some problems with the direction we are discussing.
>>>>> Probably not.
>>>>>
>>>>> As far as we are focusing on technology, there's nothing wrong from my
>>>>> perspective. And this is how the community works. Your idea needs to
>>>>> be justified and people are free to raise any technical questions
>>>>> especially considering you've posted a spec change with prototype
>>>>> codes but not only the idea.
>>>>>
>>>>>> Our
>>>>>> goal is to add an new ism device. As far as the spec is concerned, we are not
>>>>>> concerned with the implementation of the backend.
>>>>>>
>>>>>> The direction we should discuss is what is the difference between the ism device
>>>>>> and other devices such as virtio-net, and whether it is necessary to introduce
>>>>>> this new device.
>>>>> This is somehow what I want to ask, actually it's not a comparison
>>>>> with virtio-net but:
>>>>>
>>>>> - virtio-roce
>>>>> - virtio-vhost-user
>>>>> - virtio-(p)mem
>>>>>
>>>>> or whether we can simply add features to those devices to achieve what
>>>>> you want to do here.
>>>>
>>>> Yes, this is my priority to discuss.
>>>>
>>>> At the moment, I think the most similar to ism is the Vhost-user Device Backend
>>>> of virtio-vhost-user.
>>>>
>>>> My understanding of it is to map any virtio device to another vm as a vvu
>>>> device.
>>> Yes, so a possible way is to have a device with memory zone/region
>>> provision and management then map it via virtio-vhost-user.
>>
>> Yes, there is such a possibility. virtio-vhost-user makes me feel that what can
>> be shared is the function implementation of map.
>>
>> But in the vm to provide the interface to the upper layer, I think this is the
>> work of ism.
>>
>> But one of the reasons why I didn't use virtio-vhost-user directly is that in
>> another vm, the guest can operate the vvu device, which we hope that both sides
>> are equal to the ism device.
>>
>> So I want to agree on a question first: who will provide the upper layer with
>> the ability to share the memory area?
>>
>> Our answer is a new ism device. How does this device achieve memory sharing, I
>> think is the second question.
>>
>>
>>>>  From this design purpose, I think the two are different.
>>>>
>>>> Of course, you might want to extend it, it does have some similarities and uses
>>>> a lot of similar techniques.
>>> I don't have any preference so far. If you think your idea makes more
>>> sense, then try your best to justify it in the list.
>>>
>>>> So we can really discuss in this direction, whether
>>>> the vvu device can be extended to achieve the purpose of ism, or whether the
>>>> design goals can be agreed.
>>> I've added Stefan in the loop, let's hear from him.
>>>
>>>> Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
>>>> Should device/driver APIs remain independent?
>>> Btw, you mentioned that one possible user of ism is the smc, but I
>>> don't see how it connects to that with your prototype driver.
>> Yes, we originally had plans, but the virtio spec was considered for submission,
>> so this was not included. Maybe, we should have included this part @Tony
>>
>> A brief introduction is that SMC currently has a corresponding
>> s390/net/ism_drv.c and we will replace this in the virtualization scenario.


Ok, I see. So I think the goal is to implement something in virtio that 
is functional equivalent to IBM ISM device.


>>
>> Thanks.
>>
> SMC is a network protocol which is modeled by shared memory rather than
> packet.


After reading more SMC from IBM website, I think you meant SMC-D here. 
And I wonder in order to have a complete SMC solution we still need 
virtio-ROCE for inter host communcation?


>   Actually the basic required interfaces of SMC device are:
>
>    - alloc / free memory region, each connection peer has two memory
> 	regions dynamically for sending and receiving ring buffer.
>    - attach / detach memory region, remote attaches local-allocated
> 	sending region as receiving region, vice versa.
>    - notify, tell peer to read data and update cursor.
>
> Then the device can be registered as SMC ISM device. Of course, SMC
> also requires some modification to adapt it.


Looking at s390 ism driver it requires other stuffs like vlan add/remove 
or gid query, do we need them as well?

Thanks


>
> Cheers,
> Tony Lu
>
>>> Thanks
>>>
>>>> Thanks.
>>>>
>>>>
>>>>>> How to share the backend with other deivce is another problem.
>>>>> Yes, anything that is used for your virito-ism prototype can be used
>>>>> for other devices.
>>>>>
>>>>>> Our goal is to dynamically obtain a piece of memory to share with other vms.
>>>>> So at this level, I don't see the exact difference compared to
>>>>> virtio-vhost-user. Let's just focus on the API that carries on the
>>>>> semantic:
>>>>>
>>>>> - map/unmap
>>>>> - permission update
>>>>>
>>>>> The only missing piece is the per region notification.
>>>>>
>>>>>> In a connection, this memory will be used repeatedly. As far as SMC is concerned,
>>>>>> it will use it as a ring. Of course, we also need a notify mechanism.
>>>>>>
>>>>>> That's what we're aiming for, so we should first discuss whether this
>>>>>> requirement is reasonable.
>>>>> So unless somebody said "no", it is fine until now.
>>>>>
>>>>>> I think it's a feature currently not supported by
>>>>>> other devices specified by the current virtio spce.
>>>>> Probably, but we've already had rfcs for roce and vhost-user.
>>>>>
>>>>> Thanks
>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  8:21             ` Dust Li
@ 2022-10-19  9:08               ` Jason Wang
  2022-10-19  9:10                 ` Xuan Zhuo
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Wang @ 2022-10-19  9:08 UTC (permalink / raw)
  To: dust.li
  Cc: Gerry, Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, tonylu,
	zhenzao, helinguo, mst, cohuck, Stefan Hajnoczi

On Wed, Oct 19, 2022 at 4:21 PM Dust Li <dust.li@linux.alibaba.com> wrote:
>
> On Wed, Oct 19, 2022 at 04:03:42PM +0800, Gerry wrote:
> >
> >
> >> 2022年10月19日 16:01,Jason Wang <jasowang@redhat.com> 写道:
> >>
> >> On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >>>
> >>> On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >>>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >>>>>
> >>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >>>>>> Adding Stefan.
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >>>>>>>
> >>>>>>> Hello everyone,
> >>>>>>>
> >>>>>>> # Background
> >>>>>>>
> >>>>>>> Nowadays, there is a common scenario to accelerate communication between
> >>>>>>> different VMs and containers, including light weight virtual machine based
> >>>>>>> containers. One way to achieve this is to colocate them on the same host.
> >>>>>>> However, the performance of inter-VM communication through network stack is not
> >>>>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
> >>>>>>> many times, but still no generic solution available [1] [2] [3].
> >>>>>>>
> >>>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> >>>>>>> We found that by changing the communication channel between VMs from TCP to SMC
> >>>>>>> with shared memory, we can achieve superior performance for a common
> >>>>>>> socket-based application[5]:
> >>>>>>>  - latency reduced by about 50%
> >>>>>>>  - throughput increased by about 300%
> >>>>>>>  - CPU consumption reduced by about 50%
> >>>>>>>
> >>>>>>> Since there is no particularly suitable shared memory management solution
> >>>>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
> >>>>>>> is the standard for communication in the virtualization world, we want to
> >>>>>>> implement a virtio-ism device based on virtio, which can support on-demand
> >>>>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> >>>>>>> the virtio-ism device need to support:
> >>>>>>>
> >>>>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
> >>>>>>>   provisioned.
> >>>>>>> 2. Multi-region management: the shared memory is divided into regions,
> >>>>>>>   and a peer may allocate one or more regions from the same shared memory
> >>>>>>>   device.
> >>>>>>> 3. Permission control: The permission of each region can be set seperately.
> >>>>>>
> >>>>>> Looks like virtio-ROCE
> >>>>>>
> >>>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> >>>>>>
> >>>>>> and virtio-vhost-user can satisfy the requirement?
> >>>>>>
> >>>>>>>
> >>>>>>> # Virtio ism device
> >>>>>>>
> >>>>>>> ISM devices provide the ability to share memory between different guests on a
> >>>>>>> host. A guest's memory got from ism device can be shared with multiple peers at
> >>>>>>> the same time. This shared relationship can be dynamically created and released.
> >>>>>>>
> >>>>>>> The shared memory obtained from the device is divided into multiple ism regions
> >>>>>>> for share. ISM device provides a mechanism to notify other ism region referrers
> >>>>>>> of content update events.
> >>>>>>>
> >>>>>>> # Usage (SMC as example)
> >>>>>>>
> >>>>>>> Maybe there is one of possible use cases:
> >>>>>>>
> >>>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> >>>>>>>   location of a memory region in the PCI space and a token.
> >>>>>>> 2. The ism driver mmap the memory region and return to SMC with the token
> >>>>>>> 3. SMC passes the token to the connected peer
> >>>>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
> >>>>>>>   get the location of the PCI space of the shared memory
> >>>>>>>
> >>>>>>>
> >>>>>>> # About hot plugging of the ism device
> >>>>>>>
> >>>>>>>   Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> >>>>>>>   less scalable operation. So, we don't plan to support it for now.
> >>>>>>>
> >>>>>>> # Comparison with existing technology
> >>>>>>>
> >>>>>>> ## ivshmem or ivshmem 2.0 of Qemu
> >>>>>>>
> >>>>>>>   1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> >>>>>>>   use this VM, so the security is not enough.
> >>>>>>>
> >>>>>>>   2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> >>>>>>>   other VMs that use the ivshmem 2.0 shared memory device, which also does not
> >>>>>>>   meet our needs in terms of security.
> >>>>>>>
> >>>>>>> ## vhost-pci and virtiovhostuser
> >>>>>>>
> >>>>>>>   Does not support dynamic allocation and therefore not suitable for SMC.
> >>>>>>
> >>>>>> I think this is an implementation issue, we can support VHOST IOTLB
> >>>>>> message then the regions could be added/removed on demand.
> >>>>>
> >>>>>
> >>>>> 1. After the attacker connects with the victim, if the attacker does not
> >>>>>   dereference memory, the memory will be occupied under virtiovhostuser. In the
> >>>>>   case of ism devices, the victim can directly release the reference, and the
> >>>>>   maliciously referenced region only occupies the attacker's resources
> >>>>
> >>>> Let's define the security boundary here. E.g do we trust the device or
> >>>> not? If yes, in the case of virtiovhostuser, can we simple do
> >>>> VHOST_IOTLB_UNMAP then we can safely release the memory from the
> >>>> attacker.
> >>>>
> >>>>>
> >>>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
> >>>>>   time, which is a challenge for virtiovhostuser
> >>>>
> >>>> Please elaborate more the the challenges, anything make
> >>>> virtiovhostuser different?
> >>>
> >>> I understand (please point out any mistakes), one vvu device corresponds to one
> >>> vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
> >>
> >> There could be some misunderstanding here. With 1000 VM, you still
> >> need 1000 virtio-sim devices I think.
> >We are trying to achieve one virtio-ism device per vm instead of one virtio-ism device per SMC connection.

I wonder if we need something to identify a virtio-ism device since I
guess there's still a chance to have multiple virtio-ism device per VM
(different service chain etc).

Thanks

>
> I think we must achieve this if we want to meet the requirements of SMC.
> In SMC, a SMC socket(Corresponding to a TCP socket) need 2 memory
> regions(1 for Tx and 1 for Rx). So if we have 1K TCP connections,
> we'll need 2K share memory regions, and those memory regions are
> dynamically allocated and freed with the TCP socket.
>
> >
> >>
> >>>
> >>>
> >>>>
> >>>>>
> >>>>> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
> >>>>>   determines the sharing relationship at startup.
> >>>>
> >>>> Not necessarily with IOTLB API?
> >>>
> >>> Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
> >>> provide the same memory on the host to two vms. So the implementation of this
> >>> part will be much simpler. This is why we gave up virtio-vhost-user at the
> >>> beginning.
> >>
> >> Ok, just to make sure we're at the same page. From spec level,
> >> virtio-vhost-user doesn't (can't) limit the backend to be implemented
> >> in another VM. So it should be ok to be used for sharing memory
> >> between a guest and host.
> >>
> >> Thanks
> >>
> >>>
> >>> Thanks.
> >>>
> >>>
> >>>>
> >>>>>
> >>>>> 4. For security issues, the device under virtiovhostuser may mmap more memory,
> >>>>>   while ism only maps one region to other devices
> >>>>
> >>>> With VHOST_IOTLB_MAP, the map could be done per region.
> >>>>
> >>>> Thanks
> >>>>
> >>>>>
> >>>>> Thanks.
> >>>>>
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>>>
> >>>>>>> # Design
> >>>>>>>
> >>>>>>>   This is a structure diagram based on ism sharing between two vms.
> >>>>>>>
> >>>>>>>    |-------------------------------------------------------------------------------------------------------------|
> >>>>>>>    | |------------------------------------------------|       |------------------------------------------------| |
> >>>>>>>    | | Guest                                          |       | Guest                                          | |
> >>>>>>>    | |                                                |       |                                                | |
> >>>>>>>    | |   ----------------                             |       |   ----------------                             | |
> >>>>>>>    | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> >>>>>>>    | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> >>>>>>>    | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> >>>>>>>    | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> >>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
> >>>>>>>    | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> >>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
> >>>>>>>    | |                                |               |       |                               |                | |
> >>>>>>>    | |                                |               |       |                               |                | |
> >>>>>>>    | | Qemu                           |               |       | Qemu                          |                | |
> >>>>>>>    | |--------------------------------+---------------|       |-------------------------------+----------------| |
> >>>>>>>    |                                  |                                                       |                  |
> >>>>>>>    |                                  |                                                       |                  |
> >>>>>>>    |                                  |------------------------------+------------------------|                  |
> >>>>>>>    |                                                                 |                                           |
> >>>>>>>    |                                                                 |                                           |
> >>>>>>>    |                                                   --------------------------                                |
> >>>>>>>    |                                                    | M1 |   | M2 |   | M3 |                                 |
> >>>>>>>    |                                                   --------------------------                                |
> >>>>>>>    |                                                                                                             |
> >>>>>>>    | HOST                                                                                                        |
> >>>>>>>    ---------------------------------------------------------------------------------------------------------------
> >>>>>>>
> >>>>>>> # POC code
> >>>>>>>
> >>>>>>>   Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> >>>>>>>   Qemu:   https://github.com/fengidri/qemu/commits/ism
> >>>>>>>
> >>>>>>> If there are any problems, please point them out.
> >>>>>>>
> >>>>>>> Hope to hear from you, thank you.
> >>>>>>>
> >>>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> >>>>>>> [2] https://dl.acm.org/doi/10.1145/2847562
> >>>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> >>>>>>> [4] https://lwn.net/Articles/711071/
> >>>>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> >>>>>>>
> >>>>>>>
> >>>>>>> Xuan Zhuo (2):
> >>>>>>>  Reserve device id for ISM device
> >>>>>>>  virtio-ism: introduce new device virtio-ism
> >>>>>>>
> >>>>>>> content.tex    |   3 +
> >>>>>>> virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>> 2 files changed, 343 insertions(+)
> >>>>>>> create mode 100644 virtio-ism.tex
> >>>>>>>
> >>>>>>> --
> >>>>>>> 2.32.0.3.g01195cf9f
> >>>>>>>
> >>>>>>>
> >>>>>>> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> >>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> >>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >>>>>
> >>>>
> >>>
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  9:04               ` Jason Wang
@ 2022-10-19  9:10                 ` Gerry
  2022-10-19  9:13                   ` Jason Wang
  2022-10-19 10:01                 ` Tony Lu
  1 sibling, 1 reply; 61+ messages in thread
From: Gerry @ 2022-10-19  9:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: Tony Lu, Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc,
	dust.li, zhenzao, helinguo, mst, cohuck, Stefan Hajnoczi



> 2022年10月19日 17:04,Jason Wang <jasowang@redhat.com> 写道:
> 
> 
> 在 2022/10/19 16:07, Tony Lu 写道:
>> On Wed, Oct 19, 2022 at 02:02:21PM +0800, Xuan Zhuo wrote:
>>> On Wed, 19 Oct 2022 12:36:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>> On Wed, Oct 19, 2022 at 12:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>> On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>> On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Hi Jason,
>>>>>>> 
>>>>>>> I think there may be some problems with the direction we are discussing.
>>>>>> Probably not.
>>>>>> 
>>>>>> As far as we are focusing on technology, there's nothing wrong from my
>>>>>> perspective. And this is how the community works. Your idea needs to
>>>>>> be justified and people are free to raise any technical questions
>>>>>> especially considering you've posted a spec change with prototype
>>>>>> codes but not only the idea.
>>>>>> 
>>>>>>> Our
>>>>>>> goal is to add an new ism device. As far as the spec is concerned, we are not
>>>>>>> concerned with the implementation of the backend.
>>>>>>> 
>>>>>>> The direction we should discuss is what is the difference between the ism device
>>>>>>> and other devices such as virtio-net, and whether it is necessary to introduce
>>>>>>> this new device.
>>>>>> This is somehow what I want to ask, actually it's not a comparison
>>>>>> with virtio-net but:
>>>>>> 
>>>>>> - virtio-roce
>>>>>> - virtio-vhost-user
>>>>>> - virtio-(p)mem
>>>>>> 
>>>>>> or whether we can simply add features to those devices to achieve what
>>>>>> you want to do here.
>>>>> 
>>>>> Yes, this is my priority to discuss.
>>>>> 
>>>>> At the moment, I think the most similar to ism is the Vhost-user Device Backend
>>>>> of virtio-vhost-user.
>>>>> 
>>>>> My understanding of it is to map any virtio device to another vm as a vvu
>>>>> device.
>>>> Yes, so a possible way is to have a device with memory zone/region
>>>> provision and management then map it via virtio-vhost-user.
>>> 
>>> Yes, there is such a possibility. virtio-vhost-user makes me feel that what can
>>> be shared is the function implementation of map.
>>> 
>>> But in the vm to provide the interface to the upper layer, I think this is the
>>> work of ism.
>>> 
>>> But one of the reasons why I didn't use virtio-vhost-user directly is that in
>>> another vm, the guest can operate the vvu device, which we hope that both sides
>>> are equal to the ism device.
>>> 
>>> So I want to agree on a question first: who will provide the upper layer with
>>> the ability to share the memory area?
>>> 
>>> Our answer is a new ism device. How does this device achieve memory sharing, I
>>> think is the second question.
>>> 
>>> 
>>>>> From this design purpose, I think the two are different.
>>>>> 
>>>>> Of course, you might want to extend it, it does have some similarities and uses
>>>>> a lot of similar techniques.
>>>> I don't have any preference so far. If you think your idea makes more
>>>> sense, then try your best to justify it in the list.
>>>> 
>>>>> So we can really discuss in this direction, whether
>>>>> the vvu device can be extended to achieve the purpose of ism, or whether the
>>>>> design goals can be agreed.
>>>> I've added Stefan in the loop, let's hear from him.
>>>> 
>>>>> Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
>>>>> Should device/driver APIs remain independent?
>>>> Btw, you mentioned that one possible user of ism is the smc, but I
>>>> don't see how it connects to that with your prototype driver.
>>> Yes, we originally had plans, but the virtio spec was considered for submission,
>>> so this was not included. Maybe, we should have included this part @Tony
>>> 
>>> A brief introduction is that SMC currently has a corresponding
>>> s390/net/ism_drv.c and we will replace this in the virtualization scenario.
> 
> 
> Ok, I see. So I think the goal is to implement something in virtio that is functional equivalent to IBM ISM device.
> 
> 
>>> 
>>> Thanks.
>>> 
>> SMC is a network protocol which is modeled by shared memory rather than
>> packet.
> 
> 
> After reading more SMC from IBM website, I think you meant SMC-D here. And I wonder in order to have a complete SMC solution we still need virtio-ROCE for inter host communcation?
Absolutely, a complete solution includes SMC-R remote peers and SMC-D for local peers:)


> 
>>  Actually the basic required interfaces of SMC device are:
>> 
>>   - alloc / free memory region, each connection peer has two memory
>> 	regions dynamically for sending and receiving ring buffer.
>>   - attach / detach memory region, remote attaches local-allocated
>> 	sending region as receiving region, vice versa.
>>   - notify, tell peer to read data and update cursor.
>> 
>> Then the device can be registered as SMC ISM device. Of course, SMC
>> also requires some modification to adapt it.
> 
> 
> Looking at s390 ism driver it requires other stuffs like vlan add/remove or gid query, do we need them as well?
We plan to get rid of vlan support, but we do need interface to query gid etc.
And the virtio-queue helps us much to implement a control communication channel to support those operations.

> 
> Thanks
> 
> 
>> 
>> Cheers,
>> Tony Lu
>> 
>>>> Thanks
>>>> 
>>>>> Thanks.
>>>>> 
>>>>> 
>>>>>>> How to share the backend with other deivce is another problem.
>>>>>> Yes, anything that is used for your virito-ism prototype can be used
>>>>>> for other devices.
>>>>>> 
>>>>>>> Our goal is to dynamically obtain a piece of memory to share with other vms.
>>>>>> So at this level, I don't see the exact difference compared to
>>>>>> virtio-vhost-user. Let's just focus on the API that carries on the
>>>>>> semantic:
>>>>>> 
>>>>>> - map/unmap
>>>>>> - permission update
>>>>>> 
>>>>>> The only missing piece is the per region notification.
>>>>>> 
>>>>>>> In a connection, this memory will be used repeatedly. As far as SMC is concerned,
>>>>>>> it will use it as a ring. Of course, we also need a notify mechanism.
>>>>>>> 
>>>>>>> That's what we're aiming for, so we should first discuss whether this
>>>>>>> requirement is reasonable.
>>>>>> So unless somebody said "no", it is fine until now.
>>>>>> 
>>>>>>> I think it's a feature currently not supported by
>>>>>>> other devices specified by the current virtio spce.
>>>>>> Probably, but we've already had rfcs for roce and vhost-user.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  9:08               ` Jason Wang
@ 2022-10-19  9:10                 ` Xuan Zhuo
  2022-10-19  9:15                   ` Jason Wang
  0 siblings, 1 reply; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-19  9:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: Gerry, virtio-dev, hans, herongguang, zmlcc, tonylu, zhenzao,
	helinguo, mst, cohuck, Stefan Hajnoczi, dust.li

On Wed, 19 Oct 2022 17:08:29 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Oct 19, 2022 at 4:21 PM Dust Li <dust.li@linux.alibaba.com> wrote:
> >
> > On Wed, Oct 19, 2022 at 04:03:42PM +0800, Gerry wrote:
> > >
> > >
> > >> 2022年10月19日 16:01,Jason Wang <jasowang@redhat.com> 写道:
> > >>
> > >> On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >>>
> > >>> On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > >>>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >>>>>
> > >>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > >>>>>> Adding Stefan.
> > >>>>>>
> > >>>>>>
> > >>>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >>>>>>>
> > >>>>>>> Hello everyone,
> > >>>>>>>
> > >>>>>>> # Background
> > >>>>>>>
> > >>>>>>> Nowadays, there is a common scenario to accelerate communication between
> > >>>>>>> different VMs and containers, including light weight virtual machine based
> > >>>>>>> containers. One way to achieve this is to colocate them on the same host.
> > >>>>>>> However, the performance of inter-VM communication through network stack is not
> > >>>>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
> > >>>>>>> many times, but still no generic solution available [1] [2] [3].
> > >>>>>>>
> > >>>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> > >>>>>>> We found that by changing the communication channel between VMs from TCP to SMC
> > >>>>>>> with shared memory, we can achieve superior performance for a common
> > >>>>>>> socket-based application[5]:
> > >>>>>>>  - latency reduced by about 50%
> > >>>>>>>  - throughput increased by about 300%
> > >>>>>>>  - CPU consumption reduced by about 50%
> > >>>>>>>
> > >>>>>>> Since there is no particularly suitable shared memory management solution
> > >>>>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
> > >>>>>>> is the standard for communication in the virtualization world, we want to
> > >>>>>>> implement a virtio-ism device based on virtio, which can support on-demand
> > >>>>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> > >>>>>>> the virtio-ism device need to support:
> > >>>>>>>
> > >>>>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
> > >>>>>>>   provisioned.
> > >>>>>>> 2. Multi-region management: the shared memory is divided into regions,
> > >>>>>>>   and a peer may allocate one or more regions from the same shared memory
> > >>>>>>>   device.
> > >>>>>>> 3. Permission control: The permission of each region can be set seperately.
> > >>>>>>
> > >>>>>> Looks like virtio-ROCE
> > >>>>>>
> > >>>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> > >>>>>>
> > >>>>>> and virtio-vhost-user can satisfy the requirement?
> > >>>>>>
> > >>>>>>>
> > >>>>>>> # Virtio ism device
> > >>>>>>>
> > >>>>>>> ISM devices provide the ability to share memory between different guests on a
> > >>>>>>> host. A guest's memory got from ism device can be shared with multiple peers at
> > >>>>>>> the same time. This shared relationship can be dynamically created and released.
> > >>>>>>>
> > >>>>>>> The shared memory obtained from the device is divided into multiple ism regions
> > >>>>>>> for share. ISM device provides a mechanism to notify other ism region referrers
> > >>>>>>> of content update events.
> > >>>>>>>
> > >>>>>>> # Usage (SMC as example)
> > >>>>>>>
> > >>>>>>> Maybe there is one of possible use cases:
> > >>>>>>>
> > >>>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> > >>>>>>>   location of a memory region in the PCI space and a token.
> > >>>>>>> 2. The ism driver mmap the memory region and return to SMC with the token
> > >>>>>>> 3. SMC passes the token to the connected peer
> > >>>>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
> > >>>>>>>   get the location of the PCI space of the shared memory
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> # About hot plugging of the ism device
> > >>>>>>>
> > >>>>>>>   Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> > >>>>>>>   less scalable operation. So, we don't plan to support it for now.
> > >>>>>>>
> > >>>>>>> # Comparison with existing technology
> > >>>>>>>
> > >>>>>>> ## ivshmem or ivshmem 2.0 of Qemu
> > >>>>>>>
> > >>>>>>>   1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> > >>>>>>>   use this VM, so the security is not enough.
> > >>>>>>>
> > >>>>>>>   2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> > >>>>>>>   other VMs that use the ivshmem 2.0 shared memory device, which also does not
> > >>>>>>>   meet our needs in terms of security.
> > >>>>>>>
> > >>>>>>> ## vhost-pci and virtiovhostuser
> > >>>>>>>
> > >>>>>>>   Does not support dynamic allocation and therefore not suitable for SMC.
> > >>>>>>
> > >>>>>> I think this is an implementation issue, we can support VHOST IOTLB
> > >>>>>> message then the regions could be added/removed on demand.
> > >>>>>
> > >>>>>
> > >>>>> 1. After the attacker connects with the victim, if the attacker does not
> > >>>>>   dereference memory, the memory will be occupied under virtiovhostuser. In the
> > >>>>>   case of ism devices, the victim can directly release the reference, and the
> > >>>>>   maliciously referenced region only occupies the attacker's resources
> > >>>>
> > >>>> Let's define the security boundary here. E.g do we trust the device or
> > >>>> not? If yes, in the case of virtiovhostuser, can we simple do
> > >>>> VHOST_IOTLB_UNMAP then we can safely release the memory from the
> > >>>> attacker.
> > >>>>
> > >>>>>
> > >>>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
> > >>>>>   time, which is a challenge for virtiovhostuser
> > >>>>
> > >>>> Please elaborate more the the challenges, anything make
> > >>>> virtiovhostuser different?
> > >>>
> > >>> I understand (please point out any mistakes), one vvu device corresponds to one
> > >>> vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
> > >>
> > >> There could be some misunderstanding here. With 1000 VM, you still
> > >> need 1000 virtio-sim devices I think.
> > >We are trying to achieve one virtio-ism device per vm instead of one virtio-ism device per SMC connection.
>
> I wonder if we need something to identify a virtio-ism device since I
> guess there's still a chance to have multiple virtio-ism device per VM
> (different service chain etc).

Yes, there will be such a situation, a vm has multiple virtio-ism devices.

What exactly do you mean by "identify"?

Thanks.


>
> Thanks
>
> >
> > I think we must achieve this if we want to meet the requirements of SMC.
> > In SMC, a SMC socket(Corresponding to a TCP socket) need 2 memory
> > regions(1 for Tx and 1 for Rx). So if we have 1K TCP connections,
> > we'll need 2K share memory regions, and those memory regions are
> > dynamically allocated and freed with the TCP socket.
> >
> > >
> > >>
> > >>>
> > >>>
> > >>>>
> > >>>>>
> > >>>>> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
> > >>>>>   determines the sharing relationship at startup.
> > >>>>
> > >>>> Not necessarily with IOTLB API?
> > >>>
> > >>> Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
> > >>> provide the same memory on the host to two vms. So the implementation of this
> > >>> part will be much simpler. This is why we gave up virtio-vhost-user at the
> > >>> beginning.
> > >>
> > >> Ok, just to make sure we're at the same page. From spec level,
> > >> virtio-vhost-user doesn't (can't) limit the backend to be implemented
> > >> in another VM. So it should be ok to be used for sharing memory
> > >> between a guest and host.
> > >>
> > >> Thanks
> > >>
> > >>>
> > >>> Thanks.
> > >>>
> > >>>
> > >>>>
> > >>>>>
> > >>>>> 4. For security issues, the device under virtiovhostuser may mmap more memory,
> > >>>>>   while ism only maps one region to other devices
> > >>>>
> > >>>> With VHOST_IOTLB_MAP, the map could be done per region.
> > >>>>
> > >>>> Thanks
> > >>>>
> > >>>>>
> > >>>>> Thanks.
> > >>>>>
> > >>>>>>
> > >>>>>> Thanks
> > >>>>>>
> > >>>>>>>
> > >>>>>>> # Design
> > >>>>>>>
> > >>>>>>>   This is a structure diagram based on ism sharing between two vms.
> > >>>>>>>
> > >>>>>>>    |-------------------------------------------------------------------------------------------------------------|
> > >>>>>>>    | |------------------------------------------------|       |------------------------------------------------| |
> > >>>>>>>    | | Guest                                          |       | Guest                                          | |
> > >>>>>>>    | |                                                |       |                                                | |
> > >>>>>>>    | |   ----------------                             |       |   ----------------                             | |
> > >>>>>>>    | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> > >>>>>>>    | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> > >>>>>>>    | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> > >>>>>>>    | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> > >>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > >>>>>>>    | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> > >>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > >>>>>>>    | |                                |               |       |                               |                | |
> > >>>>>>>    | |                                |               |       |                               |                | |
> > >>>>>>>    | | Qemu                           |               |       | Qemu                          |                | |
> > >>>>>>>    | |--------------------------------+---------------|       |-------------------------------+----------------| |
> > >>>>>>>    |                                  |                                                       |                  |
> > >>>>>>>    |                                  |                                                       |                  |
> > >>>>>>>    |                                  |------------------------------+------------------------|                  |
> > >>>>>>>    |                                                                 |                                           |
> > >>>>>>>    |                                                                 |                                           |
> > >>>>>>>    |                                                   --------------------------                                |
> > >>>>>>>    |                                                    | M1 |   | M2 |   | M3 |                                 |
> > >>>>>>>    |                                                   --------------------------                                |
> > >>>>>>>    |                                                                                                             |
> > >>>>>>>    | HOST                                                                                                        |
> > >>>>>>>    ---------------------------------------------------------------------------------------------------------------
> > >>>>>>>
> > >>>>>>> # POC code
> > >>>>>>>
> > >>>>>>>   Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> > >>>>>>>   Qemu:   https://github.com/fengidri/qemu/commits/ism
> > >>>>>>>
> > >>>>>>> If there are any problems, please point them out.
> > >>>>>>>
> > >>>>>>> Hope to hear from you, thank you.
> > >>>>>>>
> > >>>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> > >>>>>>> [2] https://dl.acm.org/doi/10.1145/2847562
> > >>>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> > >>>>>>> [4] https://lwn.net/Articles/711071/
> > >>>>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Xuan Zhuo (2):
> > >>>>>>>  Reserve device id for ISM device
> > >>>>>>>  virtio-ism: introduce new device virtio-ism
> > >>>>>>>
> > >>>>>>> content.tex    |   3 +
> > >>>>>>> virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >>>>>>> 2 files changed, 343 insertions(+)
> > >>>>>>> create mode 100644 virtio-ism.tex
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> 2.32.0.3.g01195cf9f
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> ---------------------------------------------------------------------
> > >>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > >>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>> ---------------------------------------------------------------------
> > >>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > >>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >>>>>
> > >>>>
> > >>>
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  8:15             ` Xuan Zhuo
@ 2022-10-19  9:11               ` Jason Wang
  2022-10-19  9:15                 ` Xuan Zhuo
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Wang @ 2022-10-19  9:11 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Wed, Oct 19, 2022 at 4:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 19 Oct 2022 16:13:07 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > On Wed, 19 Oct 2022 16:01:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > Adding Stefan.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > Hello everyone,
> > > > > > > >
> > > > > > > > # Background
> > > > > > > >
> > > > > > > > Nowadays, there is a common scenario to accelerate communication between
> > > > > > > > different VMs and containers, including light weight virtual machine based
> > > > > > > > containers. One way to achieve this is to colocate them on the same host.
> > > > > > > > However, the performance of inter-VM communication through network stack is not
> > > > > > > > optimal and may also waste extra CPU cycles. This scenario has been discussed
> > > > > > > > many times, but still no generic solution available [1] [2] [3].
> > > > > > > >
> > > > > > > > With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> > > > > > > > We found that by changing the communication channel between VMs from TCP to SMC
> > > > > > > > with shared memory, we can achieve superior performance for a common
> > > > > > > > socket-based application[5]:
> > > > > > > >   - latency reduced by about 50%
> > > > > > > >   - throughput increased by about 300%
> > > > > > > >   - CPU consumption reduced by about 50%
> > > > > > > >
> > > > > > > > Since there is no particularly suitable shared memory management solution
> > > > > > > > matches the need for SMC(See ## Comparison with existing technology), and virtio
> > > > > > > > is the standard for communication in the virtualization world, we want to
> > > > > > > > implement a virtio-ism device based on virtio, which can support on-demand
> > > > > > > > memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> > > > > > > > the virtio-ism device need to support:
> > > > > > > >
> > > > > > > > 1. Dynamic provision: shared memory regions are dynamically allocated and
> > > > > > > >    provisioned.
> > > > > > > > 2. Multi-region management: the shared memory is divided into regions,
> > > > > > > >    and a peer may allocate one or more regions from the same shared memory
> > > > > > > >    device.
> > > > > > > > 3. Permission control: The permission of each region can be set seperately.
> > > > > > >
> > > > > > > Looks like virtio-ROCE
> > > > > > >
> > > > > > > https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> > > > > > >
> > > > > > > and virtio-vhost-user can satisfy the requirement?
> > > > > > >
> > > > > > > >
> > > > > > > > # Virtio ism device
> > > > > > > >
> > > > > > > > ISM devices provide the ability to share memory between different guests on a
> > > > > > > > host. A guest's memory got from ism device can be shared with multiple peers at
> > > > > > > > the same time. This shared relationship can be dynamically created and released.
> > > > > > > >
> > > > > > > > The shared memory obtained from the device is divided into multiple ism regions
> > > > > > > > for share. ISM device provides a mechanism to notify other ism region referrers
> > > > > > > > of content update events.
> > > > > > > >
> > > > > > > > # Usage (SMC as example)
> > > > > > > >
> > > > > > > > Maybe there is one of possible use cases:
> > > > > > > >
> > > > > > > > 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> > > > > > > >    location of a memory region in the PCI space and a token.
> > > > > > > > 2. The ism driver mmap the memory region and return to SMC with the token
> > > > > > > > 3. SMC passes the token to the connected peer
> > > > > > > > 3. the peer calls the ism driver interface ism_attach_region(token) to
> > > > > > > >    get the location of the PCI space of the shared memory
> > > > > > > >
> > > > > > > >
> > > > > > > > # About hot plugging of the ism device
> > > > > > > >
> > > > > > > >    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> > > > > > > >    less scalable operation. So, we don't plan to support it for now.
> > > > > > > >
> > > > > > > > # Comparison with existing technology
> > > > > > > >
> > > > > > > > ## ivshmem or ivshmem 2.0 of Qemu
> > > > > > > >
> > > > > > > >    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> > > > > > > >    use this VM, so the security is not enough.
> > > > > > > >
> > > > > > > >    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> > > > > > > >    other VMs that use the ivshmem 2.0 shared memory device, which also does not
> > > > > > > >    meet our needs in terms of security.
> > > > > > > >
> > > > > > > > ## vhost-pci and virtiovhostuser
> > > > > > > >
> > > > > > > >    Does not support dynamic allocation and therefore not suitable for SMC.
> > > > > > >
> > > > > > > I think this is an implementation issue, we can support VHOST IOTLB
> > > > > > > message then the regions could be added/removed on demand.
> > > > > >
> > > > > >
> > > > > > 1. After the attacker connects with the victim, if the attacker does not
> > > > > >    dereference memory, the memory will be occupied under virtiovhostuser. In the
> > > > > >    case of ism devices, the victim can directly release the reference, and the
> > > > > >    maliciously referenced region only occupies the attacker's resources
> > > > >
> > > > > Let's define the security boundary here. E.g do we trust the device or
> > > > > not? If yes, in the case of virtiovhostuser, can we simple do
> > > > > VHOST_IOTLB_UNMAP then we can safely release the memory from the
> > > > > attacker.
> > > > >
> > > > > >
> > > > > > 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
> > > > > >    time, which is a challenge for virtiovhostuser
> > > > >
> > > > > Please elaborate more the the challenges, anything make
> > > > > virtiovhostuser different?
> > > >
> > > > I understand (please point out any mistakes), one vvu device corresponds to one
> > > > vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
> > >
> > > There could be some misunderstanding here. With 1000 VM, you still
> > > need 1000 virtio-sim devices I think.
> >
> > No, just use a virtio-ism device.
>
> For example, if the hardware memory of a virtio-ism is 1G, and an ism region is
> 1M, there are 1000 ism regions, and these ism regions can be shared with
> different vms.

Right, this is what I've understood.

What I want to say this might be achieved with virtio-vhost-user as
well. But it may require a some changes on the protocol which I'm not
sure it's worth to bother. And I've started to think about the
possibility to build virtio-vhost-user on top (I don't see any blocker
so far).

Thanks

>
> And it is dynamic. After an ism region is shared with a vm, it can be shared
> with other vms.
>
> Thanks.
>
> >
> > Thanks.
> >
> > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
> > > > > >    determines the sharing relationship at startup.
> > > > >
> > > > > Not necessarily with IOTLB API?
> > > >
> > > > Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
> > > > provide the same memory on the host to two vms. So the implementation of this
> > > > part will be much simpler. This is why we gave up virtio-vhost-user at the
> > > > beginning.
> > >
> > > Ok, just to make sure we're at the same page. From spec level,
> > > virtio-vhost-user doesn't (can't) limit the backend to be implemented
> > > in another VM. So it should be ok to be used for sharing memory
> > > between a guest and host.
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > 4. For security issues, the device under virtiovhostuser may mmap more memory,
> > > > > >    while ism only maps one region to other devices
> > > > >
> > > > > With VHOST_IOTLB_MAP, the map could be done per region.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > # Design
> > > > > > > >
> > > > > > > >    This is a structure diagram based on ism sharing between two vms.
> > > > > > > >
> > > > > > > >     |-------------------------------------------------------------------------------------------------------------|
> > > > > > > >     | |------------------------------------------------|       |------------------------------------------------| |
> > > > > > > >     | | Guest                                          |       | Guest                                          | |
> > > > > > > >     | |                                                |       |                                                | |
> > > > > > > >     | |   ----------------                             |       |   ----------------                             | |
> > > > > > > >     | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> > > > > > > >     | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> > > > > > > >     | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> > > > > > > >     | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> > > > > > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > > > > >     | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> > > > > > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > > > > >     | |                                |               |       |                               |                | |
> > > > > > > >     | |                                |               |       |                               |                | |
> > > > > > > >     | | Qemu                           |               |       | Qemu                          |                | |
> > > > > > > >     | |--------------------------------+---------------|       |-------------------------------+----------------| |
> > > > > > > >     |                                  |                                                       |                  |
> > > > > > > >     |                                  |                                                       |                  |
> > > > > > > >     |                                  |------------------------------+------------------------|                  |
> > > > > > > >     |                                                                 |                                           |
> > > > > > > >     |                                                                 |                                           |
> > > > > > > >     |                                                   --------------------------                                |
> > > > > > > >     |                                                    | M1 |   | M2 |   | M3 |                                 |
> > > > > > > >     |                                                   --------------------------                                |
> > > > > > > >     |                                                                                                             |
> > > > > > > >     | HOST                                                                                                        |
> > > > > > > >     ---------------------------------------------------------------------------------------------------------------
> > > > > > > >
> > > > > > > > # POC code
> > > > > > > >
> > > > > > > >    Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> > > > > > > >    Qemu:   https://github.com/fengidri/qemu/commits/ism
> > > > > > > >
> > > > > > > > If there are any problems, please point them out.
> > > > > > > >
> > > > > > > > Hope to hear from you, thank you.
> > > > > > > >
> > > > > > > > [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> > > > > > > > [2] https://dl.acm.org/doi/10.1145/2847562
> > > > > > > > [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> > > > > > > > [4] https://lwn.net/Articles/711071/
> > > > > > > > [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> > > > > > > >
> > > > > > > >
> > > > > > > > Xuan Zhuo (2):
> > > > > > > >   Reserve device id for ISM device
> > > > > > > >   virtio-ism: introduce new device virtio-ism
> > > > > > > >
> > > > > > > >  content.tex    |   3 +
> > > > > > > >  virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > > >  2 files changed, 343 insertions(+)
> > > > > > > >  create mode 100644 virtio-ism.tex
> > > > > > > >
> > > > > > > > --
> > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > >
> > > > > > > >
> > > > > > > > ---------------------------------------------------------------------
> > > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > >
> > > > >
> > > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  9:10                 ` Gerry
@ 2022-10-19  9:13                   ` Jason Wang
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Wang @ 2022-10-19  9:13 UTC (permalink / raw)
  To: Gerry
  Cc: Tony Lu, Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc,
	dust.li, zhenzao, helinguo, mst, cohuck, Stefan Hajnoczi,
	Yongji Xie

On Wed, Oct 19, 2022 at 5:10 PM Gerry <gerry@linux.alibaba.com> wrote:
>
>
>
> > 2022年10月19日 17:04,Jason Wang <jasowang@redhat.com> 写道:
> >
> >
> > 在 2022/10/19 16:07, Tony Lu 写道:
> >> On Wed, Oct 19, 2022 at 02:02:21PM +0800, Xuan Zhuo wrote:
> >>> On Wed, 19 Oct 2022 12:36:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >>>> On Wed, Oct 19, 2022 at 12:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >>>>> On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >>>>>> On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >>>>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> Hi Jason,
> >>>>>>>
> >>>>>>> I think there may be some problems with the direction we are discussing.
> >>>>>> Probably not.
> >>>>>>
> >>>>>> As far as we are focusing on technology, there's nothing wrong from my
> >>>>>> perspective. And this is how the community works. Your idea needs to
> >>>>>> be justified and people are free to raise any technical questions
> >>>>>> especially considering you've posted a spec change with prototype
> >>>>>> codes but not only the idea.
> >>>>>>
> >>>>>>> Our
> >>>>>>> goal is to add an new ism device. As far as the spec is concerned, we are not
> >>>>>>> concerned with the implementation of the backend.
> >>>>>>>
> >>>>>>> The direction we should discuss is what is the difference between the ism device
> >>>>>>> and other devices such as virtio-net, and whether it is necessary to introduce
> >>>>>>> this new device.
> >>>>>> This is somehow what I want to ask, actually it's not a comparison
> >>>>>> with virtio-net but:
> >>>>>>
> >>>>>> - virtio-roce
> >>>>>> - virtio-vhost-user
> >>>>>> - virtio-(p)mem
> >>>>>>
> >>>>>> or whether we can simply add features to those devices to achieve what
> >>>>>> you want to do here.
> >>>>>
> >>>>> Yes, this is my priority to discuss.
> >>>>>
> >>>>> At the moment, I think the most similar to ism is the Vhost-user Device Backend
> >>>>> of virtio-vhost-user.
> >>>>>
> >>>>> My understanding of it is to map any virtio device to another vm as a vvu
> >>>>> device.
> >>>> Yes, so a possible way is to have a device with memory zone/region
> >>>> provision and management then map it via virtio-vhost-user.
> >>>
> >>> Yes, there is such a possibility. virtio-vhost-user makes me feel that what can
> >>> be shared is the function implementation of map.
> >>>
> >>> But in the vm to provide the interface to the upper layer, I think this is the
> >>> work of ism.
> >>>
> >>> But one of the reasons why I didn't use virtio-vhost-user directly is that in
> >>> another vm, the guest can operate the vvu device, which we hope that both sides
> >>> are equal to the ism device.
> >>>
> >>> So I want to agree on a question first: who will provide the upper layer with
> >>> the ability to share the memory area?
> >>>
> >>> Our answer is a new ism device. How does this device achieve memory sharing, I
> >>> think is the second question.
> >>>
> >>>
> >>>>> From this design purpose, I think the two are different.
> >>>>>
> >>>>> Of course, you might want to extend it, it does have some similarities and uses
> >>>>> a lot of similar techniques.
> >>>> I don't have any preference so far. If you think your idea makes more
> >>>> sense, then try your best to justify it in the list.
> >>>>
> >>>>> So we can really discuss in this direction, whether
> >>>>> the vvu device can be extended to achieve the purpose of ism, or whether the
> >>>>> design goals can be agreed.
> >>>> I've added Stefan in the loop, let's hear from him.
> >>>>
> >>>>> Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
> >>>>> Should device/driver APIs remain independent?
> >>>> Btw, you mentioned that one possible user of ism is the smc, but I
> >>>> don't see how it connects to that with your prototype driver.
> >>> Yes, we originally had plans, but the virtio spec was considered for submission,
> >>> so this was not included. Maybe, we should have included this part @Tony
> >>>
> >>> A brief introduction is that SMC currently has a corresponding
> >>> s390/net/ism_drv.c and we will replace this in the virtualization scenario.
> >
> >
> > Ok, I see. So I think the goal is to implement something in virtio that is functional equivalent to IBM ISM device.
> >
> >
> >>>
> >>> Thanks.
> >>>
> >> SMC is a network protocol which is modeled by shared memory rather than
> >> packet.
> >
> >
> > After reading more SMC from IBM website, I think you meant SMC-D here. And I wonder in order to have a complete SMC solution we still need virtio-ROCE for inter host communcation?
> Absolutely, a complete solution includes SMC-R remote peers and SMC-D for local peers:)

Ok, great.

>
>
> >
> >>  Actually the basic required interfaces of SMC device are:
> >>
> >>   - alloc / free memory region, each connection peer has two memory
> >>      regions dynamically for sending and receiving ring buffer.
> >>   - attach / detach memory region, remote attaches local-allocated
> >>      sending region as receiving region, vice versa.
> >>   - notify, tell peer to read data and update cursor.
> >>
> >> Then the device can be registered as SMC ISM device. Of course, SMC
> >> also requires some modification to adapt it.
> >
> >
> > Looking at s390 ism driver it requires other stuffs like vlan add/remove or gid query, do we need them as well?
> We plan to get rid of vlan support,

Please explain this in the changelog or cover letter.

> but we do need interface to query gid etc.

Ok, so let's add that in the next version.

> And the virtio-queue helps us much to implement a control communication channel to support those operations.

Right.

Thanks

>
> >
> > Thanks
> >
> >
> >>
> >> Cheers,
> >> Tony Lu
> >>
> >>>> Thanks
> >>>>
> >>>>> Thanks.
> >>>>>
> >>>>>
> >>>>>>> How to share the backend with other deivce is another problem.
> >>>>>> Yes, anything that is used for your virito-ism prototype can be used
> >>>>>> for other devices.
> >>>>>>
> >>>>>>> Our goal is to dynamically obtain a piece of memory to share with other vms.
> >>>>>> So at this level, I don't see the exact difference compared to
> >>>>>> virtio-vhost-user. Let's just focus on the API that carries on the
> >>>>>> semantic:
> >>>>>>
> >>>>>> - map/unmap
> >>>>>> - permission update
> >>>>>>
> >>>>>> The only missing piece is the per region notification.
> >>>>>>
> >>>>>>> In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> >>>>>>> it will use it as a ring. Of course, we also need a notify mechanism.
> >>>>>>>
> >>>>>>> That's what we're aiming for, so we should first discuss whether this
> >>>>>>> requirement is reasonable.
> >>>>>> So unless somebody said "no", it is fine until now.
> >>>>>>
> >>>>>>> I think it's a feature currently not supported by
> >>>>>>> other devices specified by the current virtio spce.
> >>>>>> Probably, but we've already had rfcs for roce and vhost-user.
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  9:11               ` Jason Wang
@ 2022-10-19  9:15                 ` Xuan Zhuo
  2022-10-21  2:42                   ` Jason Wang
  0 siblings, 1 reply; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-19  9:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Wed, 19 Oct 2022 17:11:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Oct 19, 2022 at 4:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 19 Oct 2022 16:13:07 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > On Wed, 19 Oct 2022 16:01:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > Adding Stefan.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > Hello everyone,
> > > > > > > > >
> > > > > > > > > # Background
> > > > > > > > >
> > > > > > > > > Nowadays, there is a common scenario to accelerate communication between
> > > > > > > > > different VMs and containers, including light weight virtual machine based
> > > > > > > > > containers. One way to achieve this is to colocate them on the same host.
> > > > > > > > > However, the performance of inter-VM communication through network stack is not
> > > > > > > > > optimal and may also waste extra CPU cycles. This scenario has been discussed
> > > > > > > > > many times, but still no generic solution available [1] [2] [3].
> > > > > > > > >
> > > > > > > > > With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> > > > > > > > > We found that by changing the communication channel between VMs from TCP to SMC
> > > > > > > > > with shared memory, we can achieve superior performance for a common
> > > > > > > > > socket-based application[5]:
> > > > > > > > >   - latency reduced by about 50%
> > > > > > > > >   - throughput increased by about 300%
> > > > > > > > >   - CPU consumption reduced by about 50%
> > > > > > > > >
> > > > > > > > > Since there is no particularly suitable shared memory management solution
> > > > > > > > > matches the need for SMC(See ## Comparison with existing technology), and virtio
> > > > > > > > > is the standard for communication in the virtualization world, we want to
> > > > > > > > > implement a virtio-ism device based on virtio, which can support on-demand
> > > > > > > > > memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> > > > > > > > > the virtio-ism device need to support:
> > > > > > > > >
> > > > > > > > > 1. Dynamic provision: shared memory regions are dynamically allocated and
> > > > > > > > >    provisioned.
> > > > > > > > > 2. Multi-region management: the shared memory is divided into regions,
> > > > > > > > >    and a peer may allocate one or more regions from the same shared memory
> > > > > > > > >    device.
> > > > > > > > > 3. Permission control: The permission of each region can be set seperately.
> > > > > > > >
> > > > > > > > Looks like virtio-ROCE
> > > > > > > >
> > > > > > > > https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> > > > > > > >
> > > > > > > > and virtio-vhost-user can satisfy the requirement?
> > > > > > > >
> > > > > > > > >
> > > > > > > > > # Virtio ism device
> > > > > > > > >
> > > > > > > > > ISM devices provide the ability to share memory between different guests on a
> > > > > > > > > host. A guest's memory got from ism device can be shared with multiple peers at
> > > > > > > > > the same time. This shared relationship can be dynamically created and released.
> > > > > > > > >
> > > > > > > > > The shared memory obtained from the device is divided into multiple ism regions
> > > > > > > > > for share. ISM device provides a mechanism to notify other ism region referrers
> > > > > > > > > of content update events.
> > > > > > > > >
> > > > > > > > > # Usage (SMC as example)
> > > > > > > > >
> > > > > > > > > Maybe there is one of possible use cases:
> > > > > > > > >
> > > > > > > > > 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> > > > > > > > >    location of a memory region in the PCI space and a token.
> > > > > > > > > 2. The ism driver mmap the memory region and return to SMC with the token
> > > > > > > > > 3. SMC passes the token to the connected peer
> > > > > > > > > 3. the peer calls the ism driver interface ism_attach_region(token) to
> > > > > > > > >    get the location of the PCI space of the shared memory
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > # About hot plugging of the ism device
> > > > > > > > >
> > > > > > > > >    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> > > > > > > > >    less scalable operation. So, we don't plan to support it for now.
> > > > > > > > >
> > > > > > > > > # Comparison with existing technology
> > > > > > > > >
> > > > > > > > > ## ivshmem or ivshmem 2.0 of Qemu
> > > > > > > > >
> > > > > > > > >    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> > > > > > > > >    use this VM, so the security is not enough.
> > > > > > > > >
> > > > > > > > >    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> > > > > > > > >    other VMs that use the ivshmem 2.0 shared memory device, which also does not
> > > > > > > > >    meet our needs in terms of security.
> > > > > > > > >
> > > > > > > > > ## vhost-pci and virtiovhostuser
> > > > > > > > >
> > > > > > > > >    Does not support dynamic allocation and therefore not suitable for SMC.
> > > > > > > >
> > > > > > > > I think this is an implementation issue, we can support VHOST IOTLB
> > > > > > > > message then the regions could be added/removed on demand.
> > > > > > >
> > > > > > >
> > > > > > > 1. After the attacker connects with the victim, if the attacker does not
> > > > > > >    dereference memory, the memory will be occupied under virtiovhostuser. In the
> > > > > > >    case of ism devices, the victim can directly release the reference, and the
> > > > > > >    maliciously referenced region only occupies the attacker's resources
> > > > > >
> > > > > > Let's define the security boundary here. E.g do we trust the device or
> > > > > > not? If yes, in the case of virtiovhostuser, can we simple do
> > > > > > VHOST_IOTLB_UNMAP then we can safely release the memory from the
> > > > > > attacker.
> > > > > >
> > > > > > >
> > > > > > > 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
> > > > > > >    time, which is a challenge for virtiovhostuser
> > > > > >
> > > > > > Please elaborate more the the challenges, anything make
> > > > > > virtiovhostuser different?
> > > > >
> > > > > I understand (please point out any mistakes), one vvu device corresponds to one
> > > > > vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
> > > >
> > > > There could be some misunderstanding here. With 1000 VM, you still
> > > > need 1000 virtio-sim devices I think.
> > >
> > > No, just use a virtio-ism device.
> >
> > For example, if the hardware memory of a virtio-ism is 1G, and an ism region is
> > 1M, there are 1000 ism regions, and these ism regions can be shared with
> > different vms.
>
> Right, this is what I've understood.
>
> What I want to say this might be achieved with virtio-vhost-user as
> well. But it may require a some changes on the protocol which I'm not
> sure it's worth to bother. And I've started to think about the
> possibility to build virtio-vhost-user on top (I don't see any blocker
> so far).

Yes, it is theoretically possible to implement based on virtio-vhost-user. But
when we try to implement it without depending on virtio-vhost-user, this
implementation is also very simple. Because the physical memory it shares does
not come from a vm, but from the host.

So I think we have reached an agreement on the relationship between ism and
virtio-vhost-user. ism is used to provide shared memory to the upper layer, and
this device should be necessary to add (of course, listen to some other people's
opinions). And How is its backend shared with other vms? This is our second
question.

Thanks.



>
> Thanks
>
> >
> > And it is dynamic. After an ism region is shared with a vm, it can be shared
> > with other vms.
> >
> > Thanks.
> >
> > >
> > > Thanks.
> > >
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
> > > > > > >    determines the sharing relationship at startup.
> > > > > >
> > > > > > Not necessarily with IOTLB API?
> > > > >
> > > > > Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
> > > > > provide the same memory on the host to two vms. So the implementation of this
> > > > > part will be much simpler. This is why we gave up virtio-vhost-user at the
> > > > > beginning.
> > > >
> > > > Ok, just to make sure we're at the same page. From spec level,
> > > > virtio-vhost-user doesn't (can't) limit the backend to be implemented
> > > > in another VM. So it should be ok to be used for sharing memory
> > > > between a guest and host.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > 4. For security issues, the device under virtiovhostuser may mmap more memory,
> > > > > > >    while ism only maps one region to other devices
> > > > > >
> > > > > > With VHOST_IOTLB_MAP, the map could be done per region.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > > # Design
> > > > > > > > >
> > > > > > > > >    This is a structure diagram based on ism sharing between two vms.
> > > > > > > > >
> > > > > > > > >     |-------------------------------------------------------------------------------------------------------------|
> > > > > > > > >     | |------------------------------------------------|       |------------------------------------------------| |
> > > > > > > > >     | | Guest                                          |       | Guest                                          | |
> > > > > > > > >     | |                                                |       |                                                | |
> > > > > > > > >     | |   ----------------                             |       |   ----------------                             | |
> > > > > > > > >     | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> > > > > > > > >     | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> > > > > > > > >     | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> > > > > > > > >     | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> > > > > > > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > > > > > >     | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> > > > > > > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > > > > > >     | |                                |               |       |                               |                | |
> > > > > > > > >     | |                                |               |       |                               |                | |
> > > > > > > > >     | | Qemu                           |               |       | Qemu                          |                | |
> > > > > > > > >     | |--------------------------------+---------------|       |-------------------------------+----------------| |
> > > > > > > > >     |                                  |                                                       |                  |
> > > > > > > > >     |                                  |                                                       |                  |
> > > > > > > > >     |                                  |------------------------------+------------------------|                  |
> > > > > > > > >     |                                                                 |                                           |
> > > > > > > > >     |                                                                 |                                           |
> > > > > > > > >     |                                                   --------------------------                                |
> > > > > > > > >     |                                                    | M1 |   | M2 |   | M3 |                                 |
> > > > > > > > >     |                                                   --------------------------                                |
> > > > > > > > >     |                                                                                                             |
> > > > > > > > >     | HOST                                                                                                        |
> > > > > > > > >     ---------------------------------------------------------------------------------------------------------------
> > > > > > > > >
> > > > > > > > > # POC code
> > > > > > > > >
> > > > > > > > >    Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> > > > > > > > >    Qemu:   https://github.com/fengidri/qemu/commits/ism
> > > > > > > > >
> > > > > > > > > If there are any problems, please point them out.
> > > > > > > > >
> > > > > > > > > Hope to hear from you, thank you.
> > > > > > > > >
> > > > > > > > > [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> > > > > > > > > [2] https://dl.acm.org/doi/10.1145/2847562
> > > > > > > > > [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> > > > > > > > > [4] https://lwn.net/Articles/711071/
> > > > > > > > > [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Xuan Zhuo (2):
> > > > > > > > >   Reserve device id for ISM device
> > > > > > > > >   virtio-ism: introduce new device virtio-ism
> > > > > > > > >
> > > > > > > > >  content.tex    |   3 +
> > > > > > > > >  virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > > > >  2 files changed, 343 insertions(+)
> > > > > > > > >  create mode 100644 virtio-ism.tex
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > ---------------------------------------------------------------------
> > > > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  9:10                 ` Xuan Zhuo
@ 2022-10-19  9:15                   ` Jason Wang
  2022-10-19  9:23                     ` Xuan Zhuo
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Wang @ 2022-10-19  9:15 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Gerry, virtio-dev, hans, herongguang, zmlcc, tonylu, zhenzao,
	helinguo, mst, cohuck, Stefan Hajnoczi, dust.li

On Wed, Oct 19, 2022 at 5:12 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 19 Oct 2022 17:08:29 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Oct 19, 2022 at 4:21 PM Dust Li <dust.li@linux.alibaba.com> wrote:
> > >
> > > On Wed, Oct 19, 2022 at 04:03:42PM +0800, Gerry wrote:
> > > >
> > > >
> > > >> 2022年10月19日 16:01,Jason Wang <jasowang@redhat.com> 写道:
> > > >>
> > > >> On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >>>
> > > >>> On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > >>>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >>>>>
> > > >>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > >>>>>> Adding Stefan.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >>>>>>>
> > > >>>>>>> Hello everyone,
> > > >>>>>>>
> > > >>>>>>> # Background
> > > >>>>>>>
> > > >>>>>>> Nowadays, there is a common scenario to accelerate communication between
> > > >>>>>>> different VMs and containers, including light weight virtual machine based
> > > >>>>>>> containers. One way to achieve this is to colocate them on the same host.
> > > >>>>>>> However, the performance of inter-VM communication through network stack is not
> > > >>>>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
> > > >>>>>>> many times, but still no generic solution available [1] [2] [3].
> > > >>>>>>>
> > > >>>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> > > >>>>>>> We found that by changing the communication channel between VMs from TCP to SMC
> > > >>>>>>> with shared memory, we can achieve superior performance for a common
> > > >>>>>>> socket-based application[5]:
> > > >>>>>>>  - latency reduced by about 50%
> > > >>>>>>>  - throughput increased by about 300%
> > > >>>>>>>  - CPU consumption reduced by about 50%
> > > >>>>>>>
> > > >>>>>>> Since there is no particularly suitable shared memory management solution
> > > >>>>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
> > > >>>>>>> is the standard for communication in the virtualization world, we want to
> > > >>>>>>> implement a virtio-ism device based on virtio, which can support on-demand
> > > >>>>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> > > >>>>>>> the virtio-ism device need to support:
> > > >>>>>>>
> > > >>>>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
> > > >>>>>>>   provisioned.
> > > >>>>>>> 2. Multi-region management: the shared memory is divided into regions,
> > > >>>>>>>   and a peer may allocate one or more regions from the same shared memory
> > > >>>>>>>   device.
> > > >>>>>>> 3. Permission control: The permission of each region can be set seperately.
> > > >>>>>>
> > > >>>>>> Looks like virtio-ROCE
> > > >>>>>>
> > > >>>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> > > >>>>>>
> > > >>>>>> and virtio-vhost-user can satisfy the requirement?
> > > >>>>>>
> > > >>>>>>>
> > > >>>>>>> # Virtio ism device
> > > >>>>>>>
> > > >>>>>>> ISM devices provide the ability to share memory between different guests on a
> > > >>>>>>> host. A guest's memory got from ism device can be shared with multiple peers at
> > > >>>>>>> the same time. This shared relationship can be dynamically created and released.
> > > >>>>>>>
> > > >>>>>>> The shared memory obtained from the device is divided into multiple ism regions
> > > >>>>>>> for share. ISM device provides a mechanism to notify other ism region referrers
> > > >>>>>>> of content update events.
> > > >>>>>>>
> > > >>>>>>> # Usage (SMC as example)
> > > >>>>>>>
> > > >>>>>>> Maybe there is one of possible use cases:
> > > >>>>>>>
> > > >>>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> > > >>>>>>>   location of a memory region in the PCI space and a token.
> > > >>>>>>> 2. The ism driver mmap the memory region and return to SMC with the token
> > > >>>>>>> 3. SMC passes the token to the connected peer
> > > >>>>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
> > > >>>>>>>   get the location of the PCI space of the shared memory
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> # About hot plugging of the ism device
> > > >>>>>>>
> > > >>>>>>>   Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> > > >>>>>>>   less scalable operation. So, we don't plan to support it for now.
> > > >>>>>>>
> > > >>>>>>> # Comparison with existing technology
> > > >>>>>>>
> > > >>>>>>> ## ivshmem or ivshmem 2.0 of Qemu
> > > >>>>>>>
> > > >>>>>>>   1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> > > >>>>>>>   use this VM, so the security is not enough.
> > > >>>>>>>
> > > >>>>>>>   2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> > > >>>>>>>   other VMs that use the ivshmem 2.0 shared memory device, which also does not
> > > >>>>>>>   meet our needs in terms of security.
> > > >>>>>>>
> > > >>>>>>> ## vhost-pci and virtiovhostuser
> > > >>>>>>>
> > > >>>>>>>   Does not support dynamic allocation and therefore not suitable for SMC.
> > > >>>>>>
> > > >>>>>> I think this is an implementation issue, we can support VHOST IOTLB
> > > >>>>>> message then the regions could be added/removed on demand.
> > > >>>>>
> > > >>>>>
> > > >>>>> 1. After the attacker connects with the victim, if the attacker does not
> > > >>>>>   dereference memory, the memory will be occupied under virtiovhostuser. In the
> > > >>>>>   case of ism devices, the victim can directly release the reference, and the
> > > >>>>>   maliciously referenced region only occupies the attacker's resources
> > > >>>>
> > > >>>> Let's define the security boundary here. E.g do we trust the device or
> > > >>>> not? If yes, in the case of virtiovhostuser, can we simple do
> > > >>>> VHOST_IOTLB_UNMAP then we can safely release the memory from the
> > > >>>> attacker.
> > > >>>>
> > > >>>>>
> > > >>>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
> > > >>>>>   time, which is a challenge for virtiovhostuser
> > > >>>>
> > > >>>> Please elaborate more the the challenges, anything make
> > > >>>> virtiovhostuser different?
> > > >>>
> > > >>> I understand (please point out any mistakes), one vvu device corresponds to one
> > > >>> vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
> > > >>
> > > >> There could be some misunderstanding here. With 1000 VM, you still
> > > >> need 1000 virtio-sim devices I think.
> > > >We are trying to achieve one virtio-ism device per vm instead of one virtio-ism device per SMC connection.
> >
> > I wonder if we need something to identify a virtio-ism device since I
> > guess there's still a chance to have multiple virtio-ism device per VM
> > (different service chain etc).
>
> Yes, there will be such a situation, a vm has multiple virtio-ism devices.
>
> What exactly do you mean by "identify"?

E.g we can differ two virtio-net through mac address, do we need
something similar for ism, or it's completely unncessary (e.g via
token or other) ?

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
> > >
> > > I think we must achieve this if we want to meet the requirements of SMC.
> > > In SMC, a SMC socket(Corresponding to a TCP socket) need 2 memory
> > > regions(1 for Tx and 1 for Rx). So if we have 1K TCP connections,
> > > we'll need 2K share memory regions, and those memory regions are
> > > dynamically allocated and freed with the TCP socket.
> > >
> > > >
> > > >>
> > > >>>
> > > >>>
> > > >>>>
> > > >>>>>
> > > >>>>> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
> > > >>>>>   determines the sharing relationship at startup.
> > > >>>>
> > > >>>> Not necessarily with IOTLB API?
> > > >>>
> > > >>> Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
> > > >>> provide the same memory on the host to two vms. So the implementation of this
> > > >>> part will be much simpler. This is why we gave up virtio-vhost-user at the
> > > >>> beginning.
> > > >>
> > > >> Ok, just to make sure we're at the same page. From spec level,
> > > >> virtio-vhost-user doesn't (can't) limit the backend to be implemented
> > > >> in another VM. So it should be ok to be used for sharing memory
> > > >> between a guest and host.
> > > >>
> > > >> Thanks
> > > >>
> > > >>>
> > > >>> Thanks.
> > > >>>
> > > >>>
> > > >>>>
> > > >>>>>
> > > >>>>> 4. For security issues, the device under virtiovhostuser may mmap more memory,
> > > >>>>>   while ism only maps one region to other devices
> > > >>>>
> > > >>>> With VHOST_IOTLB_MAP, the map could be done per region.
> > > >>>>
> > > >>>> Thanks
> > > >>>>
> > > >>>>>
> > > >>>>> Thanks.
> > > >>>>>
> > > >>>>>>
> > > >>>>>> Thanks
> > > >>>>>>
> > > >>>>>>>
> > > >>>>>>> # Design
> > > >>>>>>>
> > > >>>>>>>   This is a structure diagram based on ism sharing between two vms.
> > > >>>>>>>
> > > >>>>>>>    |-------------------------------------------------------------------------------------------------------------|
> > > >>>>>>>    | |------------------------------------------------|       |------------------------------------------------| |
> > > >>>>>>>    | | Guest                                          |       | Guest                                          | |
> > > >>>>>>>    | |                                                |       |                                                | |
> > > >>>>>>>    | |   ----------------                             |       |   ----------------                             | |
> > > >>>>>>>    | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> > > >>>>>>>    | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> > > >>>>>>>    | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> > > >>>>>>>    | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> > > >>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > >>>>>>>    | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> > > >>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > >>>>>>>    | |                                |               |       |                               |                | |
> > > >>>>>>>    | |                                |               |       |                               |                | |
> > > >>>>>>>    | | Qemu                           |               |       | Qemu                          |                | |
> > > >>>>>>>    | |--------------------------------+---------------|       |-------------------------------+----------------| |
> > > >>>>>>>    |                                  |                                                       |                  |
> > > >>>>>>>    |                                  |                                                       |                  |
> > > >>>>>>>    |                                  |------------------------------+------------------------|                  |
> > > >>>>>>>    |                                                                 |                                           |
> > > >>>>>>>    |                                                                 |                                           |
> > > >>>>>>>    |                                                   --------------------------                                |
> > > >>>>>>>    |                                                    | M1 |   | M2 |   | M3 |                                 |
> > > >>>>>>>    |                                                   --------------------------                                |
> > > >>>>>>>    |                                                                                                             |
> > > >>>>>>>    | HOST                                                                                                        |
> > > >>>>>>>    ---------------------------------------------------------------------------------------------------------------
> > > >>>>>>>
> > > >>>>>>> # POC code
> > > >>>>>>>
> > > >>>>>>>   Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> > > >>>>>>>   Qemu:   https://github.com/fengidri/qemu/commits/ism
> > > >>>>>>>
> > > >>>>>>> If there are any problems, please point them out.
> > > >>>>>>>
> > > >>>>>>> Hope to hear from you, thank you.
> > > >>>>>>>
> > > >>>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> > > >>>>>>> [2] https://dl.acm.org/doi/10.1145/2847562
> > > >>>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> > > >>>>>>> [4] https://lwn.net/Articles/711071/
> > > >>>>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Xuan Zhuo (2):
> > > >>>>>>>  Reserve device id for ISM device
> > > >>>>>>>  virtio-ism: introduce new device virtio-ism
> > > >>>>>>>
> > > >>>>>>> content.tex    |   3 +
> > > >>>>>>> virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > >>>>>>> 2 files changed, 343 insertions(+)
> > > >>>>>>> create mode 100644 virtio-ism.tex
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>> 2.32.0.3.g01195cf9f
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> ---------------------------------------------------------------------
> > > >>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > >>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>> ---------------------------------------------------------------------
> > > >>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > >>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >>>>>
> > > >>>>
> > > >>>
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  9:15                   ` Jason Wang
@ 2022-10-19  9:23                     ` Xuan Zhuo
  2022-10-21  2:41                       ` Jason Wang
  0 siblings, 1 reply; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-19  9:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: Gerry, virtio-dev, hans, herongguang, zmlcc, tonylu, zhenzao,
	helinguo, mst, cohuck, Stefan Hajnoczi, dust.li

On Wed, 19 Oct 2022 17:15:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Oct 19, 2022 at 5:12 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 19 Oct 2022 17:08:29 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Oct 19, 2022 at 4:21 PM Dust Li <dust.li@linux.alibaba.com> wrote:
> > > >
> > > > On Wed, Oct 19, 2022 at 04:03:42PM +0800, Gerry wrote:
> > > > >
> > > > >
> > > > >> 2022年10月19日 16:01,Jason Wang <jasowang@redhat.com> 写道:
> > > > >>
> > > > >> On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >>>
> > > > >>> On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > >>>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >>>>>
> > > > >>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > >>>>>> Adding Stefan.
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >>>>>>>
> > > > >>>>>>> Hello everyone,
> > > > >>>>>>>
> > > > >>>>>>> # Background
> > > > >>>>>>>
> > > > >>>>>>> Nowadays, there is a common scenario to accelerate communication between
> > > > >>>>>>> different VMs and containers, including light weight virtual machine based
> > > > >>>>>>> containers. One way to achieve this is to colocate them on the same host.
> > > > >>>>>>> However, the performance of inter-VM communication through network stack is not
> > > > >>>>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
> > > > >>>>>>> many times, but still no generic solution available [1] [2] [3].
> > > > >>>>>>>
> > > > >>>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> > > > >>>>>>> We found that by changing the communication channel between VMs from TCP to SMC
> > > > >>>>>>> with shared memory, we can achieve superior performance for a common
> > > > >>>>>>> socket-based application[5]:
> > > > >>>>>>>  - latency reduced by about 50%
> > > > >>>>>>>  - throughput increased by about 300%
> > > > >>>>>>>  - CPU consumption reduced by about 50%
> > > > >>>>>>>
> > > > >>>>>>> Since there is no particularly suitable shared memory management solution
> > > > >>>>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
> > > > >>>>>>> is the standard for communication in the virtualization world, we want to
> > > > >>>>>>> implement a virtio-ism device based on virtio, which can support on-demand
> > > > >>>>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> > > > >>>>>>> the virtio-ism device need to support:
> > > > >>>>>>>
> > > > >>>>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
> > > > >>>>>>>   provisioned.
> > > > >>>>>>> 2. Multi-region management: the shared memory is divided into regions,
> > > > >>>>>>>   and a peer may allocate one or more regions from the same shared memory
> > > > >>>>>>>   device.
> > > > >>>>>>> 3. Permission control: The permission of each region can be set seperately.
> > > > >>>>>>
> > > > >>>>>> Looks like virtio-ROCE
> > > > >>>>>>
> > > > >>>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> > > > >>>>>>
> > > > >>>>>> and virtio-vhost-user can satisfy the requirement?
> > > > >>>>>>
> > > > >>>>>>>
> > > > >>>>>>> # Virtio ism device
> > > > >>>>>>>
> > > > >>>>>>> ISM devices provide the ability to share memory between different guests on a
> > > > >>>>>>> host. A guest's memory got from ism device can be shared with multiple peers at
> > > > >>>>>>> the same time. This shared relationship can be dynamically created and released.
> > > > >>>>>>>
> > > > >>>>>>> The shared memory obtained from the device is divided into multiple ism regions
> > > > >>>>>>> for share. ISM device provides a mechanism to notify other ism region referrers
> > > > >>>>>>> of content update events.
> > > > >>>>>>>
> > > > >>>>>>> # Usage (SMC as example)
> > > > >>>>>>>
> > > > >>>>>>> Maybe there is one of possible use cases:
> > > > >>>>>>>
> > > > >>>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> > > > >>>>>>>   location of a memory region in the PCI space and a token.
> > > > >>>>>>> 2. The ism driver mmap the memory region and return to SMC with the token
> > > > >>>>>>> 3. SMC passes the token to the connected peer
> > > > >>>>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
> > > > >>>>>>>   get the location of the PCI space of the shared memory
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> # About hot plugging of the ism device
> > > > >>>>>>>
> > > > >>>>>>>   Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> > > > >>>>>>>   less scalable operation. So, we don't plan to support it for now.
> > > > >>>>>>>
> > > > >>>>>>> # Comparison with existing technology
> > > > >>>>>>>
> > > > >>>>>>> ## ivshmem or ivshmem 2.0 of Qemu
> > > > >>>>>>>
> > > > >>>>>>>   1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> > > > >>>>>>>   use this VM, so the security is not enough.
> > > > >>>>>>>
> > > > >>>>>>>   2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> > > > >>>>>>>   other VMs that use the ivshmem 2.0 shared memory device, which also does not
> > > > >>>>>>>   meet our needs in terms of security.
> > > > >>>>>>>
> > > > >>>>>>> ## vhost-pci and virtiovhostuser
> > > > >>>>>>>
> > > > >>>>>>>   Does not support dynamic allocation and therefore not suitable for SMC.
> > > > >>>>>>
> > > > >>>>>> I think this is an implementation issue, we can support VHOST IOTLB
> > > > >>>>>> message then the regions could be added/removed on demand.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> 1. After the attacker connects with the victim, if the attacker does not
> > > > >>>>>   dereference memory, the memory will be occupied under virtiovhostuser. In the
> > > > >>>>>   case of ism devices, the victim can directly release the reference, and the
> > > > >>>>>   maliciously referenced region only occupies the attacker's resources
> > > > >>>>
> > > > >>>> Let's define the security boundary here. E.g do we trust the device or
> > > > >>>> not? If yes, in the case of virtiovhostuser, can we simple do
> > > > >>>> VHOST_IOTLB_UNMAP then we can safely release the memory from the
> > > > >>>> attacker.
> > > > >>>>
> > > > >>>>>
> > > > >>>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
> > > > >>>>>   time, which is a challenge for virtiovhostuser
> > > > >>>>
> > > > >>>> Please elaborate more the the challenges, anything make
> > > > >>>> virtiovhostuser different?
> > > > >>>
> > > > >>> I understand (please point out any mistakes), one vvu device corresponds to one
> > > > >>> vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
> > > > >>
> > > > >> There could be some misunderstanding here. With 1000 VM, you still
> > > > >> need 1000 virtio-sim devices I think.
> > > > >We are trying to achieve one virtio-ism device per vm instead of one virtio-ism device per SMC connection.
> > >
> > > I wonder if we need something to identify a virtio-ism device since I
> > > guess there's still a chance to have multiple virtio-ism device per VM
> > > (different service chain etc).
> >
> > Yes, there will be such a situation, a vm has multiple virtio-ism devices.
> >
> > What exactly do you mean by "identify"?
>
> E.g we can differ two virtio-net through mac address, do we need
> something similar for ism, or it's completely unncessary (e.g via
> token or other) ?

Currently, we have not encountered such a request.

It is conceivable that all physical shared memory ism regions are indexed by
tokens. virtio-ism is a way to obtain these ism regions, so there is no need to
distinguish multiple virtio-ism devices under one vm on the host.

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > I think we must achieve this if we want to meet the requirements of SMC.
> > > > In SMC, a SMC socket(Corresponding to a TCP socket) need 2 memory
> > > > regions(1 for Tx and 1 for Rx). So if we have 1K TCP connections,
> > > > we'll need 2K share memory regions, and those memory regions are
> > > > dynamically allocated and freed with the TCP socket.
> > > >
> > > > >
> > > > >>
> > > > >>>
> > > > >>>
> > > > >>>>
> > > > >>>>>
> > > > >>>>> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
> > > > >>>>>   determines the sharing relationship at startup.
> > > > >>>>
> > > > >>>> Not necessarily with IOTLB API?
> > > > >>>
> > > > >>> Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
> > > > >>> provide the same memory on the host to two vms. So the implementation of this
> > > > >>> part will be much simpler. This is why we gave up virtio-vhost-user at the
> > > > >>> beginning.
> > > > >>
> > > > >> Ok, just to make sure we're at the same page. From spec level,
> > > > >> virtio-vhost-user doesn't (can't) limit the backend to be implemented
> > > > >> in another VM. So it should be ok to be used for sharing memory
> > > > >> between a guest and host.
> > > > >>
> > > > >> Thanks
> > > > >>
> > > > >>>
> > > > >>> Thanks.
> > > > >>>
> > > > >>>
> > > > >>>>
> > > > >>>>>
> > > > >>>>> 4. For security issues, the device under virtiovhostuser may mmap more memory,
> > > > >>>>>   while ism only maps one region to other devices
> > > > >>>>
> > > > >>>> With VHOST_IOTLB_MAP, the map could be done per region.
> > > > >>>>
> > > > >>>> Thanks
> > > > >>>>
> > > > >>>>>
> > > > >>>>> Thanks.
> > > > >>>>>
> > > > >>>>>>
> > > > >>>>>> Thanks
> > > > >>>>>>
> > > > >>>>>>>
> > > > >>>>>>> # Design
> > > > >>>>>>>
> > > > >>>>>>>   This is a structure diagram based on ism sharing between two vms.
> > > > >>>>>>>
> > > > >>>>>>>    |-------------------------------------------------------------------------------------------------------------|
> > > > >>>>>>>    | |------------------------------------------------|       |------------------------------------------------| |
> > > > >>>>>>>    | | Guest                                          |       | Guest                                          | |
> > > > >>>>>>>    | |                                                |       |                                                | |
> > > > >>>>>>>    | |   ----------------                             |       |   ----------------                             | |
> > > > >>>>>>>    | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> > > > >>>>>>>    | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> > > > >>>>>>>    | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> > > > >>>>>>>    | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> > > > >>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > >>>>>>>    | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> > > > >>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > >>>>>>>    | |                                |               |       |                               |                | |
> > > > >>>>>>>    | |                                |               |       |                               |                | |
> > > > >>>>>>>    | | Qemu                           |               |       | Qemu                          |                | |
> > > > >>>>>>>    | |--------------------------------+---------------|       |-------------------------------+----------------| |
> > > > >>>>>>>    |                                  |                                                       |                  |
> > > > >>>>>>>    |                                  |                                                       |                  |
> > > > >>>>>>>    |                                  |------------------------------+------------------------|                  |
> > > > >>>>>>>    |                                                                 |                                           |
> > > > >>>>>>>    |                                                                 |                                           |
> > > > >>>>>>>    |                                                   --------------------------                                |
> > > > >>>>>>>    |                                                    | M1 |   | M2 |   | M3 |                                 |
> > > > >>>>>>>    |                                                   --------------------------                                |
> > > > >>>>>>>    |                                                                                                             |
> > > > >>>>>>>    | HOST                                                                                                        |
> > > > >>>>>>>    ---------------------------------------------------------------------------------------------------------------
> > > > >>>>>>>
> > > > >>>>>>> # POC code
> > > > >>>>>>>
> > > > >>>>>>>   Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> > > > >>>>>>>   Qemu:   https://github.com/fengidri/qemu/commits/ism
> > > > >>>>>>>
> > > > >>>>>>> If there are any problems, please point them out.
> > > > >>>>>>>
> > > > >>>>>>> Hope to hear from you, thank you.
> > > > >>>>>>>
> > > > >>>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> > > > >>>>>>> [2] https://dl.acm.org/doi/10.1145/2847562
> > > > >>>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> > > > >>>>>>> [4] https://lwn.net/Articles/711071/
> > > > >>>>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Xuan Zhuo (2):
> > > > >>>>>>>  Reserve device id for ISM device
> > > > >>>>>>>  virtio-ism: introduce new device virtio-ism
> > > > >>>>>>>
> > > > >>>>>>> content.tex    |   3 +
> > > > >>>>>>> virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > >>>>>>> 2 files changed, 343 insertions(+)
> > > > >>>>>>> create mode 100644 virtio-ism.tex
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>> 2.32.0.3.g01195cf9f
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> ---------------------------------------------------------------------
> > > > >>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > >>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>> ---------------------------------------------------------------------
> > > > >>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > >>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  9:04               ` Jason Wang
  2022-10-19  9:10                 ` Gerry
@ 2022-10-19 10:01                 ` Tony Lu
  2022-10-21  2:47                   ` Jason Wang
  1 sibling, 1 reply; 61+ messages in thread
From: Tony Lu @ 2022-10-19 10:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, dust.li,
	zhenzao, helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Wed, Oct 19, 2022 at 05:04:58PM +0800, Jason Wang wrote:
> 
> 在 2022/10/19 16:07, Tony Lu 写道:
> > On Wed, Oct 19, 2022 at 02:02:21PM +0800, Xuan Zhuo wrote:
> > > On Wed, 19 Oct 2022 12:36:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Wed, Oct 19, 2022 at 12:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > Hi Jason,
> > > > > > > 
> > > > > > > I think there may be some problems with the direction we are discussing.
> > > > > > Probably not.
> > > > > > 
> > > > > > As far as we are focusing on technology, there's nothing wrong from my
> > > > > > perspective. And this is how the community works. Your idea needs to
> > > > > > be justified and people are free to raise any technical questions
> > > > > > especially considering you've posted a spec change with prototype
> > > > > > codes but not only the idea.
> > > > > > 
> > > > > > > Our
> > > > > > > goal is to add an new ism device. As far as the spec is concerned, we are not
> > > > > > > concerned with the implementation of the backend.
> > > > > > > 
> > > > > > > The direction we should discuss is what is the difference between the ism device
> > > > > > > and other devices such as virtio-net, and whether it is necessary to introduce
> > > > > > > this new device.
> > > > > > This is somehow what I want to ask, actually it's not a comparison
> > > > > > with virtio-net but:
> > > > > > 
> > > > > > - virtio-roce
> > > > > > - virtio-vhost-user
> > > > > > - virtio-(p)mem
> > > > > > 
> > > > > > or whether we can simply add features to those devices to achieve what
> > > > > > you want to do here.
> > > > > 
> > > > > Yes, this is my priority to discuss.
> > > > > 
> > > > > At the moment, I think the most similar to ism is the Vhost-user Device Backend
> > > > > of virtio-vhost-user.
> > > > > 
> > > > > My understanding of it is to map any virtio device to another vm as a vvu
> > > > > device.
> > > > Yes, so a possible way is to have a device with memory zone/region
> > > > provision and management then map it via virtio-vhost-user.
> > > 
> > > Yes, there is such a possibility. virtio-vhost-user makes me feel that what can
> > > be shared is the function implementation of map.
> > > 
> > > But in the vm to provide the interface to the upper layer, I think this is the
> > > work of ism.
> > > 
> > > But one of the reasons why I didn't use virtio-vhost-user directly is that in
> > > another vm, the guest can operate the vvu device, which we hope that both sides
> > > are equal to the ism device.
> > > 
> > > So I want to agree on a question first: who will provide the upper layer with
> > > the ability to share the memory area?
> > > 
> > > Our answer is a new ism device. How does this device achieve memory sharing, I
> > > think is the second question.
> > > 
> > > 
> > > > >  From this design purpose, I think the two are different.
> > > > > 
> > > > > Of course, you might want to extend it, it does have some similarities and uses
> > > > > a lot of similar techniques.
> > > > I don't have any preference so far. If you think your idea makes more
> > > > sense, then try your best to justify it in the list.
> > > > 
> > > > > So we can really discuss in this direction, whether
> > > > > the vvu device can be extended to achieve the purpose of ism, or whether the
> > > > > design goals can be agreed.
> > > > I've added Stefan in the loop, let's hear from him.
> > > > 
> > > > > Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
> > > > > Should device/driver APIs remain independent?
> > > > Btw, you mentioned that one possible user of ism is the smc, but I
> > > > don't see how it connects to that with your prototype driver.
> > > Yes, we originally had plans, but the virtio spec was considered for submission,
> > > so this was not included. Maybe, we should have included this part @Tony
> > > 
> > > A brief introduction is that SMC currently has a corresponding
> > > s390/net/ism_drv.c and we will replace this in the virtualization scenario.
> 
> 
> Ok, I see. So I think the goal is to implement something in virtio that is
> functional equivalent to IBM ISM device.
> 

Yes, IBM ISM devices do something similar and it inspired this.

> 
> > > 
> > > Thanks.
> > > 
> > SMC is a network protocol which is modeled by shared memory rather than
> > packet.
> 
> 
> After reading more SMC from IBM website, I think you meant SMC-D here. And I
> wonder in order to have a complete SMC solution we still need virtio-ROCE
> for inter host communcation?
> 

Mostly yes.

SMC-D is the part of whole SMC solution. SMC supports multiple
underlying device, -D means ISM device, -R means RDMA device. The key
data model is shared memory, SMC uses RDMA (-R) or ISM(-D) to *share*
memory between peers, and it will choose the suitable device on demand
during handshaking. If there was no suitable device, it would fall back
to TCP. So virtio-ROCE is not required.

> 
> >   Actually the basic required interfaces of SMC device are:
> > 
> >    - alloc / free memory region, each connection peer has two memory
> > 	regions dynamically for sending and receiving ring buffer.
> >    - attach / detach memory region, remote attaches local-allocated
> > 	sending region as receiving region, vice versa.
> >    - notify, tell peer to read data and update cursor.
> > 
> > Then the device can be registered as SMC ISM device. Of course, SMC
> > also requires some modification to adapt it.
> 
> 
> Looking at s390 ism driver it requires other stuffs like vlan add/remove or
> gid query, do we need them as well?

vlan is not required in this use case. ISM uses gid to identified each
others, maybe we could implement it in virtio ways.

To support virtio-ism smoothly, the interfaces of ISM driver still need
to be adjusted. I will put it on the table with IBM people.

Cheers,
Tony Lu

> 
> Thanks
> 
> 
> > 
> > Cheers,
> > Tony Lu
> > 
> > > > Thanks
> > > > 
> > > > > Thanks.
> > > > > 
> > > > > 
> > > > > > > How to share the backend with other deivce is another problem.
> > > > > > Yes, anything that is used for your virito-ism prototype can be used
> > > > > > for other devices.
> > > > > > 
> > > > > > > Our goal is to dynamically obtain a piece of memory to share with other vms.
> > > > > > So at this level, I don't see the exact difference compared to
> > > > > > virtio-vhost-user. Let's just focus on the API that carries on the
> > > > > > semantic:
> > > > > > 
> > > > > > - map/unmap
> > > > > > - permission update
> > > > > > 
> > > > > > The only missing piece is the per region notification.
> > > > > > 
> > > > > > > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> > > > > > > it will use it as a ring. Of course, we also need a notify mechanism.
> > > > > > > 
> > > > > > > That's what we're aiming for, so we should first discuss whether this
> > > > > > > requirement is reasonable.
> > > > > > So unless somebody said "no", it is fine until now.
> > > > > > 
> > > > > > > I think it's a feature currently not supported by
> > > > > > > other devices specified by the current virtio spce.
> > > > > > Probably, but we've already had rfcs for roce and vhost-user.
> > > > > > 
> > > > > > Thanks
> > > > > > 
> > > > > > > Thanks.
> > > > > > > 
> > > > > > > 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  9:23                     ` Xuan Zhuo
@ 2022-10-21  2:41                       ` Jason Wang
  2022-10-21  2:53                         ` Gerry
  2022-10-21  3:30                         ` Dust Li
  0 siblings, 2 replies; 61+ messages in thread
From: Jason Wang @ 2022-10-21  2:41 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Gerry, virtio-dev, hans, herongguang, zmlcc, tonylu, zhenzao,
	helinguo, mst, cohuck, Stefan Hajnoczi, dust.li

On Wed, Oct 19, 2022 at 5:27 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 19 Oct 2022 17:15:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Oct 19, 2022 at 5:12 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 19 Oct 2022 17:08:29 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Wed, Oct 19, 2022 at 4:21 PM Dust Li <dust.li@linux.alibaba.com> wrote:
> > > > >
> > > > > On Wed, Oct 19, 2022 at 04:03:42PM +0800, Gerry wrote:
> > > > > >
> > > > > >
> > > > > >> 2022年10月19日 16:01,Jason Wang <jasowang@redhat.com> 写道:
> > > > > >>
> > > > > >> On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >>>
> > > > > >>> On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > >>>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >>>>>
> > > > > >>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > >>>>>> Adding Stefan.
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>> Hello everyone,
> > > > > >>>>>>>
> > > > > >>>>>>> # Background
> > > > > >>>>>>>
> > > > > >>>>>>> Nowadays, there is a common scenario to accelerate communication between
> > > > > >>>>>>> different VMs and containers, including light weight virtual machine based
> > > > > >>>>>>> containers. One way to achieve this is to colocate them on the same host.
> > > > > >>>>>>> However, the performance of inter-VM communication through network stack is not
> > > > > >>>>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
> > > > > >>>>>>> many times, but still no generic solution available [1] [2] [3].
> > > > > >>>>>>>
> > > > > >>>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> > > > > >>>>>>> We found that by changing the communication channel between VMs from TCP to SMC
> > > > > >>>>>>> with shared memory, we can achieve superior performance for a common
> > > > > >>>>>>> socket-based application[5]:
> > > > > >>>>>>>  - latency reduced by about 50%
> > > > > >>>>>>>  - throughput increased by about 300%
> > > > > >>>>>>>  - CPU consumption reduced by about 50%
> > > > > >>>>>>>
> > > > > >>>>>>> Since there is no particularly suitable shared memory management solution
> > > > > >>>>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
> > > > > >>>>>>> is the standard for communication in the virtualization world, we want to
> > > > > >>>>>>> implement a virtio-ism device based on virtio, which can support on-demand
> > > > > >>>>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> > > > > >>>>>>> the virtio-ism device need to support:
> > > > > >>>>>>>
> > > > > >>>>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
> > > > > >>>>>>>   provisioned.
> > > > > >>>>>>> 2. Multi-region management: the shared memory is divided into regions,
> > > > > >>>>>>>   and a peer may allocate one or more regions from the same shared memory
> > > > > >>>>>>>   device.
> > > > > >>>>>>> 3. Permission control: The permission of each region can be set seperately.
> > > > > >>>>>>
> > > > > >>>>>> Looks like virtio-ROCE
> > > > > >>>>>>
> > > > > >>>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> > > > > >>>>>>
> > > > > >>>>>> and virtio-vhost-user can satisfy the requirement?
> > > > > >>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> # Virtio ism device
> > > > > >>>>>>>
> > > > > >>>>>>> ISM devices provide the ability to share memory between different guests on a
> > > > > >>>>>>> host. A guest's memory got from ism device can be shared with multiple peers at
> > > > > >>>>>>> the same time. This shared relationship can be dynamically created and released.
> > > > > >>>>>>>
> > > > > >>>>>>> The shared memory obtained from the device is divided into multiple ism regions
> > > > > >>>>>>> for share. ISM device provides a mechanism to notify other ism region referrers
> > > > > >>>>>>> of content update events.
> > > > > >>>>>>>
> > > > > >>>>>>> # Usage (SMC as example)
> > > > > >>>>>>>
> > > > > >>>>>>> Maybe there is one of possible use cases:
> > > > > >>>>>>>
> > > > > >>>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> > > > > >>>>>>>   location of a memory region in the PCI space and a token.
> > > > > >>>>>>> 2. The ism driver mmap the memory region and return to SMC with the token
> > > > > >>>>>>> 3. SMC passes the token to the connected peer
> > > > > >>>>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
> > > > > >>>>>>>   get the location of the PCI space of the shared memory
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> # About hot plugging of the ism device
> > > > > >>>>>>>
> > > > > >>>>>>>   Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> > > > > >>>>>>>   less scalable operation. So, we don't plan to support it for now.
> > > > > >>>>>>>
> > > > > >>>>>>> # Comparison with existing technology
> > > > > >>>>>>>
> > > > > >>>>>>> ## ivshmem or ivshmem 2.0 of Qemu
> > > > > >>>>>>>
> > > > > >>>>>>>   1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> > > > > >>>>>>>   use this VM, so the security is not enough.
> > > > > >>>>>>>
> > > > > >>>>>>>   2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> > > > > >>>>>>>   other VMs that use the ivshmem 2.0 shared memory device, which also does not
> > > > > >>>>>>>   meet our needs in terms of security.
> > > > > >>>>>>>
> > > > > >>>>>>> ## vhost-pci and virtiovhostuser
> > > > > >>>>>>>
> > > > > >>>>>>>   Does not support dynamic allocation and therefore not suitable for SMC.
> > > > > >>>>>>
> > > > > >>>>>> I think this is an implementation issue, we can support VHOST IOTLB
> > > > > >>>>>> message then the regions could be added/removed on demand.
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> 1. After the attacker connects with the victim, if the attacker does not
> > > > > >>>>>   dereference memory, the memory will be occupied under virtiovhostuser. In the
> > > > > >>>>>   case of ism devices, the victim can directly release the reference, and the
> > > > > >>>>>   maliciously referenced region only occupies the attacker's resources
> > > > > >>>>
> > > > > >>>> Let's define the security boundary here. E.g do we trust the device or
> > > > > >>>> not? If yes, in the case of virtiovhostuser, can we simple do
> > > > > >>>> VHOST_IOTLB_UNMAP then we can safely release the memory from the
> > > > > >>>> attacker.
> > > > > >>>>
> > > > > >>>>>
> > > > > >>>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
> > > > > >>>>>   time, which is a challenge for virtiovhostuser
> > > > > >>>>
> > > > > >>>> Please elaborate more the the challenges, anything make
> > > > > >>>> virtiovhostuser different?
> > > > > >>>
> > > > > >>> I understand (please point out any mistakes), one vvu device corresponds to one
> > > > > >>> vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
> > > > > >>
> > > > > >> There could be some misunderstanding here. With 1000 VM, you still
> > > > > >> need 1000 virtio-sim devices I think.
> > > > > >We are trying to achieve one virtio-ism device per vm instead of one virtio-ism device per SMC connection.
> > > >
> > > > I wonder if we need something to identify a virtio-ism device since I
> > > > guess there's still a chance to have multiple virtio-ism device per VM
> > > > (different service chain etc).
> > >
> > > Yes, there will be such a situation, a vm has multiple virtio-ism devices.
> > >
> > > What exactly do you mean by "identify"?
> >
> > E.g we can differ two virtio-net through mac address, do we need
> > something similar for ism, or it's completely unncessary (e.g via
> > token or other) ?
>
> Currently, we have not encountered such a request.
>
> It is conceivable that all physical shared memory ism regions are indexed by
> tokens. virtio-ism is a way to obtain these ism regions, so there is no need to
> distinguish multiple virtio-ism devices under one vm on the host.

So consider a case:

VM1 shares ism1 with VM2
VM1 shares ism2 with VM3

How do application/smc address the different ism device in this case?
E.g if VM1 want to talk with VM3 it needs to populate regions in ism2,
but how can application or protocol knows this and how can a specific
device to be addressed (via BDF?)

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > I think we must achieve this if we want to meet the requirements of SMC.
> > > > > In SMC, a SMC socket(Corresponding to a TCP socket) need 2 memory
> > > > > regions(1 for Tx and 1 for Rx). So if we have 1K TCP connections,
> > > > > we'll need 2K share memory regions, and those memory regions are
> > > > > dynamically allocated and freed with the TCP socket.
> > > > >
> > > > > >
> > > > > >>
> > > > > >>>
> > > > > >>>
> > > > > >>>>
> > > > > >>>>>
> > > > > >>>>> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
> > > > > >>>>>   determines the sharing relationship at startup.
> > > > > >>>>
> > > > > >>>> Not necessarily with IOTLB API?
> > > > > >>>
> > > > > >>> Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
> > > > > >>> provide the same memory on the host to two vms. So the implementation of this
> > > > > >>> part will be much simpler. This is why we gave up virtio-vhost-user at the
> > > > > >>> beginning.
> > > > > >>
> > > > > >> Ok, just to make sure we're at the same page. From spec level,
> > > > > >> virtio-vhost-user doesn't (can't) limit the backend to be implemented
> > > > > >> in another VM. So it should be ok to be used for sharing memory
> > > > > >> between a guest and host.
> > > > > >>
> > > > > >> Thanks
> > > > > >>
> > > > > >>>
> > > > > >>> Thanks.
> > > > > >>>
> > > > > >>>
> > > > > >>>>
> > > > > >>>>>
> > > > > >>>>> 4. For security issues, the device under virtiovhostuser may mmap more memory,
> > > > > >>>>>   while ism only maps one region to other devices
> > > > > >>>>
> > > > > >>>> With VHOST_IOTLB_MAP, the map could be done per region.
> > > > > >>>>
> > > > > >>>> Thanks
> > > > > >>>>
> > > > > >>>>>
> > > > > >>>>> Thanks.
> > > > > >>>>>
> > > > > >>>>>>
> > > > > >>>>>> Thanks
> > > > > >>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> # Design
> > > > > >>>>>>>
> > > > > >>>>>>>   This is a structure diagram based on ism sharing between two vms.
> > > > > >>>>>>>
> > > > > >>>>>>>    |-------------------------------------------------------------------------------------------------------------|
> > > > > >>>>>>>    | |------------------------------------------------|       |------------------------------------------------| |
> > > > > >>>>>>>    | | Guest                                          |       | Guest                                          | |
> > > > > >>>>>>>    | |                                                |       |                                                | |
> > > > > >>>>>>>    | |   ----------------                             |       |   ----------------                             | |
> > > > > >>>>>>>    | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> > > > > >>>>>>>    | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> > > > > >>>>>>>    | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> > > > > >>>>>>>    | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> > > > > >>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > > >>>>>>>    | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> > > > > >>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > > >>>>>>>    | |                                |               |       |                               |                | |
> > > > > >>>>>>>    | |                                |               |       |                               |                | |
> > > > > >>>>>>>    | | Qemu                           |               |       | Qemu                          |                | |
> > > > > >>>>>>>    | |--------------------------------+---------------|       |-------------------------------+----------------| |
> > > > > >>>>>>>    |                                  |                                                       |                  |
> > > > > >>>>>>>    |                                  |                                                       |                  |
> > > > > >>>>>>>    |                                  |------------------------------+------------------------|                  |
> > > > > >>>>>>>    |                                                                 |                                           |
> > > > > >>>>>>>    |                                                                 |                                           |
> > > > > >>>>>>>    |                                                   --------------------------                                |
> > > > > >>>>>>>    |                                                    | M1 |   | M2 |   | M3 |                                 |
> > > > > >>>>>>>    |                                                   --------------------------                                |
> > > > > >>>>>>>    |                                                                                                             |
> > > > > >>>>>>>    | HOST                                                                                                        |
> > > > > >>>>>>>    ---------------------------------------------------------------------------------------------------------------
> > > > > >>>>>>>
> > > > > >>>>>>> # POC code
> > > > > >>>>>>>
> > > > > >>>>>>>   Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> > > > > >>>>>>>   Qemu:   https://github.com/fengidri/qemu/commits/ism
> > > > > >>>>>>>
> > > > > >>>>>>> If there are any problems, please point them out.
> > > > > >>>>>>>
> > > > > >>>>>>> Hope to hear from you, thank you.
> > > > > >>>>>>>
> > > > > >>>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> > > > > >>>>>>> [2] https://dl.acm.org/doi/10.1145/2847562
> > > > > >>>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> > > > > >>>>>>> [4] https://lwn.net/Articles/711071/
> > > > > >>>>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> Xuan Zhuo (2):
> > > > > >>>>>>>  Reserve device id for ISM device
> > > > > >>>>>>>  virtio-ism: introduce new device virtio-ism
> > > > > >>>>>>>
> > > > > >>>>>>> content.tex    |   3 +
> > > > > >>>>>>> virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > >>>>>>> 2 files changed, 343 insertions(+)
> > > > > >>>>>>> create mode 100644 virtio-ism.tex
> > > > > >>>>>>>
> > > > > >>>>>>> --
> > > > > >>>>>>> 2.32.0.3.g01195cf9f
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> ---------------------------------------------------------------------
> > > > > >>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > >>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>> ---------------------------------------------------------------------
> > > > > >>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > >>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19  9:15                 ` Xuan Zhuo
@ 2022-10-21  2:42                   ` Jason Wang
  2022-10-21  3:03                     ` Xuan Zhuo
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Wang @ 2022-10-21  2:42 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Wed, Oct 19, 2022 at 5:23 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 19 Oct 2022 17:11:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Oct 19, 2022 at 4:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 19 Oct 2022 16:13:07 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > On Wed, 19 Oct 2022 16:01:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > Adding Stefan.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hello everyone,
> > > > > > > > > >
> > > > > > > > > > # Background
> > > > > > > > > >
> > > > > > > > > > Nowadays, there is a common scenario to accelerate communication between
> > > > > > > > > > different VMs and containers, including light weight virtual machine based
> > > > > > > > > > containers. One way to achieve this is to colocate them on the same host.
> > > > > > > > > > However, the performance of inter-VM communication through network stack is not
> > > > > > > > > > optimal and may also waste extra CPU cycles. This scenario has been discussed
> > > > > > > > > > many times, but still no generic solution available [1] [2] [3].
> > > > > > > > > >
> > > > > > > > > > With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> > > > > > > > > > We found that by changing the communication channel between VMs from TCP to SMC
> > > > > > > > > > with shared memory, we can achieve superior performance for a common
> > > > > > > > > > socket-based application[5]:
> > > > > > > > > >   - latency reduced by about 50%
> > > > > > > > > >   - throughput increased by about 300%
> > > > > > > > > >   - CPU consumption reduced by about 50%
> > > > > > > > > >
> > > > > > > > > > Since there is no particularly suitable shared memory management solution
> > > > > > > > > > matches the need for SMC(See ## Comparison with existing technology), and virtio
> > > > > > > > > > is the standard for communication in the virtualization world, we want to
> > > > > > > > > > implement a virtio-ism device based on virtio, which can support on-demand
> > > > > > > > > > memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> > > > > > > > > > the virtio-ism device need to support:
> > > > > > > > > >
> > > > > > > > > > 1. Dynamic provision: shared memory regions are dynamically allocated and
> > > > > > > > > >    provisioned.
> > > > > > > > > > 2. Multi-region management: the shared memory is divided into regions,
> > > > > > > > > >    and a peer may allocate one or more regions from the same shared memory
> > > > > > > > > >    device.
> > > > > > > > > > 3. Permission control: The permission of each region can be set seperately.
> > > > > > > > >
> > > > > > > > > Looks like virtio-ROCE
> > > > > > > > >
> > > > > > > > > https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> > > > > > > > >
> > > > > > > > > and virtio-vhost-user can satisfy the requirement?
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > # Virtio ism device
> > > > > > > > > >
> > > > > > > > > > ISM devices provide the ability to share memory between different guests on a
> > > > > > > > > > host. A guest's memory got from ism device can be shared with multiple peers at
> > > > > > > > > > the same time. This shared relationship can be dynamically created and released.
> > > > > > > > > >
> > > > > > > > > > The shared memory obtained from the device is divided into multiple ism regions
> > > > > > > > > > for share. ISM device provides a mechanism to notify other ism region referrers
> > > > > > > > > > of content update events.
> > > > > > > > > >
> > > > > > > > > > # Usage (SMC as example)
> > > > > > > > > >
> > > > > > > > > > Maybe there is one of possible use cases:
> > > > > > > > > >
> > > > > > > > > > 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> > > > > > > > > >    location of a memory region in the PCI space and a token.
> > > > > > > > > > 2. The ism driver mmap the memory region and return to SMC with the token
> > > > > > > > > > 3. SMC passes the token to the connected peer
> > > > > > > > > > 3. the peer calls the ism driver interface ism_attach_region(token) to
> > > > > > > > > >    get the location of the PCI space of the shared memory
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > # About hot plugging of the ism device
> > > > > > > > > >
> > > > > > > > > >    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> > > > > > > > > >    less scalable operation. So, we don't plan to support it for now.
> > > > > > > > > >
> > > > > > > > > > # Comparison with existing technology
> > > > > > > > > >
> > > > > > > > > > ## ivshmem or ivshmem 2.0 of Qemu
> > > > > > > > > >
> > > > > > > > > >    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> > > > > > > > > >    use this VM, so the security is not enough.
> > > > > > > > > >
> > > > > > > > > >    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> > > > > > > > > >    other VMs that use the ivshmem 2.0 shared memory device, which also does not
> > > > > > > > > >    meet our needs in terms of security.
> > > > > > > > > >
> > > > > > > > > > ## vhost-pci and virtiovhostuser
> > > > > > > > > >
> > > > > > > > > >    Does not support dynamic allocation and therefore not suitable for SMC.
> > > > > > > > >
> > > > > > > > > I think this is an implementation issue, we can support VHOST IOTLB
> > > > > > > > > message then the regions could be added/removed on demand.
> > > > > > > >
> > > > > > > >
> > > > > > > > 1. After the attacker connects with the victim, if the attacker does not
> > > > > > > >    dereference memory, the memory will be occupied under virtiovhostuser. In the
> > > > > > > >    case of ism devices, the victim can directly release the reference, and the
> > > > > > > >    maliciously referenced region only occupies the attacker's resources
> > > > > > >
> > > > > > > Let's define the security boundary here. E.g do we trust the device or
> > > > > > > not? If yes, in the case of virtiovhostuser, can we simple do
> > > > > > > VHOST_IOTLB_UNMAP then we can safely release the memory from the
> > > > > > > attacker.
> > > > > > >
> > > > > > > >
> > > > > > > > 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
> > > > > > > >    time, which is a challenge for virtiovhostuser
> > > > > > >
> > > > > > > Please elaborate more the the challenges, anything make
> > > > > > > virtiovhostuser different?
> > > > > >
> > > > > > I understand (please point out any mistakes), one vvu device corresponds to one
> > > > > > vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
> > > > >
> > > > > There could be some misunderstanding here. With 1000 VM, you still
> > > > > need 1000 virtio-sim devices I think.
> > > >
> > > > No, just use a virtio-ism device.
> > >
> > > For example, if the hardware memory of a virtio-ism is 1G, and an ism region is
> > > 1M, there are 1000 ism regions, and these ism regions can be shared with
> > > different vms.
> >
> > Right, this is what I've understood.
> >
> > What I want to say this might be achieved with virtio-vhost-user as
> > well. But it may require a some changes on the protocol which I'm not
> > sure it's worth to bother. And I've started to think about the
> > possibility to build virtio-vhost-user on top (I don't see any blocker
> > so far).
>
> Yes, it is theoretically possible to implement based on virtio-vhost-user. But
> when we try to implement it without depending on virtio-vhost-user, this
> implementation is also very simple. Because the physical memory it shares does
> not come from a vm, but from the host.
>
> So I think we have reached an agreement on the relationship between ism and
> virtio-vhost-user. ism is used to provide shared memory to the upper layer, and
> this device should be necessary to add (of course, listen to some other people's
> opinions). And How is its backend shared with other vms? This is our second
> question.

I'm not sure I get the question, but we're sharing memory not backend?

Thanks

>
> Thanks.
>
>
>
> >
> > Thanks
> >
> > >
> > > And it is dynamic. After an ism region is shared with a vm, it can be shared
> > > with other vms.
> > >
> > > Thanks.
> > >
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
> > > > > > > >    determines the sharing relationship at startup.
> > > > > > >
> > > > > > > Not necessarily with IOTLB API?
> > > > > >
> > > > > > Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
> > > > > > provide the same memory on the host to two vms. So the implementation of this
> > > > > > part will be much simpler. This is why we gave up virtio-vhost-user at the
> > > > > > beginning.
> > > > >
> > > > > Ok, just to make sure we're at the same page. From spec level,
> > > > > virtio-vhost-user doesn't (can't) limit the backend to be implemented
> > > > > in another VM. So it should be ok to be used for sharing memory
> > > > > between a guest and host.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > 4. For security issues, the device under virtiovhostuser may mmap more memory,
> > > > > > > >    while ism only maps one region to other devices
> > > > > > >
> > > > > > > With VHOST_IOTLB_MAP, the map could be done per region.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > # Design
> > > > > > > > > >
> > > > > > > > > >    This is a structure diagram based on ism sharing between two vms.
> > > > > > > > > >
> > > > > > > > > >     |-------------------------------------------------------------------------------------------------------------|
> > > > > > > > > >     | |------------------------------------------------|       |------------------------------------------------| |
> > > > > > > > > >     | | Guest                                          |       | Guest                                          | |
> > > > > > > > > >     | |                                                |       |                                                | |
> > > > > > > > > >     | |   ----------------                             |       |   ----------------                             | |
> > > > > > > > > >     | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> > > > > > > > > >     | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> > > > > > > > > >     | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> > > > > > > > > >     | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> > > > > > > > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > > > > > > >     | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> > > > > > > > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > > > > > > >     | |                                |               |       |                               |                | |
> > > > > > > > > >     | |                                |               |       |                               |                | |
> > > > > > > > > >     | | Qemu                           |               |       | Qemu                          |                | |
> > > > > > > > > >     | |--------------------------------+---------------|       |-------------------------------+----------------| |
> > > > > > > > > >     |                                  |                                                       |                  |
> > > > > > > > > >     |                                  |                                                       |                  |
> > > > > > > > > >     |                                  |------------------------------+------------------------|                  |
> > > > > > > > > >     |                                                                 |                                           |
> > > > > > > > > >     |                                                                 |                                           |
> > > > > > > > > >     |                                                   --------------------------                                |
> > > > > > > > > >     |                                                    | M1 |   | M2 |   | M3 |                                 |
> > > > > > > > > >     |                                                   --------------------------                                |
> > > > > > > > > >     |                                                                                                             |
> > > > > > > > > >     | HOST                                                                                                        |
> > > > > > > > > >     ---------------------------------------------------------------------------------------------------------------
> > > > > > > > > >
> > > > > > > > > > # POC code
> > > > > > > > > >
> > > > > > > > > >    Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> > > > > > > > > >    Qemu:   https://github.com/fengidri/qemu/commits/ism
> > > > > > > > > >
> > > > > > > > > > If there are any problems, please point them out.
> > > > > > > > > >
> > > > > > > > > > Hope to hear from you, thank you.
> > > > > > > > > >
> > > > > > > > > > [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> > > > > > > > > > [2] https://dl.acm.org/doi/10.1145/2847562
> > > > > > > > > > [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> > > > > > > > > > [4] https://lwn.net/Articles/711071/
> > > > > > > > > > [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Xuan Zhuo (2):
> > > > > > > > > >   Reserve device id for ISM device
> > > > > > > > > >   virtio-ism: introduce new device virtio-ism
> > > > > > > > > >
> > > > > > > > > >  content.tex    |   3 +
> > > > > > > > > >  virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > > > > >  2 files changed, 343 insertions(+)
> > > > > > > > > >  create mode 100644 virtio-ism.tex
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > ---------------------------------------------------------------------
> > > > > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > ---------------------------------------------------------------------
> > > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-19 10:01                 ` Tony Lu
@ 2022-10-21  2:47                   ` Jason Wang
  2022-10-21  3:05                     ` Tony Lu
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Wang @ 2022-10-21  2:47 UTC (permalink / raw)
  To: Tony Lu
  Cc: Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, dust.li,
	zhenzao, helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Wed, Oct 19, 2022 at 6:01 PM Tony Lu <tonylu@linux.alibaba.com> wrote:
>
> On Wed, Oct 19, 2022 at 05:04:58PM +0800, Jason Wang wrote:
> >
> > 在 2022/10/19 16:07, Tony Lu 写道:
> > > On Wed, Oct 19, 2022 at 02:02:21PM +0800, Xuan Zhuo wrote:
> > > > On Wed, 19 Oct 2022 12:36:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Wed, Oct 19, 2022 at 12:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Hi Jason,
> > > > > > > >
> > > > > > > > I think there may be some problems with the direction we are discussing.
> > > > > > > Probably not.
> > > > > > >
> > > > > > > As far as we are focusing on technology, there's nothing wrong from my
> > > > > > > perspective. And this is how the community works. Your idea needs to
> > > > > > > be justified and people are free to raise any technical questions
> > > > > > > especially considering you've posted a spec change with prototype
> > > > > > > codes but not only the idea.
> > > > > > >
> > > > > > > > Our
> > > > > > > > goal is to add an new ism device. As far as the spec is concerned, we are not
> > > > > > > > concerned with the implementation of the backend.
> > > > > > > >
> > > > > > > > The direction we should discuss is what is the difference between the ism device
> > > > > > > > and other devices such as virtio-net, and whether it is necessary to introduce
> > > > > > > > this new device.
> > > > > > > This is somehow what I want to ask, actually it's not a comparison
> > > > > > > with virtio-net but:
> > > > > > >
> > > > > > > - virtio-roce
> > > > > > > - virtio-vhost-user
> > > > > > > - virtio-(p)mem
> > > > > > >
> > > > > > > or whether we can simply add features to those devices to achieve what
> > > > > > > you want to do here.
> > > > > >
> > > > > > Yes, this is my priority to discuss.
> > > > > >
> > > > > > At the moment, I think the most similar to ism is the Vhost-user Device Backend
> > > > > > of virtio-vhost-user.
> > > > > >
> > > > > > My understanding of it is to map any virtio device to another vm as a vvu
> > > > > > device.
> > > > > Yes, so a possible way is to have a device with memory zone/region
> > > > > provision and management then map it via virtio-vhost-user.
> > > >
> > > > Yes, there is such a possibility. virtio-vhost-user makes me feel that what can
> > > > be shared is the function implementation of map.
> > > >
> > > > But in the vm to provide the interface to the upper layer, I think this is the
> > > > work of ism.
> > > >
> > > > But one of the reasons why I didn't use virtio-vhost-user directly is that in
> > > > another vm, the guest can operate the vvu device, which we hope that both sides
> > > > are equal to the ism device.
> > > >
> > > > So I want to agree on a question first: who will provide the upper layer with
> > > > the ability to share the memory area?
> > > >
> > > > Our answer is a new ism device. How does this device achieve memory sharing, I
> > > > think is the second question.
> > > >
> > > >
> > > > > >  From this design purpose, I think the two are different.
> > > > > >
> > > > > > Of course, you might want to extend it, it does have some similarities and uses
> > > > > > a lot of similar techniques.
> > > > > I don't have any preference so far. If you think your idea makes more
> > > > > sense, then try your best to justify it in the list.
> > > > >
> > > > > > So we can really discuss in this direction, whether
> > > > > > the vvu device can be extended to achieve the purpose of ism, or whether the
> > > > > > design goals can be agreed.
> > > > > I've added Stefan in the loop, let's hear from him.
> > > > >
> > > > > > Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
> > > > > > Should device/driver APIs remain independent?
> > > > > Btw, you mentioned that one possible user of ism is the smc, but I
> > > > > don't see how it connects to that with your prototype driver.
> > > > Yes, we originally had plans, but the virtio spec was considered for submission,
> > > > so this was not included. Maybe, we should have included this part @Tony
> > > >
> > > > A brief introduction is that SMC currently has a corresponding
> > > > s390/net/ism_drv.c and we will replace this in the virtualization scenario.
> >
> >
> > Ok, I see. So I think the goal is to implement something in virtio that is
> > functional equivalent to IBM ISM device.
> >
>
> Yes, IBM ISM devices do something similar and it inspired this.

Ok, it would be better to mention this in the cover letter of the next
version. This can ease the reviewers (IBM has some good docs of those
from the website).

>
> >
> > > >
> > > > Thanks.
> > > >
> > > SMC is a network protocol which is modeled by shared memory rather than
> > > packet.
> >
> >
> > After reading more SMC from IBM website, I think you meant SMC-D here. And I
> > wonder in order to have a complete SMC solution we still need virtio-ROCE
> > for inter host communcation?
> >
>
> Mostly yes.
>
> SMC-D is the part of whole SMC solution. SMC supports multiple
> underlying device, -D means ISM device, -R means RDMA device. The key
> data model is shared memory, SMC uses RDMA (-R) or ISM(-D) to *share*
> memory between peers, and it will choose the suitable device on demand
> during handshaking. If there was no suitable device, it would fall back
> to TCP. So virtio-ROCE is not required.

So the commniting peers on the same host we need SMC-D, in the future
we need to use RDMA to offload the communication among the peers of
different hosts. Then we can get fully transparent offload no matter
the peer is local or not.

>
> >
> > >   Actually the basic required interfaces of SMC device are:
> > >
> > >    - alloc / free memory region, each connection peer has two memory
> > >     regions dynamically for sending and receiving ring buffer.
> > >    - attach / detach memory region, remote attaches local-allocated
> > >     sending region as receiving region, vice versa.
> > >    - notify, tell peer to read data and update cursor.
> > >
> > > Then the device can be registered as SMC ISM device. Of course, SMC
> > > also requires some modification to adapt it.
> >
> >
> > Looking at s390 ism driver it requires other stuffs like vlan add/remove or
> > gid query, do we need them as well?
>
> vlan is not required in this use case. ISM uses gid to identified each
> others, maybe we could implement it in virtio ways.

I'd suggest adding the codes to register the driver to SMC/ISM in the
next version (instead of a simple procfs hooking). Then people can
easily play or review.

Thanks

>
> To support virtio-ism smoothly, the interfaces of ISM driver still need
> to be adjusted. I will put it on the table with IBM people.
>
> Cheers,
> Tony Lu
>
> >
> > Thanks
> >
> >
> > >
> > > Cheers,
> > > Tony Lu
> > >
> > > > > Thanks
> > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > > > How to share the backend with other deivce is another problem.
> > > > > > > Yes, anything that is used for your virito-ism prototype can be used
> > > > > > > for other devices.
> > > > > > >
> > > > > > > > Our goal is to dynamically obtain a piece of memory to share with other vms.
> > > > > > > So at this level, I don't see the exact difference compared to
> > > > > > > virtio-vhost-user. Let's just focus on the API that carries on the
> > > > > > > semantic:
> > > > > > >
> > > > > > > - map/unmap
> > > > > > > - permission update
> > > > > > >
> > > > > > > The only missing piece is the per region notification.
> > > > > > >
> > > > > > > > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> > > > > > > > it will use it as a ring. Of course, we also need a notify mechanism.
> > > > > > > >
> > > > > > > > That's what we're aiming for, so we should first discuss whether this
> > > > > > > > requirement is reasonable.
> > > > > > > So unless somebody said "no", it is fine until now.
> > > > > > >
> > > > > > > > I think it's a feature currently not supported by
> > > > > > > > other devices specified by the current virtio spce.
> > > > > > > Probably, but we've already had rfcs for roce and vhost-user.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-21  2:41                       ` Jason Wang
@ 2022-10-21  2:53                         ` Gerry
  2022-10-21  3:30                         ` Dust Li
  1 sibling, 0 replies; 61+ messages in thread
From: Gerry @ 2022-10-21  2:53 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, tonylu, zhenzao,
	helinguo, mst, cohuck, Stefan Hajnoczi, dust.li

[-- Attachment #1: Type: text/plain, Size: 17314 bytes --]



> 2022年10月21日 10:41,Jason Wang <jasowang@redhat.com> 写道:
> 
> On Wed, Oct 19, 2022 at 5:27 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com <mailto:xuanzhuo@linux.alibaba.com>> wrote:
>> 
>> On Wed, 19 Oct 2022 17:15:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>> On Wed, Oct 19, 2022 at 5:12 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>> 
>>>> On Wed, 19 Oct 2022 17:08:29 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>> On Wed, Oct 19, 2022 at 4:21 PM Dust Li <dust.li@linux.alibaba.com> wrote:
>>>>>> 
>>>>>> On Wed, Oct 19, 2022 at 04:03:42PM +0800, Gerry wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> 2022年10月19日 16:01,Jason Wang <jasowang@redhat.com> 写道:
>>>>>>>> 
>>>>>>>> On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>>>>> 
>>>>>>>>> On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>>>> Adding Stefan.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # Background
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Nowadays, there is a common scenario to accelerate communication between
>>>>>>>>>>>>> different VMs and containers, including light weight virtual machine based
>>>>>>>>>>>>> containers. One way to achieve this is to colocate them on the same host.
>>>>>>>>>>>>> However, the performance of inter-VM communication through network stack is not
>>>>>>>>>>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
>>>>>>>>>>>>> many times, but still no generic solution available [1] [2] [3].
>>>>>>>>>>>>> 
>>>>>>>>>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
>>>>>>>>>>>>> We found that by changing the communication channel between VMs from TCP to SMC
>>>>>>>>>>>>> with shared memory, we can achieve superior performance for a common
>>>>>>>>>>>>> socket-based application[5]:
>>>>>>>>>>>>> - latency reduced by about 50%
>>>>>>>>>>>>> - throughput increased by about 300%
>>>>>>>>>>>>> - CPU consumption reduced by about 50%
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Since there is no particularly suitable shared memory management solution
>>>>>>>>>>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
>>>>>>>>>>>>> is the standard for communication in the virtualization world, we want to
>>>>>>>>>>>>> implement a virtio-ism device based on virtio, which can support on-demand
>>>>>>>>>>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
>>>>>>>>>>>>> the virtio-ism device need to support:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
>>>>>>>>>>>>>  provisioned.
>>>>>>>>>>>>> 2. Multi-region management: the shared memory is divided into regions,
>>>>>>>>>>>>>  and a peer may allocate one or more regions from the same shared memory
>>>>>>>>>>>>>  device.
>>>>>>>>>>>>> 3. Permission control: The permission of each region can be set seperately.
>>>>>>>>>>>> 
>>>>>>>>>>>> Looks like virtio-ROCE
>>>>>>>>>>>> 
>>>>>>>>>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
>>>>>>>>>>>> 
>>>>>>>>>>>> and virtio-vhost-user can satisfy the requirement?
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # Virtio ism device
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ISM devices provide the ability to share memory between different guests on a
>>>>>>>>>>>>> host. A guest's memory got from ism device can be shared with multiple peers at
>>>>>>>>>>>>> the same time. This shared relationship can be dynamically created and released.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The shared memory obtained from the device is divided into multiple ism regions
>>>>>>>>>>>>> for share. ISM device provides a mechanism to notify other ism region referrers
>>>>>>>>>>>>> of content update events.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # Usage (SMC as example)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Maybe there is one of possible use cases:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
>>>>>>>>>>>>>  location of a memory region in the PCI space and a token.
>>>>>>>>>>>>> 2. The ism driver mmap the memory region and return to SMC with the token
>>>>>>>>>>>>> 3. SMC passes the token to the connected peer
>>>>>>>>>>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
>>>>>>>>>>>>>  get the location of the PCI space of the shared memory
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # About hot plugging of the ism device
>>>>>>>>>>>>> 
>>>>>>>>>>>>>  Hot plugging of devices is a heavier, possibly failed, time-consuming, and
>>>>>>>>>>>>>  less scalable operation. So, we don't plan to support it for now.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # Comparison with existing technology
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ## ivshmem or ivshmem 2.0 of Qemu
>>>>>>>>>>>>> 
>>>>>>>>>>>>>  1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
>>>>>>>>>>>>>  use this VM, so the security is not enough.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>  2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
>>>>>>>>>>>>>  other VMs that use the ivshmem 2.0 shared memory device, which also does not
>>>>>>>>>>>>>  meet our needs in terms of security.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ## vhost-pci and virtiovhostuser
>>>>>>>>>>>>> 
>>>>>>>>>>>>>  Does not support dynamic allocation and therefore not suitable for SMC.
>>>>>>>>>>>> 
>>>>>>>>>>>> I think this is an implementation issue, we can support VHOST IOTLB
>>>>>>>>>>>> message then the regions could be added/removed on demand.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 1. After the attacker connects with the victim, if the attacker does not
>>>>>>>>>>>  dereference memory, the memory will be occupied under virtiovhostuser. In the
>>>>>>>>>>>  case of ism devices, the victim can directly release the reference, and the
>>>>>>>>>>>  maliciously referenced region only occupies the attacker's resources
>>>>>>>>>> 
>>>>>>>>>> Let's define the security boundary here. E.g do we trust the device or
>>>>>>>>>> not? If yes, in the case of virtiovhostuser, can we simple do
>>>>>>>>>> VHOST_IOTLB_UNMAP then we can safely release the memory from the
>>>>>>>>>> attacker.
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
>>>>>>>>>>>  time, which is a challenge for virtiovhostuser
>>>>>>>>>> 
>>>>>>>>>> Please elaborate more the the challenges, anything make
>>>>>>>>>> virtiovhostuser different?
>>>>>>>>> 
>>>>>>>>> I understand (please point out any mistakes), one vvu device corresponds to one
>>>>>>>>> vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
>>>>>>>> 
>>>>>>>> There could be some misunderstanding here. With 1000 VM, you still
>>>>>>>> need 1000 virtio-sim devices I think.
>>>>>>> We are trying to achieve one virtio-ism device per vm instead of one virtio-ism device per SMC connection.
>>>>> 
>>>>> I wonder if we need something to identify a virtio-ism device since I
>>>>> guess there's still a chance to have multiple virtio-ism device per VM
>>>>> (different service chain etc).
>>>> 
>>>> Yes, there will be such a situation, a vm has multiple virtio-ism devices.
>>>> 
>>>> What exactly do you mean by "identify"?
>>> 
>>> E.g we can differ two virtio-net through mac address, do we need
>>> something similar for ism, or it's completely unncessary (e.g via
>>> token or other) ?
>> 
>> Currently, we have not encountered such a request.
>> 
>> It is conceivable that all physical shared memory ism regions are indexed by
>> tokens. virtio-ism is a way to obtain these ism regions, so there is no need to
>> distinguish multiple virtio-ism devices under one vm on the host.
> 
> So consider a case:
> 
> VM1 shares ism1 with VM2
> VM1 shares ism2 with VM3
> 
> How do application/smc address the different ism device in this case?
> E.g if VM1 want to talk with VM3 it needs to populate regions in ism2,
> but how can application or protocol knows this and how can a specific
> device to be addressed (via BDF?)
It works in this way:
1) VM1/VM2/VM3 has one ISM device, and each ISM has an crypt-secure random host id associated.
2) when VM1 try to create a TCP connection with VM2, the associated host id will be passed to VM2 through TCP options.
3) when VM2 found  the host id matches, it assumes VM1/VM2 are on the same physical server.
3) then VM2 allocates a memory buffer from the device manager, 
4) the device manager returns a buffer with an associated token.
4) VM2 sends back the buffer token through TCP options.
5) VM1 issues an attach memory buffer command to the device manager with the returned buffer token.
6) now VM1 and VM2 have access to the same shared memory buffer.

If VM1 wants to build a new TCP connections with VM3, it goes through the same process through the same ISM device, and got another memory buffer shared by VM1 and VM3 only.


> 
> Thanks
> 
>> 
>> Thanks.
>> 
>> 
>>> 
>>> Thanks
>>> 
>>>> 
>>>> Thanks.
>>>> 
>>>> 
>>>>> 
>>>>> Thanks
>>>>> 
>>>>>> 
>>>>>> I think we must achieve this if we want to meet the requirements of SMC.
>>>>>> In SMC, a SMC socket(Corresponding to a TCP socket) need 2 memory
>>>>>> regions(1 for Tx and 1 for Rx). So if we have 1K TCP connections,
>>>>>> we'll need 2K share memory regions, and those memory regions are
>>>>>> dynamically allocated and freed with the TCP socket.
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
>>>>>>>>>>>  determines the sharing relationship at startup.
>>>>>>>>>> 
>>>>>>>>>> Not necessarily with IOTLB API?
>>>>>>>>> 
>>>>>>>>> Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
>>>>>>>>> provide the same memory on the host to two vms. So the implementation of this
>>>>>>>>> part will be much simpler. This is why we gave up virtio-vhost-user at the
>>>>>>>>> beginning.
>>>>>>>> 
>>>>>>>> Ok, just to make sure we're at the same page. From spec level,
>>>>>>>> virtio-vhost-user doesn't (can't) limit the backend to be implemented
>>>>>>>> in another VM. So it should be ok to be used for sharing memory
>>>>>>>> between a guest and host.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 4. For security issues, the device under virtiovhostuser may mmap more memory,
>>>>>>>>>>>  while ism only maps one region to other devices
>>>>>>>>>> 
>>>>>>>>>> With VHOST_IOTLB_MAP, the map could be done per region.
>>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Thanks.
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # Design
>>>>>>>>>>>>> 
>>>>>>>>>>>>>  This is a structure diagram based on ism sharing between two vms.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>   |-------------------------------------------------------------------------------------------------------------|
>>>>>>>>>>>>>   | |------------------------------------------------|       |------------------------------------------------| |
>>>>>>>>>>>>>   | | Guest                                          |       | Guest                                          | |
>>>>>>>>>>>>>   | |                                                |       |                                                | |
>>>>>>>>>>>>>   | |   ----------------                             |       |   ----------------                             | |
>>>>>>>>>>>>>   | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
>>>>>>>>>>>>>   | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
>>>>>>>>>>>>>   | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
>>>>>>>>>>>>>   | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
>>>>>>>>>>>>>   | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>>>>>>>>>>>>   | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
>>>>>>>>>>>>>   | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>>>>>>>>>>>>   | |                                |               |       |                               |                | |
>>>>>>>>>>>>>   | |                                |               |       |                               |                | |
>>>>>>>>>>>>>   | | Qemu                           |               |       | Qemu                          |                | |
>>>>>>>>>>>>>   | |--------------------------------+---------------|       |-------------------------------+----------------| |
>>>>>>>>>>>>>   |                                  |                                                       |                  |
>>>>>>>>>>>>>   |                                  |                                                       |                  |
>>>>>>>>>>>>>   |                                  |------------------------------+------------------------|                  |
>>>>>>>>>>>>>   |                                                                 |                                           |
>>>>>>>>>>>>>   |                                                                 |                                           |
>>>>>>>>>>>>>   |                                                   --------------------------                                |
>>>>>>>>>>>>>   |                                                    | M1 |   | M2 |   | M3 |                                 |
>>>>>>>>>>>>>   |                                                   --------------------------                                |
>>>>>>>>>>>>>   |                                                                                                             |
>>>>>>>>>>>>>   | HOST                                                                                                        |
>>>>>>>>>>>>>   ---------------------------------------------------------------------------------------------------------------
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # POC code
>>>>>>>>>>>>> 
>>>>>>>>>>>>>  Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
>>>>>>>>>>>>>  Qemu:   https://github.com/fengidri/qemu/commits/ism
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If there are any problems, please point them out.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hope to hear from you, thank you.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
>>>>>>>>>>>>> [2] https://dl.acm.org/doi/10.1145/2847562
>>>>>>>>>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
>>>>>>>>>>>>> [4] https://lwn.net/Articles/711071/
>>>>>>>>>>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Xuan Zhuo (2):
>>>>>>>>>>>>> Reserve device id for ISM device
>>>>>>>>>>>>> virtio-ism: introduce new device virtio-ism
>>>>>>>>>>>>> 
>>>>>>>>>>>>> content.tex    |   3 +
>>>>>>>>>>>>> virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>> 2 files changed, 343 insertions(+)
>>>>>>>>>>>>> create mode 100644 virtio-ism.tex
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> 2.32.0.3.g01195cf9f
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>>>>>>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>>>>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>> 
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org <mailto:virtio-dev-unsubscribe@lists.oasis-open.org>
>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org <mailto:virtio-dev-help@lists.oasis-open.org>

[-- Attachment #2: Type: text/html, Size: 40200 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-21  2:42                   ` Jason Wang
@ 2022-10-21  3:03                     ` Xuan Zhuo
  2022-10-21  6:35                       ` Jason Wang
  0 siblings, 1 reply; 61+ messages in thread
From: Xuan Zhuo @ 2022-10-21  3:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Fri, 21 Oct 2022 10:42:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Oct 19, 2022 at 5:23 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 19 Oct 2022 17:11:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Oct 19, 2022 at 4:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Wed, 19 Oct 2022 16:13:07 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > On Wed, 19 Oct 2022 16:01:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > Adding Stefan.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hello everyone,
> > > > > > > > > > >
> > > > > > > > > > > # Background
> > > > > > > > > > >
> > > > > > > > > > > Nowadays, there is a common scenario to accelerate communication between
> > > > > > > > > > > different VMs and containers, including light weight virtual machine based
> > > > > > > > > > > containers. One way to achieve this is to colocate them on the same host.
> > > > > > > > > > > However, the performance of inter-VM communication through network stack is not
> > > > > > > > > > > optimal and may also waste extra CPU cycles. This scenario has been discussed
> > > > > > > > > > > many times, but still no generic solution available [1] [2] [3].
> > > > > > > > > > >
> > > > > > > > > > > With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> > > > > > > > > > > We found that by changing the communication channel between VMs from TCP to SMC
> > > > > > > > > > > with shared memory, we can achieve superior performance for a common
> > > > > > > > > > > socket-based application[5]:
> > > > > > > > > > >   - latency reduced by about 50%
> > > > > > > > > > >   - throughput increased by about 300%
> > > > > > > > > > >   - CPU consumption reduced by about 50%
> > > > > > > > > > >
> > > > > > > > > > > Since there is no particularly suitable shared memory management solution
> > > > > > > > > > > matches the need for SMC(See ## Comparison with existing technology), and virtio
> > > > > > > > > > > is the standard for communication in the virtualization world, we want to
> > > > > > > > > > > implement a virtio-ism device based on virtio, which can support on-demand
> > > > > > > > > > > memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> > > > > > > > > > > the virtio-ism device need to support:
> > > > > > > > > > >
> > > > > > > > > > > 1. Dynamic provision: shared memory regions are dynamically allocated and
> > > > > > > > > > >    provisioned.
> > > > > > > > > > > 2. Multi-region management: the shared memory is divided into regions,
> > > > > > > > > > >    and a peer may allocate one or more regions from the same shared memory
> > > > > > > > > > >    device.
> > > > > > > > > > > 3. Permission control: The permission of each region can be set seperately.
> > > > > > > > > >
> > > > > > > > > > Looks like virtio-ROCE
> > > > > > > > > >
> > > > > > > > > > https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> > > > > > > > > >
> > > > > > > > > > and virtio-vhost-user can satisfy the requirement?
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > # Virtio ism device
> > > > > > > > > > >
> > > > > > > > > > > ISM devices provide the ability to share memory between different guests on a
> > > > > > > > > > > host. A guest's memory got from ism device can be shared with multiple peers at
> > > > > > > > > > > the same time. This shared relationship can be dynamically created and released.
> > > > > > > > > > >
> > > > > > > > > > > The shared memory obtained from the device is divided into multiple ism regions
> > > > > > > > > > > for share. ISM device provides a mechanism to notify other ism region referrers
> > > > > > > > > > > of content update events.
> > > > > > > > > > >
> > > > > > > > > > > # Usage (SMC as example)
> > > > > > > > > > >
> > > > > > > > > > > Maybe there is one of possible use cases:
> > > > > > > > > > >
> > > > > > > > > > > 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> > > > > > > > > > >    location of a memory region in the PCI space and a token.
> > > > > > > > > > > 2. The ism driver mmap the memory region and return to SMC with the token
> > > > > > > > > > > 3. SMC passes the token to the connected peer
> > > > > > > > > > > 3. the peer calls the ism driver interface ism_attach_region(token) to
> > > > > > > > > > >    get the location of the PCI space of the shared memory
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > # About hot plugging of the ism device
> > > > > > > > > > >
> > > > > > > > > > >    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> > > > > > > > > > >    less scalable operation. So, we don't plan to support it for now.
> > > > > > > > > > >
> > > > > > > > > > > # Comparison with existing technology
> > > > > > > > > > >
> > > > > > > > > > > ## ivshmem or ivshmem 2.0 of Qemu
> > > > > > > > > > >
> > > > > > > > > > >    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> > > > > > > > > > >    use this VM, so the security is not enough.
> > > > > > > > > > >
> > > > > > > > > > >    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> > > > > > > > > > >    other VMs that use the ivshmem 2.0 shared memory device, which also does not
> > > > > > > > > > >    meet our needs in terms of security.
> > > > > > > > > > >
> > > > > > > > > > > ## vhost-pci and virtiovhostuser
> > > > > > > > > > >
> > > > > > > > > > >    Does not support dynamic allocation and therefore not suitable for SMC.
> > > > > > > > > >
> > > > > > > > > > I think this is an implementation issue, we can support VHOST IOTLB
> > > > > > > > > > message then the regions could be added/removed on demand.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 1. After the attacker connects with the victim, if the attacker does not
> > > > > > > > >    dereference memory, the memory will be occupied under virtiovhostuser. In the
> > > > > > > > >    case of ism devices, the victim can directly release the reference, and the
> > > > > > > > >    maliciously referenced region only occupies the attacker's resources
> > > > > > > >
> > > > > > > > Let's define the security boundary here. E.g do we trust the device or
> > > > > > > > not? If yes, in the case of virtiovhostuser, can we simple do
> > > > > > > > VHOST_IOTLB_UNMAP then we can safely release the memory from the
> > > > > > > > attacker.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
> > > > > > > > >    time, which is a challenge for virtiovhostuser
> > > > > > > >
> > > > > > > > Please elaborate more the the challenges, anything make
> > > > > > > > virtiovhostuser different?
> > > > > > >
> > > > > > > I understand (please point out any mistakes), one vvu device corresponds to one
> > > > > > > vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
> > > > > >
> > > > > > There could be some misunderstanding here. With 1000 VM, you still
> > > > > > need 1000 virtio-sim devices I think.
> > > > >
> > > > > No, just use a virtio-ism device.
> > > >
> > > > For example, if the hardware memory of a virtio-ism is 1G, and an ism region is
> > > > 1M, there are 1000 ism regions, and these ism regions can be shared with
> > > > different vms.
> > >
> > > Right, this is what I've understood.
> > >
> > > What I want to say this might be achieved with virtio-vhost-user as
> > > well. But it may require a some changes on the protocol which I'm not
> > > sure it's worth to bother. And I've started to think about the
> > > possibility to build virtio-vhost-user on top (I don't see any blocker
> > > so far).
> >
> > Yes, it is theoretically possible to implement based on virtio-vhost-user. But
> > when we try to implement it without depending on virtio-vhost-user, this
> > implementation is also very simple. Because the physical memory it shares does
> > not come from a vm, but from the host.
> >
> > So I think we have reached an agreement on the relationship between ism and
> > virtio-vhost-user. ism is used to provide shared memory to the upper layer, and
> > this device should be necessary to add (of course, listen to some other people's
> > opinions). And How is its backend shared with other vms? This is our second
> > question.
>
> I'm not sure I get the question, but we're sharing memory not backend?


In the design of traditional devices such as virtio-net, a piece of memory is
allocated by guest A and then handed over to the backend for use.
virtio-vhost-user allows another guest B to access guest A's memory.

Our approach is that the memory is allocated by the backend. When alloc/attach,
just insert the memory into the guest's memory space using
memory_region_add_subregion(). That's why we don't use vhost-user in our
implementation.

On the other hand, we are also looking in the other direction. If the memory is
allocated by one vm in the guest, then we have to use the vhost-user protocol.

1. The advantage of this is that it will be more convenient in resource
   management

2. Using the vhost-user protocol on the backend implementation will be more
   complicated than our current solution.

3. If the peer is malicious, then we have to unmap the memory mapping of the
   peer. (This has been discussed in another email, and it should be possible.)

Thanks.





>
> Thanks
>
> >
> > Thanks.
> >
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > And it is dynamic. After an ism region is shared with a vm, it can be shared
> > > > with other vms.
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
> > > > > > > > >    determines the sharing relationship at startup.
> > > > > > > >
> > > > > > > > Not necessarily with IOTLB API?
> > > > > > >
> > > > > > > Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
> > > > > > > provide the same memory on the host to two vms. So the implementation of this
> > > > > > > part will be much simpler. This is why we gave up virtio-vhost-user at the
> > > > > > > beginning.
> > > > > >
> > > > > > Ok, just to make sure we're at the same page. From spec level,
> > > > > > virtio-vhost-user doesn't (can't) limit the backend to be implemented
> > > > > > in another VM. So it should be ok to be used for sharing memory
> > > > > > between a guest and host.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 4. For security issues, the device under virtiovhostuser may mmap more memory,
> > > > > > > > >    while ism only maps one region to other devices
> > > > > > > >
> > > > > > > > With VHOST_IOTLB_MAP, the map could be done per region.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > # Design
> > > > > > > > > > >
> > > > > > > > > > >    This is a structure diagram based on ism sharing between two vms.
> > > > > > > > > > >
> > > > > > > > > > >     |-------------------------------------------------------------------------------------------------------------|
> > > > > > > > > > >     | |------------------------------------------------|       |------------------------------------------------| |
> > > > > > > > > > >     | | Guest                                          |       | Guest                                          | |
> > > > > > > > > > >     | |                                                |       |                                                | |
> > > > > > > > > > >     | |   ----------------                             |       |   ----------------                             | |
> > > > > > > > > > >     | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> > > > > > > > > > >     | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> > > > > > > > > > >     | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> > > > > > > > > > >     | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> > > > > > > > > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > > > > > > > >     | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> > > > > > > > > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > > > > > > > >     | |                                |               |       |                               |                | |
> > > > > > > > > > >     | |                                |               |       |                               |                | |
> > > > > > > > > > >     | | Qemu                           |               |       | Qemu                          |                | |
> > > > > > > > > > >     | |--------------------------------+---------------|       |-------------------------------+----------------| |
> > > > > > > > > > >     |                                  |                                                       |                  |
> > > > > > > > > > >     |                                  |                                                       |                  |
> > > > > > > > > > >     |                                  |------------------------------+------------------------|                  |
> > > > > > > > > > >     |                                                                 |                                           |
> > > > > > > > > > >     |                                                                 |                                           |
> > > > > > > > > > >     |                                                   --------------------------                                |
> > > > > > > > > > >     |                                                    | M1 |   | M2 |   | M3 |                                 |
> > > > > > > > > > >     |                                                   --------------------------                                |
> > > > > > > > > > >     |                                                                                                             |
> > > > > > > > > > >     | HOST                                                                                                        |
> > > > > > > > > > >     ---------------------------------------------------------------------------------------------------------------
> > > > > > > > > > >
> > > > > > > > > > > # POC code
> > > > > > > > > > >
> > > > > > > > > > >    Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> > > > > > > > > > >    Qemu:   https://github.com/fengidri/qemu/commits/ism
> > > > > > > > > > >
> > > > > > > > > > > If there are any problems, please point them out.
> > > > > > > > > > >
> > > > > > > > > > > Hope to hear from you, thank you.
> > > > > > > > > > >
> > > > > > > > > > > [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> > > > > > > > > > > [2] https://dl.acm.org/doi/10.1145/2847562
> > > > > > > > > > > [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> > > > > > > > > > > [4] https://lwn.net/Articles/711071/
> > > > > > > > > > > [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Xuan Zhuo (2):
> > > > > > > > > > >   Reserve device id for ISM device
> > > > > > > > > > >   virtio-ism: introduce new device virtio-ism
> > > > > > > > > > >
> > > > > > > > > > >  content.tex    |   3 +
> > > > > > > > > > >  virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > > > > > >  2 files changed, 343 insertions(+)
> > > > > > > > > > >  create mode 100644 virtio-ism.tex
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > ---------------------------------------------------------------------
> > > > > > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > ---------------------------------------------------------------------
> > > > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > >
> > > >
> > >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-21  2:47                   ` Jason Wang
@ 2022-10-21  3:05                     ` Tony Lu
  2022-10-21  3:07                       ` Jason Wang
  2022-10-21  3:09                       ` Jason Wang
  0 siblings, 2 replies; 61+ messages in thread
From: Tony Lu @ 2022-10-21  3:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, dust.li,
	zhenzao, helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Fri, Oct 21, 2022 at 10:47:29AM +0800, Jason Wang wrote:
> On Wed, Oct 19, 2022 at 6:01 PM Tony Lu <tonylu@linux.alibaba.com> wrote:
> >
> > On Wed, Oct 19, 2022 at 05:04:58PM +0800, Jason Wang wrote:
> > >
> > > 在 2022/10/19 16:07, Tony Lu 写道:
> > > > On Wed, Oct 19, 2022 at 02:02:21PM +0800, Xuan Zhuo wrote:
> > > > > On Wed, 19 Oct 2022 12:36:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Wed, Oct 19, 2022 at 12:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi Jason,
> > > > > > > > >
> > > > > > > > > I think there may be some problems with the direction we are discussing.
> > > > > > > > Probably not.
> > > > > > > >
> > > > > > > > As far as we are focusing on technology, there's nothing wrong from my
> > > > > > > > perspective. And this is how the community works. Your idea needs to
> > > > > > > > be justified and people are free to raise any technical questions
> > > > > > > > especially considering you've posted a spec change with prototype
> > > > > > > > codes but not only the idea.
> > > > > > > >
> > > > > > > > > Our
> > > > > > > > > goal is to add an new ism device. As far as the spec is concerned, we are not
> > > > > > > > > concerned with the implementation of the backend.
> > > > > > > > >
> > > > > > > > > The direction we should discuss is what is the difference between the ism device
> > > > > > > > > and other devices such as virtio-net, and whether it is necessary to introduce
> > > > > > > > > this new device.
> > > > > > > > This is somehow what I want to ask, actually it's not a comparison
> > > > > > > > with virtio-net but:
> > > > > > > >
> > > > > > > > - virtio-roce
> > > > > > > > - virtio-vhost-user
> > > > > > > > - virtio-(p)mem
> > > > > > > >
> > > > > > > > or whether we can simply add features to those devices to achieve what
> > > > > > > > you want to do here.
> > > > > > >
> > > > > > > Yes, this is my priority to discuss.
> > > > > > >
> > > > > > > At the moment, I think the most similar to ism is the Vhost-user Device Backend
> > > > > > > of virtio-vhost-user.
> > > > > > >
> > > > > > > My understanding of it is to map any virtio device to another vm as a vvu
> > > > > > > device.
> > > > > > Yes, so a possible way is to have a device with memory zone/region
> > > > > > provision and management then map it via virtio-vhost-user.
> > > > >
> > > > > Yes, there is such a possibility. virtio-vhost-user makes me feel that what can
> > > > > be shared is the function implementation of map.
> > > > >
> > > > > But in the vm to provide the interface to the upper layer, I think this is the
> > > > > work of ism.
> > > > >
> > > > > But one of the reasons why I didn't use virtio-vhost-user directly is that in
> > > > > another vm, the guest can operate the vvu device, which we hope that both sides
> > > > > are equal to the ism device.
> > > > >
> > > > > So I want to agree on a question first: who will provide the upper layer with
> > > > > the ability to share the memory area?
> > > > >
> > > > > Our answer is a new ism device. How does this device achieve memory sharing, I
> > > > > think is the second question.
> > > > >
> > > > >
> > > > > > >  From this design purpose, I think the two are different.
> > > > > > >
> > > > > > > Of course, you might want to extend it, it does have some similarities and uses
> > > > > > > a lot of similar techniques.
> > > > > > I don't have any preference so far. If you think your idea makes more
> > > > > > sense, then try your best to justify it in the list.
> > > > > >
> > > > > > > So we can really discuss in this direction, whether
> > > > > > > the vvu device can be extended to achieve the purpose of ism, or whether the
> > > > > > > design goals can be agreed.
> > > > > > I've added Stefan in the loop, let's hear from him.
> > > > > >
> > > > > > > Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
> > > > > > > Should device/driver APIs remain independent?
> > > > > > Btw, you mentioned that one possible user of ism is the smc, but I
> > > > > > don't see how it connects to that with your prototype driver.
> > > > > Yes, we originally had plans, but the virtio spec was considered for submission,
> > > > > so this was not included. Maybe, we should have included this part @Tony
> > > > >
> > > > > A brief introduction is that SMC currently has a corresponding
> > > > > s390/net/ism_drv.c and we will replace this in the virtualization scenario.
> > >
> > >
> > > Ok, I see. So I think the goal is to implement something in virtio that is
> > > functional equivalent to IBM ISM device.
> > >
> >
> > Yes, IBM ISM devices do something similar and it inspired this.
> 
> Ok, it would be better to mention this in the cover letter of the next
> version. This can ease the reviewers (IBM has some good docs of those
> from the website).
> 

Yes, we will do it.

> >
> > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > SMC is a network protocol which is modeled by shared memory rather than
> > > > packet.
> > >
> > >
> > > After reading more SMC from IBM website, I think you meant SMC-D here. And I
> > > wonder in order to have a complete SMC solution we still need virtio-ROCE
> > > for inter host communcation?
> > >
> >
> > Mostly yes.
> >
> > SMC-D is the part of whole SMC solution. SMC supports multiple
> > underlying device, -D means ISM device, -R means RDMA device. The key
> > data model is shared memory, SMC uses RDMA (-R) or ISM(-D) to *share*
> > memory between peers, and it will choose the suitable device on demand
> > during handshaking. If there was no suitable device, it would fall back
> > to TCP. So virtio-ROCE is not required.
> 
> So the commniting peers on the same host we need SMC-D, in the future
> we need to use RDMA to offload the communication among the peers of
> different hosts. Then we can get fully transparent offload no matter
> the peer is local or not.
> 

Yes, this is what we want to do.

> >
> > >
> > > >   Actually the basic required interfaces of SMC device are:
> > > >
> > > >    - alloc / free memory region, each connection peer has two memory
> > > >     regions dynamically for sending and receiving ring buffer.
> > > >    - attach / detach memory region, remote attaches local-allocated
> > > >     sending region as receiving region, vice versa.
> > > >    - notify, tell peer to read data and update cursor.
> > > >
> > > > Then the device can be registered as SMC ISM device. Of course, SMC
> > > > also requires some modification to adapt it.
> > >
> > >
> > > Looking at s390 ism driver it requires other stuffs like vlan add/remove or
> > > gid query, do we need them as well?
> >
> > vlan is not required in this use case. ISM uses gid to identified each
> > others, maybe we could implement it in virtio ways.
> 
> I'd suggest adding the codes to register the driver to SMC/ISM in the
> next version (instead of a simple procfs hooking). Then people can
> easily play or review.
> 

Ok, I will add the codes in the next version.

Cheers,
Tony Lu

> Thanks
> 
> >
> > To support virtio-ism smoothly, the interfaces of ISM driver still need
> > to be adjusted. I will put it on the table with IBM people.
> >
> > Cheers,
> > Tony Lu
> >
> > >
> > > Thanks
> > >
> > >
> > > >
> > > > Cheers,
> > > > Tony Lu
> > > >
> > > > > > Thanks
> > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > > > > How to share the backend with other deivce is another problem.
> > > > > > > > Yes, anything that is used for your virito-ism prototype can be used
> > > > > > > > for other devices.
> > > > > > > >
> > > > > > > > > Our goal is to dynamically obtain a piece of memory to share with other vms.
> > > > > > > > So at this level, I don't see the exact difference compared to
> > > > > > > > virtio-vhost-user. Let's just focus on the API that carries on the
> > > > > > > > semantic:
> > > > > > > >
> > > > > > > > - map/unmap
> > > > > > > > - permission update
> > > > > > > >
> > > > > > > > The only missing piece is the per region notification.
> > > > > > > >
> > > > > > > > > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> > > > > > > > > it will use it as a ring. Of course, we also need a notify mechanism.
> > > > > > > > >
> > > > > > > > > That's what we're aiming for, so we should first discuss whether this
> > > > > > > > > requirement is reasonable.
> > > > > > > > So unless somebody said "no", it is fine until now.
> > > > > > > >
> > > > > > > > > I think it's a feature currently not supported by
> > > > > > > > > other devices specified by the current virtio spce.
> > > > > > > > Probably, but we've already had rfcs for roce and vhost-user.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > >
> >


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-21  3:05                     ` Tony Lu
@ 2022-10-21  3:07                       ` Jason Wang
  2022-10-21  3:23                         ` Tony Lu
  2022-10-21  3:09                       ` Jason Wang
  1 sibling, 1 reply; 61+ messages in thread
From: Jason Wang @ 2022-10-21  3:07 UTC (permalink / raw)
  To: Tony Lu
  Cc: Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, dust.li,
	zhenzao, helinguo, gerry, mst, cohuck, Stefan Hajnoczi,
	Yongji Xie

On Fri, Oct 21, 2022 at 11:05 AM Tony Lu <tonylu@linux.alibaba.com> wrote:
>
> On Fri, Oct 21, 2022 at 10:47:29AM +0800, Jason Wang wrote:
> > On Wed, Oct 19, 2022 at 6:01 PM Tony Lu <tonylu@linux.alibaba.com> wrote:
> > >
> > > On Wed, Oct 19, 2022 at 05:04:58PM +0800, Jason Wang wrote:
> > > >
> > > > 在 2022/10/19 16:07, Tony Lu 写道:
> > > > > On Wed, Oct 19, 2022 at 02:02:21PM +0800, Xuan Zhuo wrote:
> > > > > > On Wed, 19 Oct 2022 12:36:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Wed, Oct 19, 2022 at 12:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi Jason,
> > > > > > > > > >
> > > > > > > > > > I think there may be some problems with the direction we are discussing.
> > > > > > > > > Probably not.
> > > > > > > > >
> > > > > > > > > As far as we are focusing on technology, there's nothing wrong from my
> > > > > > > > > perspective. And this is how the community works. Your idea needs to
> > > > > > > > > be justified and people are free to raise any technical questions
> > > > > > > > > especially considering you've posted a spec change with prototype
> > > > > > > > > codes but not only the idea.
> > > > > > > > >
> > > > > > > > > > Our
> > > > > > > > > > goal is to add an new ism device. As far as the spec is concerned, we are not
> > > > > > > > > > concerned with the implementation of the backend.
> > > > > > > > > >
> > > > > > > > > > The direction we should discuss is what is the difference between the ism device
> > > > > > > > > > and other devices such as virtio-net, and whether it is necessary to introduce
> > > > > > > > > > this new device.
> > > > > > > > > This is somehow what I want to ask, actually it's not a comparison
> > > > > > > > > with virtio-net but:
> > > > > > > > >
> > > > > > > > > - virtio-roce
> > > > > > > > > - virtio-vhost-user
> > > > > > > > > - virtio-(p)mem
> > > > > > > > >
> > > > > > > > > or whether we can simply add features to those devices to achieve what
> > > > > > > > > you want to do here.
> > > > > > > >
> > > > > > > > Yes, this is my priority to discuss.
> > > > > > > >
> > > > > > > > At the moment, I think the most similar to ism is the Vhost-user Device Backend
> > > > > > > > of virtio-vhost-user.
> > > > > > > >
> > > > > > > > My understanding of it is to map any virtio device to another vm as a vvu
> > > > > > > > device.
> > > > > > > Yes, so a possible way is to have a device with memory zone/region
> > > > > > > provision and management then map it via virtio-vhost-user.
> > > > > >
> > > > > > Yes, there is such a possibility. virtio-vhost-user makes me feel that what can
> > > > > > be shared is the function implementation of map.
> > > > > >
> > > > > > But in the vm to provide the interface to the upper layer, I think this is the
> > > > > > work of ism.
> > > > > >
> > > > > > But one of the reasons why I didn't use virtio-vhost-user directly is that in
> > > > > > another vm, the guest can operate the vvu device, which we hope that both sides
> > > > > > are equal to the ism device.
> > > > > >
> > > > > > So I want to agree on a question first: who will provide the upper layer with
> > > > > > the ability to share the memory area?
> > > > > >
> > > > > > Our answer is a new ism device. How does this device achieve memory sharing, I
> > > > > > think is the second question.
> > > > > >
> > > > > >
> > > > > > > >  From this design purpose, I think the two are different.
> > > > > > > >
> > > > > > > > Of course, you might want to extend it, it does have some similarities and uses
> > > > > > > > a lot of similar techniques.
> > > > > > > I don't have any preference so far. If you think your idea makes more
> > > > > > > sense, then try your best to justify it in the list.
> > > > > > >
> > > > > > > > So we can really discuss in this direction, whether
> > > > > > > > the vvu device can be extended to achieve the purpose of ism, or whether the
> > > > > > > > design goals can be agreed.
> > > > > > > I've added Stefan in the loop, let's hear from him.
> > > > > > >
> > > > > > > > Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
> > > > > > > > Should device/driver APIs remain independent?
> > > > > > > Btw, you mentioned that one possible user of ism is the smc, but I
> > > > > > > don't see how it connects to that with your prototype driver.
> > > > > > Yes, we originally had plans, but the virtio spec was considered for submission,
> > > > > > so this was not included. Maybe, we should have included this part @Tony
> > > > > >
> > > > > > A brief introduction is that SMC currently has a corresponding
> > > > > > s390/net/ism_drv.c and we will replace this in the virtualization scenario.
> > > >
> > > >
> > > > Ok, I see. So I think the goal is to implement something in virtio that is
> > > > functional equivalent to IBM ISM device.
> > > >
> > >
> > > Yes, IBM ISM devices do something similar and it inspired this.
> >
> > Ok, it would be better to mention this in the cover letter of the next
> > version. This can ease the reviewers (IBM has some good docs of those
> > from the website).
> >
>
> Yes, we will do it.
>
> > >
> > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > SMC is a network protocol which is modeled by shared memory rather than
> > > > > packet.
> > > >
> > > >
> > > > After reading more SMC from IBM website, I think you meant SMC-D here. And I
> > > > wonder in order to have a complete SMC solution we still need virtio-ROCE
> > > > for inter host communcation?
> > > >
> > >
> > > Mostly yes.
> > >
> > > SMC-D is the part of whole SMC solution. SMC supports multiple
> > > underlying device, -D means ISM device, -R means RDMA device. The key
> > > data model is shared memory, SMC uses RDMA (-R) or ISM(-D) to *share*
> > > memory between peers, and it will choose the suitable device on demand
> > > during handshaking. If there was no suitable device, it would fall back
> > > to TCP. So virtio-ROCE is not required.
> >
> > So the commniting peers on the same host we need SMC-D, in the future
> > we need to use RDMA to offload the communication among the peers of
> > different hosts. Then we can get fully transparent offload no matter
> > the peer is local or not.
> >
>
> Yes, this is what we want to do.

Great, Yong Ji posted a ROCE proposal[1], it would be appreciated if
you can review and give feedback.

[1] https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/

Thanks

>
> > >
> > > >
> > > > >   Actually the basic required interfaces of SMC device are:
> > > > >
> > > > >    - alloc / free memory region, each connection peer has two memory
> > > > >     regions dynamically for sending and receiving ring buffer.
> > > > >    - attach / detach memory region, remote attaches local-allocated
> > > > >     sending region as receiving region, vice versa.
> > > > >    - notify, tell peer to read data and update cursor.
> > > > >
> > > > > Then the device can be registered as SMC ISM device. Of course, SMC
> > > > > also requires some modification to adapt it.
> > > >
> > > >
> > > > Looking at s390 ism driver it requires other stuffs like vlan add/remove or
> > > > gid query, do we need them as well?
> > >
> > > vlan is not required in this use case. ISM uses gid to identified each
> > > others, maybe we could implement it in virtio ways.
> >
> > I'd suggest adding the codes to register the driver to SMC/ISM in the
> > next version (instead of a simple procfs hooking). Then people can
> > easily play or review.
> >
>
> Ok, I will add the codes in the next version.
>
> Cheers,
> Tony Lu
>
> > Thanks
> >
> > >
> > > To support virtio-ism smoothly, the interfaces of ISM driver still need
> > > to be adjusted. I will put it on the table with IBM people.
> > >
> > > Cheers,
> > > Tony Lu
> > >
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > Cheers,
> > > > > Tony Lu
> > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > > > > How to share the backend with other deivce is another problem.
> > > > > > > > > Yes, anything that is used for your virito-ism prototype can be used
> > > > > > > > > for other devices.
> > > > > > > > >
> > > > > > > > > > Our goal is to dynamically obtain a piece of memory to share with other vms.
> > > > > > > > > So at this level, I don't see the exact difference compared to
> > > > > > > > > virtio-vhost-user. Let's just focus on the API that carries on the
> > > > > > > > > semantic:
> > > > > > > > >
> > > > > > > > > - map/unmap
> > > > > > > > > - permission update
> > > > > > > > >
> > > > > > > > > The only missing piece is the per region notification.
> > > > > > > > >
> > > > > > > > > > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> > > > > > > > > > it will use it as a ring. Of course, we also need a notify mechanism.
> > > > > > > > > >
> > > > > > > > > > That's what we're aiming for, so we should first discuss whether this
> > > > > > > > > > requirement is reasonable.
> > > > > > > > > So unless somebody said "no", it is fine until now.
> > > > > > > > >
> > > > > > > > > > I think it's a feature currently not supported by
> > > > > > > > > > other devices specified by the current virtio spce.
> > > > > > > > > Probably, but we've already had rfcs for roce and vhost-user.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > > Thanks.
> > > > > > > > > >
> > > > > > > > > >
> > >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-21  3:05                     ` Tony Lu
  2022-10-21  3:07                       ` Jason Wang
@ 2022-10-21  3:09                       ` Jason Wang
  2022-10-21  3:53                         ` Tony Lu
  1 sibling, 1 reply; 61+ messages in thread
From: Jason Wang @ 2022-10-21  3:09 UTC (permalink / raw)
  To: Tony Lu
  Cc: Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, dust.li,
	zhenzao, helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Fri, Oct 21, 2022 at 11:05 AM Tony Lu <tonylu@linux.alibaba.com> wrote:
>
> On Fri, Oct 21, 2022 at 10:47:29AM +0800, Jason Wang wrote:
> > On Wed, Oct 19, 2022 at 6:01 PM Tony Lu <tonylu@linux.alibaba.com> wrote:
> > >
> > > On Wed, Oct 19, 2022 at 05:04:58PM +0800, Jason Wang wrote:
> > > >
> > > > 在 2022/10/19 16:07, Tony Lu 写道:
> > > > > On Wed, Oct 19, 2022 at 02:02:21PM +0800, Xuan Zhuo wrote:
> > > > > > On Wed, 19 Oct 2022 12:36:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Wed, Oct 19, 2022 at 12:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi Jason,
> > > > > > > > > >
> > > > > > > > > > I think there may be some problems with the direction we are discussing.
> > > > > > > > > Probably not.
> > > > > > > > >
> > > > > > > > > As far as we are focusing on technology, there's nothing wrong from my
> > > > > > > > > perspective. And this is how the community works. Your idea needs to
> > > > > > > > > be justified and people are free to raise any technical questions
> > > > > > > > > especially considering you've posted a spec change with prototype
> > > > > > > > > codes but not only the idea.
> > > > > > > > >
> > > > > > > > > > Our
> > > > > > > > > > goal is to add an new ism device. As far as the spec is concerned, we are not
> > > > > > > > > > concerned with the implementation of the backend.
> > > > > > > > > >
> > > > > > > > > > The direction we should discuss is what is the difference between the ism device
> > > > > > > > > > and other devices such as virtio-net, and whether it is necessary to introduce
> > > > > > > > > > this new device.
> > > > > > > > > This is somehow what I want to ask, actually it's not a comparison
> > > > > > > > > with virtio-net but:
> > > > > > > > >
> > > > > > > > > - virtio-roce
> > > > > > > > > - virtio-vhost-user
> > > > > > > > > - virtio-(p)mem
> > > > > > > > >
> > > > > > > > > or whether we can simply add features to those devices to achieve what
> > > > > > > > > you want to do here.
> > > > > > > >
> > > > > > > > Yes, this is my priority to discuss.
> > > > > > > >
> > > > > > > > At the moment, I think the most similar to ism is the Vhost-user Device Backend
> > > > > > > > of virtio-vhost-user.
> > > > > > > >
> > > > > > > > My understanding of it is to map any virtio device to another vm as a vvu
> > > > > > > > device.
> > > > > > > Yes, so a possible way is to have a device with memory zone/region
> > > > > > > provision and management then map it via virtio-vhost-user.
> > > > > >
> > > > > > Yes, there is such a possibility. virtio-vhost-user makes me feel that what can
> > > > > > be shared is the function implementation of map.
> > > > > >
> > > > > > But in the vm to provide the interface to the upper layer, I think this is the
> > > > > > work of ism.
> > > > > >
> > > > > > But one of the reasons why I didn't use virtio-vhost-user directly is that in
> > > > > > another vm, the guest can operate the vvu device, which we hope that both sides
> > > > > > are equal to the ism device.
> > > > > >
> > > > > > So I want to agree on a question first: who will provide the upper layer with
> > > > > > the ability to share the memory area?
> > > > > >
> > > > > > Our answer is a new ism device. How does this device achieve memory sharing, I
> > > > > > think is the second question.
> > > > > >
> > > > > >
> > > > > > > >  From this design purpose, I think the two are different.
> > > > > > > >
> > > > > > > > Of course, you might want to extend it, it does have some similarities and uses
> > > > > > > > a lot of similar techniques.
> > > > > > > I don't have any preference so far. If you think your idea makes more
> > > > > > > sense, then try your best to justify it in the list.
> > > > > > >
> > > > > > > > So we can really discuss in this direction, whether
> > > > > > > > the vvu device can be extended to achieve the purpose of ism, or whether the
> > > > > > > > design goals can be agreed.
> > > > > > > I've added Stefan in the loop, let's hear from him.
> > > > > > >
> > > > > > > > Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
> > > > > > > > Should device/driver APIs remain independent?
> > > > > > > Btw, you mentioned that one possible user of ism is the smc, but I
> > > > > > > don't see how it connects to that with your prototype driver.
> > > > > > Yes, we originally had plans, but the virtio spec was considered for submission,
> > > > > > so this was not included. Maybe, we should have included this part @Tony
> > > > > >
> > > > > > A brief introduction is that SMC currently has a corresponding
> > > > > > s390/net/ism_drv.c and we will replace this in the virtualization scenario.
> > > >
> > > >
> > > > Ok, I see. So I think the goal is to implement something in virtio that is
> > > > functional equivalent to IBM ISM device.
> > > >
> > >
> > > Yes, IBM ISM devices do something similar and it inspired this.
> >
> > Ok, it would be better to mention this in the cover letter of the next
> > version. This can ease the reviewers (IBM has some good docs of those
> > from the website).
> >
>
> Yes, we will do it.

Btw, I wonder about the plan to support live migration. E.g do we need
to hot unplug the ism device before the migration then we can fallback
to TCP/IP ?

Thanks

>
> > >
> > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > SMC is a network protocol which is modeled by shared memory rather than
> > > > > packet.
> > > >
> > > >
> > > > After reading more SMC from IBM website, I think you meant SMC-D here. And I
> > > > wonder in order to have a complete SMC solution we still need virtio-ROCE
> > > > for inter host communcation?
> > > >
> > >
> > > Mostly yes.
> > >
> > > SMC-D is the part of whole SMC solution. SMC supports multiple
> > > underlying device, -D means ISM device, -R means RDMA device. The key
> > > data model is shared memory, SMC uses RDMA (-R) or ISM(-D) to *share*
> > > memory between peers, and it will choose the suitable device on demand
> > > during handshaking. If there was no suitable device, it would fall back
> > > to TCP. So virtio-ROCE is not required.
> >
> > So the commniting peers on the same host we need SMC-D, in the future
> > we need to use RDMA to offload the communication among the peers of
> > different hosts. Then we can get fully transparent offload no matter
> > the peer is local or not.
> >
>
> Yes, this is what we want to do.
>
> > >
> > > >
> > > > >   Actually the basic required interfaces of SMC device are:
> > > > >
> > > > >    - alloc / free memory region, each connection peer has two memory
> > > > >     regions dynamically for sending and receiving ring buffer.
> > > > >    - attach / detach memory region, remote attaches local-allocated
> > > > >     sending region as receiving region, vice versa.
> > > > >    - notify, tell peer to read data and update cursor.
> > > > >
> > > > > Then the device can be registered as SMC ISM device. Of course, SMC
> > > > > also requires some modification to adapt it.
> > > >
> > > >
> > > > Looking at s390 ism driver it requires other stuffs like vlan add/remove or
> > > > gid query, do we need them as well?
> > >
> > > vlan is not required in this use case. ISM uses gid to identified each
> > > others, maybe we could implement it in virtio ways.
> >
> > I'd suggest adding the codes to register the driver to SMC/ISM in the
> > next version (instead of a simple procfs hooking). Then people can
> > easily play or review.
> >
>
> Ok, I will add the codes in the next version.
>
> Cheers,
> Tony Lu
>
> > Thanks
> >
> > >
> > > To support virtio-ism smoothly, the interfaces of ISM driver still need
> > > to be adjusted. I will put it on the table with IBM people.
> > >
> > > Cheers,
> > > Tony Lu
> > >
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > Cheers,
> > > > > Tony Lu
> > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > > > > How to share the backend with other deivce is another problem.
> > > > > > > > > Yes, anything that is used for your virito-ism prototype can be used
> > > > > > > > > for other devices.
> > > > > > > > >
> > > > > > > > > > Our goal is to dynamically obtain a piece of memory to share with other vms.
> > > > > > > > > So at this level, I don't see the exact difference compared to
> > > > > > > > > virtio-vhost-user. Let's just focus on the API that carries on the
> > > > > > > > > semantic:
> > > > > > > > >
> > > > > > > > > - map/unmap
> > > > > > > > > - permission update
> > > > > > > > >
> > > > > > > > > The only missing piece is the per region notification.
> > > > > > > > >
> > > > > > > > > > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> > > > > > > > > > it will use it as a ring. Of course, we also need a notify mechanism.
> > > > > > > > > >
> > > > > > > > > > That's what we're aiming for, so we should first discuss whether this
> > > > > > > > > > requirement is reasonable.
> > > > > > > > > So unless somebody said "no", it is fine until now.
> > > > > > > > >
> > > > > > > > > > I think it's a feature currently not supported by
> > > > > > > > > > other devices specified by the current virtio spce.
> > > > > > > > > Probably, but we've already had rfcs for roce and vhost-user.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > > Thanks.
> > > > > > > > > >
> > > > > > > > > >
> > >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-21  3:07                       ` Jason Wang
@ 2022-10-21  3:23                         ` Tony Lu
  0 siblings, 0 replies; 61+ messages in thread
From: Tony Lu @ 2022-10-21  3:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, dust.li,
	zhenzao, helinguo, gerry, mst, cohuck, Stefan Hajnoczi,
	Yongji Xie

On Fri, Oct 21, 2022 at 11:07:27AM +0800, Jason Wang wrote:
> On Fri, Oct 21, 2022 at 11:05 AM Tony Lu <tonylu@linux.alibaba.com> wrote:
> >
> > On Fri, Oct 21, 2022 at 10:47:29AM +0800, Jason Wang wrote:
> > > On Wed, Oct 19, 2022 at 6:01 PM Tony Lu <tonylu@linux.alibaba.com> wrote:
> > > >
> > > > On Wed, Oct 19, 2022 at 05:04:58PM +0800, Jason Wang wrote:
> > > > >
> > > > > 在 2022/10/19 16:07, Tony Lu 写道:
> > > > > > On Wed, Oct 19, 2022 at 02:02:21PM +0800, Xuan Zhuo wrote:
> > > > > > > On Wed, 19 Oct 2022 12:36:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Wed, Oct 19, 2022 at 12:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi Jason,
> > > > > > > > > > >
> > > > > > > > > > > I think there may be some problems with the direction we are discussing.
> > > > > > > > > > Probably not.
> > > > > > > > > >
> > > > > > > > > > As far as we are focusing on technology, there's nothing wrong from my
> > > > > > > > > > perspective. And this is how the community works. Your idea needs to
> > > > > > > > > > be justified and people are free to raise any technical questions
> > > > > > > > > > especially considering you've posted a spec change with prototype
> > > > > > > > > > codes but not only the idea.
> > > > > > > > > >
> > > > > > > > > > > Our
> > > > > > > > > > > goal is to add an new ism device. As far as the spec is concerned, we are not
> > > > > > > > > > > concerned with the implementation of the backend.
> > > > > > > > > > >
> > > > > > > > > > > The direction we should discuss is what is the difference between the ism device
> > > > > > > > > > > and other devices such as virtio-net, and whether it is necessary to introduce
> > > > > > > > > > > this new device.
> > > > > > > > > > This is somehow what I want to ask, actually it's not a comparison
> > > > > > > > > > with virtio-net but:
> > > > > > > > > >
> > > > > > > > > > - virtio-roce
> > > > > > > > > > - virtio-vhost-user
> > > > > > > > > > - virtio-(p)mem
> > > > > > > > > >
> > > > > > > > > > or whether we can simply add features to those devices to achieve what
> > > > > > > > > > you want to do here.
> > > > > > > > >
> > > > > > > > > Yes, this is my priority to discuss.
> > > > > > > > >
> > > > > > > > > At the moment, I think the most similar to ism is the Vhost-user Device Backend
> > > > > > > > > of virtio-vhost-user.
> > > > > > > > >
> > > > > > > > > My understanding of it is to map any virtio device to another vm as a vvu
> > > > > > > > > device.
> > > > > > > > Yes, so a possible way is to have a device with memory zone/region
> > > > > > > > provision and management then map it via virtio-vhost-user.
> > > > > > >
> > > > > > > Yes, there is such a possibility. virtio-vhost-user makes me feel that what can
> > > > > > > be shared is the function implementation of map.
> > > > > > >
> > > > > > > But in the vm to provide the interface to the upper layer, I think this is the
> > > > > > > work of ism.
> > > > > > >
> > > > > > > But one of the reasons why I didn't use virtio-vhost-user directly is that in
> > > > > > > another vm, the guest can operate the vvu device, which we hope that both sides
> > > > > > > are equal to the ism device.
> > > > > > >
> > > > > > > So I want to agree on a question first: who will provide the upper layer with
> > > > > > > the ability to share the memory area?
> > > > > > >
> > > > > > > Our answer is a new ism device. How does this device achieve memory sharing, I
> > > > > > > think is the second question.
> > > > > > >
> > > > > > >
> > > > > > > > >  From this design purpose, I think the two are different.
> > > > > > > > >
> > > > > > > > > Of course, you might want to extend it, it does have some similarities and uses
> > > > > > > > > a lot of similar techniques.
> > > > > > > > I don't have any preference so far. If you think your idea makes more
> > > > > > > > sense, then try your best to justify it in the list.
> > > > > > > >
> > > > > > > > > So we can really discuss in this direction, whether
> > > > > > > > > the vvu device can be extended to achieve the purpose of ism, or whether the
> > > > > > > > > design goals can be agreed.
> > > > > > > > I've added Stefan in the loop, let's hear from him.
> > > > > > > >
> > > > > > > > > Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
> > > > > > > > > Should device/driver APIs remain independent?
> > > > > > > > Btw, you mentioned that one possible user of ism is the smc, but I
> > > > > > > > don't see how it connects to that with your prototype driver.
> > > > > > > Yes, we originally had plans, but the virtio spec was considered for submission,
> > > > > > > so this was not included. Maybe, we should have included this part @Tony
> > > > > > >
> > > > > > > A brief introduction is that SMC currently has a corresponding
> > > > > > > s390/net/ism_drv.c and we will replace this in the virtualization scenario.
> > > > >
> > > > >
> > > > > Ok, I see. So I think the goal is to implement something in virtio that is
> > > > > functional equivalent to IBM ISM device.
> > > > >
> > > >
> > > > Yes, IBM ISM devices do something similar and it inspired this.
> > >
> > > Ok, it would be better to mention this in the cover letter of the next
> > > version. This can ease the reviewers (IBM has some good docs of those
> > > from the website).
> > >
> >
> > Yes, we will do it.
> >
> > > >
> > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > SMC is a network protocol which is modeled by shared memory rather than
> > > > > > packet.
> > > > >
> > > > >
> > > > > After reading more SMC from IBM website, I think you meant SMC-D here. And I
> > > > > wonder in order to have a complete SMC solution we still need virtio-ROCE
> > > > > for inter host communcation?
> > > > >
> > > >
> > > > Mostly yes.
> > > >
> > > > SMC-D is the part of whole SMC solution. SMC supports multiple
> > > > underlying device, -D means ISM device, -R means RDMA device. The key
> > > > data model is shared memory, SMC uses RDMA (-R) or ISM(-D) to *share*
> > > > memory between peers, and it will choose the suitable device on demand
> > > > during handshaking. If there was no suitable device, it would fall back
> > > > to TCP. So virtio-ROCE is not required.
> > >
> > > So the commniting peers on the same host we need SMC-D, in the future
> > > we need to use RDMA to offload the communication among the peers of
> > > different hosts. Then we can get fully transparent offload no matter
> > > the peer is local or not.
> > >
> >
> > Yes, this is what we want to do.
> 
> Great, Yong Ji posted a ROCE proposal[1], it would be appreciated if
> you can review and give feedback.
> 
> [1] https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/

Sounds great, I will do it. It will expand the usage scenarios of SMC.

Cheers,
Tony Lu

> 
> Thanks
> 
> >
> > > >
> > > > >
> > > > > >   Actually the basic required interfaces of SMC device are:
> > > > > >
> > > > > >    - alloc / free memory region, each connection peer has two memory
> > > > > >     regions dynamically for sending and receiving ring buffer.
> > > > > >    - attach / detach memory region, remote attaches local-allocated
> > > > > >     sending region as receiving region, vice versa.
> > > > > >    - notify, tell peer to read data and update cursor.
> > > > > >
> > > > > > Then the device can be registered as SMC ISM device. Of course, SMC
> > > > > > also requires some modification to adapt it.
> > > > >
> > > > >
> > > > > Looking at s390 ism driver it requires other stuffs like vlan add/remove or
> > > > > gid query, do we need them as well?
> > > >
> > > > vlan is not required in this use case. ISM uses gid to identified each
> > > > others, maybe we could implement it in virtio ways.
> > >
> > > I'd suggest adding the codes to register the driver to SMC/ISM in the
> > > next version (instead of a simple procfs hooking). Then people can
> > > easily play or review.
> > >
> >
> > Ok, I will add the codes in the next version.
> >
> > Cheers,
> > Tony Lu
> >
> > > Thanks
> > >
> > > >
> > > > To support virtio-ism smoothly, the interfaces of ISM driver still need
> > > > to be adjusted. I will put it on the table with IBM people.
> > > >
> > > > Cheers,
> > > > Tony Lu
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Tony Lu
> > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > > How to share the backend with other deivce is another problem.
> > > > > > > > > > Yes, anything that is used for your virito-ism prototype can be used
> > > > > > > > > > for other devices.
> > > > > > > > > >
> > > > > > > > > > > Our goal is to dynamically obtain a piece of memory to share with other vms.
> > > > > > > > > > So at this level, I don't see the exact difference compared to
> > > > > > > > > > virtio-vhost-user. Let's just focus on the API that carries on the
> > > > > > > > > > semantic:
> > > > > > > > > >
> > > > > > > > > > - map/unmap
> > > > > > > > > > - permission update
> > > > > > > > > >
> > > > > > > > > > The only missing piece is the per region notification.
> > > > > > > > > >
> > > > > > > > > > > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> > > > > > > > > > > it will use it as a ring. Of course, we also need a notify mechanism.
> > > > > > > > > > >
> > > > > > > > > > > That's what we're aiming for, so we should first discuss whether this
> > > > > > > > > > > requirement is reasonable.
> > > > > > > > > > So unless somebody said "no", it is fine until now.
> > > > > > > > > >
> > > > > > > > > > > I think it's a feature currently not supported by
> > > > > > > > > > > other devices specified by the current virtio spce.
> > > > > > > > > > Probably, but we've already had rfcs for roce and vhost-user.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > > Thanks.
> > > > > > > > > > >
> > > > > > > > > > >
> > > >
> >


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-21  2:41                       ` Jason Wang
  2022-10-21  2:53                         ` Gerry
@ 2022-10-21  3:30                         ` Dust Li
  2022-10-21  6:37                           ` Jason Wang
  1 sibling, 1 reply; 61+ messages in thread
From: Dust Li @ 2022-10-21  3:30 UTC (permalink / raw)
  To: Jason Wang, Xuan Zhuo
  Cc: Gerry, virtio-dev, hans, herongguang, zmlcc, tonylu, zhenzao,
	helinguo, mst, cohuck, Stefan Hajnoczi

On Fri, Oct 21, 2022 at 10:41:26AM +0800, Jason Wang wrote:
>On Wed, Oct 19, 2022 at 5:27 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>
>> On Wed, 19 Oct 2022 17:15:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> > On Wed, Oct 19, 2022 at 5:12 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>> > >
>> > > On Wed, 19 Oct 2022 17:08:29 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> > > > On Wed, Oct 19, 2022 at 4:21 PM Dust Li <dust.li@linux.alibaba.com> wrote:
>> > > > >
>> > > > > On Wed, Oct 19, 2022 at 04:03:42PM +0800, Gerry wrote:
>> > > > > >
>> > > > > >
>> > > > > >> 2022年10月19日 16:01,Jason Wang <jasowang@redhat.com> 写道:
>> > > > > >>
>> > > > > >> On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>> > > > > >>>
>> > > > > >>> On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> > > > > >>>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>> > > > > >>>>>
>> > > > > >>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> > > > > >>>>>> Adding Stefan.
>> > > > > >>>>>>
>> > > > > >>>>>>
>> > > > > >>>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>> > > > > >>>>>>>
>> > > > > >>>>>>> Hello everyone,
>> > > > > >>>>>>>
>> > > > > >>>>>>> # Background
>> > > > > >>>>>>>
>> > > > > >>>>>>> Nowadays, there is a common scenario to accelerate communication between
>> > > > > >>>>>>> different VMs and containers, including light weight virtual machine based
>> > > > > >>>>>>> containers. One way to achieve this is to colocate them on the same host.
>> > > > > >>>>>>> However, the performance of inter-VM communication through network stack is not
>> > > > > >>>>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
>> > > > > >>>>>>> many times, but still no generic solution available [1] [2] [3].
>> > > > > >>>>>>>
>> > > > > >>>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
>> > > > > >>>>>>> We found that by changing the communication channel between VMs from TCP to SMC
>> > > > > >>>>>>> with shared memory, we can achieve superior performance for a common
>> > > > > >>>>>>> socket-based application[5]:
>> > > > > >>>>>>>  - latency reduced by about 50%
>> > > > > >>>>>>>  - throughput increased by about 300%
>> > > > > >>>>>>>  - CPU consumption reduced by about 50%
>> > > > > >>>>>>>
>> > > > > >>>>>>> Since there is no particularly suitable shared memory management solution
>> > > > > >>>>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
>> > > > > >>>>>>> is the standard for communication in the virtualization world, we want to
>> > > > > >>>>>>> implement a virtio-ism device based on virtio, which can support on-demand
>> > > > > >>>>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
>> > > > > >>>>>>> the virtio-ism device need to support:
>> > > > > >>>>>>>
>> > > > > >>>>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
>> > > > > >>>>>>>   provisioned.
>> > > > > >>>>>>> 2. Multi-region management: the shared memory is divided into regions,
>> > > > > >>>>>>>   and a peer may allocate one or more regions from the same shared memory
>> > > > > >>>>>>>   device.
>> > > > > >>>>>>> 3. Permission control: The permission of each region can be set seperately.
>> > > > > >>>>>>
>> > > > > >>>>>> Looks like virtio-ROCE
>> > > > > >>>>>>
>> > > > > >>>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
>> > > > > >>>>>>
>> > > > > >>>>>> and virtio-vhost-user can satisfy the requirement?
>> > > > > >>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>> # Virtio ism device
>> > > > > >>>>>>>
>> > > > > >>>>>>> ISM devices provide the ability to share memory between different guests on a
>> > > > > >>>>>>> host. A guest's memory got from ism device can be shared with multiple peers at
>> > > > > >>>>>>> the same time. This shared relationship can be dynamically created and released.
>> > > > > >>>>>>>
>> > > > > >>>>>>> The shared memory obtained from the device is divided into multiple ism regions
>> > > > > >>>>>>> for share. ISM device provides a mechanism to notify other ism region referrers
>> > > > > >>>>>>> of content update events.
>> > > > > >>>>>>>
>> > > > > >>>>>>> # Usage (SMC as example)
>> > > > > >>>>>>>
>> > > > > >>>>>>> Maybe there is one of possible use cases:
>> > > > > >>>>>>>
>> > > > > >>>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
>> > > > > >>>>>>>   location of a memory region in the PCI space and a token.
>> > > > > >>>>>>> 2. The ism driver mmap the memory region and return to SMC with the token
>> > > > > >>>>>>> 3. SMC passes the token to the connected peer
>> > > > > >>>>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
>> > > > > >>>>>>>   get the location of the PCI space of the shared memory
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>> # About hot plugging of the ism device
>> > > > > >>>>>>>
>> > > > > >>>>>>>   Hot plugging of devices is a heavier, possibly failed, time-consuming, and
>> > > > > >>>>>>>   less scalable operation. So, we don't plan to support it for now.
>> > > > > >>>>>>>
>> > > > > >>>>>>> # Comparison with existing technology
>> > > > > >>>>>>>
>> > > > > >>>>>>> ## ivshmem or ivshmem 2.0 of Qemu
>> > > > > >>>>>>>
>> > > > > >>>>>>>   1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
>> > > > > >>>>>>>   use this VM, so the security is not enough.
>> > > > > >>>>>>>
>> > > > > >>>>>>>   2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
>> > > > > >>>>>>>   other VMs that use the ivshmem 2.0 shared memory device, which also does not
>> > > > > >>>>>>>   meet our needs in terms of security.
>> > > > > >>>>>>>
>> > > > > >>>>>>> ## vhost-pci and virtiovhostuser
>> > > > > >>>>>>>
>> > > > > >>>>>>>   Does not support dynamic allocation and therefore not suitable for SMC.
>> > > > > >>>>>>
>> > > > > >>>>>> I think this is an implementation issue, we can support VHOST IOTLB
>> > > > > >>>>>> message then the regions could be added/removed on demand.
>> > > > > >>>>>
>> > > > > >>>>>
>> > > > > >>>>> 1. After the attacker connects with the victim, if the attacker does not
>> > > > > >>>>>   dereference memory, the memory will be occupied under virtiovhostuser. In the
>> > > > > >>>>>   case of ism devices, the victim can directly release the reference, and the
>> > > > > >>>>>   maliciously referenced region only occupies the attacker's resources
>> > > > > >>>>
>> > > > > >>>> Let's define the security boundary here. E.g do we trust the device or
>> > > > > >>>> not? If yes, in the case of virtiovhostuser, can we simple do
>> > > > > >>>> VHOST_IOTLB_UNMAP then we can safely release the memory from the
>> > > > > >>>> attacker.
>> > > > > >>>>
>> > > > > >>>>>
>> > > > > >>>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
>> > > > > >>>>>   time, which is a challenge for virtiovhostuser
>> > > > > >>>>
>> > > > > >>>> Please elaborate more the the challenges, anything make
>> > > > > >>>> virtiovhostuser different?
>> > > > > >>>
>> > > > > >>> I understand (please point out any mistakes), one vvu device corresponds to one
>> > > > > >>> vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
>> > > > > >>
>> > > > > >> There could be some misunderstanding here. With 1000 VM, you still
>> > > > > >> need 1000 virtio-sim devices I think.
>> > > > > >We are trying to achieve one virtio-ism device per vm instead of one virtio-ism device per SMC connection.
>> > > >
>> > > > I wonder if we need something to identify a virtio-ism device since I
>> > > > guess there's still a chance to have multiple virtio-ism device per VM
>> > > > (different service chain etc).
>> > >
>> > > Yes, there will be such a situation, a vm has multiple virtio-ism devices.
>> > >
>> > > What exactly do you mean by "identify"?
>> >
>> > E.g we can differ two virtio-net through mac address, do we need
>> > something similar for ism, or it's completely unncessary (e.g via
>> > token or other) ?
>>
>> Currently, we have not encountered such a request.
>>
>> It is conceivable that all physical shared memory ism regions are indexed by
>> tokens. virtio-ism is a way to obtain these ism regions, so there is no need to
>> distinguish multiple virtio-ism devices under one vm on the host.
>
>So consider a case:
>
>VM1 shares ism1 with VM2
>VM1 shares ism2 with VM3
>
>How do application/smc address the different ism device in this case?
>E.g if VM1 want to talk with VM3 it needs to populate regions in ism2,
>but how can application or protocol knows this and how can a specific
>device to be addressed (via BDF?)

In our design, we do have a dev_id for each ISM device.
Currently, we used it to do permission management, I think
it can be used to identify different ISM devices.

The spec says:

+\begin{description}
+\item[\field{dev_id}]      the id of the device.
+\item[\field{region_size}] the size of the every ism region
+\item[\field{notify_size}] the size of the notify address.

<...>

+The device MUST regenerate a \field{dev_id}. \field{dev_id} remains unchanged
+during reset. \field{dev_id} MUST NOT be 0;

Thanks

>
>Thanks
>
>>
>> Thanks.
>>
>>
>> >
>> > Thanks
>> >
>> > >
>> > > Thanks.
>> > >
>> > >
>> > > >
>> > > > Thanks
>> > > >
>> > > > >
>> > > > > I think we must achieve this if we want to meet the requirements of SMC.
>> > > > > In SMC, a SMC socket(Corresponding to a TCP socket) need 2 memory
>> > > > > regions(1 for Tx and 1 for Rx). So if we have 1K TCP connections,
>> > > > > we'll need 2K share memory regions, and those memory regions are
>> > > > > dynamically allocated and freed with the TCP socket.
>> > > > >
>> > > > > >
>> > > > > >>
>> > > > > >>>
>> > > > > >>>
>> > > > > >>>>
>> > > > > >>>>>
>> > > > > >>>>> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
>> > > > > >>>>>   determines the sharing relationship at startup.
>> > > > > >>>>
>> > > > > >>>> Not necessarily with IOTLB API?
>> > > > > >>>
>> > > > > >>> Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
>> > > > > >>> provide the same memory on the host to two vms. So the implementation of this
>> > > > > >>> part will be much simpler. This is why we gave up virtio-vhost-user at the
>> > > > > >>> beginning.
>> > > > > >>
>> > > > > >> Ok, just to make sure we're at the same page. From spec level,
>> > > > > >> virtio-vhost-user doesn't (can't) limit the backend to be implemented
>> > > > > >> in another VM. So it should be ok to be used for sharing memory
>> > > > > >> between a guest and host.
>> > > > > >>
>> > > > > >> Thanks
>> > > > > >>
>> > > > > >>>
>> > > > > >>> Thanks.
>> > > > > >>>
>> > > > > >>>
>> > > > > >>>>
>> > > > > >>>>>
>> > > > > >>>>> 4. For security issues, the device under virtiovhostuser may mmap more memory,
>> > > > > >>>>>   while ism only maps one region to other devices
>> > > > > >>>>
>> > > > > >>>> With VHOST_IOTLB_MAP, the map could be done per region.
>> > > > > >>>>
>> > > > > >>>> Thanks
>> > > > > >>>>
>> > > > > >>>>>
>> > > > > >>>>> Thanks.
>> > > > > >>>>>
>> > > > > >>>>>>
>> > > > > >>>>>> Thanks
>> > > > > >>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>> # Design
>> > > > > >>>>>>>
>> > > > > >>>>>>>   This is a structure diagram based on ism sharing between two vms.
>> > > > > >>>>>>>
>> > > > > >>>>>>>    |-------------------------------------------------------------------------------------------------------------|
>> > > > > >>>>>>>    | |------------------------------------------------|       |------------------------------------------------| |
>> > > > > >>>>>>>    | | Guest                                          |       | Guest                                          | |
>> > > > > >>>>>>>    | |                                                |       |                                                | |
>> > > > > >>>>>>>    | |   ----------------                             |       |   ----------------                             | |
>> > > > > >>>>>>>    | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
>> > > > > >>>>>>>    | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
>> > > > > >>>>>>>    | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
>> > > > > >>>>>>>    | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
>> > > > > >>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
>> > > > > >>>>>>>    | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
>> > > > > >>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
>> > > > > >>>>>>>    | |                                |               |       |                               |                | |
>> > > > > >>>>>>>    | |                                |               |       |                               |                | |
>> > > > > >>>>>>>    | | Qemu                           |               |       | Qemu                          |                | |
>> > > > > >>>>>>>    | |--------------------------------+---------------|       |-------------------------------+----------------| |
>> > > > > >>>>>>>    |                                  |                                                       |                  |
>> > > > > >>>>>>>    |                                  |                                                       |                  |
>> > > > > >>>>>>>    |                                  |------------------------------+------------------------|                  |
>> > > > > >>>>>>>    |                                                                 |                                           |
>> > > > > >>>>>>>    |                                                                 |                                           |
>> > > > > >>>>>>>    |                                                   --------------------------                                |
>> > > > > >>>>>>>    |                                                    | M1 |   | M2 |   | M3 |                                 |
>> > > > > >>>>>>>    |                                                   --------------------------                                |
>> > > > > >>>>>>>    |                                                                                                             |
>> > > > > >>>>>>>    | HOST                                                                                                        |
>> > > > > >>>>>>>    ---------------------------------------------------------------------------------------------------------------
>> > > > > >>>>>>>
>> > > > > >>>>>>> # POC code
>> > > > > >>>>>>>
>> > > > > >>>>>>>   Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
>> > > > > >>>>>>>   Qemu:   https://github.com/fengidri/qemu/commits/ism
>> > > > > >>>>>>>
>> > > > > >>>>>>> If there are any problems, please point them out.
>> > > > > >>>>>>>
>> > > > > >>>>>>> Hope to hear from you, thank you.
>> > > > > >>>>>>>
>> > > > > >>>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
>> > > > > >>>>>>> [2] https://dl.acm.org/doi/10.1145/2847562
>> > > > > >>>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
>> > > > > >>>>>>> [4] https://lwn.net/Articles/711071/
>> > > > > >>>>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>> Xuan Zhuo (2):
>> > > > > >>>>>>>  Reserve device id for ISM device
>> > > > > >>>>>>>  virtio-ism: introduce new device virtio-ism
>> > > > > >>>>>>>
>> > > > > >>>>>>> content.tex    |   3 +
>> > > > > >>>>>>> virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
>> > > > > >>>>>>> 2 files changed, 343 insertions(+)
>> > > > > >>>>>>> create mode 100644 virtio-ism.tex
>> > > > > >>>>>>>
>> > > > > >>>>>>> --
>> > > > > >>>>>>> 2.32.0.3.g01195cf9f
>> > > > > >>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>> ---------------------------------------------------------------------
>> > > > > >>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> > > > > >>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>> > > > > >>>>>>>
>> > > > > >>>>>>
>> > > > > >>>>>
>> > > > > >>>>> ---------------------------------------------------------------------
>> > > > > >>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> > > > > >>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>> > > > > >>>>>
>> > > > > >>>>
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>> > >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-21  3:09                       ` Jason Wang
@ 2022-10-21  3:53                         ` Tony Lu
  2022-10-21  4:54                           ` Dust Li
  0 siblings, 1 reply; 61+ messages in thread
From: Tony Lu @ 2022-10-21  3:53 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, dust.li,
	zhenzao, helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Fri, Oct 21, 2022 at 11:09:19AM +0800, Jason Wang wrote:
> On Fri, Oct 21, 2022 at 11:05 AM Tony Lu <tonylu@linux.alibaba.com> wrote:
> >
> > On Fri, Oct 21, 2022 at 10:47:29AM +0800, Jason Wang wrote:
> > > On Wed, Oct 19, 2022 at 6:01 PM Tony Lu <tonylu@linux.alibaba.com> wrote:
> > > >
> > > > On Wed, Oct 19, 2022 at 05:04:58PM +0800, Jason Wang wrote:
> > > > >
> > > > > 在 2022/10/19 16:07, Tony Lu 写道:
> > > > > > On Wed, Oct 19, 2022 at 02:02:21PM +0800, Xuan Zhuo wrote:
> > > > > > > On Wed, 19 Oct 2022 12:36:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Wed, Oct 19, 2022 at 12:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi Jason,
> > > > > > > > > > >
> > > > > > > > > > > I think there may be some problems with the direction we are discussing.
> > > > > > > > > > Probably not.
> > > > > > > > > >
> > > > > > > > > > As far as we are focusing on technology, there's nothing wrong from my
> > > > > > > > > > perspective. And this is how the community works. Your idea needs to
> > > > > > > > > > be justified and people are free to raise any technical questions
> > > > > > > > > > especially considering you've posted a spec change with prototype
> > > > > > > > > > codes but not only the idea.
> > > > > > > > > >
> > > > > > > > > > > Our
> > > > > > > > > > > goal is to add an new ism device. As far as the spec is concerned, we are not
> > > > > > > > > > > concerned with the implementation of the backend.
> > > > > > > > > > >
> > > > > > > > > > > The direction we should discuss is what is the difference between the ism device
> > > > > > > > > > > and other devices such as virtio-net, and whether it is necessary to introduce
> > > > > > > > > > > this new device.
> > > > > > > > > > This is somehow what I want to ask, actually it's not a comparison
> > > > > > > > > > with virtio-net but:
> > > > > > > > > >
> > > > > > > > > > - virtio-roce
> > > > > > > > > > - virtio-vhost-user
> > > > > > > > > > - virtio-(p)mem
> > > > > > > > > >
> > > > > > > > > > or whether we can simply add features to those devices to achieve what
> > > > > > > > > > you want to do here.
> > > > > > > > >
> > > > > > > > > Yes, this is my priority to discuss.
> > > > > > > > >
> > > > > > > > > At the moment, I think the most similar to ism is the Vhost-user Device Backend
> > > > > > > > > of virtio-vhost-user.
> > > > > > > > >
> > > > > > > > > My understanding of it is to map any virtio device to another vm as a vvu
> > > > > > > > > device.
> > > > > > > > Yes, so a possible way is to have a device with memory zone/region
> > > > > > > > provision and management then map it via virtio-vhost-user.
> > > > > > >
> > > > > > > Yes, there is such a possibility. virtio-vhost-user makes me feel that what can
> > > > > > > be shared is the function implementation of map.
> > > > > > >
> > > > > > > But in the vm to provide the interface to the upper layer, I think this is the
> > > > > > > work of ism.
> > > > > > >
> > > > > > > But one of the reasons why I didn't use virtio-vhost-user directly is that in
> > > > > > > another vm, the guest can operate the vvu device, which we hope that both sides
> > > > > > > are equal to the ism device.
> > > > > > >
> > > > > > > So I want to agree on a question first: who will provide the upper layer with
> > > > > > > the ability to share the memory area?
> > > > > > >
> > > > > > > Our answer is a new ism device. How does this device achieve memory sharing, I
> > > > > > > think is the second question.
> > > > > > >
> > > > > > >
> > > > > > > > >  From this design purpose, I think the two are different.
> > > > > > > > >
> > > > > > > > > Of course, you might want to extend it, it does have some similarities and uses
> > > > > > > > > a lot of similar techniques.
> > > > > > > > I don't have any preference so far. If you think your idea makes more
> > > > > > > > sense, then try your best to justify it in the list.
> > > > > > > >
> > > > > > > > > So we can really discuss in this direction, whether
> > > > > > > > > the vvu device can be extended to achieve the purpose of ism, or whether the
> > > > > > > > > design goals can be agreed.
> > > > > > > > I've added Stefan in the loop, let's hear from him.
> > > > > > > >
> > > > > > > > > Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
> > > > > > > > > Should device/driver APIs remain independent?
> > > > > > > > Btw, you mentioned that one possible user of ism is the smc, but I
> > > > > > > > don't see how it connects to that with your prototype driver.
> > > > > > > Yes, we originally had plans, but the virtio spec was considered for submission,
> > > > > > > so this was not included. Maybe, we should have included this part @Tony
> > > > > > >
> > > > > > > A brief introduction is that SMC currently has a corresponding
> > > > > > > s390/net/ism_drv.c and we will replace this in the virtualization scenario.
> > > > >
> > > > >
> > > > > Ok, I see. So I think the goal is to implement something in virtio that is
> > > > > functional equivalent to IBM ISM device.
> > > > >
> > > >
> > > > Yes, IBM ISM devices do something similar and it inspired this.
> > >
> > > Ok, it would be better to mention this in the cover letter of the next
> > > version. This can ease the reviewers (IBM has some good docs of those
> > > from the website).
> > >
> >
> > Yes, we will do it.
> 
> Btw, I wonder about the plan to support live migration. E.g do we need
> to hot unplug the ism device before the migration then we can fallback
> to TCP/IP ?
> 

From the point view of SMC, SMC-R maintains multiple link (RDMA QP), it
can live migrate existed connections to new link.

Currently, yes, for SMC-D.

Cheers,
Tony Lu


> Thanks
> 
> >
> > > >
> > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > SMC is a network protocol which is modeled by shared memory rather than
> > > > > > packet.
> > > > >
> > > > >
> > > > > After reading more SMC from IBM website, I think you meant SMC-D here. And I
> > > > > wonder in order to have a complete SMC solution we still need virtio-ROCE
> > > > > for inter host communcation?
> > > > >
> > > >
> > > > Mostly yes.
> > > >
> > > > SMC-D is the part of whole SMC solution. SMC supports multiple
> > > > underlying device, -D means ISM device, -R means RDMA device. The key
> > > > data model is shared memory, SMC uses RDMA (-R) or ISM(-D) to *share*
> > > > memory between peers, and it will choose the suitable device on demand
> > > > during handshaking. If there was no suitable device, it would fall back
> > > > to TCP. So virtio-ROCE is not required.
> > >
> > > So the commniting peers on the same host we need SMC-D, in the future
> > > we need to use RDMA to offload the communication among the peers of
> > > different hosts. Then we can get fully transparent offload no matter
> > > the peer is local or not.
> > >
> >
> > Yes, this is what we want to do.
> >
> > > >
> > > > >
> > > > > >   Actually the basic required interfaces of SMC device are:
> > > > > >
> > > > > >    - alloc / free memory region, each connection peer has two memory
> > > > > >     regions dynamically for sending and receiving ring buffer.
> > > > > >    - attach / detach memory region, remote attaches local-allocated
> > > > > >     sending region as receiving region, vice versa.
> > > > > >    - notify, tell peer to read data and update cursor.
> > > > > >
> > > > > > Then the device can be registered as SMC ISM device. Of course, SMC
> > > > > > also requires some modification to adapt it.
> > > > >
> > > > >
> > > > > Looking at s390 ism driver it requires other stuffs like vlan add/remove or
> > > > > gid query, do we need them as well?
> > > >
> > > > vlan is not required in this use case. ISM uses gid to identified each
> > > > others, maybe we could implement it in virtio ways.
> > >
> > > I'd suggest adding the codes to register the driver to SMC/ISM in the
> > > next version (instead of a simple procfs hooking). Then people can
> > > easily play or review.
> > >
> >
> > Ok, I will add the codes in the next version.
> >
> > Cheers,
> > Tony Lu
> >
> > > Thanks
> > >
> > > >
> > > > To support virtio-ism smoothly, the interfaces of ISM driver still need
> > > > to be adjusted. I will put it on the table with IBM people.
> > > >
> > > > Cheers,
> > > > Tony Lu
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Tony Lu
> > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > > How to share the backend with other deivce is another problem.
> > > > > > > > > > Yes, anything that is used for your virito-ism prototype can be used
> > > > > > > > > > for other devices.
> > > > > > > > > >
> > > > > > > > > > > Our goal is to dynamically obtain a piece of memory to share with other vms.
> > > > > > > > > > So at this level, I don't see the exact difference compared to
> > > > > > > > > > virtio-vhost-user. Let's just focus on the API that carries on the
> > > > > > > > > > semantic:
> > > > > > > > > >
> > > > > > > > > > - map/unmap
> > > > > > > > > > - permission update
> > > > > > > > > >
> > > > > > > > > > The only missing piece is the per region notification.
> > > > > > > > > >
> > > > > > > > > > > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> > > > > > > > > > > it will use it as a ring. Of course, we also need a notify mechanism.
> > > > > > > > > > >
> > > > > > > > > > > That's what we're aiming for, so we should first discuss whether this
> > > > > > > > > > > requirement is reasonable.
> > > > > > > > > > So unless somebody said "no", it is fine until now.
> > > > > > > > > >
> > > > > > > > > > > I think it's a feature currently not supported by
> > > > > > > > > > > other devices specified by the current virtio spce.
> > > > > > > > > > Probably, but we've already had rfcs for roce and vhost-user.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > > Thanks.
> > > > > > > > > > >
> > > > > > > > > > >
> > > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-21  3:53                         ` Tony Lu
@ 2022-10-21  4:54                           ` Dust Li
  2022-10-21  5:13                             ` Tony Lu
  0 siblings, 1 reply; 61+ messages in thread
From: Dust Li @ 2022-10-21  4:54 UTC (permalink / raw)
  To: Tony Lu, Jason Wang
  Cc: Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Fri, Oct 21, 2022 at 11:53:10AM +0800, Tony Lu wrote:
>On Fri, Oct 21, 2022 at 11:09:19AM +0800, Jason Wang wrote:
>> On Fri, Oct 21, 2022 at 11:05 AM Tony Lu <tonylu@linux.alibaba.com> wrote:
>> >
>> > On Fri, Oct 21, 2022 at 10:47:29AM +0800, Jason Wang wrote:
>> > > On Wed, Oct 19, 2022 at 6:01 PM Tony Lu <tonylu@linux.alibaba.com> wrote:
>> > > >
>> > > > On Wed, Oct 19, 2022 at 05:04:58PM +0800, Jason Wang wrote:
>> > > > >
>> > > > > 在 2022/10/19 16:07, Tony Lu 写道:
>> > > > > > On Wed, Oct 19, 2022 at 02:02:21PM +0800, Xuan Zhuo wrote:
>> > > > > > > On Wed, 19 Oct 2022 12:36:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> > > > > > > > On Wed, Oct 19, 2022 at 12:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>> > > > > > > > > On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> > > > > > > > > > On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>> > > > > > > > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > Hi Jason,
>> > > > > > > > > > >
>> > > > > > > > > > > I think there may be some problems with the direction we are discussing.
>> > > > > > > > > > Probably not.
>> > > > > > > > > >
>> > > > > > > > > > As far as we are focusing on technology, there's nothing wrong from my
>> > > > > > > > > > perspective. And this is how the community works. Your idea needs to
>> > > > > > > > > > be justified and people are free to raise any technical questions
>> > > > > > > > > > especially considering you've posted a spec change with prototype
>> > > > > > > > > > codes but not only the idea.
>> > > > > > > > > >
>> > > > > > > > > > > Our
>> > > > > > > > > > > goal is to add an new ism device. As far as the spec is concerned, we are not
>> > > > > > > > > > > concerned with the implementation of the backend.
>> > > > > > > > > > >
>> > > > > > > > > > > The direction we should discuss is what is the difference between the ism device
>> > > > > > > > > > > and other devices such as virtio-net, and whether it is necessary to introduce
>> > > > > > > > > > > this new device.
>> > > > > > > > > > This is somehow what I want to ask, actually it's not a comparison
>> > > > > > > > > > with virtio-net but:
>> > > > > > > > > >
>> > > > > > > > > > - virtio-roce
>> > > > > > > > > > - virtio-vhost-user
>> > > > > > > > > > - virtio-(p)mem
>> > > > > > > > > >
>> > > > > > > > > > or whether we can simply add features to those devices to achieve what
>> > > > > > > > > > you want to do here.
>> > > > > > > > >
>> > > > > > > > > Yes, this is my priority to discuss.
>> > > > > > > > >
>> > > > > > > > > At the moment, I think the most similar to ism is the Vhost-user Device Backend
>> > > > > > > > > of virtio-vhost-user.
>> > > > > > > > >
>> > > > > > > > > My understanding of it is to map any virtio device to another vm as a vvu
>> > > > > > > > > device.
>> > > > > > > > Yes, so a possible way is to have a device with memory zone/region
>> > > > > > > > provision and management then map it via virtio-vhost-user.
>> > > > > > >
>> > > > > > > Yes, there is such a possibility. virtio-vhost-user makes me feel that what can
>> > > > > > > be shared is the function implementation of map.
>> > > > > > >
>> > > > > > > But in the vm to provide the interface to the upper layer, I think this is the
>> > > > > > > work of ism.
>> > > > > > >
>> > > > > > > But one of the reasons why I didn't use virtio-vhost-user directly is that in
>> > > > > > > another vm, the guest can operate the vvu device, which we hope that both sides
>> > > > > > > are equal to the ism device.
>> > > > > > >
>> > > > > > > So I want to agree on a question first: who will provide the upper layer with
>> > > > > > > the ability to share the memory area?
>> > > > > > >
>> > > > > > > Our answer is a new ism device. How does this device achieve memory sharing, I
>> > > > > > > think is the second question.
>> > > > > > >
>> > > > > > >
>> > > > > > > > >  From this design purpose, I think the two are different.
>> > > > > > > > >
>> > > > > > > > > Of course, you might want to extend it, it does have some similarities and uses
>> > > > > > > > > a lot of similar techniques.
>> > > > > > > > I don't have any preference so far. If you think your idea makes more
>> > > > > > > > sense, then try your best to justify it in the list.
>> > > > > > > >
>> > > > > > > > > So we can really discuss in this direction, whether
>> > > > > > > > > the vvu device can be extended to achieve the purpose of ism, or whether the
>> > > > > > > > > design goals can be agreed.
>> > > > > > > > I've added Stefan in the loop, let's hear from him.
>> > > > > > > >
>> > > > > > > > > Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
>> > > > > > > > > Should device/driver APIs remain independent?
>> > > > > > > > Btw, you mentioned that one possible user of ism is the smc, but I
>> > > > > > > > don't see how it connects to that with your prototype driver.
>> > > > > > > Yes, we originally had plans, but the virtio spec was considered for submission,
>> > > > > > > so this was not included. Maybe, we should have included this part @Tony
>> > > > > > >
>> > > > > > > A brief introduction is that SMC currently has a corresponding
>> > > > > > > s390/net/ism_drv.c and we will replace this in the virtualization scenario.
>> > > > >
>> > > > >
>> > > > > Ok, I see. So I think the goal is to implement something in virtio that is
>> > > > > functional equivalent to IBM ISM device.
>> > > > >
>> > > >
>> > > > Yes, IBM ISM devices do something similar and it inspired this.
>> > >
>> > > Ok, it would be better to mention this in the cover letter of the next
>> > > version. This can ease the reviewers (IBM has some good docs of those
>> > > from the website).
>> > >
>> >
>> > Yes, we will do it.
>> 
>> Btw, I wonder about the plan to support live migration. E.g do we need
>> to hot unplug the ism device before the migration then we can fallback
>> to TCP/IP ?
>> 
>
>>>From the point view of SMC, SMC-R maintains multiple link (RDMA QP), it
>can live migrate existed connections to new link.
>
>Currently, yes, for SMC-D.

I think Jason means VM live migration from one Host to another. Am I
right, Jason ?

In that case, the share memory from the ISM device is no longer valid,
I think we have to hot unplug before the migration to notify SMC that
the SMC-D link is no longer usable.
IIUC, SMC-D doesn't support transparently fallback to TCP/IP in this case
now. But I think we could make that happen, since SMC already support link
migration between different RDMA devices.

Thanks

>
>Cheers,
>Tony Lu
>
>
>> Thanks
>> 
>> >
>> > > >
>> > > > >
>> > > > > > >
>> > > > > > > Thanks.
>> > > > > > >
>> > > > > > SMC is a network protocol which is modeled by shared memory rather than
>> > > > > > packet.
>> > > > >
>> > > > >
>> > > > > After reading more SMC from IBM website, I think you meant SMC-D here. And I
>> > > > > wonder in order to have a complete SMC solution we still need virtio-ROCE
>> > > > > for inter host communcation?
>> > > > >
>> > > >
>> > > > Mostly yes.
>> > > >
>> > > > SMC-D is the part of whole SMC solution. SMC supports multiple
>> > > > underlying device, -D means ISM device, -R means RDMA device. The key
>> > > > data model is shared memory, SMC uses RDMA (-R) or ISM(-D) to *share*
>> > > > memory between peers, and it will choose the suitable device on demand
>> > > > during handshaking. If there was no suitable device, it would fall back
>> > > > to TCP. So virtio-ROCE is not required.
>> > >
>> > > So the commniting peers on the same host we need SMC-D, in the future
>> > > we need to use RDMA to offload the communication among the peers of
>> > > different hosts. Then we can get fully transparent offload no matter
>> > > the peer is local or not.
>> > >
>> >
>> > Yes, this is what we want to do.
>> >
>> > > >
>> > > > >
>> > > > > >   Actually the basic required interfaces of SMC device are:
>> > > > > >
>> > > > > >    - alloc / free memory region, each connection peer has two memory
>> > > > > >     regions dynamically for sending and receiving ring buffer.
>> > > > > >    - attach / detach memory region, remote attaches local-allocated
>> > > > > >     sending region as receiving region, vice versa.
>> > > > > >    - notify, tell peer to read data and update cursor.
>> > > > > >
>> > > > > > Then the device can be registered as SMC ISM device. Of course, SMC
>> > > > > > also requires some modification to adapt it.
>> > > > >
>> > > > >
>> > > > > Looking at s390 ism driver it requires other stuffs like vlan add/remove or
>> > > > > gid query, do we need them as well?
>> > > >
>> > > > vlan is not required in this use case. ISM uses gid to identified each
>> > > > others, maybe we could implement it in virtio ways.
>> > >
>> > > I'd suggest adding the codes to register the driver to SMC/ISM in the
>> > > next version (instead of a simple procfs hooking). Then people can
>> > > easily play or review.
>> > >
>> >
>> > Ok, I will add the codes in the next version.
>> >
>> > Cheers,
>> > Tony Lu
>> >
>> > > Thanks
>> > >
>> > > >
>> > > > To support virtio-ism smoothly, the interfaces of ISM driver still need
>> > > > to be adjusted. I will put it on the table with IBM people.
>> > > >
>> > > > Cheers,
>> > > > Tony Lu
>> > > >
>> > > > >
>> > > > > Thanks
>> > > > >
>> > > > >
>> > > > > >
>> > > > > > Cheers,
>> > > > > > Tony Lu
>> > > > > >
>> > > > > > > > Thanks
>> > > > > > > >
>> > > > > > > > > Thanks.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > > > How to share the backend with other deivce is another problem.
>> > > > > > > > > > Yes, anything that is used for your virito-ism prototype can be used
>> > > > > > > > > > for other devices.
>> > > > > > > > > >
>> > > > > > > > > > > Our goal is to dynamically obtain a piece of memory to share with other vms.
>> > > > > > > > > > So at this level, I don't see the exact difference compared to
>> > > > > > > > > > virtio-vhost-user. Let's just focus on the API that carries on the
>> > > > > > > > > > semantic:
>> > > > > > > > > >
>> > > > > > > > > > - map/unmap
>> > > > > > > > > > - permission update
>> > > > > > > > > >
>> > > > > > > > > > The only missing piece is the per region notification.
>> > > > > > > > > >
>> > > > > > > > > > > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
>> > > > > > > > > > > it will use it as a ring. Of course, we also need a notify mechanism.
>> > > > > > > > > > >
>> > > > > > > > > > > That's what we're aiming for, so we should first discuss whether this
>> > > > > > > > > > > requirement is reasonable.
>> > > > > > > > > > So unless somebody said "no", it is fine until now.
>> > > > > > > > > >
>> > > > > > > > > > > I think it's a feature currently not supported by
>> > > > > > > > > > > other devices specified by the current virtio spce.
>> > > > > > > > > > Probably, but we've already had rfcs for roce and vhost-user.
>> > > > > > > > > >
>> > > > > > > > > > Thanks
>> > > > > > > > > >
>> > > > > > > > > > > Thanks.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > >
>> >


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-21  4:54                           ` Dust Li
@ 2022-10-21  5:13                             ` Tony Lu
  2022-10-21  6:38                               ` Jason Wang
  0 siblings, 1 reply; 61+ messages in thread
From: Tony Lu @ 2022-10-21  5:13 UTC (permalink / raw)
  To: Dust Li
  Cc: Jason Wang, Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc,
	zhenzao, helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Fri, Oct 21, 2022 at 12:54:22PM +0800, Dust Li wrote:
> On Fri, Oct 21, 2022 at 11:53:10AM +0800, Tony Lu wrote:
> >On Fri, Oct 21, 2022 at 11:09:19AM +0800, Jason Wang wrote:
> >> On Fri, Oct 21, 2022 at 11:05 AM Tony Lu <tonylu@linux.alibaba.com> wrote:
> >> >
> >> > On Fri, Oct 21, 2022 at 10:47:29AM +0800, Jason Wang wrote:
> >> > > On Wed, Oct 19, 2022 at 6:01 PM Tony Lu <tonylu@linux.alibaba.com> wrote:
> >> > > >
> >> > > > On Wed, Oct 19, 2022 at 05:04:58PM +0800, Jason Wang wrote:
> >> > > > >
> >> > > > > 在 2022/10/19 16:07, Tony Lu 写道:
> >> > > > > > On Wed, Oct 19, 2022 at 02:02:21PM +0800, Xuan Zhuo wrote:
> >> > > > > > > On Wed, 19 Oct 2022 12:36:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >> > > > > > > > On Wed, Oct 19, 2022 at 12:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >> > > > > > > > > On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >> > > > > > > > > > On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >> > > > > > > > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > Hi Jason,
> >> > > > > > > > > > >
> >> > > > > > > > > > > I think there may be some problems with the direction we are discussing.
> >> > > > > > > > > > Probably not.
> >> > > > > > > > > >
> >> > > > > > > > > > As far as we are focusing on technology, there's nothing wrong from my
> >> > > > > > > > > > perspective. And this is how the community works. Your idea needs to
> >> > > > > > > > > > be justified and people are free to raise any technical questions
> >> > > > > > > > > > especially considering you've posted a spec change with prototype
> >> > > > > > > > > > codes but not only the idea.
> >> > > > > > > > > >
> >> > > > > > > > > > > Our
> >> > > > > > > > > > > goal is to add an new ism device. As far as the spec is concerned, we are not
> >> > > > > > > > > > > concerned with the implementation of the backend.
> >> > > > > > > > > > >
> >> > > > > > > > > > > The direction we should discuss is what is the difference between the ism device
> >> > > > > > > > > > > and other devices such as virtio-net, and whether it is necessary to introduce
> >> > > > > > > > > > > this new device.
> >> > > > > > > > > > This is somehow what I want to ask, actually it's not a comparison
> >> > > > > > > > > > with virtio-net but:
> >> > > > > > > > > >
> >> > > > > > > > > > - virtio-roce
> >> > > > > > > > > > - virtio-vhost-user
> >> > > > > > > > > > - virtio-(p)mem
> >> > > > > > > > > >
> >> > > > > > > > > > or whether we can simply add features to those devices to achieve what
> >> > > > > > > > > > you want to do here.
> >> > > > > > > > >
> >> > > > > > > > > Yes, this is my priority to discuss.
> >> > > > > > > > >
> >> > > > > > > > > At the moment, I think the most similar to ism is the Vhost-user Device Backend
> >> > > > > > > > > of virtio-vhost-user.
> >> > > > > > > > >
> >> > > > > > > > > My understanding of it is to map any virtio device to another vm as a vvu
> >> > > > > > > > > device.
> >> > > > > > > > Yes, so a possible way is to have a device with memory zone/region
> >> > > > > > > > provision and management then map it via virtio-vhost-user.
> >> > > > > > >
> >> > > > > > > Yes, there is such a possibility. virtio-vhost-user makes me feel that what can
> >> > > > > > > be shared is the function implementation of map.
> >> > > > > > >
> >> > > > > > > But in the vm to provide the interface to the upper layer, I think this is the
> >> > > > > > > work of ism.
> >> > > > > > >
> >> > > > > > > But one of the reasons why I didn't use virtio-vhost-user directly is that in
> >> > > > > > > another vm, the guest can operate the vvu device, which we hope that both sides
> >> > > > > > > are equal to the ism device.
> >> > > > > > >
> >> > > > > > > So I want to agree on a question first: who will provide the upper layer with
> >> > > > > > > the ability to share the memory area?
> >> > > > > > >
> >> > > > > > > Our answer is a new ism device. How does this device achieve memory sharing, I
> >> > > > > > > think is the second question.
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > > >  From this design purpose, I think the two are different.
> >> > > > > > > > >
> >> > > > > > > > > Of course, you might want to extend it, it does have some similarities and uses
> >> > > > > > > > > a lot of similar techniques.
> >> > > > > > > > I don't have any preference so far. If you think your idea makes more
> >> > > > > > > > sense, then try your best to justify it in the list.
> >> > > > > > > >
> >> > > > > > > > > So we can really discuss in this direction, whether
> >> > > > > > > > > the vvu device can be extended to achieve the purpose of ism, or whether the
> >> > > > > > > > > design goals can be agreed.
> >> > > > > > > > I've added Stefan in the loop, let's hear from him.
> >> > > > > > > >
> >> > > > > > > > > Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
> >> > > > > > > > > Should device/driver APIs remain independent?
> >> > > > > > > > Btw, you mentioned that one possible user of ism is the smc, but I
> >> > > > > > > > don't see how it connects to that with your prototype driver.
> >> > > > > > > Yes, we originally had plans, but the virtio spec was considered for submission,
> >> > > > > > > so this was not included. Maybe, we should have included this part @Tony
> >> > > > > > >
> >> > > > > > > A brief introduction is that SMC currently has a corresponding
> >> > > > > > > s390/net/ism_drv.c and we will replace this in the virtualization scenario.
> >> > > > >
> >> > > > >
> >> > > > > Ok, I see. So I think the goal is to implement something in virtio that is
> >> > > > > functional equivalent to IBM ISM device.
> >> > > > >
> >> > > >
> >> > > > Yes, IBM ISM devices do something similar and it inspired this.
> >> > >
> >> > > Ok, it would be better to mention this in the cover letter of the next
> >> > > version. This can ease the reviewers (IBM has some good docs of those
> >> > > from the website).
> >> > >
> >> >
> >> > Yes, we will do it.
> >> 
> >> Btw, I wonder about the plan to support live migration. E.g do we need
> >> to hot unplug the ism device before the migration then we can fallback
> >> to TCP/IP ?
> >> 
> >
> >>From the point view of SMC, SMC-R maintains multiple link (RDMA QP), it
> >can live migrate existed connections to new link.
> >
> >Currently, yes, for SMC-D.
> 
> I think Jason means VM live migration from one Host to another. Am I
> right, Jason ?
> 
> In that case, the share memory from the ISM device is no longer valid,
> I think we have to hot unplug before the migration to notify SMC that
> the SMC-D link is no longer usable.

Yes, this is what I mean ;-) SMC-D needs to unplug the device.

> IIUC, SMC-D doesn't support transparently fallback to TCP/IP in this case
> now. But I think we could make that happen, since SMC already support link
> migration between different RDMA devices.

Yes, currently SMC-D doesn't support migration to another device or
fallback. And SMC-R supports migration to another link, no fallback.

Cheers,
Tony Lu

> Thanks
> 
> >
> >Cheers,
> >Tony Lu
> >
> >
> >> Thanks
> >> 
> >> >
> >> > > >
> >> > > > >
> >> > > > > > >
> >> > > > > > > Thanks.
> >> > > > > > >
> >> > > > > > SMC is a network protocol which is modeled by shared memory rather than
> >> > > > > > packet.
> >> > > > >
> >> > > > >
> >> > > > > After reading more SMC from IBM website, I think you meant SMC-D here. And I
> >> > > > > wonder in order to have a complete SMC solution we still need virtio-ROCE
> >> > > > > for inter host communcation?
> >> > > > >
> >> > > >
> >> > > > Mostly yes.
> >> > > >
> >> > > > SMC-D is the part of whole SMC solution. SMC supports multiple
> >> > > > underlying device, -D means ISM device, -R means RDMA device. The key
> >> > > > data model is shared memory, SMC uses RDMA (-R) or ISM(-D) to *share*
> >> > > > memory between peers, and it will choose the suitable device on demand
> >> > > > during handshaking. If there was no suitable device, it would fall back
> >> > > > to TCP. So virtio-ROCE is not required.
> >> > >
> >> > > So the commniting peers on the same host we need SMC-D, in the future
> >> > > we need to use RDMA to offload the communication among the peers of
> >> > > different hosts. Then we can get fully transparent offload no matter
> >> > > the peer is local or not.
> >> > >
> >> >
> >> > Yes, this is what we want to do.
> >> >
> >> > > >
> >> > > > >
> >> > > > > >   Actually the basic required interfaces of SMC device are:
> >> > > > > >
> >> > > > > >    - alloc / free memory region, each connection peer has two memory
> >> > > > > >     regions dynamically for sending and receiving ring buffer.
> >> > > > > >    - attach / detach memory region, remote attaches local-allocated
> >> > > > > >     sending region as receiving region, vice versa.
> >> > > > > >    - notify, tell peer to read data and update cursor.
> >> > > > > >
> >> > > > > > Then the device can be registered as SMC ISM device. Of course, SMC
> >> > > > > > also requires some modification to adapt it.
> >> > > > >
> >> > > > >
> >> > > > > Looking at s390 ism driver it requires other stuffs like vlan add/remove or
> >> > > > > gid query, do we need them as well?
> >> > > >
> >> > > > vlan is not required in this use case. ISM uses gid to identified each
> >> > > > others, maybe we could implement it in virtio ways.
> >> > >
> >> > > I'd suggest adding the codes to register the driver to SMC/ISM in the
> >> > > next version (instead of a simple procfs hooking). Then people can
> >> > > easily play or review.
> >> > >
> >> >
> >> > Ok, I will add the codes in the next version.
> >> >
> >> > Cheers,
> >> > Tony Lu
> >> >
> >> > > Thanks
> >> > >
> >> > > >
> >> > > > To support virtio-ism smoothly, the interfaces of ISM driver still need
> >> > > > to be adjusted. I will put it on the table with IBM people.
> >> > > >
> >> > > > Cheers,
> >> > > > Tony Lu
> >> > > >
> >> > > > >
> >> > > > > Thanks
> >> > > > >
> >> > > > >
> >> > > > > >
> >> > > > > > Cheers,
> >> > > > > > Tony Lu
> >> > > > > >
> >> > > > > > > > Thanks
> >> > > > > > > >
> >> > > > > > > > > Thanks.
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > > > How to share the backend with other deivce is another problem.
> >> > > > > > > > > > Yes, anything that is used for your virito-ism prototype can be used
> >> > > > > > > > > > for other devices.
> >> > > > > > > > > >
> >> > > > > > > > > > > Our goal is to dynamically obtain a piece of memory to share with other vms.
> >> > > > > > > > > > So at this level, I don't see the exact difference compared to
> >> > > > > > > > > > virtio-vhost-user. Let's just focus on the API that carries on the
> >> > > > > > > > > > semantic:
> >> > > > > > > > > >
> >> > > > > > > > > > - map/unmap
> >> > > > > > > > > > - permission update
> >> > > > > > > > > >
> >> > > > > > > > > > The only missing piece is the per region notification.
> >> > > > > > > > > >
> >> > > > > > > > > > > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> >> > > > > > > > > > > it will use it as a ring. Of course, we also need a notify mechanism.
> >> > > > > > > > > > >
> >> > > > > > > > > > > That's what we're aiming for, so we should first discuss whether this
> >> > > > > > > > > > > requirement is reasonable.
> >> > > > > > > > > > So unless somebody said "no", it is fine until now.
> >> > > > > > > > > >
> >> > > > > > > > > > > I think it's a feature currently not supported by
> >> > > > > > > > > > > other devices specified by the current virtio spce.
> >> > > > > > > > > > Probably, but we've already had rfcs for roce and vhost-user.
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks
> >> > > > > > > > > >
> >> > > > > > > > > > > Thanks.
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > >
> >> >


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-21  3:03                     ` Xuan Zhuo
@ 2022-10-21  6:35                       ` Jason Wang
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Wang @ 2022-10-21  6:35 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtio-dev, hans, herongguang, zmlcc, dust.li, tonylu, zhenzao,
	helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Fri, Oct 21, 2022 at 11:26 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Fri, 21 Oct 2022 10:42:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Oct 19, 2022 at 5:23 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 19 Oct 2022 17:11:21 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Wed, Oct 19, 2022 at 4:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Wed, 19 Oct 2022 16:13:07 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > On Wed, 19 Oct 2022 16:01:42 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > Adding Stefan.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hello everyone,
> > > > > > > > > > > >
> > > > > > > > > > > > # Background
> > > > > > > > > > > >
> > > > > > > > > > > > Nowadays, there is a common scenario to accelerate communication between
> > > > > > > > > > > > different VMs and containers, including light weight virtual machine based
> > > > > > > > > > > > containers. One way to achieve this is to colocate them on the same host.
> > > > > > > > > > > > However, the performance of inter-VM communication through network stack is not
> > > > > > > > > > > > optimal and may also waste extra CPU cycles. This scenario has been discussed
> > > > > > > > > > > > many times, but still no generic solution available [1] [2] [3].
> > > > > > > > > > > >
> > > > > > > > > > > > With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> > > > > > > > > > > > We found that by changing the communication channel between VMs from TCP to SMC
> > > > > > > > > > > > with shared memory, we can achieve superior performance for a common
> > > > > > > > > > > > socket-based application[5]:
> > > > > > > > > > > >   - latency reduced by about 50%
> > > > > > > > > > > >   - throughput increased by about 300%
> > > > > > > > > > > >   - CPU consumption reduced by about 50%
> > > > > > > > > > > >
> > > > > > > > > > > > Since there is no particularly suitable shared memory management solution
> > > > > > > > > > > > matches the need for SMC(See ## Comparison with existing technology), and virtio
> > > > > > > > > > > > is the standard for communication in the virtualization world, we want to
> > > > > > > > > > > > implement a virtio-ism device based on virtio, which can support on-demand
> > > > > > > > > > > > memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> > > > > > > > > > > > the virtio-ism device need to support:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Dynamic provision: shared memory regions are dynamically allocated and
> > > > > > > > > > > >    provisioned.
> > > > > > > > > > > > 2. Multi-region management: the shared memory is divided into regions,
> > > > > > > > > > > >    and a peer may allocate one or more regions from the same shared memory
> > > > > > > > > > > >    device.
> > > > > > > > > > > > 3. Permission control: The permission of each region can be set seperately.
> > > > > > > > > > >
> > > > > > > > > > > Looks like virtio-ROCE
> > > > > > > > > > >
> > > > > > > > > > > https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> > > > > > > > > > >
> > > > > > > > > > > and virtio-vhost-user can satisfy the requirement?
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > # Virtio ism device
> > > > > > > > > > > >
> > > > > > > > > > > > ISM devices provide the ability to share memory between different guests on a
> > > > > > > > > > > > host. A guest's memory got from ism device can be shared with multiple peers at
> > > > > > > > > > > > the same time. This shared relationship can be dynamically created and released.
> > > > > > > > > > > >
> > > > > > > > > > > > The shared memory obtained from the device is divided into multiple ism regions
> > > > > > > > > > > > for share. ISM device provides a mechanism to notify other ism region referrers
> > > > > > > > > > > > of content update events.
> > > > > > > > > > > >
> > > > > > > > > > > > # Usage (SMC as example)
> > > > > > > > > > > >
> > > > > > > > > > > > Maybe there is one of possible use cases:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> > > > > > > > > > > >    location of a memory region in the PCI space and a token.
> > > > > > > > > > > > 2. The ism driver mmap the memory region and return to SMC with the token
> > > > > > > > > > > > 3. SMC passes the token to the connected peer
> > > > > > > > > > > > 3. the peer calls the ism driver interface ism_attach_region(token) to
> > > > > > > > > > > >    get the location of the PCI space of the shared memory
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > # About hot plugging of the ism device
> > > > > > > > > > > >
> > > > > > > > > > > >    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> > > > > > > > > > > >    less scalable operation. So, we don't plan to support it for now.
> > > > > > > > > > > >
> > > > > > > > > > > > # Comparison with existing technology
> > > > > > > > > > > >
> > > > > > > > > > > > ## ivshmem or ivshmem 2.0 of Qemu
> > > > > > > > > > > >
> > > > > > > > > > > >    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> > > > > > > > > > > >    use this VM, so the security is not enough.
> > > > > > > > > > > >
> > > > > > > > > > > >    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> > > > > > > > > > > >    other VMs that use the ivshmem 2.0 shared memory device, which also does not
> > > > > > > > > > > >    meet our needs in terms of security.
> > > > > > > > > > > >
> > > > > > > > > > > > ## vhost-pci and virtiovhostuser
> > > > > > > > > > > >
> > > > > > > > > > > >    Does not support dynamic allocation and therefore not suitable for SMC.
> > > > > > > > > > >
> > > > > > > > > > > I think this is an implementation issue, we can support VHOST IOTLB
> > > > > > > > > > > message then the regions could be added/removed on demand.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 1. After the attacker connects with the victim, if the attacker does not
> > > > > > > > > >    dereference memory, the memory will be occupied under virtiovhostuser. In the
> > > > > > > > > >    case of ism devices, the victim can directly release the reference, and the
> > > > > > > > > >    maliciously referenced region only occupies the attacker's resources
> > > > > > > > >
> > > > > > > > > Let's define the security boundary here. E.g do we trust the device or
> > > > > > > > > not? If yes, in the case of virtiovhostuser, can we simple do
> > > > > > > > > VHOST_IOTLB_UNMAP then we can safely release the memory from the
> > > > > > > > > attacker.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
> > > > > > > > > >    time, which is a challenge for virtiovhostuser
> > > > > > > > >
> > > > > > > > > Please elaborate more the the challenges, anything make
> > > > > > > > > virtiovhostuser different?
> > > > > > > >
> > > > > > > > I understand (please point out any mistakes), one vvu device corresponds to one
> > > > > > > > vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
> > > > > > >
> > > > > > > There could be some misunderstanding here. With 1000 VM, you still
> > > > > > > need 1000 virtio-sim devices I think.
> > > > > >
> > > > > > No, just use a virtio-ism device.
> > > > >
> > > > > For example, if the hardware memory of a virtio-ism is 1G, and an ism region is
> > > > > 1M, there are 1000 ism regions, and these ism regions can be shared with
> > > > > different vms.
> > > >
> > > > Right, this is what I've understood.
> > > >
> > > > What I want to say this might be achieved with virtio-vhost-user as
> > > > well. But it may require a some changes on the protocol which I'm not
> > > > sure it's worth to bother. And I've started to think about the
> > > > possibility to build virtio-vhost-user on top (I don't see any blocker
> > > > so far).
> > >
> > > Yes, it is theoretically possible to implement based on virtio-vhost-user. But
> > > when we try to implement it without depending on virtio-vhost-user, this
> > > implementation is also very simple. Because the physical memory it shares does
> > > not come from a vm, but from the host.
> > >
> > > So I think we have reached an agreement on the relationship between ism and
> > > virtio-vhost-user. ism is used to provide shared memory to the upper layer, and
> > > this device should be necessary to add (of course, listen to some other people's
> > > opinions). And How is its backend shared with other vms? This is our second
> > > question.
> >
> > I'm not sure I get the question, but we're sharing memory not backend?
>
>
> In the design of traditional devices such as virtio-net, a piece of memory is
> allocated by guest A and then handed over to the backend for use.
> virtio-vhost-user allows another guest B to access guest A's memory.

If you meant the RFC patch posted, yes. But actually,
virtio-vhost-user could be used to implemented e.g host hands over
memory for guest to use?

>
> Our approach is that the memory is allocated by the backend. When alloc/attach,
> just insert the memory into the guest's memory space using
> memory_region_add_subregion(). That's why we don't use vhost-user in our
> implementation.
>
> On the other hand, we are also looking in the other direction. If the memory is
> allocated by one vm in the guest, then we have to use the vhost-user protocol.

Probably not? It works just like all the regions are pre-allocated in
the case of ISM.

Similarly, if we use virtio-vhost-user, we just need a new IOTLB
message to allocate memory (or reuse the IOTLB_UPDATE).

>
> 1. The advantage of this is that it will be more convenient in resource
>    management
>
> 2. Using the vhost-user protocol on the backend implementation will be more
>    complicated than our current solution.
>
> 3. If the peer is malicious, then we have to unmap the memory mapping of the
>    peer. (This has been discussed in another email, and it should be possible.)

This only work if the peer's VMM is trusted.

Thanks

>
> Thanks.
>
>
>
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > And it is dynamic. After an ism region is shared with a vm, it can be shared
> > > > > with other vms.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
> > > > > > > > > >    determines the sharing relationship at startup.
> > > > > > > > >
> > > > > > > > > Not necessarily with IOTLB API?
> > > > > > > >
> > > > > > > > Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
> > > > > > > > provide the same memory on the host to two vms. So the implementation of this
> > > > > > > > part will be much simpler. This is why we gave up virtio-vhost-user at the
> > > > > > > > beginning.
> > > > > > >
> > > > > > > Ok, just to make sure we're at the same page. From spec level,
> > > > > > > virtio-vhost-user doesn't (can't) limit the backend to be implemented
> > > > > > > in another VM. So it should be ok to be used for sharing memory
> > > > > > > between a guest and host.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 4. For security issues, the device under virtiovhostuser may mmap more memory,
> > > > > > > > > >    while ism only maps one region to other devices
> > > > > > > > >
> > > > > > > > > With VHOST_IOTLB_MAP, the map could be done per region.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > # Design
> > > > > > > > > > > >
> > > > > > > > > > > >    This is a structure diagram based on ism sharing between two vms.
> > > > > > > > > > > >
> > > > > > > > > > > >     |-------------------------------------------------------------------------------------------------------------|
> > > > > > > > > > > >     | |------------------------------------------------|       |------------------------------------------------| |
> > > > > > > > > > > >     | | Guest                                          |       | Guest                                          | |
> > > > > > > > > > > >     | |                                                |       |                                                | |
> > > > > > > > > > > >     | |   ----------------                             |       |   ----------------                             | |
> > > > > > > > > > > >     | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> > > > > > > > > > > >     | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> > > > > > > > > > > >     | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> > > > > > > > > > > >     | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> > > > > > > > > > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > > > > > > > > >     | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> > > > > > > > > > > >     | |    |  |                -------------------     |       |    |  |                --------------------    | |
> > > > > > > > > > > >     | |                                |               |       |                               |                | |
> > > > > > > > > > > >     | |                                |               |       |                               |                | |
> > > > > > > > > > > >     | | Qemu                           |               |       | Qemu                          |                | |
> > > > > > > > > > > >     | |--------------------------------+---------------|       |-------------------------------+----------------| |
> > > > > > > > > > > >     |                                  |                                                       |                  |
> > > > > > > > > > > >     |                                  |                                                       |                  |
> > > > > > > > > > > >     |                                  |------------------------------+------------------------|                  |
> > > > > > > > > > > >     |                                                                 |                                           |
> > > > > > > > > > > >     |                                                                 |                                           |
> > > > > > > > > > > >     |                                                   --------------------------                                |
> > > > > > > > > > > >     |                                                    | M1 |   | M2 |   | M3 |                                 |
> > > > > > > > > > > >     |                                                   --------------------------                                |
> > > > > > > > > > > >     |                                                                                                             |
> > > > > > > > > > > >     | HOST                                                                                                        |
> > > > > > > > > > > >     ---------------------------------------------------------------------------------------------------------------
> > > > > > > > > > > >
> > > > > > > > > > > > # POC code
> > > > > > > > > > > >
> > > > > > > > > > > >    Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> > > > > > > > > > > >    Qemu:   https://github.com/fengidri/qemu/commits/ism
> > > > > > > > > > > >
> > > > > > > > > > > > If there are any problems, please point them out.
> > > > > > > > > > > >
> > > > > > > > > > > > Hope to hear from you, thank you.
> > > > > > > > > > > >
> > > > > > > > > > > > [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> > > > > > > > > > > > [2] https://dl.acm.org/doi/10.1145/2847562
> > > > > > > > > > > > [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> > > > > > > > > > > > [4] https://lwn.net/Articles/711071/
> > > > > > > > > > > > [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Xuan Zhuo (2):
> > > > > > > > > > > >   Reserve device id for ISM device
> > > > > > > > > > > >   virtio-ism: introduce new device virtio-ism
> > > > > > > > > > > >
> > > > > > > > > > > >  content.tex    |   3 +
> > > > > > > > > > > >  virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > > > > > > >  2 files changed, 343 insertions(+)
> > > > > > > > > > > >  create mode 100644 virtio-ism.tex
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > ---------------------------------------------------------------------
> > > > > > > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > ---------------------------------------------------------------------
> > > > > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > >
> > > > >
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-21  3:30                         ` Dust Li
@ 2022-10-21  6:37                           ` Jason Wang
  2022-10-21  9:26                             ` Dust Li
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Wang @ 2022-10-21  6:37 UTC (permalink / raw)
  To: dust.li
  Cc: Xuan Zhuo, Gerry, virtio-dev, hans, herongguang, zmlcc, tonylu,
	zhenzao, helinguo, mst, cohuck, Stefan Hajnoczi

On Fri, Oct 21, 2022 at 11:30 AM Dust Li <dust.li@linux.alibaba.com> wrote:
>
> On Fri, Oct 21, 2022 at 10:41:26AM +0800, Jason Wang wrote:
> >On Wed, Oct 19, 2022 at 5:27 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >>
> >> On Wed, 19 Oct 2022 17:15:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >> > On Wed, Oct 19, 2022 at 5:12 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >> > >
> >> > > On Wed, 19 Oct 2022 17:08:29 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >> > > > On Wed, Oct 19, 2022 at 4:21 PM Dust Li <dust.li@linux.alibaba.com> wrote:
> >> > > > >
> >> > > > > On Wed, Oct 19, 2022 at 04:03:42PM +0800, Gerry wrote:
> >> > > > > >
> >> > > > > >
> >> > > > > >> 2022年10月19日 16:01,Jason Wang <jasowang@redhat.com> 写道:
> >> > > > > >>
> >> > > > > >> On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >> > > > > >>>
> >> > > > > >>> On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >> > > > > >>>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >> > > > > >>>>>
> >> > > > > >>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >> > > > > >>>>>> Adding Stefan.
> >> > > > > >>>>>>
> >> > > > > >>>>>>
> >> > > > > >>>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> Hello everyone,
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> # Background
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> Nowadays, there is a common scenario to accelerate communication between
> >> > > > > >>>>>>> different VMs and containers, including light weight virtual machine based
> >> > > > > >>>>>>> containers. One way to achieve this is to colocate them on the same host.
> >> > > > > >>>>>>> However, the performance of inter-VM communication through network stack is not
> >> > > > > >>>>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
> >> > > > > >>>>>>> many times, but still no generic solution available [1] [2] [3].
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> >> > > > > >>>>>>> We found that by changing the communication channel between VMs from TCP to SMC
> >> > > > > >>>>>>> with shared memory, we can achieve superior performance for a common
> >> > > > > >>>>>>> socket-based application[5]:
> >> > > > > >>>>>>>  - latency reduced by about 50%
> >> > > > > >>>>>>>  - throughput increased by about 300%
> >> > > > > >>>>>>>  - CPU consumption reduced by about 50%
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> Since there is no particularly suitable shared memory management solution
> >> > > > > >>>>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
> >> > > > > >>>>>>> is the standard for communication in the virtualization world, we want to
> >> > > > > >>>>>>> implement a virtio-ism device based on virtio, which can support on-demand
> >> > > > > >>>>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> >> > > > > >>>>>>> the virtio-ism device need to support:
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
> >> > > > > >>>>>>>   provisioned.
> >> > > > > >>>>>>> 2. Multi-region management: the shared memory is divided into regions,
> >> > > > > >>>>>>>   and a peer may allocate one or more regions from the same shared memory
> >> > > > > >>>>>>>   device.
> >> > > > > >>>>>>> 3. Permission control: The permission of each region can be set seperately.
> >> > > > > >>>>>>
> >> > > > > >>>>>> Looks like virtio-ROCE
> >> > > > > >>>>>>
> >> > > > > >>>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> >> > > > > >>>>>>
> >> > > > > >>>>>> and virtio-vhost-user can satisfy the requirement?
> >> > > > > >>>>>>
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> # Virtio ism device
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> ISM devices provide the ability to share memory between different guests on a
> >> > > > > >>>>>>> host. A guest's memory got from ism device can be shared with multiple peers at
> >> > > > > >>>>>>> the same time. This shared relationship can be dynamically created and released.
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> The shared memory obtained from the device is divided into multiple ism regions
> >> > > > > >>>>>>> for share. ISM device provides a mechanism to notify other ism region referrers
> >> > > > > >>>>>>> of content update events.
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> # Usage (SMC as example)
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> Maybe there is one of possible use cases:
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> >> > > > > >>>>>>>   location of a memory region in the PCI space and a token.
> >> > > > > >>>>>>> 2. The ism driver mmap the memory region and return to SMC with the token
> >> > > > > >>>>>>> 3. SMC passes the token to the connected peer
> >> > > > > >>>>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
> >> > > > > >>>>>>>   get the location of the PCI space of the shared memory
> >> > > > > >>>>>>>
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> # About hot plugging of the ism device
> >> > > > > >>>>>>>
> >> > > > > >>>>>>>   Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> >> > > > > >>>>>>>   less scalable operation. So, we don't plan to support it for now.
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> # Comparison with existing technology
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> ## ivshmem or ivshmem 2.0 of Qemu
> >> > > > > >>>>>>>
> >> > > > > >>>>>>>   1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> >> > > > > >>>>>>>   use this VM, so the security is not enough.
> >> > > > > >>>>>>>
> >> > > > > >>>>>>>   2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> >> > > > > >>>>>>>   other VMs that use the ivshmem 2.0 shared memory device, which also does not
> >> > > > > >>>>>>>   meet our needs in terms of security.
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> ## vhost-pci and virtiovhostuser
> >> > > > > >>>>>>>
> >> > > > > >>>>>>>   Does not support dynamic allocation and therefore not suitable for SMC.
> >> > > > > >>>>>>
> >> > > > > >>>>>> I think this is an implementation issue, we can support VHOST IOTLB
> >> > > > > >>>>>> message then the regions could be added/removed on demand.
> >> > > > > >>>>>
> >> > > > > >>>>>
> >> > > > > >>>>> 1. After the attacker connects with the victim, if the attacker does not
> >> > > > > >>>>>   dereference memory, the memory will be occupied under virtiovhostuser. In the
> >> > > > > >>>>>   case of ism devices, the victim can directly release the reference, and the
> >> > > > > >>>>>   maliciously referenced region only occupies the attacker's resources
> >> > > > > >>>>
> >> > > > > >>>> Let's define the security boundary here. E.g do we trust the device or
> >> > > > > >>>> not? If yes, in the case of virtiovhostuser, can we simple do
> >> > > > > >>>> VHOST_IOTLB_UNMAP then we can safely release the memory from the
> >> > > > > >>>> attacker.
> >> > > > > >>>>
> >> > > > > >>>>>
> >> > > > > >>>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
> >> > > > > >>>>>   time, which is a challenge for virtiovhostuser
> >> > > > > >>>>
> >> > > > > >>>> Please elaborate more the the challenges, anything make
> >> > > > > >>>> virtiovhostuser different?
> >> > > > > >>>
> >> > > > > >>> I understand (please point out any mistakes), one vvu device corresponds to one
> >> > > > > >>> vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
> >> > > > > >>
> >> > > > > >> There could be some misunderstanding here. With 1000 VM, you still
> >> > > > > >> need 1000 virtio-sim devices I think.
> >> > > > > >We are trying to achieve one virtio-ism device per vm instead of one virtio-ism device per SMC connection.
> >> > > >
> >> > > > I wonder if we need something to identify a virtio-ism device since I
> >> > > > guess there's still a chance to have multiple virtio-ism device per VM
> >> > > > (different service chain etc).
> >> > >
> >> > > Yes, there will be such a situation, a vm has multiple virtio-ism devices.
> >> > >
> >> > > What exactly do you mean by "identify"?
> >> >
> >> > E.g we can differ two virtio-net through mac address, do we need
> >> > something similar for ism, or it's completely unncessary (e.g via
> >> > token or other) ?
> >>
> >> Currently, we have not encountered such a request.
> >>
> >> It is conceivable that all physical shared memory ism regions are indexed by
> >> tokens. virtio-ism is a way to obtain these ism regions, so there is no need to
> >> distinguish multiple virtio-ism devices under one vm on the host.
> >
> >So consider a case:
> >
> >VM1 shares ism1 with VM2
> >VM1 shares ism2 with VM3
> >
> >How do application/smc address the different ism device in this case?
> >E.g if VM1 want to talk with VM3 it needs to populate regions in ism2,
> >but how can application or protocol knows this and how can a specific
> >device to be addressed (via BDF?)
>
> In our design, we do have a dev_id for each ISM device.
> Currently, we used it to do permission management, I think
> it can be used to identify different ISM devices.
>
> The spec says:
>
> +\begin{description}
> +\item[\field{dev_id}]      the id of the device.

I see, we need some clarification. E.g is it a UUID or not?

Thanks

> +\item[\field{region_size}] the size of the every ism region
> +\item[\field{notify_size}] the size of the notify address.
>
> <...>
>
> +The device MUST regenerate a \field{dev_id}. \field{dev_id} remains unchanged
> +during reset. \field{dev_id} MUST NOT be 0;
>
> Thanks
>
> >
> >Thanks
> >
> >>
> >> Thanks.
> >>
> >>
> >> >
> >> > Thanks
> >> >
> >> > >
> >> > > Thanks.
> >> > >
> >> > >
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > > > >
> >> > > > > I think we must achieve this if we want to meet the requirements of SMC.
> >> > > > > In SMC, a SMC socket(Corresponding to a TCP socket) need 2 memory
> >> > > > > regions(1 for Tx and 1 for Rx). So if we have 1K TCP connections,
> >> > > > > we'll need 2K share memory regions, and those memory regions are
> >> > > > > dynamically allocated and freed with the TCP socket.
> >> > > > >
> >> > > > > >
> >> > > > > >>
> >> > > > > >>>
> >> > > > > >>>
> >> > > > > >>>>
> >> > > > > >>>>>
> >> > > > > >>>>> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
> >> > > > > >>>>>   determines the sharing relationship at startup.
> >> > > > > >>>>
> >> > > > > >>>> Not necessarily with IOTLB API?
> >> > > > > >>>
> >> > > > > >>> Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
> >> > > > > >>> provide the same memory on the host to two vms. So the implementation of this
> >> > > > > >>> part will be much simpler. This is why we gave up virtio-vhost-user at the
> >> > > > > >>> beginning.
> >> > > > > >>
> >> > > > > >> Ok, just to make sure we're at the same page. From spec level,
> >> > > > > >> virtio-vhost-user doesn't (can't) limit the backend to be implemented
> >> > > > > >> in another VM. So it should be ok to be used for sharing memory
> >> > > > > >> between a guest and host.
> >> > > > > >>
> >> > > > > >> Thanks
> >> > > > > >>
> >> > > > > >>>
> >> > > > > >>> Thanks.
> >> > > > > >>>
> >> > > > > >>>
> >> > > > > >>>>
> >> > > > > >>>>>
> >> > > > > >>>>> 4. For security issues, the device under virtiovhostuser may mmap more memory,
> >> > > > > >>>>>   while ism only maps one region to other devices
> >> > > > > >>>>
> >> > > > > >>>> With VHOST_IOTLB_MAP, the map could be done per region.
> >> > > > > >>>>
> >> > > > > >>>> Thanks
> >> > > > > >>>>
> >> > > > > >>>>>
> >> > > > > >>>>> Thanks.
> >> > > > > >>>>>
> >> > > > > >>>>>>
> >> > > > > >>>>>> Thanks
> >> > > > > >>>>>>
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> # Design
> >> > > > > >>>>>>>
> >> > > > > >>>>>>>   This is a structure diagram based on ism sharing between two vms.
> >> > > > > >>>>>>>
> >> > > > > >>>>>>>    |-------------------------------------------------------------------------------------------------------------|
> >> > > > > >>>>>>>    | |------------------------------------------------|       |------------------------------------------------| |
> >> > > > > >>>>>>>    | | Guest                                          |       | Guest                                          | |
> >> > > > > >>>>>>>    | |                                                |       |                                                | |
> >> > > > > >>>>>>>    | |   ----------------                             |       |   ----------------                             | |
> >> > > > > >>>>>>>    | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
> >> > > > > >>>>>>>    | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
> >> > > > > >>>>>>>    | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
> >> > > > > >>>>>>>    | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
> >> > > > > >>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
> >> > > > > >>>>>>>    | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
> >> > > > > >>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
> >> > > > > >>>>>>>    | |                                |               |       |                               |                | |
> >> > > > > >>>>>>>    | |                                |               |       |                               |                | |
> >> > > > > >>>>>>>    | | Qemu                           |               |       | Qemu                          |                | |
> >> > > > > >>>>>>>    | |--------------------------------+---------------|       |-------------------------------+----------------| |
> >> > > > > >>>>>>>    |                                  |                                                       |                  |
> >> > > > > >>>>>>>    |                                  |                                                       |                  |
> >> > > > > >>>>>>>    |                                  |------------------------------+------------------------|                  |
> >> > > > > >>>>>>>    |                                                                 |                                           |
> >> > > > > >>>>>>>    |                                                                 |                                           |
> >> > > > > >>>>>>>    |                                                   --------------------------                                |
> >> > > > > >>>>>>>    |                                                    | M1 |   | M2 |   | M3 |                                 |
> >> > > > > >>>>>>>    |                                                   --------------------------                                |
> >> > > > > >>>>>>>    |                                                                                                             |
> >> > > > > >>>>>>>    | HOST                                                                                                        |
> >> > > > > >>>>>>>    ---------------------------------------------------------------------------------------------------------------
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> # POC code
> >> > > > > >>>>>>>
> >> > > > > >>>>>>>   Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
> >> > > > > >>>>>>>   Qemu:   https://github.com/fengidri/qemu/commits/ism
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> If there are any problems, please point them out.
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> Hope to hear from you, thank you.
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> >> > > > > >>>>>>> [2] https://dl.acm.org/doi/10.1145/2847562
> >> > > > > >>>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
> >> > > > > >>>>>>> [4] https://lwn.net/Articles/711071/
> >> > > > > >>>>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
> >> > > > > >>>>>>>
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> Xuan Zhuo (2):
> >> > > > > >>>>>>>  Reserve device id for ISM device
> >> > > > > >>>>>>>  virtio-ism: introduce new device virtio-ism
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> content.tex    |   3 +
> >> > > > > >>>>>>> virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
> >> > > > > >>>>>>> 2 files changed, 343 insertions(+)
> >> > > > > >>>>>>> create mode 100644 virtio-ism.tex
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> --
> >> > > > > >>>>>>> 2.32.0.3.g01195cf9f
> >> > > > > >>>>>>>
> >> > > > > >>>>>>>
> >> > > > > >>>>>>> ---------------------------------------------------------------------
> >> > > > > >>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> >> > > > > >>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >> > > > > >>>>>>>
> >> > > > > >>>>>>
> >> > > > > >>>>>
> >> > > > > >>>>> ---------------------------------------------------------------------
> >> > > > > >>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> >> > > > > >>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >> > > > > >>>>>
> >> > > > > >>>>
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> > > ---------------------------------------------------------------------
> >> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> >> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >> > >
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> >> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >>
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-21  5:13                             ` Tony Lu
@ 2022-10-21  6:38                               ` Jason Wang
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Wang @ 2022-10-21  6:38 UTC (permalink / raw)
  To: Tony Lu
  Cc: Dust Li, Xuan Zhuo, virtio-dev, hans, herongguang, zmlcc,
	zhenzao, helinguo, gerry, mst, cohuck, Stefan Hajnoczi

On Fri, Oct 21, 2022 at 1:13 PM Tony Lu <tonylu@linux.alibaba.com> wrote:
>
> On Fri, Oct 21, 2022 at 12:54:22PM +0800, Dust Li wrote:
> > On Fri, Oct 21, 2022 at 11:53:10AM +0800, Tony Lu wrote:
> > >On Fri, Oct 21, 2022 at 11:09:19AM +0800, Jason Wang wrote:
> > >> On Fri, Oct 21, 2022 at 11:05 AM Tony Lu <tonylu@linux.alibaba.com> wrote:
> > >> >
> > >> > On Fri, Oct 21, 2022 at 10:47:29AM +0800, Jason Wang wrote:
> > >> > > On Wed, Oct 19, 2022 at 6:01 PM Tony Lu <tonylu@linux.alibaba.com> wrote:
> > >> > > >
> > >> > > > On Wed, Oct 19, 2022 at 05:04:58PM +0800, Jason Wang wrote:
> > >> > > > >
> > >> > > > > 在 2022/10/19 16:07, Tony Lu 写道:
> > >> > > > > > On Wed, Oct 19, 2022 at 02:02:21PM +0800, Xuan Zhuo wrote:
> > >> > > > > > > On Wed, 19 Oct 2022 12:36:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > >> > > > > > > > On Wed, Oct 19, 2022 at 12:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >> > > > > > > > > On Wed, 19 Oct 2022 11:56:52 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > >> > > > > > > > > > On Wed, Oct 19, 2022 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >> > > > > > > > > > > On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Hi Jason,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > I think there may be some problems with the direction we are discussing.
> > >> > > > > > > > > > Probably not.
> > >> > > > > > > > > >
> > >> > > > > > > > > > As far as we are focusing on technology, there's nothing wrong from my
> > >> > > > > > > > > > perspective. And this is how the community works. Your idea needs to
> > >> > > > > > > > > > be justified and people are free to raise any technical questions
> > >> > > > > > > > > > especially considering you've posted a spec change with prototype
> > >> > > > > > > > > > codes but not only the idea.
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Our
> > >> > > > > > > > > > > goal is to add an new ism device. As far as the spec is concerned, we are not
> > >> > > > > > > > > > > concerned with the implementation of the backend.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > The direction we should discuss is what is the difference between the ism device
> > >> > > > > > > > > > > and other devices such as virtio-net, and whether it is necessary to introduce
> > >> > > > > > > > > > > this new device.
> > >> > > > > > > > > > This is somehow what I want to ask, actually it's not a comparison
> > >> > > > > > > > > > with virtio-net but:
> > >> > > > > > > > > >
> > >> > > > > > > > > > - virtio-roce
> > >> > > > > > > > > > - virtio-vhost-user
> > >> > > > > > > > > > - virtio-(p)mem
> > >> > > > > > > > > >
> > >> > > > > > > > > > or whether we can simply add features to those devices to achieve what
> > >> > > > > > > > > > you want to do here.
> > >> > > > > > > > >
> > >> > > > > > > > > Yes, this is my priority to discuss.
> > >> > > > > > > > >
> > >> > > > > > > > > At the moment, I think the most similar to ism is the Vhost-user Device Backend
> > >> > > > > > > > > of virtio-vhost-user.
> > >> > > > > > > > >
> > >> > > > > > > > > My understanding of it is to map any virtio device to another vm as a vvu
> > >> > > > > > > > > device.
> > >> > > > > > > > Yes, so a possible way is to have a device with memory zone/region
> > >> > > > > > > > provision and management then map it via virtio-vhost-user.
> > >> > > > > > >
> > >> > > > > > > Yes, there is such a possibility. virtio-vhost-user makes me feel that what can
> > >> > > > > > > be shared is the function implementation of map.
> > >> > > > > > >
> > >> > > > > > > But in the vm to provide the interface to the upper layer, I think this is the
> > >> > > > > > > work of ism.
> > >> > > > > > >
> > >> > > > > > > But one of the reasons why I didn't use virtio-vhost-user directly is that in
> > >> > > > > > > another vm, the guest can operate the vvu device, which we hope that both sides
> > >> > > > > > > are equal to the ism device.
> > >> > > > > > >
> > >> > > > > > > So I want to agree on a question first: who will provide the upper layer with
> > >> > > > > > > the ability to share the memory area?
> > >> > > > > > >
> > >> > > > > > > Our answer is a new ism device. How does this device achieve memory sharing, I
> > >> > > > > > > think is the second question.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > > >  From this design purpose, I think the two are different.
> > >> > > > > > > > >
> > >> > > > > > > > > Of course, you might want to extend it, it does have some similarities and uses
> > >> > > > > > > > > a lot of similar techniques.
> > >> > > > > > > > I don't have any preference so far. If you think your idea makes more
> > >> > > > > > > > sense, then try your best to justify it in the list.
> > >> > > > > > > >
> > >> > > > > > > > > So we can really discuss in this direction, whether
> > >> > > > > > > > > the vvu device can be extended to achieve the purpose of ism, or whether the
> > >> > > > > > > > > design goals can be agreed.
> > >> > > > > > > > I've added Stefan in the loop, let's hear from him.
> > >> > > > > > > >
> > >> > > > > > > > > Or, in the direction of memory sharing in the backend, can ism and vvu be merged?
> > >> > > > > > > > > Should device/driver APIs remain independent?
> > >> > > > > > > > Btw, you mentioned that one possible user of ism is the smc, but I
> > >> > > > > > > > don't see how it connects to that with your prototype driver.
> > >> > > > > > > Yes, we originally had plans, but the virtio spec was considered for submission,
> > >> > > > > > > so this was not included. Maybe, we should have included this part @Tony
> > >> > > > > > >
> > >> > > > > > > A brief introduction is that SMC currently has a corresponding
> > >> > > > > > > s390/net/ism_drv.c and we will replace this in the virtualization scenario.
> > >> > > > >
> > >> > > > >
> > >> > > > > Ok, I see. So I think the goal is to implement something in virtio that is
> > >> > > > > functional equivalent to IBM ISM device.
> > >> > > > >
> > >> > > >
> > >> > > > Yes, IBM ISM devices do something similar and it inspired this.
> > >> > >
> > >> > > Ok, it would be better to mention this in the cover letter of the next
> > >> > > version. This can ease the reviewers (IBM has some good docs of those
> > >> > > from the website).
> > >> > >
> > >> >
> > >> > Yes, we will do it.
> > >>
> > >> Btw, I wonder about the plan to support live migration. E.g do we need
> > >> to hot unplug the ism device before the migration then we can fallback
> > >> to TCP/IP ?
> > >>
> > >
> > >>From the point view of SMC, SMC-R maintains multiple link (RDMA QP), it
> > >can live migrate existed connections to new link.
> > >
> > >Currently, yes, for SMC-D.
> >
> > I think Jason means VM live migration from one Host to another. Am I
> > right, Jason ?

Yes.

> >
> > In that case, the share memory from the ISM device is no longer valid,
> > I think we have to hot unplug before the migration to notify SMC that
> > the SMC-D link is no longer usable.
>
> Yes, this is what I mean ;-) SMC-D needs to unplug the device.
>
> > IIUC, SMC-D doesn't support transparently fallback to TCP/IP in this case
> > now. But I think we could make that happen, since SMC already support link
> > migration between different RDMA devices.
>
> Yes, currently SMC-D doesn't support migration to another device or
> fallback. And SMC-R supports migration to another link, no fallback.

Ok. I see.

Thanks

>
> Cheers,
> Tony Lu
>
> > Thanks
> >
> > >
> > >Cheers,
> > >Tony Lu
> > >
> > >
> > >> Thanks
> > >>
> > >> >
> > >> > > >
> > >> > > > >
> > >> > > > > > >
> > >> > > > > > > Thanks.
> > >> > > > > > >
> > >> > > > > > SMC is a network protocol which is modeled by shared memory rather than
> > >> > > > > > packet.
> > >> > > > >
> > >> > > > >
> > >> > > > > After reading more SMC from IBM website, I think you meant SMC-D here. And I
> > >> > > > > wonder in order to have a complete SMC solution we still need virtio-ROCE
> > >> > > > > for inter host communcation?
> > >> > > > >
> > >> > > >
> > >> > > > Mostly yes.
> > >> > > >
> > >> > > > SMC-D is the part of whole SMC solution. SMC supports multiple
> > >> > > > underlying device, -D means ISM device, -R means RDMA device. The key
> > >> > > > data model is shared memory, SMC uses RDMA (-R) or ISM(-D) to *share*
> > >> > > > memory between peers, and it will choose the suitable device on demand
> > >> > > > during handshaking. If there was no suitable device, it would fall back
> > >> > > > to TCP. So virtio-ROCE is not required.
> > >> > >
> > >> > > So the commniting peers on the same host we need SMC-D, in the future
> > >> > > we need to use RDMA to offload the communication among the peers of
> > >> > > different hosts. Then we can get fully transparent offload no matter
> > >> > > the peer is local or not.
> > >> > >
> > >> >
> > >> > Yes, this is what we want to do.
> > >> >
> > >> > > >
> > >> > > > >
> > >> > > > > >   Actually the basic required interfaces of SMC device are:
> > >> > > > > >
> > >> > > > > >    - alloc / free memory region, each connection peer has two memory
> > >> > > > > >     regions dynamically for sending and receiving ring buffer.
> > >> > > > > >    - attach / detach memory region, remote attaches local-allocated
> > >> > > > > >     sending region as receiving region, vice versa.
> > >> > > > > >    - notify, tell peer to read data and update cursor.
> > >> > > > > >
> > >> > > > > > Then the device can be registered as SMC ISM device. Of course, SMC
> > >> > > > > > also requires some modification to adapt it.
> > >> > > > >
> > >> > > > >
> > >> > > > > Looking at s390 ism driver it requires other stuffs like vlan add/remove or
> > >> > > > > gid query, do we need them as well?
> > >> > > >
> > >> > > > vlan is not required in this use case. ISM uses gid to identified each
> > >> > > > others, maybe we could implement it in virtio ways.
> > >> > >
> > >> > > I'd suggest adding the codes to register the driver to SMC/ISM in the
> > >> > > next version (instead of a simple procfs hooking). Then people can
> > >> > > easily play or review.
> > >> > >
> > >> >
> > >> > Ok, I will add the codes in the next version.
> > >> >
> > >> > Cheers,
> > >> > Tony Lu
> > >> >
> > >> > > Thanks
> > >> > >
> > >> > > >
> > >> > > > To support virtio-ism smoothly, the interfaces of ISM driver still need
> > >> > > > to be adjusted. I will put it on the table with IBM people.
> > >> > > >
> > >> > > > Cheers,
> > >> > > > Tony Lu
> > >> > > >
> > >> > > > >
> > >> > > > > Thanks
> > >> > > > >
> > >> > > > >
> > >> > > > > >
> > >> > > > > > Cheers,
> > >> > > > > > Tony Lu
> > >> > > > > >
> > >> > > > > > > > Thanks
> > >> > > > > > > >
> > >> > > > > > > > > Thanks.
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > > > How to share the backend with other deivce is another problem.
> > >> > > > > > > > > > Yes, anything that is used for your virito-ism prototype can be used
> > >> > > > > > > > > > for other devices.
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Our goal is to dynamically obtain a piece of memory to share with other vms.
> > >> > > > > > > > > > So at this level, I don't see the exact difference compared to
> > >> > > > > > > > > > virtio-vhost-user. Let's just focus on the API that carries on the
> > >> > > > > > > > > > semantic:
> > >> > > > > > > > > >
> > >> > > > > > > > > > - map/unmap
> > >> > > > > > > > > > - permission update
> > >> > > > > > > > > >
> > >> > > > > > > > > > The only missing piece is the per region notification.
> > >> > > > > > > > > >
> > >> > > > > > > > > > > In a connection, this memory will be used repeatedly. As far as SMC is concerned,
> > >> > > > > > > > > > > it will use it as a ring. Of course, we also need a notify mechanism.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > That's what we're aiming for, so we should first discuss whether this
> > >> > > > > > > > > > > requirement is reasonable.
> > >> > > > > > > > > > So unless somebody said "no", it is fine until now.
> > >> > > > > > > > > >
> > >> > > > > > > > > > > I think it's a feature currently not supported by
> > >> > > > > > > > > > > other devices specified by the current virtio spce.
> > >> > > > > > > > > > Probably, but we've already had rfcs for roce and vhost-user.
> > >> > > > > > > > > >
> > >> > > > > > > > > > Thanks
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Thanks.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > >
> > >> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-21  6:37                           ` Jason Wang
@ 2022-10-21  9:26                             ` Dust Li
  0 siblings, 0 replies; 61+ messages in thread
From: Dust Li @ 2022-10-21  9:26 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, Gerry, virtio-dev, hans, herongguang, zmlcc, tonylu,
	zhenzao, helinguo, mst, cohuck, Stefan Hajnoczi

On Fri, Oct 21, 2022 at 02:37:20PM +0800, Jason Wang wrote:
>On Fri, Oct 21, 2022 at 11:30 AM Dust Li <dust.li@linux.alibaba.com> wrote:
>>
>> On Fri, Oct 21, 2022 at 10:41:26AM +0800, Jason Wang wrote:
>> >On Wed, Oct 19, 2022 at 5:27 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>> >>
>> >> On Wed, 19 Oct 2022 17:15:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> >> > On Wed, Oct 19, 2022 at 5:12 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>> >> > >
>> >> > > On Wed, 19 Oct 2022 17:08:29 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> >> > > > On Wed, Oct 19, 2022 at 4:21 PM Dust Li <dust.li@linux.alibaba.com> wrote:
>> >> > > > >
>> >> > > > > On Wed, Oct 19, 2022 at 04:03:42PM +0800, Gerry wrote:
>> >> > > > > >
>> >> > > > > >
>> >> > > > > >> 2022年10月19日 16:01,Jason Wang <jasowang@redhat.com> 写道:
>> >> > > > > >>
>> >> > > > > >> On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>> >> > > > > >>>
>> >> > > > > >>> On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> >> > > > > >>>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>> >> > > > > >>>>>
>> >> > > > > >>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> >> > > > > >>>>>> Adding Stefan.
>> >> > > > > >>>>>>
>> >> > > > > >>>>>>
>> >> > > > > >>>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> Hello everyone,
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> # Background
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> Nowadays, there is a common scenario to accelerate communication between
>> >> > > > > >>>>>>> different VMs and containers, including light weight virtual machine based
>> >> > > > > >>>>>>> containers. One way to achieve this is to colocate them on the same host.
>> >> > > > > >>>>>>> However, the performance of inter-VM communication through network stack is not
>> >> > > > > >>>>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
>> >> > > > > >>>>>>> many times, but still no generic solution available [1] [2] [3].
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
>> >> > > > > >>>>>>> We found that by changing the communication channel between VMs from TCP to SMC
>> >> > > > > >>>>>>> with shared memory, we can achieve superior performance for a common
>> >> > > > > >>>>>>> socket-based application[5]:
>> >> > > > > >>>>>>>  - latency reduced by about 50%
>> >> > > > > >>>>>>>  - throughput increased by about 300%
>> >> > > > > >>>>>>>  - CPU consumption reduced by about 50%
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> Since there is no particularly suitable shared memory management solution
>> >> > > > > >>>>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
>> >> > > > > >>>>>>> is the standard for communication in the virtualization world, we want to
>> >> > > > > >>>>>>> implement a virtio-ism device based on virtio, which can support on-demand
>> >> > > > > >>>>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
>> >> > > > > >>>>>>> the virtio-ism device need to support:
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
>> >> > > > > >>>>>>>   provisioned.
>> >> > > > > >>>>>>> 2. Multi-region management: the shared memory is divided into regions,
>> >> > > > > >>>>>>>   and a peer may allocate one or more regions from the same shared memory
>> >> > > > > >>>>>>>   device.
>> >> > > > > >>>>>>> 3. Permission control: The permission of each region can be set seperately.
>> >> > > > > >>>>>>
>> >> > > > > >>>>>> Looks like virtio-ROCE
>> >> > > > > >>>>>>
>> >> > > > > >>>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
>> >> > > > > >>>>>>
>> >> > > > > >>>>>> and virtio-vhost-user can satisfy the requirement?
>> >> > > > > >>>>>>
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> # Virtio ism device
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> ISM devices provide the ability to share memory between different guests on a
>> >> > > > > >>>>>>> host. A guest's memory got from ism device can be shared with multiple peers at
>> >> > > > > >>>>>>> the same time. This shared relationship can be dynamically created and released.
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> The shared memory obtained from the device is divided into multiple ism regions
>> >> > > > > >>>>>>> for share. ISM device provides a mechanism to notify other ism region referrers
>> >> > > > > >>>>>>> of content update events.
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> # Usage (SMC as example)
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> Maybe there is one of possible use cases:
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
>> >> > > > > >>>>>>>   location of a memory region in the PCI space and a token.
>> >> > > > > >>>>>>> 2. The ism driver mmap the memory region and return to SMC with the token
>> >> > > > > >>>>>>> 3. SMC passes the token to the connected peer
>> >> > > > > >>>>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
>> >> > > > > >>>>>>>   get the location of the PCI space of the shared memory
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> # About hot plugging of the ism device
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>>   Hot plugging of devices is a heavier, possibly failed, time-consuming, and
>> >> > > > > >>>>>>>   less scalable operation. So, we don't plan to support it for now.
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> # Comparison with existing technology
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> ## ivshmem or ivshmem 2.0 of Qemu
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>>   1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
>> >> > > > > >>>>>>>   use this VM, so the security is not enough.
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>>   2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
>> >> > > > > >>>>>>>   other VMs that use the ivshmem 2.0 shared memory device, which also does not
>> >> > > > > >>>>>>>   meet our needs in terms of security.
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> ## vhost-pci and virtiovhostuser
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>>   Does not support dynamic allocation and therefore not suitable for SMC.
>> >> > > > > >>>>>>
>> >> > > > > >>>>>> I think this is an implementation issue, we can support VHOST IOTLB
>> >> > > > > >>>>>> message then the regions could be added/removed on demand.
>> >> > > > > >>>>>
>> >> > > > > >>>>>
>> >> > > > > >>>>> 1. After the attacker connects with the victim, if the attacker does not
>> >> > > > > >>>>>   dereference memory, the memory will be occupied under virtiovhostuser. In the
>> >> > > > > >>>>>   case of ism devices, the victim can directly release the reference, and the
>> >> > > > > >>>>>   maliciously referenced region only occupies the attacker's resources
>> >> > > > > >>>>
>> >> > > > > >>>> Let's define the security boundary here. E.g do we trust the device or
>> >> > > > > >>>> not? If yes, in the case of virtiovhostuser, can we simple do
>> >> > > > > >>>> VHOST_IOTLB_UNMAP then we can safely release the memory from the
>> >> > > > > >>>> attacker.
>> >> > > > > >>>>
>> >> > > > > >>>>>
>> >> > > > > >>>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
>> >> > > > > >>>>>   time, which is a challenge for virtiovhostuser
>> >> > > > > >>>>
>> >> > > > > >>>> Please elaborate more the the challenges, anything make
>> >> > > > > >>>> virtiovhostuser different?
>> >> > > > > >>>
>> >> > > > > >>> I understand (please point out any mistakes), one vvu device corresponds to one
>> >> > > > > >>> vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
>> >> > > > > >>
>> >> > > > > >> There could be some misunderstanding here. With 1000 VM, you still
>> >> > > > > >> need 1000 virtio-sim devices I think.
>> >> > > > > >We are trying to achieve one virtio-ism device per vm instead of one virtio-ism device per SMC connection.
>> >> > > >
>> >> > > > I wonder if we need something to identify a virtio-ism device since I
>> >> > > > guess there's still a chance to have multiple virtio-ism device per VM
>> >> > > > (different service chain etc).
>> >> > >
>> >> > > Yes, there will be such a situation, a vm has multiple virtio-ism devices.
>> >> > >
>> >> > > What exactly do you mean by "identify"?
>> >> >
>> >> > E.g we can differ two virtio-net through mac address, do we need
>> >> > something similar for ism, or it's completely unncessary (e.g via
>> >> > token or other) ?
>> >>
>> >> Currently, we have not encountered such a request.
>> >>
>> >> It is conceivable that all physical shared memory ism regions are indexed by
>> >> tokens. virtio-ism is a way to obtain these ism regions, so there is no need to
>> >> distinguish multiple virtio-ism devices under one vm on the host.
>> >
>> >So consider a case:
>> >
>> >VM1 shares ism1 with VM2
>> >VM1 shares ism2 with VM3
>> >
>> >How do application/smc address the different ism device in this case?
>> >E.g if VM1 want to talk with VM3 it needs to populate regions in ism2,
>> >but how can application or protocol knows this and how can a specific
>> >device to be addressed (via BDF?)
>>
>> In our design, we do have a dev_id for each ISM device.
>> Currently, we used it to do permission management, I think
>> it can be used to identify different ISM devices.
>>
>> The spec says:
>>
>> +\begin{description}
>> +\item[\field{dev_id}]      the id of the device.
>
>I see, we need some clarification. E.g is it a UUID or not?

Got it, will address this in the next version

Thanks

>
>Thanks
>
>> +\item[\field{region_size}] the size of the every ism region
>> +\item[\field{notify_size}] the size of the notify address.
>>
>> <...>
>>
>> +The device MUST regenerate a \field{dev_id}. \field{dev_id} remains unchanged
>> +during reset. \field{dev_id} MUST NOT be 0;
>>
>> Thanks
>>
>> >
>> >Thanks
>> >
>> >>
>> >> Thanks.
>> >>
>> >>
>> >> >
>> >> > Thanks
>> >> >
>> >> > >
>> >> > > Thanks.
>> >> > >
>> >> > >
>> >> > > >
>> >> > > > Thanks
>> >> > > >
>> >> > > > >
>> >> > > > > I think we must achieve this if we want to meet the requirements of SMC.
>> >> > > > > In SMC, a SMC socket(Corresponding to a TCP socket) need 2 memory
>> >> > > > > regions(1 for Tx and 1 for Rx). So if we have 1K TCP connections,
>> >> > > > > we'll need 2K share memory regions, and those memory regions are
>> >> > > > > dynamically allocated and freed with the TCP socket.
>> >> > > > >
>> >> > > > > >
>> >> > > > > >>
>> >> > > > > >>>
>> >> > > > > >>>
>> >> > > > > >>>>
>> >> > > > > >>>>>
>> >> > > > > >>>>> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
>> >> > > > > >>>>>   determines the sharing relationship at startup.
>> >> > > > > >>>>
>> >> > > > > >>>> Not necessarily with IOTLB API?
>> >> > > > > >>>
>> >> > > > > >>> Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
>> >> > > > > >>> provide the same memory on the host to two vms. So the implementation of this
>> >> > > > > >>> part will be much simpler. This is why we gave up virtio-vhost-user at the
>> >> > > > > >>> beginning.
>> >> > > > > >>
>> >> > > > > >> Ok, just to make sure we're at the same page. From spec level,
>> >> > > > > >> virtio-vhost-user doesn't (can't) limit the backend to be implemented
>> >> > > > > >> in another VM. So it should be ok to be used for sharing memory
>> >> > > > > >> between a guest and host.
>> >> > > > > >>
>> >> > > > > >> Thanks
>> >> > > > > >>
>> >> > > > > >>>
>> >> > > > > >>> Thanks.
>> >> > > > > >>>
>> >> > > > > >>>
>> >> > > > > >>>>
>> >> > > > > >>>>>
>> >> > > > > >>>>> 4. For security issues, the device under virtiovhostuser may mmap more memory,
>> >> > > > > >>>>>   while ism only maps one region to other devices
>> >> > > > > >>>>
>> >> > > > > >>>> With VHOST_IOTLB_MAP, the map could be done per region.
>> >> > > > > >>>>
>> >> > > > > >>>> Thanks
>> >> > > > > >>>>
>> >> > > > > >>>>>
>> >> > > > > >>>>> Thanks.
>> >> > > > > >>>>>
>> >> > > > > >>>>>>
>> >> > > > > >>>>>> Thanks
>> >> > > > > >>>>>>
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> # Design
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>>   This is a structure diagram based on ism sharing between two vms.
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>>    |-------------------------------------------------------------------------------------------------------------|
>> >> > > > > >>>>>>>    | |------------------------------------------------|       |------------------------------------------------| |
>> >> > > > > >>>>>>>    | | Guest                                          |       | Guest                                          | |
>> >> > > > > >>>>>>>    | |                                                |       |                                                | |
>> >> > > > > >>>>>>>    | |   ----------------                             |       |   ----------------                             | |
>> >> > > > > >>>>>>>    | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
>> >> > > > > >>>>>>>    | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
>> >> > > > > >>>>>>>    | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
>> >> > > > > >>>>>>>    | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
>> >> > > > > >>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
>> >> > > > > >>>>>>>    | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
>> >> > > > > >>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
>> >> > > > > >>>>>>>    | |                                |               |       |                               |                | |
>> >> > > > > >>>>>>>    | |                                |               |       |                               |                | |
>> >> > > > > >>>>>>>    | | Qemu                           |               |       | Qemu                          |                | |
>> >> > > > > >>>>>>>    | |--------------------------------+---------------|       |-------------------------------+----------------| |
>> >> > > > > >>>>>>>    |                                  |                                                       |                  |
>> >> > > > > >>>>>>>    |                                  |                                                       |                  |
>> >> > > > > >>>>>>>    |                                  |------------------------------+------------------------|                  |
>> >> > > > > >>>>>>>    |                                                                 |                                           |
>> >> > > > > >>>>>>>    |                                                                 |                                           |
>> >> > > > > >>>>>>>    |                                                   --------------------------                                |
>> >> > > > > >>>>>>>    |                                                    | M1 |   | M2 |   | M3 |                                 |
>> >> > > > > >>>>>>>    |                                                   --------------------------                                |
>> >> > > > > >>>>>>>    |                                                                                                             |
>> >> > > > > >>>>>>>    | HOST                                                                                                        |
>> >> > > > > >>>>>>>    ---------------------------------------------------------------------------------------------------------------
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> # POC code
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>>   Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
>> >> > > > > >>>>>>>   Qemu:   https://github.com/fengidri/qemu/commits/ism
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> If there are any problems, please point them out.
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> Hope to hear from you, thank you.
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
>> >> > > > > >>>>>>> [2] https://dl.acm.org/doi/10.1145/2847562
>> >> > > > > >>>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
>> >> > > > > >>>>>>> [4] https://lwn.net/Articles/711071/
>> >> > > > > >>>>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> Xuan Zhuo (2):
>> >> > > > > >>>>>>>  Reserve device id for ISM device
>> >> > > > > >>>>>>>  virtio-ism: introduce new device virtio-ism
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> content.tex    |   3 +
>> >> > > > > >>>>>>> virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
>> >> > > > > >>>>>>> 2 files changed, 343 insertions(+)
>> >> > > > > >>>>>>> create mode 100644 virtio-ism.tex
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> --
>> >> > > > > >>>>>>> 2.32.0.3.g01195cf9f
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>> ---------------------------------------------------------------------
>> >> > > > > >>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> >> > > > > >>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>> >> > > > > >>>>>>>
>> >> > > > > >>>>>>
>> >> > > > > >>>>>
>> >> > > > > >>>>> ---------------------------------------------------------------------
>> >> > > > > >>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> >> > > > > >>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>> >> > > > > >>>>>
>> >> > > > > >>>>
>> >> > > > > >>>
>> >> > > > >
>> >> > > >
>> >> > >
>> >> > > ---------------------------------------------------------------------
>> >> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> >> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>> >> > >
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> >> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>> >>
>>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-10-18  7:32 ` Jan Kiszka
@ 2022-11-14 21:30   ` Jan Kiszka
  2022-11-16  2:13     ` Xuan Zhuo
  0 siblings, 1 reply; 61+ messages in thread
From: Jan Kiszka @ 2022-11-14 21:30 UTC (permalink / raw)
  To: Xuan Zhuo, virtio-dev
  Cc: hans, herongguang, zmlcc, dust.li, tonylu, zhenzao, helinguo,
	gerry, mst, cohuck, jasowang

On 18.10.22 09:32, Jan Kiszka wrote:
> On 17.10.22 09:47, Xuan Zhuo wrote:
>> Hello everyone,
>>
>> # Background
>>
>> Nowadays, there is a common scenario to accelerate communication between
>> different VMs and containers, including light weight virtual machine based
>> containers. One way to achieve this is to colocate them on the same host.
>> However, the performance of inter-VM communication through network stack is not
>> optimal and may also waste extra CPU cycles. This scenario has been discussed
>> many times, but still no generic solution available [1] [2] [3].
>>
>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
>> We found that by changing the communication channel between VMs from TCP to SMC
>> with shared memory, we can achieve superior performance for a common
>> socket-based application[5]:
>>   - latency reduced by about 50%
>>   - throughput increased by about 300%
>>   - CPU consumption reduced by about 50%
>>
>> Since there is no particularly suitable shared memory management solution
>> matches the need for SMC(See ## Comparison with existing technology), and virtio
>> is the standard for communication in the virtualization world, we want to
>> implement a virtio-ism device based on virtio, which can support on-demand
>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
>> the virtio-ism device need to support:
>>
>> 1. Dynamic provision: shared memory regions are dynamically allocated and
>>    provisioned.
>> 2. Multi-region management: the shared memory is divided into regions,
>>    and a peer may allocate one or more regions from the same shared memory
>>    device.
>> 3. Permission control: The permission of each region can be set seperately.
>>
>> # Virtio ism device
>>
>> ISM devices provide the ability to share memory between different guests on a
>> host. A guest's memory got from ism device can be shared with multiple peers at
>> the same time. This shared relationship can be dynamically created and released.
>>
>> The shared memory obtained from the device is divided into multiple ism regions
>> for share. ISM device provides a mechanism to notify other ism region referrers
>> of content update events.
>>
>> # Usage (SMC as example)
>>
>> Maybe there is one of possible use cases:
>>
>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
>>    location of a memory region in the PCI space and a token.
>> 2. The ism driver mmap the memory region and return to SMC with the token
>> 3. SMC passes the token to the connected peer
>> 3. the peer calls the ism driver interface ism_attach_region(token) to
>>    get the location of the PCI space of the shared memory
>>
>>
>> # About hot plugging of the ism device
>>
>>    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
>>    less scalable operation. So, we don't plan to support it for now.
>>
>> # Comparison with existing technology
>>
>> ## ivshmem or ivshmem 2.0 of Qemu
>>
>>    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
>>    use this VM, so the security is not enough.
>>
>>    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
>>    other VMs that use the ivshmem 2.0 shared memory device, which also does not
>>    meet our needs in terms of security.
> 
> This is addressed by establishing separate links between VMs (modeled
> with separate devices). That is a trade-off between simplicity of the
> model and convenience, for sure.

BTW, simplicity can also brings security because it reduces the trusted
code base.

Another feature of ivshmem-v2 is permitting direct access to essential
resources of the device from /unprivileged/ userspace, including to the
event triggering registers. Is your model designed for that as well? It
not only permits VM-to-VM, it actually makes app-to-app (in VMs) very cheap.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-11-14 21:30   ` Jan Kiszka
@ 2022-11-16  2:13     ` Xuan Zhuo
  2022-11-23 15:27       ` Jan Kiszka
  0 siblings, 1 reply; 61+ messages in thread
From: Xuan Zhuo @ 2022-11-16  2:13 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: hans, herongguang, zmlcc, dust.li, tonylu, zhenzao, helinguo,
	gerry, mst, cohuck, jasowang, virtio-dev

On Mon, 14 Nov 2022 22:30:53 +0100, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 18.10.22 09:32, Jan Kiszka wrote:
> > On 17.10.22 09:47, Xuan Zhuo wrote:
> >> Hello everyone,
> >>
> >> # Background
> >>
> >> Nowadays, there is a common scenario to accelerate communication between
> >> different VMs and containers, including light weight virtual machine based
> >> containers. One way to achieve this is to colocate them on the same host.
> >> However, the performance of inter-VM communication through network stack is not
> >> optimal and may also waste extra CPU cycles. This scenario has been discussed
> >> many times, but still no generic solution available [1] [2] [3].
> >>
> >> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> >> We found that by changing the communication channel between VMs from TCP to SMC
> >> with shared memory, we can achieve superior performance for a common
> >> socket-based application[5]:
> >>   - latency reduced by about 50%
> >>   - throughput increased by about 300%
> >>   - CPU consumption reduced by about 50%
> >>
> >> Since there is no particularly suitable shared memory management solution
> >> matches the need for SMC(See ## Comparison with existing technology), and virtio
> >> is the standard for communication in the virtualization world, we want to
> >> implement a virtio-ism device based on virtio, which can support on-demand
> >> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> >> the virtio-ism device need to support:
> >>
> >> 1. Dynamic provision: shared memory regions are dynamically allocated and
> >>    provisioned.
> >> 2. Multi-region management: the shared memory is divided into regions,
> >>    and a peer may allocate one or more regions from the same shared memory
> >>    device.
> >> 3. Permission control: The permission of each region can be set seperately.
> >>
> >> # Virtio ism device
> >>
> >> ISM devices provide the ability to share memory between different guests on a
> >> host. A guest's memory got from ism device can be shared with multiple peers at
> >> the same time. This shared relationship can be dynamically created and released.
> >>
> >> The shared memory obtained from the device is divided into multiple ism regions
> >> for share. ISM device provides a mechanism to notify other ism region referrers
> >> of content update events.
> >>
> >> # Usage (SMC as example)
> >>
> >> Maybe there is one of possible use cases:
> >>
> >> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> >>    location of a memory region in the PCI space and a token.
> >> 2. The ism driver mmap the memory region and return to SMC with the token
> >> 3. SMC passes the token to the connected peer
> >> 3. the peer calls the ism driver interface ism_attach_region(token) to
> >>    get the location of the PCI space of the shared memory
> >>
> >>
> >> # About hot plugging of the ism device
> >>
> >>    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> >>    less scalable operation. So, we don't plan to support it for now.
> >>
> >> # Comparison with existing technology
> >>
> >> ## ivshmem or ivshmem 2.0 of Qemu
> >>
> >>    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> >>    use this VM, so the security is not enough.
> >>
> >>    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> >>    other VMs that use the ivshmem 2.0 shared memory device, which also does not
> >>    meet our needs in terms of security.
> >
> > This is addressed by establishing separate links between VMs (modeled
> > with separate devices). That is a trade-off between simplicity of the
> > model and convenience, for sure.
>
> BTW, simplicity can also brings security because it reduces the trusted
> code base.
>
> Another feature of ivshmem-v2 is permitting direct access to essential
> resources of the device from /unprivileged/ userspace, including to the
> event triggering registers. Is your model designed for that as well? It
> not only permits VM-to-VM, it actually makes app-to-app (in VMs) very cheap.

Yes, there are two actual application scenarios or design goals:

* As we mentioned above, docking with SMC inside the linux kernel to achieve high-speed communication.
* Virtio-ISM also has an interface below /dev. Ordinary users can also directly
  obtain the SHM region resources to achieve sharing with other APPs on other
  VMs.

https://github.com/fengidri/linux-kernel-virtio-ism/commit/55a8ed21344e26f574dd81b0213b0d61d80e2ecb
https://github.com/fengidri/linux-kernel-virtio-ism/commit/6518739f9a9a36f25d5709da940b7a7938f8e0ee

Thanks.

>
> Jan
>
> --
> Siemens AG, Technology
> Competence Center Embedded Linux
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-11-16  2:13     ` Xuan Zhuo
@ 2022-11-23 15:27       ` Jan Kiszka
  2022-11-24  2:32         ` Xuan Zhuo
  0 siblings, 1 reply; 61+ messages in thread
From: Jan Kiszka @ 2022-11-23 15:27 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: hans, herongguang, zmlcc, dust.li, tonylu, zhenzao, helinguo,
	gerry, mst, cohuck, jasowang, virtio-dev

On 16.11.22 03:13, Xuan Zhuo wrote:
> On Mon, 14 Nov 2022 22:30:53 +0100, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 18.10.22 09:32, Jan Kiszka wrote:
>>> On 17.10.22 09:47, Xuan Zhuo wrote:
>>>> Hello everyone,
>>>>
>>>> # Background
>>>>
>>>> Nowadays, there is a common scenario to accelerate communication between
>>>> different VMs and containers, including light weight virtual machine based
>>>> containers. One way to achieve this is to colocate them on the same host.
>>>> However, the performance of inter-VM communication through network stack is not
>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
>>>> many times, but still no generic solution available [1] [2] [3].
>>>>
>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
>>>> We found that by changing the communication channel between VMs from TCP to SMC
>>>> with shared memory, we can achieve superior performance for a common
>>>> socket-based application[5]:
>>>>   - latency reduced by about 50%
>>>>   - throughput increased by about 300%
>>>>   - CPU consumption reduced by about 50%
>>>>
>>>> Since there is no particularly suitable shared memory management solution
>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
>>>> is the standard for communication in the virtualization world, we want to
>>>> implement a virtio-ism device based on virtio, which can support on-demand
>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
>>>> the virtio-ism device need to support:
>>>>
>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
>>>>    provisioned.
>>>> 2. Multi-region management: the shared memory is divided into regions,
>>>>    and a peer may allocate one or more regions from the same shared memory
>>>>    device.
>>>> 3. Permission control: The permission of each region can be set seperately.
>>>>
>>>> # Virtio ism device
>>>>
>>>> ISM devices provide the ability to share memory between different guests on a
>>>> host. A guest's memory got from ism device can be shared with multiple peers at
>>>> the same time. This shared relationship can be dynamically created and released.
>>>>
>>>> The shared memory obtained from the device is divided into multiple ism regions
>>>> for share. ISM device provides a mechanism to notify other ism region referrers
>>>> of content update events.
>>>>
>>>> # Usage (SMC as example)
>>>>
>>>> Maybe there is one of possible use cases:
>>>>
>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
>>>>    location of a memory region in the PCI space and a token.
>>>> 2. The ism driver mmap the memory region and return to SMC with the token
>>>> 3. SMC passes the token to the connected peer
>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
>>>>    get the location of the PCI space of the shared memory
>>>>
>>>>
>>>> # About hot plugging of the ism device
>>>>
>>>>    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
>>>>    less scalable operation. So, we don't plan to support it for now.
>>>>
>>>> # Comparison with existing technology
>>>>
>>>> ## ivshmem or ivshmem 2.0 of Qemu
>>>>
>>>>    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
>>>>    use this VM, so the security is not enough.
>>>>
>>>>    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
>>>>    other VMs that use the ivshmem 2.0 shared memory device, which also does not
>>>>    meet our needs in terms of security.
>>>
>>> This is addressed by establishing separate links between VMs (modeled
>>> with separate devices). That is a trade-off between simplicity of the
>>> model and convenience, for sure.
>>
>> BTW, simplicity can also brings security because it reduces the trusted
>> code base.
>>
>> Another feature of ivshmem-v2 is permitting direct access to essential
>> resources of the device from /unprivileged/ userspace, including to the
>> event triggering registers. Is your model designed for that as well? It
>> not only permits VM-to-VM, it actually makes app-to-app (in VMs) very cheap.
> 
> Yes, there are two actual application scenarios or design goals:
> 
> * As we mentioned above, docking with SMC inside the linux kernel to achieve high-speed communication.
> * Virtio-ISM also has an interface below /dev. Ordinary users can also directly
>   obtain the SHM region resources to achieve sharing with other APPs on other
>   VMs.
> 
> https://github.com/fengidri/linux-kernel-virtio-ism/commit/55a8ed21344e26f574dd81b0213b0d61d80e2ecb
> https://github.com/fengidri/linux-kernel-virtio-ism/commit/6518739f9a9a36f25d5709da940b7a7938f8e0ee
> 

An example for the missing detach notification in ISM.

And the model of ivshmem-v2 permits syscall-free notification (sending,
not IRQ-based receiving, obviously). On reception, it avoids one vmexit
to throttle incoming events if you have continuously firing sender in
combination with a non-reacting unprivileged receiver task.

Would be great to combine the best of all worlds here. But specifically
missing life-cycle management on the detach side makes the ISM not
better than legacy ivshmem IHMO.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
  2022-11-23 15:27       ` Jan Kiszka
@ 2022-11-24  2:32         ` Xuan Zhuo
  0 siblings, 0 replies; 61+ messages in thread
From: Xuan Zhuo @ 2022-11-24  2:32 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: hans, herongguang, zmlcc, dust.li, tonylu, zhenzao, helinguo,
	gerry, mst, cohuck, jasowang, virtio-dev

On Wed, 23 Nov 2022 16:27:00 +0100, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 16.11.22 03:13, Xuan Zhuo wrote:
> > On Mon, 14 Nov 2022 22:30:53 +0100, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> >> On 18.10.22 09:32, Jan Kiszka wrote:
> >>> On 17.10.22 09:47, Xuan Zhuo wrote:
> >>>> Hello everyone,
> >>>>
> >>>> # Background
> >>>>
> >>>> Nowadays, there is a common scenario to accelerate communication between
> >>>> different VMs and containers, including light weight virtual machine based
> >>>> containers. One way to achieve this is to colocate them on the same host.
> >>>> However, the performance of inter-VM communication through network stack is not
> >>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
> >>>> many times, but still no generic solution available [1] [2] [3].
> >>>>
> >>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
> >>>> We found that by changing the communication channel between VMs from TCP to SMC
> >>>> with shared memory, we can achieve superior performance for a common
> >>>> socket-based application[5]:
> >>>>   - latency reduced by about 50%
> >>>>   - throughput increased by about 300%
> >>>>   - CPU consumption reduced by about 50%
> >>>>
> >>>> Since there is no particularly suitable shared memory management solution
> >>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
> >>>> is the standard for communication in the virtualization world, we want to
> >>>> implement a virtio-ism device based on virtio, which can support on-demand
> >>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
> >>>> the virtio-ism device need to support:
> >>>>
> >>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
> >>>>    provisioned.
> >>>> 2. Multi-region management: the shared memory is divided into regions,
> >>>>    and a peer may allocate one or more regions from the same shared memory
> >>>>    device.
> >>>> 3. Permission control: The permission of each region can be set seperately.
> >>>>
> >>>> # Virtio ism device
> >>>>
> >>>> ISM devices provide the ability to share memory between different guests on a
> >>>> host. A guest's memory got from ism device can be shared with multiple peers at
> >>>> the same time. This shared relationship can be dynamically created and released.
> >>>>
> >>>> The shared memory obtained from the device is divided into multiple ism regions
> >>>> for share. ISM device provides a mechanism to notify other ism region referrers
> >>>> of content update events.
> >>>>
> >>>> # Usage (SMC as example)
> >>>>
> >>>> Maybe there is one of possible use cases:
> >>>>
> >>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
> >>>>    location of a memory region in the PCI space and a token.
> >>>> 2. The ism driver mmap the memory region and return to SMC with the token
> >>>> 3. SMC passes the token to the connected peer
> >>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
> >>>>    get the location of the PCI space of the shared memory
> >>>>
> >>>>
> >>>> # About hot plugging of the ism device
> >>>>
> >>>>    Hot plugging of devices is a heavier, possibly failed, time-consuming, and
> >>>>    less scalable operation. So, we don't plan to support it for now.
> >>>>
> >>>> # Comparison with existing technology
> >>>>
> >>>> ## ivshmem or ivshmem 2.0 of Qemu
> >>>>
> >>>>    1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
> >>>>    use this VM, so the security is not enough.
> >>>>
> >>>>    2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
> >>>>    other VMs that use the ivshmem 2.0 shared memory device, which also does not
> >>>>    meet our needs in terms of security.
> >>>
> >>> This is addressed by establishing separate links between VMs (modeled
> >>> with separate devices). That is a trade-off between simplicity of the
> >>> model and convenience, for sure.
> >>
> >> BTW, simplicity can also brings security because it reduces the trusted
> >> code base.
> >>
> >> Another feature of ivshmem-v2 is permitting direct access to essential
> >> resources of the device from /unprivileged/ userspace, including to the
> >> event triggering registers. Is your model designed for that as well? It
> >> not only permits VM-to-VM, it actually makes app-to-app (in VMs) very cheap.
> >
> > Yes, there are two actual application scenarios or design goals:
> >
> > * As we mentioned above, docking with SMC inside the linux kernel to achieve high-speed communication.
> > * Virtio-ISM also has an interface below /dev. Ordinary users can also directly
> >   obtain the SHM region resources to achieve sharing with other APPs on other
> >   VMs.
> >
> > https://github.com/fengidri/linux-kernel-virtio-ism/commit/55a8ed21344e26f574dd81b0213b0d61d80e2ecb
> > https://github.com/fengidri/linux-kernel-virtio-ism/commit/6518739f9a9a36f25d5709da940b7a7938f8e0ee
> >
>
> An example for the missing detach notification in ISM.

Yes, I agree. I also think this is a good point. We should introduce the
management function of life status in the next version. For example, attach,
detach, permissions change notification.

I think it may be a more appropriate way to use virtio-virtqueue to receive
these messages.

The current ISM Spec defines vq that can be used to receive a shm update. We
can add some new events.

struct event {
	u32 ev_type; // update event, detach event or attach event
	......
}


>
> And the model of ivshmem-v2 permits syscall-free notification (sending,
> not IRQ-based receiving, obviously). On reception, it avoids one vmexit
> to throttle incoming events if you have continuously firing sender in
> combination with a non-reacting unprivileged receiver task.

I guess that you are talking about the SHM update event. We did not introduce
similar models in the ISM framework, because we hope that the user will realize
such a mechanism in the shared memory, such as SMC.

If the user uses the shared memory at user space, I think it is also convenient
to use a part of the shared memory as a notification area.

>
> Would be great to combine the best of all worlds here. But specifically
> missing life-cycle management on the detach side makes the ISM not
> better than legacy ivshmem IHMO.

I will add such a mechanism in the next version.

Thanks.


>
> Jan
>
> --
> Siemens AG, Technology
> Competence Center Embedded Linux
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2022-11-24  2:32 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-17  7:47 [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device Xuan Zhuo
2022-10-17  7:47 ` [virtio-dev] [PATCH 1/2] Reserve device id for ISM device Xuan Zhuo
2022-10-17  7:47 ` [PATCH 2/2] virtio-ism: introduce new device virtio-ism Xuan Zhuo
2022-10-17  8:17 ` [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device Jason Wang
2022-10-17 12:26   ` Xuan Zhuo
2022-10-18  6:54     ` Jason Wang
2022-10-18  8:33       ` Gerry
2022-10-19  3:55         ` Jason Wang
2022-10-19  5:29           ` Gerry
2022-10-18  8:55       ` He Rongguang
2022-10-19  4:16         ` Jason Wang
2022-10-19  6:43       ` Xuan Zhuo
2022-10-19  8:01         ` Jason Wang
2022-10-19  8:03           ` Gerry
2022-10-19  8:14             ` Xuan Zhuo
2022-10-19  8:21             ` Dust Li
2022-10-19  9:08               ` Jason Wang
2022-10-19  9:10                 ` Xuan Zhuo
2022-10-19  9:15                   ` Jason Wang
2022-10-19  9:23                     ` Xuan Zhuo
2022-10-21  2:41                       ` Jason Wang
2022-10-21  2:53                         ` Gerry
2022-10-21  3:30                         ` Dust Li
2022-10-21  6:37                           ` Jason Wang
2022-10-21  9:26                             ` Dust Li
2022-10-19  8:13           ` Xuan Zhuo
2022-10-19  8:15             ` Xuan Zhuo
2022-10-19  9:11               ` Jason Wang
2022-10-19  9:15                 ` Xuan Zhuo
2022-10-21  2:42                   ` Jason Wang
2022-10-21  3:03                     ` Xuan Zhuo
2022-10-21  6:35                       ` Jason Wang
2022-10-18  3:15   ` dust.li
2022-10-18  7:29     ` Jason Wang
2022-10-19  2:34   ` Xuan Zhuo
2022-10-19  3:56     ` Jason Wang
2022-10-19  4:08       ` Xuan Zhuo
2022-10-19  4:36         ` Jason Wang
2022-10-19  6:02           ` Xuan Zhuo
2022-10-19  8:07             ` Tony Lu
2022-10-19  9:04               ` Jason Wang
2022-10-19  9:10                 ` Gerry
2022-10-19  9:13                   ` Jason Wang
2022-10-19 10:01                 ` Tony Lu
2022-10-21  2:47                   ` Jason Wang
2022-10-21  3:05                     ` Tony Lu
2022-10-21  3:07                       ` Jason Wang
2022-10-21  3:23                         ` Tony Lu
2022-10-21  3:09                       ` Jason Wang
2022-10-21  3:53                         ` Tony Lu
2022-10-21  4:54                           ` Dust Li
2022-10-21  5:13                             ` Tony Lu
2022-10-21  6:38                               ` Jason Wang
2022-10-19  4:30       ` Xuan Zhuo
2022-10-19  5:10         ` Jason Wang
2022-10-19  6:13           ` Xuan Zhuo
2022-10-18  7:32 ` Jan Kiszka
2022-11-14 21:30   ` Jan Kiszka
2022-11-16  2:13     ` Xuan Zhuo
2022-11-23 15:27       ` Jan Kiszka
2022-11-24  2:32         ` Xuan Zhuo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.