From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Gerry Message-Id: <506143AD-A775-4CC5-B92C-B58E14948E3C@linux.alibaba.com> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\)) Subject: Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device Date: Fri, 21 Oct 2022 10:53:57 +0800 In-Reply-To: References: <20221017074724.89569-1-xuanzhuo@linux.alibaba.com> <1666009602.9397366-1-xuanzhuo@linux.alibaba.com> <1666161802.3034256-2-xuanzhuo@linux.alibaba.com> <90A95AD3-DCC6-474C-A0E6-13347B13A2B3@linux.alibaba.com> <20221019082136.GA63658@linux.alibaba.com> <1666170632.2841902-1-xuanzhuo@linux.alibaba.com> <1666171438.2142851-3-xuanzhuo@linux.alibaba.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_7D30CBDF-D33A-460A-9492-D27FEBF7238F" To: Jason Wang Cc: Xuan Zhuo , virtio-dev@lists.oasis-open.org, hans@linux.alibaba.com, herongguang@linux.alibaba.com, zmlcc@linux.alibaba.com, tonylu@linux.alibaba.com, zhenzao@linux.alibaba.com, helinguo@linux.alibaba.com, mst@redhat.com, cohuck@redhat.com, Stefan Hajnoczi , dust.li@linux.alibaba.com List-ID: --Apple-Mail=_7D30CBDF-D33A-460A-9492-D27FEBF7238F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > 2022=E5=B9=B410=E6=9C=8821=E6=97=A5 10:41=EF=BC=8CJason Wang =E5=86=99=E9=81=93=EF=BC=9A >=20 > On Wed, Oct 19, 2022 at 5:27 PM Xuan Zhuo > wrote: >>=20 >> On Wed, 19 Oct 2022 17:15:23 +0800, Jason Wang wro= te: >>> On Wed, Oct 19, 2022 at 5:12 PM Xuan Zhuo = wrote: >>>>=20 >>>> On Wed, 19 Oct 2022 17:08:29 +0800, Jason Wang w= rote: >>>>> On Wed, Oct 19, 2022 at 4:21 PM Dust Li w= rote: >>>>>>=20 >>>>>> On Wed, Oct 19, 2022 at 04:03:42PM +0800, Gerry wrote: >>>>>>>=20 >>>>>>>=20 >>>>>>>> 2022=E5=B9=B410=E6=9C=8819=E6=97=A5 16:01=EF=BC=8CJason Wang =E5=86=99=E9=81=93=EF=BC=9A >>>>>>>>=20 >>>>>>>> On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo wrote: >>>>>>>>>=20 >>>>>>>>> On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang wrote: >>>>>>>>>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo wrote: >>>>>>>>>>>=20 >>>>>>>>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang wrote: >>>>>>>>>>>> Adding Stefan. >>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo wrote: >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Hello everyone, >>>>>>>>>>>>>=20 >>>>>>>>>>>>> # Background >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Nowadays, there is a common scenario to accelerate communicat= ion between >>>>>>>>>>>>> different VMs and containers, including light weight virtual = machine based >>>>>>>>>>>>> containers. One way to achieve this is to colocate them on th= e same host. >>>>>>>>>>>>> However, the performance of inter-VM communication through ne= twork stack is not >>>>>>>>>>>>> optimal and may also waste extra CPU cycles. This scenario ha= s been discussed >>>>>>>>>>>>> many times, but still no generic solution available [1] [2] [= 3]. >>>>>>>>>>>>>=20 >>>>>>>>>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) bas= ed PoC[5], >>>>>>>>>>>>> We found that by changing the communication channel between V= Ms from TCP to SMC >>>>>>>>>>>>> with shared memory, we can achieve superior performance for a= common >>>>>>>>>>>>> socket-based application[5]: >>>>>>>>>>>>> - latency reduced by about 50% >>>>>>>>>>>>> - throughput increased by about 300% >>>>>>>>>>>>> - CPU consumption reduced by about 50% >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Since there is no particularly suitable shared memory managem= ent solution >>>>>>>>>>>>> matches the need for SMC(See ## Comparison with existing tech= nology), and virtio >>>>>>>>>>>>> is the standard for communication in the virtualization world= , we want to >>>>>>>>>>>>> implement a virtio-ism device based on virtio, which can supp= ort on-demand >>>>>>>>>>>>> memory sharing across VMs, containers or VM-container. To mat= ch the needs of SMC, >>>>>>>>>>>>> the virtio-ism device need to support: >>>>>>>>>>>>>=20 >>>>>>>>>>>>> 1. Dynamic provision: shared memory regions are dynamically a= llocated and >>>>>>>>>>>>> provisioned. >>>>>>>>>>>>> 2. Multi-region management: the shared memory is divided into= regions, >>>>>>>>>>>>> and a peer may allocate one or more regions from the same sh= ared memory >>>>>>>>>>>>> device. >>>>>>>>>>>>> 3. Permission control: The permission of each region can be s= et seperately. >>>>>>>>>>>>=20 >>>>>>>>>>>> Looks like virtio-ROCE >>>>>>>>>>>>=20 >>>>>>>>>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@byt= edance.com/T/ >>>>>>>>>>>>=20 >>>>>>>>>>>> and virtio-vhost-user can satisfy the requirement? >>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>> # Virtio ism device >>>>>>>>>>>>>=20 >>>>>>>>>>>>> ISM devices provide the ability to share memory between diffe= rent guests on a >>>>>>>>>>>>> host. A guest's memory got from ism device can be shared with= multiple peers at >>>>>>>>>>>>> the same time. This shared relationship can be dynamically cr= eated and released. >>>>>>>>>>>>>=20 >>>>>>>>>>>>> The shared memory obtained from the device is divided into mu= ltiple ism regions >>>>>>>>>>>>> for share. ISM device provides a mechanism to notify other is= m region referrers >>>>>>>>>>>>> of content update events. >>>>>>>>>>>>>=20 >>>>>>>>>>>>> # Usage (SMC as example) >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Maybe there is one of possible use cases: >>>>>>>>>>>>>=20 >>>>>>>>>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driv= er to return the >>>>>>>>>>>>> location of a memory region in the PCI space and a token. >>>>>>>>>>>>> 2. The ism driver mmap the memory region and return to SMC wi= th the token >>>>>>>>>>>>> 3. SMC passes the token to the connected peer >>>>>>>>>>>>> 3. the peer calls the ism driver interface ism_attach_region(= token) to >>>>>>>>>>>>> get the location of the PCI space of the shared memory >>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>> # About hot plugging of the ism device >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Hot plugging of devices is a heavier, possibly failed, time-= consuming, and >>>>>>>>>>>>> less scalable operation. So, we don't plan to support it for= now. >>>>>>>>>>>>>=20 >>>>>>>>>>>>> # Comparison with existing technology >>>>>>>>>>>>>=20 >>>>>>>>>>>>> ## ivshmem or ivshmem 2.0 of Qemu >>>>>>>>>>>>>=20 >>>>>>>>>>>>> 1. ivshmem 1.0 is a large piece of memory that can be seen b= y all devices that >>>>>>>>>>>>> use this VM, so the security is not enough. >>>>>>>>>>>>>=20 >>>>>>>>>>>>> 2. ivshmem 2.0 is a shared memory belonging to a VM that can= be read-only by all >>>>>>>>>>>>> other VMs that use the ivshmem 2.0 shared memory device, whi= ch also does not >>>>>>>>>>>>> meet our needs in terms of security. >>>>>>>>>>>>>=20 >>>>>>>>>>>>> ## vhost-pci and virtiovhostuser >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Does not support dynamic allocation and therefore not suitab= le for SMC. >>>>>>>>>>>>=20 >>>>>>>>>>>> I think this is an implementation issue, we can support VHOST = IOTLB >>>>>>>>>>>> message then the regions could be added/removed on demand. >>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> 1. After the attacker connects with the victim, if the attacker= does not >>>>>>>>>>> dereference memory, the memory will be occupied under virtiovh= ostuser. In the >>>>>>>>>>> case of ism devices, the victim can directly release the refer= ence, and the >>>>>>>>>>> maliciously referenced region only occupies the attacker's res= ources >>>>>>>>>>=20 >>>>>>>>>> Let's define the security boundary here. E.g do we trust the dev= ice or >>>>>>>>>> not? If yes, in the case of virtiovhostuser, can we simple do >>>>>>>>>> VHOST_IOTLB_UNMAP then we can safely release the memory from the >>>>>>>>>> attacker. >>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> 2. The ism device of a VM can be shared with multiple (1000+) V= Ms at the same >>>>>>>>>>> time, which is a challenge for virtiovhostuser >>>>>>>>>>=20 >>>>>>>>>> Please elaborate more the the challenges, anything make >>>>>>>>>> virtiovhostuser different? >>>>>>>>>=20 >>>>>>>>> I understand (please point out any mistakes), one vvu device corr= esponds to one >>>>>>>>> vm. If we share memory with 1000 vm, do we have 1000 vvu devices? >>>>>>>>=20 >>>>>>>> There could be some misunderstanding here. With 1000 VM, you still >>>>>>>> need 1000 virtio-sim devices I think. >>>>>>> We are trying to achieve one virtio-ism device per vm instead of on= e virtio-ism device per SMC connection. >>>>>=20 >>>>> I wonder if we need something to identify a virtio-ism device since I >>>>> guess there's still a chance to have multiple virtio-ism device per V= M >>>>> (different service chain etc). >>>>=20 >>>> Yes, there will be such a situation, a vm has multiple virtio-ism devi= ces. >>>>=20 >>>> What exactly do you mean by "identify"? >>>=20 >>> E.g we can differ two virtio-net through mac address, do we need >>> something similar for ism, or it's completely unncessary (e.g via >>> token or other) ? >>=20 >> Currently, we have not encountered such a request. >>=20 >> It is conceivable that all physical shared memory ism regions are indexe= d by >> tokens. virtio-ism is a way to obtain these ism regions, so there is no = need to >> distinguish multiple virtio-ism devices under one vm on the host. >=20 > So consider a case: >=20 > VM1 shares ism1 with VM2 > VM1 shares ism2 with VM3 >=20 > How do application/smc address the different ism device in this case? > E.g if VM1 want to talk with VM3 it needs to populate regions in ism2, > but how can application or protocol knows this and how can a specific > device to be addressed (via BDF?) It works in this way: 1) VM1/VM2/VM3 has one ISM device, and each ISM has an crypt-secure random = host id associated. 2) when VM1 try to create a TCP connection with VM2, the associated host id= will be passed to VM2 through TCP options. 3) when VM2 found the host id matches, it assumes VM1/VM2 are on the same = physical server. 3) then VM2 allocates a memory buffer from the device manager,=20 4) the device manager returns a buffer with an associated token. 4) VM2 sends back the buffer token through TCP options. 5) VM1 issues an attach memory buffer command to the device manager with th= e returned buffer token. 6) now VM1 and VM2 have access to the same shared memory buffer. If VM1 wants to build a new TCP connections with VM3, it goes through the s= ame process through the same ISM device, and got another memory buffer shar= ed by VM1 and VM3 only. >=20 > Thanks >=20 >>=20 >> Thanks. >>=20 >>=20 >>>=20 >>> Thanks >>>=20 >>>>=20 >>>> Thanks. >>>>=20 >>>>=20 >>>>>=20 >>>>> Thanks >>>>>=20 >>>>>>=20 >>>>>> I think we must achieve this if we want to meet the requirements of = SMC. >>>>>> In SMC, a SMC socket(Corresponding to a TCP socket) need 2 memory >>>>>> regions(1 for Tx and 1 for Rx). So if we have 1K TCP connections, >>>>>> we'll need 2K share memory regions, and those memory regions are >>>>>> dynamically allocated and freed with the TCP socket. >>>>>>=20 >>>>>>>=20 >>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> 3. The sharing relationship of ism is dynamically increased, an= d virtiovhostuser >>>>>>>>>>> determines the sharing relationship at startup. >>>>>>>>>>=20 >>>>>>>>>> Not necessarily with IOTLB API? >>>>>>>>>=20 >>>>>>>>> Unlike virtio-vhost-user, which shares the memory of a vm with an= other vm, we >>>>>>>>> provide the same memory on the host to two vms. So the implementa= tion of this >>>>>>>>> part will be much simpler. This is why we gave up virtio-vhost-us= er at the >>>>>>>>> beginning. >>>>>>>>=20 >>>>>>>> Ok, just to make sure we're at the same page. From spec level, >>>>>>>> virtio-vhost-user doesn't (can't) limit the backend to be implemen= ted >>>>>>>> in another VM. So it should be ok to be used for sharing memory >>>>>>>> between a guest and host. >>>>>>>>=20 >>>>>>>> Thanks >>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>> Thanks. >>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> 4. For security issues, the device under virtiovhostuser may mm= ap more memory, >>>>>>>>>>> while ism only maps one region to other devices >>>>>>>>>>=20 >>>>>>>>>> With VHOST_IOTLB_MAP, the map could be done per region. >>>>>>>>>>=20 >>>>>>>>>> Thanks >>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> Thanks. >>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>> Thanks >>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>> # Design >>>>>>>>>>>>>=20 >>>>>>>>>>>>> This is a structure diagram based on ism sharing between two= vms. >>>>>>>>>>>>>=20 >>>>>>>>>>>>> |----------------------------------------------------------= ---------------------------------------------------| >>>>>>>>>>>>> | |------------------------------------------------| = |------------------------------------------------| | >>>>>>>>>>>>> | | Guest | = | Guest | | >>>>>>>>>>>>> | | | = | | | >>>>>>>>>>>>> | | ---------------- | = | ---------------- | | >>>>>>>>>>>>> | | | driver | [M1] [M2] [M3] | = | | driver | [M2] [M3] | | >>>>>>>>>>>>> | | ---------------- | | | | = | ---------------- | | | | >>>>>>>>>>>>> | | |cq| |map |map |map | = | |cq| |map |map | | >>>>>>>>>>>>> | | | | | | | | = | | | | | | | >>>>>>>>>>>>> | | | | ------------------- | = | | | -------------------- | | >>>>>>>>>>>>> | |----|--|----------------| device memory |-----| = |----|--|----------------| device memory |----| | >>>>>>>>>>>>> | | | | ------------------- | = | | | -------------------- | | >>>>>>>>>>>>> | | | | = | | | | >>>>>>>>>>>>> | | | | = | | | | >>>>>>>>>>>>> | | Qemu | | = | Qemu | | | >>>>>>>>>>>>> | |--------------------------------+---------------| = |-------------------------------+----------------| | >>>>>>>>>>>>> | | = | | >>>>>>>>>>>>> | | = | | >>>>>>>>>>>>> | |-----------------------= -------+------------------------| | >>>>>>>>>>>>> | = | | >>>>>>>>>>>>> | = | | >>>>>>>>>>>>> | -------= ------------------- | >>>>>>>>>>>>> | | M1 |= | M2 | | M3 | | >>>>>>>>>>>>> | -------= ------------------- | >>>>>>>>>>>>> | = | >>>>>>>>>>>>> | HOST = | >>>>>>>>>>>>> -----------------------------------------------------------= ---------------------------------------------------- >>>>>>>>>>>>>=20 >>>>>>>>>>>>> # POC code >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/= commits/ism >>>>>>>>>>>>> Qemu: https://github.com/fengidri/qemu/commits/ism >>>>>>>>>>>>>=20 >>>>>>>>>>>>> If there are any problems, please point them out. >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Hope to hear from you, thank you. >>>>>>>>>>>>>=20 >>>>>>>>>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivs= hmem.html >>>>>>>>>>>>> [2] https://dl.acm.org/doi/10.1145/2847562 >>>>>>>>>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document >>>>>>>>>>>>> [4] https://lwn.net/Articles/711071/ >>>>>>>>>>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-ton= ylu@linux.alibaba.com/T/ >>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Xuan Zhuo (2): >>>>>>>>>>>>> Reserve device id for ISM device >>>>>>>>>>>>> virtio-ism: introduce new device virtio-ism >>>>>>>>>>>>>=20 >>>>>>>>>>>>> content.tex | 3 + >>>>>>>>>>>>> virtio-ism.tex | 340 ++++++++++++++++++++++++++++++++++++++++= +++++++++ >>>>>>>>>>>>> 2 files changed, 343 insertions(+) >>>>>>>>>>>>> create mode 100644 virtio-ism.tex >>>>>>>>>>>>>=20 >>>>>>>>>>>>> -- >>>>>>>>>>>>> 2.32.0.3.g01195cf9f >>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>> -------------------------------------------------------------= -------- >>>>>>>>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-op= en.org >>>>>>>>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-= open.org >>>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> ---------------------------------------------------------------= ------ >>>>>>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open= .org >>>>>>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-op= en.org >>>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org >>>>=20 >>>=20 >>=20 >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org --Apple-Mail=_7D30CBDF-D33A-460A-9492-D27FEBF7238F Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

2022=E5=B9=B4= 10=E6=9C=8821=E6=97=A5 10:41=EF=BC=8CJason Wang <jasowang@redhat.com> =E5=86=99=E9=81=93= =EF=BC=9A

On Wed, Oct 19, 2022 at 5:27 PM Xuan Zhu= o <xuanzhuo@linux.alibaba.com> wrote:

On = Wed, 19 Oct 2022 17:15:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
On Wed, Oct 19, 2022 at 5:12 PM Xuan Zhuo= <xuanzhuo@linu= x.alibaba.com> wrote:

On Wed, 19 Oct 2022 17:08:29 +0800, Jason Wang <jasowang@redhat.com> = wrote:
On Wed, Oct 19, 2= 022 at 4:21 PM Dust Li <dust.li@linux.alibaba.com> wrote:

On Wed, Oct 19, 2022 at 04:03:42PM = +0800, Gerry wrote:


2022=E5=B9= =B410=E6=9C=8819=E6=97=A5 16:01=EF=BC=8CJason Wang <jasowang@redhat.com> =E5=86=99=E9=81= =93=EF=BC=9A

On Wed, Oct 19, 2022 at 3:00 PM X= uan Zhuo <xuanz= huo@linux.alibaba.com> wrote:

On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang = <jasowang@redhat.com> wrote:
On Mon, Oc= t 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
<= blockquote type=3D"cite" class=3D"">
On Mon, 17 Oct 2022 16:1= 7:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
Adding Stefan.


On = Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:

Hello everyone,
# Background

Nowad= ays, there is a common scenario to accelerate communication between
different VMs and containers, including light weight virtual machine= based
containers. One way to achieve this is to colocate the= m on the same host.
However, the performance of inter-VM comm= unication through network stack is not
optimal and may also w= aste extra CPU cycles. This scenario has been discussed
many = times, but still no generic solution available [1] [2] [3].
<= br class=3D"">With pci-ivshmem + SMC(Shared Memory Communications: [4]) bas= ed PoC[5],
We found that by changing the communication channe= l between VMs from TCP to SMC
with shared memory, we can achi= eve superior performance for a common
socket-based applicatio= n[5]:
- latency reduced by about 50%
- throughp= ut increased by about 300%
- CPU consumption reduced by about= 50%

Since there is no particularly suitable s= hared memory management solution
matches the need for SMC(See= ## Comparison with existing technology), and virtio
is the s= tandard for communication in the virtualization world, we want to
implement a virtio-ism device based on virtio, which can support on-d= emand
memory sharing across VMs, containers or VM-container. = To match the needs of SMC,
the virtio-ism device need to supp= ort:

1. Dynamic provision: shared memory regio= ns are dynamically allocated and
 provisioned.
2. Multi-region management: the shared memory is divided into regions= ,
 and a peer may allocate one or more regions from the = same shared memory
 device.
3. Permission = control: The permission of each region can be set seperately.

Looks like virtio-ROCE

https://lore.kernel.org/all/20220511095900.34= 3-1-xieyongji@bytedance.com/T/

and virtio-= vhost-user can satisfy the requirement?


# Virtio ism device

ISM devices provide the ability to share memory betwee= n different guests on a
host. A guest's memory got from ism d= evice can be shared with multiple peers at
the same time. Thi= s shared relationship can be dynamically created and released.

The shared memory obtained from the device is divided into= multiple ism regions
for share. ISM device provides a mechan= ism to notify other ism region referrers
of content update ev= ents.

# Usage (SMC as example)
<= br class=3D"">Maybe there is one of possible use cases:

1. SMC calls the interface ism_alloc_region() of the ism driver t= o return the
 location of a memory region in the PCI spa= ce and a token.
2. The ism driver mmap the memory region and = return to SMC with the token
3. SMC passes the token to the c= onnected peer
3. the peer calls the ism driver interface ism_= attach_region(token) to
 get the location of the PCI spa= ce of the shared memory


# About= hot plugging of the ism device

 Hot plug= ging of devices is a heavier, possibly failed, time-consuming, and
 less scalable operation. So, we don't plan to support it for no= w.

# Comparison with existing technology

## ivshmem or ivshmem 2.0 of Qemu
 1. ivshmem 1.0 is a large piece of memory that can be see= n by all devices that
 use this VM, so the security is n= ot enough.

 2. ivshmem 2.0 is a shared me= mory belonging to a VM that can be read-only by all
 oth= er VMs that use the ivshmem 2.0 shared memory device, which also does not meet our needs in terms of security.

## vhost-pci and virtiovhostuser

&nb= sp;Does not support dynamic allocation and therefore not suitable for SMC.<= br class=3D"">

I think this is an implementation= issue, we can support VHOST IOTLB
message then the regions c= ould be added/removed on demand.

=
1. After the attacker connects with the victim, if the attac= ker does not
 dereference memory, the memory will be occ= upied under virtiovhostuser. In the
 case of ism devices= , the victim can directly release the reference, and the
&nbs= p;maliciously referenced region only occupies the attacker's resources

Let's define the security boundary he= re. E.g do we trust the device or
not? If yes, in the case of= virtiovhostuser, can we simple do
VHOST_IOTLB_UNMAP then we = can safely release the memory from the
attacker.


2. The= ism device of a VM can be shared with multiple (1000+) VMs at the same
 time, which is a challenge for virtiovhostuser

Please elaborate more the the challenges, an= ything make
virtiovhostuser different?

I understand (please point out any mistakes), one vvu d= evice corresponds to one
vm. If we share memory with 1000 vm,= do we have 1000 vvu devices?

The= re could be some misunderstanding here. With 1000 VM, you still
need 1000 virtio-sim devices I think.
We are = trying to achieve one virtio-ism device per vm instead of one virtio-ism de= vice per SMC connection.

I wonder if we need something to identify a virtio-ism device since I=
guess there's still a chance to have multiple virtio-ism dev= ice per VM
(different service chain etc).

Yes, there will be such a situation, a vm has multip= le virtio-ism devices.

What exactly do you mea= n by "identify"?

E.g we can diffe= r two virtio-net through mac address, do we need
something si= milar for ism, or it's completely unncessary (e.g via
token o= r other) ?

Currently, we have not= encountered such a request.

It is conceivable= that all physical shared memory ism regions are indexed by
t= okens. virtio-ism is a way to obtain these ism regions, so there is no need= to
distinguish multiple virtio-ism devices under one vm on t= he host.

So consider a case:

VM1 shares ism1 with VM2
VM1 shares ism2 with VM3

How do application/= smc address the different ism device in this case?
E.g if VM1 want to talk with VM3 it needs to populate= regions in ism2,
but how can= application or protocol knows this and how can a specific
device to be addressed (via BDF?)
It works in this wa= y:
1) VM1/VM2/VM3 has one ISM device, and each ISM has an crypt-s= ecure random host id associated.
2) when VM1 try to create a TCP = connection with VM2, the associated host id will be passed to VM2 through T= CP options.
3) when VM2 found  the host id matches, it assum= es VM1/VM2 are on the same physical server.
3) then VM2 allocates= a memory buffer from the device manager, 
4) the device man= ager returns a buffer with an associated token.
4) VM2 sends back= the buffer token through TCP options.
5) VM1 issues an attach me= mory buffer command to the device manager with the returned buffer token.
6) now VM1 and VM2 have access to the same shared memory buffer.

If VM1 wants to build a new TCP connecti= ons with VM3, it goes through the same process through the same ISM device,= and got another memory buffer shared by VM1 and VM3 only.



Thanks


Thanks.



Thanks


Thanks.


Thanks


I think we must achieve this if we want to meet t= he requirements of SMC.
In SMC, a SMC socket(Corresponding to= a TCP socket) need 2 memory
regions(1 for Tx and 1 for Rx). = So if we have 1K TCP connections,
we'll need 2K share memory = regions, and those memory regions are
dynamically allocated a= nd freed with the TCP socket.






3. The sharing relationship of ism is d= ynamically increased, and virtiovhostuser
 determines th= e sharing relationship at startup.

Not necessarily with IOTLB API?

Unlike virtio-vhost-user, which shares the memory of a vm with another vm,= we
provide the same memory on the host to two vms. So the im= plementation of this
part will be much simpler. This is why w= e gave up virtio-vhost-user at the
beginning.
<= /blockquote>
Ok, just to make sure we're at the same page. Fr= om spec level,
virtio-vhost-user doesn't (can't) limit the ba= ckend to be implemented
in another VM. So it should be ok to = be used for sharing memory
between a guest and host.

Thanks


Thanks.



4. For security issues, the devic= e under virtiovhostuser may mmap more memory,
 while ism= only maps one region to other devices

With VHOST_IOTLB_MAP, the map could be done per region.

Thanks


Thanks.


Thanks


# Design

 This is a structure diagram based on ism sharing= between two vms.

  |---------------= ---------------------------------------------------------------------------= -------------------|
  | |-------------------------= -----------------------|       |-------------= -----------------------------------| |
  | | Guest =             &nb= sp;            =             &nb= sp;   |       | Guest  &n= bsp;            = ;            &n= bsp;            = ;  | |
  | |     &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;    |       |  =             &nb= sp;            =             &nb= sp;        | |
 =  | |   ----------------       =             &nb= sp;         |   &nbs= p;   |   ----------------     =             &nb= sp;           | |  | |   |    driver  &n= bsp; |     [M1]   [M2]   [M3] =      |       |  = ; |    driver    |    &nb= sp;        [M2]   [M3] &n= bsp;   | |
  | |   -------= ---------       |     &nb= sp;|      |       | =       |   ----------------  &n= bsp;            = ;|      |      | |
  | |    |cq|     &nbs= p;            |= map   |map   |map    |    = ;   |    |cq|      &= nbsp;           &nbs= p;       |map   |map  &nb= sp;| |
  | |    |  |  &nbs= p;            &= nbsp;  |      |     =  |       |      = ; |    |  |       &n= bsp;            = ;      |      | &nbs= p;    | |
  | |   &nb= sp;|  |           &n= bsp;    -------------------     | &= nbsp;     |    |  |  &nbs= p;            &= nbsp;--------------------    | |
  |= |----|--|----------------|  device memory  |-----|   &= nbsp;   |----|--|----------------|  device memory  = ; |----| |
  | |    |  | &= nbsp;           &nbs= p;  -------------------     |   &nb= sp;   |    |  |     =            ---------= -----------    | |
  | |   = ;            &n= bsp;            = ;    |         =       |       |=             &n= bsp;            = ;     |        =         | |
 &nb= sp;| |            &n= bsp;            = ;       |      =          |    &= nbsp;  |          &n= bsp;            = ;        |     =            | |
  | | Qemu        &nb= sp;            =       |       &= nbsp;       |     &n= bsp; | Qemu          &nbs= p;            &= nbsp;  |          &n= bsp;     | |
  | |--------= ------------------------+---------------|      &nb= sp;|-------------------------------+----------------| |
 = ; |            =             &nb= sp;         |   &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p; |            = ;      |
  |  &n= bsp;            = ;            &n= bsp;      |      &nb= sp;            =             &nb= sp;            =            |  &= nbsp;           &nbs= p;   |
  |     &= nbsp;           &nbs= p;            &= nbsp;   |------------------------------+--------------------= ----|            &nb= sp;     |
  |   =             &nb= sp;            =             &nb= sp;            =             | &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;    |
  |    &= nbsp;           &nbs= p;            &= nbsp;           &nbs= p;            &= nbsp;          |  &n= bsp;            = ;            &n= bsp;            = ;   |
  |     &n= bsp;            = ;            &n= bsp;            = ;        --------------------------=             &n= bsp;            = ;      |
  |  &n= bsp;            = ;            &n= bsp;            = ;            | = M1 |   | M2 |   | M3 |      &n= bsp;            = ;            &n= bsp; |
  |       = ;            &n= bsp;            = ;            &n= bsp;      --------------------------  &n= bsp;            = ;            &n= bsp;    |
  |    = ;            &n= bsp;            = ;            &n= bsp;            = ;            &n= bsp;            = ;            &n= bsp;            = ;     |
  | HOST  &nb= sp;            =             &nb= sp;            =             &nb= sp;            =             &nb= sp;            =             &nb= sp; |
  --------------------------------------= -------------------------------------------------------------------------
# POC code

 K= ernel: https://github.com/fengidri/linux-kernel-virtio-ism/comm= its/ism
 Qemu:   https://github.com/fengidri/qemu= /commits/ism

If there are any problems, pl= ease point them out.

Hope to hear from you, th= ank you.

[1] https://projectacr= n.github.io/latest/tutorials/enable_ivshmem.html
[2] https://dl.acm.or= g/doi/10.1145/2847562
[3] https://hal.archives-ouverte= s.fr/hal-00368622/document
[4] https://lwn.net/Articles/711071/
[5] https://lore.kernel.org/netdev/2022= 0720170048.20806-1-tonylu@linux.alibaba.com/T/


Xuan Zhuo (2):
Reserve device id for ISM= device
virtio-ism: introduce new device virtio-ism

content.tex    |   3 +
virtio-ism.tex | 340 ++++++++++++++++++++++++++++++++++++++++++++++= +++
2 files changed, 343 insertions(+)
create m= ode 100644 virtio-ism.tex

--
2.3= 2.0.3.g01195cf9f


--------------= -------------------------------------------------------
To un= subscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org



--------------------------------------------------------------------= -
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-o= pen.org
For additional commands, e-mail: virtio-dev-help@lists.oa= sis-open.org


<= /blockquote>



-------------= --------------------------------------------------------
To u= nsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org



--------------------------------------------------------------------= -
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail:&nbs= p;virtio-dev-help@lists.oasis-open.org

--Apple-Mail=_7D30CBDF-D33A-460A-9492-D27FEBF7238F--