From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Sender: List-Post: List-Help: List-Unsubscribe: List-Subscribe: Received: from lists.oasis-open.org (oasis-open.org [10.110.1.242]) by lists.oasis-open.org (Postfix) with ESMTP id C29E798656C for ; Wed, 19 Oct 2022 08:21:42 +0000 (UTC) Date: Wed, 19 Oct 2022 16:21:36 +0800 From: Dust Li Message-ID: <20221019082136.GA63658@linux.alibaba.com> Reply-To: dust.li@linux.alibaba.com References: <20221017074724.89569-1-xuanzhuo@linux.alibaba.com> <1666009602.9397366-1-xuanzhuo@linux.alibaba.com> <1666161802.3034256-2-xuanzhuo@linux.alibaba.com> <90A95AD3-DCC6-474C-A0E6-13347B13A2B3@linux.alibaba.com> MIME-Version: 1.0 In-Reply-To: <90A95AD3-DCC6-474C-A0E6-13347B13A2B3@linux.alibaba.com> Subject: Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable To: Gerry , Jason Wang Cc: Xuan Zhuo , virtio-dev@lists.oasis-open.org, hans@linux.alibaba.com, herongguang@linux.alibaba.com, zmlcc@linux.alibaba.com, tonylu@linux.alibaba.com, zhenzao@linux.alibaba.com, helinguo@linux.alibaba.com, mst@redhat.com, cohuck@redhat.com, Stefan Hajnoczi List-ID: On Wed, Oct 19, 2022 at 04:03:42PM +0800, Gerry wrote: > > >> 2022=E5=B9=B410=E6=9C=8819=E6=97=A5 16:01=EF=BC=8CJason Wang =E5=86=99=E9=81=93=EF=BC=9A >>=20 >> On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo w= rote: >>>=20 >>> On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang wr= ote: >>>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo = wrote: >>>>>=20 >>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang = wrote: >>>>>> Adding Stefan. >>>>>>=20 >>>>>>=20 >>>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo wrote: >>>>>>>=20 >>>>>>> Hello everyone, >>>>>>>=20 >>>>>>> # Background >>>>>>>=20 >>>>>>> Nowadays, there is a common scenario to accelerate communication be= tween >>>>>>> different VMs and containers, including light weight virtual machin= e based >>>>>>> containers. One way to achieve this is to colocate them on the same= host. >>>>>>> However, the performance of inter-VM communication through network = stack is not >>>>>>> optimal and may also waste extra CPU cycles. This scenario has been= discussed >>>>>>> many times, but still no generic solution available [1] [2] [3]. >>>>>>>=20 >>>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC= [5], >>>>>>> We found that by changing the communication channel between VMs fro= m TCP to SMC >>>>>>> with shared memory, we can achieve superior performance for a commo= n >>>>>>> socket-based application[5]: >>>>>>> - latency reduced by about 50% >>>>>>> - throughput increased by about 300% >>>>>>> - CPU consumption reduced by about 50% >>>>>>>=20 >>>>>>> Since there is no particularly suitable shared memory management so= lution >>>>>>> matches the need for SMC(See ## Comparison with existing technology= ), and virtio >>>>>>> is the standard for communication in the virtualization world, we w= ant to >>>>>>> implement a virtio-ism device based on virtio, which can support on= -demand >>>>>>> memory sharing across VMs, containers or VM-container. To match the= needs of SMC, >>>>>>> the virtio-ism device need to support: >>>>>>>=20 >>>>>>> 1. Dynamic provision: shared memory regions are dynamically allocat= ed and >>>>>>> provisioned. >>>>>>> 2. Multi-region management: the shared memory is divided into regio= ns, >>>>>>> and a peer may allocate one or more regions from the same shared = memory >>>>>>> device. >>>>>>> 3. Permission control: The permission of each region can be set sep= erately. >>>>>>=20 >>>>>> Looks like virtio-ROCE >>>>>>=20 >>>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance= .com/T/ >>>>>>=20 >>>>>> and virtio-vhost-user can satisfy the requirement? >>>>>>=20 >>>>>>>=20 >>>>>>> # Virtio ism device >>>>>>>=20 >>>>>>> ISM devices provide the ability to share memory between different g= uests on a >>>>>>> host. A guest's memory got from ism device can be shared with multi= ple peers at >>>>>>> the same time. This shared relationship can be dynamically created = and released. >>>>>>>=20 >>>>>>> The shared memory obtained from the device is divided into multiple= ism regions >>>>>>> for share. ISM device provides a mechanism to notify other ism regi= on referrers >>>>>>> of content update events. >>>>>>>=20 >>>>>>> # Usage (SMC as example) >>>>>>>=20 >>>>>>> Maybe there is one of possible use cases: >>>>>>>=20 >>>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to = return the >>>>>>> location of a memory region in the PCI space and a token. >>>>>>> 2. The ism driver mmap the memory region and return to SMC with the= token >>>>>>> 3. SMC passes the token to the connected peer >>>>>>> 3. the peer calls the ism driver interface ism_attach_region(token)= to >>>>>>> get the location of the PCI space of the shared memory >>>>>>>=20 >>>>>>>=20 >>>>>>> # About hot plugging of the ism device >>>>>>>=20 >>>>>>> Hot plugging of devices is a heavier, possibly failed, time-consu= ming, and >>>>>>> less scalable operation. So, we don't plan to support it for now. >>>>>>>=20 >>>>>>> # Comparison with existing technology >>>>>>>=20 >>>>>>> ## ivshmem or ivshmem 2.0 of Qemu >>>>>>>=20 >>>>>>> 1. ivshmem 1.0 is a large piece of memory that can be seen by all= devices that >>>>>>> use this VM, so the security is not enough. >>>>>>>=20 >>>>>>> 2. ivshmem 2.0 is a shared memory belonging to a VM that can be r= ead-only by all >>>>>>> other VMs that use the ivshmem 2.0 shared memory device, which al= so does not >>>>>>> meet our needs in terms of security. >>>>>>>=20 >>>>>>> ## vhost-pci and virtiovhostuser >>>>>>>=20 >>>>>>> Does not support dynamic allocation and therefore not suitable fo= r SMC. >>>>>>=20 >>>>>> I think this is an implementation issue, we can support VHOST IOTLB >>>>>> message then the regions could be added/removed on demand. >>>>>=20 >>>>>=20 >>>>> 1. After the attacker connects with the victim, if the attacker does = not >>>>> dereference memory, the memory will be occupied under virtiovhostus= er. In the >>>>> case of ism devices, the victim can directly release the reference,= and the >>>>> maliciously referenced region only occupies the attacker's resource= s >>>>=20 >>>> Let's define the security boundary here. E.g do we trust the device or >>>> not? If yes, in the case of virtiovhostuser, can we simple do >>>> VHOST_IOTLB_UNMAP then we can safely release the memory from the >>>> attacker. >>>>=20 >>>>>=20 >>>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at = the same >>>>> time, which is a challenge for virtiovhostuser >>>>=20 >>>> Please elaborate more the the challenges, anything make >>>> virtiovhostuser different? >>>=20 >>> I understand (please point out any mistakes), one vvu device correspond= s to one >>> vm. If we share memory with 1000 vm, do we have 1000 vvu devices? >>=20 >> There could be some misunderstanding here. With 1000 VM, you still >> need 1000 virtio-sim devices I think. >We are trying to achieve one virtio-ism device per vm instead of one virti= o-ism device per SMC connection. I think we must achieve this if we want to meet the requirements of SMC. In SMC, a SMC socket(Corresponding to a TCP socket) need 2 memory regions(1 for Tx and 1 for Rx). So if we have 1K TCP connections, we'll need 2K share memory regions, and those memory regions are dynamically allocated and freed with the TCP socket. > >>=20 >>>=20 >>>=20 >>>>=20 >>>>>=20 >>>>> 3. The sharing relationship of ism is dynamically increased, and virt= iovhostuser >>>>> determines the sharing relationship at startup. >>>>=20 >>>> Not necessarily with IOTLB API? >>>=20 >>> Unlike virtio-vhost-user, which shares the memory of a vm with another = vm, we >>> provide the same memory on the host to two vms. So the implementation o= f this >>> part will be much simpler. This is why we gave up virtio-vhost-user at = the >>> beginning. >>=20 >> Ok, just to make sure we're at the same page. From spec level, >> virtio-vhost-user doesn't (can't) limit the backend to be implemented >> in another VM. So it should be ok to be used for sharing memory >> between a guest and host. >>=20 >> Thanks >>=20 >>>=20 >>> Thanks. >>>=20 >>>=20 >>>>=20 >>>>>=20 >>>>> 4. For security issues, the device under virtiovhostuser may mmap mor= e memory, >>>>> while ism only maps one region to other devices >>>>=20 >>>> With VHOST_IOTLB_MAP, the map could be done per region. >>>>=20 >>>> Thanks >>>>=20 >>>>>=20 >>>>> Thanks. >>>>>=20 >>>>>>=20 >>>>>> Thanks >>>>>>=20 >>>>>>>=20 >>>>>>> # Design >>>>>>>=20 >>>>>>> This is a structure diagram based on ism sharing between two vms. >>>>>>>=20 >>>>>>> |---------------------------------------------------------------= ----------------------------------------------| >>>>>>> | |------------------------------------------------| |----= --------------------------------------------| | >>>>>>> | | Guest | | Gue= st | | >>>>>>> | | | | = | | >>>>>>> | | ---------------- | | -= --------------- | | >>>>>>> | | | driver | [M1] [M2] [M3] | | |= driver | [M2] [M3] | | >>>>>>> | | ---------------- | | | | | -= --------------- | | | | >>>>>>> | | |cq| |map |map |map | | = |cq| |map |map | | >>>>>>> | | | | | | | | | = | | | | | | >>>>>>> | | | | ------------------- | | = | | -------------------- | | >>>>>>> | |----|--|----------------| device memory |-----| |----= |--|----------------| device memory |----| | >>>>>>> | | | | ------------------- | | = | | -------------------- | | >>>>>>> | | | | | = | | | >>>>>>> | | | | | = | | | >>>>>>> | | Qemu | | | Qem= u | | | >>>>>>> | |--------------------------------+---------------| |----= ---------------------------+----------------| | >>>>>>> | | = | | >>>>>>> | | = | | >>>>>>> | |----------------------------= --+------------------------| | >>>>>>> | = | | >>>>>>> | = | | >>>>>>> | ------------= -------------- | >>>>>>> | | M1 | | = M2 | | M3 | | >>>>>>> | ------------= -------------- | >>>>>>> | = | >>>>>>> | HOST = | >>>>>>> ----------------------------------------------------------------= ----------------------------------------------- >>>>>>>=20 >>>>>>> # POC code >>>>>>>=20 >>>>>>> Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commi= ts/ism >>>>>>> Qemu: https://github.com/fengidri/qemu/commits/ism >>>>>>>=20 >>>>>>> If there are any problems, please point them out. >>>>>>>=20 >>>>>>> Hope to hear from you, thank you. >>>>>>>=20 >>>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.h= tml >>>>>>> [2] https://dl.acm.org/doi/10.1145/2847562 >>>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document >>>>>>> [4] https://lwn.net/Articles/711071/ >>>>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@li= nux.alibaba.com/T/ >>>>>>>=20 >>>>>>>=20 >>>>>>> Xuan Zhuo (2): >>>>>>> Reserve device id for ISM device >>>>>>> virtio-ism: introduce new device virtio-ism >>>>>>>=20 >>>>>>> content.tex | 3 + >>>>>>> virtio-ism.tex | 340 ++++++++++++++++++++++++++++++++++++++++++++++= +++ >>>>>>> 2 files changed, 343 insertions(+) >>>>>>> create mode 100644 virtio-ism.tex >>>>>>>=20 >>>>>>> -- >>>>>>> 2.32.0.3.g01195cf9f >>>>>>>=20 >>>>>>>=20 >>>>>>> -------------------------------------------------------------------= -- >>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.o= rg >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org >>>>>=20 >>>>=20 >>>=20 --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org