From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05628C4363A for ; Fri, 23 Oct 2020 02:56:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1D24F21556 for ; Fri, 23 Oct 2020 02:55:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=bytedance-com.20150623.gappssmtp.com header.i=@bytedance-com.20150623.gappssmtp.com header.b="CrpOvJ95" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1D24F21556 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 584326B005D; Thu, 22 Oct 2020 22:55:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 533FF6B0062; Thu, 22 Oct 2020 22:55:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3FD576B0068; Thu, 22 Oct 2020 22:55:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0154.hostedemail.com [216.40.44.154]) by kanga.kvack.org (Postfix) with ESMTP id 0FF256B005D for ; Thu, 22 Oct 2020 22:55:58 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9D9FD180AD811 for ; Fri, 23 Oct 2020 02:55:57 +0000 (UTC) X-FDA: 77401675554.29.team98_0b0af5c27256 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 761CA180868F7 for ; Fri, 23 Oct 2020 02:55:57 +0000 (UTC) X-HE-Tag: team98_0b0af5c27256 X-Filterd-Recvd-Size: 14966 Received: from mail-ej1-f67.google.com (mail-ej1-f67.google.com [209.85.218.67]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Fri, 23 Oct 2020 02:55:56 +0000 (UTC) Received: by mail-ej1-f67.google.com with SMTP id c15so313570ejs.0 for ; Thu, 22 Oct 2020 19:55:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=JcxhHvdUqLzb79Z9KB4SXbaCBv3gG/kqg3F8avQFjmM=; b=CrpOvJ95KJADwBqiBkm0suGCVFC1BW+iDdOT4DqpTf4hD83nEK/3EryqUc2S201grG 8fhoM30MfuG+RONSK7sOmHvlqRtnb/xbpN1hHSu/yfN49JHjPM7jWubUOpOcwEUbz3HQ rIZDEbIM8h/6D0eHp3sbp+WocXT9uGC/ZP/Ew/LT3wPeUoMivAZOFe3HvELwddoqNhfs W30pErIVfWXWvnW55WSH4gg1ULQtFtsaajgCClakcOxfUQyKsTvwxq4DObeQJMfNpbGb 186/spBxuneqpjEBQmJGPCEsHLH0NBg0nE+x1+4OKjjXsUfjc3J2MOorBzIdoUF+yLxT ANWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=JcxhHvdUqLzb79Z9KB4SXbaCBv3gG/kqg3F8avQFjmM=; b=E95+hIPhtlD1xA7R6OPyoRVA2w/l687pOYzHGiogKwH0S2/A95HgCXeVT0nMwzHtop +uSo6gZC5z103lu4GT6JwqIExVOJ2Ixa8YSikOgoYXQUr58P46RMzRT5H8O+eLqbrm/Z fSS6DuotDZfKfjER6AzGCxSWSasZe98vrkR5TE2D+wAJ9TQ2OE6MO2s7awgqwIW3aNN0 2l/q1oDLqxQe/q5yj3Uv+4GIPjmh4SsirfFRCmghNvSgL/EjhVxLPvnP5ry5FSfaNGb9 3a0exvpPb4FJ8+sdt1hiLQdQblwd2/b2yIqpl7sTUN1Xk+Q3NgM4mcj3ivySxlXVZp1s 3eeg== X-Gm-Message-State: AOAM531u/ZXSqUs3vBnOF1h8GFte1AcBHynmM4UMWs/sz4OBtJYCNc/O WfiSQK8K79gVm/e+AHkT+8kKIlr129lANEGTaVO1 X-Google-Smtp-Source: ABdhPJxfS3U5ikc9yyhrp1uz6kDwkxefH3bJiNARVuQybs3ErcwcTu43R7DAoPN/CXwQzvuK2WI3WK81LyjOYesynbA= X-Received: by 2002:a17:906:7247:: with SMTP id n7mr5860ejk.174.1603421755453; Thu, 22 Oct 2020 19:55:55 -0700 (PDT) MIME-Version: 1.0 References: <20201019145623.671-1-xieyongji@bytedance.com> <6cff5900-42ee-a0f5-0d5f-9383646c27d9@redhat.com> In-Reply-To: <6cff5900-42ee-a0f5-0d5f-9383646c27d9@redhat.com> From: Yongji Xie Date: Fri, 23 Oct 2020 10:55:44 +0800 Message-ID: Subject: Re: [External] Re: [RFC 0/4] Introduce VDUSE - vDPA Device in Userspace To: Jason Wang Cc: "Michael S. Tsirkin" , akpm@linux-foundation.org, linux-mm@kvack.org, virtualization@lists.linux-foundation.org Content-Type: multipart/alternative; boundary="00000000000028db3c05b24db831" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --00000000000028db3c05b24db831 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Oct 20, 2020 at 5:13 PM Jason Wang wrote: > > On 2020/10/20 =E4=B8=8B=E5=8D=884:35, Yongji Xie wrote: > > > > > > On Tue, Oct 20, 2020 at 4:01 PM Jason Wang > > wrote: > > > > > > On 2020/10/20 =E4=B8=8B=E5=8D=883:39, Yongji Xie wrote: > > > > > > > > > On Tue, Oct 20, 2020 at 11:20 AM Jason Wang > > > > >> wrote: > > > > > > > > > On 2020/10/19 =E4=B8=8B=E5=8D=8810:56, Xie Yongji wrote: > > > > This series introduces a framework, which can be used to > > implement > > > > vDPA Devices in a userspace program. To implement it, the > work > > > > consist of two parts: control path emulating and data path > > > offloading. > > > > > > > > In the control path, the VDUSE driver will make use of > message > > > > mechnism to forward the actions (get/set features, get/st > > status, > > > > get/set config space and set virtqueue states) from > > virtio-vdpa > > > > driver to userspace. Userspace can use read()/write() to > > > > receive/reply to those control messages. > > > > > > > > In the data path, the VDUSE driver implements a MMU-based > > > > on-chip IOMMU driver which supports both direct mapping and > > > > indirect mapping with bounce buffer. Then userspace can > access > > > > those iova space via mmap(). Besides, eventfd mechnism is > > used to > > > > trigger interrupts and forward virtqueue kicks. > > > > > > > > > This is pretty interesting! > > > > > > For vhost-vdpa, it should work, but for virtio-vdpa, I think = we > > > should > > > carefully deal with the IOMMU/DMA ops stuffs. > > > > > > > > > I notice that neither dma_map nor set_map is implemented in > > > vduse_vdpa_config_ops, this means you want to let vhost-vDPA > > to deal > > > with IOMMU domains stuffs. Any reason for doing that? > > > > > > Actually, this series only focus on virtio-vdpa case now. To > > support > > > vhost-vdpa, as you said, we need to implement > > dma_map/dma_unmap. But > > > there is a limit that vm's memory can't be anonymous pages which > > are > > > forbidden in vm_insert_page(). Maybe we need to add some limits o= n > > > vhost-vdpa? > > > > > > I'm not sure I get this, any reason that you want to use > > vm_insert_page() to VM's memory. Or do you mean you want to impleme= nt > > some kind of zero-copy? > > > > > > > > If my understanding is right, we will have a QEMU (VM) process and a > > device emulation process in the vhost-vdpa case, right? When I/O > > happens, the virtio driver in VM will put the IOVA to vring and device > > emulation process will get the IOVA from vring. Then the device > > emulation process will translate the IOVA to its VA to access the dma > > buffer which resides in VM's memory. That means the device emulation > > process needs to access VM's memory, so we should use vm_insert_page() > > to build the page table of the device emulation process. > > > Ok, I get you now. So it looks to me the that the real issue is not the > limitation to anonymous page but see the comments above vm_insert_page(): > > " > > * The page has to be a nice clean _individual_ kernel allocation. > " > > So I suspect that using vm_insert_page() to share pages between > processes is legal. We need inputs from MM experts. > > Yes, vm_insert_page() can't be used in this case. So could we add the shmfd into the vhost iotlb msg and pass it to the device emulation process as a new iova_domain, just like vhost-user does. Thanks, Yongji > > > > > > I guess from the software device implemention in user space it > > only need > > to receive IOVA ranges and map them in its own address space. > > > > > > How to map them in its own address space if we don't use > vm_insert_page()? > > --00000000000028db3c05b24db831 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Tue, Oct 20, 2020 at 5:13 PM Jason= Wang <jasowang@redhat.com>= ; wrote:

On 2020/10/20 =E4=B8=8B=E5=8D=884:35, Yongji Xie wrote:
>
>
> On Tue, Oct 20, 2020 at 4:01 PM Jason Wang <jasowang@redhat.com
> <mailto:ja= sowang@redhat.com>> wrote:
>
>
>=C2=A0 =C2=A0 =C2=A0On 2020/10/20 =E4=B8=8B=E5=8D=883:39, Yongji Xie wr= ote:
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0> On Tue, Oct 20, 2020 at 11:20 AM Jason Wang &l= t;jasowang@redhat.= com
>=C2=A0 =C2=A0 =C2=A0<mailto:jasowang@redhat.com>
>=C2=A0 =C2=A0 =C2=A0> <mailto:jasowang@redhat.com <mailto:jasowang@redhat.com>>> = wrote:
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0On 2020/10/19 =E4=B8=8B=E5= =8D=8810:56, Xie Yongji wrote:
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> This series introduces= a framework, which can be used to
>=C2=A0 =C2=A0 =C2=A0implement
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> vDPA Devices in a user= space program. To implement it, the work
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> consist of two parts: = control path emulating and data path
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0offloading.
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> In the control path, t= he VDUSE driver will make use of message
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> mechnism to forward th= e actions (get/set features, get/st
>=C2=A0 =C2=A0 =C2=A0status,
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> get/set config space a= nd set virtqueue states) from
>=C2=A0 =C2=A0 =C2=A0virtio-vdpa
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> driver to userspace. U= serspace can use read()/write() to
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> receive/reply to those= control messages.
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> In the data path, the = VDUSE driver implements a MMU-based
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> on-chip IOMMU driver w= hich supports both direct mapping and
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> indirect mapping with = bounce buffer. Then userspace can access
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> those iova space via m= map(). Besides, eventfd mechnism is
>=C2=A0 =C2=A0 =C2=A0used to
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> trigger interrupts and= forward virtqueue kicks.
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0This is pretty interesting!=
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0For vhost-vdpa, it should w= ork, but for virtio-vdpa, I think we
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0should
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0carefully deal with the IOM= MU/DMA ops stuffs.
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0I notice that neither dma_m= ap nor set_map is implemented in
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0vduse_vdpa_config_ops, this= means you want to let vhost-vDPA
>=C2=A0 =C2=A0 =C2=A0to deal
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0with IOMMU domains stuffs.= =C2=A0 Any reason for doing that?
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0> Actually, this series only focus on virtio-vdp= a case now. To
>=C2=A0 =C2=A0 =C2=A0support
>=C2=A0 =C2=A0 =C2=A0> vhost-vdpa,=C2=A0 as you said, we need to impl= ement
>=C2=A0 =C2=A0 =C2=A0dma_map/dma_unmap. But
>=C2=A0 =C2=A0 =C2=A0> there is a limit that vm's memory can'= t be anonymous pages which
>=C2=A0 =C2=A0 =C2=A0are
>=C2=A0 =C2=A0 =C2=A0> forbidden in vm_insert_page(). Maybe we need t= o add some limits on
>=C2=A0 =C2=A0 =C2=A0> vhost-vdpa?
>
>
>=C2=A0 =C2=A0 =C2=A0I'm not sure I get this, any reason that you wa= nt to use
>=C2=A0 =C2=A0 =C2=A0vm_insert_page() to VM's memory. Or do you mean= you want to implement
>=C2=A0 =C2=A0 =C2=A0some kind of zero-copy?
>
>
>
> If my understanding is right, we will have a QEMU (VM) process and a <= br> > device emulation process in the vhost-vdpa case, right? When I/O
> happens, the virtio driver in VM will put the IOVA to vring and device=
> emulation process will get the IOVA from vring. Then the device
> emulation process will=C2=A0translate the IOVA to its VA to access the= dma
> buffer which resides in VM's memory. That means the device emulati= on
> process needs to access VM's=C2=A0memory, so we should use vm_inse= rt_page()
> to build the page table of the device emulation process.


Ok, I get you now. So it looks to me the that the real issue is not the limitation to anonymous page but see the comments above vm_insert_page():
"

=C2=A0=C2=A0* The page has to be a nice clean _individual_ kernel allocatio= n.
"

So I suspect that using vm_insert_page() to share pages between
processes is legal. We need inputs from MM experts.


Yes,=C2=A0 vm_insert_page() can't = be used in this case. So could we add the shmfd into=C2=A0the vhost iotlb m= sg and pass it to the device emulation process as a new iova_domain, just l= ike vhost-user does.

Thanks,
Yongji

=C2=A0


>
>=C2=A0 =C2=A0 =C2=A0I guess from the software device implemention in us= er space it
>=C2=A0 =C2=A0 =C2=A0only need
>=C2=A0 =C2=A0 =C2=A0to receive IOVA ranges and map them in its own addr= ess space.
>
>
> How to map them in its own address space if we don't use vm_insert= _page()?

--00000000000028db3c05b24db831--