From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0662BC388F9 for ; Fri, 23 Oct 2020 08:45:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 27B152463C for ; Fri, 23 Oct 2020 08:45:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="e0PQ1OW8" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 27B152463C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E9F716B005D; Fri, 23 Oct 2020 04:45:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E2A136B0062; Fri, 23 Oct 2020 04:45:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF0136B0068; Fri, 23 Oct 2020 04:45:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0130.hostedemail.com [216.40.44.130]) by kanga.kvack.org (Postfix) with ESMTP id 9ACED6B005D for ; Fri, 23 Oct 2020 04:45:02 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2824B181AC9CB for ; Fri, 23 Oct 2020 08:45:02 +0000 (UTC) X-FDA: 77402555244.08.ship89_470f6a027258 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 0569F1819E773 for ; Fri, 23 Oct 2020 08:45:02 +0000 (UTC) X-HE-Tag: ship89_470f6a027258 X-Filterd-Recvd-Size: 8395 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Fri, 23 Oct 2020 08:45:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603442700; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bDiRYrmNuAeuXyrzGZ2HO+vecWU+41yPfoW9TUEarfQ=; b=e0PQ1OW85qWzVkwdAnIdiX0+IgNUnjWoDc1JdNeGxFDQgM2A5cHQSXlVpT44vqOi7vdl+L KzumE3hnH9QvTDqcK5xML1QuwsvfbYGUz8Juiq0cHpEk1SJu4O0gABcFuI38GyTzYCeZzd gZ5/MEpoHr1Z10vkWeIydIZL+yEULHM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-182-l0-M5mPbM7mJs9tkh04YAg-1; Fri, 23 Oct 2020 04:44:58 -0400 X-MC-Unique: l0-M5mPbM7mJs9tkh04YAg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AF2FF84BA68; Fri, 23 Oct 2020 08:44:57 +0000 (UTC) Received: from [10.72.13.85] (ovpn-13-85.pek2.redhat.com [10.72.13.85]) by smtp.corp.redhat.com (Postfix) with ESMTP id D885650B44; Fri, 23 Oct 2020 08:44:47 +0000 (UTC) Subject: Re: [External] Re: [RFC 0/4] Introduce VDUSE - vDPA Device in Userspace To: Yongji Xie Cc: "Michael S. Tsirkin" , akpm@linux-foundation.org, linux-mm@kvack.org, virtualization@lists.linux-foundation.org References: <20201019145623.671-1-xieyongji@bytedance.com> <6cff5900-42ee-a0f5-0d5f-9383646c27d9@redhat.com> From: Jason Wang Message-ID: <427448f0-58ba-0730-d199-6c8cd818ea63@redhat.com> Date: Fri, 23 Oct 2020 16:44:45 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=jasowang@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2020/10/23 =E4=B8=8A=E5=8D=8810:55, Yongji Xie wrote: > > > On Tue, Oct 20, 2020 at 5:13 PM Jason Wang > wrote: > > > On 2020/10/20 =E4=B8=8B=E5=8D=884:35, Yongji Xie wrote: > > > > > > On Tue, Oct 20, 2020 at 4:01 PM Jason Wang > > >> wrote: > > > > > >=C2=A0 =C2=A0 =C2=A0On 2020/10/20 =E4=B8=8B=E5=8D=883:39, Yongji X= ie wrote: > >=C2=A0 =C2=A0 =C2=A0> > >=C2=A0 =C2=A0 =C2=A0> > >=C2=A0 =C2=A0 =C2=A0> On Tue, Oct 20, 2020 at 11:20 AM Jason Wang > > >=C2=A0 =C2=A0 =C2=A0> > >=C2=A0 =C2=A0 =C2=A0> > >>> wrote: > >=C2=A0 =C2=A0 =C2=A0> > >=C2=A0 =C2=A0 =C2=A0> > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0On 2020/10/19 =E4=B8=8B=E5= =8D=8810:56, Xie Yongji wrote: > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> This series introduces = a framework, which can be used to > >=C2=A0 =C2=A0 =C2=A0implement > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> vDPA Devices in a users= pace program. To implement > it, the work > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> consist of two parts: c= ontrol path emulating and > data path > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0offloading. > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> In the control path, th= e VDUSE driver will make use > of message > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> mechnism to forward the= actions (get/set features, > get/st > >=C2=A0 =C2=A0 =C2=A0status, > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> get/set config space an= d set virtqueue states) from > >=C2=A0 =C2=A0 =C2=A0virtio-vdpa > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> driver to userspace. Us= erspace can use read()/write() to > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> receive/reply to those = control messages. > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> In the data path, the V= DUSE driver implements a > MMU-based > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> on-chip IOMMU driver wh= ich supports both direct > mapping and > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> indirect mapping with b= ounce buffer. Then userspace > can access > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> those iova space via mm= ap(). Besides, eventfd > mechnism is > >=C2=A0 =C2=A0 =C2=A0used to > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> trigger interrupts and = forward virtqueue kicks. > >=C2=A0 =C2=A0 =C2=A0> > >=C2=A0 =C2=A0 =C2=A0> > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0This is pretty interestin= g! > >=C2=A0 =C2=A0 =C2=A0> > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0For vhost-vdpa, it should= work, but for virtio-vdpa, I > think we > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0should > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0carefully deal with the I= OMMU/DMA ops stuffs. > >=C2=A0 =C2=A0 =C2=A0> > >=C2=A0 =C2=A0 =C2=A0> > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0I notice that neither dma= _map nor set_map is > implemented in > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0vduse_vdpa_config_ops, th= is means you want to let > vhost-vDPA > >=C2=A0 =C2=A0 =C2=A0to deal > >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0with IOMMU domains stuffs= .=C2=A0 Any reason for doing that? > >=C2=A0 =C2=A0 =C2=A0> > >=C2=A0 =C2=A0 =C2=A0> Actually, this series only focus on virtio-v= dpa case now. To > >=C2=A0 =C2=A0 =C2=A0support > >=C2=A0 =C2=A0 =C2=A0> vhost-vdpa,=C2=A0 as you said, we need to im= plement > >=C2=A0 =C2=A0 =C2=A0dma_map/dma_unmap. But > >=C2=A0 =C2=A0 =C2=A0> there is a limit that vm's memory can't be a= nonymous pages > which > >=C2=A0 =C2=A0 =C2=A0are > >=C2=A0 =C2=A0 =C2=A0> forbidden in vm_insert_page(). Maybe we need= to add some > limits on > >=C2=A0 =C2=A0 =C2=A0> vhost-vdpa? > > > > > >=C2=A0 =C2=A0 =C2=A0I'm not sure I get this, any reason that you w= ant to use > >=C2=A0 =C2=A0 =C2=A0vm_insert_page() to VM's memory. Or do you mea= n you want to > implement > >=C2=A0 =C2=A0 =C2=A0some kind of zero-copy? > > > > > > > > If my understanding is right, we will have a QEMU (VM) process > and a > > device emulation process in the vhost-vdpa case, right? When I/O > > happens, the virtio driver in VM will put the IOVA to vring and > device > > emulation process will get the IOVA from vring. Then the device > > emulation process will=C2=A0translate the IOVA to its VA to acces= s > the dma > > buffer which resides in VM's memory. That means the device > emulation > > process needs to access VM's=C2=A0memory, so we should use > vm_insert_page() > > to build the page table of the device emulation process. > > > Ok, I get you now. So it looks to me the that the real issue is > not the > limitation to anonymous page but see the comments above > vm_insert_page(): > > " > > =C2=A0=C2=A0* The page has to be a nice clean _individual_ kernel a= llocation. > " > > So I suspect that using vm_insert_page() to share pages between > processes is legal. We need inputs from MM experts. > > > Yes,=C2=A0 vm_insert_page() can't be used in this case. So could we add= the=20 > shmfd into=C2=A0the vhost iotlb msg and pass it to the device emulation= =20 > process as a new iova_domain, just like vhost-user does. > > Thanks, > Yongji I think vhost-user did that via SET_MEM_TABLE which is not supported by=20 vDPA. Note that the current IOTLB message will be used when vIOMMU is=20 enabled. This needs more thought. Will come back if I had any thought. Thanks > > > > > > > >=C2=A0 =C2=A0 =C2=A0I guess from the software device implemention = in user space it > >=C2=A0 =C2=A0 =C2=A0only need > >=C2=A0 =C2=A0 =C2=A0to receive IOVA ranges and map them in its own= address space. > > > > > > How to map them in its own address space if we don't use > vm_insert_page()? >