From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 180D4C433DB for ; Fri, 25 Dec 2020 11:36:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A12DD23137 for ; Fri, 25 Dec 2020 11:36:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A12DD23137 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C61DE8D008D; Fri, 25 Dec 2020 06:36:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C11C88D0080; Fri, 25 Dec 2020 06:36:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD90C8D008D; Fri, 25 Dec 2020 06:36:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 95AD68D0080 for ; Fri, 25 Dec 2020 06:36:42 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 56C3D181AEF1E for ; Fri, 25 Dec 2020 11:36:42 +0000 (UTC) X-FDA: 77631602244.03.arm03_1c0686b27479 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 3485B28A4E8 for ; Fri, 25 Dec 2020 11:36:42 +0000 (UTC) X-HE-Tag: arm03_1c0686b27479 X-Filterd-Recvd-Size: 9633 Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Dec 2020 11:36:41 +0000 (UTC) Received: by mail-ed1-f54.google.com with SMTP id dk8so4069121edb.1 for ; Fri, 25 Dec 2020 03:36:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=dW0U27CQ5PW1ixet51Jw3cXT7HN2RV6YZF1yER1fAkk=; b=bP1qF27nPbQAJyxjeo2/LdyaHDUjjfA+i2j4NfbjaBRu45vb1JR6A0dR2/3VUsfZqP qyhuiBcWlp8cJMZe672468m9IfFb2puLOZ1RmA5ZpV5v3ouDorQZxqrdJFjBR+KwkqJ2 k4MBOGfo+dEB+SG8+JYvOjc1xuElJPrKH4TT4SV+EHb9uETKBpDAkdStgJ1pAtJrrQJN U0CydeZsnEl9IB6/DDlyqO8L58oaAH3Ld9C0J34VFkeAedS5NnepeFaWERhL3PQ3vq9v v7AkE0/uyDbOK8NAPHASAFb04Ipic29wbXY+L02DUfLDiGmayao5aPFG0VZUQL7Dk2ir xXjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=dW0U27CQ5PW1ixet51Jw3cXT7HN2RV6YZF1yER1fAkk=; b=nRLBpUr8WEmaNpAmY03X9TUBttB6S7pO9wCRrJvgutB7T4U4e/hedHvlkYgnanCqbF sr9mFAJgFhI9wqTSxepfqflJl31lqeH32VwWdqmvmNYIEZmemm+//X/3DTymj5TiRC3c nT/wVSyGWzyete677+Ukuyhw6V4+/2TFXp8EGYJVS8NvilKUWYlNwhcqHNF6BvLPvDD+ N5fmSYNddWw8uNsfZ2qf+7yVhddTsaaaKhz8YVx6AngB8wOeIHljqpNiPSEByVFhor4r QpzbEwfMohzVzEfZ4NBaak2JBGk+nOIG/w4KOW+f7/Wc/Hhbkoc7kPYoWp+z8y5fQcz/ zEYg== X-Gm-Message-State: AOAM531hl3L+/yrFNr3i4nwlqklJzQjktwLjE1PnqBghW0KGlmmYXBOY m9vB0ihg++QgQGsXJSsZlearb6d7YMo5IeN7Q8n4 X-Google-Smtp-Source: ABdhPJyfzVm0KQD3jJjRSeRPyWJpRRybRK+7ycfHra1KVC7YY/1Zl7tBpHlK54DG5AW8RaYtsXWSJpqzc1yIYXCPwj4= X-Received: by 2002:a50:f304:: with SMTP id p4mr30922018edm.118.1608896198476; Fri, 25 Dec 2020 03:36:38 -0800 (PST) MIME-Version: 1.0 References: <20201222145221.711-1-xieyongji@bytedance.com> <20201222145221.711-10-xieyongji@bytedance.com> <6818a214-d587-4f0b-7de6-13c4e7e94ab6@redhat.com> <595fe7d6-7876-26e4-0b7c-1d63ca6d7a97@redhat.com> In-Reply-To: From: Yongji Xie Date: Fri, 25 Dec 2020 19:36:28 +0800 Message-ID: Subject: Re: Re: [RFC v2 09/13] vduse: Add support for processing vhost iotlb message To: Jason Wang Cc: "Michael S. Tsirkin" , Stefan Hajnoczi , sgarzare@redhat.com, Parav Pandit , akpm@linux-foundation.org, Randy Dunlap , Matthew Wilcox , viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Dec 25, 2020 at 3:02 PM Jason Wang wrote: > > > On 2020/12/25 =E4=B8=8A=E5=8D=8810:37, Yongji Xie wrote: > > On Thu, Dec 24, 2020 at 3:37 PM Yongji Xie wr= ote: > >> On Thu, Dec 24, 2020 at 10:41 AM Jason Wang wrot= e: > >>> > >>> On 2020/12/23 =E4=B8=8B=E5=8D=888:14, Yongji Xie wrote: > >>>> On Wed, Dec 23, 2020 at 5:05 PM Jason Wang wro= te: > >>>>> On 2020/12/22 =E4=B8=8B=E5=8D=8810:52, Xie Yongji wrote: > >>>>>> To support vhost-vdpa bus driver, we need a way to share the > >>>>>> vhost-vdpa backend process's memory with the userspace VDUSE proce= ss. > >>>>>> > >>>>>> This patch tries to make use of the vhost iotlb message to achieve > >>>>>> that. We will get the shm file from the iotlb message and pass it > >>>>>> to the userspace VDUSE process. > >>>>>> > >>>>>> Signed-off-by: Xie Yongji > >>>>>> --- > >>>>>> Documentation/driver-api/vduse.rst | 15 +++- > >>>>>> drivers/vdpa/vdpa_user/vduse_dev.c | 147 +++++++++++++++++++++= +++++++++++++++- > >>>>>> include/uapi/linux/vduse.h | 11 +++ > >>>>>> 3 files changed, 171 insertions(+), 2 deletions(-) > >>>>>> > >>>>>> diff --git a/Documentation/driver-api/vduse.rst b/Documentation/dr= iver-api/vduse.rst > >>>>>> index 623f7b040ccf..48e4b1ba353f 100644 > >>>>>> --- a/Documentation/driver-api/vduse.rst > >>>>>> +++ b/Documentation/driver-api/vduse.rst > >>>>>> @@ -46,13 +46,26 @@ The following types of messages are provided b= y the VDUSE framework now: > >>>>>> > >>>>>> - VDUSE_GET_CONFIG: Read from device specific configuration sp= ace > >>>>>> > >>>>>> +- VDUSE_UPDATE_IOTLB: Update the memory mapping in device IOTLB > >>>>>> + > >>>>>> +- VDUSE_INVALIDATE_IOTLB: Invalidate the memory mapping in device= IOTLB > >>>>>> + > >>>>>> Please see include/linux/vdpa.h for details. > >>>>>> > >>>>>> -In the data path, VDUSE framework implements a MMU-based on-chip = IOMMU > >>>>>> +The data path of userspace vDPA device is implemented in differen= t ways > >>>>>> +depending on the vdpa bus to which it is attached. > >>>>>> + > >>>>>> +In virtio-vdpa case, VDUSE framework implements a MMU-based on-ch= ip IOMMU > >>>>>> driver which supports mapping the kernel dma buffer to a users= pace iova > >>>>>> region dynamically. The userspace iova region can be created b= y passing > >>>>>> the userspace vDPA device fd to mmap(2). > >>>>>> > >>>>>> +In vhost-vdpa case, the dma buffer is reside in a userspace memor= y region > >>>>>> +which will be shared to the VDUSE userspace processs via the file > >>>>>> +descriptor in VDUSE_UPDATE_IOTLB message. And the corresponding a= ddress > >>>>>> +mapping (IOVA of dma buffer <-> VA of the memory region) is also = included > >>>>>> +in this message. > >>>>>> + > >>>>>> Besides, the eventfd mechanism is used to trigger interrupt ca= llbacks and > >>>>>> receive virtqueue kicks in userspace. The following ioctls on = the userspace > >>>>>> vDPA device fd are provided to support that: > >>>>>> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdp= a_user/vduse_dev.c > >>>>>> index b974333ed4e9..d24aaacb6008 100644 > >>>>>> --- a/drivers/vdpa/vdpa_user/vduse_dev.c > >>>>>> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > >>>>>> @@ -34,6 +34,7 @@ > >>>>>> > >>>>>> struct vduse_dev_msg { > >>>>>> struct vduse_dev_request req; > >>>>>> + struct file *iotlb_file; > >>>>>> struct vduse_dev_response resp; > >>>>>> struct list_head list; > >>>>>> wait_queue_head_t waitq; > >>>>>> @@ -325,12 +326,80 @@ static int vduse_dev_set_vq_state(struct vdu= se_dev *dev, > >>>>>> return ret; > >>>>>> } > >>>>>> > >>>>>> +static int vduse_dev_update_iotlb(struct vduse_dev *dev, struct f= ile *file, > >>>>>> + u64 offset, u64 iova, u64 size, u8 p= erm) > >>>>>> +{ > >>>>>> + struct vduse_dev_msg *msg; > >>>>>> + int ret; > >>>>>> + > >>>>>> + if (!size) > >>>>>> + return -EINVAL; > >>>>>> + > >>>>>> + msg =3D vduse_dev_new_msg(dev, VDUSE_UPDATE_IOTLB); > >>>>>> + msg->req.size =3D sizeof(struct vduse_iotlb); > >>>>>> + msg->req.iotlb.offset =3D offset; > >>>>>> + msg->req.iotlb.iova =3D iova; > >>>>>> + msg->req.iotlb.size =3D size; > >>>>>> + msg->req.iotlb.perm =3D perm; > >>>>>> + msg->req.iotlb.fd =3D -1; > >>>>>> + msg->iotlb_file =3D get_file(file); > >>>>>> + > >>>>>> + ret =3D vduse_dev_msg_sync(dev, msg); > >>>>> My feeling is that we should provide consistent API for the userspa= ce > >>>>> device to use. > >>>>> > >>>>> E.g we'd better carry the IOTLB message for both virtio/vhost drive= rs. > >>>>> > >>>>> It looks to me for virtio drivers we can still use UPDAT_IOTLB mess= age > >>>>> by using VDUSE file as msg->iotlb_file here. > >>>>> > >>>> It's OK for me. One problem is when to transfer the UPDATE_IOTLB > >>>> message in virtio cases. > >>> > >>> Instead of generating IOTLB messages for userspace. > >>> > >>> How about record the mappings (which is a common case for device have > >>> on-chip IOMMU e.g mlx5e and vdpa simlator), then we can introduce ioc= tl > >>> for userspace to query? > >>> > >> If so, the IOTLB UPDATE is actually triggered by ioctl, but > >> IOTLB_INVALIDATE is triggered by the message. Is it a little odd? Or > >> how about trigger it when userspace call mmap() on the device fd? > >> > > Oh sorry, looks like mmap() needs to be called in IOTLB UPDATE message > > handler. Is it possible for the vdpa device to know which vdpa bus it > > is attached to? > > > We'd better not. It's kind of layer violation. > OK. Now I think both ioctl and message are needed. The ioctl is useful when VDUSE userspace daemon reboot. And the IOTLB_UPDATE message could be generated during the first DMA mapping in the virtio-vdpa case. Thanks, Yongji