From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1B75C433DB for ; Fri, 26 Mar 2021 06:16:33 +0000 (UTC) Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4FFCE61A0A for ; Fri, 26 Mar 2021 06:16:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4FFCE61A0A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=virtualization-bounces@lists.linux-foundation.org Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 0C84E4023B; Fri, 26 Mar 2021 06:16:33 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qnaJOXf_tYJg; Fri, 26 Mar 2021 06:16:32 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp2.osuosl.org (Postfix) with ESMTP id 8021A40239; Fri, 26 Mar 2021 06:16:31 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 58D09C000E; Fri, 26 Mar 2021 06:16:31 +0000 (UTC) Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 15DFEC000A for ; Fri, 26 Mar 2021 06:16:30 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id EBFD940239 for ; Fri, 26 Mar 2021 06:16:29 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KUl5fQn_ks-c for ; Fri, 26 Mar 2021 06:16:29 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp2.osuosl.org (Postfix) with ESMTPS id 01D4640237 for ; Fri, 26 Mar 2021 06:16:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616739387; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ffQsViwRS8lsOmvgKdN9Ae0D/qYLHQxwsKKbzi8qrUA=; b=YJRxlw41CNQRb1dB+FKWI/O6qy7qBhhBDAwPsTElCPoNCvJbIcU4+u21TtEsDllQ08YJth dAsR0SmkWUliiYRvRKSPTVwNZhtuknTzYZnpKwFtnSbsq8ctgz8Jxr0tKUGTa012zrserq SsqSzOsxEY6v4v/3RgJRzp2Bg0uqDoQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-352-8tD4liG2Mxu6ykwHgVwUrg-1; Fri, 26 Mar 2021 02:16:23 -0400 X-MC-Unique: 8tD4liG2Mxu6ykwHgVwUrg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1E612107ACCD; Fri, 26 Mar 2021 06:16:20 +0000 (UTC) Received: from wangxiaodeMacBook-Air.local (ovpn-13-10.pek2.redhat.com [10.72.13.10]) by smtp.corp.redhat.com (Postfix) with ESMTP id 538E31962F; Fri, 26 Mar 2021 06:16:07 +0000 (UTC) Subject: Re: [PATCH v5 08/11] vduse: Implement an MMU-based IOMMU driver To: Yongji Xie References: <20210315053721.189-1-xieyongji@bytedance.com> <20210315053721.189-9-xieyongji@bytedance.com> <7c90754b-681d-f3bf-514c-756abfcf3d23@redhat.com> <2db71996-037e-494d-6ef0-de3ff164d3c3@redhat.com> From: Jason Wang Message-ID: <75e3b941-dfd2-ebbc-d752-8f25c1f14cab@redhat.com> Date: Fri, 26 Mar 2021 14:16:05 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Cc: Jens Axboe , Jonathan Corbet , kvm@vger.kernel.org, "Michael S. Tsirkin" , netdev@vger.kernel.org, Randy Dunlap , Matthew Wilcox , virtualization@lists.linux-foundation.org, Christoph Hellwig , Bob Liu , bcrl@kvack.org, viro@zeniv.linux.org.uk, Stefan Hajnoczi , linux-fsdevel@vger.kernel.org, Dan Carpenter , =?UTF-8?Q?Mika_Penttil=c3=a4?= X-BeenThere: virtualization@lists.linux-foundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Linux virtualization List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============1547225132256714277==" Errors-To: virtualization-bounces@lists.linux-foundation.org Sender: "Virtualization" This is a multi-part message in MIME format. --===============1547225132256714277== Content-Type: multipart/alternative; boundary="------------13E3C8F206FE43DCBADD1197" This is a multi-part message in MIME format. --------------13E3C8F206FE43DCBADD1197 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit 在 2021/3/26 下午1:14, Yongji Xie 写道: >>>>>>> + } >>>>>>> + map->bounce_page = page; >>>>>>> + >>>>>>> + /* paired with vduse_domain_map_page() */ >>>>>>> + smp_mb(); >>>>>> So this is suspicious. It's better to explain like, we need make sure A >>>>>> must be done after B. >>>>> OK. I see. It's used to protect this pattern: >>>>> >>>>> vduse_domain_alloc_bounce_page: vduse_domain_map_page: >>>>> write map->bounce_page write map->orig_phys >>>>> mb() mb() >>>>> read map->orig_phys read map->bounce_page >>>>> >>>>> Make sure there will always be a path to do bouncing. >>>> Ok. >>>> >>>> >>>>>> And it looks to me the iotlb_lock is sufficnet to do the synchronization >>>>>> here. E.g any reason that you don't take it in >>>>>> vduse_domain_map_bounce_page(). >>>>>> >>>>> Yes, we can. But the performance in multi-queue cases will go down if >>>>> we use iotlb_lock on this critical path. >>>>> >>>>>> And what's more, is there anyway to aovid holding the spinlock during >>>>>> bouncing? >>>>>> >>>>> Looks like we can't. In the case that multiple page faults happen on >>>>> the same page, we should make sure the bouncing is done before any >>>>> page fault handler returns. >>>> So it looks to me all those extra complexitiy comes from the fact that >>>> the bounce_page and orig_phys are set by different places so we need to >>>> do the bouncing in two places. >>>> >>>> I wonder how much we can gain from the "lazy" boucning in page fault. >>>> The buffer mapped via dma_ops from virtio driver is expected to be >>>> accessed by the userspace soon. It looks to me we can do all those >>>> stuffs during dma_map() then things would be greatly simplified. >>>> >>> If so, we need to allocate lots of pages from the pool reserved for >>> atomic memory allocation requests. >> This should be fine, a lot of drivers tries to allocate pages in atomic >> context. The point is to simplify the codes to make it easy to >> determince the correctness so we can add optimization on top simply by >> benchmarking the difference. >> > OK. I will use this way in the next version. > >> E.g we have serveral places that accesses orig_phys: >> >> 1) map_page(), write >> 2) unmap_page(), write >> 3) page fault handler, read >> >> It's not clear to me how they were synchronized. Or if it was >> synchronzied implicitly (via iova allocator?), we'd better document it. > Yes. > >> Or simply use spinlock (which is the preferrable way I'd like to go). We >> probably don't need to worry too much about the cost of spinlock since >> iova allocater use it heavily. >> > Actually iova allocator implements a per-CPU cache to optimize it. > > Thanks, > Yongji Right, but have a quick glance, I guess what you meant is that usually there's no lock contention unless cpu hot-plug. This can work but the problem is that such synchornization depends on the internal implementation of IOVA allocator which is kind of fragile. I still think we should do that on our own. Thanks --------------13E3C8F206FE43DCBADD1197 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit


在 2021/3/26 下午1:14, Yongji Xie 写道:
+     }
+     map->bounce_page = page;
+
+     /* paired with vduse_domain_map_page() */
+     smp_mb();
So this is suspicious. It's better to explain like, we need make sure A
must be done after B.
OK. I see. It's used to protect this pattern:

     vduse_domain_alloc_bounce_page:          vduse_domain_map_page:
     write map->bounce_page                           write map->orig_phys
     mb()                                                            mb()
     read map->orig_phys                                 read map->bounce_page

Make sure there will always be a path to do bouncing.
Ok.


And it looks to me the iotlb_lock is sufficnet to do the synchronization
here. E.g any reason that you don't take it in
vduse_domain_map_bounce_page().

Yes, we can. But the performance in multi-queue cases will go down if
we use iotlb_lock on this critical path.

And what's more, is there anyway to aovid holding the spinlock during
bouncing?

Looks like we can't. In the case that multiple page faults happen on
the same page, we should make sure the bouncing is done before any
page fault handler returns.
So it looks to me all those extra complexitiy comes from the fact that
the bounce_page and orig_phys are set by different places so we need to
do the bouncing in two places.

I wonder how much we can gain from the "lazy" boucning in page fault.
The buffer mapped via dma_ops from virtio driver is expected to be
accessed by the userspace soon.  It looks to me we can do all those
stuffs during dma_map() then things would be greatly simplified.

If so, we need to allocate lots of pages from the pool reserved for
atomic memory allocation requests.
This should be fine, a lot of drivers tries to allocate pages in atomic
context. The point is to simplify the codes to make it easy to
determince the correctness so we can add optimization on top simply by
benchmarking the difference.

OK. I will use this way in the next version.

E.g we have serveral places that accesses orig_phys:

1) map_page(), write
2) unmap_page(), write
3) page fault handler, read

It's not clear to me how they were synchronized. Or if it was
synchronzied implicitly (via iova allocator?), we'd better document it.
Yes.

Or simply use spinlock (which is the preferrable way I'd like to go). We
probably don't need to worry too much about the cost of spinlock since
iova allocater use it heavily.

Actually iova allocator implements a per-CPU cache to optimize it.

Thanks,
Yongji


Right, but have a quick glance, I guess what you meant is that usually there's no lock contention unless cpu hot-plug. This can work but the problem is that such synchornization depends on the internal implementation of IOVA allocator which is kind of fragile. I still think we should do that on our own.

Thanks



    
--------------13E3C8F206FE43DCBADD1197-- --===============1547225132256714277== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization --===============1547225132256714277==--