From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758785AbdCVG3f (ORCPT ); Wed, 22 Mar 2017 02:29:35 -0400 Received: from szxga02-in.huawei.com ([45.249.212.188]:4347 "EHLO dggrg02-dlp.huawei.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1757828AbdCVG32 (ORCPT ); Wed, 22 Mar 2017 02:29:28 -0400 From: Zhen Lei To: Joerg Roedel , iommu , Robin Murphy , David Woodhouse , Sudeep Dutt , Ashutosh Dixit , linux-kernel CC: Zefan Li , Xinwei Hu , "Tianhong Ding" , Hanjun Guo , Zhen Lei Subject: [PATCH 0/7] iommu/iova: improve the allocation performance of dma64 Date: Wed, 22 Mar 2017 14:27:40 +0800 Message-ID: <1490164067-12552-1-git-send-email-thunder.leizhen@huawei.com> X-Mailer: git-send-email 1.9.5.msysgit.0 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.177.23.164] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A0B0206.58D219C3.08CC,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 91c26b95fc4d57b9742e25e4682dd0ca Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 64 bits devices is very common now. But currently we only defined a cached32_node to optimize the allocation performance of dma32, and I saw some dma64 drivers chose to allocate iova from dma32 space first, maybe becuase of current dma64 performance problem or some other reasons. For example:(in drivers/iommu/amd_iommu.c) static unsigned long dma_ops_alloc_iova(...... { ...... if (dma_mask > DMA_BIT_MASK(32)) pfn = alloc_iova_fast(&dma_dom->iovad, pages, IOVA_PFN(DMA_BIT_MASK(32))); if (!pfn) pfn = alloc_iova_fast(&dma_dom->iovad, pages, IOVA_PFN(dma_mask)); For the details of why dma64 iova allocation performance is very bad, please refer the description of patch-5. In this patch series, I added a cached64_node to manage the dma64 iova space(iova>=4G), it takes the same effect as cached32_node(iova<4G). Below it's the performance data before and after my patch series: (before)$ iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35898 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.2 sec 7.88 MBytes 6.48 Mbits/sec [ 5] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35900 [ 5] 0.0-10.3 sec 7.88 MBytes 6.43 Mbits/sec [ 4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35902 [ 4] 0.0-10.3 sec 7.88 MBytes 6.43 Mbits/sec (after)$ iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36330 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 1.09 GBytes 933 Mbits/sec [ 5] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36332 [ 5] 0.0-10.0 sec 1.10 GBytes 939 Mbits/sec [ 4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36334 [ 4] 0.0-10.0 sec 1.10 GBytes 938 Mbits/sec Zhen Lei (7): iommu/iova: fix incorrect variable types iommu/iova: cut down judgement times iommu/iova: insert start_pfn boundary of dma32 iommu/iova: adjust __cached_rbnode_insert_update iommu/iova: to optimize the allocation performance of dma64 iommu/iova: move the caculation of pad mask out of loop iommu/iova: fix iovad->dma_32bit_pfn as the last pfn of dma32 drivers/iommu/amd_iommu.c | 7 +- drivers/iommu/dma-iommu.c | 22 ++---- drivers/iommu/intel-iommu.c | 11 +-- drivers/iommu/iova.c | 143 +++++++++++++++++++++------------------ drivers/misc/mic/scif/scif_rma.c | 3 +- include/linux/iova.h | 7 +- 6 files changed, 94 insertions(+), 99 deletions(-) -- 2.5.0