From: Zhen Lei <thunder.leizhen@huawei.com> To: Joerg Roedel <joro@8bytes.org>, iommu <iommu@lists.linux-foundation.org>, Robin Murphy <robin.murphy@arm.com>, David Woodhouse <dwmw2@infradead.org>, Sudeep Dutt <sudeep.dutt@intel.com>, Ashutosh Dixit <ashutosh.dixit@intel.com>, linux-kernel <linux-kernel@vger.kernel.org> Cc: Zefan Li <lizefan@huawei.com>, Xinwei Hu <huxinwei@huawei.com>, "Tianhong Ding" <dingtianhong@huawei.com>, Hanjun Guo <guohanjun@huawei.com>, Zhen Lei <thunder.leizhen@huawei.com> Subject: [PATCH 0/7] iommu/iova: improve the allocation performance of dma64 Date: Wed, 22 Mar 2017 14:27:40 +0800 [thread overview] Message-ID: <1490164067-12552-1-git-send-email-thunder.leizhen@huawei.com> (raw) 64 bits devices is very common now. But currently we only defined a cached32_node to optimize the allocation performance of dma32, and I saw some dma64 drivers chose to allocate iova from dma32 space first, maybe becuase of current dma64 performance problem or some other reasons. For example:(in drivers/iommu/amd_iommu.c) static unsigned long dma_ops_alloc_iova(...... { ...... if (dma_mask > DMA_BIT_MASK(32)) pfn = alloc_iova_fast(&dma_dom->iovad, pages, IOVA_PFN(DMA_BIT_MASK(32))); if (!pfn) pfn = alloc_iova_fast(&dma_dom->iovad, pages, IOVA_PFN(dma_mask)); For the details of why dma64 iova allocation performance is very bad, please refer the description of patch-5. In this patch series, I added a cached64_node to manage the dma64 iova space(iova>=4G), it takes the same effect as cached32_node(iova<4G). Below it's the performance data before and after my patch series: (before)$ iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35898 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.2 sec 7.88 MBytes 6.48 Mbits/sec [ 5] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35900 [ 5] 0.0-10.3 sec 7.88 MBytes 6.43 Mbits/sec [ 4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35902 [ 4] 0.0-10.3 sec 7.88 MBytes 6.43 Mbits/sec (after)$ iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36330 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 1.09 GBytes 933 Mbits/sec [ 5] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36332 [ 5] 0.0-10.0 sec 1.10 GBytes 939 Mbits/sec [ 4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36334 [ 4] 0.0-10.0 sec 1.10 GBytes 938 Mbits/sec Zhen Lei (7): iommu/iova: fix incorrect variable types iommu/iova: cut down judgement times iommu/iova: insert start_pfn boundary of dma32 iommu/iova: adjust __cached_rbnode_insert_update iommu/iova: to optimize the allocation performance of dma64 iommu/iova: move the caculation of pad mask out of loop iommu/iova: fix iovad->dma_32bit_pfn as the last pfn of dma32 drivers/iommu/amd_iommu.c | 7 +- drivers/iommu/dma-iommu.c | 22 ++---- drivers/iommu/intel-iommu.c | 11 +-- drivers/iommu/iova.c | 143 +++++++++++++++++++++------------------ drivers/misc/mic/scif/scif_rma.c | 3 +- include/linux/iova.h | 7 +- 6 files changed, 94 insertions(+), 99 deletions(-) -- 2.5.0
WARNING: multiple messages have this Message-ID (diff)
From: Zhen Lei <thunder.leizhen-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> To: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>, iommu <iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>, Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>, David Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>, Sudeep Dutt <sudeep.dutt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, Ashutosh Dixit <ashutosh.dixit-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, linux-kernel <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> Cc: Xinwei Hu <huxinwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>, Zhen Lei <thunder.leizhen-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>, Zefan Li <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>, Hanjun Guo <guohanjun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>, Tianhong Ding <dingtianhong-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> Subject: [PATCH 0/7] iommu/iova: improve the allocation performance of dma64 Date: Wed, 22 Mar 2017 14:27:40 +0800 [thread overview] Message-ID: <1490164067-12552-1-git-send-email-thunder.leizhen@huawei.com> (raw) 64 bits devices is very common now. But currently we only defined a cached32_node to optimize the allocation performance of dma32, and I saw some dma64 drivers chose to allocate iova from dma32 space first, maybe becuase of current dma64 performance problem or some other reasons. For example:(in drivers/iommu/amd_iommu.c) static unsigned long dma_ops_alloc_iova(...... { ...... if (dma_mask > DMA_BIT_MASK(32)) pfn = alloc_iova_fast(&dma_dom->iovad, pages, IOVA_PFN(DMA_BIT_MASK(32))); if (!pfn) pfn = alloc_iova_fast(&dma_dom->iovad, pages, IOVA_PFN(dma_mask)); For the details of why dma64 iova allocation performance is very bad, please refer the description of patch-5. In this patch series, I added a cached64_node to manage the dma64 iova space(iova>=4G), it takes the same effect as cached32_node(iova<4G). Below it's the performance data before and after my patch series: (before)$ iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35898 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.2 sec 7.88 MBytes 6.48 Mbits/sec [ 5] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35900 [ 5] 0.0-10.3 sec 7.88 MBytes 6.43 Mbits/sec [ 4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35902 [ 4] 0.0-10.3 sec 7.88 MBytes 6.43 Mbits/sec (after)$ iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36330 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 1.09 GBytes 933 Mbits/sec [ 5] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36332 [ 5] 0.0-10.0 sec 1.10 GBytes 939 Mbits/sec [ 4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36334 [ 4] 0.0-10.0 sec 1.10 GBytes 938 Mbits/sec Zhen Lei (7): iommu/iova: fix incorrect variable types iommu/iova: cut down judgement times iommu/iova: insert start_pfn boundary of dma32 iommu/iova: adjust __cached_rbnode_insert_update iommu/iova: to optimize the allocation performance of dma64 iommu/iova: move the caculation of pad mask out of loop iommu/iova: fix iovad->dma_32bit_pfn as the last pfn of dma32 drivers/iommu/amd_iommu.c | 7 +- drivers/iommu/dma-iommu.c | 22 ++---- drivers/iommu/intel-iommu.c | 11 +-- drivers/iommu/iova.c | 143 +++++++++++++++++++++------------------ drivers/misc/mic/scif/scif_rma.c | 3 +- include/linux/iova.h | 7 +- 6 files changed, 94 insertions(+), 99 deletions(-) -- 2.5.0
next reply other threads:[~2017-03-22 6:29 UTC|newest] Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-03-22 6:27 Zhen Lei [this message] 2017-03-22 6:27 ` [PATCH 0/7] iommu/iova: improve the allocation performance of dma64 Zhen Lei 2017-03-22 6:27 ` [PATCH 1/7] iommu/iova: fix incorrect variable types Zhen Lei 2017-03-22 6:27 ` Zhen Lei 2017-03-23 11:42 ` Robin Murphy 2017-03-24 2:27 ` Leizhen (ThunderTown) 2017-03-24 2:27 ` Leizhen (ThunderTown) 2017-03-31 3:30 ` Leizhen (ThunderTown) 2017-03-31 3:30 ` Leizhen (ThunderTown) 2017-03-22 6:27 ` [PATCH 2/7] iommu/iova: cut down judgement times Zhen Lei 2017-03-22 6:27 ` Zhen Lei 2017-03-23 12:11 ` Robin Murphy 2017-03-23 12:11 ` Robin Murphy 2017-03-31 3:55 ` Leizhen (ThunderTown) 2017-03-31 3:55 ` Leizhen (ThunderTown) 2017-03-22 6:27 ` [PATCH 3/7] iommu/iova: insert start_pfn boundary of dma32 Zhen Lei 2017-03-22 6:27 ` Zhen Lei 2017-03-23 13:01 ` Robin Murphy 2017-03-23 13:01 ` Robin Murphy 2017-03-24 3:43 ` Leizhen (ThunderTown) 2017-03-24 3:43 ` Leizhen (ThunderTown) 2017-03-31 3:32 ` Leizhen (ThunderTown) 2017-03-31 3:32 ` Leizhen (ThunderTown) 2017-03-22 6:27 ` [PATCH 4/7] iommu/iova: adjust __cached_rbnode_insert_update Zhen Lei 2017-03-22 6:27 ` Zhen Lei 2017-03-22 6:27 ` [PATCH 5/7] iommu/iova: to optimize the allocation performance of dma64 Zhen Lei 2017-03-22 6:27 ` Zhen Lei 2017-03-22 6:27 ` [PATCH 6/7] iommu/iova: move the caculation of pad mask out of loop Zhen Lei 2017-03-22 6:27 ` Zhen Lei 2017-03-22 6:27 ` [PATCH 7/7] iommu/iova: fix iovad->dma_32bit_pfn as the last pfn of dma32 Zhen Lei 2017-03-22 6:27 ` Zhen Lei
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1490164067-12552-1-git-send-email-thunder.leizhen@huawei.com \ --to=thunder.leizhen@huawei.com \ --cc=ashutosh.dixit@intel.com \ --cc=dingtianhong@huawei.com \ --cc=dwmw2@infradead.org \ --cc=guohanjun@huawei.com \ --cc=huxinwei@huawei.com \ --cc=iommu@lists.linux-foundation.org \ --cc=joro@8bytes.org \ --cc=linux-kernel@vger.kernel.org \ --cc=lizefan@huawei.com \ --cc=robin.murphy@arm.com \ --cc=sudeep.dutt@intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.