From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751214AbdFESDY (ORCPT ); Mon, 5 Jun 2017 14:03:24 -0400 Received: from mail-qt0-f182.google.com ([209.85.216.182]:35036 "EHLO mail-qt0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751162AbdFESDX (ORCPT ); Mon, 5 Jun 2017 14:03:23 -0400 Subject: Re: Device address specific mapping of arm,mmu-500 From: Ray Jui To: Will Deacon Cc: Marc Zyngier , Robin Murphy , Mark Rutland , Joerg Roedel , linux-arm-kernel@lists.infradead.org, iommu@lists.linux-foundation.org, "linux-kernel@vger.kernel.org" References: <1b79efe2-6835-7a7a-f5ad-361391a7b967@broadcom.com> <20170530151437.GC23067@arm.com> <81637642-22d9-4868-156f-052f64bd042f@broadcom.com> <226bcebc-3902-90d3-24e5-51f2e1f3affb@arm.com> <16e5fc9d-b014-af7c-dcda-527522ac5cc9@arm.com> <7bd03bf8-71d5-a974-bea2-a38b4349c547@broadcom.com> <20170531124418.GE9723@arm.com> Message-ID: <498100e8-e94e-4a65-a9e1-ae59bd59fe2d@broadcom.com> Date: Mon, 5 Jun 2017 11:03:15 -0700 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) Gecko/20100101 Thunderbird/54.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Will/Robin, Just want to check with you on this again. Do you have a very rough timeline on when the excessive locking in the IOMMU driver may be fixed (so we can restore expected up to 95% performance)? Thanks, Ray On 5/31/17 10:32 AM, Ray Jui wrote: > Hi Will, > > On 5/31/17 5:44 AM, Will Deacon wrote: >> On Tue, May 30, 2017 at 11:13:36PM -0700, Ray Jui wrote: >>> I did a little more digging myself and I think I now understand what you >>> meant by identity mapping, i.e., configuring the MMU-500 with 1:1 mapping >>> between the DMA address and the IOVA address. >>> >>> I think that should work. In the end, due to this MSI write parsing issue in >>> our PCIe controller, the reason to use IOMMU is to allow the cache >>> attributes (AxCACHE) of the MSI writes towards GICv3 ITS to be modified by >>> the IOMMU to be device type, while leaving the rest of inbound reads/writes >>> from/to DDR with more optimized cache attributes setting, to allow I/O >>> coherency to be still enabled for the PCIe controller. In fact, the PCIe >>> controller itself is fully capable of DMA to/from the full address space of >>> our SoC including both DDR and any device memory. >>> >>> The 1:1 mapping will still pose some translation overhead like you >>> suggested; however, the overhead of allocating page tables and locking will >>> be gone. This sounds like the best possible option I have currently. >> >> It might end up being pretty invasive to work around a hardware bug, so >> we'll have to see what it looks like. Ideally, we could just use the SMMU >> for everything as-is and work on clawing back the lost performance (it >> should be possible to get ~95% of the perf if we sort out the locking, which >> we *are* working on). >> > > If 95% of performance can be achieved by fixing the locking in the > driver, then that's great news. > > If you have anything that you want me to help test, feel free to send it > out. I will be more than happy to help testing it and let you know about > the performance numbers, :) > >>> May I ask, how do I start to try to get this identity mapping to work as an >>> experiment and proof of concept? Any pointer or advise is highly appreciated >>> as you can see I'm not very experienced with this. I found Will recently >>> added the IOMMU_DOMAIN_IDENTITY support to the arm-smmu driver. But I >>> suppose that is to bypass the SMMU completely, instead of still going >>> through the MMU with 1:1 translation. Is my understanding correct? >> >> Yes, I don't think IOMMU_DOMAIN_IDENTITY is what you need because you >> actally need per-page control of memory attributes. >> >> Robin might have a better idea, but I think you'll have to hack dma-iommu.c >> so that you can have a version of the DMA ops that: >> >> * Initialises the identity map (I guess as normal WB cacheable?) >> * Reserves and maps the MSI region appropriately >> * Just returns the physical address for the dma address for map requests >> (return error for the MSI region) >> * Does nothing for unmap requests >> >> But my strong preference would be to fix the locking overhead from the >> SMMU so that the perf hit is acceptable. > > Yes, I agree, we want to be able to use the SMMU the intended way. Do > you have a timeline on when the locking issue may be fixed (or > improved)? Depending on the timeline, on our side, we may still need to > go for identity mapping as a temporary solution until the fix. > >> >> Will >> > > Thanks, > > Ray > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ray Jui via iommu Subject: Re: Device address specific mapping of arm,mmu-500 Date: Mon, 5 Jun 2017 11:03:15 -0700 Message-ID: <498100e8-e94e-4a65-a9e1-ae59bd59fe2d@broadcom.com> References: <1b79efe2-6835-7a7a-f5ad-361391a7b967@broadcom.com> <20170530151437.GC23067@arm.com> <81637642-22d9-4868-156f-052f64bd042f@broadcom.com> <226bcebc-3902-90d3-24e5-51f2e1f3affb@arm.com> <16e5fc9d-b014-af7c-dcda-527522ac5cc9@arm.com> <7bd03bf8-71d5-a974-bea2-a38b4349c547@broadcom.com> <20170531124418.GE9723@arm.com> Reply-To: Ray Jui Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Will Deacon Cc: Mark Rutland , Marc Zyngier , "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org List-Id: iommu@lists.linux-foundation.org Hi Will/Robin, Just want to check with you on this again. Do you have a very rough timeline on when the excessive locking in the IOMMU driver may be fixed (so we can restore expected up to 95% performance)? Thanks, Ray On 5/31/17 10:32 AM, Ray Jui wrote: > Hi Will, > > On 5/31/17 5:44 AM, Will Deacon wrote: >> On Tue, May 30, 2017 at 11:13:36PM -0700, Ray Jui wrote: >>> I did a little more digging myself and I think I now understand what you >>> meant by identity mapping, i.e., configuring the MMU-500 with 1:1 mapping >>> between the DMA address and the IOVA address. >>> >>> I think that should work. In the end, due to this MSI write parsing issue in >>> our PCIe controller, the reason to use IOMMU is to allow the cache >>> attributes (AxCACHE) of the MSI writes towards GICv3 ITS to be modified by >>> the IOMMU to be device type, while leaving the rest of inbound reads/writes >>> from/to DDR with more optimized cache attributes setting, to allow I/O >>> coherency to be still enabled for the PCIe controller. In fact, the PCIe >>> controller itself is fully capable of DMA to/from the full address space of >>> our SoC including both DDR and any device memory. >>> >>> The 1:1 mapping will still pose some translation overhead like you >>> suggested; however, the overhead of allocating page tables and locking will >>> be gone. This sounds like the best possible option I have currently. >> >> It might end up being pretty invasive to work around a hardware bug, so >> we'll have to see what it looks like. Ideally, we could just use the SMMU >> for everything as-is and work on clawing back the lost performance (it >> should be possible to get ~95% of the perf if we sort out the locking, which >> we *are* working on). >> > > If 95% of performance can be achieved by fixing the locking in the > driver, then that's great news. > > If you have anything that you want me to help test, feel free to send it > out. I will be more than happy to help testing it and let you know about > the performance numbers, :) > >>> May I ask, how do I start to try to get this identity mapping to work as an >>> experiment and proof of concept? Any pointer or advise is highly appreciated >>> as you can see I'm not very experienced with this. I found Will recently >>> added the IOMMU_DOMAIN_IDENTITY support to the arm-smmu driver. But I >>> suppose that is to bypass the SMMU completely, instead of still going >>> through the MMU with 1:1 translation. Is my understanding correct? >> >> Yes, I don't think IOMMU_DOMAIN_IDENTITY is what you need because you >> actally need per-page control of memory attributes. >> >> Robin might have a better idea, but I think you'll have to hack dma-iommu.c >> so that you can have a version of the DMA ops that: >> >> * Initialises the identity map (I guess as normal WB cacheable?) >> * Reserves and maps the MSI region appropriately >> * Just returns the physical address for the dma address for map requests >> (return error for the MSI region) >> * Does nothing for unmap requests >> >> But my strong preference would be to fix the locking overhead from the >> SMMU so that the perf hit is acceptable. > > Yes, I agree, we want to be able to use the SMMU the intended way. Do > you have a timeline on when the locking issue may be fixed (or > improved)? Depending on the timeline, on our side, we may still need to > go for identity mapping as a temporary solution until the fix. > >> >> Will >> > > Thanks, > > Ray > From mboxrd@z Thu Jan 1 00:00:00 1970 From: ray.jui@broadcom.com (Ray Jui) Date: Mon, 5 Jun 2017 11:03:15 -0700 Subject: Device address specific mapping of arm,mmu-500 In-Reply-To: References: <1b79efe2-6835-7a7a-f5ad-361391a7b967@broadcom.com> <20170530151437.GC23067@arm.com> <81637642-22d9-4868-156f-052f64bd042f@broadcom.com> <226bcebc-3902-90d3-24e5-51f2e1f3affb@arm.com> <16e5fc9d-b014-af7c-dcda-527522ac5cc9@arm.com> <7bd03bf8-71d5-a974-bea2-a38b4349c547@broadcom.com> <20170531124418.GE9723@arm.com> Message-ID: <498100e8-e94e-4a65-a9e1-ae59bd59fe2d@broadcom.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi Will/Robin, Just want to check with you on this again. Do you have a very rough timeline on when the excessive locking in the IOMMU driver may be fixed (so we can restore expected up to 95% performance)? Thanks, Ray On 5/31/17 10:32 AM, Ray Jui wrote: > Hi Will, > > On 5/31/17 5:44 AM, Will Deacon wrote: >> On Tue, May 30, 2017 at 11:13:36PM -0700, Ray Jui wrote: >>> I did a little more digging myself and I think I now understand what you >>> meant by identity mapping, i.e., configuring the MMU-500 with 1:1 mapping >>> between the DMA address and the IOVA address. >>> >>> I think that should work. In the end, due to this MSI write parsing issue in >>> our PCIe controller, the reason to use IOMMU is to allow the cache >>> attributes (AxCACHE) of the MSI writes towards GICv3 ITS to be modified by >>> the IOMMU to be device type, while leaving the rest of inbound reads/writes >>> from/to DDR with more optimized cache attributes setting, to allow I/O >>> coherency to be still enabled for the PCIe controller. In fact, the PCIe >>> controller itself is fully capable of DMA to/from the full address space of >>> our SoC including both DDR and any device memory. >>> >>> The 1:1 mapping will still pose some translation overhead like you >>> suggested; however, the overhead of allocating page tables and locking will >>> be gone. This sounds like the best possible option I have currently. >> >> It might end up being pretty invasive to work around a hardware bug, so >> we'll have to see what it looks like. Ideally, we could just use the SMMU >> for everything as-is and work on clawing back the lost performance (it >> should be possible to get ~95% of the perf if we sort out the locking, which >> we *are* working on). >> > > If 95% of performance can be achieved by fixing the locking in the > driver, then that's great news. > > If you have anything that you want me to help test, feel free to send it > out. I will be more than happy to help testing it and let you know about > the performance numbers, :) > >>> May I ask, how do I start to try to get this identity mapping to work as an >>> experiment and proof of concept? Any pointer or advise is highly appreciated >>> as you can see I'm not very experienced with this. I found Will recently >>> added the IOMMU_DOMAIN_IDENTITY support to the arm-smmu driver. But I >>> suppose that is to bypass the SMMU completely, instead of still going >>> through the MMU with 1:1 translation. Is my understanding correct? >> >> Yes, I don't think IOMMU_DOMAIN_IDENTITY is what you need because you >> actally need per-page control of memory attributes. >> >> Robin might have a better idea, but I think you'll have to hack dma-iommu.c >> so that you can have a version of the DMA ops that: >> >> * Initialises the identity map (I guess as normal WB cacheable?) >> * Reserves and maps the MSI region appropriately >> * Just returns the physical address for the dma address for map requests >> (return error for the MSI region) >> * Does nothing for unmap requests >> >> But my strong preference would be to fix the locking overhead from the >> SMMU so that the perf hit is acceptable. > > Yes, I agree, we want to be able to use the SMMU the intended way. Do > you have a timeline on when the locking issue may be fixed (or > improved)? Depending on the timeline, on our side, we may still need to > go for identity mapping as a temporary solution until the fix. > >> >> Will >> > > Thanks, > > Ray >