From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robin Murphy Subject: Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache Date: Mon, 21 Jan 2019 13:25:49 +0000 Message-ID: References: <20190121055335.15430-1-vivek.gautam@codeaurora.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Content-Language: en-GB Sender: linux-kernel-owner@vger.kernel.org To: Ard Biesheuvel , Vivek Gautam Cc: Will Deacon , Joerg Roedel , "list@263.net:IOMMU DRIVERS" Joerg Roedel iommu@lists.linux-foundation.org, pdaly@codeaurora.org, linux-arm-msm , Linux Kernel Mailing List , Tomasz Figa , Jordan Crouse , pratikp@codeaurora.org, linux-arm-kernel List-Id: linux-arm-msm@vger.kernel.org On 21/01/2019 10:50, Ard Biesheuvel wrote: > On Mon, 21 Jan 2019 at 11:17, Vivek Gautam wrote: >> >> Hi, >> >> >> On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel >> wrote: >>> >>> On Mon, 21 Jan 2019 at 06:54, Vivek Gautam wrote: >>>> >>>> Qualcomm SoCs have an additional level of cache called as >>>> System cache, aka. Last level cache (LLC). This cache sits right >>>> before the DDR, and is tightly coupled with the memory controller. >>>> The clients using this cache request their slices from this >>>> system cache, make it active, and can then start using it. >>>> For these clients with smmu, to start using the system cache for >>>> buffers and, related page tables [1], memory attributes need to be >>>> set accordingly. This series add the required support. >>>> >>> >>> Does this actually improve performance on reads from a device? The >>> non-cache coherent DMA routines perform an unconditional D-cache >>> invalidate by VA to the PoC before reading from the buffers filled by >>> the device, and I would expect the PoC to be defined as lying beyond >>> the LLC to still guarantee the architected behavior. >> >> We have seen performance improvements when running Manhattan >> GFXBench benchmarks. >> > > Ah ok, that makes sense, since in that case, the data flow is mostly > to the device, not from the device. > >> As for the PoC, from my knowledge on sdm845 the system cache, aka >> Last level cache (LLC) lies beyond the point of coherency. >> Non-cache coherent buffers will not be cached to system cache also, and >> no additional software cache maintenance ops are required for system cache. >> Pratik can add more if I am missing something. >> >> To take care of the memory attributes from DMA APIs side, we can add a >> DMA_ATTR definition to take care of any dma non-coherent APIs calls. >> > > So does the device use the correct inner non-cacheable, outer > writeback cacheable attributes if the SMMU is in pass-through? > > We have been looking into another use case where the fact that the > SMMU overrides memory attributes is causing issues (WC mappings used > by the radeon and amdgpu driver). So if the SMMU would honour the > existing attributes, would you still need the SMMU changes? Even if we could force a stage 2 mapping with the weakest pagetable attributes (such that combining would work), there would still need to be a way to set the TCR attributes appropriately if this behaviour is wanted for the SMMU's own table walks as well. Robin.