archive mirror
 help / color / mirror / Atom feed
From: Robin Murphy <>
To: Sai Prakash Ranjan <>,
	Krishna Reddy <>
Cc:,,, Will Deacon <>,,
	Thierry Reding <>
Subject: Re: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list
Date: Tue, 15 Jun 2021 14:53:45 +0100	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On 2021-06-15 12:51, Sai Prakash Ranjan wrote:
> Hi Krishna,
> On 2021-06-14 23:18, Krishna Reddy wrote:
>>> Right but we won't know until we profile the specific usecases or try 
>>> them in
>>> generic workload to see if they affect the performance. Sure, over 
>>> invalidation is
>>> a concern where multiple buffers can be mapped to same context and 
>>> the cache
>>> is not usable at the time for lookup and such but we don't do it for 
>>> small buffers
>>> and only for large buffers which means thousands of TLB entry 
>>> mappings in
>>> which case TLBIASID is preferred (note: I mentioned the HW team
>>> recommendation to use it for anything greater than 128 TLB entries) 
>>> in my
>>> earlier reply. And also note that we do this only for partial walk 
>>> flush, we are not
>>> arbitrarily changing all the TLBIs to ASID based.
>> Most of the heavy bw use cases does involve processing larger buffers.
>> When the physical memory is allocated dis-contiguously at page_size
>> (let's use 4KB here)
>> granularity, each aligned 2MB chunks IOVA unmap would involve
>> performing a TLBIASID
>> as 2MB is not a leaf. Essentially, It happens all the time during
>> large buffer unmaps and
>> potentially impact active traffic on other large buffers. Depending on 
>> how much
>> latency HW engines can absorb, the overflow/underflow issues for ISO
>> engines can be
>> sporadic and vendor specific.
>> Performing TLBIASID as default for all SoCs is not a safe operation.
> Ok so from what I gather from this is that its not easy to test for the
> negative impact and you don't have data on such yet and the behaviour is
> very vendor specific. To add on qcom impl, we have several performance
> improvements for TLB cache invalidations in HW like wait-for-safe(for 
> realtime
> clients such as camera and display) and few others to allow for cache
> lookups/updates when TLBI is in progress for the same context bank, so 
> atleast
> we are good here.
>>> I am no camera expert but from what the camera team mentioned is that 
>>> there
>>> is a thread which frees memory(large unused memory buffers) 
>>> periodically which
>>> ends up taking around 100+ms and causing some camera test failures with
>>> frame drops. Parallel efforts are already being made to optimize this 
>>> usage of
>>> thread but as I mentioned previously, this is *not a camera 
>>> specific*, lets say
>>> someone else invokes such large unmaps, it's going to face the same 
>>> issue.
>> From the above, It doesn't look like the root cause of frame drops is
>> fully understood.
>> Why is 100+ms delay causing camera frame drop?  Is the same thread
>> submitting the buffers
>> to camera after unmap is complete? If not, how is the unmap latency
>> causing issue here?
> Ok since you are interested in camera usecase, I have requested for more 
> details
> from the camera team and will give it once they comeback. However I 
> don't think
> its good to have unmap latency at all and that is being addressed by 
> this patch.
>>> > If unmap is queued and performed on a back ground thread, would it
>>> > resolve the frame drops?
>>> Not sure I understand what you mean by queuing on background thread 
>>> but with
>>> that or not, we still do the same number of TLBIs and hop through
>>> iommu->io-pgtable->arm-smmu to perform the the unmap, so how will that
>>> help?
>> I mean adding the unmap requests into a queue and processing them from
>> a different thread.
>> It is not to reduce the TLBIs. But, not to block subsequent buffer
>> allocation, IOVA map requests, if they
>> are being requested from same thread that is performing unmap. If
>> unmap is already performed from
>> a different thread, then the issue still need to be root caused to
>> understand it fully. Check for any
>> serialization issues.
> This patch is to optimize unmap latency because of large number of mmio 
> writes(TLBIVAs)
> wasting CPU cycles and not to fix camera issue which can probably be 
> solved by
> parallelization. It seems to me like you are ok with the unmap latency 
> in general
> which we are not and want to avoid that latency.
> Hi @Robin, from these discussions it seems they are not ok with the change
> for all SoC vendor implementations and do not have any data on such impact.
> As I mentioned above, on QCOM platforms we do have several optimizations 
> in HW
> for TLBIs and would like to make use of it and reduce the unmap latency.
> What do you think, should this be made implementation specific?

Yes, it sounds like there's enough uncertainty for now that this needs 
to be an opt-in feature. However, I still think that non-strict mode 
could use it generically, since that's all about over-invalidating to 
save time on individual unmaps - and relatively non-deterministic - already.

So maybe we have a second set of iommu_flush_ops, or just a flag 
somewhere to control the tlb_flush_walk functions internally, and the 
choice can be made in the iommu_get_dma_strict() test, but also forced 
on all the time by your init_context hook. What do you reckon?


  reply	other threads:[~2021-06-15 13:53 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09 14:53 [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list Sai Prakash Ranjan
2021-06-09 18:44 ` Robin Murphy
2021-06-10  5:24   ` Sai Prakash Ranjan
2021-06-10  9:08     ` Robin Murphy
2021-06-10  9:36       ` Sai Prakash Ranjan
2021-06-10 11:33         ` Robin Murphy
2021-06-10 11:54           ` Sai Prakash Ranjan
2021-06-10 15:29             ` Robin Murphy
2021-06-10 15:51               ` Sai Prakash Ranjan
2021-06-11  0:37               ` Krishna Reddy
2021-06-11  0:54                 ` Sai Prakash Ranjan
2021-06-11 16:49                   ` Krishna Reddy
2021-06-12  2:46                     ` Sai Prakash Ranjan
2021-06-14 17:48                       ` Krishna Reddy
2021-06-15 11:51                         ` Sai Prakash Ranjan
2021-06-15 13:53                           ` Robin Murphy [this message]
2021-06-16  6:58                             ` Sai Prakash Ranjan
2021-06-16  9:03                               ` Sai Prakash Ranjan
2021-06-17 21:18                                 ` Krishna Reddy
2021-06-18  2:47                                   ` Sai Prakash Ranjan
2021-06-18  4:04                           ` Sai Prakash Ranjan
2021-06-10 12:03           ` Thierry Reding

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).