All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
To: Krishna Reddy <vdumpa@nvidia.com>, Robin Murphy <robin.murphy@arm.com>
Cc: linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
	iommu@lists.linux-foundation.org, Will Deacon <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org,
	Thierry Reding <treding@nvidia.com>
Subject: Re: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list
Date: Sat, 12 Jun 2021 08:16:30 +0530	[thread overview]
Message-ID: <f749ba0957b516ab5f0ea57033d308c7@codeaurora.org> (raw)
In-Reply-To: <BY5PR12MB376480219C42E5FCE0FE0FFBB3349@BY5PR12MB3764.namprd12.prod.outlook.com>

Hi Krishna,

On 2021-06-11 22:19, Krishna Reddy wrote:
> Hi Sai,
>> >> > No, the unmap latency is not just in some test case written, the
>> >> > issue is very real and we have workloads where camera is reporting
>> >> > frame drops because of this unmap latency in the order of 100s of
>> milliseconds.
> 
>> Not exactly, this issue is not specific to camera. If you look at the 
>> numbers in the
>> commit text, even for the test device its the same observation. It 
>> depends on
>> the buffer size we are unmapping which affects the number of TLBIs 
>> issue. I am
>> not aware of any such HW side bw issues for camera specifically on 
>> QCOM
>> devices.
> 
> It is clear that reducing number of TLBIs  reduces the umap API
> latency. But, It is
> at the expense of throwing away valid tlb entries.
> Quantifying the impact of arbitrary invalidation of valid tlb entries
> at context level is not straight forward and
> use case dependent. The side-effects might be rare or won't be known
> until they are noticed.

Right but we won't know until we profile the specific usecases or try 
them
in generic workload to see if they affect the performance. Sure, over
invalidation is a concern where multiple buffers can be mapped to same 
context
and the cache is not usable at the time for lookup and such but we don't 
do it
for small buffers and only for large buffers which means thousands of 
TLB entry
mappings in which case TLBIASID is preferred (note: I mentioned the HW 
team
recommendation to use it for anything greater than 128 TLB entries) in 
my earlier
reply. And also note that we do this only for partial walk flush, we are 
not
arbitrarily changing all the TLBIs to ASID based.

> Can you provide more details on How the unmap latency is causing
> camera to drop frames?
> Is unmap performed in the perf path?

I am no camera expert but from what the camera team mentioned is that
there is a thread which frees memory(large unused memory buffers)
periodically which ends up taking around 100+ms and causing some camera 
test
failures with frame drops. Parallel efforts are already being made to 
optimize
this usage of thread but as I mentioned previously, this is *not a 
camera
specific*, lets say someone else invokes such large unmaps, it's going 
to face
the same issue.

> If unmap is queued and performed on a back ground thread, would it
> resolve the frame drops?

Not sure I understand what you mean by queuing on background thread but 
with
that or not, we still do the same number of TLBIs and hop through
iommu->io-pgtable->arm-smmu to perform the the unmap, so how will that 
help?

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

WARNING: multiple messages have this Message-ID (diff)
From: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
To: Krishna Reddy <vdumpa@nvidia.com>, Robin Murphy <robin.murphy@arm.com>
Cc: linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
	iommu@lists.linux-foundation.org,
	Thierry Reding <treding@nvidia.com>,
	Will Deacon <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list
Date: Sat, 12 Jun 2021 08:16:30 +0530	[thread overview]
Message-ID: <f749ba0957b516ab5f0ea57033d308c7@codeaurora.org> (raw)
In-Reply-To: <BY5PR12MB376480219C42E5FCE0FE0FFBB3349@BY5PR12MB3764.namprd12.prod.outlook.com>

Hi Krishna,

On 2021-06-11 22:19, Krishna Reddy wrote:
> Hi Sai,
>> >> > No, the unmap latency is not just in some test case written, the
>> >> > issue is very real and we have workloads where camera is reporting
>> >> > frame drops because of this unmap latency in the order of 100s of
>> milliseconds.
> 
>> Not exactly, this issue is not specific to camera. If you look at the 
>> numbers in the
>> commit text, even for the test device its the same observation. It 
>> depends on
>> the buffer size we are unmapping which affects the number of TLBIs 
>> issue. I am
>> not aware of any such HW side bw issues for camera specifically on 
>> QCOM
>> devices.
> 
> It is clear that reducing number of TLBIs  reduces the umap API
> latency. But, It is
> at the expense of throwing away valid tlb entries.
> Quantifying the impact of arbitrary invalidation of valid tlb entries
> at context level is not straight forward and
> use case dependent. The side-effects might be rare or won't be known
> until they are noticed.

Right but we won't know until we profile the specific usecases or try 
them
in generic workload to see if they affect the performance. Sure, over
invalidation is a concern where multiple buffers can be mapped to same 
context
and the cache is not usable at the time for lookup and such but we don't 
do it
for small buffers and only for large buffers which means thousands of 
TLB entry
mappings in which case TLBIASID is preferred (note: I mentioned the HW 
team
recommendation to use it for anything greater than 128 TLB entries) in 
my earlier
reply. And also note that we do this only for partial walk flush, we are 
not
arbitrarily changing all the TLBIs to ASID based.

> Can you provide more details on How the unmap latency is causing
> camera to drop frames?
> Is unmap performed in the perf path?

I am no camera expert but from what the camera team mentioned is that
there is a thread which frees memory(large unused memory buffers)
periodically which ends up taking around 100+ms and causing some camera 
test
failures with frame drops. Parallel efforts are already being made to 
optimize
this usage of thread but as I mentioned previously, this is *not a 
camera
specific*, lets say someone else invokes such large unmaps, it's going 
to face
the same issue.

> If unmap is queued and performed on a back ground thread, would it
> resolve the frame drops?

Not sure I understand what you mean by queuing on background thread but 
with
that or not, we still do the same number of TLBIs and hop through
iommu->io-pgtable->arm-smmu to perform the the unmap, so how will that 
help?

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2021-06-12  2:46 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09 14:53 [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list Sai Prakash Ranjan
2021-06-09 14:53 ` Sai Prakash Ranjan
2021-06-09 18:44 ` Robin Murphy
2021-06-09 18:44   ` Robin Murphy
2021-06-09 18:44   ` Robin Murphy
2021-06-10  5:24   ` Sai Prakash Ranjan
2021-06-10  5:24     ` Sai Prakash Ranjan
2021-06-10  9:08     ` Robin Murphy
2021-06-10  9:08       ` Robin Murphy
2021-06-10  9:08       ` Robin Murphy
2021-06-10  9:36       ` Sai Prakash Ranjan
2021-06-10  9:36         ` Sai Prakash Ranjan
2021-06-10 11:33         ` Robin Murphy
2021-06-10 11:33           ` Robin Murphy
2021-06-10 11:33           ` Robin Murphy
2021-06-10 11:54           ` Sai Prakash Ranjan
2021-06-10 11:54             ` Sai Prakash Ranjan
2021-06-10 15:29             ` Robin Murphy
2021-06-10 15:29               ` Robin Murphy
2021-06-10 15:29               ` Robin Murphy
2021-06-10 15:51               ` Sai Prakash Ranjan
2021-06-10 15:51                 ` Sai Prakash Ranjan
2021-06-11  0:37               ` Krishna Reddy
2021-06-11  0:37                 ` Krishna Reddy
2021-06-11  0:37                 ` Krishna Reddy
2021-06-11  0:54                 ` Sai Prakash Ranjan
2021-06-11  0:54                   ` Sai Prakash Ranjan
2021-06-11 16:49                   ` Krishna Reddy
2021-06-11 16:49                     ` Krishna Reddy
2021-06-11 16:49                     ` Krishna Reddy
2021-06-12  2:46                     ` Sai Prakash Ranjan [this message]
2021-06-12  2:46                       ` Sai Prakash Ranjan
2021-06-14 17:48                       ` Krishna Reddy
2021-06-14 17:48                         ` Krishna Reddy
2021-06-14 17:48                         ` Krishna Reddy
2021-06-15 11:51                         ` Sai Prakash Ranjan
2021-06-15 11:51                           ` Sai Prakash Ranjan
2021-06-15 13:53                           ` Robin Murphy
2021-06-15 13:53                             ` Robin Murphy
2021-06-15 13:53                             ` Robin Murphy
2021-06-16  6:58                             ` Sai Prakash Ranjan
2021-06-16  6:58                               ` Sai Prakash Ranjan
2021-06-16  9:03                               ` Sai Prakash Ranjan
2021-06-16  9:03                                 ` Sai Prakash Ranjan
2021-06-17 21:18                                 ` Krishna Reddy
2021-06-17 21:18                                   ` Krishna Reddy
2021-06-17 21:18                                   ` Krishna Reddy
2021-06-18  2:47                                   ` Sai Prakash Ranjan
2021-06-18  2:47                                     ` Sai Prakash Ranjan
2021-06-18  4:04                           ` Sai Prakash Ranjan
2021-06-18  4:04                             ` Sai Prakash Ranjan
2021-06-10 12:03           ` Thierry Reding
2021-06-10 12:03             ` Thierry Reding
2021-06-10 12:03             ` Thierry Reding

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f749ba0957b516ab5f0ea57033d308c7@codeaurora.org \
    --to=saiprakash.ranjan@codeaurora.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=treding@nvidia.com \
    --cc=vdumpa@nvidia.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.